Top Banner
Hindawi Publishing Corporation International Journal of Distributed Sensor Networks Volume 2013, Article ID 406316, 24 pages http://dx.doi.org/10.1155/2013/406316 Review Article Data Mining Techniques for Wireless Sensor Networks: A Survey Azhar Mahmood, Ke Shi, Shaheen Khatoon, and Mi Xiao School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China Correspondence should be addressed to Ke Shi; [email protected] Received 18 February 2013; Revised 15 June 2013; Accepted 20 June 2013 Academic Editor: Marimuthu Palaniswami Copyright © 2013 Azhar Mahmood et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Recently, data management and processing for wireless sensor networks (WSNs) has become a topic of active research in several fields of computer science, such as the distributed systems, the database systems, and the data mining. e main aim of deploying the WSNs-based applications is to make the real-time decision which has been proved to be very challenging due to the highly resource-constrained computing, communicating capacities, and huge volume of fast-changed data generated by WSNs. is challenge motivates the research community to explore novel data mining techniques dealing with extracting knowledge from large continuous arriving data from WSNs. Traditional data mining techniques are not directly applicable to WSNs due to the nature of sensor data, their special characteristics, and limitations of the WSNs. is work provides an overview of how traditional data mining algorithms are revised and improved to achieve good performance in a wireless sensor network environment. A comprehensive survey of existing data mining techniques and their multilevel classification scheme is presented. e taxonomy together with the comparative tables can be used as a guideline to select a technique suitable for the application at hand. Based on the limitations of the existing technique, an adaptive data mining framework of WSNs for future research is proposed. 1. Introduction Advances in wireless communication and microelectronic devices led to the development of low-power sensors and the deployment of large-scale sensor networks. With the capabili- ties of pervasive surveillance, sensor networks have attracted significant attention in many applications domains, such as habitat monitoring [1, 2], object tracking [3, 4], environment monitoring [57], military [8, 9], disaster management [10], as well as smart environments. In these applications, real- time and reliable monitoring is essential requirement. ese applications yield huge volume of dynamic, geographically distributed and heterogeneous data. is raw data, if effi- ciently analyzed and transformed to usable information through data mining, can facilitate automated or human- induced tactical/strategic decision. erefore, it is essential to develop techniques to mine the sensor data for patterns in order to make intelligent decisions promptly. Recently, extracting knowledge from sensor data has received a great deal of attention by the data mining com- munity. Different approaches focusing on clustering [1114], association rules [15, 16], frequent patterns [1720], sequential patterns [2123], and classification [2426] have been successfully used on sensor data. However, the design and deployment of sensor networks creates unique research challenges due to their large size (up to thousands of sensor nodes), random and hazardous deployment, lossy communi- cating environment, limited power supply, and high failure rate. ese challenges make traditional mining techniques inapplicable because traditionally mining is centralized and computationally expensive, and it focuses on disk-resident transactional data. As a result, new algorithms have been created, and some of the data mining algorithms have been modified to handle the data generated from sensor networks. A plethora of knowledge discovery methodologies, techniques, and algorithms have been proposed during the last ten years. For example, a decent amount of work is done for detection of the outlier in WSNs which is presented in [2729]. Most of the techniques examined in [27, 28] heavily rely on data mining techniques, but their focus is detec- tion of irregularities in WSNs data rather than information extraction and analysis. A survey [29] presented the anomaly
25

ReviewArticle Data Mining Techniques for Wireless Sensor ...home.etf.bg.ac.rs/~vm/os/dmsw/Data Mining... · have a large impact on type of data mining algorithm to...

Jul 30, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: ReviewArticle Data Mining Techniques for Wireless Sensor ...home.etf.bg.ac.rs/~vm/os/dmsw/Data Mining... · have a large impact on type of data mining algorithm to choose;therefore,onehastodecidetheprocessing

Hindawi Publishing CorporationInternational Journal of Distributed Sensor NetworksVolume 2013 Article ID 406316 24 pageshttpdxdoiorg1011552013406316

Review ArticleData Mining Techniques for Wireless Sensor Networks A Survey

Azhar Mahmood Ke Shi Shaheen Khatoon and Mi Xiao

School of Computer Science and Technology Huazhong University of Science and Technology Wuhan 430074 China

Correspondence should be addressed to Ke Shi keshimailhusteducn

Received 18 February 2013 Revised 15 June 2013 Accepted 20 June 2013

Academic Editor Marimuthu Palaniswami

Copyright copy 2013 Azhar Mahmood et al This is an open access article distributed under the Creative Commons AttributionLicense which permits unrestricted use distribution and reproduction in any medium provided the original work is properlycited

Recently data management and processing for wireless sensor networks (WSNs) has become a topic of active research in severalfields of computer science such as the distributed systems the database systems and the data mining The main aim of deployingthe WSNs-based applications is to make the real-time decision which has been proved to be very challenging due to the highlyresource-constrained computing communicating capacities and huge volume of fast-changed data generated by WSNs Thischallenge motivates the research community to explore novel data mining techniques dealing with extracting knowledge fromlarge continuous arriving data from WSNs Traditional data mining techniques are not directly applicable to WSNs due to thenature of sensor data their special characteristics and limitations of the WSNsThis work provides an overview of how traditionaldata mining algorithms are revised and improved to achieve good performance in a wireless sensor network environment Acomprehensive survey of existing data mining techniques and their multilevel classification scheme is presented The taxonomytogether with the comparative tables can be used as a guideline to select a technique suitable for the application at hand Based onthe limitations of the existing technique an adaptive data mining framework of WSNs for future research is proposed

1 Introduction

Advances in wireless communication and microelectronicdevices led to the development of low-power sensors and thedeployment of large-scale sensor networksWith the capabili-ties of pervasive surveillance sensor networks have attractedsignificant attention in many applications domains such ashabitat monitoring [1 2] object tracking [3 4] environmentmonitoring [5ndash7] military [8 9] disaster management [10]as well as smart environments In these applications real-time and reliable monitoring is essential requirement Theseapplications yield huge volume of dynamic geographicallydistributed and heterogeneous data This raw data if effi-ciently analyzed and transformed to usable informationthrough data mining can facilitate automated or human-induced tacticalstrategic decision Therefore it is essentialto develop techniques to mine the sensor data for patterns inorder to make intelligent decisions promptly

Recently extracting knowledge from sensor data hasreceived a great deal of attention by the data mining com-munity Different approaches focusing on clustering [11ndash14] association rules [15 16] frequent patterns [17ndash20]

sequential patterns [21ndash23] and classification [24ndash26] havebeen successfully used on sensor data However the designand deployment of sensor networks creates unique researchchallenges due to their large size (up to thousands of sensornodes) random and hazardous deployment lossy communi-cating environment limited power supply and high failurerate These challenges make traditional mining techniquesinapplicable because traditionally mining is centralized andcomputationally expensive and it focuses on disk-residenttransactional data As a result new algorithms have beencreated and some of the data mining algorithms havebeen modified to handle the data generated from sensornetworks A plethora of knowledge discoverymethodologiestechniques and algorithms have been proposed during thelast ten years

For example a decent amount of work is done fordetection of the outlier in WSNs which is presented in [27ndash29] Most of the techniques examined in [27 28] heavilyrely on data mining techniques but their focus is detec-tion of irregularities in WSNs data rather than informationextraction and analysis A survey [29] presented the anomaly

2 International Journal of Distributed Sensor Networks

Table 1 Difference between traditional and sensor data processing

Traditional data WSNs dataProcessing architecture Centralized DistributedData type Static DynamicMemory usage Unlimited RestrictedProcessing time Unlimited RestrictedComputational power High WeakEnergy No constraints LimitedData flow Stationary ContinuousData length Bounded UnboundedResponse time Non-real-time Real timeUpdate speed Low HighNumber of passes Multipass Single

detection in multiple domains using data mining as well asstatistical information theoretic and spectral techniques

Since data mining is a broad discipline and can beapplied to any domain data more general surveys on datamining techniques can be found in [30] where authorsexamined the machine-learning and data mining techniquesfor analyzing medical data Since the classification of datamining techniques in this survey is based on frequent patternmining clustering and classification there are plenty ofsurveys available on each of these techniques For examplefrequent pattern mining over data stream is presented in [3132] A survey on clustering algorithm for WSNs is presentedin [33 34] The clustering techniques examined in thosepapers exclusively focus on architecture and managementof network rather than information discovery A survey onclassificationmethods over data stream is given in [35] wherethe author examined conventional classification techniquesover data streams

However none of the above surveys examined datamining techniques that focus on information extractionand analysis from WSNs data In comparison with theabove-mentioned surveys this paper examines algorithmsand approaches specially designed for WSNs data not onlyleading to a different classification evaluation and discussionon different domains but also presenting different choices ofa solution We examined how data mining algorithms will beutilized to make the sensor network applications intelligentThe research method consists of review of data mining tech-niques for WSNs such as frequent pattern mining sequentialpatternmining clustering and classification Problem-basedtaxonomy is presented to classify and compare existing datamining techniques adopted forWSNs In addition evaluationof each technique is presented Based on the limitations ofexisting techniques and special characteristics of WSNs weproposed a new hybrid data mining architecture for WSNswhich combines the offline learning with distributive andonline data processing

The rest of the paper is organized as follows After theintroduction in Section 1 how traditional data mining pro-cess is different with data mining process in WSNs andchallenges of data mining for WSNs data are discussed inSection 2 In Section 3 taxonomy of categorizing the existing

data mining techniques for WSNs is presented In Section 4we analyzed a collection of published studies using the taxon-omy framework The comparison of data mining techniquesfor WSNs is presented in Section 5 The limitations of thiswork are given in Section 6 and future research directionsare presented in Section 7 Finally the paper ends with theconclusion in Section 8

2 Fundamentals of Data Mining in WSNs

21 Data Mining Process in WSNs Data mining in sensornetworks is the process of extracting application-orientedmodels and patterns with acceptable accuracy from a contin-uous rapid and possibly nonended flow of data streams fromsensor networks In this case whole data cannot be stored andmust be processed immediately Data mining algorithm hasto be sufficiently fast to process high-speed arriving dataTheconventional data mining algorithms are meant to handle thestatic data and use the multistep techniques and multiscanmining algorithms for analyzing static data-sets Thereforeconventional data mining techniques are not suitable forhandling the massive quantity high dimensionality anddistributed nature of the data generated by theWSNs Table 1shows the summary of difference between traditional dataand WSNs data mining process

It can be observed from Table 1 that traditional data min-ing is centralized computationally expensive and focused ondisk-resident transactional data It directly collects data at thecentral sitewhich is not bounded by computational resourcesIn comparison with traditional data-sets the WSNs dataflows continuously in systems with varying update rates Dueto huge amount and high storage cost it is impossible tostore the entire WSNs data or to scan through it multipletimes These characteristics of sensor data and the specialdesign issues of sensor networks make traditional datamining techniques challenging Hence it is crucial to developdata mining technique that can analyze and process WSNsdata in multidimensional multilevel single-pass and onlinemanner

22 Challenges According to the following reasons conven-tional data mining techniques for handling sensor data inWSNs are challenging

(i) Resource Constraint The sensor nodes are resourceconstraints in terms of power memory communica-tion bandwidth and computational power The mainchallenge faced by data mining techniques for WSNsis to satisfy the mining accuracy requirements whilemaintaining the resource consumption of WSNs to aminimum

(ii) Fast and Huge Data Arrival The inherent nature ofWSNs data is its high speed In many domains dataarrives faster than we are able to mine Additionallyspatiotemporal embedding of sensor data plays animportant role in WSNs application This may causemany classical data processing techniques to performpoorly on spatiotemporal sensor data The challengefor data mining techniques is how to cope with the

International Journal of Distributed Sensor Networks 3

continuous rapid and changing data streams andalso how to incorporate user interaction during high-speed data arrival

(iii) Online Mining In WSNs environment data is geo-graphically distributed inputs arrive continuouslyand newer data items may change the results basedon older data substantially Most of data mining tech-niques that analyze data in an offline manner do notmeet the requirement of handling distributed streamdata Thus a challenge for data mining techniques ishow to process distributed streaming data online

(iv) Modeling Changes of Mining Results Over Time Whenthe data-generating phenomenon is changing overtime the extracted model at any time should beup-to-date Due to the continuity of data streamssome researchers have pointed out that capturing thechange of mining results is more important in thisarea than themining resultsThe research issue is howto model this change in the results

(v) Data Transformation Since sensor nodes are limitedin terms of bandwidth transforming original dataover the network is not feasible Knowledge structuretransformation is an important issue After extractingmodel and patterns locally from WSNs data theoutput is transferred to the base stationThe challengefor data mining technique is how to efficiently rep-resent data and discovered patterns over network fortransmission

(vi) DynamicNetwork Topology Sensor network deployedin potentially harsh uncertain heterogenic anddynamic environments Moreover sensor nodes maymove among different locations at any point overtime Such dynamicity and heterogeneity increase thecomplexity of designing an appropriate data miningtechnique for WSNs

To address these challenges researchers have modifiedthe conventional data mining techniques and also proposednew data mining algorithms to handle the data generatedfrom sensor networks In the following section we haveprovided the taxonomy of these data mining techniquesbased on the discipline from which they adopt their ideas

3 Taxonomy of Data Mining Techniquesfor WSNs

In this section a classification scheme for existing approachesdesigned for mining WSNs data is presented The highest-level classification is based upon the general data miningclasses used such as frequent pattern mining sequential pat-tern mining clustering and classification Most of the frequentpattern mining and sequential pattern mining approacheshave adapted the traditional frequent mining techniquessuch as the Apriori and frequent pattern (119865119875) growth-basedalgorithms to find the association among large WSNs dataCluster-based approaches have adapted the K-mean hier-archical and data correlation-based clustering based upon

the distance among the datapoint whereas classification-based approaches have adapted the traditional classificationtechniques such as decision tree rule-based nearest neighborand support vector machines methods based on type ofclassification model that they used These algorithms havevery different and distinct roles therefore in order to choosethe algorithm forWSNs application one has to decide in termof these top-level classes

The second level of classification is based upon eachapproachrsquos ability to process data on centralized or distributedmanner Since WSNs nodes are limited in terms of resourcesuch as power computation bandwidth and memory there-fore the approach meant for distributed processing requiresone-pass algorithms to complete a part of data mininglocally and then aggregate the results The objective to usethe distributed approaches is to limit the messages andcommunication energy of sensor nodes while transferringdata to central server It also helps to improve the WSNs life-time and can extract maximum data from the environmentwhereas the centralized processing data from entire networkis collected and stored at central server for analysis Sincethe central server is rich in resources therefore there are nosuch constraints for choosing the accurate algorithm Thisapproach is always discouraged for the researchers becauseit generates huge amount of dataflow and communicationwhich can create bottlenecks and wastage of communicationbandwidth These two data processingstorage architectureshave a large impact on type of data mining algorithm tochoose therefore one has to decide the processingstoragearchitecture for choosing the data mining algorithm forWSNs application

The third level of classification is selected according tothe attitude towards solving a specific problem Researchin WSNs area has focused on two separate aspects ofissues namely WSNs performance issues and applicationissues As WSNs nodes are usually resource constrainedsuch as energy communication bandwidth memory andresource aware algorithms are needed to maximize theWSNs performance On the other hand a WSNs applicationrequires data precision and accuracy fault tolerance eventprediction scalability and robustness and it often needsabundant use of energy communication and redundanciesThis leads to resource tradeoff whether someone sacrificesthe applicationrsquos performance in favor of network efficiency orwants to get the best application performance and deal withthe network resource issues such as energy in some other way(larger battery renewable sources with the nodes) For thisreason WSNs performances or application-specific-orientedapproaches have been selected as the lowest-level classifica-tion criteria The taxonomy of data mining techniques forWSNs is presented in Figure 1

4 State of the Art of Data Mining Techniquesfor WSNs

In this section data mining techniques designed for WSNsare classified using the taxonomy framework presented inSection 3 and the characteristics and performance analysisof each technique is discussed

4 International Journal of Distributed Sensor Networks

Data mining techniques for WSNs

ClassificationClusteringSequential miningFrequent mining

Distributed Centralized Distributed Centralized Distributed Centralized Distributed Centralized

WSN performance

WSN performance

WSN performance

WSN performance

WSN performance

WSN performance

WSN performance

Application based

Application based

Application based

Application based

Application based

Application based

DSARMCARM

Distributed data aggregation

Association rules mining framework

Online algorithm Lightweight rule

learning

MPGPTSP

Relational frameworkEpisode discovery

Contextual patterns discovery

Pattern learner MSAP

DCC

Prediction model CAG

Clustering sensory data Attribute-based clustering

DHCS

EEDC

Prediction framework FVLD

online learning

Person identification algorithms

NNTC Fuzzy predictor model

LWClass

SP-treeH-cluster

In-network datamining

TMP-mine One-class quarter-sphere SVM

Figure 1 Taxonomy of data mining techniques for sensor networks

41 Frequent Pattern Mining In this section we review someof the works that have been proposed for mining frequentpatterns from WSNs data Frequent pattern mining is usedto find the group of variables that co-occur frequently inthe data-set The aim is to find the most interesting relationsbetween variables Traditional frequent pattern mining algo-rithms [36ndash39] are the CPU and the IO intensive making itvery expensive to mine dynamic nature of WSN data Unlikethe mining static database dynamic nature of WSNs data ledto the study of online mining of frequent itemset As a resulttraditional frequent pattern mining algorithms are modifiedaccording to nature of WSNs data

The basic frequent pattern mining technique is associ-ation rule mining technique The first known associationrule mining algorithm is Apriori [40] It is based on level-wise candidate generation and test methodology by makingseveral scans over database In each iteration the patternsfound to be frequent are used to generate possible frequentpatterns (the candidates) to be counted in the next iterationTherefore theApriori technique finds the frequent patterns oflength 119896 from the set of already generated candidate patternsof length 119896 minus 1 In the subsequent step the association rulesare generated by computing the support and confidence ofeach frequent item in given database 119863 which is defined asfollows

Support (119860) =Sup (119860)119863 (1)

where Sup(119860) is the number of occurrence of 119860 in database119863 Consider the following

Confidence (119860 997888rarr 119861) =Sup (119860 cup 119861)Sup (119860)

(2)

This is impractical in the context of sensor networksas it implies that all data has to be stored somewhereHowever recently there has been a growing amount of workon discovering frequent item-sets from a data stream oftransactions such that every transaction is considered onlyonce and can be deleted afterwards

The other basic approach from mining association ruleis FP-growth [41] which can discover frequent patterns byreducing the database scans by two and eliminating therequirement of candidate generation as compared with Apri-ori With the first database scan the algorithm finds theset of distinct items with respective support count (iefrequency) in the database Then with the second databasescan the algorithm summarizes the database in the form ofa frequency-descending tree (ie the FP-tree) The completeset of frequent patterns is then mined from the FP-treeby recursively applying a divide-and-conquer-based patterngrowth approach called the FP-growth algorithm withoutadditional database scan The highly compact FP-tree struc-ture introduced a new wing of research in mining frequentpatterns However the static nature of the FP-tree and twodatabase scans still limit its applicability to frequent patternmining over a WSNs data Recently several centralized and

International Journal of Distributed Sensor Networks 5

distributed solutions have been proposed with the aimto maximize the WSNsrsquo performance and maximize theapplication-based performance by applying Apriori-like andFP-growth methods over WSNs data

411 Centralized Approaches Aim to SolveWSNsrsquo Application-Based Issues Halatchev and Gruenwald [42] proposed acentralized methodology called data stream association rulemining (DSARM) to identify the missing sensorrsquos readings Ituses the association rulemining algorithm to identify sensorsthat report the same data for a number of times in a slidingwindow called related sensors and then estimates the missingdata from a sensor by using the data reported by its relatedsensors Due to the stream nature of sensor data applyingan association mining algorithm such as Apriori directly tosensor data is not possible This situation led the authorsto propose the DSARM framework that adapts the Apriorialgorithm to make it applicable to the data stream receivedfrom sensor nodesThis technique is evaluated by simulationexperiments on real data collected by the Department ofTransportation in Austin TX USA to estimate missingvalue in related data streams Performance evaluations wereconducted to compare DSARM and alternative approachesThe results show that DSARM requires more memory spaceand takes longer to produce estimation than the consideredalternative approaches it achieves better accuracy of theestimated value than the alternative approaches do Howeverthere exist some limitations in DSARM First it is basedon two frequent itemsets association rule mining whichmeans that it can discover the relationships only between twosensors and ignore the cases where missing values are relatedwith multiple sensors Second it finds those relationshipsonly when both sensors report the same value and ignoresthe cases where missing values can be estimated by therelationships between sensors that report different values

Jiang and Gruenwald [43 44] proposed a data estimationtechnique called CARM (closed item-sets-based associationrule mining) which can derive the most recent associationrules between the sensors in the current sliding window Thetechnique is based on the closed frequent item-sets miningalgorithmof data streams calledCFI-stream [45] Itmaintainsan in-memory data structure called direct update (DIU) treeto store closed item-sets When a new transaction arrivesthe algorithm checks each item-set in the transaction over adata stream slidingwindowonline and incrementally updatesthe closed item-setsrsquo support If CRAM found some missingvalues in sensor reading instead of generating all possibleassociation rules it generates the rules that have strongrelationships with the current round of sensor readingswhereone or more readings are missing Based on these rules andselected closed item-sets CRAM generates the estimatedvalues which contain item values that are not included inthe original readings Figure 2 redrawn from [43] shows theDIU tree after receiving first four transactions It shows thatcurrently there are four closed item-sets C AB CD andABCin the DIU tree and their associated supports at the right-upper corner are 3 3 1 and 2 A basic set of rules is generatedfrom these frequent item-sets All other rules can be inferredfrom this basic rule set

Φ

CDTim

eline

TID

1

2

3

4

Items

C D

A B

A B C

A B C

AB 3

C 3

ABC 2

Figure 2 Lexicographical-ordered direct update tree

412 Centralized Approaches Aim to Maximize WSNsrsquo Per-formance Loo et al [46] have proposed online one-passalgorithms for mining large sensor streams They mine thefrequent value set from sensor stream data by transformingthe stream data into interval list (IL) under lossy countingframework [47] The time is divided into equal-size intervaland snapshot from the sensor reading is taken when there isan update on sensor reading Sensorsrsquo value at that snapshotconstructs the value sets stored in database An Apriori-based strategy is used to mine the value sets The analysisof IL-based presentation of stream data showed favorableresults using synthetic data-set However while computingthe IL of candidate value set redundant intersection ofIL is inevitable which affects the performance in termsof time and computation cost The proposed technique isevaluated by comparing the performance of ILB againstan application of lossy counting (LC) using a weightedtransformation method on synthetic dataset According totheir experiments ILB outperforms LC significantly for largesensor networks Moreover both the processing time andmemory consumption of ILB are more stable than those ofLC

Chong et al [48] proposed a rule-learning model thatfinds strong rules from sensor readings The rules are used asa trigger to control sensor network operations for examplethey can be used to sleep sensor or reduce data transmissionto conserve energy To mine the rules Apriori is modified tocount the number of transactions that are frequent insteadof the item-sets within transactions and transactions areprocessed in batches 119887

1 1198872 119887

119883 Suppose there is node

119872 that collects light temperature and microphone readingfrom three other sensor streams 119878

0 1198781 and 119878

2 Initially 119872

is queried to collect all sensory values it is used to generatea rule of the form of 119886

119899which implies 119886

119899minus1 therefore the

rule is extracted and only 119886119899is sent to the base station Upon

receiving the reading 119886119899and utilizing knowledge of the rule

the reading of 119886119899minus1

can be inferred All extracted rules arestored in rule repository The proposed method is validatedby using simulation implemented in C language on syntheticdataset In the experiment the first correlated data receivedfrom sensor is used to extract rules For subsequent phasethese rules are used to infer reading of sensor for the nextround

Tanbeer et al [49] proposed a tree-based data structurecalled sensor pattern tree (SP-tree) to generate association

6 International Journal of Distributed Sensor Networks

rules from WSNs data with one database scan The mainidea of the proposed approach is to obtain the frequencyof all event-detecting sensorsrsquo data construct a prefix-treebased on that in any canonical order and then reorganizethe tree in a frequency descending order Through thereorganization the SP-tree canmaintain the frequently event-detecting sensorsrsquo nodes at the upper part of the tree whichin turn provides high compactness in the tree structureOnce the SP-tree is constructed FP-growthmining techniqueis applied to find the frequent event-detecting sensor setsExperiments are performed to verify the improvement inmemory consumption and runtime that SP-tree achieves overPLT [50] The experiments show that SP-tree outperformsPLT in time and memory consumption The reason of suchgain is two folds first the PLT construction requires twodatabase scans while SP-tree constructs the tree by scanningthe database only once second the mining phase of SP-tree is highly efficient due to the frequency-descending treestructure

413 Distributed Approaches Aim to SolveWSNsrsquo Application-Based Issues Romer [51] proposed an in-network data min-ing technique to discover frequent patterns of events withcertain spatial and temporal properties In this approach userspecifies the upper boundmaxscope andmaxhistory (variableto be measured in seconds) for the patterns of interest Thesensor collects these events and applies amining algorithm todiscover the pattern that satisfies the given parameters Eachnode in the network collects the events from its neighborswithin themaximum scope and keeps a history of their eventsfor duration of the maximum history After that each nodeapplies a mining algorithm to discover the local frequentpatterns The resulting frequent patterns are converted toassociation rules that describe an event of type 119864 that occursat node 119899 with support 119878 and confidence 119862 Local patternsare sent to the sink where secondary mining is performed tocompute the global picture of entire network The algorithmis implemented on BT node (bluetooth radio) platform [52]and the tradeoff between scope of the query and resourceconsumption on real dataset is evaluated Results show byreducing the scope of the query that the proposed approachcould decrease resource consumption Major issues in thisapproach are memory consumption of itemset discoveryalgorithms and the communication overhead of event collec-tion

414 Distributed Approaches Aim to Maximize WSNsrsquo Perfor-mance Boukerche and Samarah [15] presented a distributeddata extraction methodology to aggregate the data on sensornode which reduced the number of messages during trans-mission The distributed solution sends some parameterssuch as support time-slot size and historic period from sink toall nodes within network Each sensor node has its own bufferentry to set the support value After each time slot nodescheck whether there are messages received during this timeslot if yes then that node will set its buffer entry When thehistoric period ended each node will traverse its buffer if thenumber of set value is more than or equal to support value

provided initially then the message would be transfered tosink To evaluate the validity of the distributed approach it iscompared with the centralized methodology on real datasetThey conducted two experiments using historical periods of 5and 10 days with minimum support values ranging from 10to 90 and a time-slot size equal to 30 seconds All of thereported results show a reduction in the number of messagesand the data sizewhile increasing in the support valuesMajorissues in thismethodology are increase in cost for node bufferand also delay in crucial messages in case of high supportvalue

Boukerche and Samarah [50] proposed the positionallexicographic tree (PLT) structure for mining associationrules in which the event-detecting sensors are the mainobjects of the rules regardless of their values Similar to theFP-growth approach PLT follows a pattern growth miningtechnique The mining begins with the sensor having themaximum rank by generating the frequent patterns from itsPLT in a recursive way The computation is required at eachrecursion to update the PLT involved in the prefix part ofa pattern Therefore two database scans requirement andthe additional PLT update operations during mining limitthe efficient use of this approach in handling WSNs dataThe performance evaluation is done by comparing the PLTstructure with the FP-growth algorithm According to theirresults PLT structure outperforms FP-growth in terms ofCPU time and memory usage for all of the support valuesused the enhanced performance using PLT when comparedwith FP-growth ranges from 30 percent to 50 percent

42 Sequential PatternMining (SPM) Frequent patternmin-ing has been extended to find more complex structuresuch as sequential pattern mining It discovers frequentsubsequences as patterns in a sequence database A sequencedatabase stores a number of records where all records aresequences of ordered events with orwithout concrete notionsof time A large number of real-world domains such as userprofiling medicine local weather forecast and bioinformat-ics show an inherent tendency to be modeled by means ofsequences of eventsobjects related to each other This greatvariety of applications of sequential pattern mining makesthis problem one of the central topics in WSNs data miningas shown by the research efforts produced in the recent yearsThe sequential pattern mining techniques in sensor networkbased on either traditional sequential mining algorithmssuch as Apriori-like algorithm [53] Apriori-based methodsGSP [54] PSP [55] and pattern growth approaches FreeSpanand PrefixSpan [56 57] or some new algorithm are devisedspecifically to work with sensor network environment

421 Centralized Approaches Aim to SolveWSNsrsquo Application-Based Issues Esposito et al [58 59] presented a multi-dimensional relational sequence mining framework to iden-tify the hidden frequent temporal correlations betweensensor nodes The algorithm is based on generic level-wise search method called APRIORI [60] for discoveringcorrelated sensors The framework exploits the relationallanguage to describe the temporal evolution of a sensor

International Journal of Distributed Sensor Networks 7

network along with contextual information by working intwo phases Firstly an abstraction step is to segment andlabel the real-valued time series into similar subsequencesby using a kernel density estimator approach Then theknowledge is enriched by adding interval-based operatorsbetween the subsequences obtained in the discretization stepand the relation pattern mining algorithm has been extendedin order to deal with these new operators By taking intoaccount the interval-based temporal data along with contex-tual information about events it discovers interesting andmore human-readable patterns The framework is evaluatedon real dataset collected from a wireless sensor networkmade up of 54 Mica2Dot [61] sensors deployed in the IntelBerkeley Research Lab [62] Each sensor collected topologyinformation along with humidity temperature light andvoltage values once every 31 seconds Results show the strongcorrelation among some measurements which is useful foranomaly detection

Cook et al [21] present MavHome smart home archi-tecture which focuses on the creation of an intelligenthome perceiving the state of the home through sensors andacting upon the environment through device controllers Animportant characteristic of the proposed architecture is theability to make decisions based on predicted activities Topredict the activities an algorithm called episode discovery(ED) is proposed which is based on the work of Srikantand Agrawal [54] for mining sequential patterns from time-ordered transactions Values that can be predicted include theusage pattern of devices in the home the movement patternsof the inhabitants and the typical activities of the inhabitantsThey utilize prediction algorithms on action sequences storedin inhabitant event history to forecast user actions Actionscan then be automated based on the significance of minedpatterns as well as the predictive accuracy of the next eventA key disadvantage is the fact that the entire action historymust be stored and processed off line which is not practicalfor large prediction tasks over a long period of time Cook etal demonstrated the effectiveness of MavHome on syntheticsmart home data and real data collected by students usingX10controllers in their homes Experiments show a predictiveaccuracy as high as 534 on the real data and 944 on thesynthetic data

Rabatel et al [22] presented a strategy to detect anomaliesfrom sensor data to improve the railway maintenance Theyextract sequential pattern from real railway data and identifythe abnormal behavior Based on these abnormal findingsalarms are automatically triggered to notify potential fail-ures This abnormal behavior depends on environmental(weather conditions travel characteristics) and structural(route episode index in the route) changes in data ThePSP [55] algorithm has been used to identify the sequentialpatterns To tackle the environments conditions a contextualknowledge-based method is proposed which is able toprovide information on the seriousness and possible causesof a deviation The proposed technique helps in proactivemaintenance of train However real-time context can beimproved by providing precise and exact information foranomaly detection

a q kTqkTaq

Figure 3 Example of sequential alarm pattern

Guralnik and Haigh [23] use sequential pattern miningto learn typical behaviors of humans in their homes Humanbehavior is inferred by using motion sensors pressure padsdoor latch sensors and toilet flush sensors They installed10ndash20 sensors of different types in a home and built modelsof what sensor firings correspond to what activities in whatorder and at what time For example ldquoIn 60 of the daysthe Kitchen-Motion sensor fires between 18h00 and 18h30and then the Living-Room-Motion sensor fires between18h20 and 20h00 and then the Bedroom-Motion sensor firesbetween 19h45 and 22h00rdquoTheir algorithm uses these data tolearn the sequences of rooms in which the person was actingand it uses domain knowledge to extract the sequences ofrooms the person was acting in These sequences are thenanalyzed by a human expert to identify complex behaviormodels These models can be used to select the appropriateresponse plan to the action of elderly

Wu et al [63] proposed a new algorithm for miningsequential alarm patterns (MSAPs) from the alarm datagenerated by GSM system Sequential events are identifiedfrom alarm data by defining time interval between adjacentevents For example if time is set as six hours then thesequential alarm pattern (119886 119887 119888) indicates that 119886 119887 and 119888happen in order and that the time interval between 119886 and119887 and between 119887 and 119888 is less than six hours An exampleof sequential alarm sequence redrawn from [63] is shown inFigure 3

The number in circle represents the error ID and 119879119886119902

denotes the time difference between alarm event 119886 and alarmevent 119902 The knowledge extracted is not only useful foridentifying relevance between two events but it is also predictthe alarm sequence and takes proper steps to prevent theoccurrence of the alarms if at all possible For example if thenetwork operator detects that the alarm 119886 occurring at time 119905operator should dissipate this alarm before the time 119905+119879

119886119902to

alleviate the abnormal situations incurred The limitation inthis technique is that it cannot discover other possible time-interval patterns between the events

It is observed that there is none of centralized solutionswhich aim to maximize the WSNsrsquo performance

422 DistributedApproaches Aim to SolveWSNsrsquo Application-Based Issues Tseng and Lu [64] proposed an object trackingstrategy named themultilevel object tracking (MLOT) to dis-cover sequential patterns in object tracking sensor networks(OTSNs) by mining the movement log in sensor networks Amultilevel hierarchical structure is adapted by using the clus-tering mechanism that represents the hierarchical relationsamong sensor nodes to achieve the goal of keeping track ofmoving objects in a real-time manner The movement logsof the moving objects are analyzed by developing the data

8 International Journal of Distributed Sensor Networks

mining algorithm movement pattern generation (MPG) toobtain themovement patterns which are then used to predictthe next position of a moving object and to activate the leastsensor node The MPG is based on Apriori which uses thefrequency of the inference pattern to evaluate the confidenceof the pattern and which with the highest frequency serves asthe basis of the prediction

423 Distributed Approaches Aim to Maximize WSNsrsquo Per-formance Tseng and Lin [65] proposed an object trackingstrategy named TMP-mine to discover sequential patternsin object tracking sensor networks (OTSNs) by mining thetemporal movement patterns (TMPs) logs The discoveredtemporal movement rules (TMRs) are used to predict thelocation of next objects for saving energy In the proposedmodel object is able to record the sensor nodes it visitedalong with the arrival time at each nodeThemovement log iscollected by equipping the sensor nodes with storage devicesTheWSN collects and integrates themovement log ofmovingobjects The integrated movement log is used as the input tothe data mining method named the TMP-miner which usesthe pattern growth approach for discovering the TMPs Byapplying the TMP-mine algorithm the TMPs are discoveredand then the temporalmovement rules (TMRs) are generatedfor predicting next location of moving object Suppose thatthe following two rules are discovered by vehicle trackingsystem

Rule 1 (Station A rarr interval 10min rarr Station B rarrinterval 5min rarr Station C)

Rule 2 (Station A rarr interval 20min rarr Station B rarrinterval 5min rarr Station rarr D)

By dispatching these rules to the corresponding sensornodes the tracking can be made in energy-efficient way Forexample if a car moves with the pattern as (Station A rarrinterval 10min rarr Station B rarr interval 5min) that matcheswith Rule 1 then the node in Station B has only to activatethe node in Station C rather than that in Station D or thosearound Station B

Samarah et al [66] proposed an energy-efficientprediction-based tracking technique by using the sequentialpatterns (PTSPs) This technique helps to predict the futurelocation of a moving object with the minimum number ofsensor nodes while keeping the other sensor nodes in thenetwork in sleep mode The PTSP is based on the inheritedpatterns of the objects movements in the network and theutilization of sequential patterns to predict in which sensornode the moving object will be heading next

43 Clustering Clustering is unsupervised learning wheregiven data is categorized into subsets so that each subsetrepresents a cluster which has distinctive properties It hasbeen considered a useful technique especially for applicationsthat require scalability to large number of sensor nodesClustering also supports aggregation of data in order tosummarize the overall transmitted data

ClustersInput sensor data

Feedback

Identification ofdata correlation Grouping data

Figure 4 Data clustering for sensor networks

In the current literatures problems related to clusteringare addressed by node clustering or data clustering Recentlylarge numbers of node clustering algorithms have beendesigned for WSNs [67ndash83] These clustering techniqueswidely vary in their objectives depending on the node deploy-ment and bootstrapping schemes the pursued networkarchitecture the characteristics of the cluster head (CH)and the network operation model Although node clusteringmay be related to data clustering for example consideringdata similarity of neighboring node many popular nodeclustering algorithms that partition the sensor nodes into anumber of small groups and elect a cluster head for everygroup do not use the data mining techniques directly In thisstudy we only focus on data clustering techniques to efficientdata mining and find data correlations among the nodesFigure 4 shows the commonly used data clustering in datamining process

This work adapted the K-mean hierarchical and datacorrelation-based methods The k-mean algorithm takes theinput parameter k and partitions a set of 119899 objects into kclusters so that the resulting intracluster similarity is highbut the intercluster similarity is low Cluster similarity ismeasured with respect to the mean value of the objectsin a cluster Hierarchical method creates a hierarchicaldecomposition of the given set of data objects It works bygrouping data objects into a tree of clusters whereas datacorrelation-based clustering forms clusters based on spatialand temporal correlations with similar node sensory valueswithin a given threshold and these clusters remain fixeduntil the sensory value threshold has changed over timeWhen the threshold values change the related sensor nodeswill then communicate with neighboring nodes associatedwith other clusters to change their cluster memberships Thedrawback of this type of clustering is that it does not considernode residual energy It is observed from the survey that thecentralized and distributed clustering solutions are aim tomaximize the WSNs performance

431 Centralized Approaches Aim to Maximize WSNsrsquo Per-formance Liu et al [84] proposed a centralized graph-basedenergy-efficient data collection (EEDC) EEDC is on-demandclustering algorithm that clusters node into groups such thatmembers have similar sensor readings and thus the protocolclusters the network with an awareness of the phenomenabeing sensed EEDC is a centralized approach where thesink compares data from different nodes with a user-defineddissimilarity measure EEDC models the cluster creationprocess as a clique-covering problem by constructing a graph119866 such that each sensor node is a vertex in the graph An edge(119906 V) is drawn if the dissimilarity measure between vertex119906 and vertex V is less than or equal to the given intracluster

International Journal of Distributed Sensor Networks 9

dissimilarity measure thresholdmax dst A cluster is a cliquein the graph and the clustering problem uses the minimumnumber of cliques to cover all vertices in the graph Thisprocess minimizes the number of clusters and maximizes theenergy saving The sink also dynamically adjusts the clustersbased on spatial correlation and the received data from thesensors The algorithm produces robust and well-balancedclusters However due to centralized processings it is notsuitable for large-scale WSNs

432 Distributed Approaches Aim toMaximizeWSNsrsquo Perfor-mance Guo et al [85] proposed the H-cluster a distributedalgorithm to cluster sensory dataThe input of this algorithmis the set of sensory data collected by all of the sensorsfrom the time WSN starts working up to the current timeThe output of the algorithm is a set of cluster featuresthat summarize the clusters of the input sensory data-setHilbert-Map mapping algorithm has been used to map ad-dimensional sensory data space into a 2-dimensional areacovered by a given WSN H-cluster has 2 phases (1) itmerges connected grid features with local cluster featuresof (sensory dimensional) D at each destination node (2)it combines the connected local clusters to global clustersThe experiments on the centralized and distributed dataare carried out to compare the H-Cluster with C-Cornerand C-Center algorithms During experiment four types ofenvironment attributes are sensed by the sensors which aretemperature humidity light and voltage The results showthatH-Cluster algorithm ismuch efficient in data loss energyand the quality of cluster data in small WSNThe results alsoshows that as the amount of sensory data delivered increasesthe amount of data loss also increases and energy efficiencydecreases by increasing the size of WSNs

Yeo et al [86] proposed data correlation-based clusteringscheme (DCC) based on similarity of sensor data along aspatial suppression scheme which helps to reduce the datasize DCC enhances the advertisement phase of HEED [71]in which cluster heads are selected according to probabilityof becoming a cluster head during this phase sensor nodescommunicate with each other and the resulting clustersare organized by sensor nodes which have similar readingsSpatial suppression is performed on cluster head and italso computes the difference between sensor reading andrepresentative value If a cluster head has redundant datait will remove it except for the node identification Theexperimental results justify the hypothesis claim that theclustering based on data correlation has better compressionperformance than ordinary clustering based on locality ofcommunication they show that DCC reduces 40 of datasize through suppression and prolongs network lifetime20ndash30 However for the large-scale network applications(nodes gt 500) DCC is inefficient because each cluster headneeds more energy to collect similar data readings and alsoto communicate with several nodes Also in case of lowpercentage of similar data reading DCC is ineffective due tohigher rate of cluster head creation

Beyens et al [87] proposed a cluster-based architecturefor wireless sensor networks in which cluster heads spa-tiotemporally correlate and predict the measurements of the

cluster members by executing their prediction model Intheir approach the cluster heads execute a prediction modelwhile gateway nodes at the circumference of the clusters areresponsible for the routing task Prediction model is used toselect a suitable node of the cluster to be activated The ideais to put a sensor node to sleep when there are no objects inits sensing region

Yoon and Shahabi [88] present the clustered aggregation(CAG) algorithm that forms clusters of nodes sensing similarvalues within a given threshold (spatial correlation) andthese clusters remain unchanged as long as the sensor valuesstay within a threshold over time (temporal correlation)By grouping nodes on similar values CAG only transmitsone reading per group When the threshold values changethe related sensor nodes will then communicate with neigh-boring nodes associated with other clusters to change theircluster memberships CAG guarantees the result to be withina user-specified error-tolerance threshold Cluster formationis performed while queries are disseminated to the network(query phase) where clusters group nodes sensing similarvalues Subsequently CAG enters the response phase whereinonly one aggregated value per cluster is transmitted up theaggregation tree CAG is a lossy clustering algorithm (mostsensory readings are never reported) which trades a lowerresult precision for a significant energy storage computationand communication saving

Taherkordi et al [67] proposed a communication-efficient distributed protocol for clustering sensory dataA distributed version of 119870-Mean clustering algorithm isproposed and sends summarized data towards sink whichreduces the communication transmission time and powerconsumption of sensor nodes The sensor network is dividedinto clusters and cluster head node will only communicatewith sink Initially base station transmits current centerlocations to cluster heads Cluster head collects data fromits sensor node and sends it to the base station includingcount and vector sum of its local sensory data points aswell as sum of the squared distance from each local pointto its center On receiving data from CH the base stationupdates the cluster mean and the algorithm repeats until thefunction convergence is met The efficiency of the algorithmis evaluated via simulations Several programs are run to getthe average number of transmissions over the network duringeach test According to results the communication cost isindependent of the number of sensors (119873) and increaseslinearly by increasing the number of centers Major issuesare extra memory for cluster head and computation powerfor summarization of data before transmitting to sink Alsothe algorithm requires multiple rounds of message passingbetween cluster heads and the base station this may have aserious effect on communication efficiency when the numberof sensors is relatively high

Wang et al [89] promoted the idea of clustering theWSNs based on the queries and attributes of the data Themain motive is to achieve efficient dissemination of data inthe network The concept resembles the data-centric designmodel of WSNs The clustering is established by mappinga hierarchy of data attributes to the network topology Thebase station starts the clustering process by asking nodes

10 International Journal of Distributed Sensor Networks

Class label (Y)

Attribute set (X)

OutputInput Classification model

Figure 5 Classification maps input attribute set (X) to class label(Y)

to form clusters Those nodes that hear the request decidewhether they should nominate themselves as CHs basedon their energy After receiving the base-station requestsensor nodes having intention to become CHs wait for arandom time period that is based on the remaining batterysupply If a node nominates itself then it broadcasts anannouncement to all nodes A node joins the CH that itcan reach over the least number of hops Upon hearing aCH announcement from a node whose attribute is differentthe recipient node establishes a new cluster for that attributeand becomes a CH To evaluate the attribute-based clusteringscheme the authors have provided the theoretical analysis ofit with flooding-based schemes Analysis shows its attribute-based clustering scheme yield that gains over flooding-basedschemeswhen there are subregions in the sensor network thatare more targeted than others that is when the distributionof inquiries is not uniformly distributed over time and space

Ma et al [90] the proposed distributed hierarchicalclustering and Summarization algorithm (DHCS) for onlinedata analysis and mining in sensor networks The proposedmethod clusters sensor nodes based on their current datavalues aswell as their geographical proximity and it computesa summary for each cluster The algorithm adopts severaltechniques such as difference and hop count thresholds tomodel node and distance-based clustering Initially eachnode treats itself as an active cluster Then similar adjacentclusters are merged into larger clusters round by round Ineach round each cluster will try to combine with its mostsimilar adjacent cluster simultaneously Two clusters can bemerged only if both consider one another as the most similarneighbor DHCS terminates when no merging happens anymore The final clusters which cannot be merged any moreare called steady clusters

44 Classification Classification is a task of assigning newobject into a class of predefined object categories Classifi-cation model is learned using the set of training data andclassifies new data into one of the learned class Figure 5shows that classification maps input attribute set (X) to classlabel (Y)

Classification-based approaches have adapted the tra-ditional classification techniques such as decision tree-based rule-based nearest neighbor-based and support vectormachines-based techniques based on type of the classificationmodel that they used Decision tree is a classifier in the formof tree and classifies the instance by starting at the root oftree and moving through it until a leaf node where class labelis assigned The internal nodes are used to partition datainto subsets by applying test condition to separate instancesthat have different characteristics Nearest neighbor-basedapproaches classify dataset based on closet training examples

The training examples are vectors in a multidimensionalfeature space with corresponding class labels A nearestneighbor classifier is a lazy learner that does not processpatterns during training [91] To respond a request to classifya query vector is made to locate the closest training vectorsaccording to the distance metricThe classes of these trainingvectors are used to assign a class to the query vector

Rule-based classifier groups the dataset in predefinedclasses by using ldquoif then rdquo rules of following form

(Condition) rarr Y condition is a conjunction ofattribute and Y is a class label

SVM (support vector machine) techniques partition thedata belonging to different classes by fitting a hyperplanebetween them which maximizes the partition The data ismapped into a higher-dimensional feature space where it canbe easily partitioned by a hyperplane Furthermore a kernelfunction is used to approximate the dot products between themapped vectors in the feature space to find the hyperplane

441 Centralized Approaches Aim to SolveWSNsrsquo Application-Based Issues Chikhaoui et al [92] proposed the decisionTree (DT-) based classification technique for sensor dataThey applied the classification model to identify the personsin ubiquitous environment In order to identify personsthe proposed approach first extracts frequent patterns calledepisodes from the datasets using the Apriori algorithm [53]The next step evaluates the extracted patterns and assignsweights to these episodes to construct frequent episodeweight matrix (FEWM)

Finally the classification algorithm Decision tree (DT) isapplied on FEWMDT builds pattern classifier from a labeledtraining data-set using a divide-and-conquer approach Tobuild up a DT model it recursively selects the attribute thatis used to partition the training data-set into subsets untileach leaf node in the tree has uniform class membershipThe proposed approach is validated by experiment usingdata collected from the Domus Laboratory [93] and theTestbed smart home [94] The general performance andclassification accuracy of algorithm are evaluated by usingthe Weka framework version 370 [95] Experiment resultsshow good classification However using frequent episodesalone without temporal constraints and deep analysis doesnot guarantee good identification

Sharma et al [96] proposed amethodology for classifyingthe sensors data by using nearest neighbor trajectory clas-sification (NNTC) The training phase simply stores everytraining example with its label To make a prediction for atest example first its distance to every training example iscomputedThen 119896 closest training examples are storedwhere119896 is a fixed integer and 119896 ge 1 among the 119896 examples itlooks for the label that is most frequent This label is theprediction for this test example The algorithm is evaluatedby building a classifier from the preprocessed training datagenerated from NS2 [97] and test trajectory data [98] usingclass labels Experimental investigation yields a significantoutput in terms of the correctly classified success rate 923

Akhlaghinia et al [99] proposed the prediction techniquein smart home environments to predict the behavior pattern

International Journal of Distributed Sensor Networks 11

of occupantsThe sensor NWs collect the variety of attributesincluding environmental changes and occupantrsquos interactionwith the environment The collected data is then used by thelearning approach to construct a classification-based predic-tive model to predict the ambient intelligence environmentoccupancy The occupancy is predicted by using the fuzzyrules which are modeled by using the past value of timeseries data In the learning process input from the sensor iscompared with stored rules to take appropriate action Theprediction-based approach improves the energy saving insmart homes and enhances the safety and security of occu-pants The result shows the ability of the proposed techniqueto predict the combined occupancy time series However themodel is implemented in single-user environment and unableto predict the complex environmental patterns in multi-userenvironment over long period

442 Centralized Approaches Aim toMaximizeWSNsrsquo Perfor-mance Gaber et al [100] proposed the lightweight classifica-tion (LWClass) a one-pass algorithm for on-board miningof data streams in sensor networks They used the algorithmoutput granularity (AOG) [101 102] technique to preserve thelimited memory size and change the algorithm output rateaccording to data rate available memory algorithm outputrate history and time constraints to fill the available memorywith generated knowledgeThe algorithmworks by searchingfor the nearest instance stored in main memory when a newelement arrives All instances are already stored in the mainmemory according to a prespecified distance threshold Thethreshold here represents the similarity measure acceptableby the algorithm to consider two or more elements as oneelement according to the elements attribute values If thealgorithm finds this element then it checks the class labelIf the class label is the same then it increases the weightfor this instance by one otherwise it decrements the weightby one If the weight becomes zero then this element isreleased from the memory The algorithm is empiricallyvalidated using synthetic streaming data under the resource-constrained environment of a common handheld computer

443 DistributedApproaches Aim to SolveWSNsrsquo Application-Based Issues McConnell and Skillicorn [103] presented adistributed framework for building and deploying predictorsin sensor networks By using the computational power ofeach sensor a powerful learning structure on whole networkis constructed A distributed voting approach is proposedin which each sensor is a leaf of tree (DT) to performlocal prediction Instead of sending the raw data the localpredictive models built on sensors transmit the target class tothe sink At sink the local predication models are combinedto construct global prediction model It shows how thelocal model enables sensors to respond to the change intarget by relearning local models The proposed frameworkis useful especially for sensor networks with limited energycomputation and bandwidth resources It makes efficientthe distributed data mining in the presence of movingclass boundaries Data is also confidentially achieved bytransmitting a predictivemodel instead of original data to the

sink The distributed prediction model is evaluated using J48decision tree (implemented in WEKA) on variety of datasetfor both simple and weighted voting schemes According toresults distributed prediction model has the potential of anincrease in accuracy combined with a reduction in modelsize and runtime as compared with a centralized approachMajor issues in this framework are the need of an expensiveCPU on each sensor node for computing and building localpredictive model and also extra memory is required to storelocal predictive model

444 Distributed Approaches Aim to Maximize WSNsrsquo Per-formance Malhotra et al [104] proposed a distributed clas-sification scheme to generate effective feature vectors of lowdimension (FVLD) for wireless audio network A distributedcluster-based algorithm for detection and classification ofvehicles has been proposed Sensors form clusters on-demand for the sake of running a classification task based onthe produced feature vectors The monitoring area is dividedinto clusters and a cluster head is selected for each clusterAll sensors send their feature vector to cluster heads Thecluster head combines all received feature vectors (includingone from itself) executes the classification task using forexample KNN or ML classifiers and makes decision on theclass of the unknown vehicle Two approacheswere proposedthe first combines extracted features and the second combinesindividual decisions Classification using decision fusion anda maximum likelihood (ML) classifier led to the best resultsML is also compared with KNN classifier with varioussettings of data and decision fusion schemes The proposedtechnique produced the best classification accuracy of 8946as compared with all other approaches

Flouri et al [105ndash107] have proposed distributed andincremental techniques for learning classification rules usingSVM-based (support vector machine) technique in a sensornetwork The authors proposed two distributed algorithmsthe distributed fix partition SVM (DFP-SVM) and theweighted distributed fix partition SVM (WDFP-SVM) fortraining a SVM applied to the classification problem in aWSN SVM is incrementally trained on example set calledsupport vector The fact with SVM is that the number ofsupport vectors is very small comparedwith the number of allsample values Besides the support vectors (and offset) revealcompressed representation of separating SVM hyperplaneThat is why sending only the support vectors instead ofall training samples to the next cluster head is obviouslyvery energy efficient due to communication reduction Aftertraining the required parameters of the kernel functions aretransferred to each node for classification The performanceof the proposed approach is evaluated by running number ofsimulation and comparison is made with centralized algo-rithm The results show that energy consumption decreaseswhen the SVM is trained incrementally as compared with thecentralized case However the challenges for SVM formula-tions are computational complexity and the choice of properkernel function

Rajasegarar et al [108] proposed the SVM-based tech-nique for outlier detection in sensor data This techniqueuses one-class quarter-sphere SVM to identify local outliers

12 International Journal of Distributed Sensor Networks

at each node and to minimize the computational complexityThe sensor data that lies outside the quarter sphere isconsidered as an outlier Each node communicates onlythe radius information of sphere with its parent for outlierclassification This technique identifies outliers from the datameasurements collected after a long-time window and is notperformed in real time The technique also ignores spatialcorrelation of neighboring nodes which makes the results oflocal outliers inaccurate The technique is evaluated by usingthe real sensor measurement collected from deployment ofwireless sensors in the Great Duck Island Project [2] formonitoring the habitat of sea birds The algorithm is imple-mented in Matlab and two simulations were run to measurethe computational strategy and various kernel functionsResults reveal that the proposed technique achieves signifi-cant energy savings in terms of communication overhead inthe network

5 Comparison of Data Mining Techniquesfor WSNs

This section identifies several common and different aspectsof data mining techniques specially designed for WSNsdiscussed above These aspects will be used as metrics in thecomparative Tables 2 3 4 5 and 6 First evaluation aspectsfor different techniques are discussed and then comparativetables are presented to compare and differentiate existing datamining techniques for WSNs data

51 Input Sensor Data Sensor data can be viewed as largevolume of real-valued data that is continuously collectedfrom WSNs The type of input sensor data demonstrateswhich data mining techniques can be used to analyze thedata Data mining techniques usually consider following twocharacteristics of data

Attribute Mining techniques can identify the associationbetween data attributes Attributes can be homogenous [50] orheterogeneous [33 48] Homogenous attribute means sensingsingle-value attribute for example temperature only Forheterogeneous case each nodemay be equippedwithmultiplesensors and can sense multiple attributes for example tem-perature humidity and pressure The data mining techniqueshould be able to identify the correlation between multipleattributes

Correlation Two types of data correlation appear at eachsensor node The first type is attribute correlation that isdependency among data attributes The second type is interms of time and space that is temporal and spatial corre-lation Temporal correlation indicates that the readings fromdifferent sensor node are observed at the same time instantand readings observed at one time instant are related tothe readings observed at the previous time instant whereasspatial correlation indicates that the readings from sensornodes geographically close to each other are expected tobe largely correlated Capturing spatiotemporal correlation

helps to predict future trend of sensor reading and identifica-tion of dead node if reading from correlated sensor ismissing

52 Processing Architecture In order to apply data miningtechnique on sensor data we need to determine the modelsof computation There are two general models Consider thefollowing

CentralizedThe simplest way to analyzeWSNs data is to use acentralized model In this approach entire raw data collectedfromWSNs is transferred to central server whichmaintains adatabase of readings from all of the sensorsThe central serverperforms offline extensive analysis in order to find interestingpatterns from the aggregated data With the size of WSNsincreasing the amount of data transmitted in the system willbecome huge The obvious drawback of this approach is highconsumption of energy and bandwidth Furthermore it is notscalable to very large number of sensors

Distributed Another computation approach uses distributedmodel in which sensor nodes use their processing abilitiesto carry out some mining tasks locally and transmit onlythe required and partially processed data called local modelLocal models contain the compact event patterns rather thanraw data For example data collected from different sensorcan be aggregated before being transmitted to central serverIn these systems an intermediate node called ldquoaggregatorrdquo isused to collect and aggregate the data from different sensorsSince sensor nodes are constrained in resources the challengefor this approach is how to satisfy the mining accuracywhile keeping the communication overhead memory andcomputational cost low

53 Data Mining Method It refers to the data miningalgorithm adapted or developed for unique characteristic ofWSNs data Distributed approaches use one-scan algorithmsfor real-time processing in order to deal with the high dataarrival rate the mining results are expected to be availablewithin short response times whereas centralized approachescollect the sensory data to single site and applies offlinemultiscan technique for extensive data analysis

54 Node Properties The proposed techniques are largelyinfluenced by following types of node properties

Connectivity Single-hop communication is a direct commu-nication between the sensor node and the base station It issimple and easy to implement but limited by communicationdistanceMultihop communication uses some kinds of nodesas relays when transmitting data packets from the source tothe sink which is more complex

Mobility Node mobility increases the complexity of design-ing an appropriate data mining technique for WSNs Themajority of techniques assumes that sensor nodes are staticonly a few techniques consider the node mobility Whennodes are mobile maintaining a certain structure for data

International Journal of Distributed Sensor Networks 13

Table2Com

paris

onof

dataminingtechniqu

esforw

irelesssensor

networks

Approach

Objectiv

eDM

metho

d

Processin

gSensor

data

Nod

eproperties

Implem

entatio

nLimitatio

nsArchitecture

Attributes

Correlatio

nCon

nectivity

Mob

ility

Nod

erole

Nod

etask

Applicationarea

Evaluatio

nmetho

dDatas

ource

Opt

objective

Distributed

Central

Homogenous

Heterogeneous

Attribute

Spatial

Temporal

Singlehop

Multihops

Static

Mobile

ClusterheadSensorRelay

Simulation

AnalyticalMod

Real

Synthetic

Frequent

patte

rnmining

DSA

RM[42]

Missingdata

estim

ation

Aprio

rilik

eradicradic

radicradic

radicradic

Sensea

ndsend

Traffi

cmon

itorin

gradic

radicData

accuracy

Igno

rethes

ensor

thatrepo

rts

different

values

In-networkdata

mining[51]

Eventspatte

rns

discovery

Aprio

rilik

eradic

radicradicradic

radicradic

radic

Aggregatio

nlocalp

attern

mining

Environm

ental

mon

itorin

gradicradicradic

Scalability

Highmem

oryand

commun

ication

Distrib

uted

data

aggregation[15]

ImproveW

SNperfo

rmance

Aprio

rilik

eradic

radicradic

radicradic

radicSupp

ort-b

ased

aggregation

WSN

sperfo

rmance

mon

itorin

gradic

radicDatas

ize

Increasesb

uffer

cost

delayed

crucialm

essages

Onlinea

lgorith

m[46]

Intervallist

ofrepresentatio

nof

WSN

sdata

Lossy

coun

ting

radicradic

radicradic

radicradic

Perio

dical

sensing

WSN

smon

itorin

gradic

radicTimea

ndmem

ory

Datar

edun

dancy

Lightweightrule

learning

[48]

Identifyhigh

lycorrelated

rules

forsensin

gAp

riorilik

eradic

radicradic

radicradic

radicQuery-based

data

sensing

Con

trolW

SNs

operations

radicradic

Energy

Not

valid

ated

well

onrealdata

CARM

[43]

Missingdata

estim

ation

FP-growth

based

radicradic

radicradic

radicradic

Sensea

ndsend

Dataa

nalysis

radicradic

Data

accuracy

Ineffi

cientfor

hand

ling

high

-speed

data

14 International Journal of Distributed Sensor Networks

Table3Com

paris

onof

dataminingtechniqu

esforw

irelesssensor

networkscon

tinued

Approach

Objectiv

eDM

metho

d

Processin

gSensor

data

Nod

eproperties

Implem

entatio

nLimitatio

nsArchitecture

Attributes

Correlatio

nCon

nectivity

Mob

ility

Nod

erole

Nod

etask

Applicationarea

Evaluatio

nmetho

dDatas

ource

Opt

objective

Distributed

Central

Homogenous

Heterogeneous

Attribute

Spatial

Temporal

Singlehop

Multihops

Static

Mobile

Clusterhead

Sensor

Relay

Simulation

Analyticalmod

Real

Synthetic

Frequent

patte

rnmining

Associationrules

mining

fram

ework[50]

Faultand

future

event

predictio

n

FP-growth

usingPL

T-str

uctureradic

radicradic

radicradic

radicradic

Aggregatio

nMon

itorW

SNs

quality

ofserviceradic

radicNoof

messages

Increase

costdu

eto

multip

leDBscan

SP-tr

ee[49]

Disc

over

events

patte

rns

FP-growth

based

radicradic

radicradic

radicradic

Sensea

ndsend

Generic

mon

itorin

gradicradicradic

Mem

ory

Hightre

econstructio

ncost

Sequ

entia

lpattern

mining

Relatio

nal

fram

ework[58]

Multi-

dimensio

nal

correlation

discovery

Aprio

rilik

eradic

radicradicradic

radicradic

Sensea

ndsend

Environm

ental

mon

itorin

gradicradic

Data

representatio

nMem

oryandtim

econsum

ing

Episo

dediscovery(ED)

[21]

Actio

npredictio

n

Generalized

sequ

entia

lpatte

rn(G

SP)

radicradic

radicradic

radicSensea

ndsend

Inhabitants

behavior

predictio

nradicradicradic

Predictio

naccuracy

Ineffi

cientfor

complex

activ

ities

MPG

[64]

Predicto

bjectrsquos

future

movem

ent

Aprio

rilik

eradic

radicradic

radicradicradic

Clusterin

gRe

al-timeo

bject

tracking

radicradic

Tracking

time

andenergy

Not

analyzed

onrealdataset

Con

textual

patte

rns

discovery[22]

Ano

maly

detection

PSP

radicradicradic

radicradic

radicSensea

ndsend

Railw

aymaintenance

radicradic

Ano

maly

precision

Missingreal-time

anom

alypredictio

n

International Journal of Distributed Sensor Networks 15

Table4Com

paris

onof

dataminingtechniqu

esforw

irelesssensor

networkscon

tinued

Approach

Objectiv

eDM

metho

d

Processin

gSensor

data

Nod

eproperties

Implem

entatio

nLimitatio

nsArchitecture

Attributes

Correlatio

nCon

nectivity

Mob

ility

Nod

erole

Nod

etask

Applicationarea

Evaluatio

nmetho

dDatas

ource

Optobjectiv

e

Distributed

Central

Homogenous

Heterogeneous

Attribute

Spatial

Temporal

Singlehop

Multihops

Static

Mobile

ClusterheadSensorRelay

Simulation

Analyticalmod

Real

Synthetic

Sequ

entia

lpattern

mining

TMP-mine[65]

Predicto

bjectrsquos

future

movem

ent

Patte

rngrow

thusingTM

P-tre

econstructio

nradic

radicradic

radicradic

radicRu

le-based

node

activ

ation

Real-timeo

bject

tracking

radicradic

Energy

Highmissing

rateandtim

e

Patte

rnlearner[23]B

ehavior

recogn

ition

Tree

projectio

nradic

radicradic

radicradic

radicSensea

ndsend

Behavior

mon

itorin

gradicradic

Noof

patte

rns

learned

Com

plex

and

redu

ndant

patte

rns

MSA

P[63]

Faultp

rediction

Cand

idate

constructio

nradicradic

radicradicradic

radicSensea

ndsend

Telecommun

ication

radicradic

Patte

rnsa

ccuracy

Cand

idate

constructio

nis

expensiveto

compu

te

PTSP

[66]

Objectrsquos

future

movem

ent

predictio

n

Sequ

entia

lpatte

rngeneratio

nradic

radicradic

radicradic

radicRu

le-based

node

activ

ation

Objecttracking

radicradic

Energy

Ineffi

cientto

predict

high

-speed

objects

Clusterin

g

DCC

[86]

WSN

slon

gevity

Data

correlation-

based

cluste

ring

radicradic

radicradicradic

radicradic

Data

supp

ression

GenericWSN

sapplication

radicradic

Energy

anddata

size

Highclu

sterin

grate

H-cluste

r[85]

In-network

commun

ication

Data

correlation-

based

cluste

ring

radicradic

radicradicradic

radicradic

Data

summarization

Real-time

mon

itorin

gradic

radicradic

Com

mun

ication

Highdataloss

rate

16 International Journal of Distributed Sensor Networks

Table5Com

paris

onof

dataminingtechniqu

esforw

irelesssensor

networkscon

tinued

Approach

Objectiv

eDM

metho

d

Processin

gSensor

data

Nod

eproperties

Implem

entatio

nLimitatio

nsArchitecture

Attributes

Correlatio

nCon

nectivity

Mob

ility

Role

Nod

etask

Applicationarea

Evaluatio

nmetho

dDatas

ource

Optobjectiv

e

Distributed

Central

Homogenous

Heterogeneous

Attribute

Spatial

Temporal

Singlehop

Multihops

Static

Mobile

ClusterheadSensorRelay

Simulation

Analyticalmod

Real

Synthetic

Clusterin

gPredictio

nmod

el[87]

Predictio

n-based

mon

itorin

gHeuris

ticscheme

radicradic

radicradic

radicradic

radicradicradic

Localprediction

mod

elEn

vironm

ental

mon

itorin

gradic

radicCom

mun

ication

Clustero

verla

pping

CAG[88]

WSN

sbandw

idth

gain

Data

correlation-

based

cluste

ring

radicradic

radicradic

radicradic

radicradic

Dataa

ggregatio

nGenericWSN

sapplications

radicradic

Com

mun

ication

Sensorydataloss

EEDC[84]

On-demand

cluste

ring

Data

correlation-

based

cluste

ring

radicradic

radicradic

radicradic

radicSensea

ndsend

Surveillanced

ata

analysis

radicradicradic

Energy

Ineffi

cientfor

large

WSN

s

Clusterin

gsensorydata[67]Com

mun

ication

efficiency

K-means

radicradicradic

radicradic

radicradic

Data

summarization

Dataa

nalysis

radicradic

Com

mun

ication

Ineffi

cientfor

large

WSN

sAttributeb

ased

cluste

ring[89]

WSN

sbandw

idth

gain

Hierarchal

cluste

ringradic

radicradic

radicradic

radicradic

Datac

luste

ring

Mon

itorin

gand

tracking

radicradic

Com

mun

ication

Highcompu

tatio

ncost

DHCS

[90]

Uniform

data

distr

ibution

Hierarchal

cluste

ringradic

radicradicradic

radicradic

radicradic

Datac

luste

ring

and

summarization

Interactived

ata

analysis

radicMessage

redu

ction

Nod

esenergy

isigno

red

International Journal of Distributed Sensor Networks 17

Table6Com

paris

onof

dataminingtechniqu

esforw

irelesssensor

networkscon

tinued

Approach

Objectiv

eDM

metho

d

Processin

gSensor

data

Nod

eproperties

Implem

entatio

nLimitatio

nsArchitecture

Attributes

Correlatio

nCon

nectivity

Mob

ility

Role

Nod

etask

Applicationarea

Evaluatio

nmetho

dDatas

ource

Opt

objective

Distributed

Central

Homogenous

Heterogeneous

Attribute

Spatial

Temporal

Singlehop

Multihops

Static

Mobile

ClusterheadSensorRelay

Simulation

Analyticalmod

Real

Synthetic

Classifi

catio

nPerson

identifi

catio

nalgorithm

s[109]

Identifyhu

man

behavior

Decision

tree

radicradicradic

radicradic

radicSensea

ndsend

Health

care

radicradic

Classifi

catio

naccuracy

Doesn

otgu

arantee

thec

orrectness

Predictio

nfram

ework[103]

Distrib

uted

predictio

nDecision

tree

radicradic

radicradicradic

radicradic

Localprediction

Generic

radicradic

Predictio

naccuracy

Com

putatio

nal

complexity

NNTC

[96]

Real-time

classificatio

nNearest

neighb

orradicradic

radicradic

radicradic

Sensea

ndsend

Generic

radicradicradic

Classifi

catio

naccuracy

Not

evaluatedon

realdataset

LWClass[100]

Preserve

WSN

sresources

KNN

radicradic

radicradic

radicradic

Sensea

ndsend

Ubiqu

itous

environm

ents

radicradic

Resource

awareness

Non

adaptio

nto

conceptd

rift

FVLD

[104

]Lo

w-dim

ensio

nfeaturev

ector

generatio

nKN

NM

Lradic

radicradic

radicradic

radicradic

Classifi

catio

nVe

hicle

classificatio

nradic

radicEn

ergy

Highcostof

feature

vector

transm

ission

Fuzzypredictor

mod

el[99]

Occup

ancy

predictio

nFu

zzyrules

radicradic

radicradic

radicradic

Sensea

ndsend

Health

care

radicradic

Predictio

naccuracy

Ineffi

cientfor

complex

scenarios

Onlinelearning

[105]

Increm

ental

classificatio

nSV

Mradic

radicradic

radicradic

radicradic

Classifi

catio

nEn

vironm

ental

mon

itorin

gradic

radicEn

ergy

Com

putatio

nal

complexity

One-class

quarter-sphere

SVM

[108]

Ano

maly

detection

SVM

radicradic

radicradic

radicradicradic

Localano

maly

detection

Habitat

mon

itorin

gradic

radicEn

ergy

Igno

resspatia

lcorrelation

18 International Journal of Distributed Sensor Networks

mining becomes difficult because updates on this structureshould be persisted over time

Node Role Node can perform three types of role [33] asfollows

(i) Regular Sensor These are the nodes with limitedresources and they are used to sense the phenomenaand send the sensed data to the base station

(ii) Cluster Head Cluster head can be a regular sensornode or it can be rich in resources In centralizedapproaches cluster head is a regular sensor node thatonly controls the cluster membership In distributedapproaches besides responding for cluster formationCHs perform aggregationfusion of collected sensorsrsquodata Therefore they are equipped with significantlymore computation and communication resources

(iii) Relay It is the node that acts as medium to transmitthe data packet from one node to the others

Node Task In centralized approach node task is to sense thephenomena being monitored and send the sensed data to thebase station In distributed approaches node can performcomputation and can take action based on the detectedphenomena or target

55 Application Area We also evaluated the type of applica-tion benefited fromWSNs data mining techniques Here weexemplify some real-world applications as follows

(i) First is the environmental monitoring [5ndash7 51 5887] in which sensors are deployed in harsh andunattended regions to monitor the natural environ-ment Data mining techniques can identify when andwhere an event may occur and trigger an alarm upondetection

(ii) Second is the habitant and health monitoring [1 299 109] in which patientshumans are equipped withsmall sensors on multiple different positions of theirbody tomonitor their health or behaviorDataminingtechnique can identify the abnormal behavior andhelp to take effective action

(iii) Third is the object tracking [3 4 65 66] in whichsensors are embedded inmoving targets to track themin real-time Data mining techniques help to improvethe estimation of the location of targets and also tomake tracking more efficient and accurate

(iv) Fourth is the WSNs performance [46 48 50 51]WSNs are usually unattended and deployed in harshenvironment Sensor nodes are resource constrainedespecially in terms of power Data mining techniqueshelp to identify the faulty or dead nodes Theyalso help to conserve energy by using in-networkprocessing in which aggregated data is sent to centralside

(v) Fifth is the data analysis [67 84 90] Data miningtechniques help to discover potentially interesting

data patterns in a sensor network for a certainapplication

(vi) Sixth is the real-time monitoring [64 65 85] Datamining techniques especially distributed techniqueshelp to identify certain patterns and predict futureevents in a given time window which make real-timeresponse and action feasible

56 Implementation Each technique is also evaluated interms of experimental validation that is which dataset isused which WSNs optimization objectives are achieved andso forth

Evaluation Method Analytical modeling simulation andreal deployment are the most commonly used techniques toanalyze the performance of data mining technique forWSNs

(i) Analytical Modeling This method is very complexand usually certain simplifications are assumed topredict the performance of the proposed schemeSuch assumptions and simplifications may lead toimprecise results with limited confidence

(ii) Simulation It is the most popular and effectiveapproach to design and test any proposed schemein terms of cost and time it also provides higherlevel of details as comparedwith real implementationHowever the appropriate selection of a simulationframework according to problem and network char-acteristics is a critical task

(iii) Real Deployment It may not be feasible to evaluatethe performance of these techniques through realdeployment due to the unavailability of appropriatehardware in terms of technical and design limitationsUsually the real deployment requires hundreds ofsensor nodes and cost becomes another importantissue In a nutshell evaluating any technique pro-posed for WSNs through real deployment can getthe most convincing results although the evaluatingprocess is complex costly and time consuming

Data Source It refers to dataset use to experimentally validatethe proposed technique Two types of dataset are usedgenerally that is synthetic and real It is observed from thispaper that most of the techniques use the simulation onsynthetic dataset to validate the result In this paper it isobserved that most of the studies used the simulation due tolimited processing power of sensor nodes

Optimization Objective SinceWSNs are constrained in termsof different resources the technique is also evaluated in theoptimization objective that has been achieved Most of thetechniques consider the resource constraint and differentdesign philosophies of network None of them can workefficiently for all of the performance metrics like networksize communication overhead energy efficiency memoryconsumption node mobility and and so forth The largevariations in the performance metrics make it a difficult taskto present a comprehensive evaluation

International Journal of Distributed Sensor Networks 19

6 Limitations of Existing Data MiningTechniques for WSNs

Tables 2ndash6 show the characteristics of datamining techniquesdesigned for WSNs It is observed from comparative analysisthat the existing techniques have the following shortcomings

(i) Most of the techniques do not take into account theheterogeneous data and assume that the sensor data ishomogenous [42 46 49ndash51 65 87 110] They ignorethe fact that different attributes together can improvethe mining accuracy In some cases homogenousdata cannot contribute appropriately toward real-time decision

(ii) The majority of techniques only considers the spatialor temporal or spatiotemporal correlations [65ndash6787 88] among sensor data of neighboring nodes anddoes not consider the attribute dependency amongsensor nodes This in turn increases the computa-tional complexity and reduces the accuracy of miningtechnique

(iii) The techniqueswhich consider spatial correlation [51]among sensor data of neighboring nodes suffer fromthe choice of appropriate neighborhood range Tech-niques which consider temporal correlation amongsensor data suffers from the choice of the size of thesliding window

(iv) The majority of techniques uses centralized approach[21 42ndash44 46 58 84 101] in which all data istransmitted to the sink node for identifying certainpatterns These techniques cause much communica-tion overhead and delay the response time Whilethe techniques that used distributed architecture opti-mize response time and energy consumption theyhave the same problem as that of the centralizedapproach if the aggregatorcluster head has a largenumber of nodes under its membership

(v) Excluding a few the performance of all of the schemesdiscussed in this paper has been evaluated with thehelp of different simulation tools Although the num-ber of simulators is available and plays an importantrole for developing and testing new technique thereis always some kind of risk involved as simulationresults may not be accurate In order to analyze aprotocol more effectively it is important to knowdifferent available tools andunderstand the associatedbenefits and limitationsDue to different performancerequirements according to specific applications ageneral tool for sensor networks is still lacking atpresent

(vi) The techniques evaluated by using analytical mod-eling [21 23 46 49 100 109] used certain sim-plification and assumption to evaluate the perfor-mance of proposed technique Such assumptions andsimplifications may lead to imprecise results withlimited confidence None of the proposed techniqueis evaluated by using real deployment Although realdeployment is complex costly and time consuming

accurate results can only be obtained by using realdeployment

(vii) Excluding a few [22 103 109] the majority oftechniques assumes that sensor nodes are stationaryand do not consider nodes mobility Applying thesetechniques for mobile networks or the networks withdynamic changed topology would be challenging

(viii) Most of the techniques used the synthetic dataAlthough synthetic data is easily available therealways been chances that results generated on syn-thetic data are not accurate

(ix) For the data mining techniques themselves fre-quent pattern mining [15ndash20] approaches suffer fromchoice of proper and flexible support and confidencethreshold Clustering techniques [11ndash14] suffer fromthe choice of an appropriate parameter of clusterwidth and computing the distance between datainstances in heterogeneous data is computationallyexpensive whereas classification-based techniques[24ndash26] require some prior knowledge to classify theincoming data stream However learning accurateclassification model is challenging if the number ofvariables is large in deployed WSNs

7 Future Research Directions

It is observed from the analysis of existing data mining workon sensor network-based application there are still shortcom-ings in existing techniques By seeing these shortcomingsand special characteristics of WSNs there is a need for datamining technique designed for WSNs The technique shouldbe based on the following requirements

(i) The technique should combine offline learningmech-anisms with distributed and online data processing

(ii) It should also consider the resource constraint ofWSN and its special characteristics such as nodemobility and network topology

(iii) The technique should consider heterogeneous dataand dependencies among spatial temporal andattribute correlations which may exist between adja-cent nodes

(iv) During online mining the technique should be capa-ble for incremental learning

(v) The technique should have low computation com-plexity and be easy to be implemented

Based on aforementioned requirements for WSN ahybrid data mining framework is proposed as shown inFigure 6 In this framework sensor nodes use their pro-cessing abilities to locally carry out mining processing andtransmit only the required and partially processed data calledlocal models Single-pass algorithms are applied for networkdata processing as the data is continuously arriving and notavailable for the next scan

Local models contain the compact event patterns ratherthan raw data which address the issue of communication

20 International Journal of Distributed Sensor Networks

Node data processingData selectionRemove duplicationAggregationSummarizationData fusionclusteringAssociation analysismiddot middot middotmiddot middot middot

middot middot middot

Sensor datastream

Global model

Approximateresults

Network model Local modelQuery

Users

Sinkbasestation

In-network processingCentralized processing

Central data processingFrequent pattern miningClassificationClusteringIncremental learningPredicationAnomaly detectionTime series analysis

Network data processingLocal model integrationNetwork analysisReal time decisionsNetwork maintenance

Network patternidentification

monitoring

Sing

le p

ass

Mul

ti pa

ss

Figure 6 Proposed hybrid framework for sensor network based applications

overhead associated with data transfer Local models aredistributed on entire network which are integrated at specialnode which is resource sufficient as compared with othersensor nodes As a result a network model is computed that ismore abstract than local model and is transferred to the basestationsink inmultihop fashionThenetworkmodels are thenintegrated at base stationsink to get the global view of entirenetwork named the global model As a result approximatequery answers are returned to endusers

This framework addresses the following shortcomings ofthe existing techniques

(i) It combines the offline learning mechanisms withdistributed and online data processing The dynamicnature of WSNs data requires real-time analysismethodologies and systems Centralized processingthrough high-end computing is also required forgenerating offline predictive insights which in turncan facilitate real-time analysis The applications thatrequire real-time response and actions can use net-work model for decision and knowledge extractionThe applications that need extensive data analysis fortheir decision making can use global model and per-form central processing on base the stationsink Thenetwork model forwards the processed informationto global model for extensive predictive insight

(ii) Since the data management is a crucial issue inWSNsdata [111] in order to deal with large-scale data fromWSNs the proposed framework splits the data pro-cessing tasks at multiple locations in-network pro-cessing and processing at central server In-networkprocessing splits the large task into smaller ones atnode level and cluster head which is distributed overthe entire network and executes parallelly At the node

level storage capacities of single nodes are used tocompute the local model which contains aggregateddata from single node whereas cluster head acquiresthe data from group of nodes and aggregate datareadings over a certain region or period As a resultnetwork model is computed at each cluster headwhich contains compact data from set of nodes andreduces data size to be transmitted Network modelscan be integrated at sink to get the global view ofreal-time applications Since the sink at network levelhas restricted resource and cannot process large-scaledata for predictive analysis therefore network mod-els are sent to central server where global models canbe computed for predictive offline analysis Historicalquery from the user can also be addressed fromcentral server whereas instant query can be handledby sink to support real-time response In this way ofdata distribution the proposed framework is feasibleto deal with large amount of data obtained fromWSNs

(iii) It can consider the resource constraint of sensornode by using context-awareness techniques Mem-ory energy [79] and bandwidth are considered inthe implementation of data processing on the sensorsfor example many summarization and aggregationtechniques can be adopted to reduce energy andbandwidth consumption

(iv) The framework can address the problem quicklychanging nature of WSNs data where characteristicsof the monitored process may change over timeand render the old models outdated This problemcan be addressed using the incremental learning

International Journal of Distributed Sensor Networks 21

mechanism [39 112] that helps the model to updatenew information

(v) The framework can identified the spatial-temporalcorrelation at local model by using data correlation-based clustering whereas attribute correlation can beidentified at global model by using the multipass datamining algorithms

Currently we are working on implementation of thishybrid framework and the implementationwill be completedin the near future

8 Conclusion

The emerging need for the data mining techniques in thefield of WSNs resulted in the development of numerousalgorithms Each one of these algorithms solves certainissues related to the appropriate WSNs type and applicationIn this paper we analyzed discussed and compared therelated existing research approaches We observed that thetechniques intended for mining sensor data at the networkside are helpful for taking real-time decision aswell as serve asprerequisite for development of effective mechanism for datastorage retrieval query and transaction processing at centralside Moreover we have presented problem-based taxonomyan overall analysis and review of the past research and theirlimitations which can provide insights for endusers in apply-ing or developing an appropriate data mining method andappropriate technology forWSNs Based on these limitationswe have proposed a hybrid framework which can addressthe shortcomings of existing work We have also discussedthe challenges for implementing data mining techniques inresource-constrained WSNs Besides there are a number ofopen issues in existing studies which need to be addressedSurely the number of WSNs applications presented hereis neither complete nor exhaustive but merely a sample ofapplications that demonstrate the usefulness and possibleapplications of data mining method in sensor network

We believe that WSNs applications will become moremature and popular with the advancement of sensor tech-nology and sensor data will become more informationrich Mining techniques will then be very significant inorder to conduct advanced analysis such as determiningtrends and finding interesting patterns thus enhancingWSNsperformance and operation The intention to present thispaper is to stimulate interests in utilizing and developing theprevious studies into emerging applications

Acknowledgments

This work was supported in part by the Joint Funds ofNSFC-Microsoft Research Asia under Grant no 60933012the Specialized Research Fund for the Doctoral Programof Higher Education under Grant no 20110142110062 andInternational SampT Cooperation Program of Hubei Provinceunder Grant no 2010BFA008

References

[1] A Rozyyev H Hasbullah and F Subhan ldquoIndoor child track-ing in wireless sensor network using fuzzy logic techniquerdquoResearch Journal of Information Technology vol 3 no 2 pp 81ndash92 2011

[2] R Szewczyk E Osterweil J Polastre M Hamilton A Main-waring and D Estrin ldquoHabitat monitoring with sensor net-worksrdquo Communications of the ACM vol 47 no 6 pp 34ndash402004

[3] S H Chauhdary A K Bashir S C Shah and M S ParkldquoEOATR energy efficient object tracking by auto adjustingtransmission range in wireless sensor networkrdquo Journal ofApplied Sciences vol 9 no 24 pp 4247ndash4252 2009

[4] P K Biswas and S Phoha ldquoSelf-organizing sensor networks forintegrated target surveillancerdquo IEEETransactions onComputersvol 55 no 8 pp 1033ndash1047 2006

[5] L T Lee and C W Chen ldquoSynchronizing sensor networkswith pulse coupled and cluster based approachesrdquo InformationTechnology Journal vol 7 no 5 pp 737ndash745 2008

[6] N Sabri S A Aljunid B Ahmad A Yahya R KamaruddinandM S Salim ldquoWireless sensor actor network based on fuzzyinference system for greenhouse climate controlrdquo Journal ofApplied Sciences vol 11 no 17 pp 3104ndash3116 2011

[7] D Kumar ldquoMonitoring forest cover changes using remotesensing and GIS a global prospectiverdquo Research Journal ofEnvironmental Sciences vol 5 pp 105ndash123 2011

[8] J Yick B Mukherjee and D Ghosal ldquoWireless sensor networksurveyrdquoComputerNetworks vol 52 no 12 pp 2292ndash2330 2008

[9] T Arampatzis J Lygeros and S Manesis ldquoA survey of appli-cations of wireless sensors and wireless sensor networksrdquoin Proceedings of the 20th IEEE International Symposium onIntelligent Control (ISIC rsquo05) pp 719ndash724 June 2005

[10] Y-C Tseng M-S Pan and Y-Y Tsai ldquoWireless sensor net-works for emergency navigationrdquo Computer vol 39 no 7 pp55ndash62 2006

[11] T Yairi Y Kato and K Hori ldquoFault detection by miningassociation rules fromhouse-keeping datardquo inProceedings of the6th International Symposium on Artificial Intelligence Roboticsand Automation in Space pp 18ndash21 2001

[12] O Horovitz S Krishnaswamy and M M Gaber ldquoA fuzzyapproach for interpretation of ubiquitous data stream clusteringand its application in road safetyrdquo Intelligent Data Analysis vol11 no 1 pp 89ndash108 2007

[13] J Gama P P Rodrigues and L Lopes ldquoClustering distributedsensor data streams using local processing and reduced com-municationrdquo Intelligent Data Analysis vol 15 no 1 pp 3ndash282011

[14] Z A Aghbari I Kamel and T Awad ldquoOn clustering largenumber of data streamsrdquo Intelligent Data Analysis vol 16 no1 pp 69ndash91 2012

[15] A Boukerche and S Samarah ldquoAn efficient data extractionmechanism for mining association rules from wireless sensornetworksrdquo in Proceedings of the IEEE International Conferenceon Communications (ICC rsquo07) pp 3936ndash3941 June 2007

[16] Y Chi H Wang P S Yu and R R Muntz ldquoMomentmaintaining closed frequent itemsets over a stream slidingwindowrdquo inProceedings of the 4th IEEE International Conferenceon Data Mining (ICDM rsquo04) pp 59ndash66 November 2004

[17] M Deypir and M H Sadreddini ldquoEclatDS an efficient slid-ing window based frequent pattern mining method for data

22 International Journal of Distributed Sensor Networks

streamsrdquo Intelligent Data Analysis vol 15 no 4 pp 571ndash5872011

[18] J Gama A Ganguly O Omitaomu R Vatsavai and M GaberldquoKnowledge discovery from data streamsrdquo Intelligent DataAnalysis vol 13 no 3 pp 403ndash404 2009

[19] B George J M Kang and S Shekhar ldquoSpatio-temporal sensorgraphs (STSG) a data model for the discovery of spatio-temporal patternsrdquo Intelligent Data Analysis vol 13 no 3 pp457ndash475 2009

[20] A Mahmood K Shi and S Khatoon ldquoMining data generatedby sensor networks a surveyrdquo Information Technology Journalvol 11 pp 1534ndash1543 2012

[21] D J Cook M Youngblood E O Heierman III et alldquoMavHome an agent-based smart homerdquo in Proceedings of the1st IEEE International Conference on Pervasive Computing andCommunications (PerCom rsquo03) pp 521ndash524 March 2003

[22] J Rabatel S Bringay and P Poncelet ldquoSO MAD sensorminingfor anomaly detection in railway datardquo in Advances in DataMining Applications andTheoretical Aspects pp 191ndash205 2009

[23] V Guralnik and K Z Haigh ldquoLearning models of humanbehaviour with sequential patternsrdquo in Proceedings of the AAAI-02 Workshop on Automation as Caregiver pp 24ndash30 2002

[24] S Huang and Y Dong ldquoAn active learning system for miningtime-changing data streamsrdquo Intelligent Data Analysis vol 11no 4 pp 401ndash419 2007

[25] J Beringer and E Hullermeier ldquoEfficient instance-based learn-ing on data streamsrdquo Intelligent Data Analysis vol 11 no 6 pp627ndash650 2007

[26] E J Spinosaa A PD L F deCarvalhoa and J Gamab ldquoNoveltydetection with application to data streamsrdquo Intelligent DataAnalysis vol 13 no 3 pp 405ndash422 2009

[27] M Xie S Han B Tian and S Parvin ldquoAnomaly detectionin wireless sensor networks a surveyrdquo Journal of Network andComputer Applications vol 34 no 4 pp 1302ndash1325 2011

[28] Y Zhang N Meratnia and P Havinga ldquoOutlier detectiontechniques for wireless sensor networks a surveyrdquo IEEE Com-munications Surveys and Tutorials vol 12 no 2 pp 159ndash1702010

[29] V Chandola A Banerjee and V Kumar ldquoAnomaly detection asurveyrdquo ACM Computing Surveys vol 41 no 3 article 15 2009

[30] VMaojo and J Sanandre ldquoA survey of data mining techniquesrdquoMedical Data Analysis Lecture Notes in Computer Science vol1933 pp 17ndash22 2000

[31] W Jinlong X Congfu C Weidong and P Yunhe ldquoSurveyof the study on frequent pattern mining in data streamsrdquo inProceedings of the IEEE International Conference on SystemsMan and Cybernetics (SMC rsquo04) pp 5917ndash5922 October 2004

[32] J Cheng Y Ke and W Ng ldquoA survey on algorithms formining frequent itemsets over data streamsrdquo Knowledge andInformation Systems vol 16 no 1 pp 1ndash27 2008

[33] A A Abbasi andM Younis ldquoA survey on clustering algorithmsfor wireless sensor networksrdquo Computer Communications vol30 no 14-15 pp 2826ndash2841 2007

[34] O Boyinbode H Le and M Takizawa ldquoA survey on clusteringalgorithms for wireless sensor networksrdquo International Journalof Space-Based and SituatedComputing vol 1 no 2 pp 130ndash1362007

[35] M M Gaber A Zaslavsky and S Krishnaswamy ldquoA survey ofclassificationmethods in data streamsrdquo inData Streams pp 39ndash59 Springer 2007

[36] R Agrawal and R Srikant ldquoFast algorithms for mining associ-ation rulesrdquo in Proceedings of the 20th International ConferenceVery Large Data Bases (VLDB rsquo94) pp 487ndash499 Citeseer 1994

[37] R J Bayardo Jr ldquoEfficiently mining long patterns fromdatabasesrdquo SIGMOD Record vol 27 no 2 pp 85ndash93 1998

[38] S Brin RMotwani andC Silverstein ldquoBeyondmarket basketsgeneralizing association rules to correlationsrdquo SIGMODRecordvol 26 no 2 pp 265ndash276 1997

[39] W Cheung and O R Zaiane ldquoIncremental mining of frequentpatterns without candidate generation or support constraintrdquoin Proceedings of 7th International Database Engineering andApplications Symposium pp 111ndash116 2003

[40] R Agrawal T Imielinski and A Swami ldquoMining associationrules between sets of items in large databasesrdquo in Proceeding ofSIGMOD pp 207ndash216

[41] J Han J Pei Y Yin and R Mao ldquoMining frequent pat-terns without candidate generation a frequent-pattern treeapproachrdquo Data Mining and Knowledge Discovery vol 8 no 1pp 53ndash87 2004

[42] M Halatchev and L Gruenwald ldquoEstimating missing valuesin related sensor data streamsrdquo in Proceedings of the 11thInternational Conference on Management of Data (COMADrsquo05) 2005

[43] N Jiang ldquoDiscovering association rules in data streams basedon closed pattern miningrdquo in Proceedings of the SIGMODWorkshop on Innovative Database Research 2007

[44] N Jiang and L Gruenwald ldquoEstimating missing data in datastreamsrdquo Advances in Databases Concepts Systems and Appli-cations pp 981ndash987 2007

[45] N Jiang and L Gruenwald ldquoCFI-stream mining closed fre-quent itemsets in data streamsrdquo in Proceedings of the 12th ACMSIGKDD International Conference on Knowledge Discovery andData Mining (KDD rsquo06) pp 592ndash597 August 2006

[46] K Loo I Tong and B Kao ldquoOnline algorithms for min-ing inter-stream associations from large sensor networksrdquo inAdvances in KnowledgeDiscovery andDataMining pp 291ndash3022005

[47] G S Manku and R Motwani ldquoApproximate frequency countsover data streamsrdquo in Proceedings of the 28th InternationalConference on Very Large Data Bases pp 346ndash357 2002

[48] S K Chong S Krishnaswamy S W Loke and M M GaberldquoUsing association rules for energy conservation in wirelesssensor networksrdquo in Proceedings of the 23rd Annual ACMSymposium on Applied Computing (SAC rsquo08) pp 971ndash975March 2008

[49] S K Tanbeer C F Ahmed B-S Jeong and Y-K Lee ldquoEfficientmining of association rules from wireless sensor networksrdquo inProceedings of the 11th International Conference on AdvancedCommunication Technology (ICACT rsquo09) pp 719ndash724 February2009

[50] A Boukerche and S Samarah ldquoA novel algorithm for miningassociation rules in Wireless Ad Hoc Sensor Networksrdquo IEEETransactions on Parallel and Distributed Systems vol 19 no 7pp 865ndash877 2008

[51] K Romer ldquoDistributed mining of spatio-temporal event pat-terns in sensor networksrdquo in Proceedings of the 1st Euro-American Workshop on Middleware for Sensor Networks(EAWMS rsquo06) 2006

[52] BTnode platform httpwwwbtnodeethzch[53] R Agrawal and R Srikant ldquoMining sequential patternsrdquo in

Proceedings of the IEEE 11th International Conference on DataEngineering pp 3ndash14 March 1995

International Journal of Distributed Sensor Networks 23

[54] R Srikant and R Agrawal ldquoMining sequential patterns gen-eralizations and performance improvementsrdquo in Proceedings ofthe Advances in Database Technology (EDBT rsquo96) pp 1ndash17 1996

[55] F Masseglia F Cathala and P Poncelet ldquoThe PSP approachfor mining sequential patternsrdquo Principles of Data Mining andKnowledge Discovery pp 176ndash184 1998

[56] J Han J Pei B Mortazavi-Asl Q Chen U Dayal and M-CHsu ldquoFreeSpan frequent pattern-projected sequential patternminingrdquo in Proceedings of the Sixth ACMSIGKDD InternationalConference onKnowledgeDiscovery andDataMining (KDD rsquo01)pp 355ndash359 August 2000

[57] J Pei J Han B Mortazavi-Asl et al ldquoPrefixSpan min-ing sequential patterns efficiently by prefix-projected patterngrowthrdquo in Proceedings of the 17th International Conference onData Engineering pp 215ndash224 April 2001

[58] F Esposito T M A Basile N Di Mauro and S Ferilli ldquoA rela-tional approach to sensor network data miningrdquo InformationRetrieval and Mining in Distributed Environments pp 163ndash1812010

[59] F Esposito N Di Mauro T M A Basile and S FerillildquoMulti-dimensional relational sequence miningrdquo FundamentaInformaticae vol 89 no 1 pp 23ndash43 2008

[60] R Agrawal H Mannila R Srikant et al ldquoFast discovery ofassociation rulesrdquo inAdvances in KnowledgeDiscovery andDataMining pp 307ndash328 AAAI PressMenlo Park Calif USA 1996

[61] Mica2Dot CrossBow 2005 httpwwwxbowcom[62] Intel Berkeley Research Lab Data httpdbcsailmitedulab-

datalabdatahtml[63] P H Wu W C Peng and M S Chen ldquoMining sequential

alarm patterns in a telecommunication databaserdquo in Databasesin Telecommunications II pp 37ndash51 2001

[64] V S Tseng and E H-C Lu ldquoEnergy-efficient real-time objecttracking in multi-level sensor networks by mining and predict-ing movement patternsrdquo Journal of Systems and Software vol82 no 4 pp 697ndash706 2009

[65] V S Tseng and K W Lin ldquoEnergy efficient strategies for objecttracking in sensor networks a data mining approachrdquo Journalof Systems and Software vol 80 no 10 pp 1678ndash1698 2007

[66] S Samarah M Al-Hajri and A Boukerche ldquoA predictiveenergy-efficient technique to support object-tracking sensornetworksrdquo IEEE Transactions on Vehicular Technology vol 60no 2 pp 656ndash663 2011

[67] A Taherkordi R Mohammadi and F Eliassen ldquoA commu-nication-efficient distributed clustering algorithm for sensornetworksrdquo in Proceedings of the 22nd International Conferenceon Advanced Information Networking and Applications Work-shopsSymposia (AINA rsquo08) pp 634ndash638 March 2008

[68] G Gupta and M Younis ldquoLoad-balanced clustering of wirelesssensor networksrdquo in Proceedings of the International Conferenceon Communications (ICC rsquo03) vol 3 pp 1848ndash1852 May 2003

[69] S Bandyopadhyay and E J Coyle ldquoAn energy efficient hier-archical clustering algorithm for wireless sensor networksrdquo inProceedings of the 22nd Annual Joint Conference on the IEEEComputer and Communications Societies pp 1713ndash1723 April2003

[70] S Ghiasi A Srivastava X Yang and M Sarrafzadeh ldquoOptimalenergy aware clustering in sensor networksrdquo Sensors vol 2 no7 pp 258ndash269 2002

[71] O Younis and S Fahmy ldquoHEED a hybrid energy-efficientdistributed clustering approach for ad hoc sensor networksrdquoIEEE Transactions on Mobile Computing vol 3 no 4 pp 366ndash379 2004

[72] M Younis M Youssef and K Arisha ldquoEnergy-aware manage-ment for cluster-based sensor networksrdquo Computer Networksvol 43 no 5 pp 649ndash668 2003

[73] Y T Hou Y Shi H D Sherali and S F Midkiff ldquoOn energyprovisioning and relay node placement for wireless sensornetworksrdquo IEEE Transactions on Wireless Communications vol4 no 5 pp 2579ndash2590 2005

[74] T Wu and S Biswas ldquoA self-reorganizing slot allocation proto-col for multi-cluster sensor networksrdquo in Proceedings of the 4thInternational Symposium on Information Processing in SensorNetworks (IPSN rsquo05) pp 309ndash316 April 2005

[75] K Dasgupta K Kalpakis and P Namjoshi ldquoAn efficientclustering-based heuristic for data gathering and aggregationin sensor networksrdquo in Proceedings of the IEEE Wireless Com-munications and Networking Conference (WCNC rsquo03) vol 3 pp1948ndash1953 2003

[76] M Demirbas A Arora and V Mittal ldquoFLOC A fast local clus-tering service for wireless sensor networksrdquo in Proceedings ofWorkshop on Dependability Issues in Wireless Ad Hoc Networksand Sensor Networks (DIWANS rsquo04) 2004

[77] P Ding J Holliday and A Celik ldquoDistributed energy-efficienthierarchical clustering for wireless sensor networksrdquo in Pro-ceedings of the 1st IEEE International Conference on DistributedComputing in Sensor Systems (DCOSS rsquo05) pp 466ndash467 July2005

[78] H Chan and A Perrig ldquoACE an emergent algorithm for highlyuniform cluster formationrdquoWireless Sensor Networks vol 2920pp 154ndash171 2004

[79] H Chan M Luk and A Perrig ldquoUsing clustering informationfor sensor network localizationrdquo in Proceedings of the 1st IEEEInternational Conference on Distributed Computing in SensorSystems (DCOSS rsquo05) pp 109ndash125 July 2005

[80] H Huang and J Wu ldquoA probabilistic clustering algorithmin wireless sensor networksrdquo in Proceeding of IEEE 62ndSemiannual Vehicular Technology Conference (VTC rsquo05) p 17962005

[81] A Youssef M Younis M Youssef and A Agrawala ldquoDis-tributed formation of overlappingmulti-hop clusters in wirelesssensor networksrdquo in Proceedings of the 49th Annual IEEE GlobalCommunication Conference (Globecom rsquo06) pp 1ndash6 December2006

[82] S Dai P Wang L Gao and S Zheng ldquoMining clusteringalgorithm in wireless sensor networksrdquo in Proceedings of theIEEE International Conference on Granular Computing (GRCrsquo08) pp 178ndash182 August 2008

[83] W R Heinzelman A Chandrakasan and H Balakrish-nan ldquoEnergy-efficient communication protocol for wirelessmicrosensor networksrdquo in Proceedings of the 33rd AnnualHawaii International Conference on System Siences (HICSS rsquo00)vol 2 p 223 January 2000

[84] C Liu K Wu and J Pei ldquoA dynamic clustering and schedulingapproach to energy saving in data collection from wirelesssensor networksrdquo in Proceedings of the 2nd Annual IEEE Com-munications Society Conference on Sensor and AdHoc Commu-nications and Networks (SECON rsquo05) pp 374ndash385 September2005

[85] L Guo C Ai X Wang Z Cai and Y Li ldquoReal time clusteringof sensory data in wireless sensor networksrdquo in Proceedingsof the IEEE 28th International Performance Computing andCommunications Conference (IPCCC rsquo09) pp 33ndash40 December2009

24 International Journal of Distributed Sensor Networks

[86] M H Yeo M S Lee S J Lee and J S Yoo ldquoData correlation-based clustering in sensor networksrdquo in Proceedings of the Inter-national Symposium on Computer Science and its Applications(CSA rsquo08) pp 332ndash337 October 2008

[87] P Beyens A Nowe and K Steenhaut ldquoHigh-density wirelesssensor networks a new clustering approach for prediction-based monitoringrdquo in Proceedings of the 2nd European Work-shop on Wireless Sensor Networks (EWSN rsquo05) pp 188ndash196February 2005

[88] S Yoon and C Shahabi ldquoThe Clustered AGgregation (CAG)technique leveraging spatial and temporal correlations in wire-less sensor networksrdquo ACM Transactions on Sensor Networksvol 3 no 1 Article ID 1210672 2007

[89] K Wang S A Ayyash T D C Little and P Basu ldquoAttribute-based clustering for information dissemination in wirelesssensor networksrdquo in Proceedings of the 2nd Annual IEEE Com-munications Society Conference on Sensor and AdHoc Commu-nications and Networks (SECON rsquo05) pp 498ndash509 Santa ClaraCalif USA September 2005

[90] X Ma S Li Q Luo et al ldquoDistributed hierarchical clusteringand summarization in sensor networksrdquo in Advances in Dataand Web Management pp 168ndash175 2007

[91] L K Sharma O P Vyas S Schieder et al ldquoNearest neighbourclassification for trajectory datardquo Information and Communica-tion Technologies vol 101 pp 180ndash185 2010

[92] B Chikhaoui S Wang and H Pigot ldquoA new algorithm basedon sequential pattern mining for person identification in ubiq-uitous environmentsrdquo in Proceedings of the 4th InternationalWorkshop on Knowledge Discovery form Sensor Data (ACMSensorKDD rsquo10) pp 20ndash28 Washington DC USA 2010

[93] J R M Bauchet S Giroux H Pigot et al ldquoPervasive assistancein smart homes for people with intellectual disabilities a casestudy on meal preparationrdquo International Journal of AssistiveRobotics and Mechatronics vol 9 no 4 pp 42ndash54 2008

[94] D J Cook andM Schmitter-Edgecombe ldquoAssessing the qualityof activities in a smart environmentrdquoMethods of Information inMedicine vol 48 no 5 pp 480ndash485 2009

[95] I H Witten and E Frank Data Mining Practical MachineLearning Tools and Techniques With Java Implementation Mor-gan Kaufmann 2000

[96] K Sharma M Rajpoot and L K Sharma ldquoNearest neighbourclassification for wireless sensor network datardquo InternationalJournal of Computer Trends and Technology no 2 2011

[97] NS2 Simulator httpwwwisiedunsnamns[98] O P V L K Sharma S Schieder and A K Akasapu ldquoA nearest

neighbour classification for trajectory datardquo in Springer CCISvol 101 pp 180ndash185 2010

[99] M J Akhlaghinia A Lotfi C Langensiepen and N SherkatldquoA fuzzy predictor model for the occupancy prediction of anintelligent inhabited environmentrdquo in Proceedings of the IEEEInternational Conference on Fuzzy Systems (FUZZ rsquo08) pp 939ndash946 June 2008

[100] M Gaber S Krishnaswamy and A Zaslavsky ldquoOn-boardmining of data streams in sensor networksrdquo in AdvancedMethods for Knowledge Discovery from Complex Data pp 307ndash335 2005

[101] M M Gaber S Krishnaswamy and A Zaslavsky ldquoAdaptivemining techniques for data streams using algorithm outputgranularityrdquo in Proceedings of the Australasian Data MiningWorkshop 2003

[102] M M Gaber A Zaslavsky and S Krishnaswamy ldquoResource-aware knowledge discovery in data streamsrdquo in Proceedingsof 1st International Workshop on Knowledge Discovery in DataStreams held in Conjunction ECML and PKDD 2004

[103] S M McConnell and D B Skillicorn ldquoA distributed approachfor prediction in sensor networksrdquo in Proceedings of the Work-shop on Data Mining in Sensor Networks Newport Beach CalifUSA 2005

[104] B Malhotra I Nikolaidis and J Harms ldquoDistributed classifi-cation of acoustic targets in wireless audio-sensor networksrdquoComputer Networks vol 52 no 13 pp 2582ndash2593 2008

[105] K Flouri B Beferull-Lozano and T Tsakalides ldquoTraininga SVM-based classifier in distributed sensor networksrdquo inProceedings of the 14th International Conference onDigital SignalProcessing (DSP rsquo09) pp 1ndash5 2006

[106] K Flouri B Beferull-Lozano and T Tsakalides ldquoEnergy-efficient distributed support vectormachines for wireless sensornetworksrdquo in Proceedings of the EuropeanWorkshop onWirelessSensor Networks 2006

[107] K Flouri B Beferull-Lozano and T Tsakalides ldquoDistributedconsensus algorithms for SVM training in wireless sensornetworksrdquo in Proceedings of the 16th European Signal ProcessingConference (EUSIPCO 09) 2008

[108] S Rajasegarar C Leckie M Palaniswami and J C BezdekldquoQuarter sphere based distributed anomaly detection in wire-less sensor networksrdquo in Proceedings of the IEEE InternationalConference on Communications (ICC rsquo07) pp 3864ndash3869 June2007

[109] B Chikhaoui S Wang and H Pigot ldquoA new algorithm basedon sequential pattern mining for person identification in ubiq-uitous environmentsrdquo in Proceedings of the 4th InternationalWorkshop on Knowledge Discovery form Sensor Data (ACMSensorKDD rsquo10) pp 20ndash28 Washington DC USA 2010

[110] K Romer and F Mattern ldquoThe design space of wireless sensornetworksrdquo IEEEWireless Communications vol 11 no 6 pp 54ndash61 2004

[111] O Diallo J J P C Rodrigues and M Sene ldquoReal-time datamanagement on wireless sensor networks a surveyrdquo Journal ofNetwork andComputer Applications vol 35 no 3 pp 1013ndash10212012

[112] Y Yao L Feng B Jin and F Chen ldquoAn incremental learningapproachwith SupportVectorMachine for network data streamclassification problemrdquo Information Technology Journal vol 11no 2 pp 200ndash208 2012

Submit your manuscripts athttpwwwhindawicom

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2013Part I

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

DistributedSensor Networks

International Journal of

ISRN Signal Processing

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Mechanical Engineering

Advances in

Modelling amp Simulation in EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Advances inOptoElectronics

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2013

ISRN Sensor Networks

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawi Publishing Corporation httpwwwhindawicom Volume 2013

The Scientific World Journal

ISRN Robotics

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

International Journal of

Antennas andPropagation

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

ISRN Electronics

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

thinspJournalthinspofthinsp

Sensors

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Active and Passive Electronic Components

Chemical EngineeringInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Electrical and Computer Engineering

Journal of

ISRN Civil Engineering

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Advances inAcoustics ampVibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Page 2: ReviewArticle Data Mining Techniques for Wireless Sensor ...home.etf.bg.ac.rs/~vm/os/dmsw/Data Mining... · have a large impact on type of data mining algorithm to choose;therefore,onehastodecidetheprocessing

2 International Journal of Distributed Sensor Networks

Table 1 Difference between traditional and sensor data processing

Traditional data WSNs dataProcessing architecture Centralized DistributedData type Static DynamicMemory usage Unlimited RestrictedProcessing time Unlimited RestrictedComputational power High WeakEnergy No constraints LimitedData flow Stationary ContinuousData length Bounded UnboundedResponse time Non-real-time Real timeUpdate speed Low HighNumber of passes Multipass Single

detection in multiple domains using data mining as well asstatistical information theoretic and spectral techniques

Since data mining is a broad discipline and can beapplied to any domain data more general surveys on datamining techniques can be found in [30] where authorsexamined the machine-learning and data mining techniquesfor analyzing medical data Since the classification of datamining techniques in this survey is based on frequent patternmining clustering and classification there are plenty ofsurveys available on each of these techniques For examplefrequent pattern mining over data stream is presented in [3132] A survey on clustering algorithm for WSNs is presentedin [33 34] The clustering techniques examined in thosepapers exclusively focus on architecture and managementof network rather than information discovery A survey onclassificationmethods over data stream is given in [35] wherethe author examined conventional classification techniquesover data streams

However none of the above surveys examined datamining techniques that focus on information extractionand analysis from WSNs data In comparison with theabove-mentioned surveys this paper examines algorithmsand approaches specially designed for WSNs data not onlyleading to a different classification evaluation and discussionon different domains but also presenting different choices ofa solution We examined how data mining algorithms will beutilized to make the sensor network applications intelligentThe research method consists of review of data mining tech-niques for WSNs such as frequent pattern mining sequentialpatternmining clustering and classification Problem-basedtaxonomy is presented to classify and compare existing datamining techniques adopted forWSNs In addition evaluationof each technique is presented Based on the limitations ofexisting techniques and special characteristics of WSNs weproposed a new hybrid data mining architecture for WSNswhich combines the offline learning with distributive andonline data processing

The rest of the paper is organized as follows After theintroduction in Section 1 how traditional data mining pro-cess is different with data mining process in WSNs andchallenges of data mining for WSNs data are discussed inSection 2 In Section 3 taxonomy of categorizing the existing

data mining techniques for WSNs is presented In Section 4we analyzed a collection of published studies using the taxon-omy framework The comparison of data mining techniquesfor WSNs is presented in Section 5 The limitations of thiswork are given in Section 6 and future research directionsare presented in Section 7 Finally the paper ends with theconclusion in Section 8

2 Fundamentals of Data Mining in WSNs

21 Data Mining Process in WSNs Data mining in sensornetworks is the process of extracting application-orientedmodels and patterns with acceptable accuracy from a contin-uous rapid and possibly nonended flow of data streams fromsensor networks In this case whole data cannot be stored andmust be processed immediately Data mining algorithm hasto be sufficiently fast to process high-speed arriving dataTheconventional data mining algorithms are meant to handle thestatic data and use the multistep techniques and multiscanmining algorithms for analyzing static data-sets Thereforeconventional data mining techniques are not suitable forhandling the massive quantity high dimensionality anddistributed nature of the data generated by theWSNs Table 1shows the summary of difference between traditional dataand WSNs data mining process

It can be observed from Table 1 that traditional data min-ing is centralized computationally expensive and focused ondisk-resident transactional data It directly collects data at thecentral sitewhich is not bounded by computational resourcesIn comparison with traditional data-sets the WSNs dataflows continuously in systems with varying update rates Dueto huge amount and high storage cost it is impossible tostore the entire WSNs data or to scan through it multipletimes These characteristics of sensor data and the specialdesign issues of sensor networks make traditional datamining techniques challenging Hence it is crucial to developdata mining technique that can analyze and process WSNsdata in multidimensional multilevel single-pass and onlinemanner

22 Challenges According to the following reasons conven-tional data mining techniques for handling sensor data inWSNs are challenging

(i) Resource Constraint The sensor nodes are resourceconstraints in terms of power memory communica-tion bandwidth and computational power The mainchallenge faced by data mining techniques for WSNsis to satisfy the mining accuracy requirements whilemaintaining the resource consumption of WSNs to aminimum

(ii) Fast and Huge Data Arrival The inherent nature ofWSNs data is its high speed In many domains dataarrives faster than we are able to mine Additionallyspatiotemporal embedding of sensor data plays animportant role in WSNs application This may causemany classical data processing techniques to performpoorly on spatiotemporal sensor data The challengefor data mining techniques is how to cope with the

International Journal of Distributed Sensor Networks 3

continuous rapid and changing data streams andalso how to incorporate user interaction during high-speed data arrival

(iii) Online Mining In WSNs environment data is geo-graphically distributed inputs arrive continuouslyand newer data items may change the results basedon older data substantially Most of data mining tech-niques that analyze data in an offline manner do notmeet the requirement of handling distributed streamdata Thus a challenge for data mining techniques ishow to process distributed streaming data online

(iv) Modeling Changes of Mining Results Over Time Whenthe data-generating phenomenon is changing overtime the extracted model at any time should beup-to-date Due to the continuity of data streamssome researchers have pointed out that capturing thechange of mining results is more important in thisarea than themining resultsThe research issue is howto model this change in the results

(v) Data Transformation Since sensor nodes are limitedin terms of bandwidth transforming original dataover the network is not feasible Knowledge structuretransformation is an important issue After extractingmodel and patterns locally from WSNs data theoutput is transferred to the base stationThe challengefor data mining technique is how to efficiently rep-resent data and discovered patterns over network fortransmission

(vi) DynamicNetwork Topology Sensor network deployedin potentially harsh uncertain heterogenic anddynamic environments Moreover sensor nodes maymove among different locations at any point overtime Such dynamicity and heterogeneity increase thecomplexity of designing an appropriate data miningtechnique for WSNs

To address these challenges researchers have modifiedthe conventional data mining techniques and also proposednew data mining algorithms to handle the data generatedfrom sensor networks In the following section we haveprovided the taxonomy of these data mining techniquesbased on the discipline from which they adopt their ideas

3 Taxonomy of Data Mining Techniquesfor WSNs

In this section a classification scheme for existing approachesdesigned for mining WSNs data is presented The highest-level classification is based upon the general data miningclasses used such as frequent pattern mining sequential pat-tern mining clustering and classification Most of the frequentpattern mining and sequential pattern mining approacheshave adapted the traditional frequent mining techniquessuch as the Apriori and frequent pattern (119865119875) growth-basedalgorithms to find the association among large WSNs dataCluster-based approaches have adapted the K-mean hier-archical and data correlation-based clustering based upon

the distance among the datapoint whereas classification-based approaches have adapted the traditional classificationtechniques such as decision tree rule-based nearest neighborand support vector machines methods based on type ofclassification model that they used These algorithms havevery different and distinct roles therefore in order to choosethe algorithm forWSNs application one has to decide in termof these top-level classes

The second level of classification is based upon eachapproachrsquos ability to process data on centralized or distributedmanner Since WSNs nodes are limited in terms of resourcesuch as power computation bandwidth and memory there-fore the approach meant for distributed processing requiresone-pass algorithms to complete a part of data mininglocally and then aggregate the results The objective to usethe distributed approaches is to limit the messages andcommunication energy of sensor nodes while transferringdata to central server It also helps to improve the WSNs life-time and can extract maximum data from the environmentwhereas the centralized processing data from entire networkis collected and stored at central server for analysis Sincethe central server is rich in resources therefore there are nosuch constraints for choosing the accurate algorithm Thisapproach is always discouraged for the researchers becauseit generates huge amount of dataflow and communicationwhich can create bottlenecks and wastage of communicationbandwidth These two data processingstorage architectureshave a large impact on type of data mining algorithm tochoose therefore one has to decide the processingstoragearchitecture for choosing the data mining algorithm forWSNs application

The third level of classification is selected according tothe attitude towards solving a specific problem Researchin WSNs area has focused on two separate aspects ofissues namely WSNs performance issues and applicationissues As WSNs nodes are usually resource constrainedsuch as energy communication bandwidth memory andresource aware algorithms are needed to maximize theWSNs performance On the other hand a WSNs applicationrequires data precision and accuracy fault tolerance eventprediction scalability and robustness and it often needsabundant use of energy communication and redundanciesThis leads to resource tradeoff whether someone sacrificesthe applicationrsquos performance in favor of network efficiency orwants to get the best application performance and deal withthe network resource issues such as energy in some other way(larger battery renewable sources with the nodes) For thisreason WSNs performances or application-specific-orientedapproaches have been selected as the lowest-level classifica-tion criteria The taxonomy of data mining techniques forWSNs is presented in Figure 1

4 State of the Art of Data Mining Techniquesfor WSNs

In this section data mining techniques designed for WSNsare classified using the taxonomy framework presented inSection 3 and the characteristics and performance analysisof each technique is discussed

4 International Journal of Distributed Sensor Networks

Data mining techniques for WSNs

ClassificationClusteringSequential miningFrequent mining

Distributed Centralized Distributed Centralized Distributed Centralized Distributed Centralized

WSN performance

WSN performance

WSN performance

WSN performance

WSN performance

WSN performance

WSN performance

Application based

Application based

Application based

Application based

Application based

Application based

DSARMCARM

Distributed data aggregation

Association rules mining framework

Online algorithm Lightweight rule

learning

MPGPTSP

Relational frameworkEpisode discovery

Contextual patterns discovery

Pattern learner MSAP

DCC

Prediction model CAG

Clustering sensory data Attribute-based clustering

DHCS

EEDC

Prediction framework FVLD

online learning

Person identification algorithms

NNTC Fuzzy predictor model

LWClass

SP-treeH-cluster

In-network datamining

TMP-mine One-class quarter-sphere SVM

Figure 1 Taxonomy of data mining techniques for sensor networks

41 Frequent Pattern Mining In this section we review someof the works that have been proposed for mining frequentpatterns from WSNs data Frequent pattern mining is usedto find the group of variables that co-occur frequently inthe data-set The aim is to find the most interesting relationsbetween variables Traditional frequent pattern mining algo-rithms [36ndash39] are the CPU and the IO intensive making itvery expensive to mine dynamic nature of WSN data Unlikethe mining static database dynamic nature of WSNs data ledto the study of online mining of frequent itemset As a resulttraditional frequent pattern mining algorithms are modifiedaccording to nature of WSNs data

The basic frequent pattern mining technique is associ-ation rule mining technique The first known associationrule mining algorithm is Apriori [40] It is based on level-wise candidate generation and test methodology by makingseveral scans over database In each iteration the patternsfound to be frequent are used to generate possible frequentpatterns (the candidates) to be counted in the next iterationTherefore theApriori technique finds the frequent patterns oflength 119896 from the set of already generated candidate patternsof length 119896 minus 1 In the subsequent step the association rulesare generated by computing the support and confidence ofeach frequent item in given database 119863 which is defined asfollows

Support (119860) =Sup (119860)119863 (1)

where Sup(119860) is the number of occurrence of 119860 in database119863 Consider the following

Confidence (119860 997888rarr 119861) =Sup (119860 cup 119861)Sup (119860)

(2)

This is impractical in the context of sensor networksas it implies that all data has to be stored somewhereHowever recently there has been a growing amount of workon discovering frequent item-sets from a data stream oftransactions such that every transaction is considered onlyonce and can be deleted afterwards

The other basic approach from mining association ruleis FP-growth [41] which can discover frequent patterns byreducing the database scans by two and eliminating therequirement of candidate generation as compared with Apri-ori With the first database scan the algorithm finds theset of distinct items with respective support count (iefrequency) in the database Then with the second databasescan the algorithm summarizes the database in the form ofa frequency-descending tree (ie the FP-tree) The completeset of frequent patterns is then mined from the FP-treeby recursively applying a divide-and-conquer-based patterngrowth approach called the FP-growth algorithm withoutadditional database scan The highly compact FP-tree struc-ture introduced a new wing of research in mining frequentpatterns However the static nature of the FP-tree and twodatabase scans still limit its applicability to frequent patternmining over a WSNs data Recently several centralized and

International Journal of Distributed Sensor Networks 5

distributed solutions have been proposed with the aimto maximize the WSNsrsquo performance and maximize theapplication-based performance by applying Apriori-like andFP-growth methods over WSNs data

411 Centralized Approaches Aim to SolveWSNsrsquo Application-Based Issues Halatchev and Gruenwald [42] proposed acentralized methodology called data stream association rulemining (DSARM) to identify the missing sensorrsquos readings Ituses the association rulemining algorithm to identify sensorsthat report the same data for a number of times in a slidingwindow called related sensors and then estimates the missingdata from a sensor by using the data reported by its relatedsensors Due to the stream nature of sensor data applyingan association mining algorithm such as Apriori directly tosensor data is not possible This situation led the authorsto propose the DSARM framework that adapts the Apriorialgorithm to make it applicable to the data stream receivedfrom sensor nodesThis technique is evaluated by simulationexperiments on real data collected by the Department ofTransportation in Austin TX USA to estimate missingvalue in related data streams Performance evaluations wereconducted to compare DSARM and alternative approachesThe results show that DSARM requires more memory spaceand takes longer to produce estimation than the consideredalternative approaches it achieves better accuracy of theestimated value than the alternative approaches do Howeverthere exist some limitations in DSARM First it is basedon two frequent itemsets association rule mining whichmeans that it can discover the relationships only between twosensors and ignore the cases where missing values are relatedwith multiple sensors Second it finds those relationshipsonly when both sensors report the same value and ignoresthe cases where missing values can be estimated by therelationships between sensors that report different values

Jiang and Gruenwald [43 44] proposed a data estimationtechnique called CARM (closed item-sets-based associationrule mining) which can derive the most recent associationrules between the sensors in the current sliding window Thetechnique is based on the closed frequent item-sets miningalgorithmof data streams calledCFI-stream [45] Itmaintainsan in-memory data structure called direct update (DIU) treeto store closed item-sets When a new transaction arrivesthe algorithm checks each item-set in the transaction over adata stream slidingwindowonline and incrementally updatesthe closed item-setsrsquo support If CRAM found some missingvalues in sensor reading instead of generating all possibleassociation rules it generates the rules that have strongrelationships with the current round of sensor readingswhereone or more readings are missing Based on these rules andselected closed item-sets CRAM generates the estimatedvalues which contain item values that are not included inthe original readings Figure 2 redrawn from [43] shows theDIU tree after receiving first four transactions It shows thatcurrently there are four closed item-sets C AB CD andABCin the DIU tree and their associated supports at the right-upper corner are 3 3 1 and 2 A basic set of rules is generatedfrom these frequent item-sets All other rules can be inferredfrom this basic rule set

Φ

CDTim

eline

TID

1

2

3

4

Items

C D

A B

A B C

A B C

AB 3

C 3

ABC 2

Figure 2 Lexicographical-ordered direct update tree

412 Centralized Approaches Aim to Maximize WSNsrsquo Per-formance Loo et al [46] have proposed online one-passalgorithms for mining large sensor streams They mine thefrequent value set from sensor stream data by transformingthe stream data into interval list (IL) under lossy countingframework [47] The time is divided into equal-size intervaland snapshot from the sensor reading is taken when there isan update on sensor reading Sensorsrsquo value at that snapshotconstructs the value sets stored in database An Apriori-based strategy is used to mine the value sets The analysisof IL-based presentation of stream data showed favorableresults using synthetic data-set However while computingthe IL of candidate value set redundant intersection ofIL is inevitable which affects the performance in termsof time and computation cost The proposed technique isevaluated by comparing the performance of ILB againstan application of lossy counting (LC) using a weightedtransformation method on synthetic dataset According totheir experiments ILB outperforms LC significantly for largesensor networks Moreover both the processing time andmemory consumption of ILB are more stable than those ofLC

Chong et al [48] proposed a rule-learning model thatfinds strong rules from sensor readings The rules are used asa trigger to control sensor network operations for examplethey can be used to sleep sensor or reduce data transmissionto conserve energy To mine the rules Apriori is modified tocount the number of transactions that are frequent insteadof the item-sets within transactions and transactions areprocessed in batches 119887

1 1198872 119887

119883 Suppose there is node

119872 that collects light temperature and microphone readingfrom three other sensor streams 119878

0 1198781 and 119878

2 Initially 119872

is queried to collect all sensory values it is used to generatea rule of the form of 119886

119899which implies 119886

119899minus1 therefore the

rule is extracted and only 119886119899is sent to the base station Upon

receiving the reading 119886119899and utilizing knowledge of the rule

the reading of 119886119899minus1

can be inferred All extracted rules arestored in rule repository The proposed method is validatedby using simulation implemented in C language on syntheticdataset In the experiment the first correlated data receivedfrom sensor is used to extract rules For subsequent phasethese rules are used to infer reading of sensor for the nextround

Tanbeer et al [49] proposed a tree-based data structurecalled sensor pattern tree (SP-tree) to generate association

6 International Journal of Distributed Sensor Networks

rules from WSNs data with one database scan The mainidea of the proposed approach is to obtain the frequencyof all event-detecting sensorsrsquo data construct a prefix-treebased on that in any canonical order and then reorganizethe tree in a frequency descending order Through thereorganization the SP-tree canmaintain the frequently event-detecting sensorsrsquo nodes at the upper part of the tree whichin turn provides high compactness in the tree structureOnce the SP-tree is constructed FP-growthmining techniqueis applied to find the frequent event-detecting sensor setsExperiments are performed to verify the improvement inmemory consumption and runtime that SP-tree achieves overPLT [50] The experiments show that SP-tree outperformsPLT in time and memory consumption The reason of suchgain is two folds first the PLT construction requires twodatabase scans while SP-tree constructs the tree by scanningthe database only once second the mining phase of SP-tree is highly efficient due to the frequency-descending treestructure

413 Distributed Approaches Aim to SolveWSNsrsquo Application-Based Issues Romer [51] proposed an in-network data min-ing technique to discover frequent patterns of events withcertain spatial and temporal properties In this approach userspecifies the upper boundmaxscope andmaxhistory (variableto be measured in seconds) for the patterns of interest Thesensor collects these events and applies amining algorithm todiscover the pattern that satisfies the given parameters Eachnode in the network collects the events from its neighborswithin themaximum scope and keeps a history of their eventsfor duration of the maximum history After that each nodeapplies a mining algorithm to discover the local frequentpatterns The resulting frequent patterns are converted toassociation rules that describe an event of type 119864 that occursat node 119899 with support 119878 and confidence 119862 Local patternsare sent to the sink where secondary mining is performed tocompute the global picture of entire network The algorithmis implemented on BT node (bluetooth radio) platform [52]and the tradeoff between scope of the query and resourceconsumption on real dataset is evaluated Results show byreducing the scope of the query that the proposed approachcould decrease resource consumption Major issues in thisapproach are memory consumption of itemset discoveryalgorithms and the communication overhead of event collec-tion

414 Distributed Approaches Aim to Maximize WSNsrsquo Perfor-mance Boukerche and Samarah [15] presented a distributeddata extraction methodology to aggregate the data on sensornode which reduced the number of messages during trans-mission The distributed solution sends some parameterssuch as support time-slot size and historic period from sink toall nodes within network Each sensor node has its own bufferentry to set the support value After each time slot nodescheck whether there are messages received during this timeslot if yes then that node will set its buffer entry When thehistoric period ended each node will traverse its buffer if thenumber of set value is more than or equal to support value

provided initially then the message would be transfered tosink To evaluate the validity of the distributed approach it iscompared with the centralized methodology on real datasetThey conducted two experiments using historical periods of 5and 10 days with minimum support values ranging from 10to 90 and a time-slot size equal to 30 seconds All of thereported results show a reduction in the number of messagesand the data sizewhile increasing in the support valuesMajorissues in thismethodology are increase in cost for node bufferand also delay in crucial messages in case of high supportvalue

Boukerche and Samarah [50] proposed the positionallexicographic tree (PLT) structure for mining associationrules in which the event-detecting sensors are the mainobjects of the rules regardless of their values Similar to theFP-growth approach PLT follows a pattern growth miningtechnique The mining begins with the sensor having themaximum rank by generating the frequent patterns from itsPLT in a recursive way The computation is required at eachrecursion to update the PLT involved in the prefix part ofa pattern Therefore two database scans requirement andthe additional PLT update operations during mining limitthe efficient use of this approach in handling WSNs dataThe performance evaluation is done by comparing the PLTstructure with the FP-growth algorithm According to theirresults PLT structure outperforms FP-growth in terms ofCPU time and memory usage for all of the support valuesused the enhanced performance using PLT when comparedwith FP-growth ranges from 30 percent to 50 percent

42 Sequential PatternMining (SPM) Frequent patternmin-ing has been extended to find more complex structuresuch as sequential pattern mining It discovers frequentsubsequences as patterns in a sequence database A sequencedatabase stores a number of records where all records aresequences of ordered events with orwithout concrete notionsof time A large number of real-world domains such as userprofiling medicine local weather forecast and bioinformat-ics show an inherent tendency to be modeled by means ofsequences of eventsobjects related to each other This greatvariety of applications of sequential pattern mining makesthis problem one of the central topics in WSNs data miningas shown by the research efforts produced in the recent yearsThe sequential pattern mining techniques in sensor networkbased on either traditional sequential mining algorithmssuch as Apriori-like algorithm [53] Apriori-based methodsGSP [54] PSP [55] and pattern growth approaches FreeSpanand PrefixSpan [56 57] or some new algorithm are devisedspecifically to work with sensor network environment

421 Centralized Approaches Aim to SolveWSNsrsquo Application-Based Issues Esposito et al [58 59] presented a multi-dimensional relational sequence mining framework to iden-tify the hidden frequent temporal correlations betweensensor nodes The algorithm is based on generic level-wise search method called APRIORI [60] for discoveringcorrelated sensors The framework exploits the relationallanguage to describe the temporal evolution of a sensor

International Journal of Distributed Sensor Networks 7

network along with contextual information by working intwo phases Firstly an abstraction step is to segment andlabel the real-valued time series into similar subsequencesby using a kernel density estimator approach Then theknowledge is enriched by adding interval-based operatorsbetween the subsequences obtained in the discretization stepand the relation pattern mining algorithm has been extendedin order to deal with these new operators By taking intoaccount the interval-based temporal data along with contex-tual information about events it discovers interesting andmore human-readable patterns The framework is evaluatedon real dataset collected from a wireless sensor networkmade up of 54 Mica2Dot [61] sensors deployed in the IntelBerkeley Research Lab [62] Each sensor collected topologyinformation along with humidity temperature light andvoltage values once every 31 seconds Results show the strongcorrelation among some measurements which is useful foranomaly detection

Cook et al [21] present MavHome smart home archi-tecture which focuses on the creation of an intelligenthome perceiving the state of the home through sensors andacting upon the environment through device controllers Animportant characteristic of the proposed architecture is theability to make decisions based on predicted activities Topredict the activities an algorithm called episode discovery(ED) is proposed which is based on the work of Srikantand Agrawal [54] for mining sequential patterns from time-ordered transactions Values that can be predicted include theusage pattern of devices in the home the movement patternsof the inhabitants and the typical activities of the inhabitantsThey utilize prediction algorithms on action sequences storedin inhabitant event history to forecast user actions Actionscan then be automated based on the significance of minedpatterns as well as the predictive accuracy of the next eventA key disadvantage is the fact that the entire action historymust be stored and processed off line which is not practicalfor large prediction tasks over a long period of time Cook etal demonstrated the effectiveness of MavHome on syntheticsmart home data and real data collected by students usingX10controllers in their homes Experiments show a predictiveaccuracy as high as 534 on the real data and 944 on thesynthetic data

Rabatel et al [22] presented a strategy to detect anomaliesfrom sensor data to improve the railway maintenance Theyextract sequential pattern from real railway data and identifythe abnormal behavior Based on these abnormal findingsalarms are automatically triggered to notify potential fail-ures This abnormal behavior depends on environmental(weather conditions travel characteristics) and structural(route episode index in the route) changes in data ThePSP [55] algorithm has been used to identify the sequentialpatterns To tackle the environments conditions a contextualknowledge-based method is proposed which is able toprovide information on the seriousness and possible causesof a deviation The proposed technique helps in proactivemaintenance of train However real-time context can beimproved by providing precise and exact information foranomaly detection

a q kTqkTaq

Figure 3 Example of sequential alarm pattern

Guralnik and Haigh [23] use sequential pattern miningto learn typical behaviors of humans in their homes Humanbehavior is inferred by using motion sensors pressure padsdoor latch sensors and toilet flush sensors They installed10ndash20 sensors of different types in a home and built modelsof what sensor firings correspond to what activities in whatorder and at what time For example ldquoIn 60 of the daysthe Kitchen-Motion sensor fires between 18h00 and 18h30and then the Living-Room-Motion sensor fires between18h20 and 20h00 and then the Bedroom-Motion sensor firesbetween 19h45 and 22h00rdquoTheir algorithm uses these data tolearn the sequences of rooms in which the person was actingand it uses domain knowledge to extract the sequences ofrooms the person was acting in These sequences are thenanalyzed by a human expert to identify complex behaviormodels These models can be used to select the appropriateresponse plan to the action of elderly

Wu et al [63] proposed a new algorithm for miningsequential alarm patterns (MSAPs) from the alarm datagenerated by GSM system Sequential events are identifiedfrom alarm data by defining time interval between adjacentevents For example if time is set as six hours then thesequential alarm pattern (119886 119887 119888) indicates that 119886 119887 and 119888happen in order and that the time interval between 119886 and119887 and between 119887 and 119888 is less than six hours An exampleof sequential alarm sequence redrawn from [63] is shown inFigure 3

The number in circle represents the error ID and 119879119886119902

denotes the time difference between alarm event 119886 and alarmevent 119902 The knowledge extracted is not only useful foridentifying relevance between two events but it is also predictthe alarm sequence and takes proper steps to prevent theoccurrence of the alarms if at all possible For example if thenetwork operator detects that the alarm 119886 occurring at time 119905operator should dissipate this alarm before the time 119905+119879

119886119902to

alleviate the abnormal situations incurred The limitation inthis technique is that it cannot discover other possible time-interval patterns between the events

It is observed that there is none of centralized solutionswhich aim to maximize the WSNsrsquo performance

422 DistributedApproaches Aim to SolveWSNsrsquo Application-Based Issues Tseng and Lu [64] proposed an object trackingstrategy named themultilevel object tracking (MLOT) to dis-cover sequential patterns in object tracking sensor networks(OTSNs) by mining the movement log in sensor networks Amultilevel hierarchical structure is adapted by using the clus-tering mechanism that represents the hierarchical relationsamong sensor nodes to achieve the goal of keeping track ofmoving objects in a real-time manner The movement logsof the moving objects are analyzed by developing the data

8 International Journal of Distributed Sensor Networks

mining algorithm movement pattern generation (MPG) toobtain themovement patterns which are then used to predictthe next position of a moving object and to activate the leastsensor node The MPG is based on Apriori which uses thefrequency of the inference pattern to evaluate the confidenceof the pattern and which with the highest frequency serves asthe basis of the prediction

423 Distributed Approaches Aim to Maximize WSNsrsquo Per-formance Tseng and Lin [65] proposed an object trackingstrategy named TMP-mine to discover sequential patternsin object tracking sensor networks (OTSNs) by mining thetemporal movement patterns (TMPs) logs The discoveredtemporal movement rules (TMRs) are used to predict thelocation of next objects for saving energy In the proposedmodel object is able to record the sensor nodes it visitedalong with the arrival time at each nodeThemovement log iscollected by equipping the sensor nodes with storage devicesTheWSN collects and integrates themovement log ofmovingobjects The integrated movement log is used as the input tothe data mining method named the TMP-miner which usesthe pattern growth approach for discovering the TMPs Byapplying the TMP-mine algorithm the TMPs are discoveredand then the temporalmovement rules (TMRs) are generatedfor predicting next location of moving object Suppose thatthe following two rules are discovered by vehicle trackingsystem

Rule 1 (Station A rarr interval 10min rarr Station B rarrinterval 5min rarr Station C)

Rule 2 (Station A rarr interval 20min rarr Station B rarrinterval 5min rarr Station rarr D)

By dispatching these rules to the corresponding sensornodes the tracking can be made in energy-efficient way Forexample if a car moves with the pattern as (Station A rarrinterval 10min rarr Station B rarr interval 5min) that matcheswith Rule 1 then the node in Station B has only to activatethe node in Station C rather than that in Station D or thosearound Station B

Samarah et al [66] proposed an energy-efficientprediction-based tracking technique by using the sequentialpatterns (PTSPs) This technique helps to predict the futurelocation of a moving object with the minimum number ofsensor nodes while keeping the other sensor nodes in thenetwork in sleep mode The PTSP is based on the inheritedpatterns of the objects movements in the network and theutilization of sequential patterns to predict in which sensornode the moving object will be heading next

43 Clustering Clustering is unsupervised learning wheregiven data is categorized into subsets so that each subsetrepresents a cluster which has distinctive properties It hasbeen considered a useful technique especially for applicationsthat require scalability to large number of sensor nodesClustering also supports aggregation of data in order tosummarize the overall transmitted data

ClustersInput sensor data

Feedback

Identification ofdata correlation Grouping data

Figure 4 Data clustering for sensor networks

In the current literatures problems related to clusteringare addressed by node clustering or data clustering Recentlylarge numbers of node clustering algorithms have beendesigned for WSNs [67ndash83] These clustering techniqueswidely vary in their objectives depending on the node deploy-ment and bootstrapping schemes the pursued networkarchitecture the characteristics of the cluster head (CH)and the network operation model Although node clusteringmay be related to data clustering for example consideringdata similarity of neighboring node many popular nodeclustering algorithms that partition the sensor nodes into anumber of small groups and elect a cluster head for everygroup do not use the data mining techniques directly In thisstudy we only focus on data clustering techniques to efficientdata mining and find data correlations among the nodesFigure 4 shows the commonly used data clustering in datamining process

This work adapted the K-mean hierarchical and datacorrelation-based methods The k-mean algorithm takes theinput parameter k and partitions a set of 119899 objects into kclusters so that the resulting intracluster similarity is highbut the intercluster similarity is low Cluster similarity ismeasured with respect to the mean value of the objectsin a cluster Hierarchical method creates a hierarchicaldecomposition of the given set of data objects It works bygrouping data objects into a tree of clusters whereas datacorrelation-based clustering forms clusters based on spatialand temporal correlations with similar node sensory valueswithin a given threshold and these clusters remain fixeduntil the sensory value threshold has changed over timeWhen the threshold values change the related sensor nodeswill then communicate with neighboring nodes associatedwith other clusters to change their cluster memberships Thedrawback of this type of clustering is that it does not considernode residual energy It is observed from the survey that thecentralized and distributed clustering solutions are aim tomaximize the WSNs performance

431 Centralized Approaches Aim to Maximize WSNsrsquo Per-formance Liu et al [84] proposed a centralized graph-basedenergy-efficient data collection (EEDC) EEDC is on-demandclustering algorithm that clusters node into groups such thatmembers have similar sensor readings and thus the protocolclusters the network with an awareness of the phenomenabeing sensed EEDC is a centralized approach where thesink compares data from different nodes with a user-defineddissimilarity measure EEDC models the cluster creationprocess as a clique-covering problem by constructing a graph119866 such that each sensor node is a vertex in the graph An edge(119906 V) is drawn if the dissimilarity measure between vertex119906 and vertex V is less than or equal to the given intracluster

International Journal of Distributed Sensor Networks 9

dissimilarity measure thresholdmax dst A cluster is a cliquein the graph and the clustering problem uses the minimumnumber of cliques to cover all vertices in the graph Thisprocess minimizes the number of clusters and maximizes theenergy saving The sink also dynamically adjusts the clustersbased on spatial correlation and the received data from thesensors The algorithm produces robust and well-balancedclusters However due to centralized processings it is notsuitable for large-scale WSNs

432 Distributed Approaches Aim toMaximizeWSNsrsquo Perfor-mance Guo et al [85] proposed the H-cluster a distributedalgorithm to cluster sensory dataThe input of this algorithmis the set of sensory data collected by all of the sensorsfrom the time WSN starts working up to the current timeThe output of the algorithm is a set of cluster featuresthat summarize the clusters of the input sensory data-setHilbert-Map mapping algorithm has been used to map ad-dimensional sensory data space into a 2-dimensional areacovered by a given WSN H-cluster has 2 phases (1) itmerges connected grid features with local cluster featuresof (sensory dimensional) D at each destination node (2)it combines the connected local clusters to global clustersThe experiments on the centralized and distributed dataare carried out to compare the H-Cluster with C-Cornerand C-Center algorithms During experiment four types ofenvironment attributes are sensed by the sensors which aretemperature humidity light and voltage The results showthatH-Cluster algorithm ismuch efficient in data loss energyand the quality of cluster data in small WSNThe results alsoshows that as the amount of sensory data delivered increasesthe amount of data loss also increases and energy efficiencydecreases by increasing the size of WSNs

Yeo et al [86] proposed data correlation-based clusteringscheme (DCC) based on similarity of sensor data along aspatial suppression scheme which helps to reduce the datasize DCC enhances the advertisement phase of HEED [71]in which cluster heads are selected according to probabilityof becoming a cluster head during this phase sensor nodescommunicate with each other and the resulting clustersare organized by sensor nodes which have similar readingsSpatial suppression is performed on cluster head and italso computes the difference between sensor reading andrepresentative value If a cluster head has redundant datait will remove it except for the node identification Theexperimental results justify the hypothesis claim that theclustering based on data correlation has better compressionperformance than ordinary clustering based on locality ofcommunication they show that DCC reduces 40 of datasize through suppression and prolongs network lifetime20ndash30 However for the large-scale network applications(nodes gt 500) DCC is inefficient because each cluster headneeds more energy to collect similar data readings and alsoto communicate with several nodes Also in case of lowpercentage of similar data reading DCC is ineffective due tohigher rate of cluster head creation

Beyens et al [87] proposed a cluster-based architecturefor wireless sensor networks in which cluster heads spa-tiotemporally correlate and predict the measurements of the

cluster members by executing their prediction model Intheir approach the cluster heads execute a prediction modelwhile gateway nodes at the circumference of the clusters areresponsible for the routing task Prediction model is used toselect a suitable node of the cluster to be activated The ideais to put a sensor node to sleep when there are no objects inits sensing region

Yoon and Shahabi [88] present the clustered aggregation(CAG) algorithm that forms clusters of nodes sensing similarvalues within a given threshold (spatial correlation) andthese clusters remain unchanged as long as the sensor valuesstay within a threshold over time (temporal correlation)By grouping nodes on similar values CAG only transmitsone reading per group When the threshold values changethe related sensor nodes will then communicate with neigh-boring nodes associated with other clusters to change theircluster memberships CAG guarantees the result to be withina user-specified error-tolerance threshold Cluster formationis performed while queries are disseminated to the network(query phase) where clusters group nodes sensing similarvalues Subsequently CAG enters the response phase whereinonly one aggregated value per cluster is transmitted up theaggregation tree CAG is a lossy clustering algorithm (mostsensory readings are never reported) which trades a lowerresult precision for a significant energy storage computationand communication saving

Taherkordi et al [67] proposed a communication-efficient distributed protocol for clustering sensory dataA distributed version of 119870-Mean clustering algorithm isproposed and sends summarized data towards sink whichreduces the communication transmission time and powerconsumption of sensor nodes The sensor network is dividedinto clusters and cluster head node will only communicatewith sink Initially base station transmits current centerlocations to cluster heads Cluster head collects data fromits sensor node and sends it to the base station includingcount and vector sum of its local sensory data points aswell as sum of the squared distance from each local pointto its center On receiving data from CH the base stationupdates the cluster mean and the algorithm repeats until thefunction convergence is met The efficiency of the algorithmis evaluated via simulations Several programs are run to getthe average number of transmissions over the network duringeach test According to results the communication cost isindependent of the number of sensors (119873) and increaseslinearly by increasing the number of centers Major issuesare extra memory for cluster head and computation powerfor summarization of data before transmitting to sink Alsothe algorithm requires multiple rounds of message passingbetween cluster heads and the base station this may have aserious effect on communication efficiency when the numberof sensors is relatively high

Wang et al [89] promoted the idea of clustering theWSNs based on the queries and attributes of the data Themain motive is to achieve efficient dissemination of data inthe network The concept resembles the data-centric designmodel of WSNs The clustering is established by mappinga hierarchy of data attributes to the network topology Thebase station starts the clustering process by asking nodes

10 International Journal of Distributed Sensor Networks

Class label (Y)

Attribute set (X)

OutputInput Classification model

Figure 5 Classification maps input attribute set (X) to class label(Y)

to form clusters Those nodes that hear the request decidewhether they should nominate themselves as CHs basedon their energy After receiving the base-station requestsensor nodes having intention to become CHs wait for arandom time period that is based on the remaining batterysupply If a node nominates itself then it broadcasts anannouncement to all nodes A node joins the CH that itcan reach over the least number of hops Upon hearing aCH announcement from a node whose attribute is differentthe recipient node establishes a new cluster for that attributeand becomes a CH To evaluate the attribute-based clusteringscheme the authors have provided the theoretical analysis ofit with flooding-based schemes Analysis shows its attribute-based clustering scheme yield that gains over flooding-basedschemeswhen there are subregions in the sensor network thatare more targeted than others that is when the distributionof inquiries is not uniformly distributed over time and space

Ma et al [90] the proposed distributed hierarchicalclustering and Summarization algorithm (DHCS) for onlinedata analysis and mining in sensor networks The proposedmethod clusters sensor nodes based on their current datavalues aswell as their geographical proximity and it computesa summary for each cluster The algorithm adopts severaltechniques such as difference and hop count thresholds tomodel node and distance-based clustering Initially eachnode treats itself as an active cluster Then similar adjacentclusters are merged into larger clusters round by round Ineach round each cluster will try to combine with its mostsimilar adjacent cluster simultaneously Two clusters can bemerged only if both consider one another as the most similarneighbor DHCS terminates when no merging happens anymore The final clusters which cannot be merged any moreare called steady clusters

44 Classification Classification is a task of assigning newobject into a class of predefined object categories Classifi-cation model is learned using the set of training data andclassifies new data into one of the learned class Figure 5shows that classification maps input attribute set (X) to classlabel (Y)

Classification-based approaches have adapted the tra-ditional classification techniques such as decision tree-based rule-based nearest neighbor-based and support vectormachines-based techniques based on type of the classificationmodel that they used Decision tree is a classifier in the formof tree and classifies the instance by starting at the root oftree and moving through it until a leaf node where class labelis assigned The internal nodes are used to partition datainto subsets by applying test condition to separate instancesthat have different characteristics Nearest neighbor-basedapproaches classify dataset based on closet training examples

The training examples are vectors in a multidimensionalfeature space with corresponding class labels A nearestneighbor classifier is a lazy learner that does not processpatterns during training [91] To respond a request to classifya query vector is made to locate the closest training vectorsaccording to the distance metricThe classes of these trainingvectors are used to assign a class to the query vector

Rule-based classifier groups the dataset in predefinedclasses by using ldquoif then rdquo rules of following form

(Condition) rarr Y condition is a conjunction ofattribute and Y is a class label

SVM (support vector machine) techniques partition thedata belonging to different classes by fitting a hyperplanebetween them which maximizes the partition The data ismapped into a higher-dimensional feature space where it canbe easily partitioned by a hyperplane Furthermore a kernelfunction is used to approximate the dot products between themapped vectors in the feature space to find the hyperplane

441 Centralized Approaches Aim to SolveWSNsrsquo Application-Based Issues Chikhaoui et al [92] proposed the decisionTree (DT-) based classification technique for sensor dataThey applied the classification model to identify the personsin ubiquitous environment In order to identify personsthe proposed approach first extracts frequent patterns calledepisodes from the datasets using the Apriori algorithm [53]The next step evaluates the extracted patterns and assignsweights to these episodes to construct frequent episodeweight matrix (FEWM)

Finally the classification algorithm Decision tree (DT) isapplied on FEWMDT builds pattern classifier from a labeledtraining data-set using a divide-and-conquer approach Tobuild up a DT model it recursively selects the attribute thatis used to partition the training data-set into subsets untileach leaf node in the tree has uniform class membershipThe proposed approach is validated by experiment usingdata collected from the Domus Laboratory [93] and theTestbed smart home [94] The general performance andclassification accuracy of algorithm are evaluated by usingthe Weka framework version 370 [95] Experiment resultsshow good classification However using frequent episodesalone without temporal constraints and deep analysis doesnot guarantee good identification

Sharma et al [96] proposed amethodology for classifyingthe sensors data by using nearest neighbor trajectory clas-sification (NNTC) The training phase simply stores everytraining example with its label To make a prediction for atest example first its distance to every training example iscomputedThen 119896 closest training examples are storedwhere119896 is a fixed integer and 119896 ge 1 among the 119896 examples itlooks for the label that is most frequent This label is theprediction for this test example The algorithm is evaluatedby building a classifier from the preprocessed training datagenerated from NS2 [97] and test trajectory data [98] usingclass labels Experimental investigation yields a significantoutput in terms of the correctly classified success rate 923

Akhlaghinia et al [99] proposed the prediction techniquein smart home environments to predict the behavior pattern

International Journal of Distributed Sensor Networks 11

of occupantsThe sensor NWs collect the variety of attributesincluding environmental changes and occupantrsquos interactionwith the environment The collected data is then used by thelearning approach to construct a classification-based predic-tive model to predict the ambient intelligence environmentoccupancy The occupancy is predicted by using the fuzzyrules which are modeled by using the past value of timeseries data In the learning process input from the sensor iscompared with stored rules to take appropriate action Theprediction-based approach improves the energy saving insmart homes and enhances the safety and security of occu-pants The result shows the ability of the proposed techniqueto predict the combined occupancy time series However themodel is implemented in single-user environment and unableto predict the complex environmental patterns in multi-userenvironment over long period

442 Centralized Approaches Aim toMaximizeWSNsrsquo Perfor-mance Gaber et al [100] proposed the lightweight classifica-tion (LWClass) a one-pass algorithm for on-board miningof data streams in sensor networks They used the algorithmoutput granularity (AOG) [101 102] technique to preserve thelimited memory size and change the algorithm output rateaccording to data rate available memory algorithm outputrate history and time constraints to fill the available memorywith generated knowledgeThe algorithmworks by searchingfor the nearest instance stored in main memory when a newelement arrives All instances are already stored in the mainmemory according to a prespecified distance threshold Thethreshold here represents the similarity measure acceptableby the algorithm to consider two or more elements as oneelement according to the elements attribute values If thealgorithm finds this element then it checks the class labelIf the class label is the same then it increases the weightfor this instance by one otherwise it decrements the weightby one If the weight becomes zero then this element isreleased from the memory The algorithm is empiricallyvalidated using synthetic streaming data under the resource-constrained environment of a common handheld computer

443 DistributedApproaches Aim to SolveWSNsrsquo Application-Based Issues McConnell and Skillicorn [103] presented adistributed framework for building and deploying predictorsin sensor networks By using the computational power ofeach sensor a powerful learning structure on whole networkis constructed A distributed voting approach is proposedin which each sensor is a leaf of tree (DT) to performlocal prediction Instead of sending the raw data the localpredictive models built on sensors transmit the target class tothe sink At sink the local predication models are combinedto construct global prediction model It shows how thelocal model enables sensors to respond to the change intarget by relearning local models The proposed frameworkis useful especially for sensor networks with limited energycomputation and bandwidth resources It makes efficientthe distributed data mining in the presence of movingclass boundaries Data is also confidentially achieved bytransmitting a predictivemodel instead of original data to the

sink The distributed prediction model is evaluated using J48decision tree (implemented in WEKA) on variety of datasetfor both simple and weighted voting schemes According toresults distributed prediction model has the potential of anincrease in accuracy combined with a reduction in modelsize and runtime as compared with a centralized approachMajor issues in this framework are the need of an expensiveCPU on each sensor node for computing and building localpredictive model and also extra memory is required to storelocal predictive model

444 Distributed Approaches Aim to Maximize WSNsrsquo Per-formance Malhotra et al [104] proposed a distributed clas-sification scheme to generate effective feature vectors of lowdimension (FVLD) for wireless audio network A distributedcluster-based algorithm for detection and classification ofvehicles has been proposed Sensors form clusters on-demand for the sake of running a classification task based onthe produced feature vectors The monitoring area is dividedinto clusters and a cluster head is selected for each clusterAll sensors send their feature vector to cluster heads Thecluster head combines all received feature vectors (includingone from itself) executes the classification task using forexample KNN or ML classifiers and makes decision on theclass of the unknown vehicle Two approacheswere proposedthe first combines extracted features and the second combinesindividual decisions Classification using decision fusion anda maximum likelihood (ML) classifier led to the best resultsML is also compared with KNN classifier with varioussettings of data and decision fusion schemes The proposedtechnique produced the best classification accuracy of 8946as compared with all other approaches

Flouri et al [105ndash107] have proposed distributed andincremental techniques for learning classification rules usingSVM-based (support vector machine) technique in a sensornetwork The authors proposed two distributed algorithmsthe distributed fix partition SVM (DFP-SVM) and theweighted distributed fix partition SVM (WDFP-SVM) fortraining a SVM applied to the classification problem in aWSN SVM is incrementally trained on example set calledsupport vector The fact with SVM is that the number ofsupport vectors is very small comparedwith the number of allsample values Besides the support vectors (and offset) revealcompressed representation of separating SVM hyperplaneThat is why sending only the support vectors instead ofall training samples to the next cluster head is obviouslyvery energy efficient due to communication reduction Aftertraining the required parameters of the kernel functions aretransferred to each node for classification The performanceof the proposed approach is evaluated by running number ofsimulation and comparison is made with centralized algo-rithm The results show that energy consumption decreaseswhen the SVM is trained incrementally as compared with thecentralized case However the challenges for SVM formula-tions are computational complexity and the choice of properkernel function

Rajasegarar et al [108] proposed the SVM-based tech-nique for outlier detection in sensor data This techniqueuses one-class quarter-sphere SVM to identify local outliers

12 International Journal of Distributed Sensor Networks

at each node and to minimize the computational complexityThe sensor data that lies outside the quarter sphere isconsidered as an outlier Each node communicates onlythe radius information of sphere with its parent for outlierclassification This technique identifies outliers from the datameasurements collected after a long-time window and is notperformed in real time The technique also ignores spatialcorrelation of neighboring nodes which makes the results oflocal outliers inaccurate The technique is evaluated by usingthe real sensor measurement collected from deployment ofwireless sensors in the Great Duck Island Project [2] formonitoring the habitat of sea birds The algorithm is imple-mented in Matlab and two simulations were run to measurethe computational strategy and various kernel functionsResults reveal that the proposed technique achieves signifi-cant energy savings in terms of communication overhead inthe network

5 Comparison of Data Mining Techniquesfor WSNs

This section identifies several common and different aspectsof data mining techniques specially designed for WSNsdiscussed above These aspects will be used as metrics in thecomparative Tables 2 3 4 5 and 6 First evaluation aspectsfor different techniques are discussed and then comparativetables are presented to compare and differentiate existing datamining techniques for WSNs data

51 Input Sensor Data Sensor data can be viewed as largevolume of real-valued data that is continuously collectedfrom WSNs The type of input sensor data demonstrateswhich data mining techniques can be used to analyze thedata Data mining techniques usually consider following twocharacteristics of data

Attribute Mining techniques can identify the associationbetween data attributes Attributes can be homogenous [50] orheterogeneous [33 48] Homogenous attribute means sensingsingle-value attribute for example temperature only Forheterogeneous case each nodemay be equippedwithmultiplesensors and can sense multiple attributes for example tem-perature humidity and pressure The data mining techniqueshould be able to identify the correlation between multipleattributes

Correlation Two types of data correlation appear at eachsensor node The first type is attribute correlation that isdependency among data attributes The second type is interms of time and space that is temporal and spatial corre-lation Temporal correlation indicates that the readings fromdifferent sensor node are observed at the same time instantand readings observed at one time instant are related tothe readings observed at the previous time instant whereasspatial correlation indicates that the readings from sensornodes geographically close to each other are expected tobe largely correlated Capturing spatiotemporal correlation

helps to predict future trend of sensor reading and identifica-tion of dead node if reading from correlated sensor ismissing

52 Processing Architecture In order to apply data miningtechnique on sensor data we need to determine the modelsof computation There are two general models Consider thefollowing

CentralizedThe simplest way to analyzeWSNs data is to use acentralized model In this approach entire raw data collectedfromWSNs is transferred to central server whichmaintains adatabase of readings from all of the sensorsThe central serverperforms offline extensive analysis in order to find interestingpatterns from the aggregated data With the size of WSNsincreasing the amount of data transmitted in the system willbecome huge The obvious drawback of this approach is highconsumption of energy and bandwidth Furthermore it is notscalable to very large number of sensors

Distributed Another computation approach uses distributedmodel in which sensor nodes use their processing abilitiesto carry out some mining tasks locally and transmit onlythe required and partially processed data called local modelLocal models contain the compact event patterns rather thanraw data For example data collected from different sensorcan be aggregated before being transmitted to central serverIn these systems an intermediate node called ldquoaggregatorrdquo isused to collect and aggregate the data from different sensorsSince sensor nodes are constrained in resources the challengefor this approach is how to satisfy the mining accuracywhile keeping the communication overhead memory andcomputational cost low

53 Data Mining Method It refers to the data miningalgorithm adapted or developed for unique characteristic ofWSNs data Distributed approaches use one-scan algorithmsfor real-time processing in order to deal with the high dataarrival rate the mining results are expected to be availablewithin short response times whereas centralized approachescollect the sensory data to single site and applies offlinemultiscan technique for extensive data analysis

54 Node Properties The proposed techniques are largelyinfluenced by following types of node properties

Connectivity Single-hop communication is a direct commu-nication between the sensor node and the base station It issimple and easy to implement but limited by communicationdistanceMultihop communication uses some kinds of nodesas relays when transmitting data packets from the source tothe sink which is more complex

Mobility Node mobility increases the complexity of design-ing an appropriate data mining technique for WSNs Themajority of techniques assumes that sensor nodes are staticonly a few techniques consider the node mobility Whennodes are mobile maintaining a certain structure for data

International Journal of Distributed Sensor Networks 13

Table2Com

paris

onof

dataminingtechniqu

esforw

irelesssensor

networks

Approach

Objectiv

eDM

metho

d

Processin

gSensor

data

Nod

eproperties

Implem

entatio

nLimitatio

nsArchitecture

Attributes

Correlatio

nCon

nectivity

Mob

ility

Nod

erole

Nod

etask

Applicationarea

Evaluatio

nmetho

dDatas

ource

Opt

objective

Distributed

Central

Homogenous

Heterogeneous

Attribute

Spatial

Temporal

Singlehop

Multihops

Static

Mobile

ClusterheadSensorRelay

Simulation

AnalyticalMod

Real

Synthetic

Frequent

patte

rnmining

DSA

RM[42]

Missingdata

estim

ation

Aprio

rilik

eradicradic

radicradic

radicradic

Sensea

ndsend

Traffi

cmon

itorin

gradic

radicData

accuracy

Igno

rethes

ensor

thatrepo

rts

different

values

In-networkdata

mining[51]

Eventspatte

rns

discovery

Aprio

rilik

eradic

radicradicradic

radicradic

radic

Aggregatio

nlocalp

attern

mining

Environm

ental

mon

itorin

gradicradicradic

Scalability

Highmem

oryand

commun

ication

Distrib

uted

data

aggregation[15]

ImproveW

SNperfo

rmance

Aprio

rilik

eradic

radicradic

radicradic

radicSupp

ort-b

ased

aggregation

WSN

sperfo

rmance

mon

itorin

gradic

radicDatas

ize

Increasesb

uffer

cost

delayed

crucialm

essages

Onlinea

lgorith

m[46]

Intervallist

ofrepresentatio

nof

WSN

sdata

Lossy

coun

ting

radicradic

radicradic

radicradic

Perio

dical

sensing

WSN

smon

itorin

gradic

radicTimea

ndmem

ory

Datar

edun

dancy

Lightweightrule

learning

[48]

Identifyhigh

lycorrelated

rules

forsensin

gAp

riorilik

eradic

radicradic

radicradic

radicQuery-based

data

sensing

Con

trolW

SNs

operations

radicradic

Energy

Not

valid

ated

well

onrealdata

CARM

[43]

Missingdata

estim

ation

FP-growth

based

radicradic

radicradic

radicradic

Sensea

ndsend

Dataa

nalysis

radicradic

Data

accuracy

Ineffi

cientfor

hand

ling

high

-speed

data

14 International Journal of Distributed Sensor Networks

Table3Com

paris

onof

dataminingtechniqu

esforw

irelesssensor

networkscon

tinued

Approach

Objectiv

eDM

metho

d

Processin

gSensor

data

Nod

eproperties

Implem

entatio

nLimitatio

nsArchitecture

Attributes

Correlatio

nCon

nectivity

Mob

ility

Nod

erole

Nod

etask

Applicationarea

Evaluatio

nmetho

dDatas

ource

Opt

objective

Distributed

Central

Homogenous

Heterogeneous

Attribute

Spatial

Temporal

Singlehop

Multihops

Static

Mobile

Clusterhead

Sensor

Relay

Simulation

Analyticalmod

Real

Synthetic

Frequent

patte

rnmining

Associationrules

mining

fram

ework[50]

Faultand

future

event

predictio

n

FP-growth

usingPL

T-str

uctureradic

radicradic

radicradic

radicradic

Aggregatio

nMon

itorW

SNs

quality

ofserviceradic

radicNoof

messages

Increase

costdu

eto

multip

leDBscan

SP-tr

ee[49]

Disc

over

events

patte

rns

FP-growth

based

radicradic

radicradic

radicradic

Sensea

ndsend

Generic

mon

itorin

gradicradicradic

Mem

ory

Hightre

econstructio

ncost

Sequ

entia

lpattern

mining

Relatio

nal

fram

ework[58]

Multi-

dimensio

nal

correlation

discovery

Aprio

rilik

eradic

radicradicradic

radicradic

Sensea

ndsend

Environm

ental

mon

itorin

gradicradic

Data

representatio

nMem

oryandtim

econsum

ing

Episo

dediscovery(ED)

[21]

Actio

npredictio

n

Generalized

sequ

entia

lpatte

rn(G

SP)

radicradic

radicradic

radicSensea

ndsend

Inhabitants

behavior

predictio

nradicradicradic

Predictio

naccuracy

Ineffi

cientfor

complex

activ

ities

MPG

[64]

Predicto

bjectrsquos

future

movem

ent

Aprio

rilik

eradic

radicradic

radicradicradic

Clusterin

gRe

al-timeo

bject

tracking

radicradic

Tracking

time

andenergy

Not

analyzed

onrealdataset

Con

textual

patte

rns

discovery[22]

Ano

maly

detection

PSP

radicradicradic

radicradic

radicSensea

ndsend

Railw

aymaintenance

radicradic

Ano

maly

precision

Missingreal-time

anom

alypredictio

n

International Journal of Distributed Sensor Networks 15

Table4Com

paris

onof

dataminingtechniqu

esforw

irelesssensor

networkscon

tinued

Approach

Objectiv

eDM

metho

d

Processin

gSensor

data

Nod

eproperties

Implem

entatio

nLimitatio

nsArchitecture

Attributes

Correlatio

nCon

nectivity

Mob

ility

Nod

erole

Nod

etask

Applicationarea

Evaluatio

nmetho

dDatas

ource

Optobjectiv

e

Distributed

Central

Homogenous

Heterogeneous

Attribute

Spatial

Temporal

Singlehop

Multihops

Static

Mobile

ClusterheadSensorRelay

Simulation

Analyticalmod

Real

Synthetic

Sequ

entia

lpattern

mining

TMP-mine[65]

Predicto

bjectrsquos

future

movem

ent

Patte

rngrow

thusingTM

P-tre

econstructio

nradic

radicradic

radicradic

radicRu

le-based

node

activ

ation

Real-timeo

bject

tracking

radicradic

Energy

Highmissing

rateandtim

e

Patte

rnlearner[23]B

ehavior

recogn

ition

Tree

projectio

nradic

radicradic

radicradic

radicSensea

ndsend

Behavior

mon

itorin

gradicradic

Noof

patte

rns

learned

Com

plex

and

redu

ndant

patte

rns

MSA

P[63]

Faultp

rediction

Cand

idate

constructio

nradicradic

radicradicradic

radicSensea

ndsend

Telecommun

ication

radicradic

Patte

rnsa

ccuracy

Cand

idate

constructio

nis

expensiveto

compu

te

PTSP

[66]

Objectrsquos

future

movem

ent

predictio

n

Sequ

entia

lpatte

rngeneratio

nradic

radicradic

radicradic

radicRu

le-based

node

activ

ation

Objecttracking

radicradic

Energy

Ineffi

cientto

predict

high

-speed

objects

Clusterin

g

DCC

[86]

WSN

slon

gevity

Data

correlation-

based

cluste

ring

radicradic

radicradicradic

radicradic

Data

supp

ression

GenericWSN

sapplication

radicradic

Energy

anddata

size

Highclu

sterin

grate

H-cluste

r[85]

In-network

commun

ication

Data

correlation-

based

cluste

ring

radicradic

radicradicradic

radicradic

Data

summarization

Real-time

mon

itorin

gradic

radicradic

Com

mun

ication

Highdataloss

rate

16 International Journal of Distributed Sensor Networks

Table5Com

paris

onof

dataminingtechniqu

esforw

irelesssensor

networkscon

tinued

Approach

Objectiv

eDM

metho

d

Processin

gSensor

data

Nod

eproperties

Implem

entatio

nLimitatio

nsArchitecture

Attributes

Correlatio

nCon

nectivity

Mob

ility

Role

Nod

etask

Applicationarea

Evaluatio

nmetho

dDatas

ource

Optobjectiv

e

Distributed

Central

Homogenous

Heterogeneous

Attribute

Spatial

Temporal

Singlehop

Multihops

Static

Mobile

ClusterheadSensorRelay

Simulation

Analyticalmod

Real

Synthetic

Clusterin

gPredictio

nmod

el[87]

Predictio

n-based

mon

itorin

gHeuris

ticscheme

radicradic

radicradic

radicradic

radicradicradic

Localprediction

mod

elEn

vironm

ental

mon

itorin

gradic

radicCom

mun

ication

Clustero

verla

pping

CAG[88]

WSN

sbandw

idth

gain

Data

correlation-

based

cluste

ring

radicradic

radicradic

radicradic

radicradic

Dataa

ggregatio

nGenericWSN

sapplications

radicradic

Com

mun

ication

Sensorydataloss

EEDC[84]

On-demand

cluste

ring

Data

correlation-

based

cluste

ring

radicradic

radicradic

radicradic

radicSensea

ndsend

Surveillanced

ata

analysis

radicradicradic

Energy

Ineffi

cientfor

large

WSN

s

Clusterin

gsensorydata[67]Com

mun

ication

efficiency

K-means

radicradicradic

radicradic

radicradic

Data

summarization

Dataa

nalysis

radicradic

Com

mun

ication

Ineffi

cientfor

large

WSN

sAttributeb

ased

cluste

ring[89]

WSN

sbandw

idth

gain

Hierarchal

cluste

ringradic

radicradic

radicradic

radicradic

Datac

luste

ring

Mon

itorin

gand

tracking

radicradic

Com

mun

ication

Highcompu

tatio

ncost

DHCS

[90]

Uniform

data

distr

ibution

Hierarchal

cluste

ringradic

radicradicradic

radicradic

radicradic

Datac

luste

ring

and

summarization

Interactived

ata

analysis

radicMessage

redu

ction

Nod

esenergy

isigno

red

International Journal of Distributed Sensor Networks 17

Table6Com

paris

onof

dataminingtechniqu

esforw

irelesssensor

networkscon

tinued

Approach

Objectiv

eDM

metho

d

Processin

gSensor

data

Nod

eproperties

Implem

entatio

nLimitatio

nsArchitecture

Attributes

Correlatio

nCon

nectivity

Mob

ility

Role

Nod

etask

Applicationarea

Evaluatio

nmetho

dDatas

ource

Opt

objective

Distributed

Central

Homogenous

Heterogeneous

Attribute

Spatial

Temporal

Singlehop

Multihops

Static

Mobile

ClusterheadSensorRelay

Simulation

Analyticalmod

Real

Synthetic

Classifi

catio

nPerson

identifi

catio

nalgorithm

s[109]

Identifyhu

man

behavior

Decision

tree

radicradicradic

radicradic

radicSensea

ndsend

Health

care

radicradic

Classifi

catio

naccuracy

Doesn

otgu

arantee

thec

orrectness

Predictio

nfram

ework[103]

Distrib

uted

predictio

nDecision

tree

radicradic

radicradicradic

radicradic

Localprediction

Generic

radicradic

Predictio

naccuracy

Com

putatio

nal

complexity

NNTC

[96]

Real-time

classificatio

nNearest

neighb

orradicradic

radicradic

radicradic

Sensea

ndsend

Generic

radicradicradic

Classifi

catio

naccuracy

Not

evaluatedon

realdataset

LWClass[100]

Preserve

WSN

sresources

KNN

radicradic

radicradic

radicradic

Sensea

ndsend

Ubiqu

itous

environm

ents

radicradic

Resource

awareness

Non

adaptio

nto

conceptd

rift

FVLD

[104

]Lo

w-dim

ensio

nfeaturev

ector

generatio

nKN

NM

Lradic

radicradic

radicradic

radicradic

Classifi

catio

nVe

hicle

classificatio

nradic

radicEn

ergy

Highcostof

feature

vector

transm

ission

Fuzzypredictor

mod

el[99]

Occup

ancy

predictio

nFu

zzyrules

radicradic

radicradic

radicradic

Sensea

ndsend

Health

care

radicradic

Predictio

naccuracy

Ineffi

cientfor

complex

scenarios

Onlinelearning

[105]

Increm

ental

classificatio

nSV

Mradic

radicradic

radicradic

radicradic

Classifi

catio

nEn

vironm

ental

mon

itorin

gradic

radicEn

ergy

Com

putatio

nal

complexity

One-class

quarter-sphere

SVM

[108]

Ano

maly

detection

SVM

radicradic

radicradic

radicradicradic

Localano

maly

detection

Habitat

mon

itorin

gradic

radicEn

ergy

Igno

resspatia

lcorrelation

18 International Journal of Distributed Sensor Networks

mining becomes difficult because updates on this structureshould be persisted over time

Node Role Node can perform three types of role [33] asfollows

(i) Regular Sensor These are the nodes with limitedresources and they are used to sense the phenomenaand send the sensed data to the base station

(ii) Cluster Head Cluster head can be a regular sensornode or it can be rich in resources In centralizedapproaches cluster head is a regular sensor node thatonly controls the cluster membership In distributedapproaches besides responding for cluster formationCHs perform aggregationfusion of collected sensorsrsquodata Therefore they are equipped with significantlymore computation and communication resources

(iii) Relay It is the node that acts as medium to transmitthe data packet from one node to the others

Node Task In centralized approach node task is to sense thephenomena being monitored and send the sensed data to thebase station In distributed approaches node can performcomputation and can take action based on the detectedphenomena or target

55 Application Area We also evaluated the type of applica-tion benefited fromWSNs data mining techniques Here weexemplify some real-world applications as follows

(i) First is the environmental monitoring [5ndash7 51 5887] in which sensors are deployed in harsh andunattended regions to monitor the natural environ-ment Data mining techniques can identify when andwhere an event may occur and trigger an alarm upondetection

(ii) Second is the habitant and health monitoring [1 299 109] in which patientshumans are equipped withsmall sensors on multiple different positions of theirbody tomonitor their health or behaviorDataminingtechnique can identify the abnormal behavior andhelp to take effective action

(iii) Third is the object tracking [3 4 65 66] in whichsensors are embedded inmoving targets to track themin real-time Data mining techniques help to improvethe estimation of the location of targets and also tomake tracking more efficient and accurate

(iv) Fourth is the WSNs performance [46 48 50 51]WSNs are usually unattended and deployed in harshenvironment Sensor nodes are resource constrainedespecially in terms of power Data mining techniqueshelp to identify the faulty or dead nodes Theyalso help to conserve energy by using in-networkprocessing in which aggregated data is sent to centralside

(v) Fifth is the data analysis [67 84 90] Data miningtechniques help to discover potentially interesting

data patterns in a sensor network for a certainapplication

(vi) Sixth is the real-time monitoring [64 65 85] Datamining techniques especially distributed techniqueshelp to identify certain patterns and predict futureevents in a given time window which make real-timeresponse and action feasible

56 Implementation Each technique is also evaluated interms of experimental validation that is which dataset isused which WSNs optimization objectives are achieved andso forth

Evaluation Method Analytical modeling simulation andreal deployment are the most commonly used techniques toanalyze the performance of data mining technique forWSNs

(i) Analytical Modeling This method is very complexand usually certain simplifications are assumed topredict the performance of the proposed schemeSuch assumptions and simplifications may lead toimprecise results with limited confidence

(ii) Simulation It is the most popular and effectiveapproach to design and test any proposed schemein terms of cost and time it also provides higherlevel of details as comparedwith real implementationHowever the appropriate selection of a simulationframework according to problem and network char-acteristics is a critical task

(iii) Real Deployment It may not be feasible to evaluatethe performance of these techniques through realdeployment due to the unavailability of appropriatehardware in terms of technical and design limitationsUsually the real deployment requires hundreds ofsensor nodes and cost becomes another importantissue In a nutshell evaluating any technique pro-posed for WSNs through real deployment can getthe most convincing results although the evaluatingprocess is complex costly and time consuming

Data Source It refers to dataset use to experimentally validatethe proposed technique Two types of dataset are usedgenerally that is synthetic and real It is observed from thispaper that most of the techniques use the simulation onsynthetic dataset to validate the result In this paper it isobserved that most of the studies used the simulation due tolimited processing power of sensor nodes

Optimization Objective SinceWSNs are constrained in termsof different resources the technique is also evaluated in theoptimization objective that has been achieved Most of thetechniques consider the resource constraint and differentdesign philosophies of network None of them can workefficiently for all of the performance metrics like networksize communication overhead energy efficiency memoryconsumption node mobility and and so forth The largevariations in the performance metrics make it a difficult taskto present a comprehensive evaluation

International Journal of Distributed Sensor Networks 19

6 Limitations of Existing Data MiningTechniques for WSNs

Tables 2ndash6 show the characteristics of datamining techniquesdesigned for WSNs It is observed from comparative analysisthat the existing techniques have the following shortcomings

(i) Most of the techniques do not take into account theheterogeneous data and assume that the sensor data ishomogenous [42 46 49ndash51 65 87 110] They ignorethe fact that different attributes together can improvethe mining accuracy In some cases homogenousdata cannot contribute appropriately toward real-time decision

(ii) The majority of techniques only considers the spatialor temporal or spatiotemporal correlations [65ndash6787 88] among sensor data of neighboring nodes anddoes not consider the attribute dependency amongsensor nodes This in turn increases the computa-tional complexity and reduces the accuracy of miningtechnique

(iii) The techniqueswhich consider spatial correlation [51]among sensor data of neighboring nodes suffer fromthe choice of appropriate neighborhood range Tech-niques which consider temporal correlation amongsensor data suffers from the choice of the size of thesliding window

(iv) The majority of techniques uses centralized approach[21 42ndash44 46 58 84 101] in which all data istransmitted to the sink node for identifying certainpatterns These techniques cause much communica-tion overhead and delay the response time Whilethe techniques that used distributed architecture opti-mize response time and energy consumption theyhave the same problem as that of the centralizedapproach if the aggregatorcluster head has a largenumber of nodes under its membership

(v) Excluding a few the performance of all of the schemesdiscussed in this paper has been evaluated with thehelp of different simulation tools Although the num-ber of simulators is available and plays an importantrole for developing and testing new technique thereis always some kind of risk involved as simulationresults may not be accurate In order to analyze aprotocol more effectively it is important to knowdifferent available tools andunderstand the associatedbenefits and limitationsDue to different performancerequirements according to specific applications ageneral tool for sensor networks is still lacking atpresent

(vi) The techniques evaluated by using analytical mod-eling [21 23 46 49 100 109] used certain sim-plification and assumption to evaluate the perfor-mance of proposed technique Such assumptions andsimplifications may lead to imprecise results withlimited confidence None of the proposed techniqueis evaluated by using real deployment Although realdeployment is complex costly and time consuming

accurate results can only be obtained by using realdeployment

(vii) Excluding a few [22 103 109] the majority oftechniques assumes that sensor nodes are stationaryand do not consider nodes mobility Applying thesetechniques for mobile networks or the networks withdynamic changed topology would be challenging

(viii) Most of the techniques used the synthetic dataAlthough synthetic data is easily available therealways been chances that results generated on syn-thetic data are not accurate

(ix) For the data mining techniques themselves fre-quent pattern mining [15ndash20] approaches suffer fromchoice of proper and flexible support and confidencethreshold Clustering techniques [11ndash14] suffer fromthe choice of an appropriate parameter of clusterwidth and computing the distance between datainstances in heterogeneous data is computationallyexpensive whereas classification-based techniques[24ndash26] require some prior knowledge to classify theincoming data stream However learning accurateclassification model is challenging if the number ofvariables is large in deployed WSNs

7 Future Research Directions

It is observed from the analysis of existing data mining workon sensor network-based application there are still shortcom-ings in existing techniques By seeing these shortcomingsand special characteristics of WSNs there is a need for datamining technique designed for WSNs The technique shouldbe based on the following requirements

(i) The technique should combine offline learningmech-anisms with distributed and online data processing

(ii) It should also consider the resource constraint ofWSN and its special characteristics such as nodemobility and network topology

(iii) The technique should consider heterogeneous dataand dependencies among spatial temporal andattribute correlations which may exist between adja-cent nodes

(iv) During online mining the technique should be capa-ble for incremental learning

(v) The technique should have low computation com-plexity and be easy to be implemented

Based on aforementioned requirements for WSN ahybrid data mining framework is proposed as shown inFigure 6 In this framework sensor nodes use their pro-cessing abilities to locally carry out mining processing andtransmit only the required and partially processed data calledlocal models Single-pass algorithms are applied for networkdata processing as the data is continuously arriving and notavailable for the next scan

Local models contain the compact event patterns ratherthan raw data which address the issue of communication

20 International Journal of Distributed Sensor Networks

Node data processingData selectionRemove duplicationAggregationSummarizationData fusionclusteringAssociation analysismiddot middot middotmiddot middot middot

middot middot middot

Sensor datastream

Global model

Approximateresults

Network model Local modelQuery

Users

Sinkbasestation

In-network processingCentralized processing

Central data processingFrequent pattern miningClassificationClusteringIncremental learningPredicationAnomaly detectionTime series analysis

Network data processingLocal model integrationNetwork analysisReal time decisionsNetwork maintenance

Network patternidentification

monitoring

Sing

le p

ass

Mul

ti pa

ss

Figure 6 Proposed hybrid framework for sensor network based applications

overhead associated with data transfer Local models aredistributed on entire network which are integrated at specialnode which is resource sufficient as compared with othersensor nodes As a result a network model is computed that ismore abstract than local model and is transferred to the basestationsink inmultihop fashionThenetworkmodels are thenintegrated at base stationsink to get the global view of entirenetwork named the global model As a result approximatequery answers are returned to endusers

This framework addresses the following shortcomings ofthe existing techniques

(i) It combines the offline learning mechanisms withdistributed and online data processing The dynamicnature of WSNs data requires real-time analysismethodologies and systems Centralized processingthrough high-end computing is also required forgenerating offline predictive insights which in turncan facilitate real-time analysis The applications thatrequire real-time response and actions can use net-work model for decision and knowledge extractionThe applications that need extensive data analysis fortheir decision making can use global model and per-form central processing on base the stationsink Thenetwork model forwards the processed informationto global model for extensive predictive insight

(ii) Since the data management is a crucial issue inWSNsdata [111] in order to deal with large-scale data fromWSNs the proposed framework splits the data pro-cessing tasks at multiple locations in-network pro-cessing and processing at central server In-networkprocessing splits the large task into smaller ones atnode level and cluster head which is distributed overthe entire network and executes parallelly At the node

level storage capacities of single nodes are used tocompute the local model which contains aggregateddata from single node whereas cluster head acquiresthe data from group of nodes and aggregate datareadings over a certain region or period As a resultnetwork model is computed at each cluster headwhich contains compact data from set of nodes andreduces data size to be transmitted Network modelscan be integrated at sink to get the global view ofreal-time applications Since the sink at network levelhas restricted resource and cannot process large-scaledata for predictive analysis therefore network mod-els are sent to central server where global models canbe computed for predictive offline analysis Historicalquery from the user can also be addressed fromcentral server whereas instant query can be handledby sink to support real-time response In this way ofdata distribution the proposed framework is feasibleto deal with large amount of data obtained fromWSNs

(iii) It can consider the resource constraint of sensornode by using context-awareness techniques Mem-ory energy [79] and bandwidth are considered inthe implementation of data processing on the sensorsfor example many summarization and aggregationtechniques can be adopted to reduce energy andbandwidth consumption

(iv) The framework can address the problem quicklychanging nature of WSNs data where characteristicsof the monitored process may change over timeand render the old models outdated This problemcan be addressed using the incremental learning

International Journal of Distributed Sensor Networks 21

mechanism [39 112] that helps the model to updatenew information

(v) The framework can identified the spatial-temporalcorrelation at local model by using data correlation-based clustering whereas attribute correlation can beidentified at global model by using the multipass datamining algorithms

Currently we are working on implementation of thishybrid framework and the implementationwill be completedin the near future

8 Conclusion

The emerging need for the data mining techniques in thefield of WSNs resulted in the development of numerousalgorithms Each one of these algorithms solves certainissues related to the appropriate WSNs type and applicationIn this paper we analyzed discussed and compared therelated existing research approaches We observed that thetechniques intended for mining sensor data at the networkside are helpful for taking real-time decision aswell as serve asprerequisite for development of effective mechanism for datastorage retrieval query and transaction processing at centralside Moreover we have presented problem-based taxonomyan overall analysis and review of the past research and theirlimitations which can provide insights for endusers in apply-ing or developing an appropriate data mining method andappropriate technology forWSNs Based on these limitationswe have proposed a hybrid framework which can addressthe shortcomings of existing work We have also discussedthe challenges for implementing data mining techniques inresource-constrained WSNs Besides there are a number ofopen issues in existing studies which need to be addressedSurely the number of WSNs applications presented hereis neither complete nor exhaustive but merely a sample ofapplications that demonstrate the usefulness and possibleapplications of data mining method in sensor network

We believe that WSNs applications will become moremature and popular with the advancement of sensor tech-nology and sensor data will become more informationrich Mining techniques will then be very significant inorder to conduct advanced analysis such as determiningtrends and finding interesting patterns thus enhancingWSNsperformance and operation The intention to present thispaper is to stimulate interests in utilizing and developing theprevious studies into emerging applications

Acknowledgments

This work was supported in part by the Joint Funds ofNSFC-Microsoft Research Asia under Grant no 60933012the Specialized Research Fund for the Doctoral Programof Higher Education under Grant no 20110142110062 andInternational SampT Cooperation Program of Hubei Provinceunder Grant no 2010BFA008

References

[1] A Rozyyev H Hasbullah and F Subhan ldquoIndoor child track-ing in wireless sensor network using fuzzy logic techniquerdquoResearch Journal of Information Technology vol 3 no 2 pp 81ndash92 2011

[2] R Szewczyk E Osterweil J Polastre M Hamilton A Main-waring and D Estrin ldquoHabitat monitoring with sensor net-worksrdquo Communications of the ACM vol 47 no 6 pp 34ndash402004

[3] S H Chauhdary A K Bashir S C Shah and M S ParkldquoEOATR energy efficient object tracking by auto adjustingtransmission range in wireless sensor networkrdquo Journal ofApplied Sciences vol 9 no 24 pp 4247ndash4252 2009

[4] P K Biswas and S Phoha ldquoSelf-organizing sensor networks forintegrated target surveillancerdquo IEEETransactions onComputersvol 55 no 8 pp 1033ndash1047 2006

[5] L T Lee and C W Chen ldquoSynchronizing sensor networkswith pulse coupled and cluster based approachesrdquo InformationTechnology Journal vol 7 no 5 pp 737ndash745 2008

[6] N Sabri S A Aljunid B Ahmad A Yahya R KamaruddinandM S Salim ldquoWireless sensor actor network based on fuzzyinference system for greenhouse climate controlrdquo Journal ofApplied Sciences vol 11 no 17 pp 3104ndash3116 2011

[7] D Kumar ldquoMonitoring forest cover changes using remotesensing and GIS a global prospectiverdquo Research Journal ofEnvironmental Sciences vol 5 pp 105ndash123 2011

[8] J Yick B Mukherjee and D Ghosal ldquoWireless sensor networksurveyrdquoComputerNetworks vol 52 no 12 pp 2292ndash2330 2008

[9] T Arampatzis J Lygeros and S Manesis ldquoA survey of appli-cations of wireless sensors and wireless sensor networksrdquoin Proceedings of the 20th IEEE International Symposium onIntelligent Control (ISIC rsquo05) pp 719ndash724 June 2005

[10] Y-C Tseng M-S Pan and Y-Y Tsai ldquoWireless sensor net-works for emergency navigationrdquo Computer vol 39 no 7 pp55ndash62 2006

[11] T Yairi Y Kato and K Hori ldquoFault detection by miningassociation rules fromhouse-keeping datardquo inProceedings of the6th International Symposium on Artificial Intelligence Roboticsand Automation in Space pp 18ndash21 2001

[12] O Horovitz S Krishnaswamy and M M Gaber ldquoA fuzzyapproach for interpretation of ubiquitous data stream clusteringand its application in road safetyrdquo Intelligent Data Analysis vol11 no 1 pp 89ndash108 2007

[13] J Gama P P Rodrigues and L Lopes ldquoClustering distributedsensor data streams using local processing and reduced com-municationrdquo Intelligent Data Analysis vol 15 no 1 pp 3ndash282011

[14] Z A Aghbari I Kamel and T Awad ldquoOn clustering largenumber of data streamsrdquo Intelligent Data Analysis vol 16 no1 pp 69ndash91 2012

[15] A Boukerche and S Samarah ldquoAn efficient data extractionmechanism for mining association rules from wireless sensornetworksrdquo in Proceedings of the IEEE International Conferenceon Communications (ICC rsquo07) pp 3936ndash3941 June 2007

[16] Y Chi H Wang P S Yu and R R Muntz ldquoMomentmaintaining closed frequent itemsets over a stream slidingwindowrdquo inProceedings of the 4th IEEE International Conferenceon Data Mining (ICDM rsquo04) pp 59ndash66 November 2004

[17] M Deypir and M H Sadreddini ldquoEclatDS an efficient slid-ing window based frequent pattern mining method for data

22 International Journal of Distributed Sensor Networks

streamsrdquo Intelligent Data Analysis vol 15 no 4 pp 571ndash5872011

[18] J Gama A Ganguly O Omitaomu R Vatsavai and M GaberldquoKnowledge discovery from data streamsrdquo Intelligent DataAnalysis vol 13 no 3 pp 403ndash404 2009

[19] B George J M Kang and S Shekhar ldquoSpatio-temporal sensorgraphs (STSG) a data model for the discovery of spatio-temporal patternsrdquo Intelligent Data Analysis vol 13 no 3 pp457ndash475 2009

[20] A Mahmood K Shi and S Khatoon ldquoMining data generatedby sensor networks a surveyrdquo Information Technology Journalvol 11 pp 1534ndash1543 2012

[21] D J Cook M Youngblood E O Heierman III et alldquoMavHome an agent-based smart homerdquo in Proceedings of the1st IEEE International Conference on Pervasive Computing andCommunications (PerCom rsquo03) pp 521ndash524 March 2003

[22] J Rabatel S Bringay and P Poncelet ldquoSO MAD sensorminingfor anomaly detection in railway datardquo in Advances in DataMining Applications andTheoretical Aspects pp 191ndash205 2009

[23] V Guralnik and K Z Haigh ldquoLearning models of humanbehaviour with sequential patternsrdquo in Proceedings of the AAAI-02 Workshop on Automation as Caregiver pp 24ndash30 2002

[24] S Huang and Y Dong ldquoAn active learning system for miningtime-changing data streamsrdquo Intelligent Data Analysis vol 11no 4 pp 401ndash419 2007

[25] J Beringer and E Hullermeier ldquoEfficient instance-based learn-ing on data streamsrdquo Intelligent Data Analysis vol 11 no 6 pp627ndash650 2007

[26] E J Spinosaa A PD L F deCarvalhoa and J Gamab ldquoNoveltydetection with application to data streamsrdquo Intelligent DataAnalysis vol 13 no 3 pp 405ndash422 2009

[27] M Xie S Han B Tian and S Parvin ldquoAnomaly detectionin wireless sensor networks a surveyrdquo Journal of Network andComputer Applications vol 34 no 4 pp 1302ndash1325 2011

[28] Y Zhang N Meratnia and P Havinga ldquoOutlier detectiontechniques for wireless sensor networks a surveyrdquo IEEE Com-munications Surveys and Tutorials vol 12 no 2 pp 159ndash1702010

[29] V Chandola A Banerjee and V Kumar ldquoAnomaly detection asurveyrdquo ACM Computing Surveys vol 41 no 3 article 15 2009

[30] VMaojo and J Sanandre ldquoA survey of data mining techniquesrdquoMedical Data Analysis Lecture Notes in Computer Science vol1933 pp 17ndash22 2000

[31] W Jinlong X Congfu C Weidong and P Yunhe ldquoSurveyof the study on frequent pattern mining in data streamsrdquo inProceedings of the IEEE International Conference on SystemsMan and Cybernetics (SMC rsquo04) pp 5917ndash5922 October 2004

[32] J Cheng Y Ke and W Ng ldquoA survey on algorithms formining frequent itemsets over data streamsrdquo Knowledge andInformation Systems vol 16 no 1 pp 1ndash27 2008

[33] A A Abbasi andM Younis ldquoA survey on clustering algorithmsfor wireless sensor networksrdquo Computer Communications vol30 no 14-15 pp 2826ndash2841 2007

[34] O Boyinbode H Le and M Takizawa ldquoA survey on clusteringalgorithms for wireless sensor networksrdquo International Journalof Space-Based and SituatedComputing vol 1 no 2 pp 130ndash1362007

[35] M M Gaber A Zaslavsky and S Krishnaswamy ldquoA survey ofclassificationmethods in data streamsrdquo inData Streams pp 39ndash59 Springer 2007

[36] R Agrawal and R Srikant ldquoFast algorithms for mining associ-ation rulesrdquo in Proceedings of the 20th International ConferenceVery Large Data Bases (VLDB rsquo94) pp 487ndash499 Citeseer 1994

[37] R J Bayardo Jr ldquoEfficiently mining long patterns fromdatabasesrdquo SIGMOD Record vol 27 no 2 pp 85ndash93 1998

[38] S Brin RMotwani andC Silverstein ldquoBeyondmarket basketsgeneralizing association rules to correlationsrdquo SIGMODRecordvol 26 no 2 pp 265ndash276 1997

[39] W Cheung and O R Zaiane ldquoIncremental mining of frequentpatterns without candidate generation or support constraintrdquoin Proceedings of 7th International Database Engineering andApplications Symposium pp 111ndash116 2003

[40] R Agrawal T Imielinski and A Swami ldquoMining associationrules between sets of items in large databasesrdquo in Proceeding ofSIGMOD pp 207ndash216

[41] J Han J Pei Y Yin and R Mao ldquoMining frequent pat-terns without candidate generation a frequent-pattern treeapproachrdquo Data Mining and Knowledge Discovery vol 8 no 1pp 53ndash87 2004

[42] M Halatchev and L Gruenwald ldquoEstimating missing valuesin related sensor data streamsrdquo in Proceedings of the 11thInternational Conference on Management of Data (COMADrsquo05) 2005

[43] N Jiang ldquoDiscovering association rules in data streams basedon closed pattern miningrdquo in Proceedings of the SIGMODWorkshop on Innovative Database Research 2007

[44] N Jiang and L Gruenwald ldquoEstimating missing data in datastreamsrdquo Advances in Databases Concepts Systems and Appli-cations pp 981ndash987 2007

[45] N Jiang and L Gruenwald ldquoCFI-stream mining closed fre-quent itemsets in data streamsrdquo in Proceedings of the 12th ACMSIGKDD International Conference on Knowledge Discovery andData Mining (KDD rsquo06) pp 592ndash597 August 2006

[46] K Loo I Tong and B Kao ldquoOnline algorithms for min-ing inter-stream associations from large sensor networksrdquo inAdvances in KnowledgeDiscovery andDataMining pp 291ndash3022005

[47] G S Manku and R Motwani ldquoApproximate frequency countsover data streamsrdquo in Proceedings of the 28th InternationalConference on Very Large Data Bases pp 346ndash357 2002

[48] S K Chong S Krishnaswamy S W Loke and M M GaberldquoUsing association rules for energy conservation in wirelesssensor networksrdquo in Proceedings of the 23rd Annual ACMSymposium on Applied Computing (SAC rsquo08) pp 971ndash975March 2008

[49] S K Tanbeer C F Ahmed B-S Jeong and Y-K Lee ldquoEfficientmining of association rules from wireless sensor networksrdquo inProceedings of the 11th International Conference on AdvancedCommunication Technology (ICACT rsquo09) pp 719ndash724 February2009

[50] A Boukerche and S Samarah ldquoA novel algorithm for miningassociation rules in Wireless Ad Hoc Sensor Networksrdquo IEEETransactions on Parallel and Distributed Systems vol 19 no 7pp 865ndash877 2008

[51] K Romer ldquoDistributed mining of spatio-temporal event pat-terns in sensor networksrdquo in Proceedings of the 1st Euro-American Workshop on Middleware for Sensor Networks(EAWMS rsquo06) 2006

[52] BTnode platform httpwwwbtnodeethzch[53] R Agrawal and R Srikant ldquoMining sequential patternsrdquo in

Proceedings of the IEEE 11th International Conference on DataEngineering pp 3ndash14 March 1995

International Journal of Distributed Sensor Networks 23

[54] R Srikant and R Agrawal ldquoMining sequential patterns gen-eralizations and performance improvementsrdquo in Proceedings ofthe Advances in Database Technology (EDBT rsquo96) pp 1ndash17 1996

[55] F Masseglia F Cathala and P Poncelet ldquoThe PSP approachfor mining sequential patternsrdquo Principles of Data Mining andKnowledge Discovery pp 176ndash184 1998

[56] J Han J Pei B Mortazavi-Asl Q Chen U Dayal and M-CHsu ldquoFreeSpan frequent pattern-projected sequential patternminingrdquo in Proceedings of the Sixth ACMSIGKDD InternationalConference onKnowledgeDiscovery andDataMining (KDD rsquo01)pp 355ndash359 August 2000

[57] J Pei J Han B Mortazavi-Asl et al ldquoPrefixSpan min-ing sequential patterns efficiently by prefix-projected patterngrowthrdquo in Proceedings of the 17th International Conference onData Engineering pp 215ndash224 April 2001

[58] F Esposito T M A Basile N Di Mauro and S Ferilli ldquoA rela-tional approach to sensor network data miningrdquo InformationRetrieval and Mining in Distributed Environments pp 163ndash1812010

[59] F Esposito N Di Mauro T M A Basile and S FerillildquoMulti-dimensional relational sequence miningrdquo FundamentaInformaticae vol 89 no 1 pp 23ndash43 2008

[60] R Agrawal H Mannila R Srikant et al ldquoFast discovery ofassociation rulesrdquo inAdvances in KnowledgeDiscovery andDataMining pp 307ndash328 AAAI PressMenlo Park Calif USA 1996

[61] Mica2Dot CrossBow 2005 httpwwwxbowcom[62] Intel Berkeley Research Lab Data httpdbcsailmitedulab-

datalabdatahtml[63] P H Wu W C Peng and M S Chen ldquoMining sequential

alarm patterns in a telecommunication databaserdquo in Databasesin Telecommunications II pp 37ndash51 2001

[64] V S Tseng and E H-C Lu ldquoEnergy-efficient real-time objecttracking in multi-level sensor networks by mining and predict-ing movement patternsrdquo Journal of Systems and Software vol82 no 4 pp 697ndash706 2009

[65] V S Tseng and K W Lin ldquoEnergy efficient strategies for objecttracking in sensor networks a data mining approachrdquo Journalof Systems and Software vol 80 no 10 pp 1678ndash1698 2007

[66] S Samarah M Al-Hajri and A Boukerche ldquoA predictiveenergy-efficient technique to support object-tracking sensornetworksrdquo IEEE Transactions on Vehicular Technology vol 60no 2 pp 656ndash663 2011

[67] A Taherkordi R Mohammadi and F Eliassen ldquoA commu-nication-efficient distributed clustering algorithm for sensornetworksrdquo in Proceedings of the 22nd International Conferenceon Advanced Information Networking and Applications Work-shopsSymposia (AINA rsquo08) pp 634ndash638 March 2008

[68] G Gupta and M Younis ldquoLoad-balanced clustering of wirelesssensor networksrdquo in Proceedings of the International Conferenceon Communications (ICC rsquo03) vol 3 pp 1848ndash1852 May 2003

[69] S Bandyopadhyay and E J Coyle ldquoAn energy efficient hier-archical clustering algorithm for wireless sensor networksrdquo inProceedings of the 22nd Annual Joint Conference on the IEEEComputer and Communications Societies pp 1713ndash1723 April2003

[70] S Ghiasi A Srivastava X Yang and M Sarrafzadeh ldquoOptimalenergy aware clustering in sensor networksrdquo Sensors vol 2 no7 pp 258ndash269 2002

[71] O Younis and S Fahmy ldquoHEED a hybrid energy-efficientdistributed clustering approach for ad hoc sensor networksrdquoIEEE Transactions on Mobile Computing vol 3 no 4 pp 366ndash379 2004

[72] M Younis M Youssef and K Arisha ldquoEnergy-aware manage-ment for cluster-based sensor networksrdquo Computer Networksvol 43 no 5 pp 649ndash668 2003

[73] Y T Hou Y Shi H D Sherali and S F Midkiff ldquoOn energyprovisioning and relay node placement for wireless sensornetworksrdquo IEEE Transactions on Wireless Communications vol4 no 5 pp 2579ndash2590 2005

[74] T Wu and S Biswas ldquoA self-reorganizing slot allocation proto-col for multi-cluster sensor networksrdquo in Proceedings of the 4thInternational Symposium on Information Processing in SensorNetworks (IPSN rsquo05) pp 309ndash316 April 2005

[75] K Dasgupta K Kalpakis and P Namjoshi ldquoAn efficientclustering-based heuristic for data gathering and aggregationin sensor networksrdquo in Proceedings of the IEEE Wireless Com-munications and Networking Conference (WCNC rsquo03) vol 3 pp1948ndash1953 2003

[76] M Demirbas A Arora and V Mittal ldquoFLOC A fast local clus-tering service for wireless sensor networksrdquo in Proceedings ofWorkshop on Dependability Issues in Wireless Ad Hoc Networksand Sensor Networks (DIWANS rsquo04) 2004

[77] P Ding J Holliday and A Celik ldquoDistributed energy-efficienthierarchical clustering for wireless sensor networksrdquo in Pro-ceedings of the 1st IEEE International Conference on DistributedComputing in Sensor Systems (DCOSS rsquo05) pp 466ndash467 July2005

[78] H Chan and A Perrig ldquoACE an emergent algorithm for highlyuniform cluster formationrdquoWireless Sensor Networks vol 2920pp 154ndash171 2004

[79] H Chan M Luk and A Perrig ldquoUsing clustering informationfor sensor network localizationrdquo in Proceedings of the 1st IEEEInternational Conference on Distributed Computing in SensorSystems (DCOSS rsquo05) pp 109ndash125 July 2005

[80] H Huang and J Wu ldquoA probabilistic clustering algorithmin wireless sensor networksrdquo in Proceeding of IEEE 62ndSemiannual Vehicular Technology Conference (VTC rsquo05) p 17962005

[81] A Youssef M Younis M Youssef and A Agrawala ldquoDis-tributed formation of overlappingmulti-hop clusters in wirelesssensor networksrdquo in Proceedings of the 49th Annual IEEE GlobalCommunication Conference (Globecom rsquo06) pp 1ndash6 December2006

[82] S Dai P Wang L Gao and S Zheng ldquoMining clusteringalgorithm in wireless sensor networksrdquo in Proceedings of theIEEE International Conference on Granular Computing (GRCrsquo08) pp 178ndash182 August 2008

[83] W R Heinzelman A Chandrakasan and H Balakrish-nan ldquoEnergy-efficient communication protocol for wirelessmicrosensor networksrdquo in Proceedings of the 33rd AnnualHawaii International Conference on System Siences (HICSS rsquo00)vol 2 p 223 January 2000

[84] C Liu K Wu and J Pei ldquoA dynamic clustering and schedulingapproach to energy saving in data collection from wirelesssensor networksrdquo in Proceedings of the 2nd Annual IEEE Com-munications Society Conference on Sensor and AdHoc Commu-nications and Networks (SECON rsquo05) pp 374ndash385 September2005

[85] L Guo C Ai X Wang Z Cai and Y Li ldquoReal time clusteringof sensory data in wireless sensor networksrdquo in Proceedingsof the IEEE 28th International Performance Computing andCommunications Conference (IPCCC rsquo09) pp 33ndash40 December2009

24 International Journal of Distributed Sensor Networks

[86] M H Yeo M S Lee S J Lee and J S Yoo ldquoData correlation-based clustering in sensor networksrdquo in Proceedings of the Inter-national Symposium on Computer Science and its Applications(CSA rsquo08) pp 332ndash337 October 2008

[87] P Beyens A Nowe and K Steenhaut ldquoHigh-density wirelesssensor networks a new clustering approach for prediction-based monitoringrdquo in Proceedings of the 2nd European Work-shop on Wireless Sensor Networks (EWSN rsquo05) pp 188ndash196February 2005

[88] S Yoon and C Shahabi ldquoThe Clustered AGgregation (CAG)technique leveraging spatial and temporal correlations in wire-less sensor networksrdquo ACM Transactions on Sensor Networksvol 3 no 1 Article ID 1210672 2007

[89] K Wang S A Ayyash T D C Little and P Basu ldquoAttribute-based clustering for information dissemination in wirelesssensor networksrdquo in Proceedings of the 2nd Annual IEEE Com-munications Society Conference on Sensor and AdHoc Commu-nications and Networks (SECON rsquo05) pp 498ndash509 Santa ClaraCalif USA September 2005

[90] X Ma S Li Q Luo et al ldquoDistributed hierarchical clusteringand summarization in sensor networksrdquo in Advances in Dataand Web Management pp 168ndash175 2007

[91] L K Sharma O P Vyas S Schieder et al ldquoNearest neighbourclassification for trajectory datardquo Information and Communica-tion Technologies vol 101 pp 180ndash185 2010

[92] B Chikhaoui S Wang and H Pigot ldquoA new algorithm basedon sequential pattern mining for person identification in ubiq-uitous environmentsrdquo in Proceedings of the 4th InternationalWorkshop on Knowledge Discovery form Sensor Data (ACMSensorKDD rsquo10) pp 20ndash28 Washington DC USA 2010

[93] J R M Bauchet S Giroux H Pigot et al ldquoPervasive assistancein smart homes for people with intellectual disabilities a casestudy on meal preparationrdquo International Journal of AssistiveRobotics and Mechatronics vol 9 no 4 pp 42ndash54 2008

[94] D J Cook andM Schmitter-Edgecombe ldquoAssessing the qualityof activities in a smart environmentrdquoMethods of Information inMedicine vol 48 no 5 pp 480ndash485 2009

[95] I H Witten and E Frank Data Mining Practical MachineLearning Tools and Techniques With Java Implementation Mor-gan Kaufmann 2000

[96] K Sharma M Rajpoot and L K Sharma ldquoNearest neighbourclassification for wireless sensor network datardquo InternationalJournal of Computer Trends and Technology no 2 2011

[97] NS2 Simulator httpwwwisiedunsnamns[98] O P V L K Sharma S Schieder and A K Akasapu ldquoA nearest

neighbour classification for trajectory datardquo in Springer CCISvol 101 pp 180ndash185 2010

[99] M J Akhlaghinia A Lotfi C Langensiepen and N SherkatldquoA fuzzy predictor model for the occupancy prediction of anintelligent inhabited environmentrdquo in Proceedings of the IEEEInternational Conference on Fuzzy Systems (FUZZ rsquo08) pp 939ndash946 June 2008

[100] M Gaber S Krishnaswamy and A Zaslavsky ldquoOn-boardmining of data streams in sensor networksrdquo in AdvancedMethods for Knowledge Discovery from Complex Data pp 307ndash335 2005

[101] M M Gaber S Krishnaswamy and A Zaslavsky ldquoAdaptivemining techniques for data streams using algorithm outputgranularityrdquo in Proceedings of the Australasian Data MiningWorkshop 2003

[102] M M Gaber A Zaslavsky and S Krishnaswamy ldquoResource-aware knowledge discovery in data streamsrdquo in Proceedingsof 1st International Workshop on Knowledge Discovery in DataStreams held in Conjunction ECML and PKDD 2004

[103] S M McConnell and D B Skillicorn ldquoA distributed approachfor prediction in sensor networksrdquo in Proceedings of the Work-shop on Data Mining in Sensor Networks Newport Beach CalifUSA 2005

[104] B Malhotra I Nikolaidis and J Harms ldquoDistributed classifi-cation of acoustic targets in wireless audio-sensor networksrdquoComputer Networks vol 52 no 13 pp 2582ndash2593 2008

[105] K Flouri B Beferull-Lozano and T Tsakalides ldquoTraininga SVM-based classifier in distributed sensor networksrdquo inProceedings of the 14th International Conference onDigital SignalProcessing (DSP rsquo09) pp 1ndash5 2006

[106] K Flouri B Beferull-Lozano and T Tsakalides ldquoEnergy-efficient distributed support vectormachines for wireless sensornetworksrdquo in Proceedings of the EuropeanWorkshop onWirelessSensor Networks 2006

[107] K Flouri B Beferull-Lozano and T Tsakalides ldquoDistributedconsensus algorithms for SVM training in wireless sensornetworksrdquo in Proceedings of the 16th European Signal ProcessingConference (EUSIPCO 09) 2008

[108] S Rajasegarar C Leckie M Palaniswami and J C BezdekldquoQuarter sphere based distributed anomaly detection in wire-less sensor networksrdquo in Proceedings of the IEEE InternationalConference on Communications (ICC rsquo07) pp 3864ndash3869 June2007

[109] B Chikhaoui S Wang and H Pigot ldquoA new algorithm basedon sequential pattern mining for person identification in ubiq-uitous environmentsrdquo in Proceedings of the 4th InternationalWorkshop on Knowledge Discovery form Sensor Data (ACMSensorKDD rsquo10) pp 20ndash28 Washington DC USA 2010

[110] K Romer and F Mattern ldquoThe design space of wireless sensornetworksrdquo IEEEWireless Communications vol 11 no 6 pp 54ndash61 2004

[111] O Diallo J J P C Rodrigues and M Sene ldquoReal-time datamanagement on wireless sensor networks a surveyrdquo Journal ofNetwork andComputer Applications vol 35 no 3 pp 1013ndash10212012

[112] Y Yao L Feng B Jin and F Chen ldquoAn incremental learningapproachwith SupportVectorMachine for network data streamclassification problemrdquo Information Technology Journal vol 11no 2 pp 200ndash208 2012

Submit your manuscripts athttpwwwhindawicom

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2013Part I

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

DistributedSensor Networks

International Journal of

ISRN Signal Processing

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Mechanical Engineering

Advances in

Modelling amp Simulation in EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Advances inOptoElectronics

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2013

ISRN Sensor Networks

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawi Publishing Corporation httpwwwhindawicom Volume 2013

The Scientific World Journal

ISRN Robotics

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

International Journal of

Antennas andPropagation

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

ISRN Electronics

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

thinspJournalthinspofthinsp

Sensors

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Active and Passive Electronic Components

Chemical EngineeringInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Electrical and Computer Engineering

Journal of

ISRN Civil Engineering

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Advances inAcoustics ampVibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Page 3: ReviewArticle Data Mining Techniques for Wireless Sensor ...home.etf.bg.ac.rs/~vm/os/dmsw/Data Mining... · have a large impact on type of data mining algorithm to choose;therefore,onehastodecidetheprocessing

International Journal of Distributed Sensor Networks 3

continuous rapid and changing data streams andalso how to incorporate user interaction during high-speed data arrival

(iii) Online Mining In WSNs environment data is geo-graphically distributed inputs arrive continuouslyand newer data items may change the results basedon older data substantially Most of data mining tech-niques that analyze data in an offline manner do notmeet the requirement of handling distributed streamdata Thus a challenge for data mining techniques ishow to process distributed streaming data online

(iv) Modeling Changes of Mining Results Over Time Whenthe data-generating phenomenon is changing overtime the extracted model at any time should beup-to-date Due to the continuity of data streamssome researchers have pointed out that capturing thechange of mining results is more important in thisarea than themining resultsThe research issue is howto model this change in the results

(v) Data Transformation Since sensor nodes are limitedin terms of bandwidth transforming original dataover the network is not feasible Knowledge structuretransformation is an important issue After extractingmodel and patterns locally from WSNs data theoutput is transferred to the base stationThe challengefor data mining technique is how to efficiently rep-resent data and discovered patterns over network fortransmission

(vi) DynamicNetwork Topology Sensor network deployedin potentially harsh uncertain heterogenic anddynamic environments Moreover sensor nodes maymove among different locations at any point overtime Such dynamicity and heterogeneity increase thecomplexity of designing an appropriate data miningtechnique for WSNs

To address these challenges researchers have modifiedthe conventional data mining techniques and also proposednew data mining algorithms to handle the data generatedfrom sensor networks In the following section we haveprovided the taxonomy of these data mining techniquesbased on the discipline from which they adopt their ideas

3 Taxonomy of Data Mining Techniquesfor WSNs

In this section a classification scheme for existing approachesdesigned for mining WSNs data is presented The highest-level classification is based upon the general data miningclasses used such as frequent pattern mining sequential pat-tern mining clustering and classification Most of the frequentpattern mining and sequential pattern mining approacheshave adapted the traditional frequent mining techniquessuch as the Apriori and frequent pattern (119865119875) growth-basedalgorithms to find the association among large WSNs dataCluster-based approaches have adapted the K-mean hier-archical and data correlation-based clustering based upon

the distance among the datapoint whereas classification-based approaches have adapted the traditional classificationtechniques such as decision tree rule-based nearest neighborand support vector machines methods based on type ofclassification model that they used These algorithms havevery different and distinct roles therefore in order to choosethe algorithm forWSNs application one has to decide in termof these top-level classes

The second level of classification is based upon eachapproachrsquos ability to process data on centralized or distributedmanner Since WSNs nodes are limited in terms of resourcesuch as power computation bandwidth and memory there-fore the approach meant for distributed processing requiresone-pass algorithms to complete a part of data mininglocally and then aggregate the results The objective to usethe distributed approaches is to limit the messages andcommunication energy of sensor nodes while transferringdata to central server It also helps to improve the WSNs life-time and can extract maximum data from the environmentwhereas the centralized processing data from entire networkis collected and stored at central server for analysis Sincethe central server is rich in resources therefore there are nosuch constraints for choosing the accurate algorithm Thisapproach is always discouraged for the researchers becauseit generates huge amount of dataflow and communicationwhich can create bottlenecks and wastage of communicationbandwidth These two data processingstorage architectureshave a large impact on type of data mining algorithm tochoose therefore one has to decide the processingstoragearchitecture for choosing the data mining algorithm forWSNs application

The third level of classification is selected according tothe attitude towards solving a specific problem Researchin WSNs area has focused on two separate aspects ofissues namely WSNs performance issues and applicationissues As WSNs nodes are usually resource constrainedsuch as energy communication bandwidth memory andresource aware algorithms are needed to maximize theWSNs performance On the other hand a WSNs applicationrequires data precision and accuracy fault tolerance eventprediction scalability and robustness and it often needsabundant use of energy communication and redundanciesThis leads to resource tradeoff whether someone sacrificesthe applicationrsquos performance in favor of network efficiency orwants to get the best application performance and deal withthe network resource issues such as energy in some other way(larger battery renewable sources with the nodes) For thisreason WSNs performances or application-specific-orientedapproaches have been selected as the lowest-level classifica-tion criteria The taxonomy of data mining techniques forWSNs is presented in Figure 1

4 State of the Art of Data Mining Techniquesfor WSNs

In this section data mining techniques designed for WSNsare classified using the taxonomy framework presented inSection 3 and the characteristics and performance analysisof each technique is discussed

4 International Journal of Distributed Sensor Networks

Data mining techniques for WSNs

ClassificationClusteringSequential miningFrequent mining

Distributed Centralized Distributed Centralized Distributed Centralized Distributed Centralized

WSN performance

WSN performance

WSN performance

WSN performance

WSN performance

WSN performance

WSN performance

Application based

Application based

Application based

Application based

Application based

Application based

DSARMCARM

Distributed data aggregation

Association rules mining framework

Online algorithm Lightweight rule

learning

MPGPTSP

Relational frameworkEpisode discovery

Contextual patterns discovery

Pattern learner MSAP

DCC

Prediction model CAG

Clustering sensory data Attribute-based clustering

DHCS

EEDC

Prediction framework FVLD

online learning

Person identification algorithms

NNTC Fuzzy predictor model

LWClass

SP-treeH-cluster

In-network datamining

TMP-mine One-class quarter-sphere SVM

Figure 1 Taxonomy of data mining techniques for sensor networks

41 Frequent Pattern Mining In this section we review someof the works that have been proposed for mining frequentpatterns from WSNs data Frequent pattern mining is usedto find the group of variables that co-occur frequently inthe data-set The aim is to find the most interesting relationsbetween variables Traditional frequent pattern mining algo-rithms [36ndash39] are the CPU and the IO intensive making itvery expensive to mine dynamic nature of WSN data Unlikethe mining static database dynamic nature of WSNs data ledto the study of online mining of frequent itemset As a resulttraditional frequent pattern mining algorithms are modifiedaccording to nature of WSNs data

The basic frequent pattern mining technique is associ-ation rule mining technique The first known associationrule mining algorithm is Apriori [40] It is based on level-wise candidate generation and test methodology by makingseveral scans over database In each iteration the patternsfound to be frequent are used to generate possible frequentpatterns (the candidates) to be counted in the next iterationTherefore theApriori technique finds the frequent patterns oflength 119896 from the set of already generated candidate patternsof length 119896 minus 1 In the subsequent step the association rulesare generated by computing the support and confidence ofeach frequent item in given database 119863 which is defined asfollows

Support (119860) =Sup (119860)119863 (1)

where Sup(119860) is the number of occurrence of 119860 in database119863 Consider the following

Confidence (119860 997888rarr 119861) =Sup (119860 cup 119861)Sup (119860)

(2)

This is impractical in the context of sensor networksas it implies that all data has to be stored somewhereHowever recently there has been a growing amount of workon discovering frequent item-sets from a data stream oftransactions such that every transaction is considered onlyonce and can be deleted afterwards

The other basic approach from mining association ruleis FP-growth [41] which can discover frequent patterns byreducing the database scans by two and eliminating therequirement of candidate generation as compared with Apri-ori With the first database scan the algorithm finds theset of distinct items with respective support count (iefrequency) in the database Then with the second databasescan the algorithm summarizes the database in the form ofa frequency-descending tree (ie the FP-tree) The completeset of frequent patterns is then mined from the FP-treeby recursively applying a divide-and-conquer-based patterngrowth approach called the FP-growth algorithm withoutadditional database scan The highly compact FP-tree struc-ture introduced a new wing of research in mining frequentpatterns However the static nature of the FP-tree and twodatabase scans still limit its applicability to frequent patternmining over a WSNs data Recently several centralized and

International Journal of Distributed Sensor Networks 5

distributed solutions have been proposed with the aimto maximize the WSNsrsquo performance and maximize theapplication-based performance by applying Apriori-like andFP-growth methods over WSNs data

411 Centralized Approaches Aim to SolveWSNsrsquo Application-Based Issues Halatchev and Gruenwald [42] proposed acentralized methodology called data stream association rulemining (DSARM) to identify the missing sensorrsquos readings Ituses the association rulemining algorithm to identify sensorsthat report the same data for a number of times in a slidingwindow called related sensors and then estimates the missingdata from a sensor by using the data reported by its relatedsensors Due to the stream nature of sensor data applyingan association mining algorithm such as Apriori directly tosensor data is not possible This situation led the authorsto propose the DSARM framework that adapts the Apriorialgorithm to make it applicable to the data stream receivedfrom sensor nodesThis technique is evaluated by simulationexperiments on real data collected by the Department ofTransportation in Austin TX USA to estimate missingvalue in related data streams Performance evaluations wereconducted to compare DSARM and alternative approachesThe results show that DSARM requires more memory spaceand takes longer to produce estimation than the consideredalternative approaches it achieves better accuracy of theestimated value than the alternative approaches do Howeverthere exist some limitations in DSARM First it is basedon two frequent itemsets association rule mining whichmeans that it can discover the relationships only between twosensors and ignore the cases where missing values are relatedwith multiple sensors Second it finds those relationshipsonly when both sensors report the same value and ignoresthe cases where missing values can be estimated by therelationships between sensors that report different values

Jiang and Gruenwald [43 44] proposed a data estimationtechnique called CARM (closed item-sets-based associationrule mining) which can derive the most recent associationrules between the sensors in the current sliding window Thetechnique is based on the closed frequent item-sets miningalgorithmof data streams calledCFI-stream [45] Itmaintainsan in-memory data structure called direct update (DIU) treeto store closed item-sets When a new transaction arrivesthe algorithm checks each item-set in the transaction over adata stream slidingwindowonline and incrementally updatesthe closed item-setsrsquo support If CRAM found some missingvalues in sensor reading instead of generating all possibleassociation rules it generates the rules that have strongrelationships with the current round of sensor readingswhereone or more readings are missing Based on these rules andselected closed item-sets CRAM generates the estimatedvalues which contain item values that are not included inthe original readings Figure 2 redrawn from [43] shows theDIU tree after receiving first four transactions It shows thatcurrently there are four closed item-sets C AB CD andABCin the DIU tree and their associated supports at the right-upper corner are 3 3 1 and 2 A basic set of rules is generatedfrom these frequent item-sets All other rules can be inferredfrom this basic rule set

Φ

CDTim

eline

TID

1

2

3

4

Items

C D

A B

A B C

A B C

AB 3

C 3

ABC 2

Figure 2 Lexicographical-ordered direct update tree

412 Centralized Approaches Aim to Maximize WSNsrsquo Per-formance Loo et al [46] have proposed online one-passalgorithms for mining large sensor streams They mine thefrequent value set from sensor stream data by transformingthe stream data into interval list (IL) under lossy countingframework [47] The time is divided into equal-size intervaland snapshot from the sensor reading is taken when there isan update on sensor reading Sensorsrsquo value at that snapshotconstructs the value sets stored in database An Apriori-based strategy is used to mine the value sets The analysisof IL-based presentation of stream data showed favorableresults using synthetic data-set However while computingthe IL of candidate value set redundant intersection ofIL is inevitable which affects the performance in termsof time and computation cost The proposed technique isevaluated by comparing the performance of ILB againstan application of lossy counting (LC) using a weightedtransformation method on synthetic dataset According totheir experiments ILB outperforms LC significantly for largesensor networks Moreover both the processing time andmemory consumption of ILB are more stable than those ofLC

Chong et al [48] proposed a rule-learning model thatfinds strong rules from sensor readings The rules are used asa trigger to control sensor network operations for examplethey can be used to sleep sensor or reduce data transmissionto conserve energy To mine the rules Apriori is modified tocount the number of transactions that are frequent insteadof the item-sets within transactions and transactions areprocessed in batches 119887

1 1198872 119887

119883 Suppose there is node

119872 that collects light temperature and microphone readingfrom three other sensor streams 119878

0 1198781 and 119878

2 Initially 119872

is queried to collect all sensory values it is used to generatea rule of the form of 119886

119899which implies 119886

119899minus1 therefore the

rule is extracted and only 119886119899is sent to the base station Upon

receiving the reading 119886119899and utilizing knowledge of the rule

the reading of 119886119899minus1

can be inferred All extracted rules arestored in rule repository The proposed method is validatedby using simulation implemented in C language on syntheticdataset In the experiment the first correlated data receivedfrom sensor is used to extract rules For subsequent phasethese rules are used to infer reading of sensor for the nextround

Tanbeer et al [49] proposed a tree-based data structurecalled sensor pattern tree (SP-tree) to generate association

6 International Journal of Distributed Sensor Networks

rules from WSNs data with one database scan The mainidea of the proposed approach is to obtain the frequencyof all event-detecting sensorsrsquo data construct a prefix-treebased on that in any canonical order and then reorganizethe tree in a frequency descending order Through thereorganization the SP-tree canmaintain the frequently event-detecting sensorsrsquo nodes at the upper part of the tree whichin turn provides high compactness in the tree structureOnce the SP-tree is constructed FP-growthmining techniqueis applied to find the frequent event-detecting sensor setsExperiments are performed to verify the improvement inmemory consumption and runtime that SP-tree achieves overPLT [50] The experiments show that SP-tree outperformsPLT in time and memory consumption The reason of suchgain is two folds first the PLT construction requires twodatabase scans while SP-tree constructs the tree by scanningthe database only once second the mining phase of SP-tree is highly efficient due to the frequency-descending treestructure

413 Distributed Approaches Aim to SolveWSNsrsquo Application-Based Issues Romer [51] proposed an in-network data min-ing technique to discover frequent patterns of events withcertain spatial and temporal properties In this approach userspecifies the upper boundmaxscope andmaxhistory (variableto be measured in seconds) for the patterns of interest Thesensor collects these events and applies amining algorithm todiscover the pattern that satisfies the given parameters Eachnode in the network collects the events from its neighborswithin themaximum scope and keeps a history of their eventsfor duration of the maximum history After that each nodeapplies a mining algorithm to discover the local frequentpatterns The resulting frequent patterns are converted toassociation rules that describe an event of type 119864 that occursat node 119899 with support 119878 and confidence 119862 Local patternsare sent to the sink where secondary mining is performed tocompute the global picture of entire network The algorithmis implemented on BT node (bluetooth radio) platform [52]and the tradeoff between scope of the query and resourceconsumption on real dataset is evaluated Results show byreducing the scope of the query that the proposed approachcould decrease resource consumption Major issues in thisapproach are memory consumption of itemset discoveryalgorithms and the communication overhead of event collec-tion

414 Distributed Approaches Aim to Maximize WSNsrsquo Perfor-mance Boukerche and Samarah [15] presented a distributeddata extraction methodology to aggregate the data on sensornode which reduced the number of messages during trans-mission The distributed solution sends some parameterssuch as support time-slot size and historic period from sink toall nodes within network Each sensor node has its own bufferentry to set the support value After each time slot nodescheck whether there are messages received during this timeslot if yes then that node will set its buffer entry When thehistoric period ended each node will traverse its buffer if thenumber of set value is more than or equal to support value

provided initially then the message would be transfered tosink To evaluate the validity of the distributed approach it iscompared with the centralized methodology on real datasetThey conducted two experiments using historical periods of 5and 10 days with minimum support values ranging from 10to 90 and a time-slot size equal to 30 seconds All of thereported results show a reduction in the number of messagesand the data sizewhile increasing in the support valuesMajorissues in thismethodology are increase in cost for node bufferand also delay in crucial messages in case of high supportvalue

Boukerche and Samarah [50] proposed the positionallexicographic tree (PLT) structure for mining associationrules in which the event-detecting sensors are the mainobjects of the rules regardless of their values Similar to theFP-growth approach PLT follows a pattern growth miningtechnique The mining begins with the sensor having themaximum rank by generating the frequent patterns from itsPLT in a recursive way The computation is required at eachrecursion to update the PLT involved in the prefix part ofa pattern Therefore two database scans requirement andthe additional PLT update operations during mining limitthe efficient use of this approach in handling WSNs dataThe performance evaluation is done by comparing the PLTstructure with the FP-growth algorithm According to theirresults PLT structure outperforms FP-growth in terms ofCPU time and memory usage for all of the support valuesused the enhanced performance using PLT when comparedwith FP-growth ranges from 30 percent to 50 percent

42 Sequential PatternMining (SPM) Frequent patternmin-ing has been extended to find more complex structuresuch as sequential pattern mining It discovers frequentsubsequences as patterns in a sequence database A sequencedatabase stores a number of records where all records aresequences of ordered events with orwithout concrete notionsof time A large number of real-world domains such as userprofiling medicine local weather forecast and bioinformat-ics show an inherent tendency to be modeled by means ofsequences of eventsobjects related to each other This greatvariety of applications of sequential pattern mining makesthis problem one of the central topics in WSNs data miningas shown by the research efforts produced in the recent yearsThe sequential pattern mining techniques in sensor networkbased on either traditional sequential mining algorithmssuch as Apriori-like algorithm [53] Apriori-based methodsGSP [54] PSP [55] and pattern growth approaches FreeSpanand PrefixSpan [56 57] or some new algorithm are devisedspecifically to work with sensor network environment

421 Centralized Approaches Aim to SolveWSNsrsquo Application-Based Issues Esposito et al [58 59] presented a multi-dimensional relational sequence mining framework to iden-tify the hidden frequent temporal correlations betweensensor nodes The algorithm is based on generic level-wise search method called APRIORI [60] for discoveringcorrelated sensors The framework exploits the relationallanguage to describe the temporal evolution of a sensor

International Journal of Distributed Sensor Networks 7

network along with contextual information by working intwo phases Firstly an abstraction step is to segment andlabel the real-valued time series into similar subsequencesby using a kernel density estimator approach Then theknowledge is enriched by adding interval-based operatorsbetween the subsequences obtained in the discretization stepand the relation pattern mining algorithm has been extendedin order to deal with these new operators By taking intoaccount the interval-based temporal data along with contex-tual information about events it discovers interesting andmore human-readable patterns The framework is evaluatedon real dataset collected from a wireless sensor networkmade up of 54 Mica2Dot [61] sensors deployed in the IntelBerkeley Research Lab [62] Each sensor collected topologyinformation along with humidity temperature light andvoltage values once every 31 seconds Results show the strongcorrelation among some measurements which is useful foranomaly detection

Cook et al [21] present MavHome smart home archi-tecture which focuses on the creation of an intelligenthome perceiving the state of the home through sensors andacting upon the environment through device controllers Animportant characteristic of the proposed architecture is theability to make decisions based on predicted activities Topredict the activities an algorithm called episode discovery(ED) is proposed which is based on the work of Srikantand Agrawal [54] for mining sequential patterns from time-ordered transactions Values that can be predicted include theusage pattern of devices in the home the movement patternsof the inhabitants and the typical activities of the inhabitantsThey utilize prediction algorithms on action sequences storedin inhabitant event history to forecast user actions Actionscan then be automated based on the significance of minedpatterns as well as the predictive accuracy of the next eventA key disadvantage is the fact that the entire action historymust be stored and processed off line which is not practicalfor large prediction tasks over a long period of time Cook etal demonstrated the effectiveness of MavHome on syntheticsmart home data and real data collected by students usingX10controllers in their homes Experiments show a predictiveaccuracy as high as 534 on the real data and 944 on thesynthetic data

Rabatel et al [22] presented a strategy to detect anomaliesfrom sensor data to improve the railway maintenance Theyextract sequential pattern from real railway data and identifythe abnormal behavior Based on these abnormal findingsalarms are automatically triggered to notify potential fail-ures This abnormal behavior depends on environmental(weather conditions travel characteristics) and structural(route episode index in the route) changes in data ThePSP [55] algorithm has been used to identify the sequentialpatterns To tackle the environments conditions a contextualknowledge-based method is proposed which is able toprovide information on the seriousness and possible causesof a deviation The proposed technique helps in proactivemaintenance of train However real-time context can beimproved by providing precise and exact information foranomaly detection

a q kTqkTaq

Figure 3 Example of sequential alarm pattern

Guralnik and Haigh [23] use sequential pattern miningto learn typical behaviors of humans in their homes Humanbehavior is inferred by using motion sensors pressure padsdoor latch sensors and toilet flush sensors They installed10ndash20 sensors of different types in a home and built modelsof what sensor firings correspond to what activities in whatorder and at what time For example ldquoIn 60 of the daysthe Kitchen-Motion sensor fires between 18h00 and 18h30and then the Living-Room-Motion sensor fires between18h20 and 20h00 and then the Bedroom-Motion sensor firesbetween 19h45 and 22h00rdquoTheir algorithm uses these data tolearn the sequences of rooms in which the person was actingand it uses domain knowledge to extract the sequences ofrooms the person was acting in These sequences are thenanalyzed by a human expert to identify complex behaviormodels These models can be used to select the appropriateresponse plan to the action of elderly

Wu et al [63] proposed a new algorithm for miningsequential alarm patterns (MSAPs) from the alarm datagenerated by GSM system Sequential events are identifiedfrom alarm data by defining time interval between adjacentevents For example if time is set as six hours then thesequential alarm pattern (119886 119887 119888) indicates that 119886 119887 and 119888happen in order and that the time interval between 119886 and119887 and between 119887 and 119888 is less than six hours An exampleof sequential alarm sequence redrawn from [63] is shown inFigure 3

The number in circle represents the error ID and 119879119886119902

denotes the time difference between alarm event 119886 and alarmevent 119902 The knowledge extracted is not only useful foridentifying relevance between two events but it is also predictthe alarm sequence and takes proper steps to prevent theoccurrence of the alarms if at all possible For example if thenetwork operator detects that the alarm 119886 occurring at time 119905operator should dissipate this alarm before the time 119905+119879

119886119902to

alleviate the abnormal situations incurred The limitation inthis technique is that it cannot discover other possible time-interval patterns between the events

It is observed that there is none of centralized solutionswhich aim to maximize the WSNsrsquo performance

422 DistributedApproaches Aim to SolveWSNsrsquo Application-Based Issues Tseng and Lu [64] proposed an object trackingstrategy named themultilevel object tracking (MLOT) to dis-cover sequential patterns in object tracking sensor networks(OTSNs) by mining the movement log in sensor networks Amultilevel hierarchical structure is adapted by using the clus-tering mechanism that represents the hierarchical relationsamong sensor nodes to achieve the goal of keeping track ofmoving objects in a real-time manner The movement logsof the moving objects are analyzed by developing the data

8 International Journal of Distributed Sensor Networks

mining algorithm movement pattern generation (MPG) toobtain themovement patterns which are then used to predictthe next position of a moving object and to activate the leastsensor node The MPG is based on Apriori which uses thefrequency of the inference pattern to evaluate the confidenceof the pattern and which with the highest frequency serves asthe basis of the prediction

423 Distributed Approaches Aim to Maximize WSNsrsquo Per-formance Tseng and Lin [65] proposed an object trackingstrategy named TMP-mine to discover sequential patternsin object tracking sensor networks (OTSNs) by mining thetemporal movement patterns (TMPs) logs The discoveredtemporal movement rules (TMRs) are used to predict thelocation of next objects for saving energy In the proposedmodel object is able to record the sensor nodes it visitedalong with the arrival time at each nodeThemovement log iscollected by equipping the sensor nodes with storage devicesTheWSN collects and integrates themovement log ofmovingobjects The integrated movement log is used as the input tothe data mining method named the TMP-miner which usesthe pattern growth approach for discovering the TMPs Byapplying the TMP-mine algorithm the TMPs are discoveredand then the temporalmovement rules (TMRs) are generatedfor predicting next location of moving object Suppose thatthe following two rules are discovered by vehicle trackingsystem

Rule 1 (Station A rarr interval 10min rarr Station B rarrinterval 5min rarr Station C)

Rule 2 (Station A rarr interval 20min rarr Station B rarrinterval 5min rarr Station rarr D)

By dispatching these rules to the corresponding sensornodes the tracking can be made in energy-efficient way Forexample if a car moves with the pattern as (Station A rarrinterval 10min rarr Station B rarr interval 5min) that matcheswith Rule 1 then the node in Station B has only to activatethe node in Station C rather than that in Station D or thosearound Station B

Samarah et al [66] proposed an energy-efficientprediction-based tracking technique by using the sequentialpatterns (PTSPs) This technique helps to predict the futurelocation of a moving object with the minimum number ofsensor nodes while keeping the other sensor nodes in thenetwork in sleep mode The PTSP is based on the inheritedpatterns of the objects movements in the network and theutilization of sequential patterns to predict in which sensornode the moving object will be heading next

43 Clustering Clustering is unsupervised learning wheregiven data is categorized into subsets so that each subsetrepresents a cluster which has distinctive properties It hasbeen considered a useful technique especially for applicationsthat require scalability to large number of sensor nodesClustering also supports aggregation of data in order tosummarize the overall transmitted data

ClustersInput sensor data

Feedback

Identification ofdata correlation Grouping data

Figure 4 Data clustering for sensor networks

In the current literatures problems related to clusteringare addressed by node clustering or data clustering Recentlylarge numbers of node clustering algorithms have beendesigned for WSNs [67ndash83] These clustering techniqueswidely vary in their objectives depending on the node deploy-ment and bootstrapping schemes the pursued networkarchitecture the characteristics of the cluster head (CH)and the network operation model Although node clusteringmay be related to data clustering for example consideringdata similarity of neighboring node many popular nodeclustering algorithms that partition the sensor nodes into anumber of small groups and elect a cluster head for everygroup do not use the data mining techniques directly In thisstudy we only focus on data clustering techniques to efficientdata mining and find data correlations among the nodesFigure 4 shows the commonly used data clustering in datamining process

This work adapted the K-mean hierarchical and datacorrelation-based methods The k-mean algorithm takes theinput parameter k and partitions a set of 119899 objects into kclusters so that the resulting intracluster similarity is highbut the intercluster similarity is low Cluster similarity ismeasured with respect to the mean value of the objectsin a cluster Hierarchical method creates a hierarchicaldecomposition of the given set of data objects It works bygrouping data objects into a tree of clusters whereas datacorrelation-based clustering forms clusters based on spatialand temporal correlations with similar node sensory valueswithin a given threshold and these clusters remain fixeduntil the sensory value threshold has changed over timeWhen the threshold values change the related sensor nodeswill then communicate with neighboring nodes associatedwith other clusters to change their cluster memberships Thedrawback of this type of clustering is that it does not considernode residual energy It is observed from the survey that thecentralized and distributed clustering solutions are aim tomaximize the WSNs performance

431 Centralized Approaches Aim to Maximize WSNsrsquo Per-formance Liu et al [84] proposed a centralized graph-basedenergy-efficient data collection (EEDC) EEDC is on-demandclustering algorithm that clusters node into groups such thatmembers have similar sensor readings and thus the protocolclusters the network with an awareness of the phenomenabeing sensed EEDC is a centralized approach where thesink compares data from different nodes with a user-defineddissimilarity measure EEDC models the cluster creationprocess as a clique-covering problem by constructing a graph119866 such that each sensor node is a vertex in the graph An edge(119906 V) is drawn if the dissimilarity measure between vertex119906 and vertex V is less than or equal to the given intracluster

International Journal of Distributed Sensor Networks 9

dissimilarity measure thresholdmax dst A cluster is a cliquein the graph and the clustering problem uses the minimumnumber of cliques to cover all vertices in the graph Thisprocess minimizes the number of clusters and maximizes theenergy saving The sink also dynamically adjusts the clustersbased on spatial correlation and the received data from thesensors The algorithm produces robust and well-balancedclusters However due to centralized processings it is notsuitable for large-scale WSNs

432 Distributed Approaches Aim toMaximizeWSNsrsquo Perfor-mance Guo et al [85] proposed the H-cluster a distributedalgorithm to cluster sensory dataThe input of this algorithmis the set of sensory data collected by all of the sensorsfrom the time WSN starts working up to the current timeThe output of the algorithm is a set of cluster featuresthat summarize the clusters of the input sensory data-setHilbert-Map mapping algorithm has been used to map ad-dimensional sensory data space into a 2-dimensional areacovered by a given WSN H-cluster has 2 phases (1) itmerges connected grid features with local cluster featuresof (sensory dimensional) D at each destination node (2)it combines the connected local clusters to global clustersThe experiments on the centralized and distributed dataare carried out to compare the H-Cluster with C-Cornerand C-Center algorithms During experiment four types ofenvironment attributes are sensed by the sensors which aretemperature humidity light and voltage The results showthatH-Cluster algorithm ismuch efficient in data loss energyand the quality of cluster data in small WSNThe results alsoshows that as the amount of sensory data delivered increasesthe amount of data loss also increases and energy efficiencydecreases by increasing the size of WSNs

Yeo et al [86] proposed data correlation-based clusteringscheme (DCC) based on similarity of sensor data along aspatial suppression scheme which helps to reduce the datasize DCC enhances the advertisement phase of HEED [71]in which cluster heads are selected according to probabilityof becoming a cluster head during this phase sensor nodescommunicate with each other and the resulting clustersare organized by sensor nodes which have similar readingsSpatial suppression is performed on cluster head and italso computes the difference between sensor reading andrepresentative value If a cluster head has redundant datait will remove it except for the node identification Theexperimental results justify the hypothesis claim that theclustering based on data correlation has better compressionperformance than ordinary clustering based on locality ofcommunication they show that DCC reduces 40 of datasize through suppression and prolongs network lifetime20ndash30 However for the large-scale network applications(nodes gt 500) DCC is inefficient because each cluster headneeds more energy to collect similar data readings and alsoto communicate with several nodes Also in case of lowpercentage of similar data reading DCC is ineffective due tohigher rate of cluster head creation

Beyens et al [87] proposed a cluster-based architecturefor wireless sensor networks in which cluster heads spa-tiotemporally correlate and predict the measurements of the

cluster members by executing their prediction model Intheir approach the cluster heads execute a prediction modelwhile gateway nodes at the circumference of the clusters areresponsible for the routing task Prediction model is used toselect a suitable node of the cluster to be activated The ideais to put a sensor node to sleep when there are no objects inits sensing region

Yoon and Shahabi [88] present the clustered aggregation(CAG) algorithm that forms clusters of nodes sensing similarvalues within a given threshold (spatial correlation) andthese clusters remain unchanged as long as the sensor valuesstay within a threshold over time (temporal correlation)By grouping nodes on similar values CAG only transmitsone reading per group When the threshold values changethe related sensor nodes will then communicate with neigh-boring nodes associated with other clusters to change theircluster memberships CAG guarantees the result to be withina user-specified error-tolerance threshold Cluster formationis performed while queries are disseminated to the network(query phase) where clusters group nodes sensing similarvalues Subsequently CAG enters the response phase whereinonly one aggregated value per cluster is transmitted up theaggregation tree CAG is a lossy clustering algorithm (mostsensory readings are never reported) which trades a lowerresult precision for a significant energy storage computationand communication saving

Taherkordi et al [67] proposed a communication-efficient distributed protocol for clustering sensory dataA distributed version of 119870-Mean clustering algorithm isproposed and sends summarized data towards sink whichreduces the communication transmission time and powerconsumption of sensor nodes The sensor network is dividedinto clusters and cluster head node will only communicatewith sink Initially base station transmits current centerlocations to cluster heads Cluster head collects data fromits sensor node and sends it to the base station includingcount and vector sum of its local sensory data points aswell as sum of the squared distance from each local pointto its center On receiving data from CH the base stationupdates the cluster mean and the algorithm repeats until thefunction convergence is met The efficiency of the algorithmis evaluated via simulations Several programs are run to getthe average number of transmissions over the network duringeach test According to results the communication cost isindependent of the number of sensors (119873) and increaseslinearly by increasing the number of centers Major issuesare extra memory for cluster head and computation powerfor summarization of data before transmitting to sink Alsothe algorithm requires multiple rounds of message passingbetween cluster heads and the base station this may have aserious effect on communication efficiency when the numberof sensors is relatively high

Wang et al [89] promoted the idea of clustering theWSNs based on the queries and attributes of the data Themain motive is to achieve efficient dissemination of data inthe network The concept resembles the data-centric designmodel of WSNs The clustering is established by mappinga hierarchy of data attributes to the network topology Thebase station starts the clustering process by asking nodes

10 International Journal of Distributed Sensor Networks

Class label (Y)

Attribute set (X)

OutputInput Classification model

Figure 5 Classification maps input attribute set (X) to class label(Y)

to form clusters Those nodes that hear the request decidewhether they should nominate themselves as CHs basedon their energy After receiving the base-station requestsensor nodes having intention to become CHs wait for arandom time period that is based on the remaining batterysupply If a node nominates itself then it broadcasts anannouncement to all nodes A node joins the CH that itcan reach over the least number of hops Upon hearing aCH announcement from a node whose attribute is differentthe recipient node establishes a new cluster for that attributeand becomes a CH To evaluate the attribute-based clusteringscheme the authors have provided the theoretical analysis ofit with flooding-based schemes Analysis shows its attribute-based clustering scheme yield that gains over flooding-basedschemeswhen there are subregions in the sensor network thatare more targeted than others that is when the distributionof inquiries is not uniformly distributed over time and space

Ma et al [90] the proposed distributed hierarchicalclustering and Summarization algorithm (DHCS) for onlinedata analysis and mining in sensor networks The proposedmethod clusters sensor nodes based on their current datavalues aswell as their geographical proximity and it computesa summary for each cluster The algorithm adopts severaltechniques such as difference and hop count thresholds tomodel node and distance-based clustering Initially eachnode treats itself as an active cluster Then similar adjacentclusters are merged into larger clusters round by round Ineach round each cluster will try to combine with its mostsimilar adjacent cluster simultaneously Two clusters can bemerged only if both consider one another as the most similarneighbor DHCS terminates when no merging happens anymore The final clusters which cannot be merged any moreare called steady clusters

44 Classification Classification is a task of assigning newobject into a class of predefined object categories Classifi-cation model is learned using the set of training data andclassifies new data into one of the learned class Figure 5shows that classification maps input attribute set (X) to classlabel (Y)

Classification-based approaches have adapted the tra-ditional classification techniques such as decision tree-based rule-based nearest neighbor-based and support vectormachines-based techniques based on type of the classificationmodel that they used Decision tree is a classifier in the formof tree and classifies the instance by starting at the root oftree and moving through it until a leaf node where class labelis assigned The internal nodes are used to partition datainto subsets by applying test condition to separate instancesthat have different characteristics Nearest neighbor-basedapproaches classify dataset based on closet training examples

The training examples are vectors in a multidimensionalfeature space with corresponding class labels A nearestneighbor classifier is a lazy learner that does not processpatterns during training [91] To respond a request to classifya query vector is made to locate the closest training vectorsaccording to the distance metricThe classes of these trainingvectors are used to assign a class to the query vector

Rule-based classifier groups the dataset in predefinedclasses by using ldquoif then rdquo rules of following form

(Condition) rarr Y condition is a conjunction ofattribute and Y is a class label

SVM (support vector machine) techniques partition thedata belonging to different classes by fitting a hyperplanebetween them which maximizes the partition The data ismapped into a higher-dimensional feature space where it canbe easily partitioned by a hyperplane Furthermore a kernelfunction is used to approximate the dot products between themapped vectors in the feature space to find the hyperplane

441 Centralized Approaches Aim to SolveWSNsrsquo Application-Based Issues Chikhaoui et al [92] proposed the decisionTree (DT-) based classification technique for sensor dataThey applied the classification model to identify the personsin ubiquitous environment In order to identify personsthe proposed approach first extracts frequent patterns calledepisodes from the datasets using the Apriori algorithm [53]The next step evaluates the extracted patterns and assignsweights to these episodes to construct frequent episodeweight matrix (FEWM)

Finally the classification algorithm Decision tree (DT) isapplied on FEWMDT builds pattern classifier from a labeledtraining data-set using a divide-and-conquer approach Tobuild up a DT model it recursively selects the attribute thatis used to partition the training data-set into subsets untileach leaf node in the tree has uniform class membershipThe proposed approach is validated by experiment usingdata collected from the Domus Laboratory [93] and theTestbed smart home [94] The general performance andclassification accuracy of algorithm are evaluated by usingthe Weka framework version 370 [95] Experiment resultsshow good classification However using frequent episodesalone without temporal constraints and deep analysis doesnot guarantee good identification

Sharma et al [96] proposed amethodology for classifyingthe sensors data by using nearest neighbor trajectory clas-sification (NNTC) The training phase simply stores everytraining example with its label To make a prediction for atest example first its distance to every training example iscomputedThen 119896 closest training examples are storedwhere119896 is a fixed integer and 119896 ge 1 among the 119896 examples itlooks for the label that is most frequent This label is theprediction for this test example The algorithm is evaluatedby building a classifier from the preprocessed training datagenerated from NS2 [97] and test trajectory data [98] usingclass labels Experimental investigation yields a significantoutput in terms of the correctly classified success rate 923

Akhlaghinia et al [99] proposed the prediction techniquein smart home environments to predict the behavior pattern

International Journal of Distributed Sensor Networks 11

of occupantsThe sensor NWs collect the variety of attributesincluding environmental changes and occupantrsquos interactionwith the environment The collected data is then used by thelearning approach to construct a classification-based predic-tive model to predict the ambient intelligence environmentoccupancy The occupancy is predicted by using the fuzzyrules which are modeled by using the past value of timeseries data In the learning process input from the sensor iscompared with stored rules to take appropriate action Theprediction-based approach improves the energy saving insmart homes and enhances the safety and security of occu-pants The result shows the ability of the proposed techniqueto predict the combined occupancy time series However themodel is implemented in single-user environment and unableto predict the complex environmental patterns in multi-userenvironment over long period

442 Centralized Approaches Aim toMaximizeWSNsrsquo Perfor-mance Gaber et al [100] proposed the lightweight classifica-tion (LWClass) a one-pass algorithm for on-board miningof data streams in sensor networks They used the algorithmoutput granularity (AOG) [101 102] technique to preserve thelimited memory size and change the algorithm output rateaccording to data rate available memory algorithm outputrate history and time constraints to fill the available memorywith generated knowledgeThe algorithmworks by searchingfor the nearest instance stored in main memory when a newelement arrives All instances are already stored in the mainmemory according to a prespecified distance threshold Thethreshold here represents the similarity measure acceptableby the algorithm to consider two or more elements as oneelement according to the elements attribute values If thealgorithm finds this element then it checks the class labelIf the class label is the same then it increases the weightfor this instance by one otherwise it decrements the weightby one If the weight becomes zero then this element isreleased from the memory The algorithm is empiricallyvalidated using synthetic streaming data under the resource-constrained environment of a common handheld computer

443 DistributedApproaches Aim to SolveWSNsrsquo Application-Based Issues McConnell and Skillicorn [103] presented adistributed framework for building and deploying predictorsin sensor networks By using the computational power ofeach sensor a powerful learning structure on whole networkis constructed A distributed voting approach is proposedin which each sensor is a leaf of tree (DT) to performlocal prediction Instead of sending the raw data the localpredictive models built on sensors transmit the target class tothe sink At sink the local predication models are combinedto construct global prediction model It shows how thelocal model enables sensors to respond to the change intarget by relearning local models The proposed frameworkis useful especially for sensor networks with limited energycomputation and bandwidth resources It makes efficientthe distributed data mining in the presence of movingclass boundaries Data is also confidentially achieved bytransmitting a predictivemodel instead of original data to the

sink The distributed prediction model is evaluated using J48decision tree (implemented in WEKA) on variety of datasetfor both simple and weighted voting schemes According toresults distributed prediction model has the potential of anincrease in accuracy combined with a reduction in modelsize and runtime as compared with a centralized approachMajor issues in this framework are the need of an expensiveCPU on each sensor node for computing and building localpredictive model and also extra memory is required to storelocal predictive model

444 Distributed Approaches Aim to Maximize WSNsrsquo Per-formance Malhotra et al [104] proposed a distributed clas-sification scheme to generate effective feature vectors of lowdimension (FVLD) for wireless audio network A distributedcluster-based algorithm for detection and classification ofvehicles has been proposed Sensors form clusters on-demand for the sake of running a classification task based onthe produced feature vectors The monitoring area is dividedinto clusters and a cluster head is selected for each clusterAll sensors send their feature vector to cluster heads Thecluster head combines all received feature vectors (includingone from itself) executes the classification task using forexample KNN or ML classifiers and makes decision on theclass of the unknown vehicle Two approacheswere proposedthe first combines extracted features and the second combinesindividual decisions Classification using decision fusion anda maximum likelihood (ML) classifier led to the best resultsML is also compared with KNN classifier with varioussettings of data and decision fusion schemes The proposedtechnique produced the best classification accuracy of 8946as compared with all other approaches

Flouri et al [105ndash107] have proposed distributed andincremental techniques for learning classification rules usingSVM-based (support vector machine) technique in a sensornetwork The authors proposed two distributed algorithmsthe distributed fix partition SVM (DFP-SVM) and theweighted distributed fix partition SVM (WDFP-SVM) fortraining a SVM applied to the classification problem in aWSN SVM is incrementally trained on example set calledsupport vector The fact with SVM is that the number ofsupport vectors is very small comparedwith the number of allsample values Besides the support vectors (and offset) revealcompressed representation of separating SVM hyperplaneThat is why sending only the support vectors instead ofall training samples to the next cluster head is obviouslyvery energy efficient due to communication reduction Aftertraining the required parameters of the kernel functions aretransferred to each node for classification The performanceof the proposed approach is evaluated by running number ofsimulation and comparison is made with centralized algo-rithm The results show that energy consumption decreaseswhen the SVM is trained incrementally as compared with thecentralized case However the challenges for SVM formula-tions are computational complexity and the choice of properkernel function

Rajasegarar et al [108] proposed the SVM-based tech-nique for outlier detection in sensor data This techniqueuses one-class quarter-sphere SVM to identify local outliers

12 International Journal of Distributed Sensor Networks

at each node and to minimize the computational complexityThe sensor data that lies outside the quarter sphere isconsidered as an outlier Each node communicates onlythe radius information of sphere with its parent for outlierclassification This technique identifies outliers from the datameasurements collected after a long-time window and is notperformed in real time The technique also ignores spatialcorrelation of neighboring nodes which makes the results oflocal outliers inaccurate The technique is evaluated by usingthe real sensor measurement collected from deployment ofwireless sensors in the Great Duck Island Project [2] formonitoring the habitat of sea birds The algorithm is imple-mented in Matlab and two simulations were run to measurethe computational strategy and various kernel functionsResults reveal that the proposed technique achieves signifi-cant energy savings in terms of communication overhead inthe network

5 Comparison of Data Mining Techniquesfor WSNs

This section identifies several common and different aspectsof data mining techniques specially designed for WSNsdiscussed above These aspects will be used as metrics in thecomparative Tables 2 3 4 5 and 6 First evaluation aspectsfor different techniques are discussed and then comparativetables are presented to compare and differentiate existing datamining techniques for WSNs data

51 Input Sensor Data Sensor data can be viewed as largevolume of real-valued data that is continuously collectedfrom WSNs The type of input sensor data demonstrateswhich data mining techniques can be used to analyze thedata Data mining techniques usually consider following twocharacteristics of data

Attribute Mining techniques can identify the associationbetween data attributes Attributes can be homogenous [50] orheterogeneous [33 48] Homogenous attribute means sensingsingle-value attribute for example temperature only Forheterogeneous case each nodemay be equippedwithmultiplesensors and can sense multiple attributes for example tem-perature humidity and pressure The data mining techniqueshould be able to identify the correlation between multipleattributes

Correlation Two types of data correlation appear at eachsensor node The first type is attribute correlation that isdependency among data attributes The second type is interms of time and space that is temporal and spatial corre-lation Temporal correlation indicates that the readings fromdifferent sensor node are observed at the same time instantand readings observed at one time instant are related tothe readings observed at the previous time instant whereasspatial correlation indicates that the readings from sensornodes geographically close to each other are expected tobe largely correlated Capturing spatiotemporal correlation

helps to predict future trend of sensor reading and identifica-tion of dead node if reading from correlated sensor ismissing

52 Processing Architecture In order to apply data miningtechnique on sensor data we need to determine the modelsof computation There are two general models Consider thefollowing

CentralizedThe simplest way to analyzeWSNs data is to use acentralized model In this approach entire raw data collectedfromWSNs is transferred to central server whichmaintains adatabase of readings from all of the sensorsThe central serverperforms offline extensive analysis in order to find interestingpatterns from the aggregated data With the size of WSNsincreasing the amount of data transmitted in the system willbecome huge The obvious drawback of this approach is highconsumption of energy and bandwidth Furthermore it is notscalable to very large number of sensors

Distributed Another computation approach uses distributedmodel in which sensor nodes use their processing abilitiesto carry out some mining tasks locally and transmit onlythe required and partially processed data called local modelLocal models contain the compact event patterns rather thanraw data For example data collected from different sensorcan be aggregated before being transmitted to central serverIn these systems an intermediate node called ldquoaggregatorrdquo isused to collect and aggregate the data from different sensorsSince sensor nodes are constrained in resources the challengefor this approach is how to satisfy the mining accuracywhile keeping the communication overhead memory andcomputational cost low

53 Data Mining Method It refers to the data miningalgorithm adapted or developed for unique characteristic ofWSNs data Distributed approaches use one-scan algorithmsfor real-time processing in order to deal with the high dataarrival rate the mining results are expected to be availablewithin short response times whereas centralized approachescollect the sensory data to single site and applies offlinemultiscan technique for extensive data analysis

54 Node Properties The proposed techniques are largelyinfluenced by following types of node properties

Connectivity Single-hop communication is a direct commu-nication between the sensor node and the base station It issimple and easy to implement but limited by communicationdistanceMultihop communication uses some kinds of nodesas relays when transmitting data packets from the source tothe sink which is more complex

Mobility Node mobility increases the complexity of design-ing an appropriate data mining technique for WSNs Themajority of techniques assumes that sensor nodes are staticonly a few techniques consider the node mobility Whennodes are mobile maintaining a certain structure for data

International Journal of Distributed Sensor Networks 13

Table2Com

paris

onof

dataminingtechniqu

esforw

irelesssensor

networks

Approach

Objectiv

eDM

metho

d

Processin

gSensor

data

Nod

eproperties

Implem

entatio

nLimitatio

nsArchitecture

Attributes

Correlatio

nCon

nectivity

Mob

ility

Nod

erole

Nod

etask

Applicationarea

Evaluatio

nmetho

dDatas

ource

Opt

objective

Distributed

Central

Homogenous

Heterogeneous

Attribute

Spatial

Temporal

Singlehop

Multihops

Static

Mobile

ClusterheadSensorRelay

Simulation

AnalyticalMod

Real

Synthetic

Frequent

patte

rnmining

DSA

RM[42]

Missingdata

estim

ation

Aprio

rilik

eradicradic

radicradic

radicradic

Sensea

ndsend

Traffi

cmon

itorin

gradic

radicData

accuracy

Igno

rethes

ensor

thatrepo

rts

different

values

In-networkdata

mining[51]

Eventspatte

rns

discovery

Aprio

rilik

eradic

radicradicradic

radicradic

radic

Aggregatio

nlocalp

attern

mining

Environm

ental

mon

itorin

gradicradicradic

Scalability

Highmem

oryand

commun

ication

Distrib

uted

data

aggregation[15]

ImproveW

SNperfo

rmance

Aprio

rilik

eradic

radicradic

radicradic

radicSupp

ort-b

ased

aggregation

WSN

sperfo

rmance

mon

itorin

gradic

radicDatas

ize

Increasesb

uffer

cost

delayed

crucialm

essages

Onlinea

lgorith

m[46]

Intervallist

ofrepresentatio

nof

WSN

sdata

Lossy

coun

ting

radicradic

radicradic

radicradic

Perio

dical

sensing

WSN

smon

itorin

gradic

radicTimea

ndmem

ory

Datar

edun

dancy

Lightweightrule

learning

[48]

Identifyhigh

lycorrelated

rules

forsensin

gAp

riorilik

eradic

radicradic

radicradic

radicQuery-based

data

sensing

Con

trolW

SNs

operations

radicradic

Energy

Not

valid

ated

well

onrealdata

CARM

[43]

Missingdata

estim

ation

FP-growth

based

radicradic

radicradic

radicradic

Sensea

ndsend

Dataa

nalysis

radicradic

Data

accuracy

Ineffi

cientfor

hand

ling

high

-speed

data

14 International Journal of Distributed Sensor Networks

Table3Com

paris

onof

dataminingtechniqu

esforw

irelesssensor

networkscon

tinued

Approach

Objectiv

eDM

metho

d

Processin

gSensor

data

Nod

eproperties

Implem

entatio

nLimitatio

nsArchitecture

Attributes

Correlatio

nCon

nectivity

Mob

ility

Nod

erole

Nod

etask

Applicationarea

Evaluatio

nmetho

dDatas

ource

Opt

objective

Distributed

Central

Homogenous

Heterogeneous

Attribute

Spatial

Temporal

Singlehop

Multihops

Static

Mobile

Clusterhead

Sensor

Relay

Simulation

Analyticalmod

Real

Synthetic

Frequent

patte

rnmining

Associationrules

mining

fram

ework[50]

Faultand

future

event

predictio

n

FP-growth

usingPL

T-str

uctureradic

radicradic

radicradic

radicradic

Aggregatio

nMon

itorW

SNs

quality

ofserviceradic

radicNoof

messages

Increase

costdu

eto

multip

leDBscan

SP-tr

ee[49]

Disc

over

events

patte

rns

FP-growth

based

radicradic

radicradic

radicradic

Sensea

ndsend

Generic

mon

itorin

gradicradicradic

Mem

ory

Hightre

econstructio

ncost

Sequ

entia

lpattern

mining

Relatio

nal

fram

ework[58]

Multi-

dimensio

nal

correlation

discovery

Aprio

rilik

eradic

radicradicradic

radicradic

Sensea

ndsend

Environm

ental

mon

itorin

gradicradic

Data

representatio

nMem

oryandtim

econsum

ing

Episo

dediscovery(ED)

[21]

Actio

npredictio

n

Generalized

sequ

entia

lpatte

rn(G

SP)

radicradic

radicradic

radicSensea

ndsend

Inhabitants

behavior

predictio

nradicradicradic

Predictio

naccuracy

Ineffi

cientfor

complex

activ

ities

MPG

[64]

Predicto

bjectrsquos

future

movem

ent

Aprio

rilik

eradic

radicradic

radicradicradic

Clusterin

gRe

al-timeo

bject

tracking

radicradic

Tracking

time

andenergy

Not

analyzed

onrealdataset

Con

textual

patte

rns

discovery[22]

Ano

maly

detection

PSP

radicradicradic

radicradic

radicSensea

ndsend

Railw

aymaintenance

radicradic

Ano

maly

precision

Missingreal-time

anom

alypredictio

n

International Journal of Distributed Sensor Networks 15

Table4Com

paris

onof

dataminingtechniqu

esforw

irelesssensor

networkscon

tinued

Approach

Objectiv

eDM

metho

d

Processin

gSensor

data

Nod

eproperties

Implem

entatio

nLimitatio

nsArchitecture

Attributes

Correlatio

nCon

nectivity

Mob

ility

Nod

erole

Nod

etask

Applicationarea

Evaluatio

nmetho

dDatas

ource

Optobjectiv

e

Distributed

Central

Homogenous

Heterogeneous

Attribute

Spatial

Temporal

Singlehop

Multihops

Static

Mobile

ClusterheadSensorRelay

Simulation

Analyticalmod

Real

Synthetic

Sequ

entia

lpattern

mining

TMP-mine[65]

Predicto

bjectrsquos

future

movem

ent

Patte

rngrow

thusingTM

P-tre

econstructio

nradic

radicradic

radicradic

radicRu

le-based

node

activ

ation

Real-timeo

bject

tracking

radicradic

Energy

Highmissing

rateandtim

e

Patte

rnlearner[23]B

ehavior

recogn

ition

Tree

projectio

nradic

radicradic

radicradic

radicSensea

ndsend

Behavior

mon

itorin

gradicradic

Noof

patte

rns

learned

Com

plex

and

redu

ndant

patte

rns

MSA

P[63]

Faultp

rediction

Cand

idate

constructio

nradicradic

radicradicradic

radicSensea

ndsend

Telecommun

ication

radicradic

Patte

rnsa

ccuracy

Cand

idate

constructio

nis

expensiveto

compu

te

PTSP

[66]

Objectrsquos

future

movem

ent

predictio

n

Sequ

entia

lpatte

rngeneratio

nradic

radicradic

radicradic

radicRu

le-based

node

activ

ation

Objecttracking

radicradic

Energy

Ineffi

cientto

predict

high

-speed

objects

Clusterin

g

DCC

[86]

WSN

slon

gevity

Data

correlation-

based

cluste

ring

radicradic

radicradicradic

radicradic

Data

supp

ression

GenericWSN

sapplication

radicradic

Energy

anddata

size

Highclu

sterin

grate

H-cluste

r[85]

In-network

commun

ication

Data

correlation-

based

cluste

ring

radicradic

radicradicradic

radicradic

Data

summarization

Real-time

mon

itorin

gradic

radicradic

Com

mun

ication

Highdataloss

rate

16 International Journal of Distributed Sensor Networks

Table5Com

paris

onof

dataminingtechniqu

esforw

irelesssensor

networkscon

tinued

Approach

Objectiv

eDM

metho

d

Processin

gSensor

data

Nod

eproperties

Implem

entatio

nLimitatio

nsArchitecture

Attributes

Correlatio

nCon

nectivity

Mob

ility

Role

Nod

etask

Applicationarea

Evaluatio

nmetho

dDatas

ource

Optobjectiv

e

Distributed

Central

Homogenous

Heterogeneous

Attribute

Spatial

Temporal

Singlehop

Multihops

Static

Mobile

ClusterheadSensorRelay

Simulation

Analyticalmod

Real

Synthetic

Clusterin

gPredictio

nmod

el[87]

Predictio

n-based

mon

itorin

gHeuris

ticscheme

radicradic

radicradic

radicradic

radicradicradic

Localprediction

mod

elEn

vironm

ental

mon

itorin

gradic

radicCom

mun

ication

Clustero

verla

pping

CAG[88]

WSN

sbandw

idth

gain

Data

correlation-

based

cluste

ring

radicradic

radicradic

radicradic

radicradic

Dataa

ggregatio

nGenericWSN

sapplications

radicradic

Com

mun

ication

Sensorydataloss

EEDC[84]

On-demand

cluste

ring

Data

correlation-

based

cluste

ring

radicradic

radicradic

radicradic

radicSensea

ndsend

Surveillanced

ata

analysis

radicradicradic

Energy

Ineffi

cientfor

large

WSN

s

Clusterin

gsensorydata[67]Com

mun

ication

efficiency

K-means

radicradicradic

radicradic

radicradic

Data

summarization

Dataa

nalysis

radicradic

Com

mun

ication

Ineffi

cientfor

large

WSN

sAttributeb

ased

cluste

ring[89]

WSN

sbandw

idth

gain

Hierarchal

cluste

ringradic

radicradic

radicradic

radicradic

Datac

luste

ring

Mon

itorin

gand

tracking

radicradic

Com

mun

ication

Highcompu

tatio

ncost

DHCS

[90]

Uniform

data

distr

ibution

Hierarchal

cluste

ringradic

radicradicradic

radicradic

radicradic

Datac

luste

ring

and

summarization

Interactived

ata

analysis

radicMessage

redu

ction

Nod

esenergy

isigno

red

International Journal of Distributed Sensor Networks 17

Table6Com

paris

onof

dataminingtechniqu

esforw

irelesssensor

networkscon

tinued

Approach

Objectiv

eDM

metho

d

Processin

gSensor

data

Nod

eproperties

Implem

entatio

nLimitatio

nsArchitecture

Attributes

Correlatio

nCon

nectivity

Mob

ility

Role

Nod

etask

Applicationarea

Evaluatio

nmetho

dDatas

ource

Opt

objective

Distributed

Central

Homogenous

Heterogeneous

Attribute

Spatial

Temporal

Singlehop

Multihops

Static

Mobile

ClusterheadSensorRelay

Simulation

Analyticalmod

Real

Synthetic

Classifi

catio

nPerson

identifi

catio

nalgorithm

s[109]

Identifyhu

man

behavior

Decision

tree

radicradicradic

radicradic

radicSensea

ndsend

Health

care

radicradic

Classifi

catio

naccuracy

Doesn

otgu

arantee

thec

orrectness

Predictio

nfram

ework[103]

Distrib

uted

predictio

nDecision

tree

radicradic

radicradicradic

radicradic

Localprediction

Generic

radicradic

Predictio

naccuracy

Com

putatio

nal

complexity

NNTC

[96]

Real-time

classificatio

nNearest

neighb

orradicradic

radicradic

radicradic

Sensea

ndsend

Generic

radicradicradic

Classifi

catio

naccuracy

Not

evaluatedon

realdataset

LWClass[100]

Preserve

WSN

sresources

KNN

radicradic

radicradic

radicradic

Sensea

ndsend

Ubiqu

itous

environm

ents

radicradic

Resource

awareness

Non

adaptio

nto

conceptd

rift

FVLD

[104

]Lo

w-dim

ensio

nfeaturev

ector

generatio

nKN

NM

Lradic

radicradic

radicradic

radicradic

Classifi

catio

nVe

hicle

classificatio

nradic

radicEn

ergy

Highcostof

feature

vector

transm

ission

Fuzzypredictor

mod

el[99]

Occup

ancy

predictio

nFu

zzyrules

radicradic

radicradic

radicradic

Sensea

ndsend

Health

care

radicradic

Predictio

naccuracy

Ineffi

cientfor

complex

scenarios

Onlinelearning

[105]

Increm

ental

classificatio

nSV

Mradic

radicradic

radicradic

radicradic

Classifi

catio

nEn

vironm

ental

mon

itorin

gradic

radicEn

ergy

Com

putatio

nal

complexity

One-class

quarter-sphere

SVM

[108]

Ano

maly

detection

SVM

radicradic

radicradic

radicradicradic

Localano

maly

detection

Habitat

mon

itorin

gradic

radicEn

ergy

Igno

resspatia

lcorrelation

18 International Journal of Distributed Sensor Networks

mining becomes difficult because updates on this structureshould be persisted over time

Node Role Node can perform three types of role [33] asfollows

(i) Regular Sensor These are the nodes with limitedresources and they are used to sense the phenomenaand send the sensed data to the base station

(ii) Cluster Head Cluster head can be a regular sensornode or it can be rich in resources In centralizedapproaches cluster head is a regular sensor node thatonly controls the cluster membership In distributedapproaches besides responding for cluster formationCHs perform aggregationfusion of collected sensorsrsquodata Therefore they are equipped with significantlymore computation and communication resources

(iii) Relay It is the node that acts as medium to transmitthe data packet from one node to the others

Node Task In centralized approach node task is to sense thephenomena being monitored and send the sensed data to thebase station In distributed approaches node can performcomputation and can take action based on the detectedphenomena or target

55 Application Area We also evaluated the type of applica-tion benefited fromWSNs data mining techniques Here weexemplify some real-world applications as follows

(i) First is the environmental monitoring [5ndash7 51 5887] in which sensors are deployed in harsh andunattended regions to monitor the natural environ-ment Data mining techniques can identify when andwhere an event may occur and trigger an alarm upondetection

(ii) Second is the habitant and health monitoring [1 299 109] in which patientshumans are equipped withsmall sensors on multiple different positions of theirbody tomonitor their health or behaviorDataminingtechnique can identify the abnormal behavior andhelp to take effective action

(iii) Third is the object tracking [3 4 65 66] in whichsensors are embedded inmoving targets to track themin real-time Data mining techniques help to improvethe estimation of the location of targets and also tomake tracking more efficient and accurate

(iv) Fourth is the WSNs performance [46 48 50 51]WSNs are usually unattended and deployed in harshenvironment Sensor nodes are resource constrainedespecially in terms of power Data mining techniqueshelp to identify the faulty or dead nodes Theyalso help to conserve energy by using in-networkprocessing in which aggregated data is sent to centralside

(v) Fifth is the data analysis [67 84 90] Data miningtechniques help to discover potentially interesting

data patterns in a sensor network for a certainapplication

(vi) Sixth is the real-time monitoring [64 65 85] Datamining techniques especially distributed techniqueshelp to identify certain patterns and predict futureevents in a given time window which make real-timeresponse and action feasible

56 Implementation Each technique is also evaluated interms of experimental validation that is which dataset isused which WSNs optimization objectives are achieved andso forth

Evaluation Method Analytical modeling simulation andreal deployment are the most commonly used techniques toanalyze the performance of data mining technique forWSNs

(i) Analytical Modeling This method is very complexand usually certain simplifications are assumed topredict the performance of the proposed schemeSuch assumptions and simplifications may lead toimprecise results with limited confidence

(ii) Simulation It is the most popular and effectiveapproach to design and test any proposed schemein terms of cost and time it also provides higherlevel of details as comparedwith real implementationHowever the appropriate selection of a simulationframework according to problem and network char-acteristics is a critical task

(iii) Real Deployment It may not be feasible to evaluatethe performance of these techniques through realdeployment due to the unavailability of appropriatehardware in terms of technical and design limitationsUsually the real deployment requires hundreds ofsensor nodes and cost becomes another importantissue In a nutshell evaluating any technique pro-posed for WSNs through real deployment can getthe most convincing results although the evaluatingprocess is complex costly and time consuming

Data Source It refers to dataset use to experimentally validatethe proposed technique Two types of dataset are usedgenerally that is synthetic and real It is observed from thispaper that most of the techniques use the simulation onsynthetic dataset to validate the result In this paper it isobserved that most of the studies used the simulation due tolimited processing power of sensor nodes

Optimization Objective SinceWSNs are constrained in termsof different resources the technique is also evaluated in theoptimization objective that has been achieved Most of thetechniques consider the resource constraint and differentdesign philosophies of network None of them can workefficiently for all of the performance metrics like networksize communication overhead energy efficiency memoryconsumption node mobility and and so forth The largevariations in the performance metrics make it a difficult taskto present a comprehensive evaluation

International Journal of Distributed Sensor Networks 19

6 Limitations of Existing Data MiningTechniques for WSNs

Tables 2ndash6 show the characteristics of datamining techniquesdesigned for WSNs It is observed from comparative analysisthat the existing techniques have the following shortcomings

(i) Most of the techniques do not take into account theheterogeneous data and assume that the sensor data ishomogenous [42 46 49ndash51 65 87 110] They ignorethe fact that different attributes together can improvethe mining accuracy In some cases homogenousdata cannot contribute appropriately toward real-time decision

(ii) The majority of techniques only considers the spatialor temporal or spatiotemporal correlations [65ndash6787 88] among sensor data of neighboring nodes anddoes not consider the attribute dependency amongsensor nodes This in turn increases the computa-tional complexity and reduces the accuracy of miningtechnique

(iii) The techniqueswhich consider spatial correlation [51]among sensor data of neighboring nodes suffer fromthe choice of appropriate neighborhood range Tech-niques which consider temporal correlation amongsensor data suffers from the choice of the size of thesliding window

(iv) The majority of techniques uses centralized approach[21 42ndash44 46 58 84 101] in which all data istransmitted to the sink node for identifying certainpatterns These techniques cause much communica-tion overhead and delay the response time Whilethe techniques that used distributed architecture opti-mize response time and energy consumption theyhave the same problem as that of the centralizedapproach if the aggregatorcluster head has a largenumber of nodes under its membership

(v) Excluding a few the performance of all of the schemesdiscussed in this paper has been evaluated with thehelp of different simulation tools Although the num-ber of simulators is available and plays an importantrole for developing and testing new technique thereis always some kind of risk involved as simulationresults may not be accurate In order to analyze aprotocol more effectively it is important to knowdifferent available tools andunderstand the associatedbenefits and limitationsDue to different performancerequirements according to specific applications ageneral tool for sensor networks is still lacking atpresent

(vi) The techniques evaluated by using analytical mod-eling [21 23 46 49 100 109] used certain sim-plification and assumption to evaluate the perfor-mance of proposed technique Such assumptions andsimplifications may lead to imprecise results withlimited confidence None of the proposed techniqueis evaluated by using real deployment Although realdeployment is complex costly and time consuming

accurate results can only be obtained by using realdeployment

(vii) Excluding a few [22 103 109] the majority oftechniques assumes that sensor nodes are stationaryand do not consider nodes mobility Applying thesetechniques for mobile networks or the networks withdynamic changed topology would be challenging

(viii) Most of the techniques used the synthetic dataAlthough synthetic data is easily available therealways been chances that results generated on syn-thetic data are not accurate

(ix) For the data mining techniques themselves fre-quent pattern mining [15ndash20] approaches suffer fromchoice of proper and flexible support and confidencethreshold Clustering techniques [11ndash14] suffer fromthe choice of an appropriate parameter of clusterwidth and computing the distance between datainstances in heterogeneous data is computationallyexpensive whereas classification-based techniques[24ndash26] require some prior knowledge to classify theincoming data stream However learning accurateclassification model is challenging if the number ofvariables is large in deployed WSNs

7 Future Research Directions

It is observed from the analysis of existing data mining workon sensor network-based application there are still shortcom-ings in existing techniques By seeing these shortcomingsand special characteristics of WSNs there is a need for datamining technique designed for WSNs The technique shouldbe based on the following requirements

(i) The technique should combine offline learningmech-anisms with distributed and online data processing

(ii) It should also consider the resource constraint ofWSN and its special characteristics such as nodemobility and network topology

(iii) The technique should consider heterogeneous dataand dependencies among spatial temporal andattribute correlations which may exist between adja-cent nodes

(iv) During online mining the technique should be capa-ble for incremental learning

(v) The technique should have low computation com-plexity and be easy to be implemented

Based on aforementioned requirements for WSN ahybrid data mining framework is proposed as shown inFigure 6 In this framework sensor nodes use their pro-cessing abilities to locally carry out mining processing andtransmit only the required and partially processed data calledlocal models Single-pass algorithms are applied for networkdata processing as the data is continuously arriving and notavailable for the next scan

Local models contain the compact event patterns ratherthan raw data which address the issue of communication

20 International Journal of Distributed Sensor Networks

Node data processingData selectionRemove duplicationAggregationSummarizationData fusionclusteringAssociation analysismiddot middot middotmiddot middot middot

middot middot middot

Sensor datastream

Global model

Approximateresults

Network model Local modelQuery

Users

Sinkbasestation

In-network processingCentralized processing

Central data processingFrequent pattern miningClassificationClusteringIncremental learningPredicationAnomaly detectionTime series analysis

Network data processingLocal model integrationNetwork analysisReal time decisionsNetwork maintenance

Network patternidentification

monitoring

Sing

le p

ass

Mul

ti pa

ss

Figure 6 Proposed hybrid framework for sensor network based applications

overhead associated with data transfer Local models aredistributed on entire network which are integrated at specialnode which is resource sufficient as compared with othersensor nodes As a result a network model is computed that ismore abstract than local model and is transferred to the basestationsink inmultihop fashionThenetworkmodels are thenintegrated at base stationsink to get the global view of entirenetwork named the global model As a result approximatequery answers are returned to endusers

This framework addresses the following shortcomings ofthe existing techniques

(i) It combines the offline learning mechanisms withdistributed and online data processing The dynamicnature of WSNs data requires real-time analysismethodologies and systems Centralized processingthrough high-end computing is also required forgenerating offline predictive insights which in turncan facilitate real-time analysis The applications thatrequire real-time response and actions can use net-work model for decision and knowledge extractionThe applications that need extensive data analysis fortheir decision making can use global model and per-form central processing on base the stationsink Thenetwork model forwards the processed informationto global model for extensive predictive insight

(ii) Since the data management is a crucial issue inWSNsdata [111] in order to deal with large-scale data fromWSNs the proposed framework splits the data pro-cessing tasks at multiple locations in-network pro-cessing and processing at central server In-networkprocessing splits the large task into smaller ones atnode level and cluster head which is distributed overthe entire network and executes parallelly At the node

level storage capacities of single nodes are used tocompute the local model which contains aggregateddata from single node whereas cluster head acquiresthe data from group of nodes and aggregate datareadings over a certain region or period As a resultnetwork model is computed at each cluster headwhich contains compact data from set of nodes andreduces data size to be transmitted Network modelscan be integrated at sink to get the global view ofreal-time applications Since the sink at network levelhas restricted resource and cannot process large-scaledata for predictive analysis therefore network mod-els are sent to central server where global models canbe computed for predictive offline analysis Historicalquery from the user can also be addressed fromcentral server whereas instant query can be handledby sink to support real-time response In this way ofdata distribution the proposed framework is feasibleto deal with large amount of data obtained fromWSNs

(iii) It can consider the resource constraint of sensornode by using context-awareness techniques Mem-ory energy [79] and bandwidth are considered inthe implementation of data processing on the sensorsfor example many summarization and aggregationtechniques can be adopted to reduce energy andbandwidth consumption

(iv) The framework can address the problem quicklychanging nature of WSNs data where characteristicsof the monitored process may change over timeand render the old models outdated This problemcan be addressed using the incremental learning

International Journal of Distributed Sensor Networks 21

mechanism [39 112] that helps the model to updatenew information

(v) The framework can identified the spatial-temporalcorrelation at local model by using data correlation-based clustering whereas attribute correlation can beidentified at global model by using the multipass datamining algorithms

Currently we are working on implementation of thishybrid framework and the implementationwill be completedin the near future

8 Conclusion

The emerging need for the data mining techniques in thefield of WSNs resulted in the development of numerousalgorithms Each one of these algorithms solves certainissues related to the appropriate WSNs type and applicationIn this paper we analyzed discussed and compared therelated existing research approaches We observed that thetechniques intended for mining sensor data at the networkside are helpful for taking real-time decision aswell as serve asprerequisite for development of effective mechanism for datastorage retrieval query and transaction processing at centralside Moreover we have presented problem-based taxonomyan overall analysis and review of the past research and theirlimitations which can provide insights for endusers in apply-ing or developing an appropriate data mining method andappropriate technology forWSNs Based on these limitationswe have proposed a hybrid framework which can addressthe shortcomings of existing work We have also discussedthe challenges for implementing data mining techniques inresource-constrained WSNs Besides there are a number ofopen issues in existing studies which need to be addressedSurely the number of WSNs applications presented hereis neither complete nor exhaustive but merely a sample ofapplications that demonstrate the usefulness and possibleapplications of data mining method in sensor network

We believe that WSNs applications will become moremature and popular with the advancement of sensor tech-nology and sensor data will become more informationrich Mining techniques will then be very significant inorder to conduct advanced analysis such as determiningtrends and finding interesting patterns thus enhancingWSNsperformance and operation The intention to present thispaper is to stimulate interests in utilizing and developing theprevious studies into emerging applications

Acknowledgments

This work was supported in part by the Joint Funds ofNSFC-Microsoft Research Asia under Grant no 60933012the Specialized Research Fund for the Doctoral Programof Higher Education under Grant no 20110142110062 andInternational SampT Cooperation Program of Hubei Provinceunder Grant no 2010BFA008

References

[1] A Rozyyev H Hasbullah and F Subhan ldquoIndoor child track-ing in wireless sensor network using fuzzy logic techniquerdquoResearch Journal of Information Technology vol 3 no 2 pp 81ndash92 2011

[2] R Szewczyk E Osterweil J Polastre M Hamilton A Main-waring and D Estrin ldquoHabitat monitoring with sensor net-worksrdquo Communications of the ACM vol 47 no 6 pp 34ndash402004

[3] S H Chauhdary A K Bashir S C Shah and M S ParkldquoEOATR energy efficient object tracking by auto adjustingtransmission range in wireless sensor networkrdquo Journal ofApplied Sciences vol 9 no 24 pp 4247ndash4252 2009

[4] P K Biswas and S Phoha ldquoSelf-organizing sensor networks forintegrated target surveillancerdquo IEEETransactions onComputersvol 55 no 8 pp 1033ndash1047 2006

[5] L T Lee and C W Chen ldquoSynchronizing sensor networkswith pulse coupled and cluster based approachesrdquo InformationTechnology Journal vol 7 no 5 pp 737ndash745 2008

[6] N Sabri S A Aljunid B Ahmad A Yahya R KamaruddinandM S Salim ldquoWireless sensor actor network based on fuzzyinference system for greenhouse climate controlrdquo Journal ofApplied Sciences vol 11 no 17 pp 3104ndash3116 2011

[7] D Kumar ldquoMonitoring forest cover changes using remotesensing and GIS a global prospectiverdquo Research Journal ofEnvironmental Sciences vol 5 pp 105ndash123 2011

[8] J Yick B Mukherjee and D Ghosal ldquoWireless sensor networksurveyrdquoComputerNetworks vol 52 no 12 pp 2292ndash2330 2008

[9] T Arampatzis J Lygeros and S Manesis ldquoA survey of appli-cations of wireless sensors and wireless sensor networksrdquoin Proceedings of the 20th IEEE International Symposium onIntelligent Control (ISIC rsquo05) pp 719ndash724 June 2005

[10] Y-C Tseng M-S Pan and Y-Y Tsai ldquoWireless sensor net-works for emergency navigationrdquo Computer vol 39 no 7 pp55ndash62 2006

[11] T Yairi Y Kato and K Hori ldquoFault detection by miningassociation rules fromhouse-keeping datardquo inProceedings of the6th International Symposium on Artificial Intelligence Roboticsand Automation in Space pp 18ndash21 2001

[12] O Horovitz S Krishnaswamy and M M Gaber ldquoA fuzzyapproach for interpretation of ubiquitous data stream clusteringand its application in road safetyrdquo Intelligent Data Analysis vol11 no 1 pp 89ndash108 2007

[13] J Gama P P Rodrigues and L Lopes ldquoClustering distributedsensor data streams using local processing and reduced com-municationrdquo Intelligent Data Analysis vol 15 no 1 pp 3ndash282011

[14] Z A Aghbari I Kamel and T Awad ldquoOn clustering largenumber of data streamsrdquo Intelligent Data Analysis vol 16 no1 pp 69ndash91 2012

[15] A Boukerche and S Samarah ldquoAn efficient data extractionmechanism for mining association rules from wireless sensornetworksrdquo in Proceedings of the IEEE International Conferenceon Communications (ICC rsquo07) pp 3936ndash3941 June 2007

[16] Y Chi H Wang P S Yu and R R Muntz ldquoMomentmaintaining closed frequent itemsets over a stream slidingwindowrdquo inProceedings of the 4th IEEE International Conferenceon Data Mining (ICDM rsquo04) pp 59ndash66 November 2004

[17] M Deypir and M H Sadreddini ldquoEclatDS an efficient slid-ing window based frequent pattern mining method for data

22 International Journal of Distributed Sensor Networks

streamsrdquo Intelligent Data Analysis vol 15 no 4 pp 571ndash5872011

[18] J Gama A Ganguly O Omitaomu R Vatsavai and M GaberldquoKnowledge discovery from data streamsrdquo Intelligent DataAnalysis vol 13 no 3 pp 403ndash404 2009

[19] B George J M Kang and S Shekhar ldquoSpatio-temporal sensorgraphs (STSG) a data model for the discovery of spatio-temporal patternsrdquo Intelligent Data Analysis vol 13 no 3 pp457ndash475 2009

[20] A Mahmood K Shi and S Khatoon ldquoMining data generatedby sensor networks a surveyrdquo Information Technology Journalvol 11 pp 1534ndash1543 2012

[21] D J Cook M Youngblood E O Heierman III et alldquoMavHome an agent-based smart homerdquo in Proceedings of the1st IEEE International Conference on Pervasive Computing andCommunications (PerCom rsquo03) pp 521ndash524 March 2003

[22] J Rabatel S Bringay and P Poncelet ldquoSO MAD sensorminingfor anomaly detection in railway datardquo in Advances in DataMining Applications andTheoretical Aspects pp 191ndash205 2009

[23] V Guralnik and K Z Haigh ldquoLearning models of humanbehaviour with sequential patternsrdquo in Proceedings of the AAAI-02 Workshop on Automation as Caregiver pp 24ndash30 2002

[24] S Huang and Y Dong ldquoAn active learning system for miningtime-changing data streamsrdquo Intelligent Data Analysis vol 11no 4 pp 401ndash419 2007

[25] J Beringer and E Hullermeier ldquoEfficient instance-based learn-ing on data streamsrdquo Intelligent Data Analysis vol 11 no 6 pp627ndash650 2007

[26] E J Spinosaa A PD L F deCarvalhoa and J Gamab ldquoNoveltydetection with application to data streamsrdquo Intelligent DataAnalysis vol 13 no 3 pp 405ndash422 2009

[27] M Xie S Han B Tian and S Parvin ldquoAnomaly detectionin wireless sensor networks a surveyrdquo Journal of Network andComputer Applications vol 34 no 4 pp 1302ndash1325 2011

[28] Y Zhang N Meratnia and P Havinga ldquoOutlier detectiontechniques for wireless sensor networks a surveyrdquo IEEE Com-munications Surveys and Tutorials vol 12 no 2 pp 159ndash1702010

[29] V Chandola A Banerjee and V Kumar ldquoAnomaly detection asurveyrdquo ACM Computing Surveys vol 41 no 3 article 15 2009

[30] VMaojo and J Sanandre ldquoA survey of data mining techniquesrdquoMedical Data Analysis Lecture Notes in Computer Science vol1933 pp 17ndash22 2000

[31] W Jinlong X Congfu C Weidong and P Yunhe ldquoSurveyof the study on frequent pattern mining in data streamsrdquo inProceedings of the IEEE International Conference on SystemsMan and Cybernetics (SMC rsquo04) pp 5917ndash5922 October 2004

[32] J Cheng Y Ke and W Ng ldquoA survey on algorithms formining frequent itemsets over data streamsrdquo Knowledge andInformation Systems vol 16 no 1 pp 1ndash27 2008

[33] A A Abbasi andM Younis ldquoA survey on clustering algorithmsfor wireless sensor networksrdquo Computer Communications vol30 no 14-15 pp 2826ndash2841 2007

[34] O Boyinbode H Le and M Takizawa ldquoA survey on clusteringalgorithms for wireless sensor networksrdquo International Journalof Space-Based and SituatedComputing vol 1 no 2 pp 130ndash1362007

[35] M M Gaber A Zaslavsky and S Krishnaswamy ldquoA survey ofclassificationmethods in data streamsrdquo inData Streams pp 39ndash59 Springer 2007

[36] R Agrawal and R Srikant ldquoFast algorithms for mining associ-ation rulesrdquo in Proceedings of the 20th International ConferenceVery Large Data Bases (VLDB rsquo94) pp 487ndash499 Citeseer 1994

[37] R J Bayardo Jr ldquoEfficiently mining long patterns fromdatabasesrdquo SIGMOD Record vol 27 no 2 pp 85ndash93 1998

[38] S Brin RMotwani andC Silverstein ldquoBeyondmarket basketsgeneralizing association rules to correlationsrdquo SIGMODRecordvol 26 no 2 pp 265ndash276 1997

[39] W Cheung and O R Zaiane ldquoIncremental mining of frequentpatterns without candidate generation or support constraintrdquoin Proceedings of 7th International Database Engineering andApplications Symposium pp 111ndash116 2003

[40] R Agrawal T Imielinski and A Swami ldquoMining associationrules between sets of items in large databasesrdquo in Proceeding ofSIGMOD pp 207ndash216

[41] J Han J Pei Y Yin and R Mao ldquoMining frequent pat-terns without candidate generation a frequent-pattern treeapproachrdquo Data Mining and Knowledge Discovery vol 8 no 1pp 53ndash87 2004

[42] M Halatchev and L Gruenwald ldquoEstimating missing valuesin related sensor data streamsrdquo in Proceedings of the 11thInternational Conference on Management of Data (COMADrsquo05) 2005

[43] N Jiang ldquoDiscovering association rules in data streams basedon closed pattern miningrdquo in Proceedings of the SIGMODWorkshop on Innovative Database Research 2007

[44] N Jiang and L Gruenwald ldquoEstimating missing data in datastreamsrdquo Advances in Databases Concepts Systems and Appli-cations pp 981ndash987 2007

[45] N Jiang and L Gruenwald ldquoCFI-stream mining closed fre-quent itemsets in data streamsrdquo in Proceedings of the 12th ACMSIGKDD International Conference on Knowledge Discovery andData Mining (KDD rsquo06) pp 592ndash597 August 2006

[46] K Loo I Tong and B Kao ldquoOnline algorithms for min-ing inter-stream associations from large sensor networksrdquo inAdvances in KnowledgeDiscovery andDataMining pp 291ndash3022005

[47] G S Manku and R Motwani ldquoApproximate frequency countsover data streamsrdquo in Proceedings of the 28th InternationalConference on Very Large Data Bases pp 346ndash357 2002

[48] S K Chong S Krishnaswamy S W Loke and M M GaberldquoUsing association rules for energy conservation in wirelesssensor networksrdquo in Proceedings of the 23rd Annual ACMSymposium on Applied Computing (SAC rsquo08) pp 971ndash975March 2008

[49] S K Tanbeer C F Ahmed B-S Jeong and Y-K Lee ldquoEfficientmining of association rules from wireless sensor networksrdquo inProceedings of the 11th International Conference on AdvancedCommunication Technology (ICACT rsquo09) pp 719ndash724 February2009

[50] A Boukerche and S Samarah ldquoA novel algorithm for miningassociation rules in Wireless Ad Hoc Sensor Networksrdquo IEEETransactions on Parallel and Distributed Systems vol 19 no 7pp 865ndash877 2008

[51] K Romer ldquoDistributed mining of spatio-temporal event pat-terns in sensor networksrdquo in Proceedings of the 1st Euro-American Workshop on Middleware for Sensor Networks(EAWMS rsquo06) 2006

[52] BTnode platform httpwwwbtnodeethzch[53] R Agrawal and R Srikant ldquoMining sequential patternsrdquo in

Proceedings of the IEEE 11th International Conference on DataEngineering pp 3ndash14 March 1995

International Journal of Distributed Sensor Networks 23

[54] R Srikant and R Agrawal ldquoMining sequential patterns gen-eralizations and performance improvementsrdquo in Proceedings ofthe Advances in Database Technology (EDBT rsquo96) pp 1ndash17 1996

[55] F Masseglia F Cathala and P Poncelet ldquoThe PSP approachfor mining sequential patternsrdquo Principles of Data Mining andKnowledge Discovery pp 176ndash184 1998

[56] J Han J Pei B Mortazavi-Asl Q Chen U Dayal and M-CHsu ldquoFreeSpan frequent pattern-projected sequential patternminingrdquo in Proceedings of the Sixth ACMSIGKDD InternationalConference onKnowledgeDiscovery andDataMining (KDD rsquo01)pp 355ndash359 August 2000

[57] J Pei J Han B Mortazavi-Asl et al ldquoPrefixSpan min-ing sequential patterns efficiently by prefix-projected patterngrowthrdquo in Proceedings of the 17th International Conference onData Engineering pp 215ndash224 April 2001

[58] F Esposito T M A Basile N Di Mauro and S Ferilli ldquoA rela-tional approach to sensor network data miningrdquo InformationRetrieval and Mining in Distributed Environments pp 163ndash1812010

[59] F Esposito N Di Mauro T M A Basile and S FerillildquoMulti-dimensional relational sequence miningrdquo FundamentaInformaticae vol 89 no 1 pp 23ndash43 2008

[60] R Agrawal H Mannila R Srikant et al ldquoFast discovery ofassociation rulesrdquo inAdvances in KnowledgeDiscovery andDataMining pp 307ndash328 AAAI PressMenlo Park Calif USA 1996

[61] Mica2Dot CrossBow 2005 httpwwwxbowcom[62] Intel Berkeley Research Lab Data httpdbcsailmitedulab-

datalabdatahtml[63] P H Wu W C Peng and M S Chen ldquoMining sequential

alarm patterns in a telecommunication databaserdquo in Databasesin Telecommunications II pp 37ndash51 2001

[64] V S Tseng and E H-C Lu ldquoEnergy-efficient real-time objecttracking in multi-level sensor networks by mining and predict-ing movement patternsrdquo Journal of Systems and Software vol82 no 4 pp 697ndash706 2009

[65] V S Tseng and K W Lin ldquoEnergy efficient strategies for objecttracking in sensor networks a data mining approachrdquo Journalof Systems and Software vol 80 no 10 pp 1678ndash1698 2007

[66] S Samarah M Al-Hajri and A Boukerche ldquoA predictiveenergy-efficient technique to support object-tracking sensornetworksrdquo IEEE Transactions on Vehicular Technology vol 60no 2 pp 656ndash663 2011

[67] A Taherkordi R Mohammadi and F Eliassen ldquoA commu-nication-efficient distributed clustering algorithm for sensornetworksrdquo in Proceedings of the 22nd International Conferenceon Advanced Information Networking and Applications Work-shopsSymposia (AINA rsquo08) pp 634ndash638 March 2008

[68] G Gupta and M Younis ldquoLoad-balanced clustering of wirelesssensor networksrdquo in Proceedings of the International Conferenceon Communications (ICC rsquo03) vol 3 pp 1848ndash1852 May 2003

[69] S Bandyopadhyay and E J Coyle ldquoAn energy efficient hier-archical clustering algorithm for wireless sensor networksrdquo inProceedings of the 22nd Annual Joint Conference on the IEEEComputer and Communications Societies pp 1713ndash1723 April2003

[70] S Ghiasi A Srivastava X Yang and M Sarrafzadeh ldquoOptimalenergy aware clustering in sensor networksrdquo Sensors vol 2 no7 pp 258ndash269 2002

[71] O Younis and S Fahmy ldquoHEED a hybrid energy-efficientdistributed clustering approach for ad hoc sensor networksrdquoIEEE Transactions on Mobile Computing vol 3 no 4 pp 366ndash379 2004

[72] M Younis M Youssef and K Arisha ldquoEnergy-aware manage-ment for cluster-based sensor networksrdquo Computer Networksvol 43 no 5 pp 649ndash668 2003

[73] Y T Hou Y Shi H D Sherali and S F Midkiff ldquoOn energyprovisioning and relay node placement for wireless sensornetworksrdquo IEEE Transactions on Wireless Communications vol4 no 5 pp 2579ndash2590 2005

[74] T Wu and S Biswas ldquoA self-reorganizing slot allocation proto-col for multi-cluster sensor networksrdquo in Proceedings of the 4thInternational Symposium on Information Processing in SensorNetworks (IPSN rsquo05) pp 309ndash316 April 2005

[75] K Dasgupta K Kalpakis and P Namjoshi ldquoAn efficientclustering-based heuristic for data gathering and aggregationin sensor networksrdquo in Proceedings of the IEEE Wireless Com-munications and Networking Conference (WCNC rsquo03) vol 3 pp1948ndash1953 2003

[76] M Demirbas A Arora and V Mittal ldquoFLOC A fast local clus-tering service for wireless sensor networksrdquo in Proceedings ofWorkshop on Dependability Issues in Wireless Ad Hoc Networksand Sensor Networks (DIWANS rsquo04) 2004

[77] P Ding J Holliday and A Celik ldquoDistributed energy-efficienthierarchical clustering for wireless sensor networksrdquo in Pro-ceedings of the 1st IEEE International Conference on DistributedComputing in Sensor Systems (DCOSS rsquo05) pp 466ndash467 July2005

[78] H Chan and A Perrig ldquoACE an emergent algorithm for highlyuniform cluster formationrdquoWireless Sensor Networks vol 2920pp 154ndash171 2004

[79] H Chan M Luk and A Perrig ldquoUsing clustering informationfor sensor network localizationrdquo in Proceedings of the 1st IEEEInternational Conference on Distributed Computing in SensorSystems (DCOSS rsquo05) pp 109ndash125 July 2005

[80] H Huang and J Wu ldquoA probabilistic clustering algorithmin wireless sensor networksrdquo in Proceeding of IEEE 62ndSemiannual Vehicular Technology Conference (VTC rsquo05) p 17962005

[81] A Youssef M Younis M Youssef and A Agrawala ldquoDis-tributed formation of overlappingmulti-hop clusters in wirelesssensor networksrdquo in Proceedings of the 49th Annual IEEE GlobalCommunication Conference (Globecom rsquo06) pp 1ndash6 December2006

[82] S Dai P Wang L Gao and S Zheng ldquoMining clusteringalgorithm in wireless sensor networksrdquo in Proceedings of theIEEE International Conference on Granular Computing (GRCrsquo08) pp 178ndash182 August 2008

[83] W R Heinzelman A Chandrakasan and H Balakrish-nan ldquoEnergy-efficient communication protocol for wirelessmicrosensor networksrdquo in Proceedings of the 33rd AnnualHawaii International Conference on System Siences (HICSS rsquo00)vol 2 p 223 January 2000

[84] C Liu K Wu and J Pei ldquoA dynamic clustering and schedulingapproach to energy saving in data collection from wirelesssensor networksrdquo in Proceedings of the 2nd Annual IEEE Com-munications Society Conference on Sensor and AdHoc Commu-nications and Networks (SECON rsquo05) pp 374ndash385 September2005

[85] L Guo C Ai X Wang Z Cai and Y Li ldquoReal time clusteringof sensory data in wireless sensor networksrdquo in Proceedingsof the IEEE 28th International Performance Computing andCommunications Conference (IPCCC rsquo09) pp 33ndash40 December2009

24 International Journal of Distributed Sensor Networks

[86] M H Yeo M S Lee S J Lee and J S Yoo ldquoData correlation-based clustering in sensor networksrdquo in Proceedings of the Inter-national Symposium on Computer Science and its Applications(CSA rsquo08) pp 332ndash337 October 2008

[87] P Beyens A Nowe and K Steenhaut ldquoHigh-density wirelesssensor networks a new clustering approach for prediction-based monitoringrdquo in Proceedings of the 2nd European Work-shop on Wireless Sensor Networks (EWSN rsquo05) pp 188ndash196February 2005

[88] S Yoon and C Shahabi ldquoThe Clustered AGgregation (CAG)technique leveraging spatial and temporal correlations in wire-less sensor networksrdquo ACM Transactions on Sensor Networksvol 3 no 1 Article ID 1210672 2007

[89] K Wang S A Ayyash T D C Little and P Basu ldquoAttribute-based clustering for information dissemination in wirelesssensor networksrdquo in Proceedings of the 2nd Annual IEEE Com-munications Society Conference on Sensor and AdHoc Commu-nications and Networks (SECON rsquo05) pp 498ndash509 Santa ClaraCalif USA September 2005

[90] X Ma S Li Q Luo et al ldquoDistributed hierarchical clusteringand summarization in sensor networksrdquo in Advances in Dataand Web Management pp 168ndash175 2007

[91] L K Sharma O P Vyas S Schieder et al ldquoNearest neighbourclassification for trajectory datardquo Information and Communica-tion Technologies vol 101 pp 180ndash185 2010

[92] B Chikhaoui S Wang and H Pigot ldquoA new algorithm basedon sequential pattern mining for person identification in ubiq-uitous environmentsrdquo in Proceedings of the 4th InternationalWorkshop on Knowledge Discovery form Sensor Data (ACMSensorKDD rsquo10) pp 20ndash28 Washington DC USA 2010

[93] J R M Bauchet S Giroux H Pigot et al ldquoPervasive assistancein smart homes for people with intellectual disabilities a casestudy on meal preparationrdquo International Journal of AssistiveRobotics and Mechatronics vol 9 no 4 pp 42ndash54 2008

[94] D J Cook andM Schmitter-Edgecombe ldquoAssessing the qualityof activities in a smart environmentrdquoMethods of Information inMedicine vol 48 no 5 pp 480ndash485 2009

[95] I H Witten and E Frank Data Mining Practical MachineLearning Tools and Techniques With Java Implementation Mor-gan Kaufmann 2000

[96] K Sharma M Rajpoot and L K Sharma ldquoNearest neighbourclassification for wireless sensor network datardquo InternationalJournal of Computer Trends and Technology no 2 2011

[97] NS2 Simulator httpwwwisiedunsnamns[98] O P V L K Sharma S Schieder and A K Akasapu ldquoA nearest

neighbour classification for trajectory datardquo in Springer CCISvol 101 pp 180ndash185 2010

[99] M J Akhlaghinia A Lotfi C Langensiepen and N SherkatldquoA fuzzy predictor model for the occupancy prediction of anintelligent inhabited environmentrdquo in Proceedings of the IEEEInternational Conference on Fuzzy Systems (FUZZ rsquo08) pp 939ndash946 June 2008

[100] M Gaber S Krishnaswamy and A Zaslavsky ldquoOn-boardmining of data streams in sensor networksrdquo in AdvancedMethods for Knowledge Discovery from Complex Data pp 307ndash335 2005

[101] M M Gaber S Krishnaswamy and A Zaslavsky ldquoAdaptivemining techniques for data streams using algorithm outputgranularityrdquo in Proceedings of the Australasian Data MiningWorkshop 2003

[102] M M Gaber A Zaslavsky and S Krishnaswamy ldquoResource-aware knowledge discovery in data streamsrdquo in Proceedingsof 1st International Workshop on Knowledge Discovery in DataStreams held in Conjunction ECML and PKDD 2004

[103] S M McConnell and D B Skillicorn ldquoA distributed approachfor prediction in sensor networksrdquo in Proceedings of the Work-shop on Data Mining in Sensor Networks Newport Beach CalifUSA 2005

[104] B Malhotra I Nikolaidis and J Harms ldquoDistributed classifi-cation of acoustic targets in wireless audio-sensor networksrdquoComputer Networks vol 52 no 13 pp 2582ndash2593 2008

[105] K Flouri B Beferull-Lozano and T Tsakalides ldquoTraininga SVM-based classifier in distributed sensor networksrdquo inProceedings of the 14th International Conference onDigital SignalProcessing (DSP rsquo09) pp 1ndash5 2006

[106] K Flouri B Beferull-Lozano and T Tsakalides ldquoEnergy-efficient distributed support vectormachines for wireless sensornetworksrdquo in Proceedings of the EuropeanWorkshop onWirelessSensor Networks 2006

[107] K Flouri B Beferull-Lozano and T Tsakalides ldquoDistributedconsensus algorithms for SVM training in wireless sensornetworksrdquo in Proceedings of the 16th European Signal ProcessingConference (EUSIPCO 09) 2008

[108] S Rajasegarar C Leckie M Palaniswami and J C BezdekldquoQuarter sphere based distributed anomaly detection in wire-less sensor networksrdquo in Proceedings of the IEEE InternationalConference on Communications (ICC rsquo07) pp 3864ndash3869 June2007

[109] B Chikhaoui S Wang and H Pigot ldquoA new algorithm basedon sequential pattern mining for person identification in ubiq-uitous environmentsrdquo in Proceedings of the 4th InternationalWorkshop on Knowledge Discovery form Sensor Data (ACMSensorKDD rsquo10) pp 20ndash28 Washington DC USA 2010

[110] K Romer and F Mattern ldquoThe design space of wireless sensornetworksrdquo IEEEWireless Communications vol 11 no 6 pp 54ndash61 2004

[111] O Diallo J J P C Rodrigues and M Sene ldquoReal-time datamanagement on wireless sensor networks a surveyrdquo Journal ofNetwork andComputer Applications vol 35 no 3 pp 1013ndash10212012

[112] Y Yao L Feng B Jin and F Chen ldquoAn incremental learningapproachwith SupportVectorMachine for network data streamclassification problemrdquo Information Technology Journal vol 11no 2 pp 200ndash208 2012

Submit your manuscripts athttpwwwhindawicom

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2013Part I

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

DistributedSensor Networks

International Journal of

ISRN Signal Processing

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Mechanical Engineering

Advances in

Modelling amp Simulation in EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Advances inOptoElectronics

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2013

ISRN Sensor Networks

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawi Publishing Corporation httpwwwhindawicom Volume 2013

The Scientific World Journal

ISRN Robotics

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

International Journal of

Antennas andPropagation

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

ISRN Electronics

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

thinspJournalthinspofthinsp

Sensors

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Active and Passive Electronic Components

Chemical EngineeringInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Electrical and Computer Engineering

Journal of

ISRN Civil Engineering

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Advances inAcoustics ampVibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Page 4: ReviewArticle Data Mining Techniques for Wireless Sensor ...home.etf.bg.ac.rs/~vm/os/dmsw/Data Mining... · have a large impact on type of data mining algorithm to choose;therefore,onehastodecidetheprocessing

4 International Journal of Distributed Sensor Networks

Data mining techniques for WSNs

ClassificationClusteringSequential miningFrequent mining

Distributed Centralized Distributed Centralized Distributed Centralized Distributed Centralized

WSN performance

WSN performance

WSN performance

WSN performance

WSN performance

WSN performance

WSN performance

Application based

Application based

Application based

Application based

Application based

Application based

DSARMCARM

Distributed data aggregation

Association rules mining framework

Online algorithm Lightweight rule

learning

MPGPTSP

Relational frameworkEpisode discovery

Contextual patterns discovery

Pattern learner MSAP

DCC

Prediction model CAG

Clustering sensory data Attribute-based clustering

DHCS

EEDC

Prediction framework FVLD

online learning

Person identification algorithms

NNTC Fuzzy predictor model

LWClass

SP-treeH-cluster

In-network datamining

TMP-mine One-class quarter-sphere SVM

Figure 1 Taxonomy of data mining techniques for sensor networks

41 Frequent Pattern Mining In this section we review someof the works that have been proposed for mining frequentpatterns from WSNs data Frequent pattern mining is usedto find the group of variables that co-occur frequently inthe data-set The aim is to find the most interesting relationsbetween variables Traditional frequent pattern mining algo-rithms [36ndash39] are the CPU and the IO intensive making itvery expensive to mine dynamic nature of WSN data Unlikethe mining static database dynamic nature of WSNs data ledto the study of online mining of frequent itemset As a resulttraditional frequent pattern mining algorithms are modifiedaccording to nature of WSNs data

The basic frequent pattern mining technique is associ-ation rule mining technique The first known associationrule mining algorithm is Apriori [40] It is based on level-wise candidate generation and test methodology by makingseveral scans over database In each iteration the patternsfound to be frequent are used to generate possible frequentpatterns (the candidates) to be counted in the next iterationTherefore theApriori technique finds the frequent patterns oflength 119896 from the set of already generated candidate patternsof length 119896 minus 1 In the subsequent step the association rulesare generated by computing the support and confidence ofeach frequent item in given database 119863 which is defined asfollows

Support (119860) =Sup (119860)119863 (1)

where Sup(119860) is the number of occurrence of 119860 in database119863 Consider the following

Confidence (119860 997888rarr 119861) =Sup (119860 cup 119861)Sup (119860)

(2)

This is impractical in the context of sensor networksas it implies that all data has to be stored somewhereHowever recently there has been a growing amount of workon discovering frequent item-sets from a data stream oftransactions such that every transaction is considered onlyonce and can be deleted afterwards

The other basic approach from mining association ruleis FP-growth [41] which can discover frequent patterns byreducing the database scans by two and eliminating therequirement of candidate generation as compared with Apri-ori With the first database scan the algorithm finds theset of distinct items with respective support count (iefrequency) in the database Then with the second databasescan the algorithm summarizes the database in the form ofa frequency-descending tree (ie the FP-tree) The completeset of frequent patterns is then mined from the FP-treeby recursively applying a divide-and-conquer-based patterngrowth approach called the FP-growth algorithm withoutadditional database scan The highly compact FP-tree struc-ture introduced a new wing of research in mining frequentpatterns However the static nature of the FP-tree and twodatabase scans still limit its applicability to frequent patternmining over a WSNs data Recently several centralized and

International Journal of Distributed Sensor Networks 5

distributed solutions have been proposed with the aimto maximize the WSNsrsquo performance and maximize theapplication-based performance by applying Apriori-like andFP-growth methods over WSNs data

411 Centralized Approaches Aim to SolveWSNsrsquo Application-Based Issues Halatchev and Gruenwald [42] proposed acentralized methodology called data stream association rulemining (DSARM) to identify the missing sensorrsquos readings Ituses the association rulemining algorithm to identify sensorsthat report the same data for a number of times in a slidingwindow called related sensors and then estimates the missingdata from a sensor by using the data reported by its relatedsensors Due to the stream nature of sensor data applyingan association mining algorithm such as Apriori directly tosensor data is not possible This situation led the authorsto propose the DSARM framework that adapts the Apriorialgorithm to make it applicable to the data stream receivedfrom sensor nodesThis technique is evaluated by simulationexperiments on real data collected by the Department ofTransportation in Austin TX USA to estimate missingvalue in related data streams Performance evaluations wereconducted to compare DSARM and alternative approachesThe results show that DSARM requires more memory spaceand takes longer to produce estimation than the consideredalternative approaches it achieves better accuracy of theestimated value than the alternative approaches do Howeverthere exist some limitations in DSARM First it is basedon two frequent itemsets association rule mining whichmeans that it can discover the relationships only between twosensors and ignore the cases where missing values are relatedwith multiple sensors Second it finds those relationshipsonly when both sensors report the same value and ignoresthe cases where missing values can be estimated by therelationships between sensors that report different values

Jiang and Gruenwald [43 44] proposed a data estimationtechnique called CARM (closed item-sets-based associationrule mining) which can derive the most recent associationrules between the sensors in the current sliding window Thetechnique is based on the closed frequent item-sets miningalgorithmof data streams calledCFI-stream [45] Itmaintainsan in-memory data structure called direct update (DIU) treeto store closed item-sets When a new transaction arrivesthe algorithm checks each item-set in the transaction over adata stream slidingwindowonline and incrementally updatesthe closed item-setsrsquo support If CRAM found some missingvalues in sensor reading instead of generating all possibleassociation rules it generates the rules that have strongrelationships with the current round of sensor readingswhereone or more readings are missing Based on these rules andselected closed item-sets CRAM generates the estimatedvalues which contain item values that are not included inthe original readings Figure 2 redrawn from [43] shows theDIU tree after receiving first four transactions It shows thatcurrently there are four closed item-sets C AB CD andABCin the DIU tree and their associated supports at the right-upper corner are 3 3 1 and 2 A basic set of rules is generatedfrom these frequent item-sets All other rules can be inferredfrom this basic rule set

Φ

CDTim

eline

TID

1

2

3

4

Items

C D

A B

A B C

A B C

AB 3

C 3

ABC 2

Figure 2 Lexicographical-ordered direct update tree

412 Centralized Approaches Aim to Maximize WSNsrsquo Per-formance Loo et al [46] have proposed online one-passalgorithms for mining large sensor streams They mine thefrequent value set from sensor stream data by transformingthe stream data into interval list (IL) under lossy countingframework [47] The time is divided into equal-size intervaland snapshot from the sensor reading is taken when there isan update on sensor reading Sensorsrsquo value at that snapshotconstructs the value sets stored in database An Apriori-based strategy is used to mine the value sets The analysisof IL-based presentation of stream data showed favorableresults using synthetic data-set However while computingthe IL of candidate value set redundant intersection ofIL is inevitable which affects the performance in termsof time and computation cost The proposed technique isevaluated by comparing the performance of ILB againstan application of lossy counting (LC) using a weightedtransformation method on synthetic dataset According totheir experiments ILB outperforms LC significantly for largesensor networks Moreover both the processing time andmemory consumption of ILB are more stable than those ofLC

Chong et al [48] proposed a rule-learning model thatfinds strong rules from sensor readings The rules are used asa trigger to control sensor network operations for examplethey can be used to sleep sensor or reduce data transmissionto conserve energy To mine the rules Apriori is modified tocount the number of transactions that are frequent insteadof the item-sets within transactions and transactions areprocessed in batches 119887

1 1198872 119887

119883 Suppose there is node

119872 that collects light temperature and microphone readingfrom three other sensor streams 119878

0 1198781 and 119878

2 Initially 119872

is queried to collect all sensory values it is used to generatea rule of the form of 119886

119899which implies 119886

119899minus1 therefore the

rule is extracted and only 119886119899is sent to the base station Upon

receiving the reading 119886119899and utilizing knowledge of the rule

the reading of 119886119899minus1

can be inferred All extracted rules arestored in rule repository The proposed method is validatedby using simulation implemented in C language on syntheticdataset In the experiment the first correlated data receivedfrom sensor is used to extract rules For subsequent phasethese rules are used to infer reading of sensor for the nextround

Tanbeer et al [49] proposed a tree-based data structurecalled sensor pattern tree (SP-tree) to generate association

6 International Journal of Distributed Sensor Networks

rules from WSNs data with one database scan The mainidea of the proposed approach is to obtain the frequencyof all event-detecting sensorsrsquo data construct a prefix-treebased on that in any canonical order and then reorganizethe tree in a frequency descending order Through thereorganization the SP-tree canmaintain the frequently event-detecting sensorsrsquo nodes at the upper part of the tree whichin turn provides high compactness in the tree structureOnce the SP-tree is constructed FP-growthmining techniqueis applied to find the frequent event-detecting sensor setsExperiments are performed to verify the improvement inmemory consumption and runtime that SP-tree achieves overPLT [50] The experiments show that SP-tree outperformsPLT in time and memory consumption The reason of suchgain is two folds first the PLT construction requires twodatabase scans while SP-tree constructs the tree by scanningthe database only once second the mining phase of SP-tree is highly efficient due to the frequency-descending treestructure

413 Distributed Approaches Aim to SolveWSNsrsquo Application-Based Issues Romer [51] proposed an in-network data min-ing technique to discover frequent patterns of events withcertain spatial and temporal properties In this approach userspecifies the upper boundmaxscope andmaxhistory (variableto be measured in seconds) for the patterns of interest Thesensor collects these events and applies amining algorithm todiscover the pattern that satisfies the given parameters Eachnode in the network collects the events from its neighborswithin themaximum scope and keeps a history of their eventsfor duration of the maximum history After that each nodeapplies a mining algorithm to discover the local frequentpatterns The resulting frequent patterns are converted toassociation rules that describe an event of type 119864 that occursat node 119899 with support 119878 and confidence 119862 Local patternsare sent to the sink where secondary mining is performed tocompute the global picture of entire network The algorithmis implemented on BT node (bluetooth radio) platform [52]and the tradeoff between scope of the query and resourceconsumption on real dataset is evaluated Results show byreducing the scope of the query that the proposed approachcould decrease resource consumption Major issues in thisapproach are memory consumption of itemset discoveryalgorithms and the communication overhead of event collec-tion

414 Distributed Approaches Aim to Maximize WSNsrsquo Perfor-mance Boukerche and Samarah [15] presented a distributeddata extraction methodology to aggregate the data on sensornode which reduced the number of messages during trans-mission The distributed solution sends some parameterssuch as support time-slot size and historic period from sink toall nodes within network Each sensor node has its own bufferentry to set the support value After each time slot nodescheck whether there are messages received during this timeslot if yes then that node will set its buffer entry When thehistoric period ended each node will traverse its buffer if thenumber of set value is more than or equal to support value

provided initially then the message would be transfered tosink To evaluate the validity of the distributed approach it iscompared with the centralized methodology on real datasetThey conducted two experiments using historical periods of 5and 10 days with minimum support values ranging from 10to 90 and a time-slot size equal to 30 seconds All of thereported results show a reduction in the number of messagesand the data sizewhile increasing in the support valuesMajorissues in thismethodology are increase in cost for node bufferand also delay in crucial messages in case of high supportvalue

Boukerche and Samarah [50] proposed the positionallexicographic tree (PLT) structure for mining associationrules in which the event-detecting sensors are the mainobjects of the rules regardless of their values Similar to theFP-growth approach PLT follows a pattern growth miningtechnique The mining begins with the sensor having themaximum rank by generating the frequent patterns from itsPLT in a recursive way The computation is required at eachrecursion to update the PLT involved in the prefix part ofa pattern Therefore two database scans requirement andthe additional PLT update operations during mining limitthe efficient use of this approach in handling WSNs dataThe performance evaluation is done by comparing the PLTstructure with the FP-growth algorithm According to theirresults PLT structure outperforms FP-growth in terms ofCPU time and memory usage for all of the support valuesused the enhanced performance using PLT when comparedwith FP-growth ranges from 30 percent to 50 percent

42 Sequential PatternMining (SPM) Frequent patternmin-ing has been extended to find more complex structuresuch as sequential pattern mining It discovers frequentsubsequences as patterns in a sequence database A sequencedatabase stores a number of records where all records aresequences of ordered events with orwithout concrete notionsof time A large number of real-world domains such as userprofiling medicine local weather forecast and bioinformat-ics show an inherent tendency to be modeled by means ofsequences of eventsobjects related to each other This greatvariety of applications of sequential pattern mining makesthis problem one of the central topics in WSNs data miningas shown by the research efforts produced in the recent yearsThe sequential pattern mining techniques in sensor networkbased on either traditional sequential mining algorithmssuch as Apriori-like algorithm [53] Apriori-based methodsGSP [54] PSP [55] and pattern growth approaches FreeSpanand PrefixSpan [56 57] or some new algorithm are devisedspecifically to work with sensor network environment

421 Centralized Approaches Aim to SolveWSNsrsquo Application-Based Issues Esposito et al [58 59] presented a multi-dimensional relational sequence mining framework to iden-tify the hidden frequent temporal correlations betweensensor nodes The algorithm is based on generic level-wise search method called APRIORI [60] for discoveringcorrelated sensors The framework exploits the relationallanguage to describe the temporal evolution of a sensor

International Journal of Distributed Sensor Networks 7

network along with contextual information by working intwo phases Firstly an abstraction step is to segment andlabel the real-valued time series into similar subsequencesby using a kernel density estimator approach Then theknowledge is enriched by adding interval-based operatorsbetween the subsequences obtained in the discretization stepand the relation pattern mining algorithm has been extendedin order to deal with these new operators By taking intoaccount the interval-based temporal data along with contex-tual information about events it discovers interesting andmore human-readable patterns The framework is evaluatedon real dataset collected from a wireless sensor networkmade up of 54 Mica2Dot [61] sensors deployed in the IntelBerkeley Research Lab [62] Each sensor collected topologyinformation along with humidity temperature light andvoltage values once every 31 seconds Results show the strongcorrelation among some measurements which is useful foranomaly detection

Cook et al [21] present MavHome smart home archi-tecture which focuses on the creation of an intelligenthome perceiving the state of the home through sensors andacting upon the environment through device controllers Animportant characteristic of the proposed architecture is theability to make decisions based on predicted activities Topredict the activities an algorithm called episode discovery(ED) is proposed which is based on the work of Srikantand Agrawal [54] for mining sequential patterns from time-ordered transactions Values that can be predicted include theusage pattern of devices in the home the movement patternsof the inhabitants and the typical activities of the inhabitantsThey utilize prediction algorithms on action sequences storedin inhabitant event history to forecast user actions Actionscan then be automated based on the significance of minedpatterns as well as the predictive accuracy of the next eventA key disadvantage is the fact that the entire action historymust be stored and processed off line which is not practicalfor large prediction tasks over a long period of time Cook etal demonstrated the effectiveness of MavHome on syntheticsmart home data and real data collected by students usingX10controllers in their homes Experiments show a predictiveaccuracy as high as 534 on the real data and 944 on thesynthetic data

Rabatel et al [22] presented a strategy to detect anomaliesfrom sensor data to improve the railway maintenance Theyextract sequential pattern from real railway data and identifythe abnormal behavior Based on these abnormal findingsalarms are automatically triggered to notify potential fail-ures This abnormal behavior depends on environmental(weather conditions travel characteristics) and structural(route episode index in the route) changes in data ThePSP [55] algorithm has been used to identify the sequentialpatterns To tackle the environments conditions a contextualknowledge-based method is proposed which is able toprovide information on the seriousness and possible causesof a deviation The proposed technique helps in proactivemaintenance of train However real-time context can beimproved by providing precise and exact information foranomaly detection

a q kTqkTaq

Figure 3 Example of sequential alarm pattern

Guralnik and Haigh [23] use sequential pattern miningto learn typical behaviors of humans in their homes Humanbehavior is inferred by using motion sensors pressure padsdoor latch sensors and toilet flush sensors They installed10ndash20 sensors of different types in a home and built modelsof what sensor firings correspond to what activities in whatorder and at what time For example ldquoIn 60 of the daysthe Kitchen-Motion sensor fires between 18h00 and 18h30and then the Living-Room-Motion sensor fires between18h20 and 20h00 and then the Bedroom-Motion sensor firesbetween 19h45 and 22h00rdquoTheir algorithm uses these data tolearn the sequences of rooms in which the person was actingand it uses domain knowledge to extract the sequences ofrooms the person was acting in These sequences are thenanalyzed by a human expert to identify complex behaviormodels These models can be used to select the appropriateresponse plan to the action of elderly

Wu et al [63] proposed a new algorithm for miningsequential alarm patterns (MSAPs) from the alarm datagenerated by GSM system Sequential events are identifiedfrom alarm data by defining time interval between adjacentevents For example if time is set as six hours then thesequential alarm pattern (119886 119887 119888) indicates that 119886 119887 and 119888happen in order and that the time interval between 119886 and119887 and between 119887 and 119888 is less than six hours An exampleof sequential alarm sequence redrawn from [63] is shown inFigure 3

The number in circle represents the error ID and 119879119886119902

denotes the time difference between alarm event 119886 and alarmevent 119902 The knowledge extracted is not only useful foridentifying relevance between two events but it is also predictthe alarm sequence and takes proper steps to prevent theoccurrence of the alarms if at all possible For example if thenetwork operator detects that the alarm 119886 occurring at time 119905operator should dissipate this alarm before the time 119905+119879

119886119902to

alleviate the abnormal situations incurred The limitation inthis technique is that it cannot discover other possible time-interval patterns between the events

It is observed that there is none of centralized solutionswhich aim to maximize the WSNsrsquo performance

422 DistributedApproaches Aim to SolveWSNsrsquo Application-Based Issues Tseng and Lu [64] proposed an object trackingstrategy named themultilevel object tracking (MLOT) to dis-cover sequential patterns in object tracking sensor networks(OTSNs) by mining the movement log in sensor networks Amultilevel hierarchical structure is adapted by using the clus-tering mechanism that represents the hierarchical relationsamong sensor nodes to achieve the goal of keeping track ofmoving objects in a real-time manner The movement logsof the moving objects are analyzed by developing the data

8 International Journal of Distributed Sensor Networks

mining algorithm movement pattern generation (MPG) toobtain themovement patterns which are then used to predictthe next position of a moving object and to activate the leastsensor node The MPG is based on Apriori which uses thefrequency of the inference pattern to evaluate the confidenceof the pattern and which with the highest frequency serves asthe basis of the prediction

423 Distributed Approaches Aim to Maximize WSNsrsquo Per-formance Tseng and Lin [65] proposed an object trackingstrategy named TMP-mine to discover sequential patternsin object tracking sensor networks (OTSNs) by mining thetemporal movement patterns (TMPs) logs The discoveredtemporal movement rules (TMRs) are used to predict thelocation of next objects for saving energy In the proposedmodel object is able to record the sensor nodes it visitedalong with the arrival time at each nodeThemovement log iscollected by equipping the sensor nodes with storage devicesTheWSN collects and integrates themovement log ofmovingobjects The integrated movement log is used as the input tothe data mining method named the TMP-miner which usesthe pattern growth approach for discovering the TMPs Byapplying the TMP-mine algorithm the TMPs are discoveredand then the temporalmovement rules (TMRs) are generatedfor predicting next location of moving object Suppose thatthe following two rules are discovered by vehicle trackingsystem

Rule 1 (Station A rarr interval 10min rarr Station B rarrinterval 5min rarr Station C)

Rule 2 (Station A rarr interval 20min rarr Station B rarrinterval 5min rarr Station rarr D)

By dispatching these rules to the corresponding sensornodes the tracking can be made in energy-efficient way Forexample if a car moves with the pattern as (Station A rarrinterval 10min rarr Station B rarr interval 5min) that matcheswith Rule 1 then the node in Station B has only to activatethe node in Station C rather than that in Station D or thosearound Station B

Samarah et al [66] proposed an energy-efficientprediction-based tracking technique by using the sequentialpatterns (PTSPs) This technique helps to predict the futurelocation of a moving object with the minimum number ofsensor nodes while keeping the other sensor nodes in thenetwork in sleep mode The PTSP is based on the inheritedpatterns of the objects movements in the network and theutilization of sequential patterns to predict in which sensornode the moving object will be heading next

43 Clustering Clustering is unsupervised learning wheregiven data is categorized into subsets so that each subsetrepresents a cluster which has distinctive properties It hasbeen considered a useful technique especially for applicationsthat require scalability to large number of sensor nodesClustering also supports aggregation of data in order tosummarize the overall transmitted data

ClustersInput sensor data

Feedback

Identification ofdata correlation Grouping data

Figure 4 Data clustering for sensor networks

In the current literatures problems related to clusteringare addressed by node clustering or data clustering Recentlylarge numbers of node clustering algorithms have beendesigned for WSNs [67ndash83] These clustering techniqueswidely vary in their objectives depending on the node deploy-ment and bootstrapping schemes the pursued networkarchitecture the characteristics of the cluster head (CH)and the network operation model Although node clusteringmay be related to data clustering for example consideringdata similarity of neighboring node many popular nodeclustering algorithms that partition the sensor nodes into anumber of small groups and elect a cluster head for everygroup do not use the data mining techniques directly In thisstudy we only focus on data clustering techniques to efficientdata mining and find data correlations among the nodesFigure 4 shows the commonly used data clustering in datamining process

This work adapted the K-mean hierarchical and datacorrelation-based methods The k-mean algorithm takes theinput parameter k and partitions a set of 119899 objects into kclusters so that the resulting intracluster similarity is highbut the intercluster similarity is low Cluster similarity ismeasured with respect to the mean value of the objectsin a cluster Hierarchical method creates a hierarchicaldecomposition of the given set of data objects It works bygrouping data objects into a tree of clusters whereas datacorrelation-based clustering forms clusters based on spatialand temporal correlations with similar node sensory valueswithin a given threshold and these clusters remain fixeduntil the sensory value threshold has changed over timeWhen the threshold values change the related sensor nodeswill then communicate with neighboring nodes associatedwith other clusters to change their cluster memberships Thedrawback of this type of clustering is that it does not considernode residual energy It is observed from the survey that thecentralized and distributed clustering solutions are aim tomaximize the WSNs performance

431 Centralized Approaches Aim to Maximize WSNsrsquo Per-formance Liu et al [84] proposed a centralized graph-basedenergy-efficient data collection (EEDC) EEDC is on-demandclustering algorithm that clusters node into groups such thatmembers have similar sensor readings and thus the protocolclusters the network with an awareness of the phenomenabeing sensed EEDC is a centralized approach where thesink compares data from different nodes with a user-defineddissimilarity measure EEDC models the cluster creationprocess as a clique-covering problem by constructing a graph119866 such that each sensor node is a vertex in the graph An edge(119906 V) is drawn if the dissimilarity measure between vertex119906 and vertex V is less than or equal to the given intracluster

International Journal of Distributed Sensor Networks 9

dissimilarity measure thresholdmax dst A cluster is a cliquein the graph and the clustering problem uses the minimumnumber of cliques to cover all vertices in the graph Thisprocess minimizes the number of clusters and maximizes theenergy saving The sink also dynamically adjusts the clustersbased on spatial correlation and the received data from thesensors The algorithm produces robust and well-balancedclusters However due to centralized processings it is notsuitable for large-scale WSNs

432 Distributed Approaches Aim toMaximizeWSNsrsquo Perfor-mance Guo et al [85] proposed the H-cluster a distributedalgorithm to cluster sensory dataThe input of this algorithmis the set of sensory data collected by all of the sensorsfrom the time WSN starts working up to the current timeThe output of the algorithm is a set of cluster featuresthat summarize the clusters of the input sensory data-setHilbert-Map mapping algorithm has been used to map ad-dimensional sensory data space into a 2-dimensional areacovered by a given WSN H-cluster has 2 phases (1) itmerges connected grid features with local cluster featuresof (sensory dimensional) D at each destination node (2)it combines the connected local clusters to global clustersThe experiments on the centralized and distributed dataare carried out to compare the H-Cluster with C-Cornerand C-Center algorithms During experiment four types ofenvironment attributes are sensed by the sensors which aretemperature humidity light and voltage The results showthatH-Cluster algorithm ismuch efficient in data loss energyand the quality of cluster data in small WSNThe results alsoshows that as the amount of sensory data delivered increasesthe amount of data loss also increases and energy efficiencydecreases by increasing the size of WSNs

Yeo et al [86] proposed data correlation-based clusteringscheme (DCC) based on similarity of sensor data along aspatial suppression scheme which helps to reduce the datasize DCC enhances the advertisement phase of HEED [71]in which cluster heads are selected according to probabilityof becoming a cluster head during this phase sensor nodescommunicate with each other and the resulting clustersare organized by sensor nodes which have similar readingsSpatial suppression is performed on cluster head and italso computes the difference between sensor reading andrepresentative value If a cluster head has redundant datait will remove it except for the node identification Theexperimental results justify the hypothesis claim that theclustering based on data correlation has better compressionperformance than ordinary clustering based on locality ofcommunication they show that DCC reduces 40 of datasize through suppression and prolongs network lifetime20ndash30 However for the large-scale network applications(nodes gt 500) DCC is inefficient because each cluster headneeds more energy to collect similar data readings and alsoto communicate with several nodes Also in case of lowpercentage of similar data reading DCC is ineffective due tohigher rate of cluster head creation

Beyens et al [87] proposed a cluster-based architecturefor wireless sensor networks in which cluster heads spa-tiotemporally correlate and predict the measurements of the

cluster members by executing their prediction model Intheir approach the cluster heads execute a prediction modelwhile gateway nodes at the circumference of the clusters areresponsible for the routing task Prediction model is used toselect a suitable node of the cluster to be activated The ideais to put a sensor node to sleep when there are no objects inits sensing region

Yoon and Shahabi [88] present the clustered aggregation(CAG) algorithm that forms clusters of nodes sensing similarvalues within a given threshold (spatial correlation) andthese clusters remain unchanged as long as the sensor valuesstay within a threshold over time (temporal correlation)By grouping nodes on similar values CAG only transmitsone reading per group When the threshold values changethe related sensor nodes will then communicate with neigh-boring nodes associated with other clusters to change theircluster memberships CAG guarantees the result to be withina user-specified error-tolerance threshold Cluster formationis performed while queries are disseminated to the network(query phase) where clusters group nodes sensing similarvalues Subsequently CAG enters the response phase whereinonly one aggregated value per cluster is transmitted up theaggregation tree CAG is a lossy clustering algorithm (mostsensory readings are never reported) which trades a lowerresult precision for a significant energy storage computationand communication saving

Taherkordi et al [67] proposed a communication-efficient distributed protocol for clustering sensory dataA distributed version of 119870-Mean clustering algorithm isproposed and sends summarized data towards sink whichreduces the communication transmission time and powerconsumption of sensor nodes The sensor network is dividedinto clusters and cluster head node will only communicatewith sink Initially base station transmits current centerlocations to cluster heads Cluster head collects data fromits sensor node and sends it to the base station includingcount and vector sum of its local sensory data points aswell as sum of the squared distance from each local pointto its center On receiving data from CH the base stationupdates the cluster mean and the algorithm repeats until thefunction convergence is met The efficiency of the algorithmis evaluated via simulations Several programs are run to getthe average number of transmissions over the network duringeach test According to results the communication cost isindependent of the number of sensors (119873) and increaseslinearly by increasing the number of centers Major issuesare extra memory for cluster head and computation powerfor summarization of data before transmitting to sink Alsothe algorithm requires multiple rounds of message passingbetween cluster heads and the base station this may have aserious effect on communication efficiency when the numberof sensors is relatively high

Wang et al [89] promoted the idea of clustering theWSNs based on the queries and attributes of the data Themain motive is to achieve efficient dissemination of data inthe network The concept resembles the data-centric designmodel of WSNs The clustering is established by mappinga hierarchy of data attributes to the network topology Thebase station starts the clustering process by asking nodes

10 International Journal of Distributed Sensor Networks

Class label (Y)

Attribute set (X)

OutputInput Classification model

Figure 5 Classification maps input attribute set (X) to class label(Y)

to form clusters Those nodes that hear the request decidewhether they should nominate themselves as CHs basedon their energy After receiving the base-station requestsensor nodes having intention to become CHs wait for arandom time period that is based on the remaining batterysupply If a node nominates itself then it broadcasts anannouncement to all nodes A node joins the CH that itcan reach over the least number of hops Upon hearing aCH announcement from a node whose attribute is differentthe recipient node establishes a new cluster for that attributeand becomes a CH To evaluate the attribute-based clusteringscheme the authors have provided the theoretical analysis ofit with flooding-based schemes Analysis shows its attribute-based clustering scheme yield that gains over flooding-basedschemeswhen there are subregions in the sensor network thatare more targeted than others that is when the distributionof inquiries is not uniformly distributed over time and space

Ma et al [90] the proposed distributed hierarchicalclustering and Summarization algorithm (DHCS) for onlinedata analysis and mining in sensor networks The proposedmethod clusters sensor nodes based on their current datavalues aswell as their geographical proximity and it computesa summary for each cluster The algorithm adopts severaltechniques such as difference and hop count thresholds tomodel node and distance-based clustering Initially eachnode treats itself as an active cluster Then similar adjacentclusters are merged into larger clusters round by round Ineach round each cluster will try to combine with its mostsimilar adjacent cluster simultaneously Two clusters can bemerged only if both consider one another as the most similarneighbor DHCS terminates when no merging happens anymore The final clusters which cannot be merged any moreare called steady clusters

44 Classification Classification is a task of assigning newobject into a class of predefined object categories Classifi-cation model is learned using the set of training data andclassifies new data into one of the learned class Figure 5shows that classification maps input attribute set (X) to classlabel (Y)

Classification-based approaches have adapted the tra-ditional classification techniques such as decision tree-based rule-based nearest neighbor-based and support vectormachines-based techniques based on type of the classificationmodel that they used Decision tree is a classifier in the formof tree and classifies the instance by starting at the root oftree and moving through it until a leaf node where class labelis assigned The internal nodes are used to partition datainto subsets by applying test condition to separate instancesthat have different characteristics Nearest neighbor-basedapproaches classify dataset based on closet training examples

The training examples are vectors in a multidimensionalfeature space with corresponding class labels A nearestneighbor classifier is a lazy learner that does not processpatterns during training [91] To respond a request to classifya query vector is made to locate the closest training vectorsaccording to the distance metricThe classes of these trainingvectors are used to assign a class to the query vector

Rule-based classifier groups the dataset in predefinedclasses by using ldquoif then rdquo rules of following form

(Condition) rarr Y condition is a conjunction ofattribute and Y is a class label

SVM (support vector machine) techniques partition thedata belonging to different classes by fitting a hyperplanebetween them which maximizes the partition The data ismapped into a higher-dimensional feature space where it canbe easily partitioned by a hyperplane Furthermore a kernelfunction is used to approximate the dot products between themapped vectors in the feature space to find the hyperplane

441 Centralized Approaches Aim to SolveWSNsrsquo Application-Based Issues Chikhaoui et al [92] proposed the decisionTree (DT-) based classification technique for sensor dataThey applied the classification model to identify the personsin ubiquitous environment In order to identify personsthe proposed approach first extracts frequent patterns calledepisodes from the datasets using the Apriori algorithm [53]The next step evaluates the extracted patterns and assignsweights to these episodes to construct frequent episodeweight matrix (FEWM)

Finally the classification algorithm Decision tree (DT) isapplied on FEWMDT builds pattern classifier from a labeledtraining data-set using a divide-and-conquer approach Tobuild up a DT model it recursively selects the attribute thatis used to partition the training data-set into subsets untileach leaf node in the tree has uniform class membershipThe proposed approach is validated by experiment usingdata collected from the Domus Laboratory [93] and theTestbed smart home [94] The general performance andclassification accuracy of algorithm are evaluated by usingthe Weka framework version 370 [95] Experiment resultsshow good classification However using frequent episodesalone without temporal constraints and deep analysis doesnot guarantee good identification

Sharma et al [96] proposed amethodology for classifyingthe sensors data by using nearest neighbor trajectory clas-sification (NNTC) The training phase simply stores everytraining example with its label To make a prediction for atest example first its distance to every training example iscomputedThen 119896 closest training examples are storedwhere119896 is a fixed integer and 119896 ge 1 among the 119896 examples itlooks for the label that is most frequent This label is theprediction for this test example The algorithm is evaluatedby building a classifier from the preprocessed training datagenerated from NS2 [97] and test trajectory data [98] usingclass labels Experimental investigation yields a significantoutput in terms of the correctly classified success rate 923

Akhlaghinia et al [99] proposed the prediction techniquein smart home environments to predict the behavior pattern

International Journal of Distributed Sensor Networks 11

of occupantsThe sensor NWs collect the variety of attributesincluding environmental changes and occupantrsquos interactionwith the environment The collected data is then used by thelearning approach to construct a classification-based predic-tive model to predict the ambient intelligence environmentoccupancy The occupancy is predicted by using the fuzzyrules which are modeled by using the past value of timeseries data In the learning process input from the sensor iscompared with stored rules to take appropriate action Theprediction-based approach improves the energy saving insmart homes and enhances the safety and security of occu-pants The result shows the ability of the proposed techniqueto predict the combined occupancy time series However themodel is implemented in single-user environment and unableto predict the complex environmental patterns in multi-userenvironment over long period

442 Centralized Approaches Aim toMaximizeWSNsrsquo Perfor-mance Gaber et al [100] proposed the lightweight classifica-tion (LWClass) a one-pass algorithm for on-board miningof data streams in sensor networks They used the algorithmoutput granularity (AOG) [101 102] technique to preserve thelimited memory size and change the algorithm output rateaccording to data rate available memory algorithm outputrate history and time constraints to fill the available memorywith generated knowledgeThe algorithmworks by searchingfor the nearest instance stored in main memory when a newelement arrives All instances are already stored in the mainmemory according to a prespecified distance threshold Thethreshold here represents the similarity measure acceptableby the algorithm to consider two or more elements as oneelement according to the elements attribute values If thealgorithm finds this element then it checks the class labelIf the class label is the same then it increases the weightfor this instance by one otherwise it decrements the weightby one If the weight becomes zero then this element isreleased from the memory The algorithm is empiricallyvalidated using synthetic streaming data under the resource-constrained environment of a common handheld computer

443 DistributedApproaches Aim to SolveWSNsrsquo Application-Based Issues McConnell and Skillicorn [103] presented adistributed framework for building and deploying predictorsin sensor networks By using the computational power ofeach sensor a powerful learning structure on whole networkis constructed A distributed voting approach is proposedin which each sensor is a leaf of tree (DT) to performlocal prediction Instead of sending the raw data the localpredictive models built on sensors transmit the target class tothe sink At sink the local predication models are combinedto construct global prediction model It shows how thelocal model enables sensors to respond to the change intarget by relearning local models The proposed frameworkis useful especially for sensor networks with limited energycomputation and bandwidth resources It makes efficientthe distributed data mining in the presence of movingclass boundaries Data is also confidentially achieved bytransmitting a predictivemodel instead of original data to the

sink The distributed prediction model is evaluated using J48decision tree (implemented in WEKA) on variety of datasetfor both simple and weighted voting schemes According toresults distributed prediction model has the potential of anincrease in accuracy combined with a reduction in modelsize and runtime as compared with a centralized approachMajor issues in this framework are the need of an expensiveCPU on each sensor node for computing and building localpredictive model and also extra memory is required to storelocal predictive model

444 Distributed Approaches Aim to Maximize WSNsrsquo Per-formance Malhotra et al [104] proposed a distributed clas-sification scheme to generate effective feature vectors of lowdimension (FVLD) for wireless audio network A distributedcluster-based algorithm for detection and classification ofvehicles has been proposed Sensors form clusters on-demand for the sake of running a classification task based onthe produced feature vectors The monitoring area is dividedinto clusters and a cluster head is selected for each clusterAll sensors send their feature vector to cluster heads Thecluster head combines all received feature vectors (includingone from itself) executes the classification task using forexample KNN or ML classifiers and makes decision on theclass of the unknown vehicle Two approacheswere proposedthe first combines extracted features and the second combinesindividual decisions Classification using decision fusion anda maximum likelihood (ML) classifier led to the best resultsML is also compared with KNN classifier with varioussettings of data and decision fusion schemes The proposedtechnique produced the best classification accuracy of 8946as compared with all other approaches

Flouri et al [105ndash107] have proposed distributed andincremental techniques for learning classification rules usingSVM-based (support vector machine) technique in a sensornetwork The authors proposed two distributed algorithmsthe distributed fix partition SVM (DFP-SVM) and theweighted distributed fix partition SVM (WDFP-SVM) fortraining a SVM applied to the classification problem in aWSN SVM is incrementally trained on example set calledsupport vector The fact with SVM is that the number ofsupport vectors is very small comparedwith the number of allsample values Besides the support vectors (and offset) revealcompressed representation of separating SVM hyperplaneThat is why sending only the support vectors instead ofall training samples to the next cluster head is obviouslyvery energy efficient due to communication reduction Aftertraining the required parameters of the kernel functions aretransferred to each node for classification The performanceof the proposed approach is evaluated by running number ofsimulation and comparison is made with centralized algo-rithm The results show that energy consumption decreaseswhen the SVM is trained incrementally as compared with thecentralized case However the challenges for SVM formula-tions are computational complexity and the choice of properkernel function

Rajasegarar et al [108] proposed the SVM-based tech-nique for outlier detection in sensor data This techniqueuses one-class quarter-sphere SVM to identify local outliers

12 International Journal of Distributed Sensor Networks

at each node and to minimize the computational complexityThe sensor data that lies outside the quarter sphere isconsidered as an outlier Each node communicates onlythe radius information of sphere with its parent for outlierclassification This technique identifies outliers from the datameasurements collected after a long-time window and is notperformed in real time The technique also ignores spatialcorrelation of neighboring nodes which makes the results oflocal outliers inaccurate The technique is evaluated by usingthe real sensor measurement collected from deployment ofwireless sensors in the Great Duck Island Project [2] formonitoring the habitat of sea birds The algorithm is imple-mented in Matlab and two simulations were run to measurethe computational strategy and various kernel functionsResults reveal that the proposed technique achieves signifi-cant energy savings in terms of communication overhead inthe network

5 Comparison of Data Mining Techniquesfor WSNs

This section identifies several common and different aspectsof data mining techniques specially designed for WSNsdiscussed above These aspects will be used as metrics in thecomparative Tables 2 3 4 5 and 6 First evaluation aspectsfor different techniques are discussed and then comparativetables are presented to compare and differentiate existing datamining techniques for WSNs data

51 Input Sensor Data Sensor data can be viewed as largevolume of real-valued data that is continuously collectedfrom WSNs The type of input sensor data demonstrateswhich data mining techniques can be used to analyze thedata Data mining techniques usually consider following twocharacteristics of data

Attribute Mining techniques can identify the associationbetween data attributes Attributes can be homogenous [50] orheterogeneous [33 48] Homogenous attribute means sensingsingle-value attribute for example temperature only Forheterogeneous case each nodemay be equippedwithmultiplesensors and can sense multiple attributes for example tem-perature humidity and pressure The data mining techniqueshould be able to identify the correlation between multipleattributes

Correlation Two types of data correlation appear at eachsensor node The first type is attribute correlation that isdependency among data attributes The second type is interms of time and space that is temporal and spatial corre-lation Temporal correlation indicates that the readings fromdifferent sensor node are observed at the same time instantand readings observed at one time instant are related tothe readings observed at the previous time instant whereasspatial correlation indicates that the readings from sensornodes geographically close to each other are expected tobe largely correlated Capturing spatiotemporal correlation

helps to predict future trend of sensor reading and identifica-tion of dead node if reading from correlated sensor ismissing

52 Processing Architecture In order to apply data miningtechnique on sensor data we need to determine the modelsof computation There are two general models Consider thefollowing

CentralizedThe simplest way to analyzeWSNs data is to use acentralized model In this approach entire raw data collectedfromWSNs is transferred to central server whichmaintains adatabase of readings from all of the sensorsThe central serverperforms offline extensive analysis in order to find interestingpatterns from the aggregated data With the size of WSNsincreasing the amount of data transmitted in the system willbecome huge The obvious drawback of this approach is highconsumption of energy and bandwidth Furthermore it is notscalable to very large number of sensors

Distributed Another computation approach uses distributedmodel in which sensor nodes use their processing abilitiesto carry out some mining tasks locally and transmit onlythe required and partially processed data called local modelLocal models contain the compact event patterns rather thanraw data For example data collected from different sensorcan be aggregated before being transmitted to central serverIn these systems an intermediate node called ldquoaggregatorrdquo isused to collect and aggregate the data from different sensorsSince sensor nodes are constrained in resources the challengefor this approach is how to satisfy the mining accuracywhile keeping the communication overhead memory andcomputational cost low

53 Data Mining Method It refers to the data miningalgorithm adapted or developed for unique characteristic ofWSNs data Distributed approaches use one-scan algorithmsfor real-time processing in order to deal with the high dataarrival rate the mining results are expected to be availablewithin short response times whereas centralized approachescollect the sensory data to single site and applies offlinemultiscan technique for extensive data analysis

54 Node Properties The proposed techniques are largelyinfluenced by following types of node properties

Connectivity Single-hop communication is a direct commu-nication between the sensor node and the base station It issimple and easy to implement but limited by communicationdistanceMultihop communication uses some kinds of nodesas relays when transmitting data packets from the source tothe sink which is more complex

Mobility Node mobility increases the complexity of design-ing an appropriate data mining technique for WSNs Themajority of techniques assumes that sensor nodes are staticonly a few techniques consider the node mobility Whennodes are mobile maintaining a certain structure for data

International Journal of Distributed Sensor Networks 13

Table2Com

paris

onof

dataminingtechniqu

esforw

irelesssensor

networks

Approach

Objectiv

eDM

metho

d

Processin

gSensor

data

Nod

eproperties

Implem

entatio

nLimitatio

nsArchitecture

Attributes

Correlatio

nCon

nectivity

Mob

ility

Nod

erole

Nod

etask

Applicationarea

Evaluatio

nmetho

dDatas

ource

Opt

objective

Distributed

Central

Homogenous

Heterogeneous

Attribute

Spatial

Temporal

Singlehop

Multihops

Static

Mobile

ClusterheadSensorRelay

Simulation

AnalyticalMod

Real

Synthetic

Frequent

patte

rnmining

DSA

RM[42]

Missingdata

estim

ation

Aprio

rilik

eradicradic

radicradic

radicradic

Sensea

ndsend

Traffi

cmon

itorin

gradic

radicData

accuracy

Igno

rethes

ensor

thatrepo

rts

different

values

In-networkdata

mining[51]

Eventspatte

rns

discovery

Aprio

rilik

eradic

radicradicradic

radicradic

radic

Aggregatio

nlocalp

attern

mining

Environm

ental

mon

itorin

gradicradicradic

Scalability

Highmem

oryand

commun

ication

Distrib

uted

data

aggregation[15]

ImproveW

SNperfo

rmance

Aprio

rilik

eradic

radicradic

radicradic

radicSupp

ort-b

ased

aggregation

WSN

sperfo

rmance

mon

itorin

gradic

radicDatas

ize

Increasesb

uffer

cost

delayed

crucialm

essages

Onlinea

lgorith

m[46]

Intervallist

ofrepresentatio

nof

WSN

sdata

Lossy

coun

ting

radicradic

radicradic

radicradic

Perio

dical

sensing

WSN

smon

itorin

gradic

radicTimea

ndmem

ory

Datar

edun

dancy

Lightweightrule

learning

[48]

Identifyhigh

lycorrelated

rules

forsensin

gAp

riorilik

eradic

radicradic

radicradic

radicQuery-based

data

sensing

Con

trolW

SNs

operations

radicradic

Energy

Not

valid

ated

well

onrealdata

CARM

[43]

Missingdata

estim

ation

FP-growth

based

radicradic

radicradic

radicradic

Sensea

ndsend

Dataa

nalysis

radicradic

Data

accuracy

Ineffi

cientfor

hand

ling

high

-speed

data

14 International Journal of Distributed Sensor Networks

Table3Com

paris

onof

dataminingtechniqu

esforw

irelesssensor

networkscon

tinued

Approach

Objectiv

eDM

metho

d

Processin

gSensor

data

Nod

eproperties

Implem

entatio

nLimitatio

nsArchitecture

Attributes

Correlatio

nCon

nectivity

Mob

ility

Nod

erole

Nod

etask

Applicationarea

Evaluatio

nmetho

dDatas

ource

Opt

objective

Distributed

Central

Homogenous

Heterogeneous

Attribute

Spatial

Temporal

Singlehop

Multihops

Static

Mobile

Clusterhead

Sensor

Relay

Simulation

Analyticalmod

Real

Synthetic

Frequent

patte

rnmining

Associationrules

mining

fram

ework[50]

Faultand

future

event

predictio

n

FP-growth

usingPL

T-str

uctureradic

radicradic

radicradic

radicradic

Aggregatio

nMon

itorW

SNs

quality

ofserviceradic

radicNoof

messages

Increase

costdu

eto

multip

leDBscan

SP-tr

ee[49]

Disc

over

events

patte

rns

FP-growth

based

radicradic

radicradic

radicradic

Sensea

ndsend

Generic

mon

itorin

gradicradicradic

Mem

ory

Hightre

econstructio

ncost

Sequ

entia

lpattern

mining

Relatio

nal

fram

ework[58]

Multi-

dimensio

nal

correlation

discovery

Aprio

rilik

eradic

radicradicradic

radicradic

Sensea

ndsend

Environm

ental

mon

itorin

gradicradic

Data

representatio

nMem

oryandtim

econsum

ing

Episo

dediscovery(ED)

[21]

Actio

npredictio

n

Generalized

sequ

entia

lpatte

rn(G

SP)

radicradic

radicradic

radicSensea

ndsend

Inhabitants

behavior

predictio

nradicradicradic

Predictio

naccuracy

Ineffi

cientfor

complex

activ

ities

MPG

[64]

Predicto

bjectrsquos

future

movem

ent

Aprio

rilik

eradic

radicradic

radicradicradic

Clusterin

gRe

al-timeo

bject

tracking

radicradic

Tracking

time

andenergy

Not

analyzed

onrealdataset

Con

textual

patte

rns

discovery[22]

Ano

maly

detection

PSP

radicradicradic

radicradic

radicSensea

ndsend

Railw

aymaintenance

radicradic

Ano

maly

precision

Missingreal-time

anom

alypredictio

n

International Journal of Distributed Sensor Networks 15

Table4Com

paris

onof

dataminingtechniqu

esforw

irelesssensor

networkscon

tinued

Approach

Objectiv

eDM

metho

d

Processin

gSensor

data

Nod

eproperties

Implem

entatio

nLimitatio

nsArchitecture

Attributes

Correlatio

nCon

nectivity

Mob

ility

Nod

erole

Nod

etask

Applicationarea

Evaluatio

nmetho

dDatas

ource

Optobjectiv

e

Distributed

Central

Homogenous

Heterogeneous

Attribute

Spatial

Temporal

Singlehop

Multihops

Static

Mobile

ClusterheadSensorRelay

Simulation

Analyticalmod

Real

Synthetic

Sequ

entia

lpattern

mining

TMP-mine[65]

Predicto

bjectrsquos

future

movem

ent

Patte

rngrow

thusingTM

P-tre

econstructio

nradic

radicradic

radicradic

radicRu

le-based

node

activ

ation

Real-timeo

bject

tracking

radicradic

Energy

Highmissing

rateandtim

e

Patte

rnlearner[23]B

ehavior

recogn

ition

Tree

projectio

nradic

radicradic

radicradic

radicSensea

ndsend

Behavior

mon

itorin

gradicradic

Noof

patte

rns

learned

Com

plex

and

redu

ndant

patte

rns

MSA

P[63]

Faultp

rediction

Cand

idate

constructio

nradicradic

radicradicradic

radicSensea

ndsend

Telecommun

ication

radicradic

Patte

rnsa

ccuracy

Cand

idate

constructio

nis

expensiveto

compu

te

PTSP

[66]

Objectrsquos

future

movem

ent

predictio

n

Sequ

entia

lpatte

rngeneratio

nradic

radicradic

radicradic

radicRu

le-based

node

activ

ation

Objecttracking

radicradic

Energy

Ineffi

cientto

predict

high

-speed

objects

Clusterin

g

DCC

[86]

WSN

slon

gevity

Data

correlation-

based

cluste

ring

radicradic

radicradicradic

radicradic

Data

supp

ression

GenericWSN

sapplication

radicradic

Energy

anddata

size

Highclu

sterin

grate

H-cluste

r[85]

In-network

commun

ication

Data

correlation-

based

cluste

ring

radicradic

radicradicradic

radicradic

Data

summarization

Real-time

mon

itorin

gradic

radicradic

Com

mun

ication

Highdataloss

rate

16 International Journal of Distributed Sensor Networks

Table5Com

paris

onof

dataminingtechniqu

esforw

irelesssensor

networkscon

tinued

Approach

Objectiv

eDM

metho

d

Processin

gSensor

data

Nod

eproperties

Implem

entatio

nLimitatio

nsArchitecture

Attributes

Correlatio

nCon

nectivity

Mob

ility

Role

Nod

etask

Applicationarea

Evaluatio

nmetho

dDatas

ource

Optobjectiv

e

Distributed

Central

Homogenous

Heterogeneous

Attribute

Spatial

Temporal

Singlehop

Multihops

Static

Mobile

ClusterheadSensorRelay

Simulation

Analyticalmod

Real

Synthetic

Clusterin

gPredictio

nmod

el[87]

Predictio

n-based

mon

itorin

gHeuris

ticscheme

radicradic

radicradic

radicradic

radicradicradic

Localprediction

mod

elEn

vironm

ental

mon

itorin

gradic

radicCom

mun

ication

Clustero

verla

pping

CAG[88]

WSN

sbandw

idth

gain

Data

correlation-

based

cluste

ring

radicradic

radicradic

radicradic

radicradic

Dataa

ggregatio

nGenericWSN

sapplications

radicradic

Com

mun

ication

Sensorydataloss

EEDC[84]

On-demand

cluste

ring

Data

correlation-

based

cluste

ring

radicradic

radicradic

radicradic

radicSensea

ndsend

Surveillanced

ata

analysis

radicradicradic

Energy

Ineffi

cientfor

large

WSN

s

Clusterin

gsensorydata[67]Com

mun

ication

efficiency

K-means

radicradicradic

radicradic

radicradic

Data

summarization

Dataa

nalysis

radicradic

Com

mun

ication

Ineffi

cientfor

large

WSN

sAttributeb

ased

cluste

ring[89]

WSN

sbandw

idth

gain

Hierarchal

cluste

ringradic

radicradic

radicradic

radicradic

Datac

luste

ring

Mon

itorin

gand

tracking

radicradic

Com

mun

ication

Highcompu

tatio

ncost

DHCS

[90]

Uniform

data

distr

ibution

Hierarchal

cluste

ringradic

radicradicradic

radicradic

radicradic

Datac

luste

ring

and

summarization

Interactived

ata

analysis

radicMessage

redu

ction

Nod

esenergy

isigno

red

International Journal of Distributed Sensor Networks 17

Table6Com

paris

onof

dataminingtechniqu

esforw

irelesssensor

networkscon

tinued

Approach

Objectiv

eDM

metho

d

Processin

gSensor

data

Nod

eproperties

Implem

entatio

nLimitatio

nsArchitecture

Attributes

Correlatio

nCon

nectivity

Mob

ility

Role

Nod

etask

Applicationarea

Evaluatio

nmetho

dDatas

ource

Opt

objective

Distributed

Central

Homogenous

Heterogeneous

Attribute

Spatial

Temporal

Singlehop

Multihops

Static

Mobile

ClusterheadSensorRelay

Simulation

Analyticalmod

Real

Synthetic

Classifi

catio

nPerson

identifi

catio

nalgorithm

s[109]

Identifyhu

man

behavior

Decision

tree

radicradicradic

radicradic

radicSensea

ndsend

Health

care

radicradic

Classifi

catio

naccuracy

Doesn

otgu

arantee

thec

orrectness

Predictio

nfram

ework[103]

Distrib

uted

predictio

nDecision

tree

radicradic

radicradicradic

radicradic

Localprediction

Generic

radicradic

Predictio

naccuracy

Com

putatio

nal

complexity

NNTC

[96]

Real-time

classificatio

nNearest

neighb

orradicradic

radicradic

radicradic

Sensea

ndsend

Generic

radicradicradic

Classifi

catio

naccuracy

Not

evaluatedon

realdataset

LWClass[100]

Preserve

WSN

sresources

KNN

radicradic

radicradic

radicradic

Sensea

ndsend

Ubiqu

itous

environm

ents

radicradic

Resource

awareness

Non

adaptio

nto

conceptd

rift

FVLD

[104

]Lo

w-dim

ensio

nfeaturev

ector

generatio

nKN

NM

Lradic

radicradic

radicradic

radicradic

Classifi

catio

nVe

hicle

classificatio

nradic

radicEn

ergy

Highcostof

feature

vector

transm

ission

Fuzzypredictor

mod

el[99]

Occup

ancy

predictio

nFu

zzyrules

radicradic

radicradic

radicradic

Sensea

ndsend

Health

care

radicradic

Predictio

naccuracy

Ineffi

cientfor

complex

scenarios

Onlinelearning

[105]

Increm

ental

classificatio

nSV

Mradic

radicradic

radicradic

radicradic

Classifi

catio

nEn

vironm

ental

mon

itorin

gradic

radicEn

ergy

Com

putatio

nal

complexity

One-class

quarter-sphere

SVM

[108]

Ano

maly

detection

SVM

radicradic

radicradic

radicradicradic

Localano

maly

detection

Habitat

mon

itorin

gradic

radicEn

ergy

Igno

resspatia

lcorrelation

18 International Journal of Distributed Sensor Networks

mining becomes difficult because updates on this structureshould be persisted over time

Node Role Node can perform three types of role [33] asfollows

(i) Regular Sensor These are the nodes with limitedresources and they are used to sense the phenomenaand send the sensed data to the base station

(ii) Cluster Head Cluster head can be a regular sensornode or it can be rich in resources In centralizedapproaches cluster head is a regular sensor node thatonly controls the cluster membership In distributedapproaches besides responding for cluster formationCHs perform aggregationfusion of collected sensorsrsquodata Therefore they are equipped with significantlymore computation and communication resources

(iii) Relay It is the node that acts as medium to transmitthe data packet from one node to the others

Node Task In centralized approach node task is to sense thephenomena being monitored and send the sensed data to thebase station In distributed approaches node can performcomputation and can take action based on the detectedphenomena or target

55 Application Area We also evaluated the type of applica-tion benefited fromWSNs data mining techniques Here weexemplify some real-world applications as follows

(i) First is the environmental monitoring [5ndash7 51 5887] in which sensors are deployed in harsh andunattended regions to monitor the natural environ-ment Data mining techniques can identify when andwhere an event may occur and trigger an alarm upondetection

(ii) Second is the habitant and health monitoring [1 299 109] in which patientshumans are equipped withsmall sensors on multiple different positions of theirbody tomonitor their health or behaviorDataminingtechnique can identify the abnormal behavior andhelp to take effective action

(iii) Third is the object tracking [3 4 65 66] in whichsensors are embedded inmoving targets to track themin real-time Data mining techniques help to improvethe estimation of the location of targets and also tomake tracking more efficient and accurate

(iv) Fourth is the WSNs performance [46 48 50 51]WSNs are usually unattended and deployed in harshenvironment Sensor nodes are resource constrainedespecially in terms of power Data mining techniqueshelp to identify the faulty or dead nodes Theyalso help to conserve energy by using in-networkprocessing in which aggregated data is sent to centralside

(v) Fifth is the data analysis [67 84 90] Data miningtechniques help to discover potentially interesting

data patterns in a sensor network for a certainapplication

(vi) Sixth is the real-time monitoring [64 65 85] Datamining techniques especially distributed techniqueshelp to identify certain patterns and predict futureevents in a given time window which make real-timeresponse and action feasible

56 Implementation Each technique is also evaluated interms of experimental validation that is which dataset isused which WSNs optimization objectives are achieved andso forth

Evaluation Method Analytical modeling simulation andreal deployment are the most commonly used techniques toanalyze the performance of data mining technique forWSNs

(i) Analytical Modeling This method is very complexand usually certain simplifications are assumed topredict the performance of the proposed schemeSuch assumptions and simplifications may lead toimprecise results with limited confidence

(ii) Simulation It is the most popular and effectiveapproach to design and test any proposed schemein terms of cost and time it also provides higherlevel of details as comparedwith real implementationHowever the appropriate selection of a simulationframework according to problem and network char-acteristics is a critical task

(iii) Real Deployment It may not be feasible to evaluatethe performance of these techniques through realdeployment due to the unavailability of appropriatehardware in terms of technical and design limitationsUsually the real deployment requires hundreds ofsensor nodes and cost becomes another importantissue In a nutshell evaluating any technique pro-posed for WSNs through real deployment can getthe most convincing results although the evaluatingprocess is complex costly and time consuming

Data Source It refers to dataset use to experimentally validatethe proposed technique Two types of dataset are usedgenerally that is synthetic and real It is observed from thispaper that most of the techniques use the simulation onsynthetic dataset to validate the result In this paper it isobserved that most of the studies used the simulation due tolimited processing power of sensor nodes

Optimization Objective SinceWSNs are constrained in termsof different resources the technique is also evaluated in theoptimization objective that has been achieved Most of thetechniques consider the resource constraint and differentdesign philosophies of network None of them can workefficiently for all of the performance metrics like networksize communication overhead energy efficiency memoryconsumption node mobility and and so forth The largevariations in the performance metrics make it a difficult taskto present a comprehensive evaluation

International Journal of Distributed Sensor Networks 19

6 Limitations of Existing Data MiningTechniques for WSNs

Tables 2ndash6 show the characteristics of datamining techniquesdesigned for WSNs It is observed from comparative analysisthat the existing techniques have the following shortcomings

(i) Most of the techniques do not take into account theheterogeneous data and assume that the sensor data ishomogenous [42 46 49ndash51 65 87 110] They ignorethe fact that different attributes together can improvethe mining accuracy In some cases homogenousdata cannot contribute appropriately toward real-time decision

(ii) The majority of techniques only considers the spatialor temporal or spatiotemporal correlations [65ndash6787 88] among sensor data of neighboring nodes anddoes not consider the attribute dependency amongsensor nodes This in turn increases the computa-tional complexity and reduces the accuracy of miningtechnique

(iii) The techniqueswhich consider spatial correlation [51]among sensor data of neighboring nodes suffer fromthe choice of appropriate neighborhood range Tech-niques which consider temporal correlation amongsensor data suffers from the choice of the size of thesliding window

(iv) The majority of techniques uses centralized approach[21 42ndash44 46 58 84 101] in which all data istransmitted to the sink node for identifying certainpatterns These techniques cause much communica-tion overhead and delay the response time Whilethe techniques that used distributed architecture opti-mize response time and energy consumption theyhave the same problem as that of the centralizedapproach if the aggregatorcluster head has a largenumber of nodes under its membership

(v) Excluding a few the performance of all of the schemesdiscussed in this paper has been evaluated with thehelp of different simulation tools Although the num-ber of simulators is available and plays an importantrole for developing and testing new technique thereis always some kind of risk involved as simulationresults may not be accurate In order to analyze aprotocol more effectively it is important to knowdifferent available tools andunderstand the associatedbenefits and limitationsDue to different performancerequirements according to specific applications ageneral tool for sensor networks is still lacking atpresent

(vi) The techniques evaluated by using analytical mod-eling [21 23 46 49 100 109] used certain sim-plification and assumption to evaluate the perfor-mance of proposed technique Such assumptions andsimplifications may lead to imprecise results withlimited confidence None of the proposed techniqueis evaluated by using real deployment Although realdeployment is complex costly and time consuming

accurate results can only be obtained by using realdeployment

(vii) Excluding a few [22 103 109] the majority oftechniques assumes that sensor nodes are stationaryand do not consider nodes mobility Applying thesetechniques for mobile networks or the networks withdynamic changed topology would be challenging

(viii) Most of the techniques used the synthetic dataAlthough synthetic data is easily available therealways been chances that results generated on syn-thetic data are not accurate

(ix) For the data mining techniques themselves fre-quent pattern mining [15ndash20] approaches suffer fromchoice of proper and flexible support and confidencethreshold Clustering techniques [11ndash14] suffer fromthe choice of an appropriate parameter of clusterwidth and computing the distance between datainstances in heterogeneous data is computationallyexpensive whereas classification-based techniques[24ndash26] require some prior knowledge to classify theincoming data stream However learning accurateclassification model is challenging if the number ofvariables is large in deployed WSNs

7 Future Research Directions

It is observed from the analysis of existing data mining workon sensor network-based application there are still shortcom-ings in existing techniques By seeing these shortcomingsand special characteristics of WSNs there is a need for datamining technique designed for WSNs The technique shouldbe based on the following requirements

(i) The technique should combine offline learningmech-anisms with distributed and online data processing

(ii) It should also consider the resource constraint ofWSN and its special characteristics such as nodemobility and network topology

(iii) The technique should consider heterogeneous dataand dependencies among spatial temporal andattribute correlations which may exist between adja-cent nodes

(iv) During online mining the technique should be capa-ble for incremental learning

(v) The technique should have low computation com-plexity and be easy to be implemented

Based on aforementioned requirements for WSN ahybrid data mining framework is proposed as shown inFigure 6 In this framework sensor nodes use their pro-cessing abilities to locally carry out mining processing andtransmit only the required and partially processed data calledlocal models Single-pass algorithms are applied for networkdata processing as the data is continuously arriving and notavailable for the next scan

Local models contain the compact event patterns ratherthan raw data which address the issue of communication

20 International Journal of Distributed Sensor Networks

Node data processingData selectionRemove duplicationAggregationSummarizationData fusionclusteringAssociation analysismiddot middot middotmiddot middot middot

middot middot middot

Sensor datastream

Global model

Approximateresults

Network model Local modelQuery

Users

Sinkbasestation

In-network processingCentralized processing

Central data processingFrequent pattern miningClassificationClusteringIncremental learningPredicationAnomaly detectionTime series analysis

Network data processingLocal model integrationNetwork analysisReal time decisionsNetwork maintenance

Network patternidentification

monitoring

Sing

le p

ass

Mul

ti pa

ss

Figure 6 Proposed hybrid framework for sensor network based applications

overhead associated with data transfer Local models aredistributed on entire network which are integrated at specialnode which is resource sufficient as compared with othersensor nodes As a result a network model is computed that ismore abstract than local model and is transferred to the basestationsink inmultihop fashionThenetworkmodels are thenintegrated at base stationsink to get the global view of entirenetwork named the global model As a result approximatequery answers are returned to endusers

This framework addresses the following shortcomings ofthe existing techniques

(i) It combines the offline learning mechanisms withdistributed and online data processing The dynamicnature of WSNs data requires real-time analysismethodologies and systems Centralized processingthrough high-end computing is also required forgenerating offline predictive insights which in turncan facilitate real-time analysis The applications thatrequire real-time response and actions can use net-work model for decision and knowledge extractionThe applications that need extensive data analysis fortheir decision making can use global model and per-form central processing on base the stationsink Thenetwork model forwards the processed informationto global model for extensive predictive insight

(ii) Since the data management is a crucial issue inWSNsdata [111] in order to deal with large-scale data fromWSNs the proposed framework splits the data pro-cessing tasks at multiple locations in-network pro-cessing and processing at central server In-networkprocessing splits the large task into smaller ones atnode level and cluster head which is distributed overthe entire network and executes parallelly At the node

level storage capacities of single nodes are used tocompute the local model which contains aggregateddata from single node whereas cluster head acquiresthe data from group of nodes and aggregate datareadings over a certain region or period As a resultnetwork model is computed at each cluster headwhich contains compact data from set of nodes andreduces data size to be transmitted Network modelscan be integrated at sink to get the global view ofreal-time applications Since the sink at network levelhas restricted resource and cannot process large-scaledata for predictive analysis therefore network mod-els are sent to central server where global models canbe computed for predictive offline analysis Historicalquery from the user can also be addressed fromcentral server whereas instant query can be handledby sink to support real-time response In this way ofdata distribution the proposed framework is feasibleto deal with large amount of data obtained fromWSNs

(iii) It can consider the resource constraint of sensornode by using context-awareness techniques Mem-ory energy [79] and bandwidth are considered inthe implementation of data processing on the sensorsfor example many summarization and aggregationtechniques can be adopted to reduce energy andbandwidth consumption

(iv) The framework can address the problem quicklychanging nature of WSNs data where characteristicsof the monitored process may change over timeand render the old models outdated This problemcan be addressed using the incremental learning

International Journal of Distributed Sensor Networks 21

mechanism [39 112] that helps the model to updatenew information

(v) The framework can identified the spatial-temporalcorrelation at local model by using data correlation-based clustering whereas attribute correlation can beidentified at global model by using the multipass datamining algorithms

Currently we are working on implementation of thishybrid framework and the implementationwill be completedin the near future

8 Conclusion

The emerging need for the data mining techniques in thefield of WSNs resulted in the development of numerousalgorithms Each one of these algorithms solves certainissues related to the appropriate WSNs type and applicationIn this paper we analyzed discussed and compared therelated existing research approaches We observed that thetechniques intended for mining sensor data at the networkside are helpful for taking real-time decision aswell as serve asprerequisite for development of effective mechanism for datastorage retrieval query and transaction processing at centralside Moreover we have presented problem-based taxonomyan overall analysis and review of the past research and theirlimitations which can provide insights for endusers in apply-ing or developing an appropriate data mining method andappropriate technology forWSNs Based on these limitationswe have proposed a hybrid framework which can addressthe shortcomings of existing work We have also discussedthe challenges for implementing data mining techniques inresource-constrained WSNs Besides there are a number ofopen issues in existing studies which need to be addressedSurely the number of WSNs applications presented hereis neither complete nor exhaustive but merely a sample ofapplications that demonstrate the usefulness and possibleapplications of data mining method in sensor network

We believe that WSNs applications will become moremature and popular with the advancement of sensor tech-nology and sensor data will become more informationrich Mining techniques will then be very significant inorder to conduct advanced analysis such as determiningtrends and finding interesting patterns thus enhancingWSNsperformance and operation The intention to present thispaper is to stimulate interests in utilizing and developing theprevious studies into emerging applications

Acknowledgments

This work was supported in part by the Joint Funds ofNSFC-Microsoft Research Asia under Grant no 60933012the Specialized Research Fund for the Doctoral Programof Higher Education under Grant no 20110142110062 andInternational SampT Cooperation Program of Hubei Provinceunder Grant no 2010BFA008

References

[1] A Rozyyev H Hasbullah and F Subhan ldquoIndoor child track-ing in wireless sensor network using fuzzy logic techniquerdquoResearch Journal of Information Technology vol 3 no 2 pp 81ndash92 2011

[2] R Szewczyk E Osterweil J Polastre M Hamilton A Main-waring and D Estrin ldquoHabitat monitoring with sensor net-worksrdquo Communications of the ACM vol 47 no 6 pp 34ndash402004

[3] S H Chauhdary A K Bashir S C Shah and M S ParkldquoEOATR energy efficient object tracking by auto adjustingtransmission range in wireless sensor networkrdquo Journal ofApplied Sciences vol 9 no 24 pp 4247ndash4252 2009

[4] P K Biswas and S Phoha ldquoSelf-organizing sensor networks forintegrated target surveillancerdquo IEEETransactions onComputersvol 55 no 8 pp 1033ndash1047 2006

[5] L T Lee and C W Chen ldquoSynchronizing sensor networkswith pulse coupled and cluster based approachesrdquo InformationTechnology Journal vol 7 no 5 pp 737ndash745 2008

[6] N Sabri S A Aljunid B Ahmad A Yahya R KamaruddinandM S Salim ldquoWireless sensor actor network based on fuzzyinference system for greenhouse climate controlrdquo Journal ofApplied Sciences vol 11 no 17 pp 3104ndash3116 2011

[7] D Kumar ldquoMonitoring forest cover changes using remotesensing and GIS a global prospectiverdquo Research Journal ofEnvironmental Sciences vol 5 pp 105ndash123 2011

[8] J Yick B Mukherjee and D Ghosal ldquoWireless sensor networksurveyrdquoComputerNetworks vol 52 no 12 pp 2292ndash2330 2008

[9] T Arampatzis J Lygeros and S Manesis ldquoA survey of appli-cations of wireless sensors and wireless sensor networksrdquoin Proceedings of the 20th IEEE International Symposium onIntelligent Control (ISIC rsquo05) pp 719ndash724 June 2005

[10] Y-C Tseng M-S Pan and Y-Y Tsai ldquoWireless sensor net-works for emergency navigationrdquo Computer vol 39 no 7 pp55ndash62 2006

[11] T Yairi Y Kato and K Hori ldquoFault detection by miningassociation rules fromhouse-keeping datardquo inProceedings of the6th International Symposium on Artificial Intelligence Roboticsand Automation in Space pp 18ndash21 2001

[12] O Horovitz S Krishnaswamy and M M Gaber ldquoA fuzzyapproach for interpretation of ubiquitous data stream clusteringand its application in road safetyrdquo Intelligent Data Analysis vol11 no 1 pp 89ndash108 2007

[13] J Gama P P Rodrigues and L Lopes ldquoClustering distributedsensor data streams using local processing and reduced com-municationrdquo Intelligent Data Analysis vol 15 no 1 pp 3ndash282011

[14] Z A Aghbari I Kamel and T Awad ldquoOn clustering largenumber of data streamsrdquo Intelligent Data Analysis vol 16 no1 pp 69ndash91 2012

[15] A Boukerche and S Samarah ldquoAn efficient data extractionmechanism for mining association rules from wireless sensornetworksrdquo in Proceedings of the IEEE International Conferenceon Communications (ICC rsquo07) pp 3936ndash3941 June 2007

[16] Y Chi H Wang P S Yu and R R Muntz ldquoMomentmaintaining closed frequent itemsets over a stream slidingwindowrdquo inProceedings of the 4th IEEE International Conferenceon Data Mining (ICDM rsquo04) pp 59ndash66 November 2004

[17] M Deypir and M H Sadreddini ldquoEclatDS an efficient slid-ing window based frequent pattern mining method for data

22 International Journal of Distributed Sensor Networks

streamsrdquo Intelligent Data Analysis vol 15 no 4 pp 571ndash5872011

[18] J Gama A Ganguly O Omitaomu R Vatsavai and M GaberldquoKnowledge discovery from data streamsrdquo Intelligent DataAnalysis vol 13 no 3 pp 403ndash404 2009

[19] B George J M Kang and S Shekhar ldquoSpatio-temporal sensorgraphs (STSG) a data model for the discovery of spatio-temporal patternsrdquo Intelligent Data Analysis vol 13 no 3 pp457ndash475 2009

[20] A Mahmood K Shi and S Khatoon ldquoMining data generatedby sensor networks a surveyrdquo Information Technology Journalvol 11 pp 1534ndash1543 2012

[21] D J Cook M Youngblood E O Heierman III et alldquoMavHome an agent-based smart homerdquo in Proceedings of the1st IEEE International Conference on Pervasive Computing andCommunications (PerCom rsquo03) pp 521ndash524 March 2003

[22] J Rabatel S Bringay and P Poncelet ldquoSO MAD sensorminingfor anomaly detection in railway datardquo in Advances in DataMining Applications andTheoretical Aspects pp 191ndash205 2009

[23] V Guralnik and K Z Haigh ldquoLearning models of humanbehaviour with sequential patternsrdquo in Proceedings of the AAAI-02 Workshop on Automation as Caregiver pp 24ndash30 2002

[24] S Huang and Y Dong ldquoAn active learning system for miningtime-changing data streamsrdquo Intelligent Data Analysis vol 11no 4 pp 401ndash419 2007

[25] J Beringer and E Hullermeier ldquoEfficient instance-based learn-ing on data streamsrdquo Intelligent Data Analysis vol 11 no 6 pp627ndash650 2007

[26] E J Spinosaa A PD L F deCarvalhoa and J Gamab ldquoNoveltydetection with application to data streamsrdquo Intelligent DataAnalysis vol 13 no 3 pp 405ndash422 2009

[27] M Xie S Han B Tian and S Parvin ldquoAnomaly detectionin wireless sensor networks a surveyrdquo Journal of Network andComputer Applications vol 34 no 4 pp 1302ndash1325 2011

[28] Y Zhang N Meratnia and P Havinga ldquoOutlier detectiontechniques for wireless sensor networks a surveyrdquo IEEE Com-munications Surveys and Tutorials vol 12 no 2 pp 159ndash1702010

[29] V Chandola A Banerjee and V Kumar ldquoAnomaly detection asurveyrdquo ACM Computing Surveys vol 41 no 3 article 15 2009

[30] VMaojo and J Sanandre ldquoA survey of data mining techniquesrdquoMedical Data Analysis Lecture Notes in Computer Science vol1933 pp 17ndash22 2000

[31] W Jinlong X Congfu C Weidong and P Yunhe ldquoSurveyof the study on frequent pattern mining in data streamsrdquo inProceedings of the IEEE International Conference on SystemsMan and Cybernetics (SMC rsquo04) pp 5917ndash5922 October 2004

[32] J Cheng Y Ke and W Ng ldquoA survey on algorithms formining frequent itemsets over data streamsrdquo Knowledge andInformation Systems vol 16 no 1 pp 1ndash27 2008

[33] A A Abbasi andM Younis ldquoA survey on clustering algorithmsfor wireless sensor networksrdquo Computer Communications vol30 no 14-15 pp 2826ndash2841 2007

[34] O Boyinbode H Le and M Takizawa ldquoA survey on clusteringalgorithms for wireless sensor networksrdquo International Journalof Space-Based and SituatedComputing vol 1 no 2 pp 130ndash1362007

[35] M M Gaber A Zaslavsky and S Krishnaswamy ldquoA survey ofclassificationmethods in data streamsrdquo inData Streams pp 39ndash59 Springer 2007

[36] R Agrawal and R Srikant ldquoFast algorithms for mining associ-ation rulesrdquo in Proceedings of the 20th International ConferenceVery Large Data Bases (VLDB rsquo94) pp 487ndash499 Citeseer 1994

[37] R J Bayardo Jr ldquoEfficiently mining long patterns fromdatabasesrdquo SIGMOD Record vol 27 no 2 pp 85ndash93 1998

[38] S Brin RMotwani andC Silverstein ldquoBeyondmarket basketsgeneralizing association rules to correlationsrdquo SIGMODRecordvol 26 no 2 pp 265ndash276 1997

[39] W Cheung and O R Zaiane ldquoIncremental mining of frequentpatterns without candidate generation or support constraintrdquoin Proceedings of 7th International Database Engineering andApplications Symposium pp 111ndash116 2003

[40] R Agrawal T Imielinski and A Swami ldquoMining associationrules between sets of items in large databasesrdquo in Proceeding ofSIGMOD pp 207ndash216

[41] J Han J Pei Y Yin and R Mao ldquoMining frequent pat-terns without candidate generation a frequent-pattern treeapproachrdquo Data Mining and Knowledge Discovery vol 8 no 1pp 53ndash87 2004

[42] M Halatchev and L Gruenwald ldquoEstimating missing valuesin related sensor data streamsrdquo in Proceedings of the 11thInternational Conference on Management of Data (COMADrsquo05) 2005

[43] N Jiang ldquoDiscovering association rules in data streams basedon closed pattern miningrdquo in Proceedings of the SIGMODWorkshop on Innovative Database Research 2007

[44] N Jiang and L Gruenwald ldquoEstimating missing data in datastreamsrdquo Advances in Databases Concepts Systems and Appli-cations pp 981ndash987 2007

[45] N Jiang and L Gruenwald ldquoCFI-stream mining closed fre-quent itemsets in data streamsrdquo in Proceedings of the 12th ACMSIGKDD International Conference on Knowledge Discovery andData Mining (KDD rsquo06) pp 592ndash597 August 2006

[46] K Loo I Tong and B Kao ldquoOnline algorithms for min-ing inter-stream associations from large sensor networksrdquo inAdvances in KnowledgeDiscovery andDataMining pp 291ndash3022005

[47] G S Manku and R Motwani ldquoApproximate frequency countsover data streamsrdquo in Proceedings of the 28th InternationalConference on Very Large Data Bases pp 346ndash357 2002

[48] S K Chong S Krishnaswamy S W Loke and M M GaberldquoUsing association rules for energy conservation in wirelesssensor networksrdquo in Proceedings of the 23rd Annual ACMSymposium on Applied Computing (SAC rsquo08) pp 971ndash975March 2008

[49] S K Tanbeer C F Ahmed B-S Jeong and Y-K Lee ldquoEfficientmining of association rules from wireless sensor networksrdquo inProceedings of the 11th International Conference on AdvancedCommunication Technology (ICACT rsquo09) pp 719ndash724 February2009

[50] A Boukerche and S Samarah ldquoA novel algorithm for miningassociation rules in Wireless Ad Hoc Sensor Networksrdquo IEEETransactions on Parallel and Distributed Systems vol 19 no 7pp 865ndash877 2008

[51] K Romer ldquoDistributed mining of spatio-temporal event pat-terns in sensor networksrdquo in Proceedings of the 1st Euro-American Workshop on Middleware for Sensor Networks(EAWMS rsquo06) 2006

[52] BTnode platform httpwwwbtnodeethzch[53] R Agrawal and R Srikant ldquoMining sequential patternsrdquo in

Proceedings of the IEEE 11th International Conference on DataEngineering pp 3ndash14 March 1995

International Journal of Distributed Sensor Networks 23

[54] R Srikant and R Agrawal ldquoMining sequential patterns gen-eralizations and performance improvementsrdquo in Proceedings ofthe Advances in Database Technology (EDBT rsquo96) pp 1ndash17 1996

[55] F Masseglia F Cathala and P Poncelet ldquoThe PSP approachfor mining sequential patternsrdquo Principles of Data Mining andKnowledge Discovery pp 176ndash184 1998

[56] J Han J Pei B Mortazavi-Asl Q Chen U Dayal and M-CHsu ldquoFreeSpan frequent pattern-projected sequential patternminingrdquo in Proceedings of the Sixth ACMSIGKDD InternationalConference onKnowledgeDiscovery andDataMining (KDD rsquo01)pp 355ndash359 August 2000

[57] J Pei J Han B Mortazavi-Asl et al ldquoPrefixSpan min-ing sequential patterns efficiently by prefix-projected patterngrowthrdquo in Proceedings of the 17th International Conference onData Engineering pp 215ndash224 April 2001

[58] F Esposito T M A Basile N Di Mauro and S Ferilli ldquoA rela-tional approach to sensor network data miningrdquo InformationRetrieval and Mining in Distributed Environments pp 163ndash1812010

[59] F Esposito N Di Mauro T M A Basile and S FerillildquoMulti-dimensional relational sequence miningrdquo FundamentaInformaticae vol 89 no 1 pp 23ndash43 2008

[60] R Agrawal H Mannila R Srikant et al ldquoFast discovery ofassociation rulesrdquo inAdvances in KnowledgeDiscovery andDataMining pp 307ndash328 AAAI PressMenlo Park Calif USA 1996

[61] Mica2Dot CrossBow 2005 httpwwwxbowcom[62] Intel Berkeley Research Lab Data httpdbcsailmitedulab-

datalabdatahtml[63] P H Wu W C Peng and M S Chen ldquoMining sequential

alarm patterns in a telecommunication databaserdquo in Databasesin Telecommunications II pp 37ndash51 2001

[64] V S Tseng and E H-C Lu ldquoEnergy-efficient real-time objecttracking in multi-level sensor networks by mining and predict-ing movement patternsrdquo Journal of Systems and Software vol82 no 4 pp 697ndash706 2009

[65] V S Tseng and K W Lin ldquoEnergy efficient strategies for objecttracking in sensor networks a data mining approachrdquo Journalof Systems and Software vol 80 no 10 pp 1678ndash1698 2007

[66] S Samarah M Al-Hajri and A Boukerche ldquoA predictiveenergy-efficient technique to support object-tracking sensornetworksrdquo IEEE Transactions on Vehicular Technology vol 60no 2 pp 656ndash663 2011

[67] A Taherkordi R Mohammadi and F Eliassen ldquoA commu-nication-efficient distributed clustering algorithm for sensornetworksrdquo in Proceedings of the 22nd International Conferenceon Advanced Information Networking and Applications Work-shopsSymposia (AINA rsquo08) pp 634ndash638 March 2008

[68] G Gupta and M Younis ldquoLoad-balanced clustering of wirelesssensor networksrdquo in Proceedings of the International Conferenceon Communications (ICC rsquo03) vol 3 pp 1848ndash1852 May 2003

[69] S Bandyopadhyay and E J Coyle ldquoAn energy efficient hier-archical clustering algorithm for wireless sensor networksrdquo inProceedings of the 22nd Annual Joint Conference on the IEEEComputer and Communications Societies pp 1713ndash1723 April2003

[70] S Ghiasi A Srivastava X Yang and M Sarrafzadeh ldquoOptimalenergy aware clustering in sensor networksrdquo Sensors vol 2 no7 pp 258ndash269 2002

[71] O Younis and S Fahmy ldquoHEED a hybrid energy-efficientdistributed clustering approach for ad hoc sensor networksrdquoIEEE Transactions on Mobile Computing vol 3 no 4 pp 366ndash379 2004

[72] M Younis M Youssef and K Arisha ldquoEnergy-aware manage-ment for cluster-based sensor networksrdquo Computer Networksvol 43 no 5 pp 649ndash668 2003

[73] Y T Hou Y Shi H D Sherali and S F Midkiff ldquoOn energyprovisioning and relay node placement for wireless sensornetworksrdquo IEEE Transactions on Wireless Communications vol4 no 5 pp 2579ndash2590 2005

[74] T Wu and S Biswas ldquoA self-reorganizing slot allocation proto-col for multi-cluster sensor networksrdquo in Proceedings of the 4thInternational Symposium on Information Processing in SensorNetworks (IPSN rsquo05) pp 309ndash316 April 2005

[75] K Dasgupta K Kalpakis and P Namjoshi ldquoAn efficientclustering-based heuristic for data gathering and aggregationin sensor networksrdquo in Proceedings of the IEEE Wireless Com-munications and Networking Conference (WCNC rsquo03) vol 3 pp1948ndash1953 2003

[76] M Demirbas A Arora and V Mittal ldquoFLOC A fast local clus-tering service for wireless sensor networksrdquo in Proceedings ofWorkshop on Dependability Issues in Wireless Ad Hoc Networksand Sensor Networks (DIWANS rsquo04) 2004

[77] P Ding J Holliday and A Celik ldquoDistributed energy-efficienthierarchical clustering for wireless sensor networksrdquo in Pro-ceedings of the 1st IEEE International Conference on DistributedComputing in Sensor Systems (DCOSS rsquo05) pp 466ndash467 July2005

[78] H Chan and A Perrig ldquoACE an emergent algorithm for highlyuniform cluster formationrdquoWireless Sensor Networks vol 2920pp 154ndash171 2004

[79] H Chan M Luk and A Perrig ldquoUsing clustering informationfor sensor network localizationrdquo in Proceedings of the 1st IEEEInternational Conference on Distributed Computing in SensorSystems (DCOSS rsquo05) pp 109ndash125 July 2005

[80] H Huang and J Wu ldquoA probabilistic clustering algorithmin wireless sensor networksrdquo in Proceeding of IEEE 62ndSemiannual Vehicular Technology Conference (VTC rsquo05) p 17962005

[81] A Youssef M Younis M Youssef and A Agrawala ldquoDis-tributed formation of overlappingmulti-hop clusters in wirelesssensor networksrdquo in Proceedings of the 49th Annual IEEE GlobalCommunication Conference (Globecom rsquo06) pp 1ndash6 December2006

[82] S Dai P Wang L Gao and S Zheng ldquoMining clusteringalgorithm in wireless sensor networksrdquo in Proceedings of theIEEE International Conference on Granular Computing (GRCrsquo08) pp 178ndash182 August 2008

[83] W R Heinzelman A Chandrakasan and H Balakrish-nan ldquoEnergy-efficient communication protocol for wirelessmicrosensor networksrdquo in Proceedings of the 33rd AnnualHawaii International Conference on System Siences (HICSS rsquo00)vol 2 p 223 January 2000

[84] C Liu K Wu and J Pei ldquoA dynamic clustering and schedulingapproach to energy saving in data collection from wirelesssensor networksrdquo in Proceedings of the 2nd Annual IEEE Com-munications Society Conference on Sensor and AdHoc Commu-nications and Networks (SECON rsquo05) pp 374ndash385 September2005

[85] L Guo C Ai X Wang Z Cai and Y Li ldquoReal time clusteringof sensory data in wireless sensor networksrdquo in Proceedingsof the IEEE 28th International Performance Computing andCommunications Conference (IPCCC rsquo09) pp 33ndash40 December2009

24 International Journal of Distributed Sensor Networks

[86] M H Yeo M S Lee S J Lee and J S Yoo ldquoData correlation-based clustering in sensor networksrdquo in Proceedings of the Inter-national Symposium on Computer Science and its Applications(CSA rsquo08) pp 332ndash337 October 2008

[87] P Beyens A Nowe and K Steenhaut ldquoHigh-density wirelesssensor networks a new clustering approach for prediction-based monitoringrdquo in Proceedings of the 2nd European Work-shop on Wireless Sensor Networks (EWSN rsquo05) pp 188ndash196February 2005

[88] S Yoon and C Shahabi ldquoThe Clustered AGgregation (CAG)technique leveraging spatial and temporal correlations in wire-less sensor networksrdquo ACM Transactions on Sensor Networksvol 3 no 1 Article ID 1210672 2007

[89] K Wang S A Ayyash T D C Little and P Basu ldquoAttribute-based clustering for information dissemination in wirelesssensor networksrdquo in Proceedings of the 2nd Annual IEEE Com-munications Society Conference on Sensor and AdHoc Commu-nications and Networks (SECON rsquo05) pp 498ndash509 Santa ClaraCalif USA September 2005

[90] X Ma S Li Q Luo et al ldquoDistributed hierarchical clusteringand summarization in sensor networksrdquo in Advances in Dataand Web Management pp 168ndash175 2007

[91] L K Sharma O P Vyas S Schieder et al ldquoNearest neighbourclassification for trajectory datardquo Information and Communica-tion Technologies vol 101 pp 180ndash185 2010

[92] B Chikhaoui S Wang and H Pigot ldquoA new algorithm basedon sequential pattern mining for person identification in ubiq-uitous environmentsrdquo in Proceedings of the 4th InternationalWorkshop on Knowledge Discovery form Sensor Data (ACMSensorKDD rsquo10) pp 20ndash28 Washington DC USA 2010

[93] J R M Bauchet S Giroux H Pigot et al ldquoPervasive assistancein smart homes for people with intellectual disabilities a casestudy on meal preparationrdquo International Journal of AssistiveRobotics and Mechatronics vol 9 no 4 pp 42ndash54 2008

[94] D J Cook andM Schmitter-Edgecombe ldquoAssessing the qualityof activities in a smart environmentrdquoMethods of Information inMedicine vol 48 no 5 pp 480ndash485 2009

[95] I H Witten and E Frank Data Mining Practical MachineLearning Tools and Techniques With Java Implementation Mor-gan Kaufmann 2000

[96] K Sharma M Rajpoot and L K Sharma ldquoNearest neighbourclassification for wireless sensor network datardquo InternationalJournal of Computer Trends and Technology no 2 2011

[97] NS2 Simulator httpwwwisiedunsnamns[98] O P V L K Sharma S Schieder and A K Akasapu ldquoA nearest

neighbour classification for trajectory datardquo in Springer CCISvol 101 pp 180ndash185 2010

[99] M J Akhlaghinia A Lotfi C Langensiepen and N SherkatldquoA fuzzy predictor model for the occupancy prediction of anintelligent inhabited environmentrdquo in Proceedings of the IEEEInternational Conference on Fuzzy Systems (FUZZ rsquo08) pp 939ndash946 June 2008

[100] M Gaber S Krishnaswamy and A Zaslavsky ldquoOn-boardmining of data streams in sensor networksrdquo in AdvancedMethods for Knowledge Discovery from Complex Data pp 307ndash335 2005

[101] M M Gaber S Krishnaswamy and A Zaslavsky ldquoAdaptivemining techniques for data streams using algorithm outputgranularityrdquo in Proceedings of the Australasian Data MiningWorkshop 2003

[102] M M Gaber A Zaslavsky and S Krishnaswamy ldquoResource-aware knowledge discovery in data streamsrdquo in Proceedingsof 1st International Workshop on Knowledge Discovery in DataStreams held in Conjunction ECML and PKDD 2004

[103] S M McConnell and D B Skillicorn ldquoA distributed approachfor prediction in sensor networksrdquo in Proceedings of the Work-shop on Data Mining in Sensor Networks Newport Beach CalifUSA 2005

[104] B Malhotra I Nikolaidis and J Harms ldquoDistributed classifi-cation of acoustic targets in wireless audio-sensor networksrdquoComputer Networks vol 52 no 13 pp 2582ndash2593 2008

[105] K Flouri B Beferull-Lozano and T Tsakalides ldquoTraininga SVM-based classifier in distributed sensor networksrdquo inProceedings of the 14th International Conference onDigital SignalProcessing (DSP rsquo09) pp 1ndash5 2006

[106] K Flouri B Beferull-Lozano and T Tsakalides ldquoEnergy-efficient distributed support vectormachines for wireless sensornetworksrdquo in Proceedings of the EuropeanWorkshop onWirelessSensor Networks 2006

[107] K Flouri B Beferull-Lozano and T Tsakalides ldquoDistributedconsensus algorithms for SVM training in wireless sensornetworksrdquo in Proceedings of the 16th European Signal ProcessingConference (EUSIPCO 09) 2008

[108] S Rajasegarar C Leckie M Palaniswami and J C BezdekldquoQuarter sphere based distributed anomaly detection in wire-less sensor networksrdquo in Proceedings of the IEEE InternationalConference on Communications (ICC rsquo07) pp 3864ndash3869 June2007

[109] B Chikhaoui S Wang and H Pigot ldquoA new algorithm basedon sequential pattern mining for person identification in ubiq-uitous environmentsrdquo in Proceedings of the 4th InternationalWorkshop on Knowledge Discovery form Sensor Data (ACMSensorKDD rsquo10) pp 20ndash28 Washington DC USA 2010

[110] K Romer and F Mattern ldquoThe design space of wireless sensornetworksrdquo IEEEWireless Communications vol 11 no 6 pp 54ndash61 2004

[111] O Diallo J J P C Rodrigues and M Sene ldquoReal-time datamanagement on wireless sensor networks a surveyrdquo Journal ofNetwork andComputer Applications vol 35 no 3 pp 1013ndash10212012

[112] Y Yao L Feng B Jin and F Chen ldquoAn incremental learningapproachwith SupportVectorMachine for network data streamclassification problemrdquo Information Technology Journal vol 11no 2 pp 200ndash208 2012

Submit your manuscripts athttpwwwhindawicom

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2013Part I

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

DistributedSensor Networks

International Journal of

ISRN Signal Processing

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Mechanical Engineering

Advances in

Modelling amp Simulation in EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Advances inOptoElectronics

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2013

ISRN Sensor Networks

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawi Publishing Corporation httpwwwhindawicom Volume 2013

The Scientific World Journal

ISRN Robotics

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

International Journal of

Antennas andPropagation

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

ISRN Electronics

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

thinspJournalthinspofthinsp

Sensors

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Active and Passive Electronic Components

Chemical EngineeringInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Electrical and Computer Engineering

Journal of

ISRN Civil Engineering

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Advances inAcoustics ampVibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Page 5: ReviewArticle Data Mining Techniques for Wireless Sensor ...home.etf.bg.ac.rs/~vm/os/dmsw/Data Mining... · have a large impact on type of data mining algorithm to choose;therefore,onehastodecidetheprocessing

International Journal of Distributed Sensor Networks 5

distributed solutions have been proposed with the aimto maximize the WSNsrsquo performance and maximize theapplication-based performance by applying Apriori-like andFP-growth methods over WSNs data

411 Centralized Approaches Aim to SolveWSNsrsquo Application-Based Issues Halatchev and Gruenwald [42] proposed acentralized methodology called data stream association rulemining (DSARM) to identify the missing sensorrsquos readings Ituses the association rulemining algorithm to identify sensorsthat report the same data for a number of times in a slidingwindow called related sensors and then estimates the missingdata from a sensor by using the data reported by its relatedsensors Due to the stream nature of sensor data applyingan association mining algorithm such as Apriori directly tosensor data is not possible This situation led the authorsto propose the DSARM framework that adapts the Apriorialgorithm to make it applicable to the data stream receivedfrom sensor nodesThis technique is evaluated by simulationexperiments on real data collected by the Department ofTransportation in Austin TX USA to estimate missingvalue in related data streams Performance evaluations wereconducted to compare DSARM and alternative approachesThe results show that DSARM requires more memory spaceand takes longer to produce estimation than the consideredalternative approaches it achieves better accuracy of theestimated value than the alternative approaches do Howeverthere exist some limitations in DSARM First it is basedon two frequent itemsets association rule mining whichmeans that it can discover the relationships only between twosensors and ignore the cases where missing values are relatedwith multiple sensors Second it finds those relationshipsonly when both sensors report the same value and ignoresthe cases where missing values can be estimated by therelationships between sensors that report different values

Jiang and Gruenwald [43 44] proposed a data estimationtechnique called CARM (closed item-sets-based associationrule mining) which can derive the most recent associationrules between the sensors in the current sliding window Thetechnique is based on the closed frequent item-sets miningalgorithmof data streams calledCFI-stream [45] Itmaintainsan in-memory data structure called direct update (DIU) treeto store closed item-sets When a new transaction arrivesthe algorithm checks each item-set in the transaction over adata stream slidingwindowonline and incrementally updatesthe closed item-setsrsquo support If CRAM found some missingvalues in sensor reading instead of generating all possibleassociation rules it generates the rules that have strongrelationships with the current round of sensor readingswhereone or more readings are missing Based on these rules andselected closed item-sets CRAM generates the estimatedvalues which contain item values that are not included inthe original readings Figure 2 redrawn from [43] shows theDIU tree after receiving first four transactions It shows thatcurrently there are four closed item-sets C AB CD andABCin the DIU tree and their associated supports at the right-upper corner are 3 3 1 and 2 A basic set of rules is generatedfrom these frequent item-sets All other rules can be inferredfrom this basic rule set

Φ

CDTim

eline

TID

1

2

3

4

Items

C D

A B

A B C

A B C

AB 3

C 3

ABC 2

Figure 2 Lexicographical-ordered direct update tree

412 Centralized Approaches Aim to Maximize WSNsrsquo Per-formance Loo et al [46] have proposed online one-passalgorithms for mining large sensor streams They mine thefrequent value set from sensor stream data by transformingthe stream data into interval list (IL) under lossy countingframework [47] The time is divided into equal-size intervaland snapshot from the sensor reading is taken when there isan update on sensor reading Sensorsrsquo value at that snapshotconstructs the value sets stored in database An Apriori-based strategy is used to mine the value sets The analysisof IL-based presentation of stream data showed favorableresults using synthetic data-set However while computingthe IL of candidate value set redundant intersection ofIL is inevitable which affects the performance in termsof time and computation cost The proposed technique isevaluated by comparing the performance of ILB againstan application of lossy counting (LC) using a weightedtransformation method on synthetic dataset According totheir experiments ILB outperforms LC significantly for largesensor networks Moreover both the processing time andmemory consumption of ILB are more stable than those ofLC

Chong et al [48] proposed a rule-learning model thatfinds strong rules from sensor readings The rules are used asa trigger to control sensor network operations for examplethey can be used to sleep sensor or reduce data transmissionto conserve energy To mine the rules Apriori is modified tocount the number of transactions that are frequent insteadof the item-sets within transactions and transactions areprocessed in batches 119887

1 1198872 119887

119883 Suppose there is node

119872 that collects light temperature and microphone readingfrom three other sensor streams 119878

0 1198781 and 119878

2 Initially 119872

is queried to collect all sensory values it is used to generatea rule of the form of 119886

119899which implies 119886

119899minus1 therefore the

rule is extracted and only 119886119899is sent to the base station Upon

receiving the reading 119886119899and utilizing knowledge of the rule

the reading of 119886119899minus1

can be inferred All extracted rules arestored in rule repository The proposed method is validatedby using simulation implemented in C language on syntheticdataset In the experiment the first correlated data receivedfrom sensor is used to extract rules For subsequent phasethese rules are used to infer reading of sensor for the nextround

Tanbeer et al [49] proposed a tree-based data structurecalled sensor pattern tree (SP-tree) to generate association

6 International Journal of Distributed Sensor Networks

rules from WSNs data with one database scan The mainidea of the proposed approach is to obtain the frequencyof all event-detecting sensorsrsquo data construct a prefix-treebased on that in any canonical order and then reorganizethe tree in a frequency descending order Through thereorganization the SP-tree canmaintain the frequently event-detecting sensorsrsquo nodes at the upper part of the tree whichin turn provides high compactness in the tree structureOnce the SP-tree is constructed FP-growthmining techniqueis applied to find the frequent event-detecting sensor setsExperiments are performed to verify the improvement inmemory consumption and runtime that SP-tree achieves overPLT [50] The experiments show that SP-tree outperformsPLT in time and memory consumption The reason of suchgain is two folds first the PLT construction requires twodatabase scans while SP-tree constructs the tree by scanningthe database only once second the mining phase of SP-tree is highly efficient due to the frequency-descending treestructure

413 Distributed Approaches Aim to SolveWSNsrsquo Application-Based Issues Romer [51] proposed an in-network data min-ing technique to discover frequent patterns of events withcertain spatial and temporal properties In this approach userspecifies the upper boundmaxscope andmaxhistory (variableto be measured in seconds) for the patterns of interest Thesensor collects these events and applies amining algorithm todiscover the pattern that satisfies the given parameters Eachnode in the network collects the events from its neighborswithin themaximum scope and keeps a history of their eventsfor duration of the maximum history After that each nodeapplies a mining algorithm to discover the local frequentpatterns The resulting frequent patterns are converted toassociation rules that describe an event of type 119864 that occursat node 119899 with support 119878 and confidence 119862 Local patternsare sent to the sink where secondary mining is performed tocompute the global picture of entire network The algorithmis implemented on BT node (bluetooth radio) platform [52]and the tradeoff between scope of the query and resourceconsumption on real dataset is evaluated Results show byreducing the scope of the query that the proposed approachcould decrease resource consumption Major issues in thisapproach are memory consumption of itemset discoveryalgorithms and the communication overhead of event collec-tion

414 Distributed Approaches Aim to Maximize WSNsrsquo Perfor-mance Boukerche and Samarah [15] presented a distributeddata extraction methodology to aggregate the data on sensornode which reduced the number of messages during trans-mission The distributed solution sends some parameterssuch as support time-slot size and historic period from sink toall nodes within network Each sensor node has its own bufferentry to set the support value After each time slot nodescheck whether there are messages received during this timeslot if yes then that node will set its buffer entry When thehistoric period ended each node will traverse its buffer if thenumber of set value is more than or equal to support value

provided initially then the message would be transfered tosink To evaluate the validity of the distributed approach it iscompared with the centralized methodology on real datasetThey conducted two experiments using historical periods of 5and 10 days with minimum support values ranging from 10to 90 and a time-slot size equal to 30 seconds All of thereported results show a reduction in the number of messagesand the data sizewhile increasing in the support valuesMajorissues in thismethodology are increase in cost for node bufferand also delay in crucial messages in case of high supportvalue

Boukerche and Samarah [50] proposed the positionallexicographic tree (PLT) structure for mining associationrules in which the event-detecting sensors are the mainobjects of the rules regardless of their values Similar to theFP-growth approach PLT follows a pattern growth miningtechnique The mining begins with the sensor having themaximum rank by generating the frequent patterns from itsPLT in a recursive way The computation is required at eachrecursion to update the PLT involved in the prefix part ofa pattern Therefore two database scans requirement andthe additional PLT update operations during mining limitthe efficient use of this approach in handling WSNs dataThe performance evaluation is done by comparing the PLTstructure with the FP-growth algorithm According to theirresults PLT structure outperforms FP-growth in terms ofCPU time and memory usage for all of the support valuesused the enhanced performance using PLT when comparedwith FP-growth ranges from 30 percent to 50 percent

42 Sequential PatternMining (SPM) Frequent patternmin-ing has been extended to find more complex structuresuch as sequential pattern mining It discovers frequentsubsequences as patterns in a sequence database A sequencedatabase stores a number of records where all records aresequences of ordered events with orwithout concrete notionsof time A large number of real-world domains such as userprofiling medicine local weather forecast and bioinformat-ics show an inherent tendency to be modeled by means ofsequences of eventsobjects related to each other This greatvariety of applications of sequential pattern mining makesthis problem one of the central topics in WSNs data miningas shown by the research efforts produced in the recent yearsThe sequential pattern mining techniques in sensor networkbased on either traditional sequential mining algorithmssuch as Apriori-like algorithm [53] Apriori-based methodsGSP [54] PSP [55] and pattern growth approaches FreeSpanand PrefixSpan [56 57] or some new algorithm are devisedspecifically to work with sensor network environment

421 Centralized Approaches Aim to SolveWSNsrsquo Application-Based Issues Esposito et al [58 59] presented a multi-dimensional relational sequence mining framework to iden-tify the hidden frequent temporal correlations betweensensor nodes The algorithm is based on generic level-wise search method called APRIORI [60] for discoveringcorrelated sensors The framework exploits the relationallanguage to describe the temporal evolution of a sensor

International Journal of Distributed Sensor Networks 7

network along with contextual information by working intwo phases Firstly an abstraction step is to segment andlabel the real-valued time series into similar subsequencesby using a kernel density estimator approach Then theknowledge is enriched by adding interval-based operatorsbetween the subsequences obtained in the discretization stepand the relation pattern mining algorithm has been extendedin order to deal with these new operators By taking intoaccount the interval-based temporal data along with contex-tual information about events it discovers interesting andmore human-readable patterns The framework is evaluatedon real dataset collected from a wireless sensor networkmade up of 54 Mica2Dot [61] sensors deployed in the IntelBerkeley Research Lab [62] Each sensor collected topologyinformation along with humidity temperature light andvoltage values once every 31 seconds Results show the strongcorrelation among some measurements which is useful foranomaly detection

Cook et al [21] present MavHome smart home archi-tecture which focuses on the creation of an intelligenthome perceiving the state of the home through sensors andacting upon the environment through device controllers Animportant characteristic of the proposed architecture is theability to make decisions based on predicted activities Topredict the activities an algorithm called episode discovery(ED) is proposed which is based on the work of Srikantand Agrawal [54] for mining sequential patterns from time-ordered transactions Values that can be predicted include theusage pattern of devices in the home the movement patternsof the inhabitants and the typical activities of the inhabitantsThey utilize prediction algorithms on action sequences storedin inhabitant event history to forecast user actions Actionscan then be automated based on the significance of minedpatterns as well as the predictive accuracy of the next eventA key disadvantage is the fact that the entire action historymust be stored and processed off line which is not practicalfor large prediction tasks over a long period of time Cook etal demonstrated the effectiveness of MavHome on syntheticsmart home data and real data collected by students usingX10controllers in their homes Experiments show a predictiveaccuracy as high as 534 on the real data and 944 on thesynthetic data

Rabatel et al [22] presented a strategy to detect anomaliesfrom sensor data to improve the railway maintenance Theyextract sequential pattern from real railway data and identifythe abnormal behavior Based on these abnormal findingsalarms are automatically triggered to notify potential fail-ures This abnormal behavior depends on environmental(weather conditions travel characteristics) and structural(route episode index in the route) changes in data ThePSP [55] algorithm has been used to identify the sequentialpatterns To tackle the environments conditions a contextualknowledge-based method is proposed which is able toprovide information on the seriousness and possible causesof a deviation The proposed technique helps in proactivemaintenance of train However real-time context can beimproved by providing precise and exact information foranomaly detection

a q kTqkTaq

Figure 3 Example of sequential alarm pattern

Guralnik and Haigh [23] use sequential pattern miningto learn typical behaviors of humans in their homes Humanbehavior is inferred by using motion sensors pressure padsdoor latch sensors and toilet flush sensors They installed10ndash20 sensors of different types in a home and built modelsof what sensor firings correspond to what activities in whatorder and at what time For example ldquoIn 60 of the daysthe Kitchen-Motion sensor fires between 18h00 and 18h30and then the Living-Room-Motion sensor fires between18h20 and 20h00 and then the Bedroom-Motion sensor firesbetween 19h45 and 22h00rdquoTheir algorithm uses these data tolearn the sequences of rooms in which the person was actingand it uses domain knowledge to extract the sequences ofrooms the person was acting in These sequences are thenanalyzed by a human expert to identify complex behaviormodels These models can be used to select the appropriateresponse plan to the action of elderly

Wu et al [63] proposed a new algorithm for miningsequential alarm patterns (MSAPs) from the alarm datagenerated by GSM system Sequential events are identifiedfrom alarm data by defining time interval between adjacentevents For example if time is set as six hours then thesequential alarm pattern (119886 119887 119888) indicates that 119886 119887 and 119888happen in order and that the time interval between 119886 and119887 and between 119887 and 119888 is less than six hours An exampleof sequential alarm sequence redrawn from [63] is shown inFigure 3

The number in circle represents the error ID and 119879119886119902

denotes the time difference between alarm event 119886 and alarmevent 119902 The knowledge extracted is not only useful foridentifying relevance between two events but it is also predictthe alarm sequence and takes proper steps to prevent theoccurrence of the alarms if at all possible For example if thenetwork operator detects that the alarm 119886 occurring at time 119905operator should dissipate this alarm before the time 119905+119879

119886119902to

alleviate the abnormal situations incurred The limitation inthis technique is that it cannot discover other possible time-interval patterns between the events

It is observed that there is none of centralized solutionswhich aim to maximize the WSNsrsquo performance

422 DistributedApproaches Aim to SolveWSNsrsquo Application-Based Issues Tseng and Lu [64] proposed an object trackingstrategy named themultilevel object tracking (MLOT) to dis-cover sequential patterns in object tracking sensor networks(OTSNs) by mining the movement log in sensor networks Amultilevel hierarchical structure is adapted by using the clus-tering mechanism that represents the hierarchical relationsamong sensor nodes to achieve the goal of keeping track ofmoving objects in a real-time manner The movement logsof the moving objects are analyzed by developing the data

8 International Journal of Distributed Sensor Networks

mining algorithm movement pattern generation (MPG) toobtain themovement patterns which are then used to predictthe next position of a moving object and to activate the leastsensor node The MPG is based on Apriori which uses thefrequency of the inference pattern to evaluate the confidenceof the pattern and which with the highest frequency serves asthe basis of the prediction

423 Distributed Approaches Aim to Maximize WSNsrsquo Per-formance Tseng and Lin [65] proposed an object trackingstrategy named TMP-mine to discover sequential patternsin object tracking sensor networks (OTSNs) by mining thetemporal movement patterns (TMPs) logs The discoveredtemporal movement rules (TMRs) are used to predict thelocation of next objects for saving energy In the proposedmodel object is able to record the sensor nodes it visitedalong with the arrival time at each nodeThemovement log iscollected by equipping the sensor nodes with storage devicesTheWSN collects and integrates themovement log ofmovingobjects The integrated movement log is used as the input tothe data mining method named the TMP-miner which usesthe pattern growth approach for discovering the TMPs Byapplying the TMP-mine algorithm the TMPs are discoveredand then the temporalmovement rules (TMRs) are generatedfor predicting next location of moving object Suppose thatthe following two rules are discovered by vehicle trackingsystem

Rule 1 (Station A rarr interval 10min rarr Station B rarrinterval 5min rarr Station C)

Rule 2 (Station A rarr interval 20min rarr Station B rarrinterval 5min rarr Station rarr D)

By dispatching these rules to the corresponding sensornodes the tracking can be made in energy-efficient way Forexample if a car moves with the pattern as (Station A rarrinterval 10min rarr Station B rarr interval 5min) that matcheswith Rule 1 then the node in Station B has only to activatethe node in Station C rather than that in Station D or thosearound Station B

Samarah et al [66] proposed an energy-efficientprediction-based tracking technique by using the sequentialpatterns (PTSPs) This technique helps to predict the futurelocation of a moving object with the minimum number ofsensor nodes while keeping the other sensor nodes in thenetwork in sleep mode The PTSP is based on the inheritedpatterns of the objects movements in the network and theutilization of sequential patterns to predict in which sensornode the moving object will be heading next

43 Clustering Clustering is unsupervised learning wheregiven data is categorized into subsets so that each subsetrepresents a cluster which has distinctive properties It hasbeen considered a useful technique especially for applicationsthat require scalability to large number of sensor nodesClustering also supports aggregation of data in order tosummarize the overall transmitted data

ClustersInput sensor data

Feedback

Identification ofdata correlation Grouping data

Figure 4 Data clustering for sensor networks

In the current literatures problems related to clusteringare addressed by node clustering or data clustering Recentlylarge numbers of node clustering algorithms have beendesigned for WSNs [67ndash83] These clustering techniqueswidely vary in their objectives depending on the node deploy-ment and bootstrapping schemes the pursued networkarchitecture the characteristics of the cluster head (CH)and the network operation model Although node clusteringmay be related to data clustering for example consideringdata similarity of neighboring node many popular nodeclustering algorithms that partition the sensor nodes into anumber of small groups and elect a cluster head for everygroup do not use the data mining techniques directly In thisstudy we only focus on data clustering techniques to efficientdata mining and find data correlations among the nodesFigure 4 shows the commonly used data clustering in datamining process

This work adapted the K-mean hierarchical and datacorrelation-based methods The k-mean algorithm takes theinput parameter k and partitions a set of 119899 objects into kclusters so that the resulting intracluster similarity is highbut the intercluster similarity is low Cluster similarity ismeasured with respect to the mean value of the objectsin a cluster Hierarchical method creates a hierarchicaldecomposition of the given set of data objects It works bygrouping data objects into a tree of clusters whereas datacorrelation-based clustering forms clusters based on spatialand temporal correlations with similar node sensory valueswithin a given threshold and these clusters remain fixeduntil the sensory value threshold has changed over timeWhen the threshold values change the related sensor nodeswill then communicate with neighboring nodes associatedwith other clusters to change their cluster memberships Thedrawback of this type of clustering is that it does not considernode residual energy It is observed from the survey that thecentralized and distributed clustering solutions are aim tomaximize the WSNs performance

431 Centralized Approaches Aim to Maximize WSNsrsquo Per-formance Liu et al [84] proposed a centralized graph-basedenergy-efficient data collection (EEDC) EEDC is on-demandclustering algorithm that clusters node into groups such thatmembers have similar sensor readings and thus the protocolclusters the network with an awareness of the phenomenabeing sensed EEDC is a centralized approach where thesink compares data from different nodes with a user-defineddissimilarity measure EEDC models the cluster creationprocess as a clique-covering problem by constructing a graph119866 such that each sensor node is a vertex in the graph An edge(119906 V) is drawn if the dissimilarity measure between vertex119906 and vertex V is less than or equal to the given intracluster

International Journal of Distributed Sensor Networks 9

dissimilarity measure thresholdmax dst A cluster is a cliquein the graph and the clustering problem uses the minimumnumber of cliques to cover all vertices in the graph Thisprocess minimizes the number of clusters and maximizes theenergy saving The sink also dynamically adjusts the clustersbased on spatial correlation and the received data from thesensors The algorithm produces robust and well-balancedclusters However due to centralized processings it is notsuitable for large-scale WSNs

432 Distributed Approaches Aim toMaximizeWSNsrsquo Perfor-mance Guo et al [85] proposed the H-cluster a distributedalgorithm to cluster sensory dataThe input of this algorithmis the set of sensory data collected by all of the sensorsfrom the time WSN starts working up to the current timeThe output of the algorithm is a set of cluster featuresthat summarize the clusters of the input sensory data-setHilbert-Map mapping algorithm has been used to map ad-dimensional sensory data space into a 2-dimensional areacovered by a given WSN H-cluster has 2 phases (1) itmerges connected grid features with local cluster featuresof (sensory dimensional) D at each destination node (2)it combines the connected local clusters to global clustersThe experiments on the centralized and distributed dataare carried out to compare the H-Cluster with C-Cornerand C-Center algorithms During experiment four types ofenvironment attributes are sensed by the sensors which aretemperature humidity light and voltage The results showthatH-Cluster algorithm ismuch efficient in data loss energyand the quality of cluster data in small WSNThe results alsoshows that as the amount of sensory data delivered increasesthe amount of data loss also increases and energy efficiencydecreases by increasing the size of WSNs

Yeo et al [86] proposed data correlation-based clusteringscheme (DCC) based on similarity of sensor data along aspatial suppression scheme which helps to reduce the datasize DCC enhances the advertisement phase of HEED [71]in which cluster heads are selected according to probabilityof becoming a cluster head during this phase sensor nodescommunicate with each other and the resulting clustersare organized by sensor nodes which have similar readingsSpatial suppression is performed on cluster head and italso computes the difference between sensor reading andrepresentative value If a cluster head has redundant datait will remove it except for the node identification Theexperimental results justify the hypothesis claim that theclustering based on data correlation has better compressionperformance than ordinary clustering based on locality ofcommunication they show that DCC reduces 40 of datasize through suppression and prolongs network lifetime20ndash30 However for the large-scale network applications(nodes gt 500) DCC is inefficient because each cluster headneeds more energy to collect similar data readings and alsoto communicate with several nodes Also in case of lowpercentage of similar data reading DCC is ineffective due tohigher rate of cluster head creation

Beyens et al [87] proposed a cluster-based architecturefor wireless sensor networks in which cluster heads spa-tiotemporally correlate and predict the measurements of the

cluster members by executing their prediction model Intheir approach the cluster heads execute a prediction modelwhile gateway nodes at the circumference of the clusters areresponsible for the routing task Prediction model is used toselect a suitable node of the cluster to be activated The ideais to put a sensor node to sleep when there are no objects inits sensing region

Yoon and Shahabi [88] present the clustered aggregation(CAG) algorithm that forms clusters of nodes sensing similarvalues within a given threshold (spatial correlation) andthese clusters remain unchanged as long as the sensor valuesstay within a threshold over time (temporal correlation)By grouping nodes on similar values CAG only transmitsone reading per group When the threshold values changethe related sensor nodes will then communicate with neigh-boring nodes associated with other clusters to change theircluster memberships CAG guarantees the result to be withina user-specified error-tolerance threshold Cluster formationis performed while queries are disseminated to the network(query phase) where clusters group nodes sensing similarvalues Subsequently CAG enters the response phase whereinonly one aggregated value per cluster is transmitted up theaggregation tree CAG is a lossy clustering algorithm (mostsensory readings are never reported) which trades a lowerresult precision for a significant energy storage computationand communication saving

Taherkordi et al [67] proposed a communication-efficient distributed protocol for clustering sensory dataA distributed version of 119870-Mean clustering algorithm isproposed and sends summarized data towards sink whichreduces the communication transmission time and powerconsumption of sensor nodes The sensor network is dividedinto clusters and cluster head node will only communicatewith sink Initially base station transmits current centerlocations to cluster heads Cluster head collects data fromits sensor node and sends it to the base station includingcount and vector sum of its local sensory data points aswell as sum of the squared distance from each local pointto its center On receiving data from CH the base stationupdates the cluster mean and the algorithm repeats until thefunction convergence is met The efficiency of the algorithmis evaluated via simulations Several programs are run to getthe average number of transmissions over the network duringeach test According to results the communication cost isindependent of the number of sensors (119873) and increaseslinearly by increasing the number of centers Major issuesare extra memory for cluster head and computation powerfor summarization of data before transmitting to sink Alsothe algorithm requires multiple rounds of message passingbetween cluster heads and the base station this may have aserious effect on communication efficiency when the numberof sensors is relatively high

Wang et al [89] promoted the idea of clustering theWSNs based on the queries and attributes of the data Themain motive is to achieve efficient dissemination of data inthe network The concept resembles the data-centric designmodel of WSNs The clustering is established by mappinga hierarchy of data attributes to the network topology Thebase station starts the clustering process by asking nodes

10 International Journal of Distributed Sensor Networks

Class label (Y)

Attribute set (X)

OutputInput Classification model

Figure 5 Classification maps input attribute set (X) to class label(Y)

to form clusters Those nodes that hear the request decidewhether they should nominate themselves as CHs basedon their energy After receiving the base-station requestsensor nodes having intention to become CHs wait for arandom time period that is based on the remaining batterysupply If a node nominates itself then it broadcasts anannouncement to all nodes A node joins the CH that itcan reach over the least number of hops Upon hearing aCH announcement from a node whose attribute is differentthe recipient node establishes a new cluster for that attributeand becomes a CH To evaluate the attribute-based clusteringscheme the authors have provided the theoretical analysis ofit with flooding-based schemes Analysis shows its attribute-based clustering scheme yield that gains over flooding-basedschemeswhen there are subregions in the sensor network thatare more targeted than others that is when the distributionof inquiries is not uniformly distributed over time and space

Ma et al [90] the proposed distributed hierarchicalclustering and Summarization algorithm (DHCS) for onlinedata analysis and mining in sensor networks The proposedmethod clusters sensor nodes based on their current datavalues aswell as their geographical proximity and it computesa summary for each cluster The algorithm adopts severaltechniques such as difference and hop count thresholds tomodel node and distance-based clustering Initially eachnode treats itself as an active cluster Then similar adjacentclusters are merged into larger clusters round by round Ineach round each cluster will try to combine with its mostsimilar adjacent cluster simultaneously Two clusters can bemerged only if both consider one another as the most similarneighbor DHCS terminates when no merging happens anymore The final clusters which cannot be merged any moreare called steady clusters

44 Classification Classification is a task of assigning newobject into a class of predefined object categories Classifi-cation model is learned using the set of training data andclassifies new data into one of the learned class Figure 5shows that classification maps input attribute set (X) to classlabel (Y)

Classification-based approaches have adapted the tra-ditional classification techniques such as decision tree-based rule-based nearest neighbor-based and support vectormachines-based techniques based on type of the classificationmodel that they used Decision tree is a classifier in the formof tree and classifies the instance by starting at the root oftree and moving through it until a leaf node where class labelis assigned The internal nodes are used to partition datainto subsets by applying test condition to separate instancesthat have different characteristics Nearest neighbor-basedapproaches classify dataset based on closet training examples

The training examples are vectors in a multidimensionalfeature space with corresponding class labels A nearestneighbor classifier is a lazy learner that does not processpatterns during training [91] To respond a request to classifya query vector is made to locate the closest training vectorsaccording to the distance metricThe classes of these trainingvectors are used to assign a class to the query vector

Rule-based classifier groups the dataset in predefinedclasses by using ldquoif then rdquo rules of following form

(Condition) rarr Y condition is a conjunction ofattribute and Y is a class label

SVM (support vector machine) techniques partition thedata belonging to different classes by fitting a hyperplanebetween them which maximizes the partition The data ismapped into a higher-dimensional feature space where it canbe easily partitioned by a hyperplane Furthermore a kernelfunction is used to approximate the dot products between themapped vectors in the feature space to find the hyperplane

441 Centralized Approaches Aim to SolveWSNsrsquo Application-Based Issues Chikhaoui et al [92] proposed the decisionTree (DT-) based classification technique for sensor dataThey applied the classification model to identify the personsin ubiquitous environment In order to identify personsthe proposed approach first extracts frequent patterns calledepisodes from the datasets using the Apriori algorithm [53]The next step evaluates the extracted patterns and assignsweights to these episodes to construct frequent episodeweight matrix (FEWM)

Finally the classification algorithm Decision tree (DT) isapplied on FEWMDT builds pattern classifier from a labeledtraining data-set using a divide-and-conquer approach Tobuild up a DT model it recursively selects the attribute thatis used to partition the training data-set into subsets untileach leaf node in the tree has uniform class membershipThe proposed approach is validated by experiment usingdata collected from the Domus Laboratory [93] and theTestbed smart home [94] The general performance andclassification accuracy of algorithm are evaluated by usingthe Weka framework version 370 [95] Experiment resultsshow good classification However using frequent episodesalone without temporal constraints and deep analysis doesnot guarantee good identification

Sharma et al [96] proposed amethodology for classifyingthe sensors data by using nearest neighbor trajectory clas-sification (NNTC) The training phase simply stores everytraining example with its label To make a prediction for atest example first its distance to every training example iscomputedThen 119896 closest training examples are storedwhere119896 is a fixed integer and 119896 ge 1 among the 119896 examples itlooks for the label that is most frequent This label is theprediction for this test example The algorithm is evaluatedby building a classifier from the preprocessed training datagenerated from NS2 [97] and test trajectory data [98] usingclass labels Experimental investigation yields a significantoutput in terms of the correctly classified success rate 923

Akhlaghinia et al [99] proposed the prediction techniquein smart home environments to predict the behavior pattern

International Journal of Distributed Sensor Networks 11

of occupantsThe sensor NWs collect the variety of attributesincluding environmental changes and occupantrsquos interactionwith the environment The collected data is then used by thelearning approach to construct a classification-based predic-tive model to predict the ambient intelligence environmentoccupancy The occupancy is predicted by using the fuzzyrules which are modeled by using the past value of timeseries data In the learning process input from the sensor iscompared with stored rules to take appropriate action Theprediction-based approach improves the energy saving insmart homes and enhances the safety and security of occu-pants The result shows the ability of the proposed techniqueto predict the combined occupancy time series However themodel is implemented in single-user environment and unableto predict the complex environmental patterns in multi-userenvironment over long period

442 Centralized Approaches Aim toMaximizeWSNsrsquo Perfor-mance Gaber et al [100] proposed the lightweight classifica-tion (LWClass) a one-pass algorithm for on-board miningof data streams in sensor networks They used the algorithmoutput granularity (AOG) [101 102] technique to preserve thelimited memory size and change the algorithm output rateaccording to data rate available memory algorithm outputrate history and time constraints to fill the available memorywith generated knowledgeThe algorithmworks by searchingfor the nearest instance stored in main memory when a newelement arrives All instances are already stored in the mainmemory according to a prespecified distance threshold Thethreshold here represents the similarity measure acceptableby the algorithm to consider two or more elements as oneelement according to the elements attribute values If thealgorithm finds this element then it checks the class labelIf the class label is the same then it increases the weightfor this instance by one otherwise it decrements the weightby one If the weight becomes zero then this element isreleased from the memory The algorithm is empiricallyvalidated using synthetic streaming data under the resource-constrained environment of a common handheld computer

443 DistributedApproaches Aim to SolveWSNsrsquo Application-Based Issues McConnell and Skillicorn [103] presented adistributed framework for building and deploying predictorsin sensor networks By using the computational power ofeach sensor a powerful learning structure on whole networkis constructed A distributed voting approach is proposedin which each sensor is a leaf of tree (DT) to performlocal prediction Instead of sending the raw data the localpredictive models built on sensors transmit the target class tothe sink At sink the local predication models are combinedto construct global prediction model It shows how thelocal model enables sensors to respond to the change intarget by relearning local models The proposed frameworkis useful especially for sensor networks with limited energycomputation and bandwidth resources It makes efficientthe distributed data mining in the presence of movingclass boundaries Data is also confidentially achieved bytransmitting a predictivemodel instead of original data to the

sink The distributed prediction model is evaluated using J48decision tree (implemented in WEKA) on variety of datasetfor both simple and weighted voting schemes According toresults distributed prediction model has the potential of anincrease in accuracy combined with a reduction in modelsize and runtime as compared with a centralized approachMajor issues in this framework are the need of an expensiveCPU on each sensor node for computing and building localpredictive model and also extra memory is required to storelocal predictive model

444 Distributed Approaches Aim to Maximize WSNsrsquo Per-formance Malhotra et al [104] proposed a distributed clas-sification scheme to generate effective feature vectors of lowdimension (FVLD) for wireless audio network A distributedcluster-based algorithm for detection and classification ofvehicles has been proposed Sensors form clusters on-demand for the sake of running a classification task based onthe produced feature vectors The monitoring area is dividedinto clusters and a cluster head is selected for each clusterAll sensors send their feature vector to cluster heads Thecluster head combines all received feature vectors (includingone from itself) executes the classification task using forexample KNN or ML classifiers and makes decision on theclass of the unknown vehicle Two approacheswere proposedthe first combines extracted features and the second combinesindividual decisions Classification using decision fusion anda maximum likelihood (ML) classifier led to the best resultsML is also compared with KNN classifier with varioussettings of data and decision fusion schemes The proposedtechnique produced the best classification accuracy of 8946as compared with all other approaches

Flouri et al [105ndash107] have proposed distributed andincremental techniques for learning classification rules usingSVM-based (support vector machine) technique in a sensornetwork The authors proposed two distributed algorithmsthe distributed fix partition SVM (DFP-SVM) and theweighted distributed fix partition SVM (WDFP-SVM) fortraining a SVM applied to the classification problem in aWSN SVM is incrementally trained on example set calledsupport vector The fact with SVM is that the number ofsupport vectors is very small comparedwith the number of allsample values Besides the support vectors (and offset) revealcompressed representation of separating SVM hyperplaneThat is why sending only the support vectors instead ofall training samples to the next cluster head is obviouslyvery energy efficient due to communication reduction Aftertraining the required parameters of the kernel functions aretransferred to each node for classification The performanceof the proposed approach is evaluated by running number ofsimulation and comparison is made with centralized algo-rithm The results show that energy consumption decreaseswhen the SVM is trained incrementally as compared with thecentralized case However the challenges for SVM formula-tions are computational complexity and the choice of properkernel function

Rajasegarar et al [108] proposed the SVM-based tech-nique for outlier detection in sensor data This techniqueuses one-class quarter-sphere SVM to identify local outliers

12 International Journal of Distributed Sensor Networks

at each node and to minimize the computational complexityThe sensor data that lies outside the quarter sphere isconsidered as an outlier Each node communicates onlythe radius information of sphere with its parent for outlierclassification This technique identifies outliers from the datameasurements collected after a long-time window and is notperformed in real time The technique also ignores spatialcorrelation of neighboring nodes which makes the results oflocal outliers inaccurate The technique is evaluated by usingthe real sensor measurement collected from deployment ofwireless sensors in the Great Duck Island Project [2] formonitoring the habitat of sea birds The algorithm is imple-mented in Matlab and two simulations were run to measurethe computational strategy and various kernel functionsResults reveal that the proposed technique achieves signifi-cant energy savings in terms of communication overhead inthe network

5 Comparison of Data Mining Techniquesfor WSNs

This section identifies several common and different aspectsof data mining techniques specially designed for WSNsdiscussed above These aspects will be used as metrics in thecomparative Tables 2 3 4 5 and 6 First evaluation aspectsfor different techniques are discussed and then comparativetables are presented to compare and differentiate existing datamining techniques for WSNs data

51 Input Sensor Data Sensor data can be viewed as largevolume of real-valued data that is continuously collectedfrom WSNs The type of input sensor data demonstrateswhich data mining techniques can be used to analyze thedata Data mining techniques usually consider following twocharacteristics of data

Attribute Mining techniques can identify the associationbetween data attributes Attributes can be homogenous [50] orheterogeneous [33 48] Homogenous attribute means sensingsingle-value attribute for example temperature only Forheterogeneous case each nodemay be equippedwithmultiplesensors and can sense multiple attributes for example tem-perature humidity and pressure The data mining techniqueshould be able to identify the correlation between multipleattributes

Correlation Two types of data correlation appear at eachsensor node The first type is attribute correlation that isdependency among data attributes The second type is interms of time and space that is temporal and spatial corre-lation Temporal correlation indicates that the readings fromdifferent sensor node are observed at the same time instantand readings observed at one time instant are related tothe readings observed at the previous time instant whereasspatial correlation indicates that the readings from sensornodes geographically close to each other are expected tobe largely correlated Capturing spatiotemporal correlation

helps to predict future trend of sensor reading and identifica-tion of dead node if reading from correlated sensor ismissing

52 Processing Architecture In order to apply data miningtechnique on sensor data we need to determine the modelsof computation There are two general models Consider thefollowing

CentralizedThe simplest way to analyzeWSNs data is to use acentralized model In this approach entire raw data collectedfromWSNs is transferred to central server whichmaintains adatabase of readings from all of the sensorsThe central serverperforms offline extensive analysis in order to find interestingpatterns from the aggregated data With the size of WSNsincreasing the amount of data transmitted in the system willbecome huge The obvious drawback of this approach is highconsumption of energy and bandwidth Furthermore it is notscalable to very large number of sensors

Distributed Another computation approach uses distributedmodel in which sensor nodes use their processing abilitiesto carry out some mining tasks locally and transmit onlythe required and partially processed data called local modelLocal models contain the compact event patterns rather thanraw data For example data collected from different sensorcan be aggregated before being transmitted to central serverIn these systems an intermediate node called ldquoaggregatorrdquo isused to collect and aggregate the data from different sensorsSince sensor nodes are constrained in resources the challengefor this approach is how to satisfy the mining accuracywhile keeping the communication overhead memory andcomputational cost low

53 Data Mining Method It refers to the data miningalgorithm adapted or developed for unique characteristic ofWSNs data Distributed approaches use one-scan algorithmsfor real-time processing in order to deal with the high dataarrival rate the mining results are expected to be availablewithin short response times whereas centralized approachescollect the sensory data to single site and applies offlinemultiscan technique for extensive data analysis

54 Node Properties The proposed techniques are largelyinfluenced by following types of node properties

Connectivity Single-hop communication is a direct commu-nication between the sensor node and the base station It issimple and easy to implement but limited by communicationdistanceMultihop communication uses some kinds of nodesas relays when transmitting data packets from the source tothe sink which is more complex

Mobility Node mobility increases the complexity of design-ing an appropriate data mining technique for WSNs Themajority of techniques assumes that sensor nodes are staticonly a few techniques consider the node mobility Whennodes are mobile maintaining a certain structure for data

International Journal of Distributed Sensor Networks 13

Table2Com

paris

onof

dataminingtechniqu

esforw

irelesssensor

networks

Approach

Objectiv

eDM

metho

d

Processin

gSensor

data

Nod

eproperties

Implem

entatio

nLimitatio

nsArchitecture

Attributes

Correlatio

nCon

nectivity

Mob

ility

Nod

erole

Nod

etask

Applicationarea

Evaluatio

nmetho

dDatas

ource

Opt

objective

Distributed

Central

Homogenous

Heterogeneous

Attribute

Spatial

Temporal

Singlehop

Multihops

Static

Mobile

ClusterheadSensorRelay

Simulation

AnalyticalMod

Real

Synthetic

Frequent

patte

rnmining

DSA

RM[42]

Missingdata

estim

ation

Aprio

rilik

eradicradic

radicradic

radicradic

Sensea

ndsend

Traffi

cmon

itorin

gradic

radicData

accuracy

Igno

rethes

ensor

thatrepo

rts

different

values

In-networkdata

mining[51]

Eventspatte

rns

discovery

Aprio

rilik

eradic

radicradicradic

radicradic

radic

Aggregatio

nlocalp

attern

mining

Environm

ental

mon

itorin

gradicradicradic

Scalability

Highmem

oryand

commun

ication

Distrib

uted

data

aggregation[15]

ImproveW

SNperfo

rmance

Aprio

rilik

eradic

radicradic

radicradic

radicSupp

ort-b

ased

aggregation

WSN

sperfo

rmance

mon

itorin

gradic

radicDatas

ize

Increasesb

uffer

cost

delayed

crucialm

essages

Onlinea

lgorith

m[46]

Intervallist

ofrepresentatio

nof

WSN

sdata

Lossy

coun

ting

radicradic

radicradic

radicradic

Perio

dical

sensing

WSN

smon

itorin

gradic

radicTimea

ndmem

ory

Datar

edun

dancy

Lightweightrule

learning

[48]

Identifyhigh

lycorrelated

rules

forsensin

gAp

riorilik

eradic

radicradic

radicradic

radicQuery-based

data

sensing

Con

trolW

SNs

operations

radicradic

Energy

Not

valid

ated

well

onrealdata

CARM

[43]

Missingdata

estim

ation

FP-growth

based

radicradic

radicradic

radicradic

Sensea

ndsend

Dataa

nalysis

radicradic

Data

accuracy

Ineffi

cientfor

hand

ling

high

-speed

data

14 International Journal of Distributed Sensor Networks

Table3Com

paris

onof

dataminingtechniqu

esforw

irelesssensor

networkscon

tinued

Approach

Objectiv

eDM

metho

d

Processin

gSensor

data

Nod

eproperties

Implem

entatio

nLimitatio

nsArchitecture

Attributes

Correlatio

nCon

nectivity

Mob

ility

Nod

erole

Nod

etask

Applicationarea

Evaluatio

nmetho

dDatas

ource

Opt

objective

Distributed

Central

Homogenous

Heterogeneous

Attribute

Spatial

Temporal

Singlehop

Multihops

Static

Mobile

Clusterhead

Sensor

Relay

Simulation

Analyticalmod

Real

Synthetic

Frequent

patte

rnmining

Associationrules

mining

fram

ework[50]

Faultand

future

event

predictio

n

FP-growth

usingPL

T-str

uctureradic

radicradic

radicradic

radicradic

Aggregatio

nMon

itorW

SNs

quality

ofserviceradic

radicNoof

messages

Increase

costdu

eto

multip

leDBscan

SP-tr

ee[49]

Disc

over

events

patte

rns

FP-growth

based

radicradic

radicradic

radicradic

Sensea

ndsend

Generic

mon

itorin

gradicradicradic

Mem

ory

Hightre

econstructio

ncost

Sequ

entia

lpattern

mining

Relatio

nal

fram

ework[58]

Multi-

dimensio

nal

correlation

discovery

Aprio

rilik

eradic

radicradicradic

radicradic

Sensea

ndsend

Environm

ental

mon

itorin

gradicradic

Data

representatio

nMem

oryandtim

econsum

ing

Episo

dediscovery(ED)

[21]

Actio

npredictio

n

Generalized

sequ

entia

lpatte

rn(G

SP)

radicradic

radicradic

radicSensea

ndsend

Inhabitants

behavior

predictio

nradicradicradic

Predictio

naccuracy

Ineffi

cientfor

complex

activ

ities

MPG

[64]

Predicto

bjectrsquos

future

movem

ent

Aprio

rilik

eradic

radicradic

radicradicradic

Clusterin

gRe

al-timeo

bject

tracking

radicradic

Tracking

time

andenergy

Not

analyzed

onrealdataset

Con

textual

patte

rns

discovery[22]

Ano

maly

detection

PSP

radicradicradic

radicradic

radicSensea

ndsend

Railw

aymaintenance

radicradic

Ano

maly

precision

Missingreal-time

anom

alypredictio

n

International Journal of Distributed Sensor Networks 15

Table4Com

paris

onof

dataminingtechniqu

esforw

irelesssensor

networkscon

tinued

Approach

Objectiv

eDM

metho

d

Processin

gSensor

data

Nod

eproperties

Implem

entatio

nLimitatio

nsArchitecture

Attributes

Correlatio

nCon

nectivity

Mob

ility

Nod

erole

Nod

etask

Applicationarea

Evaluatio

nmetho

dDatas

ource

Optobjectiv

e

Distributed

Central

Homogenous

Heterogeneous

Attribute

Spatial

Temporal

Singlehop

Multihops

Static

Mobile

ClusterheadSensorRelay

Simulation

Analyticalmod

Real

Synthetic

Sequ

entia

lpattern

mining

TMP-mine[65]

Predicto

bjectrsquos

future

movem

ent

Patte

rngrow

thusingTM

P-tre

econstructio

nradic

radicradic

radicradic

radicRu

le-based

node

activ

ation

Real-timeo

bject

tracking

radicradic

Energy

Highmissing

rateandtim

e

Patte

rnlearner[23]B

ehavior

recogn

ition

Tree

projectio

nradic

radicradic

radicradic

radicSensea

ndsend

Behavior

mon

itorin

gradicradic

Noof

patte

rns

learned

Com

plex

and

redu

ndant

patte

rns

MSA

P[63]

Faultp

rediction

Cand

idate

constructio

nradicradic

radicradicradic

radicSensea

ndsend

Telecommun

ication

radicradic

Patte

rnsa

ccuracy

Cand

idate

constructio

nis

expensiveto

compu

te

PTSP

[66]

Objectrsquos

future

movem

ent

predictio

n

Sequ

entia

lpatte

rngeneratio

nradic

radicradic

radicradic

radicRu

le-based

node

activ

ation

Objecttracking

radicradic

Energy

Ineffi

cientto

predict

high

-speed

objects

Clusterin

g

DCC

[86]

WSN

slon

gevity

Data

correlation-

based

cluste

ring

radicradic

radicradicradic

radicradic

Data

supp

ression

GenericWSN

sapplication

radicradic

Energy

anddata

size

Highclu

sterin

grate

H-cluste

r[85]

In-network

commun

ication

Data

correlation-

based

cluste

ring

radicradic

radicradicradic

radicradic

Data

summarization

Real-time

mon

itorin

gradic

radicradic

Com

mun

ication

Highdataloss

rate

16 International Journal of Distributed Sensor Networks

Table5Com

paris

onof

dataminingtechniqu

esforw

irelesssensor

networkscon

tinued

Approach

Objectiv

eDM

metho

d

Processin

gSensor

data

Nod

eproperties

Implem

entatio

nLimitatio

nsArchitecture

Attributes

Correlatio

nCon

nectivity

Mob

ility

Role

Nod

etask

Applicationarea

Evaluatio

nmetho

dDatas

ource

Optobjectiv

e

Distributed

Central

Homogenous

Heterogeneous

Attribute

Spatial

Temporal

Singlehop

Multihops

Static

Mobile

ClusterheadSensorRelay

Simulation

Analyticalmod

Real

Synthetic

Clusterin

gPredictio

nmod

el[87]

Predictio

n-based

mon

itorin

gHeuris

ticscheme

radicradic

radicradic

radicradic

radicradicradic

Localprediction

mod

elEn

vironm

ental

mon

itorin

gradic

radicCom

mun

ication

Clustero

verla

pping

CAG[88]

WSN

sbandw

idth

gain

Data

correlation-

based

cluste

ring

radicradic

radicradic

radicradic

radicradic

Dataa

ggregatio

nGenericWSN

sapplications

radicradic

Com

mun

ication

Sensorydataloss

EEDC[84]

On-demand

cluste

ring

Data

correlation-

based

cluste

ring

radicradic

radicradic

radicradic

radicSensea

ndsend

Surveillanced

ata

analysis

radicradicradic

Energy

Ineffi

cientfor

large

WSN

s

Clusterin

gsensorydata[67]Com

mun

ication

efficiency

K-means

radicradicradic

radicradic

radicradic

Data

summarization

Dataa

nalysis

radicradic

Com

mun

ication

Ineffi

cientfor

large

WSN

sAttributeb

ased

cluste

ring[89]

WSN

sbandw

idth

gain

Hierarchal

cluste

ringradic

radicradic

radicradic

radicradic

Datac

luste

ring

Mon

itorin

gand

tracking

radicradic

Com

mun

ication

Highcompu

tatio

ncost

DHCS

[90]

Uniform

data

distr

ibution

Hierarchal

cluste

ringradic

radicradicradic

radicradic

radicradic

Datac

luste

ring

and

summarization

Interactived

ata

analysis

radicMessage

redu

ction

Nod

esenergy

isigno

red

International Journal of Distributed Sensor Networks 17

Table6Com

paris

onof

dataminingtechniqu

esforw

irelesssensor

networkscon

tinued

Approach

Objectiv

eDM

metho

d

Processin

gSensor

data

Nod

eproperties

Implem

entatio

nLimitatio

nsArchitecture

Attributes

Correlatio

nCon

nectivity

Mob

ility

Role

Nod

etask

Applicationarea

Evaluatio

nmetho

dDatas

ource

Opt

objective

Distributed

Central

Homogenous

Heterogeneous

Attribute

Spatial

Temporal

Singlehop

Multihops

Static

Mobile

ClusterheadSensorRelay

Simulation

Analyticalmod

Real

Synthetic

Classifi

catio

nPerson

identifi

catio

nalgorithm

s[109]

Identifyhu

man

behavior

Decision

tree

radicradicradic

radicradic

radicSensea

ndsend

Health

care

radicradic

Classifi

catio

naccuracy

Doesn

otgu

arantee

thec

orrectness

Predictio

nfram

ework[103]

Distrib

uted

predictio

nDecision

tree

radicradic

radicradicradic

radicradic

Localprediction

Generic

radicradic

Predictio

naccuracy

Com

putatio

nal

complexity

NNTC

[96]

Real-time

classificatio

nNearest

neighb

orradicradic

radicradic

radicradic

Sensea

ndsend

Generic

radicradicradic

Classifi

catio

naccuracy

Not

evaluatedon

realdataset

LWClass[100]

Preserve

WSN

sresources

KNN

radicradic

radicradic

radicradic

Sensea

ndsend

Ubiqu

itous

environm

ents

radicradic

Resource

awareness

Non

adaptio

nto

conceptd

rift

FVLD

[104

]Lo

w-dim

ensio

nfeaturev

ector

generatio

nKN

NM

Lradic

radicradic

radicradic

radicradic

Classifi

catio

nVe

hicle

classificatio

nradic

radicEn

ergy

Highcostof

feature

vector

transm

ission

Fuzzypredictor

mod

el[99]

Occup

ancy

predictio

nFu

zzyrules

radicradic

radicradic

radicradic

Sensea

ndsend

Health

care

radicradic

Predictio

naccuracy

Ineffi

cientfor

complex

scenarios

Onlinelearning

[105]

Increm

ental

classificatio

nSV

Mradic

radicradic

radicradic

radicradic

Classifi

catio

nEn

vironm

ental

mon

itorin

gradic

radicEn

ergy

Com

putatio

nal

complexity

One-class

quarter-sphere

SVM

[108]

Ano

maly

detection

SVM

radicradic

radicradic

radicradicradic

Localano

maly

detection

Habitat

mon

itorin

gradic

radicEn

ergy

Igno

resspatia

lcorrelation

18 International Journal of Distributed Sensor Networks

mining becomes difficult because updates on this structureshould be persisted over time

Node Role Node can perform three types of role [33] asfollows

(i) Regular Sensor These are the nodes with limitedresources and they are used to sense the phenomenaand send the sensed data to the base station

(ii) Cluster Head Cluster head can be a regular sensornode or it can be rich in resources In centralizedapproaches cluster head is a regular sensor node thatonly controls the cluster membership In distributedapproaches besides responding for cluster formationCHs perform aggregationfusion of collected sensorsrsquodata Therefore they are equipped with significantlymore computation and communication resources

(iii) Relay It is the node that acts as medium to transmitthe data packet from one node to the others

Node Task In centralized approach node task is to sense thephenomena being monitored and send the sensed data to thebase station In distributed approaches node can performcomputation and can take action based on the detectedphenomena or target

55 Application Area We also evaluated the type of applica-tion benefited fromWSNs data mining techniques Here weexemplify some real-world applications as follows

(i) First is the environmental monitoring [5ndash7 51 5887] in which sensors are deployed in harsh andunattended regions to monitor the natural environ-ment Data mining techniques can identify when andwhere an event may occur and trigger an alarm upondetection

(ii) Second is the habitant and health monitoring [1 299 109] in which patientshumans are equipped withsmall sensors on multiple different positions of theirbody tomonitor their health or behaviorDataminingtechnique can identify the abnormal behavior andhelp to take effective action

(iii) Third is the object tracking [3 4 65 66] in whichsensors are embedded inmoving targets to track themin real-time Data mining techniques help to improvethe estimation of the location of targets and also tomake tracking more efficient and accurate

(iv) Fourth is the WSNs performance [46 48 50 51]WSNs are usually unattended and deployed in harshenvironment Sensor nodes are resource constrainedespecially in terms of power Data mining techniqueshelp to identify the faulty or dead nodes Theyalso help to conserve energy by using in-networkprocessing in which aggregated data is sent to centralside

(v) Fifth is the data analysis [67 84 90] Data miningtechniques help to discover potentially interesting

data patterns in a sensor network for a certainapplication

(vi) Sixth is the real-time monitoring [64 65 85] Datamining techniques especially distributed techniqueshelp to identify certain patterns and predict futureevents in a given time window which make real-timeresponse and action feasible

56 Implementation Each technique is also evaluated interms of experimental validation that is which dataset isused which WSNs optimization objectives are achieved andso forth

Evaluation Method Analytical modeling simulation andreal deployment are the most commonly used techniques toanalyze the performance of data mining technique forWSNs

(i) Analytical Modeling This method is very complexand usually certain simplifications are assumed topredict the performance of the proposed schemeSuch assumptions and simplifications may lead toimprecise results with limited confidence

(ii) Simulation It is the most popular and effectiveapproach to design and test any proposed schemein terms of cost and time it also provides higherlevel of details as comparedwith real implementationHowever the appropriate selection of a simulationframework according to problem and network char-acteristics is a critical task

(iii) Real Deployment It may not be feasible to evaluatethe performance of these techniques through realdeployment due to the unavailability of appropriatehardware in terms of technical and design limitationsUsually the real deployment requires hundreds ofsensor nodes and cost becomes another importantissue In a nutshell evaluating any technique pro-posed for WSNs through real deployment can getthe most convincing results although the evaluatingprocess is complex costly and time consuming

Data Source It refers to dataset use to experimentally validatethe proposed technique Two types of dataset are usedgenerally that is synthetic and real It is observed from thispaper that most of the techniques use the simulation onsynthetic dataset to validate the result In this paper it isobserved that most of the studies used the simulation due tolimited processing power of sensor nodes

Optimization Objective SinceWSNs are constrained in termsof different resources the technique is also evaluated in theoptimization objective that has been achieved Most of thetechniques consider the resource constraint and differentdesign philosophies of network None of them can workefficiently for all of the performance metrics like networksize communication overhead energy efficiency memoryconsumption node mobility and and so forth The largevariations in the performance metrics make it a difficult taskto present a comprehensive evaluation

International Journal of Distributed Sensor Networks 19

6 Limitations of Existing Data MiningTechniques for WSNs

Tables 2ndash6 show the characteristics of datamining techniquesdesigned for WSNs It is observed from comparative analysisthat the existing techniques have the following shortcomings

(i) Most of the techniques do not take into account theheterogeneous data and assume that the sensor data ishomogenous [42 46 49ndash51 65 87 110] They ignorethe fact that different attributes together can improvethe mining accuracy In some cases homogenousdata cannot contribute appropriately toward real-time decision

(ii) The majority of techniques only considers the spatialor temporal or spatiotemporal correlations [65ndash6787 88] among sensor data of neighboring nodes anddoes not consider the attribute dependency amongsensor nodes This in turn increases the computa-tional complexity and reduces the accuracy of miningtechnique

(iii) The techniqueswhich consider spatial correlation [51]among sensor data of neighboring nodes suffer fromthe choice of appropriate neighborhood range Tech-niques which consider temporal correlation amongsensor data suffers from the choice of the size of thesliding window

(iv) The majority of techniques uses centralized approach[21 42ndash44 46 58 84 101] in which all data istransmitted to the sink node for identifying certainpatterns These techniques cause much communica-tion overhead and delay the response time Whilethe techniques that used distributed architecture opti-mize response time and energy consumption theyhave the same problem as that of the centralizedapproach if the aggregatorcluster head has a largenumber of nodes under its membership

(v) Excluding a few the performance of all of the schemesdiscussed in this paper has been evaluated with thehelp of different simulation tools Although the num-ber of simulators is available and plays an importantrole for developing and testing new technique thereis always some kind of risk involved as simulationresults may not be accurate In order to analyze aprotocol more effectively it is important to knowdifferent available tools andunderstand the associatedbenefits and limitationsDue to different performancerequirements according to specific applications ageneral tool for sensor networks is still lacking atpresent

(vi) The techniques evaluated by using analytical mod-eling [21 23 46 49 100 109] used certain sim-plification and assumption to evaluate the perfor-mance of proposed technique Such assumptions andsimplifications may lead to imprecise results withlimited confidence None of the proposed techniqueis evaluated by using real deployment Although realdeployment is complex costly and time consuming

accurate results can only be obtained by using realdeployment

(vii) Excluding a few [22 103 109] the majority oftechniques assumes that sensor nodes are stationaryand do not consider nodes mobility Applying thesetechniques for mobile networks or the networks withdynamic changed topology would be challenging

(viii) Most of the techniques used the synthetic dataAlthough synthetic data is easily available therealways been chances that results generated on syn-thetic data are not accurate

(ix) For the data mining techniques themselves fre-quent pattern mining [15ndash20] approaches suffer fromchoice of proper and flexible support and confidencethreshold Clustering techniques [11ndash14] suffer fromthe choice of an appropriate parameter of clusterwidth and computing the distance between datainstances in heterogeneous data is computationallyexpensive whereas classification-based techniques[24ndash26] require some prior knowledge to classify theincoming data stream However learning accurateclassification model is challenging if the number ofvariables is large in deployed WSNs

7 Future Research Directions

It is observed from the analysis of existing data mining workon sensor network-based application there are still shortcom-ings in existing techniques By seeing these shortcomingsand special characteristics of WSNs there is a need for datamining technique designed for WSNs The technique shouldbe based on the following requirements

(i) The technique should combine offline learningmech-anisms with distributed and online data processing

(ii) It should also consider the resource constraint ofWSN and its special characteristics such as nodemobility and network topology

(iii) The technique should consider heterogeneous dataand dependencies among spatial temporal andattribute correlations which may exist between adja-cent nodes

(iv) During online mining the technique should be capa-ble for incremental learning

(v) The technique should have low computation com-plexity and be easy to be implemented

Based on aforementioned requirements for WSN ahybrid data mining framework is proposed as shown inFigure 6 In this framework sensor nodes use their pro-cessing abilities to locally carry out mining processing andtransmit only the required and partially processed data calledlocal models Single-pass algorithms are applied for networkdata processing as the data is continuously arriving and notavailable for the next scan

Local models contain the compact event patterns ratherthan raw data which address the issue of communication

20 International Journal of Distributed Sensor Networks

Node data processingData selectionRemove duplicationAggregationSummarizationData fusionclusteringAssociation analysismiddot middot middotmiddot middot middot

middot middot middot

Sensor datastream

Global model

Approximateresults

Network model Local modelQuery

Users

Sinkbasestation

In-network processingCentralized processing

Central data processingFrequent pattern miningClassificationClusteringIncremental learningPredicationAnomaly detectionTime series analysis

Network data processingLocal model integrationNetwork analysisReal time decisionsNetwork maintenance

Network patternidentification

monitoring

Sing

le p

ass

Mul

ti pa

ss

Figure 6 Proposed hybrid framework for sensor network based applications

overhead associated with data transfer Local models aredistributed on entire network which are integrated at specialnode which is resource sufficient as compared with othersensor nodes As a result a network model is computed that ismore abstract than local model and is transferred to the basestationsink inmultihop fashionThenetworkmodels are thenintegrated at base stationsink to get the global view of entirenetwork named the global model As a result approximatequery answers are returned to endusers

This framework addresses the following shortcomings ofthe existing techniques

(i) It combines the offline learning mechanisms withdistributed and online data processing The dynamicnature of WSNs data requires real-time analysismethodologies and systems Centralized processingthrough high-end computing is also required forgenerating offline predictive insights which in turncan facilitate real-time analysis The applications thatrequire real-time response and actions can use net-work model for decision and knowledge extractionThe applications that need extensive data analysis fortheir decision making can use global model and per-form central processing on base the stationsink Thenetwork model forwards the processed informationto global model for extensive predictive insight

(ii) Since the data management is a crucial issue inWSNsdata [111] in order to deal with large-scale data fromWSNs the proposed framework splits the data pro-cessing tasks at multiple locations in-network pro-cessing and processing at central server In-networkprocessing splits the large task into smaller ones atnode level and cluster head which is distributed overthe entire network and executes parallelly At the node

level storage capacities of single nodes are used tocompute the local model which contains aggregateddata from single node whereas cluster head acquiresthe data from group of nodes and aggregate datareadings over a certain region or period As a resultnetwork model is computed at each cluster headwhich contains compact data from set of nodes andreduces data size to be transmitted Network modelscan be integrated at sink to get the global view ofreal-time applications Since the sink at network levelhas restricted resource and cannot process large-scaledata for predictive analysis therefore network mod-els are sent to central server where global models canbe computed for predictive offline analysis Historicalquery from the user can also be addressed fromcentral server whereas instant query can be handledby sink to support real-time response In this way ofdata distribution the proposed framework is feasibleto deal with large amount of data obtained fromWSNs

(iii) It can consider the resource constraint of sensornode by using context-awareness techniques Mem-ory energy [79] and bandwidth are considered inthe implementation of data processing on the sensorsfor example many summarization and aggregationtechniques can be adopted to reduce energy andbandwidth consumption

(iv) The framework can address the problem quicklychanging nature of WSNs data where characteristicsof the monitored process may change over timeand render the old models outdated This problemcan be addressed using the incremental learning

International Journal of Distributed Sensor Networks 21

mechanism [39 112] that helps the model to updatenew information

(v) The framework can identified the spatial-temporalcorrelation at local model by using data correlation-based clustering whereas attribute correlation can beidentified at global model by using the multipass datamining algorithms

Currently we are working on implementation of thishybrid framework and the implementationwill be completedin the near future

8 Conclusion

The emerging need for the data mining techniques in thefield of WSNs resulted in the development of numerousalgorithms Each one of these algorithms solves certainissues related to the appropriate WSNs type and applicationIn this paper we analyzed discussed and compared therelated existing research approaches We observed that thetechniques intended for mining sensor data at the networkside are helpful for taking real-time decision aswell as serve asprerequisite for development of effective mechanism for datastorage retrieval query and transaction processing at centralside Moreover we have presented problem-based taxonomyan overall analysis and review of the past research and theirlimitations which can provide insights for endusers in apply-ing or developing an appropriate data mining method andappropriate technology forWSNs Based on these limitationswe have proposed a hybrid framework which can addressthe shortcomings of existing work We have also discussedthe challenges for implementing data mining techniques inresource-constrained WSNs Besides there are a number ofopen issues in existing studies which need to be addressedSurely the number of WSNs applications presented hereis neither complete nor exhaustive but merely a sample ofapplications that demonstrate the usefulness and possibleapplications of data mining method in sensor network

We believe that WSNs applications will become moremature and popular with the advancement of sensor tech-nology and sensor data will become more informationrich Mining techniques will then be very significant inorder to conduct advanced analysis such as determiningtrends and finding interesting patterns thus enhancingWSNsperformance and operation The intention to present thispaper is to stimulate interests in utilizing and developing theprevious studies into emerging applications

Acknowledgments

This work was supported in part by the Joint Funds ofNSFC-Microsoft Research Asia under Grant no 60933012the Specialized Research Fund for the Doctoral Programof Higher Education under Grant no 20110142110062 andInternational SampT Cooperation Program of Hubei Provinceunder Grant no 2010BFA008

References

[1] A Rozyyev H Hasbullah and F Subhan ldquoIndoor child track-ing in wireless sensor network using fuzzy logic techniquerdquoResearch Journal of Information Technology vol 3 no 2 pp 81ndash92 2011

[2] R Szewczyk E Osterweil J Polastre M Hamilton A Main-waring and D Estrin ldquoHabitat monitoring with sensor net-worksrdquo Communications of the ACM vol 47 no 6 pp 34ndash402004

[3] S H Chauhdary A K Bashir S C Shah and M S ParkldquoEOATR energy efficient object tracking by auto adjustingtransmission range in wireless sensor networkrdquo Journal ofApplied Sciences vol 9 no 24 pp 4247ndash4252 2009

[4] P K Biswas and S Phoha ldquoSelf-organizing sensor networks forintegrated target surveillancerdquo IEEETransactions onComputersvol 55 no 8 pp 1033ndash1047 2006

[5] L T Lee and C W Chen ldquoSynchronizing sensor networkswith pulse coupled and cluster based approachesrdquo InformationTechnology Journal vol 7 no 5 pp 737ndash745 2008

[6] N Sabri S A Aljunid B Ahmad A Yahya R KamaruddinandM S Salim ldquoWireless sensor actor network based on fuzzyinference system for greenhouse climate controlrdquo Journal ofApplied Sciences vol 11 no 17 pp 3104ndash3116 2011

[7] D Kumar ldquoMonitoring forest cover changes using remotesensing and GIS a global prospectiverdquo Research Journal ofEnvironmental Sciences vol 5 pp 105ndash123 2011

[8] J Yick B Mukherjee and D Ghosal ldquoWireless sensor networksurveyrdquoComputerNetworks vol 52 no 12 pp 2292ndash2330 2008

[9] T Arampatzis J Lygeros and S Manesis ldquoA survey of appli-cations of wireless sensors and wireless sensor networksrdquoin Proceedings of the 20th IEEE International Symposium onIntelligent Control (ISIC rsquo05) pp 719ndash724 June 2005

[10] Y-C Tseng M-S Pan and Y-Y Tsai ldquoWireless sensor net-works for emergency navigationrdquo Computer vol 39 no 7 pp55ndash62 2006

[11] T Yairi Y Kato and K Hori ldquoFault detection by miningassociation rules fromhouse-keeping datardquo inProceedings of the6th International Symposium on Artificial Intelligence Roboticsand Automation in Space pp 18ndash21 2001

[12] O Horovitz S Krishnaswamy and M M Gaber ldquoA fuzzyapproach for interpretation of ubiquitous data stream clusteringand its application in road safetyrdquo Intelligent Data Analysis vol11 no 1 pp 89ndash108 2007

[13] J Gama P P Rodrigues and L Lopes ldquoClustering distributedsensor data streams using local processing and reduced com-municationrdquo Intelligent Data Analysis vol 15 no 1 pp 3ndash282011

[14] Z A Aghbari I Kamel and T Awad ldquoOn clustering largenumber of data streamsrdquo Intelligent Data Analysis vol 16 no1 pp 69ndash91 2012

[15] A Boukerche and S Samarah ldquoAn efficient data extractionmechanism for mining association rules from wireless sensornetworksrdquo in Proceedings of the IEEE International Conferenceon Communications (ICC rsquo07) pp 3936ndash3941 June 2007

[16] Y Chi H Wang P S Yu and R R Muntz ldquoMomentmaintaining closed frequent itemsets over a stream slidingwindowrdquo inProceedings of the 4th IEEE International Conferenceon Data Mining (ICDM rsquo04) pp 59ndash66 November 2004

[17] M Deypir and M H Sadreddini ldquoEclatDS an efficient slid-ing window based frequent pattern mining method for data

22 International Journal of Distributed Sensor Networks

streamsrdquo Intelligent Data Analysis vol 15 no 4 pp 571ndash5872011

[18] J Gama A Ganguly O Omitaomu R Vatsavai and M GaberldquoKnowledge discovery from data streamsrdquo Intelligent DataAnalysis vol 13 no 3 pp 403ndash404 2009

[19] B George J M Kang and S Shekhar ldquoSpatio-temporal sensorgraphs (STSG) a data model for the discovery of spatio-temporal patternsrdquo Intelligent Data Analysis vol 13 no 3 pp457ndash475 2009

[20] A Mahmood K Shi and S Khatoon ldquoMining data generatedby sensor networks a surveyrdquo Information Technology Journalvol 11 pp 1534ndash1543 2012

[21] D J Cook M Youngblood E O Heierman III et alldquoMavHome an agent-based smart homerdquo in Proceedings of the1st IEEE International Conference on Pervasive Computing andCommunications (PerCom rsquo03) pp 521ndash524 March 2003

[22] J Rabatel S Bringay and P Poncelet ldquoSO MAD sensorminingfor anomaly detection in railway datardquo in Advances in DataMining Applications andTheoretical Aspects pp 191ndash205 2009

[23] V Guralnik and K Z Haigh ldquoLearning models of humanbehaviour with sequential patternsrdquo in Proceedings of the AAAI-02 Workshop on Automation as Caregiver pp 24ndash30 2002

[24] S Huang and Y Dong ldquoAn active learning system for miningtime-changing data streamsrdquo Intelligent Data Analysis vol 11no 4 pp 401ndash419 2007

[25] J Beringer and E Hullermeier ldquoEfficient instance-based learn-ing on data streamsrdquo Intelligent Data Analysis vol 11 no 6 pp627ndash650 2007

[26] E J Spinosaa A PD L F deCarvalhoa and J Gamab ldquoNoveltydetection with application to data streamsrdquo Intelligent DataAnalysis vol 13 no 3 pp 405ndash422 2009

[27] M Xie S Han B Tian and S Parvin ldquoAnomaly detectionin wireless sensor networks a surveyrdquo Journal of Network andComputer Applications vol 34 no 4 pp 1302ndash1325 2011

[28] Y Zhang N Meratnia and P Havinga ldquoOutlier detectiontechniques for wireless sensor networks a surveyrdquo IEEE Com-munications Surveys and Tutorials vol 12 no 2 pp 159ndash1702010

[29] V Chandola A Banerjee and V Kumar ldquoAnomaly detection asurveyrdquo ACM Computing Surveys vol 41 no 3 article 15 2009

[30] VMaojo and J Sanandre ldquoA survey of data mining techniquesrdquoMedical Data Analysis Lecture Notes in Computer Science vol1933 pp 17ndash22 2000

[31] W Jinlong X Congfu C Weidong and P Yunhe ldquoSurveyof the study on frequent pattern mining in data streamsrdquo inProceedings of the IEEE International Conference on SystemsMan and Cybernetics (SMC rsquo04) pp 5917ndash5922 October 2004

[32] J Cheng Y Ke and W Ng ldquoA survey on algorithms formining frequent itemsets over data streamsrdquo Knowledge andInformation Systems vol 16 no 1 pp 1ndash27 2008

[33] A A Abbasi andM Younis ldquoA survey on clustering algorithmsfor wireless sensor networksrdquo Computer Communications vol30 no 14-15 pp 2826ndash2841 2007

[34] O Boyinbode H Le and M Takizawa ldquoA survey on clusteringalgorithms for wireless sensor networksrdquo International Journalof Space-Based and SituatedComputing vol 1 no 2 pp 130ndash1362007

[35] M M Gaber A Zaslavsky and S Krishnaswamy ldquoA survey ofclassificationmethods in data streamsrdquo inData Streams pp 39ndash59 Springer 2007

[36] R Agrawal and R Srikant ldquoFast algorithms for mining associ-ation rulesrdquo in Proceedings of the 20th International ConferenceVery Large Data Bases (VLDB rsquo94) pp 487ndash499 Citeseer 1994

[37] R J Bayardo Jr ldquoEfficiently mining long patterns fromdatabasesrdquo SIGMOD Record vol 27 no 2 pp 85ndash93 1998

[38] S Brin RMotwani andC Silverstein ldquoBeyondmarket basketsgeneralizing association rules to correlationsrdquo SIGMODRecordvol 26 no 2 pp 265ndash276 1997

[39] W Cheung and O R Zaiane ldquoIncremental mining of frequentpatterns without candidate generation or support constraintrdquoin Proceedings of 7th International Database Engineering andApplications Symposium pp 111ndash116 2003

[40] R Agrawal T Imielinski and A Swami ldquoMining associationrules between sets of items in large databasesrdquo in Proceeding ofSIGMOD pp 207ndash216

[41] J Han J Pei Y Yin and R Mao ldquoMining frequent pat-terns without candidate generation a frequent-pattern treeapproachrdquo Data Mining and Knowledge Discovery vol 8 no 1pp 53ndash87 2004

[42] M Halatchev and L Gruenwald ldquoEstimating missing valuesin related sensor data streamsrdquo in Proceedings of the 11thInternational Conference on Management of Data (COMADrsquo05) 2005

[43] N Jiang ldquoDiscovering association rules in data streams basedon closed pattern miningrdquo in Proceedings of the SIGMODWorkshop on Innovative Database Research 2007

[44] N Jiang and L Gruenwald ldquoEstimating missing data in datastreamsrdquo Advances in Databases Concepts Systems and Appli-cations pp 981ndash987 2007

[45] N Jiang and L Gruenwald ldquoCFI-stream mining closed fre-quent itemsets in data streamsrdquo in Proceedings of the 12th ACMSIGKDD International Conference on Knowledge Discovery andData Mining (KDD rsquo06) pp 592ndash597 August 2006

[46] K Loo I Tong and B Kao ldquoOnline algorithms for min-ing inter-stream associations from large sensor networksrdquo inAdvances in KnowledgeDiscovery andDataMining pp 291ndash3022005

[47] G S Manku and R Motwani ldquoApproximate frequency countsover data streamsrdquo in Proceedings of the 28th InternationalConference on Very Large Data Bases pp 346ndash357 2002

[48] S K Chong S Krishnaswamy S W Loke and M M GaberldquoUsing association rules for energy conservation in wirelesssensor networksrdquo in Proceedings of the 23rd Annual ACMSymposium on Applied Computing (SAC rsquo08) pp 971ndash975March 2008

[49] S K Tanbeer C F Ahmed B-S Jeong and Y-K Lee ldquoEfficientmining of association rules from wireless sensor networksrdquo inProceedings of the 11th International Conference on AdvancedCommunication Technology (ICACT rsquo09) pp 719ndash724 February2009

[50] A Boukerche and S Samarah ldquoA novel algorithm for miningassociation rules in Wireless Ad Hoc Sensor Networksrdquo IEEETransactions on Parallel and Distributed Systems vol 19 no 7pp 865ndash877 2008

[51] K Romer ldquoDistributed mining of spatio-temporal event pat-terns in sensor networksrdquo in Proceedings of the 1st Euro-American Workshop on Middleware for Sensor Networks(EAWMS rsquo06) 2006

[52] BTnode platform httpwwwbtnodeethzch[53] R Agrawal and R Srikant ldquoMining sequential patternsrdquo in

Proceedings of the IEEE 11th International Conference on DataEngineering pp 3ndash14 March 1995

International Journal of Distributed Sensor Networks 23

[54] R Srikant and R Agrawal ldquoMining sequential patterns gen-eralizations and performance improvementsrdquo in Proceedings ofthe Advances in Database Technology (EDBT rsquo96) pp 1ndash17 1996

[55] F Masseglia F Cathala and P Poncelet ldquoThe PSP approachfor mining sequential patternsrdquo Principles of Data Mining andKnowledge Discovery pp 176ndash184 1998

[56] J Han J Pei B Mortazavi-Asl Q Chen U Dayal and M-CHsu ldquoFreeSpan frequent pattern-projected sequential patternminingrdquo in Proceedings of the Sixth ACMSIGKDD InternationalConference onKnowledgeDiscovery andDataMining (KDD rsquo01)pp 355ndash359 August 2000

[57] J Pei J Han B Mortazavi-Asl et al ldquoPrefixSpan min-ing sequential patterns efficiently by prefix-projected patterngrowthrdquo in Proceedings of the 17th International Conference onData Engineering pp 215ndash224 April 2001

[58] F Esposito T M A Basile N Di Mauro and S Ferilli ldquoA rela-tional approach to sensor network data miningrdquo InformationRetrieval and Mining in Distributed Environments pp 163ndash1812010

[59] F Esposito N Di Mauro T M A Basile and S FerillildquoMulti-dimensional relational sequence miningrdquo FundamentaInformaticae vol 89 no 1 pp 23ndash43 2008

[60] R Agrawal H Mannila R Srikant et al ldquoFast discovery ofassociation rulesrdquo inAdvances in KnowledgeDiscovery andDataMining pp 307ndash328 AAAI PressMenlo Park Calif USA 1996

[61] Mica2Dot CrossBow 2005 httpwwwxbowcom[62] Intel Berkeley Research Lab Data httpdbcsailmitedulab-

datalabdatahtml[63] P H Wu W C Peng and M S Chen ldquoMining sequential

alarm patterns in a telecommunication databaserdquo in Databasesin Telecommunications II pp 37ndash51 2001

[64] V S Tseng and E H-C Lu ldquoEnergy-efficient real-time objecttracking in multi-level sensor networks by mining and predict-ing movement patternsrdquo Journal of Systems and Software vol82 no 4 pp 697ndash706 2009

[65] V S Tseng and K W Lin ldquoEnergy efficient strategies for objecttracking in sensor networks a data mining approachrdquo Journalof Systems and Software vol 80 no 10 pp 1678ndash1698 2007

[66] S Samarah M Al-Hajri and A Boukerche ldquoA predictiveenergy-efficient technique to support object-tracking sensornetworksrdquo IEEE Transactions on Vehicular Technology vol 60no 2 pp 656ndash663 2011

[67] A Taherkordi R Mohammadi and F Eliassen ldquoA commu-nication-efficient distributed clustering algorithm for sensornetworksrdquo in Proceedings of the 22nd International Conferenceon Advanced Information Networking and Applications Work-shopsSymposia (AINA rsquo08) pp 634ndash638 March 2008

[68] G Gupta and M Younis ldquoLoad-balanced clustering of wirelesssensor networksrdquo in Proceedings of the International Conferenceon Communications (ICC rsquo03) vol 3 pp 1848ndash1852 May 2003

[69] S Bandyopadhyay and E J Coyle ldquoAn energy efficient hier-archical clustering algorithm for wireless sensor networksrdquo inProceedings of the 22nd Annual Joint Conference on the IEEEComputer and Communications Societies pp 1713ndash1723 April2003

[70] S Ghiasi A Srivastava X Yang and M Sarrafzadeh ldquoOptimalenergy aware clustering in sensor networksrdquo Sensors vol 2 no7 pp 258ndash269 2002

[71] O Younis and S Fahmy ldquoHEED a hybrid energy-efficientdistributed clustering approach for ad hoc sensor networksrdquoIEEE Transactions on Mobile Computing vol 3 no 4 pp 366ndash379 2004

[72] M Younis M Youssef and K Arisha ldquoEnergy-aware manage-ment for cluster-based sensor networksrdquo Computer Networksvol 43 no 5 pp 649ndash668 2003

[73] Y T Hou Y Shi H D Sherali and S F Midkiff ldquoOn energyprovisioning and relay node placement for wireless sensornetworksrdquo IEEE Transactions on Wireless Communications vol4 no 5 pp 2579ndash2590 2005

[74] T Wu and S Biswas ldquoA self-reorganizing slot allocation proto-col for multi-cluster sensor networksrdquo in Proceedings of the 4thInternational Symposium on Information Processing in SensorNetworks (IPSN rsquo05) pp 309ndash316 April 2005

[75] K Dasgupta K Kalpakis and P Namjoshi ldquoAn efficientclustering-based heuristic for data gathering and aggregationin sensor networksrdquo in Proceedings of the IEEE Wireless Com-munications and Networking Conference (WCNC rsquo03) vol 3 pp1948ndash1953 2003

[76] M Demirbas A Arora and V Mittal ldquoFLOC A fast local clus-tering service for wireless sensor networksrdquo in Proceedings ofWorkshop on Dependability Issues in Wireless Ad Hoc Networksand Sensor Networks (DIWANS rsquo04) 2004

[77] P Ding J Holliday and A Celik ldquoDistributed energy-efficienthierarchical clustering for wireless sensor networksrdquo in Pro-ceedings of the 1st IEEE International Conference on DistributedComputing in Sensor Systems (DCOSS rsquo05) pp 466ndash467 July2005

[78] H Chan and A Perrig ldquoACE an emergent algorithm for highlyuniform cluster formationrdquoWireless Sensor Networks vol 2920pp 154ndash171 2004

[79] H Chan M Luk and A Perrig ldquoUsing clustering informationfor sensor network localizationrdquo in Proceedings of the 1st IEEEInternational Conference on Distributed Computing in SensorSystems (DCOSS rsquo05) pp 109ndash125 July 2005

[80] H Huang and J Wu ldquoA probabilistic clustering algorithmin wireless sensor networksrdquo in Proceeding of IEEE 62ndSemiannual Vehicular Technology Conference (VTC rsquo05) p 17962005

[81] A Youssef M Younis M Youssef and A Agrawala ldquoDis-tributed formation of overlappingmulti-hop clusters in wirelesssensor networksrdquo in Proceedings of the 49th Annual IEEE GlobalCommunication Conference (Globecom rsquo06) pp 1ndash6 December2006

[82] S Dai P Wang L Gao and S Zheng ldquoMining clusteringalgorithm in wireless sensor networksrdquo in Proceedings of theIEEE International Conference on Granular Computing (GRCrsquo08) pp 178ndash182 August 2008

[83] W R Heinzelman A Chandrakasan and H Balakrish-nan ldquoEnergy-efficient communication protocol for wirelessmicrosensor networksrdquo in Proceedings of the 33rd AnnualHawaii International Conference on System Siences (HICSS rsquo00)vol 2 p 223 January 2000

[84] C Liu K Wu and J Pei ldquoA dynamic clustering and schedulingapproach to energy saving in data collection from wirelesssensor networksrdquo in Proceedings of the 2nd Annual IEEE Com-munications Society Conference on Sensor and AdHoc Commu-nications and Networks (SECON rsquo05) pp 374ndash385 September2005

[85] L Guo C Ai X Wang Z Cai and Y Li ldquoReal time clusteringof sensory data in wireless sensor networksrdquo in Proceedingsof the IEEE 28th International Performance Computing andCommunications Conference (IPCCC rsquo09) pp 33ndash40 December2009

24 International Journal of Distributed Sensor Networks

[86] M H Yeo M S Lee S J Lee and J S Yoo ldquoData correlation-based clustering in sensor networksrdquo in Proceedings of the Inter-national Symposium on Computer Science and its Applications(CSA rsquo08) pp 332ndash337 October 2008

[87] P Beyens A Nowe and K Steenhaut ldquoHigh-density wirelesssensor networks a new clustering approach for prediction-based monitoringrdquo in Proceedings of the 2nd European Work-shop on Wireless Sensor Networks (EWSN rsquo05) pp 188ndash196February 2005

[88] S Yoon and C Shahabi ldquoThe Clustered AGgregation (CAG)technique leveraging spatial and temporal correlations in wire-less sensor networksrdquo ACM Transactions on Sensor Networksvol 3 no 1 Article ID 1210672 2007

[89] K Wang S A Ayyash T D C Little and P Basu ldquoAttribute-based clustering for information dissemination in wirelesssensor networksrdquo in Proceedings of the 2nd Annual IEEE Com-munications Society Conference on Sensor and AdHoc Commu-nications and Networks (SECON rsquo05) pp 498ndash509 Santa ClaraCalif USA September 2005

[90] X Ma S Li Q Luo et al ldquoDistributed hierarchical clusteringand summarization in sensor networksrdquo in Advances in Dataand Web Management pp 168ndash175 2007

[91] L K Sharma O P Vyas S Schieder et al ldquoNearest neighbourclassification for trajectory datardquo Information and Communica-tion Technologies vol 101 pp 180ndash185 2010

[92] B Chikhaoui S Wang and H Pigot ldquoA new algorithm basedon sequential pattern mining for person identification in ubiq-uitous environmentsrdquo in Proceedings of the 4th InternationalWorkshop on Knowledge Discovery form Sensor Data (ACMSensorKDD rsquo10) pp 20ndash28 Washington DC USA 2010

[93] J R M Bauchet S Giroux H Pigot et al ldquoPervasive assistancein smart homes for people with intellectual disabilities a casestudy on meal preparationrdquo International Journal of AssistiveRobotics and Mechatronics vol 9 no 4 pp 42ndash54 2008

[94] D J Cook andM Schmitter-Edgecombe ldquoAssessing the qualityof activities in a smart environmentrdquoMethods of Information inMedicine vol 48 no 5 pp 480ndash485 2009

[95] I H Witten and E Frank Data Mining Practical MachineLearning Tools and Techniques With Java Implementation Mor-gan Kaufmann 2000

[96] K Sharma M Rajpoot and L K Sharma ldquoNearest neighbourclassification for wireless sensor network datardquo InternationalJournal of Computer Trends and Technology no 2 2011

[97] NS2 Simulator httpwwwisiedunsnamns[98] O P V L K Sharma S Schieder and A K Akasapu ldquoA nearest

neighbour classification for trajectory datardquo in Springer CCISvol 101 pp 180ndash185 2010

[99] M J Akhlaghinia A Lotfi C Langensiepen and N SherkatldquoA fuzzy predictor model for the occupancy prediction of anintelligent inhabited environmentrdquo in Proceedings of the IEEEInternational Conference on Fuzzy Systems (FUZZ rsquo08) pp 939ndash946 June 2008

[100] M Gaber S Krishnaswamy and A Zaslavsky ldquoOn-boardmining of data streams in sensor networksrdquo in AdvancedMethods for Knowledge Discovery from Complex Data pp 307ndash335 2005

[101] M M Gaber S Krishnaswamy and A Zaslavsky ldquoAdaptivemining techniques for data streams using algorithm outputgranularityrdquo in Proceedings of the Australasian Data MiningWorkshop 2003

[102] M M Gaber A Zaslavsky and S Krishnaswamy ldquoResource-aware knowledge discovery in data streamsrdquo in Proceedingsof 1st International Workshop on Knowledge Discovery in DataStreams held in Conjunction ECML and PKDD 2004

[103] S M McConnell and D B Skillicorn ldquoA distributed approachfor prediction in sensor networksrdquo in Proceedings of the Work-shop on Data Mining in Sensor Networks Newport Beach CalifUSA 2005

[104] B Malhotra I Nikolaidis and J Harms ldquoDistributed classifi-cation of acoustic targets in wireless audio-sensor networksrdquoComputer Networks vol 52 no 13 pp 2582ndash2593 2008

[105] K Flouri B Beferull-Lozano and T Tsakalides ldquoTraininga SVM-based classifier in distributed sensor networksrdquo inProceedings of the 14th International Conference onDigital SignalProcessing (DSP rsquo09) pp 1ndash5 2006

[106] K Flouri B Beferull-Lozano and T Tsakalides ldquoEnergy-efficient distributed support vectormachines for wireless sensornetworksrdquo in Proceedings of the EuropeanWorkshop onWirelessSensor Networks 2006

[107] K Flouri B Beferull-Lozano and T Tsakalides ldquoDistributedconsensus algorithms for SVM training in wireless sensornetworksrdquo in Proceedings of the 16th European Signal ProcessingConference (EUSIPCO 09) 2008

[108] S Rajasegarar C Leckie M Palaniswami and J C BezdekldquoQuarter sphere based distributed anomaly detection in wire-less sensor networksrdquo in Proceedings of the IEEE InternationalConference on Communications (ICC rsquo07) pp 3864ndash3869 June2007

[109] B Chikhaoui S Wang and H Pigot ldquoA new algorithm basedon sequential pattern mining for person identification in ubiq-uitous environmentsrdquo in Proceedings of the 4th InternationalWorkshop on Knowledge Discovery form Sensor Data (ACMSensorKDD rsquo10) pp 20ndash28 Washington DC USA 2010

[110] K Romer and F Mattern ldquoThe design space of wireless sensornetworksrdquo IEEEWireless Communications vol 11 no 6 pp 54ndash61 2004

[111] O Diallo J J P C Rodrigues and M Sene ldquoReal-time datamanagement on wireless sensor networks a surveyrdquo Journal ofNetwork andComputer Applications vol 35 no 3 pp 1013ndash10212012

[112] Y Yao L Feng B Jin and F Chen ldquoAn incremental learningapproachwith SupportVectorMachine for network data streamclassification problemrdquo Information Technology Journal vol 11no 2 pp 200ndash208 2012

Submit your manuscripts athttpwwwhindawicom

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2013Part I

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

DistributedSensor Networks

International Journal of

ISRN Signal Processing

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Mechanical Engineering

Advances in

Modelling amp Simulation in EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Advances inOptoElectronics

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2013

ISRN Sensor Networks

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawi Publishing Corporation httpwwwhindawicom Volume 2013

The Scientific World Journal

ISRN Robotics

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

International Journal of

Antennas andPropagation

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

ISRN Electronics

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

thinspJournalthinspofthinsp

Sensors

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Active and Passive Electronic Components

Chemical EngineeringInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Electrical and Computer Engineering

Journal of

ISRN Civil Engineering

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Advances inAcoustics ampVibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Page 6: ReviewArticle Data Mining Techniques for Wireless Sensor ...home.etf.bg.ac.rs/~vm/os/dmsw/Data Mining... · have a large impact on type of data mining algorithm to choose;therefore,onehastodecidetheprocessing

6 International Journal of Distributed Sensor Networks

rules from WSNs data with one database scan The mainidea of the proposed approach is to obtain the frequencyof all event-detecting sensorsrsquo data construct a prefix-treebased on that in any canonical order and then reorganizethe tree in a frequency descending order Through thereorganization the SP-tree canmaintain the frequently event-detecting sensorsrsquo nodes at the upper part of the tree whichin turn provides high compactness in the tree structureOnce the SP-tree is constructed FP-growthmining techniqueis applied to find the frequent event-detecting sensor setsExperiments are performed to verify the improvement inmemory consumption and runtime that SP-tree achieves overPLT [50] The experiments show that SP-tree outperformsPLT in time and memory consumption The reason of suchgain is two folds first the PLT construction requires twodatabase scans while SP-tree constructs the tree by scanningthe database only once second the mining phase of SP-tree is highly efficient due to the frequency-descending treestructure

413 Distributed Approaches Aim to SolveWSNsrsquo Application-Based Issues Romer [51] proposed an in-network data min-ing technique to discover frequent patterns of events withcertain spatial and temporal properties In this approach userspecifies the upper boundmaxscope andmaxhistory (variableto be measured in seconds) for the patterns of interest Thesensor collects these events and applies amining algorithm todiscover the pattern that satisfies the given parameters Eachnode in the network collects the events from its neighborswithin themaximum scope and keeps a history of their eventsfor duration of the maximum history After that each nodeapplies a mining algorithm to discover the local frequentpatterns The resulting frequent patterns are converted toassociation rules that describe an event of type 119864 that occursat node 119899 with support 119878 and confidence 119862 Local patternsare sent to the sink where secondary mining is performed tocompute the global picture of entire network The algorithmis implemented on BT node (bluetooth radio) platform [52]and the tradeoff between scope of the query and resourceconsumption on real dataset is evaluated Results show byreducing the scope of the query that the proposed approachcould decrease resource consumption Major issues in thisapproach are memory consumption of itemset discoveryalgorithms and the communication overhead of event collec-tion

414 Distributed Approaches Aim to Maximize WSNsrsquo Perfor-mance Boukerche and Samarah [15] presented a distributeddata extraction methodology to aggregate the data on sensornode which reduced the number of messages during trans-mission The distributed solution sends some parameterssuch as support time-slot size and historic period from sink toall nodes within network Each sensor node has its own bufferentry to set the support value After each time slot nodescheck whether there are messages received during this timeslot if yes then that node will set its buffer entry When thehistoric period ended each node will traverse its buffer if thenumber of set value is more than or equal to support value

provided initially then the message would be transfered tosink To evaluate the validity of the distributed approach it iscompared with the centralized methodology on real datasetThey conducted two experiments using historical periods of 5and 10 days with minimum support values ranging from 10to 90 and a time-slot size equal to 30 seconds All of thereported results show a reduction in the number of messagesand the data sizewhile increasing in the support valuesMajorissues in thismethodology are increase in cost for node bufferand also delay in crucial messages in case of high supportvalue

Boukerche and Samarah [50] proposed the positionallexicographic tree (PLT) structure for mining associationrules in which the event-detecting sensors are the mainobjects of the rules regardless of their values Similar to theFP-growth approach PLT follows a pattern growth miningtechnique The mining begins with the sensor having themaximum rank by generating the frequent patterns from itsPLT in a recursive way The computation is required at eachrecursion to update the PLT involved in the prefix part ofa pattern Therefore two database scans requirement andthe additional PLT update operations during mining limitthe efficient use of this approach in handling WSNs dataThe performance evaluation is done by comparing the PLTstructure with the FP-growth algorithm According to theirresults PLT structure outperforms FP-growth in terms ofCPU time and memory usage for all of the support valuesused the enhanced performance using PLT when comparedwith FP-growth ranges from 30 percent to 50 percent

42 Sequential PatternMining (SPM) Frequent patternmin-ing has been extended to find more complex structuresuch as sequential pattern mining It discovers frequentsubsequences as patterns in a sequence database A sequencedatabase stores a number of records where all records aresequences of ordered events with orwithout concrete notionsof time A large number of real-world domains such as userprofiling medicine local weather forecast and bioinformat-ics show an inherent tendency to be modeled by means ofsequences of eventsobjects related to each other This greatvariety of applications of sequential pattern mining makesthis problem one of the central topics in WSNs data miningas shown by the research efforts produced in the recent yearsThe sequential pattern mining techniques in sensor networkbased on either traditional sequential mining algorithmssuch as Apriori-like algorithm [53] Apriori-based methodsGSP [54] PSP [55] and pattern growth approaches FreeSpanand PrefixSpan [56 57] or some new algorithm are devisedspecifically to work with sensor network environment

421 Centralized Approaches Aim to SolveWSNsrsquo Application-Based Issues Esposito et al [58 59] presented a multi-dimensional relational sequence mining framework to iden-tify the hidden frequent temporal correlations betweensensor nodes The algorithm is based on generic level-wise search method called APRIORI [60] for discoveringcorrelated sensors The framework exploits the relationallanguage to describe the temporal evolution of a sensor

International Journal of Distributed Sensor Networks 7

network along with contextual information by working intwo phases Firstly an abstraction step is to segment andlabel the real-valued time series into similar subsequencesby using a kernel density estimator approach Then theknowledge is enriched by adding interval-based operatorsbetween the subsequences obtained in the discretization stepand the relation pattern mining algorithm has been extendedin order to deal with these new operators By taking intoaccount the interval-based temporal data along with contex-tual information about events it discovers interesting andmore human-readable patterns The framework is evaluatedon real dataset collected from a wireless sensor networkmade up of 54 Mica2Dot [61] sensors deployed in the IntelBerkeley Research Lab [62] Each sensor collected topologyinformation along with humidity temperature light andvoltage values once every 31 seconds Results show the strongcorrelation among some measurements which is useful foranomaly detection

Cook et al [21] present MavHome smart home archi-tecture which focuses on the creation of an intelligenthome perceiving the state of the home through sensors andacting upon the environment through device controllers Animportant characteristic of the proposed architecture is theability to make decisions based on predicted activities Topredict the activities an algorithm called episode discovery(ED) is proposed which is based on the work of Srikantand Agrawal [54] for mining sequential patterns from time-ordered transactions Values that can be predicted include theusage pattern of devices in the home the movement patternsof the inhabitants and the typical activities of the inhabitantsThey utilize prediction algorithms on action sequences storedin inhabitant event history to forecast user actions Actionscan then be automated based on the significance of minedpatterns as well as the predictive accuracy of the next eventA key disadvantage is the fact that the entire action historymust be stored and processed off line which is not practicalfor large prediction tasks over a long period of time Cook etal demonstrated the effectiveness of MavHome on syntheticsmart home data and real data collected by students usingX10controllers in their homes Experiments show a predictiveaccuracy as high as 534 on the real data and 944 on thesynthetic data

Rabatel et al [22] presented a strategy to detect anomaliesfrom sensor data to improve the railway maintenance Theyextract sequential pattern from real railway data and identifythe abnormal behavior Based on these abnormal findingsalarms are automatically triggered to notify potential fail-ures This abnormal behavior depends on environmental(weather conditions travel characteristics) and structural(route episode index in the route) changes in data ThePSP [55] algorithm has been used to identify the sequentialpatterns To tackle the environments conditions a contextualknowledge-based method is proposed which is able toprovide information on the seriousness and possible causesof a deviation The proposed technique helps in proactivemaintenance of train However real-time context can beimproved by providing precise and exact information foranomaly detection

a q kTqkTaq

Figure 3 Example of sequential alarm pattern

Guralnik and Haigh [23] use sequential pattern miningto learn typical behaviors of humans in their homes Humanbehavior is inferred by using motion sensors pressure padsdoor latch sensors and toilet flush sensors They installed10ndash20 sensors of different types in a home and built modelsof what sensor firings correspond to what activities in whatorder and at what time For example ldquoIn 60 of the daysthe Kitchen-Motion sensor fires between 18h00 and 18h30and then the Living-Room-Motion sensor fires between18h20 and 20h00 and then the Bedroom-Motion sensor firesbetween 19h45 and 22h00rdquoTheir algorithm uses these data tolearn the sequences of rooms in which the person was actingand it uses domain knowledge to extract the sequences ofrooms the person was acting in These sequences are thenanalyzed by a human expert to identify complex behaviormodels These models can be used to select the appropriateresponse plan to the action of elderly

Wu et al [63] proposed a new algorithm for miningsequential alarm patterns (MSAPs) from the alarm datagenerated by GSM system Sequential events are identifiedfrom alarm data by defining time interval between adjacentevents For example if time is set as six hours then thesequential alarm pattern (119886 119887 119888) indicates that 119886 119887 and 119888happen in order and that the time interval between 119886 and119887 and between 119887 and 119888 is less than six hours An exampleof sequential alarm sequence redrawn from [63] is shown inFigure 3

The number in circle represents the error ID and 119879119886119902

denotes the time difference between alarm event 119886 and alarmevent 119902 The knowledge extracted is not only useful foridentifying relevance between two events but it is also predictthe alarm sequence and takes proper steps to prevent theoccurrence of the alarms if at all possible For example if thenetwork operator detects that the alarm 119886 occurring at time 119905operator should dissipate this alarm before the time 119905+119879

119886119902to

alleviate the abnormal situations incurred The limitation inthis technique is that it cannot discover other possible time-interval patterns between the events

It is observed that there is none of centralized solutionswhich aim to maximize the WSNsrsquo performance

422 DistributedApproaches Aim to SolveWSNsrsquo Application-Based Issues Tseng and Lu [64] proposed an object trackingstrategy named themultilevel object tracking (MLOT) to dis-cover sequential patterns in object tracking sensor networks(OTSNs) by mining the movement log in sensor networks Amultilevel hierarchical structure is adapted by using the clus-tering mechanism that represents the hierarchical relationsamong sensor nodes to achieve the goal of keeping track ofmoving objects in a real-time manner The movement logsof the moving objects are analyzed by developing the data

8 International Journal of Distributed Sensor Networks

mining algorithm movement pattern generation (MPG) toobtain themovement patterns which are then used to predictthe next position of a moving object and to activate the leastsensor node The MPG is based on Apriori which uses thefrequency of the inference pattern to evaluate the confidenceof the pattern and which with the highest frequency serves asthe basis of the prediction

423 Distributed Approaches Aim to Maximize WSNsrsquo Per-formance Tseng and Lin [65] proposed an object trackingstrategy named TMP-mine to discover sequential patternsin object tracking sensor networks (OTSNs) by mining thetemporal movement patterns (TMPs) logs The discoveredtemporal movement rules (TMRs) are used to predict thelocation of next objects for saving energy In the proposedmodel object is able to record the sensor nodes it visitedalong with the arrival time at each nodeThemovement log iscollected by equipping the sensor nodes with storage devicesTheWSN collects and integrates themovement log ofmovingobjects The integrated movement log is used as the input tothe data mining method named the TMP-miner which usesthe pattern growth approach for discovering the TMPs Byapplying the TMP-mine algorithm the TMPs are discoveredand then the temporalmovement rules (TMRs) are generatedfor predicting next location of moving object Suppose thatthe following two rules are discovered by vehicle trackingsystem

Rule 1 (Station A rarr interval 10min rarr Station B rarrinterval 5min rarr Station C)

Rule 2 (Station A rarr interval 20min rarr Station B rarrinterval 5min rarr Station rarr D)

By dispatching these rules to the corresponding sensornodes the tracking can be made in energy-efficient way Forexample if a car moves with the pattern as (Station A rarrinterval 10min rarr Station B rarr interval 5min) that matcheswith Rule 1 then the node in Station B has only to activatethe node in Station C rather than that in Station D or thosearound Station B

Samarah et al [66] proposed an energy-efficientprediction-based tracking technique by using the sequentialpatterns (PTSPs) This technique helps to predict the futurelocation of a moving object with the minimum number ofsensor nodes while keeping the other sensor nodes in thenetwork in sleep mode The PTSP is based on the inheritedpatterns of the objects movements in the network and theutilization of sequential patterns to predict in which sensornode the moving object will be heading next

43 Clustering Clustering is unsupervised learning wheregiven data is categorized into subsets so that each subsetrepresents a cluster which has distinctive properties It hasbeen considered a useful technique especially for applicationsthat require scalability to large number of sensor nodesClustering also supports aggregation of data in order tosummarize the overall transmitted data

ClustersInput sensor data

Feedback

Identification ofdata correlation Grouping data

Figure 4 Data clustering for sensor networks

In the current literatures problems related to clusteringare addressed by node clustering or data clustering Recentlylarge numbers of node clustering algorithms have beendesigned for WSNs [67ndash83] These clustering techniqueswidely vary in their objectives depending on the node deploy-ment and bootstrapping schemes the pursued networkarchitecture the characteristics of the cluster head (CH)and the network operation model Although node clusteringmay be related to data clustering for example consideringdata similarity of neighboring node many popular nodeclustering algorithms that partition the sensor nodes into anumber of small groups and elect a cluster head for everygroup do not use the data mining techniques directly In thisstudy we only focus on data clustering techniques to efficientdata mining and find data correlations among the nodesFigure 4 shows the commonly used data clustering in datamining process

This work adapted the K-mean hierarchical and datacorrelation-based methods The k-mean algorithm takes theinput parameter k and partitions a set of 119899 objects into kclusters so that the resulting intracluster similarity is highbut the intercluster similarity is low Cluster similarity ismeasured with respect to the mean value of the objectsin a cluster Hierarchical method creates a hierarchicaldecomposition of the given set of data objects It works bygrouping data objects into a tree of clusters whereas datacorrelation-based clustering forms clusters based on spatialand temporal correlations with similar node sensory valueswithin a given threshold and these clusters remain fixeduntil the sensory value threshold has changed over timeWhen the threshold values change the related sensor nodeswill then communicate with neighboring nodes associatedwith other clusters to change their cluster memberships Thedrawback of this type of clustering is that it does not considernode residual energy It is observed from the survey that thecentralized and distributed clustering solutions are aim tomaximize the WSNs performance

431 Centralized Approaches Aim to Maximize WSNsrsquo Per-formance Liu et al [84] proposed a centralized graph-basedenergy-efficient data collection (EEDC) EEDC is on-demandclustering algorithm that clusters node into groups such thatmembers have similar sensor readings and thus the protocolclusters the network with an awareness of the phenomenabeing sensed EEDC is a centralized approach where thesink compares data from different nodes with a user-defineddissimilarity measure EEDC models the cluster creationprocess as a clique-covering problem by constructing a graph119866 such that each sensor node is a vertex in the graph An edge(119906 V) is drawn if the dissimilarity measure between vertex119906 and vertex V is less than or equal to the given intracluster

International Journal of Distributed Sensor Networks 9

dissimilarity measure thresholdmax dst A cluster is a cliquein the graph and the clustering problem uses the minimumnumber of cliques to cover all vertices in the graph Thisprocess minimizes the number of clusters and maximizes theenergy saving The sink also dynamically adjusts the clustersbased on spatial correlation and the received data from thesensors The algorithm produces robust and well-balancedclusters However due to centralized processings it is notsuitable for large-scale WSNs

432 Distributed Approaches Aim toMaximizeWSNsrsquo Perfor-mance Guo et al [85] proposed the H-cluster a distributedalgorithm to cluster sensory dataThe input of this algorithmis the set of sensory data collected by all of the sensorsfrom the time WSN starts working up to the current timeThe output of the algorithm is a set of cluster featuresthat summarize the clusters of the input sensory data-setHilbert-Map mapping algorithm has been used to map ad-dimensional sensory data space into a 2-dimensional areacovered by a given WSN H-cluster has 2 phases (1) itmerges connected grid features with local cluster featuresof (sensory dimensional) D at each destination node (2)it combines the connected local clusters to global clustersThe experiments on the centralized and distributed dataare carried out to compare the H-Cluster with C-Cornerand C-Center algorithms During experiment four types ofenvironment attributes are sensed by the sensors which aretemperature humidity light and voltage The results showthatH-Cluster algorithm ismuch efficient in data loss energyand the quality of cluster data in small WSNThe results alsoshows that as the amount of sensory data delivered increasesthe amount of data loss also increases and energy efficiencydecreases by increasing the size of WSNs

Yeo et al [86] proposed data correlation-based clusteringscheme (DCC) based on similarity of sensor data along aspatial suppression scheme which helps to reduce the datasize DCC enhances the advertisement phase of HEED [71]in which cluster heads are selected according to probabilityof becoming a cluster head during this phase sensor nodescommunicate with each other and the resulting clustersare organized by sensor nodes which have similar readingsSpatial suppression is performed on cluster head and italso computes the difference between sensor reading andrepresentative value If a cluster head has redundant datait will remove it except for the node identification Theexperimental results justify the hypothesis claim that theclustering based on data correlation has better compressionperformance than ordinary clustering based on locality ofcommunication they show that DCC reduces 40 of datasize through suppression and prolongs network lifetime20ndash30 However for the large-scale network applications(nodes gt 500) DCC is inefficient because each cluster headneeds more energy to collect similar data readings and alsoto communicate with several nodes Also in case of lowpercentage of similar data reading DCC is ineffective due tohigher rate of cluster head creation

Beyens et al [87] proposed a cluster-based architecturefor wireless sensor networks in which cluster heads spa-tiotemporally correlate and predict the measurements of the

cluster members by executing their prediction model Intheir approach the cluster heads execute a prediction modelwhile gateway nodes at the circumference of the clusters areresponsible for the routing task Prediction model is used toselect a suitable node of the cluster to be activated The ideais to put a sensor node to sleep when there are no objects inits sensing region

Yoon and Shahabi [88] present the clustered aggregation(CAG) algorithm that forms clusters of nodes sensing similarvalues within a given threshold (spatial correlation) andthese clusters remain unchanged as long as the sensor valuesstay within a threshold over time (temporal correlation)By grouping nodes on similar values CAG only transmitsone reading per group When the threshold values changethe related sensor nodes will then communicate with neigh-boring nodes associated with other clusters to change theircluster memberships CAG guarantees the result to be withina user-specified error-tolerance threshold Cluster formationis performed while queries are disseminated to the network(query phase) where clusters group nodes sensing similarvalues Subsequently CAG enters the response phase whereinonly one aggregated value per cluster is transmitted up theaggregation tree CAG is a lossy clustering algorithm (mostsensory readings are never reported) which trades a lowerresult precision for a significant energy storage computationand communication saving

Taherkordi et al [67] proposed a communication-efficient distributed protocol for clustering sensory dataA distributed version of 119870-Mean clustering algorithm isproposed and sends summarized data towards sink whichreduces the communication transmission time and powerconsumption of sensor nodes The sensor network is dividedinto clusters and cluster head node will only communicatewith sink Initially base station transmits current centerlocations to cluster heads Cluster head collects data fromits sensor node and sends it to the base station includingcount and vector sum of its local sensory data points aswell as sum of the squared distance from each local pointto its center On receiving data from CH the base stationupdates the cluster mean and the algorithm repeats until thefunction convergence is met The efficiency of the algorithmis evaluated via simulations Several programs are run to getthe average number of transmissions over the network duringeach test According to results the communication cost isindependent of the number of sensors (119873) and increaseslinearly by increasing the number of centers Major issuesare extra memory for cluster head and computation powerfor summarization of data before transmitting to sink Alsothe algorithm requires multiple rounds of message passingbetween cluster heads and the base station this may have aserious effect on communication efficiency when the numberof sensors is relatively high

Wang et al [89] promoted the idea of clustering theWSNs based on the queries and attributes of the data Themain motive is to achieve efficient dissemination of data inthe network The concept resembles the data-centric designmodel of WSNs The clustering is established by mappinga hierarchy of data attributes to the network topology Thebase station starts the clustering process by asking nodes

10 International Journal of Distributed Sensor Networks

Class label (Y)

Attribute set (X)

OutputInput Classification model

Figure 5 Classification maps input attribute set (X) to class label(Y)

to form clusters Those nodes that hear the request decidewhether they should nominate themselves as CHs basedon their energy After receiving the base-station requestsensor nodes having intention to become CHs wait for arandom time period that is based on the remaining batterysupply If a node nominates itself then it broadcasts anannouncement to all nodes A node joins the CH that itcan reach over the least number of hops Upon hearing aCH announcement from a node whose attribute is differentthe recipient node establishes a new cluster for that attributeand becomes a CH To evaluate the attribute-based clusteringscheme the authors have provided the theoretical analysis ofit with flooding-based schemes Analysis shows its attribute-based clustering scheme yield that gains over flooding-basedschemeswhen there are subregions in the sensor network thatare more targeted than others that is when the distributionof inquiries is not uniformly distributed over time and space

Ma et al [90] the proposed distributed hierarchicalclustering and Summarization algorithm (DHCS) for onlinedata analysis and mining in sensor networks The proposedmethod clusters sensor nodes based on their current datavalues aswell as their geographical proximity and it computesa summary for each cluster The algorithm adopts severaltechniques such as difference and hop count thresholds tomodel node and distance-based clustering Initially eachnode treats itself as an active cluster Then similar adjacentclusters are merged into larger clusters round by round Ineach round each cluster will try to combine with its mostsimilar adjacent cluster simultaneously Two clusters can bemerged only if both consider one another as the most similarneighbor DHCS terminates when no merging happens anymore The final clusters which cannot be merged any moreare called steady clusters

44 Classification Classification is a task of assigning newobject into a class of predefined object categories Classifi-cation model is learned using the set of training data andclassifies new data into one of the learned class Figure 5shows that classification maps input attribute set (X) to classlabel (Y)

Classification-based approaches have adapted the tra-ditional classification techniques such as decision tree-based rule-based nearest neighbor-based and support vectormachines-based techniques based on type of the classificationmodel that they used Decision tree is a classifier in the formof tree and classifies the instance by starting at the root oftree and moving through it until a leaf node where class labelis assigned The internal nodes are used to partition datainto subsets by applying test condition to separate instancesthat have different characteristics Nearest neighbor-basedapproaches classify dataset based on closet training examples

The training examples are vectors in a multidimensionalfeature space with corresponding class labels A nearestneighbor classifier is a lazy learner that does not processpatterns during training [91] To respond a request to classifya query vector is made to locate the closest training vectorsaccording to the distance metricThe classes of these trainingvectors are used to assign a class to the query vector

Rule-based classifier groups the dataset in predefinedclasses by using ldquoif then rdquo rules of following form

(Condition) rarr Y condition is a conjunction ofattribute and Y is a class label

SVM (support vector machine) techniques partition thedata belonging to different classes by fitting a hyperplanebetween them which maximizes the partition The data ismapped into a higher-dimensional feature space where it canbe easily partitioned by a hyperplane Furthermore a kernelfunction is used to approximate the dot products between themapped vectors in the feature space to find the hyperplane

441 Centralized Approaches Aim to SolveWSNsrsquo Application-Based Issues Chikhaoui et al [92] proposed the decisionTree (DT-) based classification technique for sensor dataThey applied the classification model to identify the personsin ubiquitous environment In order to identify personsthe proposed approach first extracts frequent patterns calledepisodes from the datasets using the Apriori algorithm [53]The next step evaluates the extracted patterns and assignsweights to these episodes to construct frequent episodeweight matrix (FEWM)

Finally the classification algorithm Decision tree (DT) isapplied on FEWMDT builds pattern classifier from a labeledtraining data-set using a divide-and-conquer approach Tobuild up a DT model it recursively selects the attribute thatis used to partition the training data-set into subsets untileach leaf node in the tree has uniform class membershipThe proposed approach is validated by experiment usingdata collected from the Domus Laboratory [93] and theTestbed smart home [94] The general performance andclassification accuracy of algorithm are evaluated by usingthe Weka framework version 370 [95] Experiment resultsshow good classification However using frequent episodesalone without temporal constraints and deep analysis doesnot guarantee good identification

Sharma et al [96] proposed amethodology for classifyingthe sensors data by using nearest neighbor trajectory clas-sification (NNTC) The training phase simply stores everytraining example with its label To make a prediction for atest example first its distance to every training example iscomputedThen 119896 closest training examples are storedwhere119896 is a fixed integer and 119896 ge 1 among the 119896 examples itlooks for the label that is most frequent This label is theprediction for this test example The algorithm is evaluatedby building a classifier from the preprocessed training datagenerated from NS2 [97] and test trajectory data [98] usingclass labels Experimental investigation yields a significantoutput in terms of the correctly classified success rate 923

Akhlaghinia et al [99] proposed the prediction techniquein smart home environments to predict the behavior pattern

International Journal of Distributed Sensor Networks 11

of occupantsThe sensor NWs collect the variety of attributesincluding environmental changes and occupantrsquos interactionwith the environment The collected data is then used by thelearning approach to construct a classification-based predic-tive model to predict the ambient intelligence environmentoccupancy The occupancy is predicted by using the fuzzyrules which are modeled by using the past value of timeseries data In the learning process input from the sensor iscompared with stored rules to take appropriate action Theprediction-based approach improves the energy saving insmart homes and enhances the safety and security of occu-pants The result shows the ability of the proposed techniqueto predict the combined occupancy time series However themodel is implemented in single-user environment and unableto predict the complex environmental patterns in multi-userenvironment over long period

442 Centralized Approaches Aim toMaximizeWSNsrsquo Perfor-mance Gaber et al [100] proposed the lightweight classifica-tion (LWClass) a one-pass algorithm for on-board miningof data streams in sensor networks They used the algorithmoutput granularity (AOG) [101 102] technique to preserve thelimited memory size and change the algorithm output rateaccording to data rate available memory algorithm outputrate history and time constraints to fill the available memorywith generated knowledgeThe algorithmworks by searchingfor the nearest instance stored in main memory when a newelement arrives All instances are already stored in the mainmemory according to a prespecified distance threshold Thethreshold here represents the similarity measure acceptableby the algorithm to consider two or more elements as oneelement according to the elements attribute values If thealgorithm finds this element then it checks the class labelIf the class label is the same then it increases the weightfor this instance by one otherwise it decrements the weightby one If the weight becomes zero then this element isreleased from the memory The algorithm is empiricallyvalidated using synthetic streaming data under the resource-constrained environment of a common handheld computer

443 DistributedApproaches Aim to SolveWSNsrsquo Application-Based Issues McConnell and Skillicorn [103] presented adistributed framework for building and deploying predictorsin sensor networks By using the computational power ofeach sensor a powerful learning structure on whole networkis constructed A distributed voting approach is proposedin which each sensor is a leaf of tree (DT) to performlocal prediction Instead of sending the raw data the localpredictive models built on sensors transmit the target class tothe sink At sink the local predication models are combinedto construct global prediction model It shows how thelocal model enables sensors to respond to the change intarget by relearning local models The proposed frameworkis useful especially for sensor networks with limited energycomputation and bandwidth resources It makes efficientthe distributed data mining in the presence of movingclass boundaries Data is also confidentially achieved bytransmitting a predictivemodel instead of original data to the

sink The distributed prediction model is evaluated using J48decision tree (implemented in WEKA) on variety of datasetfor both simple and weighted voting schemes According toresults distributed prediction model has the potential of anincrease in accuracy combined with a reduction in modelsize and runtime as compared with a centralized approachMajor issues in this framework are the need of an expensiveCPU on each sensor node for computing and building localpredictive model and also extra memory is required to storelocal predictive model

444 Distributed Approaches Aim to Maximize WSNsrsquo Per-formance Malhotra et al [104] proposed a distributed clas-sification scheme to generate effective feature vectors of lowdimension (FVLD) for wireless audio network A distributedcluster-based algorithm for detection and classification ofvehicles has been proposed Sensors form clusters on-demand for the sake of running a classification task based onthe produced feature vectors The monitoring area is dividedinto clusters and a cluster head is selected for each clusterAll sensors send their feature vector to cluster heads Thecluster head combines all received feature vectors (includingone from itself) executes the classification task using forexample KNN or ML classifiers and makes decision on theclass of the unknown vehicle Two approacheswere proposedthe first combines extracted features and the second combinesindividual decisions Classification using decision fusion anda maximum likelihood (ML) classifier led to the best resultsML is also compared with KNN classifier with varioussettings of data and decision fusion schemes The proposedtechnique produced the best classification accuracy of 8946as compared with all other approaches

Flouri et al [105ndash107] have proposed distributed andincremental techniques for learning classification rules usingSVM-based (support vector machine) technique in a sensornetwork The authors proposed two distributed algorithmsthe distributed fix partition SVM (DFP-SVM) and theweighted distributed fix partition SVM (WDFP-SVM) fortraining a SVM applied to the classification problem in aWSN SVM is incrementally trained on example set calledsupport vector The fact with SVM is that the number ofsupport vectors is very small comparedwith the number of allsample values Besides the support vectors (and offset) revealcompressed representation of separating SVM hyperplaneThat is why sending only the support vectors instead ofall training samples to the next cluster head is obviouslyvery energy efficient due to communication reduction Aftertraining the required parameters of the kernel functions aretransferred to each node for classification The performanceof the proposed approach is evaluated by running number ofsimulation and comparison is made with centralized algo-rithm The results show that energy consumption decreaseswhen the SVM is trained incrementally as compared with thecentralized case However the challenges for SVM formula-tions are computational complexity and the choice of properkernel function

Rajasegarar et al [108] proposed the SVM-based tech-nique for outlier detection in sensor data This techniqueuses one-class quarter-sphere SVM to identify local outliers

12 International Journal of Distributed Sensor Networks

at each node and to minimize the computational complexityThe sensor data that lies outside the quarter sphere isconsidered as an outlier Each node communicates onlythe radius information of sphere with its parent for outlierclassification This technique identifies outliers from the datameasurements collected after a long-time window and is notperformed in real time The technique also ignores spatialcorrelation of neighboring nodes which makes the results oflocal outliers inaccurate The technique is evaluated by usingthe real sensor measurement collected from deployment ofwireless sensors in the Great Duck Island Project [2] formonitoring the habitat of sea birds The algorithm is imple-mented in Matlab and two simulations were run to measurethe computational strategy and various kernel functionsResults reveal that the proposed technique achieves signifi-cant energy savings in terms of communication overhead inthe network

5 Comparison of Data Mining Techniquesfor WSNs

This section identifies several common and different aspectsof data mining techniques specially designed for WSNsdiscussed above These aspects will be used as metrics in thecomparative Tables 2 3 4 5 and 6 First evaluation aspectsfor different techniques are discussed and then comparativetables are presented to compare and differentiate existing datamining techniques for WSNs data

51 Input Sensor Data Sensor data can be viewed as largevolume of real-valued data that is continuously collectedfrom WSNs The type of input sensor data demonstrateswhich data mining techniques can be used to analyze thedata Data mining techniques usually consider following twocharacteristics of data

Attribute Mining techniques can identify the associationbetween data attributes Attributes can be homogenous [50] orheterogeneous [33 48] Homogenous attribute means sensingsingle-value attribute for example temperature only Forheterogeneous case each nodemay be equippedwithmultiplesensors and can sense multiple attributes for example tem-perature humidity and pressure The data mining techniqueshould be able to identify the correlation between multipleattributes

Correlation Two types of data correlation appear at eachsensor node The first type is attribute correlation that isdependency among data attributes The second type is interms of time and space that is temporal and spatial corre-lation Temporal correlation indicates that the readings fromdifferent sensor node are observed at the same time instantand readings observed at one time instant are related tothe readings observed at the previous time instant whereasspatial correlation indicates that the readings from sensornodes geographically close to each other are expected tobe largely correlated Capturing spatiotemporal correlation

helps to predict future trend of sensor reading and identifica-tion of dead node if reading from correlated sensor ismissing

52 Processing Architecture In order to apply data miningtechnique on sensor data we need to determine the modelsof computation There are two general models Consider thefollowing

CentralizedThe simplest way to analyzeWSNs data is to use acentralized model In this approach entire raw data collectedfromWSNs is transferred to central server whichmaintains adatabase of readings from all of the sensorsThe central serverperforms offline extensive analysis in order to find interestingpatterns from the aggregated data With the size of WSNsincreasing the amount of data transmitted in the system willbecome huge The obvious drawback of this approach is highconsumption of energy and bandwidth Furthermore it is notscalable to very large number of sensors

Distributed Another computation approach uses distributedmodel in which sensor nodes use their processing abilitiesto carry out some mining tasks locally and transmit onlythe required and partially processed data called local modelLocal models contain the compact event patterns rather thanraw data For example data collected from different sensorcan be aggregated before being transmitted to central serverIn these systems an intermediate node called ldquoaggregatorrdquo isused to collect and aggregate the data from different sensorsSince sensor nodes are constrained in resources the challengefor this approach is how to satisfy the mining accuracywhile keeping the communication overhead memory andcomputational cost low

53 Data Mining Method It refers to the data miningalgorithm adapted or developed for unique characteristic ofWSNs data Distributed approaches use one-scan algorithmsfor real-time processing in order to deal with the high dataarrival rate the mining results are expected to be availablewithin short response times whereas centralized approachescollect the sensory data to single site and applies offlinemultiscan technique for extensive data analysis

54 Node Properties The proposed techniques are largelyinfluenced by following types of node properties

Connectivity Single-hop communication is a direct commu-nication between the sensor node and the base station It issimple and easy to implement but limited by communicationdistanceMultihop communication uses some kinds of nodesas relays when transmitting data packets from the source tothe sink which is more complex

Mobility Node mobility increases the complexity of design-ing an appropriate data mining technique for WSNs Themajority of techniques assumes that sensor nodes are staticonly a few techniques consider the node mobility Whennodes are mobile maintaining a certain structure for data

International Journal of Distributed Sensor Networks 13

Table2Com

paris

onof

dataminingtechniqu

esforw

irelesssensor

networks

Approach

Objectiv

eDM

metho

d

Processin

gSensor

data

Nod

eproperties

Implem

entatio

nLimitatio

nsArchitecture

Attributes

Correlatio

nCon

nectivity

Mob

ility

Nod

erole

Nod

etask

Applicationarea

Evaluatio

nmetho

dDatas

ource

Opt

objective

Distributed

Central

Homogenous

Heterogeneous

Attribute

Spatial

Temporal

Singlehop

Multihops

Static

Mobile

ClusterheadSensorRelay

Simulation

AnalyticalMod

Real

Synthetic

Frequent

patte

rnmining

DSA

RM[42]

Missingdata

estim

ation

Aprio

rilik

eradicradic

radicradic

radicradic

Sensea

ndsend

Traffi

cmon

itorin

gradic

radicData

accuracy

Igno

rethes

ensor

thatrepo

rts

different

values

In-networkdata

mining[51]

Eventspatte

rns

discovery

Aprio

rilik

eradic

radicradicradic

radicradic

radic

Aggregatio

nlocalp

attern

mining

Environm

ental

mon

itorin

gradicradicradic

Scalability

Highmem

oryand

commun

ication

Distrib

uted

data

aggregation[15]

ImproveW

SNperfo

rmance

Aprio

rilik

eradic

radicradic

radicradic

radicSupp

ort-b

ased

aggregation

WSN

sperfo

rmance

mon

itorin

gradic

radicDatas

ize

Increasesb

uffer

cost

delayed

crucialm

essages

Onlinea

lgorith

m[46]

Intervallist

ofrepresentatio

nof

WSN

sdata

Lossy

coun

ting

radicradic

radicradic

radicradic

Perio

dical

sensing

WSN

smon

itorin

gradic

radicTimea

ndmem

ory

Datar

edun

dancy

Lightweightrule

learning

[48]

Identifyhigh

lycorrelated

rules

forsensin

gAp

riorilik

eradic

radicradic

radicradic

radicQuery-based

data

sensing

Con

trolW

SNs

operations

radicradic

Energy

Not

valid

ated

well

onrealdata

CARM

[43]

Missingdata

estim

ation

FP-growth

based

radicradic

radicradic

radicradic

Sensea

ndsend

Dataa

nalysis

radicradic

Data

accuracy

Ineffi

cientfor

hand

ling

high

-speed

data

14 International Journal of Distributed Sensor Networks

Table3Com

paris

onof

dataminingtechniqu

esforw

irelesssensor

networkscon

tinued

Approach

Objectiv

eDM

metho

d

Processin

gSensor

data

Nod

eproperties

Implem

entatio

nLimitatio

nsArchitecture

Attributes

Correlatio

nCon

nectivity

Mob

ility

Nod

erole

Nod

etask

Applicationarea

Evaluatio

nmetho

dDatas

ource

Opt

objective

Distributed

Central

Homogenous

Heterogeneous

Attribute

Spatial

Temporal

Singlehop

Multihops

Static

Mobile

Clusterhead

Sensor

Relay

Simulation

Analyticalmod

Real

Synthetic

Frequent

patte

rnmining

Associationrules

mining

fram

ework[50]

Faultand

future

event

predictio

n

FP-growth

usingPL

T-str

uctureradic

radicradic

radicradic

radicradic

Aggregatio

nMon

itorW

SNs

quality

ofserviceradic

radicNoof

messages

Increase

costdu

eto

multip

leDBscan

SP-tr

ee[49]

Disc

over

events

patte

rns

FP-growth

based

radicradic

radicradic

radicradic

Sensea

ndsend

Generic

mon

itorin

gradicradicradic

Mem

ory

Hightre

econstructio

ncost

Sequ

entia

lpattern

mining

Relatio

nal

fram

ework[58]

Multi-

dimensio

nal

correlation

discovery

Aprio

rilik

eradic

radicradicradic

radicradic

Sensea

ndsend

Environm

ental

mon

itorin

gradicradic

Data

representatio

nMem

oryandtim

econsum

ing

Episo

dediscovery(ED)

[21]

Actio

npredictio

n

Generalized

sequ

entia

lpatte

rn(G

SP)

radicradic

radicradic

radicSensea

ndsend

Inhabitants

behavior

predictio

nradicradicradic

Predictio

naccuracy

Ineffi

cientfor

complex

activ

ities

MPG

[64]

Predicto

bjectrsquos

future

movem

ent

Aprio

rilik

eradic

radicradic

radicradicradic

Clusterin

gRe

al-timeo

bject

tracking

radicradic

Tracking

time

andenergy

Not

analyzed

onrealdataset

Con

textual

patte

rns

discovery[22]

Ano

maly

detection

PSP

radicradicradic

radicradic

radicSensea

ndsend

Railw

aymaintenance

radicradic

Ano

maly

precision

Missingreal-time

anom

alypredictio

n

International Journal of Distributed Sensor Networks 15

Table4Com

paris

onof

dataminingtechniqu

esforw

irelesssensor

networkscon

tinued

Approach

Objectiv

eDM

metho

d

Processin

gSensor

data

Nod

eproperties

Implem

entatio

nLimitatio

nsArchitecture

Attributes

Correlatio

nCon

nectivity

Mob

ility

Nod

erole

Nod

etask

Applicationarea

Evaluatio

nmetho

dDatas

ource

Optobjectiv

e

Distributed

Central

Homogenous

Heterogeneous

Attribute

Spatial

Temporal

Singlehop

Multihops

Static

Mobile

ClusterheadSensorRelay

Simulation

Analyticalmod

Real

Synthetic

Sequ

entia

lpattern

mining

TMP-mine[65]

Predicto

bjectrsquos

future

movem

ent

Patte

rngrow

thusingTM

P-tre

econstructio

nradic

radicradic

radicradic

radicRu

le-based

node

activ

ation

Real-timeo

bject

tracking

radicradic

Energy

Highmissing

rateandtim

e

Patte

rnlearner[23]B

ehavior

recogn

ition

Tree

projectio

nradic

radicradic

radicradic

radicSensea

ndsend

Behavior

mon

itorin

gradicradic

Noof

patte

rns

learned

Com

plex

and

redu

ndant

patte

rns

MSA

P[63]

Faultp

rediction

Cand

idate

constructio

nradicradic

radicradicradic

radicSensea

ndsend

Telecommun

ication

radicradic

Patte

rnsa

ccuracy

Cand

idate

constructio

nis

expensiveto

compu

te

PTSP

[66]

Objectrsquos

future

movem

ent

predictio

n

Sequ

entia

lpatte

rngeneratio

nradic

radicradic

radicradic

radicRu

le-based

node

activ

ation

Objecttracking

radicradic

Energy

Ineffi

cientto

predict

high

-speed

objects

Clusterin

g

DCC

[86]

WSN

slon

gevity

Data

correlation-

based

cluste

ring

radicradic

radicradicradic

radicradic

Data

supp

ression

GenericWSN

sapplication

radicradic

Energy

anddata

size

Highclu

sterin

grate

H-cluste

r[85]

In-network

commun

ication

Data

correlation-

based

cluste

ring

radicradic

radicradicradic

radicradic

Data

summarization

Real-time

mon

itorin

gradic

radicradic

Com

mun

ication

Highdataloss

rate

16 International Journal of Distributed Sensor Networks

Table5Com

paris

onof

dataminingtechniqu

esforw

irelesssensor

networkscon

tinued

Approach

Objectiv

eDM

metho

d

Processin

gSensor

data

Nod

eproperties

Implem

entatio

nLimitatio

nsArchitecture

Attributes

Correlatio

nCon

nectivity

Mob

ility

Role

Nod

etask

Applicationarea

Evaluatio

nmetho

dDatas

ource

Optobjectiv

e

Distributed

Central

Homogenous

Heterogeneous

Attribute

Spatial

Temporal

Singlehop

Multihops

Static

Mobile

ClusterheadSensorRelay

Simulation

Analyticalmod

Real

Synthetic

Clusterin

gPredictio

nmod

el[87]

Predictio

n-based

mon

itorin

gHeuris

ticscheme

radicradic

radicradic

radicradic

radicradicradic

Localprediction

mod

elEn

vironm

ental

mon

itorin

gradic

radicCom

mun

ication

Clustero

verla

pping

CAG[88]

WSN

sbandw

idth

gain

Data

correlation-

based

cluste

ring

radicradic

radicradic

radicradic

radicradic

Dataa

ggregatio

nGenericWSN

sapplications

radicradic

Com

mun

ication

Sensorydataloss

EEDC[84]

On-demand

cluste

ring

Data

correlation-

based

cluste

ring

radicradic

radicradic

radicradic

radicSensea

ndsend

Surveillanced

ata

analysis

radicradicradic

Energy

Ineffi

cientfor

large

WSN

s

Clusterin

gsensorydata[67]Com

mun

ication

efficiency

K-means

radicradicradic

radicradic

radicradic

Data

summarization

Dataa

nalysis

radicradic

Com

mun

ication

Ineffi

cientfor

large

WSN

sAttributeb

ased

cluste

ring[89]

WSN

sbandw

idth

gain

Hierarchal

cluste

ringradic

radicradic

radicradic

radicradic

Datac

luste

ring

Mon

itorin

gand

tracking

radicradic

Com

mun

ication

Highcompu

tatio

ncost

DHCS

[90]

Uniform

data

distr

ibution

Hierarchal

cluste

ringradic

radicradicradic

radicradic

radicradic

Datac

luste

ring

and

summarization

Interactived

ata

analysis

radicMessage

redu

ction

Nod

esenergy

isigno

red

International Journal of Distributed Sensor Networks 17

Table6Com

paris

onof

dataminingtechniqu

esforw

irelesssensor

networkscon

tinued

Approach

Objectiv

eDM

metho

d

Processin

gSensor

data

Nod

eproperties

Implem

entatio

nLimitatio

nsArchitecture

Attributes

Correlatio

nCon

nectivity

Mob

ility

Role

Nod

etask

Applicationarea

Evaluatio

nmetho

dDatas

ource

Opt

objective

Distributed

Central

Homogenous

Heterogeneous

Attribute

Spatial

Temporal

Singlehop

Multihops

Static

Mobile

ClusterheadSensorRelay

Simulation

Analyticalmod

Real

Synthetic

Classifi

catio

nPerson

identifi

catio

nalgorithm

s[109]

Identifyhu

man

behavior

Decision

tree

radicradicradic

radicradic

radicSensea

ndsend

Health

care

radicradic

Classifi

catio

naccuracy

Doesn

otgu

arantee

thec

orrectness

Predictio

nfram

ework[103]

Distrib

uted

predictio

nDecision

tree

radicradic

radicradicradic

radicradic

Localprediction

Generic

radicradic

Predictio

naccuracy

Com

putatio

nal

complexity

NNTC

[96]

Real-time

classificatio

nNearest

neighb

orradicradic

radicradic

radicradic

Sensea

ndsend

Generic

radicradicradic

Classifi

catio

naccuracy

Not

evaluatedon

realdataset

LWClass[100]

Preserve

WSN

sresources

KNN

radicradic

radicradic

radicradic

Sensea

ndsend

Ubiqu

itous

environm

ents

radicradic

Resource

awareness

Non

adaptio

nto

conceptd

rift

FVLD

[104

]Lo

w-dim

ensio

nfeaturev

ector

generatio

nKN

NM

Lradic

radicradic

radicradic

radicradic

Classifi

catio

nVe

hicle

classificatio

nradic

radicEn

ergy

Highcostof

feature

vector

transm

ission

Fuzzypredictor

mod

el[99]

Occup

ancy

predictio

nFu

zzyrules

radicradic

radicradic

radicradic

Sensea

ndsend

Health

care

radicradic

Predictio

naccuracy

Ineffi

cientfor

complex

scenarios

Onlinelearning

[105]

Increm

ental

classificatio

nSV

Mradic

radicradic

radicradic

radicradic

Classifi

catio

nEn

vironm

ental

mon

itorin

gradic

radicEn

ergy

Com

putatio

nal

complexity

One-class

quarter-sphere

SVM

[108]

Ano

maly

detection

SVM

radicradic

radicradic

radicradicradic

Localano

maly

detection

Habitat

mon

itorin

gradic

radicEn

ergy

Igno

resspatia

lcorrelation

18 International Journal of Distributed Sensor Networks

mining becomes difficult because updates on this structureshould be persisted over time

Node Role Node can perform three types of role [33] asfollows

(i) Regular Sensor These are the nodes with limitedresources and they are used to sense the phenomenaand send the sensed data to the base station

(ii) Cluster Head Cluster head can be a regular sensornode or it can be rich in resources In centralizedapproaches cluster head is a regular sensor node thatonly controls the cluster membership In distributedapproaches besides responding for cluster formationCHs perform aggregationfusion of collected sensorsrsquodata Therefore they are equipped with significantlymore computation and communication resources

(iii) Relay It is the node that acts as medium to transmitthe data packet from one node to the others

Node Task In centralized approach node task is to sense thephenomena being monitored and send the sensed data to thebase station In distributed approaches node can performcomputation and can take action based on the detectedphenomena or target

55 Application Area We also evaluated the type of applica-tion benefited fromWSNs data mining techniques Here weexemplify some real-world applications as follows

(i) First is the environmental monitoring [5ndash7 51 5887] in which sensors are deployed in harsh andunattended regions to monitor the natural environ-ment Data mining techniques can identify when andwhere an event may occur and trigger an alarm upondetection

(ii) Second is the habitant and health monitoring [1 299 109] in which patientshumans are equipped withsmall sensors on multiple different positions of theirbody tomonitor their health or behaviorDataminingtechnique can identify the abnormal behavior andhelp to take effective action

(iii) Third is the object tracking [3 4 65 66] in whichsensors are embedded inmoving targets to track themin real-time Data mining techniques help to improvethe estimation of the location of targets and also tomake tracking more efficient and accurate

(iv) Fourth is the WSNs performance [46 48 50 51]WSNs are usually unattended and deployed in harshenvironment Sensor nodes are resource constrainedespecially in terms of power Data mining techniqueshelp to identify the faulty or dead nodes Theyalso help to conserve energy by using in-networkprocessing in which aggregated data is sent to centralside

(v) Fifth is the data analysis [67 84 90] Data miningtechniques help to discover potentially interesting

data patterns in a sensor network for a certainapplication

(vi) Sixth is the real-time monitoring [64 65 85] Datamining techniques especially distributed techniqueshelp to identify certain patterns and predict futureevents in a given time window which make real-timeresponse and action feasible

56 Implementation Each technique is also evaluated interms of experimental validation that is which dataset isused which WSNs optimization objectives are achieved andso forth

Evaluation Method Analytical modeling simulation andreal deployment are the most commonly used techniques toanalyze the performance of data mining technique forWSNs

(i) Analytical Modeling This method is very complexand usually certain simplifications are assumed topredict the performance of the proposed schemeSuch assumptions and simplifications may lead toimprecise results with limited confidence

(ii) Simulation It is the most popular and effectiveapproach to design and test any proposed schemein terms of cost and time it also provides higherlevel of details as comparedwith real implementationHowever the appropriate selection of a simulationframework according to problem and network char-acteristics is a critical task

(iii) Real Deployment It may not be feasible to evaluatethe performance of these techniques through realdeployment due to the unavailability of appropriatehardware in terms of technical and design limitationsUsually the real deployment requires hundreds ofsensor nodes and cost becomes another importantissue In a nutshell evaluating any technique pro-posed for WSNs through real deployment can getthe most convincing results although the evaluatingprocess is complex costly and time consuming

Data Source It refers to dataset use to experimentally validatethe proposed technique Two types of dataset are usedgenerally that is synthetic and real It is observed from thispaper that most of the techniques use the simulation onsynthetic dataset to validate the result In this paper it isobserved that most of the studies used the simulation due tolimited processing power of sensor nodes

Optimization Objective SinceWSNs are constrained in termsof different resources the technique is also evaluated in theoptimization objective that has been achieved Most of thetechniques consider the resource constraint and differentdesign philosophies of network None of them can workefficiently for all of the performance metrics like networksize communication overhead energy efficiency memoryconsumption node mobility and and so forth The largevariations in the performance metrics make it a difficult taskto present a comprehensive evaluation

International Journal of Distributed Sensor Networks 19

6 Limitations of Existing Data MiningTechniques for WSNs

Tables 2ndash6 show the characteristics of datamining techniquesdesigned for WSNs It is observed from comparative analysisthat the existing techniques have the following shortcomings

(i) Most of the techniques do not take into account theheterogeneous data and assume that the sensor data ishomogenous [42 46 49ndash51 65 87 110] They ignorethe fact that different attributes together can improvethe mining accuracy In some cases homogenousdata cannot contribute appropriately toward real-time decision

(ii) The majority of techniques only considers the spatialor temporal or spatiotemporal correlations [65ndash6787 88] among sensor data of neighboring nodes anddoes not consider the attribute dependency amongsensor nodes This in turn increases the computa-tional complexity and reduces the accuracy of miningtechnique

(iii) The techniqueswhich consider spatial correlation [51]among sensor data of neighboring nodes suffer fromthe choice of appropriate neighborhood range Tech-niques which consider temporal correlation amongsensor data suffers from the choice of the size of thesliding window

(iv) The majority of techniques uses centralized approach[21 42ndash44 46 58 84 101] in which all data istransmitted to the sink node for identifying certainpatterns These techniques cause much communica-tion overhead and delay the response time Whilethe techniques that used distributed architecture opti-mize response time and energy consumption theyhave the same problem as that of the centralizedapproach if the aggregatorcluster head has a largenumber of nodes under its membership

(v) Excluding a few the performance of all of the schemesdiscussed in this paper has been evaluated with thehelp of different simulation tools Although the num-ber of simulators is available and plays an importantrole for developing and testing new technique thereis always some kind of risk involved as simulationresults may not be accurate In order to analyze aprotocol more effectively it is important to knowdifferent available tools andunderstand the associatedbenefits and limitationsDue to different performancerequirements according to specific applications ageneral tool for sensor networks is still lacking atpresent

(vi) The techniques evaluated by using analytical mod-eling [21 23 46 49 100 109] used certain sim-plification and assumption to evaluate the perfor-mance of proposed technique Such assumptions andsimplifications may lead to imprecise results withlimited confidence None of the proposed techniqueis evaluated by using real deployment Although realdeployment is complex costly and time consuming

accurate results can only be obtained by using realdeployment

(vii) Excluding a few [22 103 109] the majority oftechniques assumes that sensor nodes are stationaryand do not consider nodes mobility Applying thesetechniques for mobile networks or the networks withdynamic changed topology would be challenging

(viii) Most of the techniques used the synthetic dataAlthough synthetic data is easily available therealways been chances that results generated on syn-thetic data are not accurate

(ix) For the data mining techniques themselves fre-quent pattern mining [15ndash20] approaches suffer fromchoice of proper and flexible support and confidencethreshold Clustering techniques [11ndash14] suffer fromthe choice of an appropriate parameter of clusterwidth and computing the distance between datainstances in heterogeneous data is computationallyexpensive whereas classification-based techniques[24ndash26] require some prior knowledge to classify theincoming data stream However learning accurateclassification model is challenging if the number ofvariables is large in deployed WSNs

7 Future Research Directions

It is observed from the analysis of existing data mining workon sensor network-based application there are still shortcom-ings in existing techniques By seeing these shortcomingsand special characteristics of WSNs there is a need for datamining technique designed for WSNs The technique shouldbe based on the following requirements

(i) The technique should combine offline learningmech-anisms with distributed and online data processing

(ii) It should also consider the resource constraint ofWSN and its special characteristics such as nodemobility and network topology

(iii) The technique should consider heterogeneous dataand dependencies among spatial temporal andattribute correlations which may exist between adja-cent nodes

(iv) During online mining the technique should be capa-ble for incremental learning

(v) The technique should have low computation com-plexity and be easy to be implemented

Based on aforementioned requirements for WSN ahybrid data mining framework is proposed as shown inFigure 6 In this framework sensor nodes use their pro-cessing abilities to locally carry out mining processing andtransmit only the required and partially processed data calledlocal models Single-pass algorithms are applied for networkdata processing as the data is continuously arriving and notavailable for the next scan

Local models contain the compact event patterns ratherthan raw data which address the issue of communication

20 International Journal of Distributed Sensor Networks

Node data processingData selectionRemove duplicationAggregationSummarizationData fusionclusteringAssociation analysismiddot middot middotmiddot middot middot

middot middot middot

Sensor datastream

Global model

Approximateresults

Network model Local modelQuery

Users

Sinkbasestation

In-network processingCentralized processing

Central data processingFrequent pattern miningClassificationClusteringIncremental learningPredicationAnomaly detectionTime series analysis

Network data processingLocal model integrationNetwork analysisReal time decisionsNetwork maintenance

Network patternidentification

monitoring

Sing

le p

ass

Mul

ti pa

ss

Figure 6 Proposed hybrid framework for sensor network based applications

overhead associated with data transfer Local models aredistributed on entire network which are integrated at specialnode which is resource sufficient as compared with othersensor nodes As a result a network model is computed that ismore abstract than local model and is transferred to the basestationsink inmultihop fashionThenetworkmodels are thenintegrated at base stationsink to get the global view of entirenetwork named the global model As a result approximatequery answers are returned to endusers

This framework addresses the following shortcomings ofthe existing techniques

(i) It combines the offline learning mechanisms withdistributed and online data processing The dynamicnature of WSNs data requires real-time analysismethodologies and systems Centralized processingthrough high-end computing is also required forgenerating offline predictive insights which in turncan facilitate real-time analysis The applications thatrequire real-time response and actions can use net-work model for decision and knowledge extractionThe applications that need extensive data analysis fortheir decision making can use global model and per-form central processing on base the stationsink Thenetwork model forwards the processed informationto global model for extensive predictive insight

(ii) Since the data management is a crucial issue inWSNsdata [111] in order to deal with large-scale data fromWSNs the proposed framework splits the data pro-cessing tasks at multiple locations in-network pro-cessing and processing at central server In-networkprocessing splits the large task into smaller ones atnode level and cluster head which is distributed overthe entire network and executes parallelly At the node

level storage capacities of single nodes are used tocompute the local model which contains aggregateddata from single node whereas cluster head acquiresthe data from group of nodes and aggregate datareadings over a certain region or period As a resultnetwork model is computed at each cluster headwhich contains compact data from set of nodes andreduces data size to be transmitted Network modelscan be integrated at sink to get the global view ofreal-time applications Since the sink at network levelhas restricted resource and cannot process large-scaledata for predictive analysis therefore network mod-els are sent to central server where global models canbe computed for predictive offline analysis Historicalquery from the user can also be addressed fromcentral server whereas instant query can be handledby sink to support real-time response In this way ofdata distribution the proposed framework is feasibleto deal with large amount of data obtained fromWSNs

(iii) It can consider the resource constraint of sensornode by using context-awareness techniques Mem-ory energy [79] and bandwidth are considered inthe implementation of data processing on the sensorsfor example many summarization and aggregationtechniques can be adopted to reduce energy andbandwidth consumption

(iv) The framework can address the problem quicklychanging nature of WSNs data where characteristicsof the monitored process may change over timeand render the old models outdated This problemcan be addressed using the incremental learning

International Journal of Distributed Sensor Networks 21

mechanism [39 112] that helps the model to updatenew information

(v) The framework can identified the spatial-temporalcorrelation at local model by using data correlation-based clustering whereas attribute correlation can beidentified at global model by using the multipass datamining algorithms

Currently we are working on implementation of thishybrid framework and the implementationwill be completedin the near future

8 Conclusion

The emerging need for the data mining techniques in thefield of WSNs resulted in the development of numerousalgorithms Each one of these algorithms solves certainissues related to the appropriate WSNs type and applicationIn this paper we analyzed discussed and compared therelated existing research approaches We observed that thetechniques intended for mining sensor data at the networkside are helpful for taking real-time decision aswell as serve asprerequisite for development of effective mechanism for datastorage retrieval query and transaction processing at centralside Moreover we have presented problem-based taxonomyan overall analysis and review of the past research and theirlimitations which can provide insights for endusers in apply-ing or developing an appropriate data mining method andappropriate technology forWSNs Based on these limitationswe have proposed a hybrid framework which can addressthe shortcomings of existing work We have also discussedthe challenges for implementing data mining techniques inresource-constrained WSNs Besides there are a number ofopen issues in existing studies which need to be addressedSurely the number of WSNs applications presented hereis neither complete nor exhaustive but merely a sample ofapplications that demonstrate the usefulness and possibleapplications of data mining method in sensor network

We believe that WSNs applications will become moremature and popular with the advancement of sensor tech-nology and sensor data will become more informationrich Mining techniques will then be very significant inorder to conduct advanced analysis such as determiningtrends and finding interesting patterns thus enhancingWSNsperformance and operation The intention to present thispaper is to stimulate interests in utilizing and developing theprevious studies into emerging applications

Acknowledgments

This work was supported in part by the Joint Funds ofNSFC-Microsoft Research Asia under Grant no 60933012the Specialized Research Fund for the Doctoral Programof Higher Education under Grant no 20110142110062 andInternational SampT Cooperation Program of Hubei Provinceunder Grant no 2010BFA008

References

[1] A Rozyyev H Hasbullah and F Subhan ldquoIndoor child track-ing in wireless sensor network using fuzzy logic techniquerdquoResearch Journal of Information Technology vol 3 no 2 pp 81ndash92 2011

[2] R Szewczyk E Osterweil J Polastre M Hamilton A Main-waring and D Estrin ldquoHabitat monitoring with sensor net-worksrdquo Communications of the ACM vol 47 no 6 pp 34ndash402004

[3] S H Chauhdary A K Bashir S C Shah and M S ParkldquoEOATR energy efficient object tracking by auto adjustingtransmission range in wireless sensor networkrdquo Journal ofApplied Sciences vol 9 no 24 pp 4247ndash4252 2009

[4] P K Biswas and S Phoha ldquoSelf-organizing sensor networks forintegrated target surveillancerdquo IEEETransactions onComputersvol 55 no 8 pp 1033ndash1047 2006

[5] L T Lee and C W Chen ldquoSynchronizing sensor networkswith pulse coupled and cluster based approachesrdquo InformationTechnology Journal vol 7 no 5 pp 737ndash745 2008

[6] N Sabri S A Aljunid B Ahmad A Yahya R KamaruddinandM S Salim ldquoWireless sensor actor network based on fuzzyinference system for greenhouse climate controlrdquo Journal ofApplied Sciences vol 11 no 17 pp 3104ndash3116 2011

[7] D Kumar ldquoMonitoring forest cover changes using remotesensing and GIS a global prospectiverdquo Research Journal ofEnvironmental Sciences vol 5 pp 105ndash123 2011

[8] J Yick B Mukherjee and D Ghosal ldquoWireless sensor networksurveyrdquoComputerNetworks vol 52 no 12 pp 2292ndash2330 2008

[9] T Arampatzis J Lygeros and S Manesis ldquoA survey of appli-cations of wireless sensors and wireless sensor networksrdquoin Proceedings of the 20th IEEE International Symposium onIntelligent Control (ISIC rsquo05) pp 719ndash724 June 2005

[10] Y-C Tseng M-S Pan and Y-Y Tsai ldquoWireless sensor net-works for emergency navigationrdquo Computer vol 39 no 7 pp55ndash62 2006

[11] T Yairi Y Kato and K Hori ldquoFault detection by miningassociation rules fromhouse-keeping datardquo inProceedings of the6th International Symposium on Artificial Intelligence Roboticsand Automation in Space pp 18ndash21 2001

[12] O Horovitz S Krishnaswamy and M M Gaber ldquoA fuzzyapproach for interpretation of ubiquitous data stream clusteringand its application in road safetyrdquo Intelligent Data Analysis vol11 no 1 pp 89ndash108 2007

[13] J Gama P P Rodrigues and L Lopes ldquoClustering distributedsensor data streams using local processing and reduced com-municationrdquo Intelligent Data Analysis vol 15 no 1 pp 3ndash282011

[14] Z A Aghbari I Kamel and T Awad ldquoOn clustering largenumber of data streamsrdquo Intelligent Data Analysis vol 16 no1 pp 69ndash91 2012

[15] A Boukerche and S Samarah ldquoAn efficient data extractionmechanism for mining association rules from wireless sensornetworksrdquo in Proceedings of the IEEE International Conferenceon Communications (ICC rsquo07) pp 3936ndash3941 June 2007

[16] Y Chi H Wang P S Yu and R R Muntz ldquoMomentmaintaining closed frequent itemsets over a stream slidingwindowrdquo inProceedings of the 4th IEEE International Conferenceon Data Mining (ICDM rsquo04) pp 59ndash66 November 2004

[17] M Deypir and M H Sadreddini ldquoEclatDS an efficient slid-ing window based frequent pattern mining method for data

22 International Journal of Distributed Sensor Networks

streamsrdquo Intelligent Data Analysis vol 15 no 4 pp 571ndash5872011

[18] J Gama A Ganguly O Omitaomu R Vatsavai and M GaberldquoKnowledge discovery from data streamsrdquo Intelligent DataAnalysis vol 13 no 3 pp 403ndash404 2009

[19] B George J M Kang and S Shekhar ldquoSpatio-temporal sensorgraphs (STSG) a data model for the discovery of spatio-temporal patternsrdquo Intelligent Data Analysis vol 13 no 3 pp457ndash475 2009

[20] A Mahmood K Shi and S Khatoon ldquoMining data generatedby sensor networks a surveyrdquo Information Technology Journalvol 11 pp 1534ndash1543 2012

[21] D J Cook M Youngblood E O Heierman III et alldquoMavHome an agent-based smart homerdquo in Proceedings of the1st IEEE International Conference on Pervasive Computing andCommunications (PerCom rsquo03) pp 521ndash524 March 2003

[22] J Rabatel S Bringay and P Poncelet ldquoSO MAD sensorminingfor anomaly detection in railway datardquo in Advances in DataMining Applications andTheoretical Aspects pp 191ndash205 2009

[23] V Guralnik and K Z Haigh ldquoLearning models of humanbehaviour with sequential patternsrdquo in Proceedings of the AAAI-02 Workshop on Automation as Caregiver pp 24ndash30 2002

[24] S Huang and Y Dong ldquoAn active learning system for miningtime-changing data streamsrdquo Intelligent Data Analysis vol 11no 4 pp 401ndash419 2007

[25] J Beringer and E Hullermeier ldquoEfficient instance-based learn-ing on data streamsrdquo Intelligent Data Analysis vol 11 no 6 pp627ndash650 2007

[26] E J Spinosaa A PD L F deCarvalhoa and J Gamab ldquoNoveltydetection with application to data streamsrdquo Intelligent DataAnalysis vol 13 no 3 pp 405ndash422 2009

[27] M Xie S Han B Tian and S Parvin ldquoAnomaly detectionin wireless sensor networks a surveyrdquo Journal of Network andComputer Applications vol 34 no 4 pp 1302ndash1325 2011

[28] Y Zhang N Meratnia and P Havinga ldquoOutlier detectiontechniques for wireless sensor networks a surveyrdquo IEEE Com-munications Surveys and Tutorials vol 12 no 2 pp 159ndash1702010

[29] V Chandola A Banerjee and V Kumar ldquoAnomaly detection asurveyrdquo ACM Computing Surveys vol 41 no 3 article 15 2009

[30] VMaojo and J Sanandre ldquoA survey of data mining techniquesrdquoMedical Data Analysis Lecture Notes in Computer Science vol1933 pp 17ndash22 2000

[31] W Jinlong X Congfu C Weidong and P Yunhe ldquoSurveyof the study on frequent pattern mining in data streamsrdquo inProceedings of the IEEE International Conference on SystemsMan and Cybernetics (SMC rsquo04) pp 5917ndash5922 October 2004

[32] J Cheng Y Ke and W Ng ldquoA survey on algorithms formining frequent itemsets over data streamsrdquo Knowledge andInformation Systems vol 16 no 1 pp 1ndash27 2008

[33] A A Abbasi andM Younis ldquoA survey on clustering algorithmsfor wireless sensor networksrdquo Computer Communications vol30 no 14-15 pp 2826ndash2841 2007

[34] O Boyinbode H Le and M Takizawa ldquoA survey on clusteringalgorithms for wireless sensor networksrdquo International Journalof Space-Based and SituatedComputing vol 1 no 2 pp 130ndash1362007

[35] M M Gaber A Zaslavsky and S Krishnaswamy ldquoA survey ofclassificationmethods in data streamsrdquo inData Streams pp 39ndash59 Springer 2007

[36] R Agrawal and R Srikant ldquoFast algorithms for mining associ-ation rulesrdquo in Proceedings of the 20th International ConferenceVery Large Data Bases (VLDB rsquo94) pp 487ndash499 Citeseer 1994

[37] R J Bayardo Jr ldquoEfficiently mining long patterns fromdatabasesrdquo SIGMOD Record vol 27 no 2 pp 85ndash93 1998

[38] S Brin RMotwani andC Silverstein ldquoBeyondmarket basketsgeneralizing association rules to correlationsrdquo SIGMODRecordvol 26 no 2 pp 265ndash276 1997

[39] W Cheung and O R Zaiane ldquoIncremental mining of frequentpatterns without candidate generation or support constraintrdquoin Proceedings of 7th International Database Engineering andApplications Symposium pp 111ndash116 2003

[40] R Agrawal T Imielinski and A Swami ldquoMining associationrules between sets of items in large databasesrdquo in Proceeding ofSIGMOD pp 207ndash216

[41] J Han J Pei Y Yin and R Mao ldquoMining frequent pat-terns without candidate generation a frequent-pattern treeapproachrdquo Data Mining and Knowledge Discovery vol 8 no 1pp 53ndash87 2004

[42] M Halatchev and L Gruenwald ldquoEstimating missing valuesin related sensor data streamsrdquo in Proceedings of the 11thInternational Conference on Management of Data (COMADrsquo05) 2005

[43] N Jiang ldquoDiscovering association rules in data streams basedon closed pattern miningrdquo in Proceedings of the SIGMODWorkshop on Innovative Database Research 2007

[44] N Jiang and L Gruenwald ldquoEstimating missing data in datastreamsrdquo Advances in Databases Concepts Systems and Appli-cations pp 981ndash987 2007

[45] N Jiang and L Gruenwald ldquoCFI-stream mining closed fre-quent itemsets in data streamsrdquo in Proceedings of the 12th ACMSIGKDD International Conference on Knowledge Discovery andData Mining (KDD rsquo06) pp 592ndash597 August 2006

[46] K Loo I Tong and B Kao ldquoOnline algorithms for min-ing inter-stream associations from large sensor networksrdquo inAdvances in KnowledgeDiscovery andDataMining pp 291ndash3022005

[47] G S Manku and R Motwani ldquoApproximate frequency countsover data streamsrdquo in Proceedings of the 28th InternationalConference on Very Large Data Bases pp 346ndash357 2002

[48] S K Chong S Krishnaswamy S W Loke and M M GaberldquoUsing association rules for energy conservation in wirelesssensor networksrdquo in Proceedings of the 23rd Annual ACMSymposium on Applied Computing (SAC rsquo08) pp 971ndash975March 2008

[49] S K Tanbeer C F Ahmed B-S Jeong and Y-K Lee ldquoEfficientmining of association rules from wireless sensor networksrdquo inProceedings of the 11th International Conference on AdvancedCommunication Technology (ICACT rsquo09) pp 719ndash724 February2009

[50] A Boukerche and S Samarah ldquoA novel algorithm for miningassociation rules in Wireless Ad Hoc Sensor Networksrdquo IEEETransactions on Parallel and Distributed Systems vol 19 no 7pp 865ndash877 2008

[51] K Romer ldquoDistributed mining of spatio-temporal event pat-terns in sensor networksrdquo in Proceedings of the 1st Euro-American Workshop on Middleware for Sensor Networks(EAWMS rsquo06) 2006

[52] BTnode platform httpwwwbtnodeethzch[53] R Agrawal and R Srikant ldquoMining sequential patternsrdquo in

Proceedings of the IEEE 11th International Conference on DataEngineering pp 3ndash14 March 1995

International Journal of Distributed Sensor Networks 23

[54] R Srikant and R Agrawal ldquoMining sequential patterns gen-eralizations and performance improvementsrdquo in Proceedings ofthe Advances in Database Technology (EDBT rsquo96) pp 1ndash17 1996

[55] F Masseglia F Cathala and P Poncelet ldquoThe PSP approachfor mining sequential patternsrdquo Principles of Data Mining andKnowledge Discovery pp 176ndash184 1998

[56] J Han J Pei B Mortazavi-Asl Q Chen U Dayal and M-CHsu ldquoFreeSpan frequent pattern-projected sequential patternminingrdquo in Proceedings of the Sixth ACMSIGKDD InternationalConference onKnowledgeDiscovery andDataMining (KDD rsquo01)pp 355ndash359 August 2000

[57] J Pei J Han B Mortazavi-Asl et al ldquoPrefixSpan min-ing sequential patterns efficiently by prefix-projected patterngrowthrdquo in Proceedings of the 17th International Conference onData Engineering pp 215ndash224 April 2001

[58] F Esposito T M A Basile N Di Mauro and S Ferilli ldquoA rela-tional approach to sensor network data miningrdquo InformationRetrieval and Mining in Distributed Environments pp 163ndash1812010

[59] F Esposito N Di Mauro T M A Basile and S FerillildquoMulti-dimensional relational sequence miningrdquo FundamentaInformaticae vol 89 no 1 pp 23ndash43 2008

[60] R Agrawal H Mannila R Srikant et al ldquoFast discovery ofassociation rulesrdquo inAdvances in KnowledgeDiscovery andDataMining pp 307ndash328 AAAI PressMenlo Park Calif USA 1996

[61] Mica2Dot CrossBow 2005 httpwwwxbowcom[62] Intel Berkeley Research Lab Data httpdbcsailmitedulab-

datalabdatahtml[63] P H Wu W C Peng and M S Chen ldquoMining sequential

alarm patterns in a telecommunication databaserdquo in Databasesin Telecommunications II pp 37ndash51 2001

[64] V S Tseng and E H-C Lu ldquoEnergy-efficient real-time objecttracking in multi-level sensor networks by mining and predict-ing movement patternsrdquo Journal of Systems and Software vol82 no 4 pp 697ndash706 2009

[65] V S Tseng and K W Lin ldquoEnergy efficient strategies for objecttracking in sensor networks a data mining approachrdquo Journalof Systems and Software vol 80 no 10 pp 1678ndash1698 2007

[66] S Samarah M Al-Hajri and A Boukerche ldquoA predictiveenergy-efficient technique to support object-tracking sensornetworksrdquo IEEE Transactions on Vehicular Technology vol 60no 2 pp 656ndash663 2011

[67] A Taherkordi R Mohammadi and F Eliassen ldquoA commu-nication-efficient distributed clustering algorithm for sensornetworksrdquo in Proceedings of the 22nd International Conferenceon Advanced Information Networking and Applications Work-shopsSymposia (AINA rsquo08) pp 634ndash638 March 2008

[68] G Gupta and M Younis ldquoLoad-balanced clustering of wirelesssensor networksrdquo in Proceedings of the International Conferenceon Communications (ICC rsquo03) vol 3 pp 1848ndash1852 May 2003

[69] S Bandyopadhyay and E J Coyle ldquoAn energy efficient hier-archical clustering algorithm for wireless sensor networksrdquo inProceedings of the 22nd Annual Joint Conference on the IEEEComputer and Communications Societies pp 1713ndash1723 April2003

[70] S Ghiasi A Srivastava X Yang and M Sarrafzadeh ldquoOptimalenergy aware clustering in sensor networksrdquo Sensors vol 2 no7 pp 258ndash269 2002

[71] O Younis and S Fahmy ldquoHEED a hybrid energy-efficientdistributed clustering approach for ad hoc sensor networksrdquoIEEE Transactions on Mobile Computing vol 3 no 4 pp 366ndash379 2004

[72] M Younis M Youssef and K Arisha ldquoEnergy-aware manage-ment for cluster-based sensor networksrdquo Computer Networksvol 43 no 5 pp 649ndash668 2003

[73] Y T Hou Y Shi H D Sherali and S F Midkiff ldquoOn energyprovisioning and relay node placement for wireless sensornetworksrdquo IEEE Transactions on Wireless Communications vol4 no 5 pp 2579ndash2590 2005

[74] T Wu and S Biswas ldquoA self-reorganizing slot allocation proto-col for multi-cluster sensor networksrdquo in Proceedings of the 4thInternational Symposium on Information Processing in SensorNetworks (IPSN rsquo05) pp 309ndash316 April 2005

[75] K Dasgupta K Kalpakis and P Namjoshi ldquoAn efficientclustering-based heuristic for data gathering and aggregationin sensor networksrdquo in Proceedings of the IEEE Wireless Com-munications and Networking Conference (WCNC rsquo03) vol 3 pp1948ndash1953 2003

[76] M Demirbas A Arora and V Mittal ldquoFLOC A fast local clus-tering service for wireless sensor networksrdquo in Proceedings ofWorkshop on Dependability Issues in Wireless Ad Hoc Networksand Sensor Networks (DIWANS rsquo04) 2004

[77] P Ding J Holliday and A Celik ldquoDistributed energy-efficienthierarchical clustering for wireless sensor networksrdquo in Pro-ceedings of the 1st IEEE International Conference on DistributedComputing in Sensor Systems (DCOSS rsquo05) pp 466ndash467 July2005

[78] H Chan and A Perrig ldquoACE an emergent algorithm for highlyuniform cluster formationrdquoWireless Sensor Networks vol 2920pp 154ndash171 2004

[79] H Chan M Luk and A Perrig ldquoUsing clustering informationfor sensor network localizationrdquo in Proceedings of the 1st IEEEInternational Conference on Distributed Computing in SensorSystems (DCOSS rsquo05) pp 109ndash125 July 2005

[80] H Huang and J Wu ldquoA probabilistic clustering algorithmin wireless sensor networksrdquo in Proceeding of IEEE 62ndSemiannual Vehicular Technology Conference (VTC rsquo05) p 17962005

[81] A Youssef M Younis M Youssef and A Agrawala ldquoDis-tributed formation of overlappingmulti-hop clusters in wirelesssensor networksrdquo in Proceedings of the 49th Annual IEEE GlobalCommunication Conference (Globecom rsquo06) pp 1ndash6 December2006

[82] S Dai P Wang L Gao and S Zheng ldquoMining clusteringalgorithm in wireless sensor networksrdquo in Proceedings of theIEEE International Conference on Granular Computing (GRCrsquo08) pp 178ndash182 August 2008

[83] W R Heinzelman A Chandrakasan and H Balakrish-nan ldquoEnergy-efficient communication protocol for wirelessmicrosensor networksrdquo in Proceedings of the 33rd AnnualHawaii International Conference on System Siences (HICSS rsquo00)vol 2 p 223 January 2000

[84] C Liu K Wu and J Pei ldquoA dynamic clustering and schedulingapproach to energy saving in data collection from wirelesssensor networksrdquo in Proceedings of the 2nd Annual IEEE Com-munications Society Conference on Sensor and AdHoc Commu-nications and Networks (SECON rsquo05) pp 374ndash385 September2005

[85] L Guo C Ai X Wang Z Cai and Y Li ldquoReal time clusteringof sensory data in wireless sensor networksrdquo in Proceedingsof the IEEE 28th International Performance Computing andCommunications Conference (IPCCC rsquo09) pp 33ndash40 December2009

24 International Journal of Distributed Sensor Networks

[86] M H Yeo M S Lee S J Lee and J S Yoo ldquoData correlation-based clustering in sensor networksrdquo in Proceedings of the Inter-national Symposium on Computer Science and its Applications(CSA rsquo08) pp 332ndash337 October 2008

[87] P Beyens A Nowe and K Steenhaut ldquoHigh-density wirelesssensor networks a new clustering approach for prediction-based monitoringrdquo in Proceedings of the 2nd European Work-shop on Wireless Sensor Networks (EWSN rsquo05) pp 188ndash196February 2005

[88] S Yoon and C Shahabi ldquoThe Clustered AGgregation (CAG)technique leveraging spatial and temporal correlations in wire-less sensor networksrdquo ACM Transactions on Sensor Networksvol 3 no 1 Article ID 1210672 2007

[89] K Wang S A Ayyash T D C Little and P Basu ldquoAttribute-based clustering for information dissemination in wirelesssensor networksrdquo in Proceedings of the 2nd Annual IEEE Com-munications Society Conference on Sensor and AdHoc Commu-nications and Networks (SECON rsquo05) pp 498ndash509 Santa ClaraCalif USA September 2005

[90] X Ma S Li Q Luo et al ldquoDistributed hierarchical clusteringand summarization in sensor networksrdquo in Advances in Dataand Web Management pp 168ndash175 2007

[91] L K Sharma O P Vyas S Schieder et al ldquoNearest neighbourclassification for trajectory datardquo Information and Communica-tion Technologies vol 101 pp 180ndash185 2010

[92] B Chikhaoui S Wang and H Pigot ldquoA new algorithm basedon sequential pattern mining for person identification in ubiq-uitous environmentsrdquo in Proceedings of the 4th InternationalWorkshop on Knowledge Discovery form Sensor Data (ACMSensorKDD rsquo10) pp 20ndash28 Washington DC USA 2010

[93] J R M Bauchet S Giroux H Pigot et al ldquoPervasive assistancein smart homes for people with intellectual disabilities a casestudy on meal preparationrdquo International Journal of AssistiveRobotics and Mechatronics vol 9 no 4 pp 42ndash54 2008

[94] D J Cook andM Schmitter-Edgecombe ldquoAssessing the qualityof activities in a smart environmentrdquoMethods of Information inMedicine vol 48 no 5 pp 480ndash485 2009

[95] I H Witten and E Frank Data Mining Practical MachineLearning Tools and Techniques With Java Implementation Mor-gan Kaufmann 2000

[96] K Sharma M Rajpoot and L K Sharma ldquoNearest neighbourclassification for wireless sensor network datardquo InternationalJournal of Computer Trends and Technology no 2 2011

[97] NS2 Simulator httpwwwisiedunsnamns[98] O P V L K Sharma S Schieder and A K Akasapu ldquoA nearest

neighbour classification for trajectory datardquo in Springer CCISvol 101 pp 180ndash185 2010

[99] M J Akhlaghinia A Lotfi C Langensiepen and N SherkatldquoA fuzzy predictor model for the occupancy prediction of anintelligent inhabited environmentrdquo in Proceedings of the IEEEInternational Conference on Fuzzy Systems (FUZZ rsquo08) pp 939ndash946 June 2008

[100] M Gaber S Krishnaswamy and A Zaslavsky ldquoOn-boardmining of data streams in sensor networksrdquo in AdvancedMethods for Knowledge Discovery from Complex Data pp 307ndash335 2005

[101] M M Gaber S Krishnaswamy and A Zaslavsky ldquoAdaptivemining techniques for data streams using algorithm outputgranularityrdquo in Proceedings of the Australasian Data MiningWorkshop 2003

[102] M M Gaber A Zaslavsky and S Krishnaswamy ldquoResource-aware knowledge discovery in data streamsrdquo in Proceedingsof 1st International Workshop on Knowledge Discovery in DataStreams held in Conjunction ECML and PKDD 2004

[103] S M McConnell and D B Skillicorn ldquoA distributed approachfor prediction in sensor networksrdquo in Proceedings of the Work-shop on Data Mining in Sensor Networks Newport Beach CalifUSA 2005

[104] B Malhotra I Nikolaidis and J Harms ldquoDistributed classifi-cation of acoustic targets in wireless audio-sensor networksrdquoComputer Networks vol 52 no 13 pp 2582ndash2593 2008

[105] K Flouri B Beferull-Lozano and T Tsakalides ldquoTraininga SVM-based classifier in distributed sensor networksrdquo inProceedings of the 14th International Conference onDigital SignalProcessing (DSP rsquo09) pp 1ndash5 2006

[106] K Flouri B Beferull-Lozano and T Tsakalides ldquoEnergy-efficient distributed support vectormachines for wireless sensornetworksrdquo in Proceedings of the EuropeanWorkshop onWirelessSensor Networks 2006

[107] K Flouri B Beferull-Lozano and T Tsakalides ldquoDistributedconsensus algorithms for SVM training in wireless sensornetworksrdquo in Proceedings of the 16th European Signal ProcessingConference (EUSIPCO 09) 2008

[108] S Rajasegarar C Leckie M Palaniswami and J C BezdekldquoQuarter sphere based distributed anomaly detection in wire-less sensor networksrdquo in Proceedings of the IEEE InternationalConference on Communications (ICC rsquo07) pp 3864ndash3869 June2007

[109] B Chikhaoui S Wang and H Pigot ldquoA new algorithm basedon sequential pattern mining for person identification in ubiq-uitous environmentsrdquo in Proceedings of the 4th InternationalWorkshop on Knowledge Discovery form Sensor Data (ACMSensorKDD rsquo10) pp 20ndash28 Washington DC USA 2010

[110] K Romer and F Mattern ldquoThe design space of wireless sensornetworksrdquo IEEEWireless Communications vol 11 no 6 pp 54ndash61 2004

[111] O Diallo J J P C Rodrigues and M Sene ldquoReal-time datamanagement on wireless sensor networks a surveyrdquo Journal ofNetwork andComputer Applications vol 35 no 3 pp 1013ndash10212012

[112] Y Yao L Feng B Jin and F Chen ldquoAn incremental learningapproachwith SupportVectorMachine for network data streamclassification problemrdquo Information Technology Journal vol 11no 2 pp 200ndash208 2012

Submit your manuscripts athttpwwwhindawicom

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2013Part I

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

DistributedSensor Networks

International Journal of

ISRN Signal Processing

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Mechanical Engineering

Advances in

Modelling amp Simulation in EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Advances inOptoElectronics

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2013

ISRN Sensor Networks

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawi Publishing Corporation httpwwwhindawicom Volume 2013

The Scientific World Journal

ISRN Robotics

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

International Journal of

Antennas andPropagation

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

ISRN Electronics

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

thinspJournalthinspofthinsp

Sensors

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Active and Passive Electronic Components

Chemical EngineeringInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Electrical and Computer Engineering

Journal of

ISRN Civil Engineering

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Advances inAcoustics ampVibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Page 7: ReviewArticle Data Mining Techniques for Wireless Sensor ...home.etf.bg.ac.rs/~vm/os/dmsw/Data Mining... · have a large impact on type of data mining algorithm to choose;therefore,onehastodecidetheprocessing

International Journal of Distributed Sensor Networks 7

network along with contextual information by working intwo phases Firstly an abstraction step is to segment andlabel the real-valued time series into similar subsequencesby using a kernel density estimator approach Then theknowledge is enriched by adding interval-based operatorsbetween the subsequences obtained in the discretization stepand the relation pattern mining algorithm has been extendedin order to deal with these new operators By taking intoaccount the interval-based temporal data along with contex-tual information about events it discovers interesting andmore human-readable patterns The framework is evaluatedon real dataset collected from a wireless sensor networkmade up of 54 Mica2Dot [61] sensors deployed in the IntelBerkeley Research Lab [62] Each sensor collected topologyinformation along with humidity temperature light andvoltage values once every 31 seconds Results show the strongcorrelation among some measurements which is useful foranomaly detection

Cook et al [21] present MavHome smart home archi-tecture which focuses on the creation of an intelligenthome perceiving the state of the home through sensors andacting upon the environment through device controllers Animportant characteristic of the proposed architecture is theability to make decisions based on predicted activities Topredict the activities an algorithm called episode discovery(ED) is proposed which is based on the work of Srikantand Agrawal [54] for mining sequential patterns from time-ordered transactions Values that can be predicted include theusage pattern of devices in the home the movement patternsof the inhabitants and the typical activities of the inhabitantsThey utilize prediction algorithms on action sequences storedin inhabitant event history to forecast user actions Actionscan then be automated based on the significance of minedpatterns as well as the predictive accuracy of the next eventA key disadvantage is the fact that the entire action historymust be stored and processed off line which is not practicalfor large prediction tasks over a long period of time Cook etal demonstrated the effectiveness of MavHome on syntheticsmart home data and real data collected by students usingX10controllers in their homes Experiments show a predictiveaccuracy as high as 534 on the real data and 944 on thesynthetic data

Rabatel et al [22] presented a strategy to detect anomaliesfrom sensor data to improve the railway maintenance Theyextract sequential pattern from real railway data and identifythe abnormal behavior Based on these abnormal findingsalarms are automatically triggered to notify potential fail-ures This abnormal behavior depends on environmental(weather conditions travel characteristics) and structural(route episode index in the route) changes in data ThePSP [55] algorithm has been used to identify the sequentialpatterns To tackle the environments conditions a contextualknowledge-based method is proposed which is able toprovide information on the seriousness and possible causesof a deviation The proposed technique helps in proactivemaintenance of train However real-time context can beimproved by providing precise and exact information foranomaly detection

a q kTqkTaq

Figure 3 Example of sequential alarm pattern

Guralnik and Haigh [23] use sequential pattern miningto learn typical behaviors of humans in their homes Humanbehavior is inferred by using motion sensors pressure padsdoor latch sensors and toilet flush sensors They installed10ndash20 sensors of different types in a home and built modelsof what sensor firings correspond to what activities in whatorder and at what time For example ldquoIn 60 of the daysthe Kitchen-Motion sensor fires between 18h00 and 18h30and then the Living-Room-Motion sensor fires between18h20 and 20h00 and then the Bedroom-Motion sensor firesbetween 19h45 and 22h00rdquoTheir algorithm uses these data tolearn the sequences of rooms in which the person was actingand it uses domain knowledge to extract the sequences ofrooms the person was acting in These sequences are thenanalyzed by a human expert to identify complex behaviormodels These models can be used to select the appropriateresponse plan to the action of elderly

Wu et al [63] proposed a new algorithm for miningsequential alarm patterns (MSAPs) from the alarm datagenerated by GSM system Sequential events are identifiedfrom alarm data by defining time interval between adjacentevents For example if time is set as six hours then thesequential alarm pattern (119886 119887 119888) indicates that 119886 119887 and 119888happen in order and that the time interval between 119886 and119887 and between 119887 and 119888 is less than six hours An exampleof sequential alarm sequence redrawn from [63] is shown inFigure 3

The number in circle represents the error ID and 119879119886119902

denotes the time difference between alarm event 119886 and alarmevent 119902 The knowledge extracted is not only useful foridentifying relevance between two events but it is also predictthe alarm sequence and takes proper steps to prevent theoccurrence of the alarms if at all possible For example if thenetwork operator detects that the alarm 119886 occurring at time 119905operator should dissipate this alarm before the time 119905+119879

119886119902to

alleviate the abnormal situations incurred The limitation inthis technique is that it cannot discover other possible time-interval patterns between the events

It is observed that there is none of centralized solutionswhich aim to maximize the WSNsrsquo performance

422 DistributedApproaches Aim to SolveWSNsrsquo Application-Based Issues Tseng and Lu [64] proposed an object trackingstrategy named themultilevel object tracking (MLOT) to dis-cover sequential patterns in object tracking sensor networks(OTSNs) by mining the movement log in sensor networks Amultilevel hierarchical structure is adapted by using the clus-tering mechanism that represents the hierarchical relationsamong sensor nodes to achieve the goal of keeping track ofmoving objects in a real-time manner The movement logsof the moving objects are analyzed by developing the data

8 International Journal of Distributed Sensor Networks

mining algorithm movement pattern generation (MPG) toobtain themovement patterns which are then used to predictthe next position of a moving object and to activate the leastsensor node The MPG is based on Apriori which uses thefrequency of the inference pattern to evaluate the confidenceof the pattern and which with the highest frequency serves asthe basis of the prediction

423 Distributed Approaches Aim to Maximize WSNsrsquo Per-formance Tseng and Lin [65] proposed an object trackingstrategy named TMP-mine to discover sequential patternsin object tracking sensor networks (OTSNs) by mining thetemporal movement patterns (TMPs) logs The discoveredtemporal movement rules (TMRs) are used to predict thelocation of next objects for saving energy In the proposedmodel object is able to record the sensor nodes it visitedalong with the arrival time at each nodeThemovement log iscollected by equipping the sensor nodes with storage devicesTheWSN collects and integrates themovement log ofmovingobjects The integrated movement log is used as the input tothe data mining method named the TMP-miner which usesthe pattern growth approach for discovering the TMPs Byapplying the TMP-mine algorithm the TMPs are discoveredand then the temporalmovement rules (TMRs) are generatedfor predicting next location of moving object Suppose thatthe following two rules are discovered by vehicle trackingsystem

Rule 1 (Station A rarr interval 10min rarr Station B rarrinterval 5min rarr Station C)

Rule 2 (Station A rarr interval 20min rarr Station B rarrinterval 5min rarr Station rarr D)

By dispatching these rules to the corresponding sensornodes the tracking can be made in energy-efficient way Forexample if a car moves with the pattern as (Station A rarrinterval 10min rarr Station B rarr interval 5min) that matcheswith Rule 1 then the node in Station B has only to activatethe node in Station C rather than that in Station D or thosearound Station B

Samarah et al [66] proposed an energy-efficientprediction-based tracking technique by using the sequentialpatterns (PTSPs) This technique helps to predict the futurelocation of a moving object with the minimum number ofsensor nodes while keeping the other sensor nodes in thenetwork in sleep mode The PTSP is based on the inheritedpatterns of the objects movements in the network and theutilization of sequential patterns to predict in which sensornode the moving object will be heading next

43 Clustering Clustering is unsupervised learning wheregiven data is categorized into subsets so that each subsetrepresents a cluster which has distinctive properties It hasbeen considered a useful technique especially for applicationsthat require scalability to large number of sensor nodesClustering also supports aggregation of data in order tosummarize the overall transmitted data

ClustersInput sensor data

Feedback

Identification ofdata correlation Grouping data

Figure 4 Data clustering for sensor networks

In the current literatures problems related to clusteringare addressed by node clustering or data clustering Recentlylarge numbers of node clustering algorithms have beendesigned for WSNs [67ndash83] These clustering techniqueswidely vary in their objectives depending on the node deploy-ment and bootstrapping schemes the pursued networkarchitecture the characteristics of the cluster head (CH)and the network operation model Although node clusteringmay be related to data clustering for example consideringdata similarity of neighboring node many popular nodeclustering algorithms that partition the sensor nodes into anumber of small groups and elect a cluster head for everygroup do not use the data mining techniques directly In thisstudy we only focus on data clustering techniques to efficientdata mining and find data correlations among the nodesFigure 4 shows the commonly used data clustering in datamining process

This work adapted the K-mean hierarchical and datacorrelation-based methods The k-mean algorithm takes theinput parameter k and partitions a set of 119899 objects into kclusters so that the resulting intracluster similarity is highbut the intercluster similarity is low Cluster similarity ismeasured with respect to the mean value of the objectsin a cluster Hierarchical method creates a hierarchicaldecomposition of the given set of data objects It works bygrouping data objects into a tree of clusters whereas datacorrelation-based clustering forms clusters based on spatialand temporal correlations with similar node sensory valueswithin a given threshold and these clusters remain fixeduntil the sensory value threshold has changed over timeWhen the threshold values change the related sensor nodeswill then communicate with neighboring nodes associatedwith other clusters to change their cluster memberships Thedrawback of this type of clustering is that it does not considernode residual energy It is observed from the survey that thecentralized and distributed clustering solutions are aim tomaximize the WSNs performance

431 Centralized Approaches Aim to Maximize WSNsrsquo Per-formance Liu et al [84] proposed a centralized graph-basedenergy-efficient data collection (EEDC) EEDC is on-demandclustering algorithm that clusters node into groups such thatmembers have similar sensor readings and thus the protocolclusters the network with an awareness of the phenomenabeing sensed EEDC is a centralized approach where thesink compares data from different nodes with a user-defineddissimilarity measure EEDC models the cluster creationprocess as a clique-covering problem by constructing a graph119866 such that each sensor node is a vertex in the graph An edge(119906 V) is drawn if the dissimilarity measure between vertex119906 and vertex V is less than or equal to the given intracluster

International Journal of Distributed Sensor Networks 9

dissimilarity measure thresholdmax dst A cluster is a cliquein the graph and the clustering problem uses the minimumnumber of cliques to cover all vertices in the graph Thisprocess minimizes the number of clusters and maximizes theenergy saving The sink also dynamically adjusts the clustersbased on spatial correlation and the received data from thesensors The algorithm produces robust and well-balancedclusters However due to centralized processings it is notsuitable for large-scale WSNs

432 Distributed Approaches Aim toMaximizeWSNsrsquo Perfor-mance Guo et al [85] proposed the H-cluster a distributedalgorithm to cluster sensory dataThe input of this algorithmis the set of sensory data collected by all of the sensorsfrom the time WSN starts working up to the current timeThe output of the algorithm is a set of cluster featuresthat summarize the clusters of the input sensory data-setHilbert-Map mapping algorithm has been used to map ad-dimensional sensory data space into a 2-dimensional areacovered by a given WSN H-cluster has 2 phases (1) itmerges connected grid features with local cluster featuresof (sensory dimensional) D at each destination node (2)it combines the connected local clusters to global clustersThe experiments on the centralized and distributed dataare carried out to compare the H-Cluster with C-Cornerand C-Center algorithms During experiment four types ofenvironment attributes are sensed by the sensors which aretemperature humidity light and voltage The results showthatH-Cluster algorithm ismuch efficient in data loss energyand the quality of cluster data in small WSNThe results alsoshows that as the amount of sensory data delivered increasesthe amount of data loss also increases and energy efficiencydecreases by increasing the size of WSNs

Yeo et al [86] proposed data correlation-based clusteringscheme (DCC) based on similarity of sensor data along aspatial suppression scheme which helps to reduce the datasize DCC enhances the advertisement phase of HEED [71]in which cluster heads are selected according to probabilityof becoming a cluster head during this phase sensor nodescommunicate with each other and the resulting clustersare organized by sensor nodes which have similar readingsSpatial suppression is performed on cluster head and italso computes the difference between sensor reading andrepresentative value If a cluster head has redundant datait will remove it except for the node identification Theexperimental results justify the hypothesis claim that theclustering based on data correlation has better compressionperformance than ordinary clustering based on locality ofcommunication they show that DCC reduces 40 of datasize through suppression and prolongs network lifetime20ndash30 However for the large-scale network applications(nodes gt 500) DCC is inefficient because each cluster headneeds more energy to collect similar data readings and alsoto communicate with several nodes Also in case of lowpercentage of similar data reading DCC is ineffective due tohigher rate of cluster head creation

Beyens et al [87] proposed a cluster-based architecturefor wireless sensor networks in which cluster heads spa-tiotemporally correlate and predict the measurements of the

cluster members by executing their prediction model Intheir approach the cluster heads execute a prediction modelwhile gateway nodes at the circumference of the clusters areresponsible for the routing task Prediction model is used toselect a suitable node of the cluster to be activated The ideais to put a sensor node to sleep when there are no objects inits sensing region

Yoon and Shahabi [88] present the clustered aggregation(CAG) algorithm that forms clusters of nodes sensing similarvalues within a given threshold (spatial correlation) andthese clusters remain unchanged as long as the sensor valuesstay within a threshold over time (temporal correlation)By grouping nodes on similar values CAG only transmitsone reading per group When the threshold values changethe related sensor nodes will then communicate with neigh-boring nodes associated with other clusters to change theircluster memberships CAG guarantees the result to be withina user-specified error-tolerance threshold Cluster formationis performed while queries are disseminated to the network(query phase) where clusters group nodes sensing similarvalues Subsequently CAG enters the response phase whereinonly one aggregated value per cluster is transmitted up theaggregation tree CAG is a lossy clustering algorithm (mostsensory readings are never reported) which trades a lowerresult precision for a significant energy storage computationand communication saving

Taherkordi et al [67] proposed a communication-efficient distributed protocol for clustering sensory dataA distributed version of 119870-Mean clustering algorithm isproposed and sends summarized data towards sink whichreduces the communication transmission time and powerconsumption of sensor nodes The sensor network is dividedinto clusters and cluster head node will only communicatewith sink Initially base station transmits current centerlocations to cluster heads Cluster head collects data fromits sensor node and sends it to the base station includingcount and vector sum of its local sensory data points aswell as sum of the squared distance from each local pointto its center On receiving data from CH the base stationupdates the cluster mean and the algorithm repeats until thefunction convergence is met The efficiency of the algorithmis evaluated via simulations Several programs are run to getthe average number of transmissions over the network duringeach test According to results the communication cost isindependent of the number of sensors (119873) and increaseslinearly by increasing the number of centers Major issuesare extra memory for cluster head and computation powerfor summarization of data before transmitting to sink Alsothe algorithm requires multiple rounds of message passingbetween cluster heads and the base station this may have aserious effect on communication efficiency when the numberof sensors is relatively high

Wang et al [89] promoted the idea of clustering theWSNs based on the queries and attributes of the data Themain motive is to achieve efficient dissemination of data inthe network The concept resembles the data-centric designmodel of WSNs The clustering is established by mappinga hierarchy of data attributes to the network topology Thebase station starts the clustering process by asking nodes

10 International Journal of Distributed Sensor Networks

Class label (Y)

Attribute set (X)

OutputInput Classification model

Figure 5 Classification maps input attribute set (X) to class label(Y)

to form clusters Those nodes that hear the request decidewhether they should nominate themselves as CHs basedon their energy After receiving the base-station requestsensor nodes having intention to become CHs wait for arandom time period that is based on the remaining batterysupply If a node nominates itself then it broadcasts anannouncement to all nodes A node joins the CH that itcan reach over the least number of hops Upon hearing aCH announcement from a node whose attribute is differentthe recipient node establishes a new cluster for that attributeand becomes a CH To evaluate the attribute-based clusteringscheme the authors have provided the theoretical analysis ofit with flooding-based schemes Analysis shows its attribute-based clustering scheme yield that gains over flooding-basedschemeswhen there are subregions in the sensor network thatare more targeted than others that is when the distributionof inquiries is not uniformly distributed over time and space

Ma et al [90] the proposed distributed hierarchicalclustering and Summarization algorithm (DHCS) for onlinedata analysis and mining in sensor networks The proposedmethod clusters sensor nodes based on their current datavalues aswell as their geographical proximity and it computesa summary for each cluster The algorithm adopts severaltechniques such as difference and hop count thresholds tomodel node and distance-based clustering Initially eachnode treats itself as an active cluster Then similar adjacentclusters are merged into larger clusters round by round Ineach round each cluster will try to combine with its mostsimilar adjacent cluster simultaneously Two clusters can bemerged only if both consider one another as the most similarneighbor DHCS terminates when no merging happens anymore The final clusters which cannot be merged any moreare called steady clusters

44 Classification Classification is a task of assigning newobject into a class of predefined object categories Classifi-cation model is learned using the set of training data andclassifies new data into one of the learned class Figure 5shows that classification maps input attribute set (X) to classlabel (Y)

Classification-based approaches have adapted the tra-ditional classification techniques such as decision tree-based rule-based nearest neighbor-based and support vectormachines-based techniques based on type of the classificationmodel that they used Decision tree is a classifier in the formof tree and classifies the instance by starting at the root oftree and moving through it until a leaf node where class labelis assigned The internal nodes are used to partition datainto subsets by applying test condition to separate instancesthat have different characteristics Nearest neighbor-basedapproaches classify dataset based on closet training examples

The training examples are vectors in a multidimensionalfeature space with corresponding class labels A nearestneighbor classifier is a lazy learner that does not processpatterns during training [91] To respond a request to classifya query vector is made to locate the closest training vectorsaccording to the distance metricThe classes of these trainingvectors are used to assign a class to the query vector

Rule-based classifier groups the dataset in predefinedclasses by using ldquoif then rdquo rules of following form

(Condition) rarr Y condition is a conjunction ofattribute and Y is a class label

SVM (support vector machine) techniques partition thedata belonging to different classes by fitting a hyperplanebetween them which maximizes the partition The data ismapped into a higher-dimensional feature space where it canbe easily partitioned by a hyperplane Furthermore a kernelfunction is used to approximate the dot products between themapped vectors in the feature space to find the hyperplane

441 Centralized Approaches Aim to SolveWSNsrsquo Application-Based Issues Chikhaoui et al [92] proposed the decisionTree (DT-) based classification technique for sensor dataThey applied the classification model to identify the personsin ubiquitous environment In order to identify personsthe proposed approach first extracts frequent patterns calledepisodes from the datasets using the Apriori algorithm [53]The next step evaluates the extracted patterns and assignsweights to these episodes to construct frequent episodeweight matrix (FEWM)

Finally the classification algorithm Decision tree (DT) isapplied on FEWMDT builds pattern classifier from a labeledtraining data-set using a divide-and-conquer approach Tobuild up a DT model it recursively selects the attribute thatis used to partition the training data-set into subsets untileach leaf node in the tree has uniform class membershipThe proposed approach is validated by experiment usingdata collected from the Domus Laboratory [93] and theTestbed smart home [94] The general performance andclassification accuracy of algorithm are evaluated by usingthe Weka framework version 370 [95] Experiment resultsshow good classification However using frequent episodesalone without temporal constraints and deep analysis doesnot guarantee good identification

Sharma et al [96] proposed amethodology for classifyingthe sensors data by using nearest neighbor trajectory clas-sification (NNTC) The training phase simply stores everytraining example with its label To make a prediction for atest example first its distance to every training example iscomputedThen 119896 closest training examples are storedwhere119896 is a fixed integer and 119896 ge 1 among the 119896 examples itlooks for the label that is most frequent This label is theprediction for this test example The algorithm is evaluatedby building a classifier from the preprocessed training datagenerated from NS2 [97] and test trajectory data [98] usingclass labels Experimental investigation yields a significantoutput in terms of the correctly classified success rate 923

Akhlaghinia et al [99] proposed the prediction techniquein smart home environments to predict the behavior pattern

International Journal of Distributed Sensor Networks 11

of occupantsThe sensor NWs collect the variety of attributesincluding environmental changes and occupantrsquos interactionwith the environment The collected data is then used by thelearning approach to construct a classification-based predic-tive model to predict the ambient intelligence environmentoccupancy The occupancy is predicted by using the fuzzyrules which are modeled by using the past value of timeseries data In the learning process input from the sensor iscompared with stored rules to take appropriate action Theprediction-based approach improves the energy saving insmart homes and enhances the safety and security of occu-pants The result shows the ability of the proposed techniqueto predict the combined occupancy time series However themodel is implemented in single-user environment and unableto predict the complex environmental patterns in multi-userenvironment over long period

442 Centralized Approaches Aim toMaximizeWSNsrsquo Perfor-mance Gaber et al [100] proposed the lightweight classifica-tion (LWClass) a one-pass algorithm for on-board miningof data streams in sensor networks They used the algorithmoutput granularity (AOG) [101 102] technique to preserve thelimited memory size and change the algorithm output rateaccording to data rate available memory algorithm outputrate history and time constraints to fill the available memorywith generated knowledgeThe algorithmworks by searchingfor the nearest instance stored in main memory when a newelement arrives All instances are already stored in the mainmemory according to a prespecified distance threshold Thethreshold here represents the similarity measure acceptableby the algorithm to consider two or more elements as oneelement according to the elements attribute values If thealgorithm finds this element then it checks the class labelIf the class label is the same then it increases the weightfor this instance by one otherwise it decrements the weightby one If the weight becomes zero then this element isreleased from the memory The algorithm is empiricallyvalidated using synthetic streaming data under the resource-constrained environment of a common handheld computer

443 DistributedApproaches Aim to SolveWSNsrsquo Application-Based Issues McConnell and Skillicorn [103] presented adistributed framework for building and deploying predictorsin sensor networks By using the computational power ofeach sensor a powerful learning structure on whole networkis constructed A distributed voting approach is proposedin which each sensor is a leaf of tree (DT) to performlocal prediction Instead of sending the raw data the localpredictive models built on sensors transmit the target class tothe sink At sink the local predication models are combinedto construct global prediction model It shows how thelocal model enables sensors to respond to the change intarget by relearning local models The proposed frameworkis useful especially for sensor networks with limited energycomputation and bandwidth resources It makes efficientthe distributed data mining in the presence of movingclass boundaries Data is also confidentially achieved bytransmitting a predictivemodel instead of original data to the

sink The distributed prediction model is evaluated using J48decision tree (implemented in WEKA) on variety of datasetfor both simple and weighted voting schemes According toresults distributed prediction model has the potential of anincrease in accuracy combined with a reduction in modelsize and runtime as compared with a centralized approachMajor issues in this framework are the need of an expensiveCPU on each sensor node for computing and building localpredictive model and also extra memory is required to storelocal predictive model

444 Distributed Approaches Aim to Maximize WSNsrsquo Per-formance Malhotra et al [104] proposed a distributed clas-sification scheme to generate effective feature vectors of lowdimension (FVLD) for wireless audio network A distributedcluster-based algorithm for detection and classification ofvehicles has been proposed Sensors form clusters on-demand for the sake of running a classification task based onthe produced feature vectors The monitoring area is dividedinto clusters and a cluster head is selected for each clusterAll sensors send their feature vector to cluster heads Thecluster head combines all received feature vectors (includingone from itself) executes the classification task using forexample KNN or ML classifiers and makes decision on theclass of the unknown vehicle Two approacheswere proposedthe first combines extracted features and the second combinesindividual decisions Classification using decision fusion anda maximum likelihood (ML) classifier led to the best resultsML is also compared with KNN classifier with varioussettings of data and decision fusion schemes The proposedtechnique produced the best classification accuracy of 8946as compared with all other approaches

Flouri et al [105ndash107] have proposed distributed andincremental techniques for learning classification rules usingSVM-based (support vector machine) technique in a sensornetwork The authors proposed two distributed algorithmsthe distributed fix partition SVM (DFP-SVM) and theweighted distributed fix partition SVM (WDFP-SVM) fortraining a SVM applied to the classification problem in aWSN SVM is incrementally trained on example set calledsupport vector The fact with SVM is that the number ofsupport vectors is very small comparedwith the number of allsample values Besides the support vectors (and offset) revealcompressed representation of separating SVM hyperplaneThat is why sending only the support vectors instead ofall training samples to the next cluster head is obviouslyvery energy efficient due to communication reduction Aftertraining the required parameters of the kernel functions aretransferred to each node for classification The performanceof the proposed approach is evaluated by running number ofsimulation and comparison is made with centralized algo-rithm The results show that energy consumption decreaseswhen the SVM is trained incrementally as compared with thecentralized case However the challenges for SVM formula-tions are computational complexity and the choice of properkernel function

Rajasegarar et al [108] proposed the SVM-based tech-nique for outlier detection in sensor data This techniqueuses one-class quarter-sphere SVM to identify local outliers

12 International Journal of Distributed Sensor Networks

at each node and to minimize the computational complexityThe sensor data that lies outside the quarter sphere isconsidered as an outlier Each node communicates onlythe radius information of sphere with its parent for outlierclassification This technique identifies outliers from the datameasurements collected after a long-time window and is notperformed in real time The technique also ignores spatialcorrelation of neighboring nodes which makes the results oflocal outliers inaccurate The technique is evaluated by usingthe real sensor measurement collected from deployment ofwireless sensors in the Great Duck Island Project [2] formonitoring the habitat of sea birds The algorithm is imple-mented in Matlab and two simulations were run to measurethe computational strategy and various kernel functionsResults reveal that the proposed technique achieves signifi-cant energy savings in terms of communication overhead inthe network

5 Comparison of Data Mining Techniquesfor WSNs

This section identifies several common and different aspectsof data mining techniques specially designed for WSNsdiscussed above These aspects will be used as metrics in thecomparative Tables 2 3 4 5 and 6 First evaluation aspectsfor different techniques are discussed and then comparativetables are presented to compare and differentiate existing datamining techniques for WSNs data

51 Input Sensor Data Sensor data can be viewed as largevolume of real-valued data that is continuously collectedfrom WSNs The type of input sensor data demonstrateswhich data mining techniques can be used to analyze thedata Data mining techniques usually consider following twocharacteristics of data

Attribute Mining techniques can identify the associationbetween data attributes Attributes can be homogenous [50] orheterogeneous [33 48] Homogenous attribute means sensingsingle-value attribute for example temperature only Forheterogeneous case each nodemay be equippedwithmultiplesensors and can sense multiple attributes for example tem-perature humidity and pressure The data mining techniqueshould be able to identify the correlation between multipleattributes

Correlation Two types of data correlation appear at eachsensor node The first type is attribute correlation that isdependency among data attributes The second type is interms of time and space that is temporal and spatial corre-lation Temporal correlation indicates that the readings fromdifferent sensor node are observed at the same time instantand readings observed at one time instant are related tothe readings observed at the previous time instant whereasspatial correlation indicates that the readings from sensornodes geographically close to each other are expected tobe largely correlated Capturing spatiotemporal correlation

helps to predict future trend of sensor reading and identifica-tion of dead node if reading from correlated sensor ismissing

52 Processing Architecture In order to apply data miningtechnique on sensor data we need to determine the modelsof computation There are two general models Consider thefollowing

CentralizedThe simplest way to analyzeWSNs data is to use acentralized model In this approach entire raw data collectedfromWSNs is transferred to central server whichmaintains adatabase of readings from all of the sensorsThe central serverperforms offline extensive analysis in order to find interestingpatterns from the aggregated data With the size of WSNsincreasing the amount of data transmitted in the system willbecome huge The obvious drawback of this approach is highconsumption of energy and bandwidth Furthermore it is notscalable to very large number of sensors

Distributed Another computation approach uses distributedmodel in which sensor nodes use their processing abilitiesto carry out some mining tasks locally and transmit onlythe required and partially processed data called local modelLocal models contain the compact event patterns rather thanraw data For example data collected from different sensorcan be aggregated before being transmitted to central serverIn these systems an intermediate node called ldquoaggregatorrdquo isused to collect and aggregate the data from different sensorsSince sensor nodes are constrained in resources the challengefor this approach is how to satisfy the mining accuracywhile keeping the communication overhead memory andcomputational cost low

53 Data Mining Method It refers to the data miningalgorithm adapted or developed for unique characteristic ofWSNs data Distributed approaches use one-scan algorithmsfor real-time processing in order to deal with the high dataarrival rate the mining results are expected to be availablewithin short response times whereas centralized approachescollect the sensory data to single site and applies offlinemultiscan technique for extensive data analysis

54 Node Properties The proposed techniques are largelyinfluenced by following types of node properties

Connectivity Single-hop communication is a direct commu-nication between the sensor node and the base station It issimple and easy to implement but limited by communicationdistanceMultihop communication uses some kinds of nodesas relays when transmitting data packets from the source tothe sink which is more complex

Mobility Node mobility increases the complexity of design-ing an appropriate data mining technique for WSNs Themajority of techniques assumes that sensor nodes are staticonly a few techniques consider the node mobility Whennodes are mobile maintaining a certain structure for data

International Journal of Distributed Sensor Networks 13

Table2Com

paris

onof

dataminingtechniqu

esforw

irelesssensor

networks

Approach

Objectiv

eDM

metho

d

Processin

gSensor

data

Nod

eproperties

Implem

entatio

nLimitatio

nsArchitecture

Attributes

Correlatio

nCon

nectivity

Mob

ility

Nod

erole

Nod

etask

Applicationarea

Evaluatio

nmetho

dDatas

ource

Opt

objective

Distributed

Central

Homogenous

Heterogeneous

Attribute

Spatial

Temporal

Singlehop

Multihops

Static

Mobile

ClusterheadSensorRelay

Simulation

AnalyticalMod

Real

Synthetic

Frequent

patte

rnmining

DSA

RM[42]

Missingdata

estim

ation

Aprio

rilik

eradicradic

radicradic

radicradic

Sensea

ndsend

Traffi

cmon

itorin

gradic

radicData

accuracy

Igno

rethes

ensor

thatrepo

rts

different

values

In-networkdata

mining[51]

Eventspatte

rns

discovery

Aprio

rilik

eradic

radicradicradic

radicradic

radic

Aggregatio

nlocalp

attern

mining

Environm

ental

mon

itorin

gradicradicradic

Scalability

Highmem

oryand

commun

ication

Distrib

uted

data

aggregation[15]

ImproveW

SNperfo

rmance

Aprio

rilik

eradic

radicradic

radicradic

radicSupp

ort-b

ased

aggregation

WSN

sperfo

rmance

mon

itorin

gradic

radicDatas

ize

Increasesb

uffer

cost

delayed

crucialm

essages

Onlinea

lgorith

m[46]

Intervallist

ofrepresentatio

nof

WSN

sdata

Lossy

coun

ting

radicradic

radicradic

radicradic

Perio

dical

sensing

WSN

smon

itorin

gradic

radicTimea

ndmem

ory

Datar

edun

dancy

Lightweightrule

learning

[48]

Identifyhigh

lycorrelated

rules

forsensin

gAp

riorilik

eradic

radicradic

radicradic

radicQuery-based

data

sensing

Con

trolW

SNs

operations

radicradic

Energy

Not

valid

ated

well

onrealdata

CARM

[43]

Missingdata

estim

ation

FP-growth

based

radicradic

radicradic

radicradic

Sensea

ndsend

Dataa

nalysis

radicradic

Data

accuracy

Ineffi

cientfor

hand

ling

high

-speed

data

14 International Journal of Distributed Sensor Networks

Table3Com

paris

onof

dataminingtechniqu

esforw

irelesssensor

networkscon

tinued

Approach

Objectiv

eDM

metho

d

Processin

gSensor

data

Nod

eproperties

Implem

entatio

nLimitatio

nsArchitecture

Attributes

Correlatio

nCon

nectivity

Mob

ility

Nod

erole

Nod

etask

Applicationarea

Evaluatio

nmetho

dDatas

ource

Opt

objective

Distributed

Central

Homogenous

Heterogeneous

Attribute

Spatial

Temporal

Singlehop

Multihops

Static

Mobile

Clusterhead

Sensor

Relay

Simulation

Analyticalmod

Real

Synthetic

Frequent

patte

rnmining

Associationrules

mining

fram

ework[50]

Faultand

future

event

predictio

n

FP-growth

usingPL

T-str

uctureradic

radicradic

radicradic

radicradic

Aggregatio

nMon

itorW

SNs

quality

ofserviceradic

radicNoof

messages

Increase

costdu

eto

multip

leDBscan

SP-tr

ee[49]

Disc

over

events

patte

rns

FP-growth

based

radicradic

radicradic

radicradic

Sensea

ndsend

Generic

mon

itorin

gradicradicradic

Mem

ory

Hightre

econstructio

ncost

Sequ

entia

lpattern

mining

Relatio

nal

fram

ework[58]

Multi-

dimensio

nal

correlation

discovery

Aprio

rilik

eradic

radicradicradic

radicradic

Sensea

ndsend

Environm

ental

mon

itorin

gradicradic

Data

representatio

nMem

oryandtim

econsum

ing

Episo

dediscovery(ED)

[21]

Actio

npredictio

n

Generalized

sequ

entia

lpatte

rn(G

SP)

radicradic

radicradic

radicSensea

ndsend

Inhabitants

behavior

predictio

nradicradicradic

Predictio

naccuracy

Ineffi

cientfor

complex

activ

ities

MPG

[64]

Predicto

bjectrsquos

future

movem

ent

Aprio

rilik

eradic

radicradic

radicradicradic

Clusterin

gRe

al-timeo

bject

tracking

radicradic

Tracking

time

andenergy

Not

analyzed

onrealdataset

Con

textual

patte

rns

discovery[22]

Ano

maly

detection

PSP

radicradicradic

radicradic

radicSensea

ndsend

Railw

aymaintenance

radicradic

Ano

maly

precision

Missingreal-time

anom

alypredictio

n

International Journal of Distributed Sensor Networks 15

Table4Com

paris

onof

dataminingtechniqu

esforw

irelesssensor

networkscon

tinued

Approach

Objectiv

eDM

metho

d

Processin

gSensor

data

Nod

eproperties

Implem

entatio

nLimitatio

nsArchitecture

Attributes

Correlatio

nCon

nectivity

Mob

ility

Nod

erole

Nod

etask

Applicationarea

Evaluatio

nmetho

dDatas

ource

Optobjectiv

e

Distributed

Central

Homogenous

Heterogeneous

Attribute

Spatial

Temporal

Singlehop

Multihops

Static

Mobile

ClusterheadSensorRelay

Simulation

Analyticalmod

Real

Synthetic

Sequ

entia

lpattern

mining

TMP-mine[65]

Predicto

bjectrsquos

future

movem

ent

Patte

rngrow

thusingTM

P-tre

econstructio

nradic

radicradic

radicradic

radicRu

le-based

node

activ

ation

Real-timeo

bject

tracking

radicradic

Energy

Highmissing

rateandtim

e

Patte

rnlearner[23]B

ehavior

recogn

ition

Tree

projectio

nradic

radicradic

radicradic

radicSensea

ndsend

Behavior

mon

itorin

gradicradic

Noof

patte

rns

learned

Com

plex

and

redu

ndant

patte

rns

MSA

P[63]

Faultp

rediction

Cand

idate

constructio

nradicradic

radicradicradic

radicSensea

ndsend

Telecommun

ication

radicradic

Patte

rnsa

ccuracy

Cand

idate

constructio

nis

expensiveto

compu

te

PTSP

[66]

Objectrsquos

future

movem

ent

predictio

n

Sequ

entia

lpatte

rngeneratio

nradic

radicradic

radicradic

radicRu

le-based

node

activ

ation

Objecttracking

radicradic

Energy

Ineffi

cientto

predict

high

-speed

objects

Clusterin

g

DCC

[86]

WSN

slon

gevity

Data

correlation-

based

cluste

ring

radicradic

radicradicradic

radicradic

Data

supp

ression

GenericWSN

sapplication

radicradic

Energy

anddata

size

Highclu

sterin

grate

H-cluste

r[85]

In-network

commun

ication

Data

correlation-

based

cluste

ring

radicradic

radicradicradic

radicradic

Data

summarization

Real-time

mon

itorin

gradic

radicradic

Com

mun

ication

Highdataloss

rate

16 International Journal of Distributed Sensor Networks

Table5Com

paris

onof

dataminingtechniqu

esforw

irelesssensor

networkscon

tinued

Approach

Objectiv

eDM

metho

d

Processin

gSensor

data

Nod

eproperties

Implem

entatio

nLimitatio

nsArchitecture

Attributes

Correlatio

nCon

nectivity

Mob

ility

Role

Nod

etask

Applicationarea

Evaluatio

nmetho

dDatas

ource

Optobjectiv

e

Distributed

Central

Homogenous

Heterogeneous

Attribute

Spatial

Temporal

Singlehop

Multihops

Static

Mobile

ClusterheadSensorRelay

Simulation

Analyticalmod

Real

Synthetic

Clusterin

gPredictio

nmod

el[87]

Predictio

n-based

mon

itorin

gHeuris

ticscheme

radicradic

radicradic

radicradic

radicradicradic

Localprediction

mod

elEn

vironm

ental

mon

itorin

gradic

radicCom

mun

ication

Clustero

verla

pping

CAG[88]

WSN

sbandw

idth

gain

Data

correlation-

based

cluste

ring

radicradic

radicradic

radicradic

radicradic

Dataa

ggregatio

nGenericWSN

sapplications

radicradic

Com

mun

ication

Sensorydataloss

EEDC[84]

On-demand

cluste

ring

Data

correlation-

based

cluste

ring

radicradic

radicradic

radicradic

radicSensea

ndsend

Surveillanced

ata

analysis

radicradicradic

Energy

Ineffi

cientfor

large

WSN

s

Clusterin

gsensorydata[67]Com

mun

ication

efficiency

K-means

radicradicradic

radicradic

radicradic

Data

summarization

Dataa

nalysis

radicradic

Com

mun

ication

Ineffi

cientfor

large

WSN

sAttributeb

ased

cluste

ring[89]

WSN

sbandw

idth

gain

Hierarchal

cluste

ringradic

radicradic

radicradic

radicradic

Datac

luste

ring

Mon

itorin

gand

tracking

radicradic

Com

mun

ication

Highcompu

tatio

ncost

DHCS

[90]

Uniform

data

distr

ibution

Hierarchal

cluste

ringradic

radicradicradic

radicradic

radicradic

Datac

luste

ring

and

summarization

Interactived

ata

analysis

radicMessage

redu

ction

Nod

esenergy

isigno

red

International Journal of Distributed Sensor Networks 17

Table6Com

paris

onof

dataminingtechniqu

esforw

irelesssensor

networkscon

tinued

Approach

Objectiv

eDM

metho

d

Processin

gSensor

data

Nod

eproperties

Implem

entatio

nLimitatio

nsArchitecture

Attributes

Correlatio

nCon

nectivity

Mob

ility

Role

Nod

etask

Applicationarea

Evaluatio

nmetho

dDatas

ource

Opt

objective

Distributed

Central

Homogenous

Heterogeneous

Attribute

Spatial

Temporal

Singlehop

Multihops

Static

Mobile

ClusterheadSensorRelay

Simulation

Analyticalmod

Real

Synthetic

Classifi

catio

nPerson

identifi

catio

nalgorithm

s[109]

Identifyhu

man

behavior

Decision

tree

radicradicradic

radicradic

radicSensea

ndsend

Health

care

radicradic

Classifi

catio

naccuracy

Doesn

otgu

arantee

thec

orrectness

Predictio

nfram

ework[103]

Distrib

uted

predictio

nDecision

tree

radicradic

radicradicradic

radicradic

Localprediction

Generic

radicradic

Predictio

naccuracy

Com

putatio

nal

complexity

NNTC

[96]

Real-time

classificatio

nNearest

neighb

orradicradic

radicradic

radicradic

Sensea

ndsend

Generic

radicradicradic

Classifi

catio

naccuracy

Not

evaluatedon

realdataset

LWClass[100]

Preserve

WSN

sresources

KNN

radicradic

radicradic

radicradic

Sensea

ndsend

Ubiqu

itous

environm

ents

radicradic

Resource

awareness

Non

adaptio

nto

conceptd

rift

FVLD

[104

]Lo

w-dim

ensio

nfeaturev

ector

generatio

nKN

NM

Lradic

radicradic

radicradic

radicradic

Classifi

catio

nVe

hicle

classificatio

nradic

radicEn

ergy

Highcostof

feature

vector

transm

ission

Fuzzypredictor

mod

el[99]

Occup

ancy

predictio

nFu

zzyrules

radicradic

radicradic

radicradic

Sensea

ndsend

Health

care

radicradic

Predictio

naccuracy

Ineffi

cientfor

complex

scenarios

Onlinelearning

[105]

Increm

ental

classificatio

nSV

Mradic

radicradic

radicradic

radicradic

Classifi

catio

nEn

vironm

ental

mon

itorin

gradic

radicEn

ergy

Com

putatio

nal

complexity

One-class

quarter-sphere

SVM

[108]

Ano

maly

detection

SVM

radicradic

radicradic

radicradicradic

Localano

maly

detection

Habitat

mon

itorin

gradic

radicEn

ergy

Igno

resspatia

lcorrelation

18 International Journal of Distributed Sensor Networks

mining becomes difficult because updates on this structureshould be persisted over time

Node Role Node can perform three types of role [33] asfollows

(i) Regular Sensor These are the nodes with limitedresources and they are used to sense the phenomenaand send the sensed data to the base station

(ii) Cluster Head Cluster head can be a regular sensornode or it can be rich in resources In centralizedapproaches cluster head is a regular sensor node thatonly controls the cluster membership In distributedapproaches besides responding for cluster formationCHs perform aggregationfusion of collected sensorsrsquodata Therefore they are equipped with significantlymore computation and communication resources

(iii) Relay It is the node that acts as medium to transmitthe data packet from one node to the others

Node Task In centralized approach node task is to sense thephenomena being monitored and send the sensed data to thebase station In distributed approaches node can performcomputation and can take action based on the detectedphenomena or target

55 Application Area We also evaluated the type of applica-tion benefited fromWSNs data mining techniques Here weexemplify some real-world applications as follows

(i) First is the environmental monitoring [5ndash7 51 5887] in which sensors are deployed in harsh andunattended regions to monitor the natural environ-ment Data mining techniques can identify when andwhere an event may occur and trigger an alarm upondetection

(ii) Second is the habitant and health monitoring [1 299 109] in which patientshumans are equipped withsmall sensors on multiple different positions of theirbody tomonitor their health or behaviorDataminingtechnique can identify the abnormal behavior andhelp to take effective action

(iii) Third is the object tracking [3 4 65 66] in whichsensors are embedded inmoving targets to track themin real-time Data mining techniques help to improvethe estimation of the location of targets and also tomake tracking more efficient and accurate

(iv) Fourth is the WSNs performance [46 48 50 51]WSNs are usually unattended and deployed in harshenvironment Sensor nodes are resource constrainedespecially in terms of power Data mining techniqueshelp to identify the faulty or dead nodes Theyalso help to conserve energy by using in-networkprocessing in which aggregated data is sent to centralside

(v) Fifth is the data analysis [67 84 90] Data miningtechniques help to discover potentially interesting

data patterns in a sensor network for a certainapplication

(vi) Sixth is the real-time monitoring [64 65 85] Datamining techniques especially distributed techniqueshelp to identify certain patterns and predict futureevents in a given time window which make real-timeresponse and action feasible

56 Implementation Each technique is also evaluated interms of experimental validation that is which dataset isused which WSNs optimization objectives are achieved andso forth

Evaluation Method Analytical modeling simulation andreal deployment are the most commonly used techniques toanalyze the performance of data mining technique forWSNs

(i) Analytical Modeling This method is very complexand usually certain simplifications are assumed topredict the performance of the proposed schemeSuch assumptions and simplifications may lead toimprecise results with limited confidence

(ii) Simulation It is the most popular and effectiveapproach to design and test any proposed schemein terms of cost and time it also provides higherlevel of details as comparedwith real implementationHowever the appropriate selection of a simulationframework according to problem and network char-acteristics is a critical task

(iii) Real Deployment It may not be feasible to evaluatethe performance of these techniques through realdeployment due to the unavailability of appropriatehardware in terms of technical and design limitationsUsually the real deployment requires hundreds ofsensor nodes and cost becomes another importantissue In a nutshell evaluating any technique pro-posed for WSNs through real deployment can getthe most convincing results although the evaluatingprocess is complex costly and time consuming

Data Source It refers to dataset use to experimentally validatethe proposed technique Two types of dataset are usedgenerally that is synthetic and real It is observed from thispaper that most of the techniques use the simulation onsynthetic dataset to validate the result In this paper it isobserved that most of the studies used the simulation due tolimited processing power of sensor nodes

Optimization Objective SinceWSNs are constrained in termsof different resources the technique is also evaluated in theoptimization objective that has been achieved Most of thetechniques consider the resource constraint and differentdesign philosophies of network None of them can workefficiently for all of the performance metrics like networksize communication overhead energy efficiency memoryconsumption node mobility and and so forth The largevariations in the performance metrics make it a difficult taskto present a comprehensive evaluation

International Journal of Distributed Sensor Networks 19

6 Limitations of Existing Data MiningTechniques for WSNs

Tables 2ndash6 show the characteristics of datamining techniquesdesigned for WSNs It is observed from comparative analysisthat the existing techniques have the following shortcomings

(i) Most of the techniques do not take into account theheterogeneous data and assume that the sensor data ishomogenous [42 46 49ndash51 65 87 110] They ignorethe fact that different attributes together can improvethe mining accuracy In some cases homogenousdata cannot contribute appropriately toward real-time decision

(ii) The majority of techniques only considers the spatialor temporal or spatiotemporal correlations [65ndash6787 88] among sensor data of neighboring nodes anddoes not consider the attribute dependency amongsensor nodes This in turn increases the computa-tional complexity and reduces the accuracy of miningtechnique

(iii) The techniqueswhich consider spatial correlation [51]among sensor data of neighboring nodes suffer fromthe choice of appropriate neighborhood range Tech-niques which consider temporal correlation amongsensor data suffers from the choice of the size of thesliding window

(iv) The majority of techniques uses centralized approach[21 42ndash44 46 58 84 101] in which all data istransmitted to the sink node for identifying certainpatterns These techniques cause much communica-tion overhead and delay the response time Whilethe techniques that used distributed architecture opti-mize response time and energy consumption theyhave the same problem as that of the centralizedapproach if the aggregatorcluster head has a largenumber of nodes under its membership

(v) Excluding a few the performance of all of the schemesdiscussed in this paper has been evaluated with thehelp of different simulation tools Although the num-ber of simulators is available and plays an importantrole for developing and testing new technique thereis always some kind of risk involved as simulationresults may not be accurate In order to analyze aprotocol more effectively it is important to knowdifferent available tools andunderstand the associatedbenefits and limitationsDue to different performancerequirements according to specific applications ageneral tool for sensor networks is still lacking atpresent

(vi) The techniques evaluated by using analytical mod-eling [21 23 46 49 100 109] used certain sim-plification and assumption to evaluate the perfor-mance of proposed technique Such assumptions andsimplifications may lead to imprecise results withlimited confidence None of the proposed techniqueis evaluated by using real deployment Although realdeployment is complex costly and time consuming

accurate results can only be obtained by using realdeployment

(vii) Excluding a few [22 103 109] the majority oftechniques assumes that sensor nodes are stationaryand do not consider nodes mobility Applying thesetechniques for mobile networks or the networks withdynamic changed topology would be challenging

(viii) Most of the techniques used the synthetic dataAlthough synthetic data is easily available therealways been chances that results generated on syn-thetic data are not accurate

(ix) For the data mining techniques themselves fre-quent pattern mining [15ndash20] approaches suffer fromchoice of proper and flexible support and confidencethreshold Clustering techniques [11ndash14] suffer fromthe choice of an appropriate parameter of clusterwidth and computing the distance between datainstances in heterogeneous data is computationallyexpensive whereas classification-based techniques[24ndash26] require some prior knowledge to classify theincoming data stream However learning accurateclassification model is challenging if the number ofvariables is large in deployed WSNs

7 Future Research Directions

It is observed from the analysis of existing data mining workon sensor network-based application there are still shortcom-ings in existing techniques By seeing these shortcomingsand special characteristics of WSNs there is a need for datamining technique designed for WSNs The technique shouldbe based on the following requirements

(i) The technique should combine offline learningmech-anisms with distributed and online data processing

(ii) It should also consider the resource constraint ofWSN and its special characteristics such as nodemobility and network topology

(iii) The technique should consider heterogeneous dataand dependencies among spatial temporal andattribute correlations which may exist between adja-cent nodes

(iv) During online mining the technique should be capa-ble for incremental learning

(v) The technique should have low computation com-plexity and be easy to be implemented

Based on aforementioned requirements for WSN ahybrid data mining framework is proposed as shown inFigure 6 In this framework sensor nodes use their pro-cessing abilities to locally carry out mining processing andtransmit only the required and partially processed data calledlocal models Single-pass algorithms are applied for networkdata processing as the data is continuously arriving and notavailable for the next scan

Local models contain the compact event patterns ratherthan raw data which address the issue of communication

20 International Journal of Distributed Sensor Networks

Node data processingData selectionRemove duplicationAggregationSummarizationData fusionclusteringAssociation analysismiddot middot middotmiddot middot middot

middot middot middot

Sensor datastream

Global model

Approximateresults

Network model Local modelQuery

Users

Sinkbasestation

In-network processingCentralized processing

Central data processingFrequent pattern miningClassificationClusteringIncremental learningPredicationAnomaly detectionTime series analysis

Network data processingLocal model integrationNetwork analysisReal time decisionsNetwork maintenance

Network patternidentification

monitoring

Sing

le p

ass

Mul

ti pa

ss

Figure 6 Proposed hybrid framework for sensor network based applications

overhead associated with data transfer Local models aredistributed on entire network which are integrated at specialnode which is resource sufficient as compared with othersensor nodes As a result a network model is computed that ismore abstract than local model and is transferred to the basestationsink inmultihop fashionThenetworkmodels are thenintegrated at base stationsink to get the global view of entirenetwork named the global model As a result approximatequery answers are returned to endusers

This framework addresses the following shortcomings ofthe existing techniques

(i) It combines the offline learning mechanisms withdistributed and online data processing The dynamicnature of WSNs data requires real-time analysismethodologies and systems Centralized processingthrough high-end computing is also required forgenerating offline predictive insights which in turncan facilitate real-time analysis The applications thatrequire real-time response and actions can use net-work model for decision and knowledge extractionThe applications that need extensive data analysis fortheir decision making can use global model and per-form central processing on base the stationsink Thenetwork model forwards the processed informationto global model for extensive predictive insight

(ii) Since the data management is a crucial issue inWSNsdata [111] in order to deal with large-scale data fromWSNs the proposed framework splits the data pro-cessing tasks at multiple locations in-network pro-cessing and processing at central server In-networkprocessing splits the large task into smaller ones atnode level and cluster head which is distributed overthe entire network and executes parallelly At the node

level storage capacities of single nodes are used tocompute the local model which contains aggregateddata from single node whereas cluster head acquiresthe data from group of nodes and aggregate datareadings over a certain region or period As a resultnetwork model is computed at each cluster headwhich contains compact data from set of nodes andreduces data size to be transmitted Network modelscan be integrated at sink to get the global view ofreal-time applications Since the sink at network levelhas restricted resource and cannot process large-scaledata for predictive analysis therefore network mod-els are sent to central server where global models canbe computed for predictive offline analysis Historicalquery from the user can also be addressed fromcentral server whereas instant query can be handledby sink to support real-time response In this way ofdata distribution the proposed framework is feasibleto deal with large amount of data obtained fromWSNs

(iii) It can consider the resource constraint of sensornode by using context-awareness techniques Mem-ory energy [79] and bandwidth are considered inthe implementation of data processing on the sensorsfor example many summarization and aggregationtechniques can be adopted to reduce energy andbandwidth consumption

(iv) The framework can address the problem quicklychanging nature of WSNs data where characteristicsof the monitored process may change over timeand render the old models outdated This problemcan be addressed using the incremental learning

International Journal of Distributed Sensor Networks 21

mechanism [39 112] that helps the model to updatenew information

(v) The framework can identified the spatial-temporalcorrelation at local model by using data correlation-based clustering whereas attribute correlation can beidentified at global model by using the multipass datamining algorithms

Currently we are working on implementation of thishybrid framework and the implementationwill be completedin the near future

8 Conclusion

The emerging need for the data mining techniques in thefield of WSNs resulted in the development of numerousalgorithms Each one of these algorithms solves certainissues related to the appropriate WSNs type and applicationIn this paper we analyzed discussed and compared therelated existing research approaches We observed that thetechniques intended for mining sensor data at the networkside are helpful for taking real-time decision aswell as serve asprerequisite for development of effective mechanism for datastorage retrieval query and transaction processing at centralside Moreover we have presented problem-based taxonomyan overall analysis and review of the past research and theirlimitations which can provide insights for endusers in apply-ing or developing an appropriate data mining method andappropriate technology forWSNs Based on these limitationswe have proposed a hybrid framework which can addressthe shortcomings of existing work We have also discussedthe challenges for implementing data mining techniques inresource-constrained WSNs Besides there are a number ofopen issues in existing studies which need to be addressedSurely the number of WSNs applications presented hereis neither complete nor exhaustive but merely a sample ofapplications that demonstrate the usefulness and possibleapplications of data mining method in sensor network

We believe that WSNs applications will become moremature and popular with the advancement of sensor tech-nology and sensor data will become more informationrich Mining techniques will then be very significant inorder to conduct advanced analysis such as determiningtrends and finding interesting patterns thus enhancingWSNsperformance and operation The intention to present thispaper is to stimulate interests in utilizing and developing theprevious studies into emerging applications

Acknowledgments

This work was supported in part by the Joint Funds ofNSFC-Microsoft Research Asia under Grant no 60933012the Specialized Research Fund for the Doctoral Programof Higher Education under Grant no 20110142110062 andInternational SampT Cooperation Program of Hubei Provinceunder Grant no 2010BFA008

References

[1] A Rozyyev H Hasbullah and F Subhan ldquoIndoor child track-ing in wireless sensor network using fuzzy logic techniquerdquoResearch Journal of Information Technology vol 3 no 2 pp 81ndash92 2011

[2] R Szewczyk E Osterweil J Polastre M Hamilton A Main-waring and D Estrin ldquoHabitat monitoring with sensor net-worksrdquo Communications of the ACM vol 47 no 6 pp 34ndash402004

[3] S H Chauhdary A K Bashir S C Shah and M S ParkldquoEOATR energy efficient object tracking by auto adjustingtransmission range in wireless sensor networkrdquo Journal ofApplied Sciences vol 9 no 24 pp 4247ndash4252 2009

[4] P K Biswas and S Phoha ldquoSelf-organizing sensor networks forintegrated target surveillancerdquo IEEETransactions onComputersvol 55 no 8 pp 1033ndash1047 2006

[5] L T Lee and C W Chen ldquoSynchronizing sensor networkswith pulse coupled and cluster based approachesrdquo InformationTechnology Journal vol 7 no 5 pp 737ndash745 2008

[6] N Sabri S A Aljunid B Ahmad A Yahya R KamaruddinandM S Salim ldquoWireless sensor actor network based on fuzzyinference system for greenhouse climate controlrdquo Journal ofApplied Sciences vol 11 no 17 pp 3104ndash3116 2011

[7] D Kumar ldquoMonitoring forest cover changes using remotesensing and GIS a global prospectiverdquo Research Journal ofEnvironmental Sciences vol 5 pp 105ndash123 2011

[8] J Yick B Mukherjee and D Ghosal ldquoWireless sensor networksurveyrdquoComputerNetworks vol 52 no 12 pp 2292ndash2330 2008

[9] T Arampatzis J Lygeros and S Manesis ldquoA survey of appli-cations of wireless sensors and wireless sensor networksrdquoin Proceedings of the 20th IEEE International Symposium onIntelligent Control (ISIC rsquo05) pp 719ndash724 June 2005

[10] Y-C Tseng M-S Pan and Y-Y Tsai ldquoWireless sensor net-works for emergency navigationrdquo Computer vol 39 no 7 pp55ndash62 2006

[11] T Yairi Y Kato and K Hori ldquoFault detection by miningassociation rules fromhouse-keeping datardquo inProceedings of the6th International Symposium on Artificial Intelligence Roboticsand Automation in Space pp 18ndash21 2001

[12] O Horovitz S Krishnaswamy and M M Gaber ldquoA fuzzyapproach for interpretation of ubiquitous data stream clusteringand its application in road safetyrdquo Intelligent Data Analysis vol11 no 1 pp 89ndash108 2007

[13] J Gama P P Rodrigues and L Lopes ldquoClustering distributedsensor data streams using local processing and reduced com-municationrdquo Intelligent Data Analysis vol 15 no 1 pp 3ndash282011

[14] Z A Aghbari I Kamel and T Awad ldquoOn clustering largenumber of data streamsrdquo Intelligent Data Analysis vol 16 no1 pp 69ndash91 2012

[15] A Boukerche and S Samarah ldquoAn efficient data extractionmechanism for mining association rules from wireless sensornetworksrdquo in Proceedings of the IEEE International Conferenceon Communications (ICC rsquo07) pp 3936ndash3941 June 2007

[16] Y Chi H Wang P S Yu and R R Muntz ldquoMomentmaintaining closed frequent itemsets over a stream slidingwindowrdquo inProceedings of the 4th IEEE International Conferenceon Data Mining (ICDM rsquo04) pp 59ndash66 November 2004

[17] M Deypir and M H Sadreddini ldquoEclatDS an efficient slid-ing window based frequent pattern mining method for data

22 International Journal of Distributed Sensor Networks

streamsrdquo Intelligent Data Analysis vol 15 no 4 pp 571ndash5872011

[18] J Gama A Ganguly O Omitaomu R Vatsavai and M GaberldquoKnowledge discovery from data streamsrdquo Intelligent DataAnalysis vol 13 no 3 pp 403ndash404 2009

[19] B George J M Kang and S Shekhar ldquoSpatio-temporal sensorgraphs (STSG) a data model for the discovery of spatio-temporal patternsrdquo Intelligent Data Analysis vol 13 no 3 pp457ndash475 2009

[20] A Mahmood K Shi and S Khatoon ldquoMining data generatedby sensor networks a surveyrdquo Information Technology Journalvol 11 pp 1534ndash1543 2012

[21] D J Cook M Youngblood E O Heierman III et alldquoMavHome an agent-based smart homerdquo in Proceedings of the1st IEEE International Conference on Pervasive Computing andCommunications (PerCom rsquo03) pp 521ndash524 March 2003

[22] J Rabatel S Bringay and P Poncelet ldquoSO MAD sensorminingfor anomaly detection in railway datardquo in Advances in DataMining Applications andTheoretical Aspects pp 191ndash205 2009

[23] V Guralnik and K Z Haigh ldquoLearning models of humanbehaviour with sequential patternsrdquo in Proceedings of the AAAI-02 Workshop on Automation as Caregiver pp 24ndash30 2002

[24] S Huang and Y Dong ldquoAn active learning system for miningtime-changing data streamsrdquo Intelligent Data Analysis vol 11no 4 pp 401ndash419 2007

[25] J Beringer and E Hullermeier ldquoEfficient instance-based learn-ing on data streamsrdquo Intelligent Data Analysis vol 11 no 6 pp627ndash650 2007

[26] E J Spinosaa A PD L F deCarvalhoa and J Gamab ldquoNoveltydetection with application to data streamsrdquo Intelligent DataAnalysis vol 13 no 3 pp 405ndash422 2009

[27] M Xie S Han B Tian and S Parvin ldquoAnomaly detectionin wireless sensor networks a surveyrdquo Journal of Network andComputer Applications vol 34 no 4 pp 1302ndash1325 2011

[28] Y Zhang N Meratnia and P Havinga ldquoOutlier detectiontechniques for wireless sensor networks a surveyrdquo IEEE Com-munications Surveys and Tutorials vol 12 no 2 pp 159ndash1702010

[29] V Chandola A Banerjee and V Kumar ldquoAnomaly detection asurveyrdquo ACM Computing Surveys vol 41 no 3 article 15 2009

[30] VMaojo and J Sanandre ldquoA survey of data mining techniquesrdquoMedical Data Analysis Lecture Notes in Computer Science vol1933 pp 17ndash22 2000

[31] W Jinlong X Congfu C Weidong and P Yunhe ldquoSurveyof the study on frequent pattern mining in data streamsrdquo inProceedings of the IEEE International Conference on SystemsMan and Cybernetics (SMC rsquo04) pp 5917ndash5922 October 2004

[32] J Cheng Y Ke and W Ng ldquoA survey on algorithms formining frequent itemsets over data streamsrdquo Knowledge andInformation Systems vol 16 no 1 pp 1ndash27 2008

[33] A A Abbasi andM Younis ldquoA survey on clustering algorithmsfor wireless sensor networksrdquo Computer Communications vol30 no 14-15 pp 2826ndash2841 2007

[34] O Boyinbode H Le and M Takizawa ldquoA survey on clusteringalgorithms for wireless sensor networksrdquo International Journalof Space-Based and SituatedComputing vol 1 no 2 pp 130ndash1362007

[35] M M Gaber A Zaslavsky and S Krishnaswamy ldquoA survey ofclassificationmethods in data streamsrdquo inData Streams pp 39ndash59 Springer 2007

[36] R Agrawal and R Srikant ldquoFast algorithms for mining associ-ation rulesrdquo in Proceedings of the 20th International ConferenceVery Large Data Bases (VLDB rsquo94) pp 487ndash499 Citeseer 1994

[37] R J Bayardo Jr ldquoEfficiently mining long patterns fromdatabasesrdquo SIGMOD Record vol 27 no 2 pp 85ndash93 1998

[38] S Brin RMotwani andC Silverstein ldquoBeyondmarket basketsgeneralizing association rules to correlationsrdquo SIGMODRecordvol 26 no 2 pp 265ndash276 1997

[39] W Cheung and O R Zaiane ldquoIncremental mining of frequentpatterns without candidate generation or support constraintrdquoin Proceedings of 7th International Database Engineering andApplications Symposium pp 111ndash116 2003

[40] R Agrawal T Imielinski and A Swami ldquoMining associationrules between sets of items in large databasesrdquo in Proceeding ofSIGMOD pp 207ndash216

[41] J Han J Pei Y Yin and R Mao ldquoMining frequent pat-terns without candidate generation a frequent-pattern treeapproachrdquo Data Mining and Knowledge Discovery vol 8 no 1pp 53ndash87 2004

[42] M Halatchev and L Gruenwald ldquoEstimating missing valuesin related sensor data streamsrdquo in Proceedings of the 11thInternational Conference on Management of Data (COMADrsquo05) 2005

[43] N Jiang ldquoDiscovering association rules in data streams basedon closed pattern miningrdquo in Proceedings of the SIGMODWorkshop on Innovative Database Research 2007

[44] N Jiang and L Gruenwald ldquoEstimating missing data in datastreamsrdquo Advances in Databases Concepts Systems and Appli-cations pp 981ndash987 2007

[45] N Jiang and L Gruenwald ldquoCFI-stream mining closed fre-quent itemsets in data streamsrdquo in Proceedings of the 12th ACMSIGKDD International Conference on Knowledge Discovery andData Mining (KDD rsquo06) pp 592ndash597 August 2006

[46] K Loo I Tong and B Kao ldquoOnline algorithms for min-ing inter-stream associations from large sensor networksrdquo inAdvances in KnowledgeDiscovery andDataMining pp 291ndash3022005

[47] G S Manku and R Motwani ldquoApproximate frequency countsover data streamsrdquo in Proceedings of the 28th InternationalConference on Very Large Data Bases pp 346ndash357 2002

[48] S K Chong S Krishnaswamy S W Loke and M M GaberldquoUsing association rules for energy conservation in wirelesssensor networksrdquo in Proceedings of the 23rd Annual ACMSymposium on Applied Computing (SAC rsquo08) pp 971ndash975March 2008

[49] S K Tanbeer C F Ahmed B-S Jeong and Y-K Lee ldquoEfficientmining of association rules from wireless sensor networksrdquo inProceedings of the 11th International Conference on AdvancedCommunication Technology (ICACT rsquo09) pp 719ndash724 February2009

[50] A Boukerche and S Samarah ldquoA novel algorithm for miningassociation rules in Wireless Ad Hoc Sensor Networksrdquo IEEETransactions on Parallel and Distributed Systems vol 19 no 7pp 865ndash877 2008

[51] K Romer ldquoDistributed mining of spatio-temporal event pat-terns in sensor networksrdquo in Proceedings of the 1st Euro-American Workshop on Middleware for Sensor Networks(EAWMS rsquo06) 2006

[52] BTnode platform httpwwwbtnodeethzch[53] R Agrawal and R Srikant ldquoMining sequential patternsrdquo in

Proceedings of the IEEE 11th International Conference on DataEngineering pp 3ndash14 March 1995

International Journal of Distributed Sensor Networks 23

[54] R Srikant and R Agrawal ldquoMining sequential patterns gen-eralizations and performance improvementsrdquo in Proceedings ofthe Advances in Database Technology (EDBT rsquo96) pp 1ndash17 1996

[55] F Masseglia F Cathala and P Poncelet ldquoThe PSP approachfor mining sequential patternsrdquo Principles of Data Mining andKnowledge Discovery pp 176ndash184 1998

[56] J Han J Pei B Mortazavi-Asl Q Chen U Dayal and M-CHsu ldquoFreeSpan frequent pattern-projected sequential patternminingrdquo in Proceedings of the Sixth ACMSIGKDD InternationalConference onKnowledgeDiscovery andDataMining (KDD rsquo01)pp 355ndash359 August 2000

[57] J Pei J Han B Mortazavi-Asl et al ldquoPrefixSpan min-ing sequential patterns efficiently by prefix-projected patterngrowthrdquo in Proceedings of the 17th International Conference onData Engineering pp 215ndash224 April 2001

[58] F Esposito T M A Basile N Di Mauro and S Ferilli ldquoA rela-tional approach to sensor network data miningrdquo InformationRetrieval and Mining in Distributed Environments pp 163ndash1812010

[59] F Esposito N Di Mauro T M A Basile and S FerillildquoMulti-dimensional relational sequence miningrdquo FundamentaInformaticae vol 89 no 1 pp 23ndash43 2008

[60] R Agrawal H Mannila R Srikant et al ldquoFast discovery ofassociation rulesrdquo inAdvances in KnowledgeDiscovery andDataMining pp 307ndash328 AAAI PressMenlo Park Calif USA 1996

[61] Mica2Dot CrossBow 2005 httpwwwxbowcom[62] Intel Berkeley Research Lab Data httpdbcsailmitedulab-

datalabdatahtml[63] P H Wu W C Peng and M S Chen ldquoMining sequential

alarm patterns in a telecommunication databaserdquo in Databasesin Telecommunications II pp 37ndash51 2001

[64] V S Tseng and E H-C Lu ldquoEnergy-efficient real-time objecttracking in multi-level sensor networks by mining and predict-ing movement patternsrdquo Journal of Systems and Software vol82 no 4 pp 697ndash706 2009

[65] V S Tseng and K W Lin ldquoEnergy efficient strategies for objecttracking in sensor networks a data mining approachrdquo Journalof Systems and Software vol 80 no 10 pp 1678ndash1698 2007

[66] S Samarah M Al-Hajri and A Boukerche ldquoA predictiveenergy-efficient technique to support object-tracking sensornetworksrdquo IEEE Transactions on Vehicular Technology vol 60no 2 pp 656ndash663 2011

[67] A Taherkordi R Mohammadi and F Eliassen ldquoA commu-nication-efficient distributed clustering algorithm for sensornetworksrdquo in Proceedings of the 22nd International Conferenceon Advanced Information Networking and Applications Work-shopsSymposia (AINA rsquo08) pp 634ndash638 March 2008

[68] G Gupta and M Younis ldquoLoad-balanced clustering of wirelesssensor networksrdquo in Proceedings of the International Conferenceon Communications (ICC rsquo03) vol 3 pp 1848ndash1852 May 2003

[69] S Bandyopadhyay and E J Coyle ldquoAn energy efficient hier-archical clustering algorithm for wireless sensor networksrdquo inProceedings of the 22nd Annual Joint Conference on the IEEEComputer and Communications Societies pp 1713ndash1723 April2003

[70] S Ghiasi A Srivastava X Yang and M Sarrafzadeh ldquoOptimalenergy aware clustering in sensor networksrdquo Sensors vol 2 no7 pp 258ndash269 2002

[71] O Younis and S Fahmy ldquoHEED a hybrid energy-efficientdistributed clustering approach for ad hoc sensor networksrdquoIEEE Transactions on Mobile Computing vol 3 no 4 pp 366ndash379 2004

[72] M Younis M Youssef and K Arisha ldquoEnergy-aware manage-ment for cluster-based sensor networksrdquo Computer Networksvol 43 no 5 pp 649ndash668 2003

[73] Y T Hou Y Shi H D Sherali and S F Midkiff ldquoOn energyprovisioning and relay node placement for wireless sensornetworksrdquo IEEE Transactions on Wireless Communications vol4 no 5 pp 2579ndash2590 2005

[74] T Wu and S Biswas ldquoA self-reorganizing slot allocation proto-col for multi-cluster sensor networksrdquo in Proceedings of the 4thInternational Symposium on Information Processing in SensorNetworks (IPSN rsquo05) pp 309ndash316 April 2005

[75] K Dasgupta K Kalpakis and P Namjoshi ldquoAn efficientclustering-based heuristic for data gathering and aggregationin sensor networksrdquo in Proceedings of the IEEE Wireless Com-munications and Networking Conference (WCNC rsquo03) vol 3 pp1948ndash1953 2003

[76] M Demirbas A Arora and V Mittal ldquoFLOC A fast local clus-tering service for wireless sensor networksrdquo in Proceedings ofWorkshop on Dependability Issues in Wireless Ad Hoc Networksand Sensor Networks (DIWANS rsquo04) 2004

[77] P Ding J Holliday and A Celik ldquoDistributed energy-efficienthierarchical clustering for wireless sensor networksrdquo in Pro-ceedings of the 1st IEEE International Conference on DistributedComputing in Sensor Systems (DCOSS rsquo05) pp 466ndash467 July2005

[78] H Chan and A Perrig ldquoACE an emergent algorithm for highlyuniform cluster formationrdquoWireless Sensor Networks vol 2920pp 154ndash171 2004

[79] H Chan M Luk and A Perrig ldquoUsing clustering informationfor sensor network localizationrdquo in Proceedings of the 1st IEEEInternational Conference on Distributed Computing in SensorSystems (DCOSS rsquo05) pp 109ndash125 July 2005

[80] H Huang and J Wu ldquoA probabilistic clustering algorithmin wireless sensor networksrdquo in Proceeding of IEEE 62ndSemiannual Vehicular Technology Conference (VTC rsquo05) p 17962005

[81] A Youssef M Younis M Youssef and A Agrawala ldquoDis-tributed formation of overlappingmulti-hop clusters in wirelesssensor networksrdquo in Proceedings of the 49th Annual IEEE GlobalCommunication Conference (Globecom rsquo06) pp 1ndash6 December2006

[82] S Dai P Wang L Gao and S Zheng ldquoMining clusteringalgorithm in wireless sensor networksrdquo in Proceedings of theIEEE International Conference on Granular Computing (GRCrsquo08) pp 178ndash182 August 2008

[83] W R Heinzelman A Chandrakasan and H Balakrish-nan ldquoEnergy-efficient communication protocol for wirelessmicrosensor networksrdquo in Proceedings of the 33rd AnnualHawaii International Conference on System Siences (HICSS rsquo00)vol 2 p 223 January 2000

[84] C Liu K Wu and J Pei ldquoA dynamic clustering and schedulingapproach to energy saving in data collection from wirelesssensor networksrdquo in Proceedings of the 2nd Annual IEEE Com-munications Society Conference on Sensor and AdHoc Commu-nications and Networks (SECON rsquo05) pp 374ndash385 September2005

[85] L Guo C Ai X Wang Z Cai and Y Li ldquoReal time clusteringof sensory data in wireless sensor networksrdquo in Proceedingsof the IEEE 28th International Performance Computing andCommunications Conference (IPCCC rsquo09) pp 33ndash40 December2009

24 International Journal of Distributed Sensor Networks

[86] M H Yeo M S Lee S J Lee and J S Yoo ldquoData correlation-based clustering in sensor networksrdquo in Proceedings of the Inter-national Symposium on Computer Science and its Applications(CSA rsquo08) pp 332ndash337 October 2008

[87] P Beyens A Nowe and K Steenhaut ldquoHigh-density wirelesssensor networks a new clustering approach for prediction-based monitoringrdquo in Proceedings of the 2nd European Work-shop on Wireless Sensor Networks (EWSN rsquo05) pp 188ndash196February 2005

[88] S Yoon and C Shahabi ldquoThe Clustered AGgregation (CAG)technique leveraging spatial and temporal correlations in wire-less sensor networksrdquo ACM Transactions on Sensor Networksvol 3 no 1 Article ID 1210672 2007

[89] K Wang S A Ayyash T D C Little and P Basu ldquoAttribute-based clustering for information dissemination in wirelesssensor networksrdquo in Proceedings of the 2nd Annual IEEE Com-munications Society Conference on Sensor and AdHoc Commu-nications and Networks (SECON rsquo05) pp 498ndash509 Santa ClaraCalif USA September 2005

[90] X Ma S Li Q Luo et al ldquoDistributed hierarchical clusteringand summarization in sensor networksrdquo in Advances in Dataand Web Management pp 168ndash175 2007

[91] L K Sharma O P Vyas S Schieder et al ldquoNearest neighbourclassification for trajectory datardquo Information and Communica-tion Technologies vol 101 pp 180ndash185 2010

[92] B Chikhaoui S Wang and H Pigot ldquoA new algorithm basedon sequential pattern mining for person identification in ubiq-uitous environmentsrdquo in Proceedings of the 4th InternationalWorkshop on Knowledge Discovery form Sensor Data (ACMSensorKDD rsquo10) pp 20ndash28 Washington DC USA 2010

[93] J R M Bauchet S Giroux H Pigot et al ldquoPervasive assistancein smart homes for people with intellectual disabilities a casestudy on meal preparationrdquo International Journal of AssistiveRobotics and Mechatronics vol 9 no 4 pp 42ndash54 2008

[94] D J Cook andM Schmitter-Edgecombe ldquoAssessing the qualityof activities in a smart environmentrdquoMethods of Information inMedicine vol 48 no 5 pp 480ndash485 2009

[95] I H Witten and E Frank Data Mining Practical MachineLearning Tools and Techniques With Java Implementation Mor-gan Kaufmann 2000

[96] K Sharma M Rajpoot and L K Sharma ldquoNearest neighbourclassification for wireless sensor network datardquo InternationalJournal of Computer Trends and Technology no 2 2011

[97] NS2 Simulator httpwwwisiedunsnamns[98] O P V L K Sharma S Schieder and A K Akasapu ldquoA nearest

neighbour classification for trajectory datardquo in Springer CCISvol 101 pp 180ndash185 2010

[99] M J Akhlaghinia A Lotfi C Langensiepen and N SherkatldquoA fuzzy predictor model for the occupancy prediction of anintelligent inhabited environmentrdquo in Proceedings of the IEEEInternational Conference on Fuzzy Systems (FUZZ rsquo08) pp 939ndash946 June 2008

[100] M Gaber S Krishnaswamy and A Zaslavsky ldquoOn-boardmining of data streams in sensor networksrdquo in AdvancedMethods for Knowledge Discovery from Complex Data pp 307ndash335 2005

[101] M M Gaber S Krishnaswamy and A Zaslavsky ldquoAdaptivemining techniques for data streams using algorithm outputgranularityrdquo in Proceedings of the Australasian Data MiningWorkshop 2003

[102] M M Gaber A Zaslavsky and S Krishnaswamy ldquoResource-aware knowledge discovery in data streamsrdquo in Proceedingsof 1st International Workshop on Knowledge Discovery in DataStreams held in Conjunction ECML and PKDD 2004

[103] S M McConnell and D B Skillicorn ldquoA distributed approachfor prediction in sensor networksrdquo in Proceedings of the Work-shop on Data Mining in Sensor Networks Newport Beach CalifUSA 2005

[104] B Malhotra I Nikolaidis and J Harms ldquoDistributed classifi-cation of acoustic targets in wireless audio-sensor networksrdquoComputer Networks vol 52 no 13 pp 2582ndash2593 2008

[105] K Flouri B Beferull-Lozano and T Tsakalides ldquoTraininga SVM-based classifier in distributed sensor networksrdquo inProceedings of the 14th International Conference onDigital SignalProcessing (DSP rsquo09) pp 1ndash5 2006

[106] K Flouri B Beferull-Lozano and T Tsakalides ldquoEnergy-efficient distributed support vectormachines for wireless sensornetworksrdquo in Proceedings of the EuropeanWorkshop onWirelessSensor Networks 2006

[107] K Flouri B Beferull-Lozano and T Tsakalides ldquoDistributedconsensus algorithms for SVM training in wireless sensornetworksrdquo in Proceedings of the 16th European Signal ProcessingConference (EUSIPCO 09) 2008

[108] S Rajasegarar C Leckie M Palaniswami and J C BezdekldquoQuarter sphere based distributed anomaly detection in wire-less sensor networksrdquo in Proceedings of the IEEE InternationalConference on Communications (ICC rsquo07) pp 3864ndash3869 June2007

[109] B Chikhaoui S Wang and H Pigot ldquoA new algorithm basedon sequential pattern mining for person identification in ubiq-uitous environmentsrdquo in Proceedings of the 4th InternationalWorkshop on Knowledge Discovery form Sensor Data (ACMSensorKDD rsquo10) pp 20ndash28 Washington DC USA 2010

[110] K Romer and F Mattern ldquoThe design space of wireless sensornetworksrdquo IEEEWireless Communications vol 11 no 6 pp 54ndash61 2004

[111] O Diallo J J P C Rodrigues and M Sene ldquoReal-time datamanagement on wireless sensor networks a surveyrdquo Journal ofNetwork andComputer Applications vol 35 no 3 pp 1013ndash10212012

[112] Y Yao L Feng B Jin and F Chen ldquoAn incremental learningapproachwith SupportVectorMachine for network data streamclassification problemrdquo Information Technology Journal vol 11no 2 pp 200ndash208 2012

Submit your manuscripts athttpwwwhindawicom

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2013Part I

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

DistributedSensor Networks

International Journal of

ISRN Signal Processing

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Mechanical Engineering

Advances in

Modelling amp Simulation in EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Advances inOptoElectronics

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2013

ISRN Sensor Networks

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawi Publishing Corporation httpwwwhindawicom Volume 2013

The Scientific World Journal

ISRN Robotics

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

International Journal of

Antennas andPropagation

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

ISRN Electronics

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

thinspJournalthinspofthinsp

Sensors

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Active and Passive Electronic Components

Chemical EngineeringInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Electrical and Computer Engineering

Journal of

ISRN Civil Engineering

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Advances inAcoustics ampVibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Page 8: ReviewArticle Data Mining Techniques for Wireless Sensor ...home.etf.bg.ac.rs/~vm/os/dmsw/Data Mining... · have a large impact on type of data mining algorithm to choose;therefore,onehastodecidetheprocessing

8 International Journal of Distributed Sensor Networks

mining algorithm movement pattern generation (MPG) toobtain themovement patterns which are then used to predictthe next position of a moving object and to activate the leastsensor node The MPG is based on Apriori which uses thefrequency of the inference pattern to evaluate the confidenceof the pattern and which with the highest frequency serves asthe basis of the prediction

423 Distributed Approaches Aim to Maximize WSNsrsquo Per-formance Tseng and Lin [65] proposed an object trackingstrategy named TMP-mine to discover sequential patternsin object tracking sensor networks (OTSNs) by mining thetemporal movement patterns (TMPs) logs The discoveredtemporal movement rules (TMRs) are used to predict thelocation of next objects for saving energy In the proposedmodel object is able to record the sensor nodes it visitedalong with the arrival time at each nodeThemovement log iscollected by equipping the sensor nodes with storage devicesTheWSN collects and integrates themovement log ofmovingobjects The integrated movement log is used as the input tothe data mining method named the TMP-miner which usesthe pattern growth approach for discovering the TMPs Byapplying the TMP-mine algorithm the TMPs are discoveredand then the temporalmovement rules (TMRs) are generatedfor predicting next location of moving object Suppose thatthe following two rules are discovered by vehicle trackingsystem

Rule 1 (Station A rarr interval 10min rarr Station B rarrinterval 5min rarr Station C)

Rule 2 (Station A rarr interval 20min rarr Station B rarrinterval 5min rarr Station rarr D)

By dispatching these rules to the corresponding sensornodes the tracking can be made in energy-efficient way Forexample if a car moves with the pattern as (Station A rarrinterval 10min rarr Station B rarr interval 5min) that matcheswith Rule 1 then the node in Station B has only to activatethe node in Station C rather than that in Station D or thosearound Station B

Samarah et al [66] proposed an energy-efficientprediction-based tracking technique by using the sequentialpatterns (PTSPs) This technique helps to predict the futurelocation of a moving object with the minimum number ofsensor nodes while keeping the other sensor nodes in thenetwork in sleep mode The PTSP is based on the inheritedpatterns of the objects movements in the network and theutilization of sequential patterns to predict in which sensornode the moving object will be heading next

43 Clustering Clustering is unsupervised learning wheregiven data is categorized into subsets so that each subsetrepresents a cluster which has distinctive properties It hasbeen considered a useful technique especially for applicationsthat require scalability to large number of sensor nodesClustering also supports aggregation of data in order tosummarize the overall transmitted data

ClustersInput sensor data

Feedback

Identification ofdata correlation Grouping data

Figure 4 Data clustering for sensor networks

In the current literatures problems related to clusteringare addressed by node clustering or data clustering Recentlylarge numbers of node clustering algorithms have beendesigned for WSNs [67ndash83] These clustering techniqueswidely vary in their objectives depending on the node deploy-ment and bootstrapping schemes the pursued networkarchitecture the characteristics of the cluster head (CH)and the network operation model Although node clusteringmay be related to data clustering for example consideringdata similarity of neighboring node many popular nodeclustering algorithms that partition the sensor nodes into anumber of small groups and elect a cluster head for everygroup do not use the data mining techniques directly In thisstudy we only focus on data clustering techniques to efficientdata mining and find data correlations among the nodesFigure 4 shows the commonly used data clustering in datamining process

This work adapted the K-mean hierarchical and datacorrelation-based methods The k-mean algorithm takes theinput parameter k and partitions a set of 119899 objects into kclusters so that the resulting intracluster similarity is highbut the intercluster similarity is low Cluster similarity ismeasured with respect to the mean value of the objectsin a cluster Hierarchical method creates a hierarchicaldecomposition of the given set of data objects It works bygrouping data objects into a tree of clusters whereas datacorrelation-based clustering forms clusters based on spatialand temporal correlations with similar node sensory valueswithin a given threshold and these clusters remain fixeduntil the sensory value threshold has changed over timeWhen the threshold values change the related sensor nodeswill then communicate with neighboring nodes associatedwith other clusters to change their cluster memberships Thedrawback of this type of clustering is that it does not considernode residual energy It is observed from the survey that thecentralized and distributed clustering solutions are aim tomaximize the WSNs performance

431 Centralized Approaches Aim to Maximize WSNsrsquo Per-formance Liu et al [84] proposed a centralized graph-basedenergy-efficient data collection (EEDC) EEDC is on-demandclustering algorithm that clusters node into groups such thatmembers have similar sensor readings and thus the protocolclusters the network with an awareness of the phenomenabeing sensed EEDC is a centralized approach where thesink compares data from different nodes with a user-defineddissimilarity measure EEDC models the cluster creationprocess as a clique-covering problem by constructing a graph119866 such that each sensor node is a vertex in the graph An edge(119906 V) is drawn if the dissimilarity measure between vertex119906 and vertex V is less than or equal to the given intracluster

International Journal of Distributed Sensor Networks 9

dissimilarity measure thresholdmax dst A cluster is a cliquein the graph and the clustering problem uses the minimumnumber of cliques to cover all vertices in the graph Thisprocess minimizes the number of clusters and maximizes theenergy saving The sink also dynamically adjusts the clustersbased on spatial correlation and the received data from thesensors The algorithm produces robust and well-balancedclusters However due to centralized processings it is notsuitable for large-scale WSNs

432 Distributed Approaches Aim toMaximizeWSNsrsquo Perfor-mance Guo et al [85] proposed the H-cluster a distributedalgorithm to cluster sensory dataThe input of this algorithmis the set of sensory data collected by all of the sensorsfrom the time WSN starts working up to the current timeThe output of the algorithm is a set of cluster featuresthat summarize the clusters of the input sensory data-setHilbert-Map mapping algorithm has been used to map ad-dimensional sensory data space into a 2-dimensional areacovered by a given WSN H-cluster has 2 phases (1) itmerges connected grid features with local cluster featuresof (sensory dimensional) D at each destination node (2)it combines the connected local clusters to global clustersThe experiments on the centralized and distributed dataare carried out to compare the H-Cluster with C-Cornerand C-Center algorithms During experiment four types ofenvironment attributes are sensed by the sensors which aretemperature humidity light and voltage The results showthatH-Cluster algorithm ismuch efficient in data loss energyand the quality of cluster data in small WSNThe results alsoshows that as the amount of sensory data delivered increasesthe amount of data loss also increases and energy efficiencydecreases by increasing the size of WSNs

Yeo et al [86] proposed data correlation-based clusteringscheme (DCC) based on similarity of sensor data along aspatial suppression scheme which helps to reduce the datasize DCC enhances the advertisement phase of HEED [71]in which cluster heads are selected according to probabilityof becoming a cluster head during this phase sensor nodescommunicate with each other and the resulting clustersare organized by sensor nodes which have similar readingsSpatial suppression is performed on cluster head and italso computes the difference between sensor reading andrepresentative value If a cluster head has redundant datait will remove it except for the node identification Theexperimental results justify the hypothesis claim that theclustering based on data correlation has better compressionperformance than ordinary clustering based on locality ofcommunication they show that DCC reduces 40 of datasize through suppression and prolongs network lifetime20ndash30 However for the large-scale network applications(nodes gt 500) DCC is inefficient because each cluster headneeds more energy to collect similar data readings and alsoto communicate with several nodes Also in case of lowpercentage of similar data reading DCC is ineffective due tohigher rate of cluster head creation

Beyens et al [87] proposed a cluster-based architecturefor wireless sensor networks in which cluster heads spa-tiotemporally correlate and predict the measurements of the

cluster members by executing their prediction model Intheir approach the cluster heads execute a prediction modelwhile gateway nodes at the circumference of the clusters areresponsible for the routing task Prediction model is used toselect a suitable node of the cluster to be activated The ideais to put a sensor node to sleep when there are no objects inits sensing region

Yoon and Shahabi [88] present the clustered aggregation(CAG) algorithm that forms clusters of nodes sensing similarvalues within a given threshold (spatial correlation) andthese clusters remain unchanged as long as the sensor valuesstay within a threshold over time (temporal correlation)By grouping nodes on similar values CAG only transmitsone reading per group When the threshold values changethe related sensor nodes will then communicate with neigh-boring nodes associated with other clusters to change theircluster memberships CAG guarantees the result to be withina user-specified error-tolerance threshold Cluster formationis performed while queries are disseminated to the network(query phase) where clusters group nodes sensing similarvalues Subsequently CAG enters the response phase whereinonly one aggregated value per cluster is transmitted up theaggregation tree CAG is a lossy clustering algorithm (mostsensory readings are never reported) which trades a lowerresult precision for a significant energy storage computationand communication saving

Taherkordi et al [67] proposed a communication-efficient distributed protocol for clustering sensory dataA distributed version of 119870-Mean clustering algorithm isproposed and sends summarized data towards sink whichreduces the communication transmission time and powerconsumption of sensor nodes The sensor network is dividedinto clusters and cluster head node will only communicatewith sink Initially base station transmits current centerlocations to cluster heads Cluster head collects data fromits sensor node and sends it to the base station includingcount and vector sum of its local sensory data points aswell as sum of the squared distance from each local pointto its center On receiving data from CH the base stationupdates the cluster mean and the algorithm repeats until thefunction convergence is met The efficiency of the algorithmis evaluated via simulations Several programs are run to getthe average number of transmissions over the network duringeach test According to results the communication cost isindependent of the number of sensors (119873) and increaseslinearly by increasing the number of centers Major issuesare extra memory for cluster head and computation powerfor summarization of data before transmitting to sink Alsothe algorithm requires multiple rounds of message passingbetween cluster heads and the base station this may have aserious effect on communication efficiency when the numberof sensors is relatively high

Wang et al [89] promoted the idea of clustering theWSNs based on the queries and attributes of the data Themain motive is to achieve efficient dissemination of data inthe network The concept resembles the data-centric designmodel of WSNs The clustering is established by mappinga hierarchy of data attributes to the network topology Thebase station starts the clustering process by asking nodes

10 International Journal of Distributed Sensor Networks

Class label (Y)

Attribute set (X)

OutputInput Classification model

Figure 5 Classification maps input attribute set (X) to class label(Y)

to form clusters Those nodes that hear the request decidewhether they should nominate themselves as CHs basedon their energy After receiving the base-station requestsensor nodes having intention to become CHs wait for arandom time period that is based on the remaining batterysupply If a node nominates itself then it broadcasts anannouncement to all nodes A node joins the CH that itcan reach over the least number of hops Upon hearing aCH announcement from a node whose attribute is differentthe recipient node establishes a new cluster for that attributeand becomes a CH To evaluate the attribute-based clusteringscheme the authors have provided the theoretical analysis ofit with flooding-based schemes Analysis shows its attribute-based clustering scheme yield that gains over flooding-basedschemeswhen there are subregions in the sensor network thatare more targeted than others that is when the distributionof inquiries is not uniformly distributed over time and space

Ma et al [90] the proposed distributed hierarchicalclustering and Summarization algorithm (DHCS) for onlinedata analysis and mining in sensor networks The proposedmethod clusters sensor nodes based on their current datavalues aswell as their geographical proximity and it computesa summary for each cluster The algorithm adopts severaltechniques such as difference and hop count thresholds tomodel node and distance-based clustering Initially eachnode treats itself as an active cluster Then similar adjacentclusters are merged into larger clusters round by round Ineach round each cluster will try to combine with its mostsimilar adjacent cluster simultaneously Two clusters can bemerged only if both consider one another as the most similarneighbor DHCS terminates when no merging happens anymore The final clusters which cannot be merged any moreare called steady clusters

44 Classification Classification is a task of assigning newobject into a class of predefined object categories Classifi-cation model is learned using the set of training data andclassifies new data into one of the learned class Figure 5shows that classification maps input attribute set (X) to classlabel (Y)

Classification-based approaches have adapted the tra-ditional classification techniques such as decision tree-based rule-based nearest neighbor-based and support vectormachines-based techniques based on type of the classificationmodel that they used Decision tree is a classifier in the formof tree and classifies the instance by starting at the root oftree and moving through it until a leaf node where class labelis assigned The internal nodes are used to partition datainto subsets by applying test condition to separate instancesthat have different characteristics Nearest neighbor-basedapproaches classify dataset based on closet training examples

The training examples are vectors in a multidimensionalfeature space with corresponding class labels A nearestneighbor classifier is a lazy learner that does not processpatterns during training [91] To respond a request to classifya query vector is made to locate the closest training vectorsaccording to the distance metricThe classes of these trainingvectors are used to assign a class to the query vector

Rule-based classifier groups the dataset in predefinedclasses by using ldquoif then rdquo rules of following form

(Condition) rarr Y condition is a conjunction ofattribute and Y is a class label

SVM (support vector machine) techniques partition thedata belonging to different classes by fitting a hyperplanebetween them which maximizes the partition The data ismapped into a higher-dimensional feature space where it canbe easily partitioned by a hyperplane Furthermore a kernelfunction is used to approximate the dot products between themapped vectors in the feature space to find the hyperplane

441 Centralized Approaches Aim to SolveWSNsrsquo Application-Based Issues Chikhaoui et al [92] proposed the decisionTree (DT-) based classification technique for sensor dataThey applied the classification model to identify the personsin ubiquitous environment In order to identify personsthe proposed approach first extracts frequent patterns calledepisodes from the datasets using the Apriori algorithm [53]The next step evaluates the extracted patterns and assignsweights to these episodes to construct frequent episodeweight matrix (FEWM)

Finally the classification algorithm Decision tree (DT) isapplied on FEWMDT builds pattern classifier from a labeledtraining data-set using a divide-and-conquer approach Tobuild up a DT model it recursively selects the attribute thatis used to partition the training data-set into subsets untileach leaf node in the tree has uniform class membershipThe proposed approach is validated by experiment usingdata collected from the Domus Laboratory [93] and theTestbed smart home [94] The general performance andclassification accuracy of algorithm are evaluated by usingthe Weka framework version 370 [95] Experiment resultsshow good classification However using frequent episodesalone without temporal constraints and deep analysis doesnot guarantee good identification

Sharma et al [96] proposed amethodology for classifyingthe sensors data by using nearest neighbor trajectory clas-sification (NNTC) The training phase simply stores everytraining example with its label To make a prediction for atest example first its distance to every training example iscomputedThen 119896 closest training examples are storedwhere119896 is a fixed integer and 119896 ge 1 among the 119896 examples itlooks for the label that is most frequent This label is theprediction for this test example The algorithm is evaluatedby building a classifier from the preprocessed training datagenerated from NS2 [97] and test trajectory data [98] usingclass labels Experimental investigation yields a significantoutput in terms of the correctly classified success rate 923

Akhlaghinia et al [99] proposed the prediction techniquein smart home environments to predict the behavior pattern

International Journal of Distributed Sensor Networks 11

of occupantsThe sensor NWs collect the variety of attributesincluding environmental changes and occupantrsquos interactionwith the environment The collected data is then used by thelearning approach to construct a classification-based predic-tive model to predict the ambient intelligence environmentoccupancy The occupancy is predicted by using the fuzzyrules which are modeled by using the past value of timeseries data In the learning process input from the sensor iscompared with stored rules to take appropriate action Theprediction-based approach improves the energy saving insmart homes and enhances the safety and security of occu-pants The result shows the ability of the proposed techniqueto predict the combined occupancy time series However themodel is implemented in single-user environment and unableto predict the complex environmental patterns in multi-userenvironment over long period

442 Centralized Approaches Aim toMaximizeWSNsrsquo Perfor-mance Gaber et al [100] proposed the lightweight classifica-tion (LWClass) a one-pass algorithm for on-board miningof data streams in sensor networks They used the algorithmoutput granularity (AOG) [101 102] technique to preserve thelimited memory size and change the algorithm output rateaccording to data rate available memory algorithm outputrate history and time constraints to fill the available memorywith generated knowledgeThe algorithmworks by searchingfor the nearest instance stored in main memory when a newelement arrives All instances are already stored in the mainmemory according to a prespecified distance threshold Thethreshold here represents the similarity measure acceptableby the algorithm to consider two or more elements as oneelement according to the elements attribute values If thealgorithm finds this element then it checks the class labelIf the class label is the same then it increases the weightfor this instance by one otherwise it decrements the weightby one If the weight becomes zero then this element isreleased from the memory The algorithm is empiricallyvalidated using synthetic streaming data under the resource-constrained environment of a common handheld computer

443 DistributedApproaches Aim to SolveWSNsrsquo Application-Based Issues McConnell and Skillicorn [103] presented adistributed framework for building and deploying predictorsin sensor networks By using the computational power ofeach sensor a powerful learning structure on whole networkis constructed A distributed voting approach is proposedin which each sensor is a leaf of tree (DT) to performlocal prediction Instead of sending the raw data the localpredictive models built on sensors transmit the target class tothe sink At sink the local predication models are combinedto construct global prediction model It shows how thelocal model enables sensors to respond to the change intarget by relearning local models The proposed frameworkis useful especially for sensor networks with limited energycomputation and bandwidth resources It makes efficientthe distributed data mining in the presence of movingclass boundaries Data is also confidentially achieved bytransmitting a predictivemodel instead of original data to the

sink The distributed prediction model is evaluated using J48decision tree (implemented in WEKA) on variety of datasetfor both simple and weighted voting schemes According toresults distributed prediction model has the potential of anincrease in accuracy combined with a reduction in modelsize and runtime as compared with a centralized approachMajor issues in this framework are the need of an expensiveCPU on each sensor node for computing and building localpredictive model and also extra memory is required to storelocal predictive model

444 Distributed Approaches Aim to Maximize WSNsrsquo Per-formance Malhotra et al [104] proposed a distributed clas-sification scheme to generate effective feature vectors of lowdimension (FVLD) for wireless audio network A distributedcluster-based algorithm for detection and classification ofvehicles has been proposed Sensors form clusters on-demand for the sake of running a classification task based onthe produced feature vectors The monitoring area is dividedinto clusters and a cluster head is selected for each clusterAll sensors send their feature vector to cluster heads Thecluster head combines all received feature vectors (includingone from itself) executes the classification task using forexample KNN or ML classifiers and makes decision on theclass of the unknown vehicle Two approacheswere proposedthe first combines extracted features and the second combinesindividual decisions Classification using decision fusion anda maximum likelihood (ML) classifier led to the best resultsML is also compared with KNN classifier with varioussettings of data and decision fusion schemes The proposedtechnique produced the best classification accuracy of 8946as compared with all other approaches

Flouri et al [105ndash107] have proposed distributed andincremental techniques for learning classification rules usingSVM-based (support vector machine) technique in a sensornetwork The authors proposed two distributed algorithmsthe distributed fix partition SVM (DFP-SVM) and theweighted distributed fix partition SVM (WDFP-SVM) fortraining a SVM applied to the classification problem in aWSN SVM is incrementally trained on example set calledsupport vector The fact with SVM is that the number ofsupport vectors is very small comparedwith the number of allsample values Besides the support vectors (and offset) revealcompressed representation of separating SVM hyperplaneThat is why sending only the support vectors instead ofall training samples to the next cluster head is obviouslyvery energy efficient due to communication reduction Aftertraining the required parameters of the kernel functions aretransferred to each node for classification The performanceof the proposed approach is evaluated by running number ofsimulation and comparison is made with centralized algo-rithm The results show that energy consumption decreaseswhen the SVM is trained incrementally as compared with thecentralized case However the challenges for SVM formula-tions are computational complexity and the choice of properkernel function

Rajasegarar et al [108] proposed the SVM-based tech-nique for outlier detection in sensor data This techniqueuses one-class quarter-sphere SVM to identify local outliers

12 International Journal of Distributed Sensor Networks

at each node and to minimize the computational complexityThe sensor data that lies outside the quarter sphere isconsidered as an outlier Each node communicates onlythe radius information of sphere with its parent for outlierclassification This technique identifies outliers from the datameasurements collected after a long-time window and is notperformed in real time The technique also ignores spatialcorrelation of neighboring nodes which makes the results oflocal outliers inaccurate The technique is evaluated by usingthe real sensor measurement collected from deployment ofwireless sensors in the Great Duck Island Project [2] formonitoring the habitat of sea birds The algorithm is imple-mented in Matlab and two simulations were run to measurethe computational strategy and various kernel functionsResults reveal that the proposed technique achieves signifi-cant energy savings in terms of communication overhead inthe network

5 Comparison of Data Mining Techniquesfor WSNs

This section identifies several common and different aspectsof data mining techniques specially designed for WSNsdiscussed above These aspects will be used as metrics in thecomparative Tables 2 3 4 5 and 6 First evaluation aspectsfor different techniques are discussed and then comparativetables are presented to compare and differentiate existing datamining techniques for WSNs data

51 Input Sensor Data Sensor data can be viewed as largevolume of real-valued data that is continuously collectedfrom WSNs The type of input sensor data demonstrateswhich data mining techniques can be used to analyze thedata Data mining techniques usually consider following twocharacteristics of data

Attribute Mining techniques can identify the associationbetween data attributes Attributes can be homogenous [50] orheterogeneous [33 48] Homogenous attribute means sensingsingle-value attribute for example temperature only Forheterogeneous case each nodemay be equippedwithmultiplesensors and can sense multiple attributes for example tem-perature humidity and pressure The data mining techniqueshould be able to identify the correlation between multipleattributes

Correlation Two types of data correlation appear at eachsensor node The first type is attribute correlation that isdependency among data attributes The second type is interms of time and space that is temporal and spatial corre-lation Temporal correlation indicates that the readings fromdifferent sensor node are observed at the same time instantand readings observed at one time instant are related tothe readings observed at the previous time instant whereasspatial correlation indicates that the readings from sensornodes geographically close to each other are expected tobe largely correlated Capturing spatiotemporal correlation

helps to predict future trend of sensor reading and identifica-tion of dead node if reading from correlated sensor ismissing

52 Processing Architecture In order to apply data miningtechnique on sensor data we need to determine the modelsof computation There are two general models Consider thefollowing

CentralizedThe simplest way to analyzeWSNs data is to use acentralized model In this approach entire raw data collectedfromWSNs is transferred to central server whichmaintains adatabase of readings from all of the sensorsThe central serverperforms offline extensive analysis in order to find interestingpatterns from the aggregated data With the size of WSNsincreasing the amount of data transmitted in the system willbecome huge The obvious drawback of this approach is highconsumption of energy and bandwidth Furthermore it is notscalable to very large number of sensors

Distributed Another computation approach uses distributedmodel in which sensor nodes use their processing abilitiesto carry out some mining tasks locally and transmit onlythe required and partially processed data called local modelLocal models contain the compact event patterns rather thanraw data For example data collected from different sensorcan be aggregated before being transmitted to central serverIn these systems an intermediate node called ldquoaggregatorrdquo isused to collect and aggregate the data from different sensorsSince sensor nodes are constrained in resources the challengefor this approach is how to satisfy the mining accuracywhile keeping the communication overhead memory andcomputational cost low

53 Data Mining Method It refers to the data miningalgorithm adapted or developed for unique characteristic ofWSNs data Distributed approaches use one-scan algorithmsfor real-time processing in order to deal with the high dataarrival rate the mining results are expected to be availablewithin short response times whereas centralized approachescollect the sensory data to single site and applies offlinemultiscan technique for extensive data analysis

54 Node Properties The proposed techniques are largelyinfluenced by following types of node properties

Connectivity Single-hop communication is a direct commu-nication between the sensor node and the base station It issimple and easy to implement but limited by communicationdistanceMultihop communication uses some kinds of nodesas relays when transmitting data packets from the source tothe sink which is more complex

Mobility Node mobility increases the complexity of design-ing an appropriate data mining technique for WSNs Themajority of techniques assumes that sensor nodes are staticonly a few techniques consider the node mobility Whennodes are mobile maintaining a certain structure for data

International Journal of Distributed Sensor Networks 13

Table2Com

paris

onof

dataminingtechniqu

esforw

irelesssensor

networks

Approach

Objectiv

eDM

metho

d

Processin

gSensor

data

Nod

eproperties

Implem

entatio

nLimitatio

nsArchitecture

Attributes

Correlatio

nCon

nectivity

Mob

ility

Nod

erole

Nod

etask

Applicationarea

Evaluatio

nmetho

dDatas

ource

Opt

objective

Distributed

Central

Homogenous

Heterogeneous

Attribute

Spatial

Temporal

Singlehop

Multihops

Static

Mobile

ClusterheadSensorRelay

Simulation

AnalyticalMod

Real

Synthetic

Frequent

patte

rnmining

DSA

RM[42]

Missingdata

estim

ation

Aprio

rilik

eradicradic

radicradic

radicradic

Sensea

ndsend

Traffi

cmon

itorin

gradic

radicData

accuracy

Igno

rethes

ensor

thatrepo

rts

different

values

In-networkdata

mining[51]

Eventspatte

rns

discovery

Aprio

rilik

eradic

radicradicradic

radicradic

radic

Aggregatio

nlocalp

attern

mining

Environm

ental

mon

itorin

gradicradicradic

Scalability

Highmem

oryand

commun

ication

Distrib

uted

data

aggregation[15]

ImproveW

SNperfo

rmance

Aprio

rilik

eradic

radicradic

radicradic

radicSupp

ort-b

ased

aggregation

WSN

sperfo

rmance

mon

itorin

gradic

radicDatas

ize

Increasesb

uffer

cost

delayed

crucialm

essages

Onlinea

lgorith

m[46]

Intervallist

ofrepresentatio

nof

WSN

sdata

Lossy

coun

ting

radicradic

radicradic

radicradic

Perio

dical

sensing

WSN

smon

itorin

gradic

radicTimea

ndmem

ory

Datar

edun

dancy

Lightweightrule

learning

[48]

Identifyhigh

lycorrelated

rules

forsensin

gAp

riorilik

eradic

radicradic

radicradic

radicQuery-based

data

sensing

Con

trolW

SNs

operations

radicradic

Energy

Not

valid

ated

well

onrealdata

CARM

[43]

Missingdata

estim

ation

FP-growth

based

radicradic

radicradic

radicradic

Sensea

ndsend

Dataa

nalysis

radicradic

Data

accuracy

Ineffi

cientfor

hand

ling

high

-speed

data

14 International Journal of Distributed Sensor Networks

Table3Com

paris

onof

dataminingtechniqu

esforw

irelesssensor

networkscon

tinued

Approach

Objectiv

eDM

metho

d

Processin

gSensor

data

Nod

eproperties

Implem

entatio

nLimitatio

nsArchitecture

Attributes

Correlatio

nCon

nectivity

Mob

ility

Nod

erole

Nod

etask

Applicationarea

Evaluatio

nmetho

dDatas

ource

Opt

objective

Distributed

Central

Homogenous

Heterogeneous

Attribute

Spatial

Temporal

Singlehop

Multihops

Static

Mobile

Clusterhead

Sensor

Relay

Simulation

Analyticalmod

Real

Synthetic

Frequent

patte

rnmining

Associationrules

mining

fram

ework[50]

Faultand

future

event

predictio

n

FP-growth

usingPL

T-str

uctureradic

radicradic

radicradic

radicradic

Aggregatio

nMon

itorW

SNs

quality

ofserviceradic

radicNoof

messages

Increase

costdu

eto

multip

leDBscan

SP-tr

ee[49]

Disc

over

events

patte

rns

FP-growth

based

radicradic

radicradic

radicradic

Sensea

ndsend

Generic

mon

itorin

gradicradicradic

Mem

ory

Hightre

econstructio

ncost

Sequ

entia

lpattern

mining

Relatio

nal

fram

ework[58]

Multi-

dimensio

nal

correlation

discovery

Aprio

rilik

eradic

radicradicradic

radicradic

Sensea

ndsend

Environm

ental

mon

itorin

gradicradic

Data

representatio

nMem

oryandtim

econsum

ing

Episo

dediscovery(ED)

[21]

Actio

npredictio

n

Generalized

sequ

entia

lpatte

rn(G

SP)

radicradic

radicradic

radicSensea

ndsend

Inhabitants

behavior

predictio

nradicradicradic

Predictio

naccuracy

Ineffi

cientfor

complex

activ

ities

MPG

[64]

Predicto

bjectrsquos

future

movem

ent

Aprio

rilik

eradic

radicradic

radicradicradic

Clusterin

gRe

al-timeo

bject

tracking

radicradic

Tracking

time

andenergy

Not

analyzed

onrealdataset

Con

textual

patte

rns

discovery[22]

Ano

maly

detection

PSP

radicradicradic

radicradic

radicSensea

ndsend

Railw

aymaintenance

radicradic

Ano

maly

precision

Missingreal-time

anom

alypredictio

n

International Journal of Distributed Sensor Networks 15

Table4Com

paris

onof

dataminingtechniqu

esforw

irelesssensor

networkscon

tinued

Approach

Objectiv

eDM

metho

d

Processin

gSensor

data

Nod

eproperties

Implem

entatio

nLimitatio

nsArchitecture

Attributes

Correlatio

nCon

nectivity

Mob

ility

Nod

erole

Nod

etask

Applicationarea

Evaluatio

nmetho

dDatas

ource

Optobjectiv

e

Distributed

Central

Homogenous

Heterogeneous

Attribute

Spatial

Temporal

Singlehop

Multihops

Static

Mobile

ClusterheadSensorRelay

Simulation

Analyticalmod

Real

Synthetic

Sequ

entia

lpattern

mining

TMP-mine[65]

Predicto

bjectrsquos

future

movem

ent

Patte

rngrow

thusingTM

P-tre

econstructio

nradic

radicradic

radicradic

radicRu

le-based

node

activ

ation

Real-timeo

bject

tracking

radicradic

Energy

Highmissing

rateandtim

e

Patte

rnlearner[23]B

ehavior

recogn

ition

Tree

projectio

nradic

radicradic

radicradic

radicSensea

ndsend

Behavior

mon

itorin

gradicradic

Noof

patte

rns

learned

Com

plex

and

redu

ndant

patte

rns

MSA

P[63]

Faultp

rediction

Cand

idate

constructio

nradicradic

radicradicradic

radicSensea

ndsend

Telecommun

ication

radicradic

Patte

rnsa

ccuracy

Cand

idate

constructio

nis

expensiveto

compu

te

PTSP

[66]

Objectrsquos

future

movem

ent

predictio

n

Sequ

entia

lpatte

rngeneratio

nradic

radicradic

radicradic

radicRu

le-based

node

activ

ation

Objecttracking

radicradic

Energy

Ineffi

cientto

predict

high

-speed

objects

Clusterin

g

DCC

[86]

WSN

slon

gevity

Data

correlation-

based

cluste

ring

radicradic

radicradicradic

radicradic

Data

supp

ression

GenericWSN

sapplication

radicradic

Energy

anddata

size

Highclu

sterin

grate

H-cluste

r[85]

In-network

commun

ication

Data

correlation-

based

cluste

ring

radicradic

radicradicradic

radicradic

Data

summarization

Real-time

mon

itorin

gradic

radicradic

Com

mun

ication

Highdataloss

rate

16 International Journal of Distributed Sensor Networks

Table5Com

paris

onof

dataminingtechniqu

esforw

irelesssensor

networkscon

tinued

Approach

Objectiv

eDM

metho

d

Processin

gSensor

data

Nod

eproperties

Implem

entatio

nLimitatio

nsArchitecture

Attributes

Correlatio

nCon

nectivity

Mob

ility

Role

Nod

etask

Applicationarea

Evaluatio

nmetho

dDatas

ource

Optobjectiv

e

Distributed

Central

Homogenous

Heterogeneous

Attribute

Spatial

Temporal

Singlehop

Multihops

Static

Mobile

ClusterheadSensorRelay

Simulation

Analyticalmod

Real

Synthetic

Clusterin

gPredictio

nmod

el[87]

Predictio

n-based

mon

itorin

gHeuris

ticscheme

radicradic

radicradic

radicradic

radicradicradic

Localprediction

mod

elEn

vironm

ental

mon

itorin

gradic

radicCom

mun

ication

Clustero

verla

pping

CAG[88]

WSN

sbandw

idth

gain

Data

correlation-

based

cluste

ring

radicradic

radicradic

radicradic

radicradic

Dataa

ggregatio

nGenericWSN

sapplications

radicradic

Com

mun

ication

Sensorydataloss

EEDC[84]

On-demand

cluste

ring

Data

correlation-

based

cluste

ring

radicradic

radicradic

radicradic

radicSensea

ndsend

Surveillanced

ata

analysis

radicradicradic

Energy

Ineffi

cientfor

large

WSN

s

Clusterin

gsensorydata[67]Com

mun

ication

efficiency

K-means

radicradicradic

radicradic

radicradic

Data

summarization

Dataa

nalysis

radicradic

Com

mun

ication

Ineffi

cientfor

large

WSN

sAttributeb

ased

cluste

ring[89]

WSN

sbandw

idth

gain

Hierarchal

cluste

ringradic

radicradic

radicradic

radicradic

Datac

luste

ring

Mon

itorin

gand

tracking

radicradic

Com

mun

ication

Highcompu

tatio

ncost

DHCS

[90]

Uniform

data

distr

ibution

Hierarchal

cluste

ringradic

radicradicradic

radicradic

radicradic

Datac

luste

ring

and

summarization

Interactived

ata

analysis

radicMessage

redu

ction

Nod

esenergy

isigno

red

International Journal of Distributed Sensor Networks 17

Table6Com

paris

onof

dataminingtechniqu

esforw

irelesssensor

networkscon

tinued

Approach

Objectiv

eDM

metho

d

Processin

gSensor

data

Nod

eproperties

Implem

entatio

nLimitatio

nsArchitecture

Attributes

Correlatio

nCon

nectivity

Mob

ility

Role

Nod

etask

Applicationarea

Evaluatio

nmetho

dDatas

ource

Opt

objective

Distributed

Central

Homogenous

Heterogeneous

Attribute

Spatial

Temporal

Singlehop

Multihops

Static

Mobile

ClusterheadSensorRelay

Simulation

Analyticalmod

Real

Synthetic

Classifi

catio

nPerson

identifi

catio

nalgorithm

s[109]

Identifyhu

man

behavior

Decision

tree

radicradicradic

radicradic

radicSensea

ndsend

Health

care

radicradic

Classifi

catio

naccuracy

Doesn

otgu

arantee

thec

orrectness

Predictio

nfram

ework[103]

Distrib

uted

predictio

nDecision

tree

radicradic

radicradicradic

radicradic

Localprediction

Generic

radicradic

Predictio

naccuracy

Com

putatio

nal

complexity

NNTC

[96]

Real-time

classificatio

nNearest

neighb

orradicradic

radicradic

radicradic

Sensea

ndsend

Generic

radicradicradic

Classifi

catio

naccuracy

Not

evaluatedon

realdataset

LWClass[100]

Preserve

WSN

sresources

KNN

radicradic

radicradic

radicradic

Sensea

ndsend

Ubiqu

itous

environm

ents

radicradic

Resource

awareness

Non

adaptio

nto

conceptd

rift

FVLD

[104

]Lo

w-dim

ensio

nfeaturev

ector

generatio

nKN

NM

Lradic

radicradic

radicradic

radicradic

Classifi

catio

nVe

hicle

classificatio

nradic

radicEn

ergy

Highcostof

feature

vector

transm

ission

Fuzzypredictor

mod

el[99]

Occup

ancy

predictio

nFu

zzyrules

radicradic

radicradic

radicradic

Sensea

ndsend

Health

care

radicradic

Predictio

naccuracy

Ineffi

cientfor

complex

scenarios

Onlinelearning

[105]

Increm

ental

classificatio

nSV

Mradic

radicradic

radicradic

radicradic

Classifi

catio

nEn

vironm

ental

mon

itorin

gradic

radicEn

ergy

Com

putatio

nal

complexity

One-class

quarter-sphere

SVM

[108]

Ano

maly

detection

SVM

radicradic

radicradic

radicradicradic

Localano

maly

detection

Habitat

mon

itorin

gradic

radicEn

ergy

Igno

resspatia

lcorrelation

18 International Journal of Distributed Sensor Networks

mining becomes difficult because updates on this structureshould be persisted over time

Node Role Node can perform three types of role [33] asfollows

(i) Regular Sensor These are the nodes with limitedresources and they are used to sense the phenomenaand send the sensed data to the base station

(ii) Cluster Head Cluster head can be a regular sensornode or it can be rich in resources In centralizedapproaches cluster head is a regular sensor node thatonly controls the cluster membership In distributedapproaches besides responding for cluster formationCHs perform aggregationfusion of collected sensorsrsquodata Therefore they are equipped with significantlymore computation and communication resources

(iii) Relay It is the node that acts as medium to transmitthe data packet from one node to the others

Node Task In centralized approach node task is to sense thephenomena being monitored and send the sensed data to thebase station In distributed approaches node can performcomputation and can take action based on the detectedphenomena or target

55 Application Area We also evaluated the type of applica-tion benefited fromWSNs data mining techniques Here weexemplify some real-world applications as follows

(i) First is the environmental monitoring [5ndash7 51 5887] in which sensors are deployed in harsh andunattended regions to monitor the natural environ-ment Data mining techniques can identify when andwhere an event may occur and trigger an alarm upondetection

(ii) Second is the habitant and health monitoring [1 299 109] in which patientshumans are equipped withsmall sensors on multiple different positions of theirbody tomonitor their health or behaviorDataminingtechnique can identify the abnormal behavior andhelp to take effective action

(iii) Third is the object tracking [3 4 65 66] in whichsensors are embedded inmoving targets to track themin real-time Data mining techniques help to improvethe estimation of the location of targets and also tomake tracking more efficient and accurate

(iv) Fourth is the WSNs performance [46 48 50 51]WSNs are usually unattended and deployed in harshenvironment Sensor nodes are resource constrainedespecially in terms of power Data mining techniqueshelp to identify the faulty or dead nodes Theyalso help to conserve energy by using in-networkprocessing in which aggregated data is sent to centralside

(v) Fifth is the data analysis [67 84 90] Data miningtechniques help to discover potentially interesting

data patterns in a sensor network for a certainapplication

(vi) Sixth is the real-time monitoring [64 65 85] Datamining techniques especially distributed techniqueshelp to identify certain patterns and predict futureevents in a given time window which make real-timeresponse and action feasible

56 Implementation Each technique is also evaluated interms of experimental validation that is which dataset isused which WSNs optimization objectives are achieved andso forth

Evaluation Method Analytical modeling simulation andreal deployment are the most commonly used techniques toanalyze the performance of data mining technique forWSNs

(i) Analytical Modeling This method is very complexand usually certain simplifications are assumed topredict the performance of the proposed schemeSuch assumptions and simplifications may lead toimprecise results with limited confidence

(ii) Simulation It is the most popular and effectiveapproach to design and test any proposed schemein terms of cost and time it also provides higherlevel of details as comparedwith real implementationHowever the appropriate selection of a simulationframework according to problem and network char-acteristics is a critical task

(iii) Real Deployment It may not be feasible to evaluatethe performance of these techniques through realdeployment due to the unavailability of appropriatehardware in terms of technical and design limitationsUsually the real deployment requires hundreds ofsensor nodes and cost becomes another importantissue In a nutshell evaluating any technique pro-posed for WSNs through real deployment can getthe most convincing results although the evaluatingprocess is complex costly and time consuming

Data Source It refers to dataset use to experimentally validatethe proposed technique Two types of dataset are usedgenerally that is synthetic and real It is observed from thispaper that most of the techniques use the simulation onsynthetic dataset to validate the result In this paper it isobserved that most of the studies used the simulation due tolimited processing power of sensor nodes

Optimization Objective SinceWSNs are constrained in termsof different resources the technique is also evaluated in theoptimization objective that has been achieved Most of thetechniques consider the resource constraint and differentdesign philosophies of network None of them can workefficiently for all of the performance metrics like networksize communication overhead energy efficiency memoryconsumption node mobility and and so forth The largevariations in the performance metrics make it a difficult taskto present a comprehensive evaluation

International Journal of Distributed Sensor Networks 19

6 Limitations of Existing Data MiningTechniques for WSNs

Tables 2ndash6 show the characteristics of datamining techniquesdesigned for WSNs It is observed from comparative analysisthat the existing techniques have the following shortcomings

(i) Most of the techniques do not take into account theheterogeneous data and assume that the sensor data ishomogenous [42 46 49ndash51 65 87 110] They ignorethe fact that different attributes together can improvethe mining accuracy In some cases homogenousdata cannot contribute appropriately toward real-time decision

(ii) The majority of techniques only considers the spatialor temporal or spatiotemporal correlations [65ndash6787 88] among sensor data of neighboring nodes anddoes not consider the attribute dependency amongsensor nodes This in turn increases the computa-tional complexity and reduces the accuracy of miningtechnique

(iii) The techniqueswhich consider spatial correlation [51]among sensor data of neighboring nodes suffer fromthe choice of appropriate neighborhood range Tech-niques which consider temporal correlation amongsensor data suffers from the choice of the size of thesliding window

(iv) The majority of techniques uses centralized approach[21 42ndash44 46 58 84 101] in which all data istransmitted to the sink node for identifying certainpatterns These techniques cause much communica-tion overhead and delay the response time Whilethe techniques that used distributed architecture opti-mize response time and energy consumption theyhave the same problem as that of the centralizedapproach if the aggregatorcluster head has a largenumber of nodes under its membership

(v) Excluding a few the performance of all of the schemesdiscussed in this paper has been evaluated with thehelp of different simulation tools Although the num-ber of simulators is available and plays an importantrole for developing and testing new technique thereis always some kind of risk involved as simulationresults may not be accurate In order to analyze aprotocol more effectively it is important to knowdifferent available tools andunderstand the associatedbenefits and limitationsDue to different performancerequirements according to specific applications ageneral tool for sensor networks is still lacking atpresent

(vi) The techniques evaluated by using analytical mod-eling [21 23 46 49 100 109] used certain sim-plification and assumption to evaluate the perfor-mance of proposed technique Such assumptions andsimplifications may lead to imprecise results withlimited confidence None of the proposed techniqueis evaluated by using real deployment Although realdeployment is complex costly and time consuming

accurate results can only be obtained by using realdeployment

(vii) Excluding a few [22 103 109] the majority oftechniques assumes that sensor nodes are stationaryand do not consider nodes mobility Applying thesetechniques for mobile networks or the networks withdynamic changed topology would be challenging

(viii) Most of the techniques used the synthetic dataAlthough synthetic data is easily available therealways been chances that results generated on syn-thetic data are not accurate

(ix) For the data mining techniques themselves fre-quent pattern mining [15ndash20] approaches suffer fromchoice of proper and flexible support and confidencethreshold Clustering techniques [11ndash14] suffer fromthe choice of an appropriate parameter of clusterwidth and computing the distance between datainstances in heterogeneous data is computationallyexpensive whereas classification-based techniques[24ndash26] require some prior knowledge to classify theincoming data stream However learning accurateclassification model is challenging if the number ofvariables is large in deployed WSNs

7 Future Research Directions

It is observed from the analysis of existing data mining workon sensor network-based application there are still shortcom-ings in existing techniques By seeing these shortcomingsand special characteristics of WSNs there is a need for datamining technique designed for WSNs The technique shouldbe based on the following requirements

(i) The technique should combine offline learningmech-anisms with distributed and online data processing

(ii) It should also consider the resource constraint ofWSN and its special characteristics such as nodemobility and network topology

(iii) The technique should consider heterogeneous dataand dependencies among spatial temporal andattribute correlations which may exist between adja-cent nodes

(iv) During online mining the technique should be capa-ble for incremental learning

(v) The technique should have low computation com-plexity and be easy to be implemented

Based on aforementioned requirements for WSN ahybrid data mining framework is proposed as shown inFigure 6 In this framework sensor nodes use their pro-cessing abilities to locally carry out mining processing andtransmit only the required and partially processed data calledlocal models Single-pass algorithms are applied for networkdata processing as the data is continuously arriving and notavailable for the next scan

Local models contain the compact event patterns ratherthan raw data which address the issue of communication

20 International Journal of Distributed Sensor Networks

Node data processingData selectionRemove duplicationAggregationSummarizationData fusionclusteringAssociation analysismiddot middot middotmiddot middot middot

middot middot middot

Sensor datastream

Global model

Approximateresults

Network model Local modelQuery

Users

Sinkbasestation

In-network processingCentralized processing

Central data processingFrequent pattern miningClassificationClusteringIncremental learningPredicationAnomaly detectionTime series analysis

Network data processingLocal model integrationNetwork analysisReal time decisionsNetwork maintenance

Network patternidentification

monitoring

Sing

le p

ass

Mul

ti pa

ss

Figure 6 Proposed hybrid framework for sensor network based applications

overhead associated with data transfer Local models aredistributed on entire network which are integrated at specialnode which is resource sufficient as compared with othersensor nodes As a result a network model is computed that ismore abstract than local model and is transferred to the basestationsink inmultihop fashionThenetworkmodels are thenintegrated at base stationsink to get the global view of entirenetwork named the global model As a result approximatequery answers are returned to endusers

This framework addresses the following shortcomings ofthe existing techniques

(i) It combines the offline learning mechanisms withdistributed and online data processing The dynamicnature of WSNs data requires real-time analysismethodologies and systems Centralized processingthrough high-end computing is also required forgenerating offline predictive insights which in turncan facilitate real-time analysis The applications thatrequire real-time response and actions can use net-work model for decision and knowledge extractionThe applications that need extensive data analysis fortheir decision making can use global model and per-form central processing on base the stationsink Thenetwork model forwards the processed informationto global model for extensive predictive insight

(ii) Since the data management is a crucial issue inWSNsdata [111] in order to deal with large-scale data fromWSNs the proposed framework splits the data pro-cessing tasks at multiple locations in-network pro-cessing and processing at central server In-networkprocessing splits the large task into smaller ones atnode level and cluster head which is distributed overthe entire network and executes parallelly At the node

level storage capacities of single nodes are used tocompute the local model which contains aggregateddata from single node whereas cluster head acquiresthe data from group of nodes and aggregate datareadings over a certain region or period As a resultnetwork model is computed at each cluster headwhich contains compact data from set of nodes andreduces data size to be transmitted Network modelscan be integrated at sink to get the global view ofreal-time applications Since the sink at network levelhas restricted resource and cannot process large-scaledata for predictive analysis therefore network mod-els are sent to central server where global models canbe computed for predictive offline analysis Historicalquery from the user can also be addressed fromcentral server whereas instant query can be handledby sink to support real-time response In this way ofdata distribution the proposed framework is feasibleto deal with large amount of data obtained fromWSNs

(iii) It can consider the resource constraint of sensornode by using context-awareness techniques Mem-ory energy [79] and bandwidth are considered inthe implementation of data processing on the sensorsfor example many summarization and aggregationtechniques can be adopted to reduce energy andbandwidth consumption

(iv) The framework can address the problem quicklychanging nature of WSNs data where characteristicsof the monitored process may change over timeand render the old models outdated This problemcan be addressed using the incremental learning

International Journal of Distributed Sensor Networks 21

mechanism [39 112] that helps the model to updatenew information

(v) The framework can identified the spatial-temporalcorrelation at local model by using data correlation-based clustering whereas attribute correlation can beidentified at global model by using the multipass datamining algorithms

Currently we are working on implementation of thishybrid framework and the implementationwill be completedin the near future

8 Conclusion

The emerging need for the data mining techniques in thefield of WSNs resulted in the development of numerousalgorithms Each one of these algorithms solves certainissues related to the appropriate WSNs type and applicationIn this paper we analyzed discussed and compared therelated existing research approaches We observed that thetechniques intended for mining sensor data at the networkside are helpful for taking real-time decision aswell as serve asprerequisite for development of effective mechanism for datastorage retrieval query and transaction processing at centralside Moreover we have presented problem-based taxonomyan overall analysis and review of the past research and theirlimitations which can provide insights for endusers in apply-ing or developing an appropriate data mining method andappropriate technology forWSNs Based on these limitationswe have proposed a hybrid framework which can addressthe shortcomings of existing work We have also discussedthe challenges for implementing data mining techniques inresource-constrained WSNs Besides there are a number ofopen issues in existing studies which need to be addressedSurely the number of WSNs applications presented hereis neither complete nor exhaustive but merely a sample ofapplications that demonstrate the usefulness and possibleapplications of data mining method in sensor network

We believe that WSNs applications will become moremature and popular with the advancement of sensor tech-nology and sensor data will become more informationrich Mining techniques will then be very significant inorder to conduct advanced analysis such as determiningtrends and finding interesting patterns thus enhancingWSNsperformance and operation The intention to present thispaper is to stimulate interests in utilizing and developing theprevious studies into emerging applications

Acknowledgments

This work was supported in part by the Joint Funds ofNSFC-Microsoft Research Asia under Grant no 60933012the Specialized Research Fund for the Doctoral Programof Higher Education under Grant no 20110142110062 andInternational SampT Cooperation Program of Hubei Provinceunder Grant no 2010BFA008

References

[1] A Rozyyev H Hasbullah and F Subhan ldquoIndoor child track-ing in wireless sensor network using fuzzy logic techniquerdquoResearch Journal of Information Technology vol 3 no 2 pp 81ndash92 2011

[2] R Szewczyk E Osterweil J Polastre M Hamilton A Main-waring and D Estrin ldquoHabitat monitoring with sensor net-worksrdquo Communications of the ACM vol 47 no 6 pp 34ndash402004

[3] S H Chauhdary A K Bashir S C Shah and M S ParkldquoEOATR energy efficient object tracking by auto adjustingtransmission range in wireless sensor networkrdquo Journal ofApplied Sciences vol 9 no 24 pp 4247ndash4252 2009

[4] P K Biswas and S Phoha ldquoSelf-organizing sensor networks forintegrated target surveillancerdquo IEEETransactions onComputersvol 55 no 8 pp 1033ndash1047 2006

[5] L T Lee and C W Chen ldquoSynchronizing sensor networkswith pulse coupled and cluster based approachesrdquo InformationTechnology Journal vol 7 no 5 pp 737ndash745 2008

[6] N Sabri S A Aljunid B Ahmad A Yahya R KamaruddinandM S Salim ldquoWireless sensor actor network based on fuzzyinference system for greenhouse climate controlrdquo Journal ofApplied Sciences vol 11 no 17 pp 3104ndash3116 2011

[7] D Kumar ldquoMonitoring forest cover changes using remotesensing and GIS a global prospectiverdquo Research Journal ofEnvironmental Sciences vol 5 pp 105ndash123 2011

[8] J Yick B Mukherjee and D Ghosal ldquoWireless sensor networksurveyrdquoComputerNetworks vol 52 no 12 pp 2292ndash2330 2008

[9] T Arampatzis J Lygeros and S Manesis ldquoA survey of appli-cations of wireless sensors and wireless sensor networksrdquoin Proceedings of the 20th IEEE International Symposium onIntelligent Control (ISIC rsquo05) pp 719ndash724 June 2005

[10] Y-C Tseng M-S Pan and Y-Y Tsai ldquoWireless sensor net-works for emergency navigationrdquo Computer vol 39 no 7 pp55ndash62 2006

[11] T Yairi Y Kato and K Hori ldquoFault detection by miningassociation rules fromhouse-keeping datardquo inProceedings of the6th International Symposium on Artificial Intelligence Roboticsand Automation in Space pp 18ndash21 2001

[12] O Horovitz S Krishnaswamy and M M Gaber ldquoA fuzzyapproach for interpretation of ubiquitous data stream clusteringand its application in road safetyrdquo Intelligent Data Analysis vol11 no 1 pp 89ndash108 2007

[13] J Gama P P Rodrigues and L Lopes ldquoClustering distributedsensor data streams using local processing and reduced com-municationrdquo Intelligent Data Analysis vol 15 no 1 pp 3ndash282011

[14] Z A Aghbari I Kamel and T Awad ldquoOn clustering largenumber of data streamsrdquo Intelligent Data Analysis vol 16 no1 pp 69ndash91 2012

[15] A Boukerche and S Samarah ldquoAn efficient data extractionmechanism for mining association rules from wireless sensornetworksrdquo in Proceedings of the IEEE International Conferenceon Communications (ICC rsquo07) pp 3936ndash3941 June 2007

[16] Y Chi H Wang P S Yu and R R Muntz ldquoMomentmaintaining closed frequent itemsets over a stream slidingwindowrdquo inProceedings of the 4th IEEE International Conferenceon Data Mining (ICDM rsquo04) pp 59ndash66 November 2004

[17] M Deypir and M H Sadreddini ldquoEclatDS an efficient slid-ing window based frequent pattern mining method for data

22 International Journal of Distributed Sensor Networks

streamsrdquo Intelligent Data Analysis vol 15 no 4 pp 571ndash5872011

[18] J Gama A Ganguly O Omitaomu R Vatsavai and M GaberldquoKnowledge discovery from data streamsrdquo Intelligent DataAnalysis vol 13 no 3 pp 403ndash404 2009

[19] B George J M Kang and S Shekhar ldquoSpatio-temporal sensorgraphs (STSG) a data model for the discovery of spatio-temporal patternsrdquo Intelligent Data Analysis vol 13 no 3 pp457ndash475 2009

[20] A Mahmood K Shi and S Khatoon ldquoMining data generatedby sensor networks a surveyrdquo Information Technology Journalvol 11 pp 1534ndash1543 2012

[21] D J Cook M Youngblood E O Heierman III et alldquoMavHome an agent-based smart homerdquo in Proceedings of the1st IEEE International Conference on Pervasive Computing andCommunications (PerCom rsquo03) pp 521ndash524 March 2003

[22] J Rabatel S Bringay and P Poncelet ldquoSO MAD sensorminingfor anomaly detection in railway datardquo in Advances in DataMining Applications andTheoretical Aspects pp 191ndash205 2009

[23] V Guralnik and K Z Haigh ldquoLearning models of humanbehaviour with sequential patternsrdquo in Proceedings of the AAAI-02 Workshop on Automation as Caregiver pp 24ndash30 2002

[24] S Huang and Y Dong ldquoAn active learning system for miningtime-changing data streamsrdquo Intelligent Data Analysis vol 11no 4 pp 401ndash419 2007

[25] J Beringer and E Hullermeier ldquoEfficient instance-based learn-ing on data streamsrdquo Intelligent Data Analysis vol 11 no 6 pp627ndash650 2007

[26] E J Spinosaa A PD L F deCarvalhoa and J Gamab ldquoNoveltydetection with application to data streamsrdquo Intelligent DataAnalysis vol 13 no 3 pp 405ndash422 2009

[27] M Xie S Han B Tian and S Parvin ldquoAnomaly detectionin wireless sensor networks a surveyrdquo Journal of Network andComputer Applications vol 34 no 4 pp 1302ndash1325 2011

[28] Y Zhang N Meratnia and P Havinga ldquoOutlier detectiontechniques for wireless sensor networks a surveyrdquo IEEE Com-munications Surveys and Tutorials vol 12 no 2 pp 159ndash1702010

[29] V Chandola A Banerjee and V Kumar ldquoAnomaly detection asurveyrdquo ACM Computing Surveys vol 41 no 3 article 15 2009

[30] VMaojo and J Sanandre ldquoA survey of data mining techniquesrdquoMedical Data Analysis Lecture Notes in Computer Science vol1933 pp 17ndash22 2000

[31] W Jinlong X Congfu C Weidong and P Yunhe ldquoSurveyof the study on frequent pattern mining in data streamsrdquo inProceedings of the IEEE International Conference on SystemsMan and Cybernetics (SMC rsquo04) pp 5917ndash5922 October 2004

[32] J Cheng Y Ke and W Ng ldquoA survey on algorithms formining frequent itemsets over data streamsrdquo Knowledge andInformation Systems vol 16 no 1 pp 1ndash27 2008

[33] A A Abbasi andM Younis ldquoA survey on clustering algorithmsfor wireless sensor networksrdquo Computer Communications vol30 no 14-15 pp 2826ndash2841 2007

[34] O Boyinbode H Le and M Takizawa ldquoA survey on clusteringalgorithms for wireless sensor networksrdquo International Journalof Space-Based and SituatedComputing vol 1 no 2 pp 130ndash1362007

[35] M M Gaber A Zaslavsky and S Krishnaswamy ldquoA survey ofclassificationmethods in data streamsrdquo inData Streams pp 39ndash59 Springer 2007

[36] R Agrawal and R Srikant ldquoFast algorithms for mining associ-ation rulesrdquo in Proceedings of the 20th International ConferenceVery Large Data Bases (VLDB rsquo94) pp 487ndash499 Citeseer 1994

[37] R J Bayardo Jr ldquoEfficiently mining long patterns fromdatabasesrdquo SIGMOD Record vol 27 no 2 pp 85ndash93 1998

[38] S Brin RMotwani andC Silverstein ldquoBeyondmarket basketsgeneralizing association rules to correlationsrdquo SIGMODRecordvol 26 no 2 pp 265ndash276 1997

[39] W Cheung and O R Zaiane ldquoIncremental mining of frequentpatterns without candidate generation or support constraintrdquoin Proceedings of 7th International Database Engineering andApplications Symposium pp 111ndash116 2003

[40] R Agrawal T Imielinski and A Swami ldquoMining associationrules between sets of items in large databasesrdquo in Proceeding ofSIGMOD pp 207ndash216

[41] J Han J Pei Y Yin and R Mao ldquoMining frequent pat-terns without candidate generation a frequent-pattern treeapproachrdquo Data Mining and Knowledge Discovery vol 8 no 1pp 53ndash87 2004

[42] M Halatchev and L Gruenwald ldquoEstimating missing valuesin related sensor data streamsrdquo in Proceedings of the 11thInternational Conference on Management of Data (COMADrsquo05) 2005

[43] N Jiang ldquoDiscovering association rules in data streams basedon closed pattern miningrdquo in Proceedings of the SIGMODWorkshop on Innovative Database Research 2007

[44] N Jiang and L Gruenwald ldquoEstimating missing data in datastreamsrdquo Advances in Databases Concepts Systems and Appli-cations pp 981ndash987 2007

[45] N Jiang and L Gruenwald ldquoCFI-stream mining closed fre-quent itemsets in data streamsrdquo in Proceedings of the 12th ACMSIGKDD International Conference on Knowledge Discovery andData Mining (KDD rsquo06) pp 592ndash597 August 2006

[46] K Loo I Tong and B Kao ldquoOnline algorithms for min-ing inter-stream associations from large sensor networksrdquo inAdvances in KnowledgeDiscovery andDataMining pp 291ndash3022005

[47] G S Manku and R Motwani ldquoApproximate frequency countsover data streamsrdquo in Proceedings of the 28th InternationalConference on Very Large Data Bases pp 346ndash357 2002

[48] S K Chong S Krishnaswamy S W Loke and M M GaberldquoUsing association rules for energy conservation in wirelesssensor networksrdquo in Proceedings of the 23rd Annual ACMSymposium on Applied Computing (SAC rsquo08) pp 971ndash975March 2008

[49] S K Tanbeer C F Ahmed B-S Jeong and Y-K Lee ldquoEfficientmining of association rules from wireless sensor networksrdquo inProceedings of the 11th International Conference on AdvancedCommunication Technology (ICACT rsquo09) pp 719ndash724 February2009

[50] A Boukerche and S Samarah ldquoA novel algorithm for miningassociation rules in Wireless Ad Hoc Sensor Networksrdquo IEEETransactions on Parallel and Distributed Systems vol 19 no 7pp 865ndash877 2008

[51] K Romer ldquoDistributed mining of spatio-temporal event pat-terns in sensor networksrdquo in Proceedings of the 1st Euro-American Workshop on Middleware for Sensor Networks(EAWMS rsquo06) 2006

[52] BTnode platform httpwwwbtnodeethzch[53] R Agrawal and R Srikant ldquoMining sequential patternsrdquo in

Proceedings of the IEEE 11th International Conference on DataEngineering pp 3ndash14 March 1995

International Journal of Distributed Sensor Networks 23

[54] R Srikant and R Agrawal ldquoMining sequential patterns gen-eralizations and performance improvementsrdquo in Proceedings ofthe Advances in Database Technology (EDBT rsquo96) pp 1ndash17 1996

[55] F Masseglia F Cathala and P Poncelet ldquoThe PSP approachfor mining sequential patternsrdquo Principles of Data Mining andKnowledge Discovery pp 176ndash184 1998

[56] J Han J Pei B Mortazavi-Asl Q Chen U Dayal and M-CHsu ldquoFreeSpan frequent pattern-projected sequential patternminingrdquo in Proceedings of the Sixth ACMSIGKDD InternationalConference onKnowledgeDiscovery andDataMining (KDD rsquo01)pp 355ndash359 August 2000

[57] J Pei J Han B Mortazavi-Asl et al ldquoPrefixSpan min-ing sequential patterns efficiently by prefix-projected patterngrowthrdquo in Proceedings of the 17th International Conference onData Engineering pp 215ndash224 April 2001

[58] F Esposito T M A Basile N Di Mauro and S Ferilli ldquoA rela-tional approach to sensor network data miningrdquo InformationRetrieval and Mining in Distributed Environments pp 163ndash1812010

[59] F Esposito N Di Mauro T M A Basile and S FerillildquoMulti-dimensional relational sequence miningrdquo FundamentaInformaticae vol 89 no 1 pp 23ndash43 2008

[60] R Agrawal H Mannila R Srikant et al ldquoFast discovery ofassociation rulesrdquo inAdvances in KnowledgeDiscovery andDataMining pp 307ndash328 AAAI PressMenlo Park Calif USA 1996

[61] Mica2Dot CrossBow 2005 httpwwwxbowcom[62] Intel Berkeley Research Lab Data httpdbcsailmitedulab-

datalabdatahtml[63] P H Wu W C Peng and M S Chen ldquoMining sequential

alarm patterns in a telecommunication databaserdquo in Databasesin Telecommunications II pp 37ndash51 2001

[64] V S Tseng and E H-C Lu ldquoEnergy-efficient real-time objecttracking in multi-level sensor networks by mining and predict-ing movement patternsrdquo Journal of Systems and Software vol82 no 4 pp 697ndash706 2009

[65] V S Tseng and K W Lin ldquoEnergy efficient strategies for objecttracking in sensor networks a data mining approachrdquo Journalof Systems and Software vol 80 no 10 pp 1678ndash1698 2007

[66] S Samarah M Al-Hajri and A Boukerche ldquoA predictiveenergy-efficient technique to support object-tracking sensornetworksrdquo IEEE Transactions on Vehicular Technology vol 60no 2 pp 656ndash663 2011

[67] A Taherkordi R Mohammadi and F Eliassen ldquoA commu-nication-efficient distributed clustering algorithm for sensornetworksrdquo in Proceedings of the 22nd International Conferenceon Advanced Information Networking and Applications Work-shopsSymposia (AINA rsquo08) pp 634ndash638 March 2008

[68] G Gupta and M Younis ldquoLoad-balanced clustering of wirelesssensor networksrdquo in Proceedings of the International Conferenceon Communications (ICC rsquo03) vol 3 pp 1848ndash1852 May 2003

[69] S Bandyopadhyay and E J Coyle ldquoAn energy efficient hier-archical clustering algorithm for wireless sensor networksrdquo inProceedings of the 22nd Annual Joint Conference on the IEEEComputer and Communications Societies pp 1713ndash1723 April2003

[70] S Ghiasi A Srivastava X Yang and M Sarrafzadeh ldquoOptimalenergy aware clustering in sensor networksrdquo Sensors vol 2 no7 pp 258ndash269 2002

[71] O Younis and S Fahmy ldquoHEED a hybrid energy-efficientdistributed clustering approach for ad hoc sensor networksrdquoIEEE Transactions on Mobile Computing vol 3 no 4 pp 366ndash379 2004

[72] M Younis M Youssef and K Arisha ldquoEnergy-aware manage-ment for cluster-based sensor networksrdquo Computer Networksvol 43 no 5 pp 649ndash668 2003

[73] Y T Hou Y Shi H D Sherali and S F Midkiff ldquoOn energyprovisioning and relay node placement for wireless sensornetworksrdquo IEEE Transactions on Wireless Communications vol4 no 5 pp 2579ndash2590 2005

[74] T Wu and S Biswas ldquoA self-reorganizing slot allocation proto-col for multi-cluster sensor networksrdquo in Proceedings of the 4thInternational Symposium on Information Processing in SensorNetworks (IPSN rsquo05) pp 309ndash316 April 2005

[75] K Dasgupta K Kalpakis and P Namjoshi ldquoAn efficientclustering-based heuristic for data gathering and aggregationin sensor networksrdquo in Proceedings of the IEEE Wireless Com-munications and Networking Conference (WCNC rsquo03) vol 3 pp1948ndash1953 2003

[76] M Demirbas A Arora and V Mittal ldquoFLOC A fast local clus-tering service for wireless sensor networksrdquo in Proceedings ofWorkshop on Dependability Issues in Wireless Ad Hoc Networksand Sensor Networks (DIWANS rsquo04) 2004

[77] P Ding J Holliday and A Celik ldquoDistributed energy-efficienthierarchical clustering for wireless sensor networksrdquo in Pro-ceedings of the 1st IEEE International Conference on DistributedComputing in Sensor Systems (DCOSS rsquo05) pp 466ndash467 July2005

[78] H Chan and A Perrig ldquoACE an emergent algorithm for highlyuniform cluster formationrdquoWireless Sensor Networks vol 2920pp 154ndash171 2004

[79] H Chan M Luk and A Perrig ldquoUsing clustering informationfor sensor network localizationrdquo in Proceedings of the 1st IEEEInternational Conference on Distributed Computing in SensorSystems (DCOSS rsquo05) pp 109ndash125 July 2005

[80] H Huang and J Wu ldquoA probabilistic clustering algorithmin wireless sensor networksrdquo in Proceeding of IEEE 62ndSemiannual Vehicular Technology Conference (VTC rsquo05) p 17962005

[81] A Youssef M Younis M Youssef and A Agrawala ldquoDis-tributed formation of overlappingmulti-hop clusters in wirelesssensor networksrdquo in Proceedings of the 49th Annual IEEE GlobalCommunication Conference (Globecom rsquo06) pp 1ndash6 December2006

[82] S Dai P Wang L Gao and S Zheng ldquoMining clusteringalgorithm in wireless sensor networksrdquo in Proceedings of theIEEE International Conference on Granular Computing (GRCrsquo08) pp 178ndash182 August 2008

[83] W R Heinzelman A Chandrakasan and H Balakrish-nan ldquoEnergy-efficient communication protocol for wirelessmicrosensor networksrdquo in Proceedings of the 33rd AnnualHawaii International Conference on System Siences (HICSS rsquo00)vol 2 p 223 January 2000

[84] C Liu K Wu and J Pei ldquoA dynamic clustering and schedulingapproach to energy saving in data collection from wirelesssensor networksrdquo in Proceedings of the 2nd Annual IEEE Com-munications Society Conference on Sensor and AdHoc Commu-nications and Networks (SECON rsquo05) pp 374ndash385 September2005

[85] L Guo C Ai X Wang Z Cai and Y Li ldquoReal time clusteringof sensory data in wireless sensor networksrdquo in Proceedingsof the IEEE 28th International Performance Computing andCommunications Conference (IPCCC rsquo09) pp 33ndash40 December2009

24 International Journal of Distributed Sensor Networks

[86] M H Yeo M S Lee S J Lee and J S Yoo ldquoData correlation-based clustering in sensor networksrdquo in Proceedings of the Inter-national Symposium on Computer Science and its Applications(CSA rsquo08) pp 332ndash337 October 2008

[87] P Beyens A Nowe and K Steenhaut ldquoHigh-density wirelesssensor networks a new clustering approach for prediction-based monitoringrdquo in Proceedings of the 2nd European Work-shop on Wireless Sensor Networks (EWSN rsquo05) pp 188ndash196February 2005

[88] S Yoon and C Shahabi ldquoThe Clustered AGgregation (CAG)technique leveraging spatial and temporal correlations in wire-less sensor networksrdquo ACM Transactions on Sensor Networksvol 3 no 1 Article ID 1210672 2007

[89] K Wang S A Ayyash T D C Little and P Basu ldquoAttribute-based clustering for information dissemination in wirelesssensor networksrdquo in Proceedings of the 2nd Annual IEEE Com-munications Society Conference on Sensor and AdHoc Commu-nications and Networks (SECON rsquo05) pp 498ndash509 Santa ClaraCalif USA September 2005

[90] X Ma S Li Q Luo et al ldquoDistributed hierarchical clusteringand summarization in sensor networksrdquo in Advances in Dataand Web Management pp 168ndash175 2007

[91] L K Sharma O P Vyas S Schieder et al ldquoNearest neighbourclassification for trajectory datardquo Information and Communica-tion Technologies vol 101 pp 180ndash185 2010

[92] B Chikhaoui S Wang and H Pigot ldquoA new algorithm basedon sequential pattern mining for person identification in ubiq-uitous environmentsrdquo in Proceedings of the 4th InternationalWorkshop on Knowledge Discovery form Sensor Data (ACMSensorKDD rsquo10) pp 20ndash28 Washington DC USA 2010

[93] J R M Bauchet S Giroux H Pigot et al ldquoPervasive assistancein smart homes for people with intellectual disabilities a casestudy on meal preparationrdquo International Journal of AssistiveRobotics and Mechatronics vol 9 no 4 pp 42ndash54 2008

[94] D J Cook andM Schmitter-Edgecombe ldquoAssessing the qualityof activities in a smart environmentrdquoMethods of Information inMedicine vol 48 no 5 pp 480ndash485 2009

[95] I H Witten and E Frank Data Mining Practical MachineLearning Tools and Techniques With Java Implementation Mor-gan Kaufmann 2000

[96] K Sharma M Rajpoot and L K Sharma ldquoNearest neighbourclassification for wireless sensor network datardquo InternationalJournal of Computer Trends and Technology no 2 2011

[97] NS2 Simulator httpwwwisiedunsnamns[98] O P V L K Sharma S Schieder and A K Akasapu ldquoA nearest

neighbour classification for trajectory datardquo in Springer CCISvol 101 pp 180ndash185 2010

[99] M J Akhlaghinia A Lotfi C Langensiepen and N SherkatldquoA fuzzy predictor model for the occupancy prediction of anintelligent inhabited environmentrdquo in Proceedings of the IEEEInternational Conference on Fuzzy Systems (FUZZ rsquo08) pp 939ndash946 June 2008

[100] M Gaber S Krishnaswamy and A Zaslavsky ldquoOn-boardmining of data streams in sensor networksrdquo in AdvancedMethods for Knowledge Discovery from Complex Data pp 307ndash335 2005

[101] M M Gaber S Krishnaswamy and A Zaslavsky ldquoAdaptivemining techniques for data streams using algorithm outputgranularityrdquo in Proceedings of the Australasian Data MiningWorkshop 2003

[102] M M Gaber A Zaslavsky and S Krishnaswamy ldquoResource-aware knowledge discovery in data streamsrdquo in Proceedingsof 1st International Workshop on Knowledge Discovery in DataStreams held in Conjunction ECML and PKDD 2004

[103] S M McConnell and D B Skillicorn ldquoA distributed approachfor prediction in sensor networksrdquo in Proceedings of the Work-shop on Data Mining in Sensor Networks Newport Beach CalifUSA 2005

[104] B Malhotra I Nikolaidis and J Harms ldquoDistributed classifi-cation of acoustic targets in wireless audio-sensor networksrdquoComputer Networks vol 52 no 13 pp 2582ndash2593 2008

[105] K Flouri B Beferull-Lozano and T Tsakalides ldquoTraininga SVM-based classifier in distributed sensor networksrdquo inProceedings of the 14th International Conference onDigital SignalProcessing (DSP rsquo09) pp 1ndash5 2006

[106] K Flouri B Beferull-Lozano and T Tsakalides ldquoEnergy-efficient distributed support vectormachines for wireless sensornetworksrdquo in Proceedings of the EuropeanWorkshop onWirelessSensor Networks 2006

[107] K Flouri B Beferull-Lozano and T Tsakalides ldquoDistributedconsensus algorithms for SVM training in wireless sensornetworksrdquo in Proceedings of the 16th European Signal ProcessingConference (EUSIPCO 09) 2008

[108] S Rajasegarar C Leckie M Palaniswami and J C BezdekldquoQuarter sphere based distributed anomaly detection in wire-less sensor networksrdquo in Proceedings of the IEEE InternationalConference on Communications (ICC rsquo07) pp 3864ndash3869 June2007

[109] B Chikhaoui S Wang and H Pigot ldquoA new algorithm basedon sequential pattern mining for person identification in ubiq-uitous environmentsrdquo in Proceedings of the 4th InternationalWorkshop on Knowledge Discovery form Sensor Data (ACMSensorKDD rsquo10) pp 20ndash28 Washington DC USA 2010

[110] K Romer and F Mattern ldquoThe design space of wireless sensornetworksrdquo IEEEWireless Communications vol 11 no 6 pp 54ndash61 2004

[111] O Diallo J J P C Rodrigues and M Sene ldquoReal-time datamanagement on wireless sensor networks a surveyrdquo Journal ofNetwork andComputer Applications vol 35 no 3 pp 1013ndash10212012

[112] Y Yao L Feng B Jin and F Chen ldquoAn incremental learningapproachwith SupportVectorMachine for network data streamclassification problemrdquo Information Technology Journal vol 11no 2 pp 200ndash208 2012

Submit your manuscripts athttpwwwhindawicom

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2013Part I

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

DistributedSensor Networks

International Journal of

ISRN Signal Processing

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Mechanical Engineering

Advances in

Modelling amp Simulation in EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Advances inOptoElectronics

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2013

ISRN Sensor Networks

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawi Publishing Corporation httpwwwhindawicom Volume 2013

The Scientific World Journal

ISRN Robotics

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

International Journal of

Antennas andPropagation

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

ISRN Electronics

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

thinspJournalthinspofthinsp

Sensors

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Active and Passive Electronic Components

Chemical EngineeringInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Electrical and Computer Engineering

Journal of

ISRN Civil Engineering

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Advances inAcoustics ampVibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Page 9: ReviewArticle Data Mining Techniques for Wireless Sensor ...home.etf.bg.ac.rs/~vm/os/dmsw/Data Mining... · have a large impact on type of data mining algorithm to choose;therefore,onehastodecidetheprocessing

International Journal of Distributed Sensor Networks 9

dissimilarity measure thresholdmax dst A cluster is a cliquein the graph and the clustering problem uses the minimumnumber of cliques to cover all vertices in the graph Thisprocess minimizes the number of clusters and maximizes theenergy saving The sink also dynamically adjusts the clustersbased on spatial correlation and the received data from thesensors The algorithm produces robust and well-balancedclusters However due to centralized processings it is notsuitable for large-scale WSNs

432 Distributed Approaches Aim toMaximizeWSNsrsquo Perfor-mance Guo et al [85] proposed the H-cluster a distributedalgorithm to cluster sensory dataThe input of this algorithmis the set of sensory data collected by all of the sensorsfrom the time WSN starts working up to the current timeThe output of the algorithm is a set of cluster featuresthat summarize the clusters of the input sensory data-setHilbert-Map mapping algorithm has been used to map ad-dimensional sensory data space into a 2-dimensional areacovered by a given WSN H-cluster has 2 phases (1) itmerges connected grid features with local cluster featuresof (sensory dimensional) D at each destination node (2)it combines the connected local clusters to global clustersThe experiments on the centralized and distributed dataare carried out to compare the H-Cluster with C-Cornerand C-Center algorithms During experiment four types ofenvironment attributes are sensed by the sensors which aretemperature humidity light and voltage The results showthatH-Cluster algorithm ismuch efficient in data loss energyand the quality of cluster data in small WSNThe results alsoshows that as the amount of sensory data delivered increasesthe amount of data loss also increases and energy efficiencydecreases by increasing the size of WSNs

Yeo et al [86] proposed data correlation-based clusteringscheme (DCC) based on similarity of sensor data along aspatial suppression scheme which helps to reduce the datasize DCC enhances the advertisement phase of HEED [71]in which cluster heads are selected according to probabilityof becoming a cluster head during this phase sensor nodescommunicate with each other and the resulting clustersare organized by sensor nodes which have similar readingsSpatial suppression is performed on cluster head and italso computes the difference between sensor reading andrepresentative value If a cluster head has redundant datait will remove it except for the node identification Theexperimental results justify the hypothesis claim that theclustering based on data correlation has better compressionperformance than ordinary clustering based on locality ofcommunication they show that DCC reduces 40 of datasize through suppression and prolongs network lifetime20ndash30 However for the large-scale network applications(nodes gt 500) DCC is inefficient because each cluster headneeds more energy to collect similar data readings and alsoto communicate with several nodes Also in case of lowpercentage of similar data reading DCC is ineffective due tohigher rate of cluster head creation

Beyens et al [87] proposed a cluster-based architecturefor wireless sensor networks in which cluster heads spa-tiotemporally correlate and predict the measurements of the

cluster members by executing their prediction model Intheir approach the cluster heads execute a prediction modelwhile gateway nodes at the circumference of the clusters areresponsible for the routing task Prediction model is used toselect a suitable node of the cluster to be activated The ideais to put a sensor node to sleep when there are no objects inits sensing region

Yoon and Shahabi [88] present the clustered aggregation(CAG) algorithm that forms clusters of nodes sensing similarvalues within a given threshold (spatial correlation) andthese clusters remain unchanged as long as the sensor valuesstay within a threshold over time (temporal correlation)By grouping nodes on similar values CAG only transmitsone reading per group When the threshold values changethe related sensor nodes will then communicate with neigh-boring nodes associated with other clusters to change theircluster memberships CAG guarantees the result to be withina user-specified error-tolerance threshold Cluster formationis performed while queries are disseminated to the network(query phase) where clusters group nodes sensing similarvalues Subsequently CAG enters the response phase whereinonly one aggregated value per cluster is transmitted up theaggregation tree CAG is a lossy clustering algorithm (mostsensory readings are never reported) which trades a lowerresult precision for a significant energy storage computationand communication saving

Taherkordi et al [67] proposed a communication-efficient distributed protocol for clustering sensory dataA distributed version of 119870-Mean clustering algorithm isproposed and sends summarized data towards sink whichreduces the communication transmission time and powerconsumption of sensor nodes The sensor network is dividedinto clusters and cluster head node will only communicatewith sink Initially base station transmits current centerlocations to cluster heads Cluster head collects data fromits sensor node and sends it to the base station includingcount and vector sum of its local sensory data points aswell as sum of the squared distance from each local pointto its center On receiving data from CH the base stationupdates the cluster mean and the algorithm repeats until thefunction convergence is met The efficiency of the algorithmis evaluated via simulations Several programs are run to getthe average number of transmissions over the network duringeach test According to results the communication cost isindependent of the number of sensors (119873) and increaseslinearly by increasing the number of centers Major issuesare extra memory for cluster head and computation powerfor summarization of data before transmitting to sink Alsothe algorithm requires multiple rounds of message passingbetween cluster heads and the base station this may have aserious effect on communication efficiency when the numberof sensors is relatively high

Wang et al [89] promoted the idea of clustering theWSNs based on the queries and attributes of the data Themain motive is to achieve efficient dissemination of data inthe network The concept resembles the data-centric designmodel of WSNs The clustering is established by mappinga hierarchy of data attributes to the network topology Thebase station starts the clustering process by asking nodes

10 International Journal of Distributed Sensor Networks

Class label (Y)

Attribute set (X)

OutputInput Classification model

Figure 5 Classification maps input attribute set (X) to class label(Y)

to form clusters Those nodes that hear the request decidewhether they should nominate themselves as CHs basedon their energy After receiving the base-station requestsensor nodes having intention to become CHs wait for arandom time period that is based on the remaining batterysupply If a node nominates itself then it broadcasts anannouncement to all nodes A node joins the CH that itcan reach over the least number of hops Upon hearing aCH announcement from a node whose attribute is differentthe recipient node establishes a new cluster for that attributeand becomes a CH To evaluate the attribute-based clusteringscheme the authors have provided the theoretical analysis ofit with flooding-based schemes Analysis shows its attribute-based clustering scheme yield that gains over flooding-basedschemeswhen there are subregions in the sensor network thatare more targeted than others that is when the distributionof inquiries is not uniformly distributed over time and space

Ma et al [90] the proposed distributed hierarchicalclustering and Summarization algorithm (DHCS) for onlinedata analysis and mining in sensor networks The proposedmethod clusters sensor nodes based on their current datavalues aswell as their geographical proximity and it computesa summary for each cluster The algorithm adopts severaltechniques such as difference and hop count thresholds tomodel node and distance-based clustering Initially eachnode treats itself as an active cluster Then similar adjacentclusters are merged into larger clusters round by round Ineach round each cluster will try to combine with its mostsimilar adjacent cluster simultaneously Two clusters can bemerged only if both consider one another as the most similarneighbor DHCS terminates when no merging happens anymore The final clusters which cannot be merged any moreare called steady clusters

44 Classification Classification is a task of assigning newobject into a class of predefined object categories Classifi-cation model is learned using the set of training data andclassifies new data into one of the learned class Figure 5shows that classification maps input attribute set (X) to classlabel (Y)

Classification-based approaches have adapted the tra-ditional classification techniques such as decision tree-based rule-based nearest neighbor-based and support vectormachines-based techniques based on type of the classificationmodel that they used Decision tree is a classifier in the formof tree and classifies the instance by starting at the root oftree and moving through it until a leaf node where class labelis assigned The internal nodes are used to partition datainto subsets by applying test condition to separate instancesthat have different characteristics Nearest neighbor-basedapproaches classify dataset based on closet training examples

The training examples are vectors in a multidimensionalfeature space with corresponding class labels A nearestneighbor classifier is a lazy learner that does not processpatterns during training [91] To respond a request to classifya query vector is made to locate the closest training vectorsaccording to the distance metricThe classes of these trainingvectors are used to assign a class to the query vector

Rule-based classifier groups the dataset in predefinedclasses by using ldquoif then rdquo rules of following form

(Condition) rarr Y condition is a conjunction ofattribute and Y is a class label

SVM (support vector machine) techniques partition thedata belonging to different classes by fitting a hyperplanebetween them which maximizes the partition The data ismapped into a higher-dimensional feature space where it canbe easily partitioned by a hyperplane Furthermore a kernelfunction is used to approximate the dot products between themapped vectors in the feature space to find the hyperplane

441 Centralized Approaches Aim to SolveWSNsrsquo Application-Based Issues Chikhaoui et al [92] proposed the decisionTree (DT-) based classification technique for sensor dataThey applied the classification model to identify the personsin ubiquitous environment In order to identify personsthe proposed approach first extracts frequent patterns calledepisodes from the datasets using the Apriori algorithm [53]The next step evaluates the extracted patterns and assignsweights to these episodes to construct frequent episodeweight matrix (FEWM)

Finally the classification algorithm Decision tree (DT) isapplied on FEWMDT builds pattern classifier from a labeledtraining data-set using a divide-and-conquer approach Tobuild up a DT model it recursively selects the attribute thatis used to partition the training data-set into subsets untileach leaf node in the tree has uniform class membershipThe proposed approach is validated by experiment usingdata collected from the Domus Laboratory [93] and theTestbed smart home [94] The general performance andclassification accuracy of algorithm are evaluated by usingthe Weka framework version 370 [95] Experiment resultsshow good classification However using frequent episodesalone without temporal constraints and deep analysis doesnot guarantee good identification

Sharma et al [96] proposed amethodology for classifyingthe sensors data by using nearest neighbor trajectory clas-sification (NNTC) The training phase simply stores everytraining example with its label To make a prediction for atest example first its distance to every training example iscomputedThen 119896 closest training examples are storedwhere119896 is a fixed integer and 119896 ge 1 among the 119896 examples itlooks for the label that is most frequent This label is theprediction for this test example The algorithm is evaluatedby building a classifier from the preprocessed training datagenerated from NS2 [97] and test trajectory data [98] usingclass labels Experimental investigation yields a significantoutput in terms of the correctly classified success rate 923

Akhlaghinia et al [99] proposed the prediction techniquein smart home environments to predict the behavior pattern

International Journal of Distributed Sensor Networks 11

of occupantsThe sensor NWs collect the variety of attributesincluding environmental changes and occupantrsquos interactionwith the environment The collected data is then used by thelearning approach to construct a classification-based predic-tive model to predict the ambient intelligence environmentoccupancy The occupancy is predicted by using the fuzzyrules which are modeled by using the past value of timeseries data In the learning process input from the sensor iscompared with stored rules to take appropriate action Theprediction-based approach improves the energy saving insmart homes and enhances the safety and security of occu-pants The result shows the ability of the proposed techniqueto predict the combined occupancy time series However themodel is implemented in single-user environment and unableto predict the complex environmental patterns in multi-userenvironment over long period

442 Centralized Approaches Aim toMaximizeWSNsrsquo Perfor-mance Gaber et al [100] proposed the lightweight classifica-tion (LWClass) a one-pass algorithm for on-board miningof data streams in sensor networks They used the algorithmoutput granularity (AOG) [101 102] technique to preserve thelimited memory size and change the algorithm output rateaccording to data rate available memory algorithm outputrate history and time constraints to fill the available memorywith generated knowledgeThe algorithmworks by searchingfor the nearest instance stored in main memory when a newelement arrives All instances are already stored in the mainmemory according to a prespecified distance threshold Thethreshold here represents the similarity measure acceptableby the algorithm to consider two or more elements as oneelement according to the elements attribute values If thealgorithm finds this element then it checks the class labelIf the class label is the same then it increases the weightfor this instance by one otherwise it decrements the weightby one If the weight becomes zero then this element isreleased from the memory The algorithm is empiricallyvalidated using synthetic streaming data under the resource-constrained environment of a common handheld computer

443 DistributedApproaches Aim to SolveWSNsrsquo Application-Based Issues McConnell and Skillicorn [103] presented adistributed framework for building and deploying predictorsin sensor networks By using the computational power ofeach sensor a powerful learning structure on whole networkis constructed A distributed voting approach is proposedin which each sensor is a leaf of tree (DT) to performlocal prediction Instead of sending the raw data the localpredictive models built on sensors transmit the target class tothe sink At sink the local predication models are combinedto construct global prediction model It shows how thelocal model enables sensors to respond to the change intarget by relearning local models The proposed frameworkis useful especially for sensor networks with limited energycomputation and bandwidth resources It makes efficientthe distributed data mining in the presence of movingclass boundaries Data is also confidentially achieved bytransmitting a predictivemodel instead of original data to the

sink The distributed prediction model is evaluated using J48decision tree (implemented in WEKA) on variety of datasetfor both simple and weighted voting schemes According toresults distributed prediction model has the potential of anincrease in accuracy combined with a reduction in modelsize and runtime as compared with a centralized approachMajor issues in this framework are the need of an expensiveCPU on each sensor node for computing and building localpredictive model and also extra memory is required to storelocal predictive model

444 Distributed Approaches Aim to Maximize WSNsrsquo Per-formance Malhotra et al [104] proposed a distributed clas-sification scheme to generate effective feature vectors of lowdimension (FVLD) for wireless audio network A distributedcluster-based algorithm for detection and classification ofvehicles has been proposed Sensors form clusters on-demand for the sake of running a classification task based onthe produced feature vectors The monitoring area is dividedinto clusters and a cluster head is selected for each clusterAll sensors send their feature vector to cluster heads Thecluster head combines all received feature vectors (includingone from itself) executes the classification task using forexample KNN or ML classifiers and makes decision on theclass of the unknown vehicle Two approacheswere proposedthe first combines extracted features and the second combinesindividual decisions Classification using decision fusion anda maximum likelihood (ML) classifier led to the best resultsML is also compared with KNN classifier with varioussettings of data and decision fusion schemes The proposedtechnique produced the best classification accuracy of 8946as compared with all other approaches

Flouri et al [105ndash107] have proposed distributed andincremental techniques for learning classification rules usingSVM-based (support vector machine) technique in a sensornetwork The authors proposed two distributed algorithmsthe distributed fix partition SVM (DFP-SVM) and theweighted distributed fix partition SVM (WDFP-SVM) fortraining a SVM applied to the classification problem in aWSN SVM is incrementally trained on example set calledsupport vector The fact with SVM is that the number ofsupport vectors is very small comparedwith the number of allsample values Besides the support vectors (and offset) revealcompressed representation of separating SVM hyperplaneThat is why sending only the support vectors instead ofall training samples to the next cluster head is obviouslyvery energy efficient due to communication reduction Aftertraining the required parameters of the kernel functions aretransferred to each node for classification The performanceof the proposed approach is evaluated by running number ofsimulation and comparison is made with centralized algo-rithm The results show that energy consumption decreaseswhen the SVM is trained incrementally as compared with thecentralized case However the challenges for SVM formula-tions are computational complexity and the choice of properkernel function

Rajasegarar et al [108] proposed the SVM-based tech-nique for outlier detection in sensor data This techniqueuses one-class quarter-sphere SVM to identify local outliers

12 International Journal of Distributed Sensor Networks

at each node and to minimize the computational complexityThe sensor data that lies outside the quarter sphere isconsidered as an outlier Each node communicates onlythe radius information of sphere with its parent for outlierclassification This technique identifies outliers from the datameasurements collected after a long-time window and is notperformed in real time The technique also ignores spatialcorrelation of neighboring nodes which makes the results oflocal outliers inaccurate The technique is evaluated by usingthe real sensor measurement collected from deployment ofwireless sensors in the Great Duck Island Project [2] formonitoring the habitat of sea birds The algorithm is imple-mented in Matlab and two simulations were run to measurethe computational strategy and various kernel functionsResults reveal that the proposed technique achieves signifi-cant energy savings in terms of communication overhead inthe network

5 Comparison of Data Mining Techniquesfor WSNs

This section identifies several common and different aspectsof data mining techniques specially designed for WSNsdiscussed above These aspects will be used as metrics in thecomparative Tables 2 3 4 5 and 6 First evaluation aspectsfor different techniques are discussed and then comparativetables are presented to compare and differentiate existing datamining techniques for WSNs data

51 Input Sensor Data Sensor data can be viewed as largevolume of real-valued data that is continuously collectedfrom WSNs The type of input sensor data demonstrateswhich data mining techniques can be used to analyze thedata Data mining techniques usually consider following twocharacteristics of data

Attribute Mining techniques can identify the associationbetween data attributes Attributes can be homogenous [50] orheterogeneous [33 48] Homogenous attribute means sensingsingle-value attribute for example temperature only Forheterogeneous case each nodemay be equippedwithmultiplesensors and can sense multiple attributes for example tem-perature humidity and pressure The data mining techniqueshould be able to identify the correlation between multipleattributes

Correlation Two types of data correlation appear at eachsensor node The first type is attribute correlation that isdependency among data attributes The second type is interms of time and space that is temporal and spatial corre-lation Temporal correlation indicates that the readings fromdifferent sensor node are observed at the same time instantand readings observed at one time instant are related tothe readings observed at the previous time instant whereasspatial correlation indicates that the readings from sensornodes geographically close to each other are expected tobe largely correlated Capturing spatiotemporal correlation

helps to predict future trend of sensor reading and identifica-tion of dead node if reading from correlated sensor ismissing

52 Processing Architecture In order to apply data miningtechnique on sensor data we need to determine the modelsof computation There are two general models Consider thefollowing

CentralizedThe simplest way to analyzeWSNs data is to use acentralized model In this approach entire raw data collectedfromWSNs is transferred to central server whichmaintains adatabase of readings from all of the sensorsThe central serverperforms offline extensive analysis in order to find interestingpatterns from the aggregated data With the size of WSNsincreasing the amount of data transmitted in the system willbecome huge The obvious drawback of this approach is highconsumption of energy and bandwidth Furthermore it is notscalable to very large number of sensors

Distributed Another computation approach uses distributedmodel in which sensor nodes use their processing abilitiesto carry out some mining tasks locally and transmit onlythe required and partially processed data called local modelLocal models contain the compact event patterns rather thanraw data For example data collected from different sensorcan be aggregated before being transmitted to central serverIn these systems an intermediate node called ldquoaggregatorrdquo isused to collect and aggregate the data from different sensorsSince sensor nodes are constrained in resources the challengefor this approach is how to satisfy the mining accuracywhile keeping the communication overhead memory andcomputational cost low

53 Data Mining Method It refers to the data miningalgorithm adapted or developed for unique characteristic ofWSNs data Distributed approaches use one-scan algorithmsfor real-time processing in order to deal with the high dataarrival rate the mining results are expected to be availablewithin short response times whereas centralized approachescollect the sensory data to single site and applies offlinemultiscan technique for extensive data analysis

54 Node Properties The proposed techniques are largelyinfluenced by following types of node properties

Connectivity Single-hop communication is a direct commu-nication between the sensor node and the base station It issimple and easy to implement but limited by communicationdistanceMultihop communication uses some kinds of nodesas relays when transmitting data packets from the source tothe sink which is more complex

Mobility Node mobility increases the complexity of design-ing an appropriate data mining technique for WSNs Themajority of techniques assumes that sensor nodes are staticonly a few techniques consider the node mobility Whennodes are mobile maintaining a certain structure for data

International Journal of Distributed Sensor Networks 13

Table2Com

paris

onof

dataminingtechniqu

esforw

irelesssensor

networks

Approach

Objectiv

eDM

metho

d

Processin

gSensor

data

Nod

eproperties

Implem

entatio

nLimitatio

nsArchitecture

Attributes

Correlatio

nCon

nectivity

Mob

ility

Nod

erole

Nod

etask

Applicationarea

Evaluatio

nmetho

dDatas

ource

Opt

objective

Distributed

Central

Homogenous

Heterogeneous

Attribute

Spatial

Temporal

Singlehop

Multihops

Static

Mobile

ClusterheadSensorRelay

Simulation

AnalyticalMod

Real

Synthetic

Frequent

patte

rnmining

DSA

RM[42]

Missingdata

estim

ation

Aprio

rilik

eradicradic

radicradic

radicradic

Sensea

ndsend

Traffi

cmon

itorin

gradic

radicData

accuracy

Igno

rethes

ensor

thatrepo

rts

different

values

In-networkdata

mining[51]

Eventspatte

rns

discovery

Aprio

rilik

eradic

radicradicradic

radicradic

radic

Aggregatio

nlocalp

attern

mining

Environm

ental

mon

itorin

gradicradicradic

Scalability

Highmem

oryand

commun

ication

Distrib

uted

data

aggregation[15]

ImproveW

SNperfo

rmance

Aprio

rilik

eradic

radicradic

radicradic

radicSupp

ort-b

ased

aggregation

WSN

sperfo

rmance

mon

itorin

gradic

radicDatas

ize

Increasesb

uffer

cost

delayed

crucialm

essages

Onlinea

lgorith

m[46]

Intervallist

ofrepresentatio

nof

WSN

sdata

Lossy

coun

ting

radicradic

radicradic

radicradic

Perio

dical

sensing

WSN

smon

itorin

gradic

radicTimea

ndmem

ory

Datar

edun

dancy

Lightweightrule

learning

[48]

Identifyhigh

lycorrelated

rules

forsensin

gAp

riorilik

eradic

radicradic

radicradic

radicQuery-based

data

sensing

Con

trolW

SNs

operations

radicradic

Energy

Not

valid

ated

well

onrealdata

CARM

[43]

Missingdata

estim

ation

FP-growth

based

radicradic

radicradic

radicradic

Sensea

ndsend

Dataa

nalysis

radicradic

Data

accuracy

Ineffi

cientfor

hand

ling

high

-speed

data

14 International Journal of Distributed Sensor Networks

Table3Com

paris

onof

dataminingtechniqu

esforw

irelesssensor

networkscon

tinued

Approach

Objectiv

eDM

metho

d

Processin

gSensor

data

Nod

eproperties

Implem

entatio

nLimitatio

nsArchitecture

Attributes

Correlatio

nCon

nectivity

Mob

ility

Nod

erole

Nod

etask

Applicationarea

Evaluatio

nmetho

dDatas

ource

Opt

objective

Distributed

Central

Homogenous

Heterogeneous

Attribute

Spatial

Temporal

Singlehop

Multihops

Static

Mobile

Clusterhead

Sensor

Relay

Simulation

Analyticalmod

Real

Synthetic

Frequent

patte

rnmining

Associationrules

mining

fram

ework[50]

Faultand

future

event

predictio

n

FP-growth

usingPL

T-str

uctureradic

radicradic

radicradic

radicradic

Aggregatio

nMon

itorW

SNs

quality

ofserviceradic

radicNoof

messages

Increase

costdu

eto

multip

leDBscan

SP-tr

ee[49]

Disc

over

events

patte

rns

FP-growth

based

radicradic

radicradic

radicradic

Sensea

ndsend

Generic

mon

itorin

gradicradicradic

Mem

ory

Hightre

econstructio

ncost

Sequ

entia

lpattern

mining

Relatio

nal

fram

ework[58]

Multi-

dimensio

nal

correlation

discovery

Aprio

rilik

eradic

radicradicradic

radicradic

Sensea

ndsend

Environm

ental

mon

itorin

gradicradic

Data

representatio

nMem

oryandtim

econsum

ing

Episo

dediscovery(ED)

[21]

Actio

npredictio

n

Generalized

sequ

entia

lpatte

rn(G

SP)

radicradic

radicradic

radicSensea

ndsend

Inhabitants

behavior

predictio

nradicradicradic

Predictio

naccuracy

Ineffi

cientfor

complex

activ

ities

MPG

[64]

Predicto

bjectrsquos

future

movem

ent

Aprio

rilik

eradic

radicradic

radicradicradic

Clusterin

gRe

al-timeo

bject

tracking

radicradic

Tracking

time

andenergy

Not

analyzed

onrealdataset

Con

textual

patte

rns

discovery[22]

Ano

maly

detection

PSP

radicradicradic

radicradic

radicSensea

ndsend

Railw

aymaintenance

radicradic

Ano

maly

precision

Missingreal-time

anom

alypredictio

n

International Journal of Distributed Sensor Networks 15

Table4Com

paris

onof

dataminingtechniqu

esforw

irelesssensor

networkscon

tinued

Approach

Objectiv

eDM

metho

d

Processin

gSensor

data

Nod

eproperties

Implem

entatio

nLimitatio

nsArchitecture

Attributes

Correlatio

nCon

nectivity

Mob

ility

Nod

erole

Nod

etask

Applicationarea

Evaluatio

nmetho

dDatas

ource

Optobjectiv

e

Distributed

Central

Homogenous

Heterogeneous

Attribute

Spatial

Temporal

Singlehop

Multihops

Static

Mobile

ClusterheadSensorRelay

Simulation

Analyticalmod

Real

Synthetic

Sequ

entia

lpattern

mining

TMP-mine[65]

Predicto

bjectrsquos

future

movem

ent

Patte

rngrow

thusingTM

P-tre

econstructio

nradic

radicradic

radicradic

radicRu

le-based

node

activ

ation

Real-timeo

bject

tracking

radicradic

Energy

Highmissing

rateandtim

e

Patte

rnlearner[23]B

ehavior

recogn

ition

Tree

projectio

nradic

radicradic

radicradic

radicSensea

ndsend

Behavior

mon

itorin

gradicradic

Noof

patte

rns

learned

Com

plex

and

redu

ndant

patte

rns

MSA

P[63]

Faultp

rediction

Cand

idate

constructio

nradicradic

radicradicradic

radicSensea

ndsend

Telecommun

ication

radicradic

Patte

rnsa

ccuracy

Cand

idate

constructio

nis

expensiveto

compu

te

PTSP

[66]

Objectrsquos

future

movem

ent

predictio

n

Sequ

entia

lpatte

rngeneratio

nradic

radicradic

radicradic

radicRu

le-based

node

activ

ation

Objecttracking

radicradic

Energy

Ineffi

cientto

predict

high

-speed

objects

Clusterin

g

DCC

[86]

WSN

slon

gevity

Data

correlation-

based

cluste

ring

radicradic

radicradicradic

radicradic

Data

supp

ression

GenericWSN

sapplication

radicradic

Energy

anddata

size

Highclu

sterin

grate

H-cluste

r[85]

In-network

commun

ication

Data

correlation-

based

cluste

ring

radicradic

radicradicradic

radicradic

Data

summarization

Real-time

mon

itorin

gradic

radicradic

Com

mun

ication

Highdataloss

rate

16 International Journal of Distributed Sensor Networks

Table5Com

paris

onof

dataminingtechniqu

esforw

irelesssensor

networkscon

tinued

Approach

Objectiv

eDM

metho

d

Processin

gSensor

data

Nod

eproperties

Implem

entatio

nLimitatio

nsArchitecture

Attributes

Correlatio

nCon

nectivity

Mob

ility

Role

Nod

etask

Applicationarea

Evaluatio

nmetho

dDatas

ource

Optobjectiv

e

Distributed

Central

Homogenous

Heterogeneous

Attribute

Spatial

Temporal

Singlehop

Multihops

Static

Mobile

ClusterheadSensorRelay

Simulation

Analyticalmod

Real

Synthetic

Clusterin

gPredictio

nmod

el[87]

Predictio

n-based

mon

itorin

gHeuris

ticscheme

radicradic

radicradic

radicradic

radicradicradic

Localprediction

mod

elEn

vironm

ental

mon

itorin

gradic

radicCom

mun

ication

Clustero

verla

pping

CAG[88]

WSN

sbandw

idth

gain

Data

correlation-

based

cluste

ring

radicradic

radicradic

radicradic

radicradic

Dataa

ggregatio

nGenericWSN

sapplications

radicradic

Com

mun

ication

Sensorydataloss

EEDC[84]

On-demand

cluste

ring

Data

correlation-

based

cluste

ring

radicradic

radicradic

radicradic

radicSensea

ndsend

Surveillanced

ata

analysis

radicradicradic

Energy

Ineffi

cientfor

large

WSN

s

Clusterin

gsensorydata[67]Com

mun

ication

efficiency

K-means

radicradicradic

radicradic

radicradic

Data

summarization

Dataa

nalysis

radicradic

Com

mun

ication

Ineffi

cientfor

large

WSN

sAttributeb

ased

cluste

ring[89]

WSN

sbandw

idth

gain

Hierarchal

cluste

ringradic

radicradic

radicradic

radicradic

Datac

luste

ring

Mon

itorin

gand

tracking

radicradic

Com

mun

ication

Highcompu

tatio

ncost

DHCS

[90]

Uniform

data

distr

ibution

Hierarchal

cluste

ringradic

radicradicradic

radicradic

radicradic

Datac

luste

ring

and

summarization

Interactived

ata

analysis

radicMessage

redu

ction

Nod

esenergy

isigno

red

International Journal of Distributed Sensor Networks 17

Table6Com

paris

onof

dataminingtechniqu

esforw

irelesssensor

networkscon

tinued

Approach

Objectiv

eDM

metho

d

Processin

gSensor

data

Nod

eproperties

Implem

entatio

nLimitatio

nsArchitecture

Attributes

Correlatio

nCon

nectivity

Mob

ility

Role

Nod

etask

Applicationarea

Evaluatio

nmetho

dDatas

ource

Opt

objective

Distributed

Central

Homogenous

Heterogeneous

Attribute

Spatial

Temporal

Singlehop

Multihops

Static

Mobile

ClusterheadSensorRelay

Simulation

Analyticalmod

Real

Synthetic

Classifi

catio

nPerson

identifi

catio

nalgorithm

s[109]

Identifyhu

man

behavior

Decision

tree

radicradicradic

radicradic

radicSensea

ndsend

Health

care

radicradic

Classifi

catio

naccuracy

Doesn

otgu

arantee

thec

orrectness

Predictio

nfram

ework[103]

Distrib

uted

predictio

nDecision

tree

radicradic

radicradicradic

radicradic

Localprediction

Generic

radicradic

Predictio

naccuracy

Com

putatio

nal

complexity

NNTC

[96]

Real-time

classificatio

nNearest

neighb

orradicradic

radicradic

radicradic

Sensea

ndsend

Generic

radicradicradic

Classifi

catio

naccuracy

Not

evaluatedon

realdataset

LWClass[100]

Preserve

WSN

sresources

KNN

radicradic

radicradic

radicradic

Sensea

ndsend

Ubiqu

itous

environm

ents

radicradic

Resource

awareness

Non

adaptio

nto

conceptd

rift

FVLD

[104

]Lo

w-dim

ensio

nfeaturev

ector

generatio

nKN

NM

Lradic

radicradic

radicradic

radicradic

Classifi

catio

nVe

hicle

classificatio

nradic

radicEn

ergy

Highcostof

feature

vector

transm

ission

Fuzzypredictor

mod

el[99]

Occup

ancy

predictio

nFu

zzyrules

radicradic

radicradic

radicradic

Sensea

ndsend

Health

care

radicradic

Predictio

naccuracy

Ineffi

cientfor

complex

scenarios

Onlinelearning

[105]

Increm

ental

classificatio

nSV

Mradic

radicradic

radicradic

radicradic

Classifi

catio

nEn

vironm

ental

mon

itorin

gradic

radicEn

ergy

Com

putatio

nal

complexity

One-class

quarter-sphere

SVM

[108]

Ano

maly

detection

SVM

radicradic

radicradic

radicradicradic

Localano

maly

detection

Habitat

mon

itorin

gradic

radicEn

ergy

Igno

resspatia

lcorrelation

18 International Journal of Distributed Sensor Networks

mining becomes difficult because updates on this structureshould be persisted over time

Node Role Node can perform three types of role [33] asfollows

(i) Regular Sensor These are the nodes with limitedresources and they are used to sense the phenomenaand send the sensed data to the base station

(ii) Cluster Head Cluster head can be a regular sensornode or it can be rich in resources In centralizedapproaches cluster head is a regular sensor node thatonly controls the cluster membership In distributedapproaches besides responding for cluster formationCHs perform aggregationfusion of collected sensorsrsquodata Therefore they are equipped with significantlymore computation and communication resources

(iii) Relay It is the node that acts as medium to transmitthe data packet from one node to the others

Node Task In centralized approach node task is to sense thephenomena being monitored and send the sensed data to thebase station In distributed approaches node can performcomputation and can take action based on the detectedphenomena or target

55 Application Area We also evaluated the type of applica-tion benefited fromWSNs data mining techniques Here weexemplify some real-world applications as follows

(i) First is the environmental monitoring [5ndash7 51 5887] in which sensors are deployed in harsh andunattended regions to monitor the natural environ-ment Data mining techniques can identify when andwhere an event may occur and trigger an alarm upondetection

(ii) Second is the habitant and health monitoring [1 299 109] in which patientshumans are equipped withsmall sensors on multiple different positions of theirbody tomonitor their health or behaviorDataminingtechnique can identify the abnormal behavior andhelp to take effective action

(iii) Third is the object tracking [3 4 65 66] in whichsensors are embedded inmoving targets to track themin real-time Data mining techniques help to improvethe estimation of the location of targets and also tomake tracking more efficient and accurate

(iv) Fourth is the WSNs performance [46 48 50 51]WSNs are usually unattended and deployed in harshenvironment Sensor nodes are resource constrainedespecially in terms of power Data mining techniqueshelp to identify the faulty or dead nodes Theyalso help to conserve energy by using in-networkprocessing in which aggregated data is sent to centralside

(v) Fifth is the data analysis [67 84 90] Data miningtechniques help to discover potentially interesting

data patterns in a sensor network for a certainapplication

(vi) Sixth is the real-time monitoring [64 65 85] Datamining techniques especially distributed techniqueshelp to identify certain patterns and predict futureevents in a given time window which make real-timeresponse and action feasible

56 Implementation Each technique is also evaluated interms of experimental validation that is which dataset isused which WSNs optimization objectives are achieved andso forth

Evaluation Method Analytical modeling simulation andreal deployment are the most commonly used techniques toanalyze the performance of data mining technique forWSNs

(i) Analytical Modeling This method is very complexand usually certain simplifications are assumed topredict the performance of the proposed schemeSuch assumptions and simplifications may lead toimprecise results with limited confidence

(ii) Simulation It is the most popular and effectiveapproach to design and test any proposed schemein terms of cost and time it also provides higherlevel of details as comparedwith real implementationHowever the appropriate selection of a simulationframework according to problem and network char-acteristics is a critical task

(iii) Real Deployment It may not be feasible to evaluatethe performance of these techniques through realdeployment due to the unavailability of appropriatehardware in terms of technical and design limitationsUsually the real deployment requires hundreds ofsensor nodes and cost becomes another importantissue In a nutshell evaluating any technique pro-posed for WSNs through real deployment can getthe most convincing results although the evaluatingprocess is complex costly and time consuming

Data Source It refers to dataset use to experimentally validatethe proposed technique Two types of dataset are usedgenerally that is synthetic and real It is observed from thispaper that most of the techniques use the simulation onsynthetic dataset to validate the result In this paper it isobserved that most of the studies used the simulation due tolimited processing power of sensor nodes

Optimization Objective SinceWSNs are constrained in termsof different resources the technique is also evaluated in theoptimization objective that has been achieved Most of thetechniques consider the resource constraint and differentdesign philosophies of network None of them can workefficiently for all of the performance metrics like networksize communication overhead energy efficiency memoryconsumption node mobility and and so forth The largevariations in the performance metrics make it a difficult taskto present a comprehensive evaluation

International Journal of Distributed Sensor Networks 19

6 Limitations of Existing Data MiningTechniques for WSNs

Tables 2ndash6 show the characteristics of datamining techniquesdesigned for WSNs It is observed from comparative analysisthat the existing techniques have the following shortcomings

(i) Most of the techniques do not take into account theheterogeneous data and assume that the sensor data ishomogenous [42 46 49ndash51 65 87 110] They ignorethe fact that different attributes together can improvethe mining accuracy In some cases homogenousdata cannot contribute appropriately toward real-time decision

(ii) The majority of techniques only considers the spatialor temporal or spatiotemporal correlations [65ndash6787 88] among sensor data of neighboring nodes anddoes not consider the attribute dependency amongsensor nodes This in turn increases the computa-tional complexity and reduces the accuracy of miningtechnique

(iii) The techniqueswhich consider spatial correlation [51]among sensor data of neighboring nodes suffer fromthe choice of appropriate neighborhood range Tech-niques which consider temporal correlation amongsensor data suffers from the choice of the size of thesliding window

(iv) The majority of techniques uses centralized approach[21 42ndash44 46 58 84 101] in which all data istransmitted to the sink node for identifying certainpatterns These techniques cause much communica-tion overhead and delay the response time Whilethe techniques that used distributed architecture opti-mize response time and energy consumption theyhave the same problem as that of the centralizedapproach if the aggregatorcluster head has a largenumber of nodes under its membership

(v) Excluding a few the performance of all of the schemesdiscussed in this paper has been evaluated with thehelp of different simulation tools Although the num-ber of simulators is available and plays an importantrole for developing and testing new technique thereis always some kind of risk involved as simulationresults may not be accurate In order to analyze aprotocol more effectively it is important to knowdifferent available tools andunderstand the associatedbenefits and limitationsDue to different performancerequirements according to specific applications ageneral tool for sensor networks is still lacking atpresent

(vi) The techniques evaluated by using analytical mod-eling [21 23 46 49 100 109] used certain sim-plification and assumption to evaluate the perfor-mance of proposed technique Such assumptions andsimplifications may lead to imprecise results withlimited confidence None of the proposed techniqueis evaluated by using real deployment Although realdeployment is complex costly and time consuming

accurate results can only be obtained by using realdeployment

(vii) Excluding a few [22 103 109] the majority oftechniques assumes that sensor nodes are stationaryand do not consider nodes mobility Applying thesetechniques for mobile networks or the networks withdynamic changed topology would be challenging

(viii) Most of the techniques used the synthetic dataAlthough synthetic data is easily available therealways been chances that results generated on syn-thetic data are not accurate

(ix) For the data mining techniques themselves fre-quent pattern mining [15ndash20] approaches suffer fromchoice of proper and flexible support and confidencethreshold Clustering techniques [11ndash14] suffer fromthe choice of an appropriate parameter of clusterwidth and computing the distance between datainstances in heterogeneous data is computationallyexpensive whereas classification-based techniques[24ndash26] require some prior knowledge to classify theincoming data stream However learning accurateclassification model is challenging if the number ofvariables is large in deployed WSNs

7 Future Research Directions

It is observed from the analysis of existing data mining workon sensor network-based application there are still shortcom-ings in existing techniques By seeing these shortcomingsand special characteristics of WSNs there is a need for datamining technique designed for WSNs The technique shouldbe based on the following requirements

(i) The technique should combine offline learningmech-anisms with distributed and online data processing

(ii) It should also consider the resource constraint ofWSN and its special characteristics such as nodemobility and network topology

(iii) The technique should consider heterogeneous dataand dependencies among spatial temporal andattribute correlations which may exist between adja-cent nodes

(iv) During online mining the technique should be capa-ble for incremental learning

(v) The technique should have low computation com-plexity and be easy to be implemented

Based on aforementioned requirements for WSN ahybrid data mining framework is proposed as shown inFigure 6 In this framework sensor nodes use their pro-cessing abilities to locally carry out mining processing andtransmit only the required and partially processed data calledlocal models Single-pass algorithms are applied for networkdata processing as the data is continuously arriving and notavailable for the next scan

Local models contain the compact event patterns ratherthan raw data which address the issue of communication

20 International Journal of Distributed Sensor Networks

Node data processingData selectionRemove duplicationAggregationSummarizationData fusionclusteringAssociation analysismiddot middot middotmiddot middot middot

middot middot middot

Sensor datastream

Global model

Approximateresults

Network model Local modelQuery

Users

Sinkbasestation

In-network processingCentralized processing

Central data processingFrequent pattern miningClassificationClusteringIncremental learningPredicationAnomaly detectionTime series analysis

Network data processingLocal model integrationNetwork analysisReal time decisionsNetwork maintenance

Network patternidentification

monitoring

Sing

le p

ass

Mul

ti pa

ss

Figure 6 Proposed hybrid framework for sensor network based applications

overhead associated with data transfer Local models aredistributed on entire network which are integrated at specialnode which is resource sufficient as compared with othersensor nodes As a result a network model is computed that ismore abstract than local model and is transferred to the basestationsink inmultihop fashionThenetworkmodels are thenintegrated at base stationsink to get the global view of entirenetwork named the global model As a result approximatequery answers are returned to endusers

This framework addresses the following shortcomings ofthe existing techniques

(i) It combines the offline learning mechanisms withdistributed and online data processing The dynamicnature of WSNs data requires real-time analysismethodologies and systems Centralized processingthrough high-end computing is also required forgenerating offline predictive insights which in turncan facilitate real-time analysis The applications thatrequire real-time response and actions can use net-work model for decision and knowledge extractionThe applications that need extensive data analysis fortheir decision making can use global model and per-form central processing on base the stationsink Thenetwork model forwards the processed informationto global model for extensive predictive insight

(ii) Since the data management is a crucial issue inWSNsdata [111] in order to deal with large-scale data fromWSNs the proposed framework splits the data pro-cessing tasks at multiple locations in-network pro-cessing and processing at central server In-networkprocessing splits the large task into smaller ones atnode level and cluster head which is distributed overthe entire network and executes parallelly At the node

level storage capacities of single nodes are used tocompute the local model which contains aggregateddata from single node whereas cluster head acquiresthe data from group of nodes and aggregate datareadings over a certain region or period As a resultnetwork model is computed at each cluster headwhich contains compact data from set of nodes andreduces data size to be transmitted Network modelscan be integrated at sink to get the global view ofreal-time applications Since the sink at network levelhas restricted resource and cannot process large-scaledata for predictive analysis therefore network mod-els are sent to central server where global models canbe computed for predictive offline analysis Historicalquery from the user can also be addressed fromcentral server whereas instant query can be handledby sink to support real-time response In this way ofdata distribution the proposed framework is feasibleto deal with large amount of data obtained fromWSNs

(iii) It can consider the resource constraint of sensornode by using context-awareness techniques Mem-ory energy [79] and bandwidth are considered inthe implementation of data processing on the sensorsfor example many summarization and aggregationtechniques can be adopted to reduce energy andbandwidth consumption

(iv) The framework can address the problem quicklychanging nature of WSNs data where characteristicsof the monitored process may change over timeand render the old models outdated This problemcan be addressed using the incremental learning

International Journal of Distributed Sensor Networks 21

mechanism [39 112] that helps the model to updatenew information

(v) The framework can identified the spatial-temporalcorrelation at local model by using data correlation-based clustering whereas attribute correlation can beidentified at global model by using the multipass datamining algorithms

Currently we are working on implementation of thishybrid framework and the implementationwill be completedin the near future

8 Conclusion

The emerging need for the data mining techniques in thefield of WSNs resulted in the development of numerousalgorithms Each one of these algorithms solves certainissues related to the appropriate WSNs type and applicationIn this paper we analyzed discussed and compared therelated existing research approaches We observed that thetechniques intended for mining sensor data at the networkside are helpful for taking real-time decision aswell as serve asprerequisite for development of effective mechanism for datastorage retrieval query and transaction processing at centralside Moreover we have presented problem-based taxonomyan overall analysis and review of the past research and theirlimitations which can provide insights for endusers in apply-ing or developing an appropriate data mining method andappropriate technology forWSNs Based on these limitationswe have proposed a hybrid framework which can addressthe shortcomings of existing work We have also discussedthe challenges for implementing data mining techniques inresource-constrained WSNs Besides there are a number ofopen issues in existing studies which need to be addressedSurely the number of WSNs applications presented hereis neither complete nor exhaustive but merely a sample ofapplications that demonstrate the usefulness and possibleapplications of data mining method in sensor network

We believe that WSNs applications will become moremature and popular with the advancement of sensor tech-nology and sensor data will become more informationrich Mining techniques will then be very significant inorder to conduct advanced analysis such as determiningtrends and finding interesting patterns thus enhancingWSNsperformance and operation The intention to present thispaper is to stimulate interests in utilizing and developing theprevious studies into emerging applications

Acknowledgments

This work was supported in part by the Joint Funds ofNSFC-Microsoft Research Asia under Grant no 60933012the Specialized Research Fund for the Doctoral Programof Higher Education under Grant no 20110142110062 andInternational SampT Cooperation Program of Hubei Provinceunder Grant no 2010BFA008

References

[1] A Rozyyev H Hasbullah and F Subhan ldquoIndoor child track-ing in wireless sensor network using fuzzy logic techniquerdquoResearch Journal of Information Technology vol 3 no 2 pp 81ndash92 2011

[2] R Szewczyk E Osterweil J Polastre M Hamilton A Main-waring and D Estrin ldquoHabitat monitoring with sensor net-worksrdquo Communications of the ACM vol 47 no 6 pp 34ndash402004

[3] S H Chauhdary A K Bashir S C Shah and M S ParkldquoEOATR energy efficient object tracking by auto adjustingtransmission range in wireless sensor networkrdquo Journal ofApplied Sciences vol 9 no 24 pp 4247ndash4252 2009

[4] P K Biswas and S Phoha ldquoSelf-organizing sensor networks forintegrated target surveillancerdquo IEEETransactions onComputersvol 55 no 8 pp 1033ndash1047 2006

[5] L T Lee and C W Chen ldquoSynchronizing sensor networkswith pulse coupled and cluster based approachesrdquo InformationTechnology Journal vol 7 no 5 pp 737ndash745 2008

[6] N Sabri S A Aljunid B Ahmad A Yahya R KamaruddinandM S Salim ldquoWireless sensor actor network based on fuzzyinference system for greenhouse climate controlrdquo Journal ofApplied Sciences vol 11 no 17 pp 3104ndash3116 2011

[7] D Kumar ldquoMonitoring forest cover changes using remotesensing and GIS a global prospectiverdquo Research Journal ofEnvironmental Sciences vol 5 pp 105ndash123 2011

[8] J Yick B Mukherjee and D Ghosal ldquoWireless sensor networksurveyrdquoComputerNetworks vol 52 no 12 pp 2292ndash2330 2008

[9] T Arampatzis J Lygeros and S Manesis ldquoA survey of appli-cations of wireless sensors and wireless sensor networksrdquoin Proceedings of the 20th IEEE International Symposium onIntelligent Control (ISIC rsquo05) pp 719ndash724 June 2005

[10] Y-C Tseng M-S Pan and Y-Y Tsai ldquoWireless sensor net-works for emergency navigationrdquo Computer vol 39 no 7 pp55ndash62 2006

[11] T Yairi Y Kato and K Hori ldquoFault detection by miningassociation rules fromhouse-keeping datardquo inProceedings of the6th International Symposium on Artificial Intelligence Roboticsand Automation in Space pp 18ndash21 2001

[12] O Horovitz S Krishnaswamy and M M Gaber ldquoA fuzzyapproach for interpretation of ubiquitous data stream clusteringand its application in road safetyrdquo Intelligent Data Analysis vol11 no 1 pp 89ndash108 2007

[13] J Gama P P Rodrigues and L Lopes ldquoClustering distributedsensor data streams using local processing and reduced com-municationrdquo Intelligent Data Analysis vol 15 no 1 pp 3ndash282011

[14] Z A Aghbari I Kamel and T Awad ldquoOn clustering largenumber of data streamsrdquo Intelligent Data Analysis vol 16 no1 pp 69ndash91 2012

[15] A Boukerche and S Samarah ldquoAn efficient data extractionmechanism for mining association rules from wireless sensornetworksrdquo in Proceedings of the IEEE International Conferenceon Communications (ICC rsquo07) pp 3936ndash3941 June 2007

[16] Y Chi H Wang P S Yu and R R Muntz ldquoMomentmaintaining closed frequent itemsets over a stream slidingwindowrdquo inProceedings of the 4th IEEE International Conferenceon Data Mining (ICDM rsquo04) pp 59ndash66 November 2004

[17] M Deypir and M H Sadreddini ldquoEclatDS an efficient slid-ing window based frequent pattern mining method for data

22 International Journal of Distributed Sensor Networks

streamsrdquo Intelligent Data Analysis vol 15 no 4 pp 571ndash5872011

[18] J Gama A Ganguly O Omitaomu R Vatsavai and M GaberldquoKnowledge discovery from data streamsrdquo Intelligent DataAnalysis vol 13 no 3 pp 403ndash404 2009

[19] B George J M Kang and S Shekhar ldquoSpatio-temporal sensorgraphs (STSG) a data model for the discovery of spatio-temporal patternsrdquo Intelligent Data Analysis vol 13 no 3 pp457ndash475 2009

[20] A Mahmood K Shi and S Khatoon ldquoMining data generatedby sensor networks a surveyrdquo Information Technology Journalvol 11 pp 1534ndash1543 2012

[21] D J Cook M Youngblood E O Heierman III et alldquoMavHome an agent-based smart homerdquo in Proceedings of the1st IEEE International Conference on Pervasive Computing andCommunications (PerCom rsquo03) pp 521ndash524 March 2003

[22] J Rabatel S Bringay and P Poncelet ldquoSO MAD sensorminingfor anomaly detection in railway datardquo in Advances in DataMining Applications andTheoretical Aspects pp 191ndash205 2009

[23] V Guralnik and K Z Haigh ldquoLearning models of humanbehaviour with sequential patternsrdquo in Proceedings of the AAAI-02 Workshop on Automation as Caregiver pp 24ndash30 2002

[24] S Huang and Y Dong ldquoAn active learning system for miningtime-changing data streamsrdquo Intelligent Data Analysis vol 11no 4 pp 401ndash419 2007

[25] J Beringer and E Hullermeier ldquoEfficient instance-based learn-ing on data streamsrdquo Intelligent Data Analysis vol 11 no 6 pp627ndash650 2007

[26] E J Spinosaa A PD L F deCarvalhoa and J Gamab ldquoNoveltydetection with application to data streamsrdquo Intelligent DataAnalysis vol 13 no 3 pp 405ndash422 2009

[27] M Xie S Han B Tian and S Parvin ldquoAnomaly detectionin wireless sensor networks a surveyrdquo Journal of Network andComputer Applications vol 34 no 4 pp 1302ndash1325 2011

[28] Y Zhang N Meratnia and P Havinga ldquoOutlier detectiontechniques for wireless sensor networks a surveyrdquo IEEE Com-munications Surveys and Tutorials vol 12 no 2 pp 159ndash1702010

[29] V Chandola A Banerjee and V Kumar ldquoAnomaly detection asurveyrdquo ACM Computing Surveys vol 41 no 3 article 15 2009

[30] VMaojo and J Sanandre ldquoA survey of data mining techniquesrdquoMedical Data Analysis Lecture Notes in Computer Science vol1933 pp 17ndash22 2000

[31] W Jinlong X Congfu C Weidong and P Yunhe ldquoSurveyof the study on frequent pattern mining in data streamsrdquo inProceedings of the IEEE International Conference on SystemsMan and Cybernetics (SMC rsquo04) pp 5917ndash5922 October 2004

[32] J Cheng Y Ke and W Ng ldquoA survey on algorithms formining frequent itemsets over data streamsrdquo Knowledge andInformation Systems vol 16 no 1 pp 1ndash27 2008

[33] A A Abbasi andM Younis ldquoA survey on clustering algorithmsfor wireless sensor networksrdquo Computer Communications vol30 no 14-15 pp 2826ndash2841 2007

[34] O Boyinbode H Le and M Takizawa ldquoA survey on clusteringalgorithms for wireless sensor networksrdquo International Journalof Space-Based and SituatedComputing vol 1 no 2 pp 130ndash1362007

[35] M M Gaber A Zaslavsky and S Krishnaswamy ldquoA survey ofclassificationmethods in data streamsrdquo inData Streams pp 39ndash59 Springer 2007

[36] R Agrawal and R Srikant ldquoFast algorithms for mining associ-ation rulesrdquo in Proceedings of the 20th International ConferenceVery Large Data Bases (VLDB rsquo94) pp 487ndash499 Citeseer 1994

[37] R J Bayardo Jr ldquoEfficiently mining long patterns fromdatabasesrdquo SIGMOD Record vol 27 no 2 pp 85ndash93 1998

[38] S Brin RMotwani andC Silverstein ldquoBeyondmarket basketsgeneralizing association rules to correlationsrdquo SIGMODRecordvol 26 no 2 pp 265ndash276 1997

[39] W Cheung and O R Zaiane ldquoIncremental mining of frequentpatterns without candidate generation or support constraintrdquoin Proceedings of 7th International Database Engineering andApplications Symposium pp 111ndash116 2003

[40] R Agrawal T Imielinski and A Swami ldquoMining associationrules between sets of items in large databasesrdquo in Proceeding ofSIGMOD pp 207ndash216

[41] J Han J Pei Y Yin and R Mao ldquoMining frequent pat-terns without candidate generation a frequent-pattern treeapproachrdquo Data Mining and Knowledge Discovery vol 8 no 1pp 53ndash87 2004

[42] M Halatchev and L Gruenwald ldquoEstimating missing valuesin related sensor data streamsrdquo in Proceedings of the 11thInternational Conference on Management of Data (COMADrsquo05) 2005

[43] N Jiang ldquoDiscovering association rules in data streams basedon closed pattern miningrdquo in Proceedings of the SIGMODWorkshop on Innovative Database Research 2007

[44] N Jiang and L Gruenwald ldquoEstimating missing data in datastreamsrdquo Advances in Databases Concepts Systems and Appli-cations pp 981ndash987 2007

[45] N Jiang and L Gruenwald ldquoCFI-stream mining closed fre-quent itemsets in data streamsrdquo in Proceedings of the 12th ACMSIGKDD International Conference on Knowledge Discovery andData Mining (KDD rsquo06) pp 592ndash597 August 2006

[46] K Loo I Tong and B Kao ldquoOnline algorithms for min-ing inter-stream associations from large sensor networksrdquo inAdvances in KnowledgeDiscovery andDataMining pp 291ndash3022005

[47] G S Manku and R Motwani ldquoApproximate frequency countsover data streamsrdquo in Proceedings of the 28th InternationalConference on Very Large Data Bases pp 346ndash357 2002

[48] S K Chong S Krishnaswamy S W Loke and M M GaberldquoUsing association rules for energy conservation in wirelesssensor networksrdquo in Proceedings of the 23rd Annual ACMSymposium on Applied Computing (SAC rsquo08) pp 971ndash975March 2008

[49] S K Tanbeer C F Ahmed B-S Jeong and Y-K Lee ldquoEfficientmining of association rules from wireless sensor networksrdquo inProceedings of the 11th International Conference on AdvancedCommunication Technology (ICACT rsquo09) pp 719ndash724 February2009

[50] A Boukerche and S Samarah ldquoA novel algorithm for miningassociation rules in Wireless Ad Hoc Sensor Networksrdquo IEEETransactions on Parallel and Distributed Systems vol 19 no 7pp 865ndash877 2008

[51] K Romer ldquoDistributed mining of spatio-temporal event pat-terns in sensor networksrdquo in Proceedings of the 1st Euro-American Workshop on Middleware for Sensor Networks(EAWMS rsquo06) 2006

[52] BTnode platform httpwwwbtnodeethzch[53] R Agrawal and R Srikant ldquoMining sequential patternsrdquo in

Proceedings of the IEEE 11th International Conference on DataEngineering pp 3ndash14 March 1995

International Journal of Distributed Sensor Networks 23

[54] R Srikant and R Agrawal ldquoMining sequential patterns gen-eralizations and performance improvementsrdquo in Proceedings ofthe Advances in Database Technology (EDBT rsquo96) pp 1ndash17 1996

[55] F Masseglia F Cathala and P Poncelet ldquoThe PSP approachfor mining sequential patternsrdquo Principles of Data Mining andKnowledge Discovery pp 176ndash184 1998

[56] J Han J Pei B Mortazavi-Asl Q Chen U Dayal and M-CHsu ldquoFreeSpan frequent pattern-projected sequential patternminingrdquo in Proceedings of the Sixth ACMSIGKDD InternationalConference onKnowledgeDiscovery andDataMining (KDD rsquo01)pp 355ndash359 August 2000

[57] J Pei J Han B Mortazavi-Asl et al ldquoPrefixSpan min-ing sequential patterns efficiently by prefix-projected patterngrowthrdquo in Proceedings of the 17th International Conference onData Engineering pp 215ndash224 April 2001

[58] F Esposito T M A Basile N Di Mauro and S Ferilli ldquoA rela-tional approach to sensor network data miningrdquo InformationRetrieval and Mining in Distributed Environments pp 163ndash1812010

[59] F Esposito N Di Mauro T M A Basile and S FerillildquoMulti-dimensional relational sequence miningrdquo FundamentaInformaticae vol 89 no 1 pp 23ndash43 2008

[60] R Agrawal H Mannila R Srikant et al ldquoFast discovery ofassociation rulesrdquo inAdvances in KnowledgeDiscovery andDataMining pp 307ndash328 AAAI PressMenlo Park Calif USA 1996

[61] Mica2Dot CrossBow 2005 httpwwwxbowcom[62] Intel Berkeley Research Lab Data httpdbcsailmitedulab-

datalabdatahtml[63] P H Wu W C Peng and M S Chen ldquoMining sequential

alarm patterns in a telecommunication databaserdquo in Databasesin Telecommunications II pp 37ndash51 2001

[64] V S Tseng and E H-C Lu ldquoEnergy-efficient real-time objecttracking in multi-level sensor networks by mining and predict-ing movement patternsrdquo Journal of Systems and Software vol82 no 4 pp 697ndash706 2009

[65] V S Tseng and K W Lin ldquoEnergy efficient strategies for objecttracking in sensor networks a data mining approachrdquo Journalof Systems and Software vol 80 no 10 pp 1678ndash1698 2007

[66] S Samarah M Al-Hajri and A Boukerche ldquoA predictiveenergy-efficient technique to support object-tracking sensornetworksrdquo IEEE Transactions on Vehicular Technology vol 60no 2 pp 656ndash663 2011

[67] A Taherkordi R Mohammadi and F Eliassen ldquoA commu-nication-efficient distributed clustering algorithm for sensornetworksrdquo in Proceedings of the 22nd International Conferenceon Advanced Information Networking and Applications Work-shopsSymposia (AINA rsquo08) pp 634ndash638 March 2008

[68] G Gupta and M Younis ldquoLoad-balanced clustering of wirelesssensor networksrdquo in Proceedings of the International Conferenceon Communications (ICC rsquo03) vol 3 pp 1848ndash1852 May 2003

[69] S Bandyopadhyay and E J Coyle ldquoAn energy efficient hier-archical clustering algorithm for wireless sensor networksrdquo inProceedings of the 22nd Annual Joint Conference on the IEEEComputer and Communications Societies pp 1713ndash1723 April2003

[70] S Ghiasi A Srivastava X Yang and M Sarrafzadeh ldquoOptimalenergy aware clustering in sensor networksrdquo Sensors vol 2 no7 pp 258ndash269 2002

[71] O Younis and S Fahmy ldquoHEED a hybrid energy-efficientdistributed clustering approach for ad hoc sensor networksrdquoIEEE Transactions on Mobile Computing vol 3 no 4 pp 366ndash379 2004

[72] M Younis M Youssef and K Arisha ldquoEnergy-aware manage-ment for cluster-based sensor networksrdquo Computer Networksvol 43 no 5 pp 649ndash668 2003

[73] Y T Hou Y Shi H D Sherali and S F Midkiff ldquoOn energyprovisioning and relay node placement for wireless sensornetworksrdquo IEEE Transactions on Wireless Communications vol4 no 5 pp 2579ndash2590 2005

[74] T Wu and S Biswas ldquoA self-reorganizing slot allocation proto-col for multi-cluster sensor networksrdquo in Proceedings of the 4thInternational Symposium on Information Processing in SensorNetworks (IPSN rsquo05) pp 309ndash316 April 2005

[75] K Dasgupta K Kalpakis and P Namjoshi ldquoAn efficientclustering-based heuristic for data gathering and aggregationin sensor networksrdquo in Proceedings of the IEEE Wireless Com-munications and Networking Conference (WCNC rsquo03) vol 3 pp1948ndash1953 2003

[76] M Demirbas A Arora and V Mittal ldquoFLOC A fast local clus-tering service for wireless sensor networksrdquo in Proceedings ofWorkshop on Dependability Issues in Wireless Ad Hoc Networksand Sensor Networks (DIWANS rsquo04) 2004

[77] P Ding J Holliday and A Celik ldquoDistributed energy-efficienthierarchical clustering for wireless sensor networksrdquo in Pro-ceedings of the 1st IEEE International Conference on DistributedComputing in Sensor Systems (DCOSS rsquo05) pp 466ndash467 July2005

[78] H Chan and A Perrig ldquoACE an emergent algorithm for highlyuniform cluster formationrdquoWireless Sensor Networks vol 2920pp 154ndash171 2004

[79] H Chan M Luk and A Perrig ldquoUsing clustering informationfor sensor network localizationrdquo in Proceedings of the 1st IEEEInternational Conference on Distributed Computing in SensorSystems (DCOSS rsquo05) pp 109ndash125 July 2005

[80] H Huang and J Wu ldquoA probabilistic clustering algorithmin wireless sensor networksrdquo in Proceeding of IEEE 62ndSemiannual Vehicular Technology Conference (VTC rsquo05) p 17962005

[81] A Youssef M Younis M Youssef and A Agrawala ldquoDis-tributed formation of overlappingmulti-hop clusters in wirelesssensor networksrdquo in Proceedings of the 49th Annual IEEE GlobalCommunication Conference (Globecom rsquo06) pp 1ndash6 December2006

[82] S Dai P Wang L Gao and S Zheng ldquoMining clusteringalgorithm in wireless sensor networksrdquo in Proceedings of theIEEE International Conference on Granular Computing (GRCrsquo08) pp 178ndash182 August 2008

[83] W R Heinzelman A Chandrakasan and H Balakrish-nan ldquoEnergy-efficient communication protocol for wirelessmicrosensor networksrdquo in Proceedings of the 33rd AnnualHawaii International Conference on System Siences (HICSS rsquo00)vol 2 p 223 January 2000

[84] C Liu K Wu and J Pei ldquoA dynamic clustering and schedulingapproach to energy saving in data collection from wirelesssensor networksrdquo in Proceedings of the 2nd Annual IEEE Com-munications Society Conference on Sensor and AdHoc Commu-nications and Networks (SECON rsquo05) pp 374ndash385 September2005

[85] L Guo C Ai X Wang Z Cai and Y Li ldquoReal time clusteringof sensory data in wireless sensor networksrdquo in Proceedingsof the IEEE 28th International Performance Computing andCommunications Conference (IPCCC rsquo09) pp 33ndash40 December2009

24 International Journal of Distributed Sensor Networks

[86] M H Yeo M S Lee S J Lee and J S Yoo ldquoData correlation-based clustering in sensor networksrdquo in Proceedings of the Inter-national Symposium on Computer Science and its Applications(CSA rsquo08) pp 332ndash337 October 2008

[87] P Beyens A Nowe and K Steenhaut ldquoHigh-density wirelesssensor networks a new clustering approach for prediction-based monitoringrdquo in Proceedings of the 2nd European Work-shop on Wireless Sensor Networks (EWSN rsquo05) pp 188ndash196February 2005

[88] S Yoon and C Shahabi ldquoThe Clustered AGgregation (CAG)technique leveraging spatial and temporal correlations in wire-less sensor networksrdquo ACM Transactions on Sensor Networksvol 3 no 1 Article ID 1210672 2007

[89] K Wang S A Ayyash T D C Little and P Basu ldquoAttribute-based clustering for information dissemination in wirelesssensor networksrdquo in Proceedings of the 2nd Annual IEEE Com-munications Society Conference on Sensor and AdHoc Commu-nications and Networks (SECON rsquo05) pp 498ndash509 Santa ClaraCalif USA September 2005

[90] X Ma S Li Q Luo et al ldquoDistributed hierarchical clusteringand summarization in sensor networksrdquo in Advances in Dataand Web Management pp 168ndash175 2007

[91] L K Sharma O P Vyas S Schieder et al ldquoNearest neighbourclassification for trajectory datardquo Information and Communica-tion Technologies vol 101 pp 180ndash185 2010

[92] B Chikhaoui S Wang and H Pigot ldquoA new algorithm basedon sequential pattern mining for person identification in ubiq-uitous environmentsrdquo in Proceedings of the 4th InternationalWorkshop on Knowledge Discovery form Sensor Data (ACMSensorKDD rsquo10) pp 20ndash28 Washington DC USA 2010

[93] J R M Bauchet S Giroux H Pigot et al ldquoPervasive assistancein smart homes for people with intellectual disabilities a casestudy on meal preparationrdquo International Journal of AssistiveRobotics and Mechatronics vol 9 no 4 pp 42ndash54 2008

[94] D J Cook andM Schmitter-Edgecombe ldquoAssessing the qualityof activities in a smart environmentrdquoMethods of Information inMedicine vol 48 no 5 pp 480ndash485 2009

[95] I H Witten and E Frank Data Mining Practical MachineLearning Tools and Techniques With Java Implementation Mor-gan Kaufmann 2000

[96] K Sharma M Rajpoot and L K Sharma ldquoNearest neighbourclassification for wireless sensor network datardquo InternationalJournal of Computer Trends and Technology no 2 2011

[97] NS2 Simulator httpwwwisiedunsnamns[98] O P V L K Sharma S Schieder and A K Akasapu ldquoA nearest

neighbour classification for trajectory datardquo in Springer CCISvol 101 pp 180ndash185 2010

[99] M J Akhlaghinia A Lotfi C Langensiepen and N SherkatldquoA fuzzy predictor model for the occupancy prediction of anintelligent inhabited environmentrdquo in Proceedings of the IEEEInternational Conference on Fuzzy Systems (FUZZ rsquo08) pp 939ndash946 June 2008

[100] M Gaber S Krishnaswamy and A Zaslavsky ldquoOn-boardmining of data streams in sensor networksrdquo in AdvancedMethods for Knowledge Discovery from Complex Data pp 307ndash335 2005

[101] M M Gaber S Krishnaswamy and A Zaslavsky ldquoAdaptivemining techniques for data streams using algorithm outputgranularityrdquo in Proceedings of the Australasian Data MiningWorkshop 2003

[102] M M Gaber A Zaslavsky and S Krishnaswamy ldquoResource-aware knowledge discovery in data streamsrdquo in Proceedingsof 1st International Workshop on Knowledge Discovery in DataStreams held in Conjunction ECML and PKDD 2004

[103] S M McConnell and D B Skillicorn ldquoA distributed approachfor prediction in sensor networksrdquo in Proceedings of the Work-shop on Data Mining in Sensor Networks Newport Beach CalifUSA 2005

[104] B Malhotra I Nikolaidis and J Harms ldquoDistributed classifi-cation of acoustic targets in wireless audio-sensor networksrdquoComputer Networks vol 52 no 13 pp 2582ndash2593 2008

[105] K Flouri B Beferull-Lozano and T Tsakalides ldquoTraininga SVM-based classifier in distributed sensor networksrdquo inProceedings of the 14th International Conference onDigital SignalProcessing (DSP rsquo09) pp 1ndash5 2006

[106] K Flouri B Beferull-Lozano and T Tsakalides ldquoEnergy-efficient distributed support vectormachines for wireless sensornetworksrdquo in Proceedings of the EuropeanWorkshop onWirelessSensor Networks 2006

[107] K Flouri B Beferull-Lozano and T Tsakalides ldquoDistributedconsensus algorithms for SVM training in wireless sensornetworksrdquo in Proceedings of the 16th European Signal ProcessingConference (EUSIPCO 09) 2008

[108] S Rajasegarar C Leckie M Palaniswami and J C BezdekldquoQuarter sphere based distributed anomaly detection in wire-less sensor networksrdquo in Proceedings of the IEEE InternationalConference on Communications (ICC rsquo07) pp 3864ndash3869 June2007

[109] B Chikhaoui S Wang and H Pigot ldquoA new algorithm basedon sequential pattern mining for person identification in ubiq-uitous environmentsrdquo in Proceedings of the 4th InternationalWorkshop on Knowledge Discovery form Sensor Data (ACMSensorKDD rsquo10) pp 20ndash28 Washington DC USA 2010

[110] K Romer and F Mattern ldquoThe design space of wireless sensornetworksrdquo IEEEWireless Communications vol 11 no 6 pp 54ndash61 2004

[111] O Diallo J J P C Rodrigues and M Sene ldquoReal-time datamanagement on wireless sensor networks a surveyrdquo Journal ofNetwork andComputer Applications vol 35 no 3 pp 1013ndash10212012

[112] Y Yao L Feng B Jin and F Chen ldquoAn incremental learningapproachwith SupportVectorMachine for network data streamclassification problemrdquo Information Technology Journal vol 11no 2 pp 200ndash208 2012

Submit your manuscripts athttpwwwhindawicom

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2013Part I

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

DistributedSensor Networks

International Journal of

ISRN Signal Processing

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Mechanical Engineering

Advances in

Modelling amp Simulation in EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Advances inOptoElectronics

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2013

ISRN Sensor Networks

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawi Publishing Corporation httpwwwhindawicom Volume 2013

The Scientific World Journal

ISRN Robotics

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

International Journal of

Antennas andPropagation

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

ISRN Electronics

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

thinspJournalthinspofthinsp

Sensors

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Active and Passive Electronic Components

Chemical EngineeringInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Electrical and Computer Engineering

Journal of

ISRN Civil Engineering

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Advances inAcoustics ampVibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Page 10: ReviewArticle Data Mining Techniques for Wireless Sensor ...home.etf.bg.ac.rs/~vm/os/dmsw/Data Mining... · have a large impact on type of data mining algorithm to choose;therefore,onehastodecidetheprocessing

10 International Journal of Distributed Sensor Networks

Class label (Y)

Attribute set (X)

OutputInput Classification model

Figure 5 Classification maps input attribute set (X) to class label(Y)

to form clusters Those nodes that hear the request decidewhether they should nominate themselves as CHs basedon their energy After receiving the base-station requestsensor nodes having intention to become CHs wait for arandom time period that is based on the remaining batterysupply If a node nominates itself then it broadcasts anannouncement to all nodes A node joins the CH that itcan reach over the least number of hops Upon hearing aCH announcement from a node whose attribute is differentthe recipient node establishes a new cluster for that attributeand becomes a CH To evaluate the attribute-based clusteringscheme the authors have provided the theoretical analysis ofit with flooding-based schemes Analysis shows its attribute-based clustering scheme yield that gains over flooding-basedschemeswhen there are subregions in the sensor network thatare more targeted than others that is when the distributionof inquiries is not uniformly distributed over time and space

Ma et al [90] the proposed distributed hierarchicalclustering and Summarization algorithm (DHCS) for onlinedata analysis and mining in sensor networks The proposedmethod clusters sensor nodes based on their current datavalues aswell as their geographical proximity and it computesa summary for each cluster The algorithm adopts severaltechniques such as difference and hop count thresholds tomodel node and distance-based clustering Initially eachnode treats itself as an active cluster Then similar adjacentclusters are merged into larger clusters round by round Ineach round each cluster will try to combine with its mostsimilar adjacent cluster simultaneously Two clusters can bemerged only if both consider one another as the most similarneighbor DHCS terminates when no merging happens anymore The final clusters which cannot be merged any moreare called steady clusters

44 Classification Classification is a task of assigning newobject into a class of predefined object categories Classifi-cation model is learned using the set of training data andclassifies new data into one of the learned class Figure 5shows that classification maps input attribute set (X) to classlabel (Y)

Classification-based approaches have adapted the tra-ditional classification techniques such as decision tree-based rule-based nearest neighbor-based and support vectormachines-based techniques based on type of the classificationmodel that they used Decision tree is a classifier in the formof tree and classifies the instance by starting at the root oftree and moving through it until a leaf node where class labelis assigned The internal nodes are used to partition datainto subsets by applying test condition to separate instancesthat have different characteristics Nearest neighbor-basedapproaches classify dataset based on closet training examples

The training examples are vectors in a multidimensionalfeature space with corresponding class labels A nearestneighbor classifier is a lazy learner that does not processpatterns during training [91] To respond a request to classifya query vector is made to locate the closest training vectorsaccording to the distance metricThe classes of these trainingvectors are used to assign a class to the query vector

Rule-based classifier groups the dataset in predefinedclasses by using ldquoif then rdquo rules of following form

(Condition) rarr Y condition is a conjunction ofattribute and Y is a class label

SVM (support vector machine) techniques partition thedata belonging to different classes by fitting a hyperplanebetween them which maximizes the partition The data ismapped into a higher-dimensional feature space where it canbe easily partitioned by a hyperplane Furthermore a kernelfunction is used to approximate the dot products between themapped vectors in the feature space to find the hyperplane

441 Centralized Approaches Aim to SolveWSNsrsquo Application-Based Issues Chikhaoui et al [92] proposed the decisionTree (DT-) based classification technique for sensor dataThey applied the classification model to identify the personsin ubiquitous environment In order to identify personsthe proposed approach first extracts frequent patterns calledepisodes from the datasets using the Apriori algorithm [53]The next step evaluates the extracted patterns and assignsweights to these episodes to construct frequent episodeweight matrix (FEWM)

Finally the classification algorithm Decision tree (DT) isapplied on FEWMDT builds pattern classifier from a labeledtraining data-set using a divide-and-conquer approach Tobuild up a DT model it recursively selects the attribute thatis used to partition the training data-set into subsets untileach leaf node in the tree has uniform class membershipThe proposed approach is validated by experiment usingdata collected from the Domus Laboratory [93] and theTestbed smart home [94] The general performance andclassification accuracy of algorithm are evaluated by usingthe Weka framework version 370 [95] Experiment resultsshow good classification However using frequent episodesalone without temporal constraints and deep analysis doesnot guarantee good identification

Sharma et al [96] proposed amethodology for classifyingthe sensors data by using nearest neighbor trajectory clas-sification (NNTC) The training phase simply stores everytraining example with its label To make a prediction for atest example first its distance to every training example iscomputedThen 119896 closest training examples are storedwhere119896 is a fixed integer and 119896 ge 1 among the 119896 examples itlooks for the label that is most frequent This label is theprediction for this test example The algorithm is evaluatedby building a classifier from the preprocessed training datagenerated from NS2 [97] and test trajectory data [98] usingclass labels Experimental investigation yields a significantoutput in terms of the correctly classified success rate 923

Akhlaghinia et al [99] proposed the prediction techniquein smart home environments to predict the behavior pattern

International Journal of Distributed Sensor Networks 11

of occupantsThe sensor NWs collect the variety of attributesincluding environmental changes and occupantrsquos interactionwith the environment The collected data is then used by thelearning approach to construct a classification-based predic-tive model to predict the ambient intelligence environmentoccupancy The occupancy is predicted by using the fuzzyrules which are modeled by using the past value of timeseries data In the learning process input from the sensor iscompared with stored rules to take appropriate action Theprediction-based approach improves the energy saving insmart homes and enhances the safety and security of occu-pants The result shows the ability of the proposed techniqueto predict the combined occupancy time series However themodel is implemented in single-user environment and unableto predict the complex environmental patterns in multi-userenvironment over long period

442 Centralized Approaches Aim toMaximizeWSNsrsquo Perfor-mance Gaber et al [100] proposed the lightweight classifica-tion (LWClass) a one-pass algorithm for on-board miningof data streams in sensor networks They used the algorithmoutput granularity (AOG) [101 102] technique to preserve thelimited memory size and change the algorithm output rateaccording to data rate available memory algorithm outputrate history and time constraints to fill the available memorywith generated knowledgeThe algorithmworks by searchingfor the nearest instance stored in main memory when a newelement arrives All instances are already stored in the mainmemory according to a prespecified distance threshold Thethreshold here represents the similarity measure acceptableby the algorithm to consider two or more elements as oneelement according to the elements attribute values If thealgorithm finds this element then it checks the class labelIf the class label is the same then it increases the weightfor this instance by one otherwise it decrements the weightby one If the weight becomes zero then this element isreleased from the memory The algorithm is empiricallyvalidated using synthetic streaming data under the resource-constrained environment of a common handheld computer

443 DistributedApproaches Aim to SolveWSNsrsquo Application-Based Issues McConnell and Skillicorn [103] presented adistributed framework for building and deploying predictorsin sensor networks By using the computational power ofeach sensor a powerful learning structure on whole networkis constructed A distributed voting approach is proposedin which each sensor is a leaf of tree (DT) to performlocal prediction Instead of sending the raw data the localpredictive models built on sensors transmit the target class tothe sink At sink the local predication models are combinedto construct global prediction model It shows how thelocal model enables sensors to respond to the change intarget by relearning local models The proposed frameworkis useful especially for sensor networks with limited energycomputation and bandwidth resources It makes efficientthe distributed data mining in the presence of movingclass boundaries Data is also confidentially achieved bytransmitting a predictivemodel instead of original data to the

sink The distributed prediction model is evaluated using J48decision tree (implemented in WEKA) on variety of datasetfor both simple and weighted voting schemes According toresults distributed prediction model has the potential of anincrease in accuracy combined with a reduction in modelsize and runtime as compared with a centralized approachMajor issues in this framework are the need of an expensiveCPU on each sensor node for computing and building localpredictive model and also extra memory is required to storelocal predictive model

444 Distributed Approaches Aim to Maximize WSNsrsquo Per-formance Malhotra et al [104] proposed a distributed clas-sification scheme to generate effective feature vectors of lowdimension (FVLD) for wireless audio network A distributedcluster-based algorithm for detection and classification ofvehicles has been proposed Sensors form clusters on-demand for the sake of running a classification task based onthe produced feature vectors The monitoring area is dividedinto clusters and a cluster head is selected for each clusterAll sensors send their feature vector to cluster heads Thecluster head combines all received feature vectors (includingone from itself) executes the classification task using forexample KNN or ML classifiers and makes decision on theclass of the unknown vehicle Two approacheswere proposedthe first combines extracted features and the second combinesindividual decisions Classification using decision fusion anda maximum likelihood (ML) classifier led to the best resultsML is also compared with KNN classifier with varioussettings of data and decision fusion schemes The proposedtechnique produced the best classification accuracy of 8946as compared with all other approaches

Flouri et al [105ndash107] have proposed distributed andincremental techniques for learning classification rules usingSVM-based (support vector machine) technique in a sensornetwork The authors proposed two distributed algorithmsthe distributed fix partition SVM (DFP-SVM) and theweighted distributed fix partition SVM (WDFP-SVM) fortraining a SVM applied to the classification problem in aWSN SVM is incrementally trained on example set calledsupport vector The fact with SVM is that the number ofsupport vectors is very small comparedwith the number of allsample values Besides the support vectors (and offset) revealcompressed representation of separating SVM hyperplaneThat is why sending only the support vectors instead ofall training samples to the next cluster head is obviouslyvery energy efficient due to communication reduction Aftertraining the required parameters of the kernel functions aretransferred to each node for classification The performanceof the proposed approach is evaluated by running number ofsimulation and comparison is made with centralized algo-rithm The results show that energy consumption decreaseswhen the SVM is trained incrementally as compared with thecentralized case However the challenges for SVM formula-tions are computational complexity and the choice of properkernel function

Rajasegarar et al [108] proposed the SVM-based tech-nique for outlier detection in sensor data This techniqueuses one-class quarter-sphere SVM to identify local outliers

12 International Journal of Distributed Sensor Networks

at each node and to minimize the computational complexityThe sensor data that lies outside the quarter sphere isconsidered as an outlier Each node communicates onlythe radius information of sphere with its parent for outlierclassification This technique identifies outliers from the datameasurements collected after a long-time window and is notperformed in real time The technique also ignores spatialcorrelation of neighboring nodes which makes the results oflocal outliers inaccurate The technique is evaluated by usingthe real sensor measurement collected from deployment ofwireless sensors in the Great Duck Island Project [2] formonitoring the habitat of sea birds The algorithm is imple-mented in Matlab and two simulations were run to measurethe computational strategy and various kernel functionsResults reveal that the proposed technique achieves signifi-cant energy savings in terms of communication overhead inthe network

5 Comparison of Data Mining Techniquesfor WSNs

This section identifies several common and different aspectsof data mining techniques specially designed for WSNsdiscussed above These aspects will be used as metrics in thecomparative Tables 2 3 4 5 and 6 First evaluation aspectsfor different techniques are discussed and then comparativetables are presented to compare and differentiate existing datamining techniques for WSNs data

51 Input Sensor Data Sensor data can be viewed as largevolume of real-valued data that is continuously collectedfrom WSNs The type of input sensor data demonstrateswhich data mining techniques can be used to analyze thedata Data mining techniques usually consider following twocharacteristics of data

Attribute Mining techniques can identify the associationbetween data attributes Attributes can be homogenous [50] orheterogeneous [33 48] Homogenous attribute means sensingsingle-value attribute for example temperature only Forheterogeneous case each nodemay be equippedwithmultiplesensors and can sense multiple attributes for example tem-perature humidity and pressure The data mining techniqueshould be able to identify the correlation between multipleattributes

Correlation Two types of data correlation appear at eachsensor node The first type is attribute correlation that isdependency among data attributes The second type is interms of time and space that is temporal and spatial corre-lation Temporal correlation indicates that the readings fromdifferent sensor node are observed at the same time instantand readings observed at one time instant are related tothe readings observed at the previous time instant whereasspatial correlation indicates that the readings from sensornodes geographically close to each other are expected tobe largely correlated Capturing spatiotemporal correlation

helps to predict future trend of sensor reading and identifica-tion of dead node if reading from correlated sensor ismissing

52 Processing Architecture In order to apply data miningtechnique on sensor data we need to determine the modelsof computation There are two general models Consider thefollowing

CentralizedThe simplest way to analyzeWSNs data is to use acentralized model In this approach entire raw data collectedfromWSNs is transferred to central server whichmaintains adatabase of readings from all of the sensorsThe central serverperforms offline extensive analysis in order to find interestingpatterns from the aggregated data With the size of WSNsincreasing the amount of data transmitted in the system willbecome huge The obvious drawback of this approach is highconsumption of energy and bandwidth Furthermore it is notscalable to very large number of sensors

Distributed Another computation approach uses distributedmodel in which sensor nodes use their processing abilitiesto carry out some mining tasks locally and transmit onlythe required and partially processed data called local modelLocal models contain the compact event patterns rather thanraw data For example data collected from different sensorcan be aggregated before being transmitted to central serverIn these systems an intermediate node called ldquoaggregatorrdquo isused to collect and aggregate the data from different sensorsSince sensor nodes are constrained in resources the challengefor this approach is how to satisfy the mining accuracywhile keeping the communication overhead memory andcomputational cost low

53 Data Mining Method It refers to the data miningalgorithm adapted or developed for unique characteristic ofWSNs data Distributed approaches use one-scan algorithmsfor real-time processing in order to deal with the high dataarrival rate the mining results are expected to be availablewithin short response times whereas centralized approachescollect the sensory data to single site and applies offlinemultiscan technique for extensive data analysis

54 Node Properties The proposed techniques are largelyinfluenced by following types of node properties

Connectivity Single-hop communication is a direct commu-nication between the sensor node and the base station It issimple and easy to implement but limited by communicationdistanceMultihop communication uses some kinds of nodesas relays when transmitting data packets from the source tothe sink which is more complex

Mobility Node mobility increases the complexity of design-ing an appropriate data mining technique for WSNs Themajority of techniques assumes that sensor nodes are staticonly a few techniques consider the node mobility Whennodes are mobile maintaining a certain structure for data

International Journal of Distributed Sensor Networks 13

Table2Com

paris

onof

dataminingtechniqu

esforw

irelesssensor

networks

Approach

Objectiv

eDM

metho

d

Processin

gSensor

data

Nod

eproperties

Implem

entatio

nLimitatio

nsArchitecture

Attributes

Correlatio

nCon

nectivity

Mob

ility

Nod

erole

Nod

etask

Applicationarea

Evaluatio

nmetho

dDatas

ource

Opt

objective

Distributed

Central

Homogenous

Heterogeneous

Attribute

Spatial

Temporal

Singlehop

Multihops

Static

Mobile

ClusterheadSensorRelay

Simulation

AnalyticalMod

Real

Synthetic

Frequent

patte

rnmining

DSA

RM[42]

Missingdata

estim

ation

Aprio

rilik

eradicradic

radicradic

radicradic

Sensea

ndsend

Traffi

cmon

itorin

gradic

radicData

accuracy

Igno

rethes

ensor

thatrepo

rts

different

values

In-networkdata

mining[51]

Eventspatte

rns

discovery

Aprio

rilik

eradic

radicradicradic

radicradic

radic

Aggregatio

nlocalp

attern

mining

Environm

ental

mon

itorin

gradicradicradic

Scalability

Highmem

oryand

commun

ication

Distrib

uted

data

aggregation[15]

ImproveW

SNperfo

rmance

Aprio

rilik

eradic

radicradic

radicradic

radicSupp

ort-b

ased

aggregation

WSN

sperfo

rmance

mon

itorin

gradic

radicDatas

ize

Increasesb

uffer

cost

delayed

crucialm

essages

Onlinea

lgorith

m[46]

Intervallist

ofrepresentatio

nof

WSN

sdata

Lossy

coun

ting

radicradic

radicradic

radicradic

Perio

dical

sensing

WSN

smon

itorin

gradic

radicTimea

ndmem

ory

Datar

edun

dancy

Lightweightrule

learning

[48]

Identifyhigh

lycorrelated

rules

forsensin

gAp

riorilik

eradic

radicradic

radicradic

radicQuery-based

data

sensing

Con

trolW

SNs

operations

radicradic

Energy

Not

valid

ated

well

onrealdata

CARM

[43]

Missingdata

estim

ation

FP-growth

based

radicradic

radicradic

radicradic

Sensea

ndsend

Dataa

nalysis

radicradic

Data

accuracy

Ineffi

cientfor

hand

ling

high

-speed

data

14 International Journal of Distributed Sensor Networks

Table3Com

paris

onof

dataminingtechniqu

esforw

irelesssensor

networkscon

tinued

Approach

Objectiv

eDM

metho

d

Processin

gSensor

data

Nod

eproperties

Implem

entatio

nLimitatio

nsArchitecture

Attributes

Correlatio

nCon

nectivity

Mob

ility

Nod

erole

Nod

etask

Applicationarea

Evaluatio

nmetho

dDatas

ource

Opt

objective

Distributed

Central

Homogenous

Heterogeneous

Attribute

Spatial

Temporal

Singlehop

Multihops

Static

Mobile

Clusterhead

Sensor

Relay

Simulation

Analyticalmod

Real

Synthetic

Frequent

patte

rnmining

Associationrules

mining

fram

ework[50]

Faultand

future

event

predictio

n

FP-growth

usingPL

T-str

uctureradic

radicradic

radicradic

radicradic

Aggregatio

nMon

itorW

SNs

quality

ofserviceradic

radicNoof

messages

Increase

costdu

eto

multip

leDBscan

SP-tr

ee[49]

Disc

over

events

patte

rns

FP-growth

based

radicradic

radicradic

radicradic

Sensea

ndsend

Generic

mon

itorin

gradicradicradic

Mem

ory

Hightre

econstructio

ncost

Sequ

entia

lpattern

mining

Relatio

nal

fram

ework[58]

Multi-

dimensio

nal

correlation

discovery

Aprio

rilik

eradic

radicradicradic

radicradic

Sensea

ndsend

Environm

ental

mon

itorin

gradicradic

Data

representatio

nMem

oryandtim

econsum

ing

Episo

dediscovery(ED)

[21]

Actio

npredictio

n

Generalized

sequ

entia

lpatte

rn(G

SP)

radicradic

radicradic

radicSensea

ndsend

Inhabitants

behavior

predictio

nradicradicradic

Predictio

naccuracy

Ineffi

cientfor

complex

activ

ities

MPG

[64]

Predicto

bjectrsquos

future

movem

ent

Aprio

rilik

eradic

radicradic

radicradicradic

Clusterin

gRe

al-timeo

bject

tracking

radicradic

Tracking

time

andenergy

Not

analyzed

onrealdataset

Con

textual

patte

rns

discovery[22]

Ano

maly

detection

PSP

radicradicradic

radicradic

radicSensea

ndsend

Railw

aymaintenance

radicradic

Ano

maly

precision

Missingreal-time

anom

alypredictio

n

International Journal of Distributed Sensor Networks 15

Table4Com

paris

onof

dataminingtechniqu

esforw

irelesssensor

networkscon

tinued

Approach

Objectiv

eDM

metho

d

Processin

gSensor

data

Nod

eproperties

Implem

entatio

nLimitatio

nsArchitecture

Attributes

Correlatio

nCon

nectivity

Mob

ility

Nod

erole

Nod

etask

Applicationarea

Evaluatio

nmetho

dDatas

ource

Optobjectiv

e

Distributed

Central

Homogenous

Heterogeneous

Attribute

Spatial

Temporal

Singlehop

Multihops

Static

Mobile

ClusterheadSensorRelay

Simulation

Analyticalmod

Real

Synthetic

Sequ

entia

lpattern

mining

TMP-mine[65]

Predicto

bjectrsquos

future

movem

ent

Patte

rngrow

thusingTM

P-tre

econstructio

nradic

radicradic

radicradic

radicRu

le-based

node

activ

ation

Real-timeo

bject

tracking

radicradic

Energy

Highmissing

rateandtim

e

Patte

rnlearner[23]B

ehavior

recogn

ition

Tree

projectio

nradic

radicradic

radicradic

radicSensea

ndsend

Behavior

mon

itorin

gradicradic

Noof

patte

rns

learned

Com

plex

and

redu

ndant

patte

rns

MSA

P[63]

Faultp

rediction

Cand

idate

constructio

nradicradic

radicradicradic

radicSensea

ndsend

Telecommun

ication

radicradic

Patte

rnsa

ccuracy

Cand

idate

constructio

nis

expensiveto

compu

te

PTSP

[66]

Objectrsquos

future

movem

ent

predictio

n

Sequ

entia

lpatte

rngeneratio

nradic

radicradic

radicradic

radicRu

le-based

node

activ

ation

Objecttracking

radicradic

Energy

Ineffi

cientto

predict

high

-speed

objects

Clusterin

g

DCC

[86]

WSN

slon

gevity

Data

correlation-

based

cluste

ring

radicradic

radicradicradic

radicradic

Data

supp

ression

GenericWSN

sapplication

radicradic

Energy

anddata

size

Highclu

sterin

grate

H-cluste

r[85]

In-network

commun

ication

Data

correlation-

based

cluste

ring

radicradic

radicradicradic

radicradic

Data

summarization

Real-time

mon

itorin

gradic

radicradic

Com

mun

ication

Highdataloss

rate

16 International Journal of Distributed Sensor Networks

Table5Com

paris

onof

dataminingtechniqu

esforw

irelesssensor

networkscon

tinued

Approach

Objectiv

eDM

metho

d

Processin

gSensor

data

Nod

eproperties

Implem

entatio

nLimitatio

nsArchitecture

Attributes

Correlatio

nCon

nectivity

Mob

ility

Role

Nod

etask

Applicationarea

Evaluatio

nmetho

dDatas

ource

Optobjectiv

e

Distributed

Central

Homogenous

Heterogeneous

Attribute

Spatial

Temporal

Singlehop

Multihops

Static

Mobile

ClusterheadSensorRelay

Simulation

Analyticalmod

Real

Synthetic

Clusterin

gPredictio

nmod

el[87]

Predictio

n-based

mon

itorin

gHeuris

ticscheme

radicradic

radicradic

radicradic

radicradicradic

Localprediction

mod

elEn

vironm

ental

mon

itorin

gradic

radicCom

mun

ication

Clustero

verla

pping

CAG[88]

WSN

sbandw

idth

gain

Data

correlation-

based

cluste

ring

radicradic

radicradic

radicradic

radicradic

Dataa

ggregatio

nGenericWSN

sapplications

radicradic

Com

mun

ication

Sensorydataloss

EEDC[84]

On-demand

cluste

ring

Data

correlation-

based

cluste

ring

radicradic

radicradic

radicradic

radicSensea

ndsend

Surveillanced

ata

analysis

radicradicradic

Energy

Ineffi

cientfor

large

WSN

s

Clusterin

gsensorydata[67]Com

mun

ication

efficiency

K-means

radicradicradic

radicradic

radicradic

Data

summarization

Dataa

nalysis

radicradic

Com

mun

ication

Ineffi

cientfor

large

WSN

sAttributeb

ased

cluste

ring[89]

WSN

sbandw

idth

gain

Hierarchal

cluste

ringradic

radicradic

radicradic

radicradic

Datac

luste

ring

Mon

itorin

gand

tracking

radicradic

Com

mun

ication

Highcompu

tatio

ncost

DHCS

[90]

Uniform

data

distr

ibution

Hierarchal

cluste

ringradic

radicradicradic

radicradic

radicradic

Datac

luste

ring

and

summarization

Interactived

ata

analysis

radicMessage

redu

ction

Nod

esenergy

isigno

red

International Journal of Distributed Sensor Networks 17

Table6Com

paris

onof

dataminingtechniqu

esforw

irelesssensor

networkscon

tinued

Approach

Objectiv

eDM

metho

d

Processin

gSensor

data

Nod

eproperties

Implem

entatio

nLimitatio

nsArchitecture

Attributes

Correlatio

nCon

nectivity

Mob

ility

Role

Nod

etask

Applicationarea

Evaluatio

nmetho

dDatas

ource

Opt

objective

Distributed

Central

Homogenous

Heterogeneous

Attribute

Spatial

Temporal

Singlehop

Multihops

Static

Mobile

ClusterheadSensorRelay

Simulation

Analyticalmod

Real

Synthetic

Classifi

catio

nPerson

identifi

catio

nalgorithm

s[109]

Identifyhu

man

behavior

Decision

tree

radicradicradic

radicradic

radicSensea

ndsend

Health

care

radicradic

Classifi

catio

naccuracy

Doesn

otgu

arantee

thec

orrectness

Predictio

nfram

ework[103]

Distrib

uted

predictio

nDecision

tree

radicradic

radicradicradic

radicradic

Localprediction

Generic

radicradic

Predictio

naccuracy

Com

putatio

nal

complexity

NNTC

[96]

Real-time

classificatio

nNearest

neighb

orradicradic

radicradic

radicradic

Sensea

ndsend

Generic

radicradicradic

Classifi

catio

naccuracy

Not

evaluatedon

realdataset

LWClass[100]

Preserve

WSN

sresources

KNN

radicradic

radicradic

radicradic

Sensea

ndsend

Ubiqu

itous

environm

ents

radicradic

Resource

awareness

Non

adaptio

nto

conceptd

rift

FVLD

[104

]Lo

w-dim

ensio

nfeaturev

ector

generatio

nKN

NM

Lradic

radicradic

radicradic

radicradic

Classifi

catio

nVe

hicle

classificatio

nradic

radicEn

ergy

Highcostof

feature

vector

transm

ission

Fuzzypredictor

mod

el[99]

Occup

ancy

predictio

nFu

zzyrules

radicradic

radicradic

radicradic

Sensea

ndsend

Health

care

radicradic

Predictio

naccuracy

Ineffi

cientfor

complex

scenarios

Onlinelearning

[105]

Increm

ental

classificatio

nSV

Mradic

radicradic

radicradic

radicradic

Classifi

catio

nEn

vironm

ental

mon

itorin

gradic

radicEn

ergy

Com

putatio

nal

complexity

One-class

quarter-sphere

SVM

[108]

Ano

maly

detection

SVM

radicradic

radicradic

radicradicradic

Localano

maly

detection

Habitat

mon

itorin

gradic

radicEn

ergy

Igno

resspatia

lcorrelation

18 International Journal of Distributed Sensor Networks

mining becomes difficult because updates on this structureshould be persisted over time

Node Role Node can perform three types of role [33] asfollows

(i) Regular Sensor These are the nodes with limitedresources and they are used to sense the phenomenaand send the sensed data to the base station

(ii) Cluster Head Cluster head can be a regular sensornode or it can be rich in resources In centralizedapproaches cluster head is a regular sensor node thatonly controls the cluster membership In distributedapproaches besides responding for cluster formationCHs perform aggregationfusion of collected sensorsrsquodata Therefore they are equipped with significantlymore computation and communication resources

(iii) Relay It is the node that acts as medium to transmitthe data packet from one node to the others

Node Task In centralized approach node task is to sense thephenomena being monitored and send the sensed data to thebase station In distributed approaches node can performcomputation and can take action based on the detectedphenomena or target

55 Application Area We also evaluated the type of applica-tion benefited fromWSNs data mining techniques Here weexemplify some real-world applications as follows

(i) First is the environmental monitoring [5ndash7 51 5887] in which sensors are deployed in harsh andunattended regions to monitor the natural environ-ment Data mining techniques can identify when andwhere an event may occur and trigger an alarm upondetection

(ii) Second is the habitant and health monitoring [1 299 109] in which patientshumans are equipped withsmall sensors on multiple different positions of theirbody tomonitor their health or behaviorDataminingtechnique can identify the abnormal behavior andhelp to take effective action

(iii) Third is the object tracking [3 4 65 66] in whichsensors are embedded inmoving targets to track themin real-time Data mining techniques help to improvethe estimation of the location of targets and also tomake tracking more efficient and accurate

(iv) Fourth is the WSNs performance [46 48 50 51]WSNs are usually unattended and deployed in harshenvironment Sensor nodes are resource constrainedespecially in terms of power Data mining techniqueshelp to identify the faulty or dead nodes Theyalso help to conserve energy by using in-networkprocessing in which aggregated data is sent to centralside

(v) Fifth is the data analysis [67 84 90] Data miningtechniques help to discover potentially interesting

data patterns in a sensor network for a certainapplication

(vi) Sixth is the real-time monitoring [64 65 85] Datamining techniques especially distributed techniqueshelp to identify certain patterns and predict futureevents in a given time window which make real-timeresponse and action feasible

56 Implementation Each technique is also evaluated interms of experimental validation that is which dataset isused which WSNs optimization objectives are achieved andso forth

Evaluation Method Analytical modeling simulation andreal deployment are the most commonly used techniques toanalyze the performance of data mining technique forWSNs

(i) Analytical Modeling This method is very complexand usually certain simplifications are assumed topredict the performance of the proposed schemeSuch assumptions and simplifications may lead toimprecise results with limited confidence

(ii) Simulation It is the most popular and effectiveapproach to design and test any proposed schemein terms of cost and time it also provides higherlevel of details as comparedwith real implementationHowever the appropriate selection of a simulationframework according to problem and network char-acteristics is a critical task

(iii) Real Deployment It may not be feasible to evaluatethe performance of these techniques through realdeployment due to the unavailability of appropriatehardware in terms of technical and design limitationsUsually the real deployment requires hundreds ofsensor nodes and cost becomes another importantissue In a nutshell evaluating any technique pro-posed for WSNs through real deployment can getthe most convincing results although the evaluatingprocess is complex costly and time consuming

Data Source It refers to dataset use to experimentally validatethe proposed technique Two types of dataset are usedgenerally that is synthetic and real It is observed from thispaper that most of the techniques use the simulation onsynthetic dataset to validate the result In this paper it isobserved that most of the studies used the simulation due tolimited processing power of sensor nodes

Optimization Objective SinceWSNs are constrained in termsof different resources the technique is also evaluated in theoptimization objective that has been achieved Most of thetechniques consider the resource constraint and differentdesign philosophies of network None of them can workefficiently for all of the performance metrics like networksize communication overhead energy efficiency memoryconsumption node mobility and and so forth The largevariations in the performance metrics make it a difficult taskto present a comprehensive evaluation

International Journal of Distributed Sensor Networks 19

6 Limitations of Existing Data MiningTechniques for WSNs

Tables 2ndash6 show the characteristics of datamining techniquesdesigned for WSNs It is observed from comparative analysisthat the existing techniques have the following shortcomings

(i) Most of the techniques do not take into account theheterogeneous data and assume that the sensor data ishomogenous [42 46 49ndash51 65 87 110] They ignorethe fact that different attributes together can improvethe mining accuracy In some cases homogenousdata cannot contribute appropriately toward real-time decision

(ii) The majority of techniques only considers the spatialor temporal or spatiotemporal correlations [65ndash6787 88] among sensor data of neighboring nodes anddoes not consider the attribute dependency amongsensor nodes This in turn increases the computa-tional complexity and reduces the accuracy of miningtechnique

(iii) The techniqueswhich consider spatial correlation [51]among sensor data of neighboring nodes suffer fromthe choice of appropriate neighborhood range Tech-niques which consider temporal correlation amongsensor data suffers from the choice of the size of thesliding window

(iv) The majority of techniques uses centralized approach[21 42ndash44 46 58 84 101] in which all data istransmitted to the sink node for identifying certainpatterns These techniques cause much communica-tion overhead and delay the response time Whilethe techniques that used distributed architecture opti-mize response time and energy consumption theyhave the same problem as that of the centralizedapproach if the aggregatorcluster head has a largenumber of nodes under its membership

(v) Excluding a few the performance of all of the schemesdiscussed in this paper has been evaluated with thehelp of different simulation tools Although the num-ber of simulators is available and plays an importantrole for developing and testing new technique thereis always some kind of risk involved as simulationresults may not be accurate In order to analyze aprotocol more effectively it is important to knowdifferent available tools andunderstand the associatedbenefits and limitationsDue to different performancerequirements according to specific applications ageneral tool for sensor networks is still lacking atpresent

(vi) The techniques evaluated by using analytical mod-eling [21 23 46 49 100 109] used certain sim-plification and assumption to evaluate the perfor-mance of proposed technique Such assumptions andsimplifications may lead to imprecise results withlimited confidence None of the proposed techniqueis evaluated by using real deployment Although realdeployment is complex costly and time consuming

accurate results can only be obtained by using realdeployment

(vii) Excluding a few [22 103 109] the majority oftechniques assumes that sensor nodes are stationaryand do not consider nodes mobility Applying thesetechniques for mobile networks or the networks withdynamic changed topology would be challenging

(viii) Most of the techniques used the synthetic dataAlthough synthetic data is easily available therealways been chances that results generated on syn-thetic data are not accurate

(ix) For the data mining techniques themselves fre-quent pattern mining [15ndash20] approaches suffer fromchoice of proper and flexible support and confidencethreshold Clustering techniques [11ndash14] suffer fromthe choice of an appropriate parameter of clusterwidth and computing the distance between datainstances in heterogeneous data is computationallyexpensive whereas classification-based techniques[24ndash26] require some prior knowledge to classify theincoming data stream However learning accurateclassification model is challenging if the number ofvariables is large in deployed WSNs

7 Future Research Directions

It is observed from the analysis of existing data mining workon sensor network-based application there are still shortcom-ings in existing techniques By seeing these shortcomingsand special characteristics of WSNs there is a need for datamining technique designed for WSNs The technique shouldbe based on the following requirements

(i) The technique should combine offline learningmech-anisms with distributed and online data processing

(ii) It should also consider the resource constraint ofWSN and its special characteristics such as nodemobility and network topology

(iii) The technique should consider heterogeneous dataand dependencies among spatial temporal andattribute correlations which may exist between adja-cent nodes

(iv) During online mining the technique should be capa-ble for incremental learning

(v) The technique should have low computation com-plexity and be easy to be implemented

Based on aforementioned requirements for WSN ahybrid data mining framework is proposed as shown inFigure 6 In this framework sensor nodes use their pro-cessing abilities to locally carry out mining processing andtransmit only the required and partially processed data calledlocal models Single-pass algorithms are applied for networkdata processing as the data is continuously arriving and notavailable for the next scan

Local models contain the compact event patterns ratherthan raw data which address the issue of communication

20 International Journal of Distributed Sensor Networks

Node data processingData selectionRemove duplicationAggregationSummarizationData fusionclusteringAssociation analysismiddot middot middotmiddot middot middot

middot middot middot

Sensor datastream

Global model

Approximateresults

Network model Local modelQuery

Users

Sinkbasestation

In-network processingCentralized processing

Central data processingFrequent pattern miningClassificationClusteringIncremental learningPredicationAnomaly detectionTime series analysis

Network data processingLocal model integrationNetwork analysisReal time decisionsNetwork maintenance

Network patternidentification

monitoring

Sing

le p

ass

Mul

ti pa

ss

Figure 6 Proposed hybrid framework for sensor network based applications

overhead associated with data transfer Local models aredistributed on entire network which are integrated at specialnode which is resource sufficient as compared with othersensor nodes As a result a network model is computed that ismore abstract than local model and is transferred to the basestationsink inmultihop fashionThenetworkmodels are thenintegrated at base stationsink to get the global view of entirenetwork named the global model As a result approximatequery answers are returned to endusers

This framework addresses the following shortcomings ofthe existing techniques

(i) It combines the offline learning mechanisms withdistributed and online data processing The dynamicnature of WSNs data requires real-time analysismethodologies and systems Centralized processingthrough high-end computing is also required forgenerating offline predictive insights which in turncan facilitate real-time analysis The applications thatrequire real-time response and actions can use net-work model for decision and knowledge extractionThe applications that need extensive data analysis fortheir decision making can use global model and per-form central processing on base the stationsink Thenetwork model forwards the processed informationto global model for extensive predictive insight

(ii) Since the data management is a crucial issue inWSNsdata [111] in order to deal with large-scale data fromWSNs the proposed framework splits the data pro-cessing tasks at multiple locations in-network pro-cessing and processing at central server In-networkprocessing splits the large task into smaller ones atnode level and cluster head which is distributed overthe entire network and executes parallelly At the node

level storage capacities of single nodes are used tocompute the local model which contains aggregateddata from single node whereas cluster head acquiresthe data from group of nodes and aggregate datareadings over a certain region or period As a resultnetwork model is computed at each cluster headwhich contains compact data from set of nodes andreduces data size to be transmitted Network modelscan be integrated at sink to get the global view ofreal-time applications Since the sink at network levelhas restricted resource and cannot process large-scaledata for predictive analysis therefore network mod-els are sent to central server where global models canbe computed for predictive offline analysis Historicalquery from the user can also be addressed fromcentral server whereas instant query can be handledby sink to support real-time response In this way ofdata distribution the proposed framework is feasibleto deal with large amount of data obtained fromWSNs

(iii) It can consider the resource constraint of sensornode by using context-awareness techniques Mem-ory energy [79] and bandwidth are considered inthe implementation of data processing on the sensorsfor example many summarization and aggregationtechniques can be adopted to reduce energy andbandwidth consumption

(iv) The framework can address the problem quicklychanging nature of WSNs data where characteristicsof the monitored process may change over timeand render the old models outdated This problemcan be addressed using the incremental learning

International Journal of Distributed Sensor Networks 21

mechanism [39 112] that helps the model to updatenew information

(v) The framework can identified the spatial-temporalcorrelation at local model by using data correlation-based clustering whereas attribute correlation can beidentified at global model by using the multipass datamining algorithms

Currently we are working on implementation of thishybrid framework and the implementationwill be completedin the near future

8 Conclusion

The emerging need for the data mining techniques in thefield of WSNs resulted in the development of numerousalgorithms Each one of these algorithms solves certainissues related to the appropriate WSNs type and applicationIn this paper we analyzed discussed and compared therelated existing research approaches We observed that thetechniques intended for mining sensor data at the networkside are helpful for taking real-time decision aswell as serve asprerequisite for development of effective mechanism for datastorage retrieval query and transaction processing at centralside Moreover we have presented problem-based taxonomyan overall analysis and review of the past research and theirlimitations which can provide insights for endusers in apply-ing or developing an appropriate data mining method andappropriate technology forWSNs Based on these limitationswe have proposed a hybrid framework which can addressthe shortcomings of existing work We have also discussedthe challenges for implementing data mining techniques inresource-constrained WSNs Besides there are a number ofopen issues in existing studies which need to be addressedSurely the number of WSNs applications presented hereis neither complete nor exhaustive but merely a sample ofapplications that demonstrate the usefulness and possibleapplications of data mining method in sensor network

We believe that WSNs applications will become moremature and popular with the advancement of sensor tech-nology and sensor data will become more informationrich Mining techniques will then be very significant inorder to conduct advanced analysis such as determiningtrends and finding interesting patterns thus enhancingWSNsperformance and operation The intention to present thispaper is to stimulate interests in utilizing and developing theprevious studies into emerging applications

Acknowledgments

This work was supported in part by the Joint Funds ofNSFC-Microsoft Research Asia under Grant no 60933012the Specialized Research Fund for the Doctoral Programof Higher Education under Grant no 20110142110062 andInternational SampT Cooperation Program of Hubei Provinceunder Grant no 2010BFA008

References

[1] A Rozyyev H Hasbullah and F Subhan ldquoIndoor child track-ing in wireless sensor network using fuzzy logic techniquerdquoResearch Journal of Information Technology vol 3 no 2 pp 81ndash92 2011

[2] R Szewczyk E Osterweil J Polastre M Hamilton A Main-waring and D Estrin ldquoHabitat monitoring with sensor net-worksrdquo Communications of the ACM vol 47 no 6 pp 34ndash402004

[3] S H Chauhdary A K Bashir S C Shah and M S ParkldquoEOATR energy efficient object tracking by auto adjustingtransmission range in wireless sensor networkrdquo Journal ofApplied Sciences vol 9 no 24 pp 4247ndash4252 2009

[4] P K Biswas and S Phoha ldquoSelf-organizing sensor networks forintegrated target surveillancerdquo IEEETransactions onComputersvol 55 no 8 pp 1033ndash1047 2006

[5] L T Lee and C W Chen ldquoSynchronizing sensor networkswith pulse coupled and cluster based approachesrdquo InformationTechnology Journal vol 7 no 5 pp 737ndash745 2008

[6] N Sabri S A Aljunid B Ahmad A Yahya R KamaruddinandM S Salim ldquoWireless sensor actor network based on fuzzyinference system for greenhouse climate controlrdquo Journal ofApplied Sciences vol 11 no 17 pp 3104ndash3116 2011

[7] D Kumar ldquoMonitoring forest cover changes using remotesensing and GIS a global prospectiverdquo Research Journal ofEnvironmental Sciences vol 5 pp 105ndash123 2011

[8] J Yick B Mukherjee and D Ghosal ldquoWireless sensor networksurveyrdquoComputerNetworks vol 52 no 12 pp 2292ndash2330 2008

[9] T Arampatzis J Lygeros and S Manesis ldquoA survey of appli-cations of wireless sensors and wireless sensor networksrdquoin Proceedings of the 20th IEEE International Symposium onIntelligent Control (ISIC rsquo05) pp 719ndash724 June 2005

[10] Y-C Tseng M-S Pan and Y-Y Tsai ldquoWireless sensor net-works for emergency navigationrdquo Computer vol 39 no 7 pp55ndash62 2006

[11] T Yairi Y Kato and K Hori ldquoFault detection by miningassociation rules fromhouse-keeping datardquo inProceedings of the6th International Symposium on Artificial Intelligence Roboticsand Automation in Space pp 18ndash21 2001

[12] O Horovitz S Krishnaswamy and M M Gaber ldquoA fuzzyapproach for interpretation of ubiquitous data stream clusteringand its application in road safetyrdquo Intelligent Data Analysis vol11 no 1 pp 89ndash108 2007

[13] J Gama P P Rodrigues and L Lopes ldquoClustering distributedsensor data streams using local processing and reduced com-municationrdquo Intelligent Data Analysis vol 15 no 1 pp 3ndash282011

[14] Z A Aghbari I Kamel and T Awad ldquoOn clustering largenumber of data streamsrdquo Intelligent Data Analysis vol 16 no1 pp 69ndash91 2012

[15] A Boukerche and S Samarah ldquoAn efficient data extractionmechanism for mining association rules from wireless sensornetworksrdquo in Proceedings of the IEEE International Conferenceon Communications (ICC rsquo07) pp 3936ndash3941 June 2007

[16] Y Chi H Wang P S Yu and R R Muntz ldquoMomentmaintaining closed frequent itemsets over a stream slidingwindowrdquo inProceedings of the 4th IEEE International Conferenceon Data Mining (ICDM rsquo04) pp 59ndash66 November 2004

[17] M Deypir and M H Sadreddini ldquoEclatDS an efficient slid-ing window based frequent pattern mining method for data

22 International Journal of Distributed Sensor Networks

streamsrdquo Intelligent Data Analysis vol 15 no 4 pp 571ndash5872011

[18] J Gama A Ganguly O Omitaomu R Vatsavai and M GaberldquoKnowledge discovery from data streamsrdquo Intelligent DataAnalysis vol 13 no 3 pp 403ndash404 2009

[19] B George J M Kang and S Shekhar ldquoSpatio-temporal sensorgraphs (STSG) a data model for the discovery of spatio-temporal patternsrdquo Intelligent Data Analysis vol 13 no 3 pp457ndash475 2009

[20] A Mahmood K Shi and S Khatoon ldquoMining data generatedby sensor networks a surveyrdquo Information Technology Journalvol 11 pp 1534ndash1543 2012

[21] D J Cook M Youngblood E O Heierman III et alldquoMavHome an agent-based smart homerdquo in Proceedings of the1st IEEE International Conference on Pervasive Computing andCommunications (PerCom rsquo03) pp 521ndash524 March 2003

[22] J Rabatel S Bringay and P Poncelet ldquoSO MAD sensorminingfor anomaly detection in railway datardquo in Advances in DataMining Applications andTheoretical Aspects pp 191ndash205 2009

[23] V Guralnik and K Z Haigh ldquoLearning models of humanbehaviour with sequential patternsrdquo in Proceedings of the AAAI-02 Workshop on Automation as Caregiver pp 24ndash30 2002

[24] S Huang and Y Dong ldquoAn active learning system for miningtime-changing data streamsrdquo Intelligent Data Analysis vol 11no 4 pp 401ndash419 2007

[25] J Beringer and E Hullermeier ldquoEfficient instance-based learn-ing on data streamsrdquo Intelligent Data Analysis vol 11 no 6 pp627ndash650 2007

[26] E J Spinosaa A PD L F deCarvalhoa and J Gamab ldquoNoveltydetection with application to data streamsrdquo Intelligent DataAnalysis vol 13 no 3 pp 405ndash422 2009

[27] M Xie S Han B Tian and S Parvin ldquoAnomaly detectionin wireless sensor networks a surveyrdquo Journal of Network andComputer Applications vol 34 no 4 pp 1302ndash1325 2011

[28] Y Zhang N Meratnia and P Havinga ldquoOutlier detectiontechniques for wireless sensor networks a surveyrdquo IEEE Com-munications Surveys and Tutorials vol 12 no 2 pp 159ndash1702010

[29] V Chandola A Banerjee and V Kumar ldquoAnomaly detection asurveyrdquo ACM Computing Surveys vol 41 no 3 article 15 2009

[30] VMaojo and J Sanandre ldquoA survey of data mining techniquesrdquoMedical Data Analysis Lecture Notes in Computer Science vol1933 pp 17ndash22 2000

[31] W Jinlong X Congfu C Weidong and P Yunhe ldquoSurveyof the study on frequent pattern mining in data streamsrdquo inProceedings of the IEEE International Conference on SystemsMan and Cybernetics (SMC rsquo04) pp 5917ndash5922 October 2004

[32] J Cheng Y Ke and W Ng ldquoA survey on algorithms formining frequent itemsets over data streamsrdquo Knowledge andInformation Systems vol 16 no 1 pp 1ndash27 2008

[33] A A Abbasi andM Younis ldquoA survey on clustering algorithmsfor wireless sensor networksrdquo Computer Communications vol30 no 14-15 pp 2826ndash2841 2007

[34] O Boyinbode H Le and M Takizawa ldquoA survey on clusteringalgorithms for wireless sensor networksrdquo International Journalof Space-Based and SituatedComputing vol 1 no 2 pp 130ndash1362007

[35] M M Gaber A Zaslavsky and S Krishnaswamy ldquoA survey ofclassificationmethods in data streamsrdquo inData Streams pp 39ndash59 Springer 2007

[36] R Agrawal and R Srikant ldquoFast algorithms for mining associ-ation rulesrdquo in Proceedings of the 20th International ConferenceVery Large Data Bases (VLDB rsquo94) pp 487ndash499 Citeseer 1994

[37] R J Bayardo Jr ldquoEfficiently mining long patterns fromdatabasesrdquo SIGMOD Record vol 27 no 2 pp 85ndash93 1998

[38] S Brin RMotwani andC Silverstein ldquoBeyondmarket basketsgeneralizing association rules to correlationsrdquo SIGMODRecordvol 26 no 2 pp 265ndash276 1997

[39] W Cheung and O R Zaiane ldquoIncremental mining of frequentpatterns without candidate generation or support constraintrdquoin Proceedings of 7th International Database Engineering andApplications Symposium pp 111ndash116 2003

[40] R Agrawal T Imielinski and A Swami ldquoMining associationrules between sets of items in large databasesrdquo in Proceeding ofSIGMOD pp 207ndash216

[41] J Han J Pei Y Yin and R Mao ldquoMining frequent pat-terns without candidate generation a frequent-pattern treeapproachrdquo Data Mining and Knowledge Discovery vol 8 no 1pp 53ndash87 2004

[42] M Halatchev and L Gruenwald ldquoEstimating missing valuesin related sensor data streamsrdquo in Proceedings of the 11thInternational Conference on Management of Data (COMADrsquo05) 2005

[43] N Jiang ldquoDiscovering association rules in data streams basedon closed pattern miningrdquo in Proceedings of the SIGMODWorkshop on Innovative Database Research 2007

[44] N Jiang and L Gruenwald ldquoEstimating missing data in datastreamsrdquo Advances in Databases Concepts Systems and Appli-cations pp 981ndash987 2007

[45] N Jiang and L Gruenwald ldquoCFI-stream mining closed fre-quent itemsets in data streamsrdquo in Proceedings of the 12th ACMSIGKDD International Conference on Knowledge Discovery andData Mining (KDD rsquo06) pp 592ndash597 August 2006

[46] K Loo I Tong and B Kao ldquoOnline algorithms for min-ing inter-stream associations from large sensor networksrdquo inAdvances in KnowledgeDiscovery andDataMining pp 291ndash3022005

[47] G S Manku and R Motwani ldquoApproximate frequency countsover data streamsrdquo in Proceedings of the 28th InternationalConference on Very Large Data Bases pp 346ndash357 2002

[48] S K Chong S Krishnaswamy S W Loke and M M GaberldquoUsing association rules for energy conservation in wirelesssensor networksrdquo in Proceedings of the 23rd Annual ACMSymposium on Applied Computing (SAC rsquo08) pp 971ndash975March 2008

[49] S K Tanbeer C F Ahmed B-S Jeong and Y-K Lee ldquoEfficientmining of association rules from wireless sensor networksrdquo inProceedings of the 11th International Conference on AdvancedCommunication Technology (ICACT rsquo09) pp 719ndash724 February2009

[50] A Boukerche and S Samarah ldquoA novel algorithm for miningassociation rules in Wireless Ad Hoc Sensor Networksrdquo IEEETransactions on Parallel and Distributed Systems vol 19 no 7pp 865ndash877 2008

[51] K Romer ldquoDistributed mining of spatio-temporal event pat-terns in sensor networksrdquo in Proceedings of the 1st Euro-American Workshop on Middleware for Sensor Networks(EAWMS rsquo06) 2006

[52] BTnode platform httpwwwbtnodeethzch[53] R Agrawal and R Srikant ldquoMining sequential patternsrdquo in

Proceedings of the IEEE 11th International Conference on DataEngineering pp 3ndash14 March 1995

International Journal of Distributed Sensor Networks 23

[54] R Srikant and R Agrawal ldquoMining sequential patterns gen-eralizations and performance improvementsrdquo in Proceedings ofthe Advances in Database Technology (EDBT rsquo96) pp 1ndash17 1996

[55] F Masseglia F Cathala and P Poncelet ldquoThe PSP approachfor mining sequential patternsrdquo Principles of Data Mining andKnowledge Discovery pp 176ndash184 1998

[56] J Han J Pei B Mortazavi-Asl Q Chen U Dayal and M-CHsu ldquoFreeSpan frequent pattern-projected sequential patternminingrdquo in Proceedings of the Sixth ACMSIGKDD InternationalConference onKnowledgeDiscovery andDataMining (KDD rsquo01)pp 355ndash359 August 2000

[57] J Pei J Han B Mortazavi-Asl et al ldquoPrefixSpan min-ing sequential patterns efficiently by prefix-projected patterngrowthrdquo in Proceedings of the 17th International Conference onData Engineering pp 215ndash224 April 2001

[58] F Esposito T M A Basile N Di Mauro and S Ferilli ldquoA rela-tional approach to sensor network data miningrdquo InformationRetrieval and Mining in Distributed Environments pp 163ndash1812010

[59] F Esposito N Di Mauro T M A Basile and S FerillildquoMulti-dimensional relational sequence miningrdquo FundamentaInformaticae vol 89 no 1 pp 23ndash43 2008

[60] R Agrawal H Mannila R Srikant et al ldquoFast discovery ofassociation rulesrdquo inAdvances in KnowledgeDiscovery andDataMining pp 307ndash328 AAAI PressMenlo Park Calif USA 1996

[61] Mica2Dot CrossBow 2005 httpwwwxbowcom[62] Intel Berkeley Research Lab Data httpdbcsailmitedulab-

datalabdatahtml[63] P H Wu W C Peng and M S Chen ldquoMining sequential

alarm patterns in a telecommunication databaserdquo in Databasesin Telecommunications II pp 37ndash51 2001

[64] V S Tseng and E H-C Lu ldquoEnergy-efficient real-time objecttracking in multi-level sensor networks by mining and predict-ing movement patternsrdquo Journal of Systems and Software vol82 no 4 pp 697ndash706 2009

[65] V S Tseng and K W Lin ldquoEnergy efficient strategies for objecttracking in sensor networks a data mining approachrdquo Journalof Systems and Software vol 80 no 10 pp 1678ndash1698 2007

[66] S Samarah M Al-Hajri and A Boukerche ldquoA predictiveenergy-efficient technique to support object-tracking sensornetworksrdquo IEEE Transactions on Vehicular Technology vol 60no 2 pp 656ndash663 2011

[67] A Taherkordi R Mohammadi and F Eliassen ldquoA commu-nication-efficient distributed clustering algorithm for sensornetworksrdquo in Proceedings of the 22nd International Conferenceon Advanced Information Networking and Applications Work-shopsSymposia (AINA rsquo08) pp 634ndash638 March 2008

[68] G Gupta and M Younis ldquoLoad-balanced clustering of wirelesssensor networksrdquo in Proceedings of the International Conferenceon Communications (ICC rsquo03) vol 3 pp 1848ndash1852 May 2003

[69] S Bandyopadhyay and E J Coyle ldquoAn energy efficient hier-archical clustering algorithm for wireless sensor networksrdquo inProceedings of the 22nd Annual Joint Conference on the IEEEComputer and Communications Societies pp 1713ndash1723 April2003

[70] S Ghiasi A Srivastava X Yang and M Sarrafzadeh ldquoOptimalenergy aware clustering in sensor networksrdquo Sensors vol 2 no7 pp 258ndash269 2002

[71] O Younis and S Fahmy ldquoHEED a hybrid energy-efficientdistributed clustering approach for ad hoc sensor networksrdquoIEEE Transactions on Mobile Computing vol 3 no 4 pp 366ndash379 2004

[72] M Younis M Youssef and K Arisha ldquoEnergy-aware manage-ment for cluster-based sensor networksrdquo Computer Networksvol 43 no 5 pp 649ndash668 2003

[73] Y T Hou Y Shi H D Sherali and S F Midkiff ldquoOn energyprovisioning and relay node placement for wireless sensornetworksrdquo IEEE Transactions on Wireless Communications vol4 no 5 pp 2579ndash2590 2005

[74] T Wu and S Biswas ldquoA self-reorganizing slot allocation proto-col for multi-cluster sensor networksrdquo in Proceedings of the 4thInternational Symposium on Information Processing in SensorNetworks (IPSN rsquo05) pp 309ndash316 April 2005

[75] K Dasgupta K Kalpakis and P Namjoshi ldquoAn efficientclustering-based heuristic for data gathering and aggregationin sensor networksrdquo in Proceedings of the IEEE Wireless Com-munications and Networking Conference (WCNC rsquo03) vol 3 pp1948ndash1953 2003

[76] M Demirbas A Arora and V Mittal ldquoFLOC A fast local clus-tering service for wireless sensor networksrdquo in Proceedings ofWorkshop on Dependability Issues in Wireless Ad Hoc Networksand Sensor Networks (DIWANS rsquo04) 2004

[77] P Ding J Holliday and A Celik ldquoDistributed energy-efficienthierarchical clustering for wireless sensor networksrdquo in Pro-ceedings of the 1st IEEE International Conference on DistributedComputing in Sensor Systems (DCOSS rsquo05) pp 466ndash467 July2005

[78] H Chan and A Perrig ldquoACE an emergent algorithm for highlyuniform cluster formationrdquoWireless Sensor Networks vol 2920pp 154ndash171 2004

[79] H Chan M Luk and A Perrig ldquoUsing clustering informationfor sensor network localizationrdquo in Proceedings of the 1st IEEEInternational Conference on Distributed Computing in SensorSystems (DCOSS rsquo05) pp 109ndash125 July 2005

[80] H Huang and J Wu ldquoA probabilistic clustering algorithmin wireless sensor networksrdquo in Proceeding of IEEE 62ndSemiannual Vehicular Technology Conference (VTC rsquo05) p 17962005

[81] A Youssef M Younis M Youssef and A Agrawala ldquoDis-tributed formation of overlappingmulti-hop clusters in wirelesssensor networksrdquo in Proceedings of the 49th Annual IEEE GlobalCommunication Conference (Globecom rsquo06) pp 1ndash6 December2006

[82] S Dai P Wang L Gao and S Zheng ldquoMining clusteringalgorithm in wireless sensor networksrdquo in Proceedings of theIEEE International Conference on Granular Computing (GRCrsquo08) pp 178ndash182 August 2008

[83] W R Heinzelman A Chandrakasan and H Balakrish-nan ldquoEnergy-efficient communication protocol for wirelessmicrosensor networksrdquo in Proceedings of the 33rd AnnualHawaii International Conference on System Siences (HICSS rsquo00)vol 2 p 223 January 2000

[84] C Liu K Wu and J Pei ldquoA dynamic clustering and schedulingapproach to energy saving in data collection from wirelesssensor networksrdquo in Proceedings of the 2nd Annual IEEE Com-munications Society Conference on Sensor and AdHoc Commu-nications and Networks (SECON rsquo05) pp 374ndash385 September2005

[85] L Guo C Ai X Wang Z Cai and Y Li ldquoReal time clusteringof sensory data in wireless sensor networksrdquo in Proceedingsof the IEEE 28th International Performance Computing andCommunications Conference (IPCCC rsquo09) pp 33ndash40 December2009

24 International Journal of Distributed Sensor Networks

[86] M H Yeo M S Lee S J Lee and J S Yoo ldquoData correlation-based clustering in sensor networksrdquo in Proceedings of the Inter-national Symposium on Computer Science and its Applications(CSA rsquo08) pp 332ndash337 October 2008

[87] P Beyens A Nowe and K Steenhaut ldquoHigh-density wirelesssensor networks a new clustering approach for prediction-based monitoringrdquo in Proceedings of the 2nd European Work-shop on Wireless Sensor Networks (EWSN rsquo05) pp 188ndash196February 2005

[88] S Yoon and C Shahabi ldquoThe Clustered AGgregation (CAG)technique leveraging spatial and temporal correlations in wire-less sensor networksrdquo ACM Transactions on Sensor Networksvol 3 no 1 Article ID 1210672 2007

[89] K Wang S A Ayyash T D C Little and P Basu ldquoAttribute-based clustering for information dissemination in wirelesssensor networksrdquo in Proceedings of the 2nd Annual IEEE Com-munications Society Conference on Sensor and AdHoc Commu-nications and Networks (SECON rsquo05) pp 498ndash509 Santa ClaraCalif USA September 2005

[90] X Ma S Li Q Luo et al ldquoDistributed hierarchical clusteringand summarization in sensor networksrdquo in Advances in Dataand Web Management pp 168ndash175 2007

[91] L K Sharma O P Vyas S Schieder et al ldquoNearest neighbourclassification for trajectory datardquo Information and Communica-tion Technologies vol 101 pp 180ndash185 2010

[92] B Chikhaoui S Wang and H Pigot ldquoA new algorithm basedon sequential pattern mining for person identification in ubiq-uitous environmentsrdquo in Proceedings of the 4th InternationalWorkshop on Knowledge Discovery form Sensor Data (ACMSensorKDD rsquo10) pp 20ndash28 Washington DC USA 2010

[93] J R M Bauchet S Giroux H Pigot et al ldquoPervasive assistancein smart homes for people with intellectual disabilities a casestudy on meal preparationrdquo International Journal of AssistiveRobotics and Mechatronics vol 9 no 4 pp 42ndash54 2008

[94] D J Cook andM Schmitter-Edgecombe ldquoAssessing the qualityof activities in a smart environmentrdquoMethods of Information inMedicine vol 48 no 5 pp 480ndash485 2009

[95] I H Witten and E Frank Data Mining Practical MachineLearning Tools and Techniques With Java Implementation Mor-gan Kaufmann 2000

[96] K Sharma M Rajpoot and L K Sharma ldquoNearest neighbourclassification for wireless sensor network datardquo InternationalJournal of Computer Trends and Technology no 2 2011

[97] NS2 Simulator httpwwwisiedunsnamns[98] O P V L K Sharma S Schieder and A K Akasapu ldquoA nearest

neighbour classification for trajectory datardquo in Springer CCISvol 101 pp 180ndash185 2010

[99] M J Akhlaghinia A Lotfi C Langensiepen and N SherkatldquoA fuzzy predictor model for the occupancy prediction of anintelligent inhabited environmentrdquo in Proceedings of the IEEEInternational Conference on Fuzzy Systems (FUZZ rsquo08) pp 939ndash946 June 2008

[100] M Gaber S Krishnaswamy and A Zaslavsky ldquoOn-boardmining of data streams in sensor networksrdquo in AdvancedMethods for Knowledge Discovery from Complex Data pp 307ndash335 2005

[101] M M Gaber S Krishnaswamy and A Zaslavsky ldquoAdaptivemining techniques for data streams using algorithm outputgranularityrdquo in Proceedings of the Australasian Data MiningWorkshop 2003

[102] M M Gaber A Zaslavsky and S Krishnaswamy ldquoResource-aware knowledge discovery in data streamsrdquo in Proceedingsof 1st International Workshop on Knowledge Discovery in DataStreams held in Conjunction ECML and PKDD 2004

[103] S M McConnell and D B Skillicorn ldquoA distributed approachfor prediction in sensor networksrdquo in Proceedings of the Work-shop on Data Mining in Sensor Networks Newport Beach CalifUSA 2005

[104] B Malhotra I Nikolaidis and J Harms ldquoDistributed classifi-cation of acoustic targets in wireless audio-sensor networksrdquoComputer Networks vol 52 no 13 pp 2582ndash2593 2008

[105] K Flouri B Beferull-Lozano and T Tsakalides ldquoTraininga SVM-based classifier in distributed sensor networksrdquo inProceedings of the 14th International Conference onDigital SignalProcessing (DSP rsquo09) pp 1ndash5 2006

[106] K Flouri B Beferull-Lozano and T Tsakalides ldquoEnergy-efficient distributed support vectormachines for wireless sensornetworksrdquo in Proceedings of the EuropeanWorkshop onWirelessSensor Networks 2006

[107] K Flouri B Beferull-Lozano and T Tsakalides ldquoDistributedconsensus algorithms for SVM training in wireless sensornetworksrdquo in Proceedings of the 16th European Signal ProcessingConference (EUSIPCO 09) 2008

[108] S Rajasegarar C Leckie M Palaniswami and J C BezdekldquoQuarter sphere based distributed anomaly detection in wire-less sensor networksrdquo in Proceedings of the IEEE InternationalConference on Communications (ICC rsquo07) pp 3864ndash3869 June2007

[109] B Chikhaoui S Wang and H Pigot ldquoA new algorithm basedon sequential pattern mining for person identification in ubiq-uitous environmentsrdquo in Proceedings of the 4th InternationalWorkshop on Knowledge Discovery form Sensor Data (ACMSensorKDD rsquo10) pp 20ndash28 Washington DC USA 2010

[110] K Romer and F Mattern ldquoThe design space of wireless sensornetworksrdquo IEEEWireless Communications vol 11 no 6 pp 54ndash61 2004

[111] O Diallo J J P C Rodrigues and M Sene ldquoReal-time datamanagement on wireless sensor networks a surveyrdquo Journal ofNetwork andComputer Applications vol 35 no 3 pp 1013ndash10212012

[112] Y Yao L Feng B Jin and F Chen ldquoAn incremental learningapproachwith SupportVectorMachine for network data streamclassification problemrdquo Information Technology Journal vol 11no 2 pp 200ndash208 2012

Submit your manuscripts athttpwwwhindawicom

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2013Part I

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

DistributedSensor Networks

International Journal of

ISRN Signal Processing

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Mechanical Engineering

Advances in

Modelling amp Simulation in EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Advances inOptoElectronics

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2013

ISRN Sensor Networks

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawi Publishing Corporation httpwwwhindawicom Volume 2013

The Scientific World Journal

ISRN Robotics

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

International Journal of

Antennas andPropagation

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

ISRN Electronics

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

thinspJournalthinspofthinsp

Sensors

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Active and Passive Electronic Components

Chemical EngineeringInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Electrical and Computer Engineering

Journal of

ISRN Civil Engineering

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Advances inAcoustics ampVibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Page 11: ReviewArticle Data Mining Techniques for Wireless Sensor ...home.etf.bg.ac.rs/~vm/os/dmsw/Data Mining... · have a large impact on type of data mining algorithm to choose;therefore,onehastodecidetheprocessing

International Journal of Distributed Sensor Networks 11

of occupantsThe sensor NWs collect the variety of attributesincluding environmental changes and occupantrsquos interactionwith the environment The collected data is then used by thelearning approach to construct a classification-based predic-tive model to predict the ambient intelligence environmentoccupancy The occupancy is predicted by using the fuzzyrules which are modeled by using the past value of timeseries data In the learning process input from the sensor iscompared with stored rules to take appropriate action Theprediction-based approach improves the energy saving insmart homes and enhances the safety and security of occu-pants The result shows the ability of the proposed techniqueto predict the combined occupancy time series However themodel is implemented in single-user environment and unableto predict the complex environmental patterns in multi-userenvironment over long period

442 Centralized Approaches Aim toMaximizeWSNsrsquo Perfor-mance Gaber et al [100] proposed the lightweight classifica-tion (LWClass) a one-pass algorithm for on-board miningof data streams in sensor networks They used the algorithmoutput granularity (AOG) [101 102] technique to preserve thelimited memory size and change the algorithm output rateaccording to data rate available memory algorithm outputrate history and time constraints to fill the available memorywith generated knowledgeThe algorithmworks by searchingfor the nearest instance stored in main memory when a newelement arrives All instances are already stored in the mainmemory according to a prespecified distance threshold Thethreshold here represents the similarity measure acceptableby the algorithm to consider two or more elements as oneelement according to the elements attribute values If thealgorithm finds this element then it checks the class labelIf the class label is the same then it increases the weightfor this instance by one otherwise it decrements the weightby one If the weight becomes zero then this element isreleased from the memory The algorithm is empiricallyvalidated using synthetic streaming data under the resource-constrained environment of a common handheld computer

443 DistributedApproaches Aim to SolveWSNsrsquo Application-Based Issues McConnell and Skillicorn [103] presented adistributed framework for building and deploying predictorsin sensor networks By using the computational power ofeach sensor a powerful learning structure on whole networkis constructed A distributed voting approach is proposedin which each sensor is a leaf of tree (DT) to performlocal prediction Instead of sending the raw data the localpredictive models built on sensors transmit the target class tothe sink At sink the local predication models are combinedto construct global prediction model It shows how thelocal model enables sensors to respond to the change intarget by relearning local models The proposed frameworkis useful especially for sensor networks with limited energycomputation and bandwidth resources It makes efficientthe distributed data mining in the presence of movingclass boundaries Data is also confidentially achieved bytransmitting a predictivemodel instead of original data to the

sink The distributed prediction model is evaluated using J48decision tree (implemented in WEKA) on variety of datasetfor both simple and weighted voting schemes According toresults distributed prediction model has the potential of anincrease in accuracy combined with a reduction in modelsize and runtime as compared with a centralized approachMajor issues in this framework are the need of an expensiveCPU on each sensor node for computing and building localpredictive model and also extra memory is required to storelocal predictive model

444 Distributed Approaches Aim to Maximize WSNsrsquo Per-formance Malhotra et al [104] proposed a distributed clas-sification scheme to generate effective feature vectors of lowdimension (FVLD) for wireless audio network A distributedcluster-based algorithm for detection and classification ofvehicles has been proposed Sensors form clusters on-demand for the sake of running a classification task based onthe produced feature vectors The monitoring area is dividedinto clusters and a cluster head is selected for each clusterAll sensors send their feature vector to cluster heads Thecluster head combines all received feature vectors (includingone from itself) executes the classification task using forexample KNN or ML classifiers and makes decision on theclass of the unknown vehicle Two approacheswere proposedthe first combines extracted features and the second combinesindividual decisions Classification using decision fusion anda maximum likelihood (ML) classifier led to the best resultsML is also compared with KNN classifier with varioussettings of data and decision fusion schemes The proposedtechnique produced the best classification accuracy of 8946as compared with all other approaches

Flouri et al [105ndash107] have proposed distributed andincremental techniques for learning classification rules usingSVM-based (support vector machine) technique in a sensornetwork The authors proposed two distributed algorithmsthe distributed fix partition SVM (DFP-SVM) and theweighted distributed fix partition SVM (WDFP-SVM) fortraining a SVM applied to the classification problem in aWSN SVM is incrementally trained on example set calledsupport vector The fact with SVM is that the number ofsupport vectors is very small comparedwith the number of allsample values Besides the support vectors (and offset) revealcompressed representation of separating SVM hyperplaneThat is why sending only the support vectors instead ofall training samples to the next cluster head is obviouslyvery energy efficient due to communication reduction Aftertraining the required parameters of the kernel functions aretransferred to each node for classification The performanceof the proposed approach is evaluated by running number ofsimulation and comparison is made with centralized algo-rithm The results show that energy consumption decreaseswhen the SVM is trained incrementally as compared with thecentralized case However the challenges for SVM formula-tions are computational complexity and the choice of properkernel function

Rajasegarar et al [108] proposed the SVM-based tech-nique for outlier detection in sensor data This techniqueuses one-class quarter-sphere SVM to identify local outliers

12 International Journal of Distributed Sensor Networks

at each node and to minimize the computational complexityThe sensor data that lies outside the quarter sphere isconsidered as an outlier Each node communicates onlythe radius information of sphere with its parent for outlierclassification This technique identifies outliers from the datameasurements collected after a long-time window and is notperformed in real time The technique also ignores spatialcorrelation of neighboring nodes which makes the results oflocal outliers inaccurate The technique is evaluated by usingthe real sensor measurement collected from deployment ofwireless sensors in the Great Duck Island Project [2] formonitoring the habitat of sea birds The algorithm is imple-mented in Matlab and two simulations were run to measurethe computational strategy and various kernel functionsResults reveal that the proposed technique achieves signifi-cant energy savings in terms of communication overhead inthe network

5 Comparison of Data Mining Techniquesfor WSNs

This section identifies several common and different aspectsof data mining techniques specially designed for WSNsdiscussed above These aspects will be used as metrics in thecomparative Tables 2 3 4 5 and 6 First evaluation aspectsfor different techniques are discussed and then comparativetables are presented to compare and differentiate existing datamining techniques for WSNs data

51 Input Sensor Data Sensor data can be viewed as largevolume of real-valued data that is continuously collectedfrom WSNs The type of input sensor data demonstrateswhich data mining techniques can be used to analyze thedata Data mining techniques usually consider following twocharacteristics of data

Attribute Mining techniques can identify the associationbetween data attributes Attributes can be homogenous [50] orheterogeneous [33 48] Homogenous attribute means sensingsingle-value attribute for example temperature only Forheterogeneous case each nodemay be equippedwithmultiplesensors and can sense multiple attributes for example tem-perature humidity and pressure The data mining techniqueshould be able to identify the correlation between multipleattributes

Correlation Two types of data correlation appear at eachsensor node The first type is attribute correlation that isdependency among data attributes The second type is interms of time and space that is temporal and spatial corre-lation Temporal correlation indicates that the readings fromdifferent sensor node are observed at the same time instantand readings observed at one time instant are related tothe readings observed at the previous time instant whereasspatial correlation indicates that the readings from sensornodes geographically close to each other are expected tobe largely correlated Capturing spatiotemporal correlation

helps to predict future trend of sensor reading and identifica-tion of dead node if reading from correlated sensor ismissing

52 Processing Architecture In order to apply data miningtechnique on sensor data we need to determine the modelsof computation There are two general models Consider thefollowing

CentralizedThe simplest way to analyzeWSNs data is to use acentralized model In this approach entire raw data collectedfromWSNs is transferred to central server whichmaintains adatabase of readings from all of the sensorsThe central serverperforms offline extensive analysis in order to find interestingpatterns from the aggregated data With the size of WSNsincreasing the amount of data transmitted in the system willbecome huge The obvious drawback of this approach is highconsumption of energy and bandwidth Furthermore it is notscalable to very large number of sensors

Distributed Another computation approach uses distributedmodel in which sensor nodes use their processing abilitiesto carry out some mining tasks locally and transmit onlythe required and partially processed data called local modelLocal models contain the compact event patterns rather thanraw data For example data collected from different sensorcan be aggregated before being transmitted to central serverIn these systems an intermediate node called ldquoaggregatorrdquo isused to collect and aggregate the data from different sensorsSince sensor nodes are constrained in resources the challengefor this approach is how to satisfy the mining accuracywhile keeping the communication overhead memory andcomputational cost low

53 Data Mining Method It refers to the data miningalgorithm adapted or developed for unique characteristic ofWSNs data Distributed approaches use one-scan algorithmsfor real-time processing in order to deal with the high dataarrival rate the mining results are expected to be availablewithin short response times whereas centralized approachescollect the sensory data to single site and applies offlinemultiscan technique for extensive data analysis

54 Node Properties The proposed techniques are largelyinfluenced by following types of node properties

Connectivity Single-hop communication is a direct commu-nication between the sensor node and the base station It issimple and easy to implement but limited by communicationdistanceMultihop communication uses some kinds of nodesas relays when transmitting data packets from the source tothe sink which is more complex

Mobility Node mobility increases the complexity of design-ing an appropriate data mining technique for WSNs Themajority of techniques assumes that sensor nodes are staticonly a few techniques consider the node mobility Whennodes are mobile maintaining a certain structure for data

International Journal of Distributed Sensor Networks 13

Table2Com

paris

onof

dataminingtechniqu

esforw

irelesssensor

networks

Approach

Objectiv

eDM

metho

d

Processin

gSensor

data

Nod

eproperties

Implem

entatio

nLimitatio

nsArchitecture

Attributes

Correlatio

nCon

nectivity

Mob

ility

Nod

erole

Nod

etask

Applicationarea

Evaluatio

nmetho

dDatas

ource

Opt

objective

Distributed

Central

Homogenous

Heterogeneous

Attribute

Spatial

Temporal

Singlehop

Multihops

Static

Mobile

ClusterheadSensorRelay

Simulation

AnalyticalMod

Real

Synthetic

Frequent

patte

rnmining

DSA

RM[42]

Missingdata

estim

ation

Aprio

rilik

eradicradic

radicradic

radicradic

Sensea

ndsend

Traffi

cmon

itorin

gradic

radicData

accuracy

Igno

rethes

ensor

thatrepo

rts

different

values

In-networkdata

mining[51]

Eventspatte

rns

discovery

Aprio

rilik

eradic

radicradicradic

radicradic

radic

Aggregatio

nlocalp

attern

mining

Environm

ental

mon

itorin

gradicradicradic

Scalability

Highmem

oryand

commun

ication

Distrib

uted

data

aggregation[15]

ImproveW

SNperfo

rmance

Aprio

rilik

eradic

radicradic

radicradic

radicSupp

ort-b

ased

aggregation

WSN

sperfo

rmance

mon

itorin

gradic

radicDatas

ize

Increasesb

uffer

cost

delayed

crucialm

essages

Onlinea

lgorith

m[46]

Intervallist

ofrepresentatio

nof

WSN

sdata

Lossy

coun

ting

radicradic

radicradic

radicradic

Perio

dical

sensing

WSN

smon

itorin

gradic

radicTimea

ndmem

ory

Datar

edun

dancy

Lightweightrule

learning

[48]

Identifyhigh

lycorrelated

rules

forsensin

gAp

riorilik

eradic

radicradic

radicradic

radicQuery-based

data

sensing

Con

trolW

SNs

operations

radicradic

Energy

Not

valid

ated

well

onrealdata

CARM

[43]

Missingdata

estim

ation

FP-growth

based

radicradic

radicradic

radicradic

Sensea

ndsend

Dataa

nalysis

radicradic

Data

accuracy

Ineffi

cientfor

hand

ling

high

-speed

data

14 International Journal of Distributed Sensor Networks

Table3Com

paris

onof

dataminingtechniqu

esforw

irelesssensor

networkscon

tinued

Approach

Objectiv

eDM

metho

d

Processin

gSensor

data

Nod

eproperties

Implem

entatio

nLimitatio

nsArchitecture

Attributes

Correlatio

nCon

nectivity

Mob

ility

Nod

erole

Nod

etask

Applicationarea

Evaluatio

nmetho

dDatas

ource

Opt

objective

Distributed

Central

Homogenous

Heterogeneous

Attribute

Spatial

Temporal

Singlehop

Multihops

Static

Mobile

Clusterhead

Sensor

Relay

Simulation

Analyticalmod

Real

Synthetic

Frequent

patte

rnmining

Associationrules

mining

fram

ework[50]

Faultand

future

event

predictio

n

FP-growth

usingPL

T-str

uctureradic

radicradic

radicradic

radicradic

Aggregatio

nMon

itorW

SNs

quality

ofserviceradic

radicNoof

messages

Increase

costdu

eto

multip

leDBscan

SP-tr

ee[49]

Disc

over

events

patte

rns

FP-growth

based

radicradic

radicradic

radicradic

Sensea

ndsend

Generic

mon

itorin

gradicradicradic

Mem

ory

Hightre

econstructio

ncost

Sequ

entia

lpattern

mining

Relatio

nal

fram

ework[58]

Multi-

dimensio

nal

correlation

discovery

Aprio

rilik

eradic

radicradicradic

radicradic

Sensea

ndsend

Environm

ental

mon

itorin

gradicradic

Data

representatio

nMem

oryandtim

econsum

ing

Episo

dediscovery(ED)

[21]

Actio

npredictio

n

Generalized

sequ

entia

lpatte

rn(G

SP)

radicradic

radicradic

radicSensea

ndsend

Inhabitants

behavior

predictio

nradicradicradic

Predictio

naccuracy

Ineffi

cientfor

complex

activ

ities

MPG

[64]

Predicto

bjectrsquos

future

movem

ent

Aprio

rilik

eradic

radicradic

radicradicradic

Clusterin

gRe

al-timeo

bject

tracking

radicradic

Tracking

time

andenergy

Not

analyzed

onrealdataset

Con

textual

patte

rns

discovery[22]

Ano

maly

detection

PSP

radicradicradic

radicradic

radicSensea

ndsend

Railw

aymaintenance

radicradic

Ano

maly

precision

Missingreal-time

anom

alypredictio

n

International Journal of Distributed Sensor Networks 15

Table4Com

paris

onof

dataminingtechniqu

esforw

irelesssensor

networkscon

tinued

Approach

Objectiv

eDM

metho

d

Processin

gSensor

data

Nod

eproperties

Implem

entatio

nLimitatio

nsArchitecture

Attributes

Correlatio

nCon

nectivity

Mob

ility

Nod

erole

Nod

etask

Applicationarea

Evaluatio

nmetho

dDatas

ource

Optobjectiv

e

Distributed

Central

Homogenous

Heterogeneous

Attribute

Spatial

Temporal

Singlehop

Multihops

Static

Mobile

ClusterheadSensorRelay

Simulation

Analyticalmod

Real

Synthetic

Sequ

entia

lpattern

mining

TMP-mine[65]

Predicto

bjectrsquos

future

movem

ent

Patte

rngrow

thusingTM

P-tre

econstructio

nradic

radicradic

radicradic

radicRu

le-based

node

activ

ation

Real-timeo

bject

tracking

radicradic

Energy

Highmissing

rateandtim

e

Patte

rnlearner[23]B

ehavior

recogn

ition

Tree

projectio

nradic

radicradic

radicradic

radicSensea

ndsend

Behavior

mon

itorin

gradicradic

Noof

patte

rns

learned

Com

plex

and

redu

ndant

patte

rns

MSA

P[63]

Faultp

rediction

Cand

idate

constructio

nradicradic

radicradicradic

radicSensea

ndsend

Telecommun

ication

radicradic

Patte

rnsa

ccuracy

Cand

idate

constructio

nis

expensiveto

compu

te

PTSP

[66]

Objectrsquos

future

movem

ent

predictio

n

Sequ

entia

lpatte

rngeneratio

nradic

radicradic

radicradic

radicRu

le-based

node

activ

ation

Objecttracking

radicradic

Energy

Ineffi

cientto

predict

high

-speed

objects

Clusterin

g

DCC

[86]

WSN

slon

gevity

Data

correlation-

based

cluste

ring

radicradic

radicradicradic

radicradic

Data

supp

ression

GenericWSN

sapplication

radicradic

Energy

anddata

size

Highclu

sterin

grate

H-cluste

r[85]

In-network

commun

ication

Data

correlation-

based

cluste

ring

radicradic

radicradicradic

radicradic

Data

summarization

Real-time

mon

itorin

gradic

radicradic

Com

mun

ication

Highdataloss

rate

16 International Journal of Distributed Sensor Networks

Table5Com

paris

onof

dataminingtechniqu

esforw

irelesssensor

networkscon

tinued

Approach

Objectiv

eDM

metho

d

Processin

gSensor

data

Nod

eproperties

Implem

entatio

nLimitatio

nsArchitecture

Attributes

Correlatio

nCon

nectivity

Mob

ility

Role

Nod

etask

Applicationarea

Evaluatio

nmetho

dDatas

ource

Optobjectiv

e

Distributed

Central

Homogenous

Heterogeneous

Attribute

Spatial

Temporal

Singlehop

Multihops

Static

Mobile

ClusterheadSensorRelay

Simulation

Analyticalmod

Real

Synthetic

Clusterin

gPredictio

nmod

el[87]

Predictio

n-based

mon

itorin

gHeuris

ticscheme

radicradic

radicradic

radicradic

radicradicradic

Localprediction

mod

elEn

vironm

ental

mon

itorin

gradic

radicCom

mun

ication

Clustero

verla

pping

CAG[88]

WSN

sbandw

idth

gain

Data

correlation-

based

cluste

ring

radicradic

radicradic

radicradic

radicradic

Dataa

ggregatio

nGenericWSN

sapplications

radicradic

Com

mun

ication

Sensorydataloss

EEDC[84]

On-demand

cluste

ring

Data

correlation-

based

cluste

ring

radicradic

radicradic

radicradic

radicSensea

ndsend

Surveillanced

ata

analysis

radicradicradic

Energy

Ineffi

cientfor

large

WSN

s

Clusterin

gsensorydata[67]Com

mun

ication

efficiency

K-means

radicradicradic

radicradic

radicradic

Data

summarization

Dataa

nalysis

radicradic

Com

mun

ication

Ineffi

cientfor

large

WSN

sAttributeb

ased

cluste

ring[89]

WSN

sbandw

idth

gain

Hierarchal

cluste

ringradic

radicradic

radicradic

radicradic

Datac

luste

ring

Mon

itorin

gand

tracking

radicradic

Com

mun

ication

Highcompu

tatio

ncost

DHCS

[90]

Uniform

data

distr

ibution

Hierarchal

cluste

ringradic

radicradicradic

radicradic

radicradic

Datac

luste

ring

and

summarization

Interactived

ata

analysis

radicMessage

redu

ction

Nod

esenergy

isigno

red

International Journal of Distributed Sensor Networks 17

Table6Com

paris

onof

dataminingtechniqu

esforw

irelesssensor

networkscon

tinued

Approach

Objectiv

eDM

metho

d

Processin

gSensor

data

Nod

eproperties

Implem

entatio

nLimitatio

nsArchitecture

Attributes

Correlatio

nCon

nectivity

Mob

ility

Role

Nod

etask

Applicationarea

Evaluatio

nmetho

dDatas

ource

Opt

objective

Distributed

Central

Homogenous

Heterogeneous

Attribute

Spatial

Temporal

Singlehop

Multihops

Static

Mobile

ClusterheadSensorRelay

Simulation

Analyticalmod

Real

Synthetic

Classifi

catio

nPerson

identifi

catio

nalgorithm

s[109]

Identifyhu

man

behavior

Decision

tree

radicradicradic

radicradic

radicSensea

ndsend

Health

care

radicradic

Classifi

catio

naccuracy

Doesn

otgu

arantee

thec

orrectness

Predictio

nfram

ework[103]

Distrib

uted

predictio

nDecision

tree

radicradic

radicradicradic

radicradic

Localprediction

Generic

radicradic

Predictio

naccuracy

Com

putatio

nal

complexity

NNTC

[96]

Real-time

classificatio

nNearest

neighb

orradicradic

radicradic

radicradic

Sensea

ndsend

Generic

radicradicradic

Classifi

catio

naccuracy

Not

evaluatedon

realdataset

LWClass[100]

Preserve

WSN

sresources

KNN

radicradic

radicradic

radicradic

Sensea

ndsend

Ubiqu

itous

environm

ents

radicradic

Resource

awareness

Non

adaptio

nto

conceptd

rift

FVLD

[104

]Lo

w-dim

ensio

nfeaturev

ector

generatio

nKN

NM

Lradic

radicradic

radicradic

radicradic

Classifi

catio

nVe

hicle

classificatio

nradic

radicEn

ergy

Highcostof

feature

vector

transm

ission

Fuzzypredictor

mod

el[99]

Occup

ancy

predictio

nFu

zzyrules

radicradic

radicradic

radicradic

Sensea

ndsend

Health

care

radicradic

Predictio

naccuracy

Ineffi

cientfor

complex

scenarios

Onlinelearning

[105]

Increm

ental

classificatio

nSV

Mradic

radicradic

radicradic

radicradic

Classifi

catio

nEn

vironm

ental

mon

itorin

gradic

radicEn

ergy

Com

putatio

nal

complexity

One-class

quarter-sphere

SVM

[108]

Ano

maly

detection

SVM

radicradic

radicradic

radicradicradic

Localano

maly

detection

Habitat

mon

itorin

gradic

radicEn

ergy

Igno

resspatia

lcorrelation

18 International Journal of Distributed Sensor Networks

mining becomes difficult because updates on this structureshould be persisted over time

Node Role Node can perform three types of role [33] asfollows

(i) Regular Sensor These are the nodes with limitedresources and they are used to sense the phenomenaand send the sensed data to the base station

(ii) Cluster Head Cluster head can be a regular sensornode or it can be rich in resources In centralizedapproaches cluster head is a regular sensor node thatonly controls the cluster membership In distributedapproaches besides responding for cluster formationCHs perform aggregationfusion of collected sensorsrsquodata Therefore they are equipped with significantlymore computation and communication resources

(iii) Relay It is the node that acts as medium to transmitthe data packet from one node to the others

Node Task In centralized approach node task is to sense thephenomena being monitored and send the sensed data to thebase station In distributed approaches node can performcomputation and can take action based on the detectedphenomena or target

55 Application Area We also evaluated the type of applica-tion benefited fromWSNs data mining techniques Here weexemplify some real-world applications as follows

(i) First is the environmental monitoring [5ndash7 51 5887] in which sensors are deployed in harsh andunattended regions to monitor the natural environ-ment Data mining techniques can identify when andwhere an event may occur and trigger an alarm upondetection

(ii) Second is the habitant and health monitoring [1 299 109] in which patientshumans are equipped withsmall sensors on multiple different positions of theirbody tomonitor their health or behaviorDataminingtechnique can identify the abnormal behavior andhelp to take effective action

(iii) Third is the object tracking [3 4 65 66] in whichsensors are embedded inmoving targets to track themin real-time Data mining techniques help to improvethe estimation of the location of targets and also tomake tracking more efficient and accurate

(iv) Fourth is the WSNs performance [46 48 50 51]WSNs are usually unattended and deployed in harshenvironment Sensor nodes are resource constrainedespecially in terms of power Data mining techniqueshelp to identify the faulty or dead nodes Theyalso help to conserve energy by using in-networkprocessing in which aggregated data is sent to centralside

(v) Fifth is the data analysis [67 84 90] Data miningtechniques help to discover potentially interesting

data patterns in a sensor network for a certainapplication

(vi) Sixth is the real-time monitoring [64 65 85] Datamining techniques especially distributed techniqueshelp to identify certain patterns and predict futureevents in a given time window which make real-timeresponse and action feasible

56 Implementation Each technique is also evaluated interms of experimental validation that is which dataset isused which WSNs optimization objectives are achieved andso forth

Evaluation Method Analytical modeling simulation andreal deployment are the most commonly used techniques toanalyze the performance of data mining technique forWSNs

(i) Analytical Modeling This method is very complexand usually certain simplifications are assumed topredict the performance of the proposed schemeSuch assumptions and simplifications may lead toimprecise results with limited confidence

(ii) Simulation It is the most popular and effectiveapproach to design and test any proposed schemein terms of cost and time it also provides higherlevel of details as comparedwith real implementationHowever the appropriate selection of a simulationframework according to problem and network char-acteristics is a critical task

(iii) Real Deployment It may not be feasible to evaluatethe performance of these techniques through realdeployment due to the unavailability of appropriatehardware in terms of technical and design limitationsUsually the real deployment requires hundreds ofsensor nodes and cost becomes another importantissue In a nutshell evaluating any technique pro-posed for WSNs through real deployment can getthe most convincing results although the evaluatingprocess is complex costly and time consuming

Data Source It refers to dataset use to experimentally validatethe proposed technique Two types of dataset are usedgenerally that is synthetic and real It is observed from thispaper that most of the techniques use the simulation onsynthetic dataset to validate the result In this paper it isobserved that most of the studies used the simulation due tolimited processing power of sensor nodes

Optimization Objective SinceWSNs are constrained in termsof different resources the technique is also evaluated in theoptimization objective that has been achieved Most of thetechniques consider the resource constraint and differentdesign philosophies of network None of them can workefficiently for all of the performance metrics like networksize communication overhead energy efficiency memoryconsumption node mobility and and so forth The largevariations in the performance metrics make it a difficult taskto present a comprehensive evaluation

International Journal of Distributed Sensor Networks 19

6 Limitations of Existing Data MiningTechniques for WSNs

Tables 2ndash6 show the characteristics of datamining techniquesdesigned for WSNs It is observed from comparative analysisthat the existing techniques have the following shortcomings

(i) Most of the techniques do not take into account theheterogeneous data and assume that the sensor data ishomogenous [42 46 49ndash51 65 87 110] They ignorethe fact that different attributes together can improvethe mining accuracy In some cases homogenousdata cannot contribute appropriately toward real-time decision

(ii) The majority of techniques only considers the spatialor temporal or spatiotemporal correlations [65ndash6787 88] among sensor data of neighboring nodes anddoes not consider the attribute dependency amongsensor nodes This in turn increases the computa-tional complexity and reduces the accuracy of miningtechnique

(iii) The techniqueswhich consider spatial correlation [51]among sensor data of neighboring nodes suffer fromthe choice of appropriate neighborhood range Tech-niques which consider temporal correlation amongsensor data suffers from the choice of the size of thesliding window

(iv) The majority of techniques uses centralized approach[21 42ndash44 46 58 84 101] in which all data istransmitted to the sink node for identifying certainpatterns These techniques cause much communica-tion overhead and delay the response time Whilethe techniques that used distributed architecture opti-mize response time and energy consumption theyhave the same problem as that of the centralizedapproach if the aggregatorcluster head has a largenumber of nodes under its membership

(v) Excluding a few the performance of all of the schemesdiscussed in this paper has been evaluated with thehelp of different simulation tools Although the num-ber of simulators is available and plays an importantrole for developing and testing new technique thereis always some kind of risk involved as simulationresults may not be accurate In order to analyze aprotocol more effectively it is important to knowdifferent available tools andunderstand the associatedbenefits and limitationsDue to different performancerequirements according to specific applications ageneral tool for sensor networks is still lacking atpresent

(vi) The techniques evaluated by using analytical mod-eling [21 23 46 49 100 109] used certain sim-plification and assumption to evaluate the perfor-mance of proposed technique Such assumptions andsimplifications may lead to imprecise results withlimited confidence None of the proposed techniqueis evaluated by using real deployment Although realdeployment is complex costly and time consuming

accurate results can only be obtained by using realdeployment

(vii) Excluding a few [22 103 109] the majority oftechniques assumes that sensor nodes are stationaryand do not consider nodes mobility Applying thesetechniques for mobile networks or the networks withdynamic changed topology would be challenging

(viii) Most of the techniques used the synthetic dataAlthough synthetic data is easily available therealways been chances that results generated on syn-thetic data are not accurate

(ix) For the data mining techniques themselves fre-quent pattern mining [15ndash20] approaches suffer fromchoice of proper and flexible support and confidencethreshold Clustering techniques [11ndash14] suffer fromthe choice of an appropriate parameter of clusterwidth and computing the distance between datainstances in heterogeneous data is computationallyexpensive whereas classification-based techniques[24ndash26] require some prior knowledge to classify theincoming data stream However learning accurateclassification model is challenging if the number ofvariables is large in deployed WSNs

7 Future Research Directions

It is observed from the analysis of existing data mining workon sensor network-based application there are still shortcom-ings in existing techniques By seeing these shortcomingsand special characteristics of WSNs there is a need for datamining technique designed for WSNs The technique shouldbe based on the following requirements

(i) The technique should combine offline learningmech-anisms with distributed and online data processing

(ii) It should also consider the resource constraint ofWSN and its special characteristics such as nodemobility and network topology

(iii) The technique should consider heterogeneous dataand dependencies among spatial temporal andattribute correlations which may exist between adja-cent nodes

(iv) During online mining the technique should be capa-ble for incremental learning

(v) The technique should have low computation com-plexity and be easy to be implemented

Based on aforementioned requirements for WSN ahybrid data mining framework is proposed as shown inFigure 6 In this framework sensor nodes use their pro-cessing abilities to locally carry out mining processing andtransmit only the required and partially processed data calledlocal models Single-pass algorithms are applied for networkdata processing as the data is continuously arriving and notavailable for the next scan

Local models contain the compact event patterns ratherthan raw data which address the issue of communication

20 International Journal of Distributed Sensor Networks

Node data processingData selectionRemove duplicationAggregationSummarizationData fusionclusteringAssociation analysismiddot middot middotmiddot middot middot

middot middot middot

Sensor datastream

Global model

Approximateresults

Network model Local modelQuery

Users

Sinkbasestation

In-network processingCentralized processing

Central data processingFrequent pattern miningClassificationClusteringIncremental learningPredicationAnomaly detectionTime series analysis

Network data processingLocal model integrationNetwork analysisReal time decisionsNetwork maintenance

Network patternidentification

monitoring

Sing

le p

ass

Mul

ti pa

ss

Figure 6 Proposed hybrid framework for sensor network based applications

overhead associated with data transfer Local models aredistributed on entire network which are integrated at specialnode which is resource sufficient as compared with othersensor nodes As a result a network model is computed that ismore abstract than local model and is transferred to the basestationsink inmultihop fashionThenetworkmodels are thenintegrated at base stationsink to get the global view of entirenetwork named the global model As a result approximatequery answers are returned to endusers

This framework addresses the following shortcomings ofthe existing techniques

(i) It combines the offline learning mechanisms withdistributed and online data processing The dynamicnature of WSNs data requires real-time analysismethodologies and systems Centralized processingthrough high-end computing is also required forgenerating offline predictive insights which in turncan facilitate real-time analysis The applications thatrequire real-time response and actions can use net-work model for decision and knowledge extractionThe applications that need extensive data analysis fortheir decision making can use global model and per-form central processing on base the stationsink Thenetwork model forwards the processed informationto global model for extensive predictive insight

(ii) Since the data management is a crucial issue inWSNsdata [111] in order to deal with large-scale data fromWSNs the proposed framework splits the data pro-cessing tasks at multiple locations in-network pro-cessing and processing at central server In-networkprocessing splits the large task into smaller ones atnode level and cluster head which is distributed overthe entire network and executes parallelly At the node

level storage capacities of single nodes are used tocompute the local model which contains aggregateddata from single node whereas cluster head acquiresthe data from group of nodes and aggregate datareadings over a certain region or period As a resultnetwork model is computed at each cluster headwhich contains compact data from set of nodes andreduces data size to be transmitted Network modelscan be integrated at sink to get the global view ofreal-time applications Since the sink at network levelhas restricted resource and cannot process large-scaledata for predictive analysis therefore network mod-els are sent to central server where global models canbe computed for predictive offline analysis Historicalquery from the user can also be addressed fromcentral server whereas instant query can be handledby sink to support real-time response In this way ofdata distribution the proposed framework is feasibleto deal with large amount of data obtained fromWSNs

(iii) It can consider the resource constraint of sensornode by using context-awareness techniques Mem-ory energy [79] and bandwidth are considered inthe implementation of data processing on the sensorsfor example many summarization and aggregationtechniques can be adopted to reduce energy andbandwidth consumption

(iv) The framework can address the problem quicklychanging nature of WSNs data where characteristicsof the monitored process may change over timeand render the old models outdated This problemcan be addressed using the incremental learning

International Journal of Distributed Sensor Networks 21

mechanism [39 112] that helps the model to updatenew information

(v) The framework can identified the spatial-temporalcorrelation at local model by using data correlation-based clustering whereas attribute correlation can beidentified at global model by using the multipass datamining algorithms

Currently we are working on implementation of thishybrid framework and the implementationwill be completedin the near future

8 Conclusion

The emerging need for the data mining techniques in thefield of WSNs resulted in the development of numerousalgorithms Each one of these algorithms solves certainissues related to the appropriate WSNs type and applicationIn this paper we analyzed discussed and compared therelated existing research approaches We observed that thetechniques intended for mining sensor data at the networkside are helpful for taking real-time decision aswell as serve asprerequisite for development of effective mechanism for datastorage retrieval query and transaction processing at centralside Moreover we have presented problem-based taxonomyan overall analysis and review of the past research and theirlimitations which can provide insights for endusers in apply-ing or developing an appropriate data mining method andappropriate technology forWSNs Based on these limitationswe have proposed a hybrid framework which can addressthe shortcomings of existing work We have also discussedthe challenges for implementing data mining techniques inresource-constrained WSNs Besides there are a number ofopen issues in existing studies which need to be addressedSurely the number of WSNs applications presented hereis neither complete nor exhaustive but merely a sample ofapplications that demonstrate the usefulness and possibleapplications of data mining method in sensor network

We believe that WSNs applications will become moremature and popular with the advancement of sensor tech-nology and sensor data will become more informationrich Mining techniques will then be very significant inorder to conduct advanced analysis such as determiningtrends and finding interesting patterns thus enhancingWSNsperformance and operation The intention to present thispaper is to stimulate interests in utilizing and developing theprevious studies into emerging applications

Acknowledgments

This work was supported in part by the Joint Funds ofNSFC-Microsoft Research Asia under Grant no 60933012the Specialized Research Fund for the Doctoral Programof Higher Education under Grant no 20110142110062 andInternational SampT Cooperation Program of Hubei Provinceunder Grant no 2010BFA008

References

[1] A Rozyyev H Hasbullah and F Subhan ldquoIndoor child track-ing in wireless sensor network using fuzzy logic techniquerdquoResearch Journal of Information Technology vol 3 no 2 pp 81ndash92 2011

[2] R Szewczyk E Osterweil J Polastre M Hamilton A Main-waring and D Estrin ldquoHabitat monitoring with sensor net-worksrdquo Communications of the ACM vol 47 no 6 pp 34ndash402004

[3] S H Chauhdary A K Bashir S C Shah and M S ParkldquoEOATR energy efficient object tracking by auto adjustingtransmission range in wireless sensor networkrdquo Journal ofApplied Sciences vol 9 no 24 pp 4247ndash4252 2009

[4] P K Biswas and S Phoha ldquoSelf-organizing sensor networks forintegrated target surveillancerdquo IEEETransactions onComputersvol 55 no 8 pp 1033ndash1047 2006

[5] L T Lee and C W Chen ldquoSynchronizing sensor networkswith pulse coupled and cluster based approachesrdquo InformationTechnology Journal vol 7 no 5 pp 737ndash745 2008

[6] N Sabri S A Aljunid B Ahmad A Yahya R KamaruddinandM S Salim ldquoWireless sensor actor network based on fuzzyinference system for greenhouse climate controlrdquo Journal ofApplied Sciences vol 11 no 17 pp 3104ndash3116 2011

[7] D Kumar ldquoMonitoring forest cover changes using remotesensing and GIS a global prospectiverdquo Research Journal ofEnvironmental Sciences vol 5 pp 105ndash123 2011

[8] J Yick B Mukherjee and D Ghosal ldquoWireless sensor networksurveyrdquoComputerNetworks vol 52 no 12 pp 2292ndash2330 2008

[9] T Arampatzis J Lygeros and S Manesis ldquoA survey of appli-cations of wireless sensors and wireless sensor networksrdquoin Proceedings of the 20th IEEE International Symposium onIntelligent Control (ISIC rsquo05) pp 719ndash724 June 2005

[10] Y-C Tseng M-S Pan and Y-Y Tsai ldquoWireless sensor net-works for emergency navigationrdquo Computer vol 39 no 7 pp55ndash62 2006

[11] T Yairi Y Kato and K Hori ldquoFault detection by miningassociation rules fromhouse-keeping datardquo inProceedings of the6th International Symposium on Artificial Intelligence Roboticsand Automation in Space pp 18ndash21 2001

[12] O Horovitz S Krishnaswamy and M M Gaber ldquoA fuzzyapproach for interpretation of ubiquitous data stream clusteringand its application in road safetyrdquo Intelligent Data Analysis vol11 no 1 pp 89ndash108 2007

[13] J Gama P P Rodrigues and L Lopes ldquoClustering distributedsensor data streams using local processing and reduced com-municationrdquo Intelligent Data Analysis vol 15 no 1 pp 3ndash282011

[14] Z A Aghbari I Kamel and T Awad ldquoOn clustering largenumber of data streamsrdquo Intelligent Data Analysis vol 16 no1 pp 69ndash91 2012

[15] A Boukerche and S Samarah ldquoAn efficient data extractionmechanism for mining association rules from wireless sensornetworksrdquo in Proceedings of the IEEE International Conferenceon Communications (ICC rsquo07) pp 3936ndash3941 June 2007

[16] Y Chi H Wang P S Yu and R R Muntz ldquoMomentmaintaining closed frequent itemsets over a stream slidingwindowrdquo inProceedings of the 4th IEEE International Conferenceon Data Mining (ICDM rsquo04) pp 59ndash66 November 2004

[17] M Deypir and M H Sadreddini ldquoEclatDS an efficient slid-ing window based frequent pattern mining method for data

22 International Journal of Distributed Sensor Networks

streamsrdquo Intelligent Data Analysis vol 15 no 4 pp 571ndash5872011

[18] J Gama A Ganguly O Omitaomu R Vatsavai and M GaberldquoKnowledge discovery from data streamsrdquo Intelligent DataAnalysis vol 13 no 3 pp 403ndash404 2009

[19] B George J M Kang and S Shekhar ldquoSpatio-temporal sensorgraphs (STSG) a data model for the discovery of spatio-temporal patternsrdquo Intelligent Data Analysis vol 13 no 3 pp457ndash475 2009

[20] A Mahmood K Shi and S Khatoon ldquoMining data generatedby sensor networks a surveyrdquo Information Technology Journalvol 11 pp 1534ndash1543 2012

[21] D J Cook M Youngblood E O Heierman III et alldquoMavHome an agent-based smart homerdquo in Proceedings of the1st IEEE International Conference on Pervasive Computing andCommunications (PerCom rsquo03) pp 521ndash524 March 2003

[22] J Rabatel S Bringay and P Poncelet ldquoSO MAD sensorminingfor anomaly detection in railway datardquo in Advances in DataMining Applications andTheoretical Aspects pp 191ndash205 2009

[23] V Guralnik and K Z Haigh ldquoLearning models of humanbehaviour with sequential patternsrdquo in Proceedings of the AAAI-02 Workshop on Automation as Caregiver pp 24ndash30 2002

[24] S Huang and Y Dong ldquoAn active learning system for miningtime-changing data streamsrdquo Intelligent Data Analysis vol 11no 4 pp 401ndash419 2007

[25] J Beringer and E Hullermeier ldquoEfficient instance-based learn-ing on data streamsrdquo Intelligent Data Analysis vol 11 no 6 pp627ndash650 2007

[26] E J Spinosaa A PD L F deCarvalhoa and J Gamab ldquoNoveltydetection with application to data streamsrdquo Intelligent DataAnalysis vol 13 no 3 pp 405ndash422 2009

[27] M Xie S Han B Tian and S Parvin ldquoAnomaly detectionin wireless sensor networks a surveyrdquo Journal of Network andComputer Applications vol 34 no 4 pp 1302ndash1325 2011

[28] Y Zhang N Meratnia and P Havinga ldquoOutlier detectiontechniques for wireless sensor networks a surveyrdquo IEEE Com-munications Surveys and Tutorials vol 12 no 2 pp 159ndash1702010

[29] V Chandola A Banerjee and V Kumar ldquoAnomaly detection asurveyrdquo ACM Computing Surveys vol 41 no 3 article 15 2009

[30] VMaojo and J Sanandre ldquoA survey of data mining techniquesrdquoMedical Data Analysis Lecture Notes in Computer Science vol1933 pp 17ndash22 2000

[31] W Jinlong X Congfu C Weidong and P Yunhe ldquoSurveyof the study on frequent pattern mining in data streamsrdquo inProceedings of the IEEE International Conference on SystemsMan and Cybernetics (SMC rsquo04) pp 5917ndash5922 October 2004

[32] J Cheng Y Ke and W Ng ldquoA survey on algorithms formining frequent itemsets over data streamsrdquo Knowledge andInformation Systems vol 16 no 1 pp 1ndash27 2008

[33] A A Abbasi andM Younis ldquoA survey on clustering algorithmsfor wireless sensor networksrdquo Computer Communications vol30 no 14-15 pp 2826ndash2841 2007

[34] O Boyinbode H Le and M Takizawa ldquoA survey on clusteringalgorithms for wireless sensor networksrdquo International Journalof Space-Based and SituatedComputing vol 1 no 2 pp 130ndash1362007

[35] M M Gaber A Zaslavsky and S Krishnaswamy ldquoA survey ofclassificationmethods in data streamsrdquo inData Streams pp 39ndash59 Springer 2007

[36] R Agrawal and R Srikant ldquoFast algorithms for mining associ-ation rulesrdquo in Proceedings of the 20th International ConferenceVery Large Data Bases (VLDB rsquo94) pp 487ndash499 Citeseer 1994

[37] R J Bayardo Jr ldquoEfficiently mining long patterns fromdatabasesrdquo SIGMOD Record vol 27 no 2 pp 85ndash93 1998

[38] S Brin RMotwani andC Silverstein ldquoBeyondmarket basketsgeneralizing association rules to correlationsrdquo SIGMODRecordvol 26 no 2 pp 265ndash276 1997

[39] W Cheung and O R Zaiane ldquoIncremental mining of frequentpatterns without candidate generation or support constraintrdquoin Proceedings of 7th International Database Engineering andApplications Symposium pp 111ndash116 2003

[40] R Agrawal T Imielinski and A Swami ldquoMining associationrules between sets of items in large databasesrdquo in Proceeding ofSIGMOD pp 207ndash216

[41] J Han J Pei Y Yin and R Mao ldquoMining frequent pat-terns without candidate generation a frequent-pattern treeapproachrdquo Data Mining and Knowledge Discovery vol 8 no 1pp 53ndash87 2004

[42] M Halatchev and L Gruenwald ldquoEstimating missing valuesin related sensor data streamsrdquo in Proceedings of the 11thInternational Conference on Management of Data (COMADrsquo05) 2005

[43] N Jiang ldquoDiscovering association rules in data streams basedon closed pattern miningrdquo in Proceedings of the SIGMODWorkshop on Innovative Database Research 2007

[44] N Jiang and L Gruenwald ldquoEstimating missing data in datastreamsrdquo Advances in Databases Concepts Systems and Appli-cations pp 981ndash987 2007

[45] N Jiang and L Gruenwald ldquoCFI-stream mining closed fre-quent itemsets in data streamsrdquo in Proceedings of the 12th ACMSIGKDD International Conference on Knowledge Discovery andData Mining (KDD rsquo06) pp 592ndash597 August 2006

[46] K Loo I Tong and B Kao ldquoOnline algorithms for min-ing inter-stream associations from large sensor networksrdquo inAdvances in KnowledgeDiscovery andDataMining pp 291ndash3022005

[47] G S Manku and R Motwani ldquoApproximate frequency countsover data streamsrdquo in Proceedings of the 28th InternationalConference on Very Large Data Bases pp 346ndash357 2002

[48] S K Chong S Krishnaswamy S W Loke and M M GaberldquoUsing association rules for energy conservation in wirelesssensor networksrdquo in Proceedings of the 23rd Annual ACMSymposium on Applied Computing (SAC rsquo08) pp 971ndash975March 2008

[49] S K Tanbeer C F Ahmed B-S Jeong and Y-K Lee ldquoEfficientmining of association rules from wireless sensor networksrdquo inProceedings of the 11th International Conference on AdvancedCommunication Technology (ICACT rsquo09) pp 719ndash724 February2009

[50] A Boukerche and S Samarah ldquoA novel algorithm for miningassociation rules in Wireless Ad Hoc Sensor Networksrdquo IEEETransactions on Parallel and Distributed Systems vol 19 no 7pp 865ndash877 2008

[51] K Romer ldquoDistributed mining of spatio-temporal event pat-terns in sensor networksrdquo in Proceedings of the 1st Euro-American Workshop on Middleware for Sensor Networks(EAWMS rsquo06) 2006

[52] BTnode platform httpwwwbtnodeethzch[53] R Agrawal and R Srikant ldquoMining sequential patternsrdquo in

Proceedings of the IEEE 11th International Conference on DataEngineering pp 3ndash14 March 1995

International Journal of Distributed Sensor Networks 23

[54] R Srikant and R Agrawal ldquoMining sequential patterns gen-eralizations and performance improvementsrdquo in Proceedings ofthe Advances in Database Technology (EDBT rsquo96) pp 1ndash17 1996

[55] F Masseglia F Cathala and P Poncelet ldquoThe PSP approachfor mining sequential patternsrdquo Principles of Data Mining andKnowledge Discovery pp 176ndash184 1998

[56] J Han J Pei B Mortazavi-Asl Q Chen U Dayal and M-CHsu ldquoFreeSpan frequent pattern-projected sequential patternminingrdquo in Proceedings of the Sixth ACMSIGKDD InternationalConference onKnowledgeDiscovery andDataMining (KDD rsquo01)pp 355ndash359 August 2000

[57] J Pei J Han B Mortazavi-Asl et al ldquoPrefixSpan min-ing sequential patterns efficiently by prefix-projected patterngrowthrdquo in Proceedings of the 17th International Conference onData Engineering pp 215ndash224 April 2001

[58] F Esposito T M A Basile N Di Mauro and S Ferilli ldquoA rela-tional approach to sensor network data miningrdquo InformationRetrieval and Mining in Distributed Environments pp 163ndash1812010

[59] F Esposito N Di Mauro T M A Basile and S FerillildquoMulti-dimensional relational sequence miningrdquo FundamentaInformaticae vol 89 no 1 pp 23ndash43 2008

[60] R Agrawal H Mannila R Srikant et al ldquoFast discovery ofassociation rulesrdquo inAdvances in KnowledgeDiscovery andDataMining pp 307ndash328 AAAI PressMenlo Park Calif USA 1996

[61] Mica2Dot CrossBow 2005 httpwwwxbowcom[62] Intel Berkeley Research Lab Data httpdbcsailmitedulab-

datalabdatahtml[63] P H Wu W C Peng and M S Chen ldquoMining sequential

alarm patterns in a telecommunication databaserdquo in Databasesin Telecommunications II pp 37ndash51 2001

[64] V S Tseng and E H-C Lu ldquoEnergy-efficient real-time objecttracking in multi-level sensor networks by mining and predict-ing movement patternsrdquo Journal of Systems and Software vol82 no 4 pp 697ndash706 2009

[65] V S Tseng and K W Lin ldquoEnergy efficient strategies for objecttracking in sensor networks a data mining approachrdquo Journalof Systems and Software vol 80 no 10 pp 1678ndash1698 2007

[66] S Samarah M Al-Hajri and A Boukerche ldquoA predictiveenergy-efficient technique to support object-tracking sensornetworksrdquo IEEE Transactions on Vehicular Technology vol 60no 2 pp 656ndash663 2011

[67] A Taherkordi R Mohammadi and F Eliassen ldquoA commu-nication-efficient distributed clustering algorithm for sensornetworksrdquo in Proceedings of the 22nd International Conferenceon Advanced Information Networking and Applications Work-shopsSymposia (AINA rsquo08) pp 634ndash638 March 2008

[68] G Gupta and M Younis ldquoLoad-balanced clustering of wirelesssensor networksrdquo in Proceedings of the International Conferenceon Communications (ICC rsquo03) vol 3 pp 1848ndash1852 May 2003

[69] S Bandyopadhyay and E J Coyle ldquoAn energy efficient hier-archical clustering algorithm for wireless sensor networksrdquo inProceedings of the 22nd Annual Joint Conference on the IEEEComputer and Communications Societies pp 1713ndash1723 April2003

[70] S Ghiasi A Srivastava X Yang and M Sarrafzadeh ldquoOptimalenergy aware clustering in sensor networksrdquo Sensors vol 2 no7 pp 258ndash269 2002

[71] O Younis and S Fahmy ldquoHEED a hybrid energy-efficientdistributed clustering approach for ad hoc sensor networksrdquoIEEE Transactions on Mobile Computing vol 3 no 4 pp 366ndash379 2004

[72] M Younis M Youssef and K Arisha ldquoEnergy-aware manage-ment for cluster-based sensor networksrdquo Computer Networksvol 43 no 5 pp 649ndash668 2003

[73] Y T Hou Y Shi H D Sherali and S F Midkiff ldquoOn energyprovisioning and relay node placement for wireless sensornetworksrdquo IEEE Transactions on Wireless Communications vol4 no 5 pp 2579ndash2590 2005

[74] T Wu and S Biswas ldquoA self-reorganizing slot allocation proto-col for multi-cluster sensor networksrdquo in Proceedings of the 4thInternational Symposium on Information Processing in SensorNetworks (IPSN rsquo05) pp 309ndash316 April 2005

[75] K Dasgupta K Kalpakis and P Namjoshi ldquoAn efficientclustering-based heuristic for data gathering and aggregationin sensor networksrdquo in Proceedings of the IEEE Wireless Com-munications and Networking Conference (WCNC rsquo03) vol 3 pp1948ndash1953 2003

[76] M Demirbas A Arora and V Mittal ldquoFLOC A fast local clus-tering service for wireless sensor networksrdquo in Proceedings ofWorkshop on Dependability Issues in Wireless Ad Hoc Networksand Sensor Networks (DIWANS rsquo04) 2004

[77] P Ding J Holliday and A Celik ldquoDistributed energy-efficienthierarchical clustering for wireless sensor networksrdquo in Pro-ceedings of the 1st IEEE International Conference on DistributedComputing in Sensor Systems (DCOSS rsquo05) pp 466ndash467 July2005

[78] H Chan and A Perrig ldquoACE an emergent algorithm for highlyuniform cluster formationrdquoWireless Sensor Networks vol 2920pp 154ndash171 2004

[79] H Chan M Luk and A Perrig ldquoUsing clustering informationfor sensor network localizationrdquo in Proceedings of the 1st IEEEInternational Conference on Distributed Computing in SensorSystems (DCOSS rsquo05) pp 109ndash125 July 2005

[80] H Huang and J Wu ldquoA probabilistic clustering algorithmin wireless sensor networksrdquo in Proceeding of IEEE 62ndSemiannual Vehicular Technology Conference (VTC rsquo05) p 17962005

[81] A Youssef M Younis M Youssef and A Agrawala ldquoDis-tributed formation of overlappingmulti-hop clusters in wirelesssensor networksrdquo in Proceedings of the 49th Annual IEEE GlobalCommunication Conference (Globecom rsquo06) pp 1ndash6 December2006

[82] S Dai P Wang L Gao and S Zheng ldquoMining clusteringalgorithm in wireless sensor networksrdquo in Proceedings of theIEEE International Conference on Granular Computing (GRCrsquo08) pp 178ndash182 August 2008

[83] W R Heinzelman A Chandrakasan and H Balakrish-nan ldquoEnergy-efficient communication protocol for wirelessmicrosensor networksrdquo in Proceedings of the 33rd AnnualHawaii International Conference on System Siences (HICSS rsquo00)vol 2 p 223 January 2000

[84] C Liu K Wu and J Pei ldquoA dynamic clustering and schedulingapproach to energy saving in data collection from wirelesssensor networksrdquo in Proceedings of the 2nd Annual IEEE Com-munications Society Conference on Sensor and AdHoc Commu-nications and Networks (SECON rsquo05) pp 374ndash385 September2005

[85] L Guo C Ai X Wang Z Cai and Y Li ldquoReal time clusteringof sensory data in wireless sensor networksrdquo in Proceedingsof the IEEE 28th International Performance Computing andCommunications Conference (IPCCC rsquo09) pp 33ndash40 December2009

24 International Journal of Distributed Sensor Networks

[86] M H Yeo M S Lee S J Lee and J S Yoo ldquoData correlation-based clustering in sensor networksrdquo in Proceedings of the Inter-national Symposium on Computer Science and its Applications(CSA rsquo08) pp 332ndash337 October 2008

[87] P Beyens A Nowe and K Steenhaut ldquoHigh-density wirelesssensor networks a new clustering approach for prediction-based monitoringrdquo in Proceedings of the 2nd European Work-shop on Wireless Sensor Networks (EWSN rsquo05) pp 188ndash196February 2005

[88] S Yoon and C Shahabi ldquoThe Clustered AGgregation (CAG)technique leveraging spatial and temporal correlations in wire-less sensor networksrdquo ACM Transactions on Sensor Networksvol 3 no 1 Article ID 1210672 2007

[89] K Wang S A Ayyash T D C Little and P Basu ldquoAttribute-based clustering for information dissemination in wirelesssensor networksrdquo in Proceedings of the 2nd Annual IEEE Com-munications Society Conference on Sensor and AdHoc Commu-nications and Networks (SECON rsquo05) pp 498ndash509 Santa ClaraCalif USA September 2005

[90] X Ma S Li Q Luo et al ldquoDistributed hierarchical clusteringand summarization in sensor networksrdquo in Advances in Dataand Web Management pp 168ndash175 2007

[91] L K Sharma O P Vyas S Schieder et al ldquoNearest neighbourclassification for trajectory datardquo Information and Communica-tion Technologies vol 101 pp 180ndash185 2010

[92] B Chikhaoui S Wang and H Pigot ldquoA new algorithm basedon sequential pattern mining for person identification in ubiq-uitous environmentsrdquo in Proceedings of the 4th InternationalWorkshop on Knowledge Discovery form Sensor Data (ACMSensorKDD rsquo10) pp 20ndash28 Washington DC USA 2010

[93] J R M Bauchet S Giroux H Pigot et al ldquoPervasive assistancein smart homes for people with intellectual disabilities a casestudy on meal preparationrdquo International Journal of AssistiveRobotics and Mechatronics vol 9 no 4 pp 42ndash54 2008

[94] D J Cook andM Schmitter-Edgecombe ldquoAssessing the qualityof activities in a smart environmentrdquoMethods of Information inMedicine vol 48 no 5 pp 480ndash485 2009

[95] I H Witten and E Frank Data Mining Practical MachineLearning Tools and Techniques With Java Implementation Mor-gan Kaufmann 2000

[96] K Sharma M Rajpoot and L K Sharma ldquoNearest neighbourclassification for wireless sensor network datardquo InternationalJournal of Computer Trends and Technology no 2 2011

[97] NS2 Simulator httpwwwisiedunsnamns[98] O P V L K Sharma S Schieder and A K Akasapu ldquoA nearest

neighbour classification for trajectory datardquo in Springer CCISvol 101 pp 180ndash185 2010

[99] M J Akhlaghinia A Lotfi C Langensiepen and N SherkatldquoA fuzzy predictor model for the occupancy prediction of anintelligent inhabited environmentrdquo in Proceedings of the IEEEInternational Conference on Fuzzy Systems (FUZZ rsquo08) pp 939ndash946 June 2008

[100] M Gaber S Krishnaswamy and A Zaslavsky ldquoOn-boardmining of data streams in sensor networksrdquo in AdvancedMethods for Knowledge Discovery from Complex Data pp 307ndash335 2005

[101] M M Gaber S Krishnaswamy and A Zaslavsky ldquoAdaptivemining techniques for data streams using algorithm outputgranularityrdquo in Proceedings of the Australasian Data MiningWorkshop 2003

[102] M M Gaber A Zaslavsky and S Krishnaswamy ldquoResource-aware knowledge discovery in data streamsrdquo in Proceedingsof 1st International Workshop on Knowledge Discovery in DataStreams held in Conjunction ECML and PKDD 2004

[103] S M McConnell and D B Skillicorn ldquoA distributed approachfor prediction in sensor networksrdquo in Proceedings of the Work-shop on Data Mining in Sensor Networks Newport Beach CalifUSA 2005

[104] B Malhotra I Nikolaidis and J Harms ldquoDistributed classifi-cation of acoustic targets in wireless audio-sensor networksrdquoComputer Networks vol 52 no 13 pp 2582ndash2593 2008

[105] K Flouri B Beferull-Lozano and T Tsakalides ldquoTraininga SVM-based classifier in distributed sensor networksrdquo inProceedings of the 14th International Conference onDigital SignalProcessing (DSP rsquo09) pp 1ndash5 2006

[106] K Flouri B Beferull-Lozano and T Tsakalides ldquoEnergy-efficient distributed support vectormachines for wireless sensornetworksrdquo in Proceedings of the EuropeanWorkshop onWirelessSensor Networks 2006

[107] K Flouri B Beferull-Lozano and T Tsakalides ldquoDistributedconsensus algorithms for SVM training in wireless sensornetworksrdquo in Proceedings of the 16th European Signal ProcessingConference (EUSIPCO 09) 2008

[108] S Rajasegarar C Leckie M Palaniswami and J C BezdekldquoQuarter sphere based distributed anomaly detection in wire-less sensor networksrdquo in Proceedings of the IEEE InternationalConference on Communications (ICC rsquo07) pp 3864ndash3869 June2007

[109] B Chikhaoui S Wang and H Pigot ldquoA new algorithm basedon sequential pattern mining for person identification in ubiq-uitous environmentsrdquo in Proceedings of the 4th InternationalWorkshop on Knowledge Discovery form Sensor Data (ACMSensorKDD rsquo10) pp 20ndash28 Washington DC USA 2010

[110] K Romer and F Mattern ldquoThe design space of wireless sensornetworksrdquo IEEEWireless Communications vol 11 no 6 pp 54ndash61 2004

[111] O Diallo J J P C Rodrigues and M Sene ldquoReal-time datamanagement on wireless sensor networks a surveyrdquo Journal ofNetwork andComputer Applications vol 35 no 3 pp 1013ndash10212012

[112] Y Yao L Feng B Jin and F Chen ldquoAn incremental learningapproachwith SupportVectorMachine for network data streamclassification problemrdquo Information Technology Journal vol 11no 2 pp 200ndash208 2012

Submit your manuscripts athttpwwwhindawicom

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2013Part I

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

DistributedSensor Networks

International Journal of

ISRN Signal Processing

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Mechanical Engineering

Advances in

Modelling amp Simulation in EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Advances inOptoElectronics

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2013

ISRN Sensor Networks

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawi Publishing Corporation httpwwwhindawicom Volume 2013

The Scientific World Journal

ISRN Robotics

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

International Journal of

Antennas andPropagation

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

ISRN Electronics

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

thinspJournalthinspofthinsp

Sensors

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Active and Passive Electronic Components

Chemical EngineeringInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Electrical and Computer Engineering

Journal of

ISRN Civil Engineering

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Advances inAcoustics ampVibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Page 12: ReviewArticle Data Mining Techniques for Wireless Sensor ...home.etf.bg.ac.rs/~vm/os/dmsw/Data Mining... · have a large impact on type of data mining algorithm to choose;therefore,onehastodecidetheprocessing

12 International Journal of Distributed Sensor Networks

at each node and to minimize the computational complexityThe sensor data that lies outside the quarter sphere isconsidered as an outlier Each node communicates onlythe radius information of sphere with its parent for outlierclassification This technique identifies outliers from the datameasurements collected after a long-time window and is notperformed in real time The technique also ignores spatialcorrelation of neighboring nodes which makes the results oflocal outliers inaccurate The technique is evaluated by usingthe real sensor measurement collected from deployment ofwireless sensors in the Great Duck Island Project [2] formonitoring the habitat of sea birds The algorithm is imple-mented in Matlab and two simulations were run to measurethe computational strategy and various kernel functionsResults reveal that the proposed technique achieves signifi-cant energy savings in terms of communication overhead inthe network

5 Comparison of Data Mining Techniquesfor WSNs

This section identifies several common and different aspectsof data mining techniques specially designed for WSNsdiscussed above These aspects will be used as metrics in thecomparative Tables 2 3 4 5 and 6 First evaluation aspectsfor different techniques are discussed and then comparativetables are presented to compare and differentiate existing datamining techniques for WSNs data

51 Input Sensor Data Sensor data can be viewed as largevolume of real-valued data that is continuously collectedfrom WSNs The type of input sensor data demonstrateswhich data mining techniques can be used to analyze thedata Data mining techniques usually consider following twocharacteristics of data

Attribute Mining techniques can identify the associationbetween data attributes Attributes can be homogenous [50] orheterogeneous [33 48] Homogenous attribute means sensingsingle-value attribute for example temperature only Forheterogeneous case each nodemay be equippedwithmultiplesensors and can sense multiple attributes for example tem-perature humidity and pressure The data mining techniqueshould be able to identify the correlation between multipleattributes

Correlation Two types of data correlation appear at eachsensor node The first type is attribute correlation that isdependency among data attributes The second type is interms of time and space that is temporal and spatial corre-lation Temporal correlation indicates that the readings fromdifferent sensor node are observed at the same time instantand readings observed at one time instant are related tothe readings observed at the previous time instant whereasspatial correlation indicates that the readings from sensornodes geographically close to each other are expected tobe largely correlated Capturing spatiotemporal correlation

helps to predict future trend of sensor reading and identifica-tion of dead node if reading from correlated sensor ismissing

52 Processing Architecture In order to apply data miningtechnique on sensor data we need to determine the modelsof computation There are two general models Consider thefollowing

CentralizedThe simplest way to analyzeWSNs data is to use acentralized model In this approach entire raw data collectedfromWSNs is transferred to central server whichmaintains adatabase of readings from all of the sensorsThe central serverperforms offline extensive analysis in order to find interestingpatterns from the aggregated data With the size of WSNsincreasing the amount of data transmitted in the system willbecome huge The obvious drawback of this approach is highconsumption of energy and bandwidth Furthermore it is notscalable to very large number of sensors

Distributed Another computation approach uses distributedmodel in which sensor nodes use their processing abilitiesto carry out some mining tasks locally and transmit onlythe required and partially processed data called local modelLocal models contain the compact event patterns rather thanraw data For example data collected from different sensorcan be aggregated before being transmitted to central serverIn these systems an intermediate node called ldquoaggregatorrdquo isused to collect and aggregate the data from different sensorsSince sensor nodes are constrained in resources the challengefor this approach is how to satisfy the mining accuracywhile keeping the communication overhead memory andcomputational cost low

53 Data Mining Method It refers to the data miningalgorithm adapted or developed for unique characteristic ofWSNs data Distributed approaches use one-scan algorithmsfor real-time processing in order to deal with the high dataarrival rate the mining results are expected to be availablewithin short response times whereas centralized approachescollect the sensory data to single site and applies offlinemultiscan technique for extensive data analysis

54 Node Properties The proposed techniques are largelyinfluenced by following types of node properties

Connectivity Single-hop communication is a direct commu-nication between the sensor node and the base station It issimple and easy to implement but limited by communicationdistanceMultihop communication uses some kinds of nodesas relays when transmitting data packets from the source tothe sink which is more complex

Mobility Node mobility increases the complexity of design-ing an appropriate data mining technique for WSNs Themajority of techniques assumes that sensor nodes are staticonly a few techniques consider the node mobility Whennodes are mobile maintaining a certain structure for data

International Journal of Distributed Sensor Networks 13

Table2Com

paris

onof

dataminingtechniqu

esforw

irelesssensor

networks

Approach

Objectiv

eDM

metho

d

Processin

gSensor

data

Nod

eproperties

Implem

entatio

nLimitatio

nsArchitecture

Attributes

Correlatio

nCon

nectivity

Mob

ility

Nod

erole

Nod

etask

Applicationarea

Evaluatio

nmetho

dDatas

ource

Opt

objective

Distributed

Central

Homogenous

Heterogeneous

Attribute

Spatial

Temporal

Singlehop

Multihops

Static

Mobile

ClusterheadSensorRelay

Simulation

AnalyticalMod

Real

Synthetic

Frequent

patte

rnmining

DSA

RM[42]

Missingdata

estim

ation

Aprio

rilik

eradicradic

radicradic

radicradic

Sensea

ndsend

Traffi

cmon

itorin

gradic

radicData

accuracy

Igno

rethes

ensor

thatrepo

rts

different

values

In-networkdata

mining[51]

Eventspatte

rns

discovery

Aprio

rilik

eradic

radicradicradic

radicradic

radic

Aggregatio

nlocalp

attern

mining

Environm

ental

mon

itorin

gradicradicradic

Scalability

Highmem

oryand

commun

ication

Distrib

uted

data

aggregation[15]

ImproveW

SNperfo

rmance

Aprio

rilik

eradic

radicradic

radicradic

radicSupp

ort-b

ased

aggregation

WSN

sperfo

rmance

mon

itorin

gradic

radicDatas

ize

Increasesb

uffer

cost

delayed

crucialm

essages

Onlinea

lgorith

m[46]

Intervallist

ofrepresentatio

nof

WSN

sdata

Lossy

coun

ting

radicradic

radicradic

radicradic

Perio

dical

sensing

WSN

smon

itorin

gradic

radicTimea

ndmem

ory

Datar

edun

dancy

Lightweightrule

learning

[48]

Identifyhigh

lycorrelated

rules

forsensin

gAp

riorilik

eradic

radicradic

radicradic

radicQuery-based

data

sensing

Con

trolW

SNs

operations

radicradic

Energy

Not

valid

ated

well

onrealdata

CARM

[43]

Missingdata

estim

ation

FP-growth

based

radicradic

radicradic

radicradic

Sensea

ndsend

Dataa

nalysis

radicradic

Data

accuracy

Ineffi

cientfor

hand

ling

high

-speed

data

14 International Journal of Distributed Sensor Networks

Table3Com

paris

onof

dataminingtechniqu

esforw

irelesssensor

networkscon

tinued

Approach

Objectiv

eDM

metho

d

Processin

gSensor

data

Nod

eproperties

Implem

entatio

nLimitatio

nsArchitecture

Attributes

Correlatio

nCon

nectivity

Mob

ility

Nod

erole

Nod

etask

Applicationarea

Evaluatio

nmetho

dDatas

ource

Opt

objective

Distributed

Central

Homogenous

Heterogeneous

Attribute

Spatial

Temporal

Singlehop

Multihops

Static

Mobile

Clusterhead

Sensor

Relay

Simulation

Analyticalmod

Real

Synthetic

Frequent

patte

rnmining

Associationrules

mining

fram

ework[50]

Faultand

future

event

predictio

n

FP-growth

usingPL

T-str

uctureradic

radicradic

radicradic

radicradic

Aggregatio

nMon

itorW

SNs

quality

ofserviceradic

radicNoof

messages

Increase

costdu

eto

multip

leDBscan

SP-tr

ee[49]

Disc

over

events

patte

rns

FP-growth

based

radicradic

radicradic

radicradic

Sensea

ndsend

Generic

mon

itorin

gradicradicradic

Mem

ory

Hightre

econstructio

ncost

Sequ

entia

lpattern

mining

Relatio

nal

fram

ework[58]

Multi-

dimensio

nal

correlation

discovery

Aprio

rilik

eradic

radicradicradic

radicradic

Sensea

ndsend

Environm

ental

mon

itorin

gradicradic

Data

representatio

nMem

oryandtim

econsum

ing

Episo

dediscovery(ED)

[21]

Actio

npredictio

n

Generalized

sequ

entia

lpatte

rn(G

SP)

radicradic

radicradic

radicSensea

ndsend

Inhabitants

behavior

predictio

nradicradicradic

Predictio

naccuracy

Ineffi

cientfor

complex

activ

ities

MPG

[64]

Predicto

bjectrsquos

future

movem

ent

Aprio

rilik

eradic

radicradic

radicradicradic

Clusterin

gRe

al-timeo

bject

tracking

radicradic

Tracking

time

andenergy

Not

analyzed

onrealdataset

Con

textual

patte

rns

discovery[22]

Ano

maly

detection

PSP

radicradicradic

radicradic

radicSensea

ndsend

Railw

aymaintenance

radicradic

Ano

maly

precision

Missingreal-time

anom

alypredictio

n

International Journal of Distributed Sensor Networks 15

Table4Com

paris

onof

dataminingtechniqu

esforw

irelesssensor

networkscon

tinued

Approach

Objectiv

eDM

metho

d

Processin

gSensor

data

Nod

eproperties

Implem

entatio

nLimitatio

nsArchitecture

Attributes

Correlatio

nCon

nectivity

Mob

ility

Nod

erole

Nod

etask

Applicationarea

Evaluatio

nmetho

dDatas

ource

Optobjectiv

e

Distributed

Central

Homogenous

Heterogeneous

Attribute

Spatial

Temporal

Singlehop

Multihops

Static

Mobile

ClusterheadSensorRelay

Simulation

Analyticalmod

Real

Synthetic

Sequ

entia

lpattern

mining

TMP-mine[65]

Predicto

bjectrsquos

future

movem

ent

Patte

rngrow

thusingTM

P-tre

econstructio

nradic

radicradic

radicradic

radicRu

le-based

node

activ

ation

Real-timeo

bject

tracking

radicradic

Energy

Highmissing

rateandtim

e

Patte

rnlearner[23]B

ehavior

recogn

ition

Tree

projectio

nradic

radicradic

radicradic

radicSensea

ndsend

Behavior

mon

itorin

gradicradic

Noof

patte

rns

learned

Com

plex

and

redu

ndant

patte

rns

MSA

P[63]

Faultp

rediction

Cand

idate

constructio

nradicradic

radicradicradic

radicSensea

ndsend

Telecommun

ication

radicradic

Patte

rnsa

ccuracy

Cand

idate

constructio

nis

expensiveto

compu

te

PTSP

[66]

Objectrsquos

future

movem

ent

predictio

n

Sequ

entia

lpatte

rngeneratio

nradic

radicradic

radicradic

radicRu

le-based

node

activ

ation

Objecttracking

radicradic

Energy

Ineffi

cientto

predict

high

-speed

objects

Clusterin

g

DCC

[86]

WSN

slon

gevity

Data

correlation-

based

cluste

ring

radicradic

radicradicradic

radicradic

Data

supp

ression

GenericWSN

sapplication

radicradic

Energy

anddata

size

Highclu

sterin

grate

H-cluste

r[85]

In-network

commun

ication

Data

correlation-

based

cluste

ring

radicradic

radicradicradic

radicradic

Data

summarization

Real-time

mon

itorin

gradic

radicradic

Com

mun

ication

Highdataloss

rate

16 International Journal of Distributed Sensor Networks

Table5Com

paris

onof

dataminingtechniqu

esforw

irelesssensor

networkscon

tinued

Approach

Objectiv

eDM

metho

d

Processin

gSensor

data

Nod

eproperties

Implem

entatio

nLimitatio

nsArchitecture

Attributes

Correlatio

nCon

nectivity

Mob

ility

Role

Nod

etask

Applicationarea

Evaluatio

nmetho

dDatas

ource

Optobjectiv

e

Distributed

Central

Homogenous

Heterogeneous

Attribute

Spatial

Temporal

Singlehop

Multihops

Static

Mobile

ClusterheadSensorRelay

Simulation

Analyticalmod

Real

Synthetic

Clusterin

gPredictio

nmod

el[87]

Predictio

n-based

mon

itorin

gHeuris

ticscheme

radicradic

radicradic

radicradic

radicradicradic

Localprediction

mod

elEn

vironm

ental

mon

itorin

gradic

radicCom

mun

ication

Clustero

verla

pping

CAG[88]

WSN

sbandw

idth

gain

Data

correlation-

based

cluste

ring

radicradic

radicradic

radicradic

radicradic

Dataa

ggregatio

nGenericWSN

sapplications

radicradic

Com

mun

ication

Sensorydataloss

EEDC[84]

On-demand

cluste

ring

Data

correlation-

based

cluste

ring

radicradic

radicradic

radicradic

radicSensea

ndsend

Surveillanced

ata

analysis

radicradicradic

Energy

Ineffi

cientfor

large

WSN

s

Clusterin

gsensorydata[67]Com

mun

ication

efficiency

K-means

radicradicradic

radicradic

radicradic

Data

summarization

Dataa

nalysis

radicradic

Com

mun

ication

Ineffi

cientfor

large

WSN

sAttributeb

ased

cluste

ring[89]

WSN

sbandw

idth

gain

Hierarchal

cluste

ringradic

radicradic

radicradic

radicradic

Datac

luste

ring

Mon

itorin

gand

tracking

radicradic

Com

mun

ication

Highcompu

tatio

ncost

DHCS

[90]

Uniform

data

distr

ibution

Hierarchal

cluste

ringradic

radicradicradic

radicradic

radicradic

Datac

luste

ring

and

summarization

Interactived

ata

analysis

radicMessage

redu

ction

Nod

esenergy

isigno

red

International Journal of Distributed Sensor Networks 17

Table6Com

paris

onof

dataminingtechniqu

esforw

irelesssensor

networkscon

tinued

Approach

Objectiv

eDM

metho

d

Processin

gSensor

data

Nod

eproperties

Implem

entatio

nLimitatio

nsArchitecture

Attributes

Correlatio

nCon

nectivity

Mob

ility

Role

Nod

etask

Applicationarea

Evaluatio

nmetho

dDatas

ource

Opt

objective

Distributed

Central

Homogenous

Heterogeneous

Attribute

Spatial

Temporal

Singlehop

Multihops

Static

Mobile

ClusterheadSensorRelay

Simulation

Analyticalmod

Real

Synthetic

Classifi

catio

nPerson

identifi

catio

nalgorithm

s[109]

Identifyhu

man

behavior

Decision

tree

radicradicradic

radicradic

radicSensea

ndsend

Health

care

radicradic

Classifi

catio

naccuracy

Doesn

otgu

arantee

thec

orrectness

Predictio

nfram

ework[103]

Distrib

uted

predictio

nDecision

tree

radicradic

radicradicradic

radicradic

Localprediction

Generic

radicradic

Predictio

naccuracy

Com

putatio

nal

complexity

NNTC

[96]

Real-time

classificatio

nNearest

neighb

orradicradic

radicradic

radicradic

Sensea

ndsend

Generic

radicradicradic

Classifi

catio

naccuracy

Not

evaluatedon

realdataset

LWClass[100]

Preserve

WSN

sresources

KNN

radicradic

radicradic

radicradic

Sensea

ndsend

Ubiqu

itous

environm

ents

radicradic

Resource

awareness

Non

adaptio

nto

conceptd

rift

FVLD

[104

]Lo

w-dim

ensio

nfeaturev

ector

generatio

nKN

NM

Lradic

radicradic

radicradic

radicradic

Classifi

catio

nVe

hicle

classificatio

nradic

radicEn

ergy

Highcostof

feature

vector

transm

ission

Fuzzypredictor

mod

el[99]

Occup

ancy

predictio

nFu

zzyrules

radicradic

radicradic

radicradic

Sensea

ndsend

Health

care

radicradic

Predictio

naccuracy

Ineffi

cientfor

complex

scenarios

Onlinelearning

[105]

Increm

ental

classificatio

nSV

Mradic

radicradic

radicradic

radicradic

Classifi

catio

nEn

vironm

ental

mon

itorin

gradic

radicEn

ergy

Com

putatio

nal

complexity

One-class

quarter-sphere

SVM

[108]

Ano

maly

detection

SVM

radicradic

radicradic

radicradicradic

Localano

maly

detection

Habitat

mon

itorin

gradic

radicEn

ergy

Igno

resspatia

lcorrelation

18 International Journal of Distributed Sensor Networks

mining becomes difficult because updates on this structureshould be persisted over time

Node Role Node can perform three types of role [33] asfollows

(i) Regular Sensor These are the nodes with limitedresources and they are used to sense the phenomenaand send the sensed data to the base station

(ii) Cluster Head Cluster head can be a regular sensornode or it can be rich in resources In centralizedapproaches cluster head is a regular sensor node thatonly controls the cluster membership In distributedapproaches besides responding for cluster formationCHs perform aggregationfusion of collected sensorsrsquodata Therefore they are equipped with significantlymore computation and communication resources

(iii) Relay It is the node that acts as medium to transmitthe data packet from one node to the others

Node Task In centralized approach node task is to sense thephenomena being monitored and send the sensed data to thebase station In distributed approaches node can performcomputation and can take action based on the detectedphenomena or target

55 Application Area We also evaluated the type of applica-tion benefited fromWSNs data mining techniques Here weexemplify some real-world applications as follows

(i) First is the environmental monitoring [5ndash7 51 5887] in which sensors are deployed in harsh andunattended regions to monitor the natural environ-ment Data mining techniques can identify when andwhere an event may occur and trigger an alarm upondetection

(ii) Second is the habitant and health monitoring [1 299 109] in which patientshumans are equipped withsmall sensors on multiple different positions of theirbody tomonitor their health or behaviorDataminingtechnique can identify the abnormal behavior andhelp to take effective action

(iii) Third is the object tracking [3 4 65 66] in whichsensors are embedded inmoving targets to track themin real-time Data mining techniques help to improvethe estimation of the location of targets and also tomake tracking more efficient and accurate

(iv) Fourth is the WSNs performance [46 48 50 51]WSNs are usually unattended and deployed in harshenvironment Sensor nodes are resource constrainedespecially in terms of power Data mining techniqueshelp to identify the faulty or dead nodes Theyalso help to conserve energy by using in-networkprocessing in which aggregated data is sent to centralside

(v) Fifth is the data analysis [67 84 90] Data miningtechniques help to discover potentially interesting

data patterns in a sensor network for a certainapplication

(vi) Sixth is the real-time monitoring [64 65 85] Datamining techniques especially distributed techniqueshelp to identify certain patterns and predict futureevents in a given time window which make real-timeresponse and action feasible

56 Implementation Each technique is also evaluated interms of experimental validation that is which dataset isused which WSNs optimization objectives are achieved andso forth

Evaluation Method Analytical modeling simulation andreal deployment are the most commonly used techniques toanalyze the performance of data mining technique forWSNs

(i) Analytical Modeling This method is very complexand usually certain simplifications are assumed topredict the performance of the proposed schemeSuch assumptions and simplifications may lead toimprecise results with limited confidence

(ii) Simulation It is the most popular and effectiveapproach to design and test any proposed schemein terms of cost and time it also provides higherlevel of details as comparedwith real implementationHowever the appropriate selection of a simulationframework according to problem and network char-acteristics is a critical task

(iii) Real Deployment It may not be feasible to evaluatethe performance of these techniques through realdeployment due to the unavailability of appropriatehardware in terms of technical and design limitationsUsually the real deployment requires hundreds ofsensor nodes and cost becomes another importantissue In a nutshell evaluating any technique pro-posed for WSNs through real deployment can getthe most convincing results although the evaluatingprocess is complex costly and time consuming

Data Source It refers to dataset use to experimentally validatethe proposed technique Two types of dataset are usedgenerally that is synthetic and real It is observed from thispaper that most of the techniques use the simulation onsynthetic dataset to validate the result In this paper it isobserved that most of the studies used the simulation due tolimited processing power of sensor nodes

Optimization Objective SinceWSNs are constrained in termsof different resources the technique is also evaluated in theoptimization objective that has been achieved Most of thetechniques consider the resource constraint and differentdesign philosophies of network None of them can workefficiently for all of the performance metrics like networksize communication overhead energy efficiency memoryconsumption node mobility and and so forth The largevariations in the performance metrics make it a difficult taskto present a comprehensive evaluation

International Journal of Distributed Sensor Networks 19

6 Limitations of Existing Data MiningTechniques for WSNs

Tables 2ndash6 show the characteristics of datamining techniquesdesigned for WSNs It is observed from comparative analysisthat the existing techniques have the following shortcomings

(i) Most of the techniques do not take into account theheterogeneous data and assume that the sensor data ishomogenous [42 46 49ndash51 65 87 110] They ignorethe fact that different attributes together can improvethe mining accuracy In some cases homogenousdata cannot contribute appropriately toward real-time decision

(ii) The majority of techniques only considers the spatialor temporal or spatiotemporal correlations [65ndash6787 88] among sensor data of neighboring nodes anddoes not consider the attribute dependency amongsensor nodes This in turn increases the computa-tional complexity and reduces the accuracy of miningtechnique

(iii) The techniqueswhich consider spatial correlation [51]among sensor data of neighboring nodes suffer fromthe choice of appropriate neighborhood range Tech-niques which consider temporal correlation amongsensor data suffers from the choice of the size of thesliding window

(iv) The majority of techniques uses centralized approach[21 42ndash44 46 58 84 101] in which all data istransmitted to the sink node for identifying certainpatterns These techniques cause much communica-tion overhead and delay the response time Whilethe techniques that used distributed architecture opti-mize response time and energy consumption theyhave the same problem as that of the centralizedapproach if the aggregatorcluster head has a largenumber of nodes under its membership

(v) Excluding a few the performance of all of the schemesdiscussed in this paper has been evaluated with thehelp of different simulation tools Although the num-ber of simulators is available and plays an importantrole for developing and testing new technique thereis always some kind of risk involved as simulationresults may not be accurate In order to analyze aprotocol more effectively it is important to knowdifferent available tools andunderstand the associatedbenefits and limitationsDue to different performancerequirements according to specific applications ageneral tool for sensor networks is still lacking atpresent

(vi) The techniques evaluated by using analytical mod-eling [21 23 46 49 100 109] used certain sim-plification and assumption to evaluate the perfor-mance of proposed technique Such assumptions andsimplifications may lead to imprecise results withlimited confidence None of the proposed techniqueis evaluated by using real deployment Although realdeployment is complex costly and time consuming

accurate results can only be obtained by using realdeployment

(vii) Excluding a few [22 103 109] the majority oftechniques assumes that sensor nodes are stationaryand do not consider nodes mobility Applying thesetechniques for mobile networks or the networks withdynamic changed topology would be challenging

(viii) Most of the techniques used the synthetic dataAlthough synthetic data is easily available therealways been chances that results generated on syn-thetic data are not accurate

(ix) For the data mining techniques themselves fre-quent pattern mining [15ndash20] approaches suffer fromchoice of proper and flexible support and confidencethreshold Clustering techniques [11ndash14] suffer fromthe choice of an appropriate parameter of clusterwidth and computing the distance between datainstances in heterogeneous data is computationallyexpensive whereas classification-based techniques[24ndash26] require some prior knowledge to classify theincoming data stream However learning accurateclassification model is challenging if the number ofvariables is large in deployed WSNs

7 Future Research Directions

It is observed from the analysis of existing data mining workon sensor network-based application there are still shortcom-ings in existing techniques By seeing these shortcomingsand special characteristics of WSNs there is a need for datamining technique designed for WSNs The technique shouldbe based on the following requirements

(i) The technique should combine offline learningmech-anisms with distributed and online data processing

(ii) It should also consider the resource constraint ofWSN and its special characteristics such as nodemobility and network topology

(iii) The technique should consider heterogeneous dataand dependencies among spatial temporal andattribute correlations which may exist between adja-cent nodes

(iv) During online mining the technique should be capa-ble for incremental learning

(v) The technique should have low computation com-plexity and be easy to be implemented

Based on aforementioned requirements for WSN ahybrid data mining framework is proposed as shown inFigure 6 In this framework sensor nodes use their pro-cessing abilities to locally carry out mining processing andtransmit only the required and partially processed data calledlocal models Single-pass algorithms are applied for networkdata processing as the data is continuously arriving and notavailable for the next scan

Local models contain the compact event patterns ratherthan raw data which address the issue of communication

20 International Journal of Distributed Sensor Networks

Node data processingData selectionRemove duplicationAggregationSummarizationData fusionclusteringAssociation analysismiddot middot middotmiddot middot middot

middot middot middot

Sensor datastream

Global model

Approximateresults

Network model Local modelQuery

Users

Sinkbasestation

In-network processingCentralized processing

Central data processingFrequent pattern miningClassificationClusteringIncremental learningPredicationAnomaly detectionTime series analysis

Network data processingLocal model integrationNetwork analysisReal time decisionsNetwork maintenance

Network patternidentification

monitoring

Sing

le p

ass

Mul

ti pa

ss

Figure 6 Proposed hybrid framework for sensor network based applications

overhead associated with data transfer Local models aredistributed on entire network which are integrated at specialnode which is resource sufficient as compared with othersensor nodes As a result a network model is computed that ismore abstract than local model and is transferred to the basestationsink inmultihop fashionThenetworkmodels are thenintegrated at base stationsink to get the global view of entirenetwork named the global model As a result approximatequery answers are returned to endusers

This framework addresses the following shortcomings ofthe existing techniques

(i) It combines the offline learning mechanisms withdistributed and online data processing The dynamicnature of WSNs data requires real-time analysismethodologies and systems Centralized processingthrough high-end computing is also required forgenerating offline predictive insights which in turncan facilitate real-time analysis The applications thatrequire real-time response and actions can use net-work model for decision and knowledge extractionThe applications that need extensive data analysis fortheir decision making can use global model and per-form central processing on base the stationsink Thenetwork model forwards the processed informationto global model for extensive predictive insight

(ii) Since the data management is a crucial issue inWSNsdata [111] in order to deal with large-scale data fromWSNs the proposed framework splits the data pro-cessing tasks at multiple locations in-network pro-cessing and processing at central server In-networkprocessing splits the large task into smaller ones atnode level and cluster head which is distributed overthe entire network and executes parallelly At the node

level storage capacities of single nodes are used tocompute the local model which contains aggregateddata from single node whereas cluster head acquiresthe data from group of nodes and aggregate datareadings over a certain region or period As a resultnetwork model is computed at each cluster headwhich contains compact data from set of nodes andreduces data size to be transmitted Network modelscan be integrated at sink to get the global view ofreal-time applications Since the sink at network levelhas restricted resource and cannot process large-scaledata for predictive analysis therefore network mod-els are sent to central server where global models canbe computed for predictive offline analysis Historicalquery from the user can also be addressed fromcentral server whereas instant query can be handledby sink to support real-time response In this way ofdata distribution the proposed framework is feasibleto deal with large amount of data obtained fromWSNs

(iii) It can consider the resource constraint of sensornode by using context-awareness techniques Mem-ory energy [79] and bandwidth are considered inthe implementation of data processing on the sensorsfor example many summarization and aggregationtechniques can be adopted to reduce energy andbandwidth consumption

(iv) The framework can address the problem quicklychanging nature of WSNs data where characteristicsof the monitored process may change over timeand render the old models outdated This problemcan be addressed using the incremental learning

International Journal of Distributed Sensor Networks 21

mechanism [39 112] that helps the model to updatenew information

(v) The framework can identified the spatial-temporalcorrelation at local model by using data correlation-based clustering whereas attribute correlation can beidentified at global model by using the multipass datamining algorithms

Currently we are working on implementation of thishybrid framework and the implementationwill be completedin the near future

8 Conclusion

The emerging need for the data mining techniques in thefield of WSNs resulted in the development of numerousalgorithms Each one of these algorithms solves certainissues related to the appropriate WSNs type and applicationIn this paper we analyzed discussed and compared therelated existing research approaches We observed that thetechniques intended for mining sensor data at the networkside are helpful for taking real-time decision aswell as serve asprerequisite for development of effective mechanism for datastorage retrieval query and transaction processing at centralside Moreover we have presented problem-based taxonomyan overall analysis and review of the past research and theirlimitations which can provide insights for endusers in apply-ing or developing an appropriate data mining method andappropriate technology forWSNs Based on these limitationswe have proposed a hybrid framework which can addressthe shortcomings of existing work We have also discussedthe challenges for implementing data mining techniques inresource-constrained WSNs Besides there are a number ofopen issues in existing studies which need to be addressedSurely the number of WSNs applications presented hereis neither complete nor exhaustive but merely a sample ofapplications that demonstrate the usefulness and possibleapplications of data mining method in sensor network

We believe that WSNs applications will become moremature and popular with the advancement of sensor tech-nology and sensor data will become more informationrich Mining techniques will then be very significant inorder to conduct advanced analysis such as determiningtrends and finding interesting patterns thus enhancingWSNsperformance and operation The intention to present thispaper is to stimulate interests in utilizing and developing theprevious studies into emerging applications

Acknowledgments

This work was supported in part by the Joint Funds ofNSFC-Microsoft Research Asia under Grant no 60933012the Specialized Research Fund for the Doctoral Programof Higher Education under Grant no 20110142110062 andInternational SampT Cooperation Program of Hubei Provinceunder Grant no 2010BFA008

References

[1] A Rozyyev H Hasbullah and F Subhan ldquoIndoor child track-ing in wireless sensor network using fuzzy logic techniquerdquoResearch Journal of Information Technology vol 3 no 2 pp 81ndash92 2011

[2] R Szewczyk E Osterweil J Polastre M Hamilton A Main-waring and D Estrin ldquoHabitat monitoring with sensor net-worksrdquo Communications of the ACM vol 47 no 6 pp 34ndash402004

[3] S H Chauhdary A K Bashir S C Shah and M S ParkldquoEOATR energy efficient object tracking by auto adjustingtransmission range in wireless sensor networkrdquo Journal ofApplied Sciences vol 9 no 24 pp 4247ndash4252 2009

[4] P K Biswas and S Phoha ldquoSelf-organizing sensor networks forintegrated target surveillancerdquo IEEETransactions onComputersvol 55 no 8 pp 1033ndash1047 2006

[5] L T Lee and C W Chen ldquoSynchronizing sensor networkswith pulse coupled and cluster based approachesrdquo InformationTechnology Journal vol 7 no 5 pp 737ndash745 2008

[6] N Sabri S A Aljunid B Ahmad A Yahya R KamaruddinandM S Salim ldquoWireless sensor actor network based on fuzzyinference system for greenhouse climate controlrdquo Journal ofApplied Sciences vol 11 no 17 pp 3104ndash3116 2011

[7] D Kumar ldquoMonitoring forest cover changes using remotesensing and GIS a global prospectiverdquo Research Journal ofEnvironmental Sciences vol 5 pp 105ndash123 2011

[8] J Yick B Mukherjee and D Ghosal ldquoWireless sensor networksurveyrdquoComputerNetworks vol 52 no 12 pp 2292ndash2330 2008

[9] T Arampatzis J Lygeros and S Manesis ldquoA survey of appli-cations of wireless sensors and wireless sensor networksrdquoin Proceedings of the 20th IEEE International Symposium onIntelligent Control (ISIC rsquo05) pp 719ndash724 June 2005

[10] Y-C Tseng M-S Pan and Y-Y Tsai ldquoWireless sensor net-works for emergency navigationrdquo Computer vol 39 no 7 pp55ndash62 2006

[11] T Yairi Y Kato and K Hori ldquoFault detection by miningassociation rules fromhouse-keeping datardquo inProceedings of the6th International Symposium on Artificial Intelligence Roboticsand Automation in Space pp 18ndash21 2001

[12] O Horovitz S Krishnaswamy and M M Gaber ldquoA fuzzyapproach for interpretation of ubiquitous data stream clusteringand its application in road safetyrdquo Intelligent Data Analysis vol11 no 1 pp 89ndash108 2007

[13] J Gama P P Rodrigues and L Lopes ldquoClustering distributedsensor data streams using local processing and reduced com-municationrdquo Intelligent Data Analysis vol 15 no 1 pp 3ndash282011

[14] Z A Aghbari I Kamel and T Awad ldquoOn clustering largenumber of data streamsrdquo Intelligent Data Analysis vol 16 no1 pp 69ndash91 2012

[15] A Boukerche and S Samarah ldquoAn efficient data extractionmechanism for mining association rules from wireless sensornetworksrdquo in Proceedings of the IEEE International Conferenceon Communications (ICC rsquo07) pp 3936ndash3941 June 2007

[16] Y Chi H Wang P S Yu and R R Muntz ldquoMomentmaintaining closed frequent itemsets over a stream slidingwindowrdquo inProceedings of the 4th IEEE International Conferenceon Data Mining (ICDM rsquo04) pp 59ndash66 November 2004

[17] M Deypir and M H Sadreddini ldquoEclatDS an efficient slid-ing window based frequent pattern mining method for data

22 International Journal of Distributed Sensor Networks

streamsrdquo Intelligent Data Analysis vol 15 no 4 pp 571ndash5872011

[18] J Gama A Ganguly O Omitaomu R Vatsavai and M GaberldquoKnowledge discovery from data streamsrdquo Intelligent DataAnalysis vol 13 no 3 pp 403ndash404 2009

[19] B George J M Kang and S Shekhar ldquoSpatio-temporal sensorgraphs (STSG) a data model for the discovery of spatio-temporal patternsrdquo Intelligent Data Analysis vol 13 no 3 pp457ndash475 2009

[20] A Mahmood K Shi and S Khatoon ldquoMining data generatedby sensor networks a surveyrdquo Information Technology Journalvol 11 pp 1534ndash1543 2012

[21] D J Cook M Youngblood E O Heierman III et alldquoMavHome an agent-based smart homerdquo in Proceedings of the1st IEEE International Conference on Pervasive Computing andCommunications (PerCom rsquo03) pp 521ndash524 March 2003

[22] J Rabatel S Bringay and P Poncelet ldquoSO MAD sensorminingfor anomaly detection in railway datardquo in Advances in DataMining Applications andTheoretical Aspects pp 191ndash205 2009

[23] V Guralnik and K Z Haigh ldquoLearning models of humanbehaviour with sequential patternsrdquo in Proceedings of the AAAI-02 Workshop on Automation as Caregiver pp 24ndash30 2002

[24] S Huang and Y Dong ldquoAn active learning system for miningtime-changing data streamsrdquo Intelligent Data Analysis vol 11no 4 pp 401ndash419 2007

[25] J Beringer and E Hullermeier ldquoEfficient instance-based learn-ing on data streamsrdquo Intelligent Data Analysis vol 11 no 6 pp627ndash650 2007

[26] E J Spinosaa A PD L F deCarvalhoa and J Gamab ldquoNoveltydetection with application to data streamsrdquo Intelligent DataAnalysis vol 13 no 3 pp 405ndash422 2009

[27] M Xie S Han B Tian and S Parvin ldquoAnomaly detectionin wireless sensor networks a surveyrdquo Journal of Network andComputer Applications vol 34 no 4 pp 1302ndash1325 2011

[28] Y Zhang N Meratnia and P Havinga ldquoOutlier detectiontechniques for wireless sensor networks a surveyrdquo IEEE Com-munications Surveys and Tutorials vol 12 no 2 pp 159ndash1702010

[29] V Chandola A Banerjee and V Kumar ldquoAnomaly detection asurveyrdquo ACM Computing Surveys vol 41 no 3 article 15 2009

[30] VMaojo and J Sanandre ldquoA survey of data mining techniquesrdquoMedical Data Analysis Lecture Notes in Computer Science vol1933 pp 17ndash22 2000

[31] W Jinlong X Congfu C Weidong and P Yunhe ldquoSurveyof the study on frequent pattern mining in data streamsrdquo inProceedings of the IEEE International Conference on SystemsMan and Cybernetics (SMC rsquo04) pp 5917ndash5922 October 2004

[32] J Cheng Y Ke and W Ng ldquoA survey on algorithms formining frequent itemsets over data streamsrdquo Knowledge andInformation Systems vol 16 no 1 pp 1ndash27 2008

[33] A A Abbasi andM Younis ldquoA survey on clustering algorithmsfor wireless sensor networksrdquo Computer Communications vol30 no 14-15 pp 2826ndash2841 2007

[34] O Boyinbode H Le and M Takizawa ldquoA survey on clusteringalgorithms for wireless sensor networksrdquo International Journalof Space-Based and SituatedComputing vol 1 no 2 pp 130ndash1362007

[35] M M Gaber A Zaslavsky and S Krishnaswamy ldquoA survey ofclassificationmethods in data streamsrdquo inData Streams pp 39ndash59 Springer 2007

[36] R Agrawal and R Srikant ldquoFast algorithms for mining associ-ation rulesrdquo in Proceedings of the 20th International ConferenceVery Large Data Bases (VLDB rsquo94) pp 487ndash499 Citeseer 1994

[37] R J Bayardo Jr ldquoEfficiently mining long patterns fromdatabasesrdquo SIGMOD Record vol 27 no 2 pp 85ndash93 1998

[38] S Brin RMotwani andC Silverstein ldquoBeyondmarket basketsgeneralizing association rules to correlationsrdquo SIGMODRecordvol 26 no 2 pp 265ndash276 1997

[39] W Cheung and O R Zaiane ldquoIncremental mining of frequentpatterns without candidate generation or support constraintrdquoin Proceedings of 7th International Database Engineering andApplications Symposium pp 111ndash116 2003

[40] R Agrawal T Imielinski and A Swami ldquoMining associationrules between sets of items in large databasesrdquo in Proceeding ofSIGMOD pp 207ndash216

[41] J Han J Pei Y Yin and R Mao ldquoMining frequent pat-terns without candidate generation a frequent-pattern treeapproachrdquo Data Mining and Knowledge Discovery vol 8 no 1pp 53ndash87 2004

[42] M Halatchev and L Gruenwald ldquoEstimating missing valuesin related sensor data streamsrdquo in Proceedings of the 11thInternational Conference on Management of Data (COMADrsquo05) 2005

[43] N Jiang ldquoDiscovering association rules in data streams basedon closed pattern miningrdquo in Proceedings of the SIGMODWorkshop on Innovative Database Research 2007

[44] N Jiang and L Gruenwald ldquoEstimating missing data in datastreamsrdquo Advances in Databases Concepts Systems and Appli-cations pp 981ndash987 2007

[45] N Jiang and L Gruenwald ldquoCFI-stream mining closed fre-quent itemsets in data streamsrdquo in Proceedings of the 12th ACMSIGKDD International Conference on Knowledge Discovery andData Mining (KDD rsquo06) pp 592ndash597 August 2006

[46] K Loo I Tong and B Kao ldquoOnline algorithms for min-ing inter-stream associations from large sensor networksrdquo inAdvances in KnowledgeDiscovery andDataMining pp 291ndash3022005

[47] G S Manku and R Motwani ldquoApproximate frequency countsover data streamsrdquo in Proceedings of the 28th InternationalConference on Very Large Data Bases pp 346ndash357 2002

[48] S K Chong S Krishnaswamy S W Loke and M M GaberldquoUsing association rules for energy conservation in wirelesssensor networksrdquo in Proceedings of the 23rd Annual ACMSymposium on Applied Computing (SAC rsquo08) pp 971ndash975March 2008

[49] S K Tanbeer C F Ahmed B-S Jeong and Y-K Lee ldquoEfficientmining of association rules from wireless sensor networksrdquo inProceedings of the 11th International Conference on AdvancedCommunication Technology (ICACT rsquo09) pp 719ndash724 February2009

[50] A Boukerche and S Samarah ldquoA novel algorithm for miningassociation rules in Wireless Ad Hoc Sensor Networksrdquo IEEETransactions on Parallel and Distributed Systems vol 19 no 7pp 865ndash877 2008

[51] K Romer ldquoDistributed mining of spatio-temporal event pat-terns in sensor networksrdquo in Proceedings of the 1st Euro-American Workshop on Middleware for Sensor Networks(EAWMS rsquo06) 2006

[52] BTnode platform httpwwwbtnodeethzch[53] R Agrawal and R Srikant ldquoMining sequential patternsrdquo in

Proceedings of the IEEE 11th International Conference on DataEngineering pp 3ndash14 March 1995

International Journal of Distributed Sensor Networks 23

[54] R Srikant and R Agrawal ldquoMining sequential patterns gen-eralizations and performance improvementsrdquo in Proceedings ofthe Advances in Database Technology (EDBT rsquo96) pp 1ndash17 1996

[55] F Masseglia F Cathala and P Poncelet ldquoThe PSP approachfor mining sequential patternsrdquo Principles of Data Mining andKnowledge Discovery pp 176ndash184 1998

[56] J Han J Pei B Mortazavi-Asl Q Chen U Dayal and M-CHsu ldquoFreeSpan frequent pattern-projected sequential patternminingrdquo in Proceedings of the Sixth ACMSIGKDD InternationalConference onKnowledgeDiscovery andDataMining (KDD rsquo01)pp 355ndash359 August 2000

[57] J Pei J Han B Mortazavi-Asl et al ldquoPrefixSpan min-ing sequential patterns efficiently by prefix-projected patterngrowthrdquo in Proceedings of the 17th International Conference onData Engineering pp 215ndash224 April 2001

[58] F Esposito T M A Basile N Di Mauro and S Ferilli ldquoA rela-tional approach to sensor network data miningrdquo InformationRetrieval and Mining in Distributed Environments pp 163ndash1812010

[59] F Esposito N Di Mauro T M A Basile and S FerillildquoMulti-dimensional relational sequence miningrdquo FundamentaInformaticae vol 89 no 1 pp 23ndash43 2008

[60] R Agrawal H Mannila R Srikant et al ldquoFast discovery ofassociation rulesrdquo inAdvances in KnowledgeDiscovery andDataMining pp 307ndash328 AAAI PressMenlo Park Calif USA 1996

[61] Mica2Dot CrossBow 2005 httpwwwxbowcom[62] Intel Berkeley Research Lab Data httpdbcsailmitedulab-

datalabdatahtml[63] P H Wu W C Peng and M S Chen ldquoMining sequential

alarm patterns in a telecommunication databaserdquo in Databasesin Telecommunications II pp 37ndash51 2001

[64] V S Tseng and E H-C Lu ldquoEnergy-efficient real-time objecttracking in multi-level sensor networks by mining and predict-ing movement patternsrdquo Journal of Systems and Software vol82 no 4 pp 697ndash706 2009

[65] V S Tseng and K W Lin ldquoEnergy efficient strategies for objecttracking in sensor networks a data mining approachrdquo Journalof Systems and Software vol 80 no 10 pp 1678ndash1698 2007

[66] S Samarah M Al-Hajri and A Boukerche ldquoA predictiveenergy-efficient technique to support object-tracking sensornetworksrdquo IEEE Transactions on Vehicular Technology vol 60no 2 pp 656ndash663 2011

[67] A Taherkordi R Mohammadi and F Eliassen ldquoA commu-nication-efficient distributed clustering algorithm for sensornetworksrdquo in Proceedings of the 22nd International Conferenceon Advanced Information Networking and Applications Work-shopsSymposia (AINA rsquo08) pp 634ndash638 March 2008

[68] G Gupta and M Younis ldquoLoad-balanced clustering of wirelesssensor networksrdquo in Proceedings of the International Conferenceon Communications (ICC rsquo03) vol 3 pp 1848ndash1852 May 2003

[69] S Bandyopadhyay and E J Coyle ldquoAn energy efficient hier-archical clustering algorithm for wireless sensor networksrdquo inProceedings of the 22nd Annual Joint Conference on the IEEEComputer and Communications Societies pp 1713ndash1723 April2003

[70] S Ghiasi A Srivastava X Yang and M Sarrafzadeh ldquoOptimalenergy aware clustering in sensor networksrdquo Sensors vol 2 no7 pp 258ndash269 2002

[71] O Younis and S Fahmy ldquoHEED a hybrid energy-efficientdistributed clustering approach for ad hoc sensor networksrdquoIEEE Transactions on Mobile Computing vol 3 no 4 pp 366ndash379 2004

[72] M Younis M Youssef and K Arisha ldquoEnergy-aware manage-ment for cluster-based sensor networksrdquo Computer Networksvol 43 no 5 pp 649ndash668 2003

[73] Y T Hou Y Shi H D Sherali and S F Midkiff ldquoOn energyprovisioning and relay node placement for wireless sensornetworksrdquo IEEE Transactions on Wireless Communications vol4 no 5 pp 2579ndash2590 2005

[74] T Wu and S Biswas ldquoA self-reorganizing slot allocation proto-col for multi-cluster sensor networksrdquo in Proceedings of the 4thInternational Symposium on Information Processing in SensorNetworks (IPSN rsquo05) pp 309ndash316 April 2005

[75] K Dasgupta K Kalpakis and P Namjoshi ldquoAn efficientclustering-based heuristic for data gathering and aggregationin sensor networksrdquo in Proceedings of the IEEE Wireless Com-munications and Networking Conference (WCNC rsquo03) vol 3 pp1948ndash1953 2003

[76] M Demirbas A Arora and V Mittal ldquoFLOC A fast local clus-tering service for wireless sensor networksrdquo in Proceedings ofWorkshop on Dependability Issues in Wireless Ad Hoc Networksand Sensor Networks (DIWANS rsquo04) 2004

[77] P Ding J Holliday and A Celik ldquoDistributed energy-efficienthierarchical clustering for wireless sensor networksrdquo in Pro-ceedings of the 1st IEEE International Conference on DistributedComputing in Sensor Systems (DCOSS rsquo05) pp 466ndash467 July2005

[78] H Chan and A Perrig ldquoACE an emergent algorithm for highlyuniform cluster formationrdquoWireless Sensor Networks vol 2920pp 154ndash171 2004

[79] H Chan M Luk and A Perrig ldquoUsing clustering informationfor sensor network localizationrdquo in Proceedings of the 1st IEEEInternational Conference on Distributed Computing in SensorSystems (DCOSS rsquo05) pp 109ndash125 July 2005

[80] H Huang and J Wu ldquoA probabilistic clustering algorithmin wireless sensor networksrdquo in Proceeding of IEEE 62ndSemiannual Vehicular Technology Conference (VTC rsquo05) p 17962005

[81] A Youssef M Younis M Youssef and A Agrawala ldquoDis-tributed formation of overlappingmulti-hop clusters in wirelesssensor networksrdquo in Proceedings of the 49th Annual IEEE GlobalCommunication Conference (Globecom rsquo06) pp 1ndash6 December2006

[82] S Dai P Wang L Gao and S Zheng ldquoMining clusteringalgorithm in wireless sensor networksrdquo in Proceedings of theIEEE International Conference on Granular Computing (GRCrsquo08) pp 178ndash182 August 2008

[83] W R Heinzelman A Chandrakasan and H Balakrish-nan ldquoEnergy-efficient communication protocol for wirelessmicrosensor networksrdquo in Proceedings of the 33rd AnnualHawaii International Conference on System Siences (HICSS rsquo00)vol 2 p 223 January 2000

[84] C Liu K Wu and J Pei ldquoA dynamic clustering and schedulingapproach to energy saving in data collection from wirelesssensor networksrdquo in Proceedings of the 2nd Annual IEEE Com-munications Society Conference on Sensor and AdHoc Commu-nications and Networks (SECON rsquo05) pp 374ndash385 September2005

[85] L Guo C Ai X Wang Z Cai and Y Li ldquoReal time clusteringof sensory data in wireless sensor networksrdquo in Proceedingsof the IEEE 28th International Performance Computing andCommunications Conference (IPCCC rsquo09) pp 33ndash40 December2009

24 International Journal of Distributed Sensor Networks

[86] M H Yeo M S Lee S J Lee and J S Yoo ldquoData correlation-based clustering in sensor networksrdquo in Proceedings of the Inter-national Symposium on Computer Science and its Applications(CSA rsquo08) pp 332ndash337 October 2008

[87] P Beyens A Nowe and K Steenhaut ldquoHigh-density wirelesssensor networks a new clustering approach for prediction-based monitoringrdquo in Proceedings of the 2nd European Work-shop on Wireless Sensor Networks (EWSN rsquo05) pp 188ndash196February 2005

[88] S Yoon and C Shahabi ldquoThe Clustered AGgregation (CAG)technique leveraging spatial and temporal correlations in wire-less sensor networksrdquo ACM Transactions on Sensor Networksvol 3 no 1 Article ID 1210672 2007

[89] K Wang S A Ayyash T D C Little and P Basu ldquoAttribute-based clustering for information dissemination in wirelesssensor networksrdquo in Proceedings of the 2nd Annual IEEE Com-munications Society Conference on Sensor and AdHoc Commu-nications and Networks (SECON rsquo05) pp 498ndash509 Santa ClaraCalif USA September 2005

[90] X Ma S Li Q Luo et al ldquoDistributed hierarchical clusteringand summarization in sensor networksrdquo in Advances in Dataand Web Management pp 168ndash175 2007

[91] L K Sharma O P Vyas S Schieder et al ldquoNearest neighbourclassification for trajectory datardquo Information and Communica-tion Technologies vol 101 pp 180ndash185 2010

[92] B Chikhaoui S Wang and H Pigot ldquoA new algorithm basedon sequential pattern mining for person identification in ubiq-uitous environmentsrdquo in Proceedings of the 4th InternationalWorkshop on Knowledge Discovery form Sensor Data (ACMSensorKDD rsquo10) pp 20ndash28 Washington DC USA 2010

[93] J R M Bauchet S Giroux H Pigot et al ldquoPervasive assistancein smart homes for people with intellectual disabilities a casestudy on meal preparationrdquo International Journal of AssistiveRobotics and Mechatronics vol 9 no 4 pp 42ndash54 2008

[94] D J Cook andM Schmitter-Edgecombe ldquoAssessing the qualityof activities in a smart environmentrdquoMethods of Information inMedicine vol 48 no 5 pp 480ndash485 2009

[95] I H Witten and E Frank Data Mining Practical MachineLearning Tools and Techniques With Java Implementation Mor-gan Kaufmann 2000

[96] K Sharma M Rajpoot and L K Sharma ldquoNearest neighbourclassification for wireless sensor network datardquo InternationalJournal of Computer Trends and Technology no 2 2011

[97] NS2 Simulator httpwwwisiedunsnamns[98] O P V L K Sharma S Schieder and A K Akasapu ldquoA nearest

neighbour classification for trajectory datardquo in Springer CCISvol 101 pp 180ndash185 2010

[99] M J Akhlaghinia A Lotfi C Langensiepen and N SherkatldquoA fuzzy predictor model for the occupancy prediction of anintelligent inhabited environmentrdquo in Proceedings of the IEEEInternational Conference on Fuzzy Systems (FUZZ rsquo08) pp 939ndash946 June 2008

[100] M Gaber S Krishnaswamy and A Zaslavsky ldquoOn-boardmining of data streams in sensor networksrdquo in AdvancedMethods for Knowledge Discovery from Complex Data pp 307ndash335 2005

[101] M M Gaber S Krishnaswamy and A Zaslavsky ldquoAdaptivemining techniques for data streams using algorithm outputgranularityrdquo in Proceedings of the Australasian Data MiningWorkshop 2003

[102] M M Gaber A Zaslavsky and S Krishnaswamy ldquoResource-aware knowledge discovery in data streamsrdquo in Proceedingsof 1st International Workshop on Knowledge Discovery in DataStreams held in Conjunction ECML and PKDD 2004

[103] S M McConnell and D B Skillicorn ldquoA distributed approachfor prediction in sensor networksrdquo in Proceedings of the Work-shop on Data Mining in Sensor Networks Newport Beach CalifUSA 2005

[104] B Malhotra I Nikolaidis and J Harms ldquoDistributed classifi-cation of acoustic targets in wireless audio-sensor networksrdquoComputer Networks vol 52 no 13 pp 2582ndash2593 2008

[105] K Flouri B Beferull-Lozano and T Tsakalides ldquoTraininga SVM-based classifier in distributed sensor networksrdquo inProceedings of the 14th International Conference onDigital SignalProcessing (DSP rsquo09) pp 1ndash5 2006

[106] K Flouri B Beferull-Lozano and T Tsakalides ldquoEnergy-efficient distributed support vectormachines for wireless sensornetworksrdquo in Proceedings of the EuropeanWorkshop onWirelessSensor Networks 2006

[107] K Flouri B Beferull-Lozano and T Tsakalides ldquoDistributedconsensus algorithms for SVM training in wireless sensornetworksrdquo in Proceedings of the 16th European Signal ProcessingConference (EUSIPCO 09) 2008

[108] S Rajasegarar C Leckie M Palaniswami and J C BezdekldquoQuarter sphere based distributed anomaly detection in wire-less sensor networksrdquo in Proceedings of the IEEE InternationalConference on Communications (ICC rsquo07) pp 3864ndash3869 June2007

[109] B Chikhaoui S Wang and H Pigot ldquoA new algorithm basedon sequential pattern mining for person identification in ubiq-uitous environmentsrdquo in Proceedings of the 4th InternationalWorkshop on Knowledge Discovery form Sensor Data (ACMSensorKDD rsquo10) pp 20ndash28 Washington DC USA 2010

[110] K Romer and F Mattern ldquoThe design space of wireless sensornetworksrdquo IEEEWireless Communications vol 11 no 6 pp 54ndash61 2004

[111] O Diallo J J P C Rodrigues and M Sene ldquoReal-time datamanagement on wireless sensor networks a surveyrdquo Journal ofNetwork andComputer Applications vol 35 no 3 pp 1013ndash10212012

[112] Y Yao L Feng B Jin and F Chen ldquoAn incremental learningapproachwith SupportVectorMachine for network data streamclassification problemrdquo Information Technology Journal vol 11no 2 pp 200ndash208 2012

Submit your manuscripts athttpwwwhindawicom

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2013Part I

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

DistributedSensor Networks

International Journal of

ISRN Signal Processing

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Mechanical Engineering

Advances in

Modelling amp Simulation in EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Advances inOptoElectronics

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2013

ISRN Sensor Networks

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawi Publishing Corporation httpwwwhindawicom Volume 2013

The Scientific World Journal

ISRN Robotics

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

International Journal of

Antennas andPropagation

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

ISRN Electronics

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

thinspJournalthinspofthinsp

Sensors

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Active and Passive Electronic Components

Chemical EngineeringInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Electrical and Computer Engineering

Journal of

ISRN Civil Engineering

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Advances inAcoustics ampVibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Page 13: ReviewArticle Data Mining Techniques for Wireless Sensor ...home.etf.bg.ac.rs/~vm/os/dmsw/Data Mining... · have a large impact on type of data mining algorithm to choose;therefore,onehastodecidetheprocessing

International Journal of Distributed Sensor Networks 13

Table2Com

paris

onof

dataminingtechniqu

esforw

irelesssensor

networks

Approach

Objectiv

eDM

metho

d

Processin

gSensor

data

Nod

eproperties

Implem

entatio

nLimitatio

nsArchitecture

Attributes

Correlatio

nCon

nectivity

Mob

ility

Nod

erole

Nod

etask

Applicationarea

Evaluatio

nmetho

dDatas

ource

Opt

objective

Distributed

Central

Homogenous

Heterogeneous

Attribute

Spatial

Temporal

Singlehop

Multihops

Static

Mobile

ClusterheadSensorRelay

Simulation

AnalyticalMod

Real

Synthetic

Frequent

patte

rnmining

DSA

RM[42]

Missingdata

estim

ation

Aprio

rilik

eradicradic

radicradic

radicradic

Sensea

ndsend

Traffi

cmon

itorin

gradic

radicData

accuracy

Igno

rethes

ensor

thatrepo

rts

different

values

In-networkdata

mining[51]

Eventspatte

rns

discovery

Aprio

rilik

eradic

radicradicradic

radicradic

radic

Aggregatio

nlocalp

attern

mining

Environm

ental

mon

itorin

gradicradicradic

Scalability

Highmem

oryand

commun

ication

Distrib

uted

data

aggregation[15]

ImproveW

SNperfo

rmance

Aprio

rilik

eradic

radicradic

radicradic

radicSupp

ort-b

ased

aggregation

WSN

sperfo

rmance

mon

itorin

gradic

radicDatas

ize

Increasesb

uffer

cost

delayed

crucialm

essages

Onlinea

lgorith

m[46]

Intervallist

ofrepresentatio

nof

WSN

sdata

Lossy

coun

ting

radicradic

radicradic

radicradic

Perio

dical

sensing

WSN

smon

itorin

gradic

radicTimea

ndmem

ory

Datar

edun

dancy

Lightweightrule

learning

[48]

Identifyhigh

lycorrelated

rules

forsensin

gAp

riorilik

eradic

radicradic

radicradic

radicQuery-based

data

sensing

Con

trolW

SNs

operations

radicradic

Energy

Not

valid

ated

well

onrealdata

CARM

[43]

Missingdata

estim

ation

FP-growth

based

radicradic

radicradic

radicradic

Sensea

ndsend

Dataa

nalysis

radicradic

Data

accuracy

Ineffi

cientfor

hand

ling

high

-speed

data

14 International Journal of Distributed Sensor Networks

Table3Com

paris

onof

dataminingtechniqu

esforw

irelesssensor

networkscon

tinued

Approach

Objectiv

eDM

metho

d

Processin

gSensor

data

Nod

eproperties

Implem

entatio

nLimitatio

nsArchitecture

Attributes

Correlatio

nCon

nectivity

Mob

ility

Nod

erole

Nod

etask

Applicationarea

Evaluatio

nmetho

dDatas

ource

Opt

objective

Distributed

Central

Homogenous

Heterogeneous

Attribute

Spatial

Temporal

Singlehop

Multihops

Static

Mobile

Clusterhead

Sensor

Relay

Simulation

Analyticalmod

Real

Synthetic

Frequent

patte

rnmining

Associationrules

mining

fram

ework[50]

Faultand

future

event

predictio

n

FP-growth

usingPL

T-str

uctureradic

radicradic

radicradic

radicradic

Aggregatio

nMon

itorW

SNs

quality

ofserviceradic

radicNoof

messages

Increase

costdu

eto

multip

leDBscan

SP-tr

ee[49]

Disc

over

events

patte

rns

FP-growth

based

radicradic

radicradic

radicradic

Sensea

ndsend

Generic

mon

itorin

gradicradicradic

Mem

ory

Hightre

econstructio

ncost

Sequ

entia

lpattern

mining

Relatio

nal

fram

ework[58]

Multi-

dimensio

nal

correlation

discovery

Aprio

rilik

eradic

radicradicradic

radicradic

Sensea

ndsend

Environm

ental

mon

itorin

gradicradic

Data

representatio

nMem

oryandtim

econsum

ing

Episo

dediscovery(ED)

[21]

Actio

npredictio

n

Generalized

sequ

entia

lpatte

rn(G

SP)

radicradic

radicradic

radicSensea

ndsend

Inhabitants

behavior

predictio

nradicradicradic

Predictio

naccuracy

Ineffi

cientfor

complex

activ

ities

MPG

[64]

Predicto

bjectrsquos

future

movem

ent

Aprio

rilik

eradic

radicradic

radicradicradic

Clusterin

gRe

al-timeo

bject

tracking

radicradic

Tracking

time

andenergy

Not

analyzed

onrealdataset

Con

textual

patte

rns

discovery[22]

Ano

maly

detection

PSP

radicradicradic

radicradic

radicSensea

ndsend

Railw

aymaintenance

radicradic

Ano

maly

precision

Missingreal-time

anom

alypredictio

n

International Journal of Distributed Sensor Networks 15

Table4Com

paris

onof

dataminingtechniqu

esforw

irelesssensor

networkscon

tinued

Approach

Objectiv

eDM

metho

d

Processin

gSensor

data

Nod

eproperties

Implem

entatio

nLimitatio

nsArchitecture

Attributes

Correlatio

nCon

nectivity

Mob

ility

Nod

erole

Nod

etask

Applicationarea

Evaluatio

nmetho

dDatas

ource

Optobjectiv

e

Distributed

Central

Homogenous

Heterogeneous

Attribute

Spatial

Temporal

Singlehop

Multihops

Static

Mobile

ClusterheadSensorRelay

Simulation

Analyticalmod

Real

Synthetic

Sequ

entia

lpattern

mining

TMP-mine[65]

Predicto

bjectrsquos

future

movem

ent

Patte

rngrow

thusingTM

P-tre

econstructio

nradic

radicradic

radicradic

radicRu

le-based

node

activ

ation

Real-timeo

bject

tracking

radicradic

Energy

Highmissing

rateandtim

e

Patte

rnlearner[23]B

ehavior

recogn

ition

Tree

projectio

nradic

radicradic

radicradic

radicSensea

ndsend

Behavior

mon

itorin

gradicradic

Noof

patte

rns

learned

Com

plex

and

redu

ndant

patte

rns

MSA

P[63]

Faultp

rediction

Cand

idate

constructio

nradicradic

radicradicradic

radicSensea

ndsend

Telecommun

ication

radicradic

Patte

rnsa

ccuracy

Cand

idate

constructio

nis

expensiveto

compu

te

PTSP

[66]

Objectrsquos

future

movem

ent

predictio

n

Sequ

entia

lpatte

rngeneratio

nradic

radicradic

radicradic

radicRu

le-based

node

activ

ation

Objecttracking

radicradic

Energy

Ineffi

cientto

predict

high

-speed

objects

Clusterin

g

DCC

[86]

WSN

slon

gevity

Data

correlation-

based

cluste

ring

radicradic

radicradicradic

radicradic

Data

supp

ression

GenericWSN

sapplication

radicradic

Energy

anddata

size

Highclu

sterin

grate

H-cluste

r[85]

In-network

commun

ication

Data

correlation-

based

cluste

ring

radicradic

radicradicradic

radicradic

Data

summarization

Real-time

mon

itorin

gradic

radicradic

Com

mun

ication

Highdataloss

rate

16 International Journal of Distributed Sensor Networks

Table5Com

paris

onof

dataminingtechniqu

esforw

irelesssensor

networkscon

tinued

Approach

Objectiv

eDM

metho

d

Processin

gSensor

data

Nod

eproperties

Implem

entatio

nLimitatio

nsArchitecture

Attributes

Correlatio

nCon

nectivity

Mob

ility

Role

Nod

etask

Applicationarea

Evaluatio

nmetho

dDatas

ource

Optobjectiv

e

Distributed

Central

Homogenous

Heterogeneous

Attribute

Spatial

Temporal

Singlehop

Multihops

Static

Mobile

ClusterheadSensorRelay

Simulation

Analyticalmod

Real

Synthetic

Clusterin

gPredictio

nmod

el[87]

Predictio

n-based

mon

itorin

gHeuris

ticscheme

radicradic

radicradic

radicradic

radicradicradic

Localprediction

mod

elEn

vironm

ental

mon

itorin

gradic

radicCom

mun

ication

Clustero

verla

pping

CAG[88]

WSN

sbandw

idth

gain

Data

correlation-

based

cluste

ring

radicradic

radicradic

radicradic

radicradic

Dataa

ggregatio

nGenericWSN

sapplications

radicradic

Com

mun

ication

Sensorydataloss

EEDC[84]

On-demand

cluste

ring

Data

correlation-

based

cluste

ring

radicradic

radicradic

radicradic

radicSensea

ndsend

Surveillanced

ata

analysis

radicradicradic

Energy

Ineffi

cientfor

large

WSN

s

Clusterin

gsensorydata[67]Com

mun

ication

efficiency

K-means

radicradicradic

radicradic

radicradic

Data

summarization

Dataa

nalysis

radicradic

Com

mun

ication

Ineffi

cientfor

large

WSN

sAttributeb

ased

cluste

ring[89]

WSN

sbandw

idth

gain

Hierarchal

cluste

ringradic

radicradic

radicradic

radicradic

Datac

luste

ring

Mon

itorin

gand

tracking

radicradic

Com

mun

ication

Highcompu

tatio

ncost

DHCS

[90]

Uniform

data

distr

ibution

Hierarchal

cluste

ringradic

radicradicradic

radicradic

radicradic

Datac

luste

ring

and

summarization

Interactived

ata

analysis

radicMessage

redu

ction

Nod

esenergy

isigno

red

International Journal of Distributed Sensor Networks 17

Table6Com

paris

onof

dataminingtechniqu

esforw

irelesssensor

networkscon

tinued

Approach

Objectiv

eDM

metho

d

Processin

gSensor

data

Nod

eproperties

Implem

entatio

nLimitatio

nsArchitecture

Attributes

Correlatio

nCon

nectivity

Mob

ility

Role

Nod

etask

Applicationarea

Evaluatio

nmetho

dDatas

ource

Opt

objective

Distributed

Central

Homogenous

Heterogeneous

Attribute

Spatial

Temporal

Singlehop

Multihops

Static

Mobile

ClusterheadSensorRelay

Simulation

Analyticalmod

Real

Synthetic

Classifi

catio

nPerson

identifi

catio

nalgorithm

s[109]

Identifyhu

man

behavior

Decision

tree

radicradicradic

radicradic

radicSensea

ndsend

Health

care

radicradic

Classifi

catio

naccuracy

Doesn

otgu

arantee

thec

orrectness

Predictio

nfram

ework[103]

Distrib

uted

predictio

nDecision

tree

radicradic

radicradicradic

radicradic

Localprediction

Generic

radicradic

Predictio

naccuracy

Com

putatio

nal

complexity

NNTC

[96]

Real-time

classificatio

nNearest

neighb

orradicradic

radicradic

radicradic

Sensea

ndsend

Generic

radicradicradic

Classifi

catio

naccuracy

Not

evaluatedon

realdataset

LWClass[100]

Preserve

WSN

sresources

KNN

radicradic

radicradic

radicradic

Sensea

ndsend

Ubiqu

itous

environm

ents

radicradic

Resource

awareness

Non

adaptio

nto

conceptd

rift

FVLD

[104

]Lo

w-dim

ensio

nfeaturev

ector

generatio

nKN

NM

Lradic

radicradic

radicradic

radicradic

Classifi

catio

nVe

hicle

classificatio

nradic

radicEn

ergy

Highcostof

feature

vector

transm

ission

Fuzzypredictor

mod

el[99]

Occup

ancy

predictio

nFu

zzyrules

radicradic

radicradic

radicradic

Sensea

ndsend

Health

care

radicradic

Predictio

naccuracy

Ineffi

cientfor

complex

scenarios

Onlinelearning

[105]

Increm

ental

classificatio

nSV

Mradic

radicradic

radicradic

radicradic

Classifi

catio

nEn

vironm

ental

mon

itorin

gradic

radicEn

ergy

Com

putatio

nal

complexity

One-class

quarter-sphere

SVM

[108]

Ano

maly

detection

SVM

radicradic

radicradic

radicradicradic

Localano

maly

detection

Habitat

mon

itorin

gradic

radicEn

ergy

Igno

resspatia

lcorrelation

18 International Journal of Distributed Sensor Networks

mining becomes difficult because updates on this structureshould be persisted over time

Node Role Node can perform three types of role [33] asfollows

(i) Regular Sensor These are the nodes with limitedresources and they are used to sense the phenomenaand send the sensed data to the base station

(ii) Cluster Head Cluster head can be a regular sensornode or it can be rich in resources In centralizedapproaches cluster head is a regular sensor node thatonly controls the cluster membership In distributedapproaches besides responding for cluster formationCHs perform aggregationfusion of collected sensorsrsquodata Therefore they are equipped with significantlymore computation and communication resources

(iii) Relay It is the node that acts as medium to transmitthe data packet from one node to the others

Node Task In centralized approach node task is to sense thephenomena being monitored and send the sensed data to thebase station In distributed approaches node can performcomputation and can take action based on the detectedphenomena or target

55 Application Area We also evaluated the type of applica-tion benefited fromWSNs data mining techniques Here weexemplify some real-world applications as follows

(i) First is the environmental monitoring [5ndash7 51 5887] in which sensors are deployed in harsh andunattended regions to monitor the natural environ-ment Data mining techniques can identify when andwhere an event may occur and trigger an alarm upondetection

(ii) Second is the habitant and health monitoring [1 299 109] in which patientshumans are equipped withsmall sensors on multiple different positions of theirbody tomonitor their health or behaviorDataminingtechnique can identify the abnormal behavior andhelp to take effective action

(iii) Third is the object tracking [3 4 65 66] in whichsensors are embedded inmoving targets to track themin real-time Data mining techniques help to improvethe estimation of the location of targets and also tomake tracking more efficient and accurate

(iv) Fourth is the WSNs performance [46 48 50 51]WSNs are usually unattended and deployed in harshenvironment Sensor nodes are resource constrainedespecially in terms of power Data mining techniqueshelp to identify the faulty or dead nodes Theyalso help to conserve energy by using in-networkprocessing in which aggregated data is sent to centralside

(v) Fifth is the data analysis [67 84 90] Data miningtechniques help to discover potentially interesting

data patterns in a sensor network for a certainapplication

(vi) Sixth is the real-time monitoring [64 65 85] Datamining techniques especially distributed techniqueshelp to identify certain patterns and predict futureevents in a given time window which make real-timeresponse and action feasible

56 Implementation Each technique is also evaluated interms of experimental validation that is which dataset isused which WSNs optimization objectives are achieved andso forth

Evaluation Method Analytical modeling simulation andreal deployment are the most commonly used techniques toanalyze the performance of data mining technique forWSNs

(i) Analytical Modeling This method is very complexand usually certain simplifications are assumed topredict the performance of the proposed schemeSuch assumptions and simplifications may lead toimprecise results with limited confidence

(ii) Simulation It is the most popular and effectiveapproach to design and test any proposed schemein terms of cost and time it also provides higherlevel of details as comparedwith real implementationHowever the appropriate selection of a simulationframework according to problem and network char-acteristics is a critical task

(iii) Real Deployment It may not be feasible to evaluatethe performance of these techniques through realdeployment due to the unavailability of appropriatehardware in terms of technical and design limitationsUsually the real deployment requires hundreds ofsensor nodes and cost becomes another importantissue In a nutshell evaluating any technique pro-posed for WSNs through real deployment can getthe most convincing results although the evaluatingprocess is complex costly and time consuming

Data Source It refers to dataset use to experimentally validatethe proposed technique Two types of dataset are usedgenerally that is synthetic and real It is observed from thispaper that most of the techniques use the simulation onsynthetic dataset to validate the result In this paper it isobserved that most of the studies used the simulation due tolimited processing power of sensor nodes

Optimization Objective SinceWSNs are constrained in termsof different resources the technique is also evaluated in theoptimization objective that has been achieved Most of thetechniques consider the resource constraint and differentdesign philosophies of network None of them can workefficiently for all of the performance metrics like networksize communication overhead energy efficiency memoryconsumption node mobility and and so forth The largevariations in the performance metrics make it a difficult taskto present a comprehensive evaluation

International Journal of Distributed Sensor Networks 19

6 Limitations of Existing Data MiningTechniques for WSNs

Tables 2ndash6 show the characteristics of datamining techniquesdesigned for WSNs It is observed from comparative analysisthat the existing techniques have the following shortcomings

(i) Most of the techniques do not take into account theheterogeneous data and assume that the sensor data ishomogenous [42 46 49ndash51 65 87 110] They ignorethe fact that different attributes together can improvethe mining accuracy In some cases homogenousdata cannot contribute appropriately toward real-time decision

(ii) The majority of techniques only considers the spatialor temporal or spatiotemporal correlations [65ndash6787 88] among sensor data of neighboring nodes anddoes not consider the attribute dependency amongsensor nodes This in turn increases the computa-tional complexity and reduces the accuracy of miningtechnique

(iii) The techniqueswhich consider spatial correlation [51]among sensor data of neighboring nodes suffer fromthe choice of appropriate neighborhood range Tech-niques which consider temporal correlation amongsensor data suffers from the choice of the size of thesliding window

(iv) The majority of techniques uses centralized approach[21 42ndash44 46 58 84 101] in which all data istransmitted to the sink node for identifying certainpatterns These techniques cause much communica-tion overhead and delay the response time Whilethe techniques that used distributed architecture opti-mize response time and energy consumption theyhave the same problem as that of the centralizedapproach if the aggregatorcluster head has a largenumber of nodes under its membership

(v) Excluding a few the performance of all of the schemesdiscussed in this paper has been evaluated with thehelp of different simulation tools Although the num-ber of simulators is available and plays an importantrole for developing and testing new technique thereis always some kind of risk involved as simulationresults may not be accurate In order to analyze aprotocol more effectively it is important to knowdifferent available tools andunderstand the associatedbenefits and limitationsDue to different performancerequirements according to specific applications ageneral tool for sensor networks is still lacking atpresent

(vi) The techniques evaluated by using analytical mod-eling [21 23 46 49 100 109] used certain sim-plification and assumption to evaluate the perfor-mance of proposed technique Such assumptions andsimplifications may lead to imprecise results withlimited confidence None of the proposed techniqueis evaluated by using real deployment Although realdeployment is complex costly and time consuming

accurate results can only be obtained by using realdeployment

(vii) Excluding a few [22 103 109] the majority oftechniques assumes that sensor nodes are stationaryand do not consider nodes mobility Applying thesetechniques for mobile networks or the networks withdynamic changed topology would be challenging

(viii) Most of the techniques used the synthetic dataAlthough synthetic data is easily available therealways been chances that results generated on syn-thetic data are not accurate

(ix) For the data mining techniques themselves fre-quent pattern mining [15ndash20] approaches suffer fromchoice of proper and flexible support and confidencethreshold Clustering techniques [11ndash14] suffer fromthe choice of an appropriate parameter of clusterwidth and computing the distance between datainstances in heterogeneous data is computationallyexpensive whereas classification-based techniques[24ndash26] require some prior knowledge to classify theincoming data stream However learning accurateclassification model is challenging if the number ofvariables is large in deployed WSNs

7 Future Research Directions

It is observed from the analysis of existing data mining workon sensor network-based application there are still shortcom-ings in existing techniques By seeing these shortcomingsand special characteristics of WSNs there is a need for datamining technique designed for WSNs The technique shouldbe based on the following requirements

(i) The technique should combine offline learningmech-anisms with distributed and online data processing

(ii) It should also consider the resource constraint ofWSN and its special characteristics such as nodemobility and network topology

(iii) The technique should consider heterogeneous dataand dependencies among spatial temporal andattribute correlations which may exist between adja-cent nodes

(iv) During online mining the technique should be capa-ble for incremental learning

(v) The technique should have low computation com-plexity and be easy to be implemented

Based on aforementioned requirements for WSN ahybrid data mining framework is proposed as shown inFigure 6 In this framework sensor nodes use their pro-cessing abilities to locally carry out mining processing andtransmit only the required and partially processed data calledlocal models Single-pass algorithms are applied for networkdata processing as the data is continuously arriving and notavailable for the next scan

Local models contain the compact event patterns ratherthan raw data which address the issue of communication

20 International Journal of Distributed Sensor Networks

Node data processingData selectionRemove duplicationAggregationSummarizationData fusionclusteringAssociation analysismiddot middot middotmiddot middot middot

middot middot middot

Sensor datastream

Global model

Approximateresults

Network model Local modelQuery

Users

Sinkbasestation

In-network processingCentralized processing

Central data processingFrequent pattern miningClassificationClusteringIncremental learningPredicationAnomaly detectionTime series analysis

Network data processingLocal model integrationNetwork analysisReal time decisionsNetwork maintenance

Network patternidentification

monitoring

Sing

le p

ass

Mul

ti pa

ss

Figure 6 Proposed hybrid framework for sensor network based applications

overhead associated with data transfer Local models aredistributed on entire network which are integrated at specialnode which is resource sufficient as compared with othersensor nodes As a result a network model is computed that ismore abstract than local model and is transferred to the basestationsink inmultihop fashionThenetworkmodels are thenintegrated at base stationsink to get the global view of entirenetwork named the global model As a result approximatequery answers are returned to endusers

This framework addresses the following shortcomings ofthe existing techniques

(i) It combines the offline learning mechanisms withdistributed and online data processing The dynamicnature of WSNs data requires real-time analysismethodologies and systems Centralized processingthrough high-end computing is also required forgenerating offline predictive insights which in turncan facilitate real-time analysis The applications thatrequire real-time response and actions can use net-work model for decision and knowledge extractionThe applications that need extensive data analysis fortheir decision making can use global model and per-form central processing on base the stationsink Thenetwork model forwards the processed informationto global model for extensive predictive insight

(ii) Since the data management is a crucial issue inWSNsdata [111] in order to deal with large-scale data fromWSNs the proposed framework splits the data pro-cessing tasks at multiple locations in-network pro-cessing and processing at central server In-networkprocessing splits the large task into smaller ones atnode level and cluster head which is distributed overthe entire network and executes parallelly At the node

level storage capacities of single nodes are used tocompute the local model which contains aggregateddata from single node whereas cluster head acquiresthe data from group of nodes and aggregate datareadings over a certain region or period As a resultnetwork model is computed at each cluster headwhich contains compact data from set of nodes andreduces data size to be transmitted Network modelscan be integrated at sink to get the global view ofreal-time applications Since the sink at network levelhas restricted resource and cannot process large-scaledata for predictive analysis therefore network mod-els are sent to central server where global models canbe computed for predictive offline analysis Historicalquery from the user can also be addressed fromcentral server whereas instant query can be handledby sink to support real-time response In this way ofdata distribution the proposed framework is feasibleto deal with large amount of data obtained fromWSNs

(iii) It can consider the resource constraint of sensornode by using context-awareness techniques Mem-ory energy [79] and bandwidth are considered inthe implementation of data processing on the sensorsfor example many summarization and aggregationtechniques can be adopted to reduce energy andbandwidth consumption

(iv) The framework can address the problem quicklychanging nature of WSNs data where characteristicsof the monitored process may change over timeand render the old models outdated This problemcan be addressed using the incremental learning

International Journal of Distributed Sensor Networks 21

mechanism [39 112] that helps the model to updatenew information

(v) The framework can identified the spatial-temporalcorrelation at local model by using data correlation-based clustering whereas attribute correlation can beidentified at global model by using the multipass datamining algorithms

Currently we are working on implementation of thishybrid framework and the implementationwill be completedin the near future

8 Conclusion

The emerging need for the data mining techniques in thefield of WSNs resulted in the development of numerousalgorithms Each one of these algorithms solves certainissues related to the appropriate WSNs type and applicationIn this paper we analyzed discussed and compared therelated existing research approaches We observed that thetechniques intended for mining sensor data at the networkside are helpful for taking real-time decision aswell as serve asprerequisite for development of effective mechanism for datastorage retrieval query and transaction processing at centralside Moreover we have presented problem-based taxonomyan overall analysis and review of the past research and theirlimitations which can provide insights for endusers in apply-ing or developing an appropriate data mining method andappropriate technology forWSNs Based on these limitationswe have proposed a hybrid framework which can addressthe shortcomings of existing work We have also discussedthe challenges for implementing data mining techniques inresource-constrained WSNs Besides there are a number ofopen issues in existing studies which need to be addressedSurely the number of WSNs applications presented hereis neither complete nor exhaustive but merely a sample ofapplications that demonstrate the usefulness and possibleapplications of data mining method in sensor network

We believe that WSNs applications will become moremature and popular with the advancement of sensor tech-nology and sensor data will become more informationrich Mining techniques will then be very significant inorder to conduct advanced analysis such as determiningtrends and finding interesting patterns thus enhancingWSNsperformance and operation The intention to present thispaper is to stimulate interests in utilizing and developing theprevious studies into emerging applications

Acknowledgments

This work was supported in part by the Joint Funds ofNSFC-Microsoft Research Asia under Grant no 60933012the Specialized Research Fund for the Doctoral Programof Higher Education under Grant no 20110142110062 andInternational SampT Cooperation Program of Hubei Provinceunder Grant no 2010BFA008

References

[1] A Rozyyev H Hasbullah and F Subhan ldquoIndoor child track-ing in wireless sensor network using fuzzy logic techniquerdquoResearch Journal of Information Technology vol 3 no 2 pp 81ndash92 2011

[2] R Szewczyk E Osterweil J Polastre M Hamilton A Main-waring and D Estrin ldquoHabitat monitoring with sensor net-worksrdquo Communications of the ACM vol 47 no 6 pp 34ndash402004

[3] S H Chauhdary A K Bashir S C Shah and M S ParkldquoEOATR energy efficient object tracking by auto adjustingtransmission range in wireless sensor networkrdquo Journal ofApplied Sciences vol 9 no 24 pp 4247ndash4252 2009

[4] P K Biswas and S Phoha ldquoSelf-organizing sensor networks forintegrated target surveillancerdquo IEEETransactions onComputersvol 55 no 8 pp 1033ndash1047 2006

[5] L T Lee and C W Chen ldquoSynchronizing sensor networkswith pulse coupled and cluster based approachesrdquo InformationTechnology Journal vol 7 no 5 pp 737ndash745 2008

[6] N Sabri S A Aljunid B Ahmad A Yahya R KamaruddinandM S Salim ldquoWireless sensor actor network based on fuzzyinference system for greenhouse climate controlrdquo Journal ofApplied Sciences vol 11 no 17 pp 3104ndash3116 2011

[7] D Kumar ldquoMonitoring forest cover changes using remotesensing and GIS a global prospectiverdquo Research Journal ofEnvironmental Sciences vol 5 pp 105ndash123 2011

[8] J Yick B Mukherjee and D Ghosal ldquoWireless sensor networksurveyrdquoComputerNetworks vol 52 no 12 pp 2292ndash2330 2008

[9] T Arampatzis J Lygeros and S Manesis ldquoA survey of appli-cations of wireless sensors and wireless sensor networksrdquoin Proceedings of the 20th IEEE International Symposium onIntelligent Control (ISIC rsquo05) pp 719ndash724 June 2005

[10] Y-C Tseng M-S Pan and Y-Y Tsai ldquoWireless sensor net-works for emergency navigationrdquo Computer vol 39 no 7 pp55ndash62 2006

[11] T Yairi Y Kato and K Hori ldquoFault detection by miningassociation rules fromhouse-keeping datardquo inProceedings of the6th International Symposium on Artificial Intelligence Roboticsand Automation in Space pp 18ndash21 2001

[12] O Horovitz S Krishnaswamy and M M Gaber ldquoA fuzzyapproach for interpretation of ubiquitous data stream clusteringand its application in road safetyrdquo Intelligent Data Analysis vol11 no 1 pp 89ndash108 2007

[13] J Gama P P Rodrigues and L Lopes ldquoClustering distributedsensor data streams using local processing and reduced com-municationrdquo Intelligent Data Analysis vol 15 no 1 pp 3ndash282011

[14] Z A Aghbari I Kamel and T Awad ldquoOn clustering largenumber of data streamsrdquo Intelligent Data Analysis vol 16 no1 pp 69ndash91 2012

[15] A Boukerche and S Samarah ldquoAn efficient data extractionmechanism for mining association rules from wireless sensornetworksrdquo in Proceedings of the IEEE International Conferenceon Communications (ICC rsquo07) pp 3936ndash3941 June 2007

[16] Y Chi H Wang P S Yu and R R Muntz ldquoMomentmaintaining closed frequent itemsets over a stream slidingwindowrdquo inProceedings of the 4th IEEE International Conferenceon Data Mining (ICDM rsquo04) pp 59ndash66 November 2004

[17] M Deypir and M H Sadreddini ldquoEclatDS an efficient slid-ing window based frequent pattern mining method for data

22 International Journal of Distributed Sensor Networks

streamsrdquo Intelligent Data Analysis vol 15 no 4 pp 571ndash5872011

[18] J Gama A Ganguly O Omitaomu R Vatsavai and M GaberldquoKnowledge discovery from data streamsrdquo Intelligent DataAnalysis vol 13 no 3 pp 403ndash404 2009

[19] B George J M Kang and S Shekhar ldquoSpatio-temporal sensorgraphs (STSG) a data model for the discovery of spatio-temporal patternsrdquo Intelligent Data Analysis vol 13 no 3 pp457ndash475 2009

[20] A Mahmood K Shi and S Khatoon ldquoMining data generatedby sensor networks a surveyrdquo Information Technology Journalvol 11 pp 1534ndash1543 2012

[21] D J Cook M Youngblood E O Heierman III et alldquoMavHome an agent-based smart homerdquo in Proceedings of the1st IEEE International Conference on Pervasive Computing andCommunications (PerCom rsquo03) pp 521ndash524 March 2003

[22] J Rabatel S Bringay and P Poncelet ldquoSO MAD sensorminingfor anomaly detection in railway datardquo in Advances in DataMining Applications andTheoretical Aspects pp 191ndash205 2009

[23] V Guralnik and K Z Haigh ldquoLearning models of humanbehaviour with sequential patternsrdquo in Proceedings of the AAAI-02 Workshop on Automation as Caregiver pp 24ndash30 2002

[24] S Huang and Y Dong ldquoAn active learning system for miningtime-changing data streamsrdquo Intelligent Data Analysis vol 11no 4 pp 401ndash419 2007

[25] J Beringer and E Hullermeier ldquoEfficient instance-based learn-ing on data streamsrdquo Intelligent Data Analysis vol 11 no 6 pp627ndash650 2007

[26] E J Spinosaa A PD L F deCarvalhoa and J Gamab ldquoNoveltydetection with application to data streamsrdquo Intelligent DataAnalysis vol 13 no 3 pp 405ndash422 2009

[27] M Xie S Han B Tian and S Parvin ldquoAnomaly detectionin wireless sensor networks a surveyrdquo Journal of Network andComputer Applications vol 34 no 4 pp 1302ndash1325 2011

[28] Y Zhang N Meratnia and P Havinga ldquoOutlier detectiontechniques for wireless sensor networks a surveyrdquo IEEE Com-munications Surveys and Tutorials vol 12 no 2 pp 159ndash1702010

[29] V Chandola A Banerjee and V Kumar ldquoAnomaly detection asurveyrdquo ACM Computing Surveys vol 41 no 3 article 15 2009

[30] VMaojo and J Sanandre ldquoA survey of data mining techniquesrdquoMedical Data Analysis Lecture Notes in Computer Science vol1933 pp 17ndash22 2000

[31] W Jinlong X Congfu C Weidong and P Yunhe ldquoSurveyof the study on frequent pattern mining in data streamsrdquo inProceedings of the IEEE International Conference on SystemsMan and Cybernetics (SMC rsquo04) pp 5917ndash5922 October 2004

[32] J Cheng Y Ke and W Ng ldquoA survey on algorithms formining frequent itemsets over data streamsrdquo Knowledge andInformation Systems vol 16 no 1 pp 1ndash27 2008

[33] A A Abbasi andM Younis ldquoA survey on clustering algorithmsfor wireless sensor networksrdquo Computer Communications vol30 no 14-15 pp 2826ndash2841 2007

[34] O Boyinbode H Le and M Takizawa ldquoA survey on clusteringalgorithms for wireless sensor networksrdquo International Journalof Space-Based and SituatedComputing vol 1 no 2 pp 130ndash1362007

[35] M M Gaber A Zaslavsky and S Krishnaswamy ldquoA survey ofclassificationmethods in data streamsrdquo inData Streams pp 39ndash59 Springer 2007

[36] R Agrawal and R Srikant ldquoFast algorithms for mining associ-ation rulesrdquo in Proceedings of the 20th International ConferenceVery Large Data Bases (VLDB rsquo94) pp 487ndash499 Citeseer 1994

[37] R J Bayardo Jr ldquoEfficiently mining long patterns fromdatabasesrdquo SIGMOD Record vol 27 no 2 pp 85ndash93 1998

[38] S Brin RMotwani andC Silverstein ldquoBeyondmarket basketsgeneralizing association rules to correlationsrdquo SIGMODRecordvol 26 no 2 pp 265ndash276 1997

[39] W Cheung and O R Zaiane ldquoIncremental mining of frequentpatterns without candidate generation or support constraintrdquoin Proceedings of 7th International Database Engineering andApplications Symposium pp 111ndash116 2003

[40] R Agrawal T Imielinski and A Swami ldquoMining associationrules between sets of items in large databasesrdquo in Proceeding ofSIGMOD pp 207ndash216

[41] J Han J Pei Y Yin and R Mao ldquoMining frequent pat-terns without candidate generation a frequent-pattern treeapproachrdquo Data Mining and Knowledge Discovery vol 8 no 1pp 53ndash87 2004

[42] M Halatchev and L Gruenwald ldquoEstimating missing valuesin related sensor data streamsrdquo in Proceedings of the 11thInternational Conference on Management of Data (COMADrsquo05) 2005

[43] N Jiang ldquoDiscovering association rules in data streams basedon closed pattern miningrdquo in Proceedings of the SIGMODWorkshop on Innovative Database Research 2007

[44] N Jiang and L Gruenwald ldquoEstimating missing data in datastreamsrdquo Advances in Databases Concepts Systems and Appli-cations pp 981ndash987 2007

[45] N Jiang and L Gruenwald ldquoCFI-stream mining closed fre-quent itemsets in data streamsrdquo in Proceedings of the 12th ACMSIGKDD International Conference on Knowledge Discovery andData Mining (KDD rsquo06) pp 592ndash597 August 2006

[46] K Loo I Tong and B Kao ldquoOnline algorithms for min-ing inter-stream associations from large sensor networksrdquo inAdvances in KnowledgeDiscovery andDataMining pp 291ndash3022005

[47] G S Manku and R Motwani ldquoApproximate frequency countsover data streamsrdquo in Proceedings of the 28th InternationalConference on Very Large Data Bases pp 346ndash357 2002

[48] S K Chong S Krishnaswamy S W Loke and M M GaberldquoUsing association rules for energy conservation in wirelesssensor networksrdquo in Proceedings of the 23rd Annual ACMSymposium on Applied Computing (SAC rsquo08) pp 971ndash975March 2008

[49] S K Tanbeer C F Ahmed B-S Jeong and Y-K Lee ldquoEfficientmining of association rules from wireless sensor networksrdquo inProceedings of the 11th International Conference on AdvancedCommunication Technology (ICACT rsquo09) pp 719ndash724 February2009

[50] A Boukerche and S Samarah ldquoA novel algorithm for miningassociation rules in Wireless Ad Hoc Sensor Networksrdquo IEEETransactions on Parallel and Distributed Systems vol 19 no 7pp 865ndash877 2008

[51] K Romer ldquoDistributed mining of spatio-temporal event pat-terns in sensor networksrdquo in Proceedings of the 1st Euro-American Workshop on Middleware for Sensor Networks(EAWMS rsquo06) 2006

[52] BTnode platform httpwwwbtnodeethzch[53] R Agrawal and R Srikant ldquoMining sequential patternsrdquo in

Proceedings of the IEEE 11th International Conference on DataEngineering pp 3ndash14 March 1995

International Journal of Distributed Sensor Networks 23

[54] R Srikant and R Agrawal ldquoMining sequential patterns gen-eralizations and performance improvementsrdquo in Proceedings ofthe Advances in Database Technology (EDBT rsquo96) pp 1ndash17 1996

[55] F Masseglia F Cathala and P Poncelet ldquoThe PSP approachfor mining sequential patternsrdquo Principles of Data Mining andKnowledge Discovery pp 176ndash184 1998

[56] J Han J Pei B Mortazavi-Asl Q Chen U Dayal and M-CHsu ldquoFreeSpan frequent pattern-projected sequential patternminingrdquo in Proceedings of the Sixth ACMSIGKDD InternationalConference onKnowledgeDiscovery andDataMining (KDD rsquo01)pp 355ndash359 August 2000

[57] J Pei J Han B Mortazavi-Asl et al ldquoPrefixSpan min-ing sequential patterns efficiently by prefix-projected patterngrowthrdquo in Proceedings of the 17th International Conference onData Engineering pp 215ndash224 April 2001

[58] F Esposito T M A Basile N Di Mauro and S Ferilli ldquoA rela-tional approach to sensor network data miningrdquo InformationRetrieval and Mining in Distributed Environments pp 163ndash1812010

[59] F Esposito N Di Mauro T M A Basile and S FerillildquoMulti-dimensional relational sequence miningrdquo FundamentaInformaticae vol 89 no 1 pp 23ndash43 2008

[60] R Agrawal H Mannila R Srikant et al ldquoFast discovery ofassociation rulesrdquo inAdvances in KnowledgeDiscovery andDataMining pp 307ndash328 AAAI PressMenlo Park Calif USA 1996

[61] Mica2Dot CrossBow 2005 httpwwwxbowcom[62] Intel Berkeley Research Lab Data httpdbcsailmitedulab-

datalabdatahtml[63] P H Wu W C Peng and M S Chen ldquoMining sequential

alarm patterns in a telecommunication databaserdquo in Databasesin Telecommunications II pp 37ndash51 2001

[64] V S Tseng and E H-C Lu ldquoEnergy-efficient real-time objecttracking in multi-level sensor networks by mining and predict-ing movement patternsrdquo Journal of Systems and Software vol82 no 4 pp 697ndash706 2009

[65] V S Tseng and K W Lin ldquoEnergy efficient strategies for objecttracking in sensor networks a data mining approachrdquo Journalof Systems and Software vol 80 no 10 pp 1678ndash1698 2007

[66] S Samarah M Al-Hajri and A Boukerche ldquoA predictiveenergy-efficient technique to support object-tracking sensornetworksrdquo IEEE Transactions on Vehicular Technology vol 60no 2 pp 656ndash663 2011

[67] A Taherkordi R Mohammadi and F Eliassen ldquoA commu-nication-efficient distributed clustering algorithm for sensornetworksrdquo in Proceedings of the 22nd International Conferenceon Advanced Information Networking and Applications Work-shopsSymposia (AINA rsquo08) pp 634ndash638 March 2008

[68] G Gupta and M Younis ldquoLoad-balanced clustering of wirelesssensor networksrdquo in Proceedings of the International Conferenceon Communications (ICC rsquo03) vol 3 pp 1848ndash1852 May 2003

[69] S Bandyopadhyay and E J Coyle ldquoAn energy efficient hier-archical clustering algorithm for wireless sensor networksrdquo inProceedings of the 22nd Annual Joint Conference on the IEEEComputer and Communications Societies pp 1713ndash1723 April2003

[70] S Ghiasi A Srivastava X Yang and M Sarrafzadeh ldquoOptimalenergy aware clustering in sensor networksrdquo Sensors vol 2 no7 pp 258ndash269 2002

[71] O Younis and S Fahmy ldquoHEED a hybrid energy-efficientdistributed clustering approach for ad hoc sensor networksrdquoIEEE Transactions on Mobile Computing vol 3 no 4 pp 366ndash379 2004

[72] M Younis M Youssef and K Arisha ldquoEnergy-aware manage-ment for cluster-based sensor networksrdquo Computer Networksvol 43 no 5 pp 649ndash668 2003

[73] Y T Hou Y Shi H D Sherali and S F Midkiff ldquoOn energyprovisioning and relay node placement for wireless sensornetworksrdquo IEEE Transactions on Wireless Communications vol4 no 5 pp 2579ndash2590 2005

[74] T Wu and S Biswas ldquoA self-reorganizing slot allocation proto-col for multi-cluster sensor networksrdquo in Proceedings of the 4thInternational Symposium on Information Processing in SensorNetworks (IPSN rsquo05) pp 309ndash316 April 2005

[75] K Dasgupta K Kalpakis and P Namjoshi ldquoAn efficientclustering-based heuristic for data gathering and aggregationin sensor networksrdquo in Proceedings of the IEEE Wireless Com-munications and Networking Conference (WCNC rsquo03) vol 3 pp1948ndash1953 2003

[76] M Demirbas A Arora and V Mittal ldquoFLOC A fast local clus-tering service for wireless sensor networksrdquo in Proceedings ofWorkshop on Dependability Issues in Wireless Ad Hoc Networksand Sensor Networks (DIWANS rsquo04) 2004

[77] P Ding J Holliday and A Celik ldquoDistributed energy-efficienthierarchical clustering for wireless sensor networksrdquo in Pro-ceedings of the 1st IEEE International Conference on DistributedComputing in Sensor Systems (DCOSS rsquo05) pp 466ndash467 July2005

[78] H Chan and A Perrig ldquoACE an emergent algorithm for highlyuniform cluster formationrdquoWireless Sensor Networks vol 2920pp 154ndash171 2004

[79] H Chan M Luk and A Perrig ldquoUsing clustering informationfor sensor network localizationrdquo in Proceedings of the 1st IEEEInternational Conference on Distributed Computing in SensorSystems (DCOSS rsquo05) pp 109ndash125 July 2005

[80] H Huang and J Wu ldquoA probabilistic clustering algorithmin wireless sensor networksrdquo in Proceeding of IEEE 62ndSemiannual Vehicular Technology Conference (VTC rsquo05) p 17962005

[81] A Youssef M Younis M Youssef and A Agrawala ldquoDis-tributed formation of overlappingmulti-hop clusters in wirelesssensor networksrdquo in Proceedings of the 49th Annual IEEE GlobalCommunication Conference (Globecom rsquo06) pp 1ndash6 December2006

[82] S Dai P Wang L Gao and S Zheng ldquoMining clusteringalgorithm in wireless sensor networksrdquo in Proceedings of theIEEE International Conference on Granular Computing (GRCrsquo08) pp 178ndash182 August 2008

[83] W R Heinzelman A Chandrakasan and H Balakrish-nan ldquoEnergy-efficient communication protocol for wirelessmicrosensor networksrdquo in Proceedings of the 33rd AnnualHawaii International Conference on System Siences (HICSS rsquo00)vol 2 p 223 January 2000

[84] C Liu K Wu and J Pei ldquoA dynamic clustering and schedulingapproach to energy saving in data collection from wirelesssensor networksrdquo in Proceedings of the 2nd Annual IEEE Com-munications Society Conference on Sensor and AdHoc Commu-nications and Networks (SECON rsquo05) pp 374ndash385 September2005

[85] L Guo C Ai X Wang Z Cai and Y Li ldquoReal time clusteringof sensory data in wireless sensor networksrdquo in Proceedingsof the IEEE 28th International Performance Computing andCommunications Conference (IPCCC rsquo09) pp 33ndash40 December2009

24 International Journal of Distributed Sensor Networks

[86] M H Yeo M S Lee S J Lee and J S Yoo ldquoData correlation-based clustering in sensor networksrdquo in Proceedings of the Inter-national Symposium on Computer Science and its Applications(CSA rsquo08) pp 332ndash337 October 2008

[87] P Beyens A Nowe and K Steenhaut ldquoHigh-density wirelesssensor networks a new clustering approach for prediction-based monitoringrdquo in Proceedings of the 2nd European Work-shop on Wireless Sensor Networks (EWSN rsquo05) pp 188ndash196February 2005

[88] S Yoon and C Shahabi ldquoThe Clustered AGgregation (CAG)technique leveraging spatial and temporal correlations in wire-less sensor networksrdquo ACM Transactions on Sensor Networksvol 3 no 1 Article ID 1210672 2007

[89] K Wang S A Ayyash T D C Little and P Basu ldquoAttribute-based clustering for information dissemination in wirelesssensor networksrdquo in Proceedings of the 2nd Annual IEEE Com-munications Society Conference on Sensor and AdHoc Commu-nications and Networks (SECON rsquo05) pp 498ndash509 Santa ClaraCalif USA September 2005

[90] X Ma S Li Q Luo et al ldquoDistributed hierarchical clusteringand summarization in sensor networksrdquo in Advances in Dataand Web Management pp 168ndash175 2007

[91] L K Sharma O P Vyas S Schieder et al ldquoNearest neighbourclassification for trajectory datardquo Information and Communica-tion Technologies vol 101 pp 180ndash185 2010

[92] B Chikhaoui S Wang and H Pigot ldquoA new algorithm basedon sequential pattern mining for person identification in ubiq-uitous environmentsrdquo in Proceedings of the 4th InternationalWorkshop on Knowledge Discovery form Sensor Data (ACMSensorKDD rsquo10) pp 20ndash28 Washington DC USA 2010

[93] J R M Bauchet S Giroux H Pigot et al ldquoPervasive assistancein smart homes for people with intellectual disabilities a casestudy on meal preparationrdquo International Journal of AssistiveRobotics and Mechatronics vol 9 no 4 pp 42ndash54 2008

[94] D J Cook andM Schmitter-Edgecombe ldquoAssessing the qualityof activities in a smart environmentrdquoMethods of Information inMedicine vol 48 no 5 pp 480ndash485 2009

[95] I H Witten and E Frank Data Mining Practical MachineLearning Tools and Techniques With Java Implementation Mor-gan Kaufmann 2000

[96] K Sharma M Rajpoot and L K Sharma ldquoNearest neighbourclassification for wireless sensor network datardquo InternationalJournal of Computer Trends and Technology no 2 2011

[97] NS2 Simulator httpwwwisiedunsnamns[98] O P V L K Sharma S Schieder and A K Akasapu ldquoA nearest

neighbour classification for trajectory datardquo in Springer CCISvol 101 pp 180ndash185 2010

[99] M J Akhlaghinia A Lotfi C Langensiepen and N SherkatldquoA fuzzy predictor model for the occupancy prediction of anintelligent inhabited environmentrdquo in Proceedings of the IEEEInternational Conference on Fuzzy Systems (FUZZ rsquo08) pp 939ndash946 June 2008

[100] M Gaber S Krishnaswamy and A Zaslavsky ldquoOn-boardmining of data streams in sensor networksrdquo in AdvancedMethods for Knowledge Discovery from Complex Data pp 307ndash335 2005

[101] M M Gaber S Krishnaswamy and A Zaslavsky ldquoAdaptivemining techniques for data streams using algorithm outputgranularityrdquo in Proceedings of the Australasian Data MiningWorkshop 2003

[102] M M Gaber A Zaslavsky and S Krishnaswamy ldquoResource-aware knowledge discovery in data streamsrdquo in Proceedingsof 1st International Workshop on Knowledge Discovery in DataStreams held in Conjunction ECML and PKDD 2004

[103] S M McConnell and D B Skillicorn ldquoA distributed approachfor prediction in sensor networksrdquo in Proceedings of the Work-shop on Data Mining in Sensor Networks Newport Beach CalifUSA 2005

[104] B Malhotra I Nikolaidis and J Harms ldquoDistributed classifi-cation of acoustic targets in wireless audio-sensor networksrdquoComputer Networks vol 52 no 13 pp 2582ndash2593 2008

[105] K Flouri B Beferull-Lozano and T Tsakalides ldquoTraininga SVM-based classifier in distributed sensor networksrdquo inProceedings of the 14th International Conference onDigital SignalProcessing (DSP rsquo09) pp 1ndash5 2006

[106] K Flouri B Beferull-Lozano and T Tsakalides ldquoEnergy-efficient distributed support vectormachines for wireless sensornetworksrdquo in Proceedings of the EuropeanWorkshop onWirelessSensor Networks 2006

[107] K Flouri B Beferull-Lozano and T Tsakalides ldquoDistributedconsensus algorithms for SVM training in wireless sensornetworksrdquo in Proceedings of the 16th European Signal ProcessingConference (EUSIPCO 09) 2008

[108] S Rajasegarar C Leckie M Palaniswami and J C BezdekldquoQuarter sphere based distributed anomaly detection in wire-less sensor networksrdquo in Proceedings of the IEEE InternationalConference on Communications (ICC rsquo07) pp 3864ndash3869 June2007

[109] B Chikhaoui S Wang and H Pigot ldquoA new algorithm basedon sequential pattern mining for person identification in ubiq-uitous environmentsrdquo in Proceedings of the 4th InternationalWorkshop on Knowledge Discovery form Sensor Data (ACMSensorKDD rsquo10) pp 20ndash28 Washington DC USA 2010

[110] K Romer and F Mattern ldquoThe design space of wireless sensornetworksrdquo IEEEWireless Communications vol 11 no 6 pp 54ndash61 2004

[111] O Diallo J J P C Rodrigues and M Sene ldquoReal-time datamanagement on wireless sensor networks a surveyrdquo Journal ofNetwork andComputer Applications vol 35 no 3 pp 1013ndash10212012

[112] Y Yao L Feng B Jin and F Chen ldquoAn incremental learningapproachwith SupportVectorMachine for network data streamclassification problemrdquo Information Technology Journal vol 11no 2 pp 200ndash208 2012

Submit your manuscripts athttpwwwhindawicom

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2013Part I

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

DistributedSensor Networks

International Journal of

ISRN Signal Processing

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Mechanical Engineering

Advances in

Modelling amp Simulation in EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Advances inOptoElectronics

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2013

ISRN Sensor Networks

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawi Publishing Corporation httpwwwhindawicom Volume 2013

The Scientific World Journal

ISRN Robotics

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

International Journal of

Antennas andPropagation

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

ISRN Electronics

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

thinspJournalthinspofthinsp

Sensors

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Active and Passive Electronic Components

Chemical EngineeringInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Electrical and Computer Engineering

Journal of

ISRN Civil Engineering

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Advances inAcoustics ampVibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Page 14: ReviewArticle Data Mining Techniques for Wireless Sensor ...home.etf.bg.ac.rs/~vm/os/dmsw/Data Mining... · have a large impact on type of data mining algorithm to choose;therefore,onehastodecidetheprocessing

14 International Journal of Distributed Sensor Networks

Table3Com

paris

onof

dataminingtechniqu

esforw

irelesssensor

networkscon

tinued

Approach

Objectiv

eDM

metho

d

Processin

gSensor

data

Nod

eproperties

Implem

entatio

nLimitatio

nsArchitecture

Attributes

Correlatio

nCon

nectivity

Mob

ility

Nod

erole

Nod

etask

Applicationarea

Evaluatio

nmetho

dDatas

ource

Opt

objective

Distributed

Central

Homogenous

Heterogeneous

Attribute

Spatial

Temporal

Singlehop

Multihops

Static

Mobile

Clusterhead

Sensor

Relay

Simulation

Analyticalmod

Real

Synthetic

Frequent

patte

rnmining

Associationrules

mining

fram

ework[50]

Faultand

future

event

predictio

n

FP-growth

usingPL

T-str

uctureradic

radicradic

radicradic

radicradic

Aggregatio

nMon

itorW

SNs

quality

ofserviceradic

radicNoof

messages

Increase

costdu

eto

multip

leDBscan

SP-tr

ee[49]

Disc

over

events

patte

rns

FP-growth

based

radicradic

radicradic

radicradic

Sensea

ndsend

Generic

mon

itorin

gradicradicradic

Mem

ory

Hightre

econstructio

ncost

Sequ

entia

lpattern

mining

Relatio

nal

fram

ework[58]

Multi-

dimensio

nal

correlation

discovery

Aprio

rilik

eradic

radicradicradic

radicradic

Sensea

ndsend

Environm

ental

mon

itorin

gradicradic

Data

representatio

nMem

oryandtim

econsum

ing

Episo

dediscovery(ED)

[21]

Actio

npredictio

n

Generalized

sequ

entia

lpatte

rn(G

SP)

radicradic

radicradic

radicSensea

ndsend

Inhabitants

behavior

predictio

nradicradicradic

Predictio

naccuracy

Ineffi

cientfor

complex

activ

ities

MPG

[64]

Predicto

bjectrsquos

future

movem

ent

Aprio

rilik

eradic

radicradic

radicradicradic

Clusterin

gRe

al-timeo

bject

tracking

radicradic

Tracking

time

andenergy

Not

analyzed

onrealdataset

Con

textual

patte

rns

discovery[22]

Ano

maly

detection

PSP

radicradicradic

radicradic

radicSensea

ndsend

Railw

aymaintenance

radicradic

Ano

maly

precision

Missingreal-time

anom

alypredictio

n

International Journal of Distributed Sensor Networks 15

Table4Com

paris

onof

dataminingtechniqu

esforw

irelesssensor

networkscon

tinued

Approach

Objectiv

eDM

metho

d

Processin

gSensor

data

Nod

eproperties

Implem

entatio

nLimitatio

nsArchitecture

Attributes

Correlatio

nCon

nectivity

Mob

ility

Nod

erole

Nod

etask

Applicationarea

Evaluatio

nmetho

dDatas

ource

Optobjectiv

e

Distributed

Central

Homogenous

Heterogeneous

Attribute

Spatial

Temporal

Singlehop

Multihops

Static

Mobile

ClusterheadSensorRelay

Simulation

Analyticalmod

Real

Synthetic

Sequ

entia

lpattern

mining

TMP-mine[65]

Predicto

bjectrsquos

future

movem

ent

Patte

rngrow

thusingTM

P-tre

econstructio

nradic

radicradic

radicradic

radicRu

le-based

node

activ

ation

Real-timeo

bject

tracking

radicradic

Energy

Highmissing

rateandtim

e

Patte

rnlearner[23]B

ehavior

recogn

ition

Tree

projectio

nradic

radicradic

radicradic

radicSensea

ndsend

Behavior

mon

itorin

gradicradic

Noof

patte

rns

learned

Com

plex

and

redu

ndant

patte

rns

MSA

P[63]

Faultp

rediction

Cand

idate

constructio

nradicradic

radicradicradic

radicSensea

ndsend

Telecommun

ication

radicradic

Patte

rnsa

ccuracy

Cand

idate

constructio

nis

expensiveto

compu

te

PTSP

[66]

Objectrsquos

future

movem

ent

predictio

n

Sequ

entia

lpatte

rngeneratio

nradic

radicradic

radicradic

radicRu

le-based

node

activ

ation

Objecttracking

radicradic

Energy

Ineffi

cientto

predict

high

-speed

objects

Clusterin

g

DCC

[86]

WSN

slon

gevity

Data

correlation-

based

cluste

ring

radicradic

radicradicradic

radicradic

Data

supp

ression

GenericWSN

sapplication

radicradic

Energy

anddata

size

Highclu

sterin

grate

H-cluste

r[85]

In-network

commun

ication

Data

correlation-

based

cluste

ring

radicradic

radicradicradic

radicradic

Data

summarization

Real-time

mon

itorin

gradic

radicradic

Com

mun

ication

Highdataloss

rate

16 International Journal of Distributed Sensor Networks

Table5Com

paris

onof

dataminingtechniqu

esforw

irelesssensor

networkscon

tinued

Approach

Objectiv

eDM

metho

d

Processin

gSensor

data

Nod

eproperties

Implem

entatio

nLimitatio

nsArchitecture

Attributes

Correlatio

nCon

nectivity

Mob

ility

Role

Nod

etask

Applicationarea

Evaluatio

nmetho

dDatas

ource

Optobjectiv

e

Distributed

Central

Homogenous

Heterogeneous

Attribute

Spatial

Temporal

Singlehop

Multihops

Static

Mobile

ClusterheadSensorRelay

Simulation

Analyticalmod

Real

Synthetic

Clusterin

gPredictio

nmod

el[87]

Predictio

n-based

mon

itorin

gHeuris

ticscheme

radicradic

radicradic

radicradic

radicradicradic

Localprediction

mod

elEn

vironm

ental

mon

itorin

gradic

radicCom

mun

ication

Clustero

verla

pping

CAG[88]

WSN

sbandw

idth

gain

Data

correlation-

based

cluste

ring

radicradic

radicradic

radicradic

radicradic

Dataa

ggregatio

nGenericWSN

sapplications

radicradic

Com

mun

ication

Sensorydataloss

EEDC[84]

On-demand

cluste

ring

Data

correlation-

based

cluste

ring

radicradic

radicradic

radicradic

radicSensea

ndsend

Surveillanced

ata

analysis

radicradicradic

Energy

Ineffi

cientfor

large

WSN

s

Clusterin

gsensorydata[67]Com

mun

ication

efficiency

K-means

radicradicradic

radicradic

radicradic

Data

summarization

Dataa

nalysis

radicradic

Com

mun

ication

Ineffi

cientfor

large

WSN

sAttributeb

ased

cluste

ring[89]

WSN

sbandw

idth

gain

Hierarchal

cluste

ringradic

radicradic

radicradic

radicradic

Datac

luste

ring

Mon

itorin

gand

tracking

radicradic

Com

mun

ication

Highcompu

tatio

ncost

DHCS

[90]

Uniform

data

distr

ibution

Hierarchal

cluste

ringradic

radicradicradic

radicradic

radicradic

Datac

luste

ring

and

summarization

Interactived

ata

analysis

radicMessage

redu

ction

Nod

esenergy

isigno

red

International Journal of Distributed Sensor Networks 17

Table6Com

paris

onof

dataminingtechniqu

esforw

irelesssensor

networkscon

tinued

Approach

Objectiv

eDM

metho

d

Processin

gSensor

data

Nod

eproperties

Implem

entatio

nLimitatio

nsArchitecture

Attributes

Correlatio

nCon

nectivity

Mob

ility

Role

Nod

etask

Applicationarea

Evaluatio

nmetho

dDatas

ource

Opt

objective

Distributed

Central

Homogenous

Heterogeneous

Attribute

Spatial

Temporal

Singlehop

Multihops

Static

Mobile

ClusterheadSensorRelay

Simulation

Analyticalmod

Real

Synthetic

Classifi

catio

nPerson

identifi

catio

nalgorithm

s[109]

Identifyhu

man

behavior

Decision

tree

radicradicradic

radicradic

radicSensea

ndsend

Health

care

radicradic

Classifi

catio

naccuracy

Doesn

otgu

arantee

thec

orrectness

Predictio

nfram

ework[103]

Distrib

uted

predictio

nDecision

tree

radicradic

radicradicradic

radicradic

Localprediction

Generic

radicradic

Predictio

naccuracy

Com

putatio

nal

complexity

NNTC

[96]

Real-time

classificatio

nNearest

neighb

orradicradic

radicradic

radicradic

Sensea

ndsend

Generic

radicradicradic

Classifi

catio

naccuracy

Not

evaluatedon

realdataset

LWClass[100]

Preserve

WSN

sresources

KNN

radicradic

radicradic

radicradic

Sensea

ndsend

Ubiqu

itous

environm

ents

radicradic

Resource

awareness

Non

adaptio

nto

conceptd

rift

FVLD

[104

]Lo

w-dim

ensio

nfeaturev

ector

generatio

nKN

NM

Lradic

radicradic

radicradic

radicradic

Classifi

catio

nVe

hicle

classificatio

nradic

radicEn

ergy

Highcostof

feature

vector

transm

ission

Fuzzypredictor

mod

el[99]

Occup

ancy

predictio

nFu

zzyrules

radicradic

radicradic

radicradic

Sensea

ndsend

Health

care

radicradic

Predictio

naccuracy

Ineffi

cientfor

complex

scenarios

Onlinelearning

[105]

Increm

ental

classificatio

nSV

Mradic

radicradic

radicradic

radicradic

Classifi

catio

nEn

vironm

ental

mon

itorin

gradic

radicEn

ergy

Com

putatio

nal

complexity

One-class

quarter-sphere

SVM

[108]

Ano

maly

detection

SVM

radicradic

radicradic

radicradicradic

Localano

maly

detection

Habitat

mon

itorin

gradic

radicEn

ergy

Igno

resspatia

lcorrelation

18 International Journal of Distributed Sensor Networks

mining becomes difficult because updates on this structureshould be persisted over time

Node Role Node can perform three types of role [33] asfollows

(i) Regular Sensor These are the nodes with limitedresources and they are used to sense the phenomenaand send the sensed data to the base station

(ii) Cluster Head Cluster head can be a regular sensornode or it can be rich in resources In centralizedapproaches cluster head is a regular sensor node thatonly controls the cluster membership In distributedapproaches besides responding for cluster formationCHs perform aggregationfusion of collected sensorsrsquodata Therefore they are equipped with significantlymore computation and communication resources

(iii) Relay It is the node that acts as medium to transmitthe data packet from one node to the others

Node Task In centralized approach node task is to sense thephenomena being monitored and send the sensed data to thebase station In distributed approaches node can performcomputation and can take action based on the detectedphenomena or target

55 Application Area We also evaluated the type of applica-tion benefited fromWSNs data mining techniques Here weexemplify some real-world applications as follows

(i) First is the environmental monitoring [5ndash7 51 5887] in which sensors are deployed in harsh andunattended regions to monitor the natural environ-ment Data mining techniques can identify when andwhere an event may occur and trigger an alarm upondetection

(ii) Second is the habitant and health monitoring [1 299 109] in which patientshumans are equipped withsmall sensors on multiple different positions of theirbody tomonitor their health or behaviorDataminingtechnique can identify the abnormal behavior andhelp to take effective action

(iii) Third is the object tracking [3 4 65 66] in whichsensors are embedded inmoving targets to track themin real-time Data mining techniques help to improvethe estimation of the location of targets and also tomake tracking more efficient and accurate

(iv) Fourth is the WSNs performance [46 48 50 51]WSNs are usually unattended and deployed in harshenvironment Sensor nodes are resource constrainedespecially in terms of power Data mining techniqueshelp to identify the faulty or dead nodes Theyalso help to conserve energy by using in-networkprocessing in which aggregated data is sent to centralside

(v) Fifth is the data analysis [67 84 90] Data miningtechniques help to discover potentially interesting

data patterns in a sensor network for a certainapplication

(vi) Sixth is the real-time monitoring [64 65 85] Datamining techniques especially distributed techniqueshelp to identify certain patterns and predict futureevents in a given time window which make real-timeresponse and action feasible

56 Implementation Each technique is also evaluated interms of experimental validation that is which dataset isused which WSNs optimization objectives are achieved andso forth

Evaluation Method Analytical modeling simulation andreal deployment are the most commonly used techniques toanalyze the performance of data mining technique forWSNs

(i) Analytical Modeling This method is very complexand usually certain simplifications are assumed topredict the performance of the proposed schemeSuch assumptions and simplifications may lead toimprecise results with limited confidence

(ii) Simulation It is the most popular and effectiveapproach to design and test any proposed schemein terms of cost and time it also provides higherlevel of details as comparedwith real implementationHowever the appropriate selection of a simulationframework according to problem and network char-acteristics is a critical task

(iii) Real Deployment It may not be feasible to evaluatethe performance of these techniques through realdeployment due to the unavailability of appropriatehardware in terms of technical and design limitationsUsually the real deployment requires hundreds ofsensor nodes and cost becomes another importantissue In a nutshell evaluating any technique pro-posed for WSNs through real deployment can getthe most convincing results although the evaluatingprocess is complex costly and time consuming

Data Source It refers to dataset use to experimentally validatethe proposed technique Two types of dataset are usedgenerally that is synthetic and real It is observed from thispaper that most of the techniques use the simulation onsynthetic dataset to validate the result In this paper it isobserved that most of the studies used the simulation due tolimited processing power of sensor nodes

Optimization Objective SinceWSNs are constrained in termsof different resources the technique is also evaluated in theoptimization objective that has been achieved Most of thetechniques consider the resource constraint and differentdesign philosophies of network None of them can workefficiently for all of the performance metrics like networksize communication overhead energy efficiency memoryconsumption node mobility and and so forth The largevariations in the performance metrics make it a difficult taskto present a comprehensive evaluation

International Journal of Distributed Sensor Networks 19

6 Limitations of Existing Data MiningTechniques for WSNs

Tables 2ndash6 show the characteristics of datamining techniquesdesigned for WSNs It is observed from comparative analysisthat the existing techniques have the following shortcomings

(i) Most of the techniques do not take into account theheterogeneous data and assume that the sensor data ishomogenous [42 46 49ndash51 65 87 110] They ignorethe fact that different attributes together can improvethe mining accuracy In some cases homogenousdata cannot contribute appropriately toward real-time decision

(ii) The majority of techniques only considers the spatialor temporal or spatiotemporal correlations [65ndash6787 88] among sensor data of neighboring nodes anddoes not consider the attribute dependency amongsensor nodes This in turn increases the computa-tional complexity and reduces the accuracy of miningtechnique

(iii) The techniqueswhich consider spatial correlation [51]among sensor data of neighboring nodes suffer fromthe choice of appropriate neighborhood range Tech-niques which consider temporal correlation amongsensor data suffers from the choice of the size of thesliding window

(iv) The majority of techniques uses centralized approach[21 42ndash44 46 58 84 101] in which all data istransmitted to the sink node for identifying certainpatterns These techniques cause much communica-tion overhead and delay the response time Whilethe techniques that used distributed architecture opti-mize response time and energy consumption theyhave the same problem as that of the centralizedapproach if the aggregatorcluster head has a largenumber of nodes under its membership

(v) Excluding a few the performance of all of the schemesdiscussed in this paper has been evaluated with thehelp of different simulation tools Although the num-ber of simulators is available and plays an importantrole for developing and testing new technique thereis always some kind of risk involved as simulationresults may not be accurate In order to analyze aprotocol more effectively it is important to knowdifferent available tools andunderstand the associatedbenefits and limitationsDue to different performancerequirements according to specific applications ageneral tool for sensor networks is still lacking atpresent

(vi) The techniques evaluated by using analytical mod-eling [21 23 46 49 100 109] used certain sim-plification and assumption to evaluate the perfor-mance of proposed technique Such assumptions andsimplifications may lead to imprecise results withlimited confidence None of the proposed techniqueis evaluated by using real deployment Although realdeployment is complex costly and time consuming

accurate results can only be obtained by using realdeployment

(vii) Excluding a few [22 103 109] the majority oftechniques assumes that sensor nodes are stationaryand do not consider nodes mobility Applying thesetechniques for mobile networks or the networks withdynamic changed topology would be challenging

(viii) Most of the techniques used the synthetic dataAlthough synthetic data is easily available therealways been chances that results generated on syn-thetic data are not accurate

(ix) For the data mining techniques themselves fre-quent pattern mining [15ndash20] approaches suffer fromchoice of proper and flexible support and confidencethreshold Clustering techniques [11ndash14] suffer fromthe choice of an appropriate parameter of clusterwidth and computing the distance between datainstances in heterogeneous data is computationallyexpensive whereas classification-based techniques[24ndash26] require some prior knowledge to classify theincoming data stream However learning accurateclassification model is challenging if the number ofvariables is large in deployed WSNs

7 Future Research Directions

It is observed from the analysis of existing data mining workon sensor network-based application there are still shortcom-ings in existing techniques By seeing these shortcomingsand special characteristics of WSNs there is a need for datamining technique designed for WSNs The technique shouldbe based on the following requirements

(i) The technique should combine offline learningmech-anisms with distributed and online data processing

(ii) It should also consider the resource constraint ofWSN and its special characteristics such as nodemobility and network topology

(iii) The technique should consider heterogeneous dataand dependencies among spatial temporal andattribute correlations which may exist between adja-cent nodes

(iv) During online mining the technique should be capa-ble for incremental learning

(v) The technique should have low computation com-plexity and be easy to be implemented

Based on aforementioned requirements for WSN ahybrid data mining framework is proposed as shown inFigure 6 In this framework sensor nodes use their pro-cessing abilities to locally carry out mining processing andtransmit only the required and partially processed data calledlocal models Single-pass algorithms are applied for networkdata processing as the data is continuously arriving and notavailable for the next scan

Local models contain the compact event patterns ratherthan raw data which address the issue of communication

20 International Journal of Distributed Sensor Networks

Node data processingData selectionRemove duplicationAggregationSummarizationData fusionclusteringAssociation analysismiddot middot middotmiddot middot middot

middot middot middot

Sensor datastream

Global model

Approximateresults

Network model Local modelQuery

Users

Sinkbasestation

In-network processingCentralized processing

Central data processingFrequent pattern miningClassificationClusteringIncremental learningPredicationAnomaly detectionTime series analysis

Network data processingLocal model integrationNetwork analysisReal time decisionsNetwork maintenance

Network patternidentification

monitoring

Sing

le p

ass

Mul

ti pa

ss

Figure 6 Proposed hybrid framework for sensor network based applications

overhead associated with data transfer Local models aredistributed on entire network which are integrated at specialnode which is resource sufficient as compared with othersensor nodes As a result a network model is computed that ismore abstract than local model and is transferred to the basestationsink inmultihop fashionThenetworkmodels are thenintegrated at base stationsink to get the global view of entirenetwork named the global model As a result approximatequery answers are returned to endusers

This framework addresses the following shortcomings ofthe existing techniques

(i) It combines the offline learning mechanisms withdistributed and online data processing The dynamicnature of WSNs data requires real-time analysismethodologies and systems Centralized processingthrough high-end computing is also required forgenerating offline predictive insights which in turncan facilitate real-time analysis The applications thatrequire real-time response and actions can use net-work model for decision and knowledge extractionThe applications that need extensive data analysis fortheir decision making can use global model and per-form central processing on base the stationsink Thenetwork model forwards the processed informationto global model for extensive predictive insight

(ii) Since the data management is a crucial issue inWSNsdata [111] in order to deal with large-scale data fromWSNs the proposed framework splits the data pro-cessing tasks at multiple locations in-network pro-cessing and processing at central server In-networkprocessing splits the large task into smaller ones atnode level and cluster head which is distributed overthe entire network and executes parallelly At the node

level storage capacities of single nodes are used tocompute the local model which contains aggregateddata from single node whereas cluster head acquiresthe data from group of nodes and aggregate datareadings over a certain region or period As a resultnetwork model is computed at each cluster headwhich contains compact data from set of nodes andreduces data size to be transmitted Network modelscan be integrated at sink to get the global view ofreal-time applications Since the sink at network levelhas restricted resource and cannot process large-scaledata for predictive analysis therefore network mod-els are sent to central server where global models canbe computed for predictive offline analysis Historicalquery from the user can also be addressed fromcentral server whereas instant query can be handledby sink to support real-time response In this way ofdata distribution the proposed framework is feasibleto deal with large amount of data obtained fromWSNs

(iii) It can consider the resource constraint of sensornode by using context-awareness techniques Mem-ory energy [79] and bandwidth are considered inthe implementation of data processing on the sensorsfor example many summarization and aggregationtechniques can be adopted to reduce energy andbandwidth consumption

(iv) The framework can address the problem quicklychanging nature of WSNs data where characteristicsof the monitored process may change over timeand render the old models outdated This problemcan be addressed using the incremental learning

International Journal of Distributed Sensor Networks 21

mechanism [39 112] that helps the model to updatenew information

(v) The framework can identified the spatial-temporalcorrelation at local model by using data correlation-based clustering whereas attribute correlation can beidentified at global model by using the multipass datamining algorithms

Currently we are working on implementation of thishybrid framework and the implementationwill be completedin the near future

8 Conclusion

The emerging need for the data mining techniques in thefield of WSNs resulted in the development of numerousalgorithms Each one of these algorithms solves certainissues related to the appropriate WSNs type and applicationIn this paper we analyzed discussed and compared therelated existing research approaches We observed that thetechniques intended for mining sensor data at the networkside are helpful for taking real-time decision aswell as serve asprerequisite for development of effective mechanism for datastorage retrieval query and transaction processing at centralside Moreover we have presented problem-based taxonomyan overall analysis and review of the past research and theirlimitations which can provide insights for endusers in apply-ing or developing an appropriate data mining method andappropriate technology forWSNs Based on these limitationswe have proposed a hybrid framework which can addressthe shortcomings of existing work We have also discussedthe challenges for implementing data mining techniques inresource-constrained WSNs Besides there are a number ofopen issues in existing studies which need to be addressedSurely the number of WSNs applications presented hereis neither complete nor exhaustive but merely a sample ofapplications that demonstrate the usefulness and possibleapplications of data mining method in sensor network

We believe that WSNs applications will become moremature and popular with the advancement of sensor tech-nology and sensor data will become more informationrich Mining techniques will then be very significant inorder to conduct advanced analysis such as determiningtrends and finding interesting patterns thus enhancingWSNsperformance and operation The intention to present thispaper is to stimulate interests in utilizing and developing theprevious studies into emerging applications

Acknowledgments

This work was supported in part by the Joint Funds ofNSFC-Microsoft Research Asia under Grant no 60933012the Specialized Research Fund for the Doctoral Programof Higher Education under Grant no 20110142110062 andInternational SampT Cooperation Program of Hubei Provinceunder Grant no 2010BFA008

References

[1] A Rozyyev H Hasbullah and F Subhan ldquoIndoor child track-ing in wireless sensor network using fuzzy logic techniquerdquoResearch Journal of Information Technology vol 3 no 2 pp 81ndash92 2011

[2] R Szewczyk E Osterweil J Polastre M Hamilton A Main-waring and D Estrin ldquoHabitat monitoring with sensor net-worksrdquo Communications of the ACM vol 47 no 6 pp 34ndash402004

[3] S H Chauhdary A K Bashir S C Shah and M S ParkldquoEOATR energy efficient object tracking by auto adjustingtransmission range in wireless sensor networkrdquo Journal ofApplied Sciences vol 9 no 24 pp 4247ndash4252 2009

[4] P K Biswas and S Phoha ldquoSelf-organizing sensor networks forintegrated target surveillancerdquo IEEETransactions onComputersvol 55 no 8 pp 1033ndash1047 2006

[5] L T Lee and C W Chen ldquoSynchronizing sensor networkswith pulse coupled and cluster based approachesrdquo InformationTechnology Journal vol 7 no 5 pp 737ndash745 2008

[6] N Sabri S A Aljunid B Ahmad A Yahya R KamaruddinandM S Salim ldquoWireless sensor actor network based on fuzzyinference system for greenhouse climate controlrdquo Journal ofApplied Sciences vol 11 no 17 pp 3104ndash3116 2011

[7] D Kumar ldquoMonitoring forest cover changes using remotesensing and GIS a global prospectiverdquo Research Journal ofEnvironmental Sciences vol 5 pp 105ndash123 2011

[8] J Yick B Mukherjee and D Ghosal ldquoWireless sensor networksurveyrdquoComputerNetworks vol 52 no 12 pp 2292ndash2330 2008

[9] T Arampatzis J Lygeros and S Manesis ldquoA survey of appli-cations of wireless sensors and wireless sensor networksrdquoin Proceedings of the 20th IEEE International Symposium onIntelligent Control (ISIC rsquo05) pp 719ndash724 June 2005

[10] Y-C Tseng M-S Pan and Y-Y Tsai ldquoWireless sensor net-works for emergency navigationrdquo Computer vol 39 no 7 pp55ndash62 2006

[11] T Yairi Y Kato and K Hori ldquoFault detection by miningassociation rules fromhouse-keeping datardquo inProceedings of the6th International Symposium on Artificial Intelligence Roboticsand Automation in Space pp 18ndash21 2001

[12] O Horovitz S Krishnaswamy and M M Gaber ldquoA fuzzyapproach for interpretation of ubiquitous data stream clusteringand its application in road safetyrdquo Intelligent Data Analysis vol11 no 1 pp 89ndash108 2007

[13] J Gama P P Rodrigues and L Lopes ldquoClustering distributedsensor data streams using local processing and reduced com-municationrdquo Intelligent Data Analysis vol 15 no 1 pp 3ndash282011

[14] Z A Aghbari I Kamel and T Awad ldquoOn clustering largenumber of data streamsrdquo Intelligent Data Analysis vol 16 no1 pp 69ndash91 2012

[15] A Boukerche and S Samarah ldquoAn efficient data extractionmechanism for mining association rules from wireless sensornetworksrdquo in Proceedings of the IEEE International Conferenceon Communications (ICC rsquo07) pp 3936ndash3941 June 2007

[16] Y Chi H Wang P S Yu and R R Muntz ldquoMomentmaintaining closed frequent itemsets over a stream slidingwindowrdquo inProceedings of the 4th IEEE International Conferenceon Data Mining (ICDM rsquo04) pp 59ndash66 November 2004

[17] M Deypir and M H Sadreddini ldquoEclatDS an efficient slid-ing window based frequent pattern mining method for data

22 International Journal of Distributed Sensor Networks

streamsrdquo Intelligent Data Analysis vol 15 no 4 pp 571ndash5872011

[18] J Gama A Ganguly O Omitaomu R Vatsavai and M GaberldquoKnowledge discovery from data streamsrdquo Intelligent DataAnalysis vol 13 no 3 pp 403ndash404 2009

[19] B George J M Kang and S Shekhar ldquoSpatio-temporal sensorgraphs (STSG) a data model for the discovery of spatio-temporal patternsrdquo Intelligent Data Analysis vol 13 no 3 pp457ndash475 2009

[20] A Mahmood K Shi and S Khatoon ldquoMining data generatedby sensor networks a surveyrdquo Information Technology Journalvol 11 pp 1534ndash1543 2012

[21] D J Cook M Youngblood E O Heierman III et alldquoMavHome an agent-based smart homerdquo in Proceedings of the1st IEEE International Conference on Pervasive Computing andCommunications (PerCom rsquo03) pp 521ndash524 March 2003

[22] J Rabatel S Bringay and P Poncelet ldquoSO MAD sensorminingfor anomaly detection in railway datardquo in Advances in DataMining Applications andTheoretical Aspects pp 191ndash205 2009

[23] V Guralnik and K Z Haigh ldquoLearning models of humanbehaviour with sequential patternsrdquo in Proceedings of the AAAI-02 Workshop on Automation as Caregiver pp 24ndash30 2002

[24] S Huang and Y Dong ldquoAn active learning system for miningtime-changing data streamsrdquo Intelligent Data Analysis vol 11no 4 pp 401ndash419 2007

[25] J Beringer and E Hullermeier ldquoEfficient instance-based learn-ing on data streamsrdquo Intelligent Data Analysis vol 11 no 6 pp627ndash650 2007

[26] E J Spinosaa A PD L F deCarvalhoa and J Gamab ldquoNoveltydetection with application to data streamsrdquo Intelligent DataAnalysis vol 13 no 3 pp 405ndash422 2009

[27] M Xie S Han B Tian and S Parvin ldquoAnomaly detectionin wireless sensor networks a surveyrdquo Journal of Network andComputer Applications vol 34 no 4 pp 1302ndash1325 2011

[28] Y Zhang N Meratnia and P Havinga ldquoOutlier detectiontechniques for wireless sensor networks a surveyrdquo IEEE Com-munications Surveys and Tutorials vol 12 no 2 pp 159ndash1702010

[29] V Chandola A Banerjee and V Kumar ldquoAnomaly detection asurveyrdquo ACM Computing Surveys vol 41 no 3 article 15 2009

[30] VMaojo and J Sanandre ldquoA survey of data mining techniquesrdquoMedical Data Analysis Lecture Notes in Computer Science vol1933 pp 17ndash22 2000

[31] W Jinlong X Congfu C Weidong and P Yunhe ldquoSurveyof the study on frequent pattern mining in data streamsrdquo inProceedings of the IEEE International Conference on SystemsMan and Cybernetics (SMC rsquo04) pp 5917ndash5922 October 2004

[32] J Cheng Y Ke and W Ng ldquoA survey on algorithms formining frequent itemsets over data streamsrdquo Knowledge andInformation Systems vol 16 no 1 pp 1ndash27 2008

[33] A A Abbasi andM Younis ldquoA survey on clustering algorithmsfor wireless sensor networksrdquo Computer Communications vol30 no 14-15 pp 2826ndash2841 2007

[34] O Boyinbode H Le and M Takizawa ldquoA survey on clusteringalgorithms for wireless sensor networksrdquo International Journalof Space-Based and SituatedComputing vol 1 no 2 pp 130ndash1362007

[35] M M Gaber A Zaslavsky and S Krishnaswamy ldquoA survey ofclassificationmethods in data streamsrdquo inData Streams pp 39ndash59 Springer 2007

[36] R Agrawal and R Srikant ldquoFast algorithms for mining associ-ation rulesrdquo in Proceedings of the 20th International ConferenceVery Large Data Bases (VLDB rsquo94) pp 487ndash499 Citeseer 1994

[37] R J Bayardo Jr ldquoEfficiently mining long patterns fromdatabasesrdquo SIGMOD Record vol 27 no 2 pp 85ndash93 1998

[38] S Brin RMotwani andC Silverstein ldquoBeyondmarket basketsgeneralizing association rules to correlationsrdquo SIGMODRecordvol 26 no 2 pp 265ndash276 1997

[39] W Cheung and O R Zaiane ldquoIncremental mining of frequentpatterns without candidate generation or support constraintrdquoin Proceedings of 7th International Database Engineering andApplications Symposium pp 111ndash116 2003

[40] R Agrawal T Imielinski and A Swami ldquoMining associationrules between sets of items in large databasesrdquo in Proceeding ofSIGMOD pp 207ndash216

[41] J Han J Pei Y Yin and R Mao ldquoMining frequent pat-terns without candidate generation a frequent-pattern treeapproachrdquo Data Mining and Knowledge Discovery vol 8 no 1pp 53ndash87 2004

[42] M Halatchev and L Gruenwald ldquoEstimating missing valuesin related sensor data streamsrdquo in Proceedings of the 11thInternational Conference on Management of Data (COMADrsquo05) 2005

[43] N Jiang ldquoDiscovering association rules in data streams basedon closed pattern miningrdquo in Proceedings of the SIGMODWorkshop on Innovative Database Research 2007

[44] N Jiang and L Gruenwald ldquoEstimating missing data in datastreamsrdquo Advances in Databases Concepts Systems and Appli-cations pp 981ndash987 2007

[45] N Jiang and L Gruenwald ldquoCFI-stream mining closed fre-quent itemsets in data streamsrdquo in Proceedings of the 12th ACMSIGKDD International Conference on Knowledge Discovery andData Mining (KDD rsquo06) pp 592ndash597 August 2006

[46] K Loo I Tong and B Kao ldquoOnline algorithms for min-ing inter-stream associations from large sensor networksrdquo inAdvances in KnowledgeDiscovery andDataMining pp 291ndash3022005

[47] G S Manku and R Motwani ldquoApproximate frequency countsover data streamsrdquo in Proceedings of the 28th InternationalConference on Very Large Data Bases pp 346ndash357 2002

[48] S K Chong S Krishnaswamy S W Loke and M M GaberldquoUsing association rules for energy conservation in wirelesssensor networksrdquo in Proceedings of the 23rd Annual ACMSymposium on Applied Computing (SAC rsquo08) pp 971ndash975March 2008

[49] S K Tanbeer C F Ahmed B-S Jeong and Y-K Lee ldquoEfficientmining of association rules from wireless sensor networksrdquo inProceedings of the 11th International Conference on AdvancedCommunication Technology (ICACT rsquo09) pp 719ndash724 February2009

[50] A Boukerche and S Samarah ldquoA novel algorithm for miningassociation rules in Wireless Ad Hoc Sensor Networksrdquo IEEETransactions on Parallel and Distributed Systems vol 19 no 7pp 865ndash877 2008

[51] K Romer ldquoDistributed mining of spatio-temporal event pat-terns in sensor networksrdquo in Proceedings of the 1st Euro-American Workshop on Middleware for Sensor Networks(EAWMS rsquo06) 2006

[52] BTnode platform httpwwwbtnodeethzch[53] R Agrawal and R Srikant ldquoMining sequential patternsrdquo in

Proceedings of the IEEE 11th International Conference on DataEngineering pp 3ndash14 March 1995

International Journal of Distributed Sensor Networks 23

[54] R Srikant and R Agrawal ldquoMining sequential patterns gen-eralizations and performance improvementsrdquo in Proceedings ofthe Advances in Database Technology (EDBT rsquo96) pp 1ndash17 1996

[55] F Masseglia F Cathala and P Poncelet ldquoThe PSP approachfor mining sequential patternsrdquo Principles of Data Mining andKnowledge Discovery pp 176ndash184 1998

[56] J Han J Pei B Mortazavi-Asl Q Chen U Dayal and M-CHsu ldquoFreeSpan frequent pattern-projected sequential patternminingrdquo in Proceedings of the Sixth ACMSIGKDD InternationalConference onKnowledgeDiscovery andDataMining (KDD rsquo01)pp 355ndash359 August 2000

[57] J Pei J Han B Mortazavi-Asl et al ldquoPrefixSpan min-ing sequential patterns efficiently by prefix-projected patterngrowthrdquo in Proceedings of the 17th International Conference onData Engineering pp 215ndash224 April 2001

[58] F Esposito T M A Basile N Di Mauro and S Ferilli ldquoA rela-tional approach to sensor network data miningrdquo InformationRetrieval and Mining in Distributed Environments pp 163ndash1812010

[59] F Esposito N Di Mauro T M A Basile and S FerillildquoMulti-dimensional relational sequence miningrdquo FundamentaInformaticae vol 89 no 1 pp 23ndash43 2008

[60] R Agrawal H Mannila R Srikant et al ldquoFast discovery ofassociation rulesrdquo inAdvances in KnowledgeDiscovery andDataMining pp 307ndash328 AAAI PressMenlo Park Calif USA 1996

[61] Mica2Dot CrossBow 2005 httpwwwxbowcom[62] Intel Berkeley Research Lab Data httpdbcsailmitedulab-

datalabdatahtml[63] P H Wu W C Peng and M S Chen ldquoMining sequential

alarm patterns in a telecommunication databaserdquo in Databasesin Telecommunications II pp 37ndash51 2001

[64] V S Tseng and E H-C Lu ldquoEnergy-efficient real-time objecttracking in multi-level sensor networks by mining and predict-ing movement patternsrdquo Journal of Systems and Software vol82 no 4 pp 697ndash706 2009

[65] V S Tseng and K W Lin ldquoEnergy efficient strategies for objecttracking in sensor networks a data mining approachrdquo Journalof Systems and Software vol 80 no 10 pp 1678ndash1698 2007

[66] S Samarah M Al-Hajri and A Boukerche ldquoA predictiveenergy-efficient technique to support object-tracking sensornetworksrdquo IEEE Transactions on Vehicular Technology vol 60no 2 pp 656ndash663 2011

[67] A Taherkordi R Mohammadi and F Eliassen ldquoA commu-nication-efficient distributed clustering algorithm for sensornetworksrdquo in Proceedings of the 22nd International Conferenceon Advanced Information Networking and Applications Work-shopsSymposia (AINA rsquo08) pp 634ndash638 March 2008

[68] G Gupta and M Younis ldquoLoad-balanced clustering of wirelesssensor networksrdquo in Proceedings of the International Conferenceon Communications (ICC rsquo03) vol 3 pp 1848ndash1852 May 2003

[69] S Bandyopadhyay and E J Coyle ldquoAn energy efficient hier-archical clustering algorithm for wireless sensor networksrdquo inProceedings of the 22nd Annual Joint Conference on the IEEEComputer and Communications Societies pp 1713ndash1723 April2003

[70] S Ghiasi A Srivastava X Yang and M Sarrafzadeh ldquoOptimalenergy aware clustering in sensor networksrdquo Sensors vol 2 no7 pp 258ndash269 2002

[71] O Younis and S Fahmy ldquoHEED a hybrid energy-efficientdistributed clustering approach for ad hoc sensor networksrdquoIEEE Transactions on Mobile Computing vol 3 no 4 pp 366ndash379 2004

[72] M Younis M Youssef and K Arisha ldquoEnergy-aware manage-ment for cluster-based sensor networksrdquo Computer Networksvol 43 no 5 pp 649ndash668 2003

[73] Y T Hou Y Shi H D Sherali and S F Midkiff ldquoOn energyprovisioning and relay node placement for wireless sensornetworksrdquo IEEE Transactions on Wireless Communications vol4 no 5 pp 2579ndash2590 2005

[74] T Wu and S Biswas ldquoA self-reorganizing slot allocation proto-col for multi-cluster sensor networksrdquo in Proceedings of the 4thInternational Symposium on Information Processing in SensorNetworks (IPSN rsquo05) pp 309ndash316 April 2005

[75] K Dasgupta K Kalpakis and P Namjoshi ldquoAn efficientclustering-based heuristic for data gathering and aggregationin sensor networksrdquo in Proceedings of the IEEE Wireless Com-munications and Networking Conference (WCNC rsquo03) vol 3 pp1948ndash1953 2003

[76] M Demirbas A Arora and V Mittal ldquoFLOC A fast local clus-tering service for wireless sensor networksrdquo in Proceedings ofWorkshop on Dependability Issues in Wireless Ad Hoc Networksand Sensor Networks (DIWANS rsquo04) 2004

[77] P Ding J Holliday and A Celik ldquoDistributed energy-efficienthierarchical clustering for wireless sensor networksrdquo in Pro-ceedings of the 1st IEEE International Conference on DistributedComputing in Sensor Systems (DCOSS rsquo05) pp 466ndash467 July2005

[78] H Chan and A Perrig ldquoACE an emergent algorithm for highlyuniform cluster formationrdquoWireless Sensor Networks vol 2920pp 154ndash171 2004

[79] H Chan M Luk and A Perrig ldquoUsing clustering informationfor sensor network localizationrdquo in Proceedings of the 1st IEEEInternational Conference on Distributed Computing in SensorSystems (DCOSS rsquo05) pp 109ndash125 July 2005

[80] H Huang and J Wu ldquoA probabilistic clustering algorithmin wireless sensor networksrdquo in Proceeding of IEEE 62ndSemiannual Vehicular Technology Conference (VTC rsquo05) p 17962005

[81] A Youssef M Younis M Youssef and A Agrawala ldquoDis-tributed formation of overlappingmulti-hop clusters in wirelesssensor networksrdquo in Proceedings of the 49th Annual IEEE GlobalCommunication Conference (Globecom rsquo06) pp 1ndash6 December2006

[82] S Dai P Wang L Gao and S Zheng ldquoMining clusteringalgorithm in wireless sensor networksrdquo in Proceedings of theIEEE International Conference on Granular Computing (GRCrsquo08) pp 178ndash182 August 2008

[83] W R Heinzelman A Chandrakasan and H Balakrish-nan ldquoEnergy-efficient communication protocol for wirelessmicrosensor networksrdquo in Proceedings of the 33rd AnnualHawaii International Conference on System Siences (HICSS rsquo00)vol 2 p 223 January 2000

[84] C Liu K Wu and J Pei ldquoA dynamic clustering and schedulingapproach to energy saving in data collection from wirelesssensor networksrdquo in Proceedings of the 2nd Annual IEEE Com-munications Society Conference on Sensor and AdHoc Commu-nications and Networks (SECON rsquo05) pp 374ndash385 September2005

[85] L Guo C Ai X Wang Z Cai and Y Li ldquoReal time clusteringof sensory data in wireless sensor networksrdquo in Proceedingsof the IEEE 28th International Performance Computing andCommunications Conference (IPCCC rsquo09) pp 33ndash40 December2009

24 International Journal of Distributed Sensor Networks

[86] M H Yeo M S Lee S J Lee and J S Yoo ldquoData correlation-based clustering in sensor networksrdquo in Proceedings of the Inter-national Symposium on Computer Science and its Applications(CSA rsquo08) pp 332ndash337 October 2008

[87] P Beyens A Nowe and K Steenhaut ldquoHigh-density wirelesssensor networks a new clustering approach for prediction-based monitoringrdquo in Proceedings of the 2nd European Work-shop on Wireless Sensor Networks (EWSN rsquo05) pp 188ndash196February 2005

[88] S Yoon and C Shahabi ldquoThe Clustered AGgregation (CAG)technique leveraging spatial and temporal correlations in wire-less sensor networksrdquo ACM Transactions on Sensor Networksvol 3 no 1 Article ID 1210672 2007

[89] K Wang S A Ayyash T D C Little and P Basu ldquoAttribute-based clustering for information dissemination in wirelesssensor networksrdquo in Proceedings of the 2nd Annual IEEE Com-munications Society Conference on Sensor and AdHoc Commu-nications and Networks (SECON rsquo05) pp 498ndash509 Santa ClaraCalif USA September 2005

[90] X Ma S Li Q Luo et al ldquoDistributed hierarchical clusteringand summarization in sensor networksrdquo in Advances in Dataand Web Management pp 168ndash175 2007

[91] L K Sharma O P Vyas S Schieder et al ldquoNearest neighbourclassification for trajectory datardquo Information and Communica-tion Technologies vol 101 pp 180ndash185 2010

[92] B Chikhaoui S Wang and H Pigot ldquoA new algorithm basedon sequential pattern mining for person identification in ubiq-uitous environmentsrdquo in Proceedings of the 4th InternationalWorkshop on Knowledge Discovery form Sensor Data (ACMSensorKDD rsquo10) pp 20ndash28 Washington DC USA 2010

[93] J R M Bauchet S Giroux H Pigot et al ldquoPervasive assistancein smart homes for people with intellectual disabilities a casestudy on meal preparationrdquo International Journal of AssistiveRobotics and Mechatronics vol 9 no 4 pp 42ndash54 2008

[94] D J Cook andM Schmitter-Edgecombe ldquoAssessing the qualityof activities in a smart environmentrdquoMethods of Information inMedicine vol 48 no 5 pp 480ndash485 2009

[95] I H Witten and E Frank Data Mining Practical MachineLearning Tools and Techniques With Java Implementation Mor-gan Kaufmann 2000

[96] K Sharma M Rajpoot and L K Sharma ldquoNearest neighbourclassification for wireless sensor network datardquo InternationalJournal of Computer Trends and Technology no 2 2011

[97] NS2 Simulator httpwwwisiedunsnamns[98] O P V L K Sharma S Schieder and A K Akasapu ldquoA nearest

neighbour classification for trajectory datardquo in Springer CCISvol 101 pp 180ndash185 2010

[99] M J Akhlaghinia A Lotfi C Langensiepen and N SherkatldquoA fuzzy predictor model for the occupancy prediction of anintelligent inhabited environmentrdquo in Proceedings of the IEEEInternational Conference on Fuzzy Systems (FUZZ rsquo08) pp 939ndash946 June 2008

[100] M Gaber S Krishnaswamy and A Zaslavsky ldquoOn-boardmining of data streams in sensor networksrdquo in AdvancedMethods for Knowledge Discovery from Complex Data pp 307ndash335 2005

[101] M M Gaber S Krishnaswamy and A Zaslavsky ldquoAdaptivemining techniques for data streams using algorithm outputgranularityrdquo in Proceedings of the Australasian Data MiningWorkshop 2003

[102] M M Gaber A Zaslavsky and S Krishnaswamy ldquoResource-aware knowledge discovery in data streamsrdquo in Proceedingsof 1st International Workshop on Knowledge Discovery in DataStreams held in Conjunction ECML and PKDD 2004

[103] S M McConnell and D B Skillicorn ldquoA distributed approachfor prediction in sensor networksrdquo in Proceedings of the Work-shop on Data Mining in Sensor Networks Newport Beach CalifUSA 2005

[104] B Malhotra I Nikolaidis and J Harms ldquoDistributed classifi-cation of acoustic targets in wireless audio-sensor networksrdquoComputer Networks vol 52 no 13 pp 2582ndash2593 2008

[105] K Flouri B Beferull-Lozano and T Tsakalides ldquoTraininga SVM-based classifier in distributed sensor networksrdquo inProceedings of the 14th International Conference onDigital SignalProcessing (DSP rsquo09) pp 1ndash5 2006

[106] K Flouri B Beferull-Lozano and T Tsakalides ldquoEnergy-efficient distributed support vectormachines for wireless sensornetworksrdquo in Proceedings of the EuropeanWorkshop onWirelessSensor Networks 2006

[107] K Flouri B Beferull-Lozano and T Tsakalides ldquoDistributedconsensus algorithms for SVM training in wireless sensornetworksrdquo in Proceedings of the 16th European Signal ProcessingConference (EUSIPCO 09) 2008

[108] S Rajasegarar C Leckie M Palaniswami and J C BezdekldquoQuarter sphere based distributed anomaly detection in wire-less sensor networksrdquo in Proceedings of the IEEE InternationalConference on Communications (ICC rsquo07) pp 3864ndash3869 June2007

[109] B Chikhaoui S Wang and H Pigot ldquoA new algorithm basedon sequential pattern mining for person identification in ubiq-uitous environmentsrdquo in Proceedings of the 4th InternationalWorkshop on Knowledge Discovery form Sensor Data (ACMSensorKDD rsquo10) pp 20ndash28 Washington DC USA 2010

[110] K Romer and F Mattern ldquoThe design space of wireless sensornetworksrdquo IEEEWireless Communications vol 11 no 6 pp 54ndash61 2004

[111] O Diallo J J P C Rodrigues and M Sene ldquoReal-time datamanagement on wireless sensor networks a surveyrdquo Journal ofNetwork andComputer Applications vol 35 no 3 pp 1013ndash10212012

[112] Y Yao L Feng B Jin and F Chen ldquoAn incremental learningapproachwith SupportVectorMachine for network data streamclassification problemrdquo Information Technology Journal vol 11no 2 pp 200ndash208 2012

Submit your manuscripts athttpwwwhindawicom

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2013Part I

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

DistributedSensor Networks

International Journal of

ISRN Signal Processing

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Mechanical Engineering

Advances in

Modelling amp Simulation in EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Advances inOptoElectronics

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2013

ISRN Sensor Networks

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawi Publishing Corporation httpwwwhindawicom Volume 2013

The Scientific World Journal

ISRN Robotics

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

International Journal of

Antennas andPropagation

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

ISRN Electronics

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

thinspJournalthinspofthinsp

Sensors

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Active and Passive Electronic Components

Chemical EngineeringInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Electrical and Computer Engineering

Journal of

ISRN Civil Engineering

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Advances inAcoustics ampVibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Page 15: ReviewArticle Data Mining Techniques for Wireless Sensor ...home.etf.bg.ac.rs/~vm/os/dmsw/Data Mining... · have a large impact on type of data mining algorithm to choose;therefore,onehastodecidetheprocessing

International Journal of Distributed Sensor Networks 15

Table4Com

paris

onof

dataminingtechniqu

esforw

irelesssensor

networkscon

tinued

Approach

Objectiv

eDM

metho

d

Processin

gSensor

data

Nod

eproperties

Implem

entatio

nLimitatio

nsArchitecture

Attributes

Correlatio

nCon

nectivity

Mob

ility

Nod

erole

Nod

etask

Applicationarea

Evaluatio

nmetho

dDatas

ource

Optobjectiv

e

Distributed

Central

Homogenous

Heterogeneous

Attribute

Spatial

Temporal

Singlehop

Multihops

Static

Mobile

ClusterheadSensorRelay

Simulation

Analyticalmod

Real

Synthetic

Sequ

entia

lpattern

mining

TMP-mine[65]

Predicto

bjectrsquos

future

movem

ent

Patte

rngrow

thusingTM

P-tre

econstructio

nradic

radicradic

radicradic

radicRu

le-based

node

activ

ation

Real-timeo

bject

tracking

radicradic

Energy

Highmissing

rateandtim

e

Patte

rnlearner[23]B

ehavior

recogn

ition

Tree

projectio

nradic

radicradic

radicradic

radicSensea

ndsend

Behavior

mon

itorin

gradicradic

Noof

patte

rns

learned

Com

plex

and

redu

ndant

patte

rns

MSA

P[63]

Faultp

rediction

Cand

idate

constructio

nradicradic

radicradicradic

radicSensea

ndsend

Telecommun

ication

radicradic

Patte

rnsa

ccuracy

Cand

idate

constructio

nis

expensiveto

compu

te

PTSP

[66]

Objectrsquos

future

movem

ent

predictio

n

Sequ

entia

lpatte

rngeneratio

nradic

radicradic

radicradic

radicRu

le-based

node

activ

ation

Objecttracking

radicradic

Energy

Ineffi

cientto

predict

high

-speed

objects

Clusterin

g

DCC

[86]

WSN

slon

gevity

Data

correlation-

based

cluste

ring

radicradic

radicradicradic

radicradic

Data

supp

ression

GenericWSN

sapplication

radicradic

Energy

anddata

size

Highclu

sterin

grate

H-cluste

r[85]

In-network

commun

ication

Data

correlation-

based

cluste

ring

radicradic

radicradicradic

radicradic

Data

summarization

Real-time

mon

itorin

gradic

radicradic

Com

mun

ication

Highdataloss

rate

16 International Journal of Distributed Sensor Networks

Table5Com

paris

onof

dataminingtechniqu

esforw

irelesssensor

networkscon

tinued

Approach

Objectiv

eDM

metho

d

Processin

gSensor

data

Nod

eproperties

Implem

entatio

nLimitatio

nsArchitecture

Attributes

Correlatio

nCon

nectivity

Mob

ility

Role

Nod

etask

Applicationarea

Evaluatio

nmetho

dDatas

ource

Optobjectiv

e

Distributed

Central

Homogenous

Heterogeneous

Attribute

Spatial

Temporal

Singlehop

Multihops

Static

Mobile

ClusterheadSensorRelay

Simulation

Analyticalmod

Real

Synthetic

Clusterin

gPredictio

nmod

el[87]

Predictio

n-based

mon

itorin

gHeuris

ticscheme

radicradic

radicradic

radicradic

radicradicradic

Localprediction

mod

elEn

vironm

ental

mon

itorin

gradic

radicCom

mun

ication

Clustero

verla

pping

CAG[88]

WSN

sbandw

idth

gain

Data

correlation-

based

cluste

ring

radicradic

radicradic

radicradic

radicradic

Dataa

ggregatio

nGenericWSN

sapplications

radicradic

Com

mun

ication

Sensorydataloss

EEDC[84]

On-demand

cluste

ring

Data

correlation-

based

cluste

ring

radicradic

radicradic

radicradic

radicSensea

ndsend

Surveillanced

ata

analysis

radicradicradic

Energy

Ineffi

cientfor

large

WSN

s

Clusterin

gsensorydata[67]Com

mun

ication

efficiency

K-means

radicradicradic

radicradic

radicradic

Data

summarization

Dataa

nalysis

radicradic

Com

mun

ication

Ineffi

cientfor

large

WSN

sAttributeb

ased

cluste

ring[89]

WSN

sbandw

idth

gain

Hierarchal

cluste

ringradic

radicradic

radicradic

radicradic

Datac

luste

ring

Mon

itorin

gand

tracking

radicradic

Com

mun

ication

Highcompu

tatio

ncost

DHCS

[90]

Uniform

data

distr

ibution

Hierarchal

cluste

ringradic

radicradicradic

radicradic

radicradic

Datac

luste

ring

and

summarization

Interactived

ata

analysis

radicMessage

redu

ction

Nod

esenergy

isigno

red

International Journal of Distributed Sensor Networks 17

Table6Com

paris

onof

dataminingtechniqu

esforw

irelesssensor

networkscon

tinued

Approach

Objectiv

eDM

metho

d

Processin

gSensor

data

Nod

eproperties

Implem

entatio

nLimitatio

nsArchitecture

Attributes

Correlatio

nCon

nectivity

Mob

ility

Role

Nod

etask

Applicationarea

Evaluatio

nmetho

dDatas

ource

Opt

objective

Distributed

Central

Homogenous

Heterogeneous

Attribute

Spatial

Temporal

Singlehop

Multihops

Static

Mobile

ClusterheadSensorRelay

Simulation

Analyticalmod

Real

Synthetic

Classifi

catio

nPerson

identifi

catio

nalgorithm

s[109]

Identifyhu

man

behavior

Decision

tree

radicradicradic

radicradic

radicSensea

ndsend

Health

care

radicradic

Classifi

catio

naccuracy

Doesn

otgu

arantee

thec

orrectness

Predictio

nfram

ework[103]

Distrib

uted

predictio

nDecision

tree

radicradic

radicradicradic

radicradic

Localprediction

Generic

radicradic

Predictio

naccuracy

Com

putatio

nal

complexity

NNTC

[96]

Real-time

classificatio

nNearest

neighb

orradicradic

radicradic

radicradic

Sensea

ndsend

Generic

radicradicradic

Classifi

catio

naccuracy

Not

evaluatedon

realdataset

LWClass[100]

Preserve

WSN

sresources

KNN

radicradic

radicradic

radicradic

Sensea

ndsend

Ubiqu

itous

environm

ents

radicradic

Resource

awareness

Non

adaptio

nto

conceptd

rift

FVLD

[104

]Lo

w-dim

ensio

nfeaturev

ector

generatio

nKN

NM

Lradic

radicradic

radicradic

radicradic

Classifi

catio

nVe

hicle

classificatio

nradic

radicEn

ergy

Highcostof

feature

vector

transm

ission

Fuzzypredictor

mod

el[99]

Occup

ancy

predictio

nFu

zzyrules

radicradic

radicradic

radicradic

Sensea

ndsend

Health

care

radicradic

Predictio

naccuracy

Ineffi

cientfor

complex

scenarios

Onlinelearning

[105]

Increm

ental

classificatio

nSV

Mradic

radicradic

radicradic

radicradic

Classifi

catio

nEn

vironm

ental

mon

itorin

gradic

radicEn

ergy

Com

putatio

nal

complexity

One-class

quarter-sphere

SVM

[108]

Ano

maly

detection

SVM

radicradic

radicradic

radicradicradic

Localano

maly

detection

Habitat

mon

itorin

gradic

radicEn

ergy

Igno

resspatia

lcorrelation

18 International Journal of Distributed Sensor Networks

mining becomes difficult because updates on this structureshould be persisted over time

Node Role Node can perform three types of role [33] asfollows

(i) Regular Sensor These are the nodes with limitedresources and they are used to sense the phenomenaand send the sensed data to the base station

(ii) Cluster Head Cluster head can be a regular sensornode or it can be rich in resources In centralizedapproaches cluster head is a regular sensor node thatonly controls the cluster membership In distributedapproaches besides responding for cluster formationCHs perform aggregationfusion of collected sensorsrsquodata Therefore they are equipped with significantlymore computation and communication resources

(iii) Relay It is the node that acts as medium to transmitthe data packet from one node to the others

Node Task In centralized approach node task is to sense thephenomena being monitored and send the sensed data to thebase station In distributed approaches node can performcomputation and can take action based on the detectedphenomena or target

55 Application Area We also evaluated the type of applica-tion benefited fromWSNs data mining techniques Here weexemplify some real-world applications as follows

(i) First is the environmental monitoring [5ndash7 51 5887] in which sensors are deployed in harsh andunattended regions to monitor the natural environ-ment Data mining techniques can identify when andwhere an event may occur and trigger an alarm upondetection

(ii) Second is the habitant and health monitoring [1 299 109] in which patientshumans are equipped withsmall sensors on multiple different positions of theirbody tomonitor their health or behaviorDataminingtechnique can identify the abnormal behavior andhelp to take effective action

(iii) Third is the object tracking [3 4 65 66] in whichsensors are embedded inmoving targets to track themin real-time Data mining techniques help to improvethe estimation of the location of targets and also tomake tracking more efficient and accurate

(iv) Fourth is the WSNs performance [46 48 50 51]WSNs are usually unattended and deployed in harshenvironment Sensor nodes are resource constrainedespecially in terms of power Data mining techniqueshelp to identify the faulty or dead nodes Theyalso help to conserve energy by using in-networkprocessing in which aggregated data is sent to centralside

(v) Fifth is the data analysis [67 84 90] Data miningtechniques help to discover potentially interesting

data patterns in a sensor network for a certainapplication

(vi) Sixth is the real-time monitoring [64 65 85] Datamining techniques especially distributed techniqueshelp to identify certain patterns and predict futureevents in a given time window which make real-timeresponse and action feasible

56 Implementation Each technique is also evaluated interms of experimental validation that is which dataset isused which WSNs optimization objectives are achieved andso forth

Evaluation Method Analytical modeling simulation andreal deployment are the most commonly used techniques toanalyze the performance of data mining technique forWSNs

(i) Analytical Modeling This method is very complexand usually certain simplifications are assumed topredict the performance of the proposed schemeSuch assumptions and simplifications may lead toimprecise results with limited confidence

(ii) Simulation It is the most popular and effectiveapproach to design and test any proposed schemein terms of cost and time it also provides higherlevel of details as comparedwith real implementationHowever the appropriate selection of a simulationframework according to problem and network char-acteristics is a critical task

(iii) Real Deployment It may not be feasible to evaluatethe performance of these techniques through realdeployment due to the unavailability of appropriatehardware in terms of technical and design limitationsUsually the real deployment requires hundreds ofsensor nodes and cost becomes another importantissue In a nutshell evaluating any technique pro-posed for WSNs through real deployment can getthe most convincing results although the evaluatingprocess is complex costly and time consuming

Data Source It refers to dataset use to experimentally validatethe proposed technique Two types of dataset are usedgenerally that is synthetic and real It is observed from thispaper that most of the techniques use the simulation onsynthetic dataset to validate the result In this paper it isobserved that most of the studies used the simulation due tolimited processing power of sensor nodes

Optimization Objective SinceWSNs are constrained in termsof different resources the technique is also evaluated in theoptimization objective that has been achieved Most of thetechniques consider the resource constraint and differentdesign philosophies of network None of them can workefficiently for all of the performance metrics like networksize communication overhead energy efficiency memoryconsumption node mobility and and so forth The largevariations in the performance metrics make it a difficult taskto present a comprehensive evaluation

International Journal of Distributed Sensor Networks 19

6 Limitations of Existing Data MiningTechniques for WSNs

Tables 2ndash6 show the characteristics of datamining techniquesdesigned for WSNs It is observed from comparative analysisthat the existing techniques have the following shortcomings

(i) Most of the techniques do not take into account theheterogeneous data and assume that the sensor data ishomogenous [42 46 49ndash51 65 87 110] They ignorethe fact that different attributes together can improvethe mining accuracy In some cases homogenousdata cannot contribute appropriately toward real-time decision

(ii) The majority of techniques only considers the spatialor temporal or spatiotemporal correlations [65ndash6787 88] among sensor data of neighboring nodes anddoes not consider the attribute dependency amongsensor nodes This in turn increases the computa-tional complexity and reduces the accuracy of miningtechnique

(iii) The techniqueswhich consider spatial correlation [51]among sensor data of neighboring nodes suffer fromthe choice of appropriate neighborhood range Tech-niques which consider temporal correlation amongsensor data suffers from the choice of the size of thesliding window

(iv) The majority of techniques uses centralized approach[21 42ndash44 46 58 84 101] in which all data istransmitted to the sink node for identifying certainpatterns These techniques cause much communica-tion overhead and delay the response time Whilethe techniques that used distributed architecture opti-mize response time and energy consumption theyhave the same problem as that of the centralizedapproach if the aggregatorcluster head has a largenumber of nodes under its membership

(v) Excluding a few the performance of all of the schemesdiscussed in this paper has been evaluated with thehelp of different simulation tools Although the num-ber of simulators is available and plays an importantrole for developing and testing new technique thereis always some kind of risk involved as simulationresults may not be accurate In order to analyze aprotocol more effectively it is important to knowdifferent available tools andunderstand the associatedbenefits and limitationsDue to different performancerequirements according to specific applications ageneral tool for sensor networks is still lacking atpresent

(vi) The techniques evaluated by using analytical mod-eling [21 23 46 49 100 109] used certain sim-plification and assumption to evaluate the perfor-mance of proposed technique Such assumptions andsimplifications may lead to imprecise results withlimited confidence None of the proposed techniqueis evaluated by using real deployment Although realdeployment is complex costly and time consuming

accurate results can only be obtained by using realdeployment

(vii) Excluding a few [22 103 109] the majority oftechniques assumes that sensor nodes are stationaryand do not consider nodes mobility Applying thesetechniques for mobile networks or the networks withdynamic changed topology would be challenging

(viii) Most of the techniques used the synthetic dataAlthough synthetic data is easily available therealways been chances that results generated on syn-thetic data are not accurate

(ix) For the data mining techniques themselves fre-quent pattern mining [15ndash20] approaches suffer fromchoice of proper and flexible support and confidencethreshold Clustering techniques [11ndash14] suffer fromthe choice of an appropriate parameter of clusterwidth and computing the distance between datainstances in heterogeneous data is computationallyexpensive whereas classification-based techniques[24ndash26] require some prior knowledge to classify theincoming data stream However learning accurateclassification model is challenging if the number ofvariables is large in deployed WSNs

7 Future Research Directions

It is observed from the analysis of existing data mining workon sensor network-based application there are still shortcom-ings in existing techniques By seeing these shortcomingsand special characteristics of WSNs there is a need for datamining technique designed for WSNs The technique shouldbe based on the following requirements

(i) The technique should combine offline learningmech-anisms with distributed and online data processing

(ii) It should also consider the resource constraint ofWSN and its special characteristics such as nodemobility and network topology

(iii) The technique should consider heterogeneous dataand dependencies among spatial temporal andattribute correlations which may exist between adja-cent nodes

(iv) During online mining the technique should be capa-ble for incremental learning

(v) The technique should have low computation com-plexity and be easy to be implemented

Based on aforementioned requirements for WSN ahybrid data mining framework is proposed as shown inFigure 6 In this framework sensor nodes use their pro-cessing abilities to locally carry out mining processing andtransmit only the required and partially processed data calledlocal models Single-pass algorithms are applied for networkdata processing as the data is continuously arriving and notavailable for the next scan

Local models contain the compact event patterns ratherthan raw data which address the issue of communication

20 International Journal of Distributed Sensor Networks

Node data processingData selectionRemove duplicationAggregationSummarizationData fusionclusteringAssociation analysismiddot middot middotmiddot middot middot

middot middot middot

Sensor datastream

Global model

Approximateresults

Network model Local modelQuery

Users

Sinkbasestation

In-network processingCentralized processing

Central data processingFrequent pattern miningClassificationClusteringIncremental learningPredicationAnomaly detectionTime series analysis

Network data processingLocal model integrationNetwork analysisReal time decisionsNetwork maintenance

Network patternidentification

monitoring

Sing

le p

ass

Mul

ti pa

ss

Figure 6 Proposed hybrid framework for sensor network based applications

overhead associated with data transfer Local models aredistributed on entire network which are integrated at specialnode which is resource sufficient as compared with othersensor nodes As a result a network model is computed that ismore abstract than local model and is transferred to the basestationsink inmultihop fashionThenetworkmodels are thenintegrated at base stationsink to get the global view of entirenetwork named the global model As a result approximatequery answers are returned to endusers

This framework addresses the following shortcomings ofthe existing techniques

(i) It combines the offline learning mechanisms withdistributed and online data processing The dynamicnature of WSNs data requires real-time analysismethodologies and systems Centralized processingthrough high-end computing is also required forgenerating offline predictive insights which in turncan facilitate real-time analysis The applications thatrequire real-time response and actions can use net-work model for decision and knowledge extractionThe applications that need extensive data analysis fortheir decision making can use global model and per-form central processing on base the stationsink Thenetwork model forwards the processed informationto global model for extensive predictive insight

(ii) Since the data management is a crucial issue inWSNsdata [111] in order to deal with large-scale data fromWSNs the proposed framework splits the data pro-cessing tasks at multiple locations in-network pro-cessing and processing at central server In-networkprocessing splits the large task into smaller ones atnode level and cluster head which is distributed overthe entire network and executes parallelly At the node

level storage capacities of single nodes are used tocompute the local model which contains aggregateddata from single node whereas cluster head acquiresthe data from group of nodes and aggregate datareadings over a certain region or period As a resultnetwork model is computed at each cluster headwhich contains compact data from set of nodes andreduces data size to be transmitted Network modelscan be integrated at sink to get the global view ofreal-time applications Since the sink at network levelhas restricted resource and cannot process large-scaledata for predictive analysis therefore network mod-els are sent to central server where global models canbe computed for predictive offline analysis Historicalquery from the user can also be addressed fromcentral server whereas instant query can be handledby sink to support real-time response In this way ofdata distribution the proposed framework is feasibleto deal with large amount of data obtained fromWSNs

(iii) It can consider the resource constraint of sensornode by using context-awareness techniques Mem-ory energy [79] and bandwidth are considered inthe implementation of data processing on the sensorsfor example many summarization and aggregationtechniques can be adopted to reduce energy andbandwidth consumption

(iv) The framework can address the problem quicklychanging nature of WSNs data where characteristicsof the monitored process may change over timeand render the old models outdated This problemcan be addressed using the incremental learning

International Journal of Distributed Sensor Networks 21

mechanism [39 112] that helps the model to updatenew information

(v) The framework can identified the spatial-temporalcorrelation at local model by using data correlation-based clustering whereas attribute correlation can beidentified at global model by using the multipass datamining algorithms

Currently we are working on implementation of thishybrid framework and the implementationwill be completedin the near future

8 Conclusion

The emerging need for the data mining techniques in thefield of WSNs resulted in the development of numerousalgorithms Each one of these algorithms solves certainissues related to the appropriate WSNs type and applicationIn this paper we analyzed discussed and compared therelated existing research approaches We observed that thetechniques intended for mining sensor data at the networkside are helpful for taking real-time decision aswell as serve asprerequisite for development of effective mechanism for datastorage retrieval query and transaction processing at centralside Moreover we have presented problem-based taxonomyan overall analysis and review of the past research and theirlimitations which can provide insights for endusers in apply-ing or developing an appropriate data mining method andappropriate technology forWSNs Based on these limitationswe have proposed a hybrid framework which can addressthe shortcomings of existing work We have also discussedthe challenges for implementing data mining techniques inresource-constrained WSNs Besides there are a number ofopen issues in existing studies which need to be addressedSurely the number of WSNs applications presented hereis neither complete nor exhaustive but merely a sample ofapplications that demonstrate the usefulness and possibleapplications of data mining method in sensor network

We believe that WSNs applications will become moremature and popular with the advancement of sensor tech-nology and sensor data will become more informationrich Mining techniques will then be very significant inorder to conduct advanced analysis such as determiningtrends and finding interesting patterns thus enhancingWSNsperformance and operation The intention to present thispaper is to stimulate interests in utilizing and developing theprevious studies into emerging applications

Acknowledgments

This work was supported in part by the Joint Funds ofNSFC-Microsoft Research Asia under Grant no 60933012the Specialized Research Fund for the Doctoral Programof Higher Education under Grant no 20110142110062 andInternational SampT Cooperation Program of Hubei Provinceunder Grant no 2010BFA008

References

[1] A Rozyyev H Hasbullah and F Subhan ldquoIndoor child track-ing in wireless sensor network using fuzzy logic techniquerdquoResearch Journal of Information Technology vol 3 no 2 pp 81ndash92 2011

[2] R Szewczyk E Osterweil J Polastre M Hamilton A Main-waring and D Estrin ldquoHabitat monitoring with sensor net-worksrdquo Communications of the ACM vol 47 no 6 pp 34ndash402004

[3] S H Chauhdary A K Bashir S C Shah and M S ParkldquoEOATR energy efficient object tracking by auto adjustingtransmission range in wireless sensor networkrdquo Journal ofApplied Sciences vol 9 no 24 pp 4247ndash4252 2009

[4] P K Biswas and S Phoha ldquoSelf-organizing sensor networks forintegrated target surveillancerdquo IEEETransactions onComputersvol 55 no 8 pp 1033ndash1047 2006

[5] L T Lee and C W Chen ldquoSynchronizing sensor networkswith pulse coupled and cluster based approachesrdquo InformationTechnology Journal vol 7 no 5 pp 737ndash745 2008

[6] N Sabri S A Aljunid B Ahmad A Yahya R KamaruddinandM S Salim ldquoWireless sensor actor network based on fuzzyinference system for greenhouse climate controlrdquo Journal ofApplied Sciences vol 11 no 17 pp 3104ndash3116 2011

[7] D Kumar ldquoMonitoring forest cover changes using remotesensing and GIS a global prospectiverdquo Research Journal ofEnvironmental Sciences vol 5 pp 105ndash123 2011

[8] J Yick B Mukherjee and D Ghosal ldquoWireless sensor networksurveyrdquoComputerNetworks vol 52 no 12 pp 2292ndash2330 2008

[9] T Arampatzis J Lygeros and S Manesis ldquoA survey of appli-cations of wireless sensors and wireless sensor networksrdquoin Proceedings of the 20th IEEE International Symposium onIntelligent Control (ISIC rsquo05) pp 719ndash724 June 2005

[10] Y-C Tseng M-S Pan and Y-Y Tsai ldquoWireless sensor net-works for emergency navigationrdquo Computer vol 39 no 7 pp55ndash62 2006

[11] T Yairi Y Kato and K Hori ldquoFault detection by miningassociation rules fromhouse-keeping datardquo inProceedings of the6th International Symposium on Artificial Intelligence Roboticsand Automation in Space pp 18ndash21 2001

[12] O Horovitz S Krishnaswamy and M M Gaber ldquoA fuzzyapproach for interpretation of ubiquitous data stream clusteringand its application in road safetyrdquo Intelligent Data Analysis vol11 no 1 pp 89ndash108 2007

[13] J Gama P P Rodrigues and L Lopes ldquoClustering distributedsensor data streams using local processing and reduced com-municationrdquo Intelligent Data Analysis vol 15 no 1 pp 3ndash282011

[14] Z A Aghbari I Kamel and T Awad ldquoOn clustering largenumber of data streamsrdquo Intelligent Data Analysis vol 16 no1 pp 69ndash91 2012

[15] A Boukerche and S Samarah ldquoAn efficient data extractionmechanism for mining association rules from wireless sensornetworksrdquo in Proceedings of the IEEE International Conferenceon Communications (ICC rsquo07) pp 3936ndash3941 June 2007

[16] Y Chi H Wang P S Yu and R R Muntz ldquoMomentmaintaining closed frequent itemsets over a stream slidingwindowrdquo inProceedings of the 4th IEEE International Conferenceon Data Mining (ICDM rsquo04) pp 59ndash66 November 2004

[17] M Deypir and M H Sadreddini ldquoEclatDS an efficient slid-ing window based frequent pattern mining method for data

22 International Journal of Distributed Sensor Networks

streamsrdquo Intelligent Data Analysis vol 15 no 4 pp 571ndash5872011

[18] J Gama A Ganguly O Omitaomu R Vatsavai and M GaberldquoKnowledge discovery from data streamsrdquo Intelligent DataAnalysis vol 13 no 3 pp 403ndash404 2009

[19] B George J M Kang and S Shekhar ldquoSpatio-temporal sensorgraphs (STSG) a data model for the discovery of spatio-temporal patternsrdquo Intelligent Data Analysis vol 13 no 3 pp457ndash475 2009

[20] A Mahmood K Shi and S Khatoon ldquoMining data generatedby sensor networks a surveyrdquo Information Technology Journalvol 11 pp 1534ndash1543 2012

[21] D J Cook M Youngblood E O Heierman III et alldquoMavHome an agent-based smart homerdquo in Proceedings of the1st IEEE International Conference on Pervasive Computing andCommunications (PerCom rsquo03) pp 521ndash524 March 2003

[22] J Rabatel S Bringay and P Poncelet ldquoSO MAD sensorminingfor anomaly detection in railway datardquo in Advances in DataMining Applications andTheoretical Aspects pp 191ndash205 2009

[23] V Guralnik and K Z Haigh ldquoLearning models of humanbehaviour with sequential patternsrdquo in Proceedings of the AAAI-02 Workshop on Automation as Caregiver pp 24ndash30 2002

[24] S Huang and Y Dong ldquoAn active learning system for miningtime-changing data streamsrdquo Intelligent Data Analysis vol 11no 4 pp 401ndash419 2007

[25] J Beringer and E Hullermeier ldquoEfficient instance-based learn-ing on data streamsrdquo Intelligent Data Analysis vol 11 no 6 pp627ndash650 2007

[26] E J Spinosaa A PD L F deCarvalhoa and J Gamab ldquoNoveltydetection with application to data streamsrdquo Intelligent DataAnalysis vol 13 no 3 pp 405ndash422 2009

[27] M Xie S Han B Tian and S Parvin ldquoAnomaly detectionin wireless sensor networks a surveyrdquo Journal of Network andComputer Applications vol 34 no 4 pp 1302ndash1325 2011

[28] Y Zhang N Meratnia and P Havinga ldquoOutlier detectiontechniques for wireless sensor networks a surveyrdquo IEEE Com-munications Surveys and Tutorials vol 12 no 2 pp 159ndash1702010

[29] V Chandola A Banerjee and V Kumar ldquoAnomaly detection asurveyrdquo ACM Computing Surveys vol 41 no 3 article 15 2009

[30] VMaojo and J Sanandre ldquoA survey of data mining techniquesrdquoMedical Data Analysis Lecture Notes in Computer Science vol1933 pp 17ndash22 2000

[31] W Jinlong X Congfu C Weidong and P Yunhe ldquoSurveyof the study on frequent pattern mining in data streamsrdquo inProceedings of the IEEE International Conference on SystemsMan and Cybernetics (SMC rsquo04) pp 5917ndash5922 October 2004

[32] J Cheng Y Ke and W Ng ldquoA survey on algorithms formining frequent itemsets over data streamsrdquo Knowledge andInformation Systems vol 16 no 1 pp 1ndash27 2008

[33] A A Abbasi andM Younis ldquoA survey on clustering algorithmsfor wireless sensor networksrdquo Computer Communications vol30 no 14-15 pp 2826ndash2841 2007

[34] O Boyinbode H Le and M Takizawa ldquoA survey on clusteringalgorithms for wireless sensor networksrdquo International Journalof Space-Based and SituatedComputing vol 1 no 2 pp 130ndash1362007

[35] M M Gaber A Zaslavsky and S Krishnaswamy ldquoA survey ofclassificationmethods in data streamsrdquo inData Streams pp 39ndash59 Springer 2007

[36] R Agrawal and R Srikant ldquoFast algorithms for mining associ-ation rulesrdquo in Proceedings of the 20th International ConferenceVery Large Data Bases (VLDB rsquo94) pp 487ndash499 Citeseer 1994

[37] R J Bayardo Jr ldquoEfficiently mining long patterns fromdatabasesrdquo SIGMOD Record vol 27 no 2 pp 85ndash93 1998

[38] S Brin RMotwani andC Silverstein ldquoBeyondmarket basketsgeneralizing association rules to correlationsrdquo SIGMODRecordvol 26 no 2 pp 265ndash276 1997

[39] W Cheung and O R Zaiane ldquoIncremental mining of frequentpatterns without candidate generation or support constraintrdquoin Proceedings of 7th International Database Engineering andApplications Symposium pp 111ndash116 2003

[40] R Agrawal T Imielinski and A Swami ldquoMining associationrules between sets of items in large databasesrdquo in Proceeding ofSIGMOD pp 207ndash216

[41] J Han J Pei Y Yin and R Mao ldquoMining frequent pat-terns without candidate generation a frequent-pattern treeapproachrdquo Data Mining and Knowledge Discovery vol 8 no 1pp 53ndash87 2004

[42] M Halatchev and L Gruenwald ldquoEstimating missing valuesin related sensor data streamsrdquo in Proceedings of the 11thInternational Conference on Management of Data (COMADrsquo05) 2005

[43] N Jiang ldquoDiscovering association rules in data streams basedon closed pattern miningrdquo in Proceedings of the SIGMODWorkshop on Innovative Database Research 2007

[44] N Jiang and L Gruenwald ldquoEstimating missing data in datastreamsrdquo Advances in Databases Concepts Systems and Appli-cations pp 981ndash987 2007

[45] N Jiang and L Gruenwald ldquoCFI-stream mining closed fre-quent itemsets in data streamsrdquo in Proceedings of the 12th ACMSIGKDD International Conference on Knowledge Discovery andData Mining (KDD rsquo06) pp 592ndash597 August 2006

[46] K Loo I Tong and B Kao ldquoOnline algorithms for min-ing inter-stream associations from large sensor networksrdquo inAdvances in KnowledgeDiscovery andDataMining pp 291ndash3022005

[47] G S Manku and R Motwani ldquoApproximate frequency countsover data streamsrdquo in Proceedings of the 28th InternationalConference on Very Large Data Bases pp 346ndash357 2002

[48] S K Chong S Krishnaswamy S W Loke and M M GaberldquoUsing association rules for energy conservation in wirelesssensor networksrdquo in Proceedings of the 23rd Annual ACMSymposium on Applied Computing (SAC rsquo08) pp 971ndash975March 2008

[49] S K Tanbeer C F Ahmed B-S Jeong and Y-K Lee ldquoEfficientmining of association rules from wireless sensor networksrdquo inProceedings of the 11th International Conference on AdvancedCommunication Technology (ICACT rsquo09) pp 719ndash724 February2009

[50] A Boukerche and S Samarah ldquoA novel algorithm for miningassociation rules in Wireless Ad Hoc Sensor Networksrdquo IEEETransactions on Parallel and Distributed Systems vol 19 no 7pp 865ndash877 2008

[51] K Romer ldquoDistributed mining of spatio-temporal event pat-terns in sensor networksrdquo in Proceedings of the 1st Euro-American Workshop on Middleware for Sensor Networks(EAWMS rsquo06) 2006

[52] BTnode platform httpwwwbtnodeethzch[53] R Agrawal and R Srikant ldquoMining sequential patternsrdquo in

Proceedings of the IEEE 11th International Conference on DataEngineering pp 3ndash14 March 1995

International Journal of Distributed Sensor Networks 23

[54] R Srikant and R Agrawal ldquoMining sequential patterns gen-eralizations and performance improvementsrdquo in Proceedings ofthe Advances in Database Technology (EDBT rsquo96) pp 1ndash17 1996

[55] F Masseglia F Cathala and P Poncelet ldquoThe PSP approachfor mining sequential patternsrdquo Principles of Data Mining andKnowledge Discovery pp 176ndash184 1998

[56] J Han J Pei B Mortazavi-Asl Q Chen U Dayal and M-CHsu ldquoFreeSpan frequent pattern-projected sequential patternminingrdquo in Proceedings of the Sixth ACMSIGKDD InternationalConference onKnowledgeDiscovery andDataMining (KDD rsquo01)pp 355ndash359 August 2000

[57] J Pei J Han B Mortazavi-Asl et al ldquoPrefixSpan min-ing sequential patterns efficiently by prefix-projected patterngrowthrdquo in Proceedings of the 17th International Conference onData Engineering pp 215ndash224 April 2001

[58] F Esposito T M A Basile N Di Mauro and S Ferilli ldquoA rela-tional approach to sensor network data miningrdquo InformationRetrieval and Mining in Distributed Environments pp 163ndash1812010

[59] F Esposito N Di Mauro T M A Basile and S FerillildquoMulti-dimensional relational sequence miningrdquo FundamentaInformaticae vol 89 no 1 pp 23ndash43 2008

[60] R Agrawal H Mannila R Srikant et al ldquoFast discovery ofassociation rulesrdquo inAdvances in KnowledgeDiscovery andDataMining pp 307ndash328 AAAI PressMenlo Park Calif USA 1996

[61] Mica2Dot CrossBow 2005 httpwwwxbowcom[62] Intel Berkeley Research Lab Data httpdbcsailmitedulab-

datalabdatahtml[63] P H Wu W C Peng and M S Chen ldquoMining sequential

alarm patterns in a telecommunication databaserdquo in Databasesin Telecommunications II pp 37ndash51 2001

[64] V S Tseng and E H-C Lu ldquoEnergy-efficient real-time objecttracking in multi-level sensor networks by mining and predict-ing movement patternsrdquo Journal of Systems and Software vol82 no 4 pp 697ndash706 2009

[65] V S Tseng and K W Lin ldquoEnergy efficient strategies for objecttracking in sensor networks a data mining approachrdquo Journalof Systems and Software vol 80 no 10 pp 1678ndash1698 2007

[66] S Samarah M Al-Hajri and A Boukerche ldquoA predictiveenergy-efficient technique to support object-tracking sensornetworksrdquo IEEE Transactions on Vehicular Technology vol 60no 2 pp 656ndash663 2011

[67] A Taherkordi R Mohammadi and F Eliassen ldquoA commu-nication-efficient distributed clustering algorithm for sensornetworksrdquo in Proceedings of the 22nd International Conferenceon Advanced Information Networking and Applications Work-shopsSymposia (AINA rsquo08) pp 634ndash638 March 2008

[68] G Gupta and M Younis ldquoLoad-balanced clustering of wirelesssensor networksrdquo in Proceedings of the International Conferenceon Communications (ICC rsquo03) vol 3 pp 1848ndash1852 May 2003

[69] S Bandyopadhyay and E J Coyle ldquoAn energy efficient hier-archical clustering algorithm for wireless sensor networksrdquo inProceedings of the 22nd Annual Joint Conference on the IEEEComputer and Communications Societies pp 1713ndash1723 April2003

[70] S Ghiasi A Srivastava X Yang and M Sarrafzadeh ldquoOptimalenergy aware clustering in sensor networksrdquo Sensors vol 2 no7 pp 258ndash269 2002

[71] O Younis and S Fahmy ldquoHEED a hybrid energy-efficientdistributed clustering approach for ad hoc sensor networksrdquoIEEE Transactions on Mobile Computing vol 3 no 4 pp 366ndash379 2004

[72] M Younis M Youssef and K Arisha ldquoEnergy-aware manage-ment for cluster-based sensor networksrdquo Computer Networksvol 43 no 5 pp 649ndash668 2003

[73] Y T Hou Y Shi H D Sherali and S F Midkiff ldquoOn energyprovisioning and relay node placement for wireless sensornetworksrdquo IEEE Transactions on Wireless Communications vol4 no 5 pp 2579ndash2590 2005

[74] T Wu and S Biswas ldquoA self-reorganizing slot allocation proto-col for multi-cluster sensor networksrdquo in Proceedings of the 4thInternational Symposium on Information Processing in SensorNetworks (IPSN rsquo05) pp 309ndash316 April 2005

[75] K Dasgupta K Kalpakis and P Namjoshi ldquoAn efficientclustering-based heuristic for data gathering and aggregationin sensor networksrdquo in Proceedings of the IEEE Wireless Com-munications and Networking Conference (WCNC rsquo03) vol 3 pp1948ndash1953 2003

[76] M Demirbas A Arora and V Mittal ldquoFLOC A fast local clus-tering service for wireless sensor networksrdquo in Proceedings ofWorkshop on Dependability Issues in Wireless Ad Hoc Networksand Sensor Networks (DIWANS rsquo04) 2004

[77] P Ding J Holliday and A Celik ldquoDistributed energy-efficienthierarchical clustering for wireless sensor networksrdquo in Pro-ceedings of the 1st IEEE International Conference on DistributedComputing in Sensor Systems (DCOSS rsquo05) pp 466ndash467 July2005

[78] H Chan and A Perrig ldquoACE an emergent algorithm for highlyuniform cluster formationrdquoWireless Sensor Networks vol 2920pp 154ndash171 2004

[79] H Chan M Luk and A Perrig ldquoUsing clustering informationfor sensor network localizationrdquo in Proceedings of the 1st IEEEInternational Conference on Distributed Computing in SensorSystems (DCOSS rsquo05) pp 109ndash125 July 2005

[80] H Huang and J Wu ldquoA probabilistic clustering algorithmin wireless sensor networksrdquo in Proceeding of IEEE 62ndSemiannual Vehicular Technology Conference (VTC rsquo05) p 17962005

[81] A Youssef M Younis M Youssef and A Agrawala ldquoDis-tributed formation of overlappingmulti-hop clusters in wirelesssensor networksrdquo in Proceedings of the 49th Annual IEEE GlobalCommunication Conference (Globecom rsquo06) pp 1ndash6 December2006

[82] S Dai P Wang L Gao and S Zheng ldquoMining clusteringalgorithm in wireless sensor networksrdquo in Proceedings of theIEEE International Conference on Granular Computing (GRCrsquo08) pp 178ndash182 August 2008

[83] W R Heinzelman A Chandrakasan and H Balakrish-nan ldquoEnergy-efficient communication protocol for wirelessmicrosensor networksrdquo in Proceedings of the 33rd AnnualHawaii International Conference on System Siences (HICSS rsquo00)vol 2 p 223 January 2000

[84] C Liu K Wu and J Pei ldquoA dynamic clustering and schedulingapproach to energy saving in data collection from wirelesssensor networksrdquo in Proceedings of the 2nd Annual IEEE Com-munications Society Conference on Sensor and AdHoc Commu-nications and Networks (SECON rsquo05) pp 374ndash385 September2005

[85] L Guo C Ai X Wang Z Cai and Y Li ldquoReal time clusteringof sensory data in wireless sensor networksrdquo in Proceedingsof the IEEE 28th International Performance Computing andCommunications Conference (IPCCC rsquo09) pp 33ndash40 December2009

24 International Journal of Distributed Sensor Networks

[86] M H Yeo M S Lee S J Lee and J S Yoo ldquoData correlation-based clustering in sensor networksrdquo in Proceedings of the Inter-national Symposium on Computer Science and its Applications(CSA rsquo08) pp 332ndash337 October 2008

[87] P Beyens A Nowe and K Steenhaut ldquoHigh-density wirelesssensor networks a new clustering approach for prediction-based monitoringrdquo in Proceedings of the 2nd European Work-shop on Wireless Sensor Networks (EWSN rsquo05) pp 188ndash196February 2005

[88] S Yoon and C Shahabi ldquoThe Clustered AGgregation (CAG)technique leveraging spatial and temporal correlations in wire-less sensor networksrdquo ACM Transactions on Sensor Networksvol 3 no 1 Article ID 1210672 2007

[89] K Wang S A Ayyash T D C Little and P Basu ldquoAttribute-based clustering for information dissemination in wirelesssensor networksrdquo in Proceedings of the 2nd Annual IEEE Com-munications Society Conference on Sensor and AdHoc Commu-nications and Networks (SECON rsquo05) pp 498ndash509 Santa ClaraCalif USA September 2005

[90] X Ma S Li Q Luo et al ldquoDistributed hierarchical clusteringand summarization in sensor networksrdquo in Advances in Dataand Web Management pp 168ndash175 2007

[91] L K Sharma O P Vyas S Schieder et al ldquoNearest neighbourclassification for trajectory datardquo Information and Communica-tion Technologies vol 101 pp 180ndash185 2010

[92] B Chikhaoui S Wang and H Pigot ldquoA new algorithm basedon sequential pattern mining for person identification in ubiq-uitous environmentsrdquo in Proceedings of the 4th InternationalWorkshop on Knowledge Discovery form Sensor Data (ACMSensorKDD rsquo10) pp 20ndash28 Washington DC USA 2010

[93] J R M Bauchet S Giroux H Pigot et al ldquoPervasive assistancein smart homes for people with intellectual disabilities a casestudy on meal preparationrdquo International Journal of AssistiveRobotics and Mechatronics vol 9 no 4 pp 42ndash54 2008

[94] D J Cook andM Schmitter-Edgecombe ldquoAssessing the qualityof activities in a smart environmentrdquoMethods of Information inMedicine vol 48 no 5 pp 480ndash485 2009

[95] I H Witten and E Frank Data Mining Practical MachineLearning Tools and Techniques With Java Implementation Mor-gan Kaufmann 2000

[96] K Sharma M Rajpoot and L K Sharma ldquoNearest neighbourclassification for wireless sensor network datardquo InternationalJournal of Computer Trends and Technology no 2 2011

[97] NS2 Simulator httpwwwisiedunsnamns[98] O P V L K Sharma S Schieder and A K Akasapu ldquoA nearest

neighbour classification for trajectory datardquo in Springer CCISvol 101 pp 180ndash185 2010

[99] M J Akhlaghinia A Lotfi C Langensiepen and N SherkatldquoA fuzzy predictor model for the occupancy prediction of anintelligent inhabited environmentrdquo in Proceedings of the IEEEInternational Conference on Fuzzy Systems (FUZZ rsquo08) pp 939ndash946 June 2008

[100] M Gaber S Krishnaswamy and A Zaslavsky ldquoOn-boardmining of data streams in sensor networksrdquo in AdvancedMethods for Knowledge Discovery from Complex Data pp 307ndash335 2005

[101] M M Gaber S Krishnaswamy and A Zaslavsky ldquoAdaptivemining techniques for data streams using algorithm outputgranularityrdquo in Proceedings of the Australasian Data MiningWorkshop 2003

[102] M M Gaber A Zaslavsky and S Krishnaswamy ldquoResource-aware knowledge discovery in data streamsrdquo in Proceedingsof 1st International Workshop on Knowledge Discovery in DataStreams held in Conjunction ECML and PKDD 2004

[103] S M McConnell and D B Skillicorn ldquoA distributed approachfor prediction in sensor networksrdquo in Proceedings of the Work-shop on Data Mining in Sensor Networks Newport Beach CalifUSA 2005

[104] B Malhotra I Nikolaidis and J Harms ldquoDistributed classifi-cation of acoustic targets in wireless audio-sensor networksrdquoComputer Networks vol 52 no 13 pp 2582ndash2593 2008

[105] K Flouri B Beferull-Lozano and T Tsakalides ldquoTraininga SVM-based classifier in distributed sensor networksrdquo inProceedings of the 14th International Conference onDigital SignalProcessing (DSP rsquo09) pp 1ndash5 2006

[106] K Flouri B Beferull-Lozano and T Tsakalides ldquoEnergy-efficient distributed support vectormachines for wireless sensornetworksrdquo in Proceedings of the EuropeanWorkshop onWirelessSensor Networks 2006

[107] K Flouri B Beferull-Lozano and T Tsakalides ldquoDistributedconsensus algorithms for SVM training in wireless sensornetworksrdquo in Proceedings of the 16th European Signal ProcessingConference (EUSIPCO 09) 2008

[108] S Rajasegarar C Leckie M Palaniswami and J C BezdekldquoQuarter sphere based distributed anomaly detection in wire-less sensor networksrdquo in Proceedings of the IEEE InternationalConference on Communications (ICC rsquo07) pp 3864ndash3869 June2007

[109] B Chikhaoui S Wang and H Pigot ldquoA new algorithm basedon sequential pattern mining for person identification in ubiq-uitous environmentsrdquo in Proceedings of the 4th InternationalWorkshop on Knowledge Discovery form Sensor Data (ACMSensorKDD rsquo10) pp 20ndash28 Washington DC USA 2010

[110] K Romer and F Mattern ldquoThe design space of wireless sensornetworksrdquo IEEEWireless Communications vol 11 no 6 pp 54ndash61 2004

[111] O Diallo J J P C Rodrigues and M Sene ldquoReal-time datamanagement on wireless sensor networks a surveyrdquo Journal ofNetwork andComputer Applications vol 35 no 3 pp 1013ndash10212012

[112] Y Yao L Feng B Jin and F Chen ldquoAn incremental learningapproachwith SupportVectorMachine for network data streamclassification problemrdquo Information Technology Journal vol 11no 2 pp 200ndash208 2012

Submit your manuscripts athttpwwwhindawicom

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2013Part I

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

DistributedSensor Networks

International Journal of

ISRN Signal Processing

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Mechanical Engineering

Advances in

Modelling amp Simulation in EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Advances inOptoElectronics

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2013

ISRN Sensor Networks

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawi Publishing Corporation httpwwwhindawicom Volume 2013

The Scientific World Journal

ISRN Robotics

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

International Journal of

Antennas andPropagation

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

ISRN Electronics

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

thinspJournalthinspofthinsp

Sensors

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Active and Passive Electronic Components

Chemical EngineeringInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Electrical and Computer Engineering

Journal of

ISRN Civil Engineering

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Advances inAcoustics ampVibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Page 16: ReviewArticle Data Mining Techniques for Wireless Sensor ...home.etf.bg.ac.rs/~vm/os/dmsw/Data Mining... · have a large impact on type of data mining algorithm to choose;therefore,onehastodecidetheprocessing

16 International Journal of Distributed Sensor Networks

Table5Com

paris

onof

dataminingtechniqu

esforw

irelesssensor

networkscon

tinued

Approach

Objectiv

eDM

metho

d

Processin

gSensor

data

Nod

eproperties

Implem

entatio

nLimitatio

nsArchitecture

Attributes

Correlatio

nCon

nectivity

Mob

ility

Role

Nod

etask

Applicationarea

Evaluatio

nmetho

dDatas

ource

Optobjectiv

e

Distributed

Central

Homogenous

Heterogeneous

Attribute

Spatial

Temporal

Singlehop

Multihops

Static

Mobile

ClusterheadSensorRelay

Simulation

Analyticalmod

Real

Synthetic

Clusterin

gPredictio

nmod

el[87]

Predictio

n-based

mon

itorin

gHeuris

ticscheme

radicradic

radicradic

radicradic

radicradicradic

Localprediction

mod

elEn

vironm

ental

mon

itorin

gradic

radicCom

mun

ication

Clustero

verla

pping

CAG[88]

WSN

sbandw

idth

gain

Data

correlation-

based

cluste

ring

radicradic

radicradic

radicradic

radicradic

Dataa

ggregatio

nGenericWSN

sapplications

radicradic

Com

mun

ication

Sensorydataloss

EEDC[84]

On-demand

cluste

ring

Data

correlation-

based

cluste

ring

radicradic

radicradic

radicradic

radicSensea

ndsend

Surveillanced

ata

analysis

radicradicradic

Energy

Ineffi

cientfor

large

WSN

s

Clusterin

gsensorydata[67]Com

mun

ication

efficiency

K-means

radicradicradic

radicradic

radicradic

Data

summarization

Dataa

nalysis

radicradic

Com

mun

ication

Ineffi

cientfor

large

WSN

sAttributeb

ased

cluste

ring[89]

WSN

sbandw

idth

gain

Hierarchal

cluste

ringradic

radicradic

radicradic

radicradic

Datac

luste

ring

Mon

itorin

gand

tracking

radicradic

Com

mun

ication

Highcompu

tatio

ncost

DHCS

[90]

Uniform

data

distr

ibution

Hierarchal

cluste

ringradic

radicradicradic

radicradic

radicradic

Datac

luste

ring

and

summarization

Interactived

ata

analysis

radicMessage

redu

ction

Nod

esenergy

isigno

red

International Journal of Distributed Sensor Networks 17

Table6Com

paris

onof

dataminingtechniqu

esforw

irelesssensor

networkscon

tinued

Approach

Objectiv

eDM

metho

d

Processin

gSensor

data

Nod

eproperties

Implem

entatio

nLimitatio

nsArchitecture

Attributes

Correlatio

nCon

nectivity

Mob

ility

Role

Nod

etask

Applicationarea

Evaluatio

nmetho

dDatas

ource

Opt

objective

Distributed

Central

Homogenous

Heterogeneous

Attribute

Spatial

Temporal

Singlehop

Multihops

Static

Mobile

ClusterheadSensorRelay

Simulation

Analyticalmod

Real

Synthetic

Classifi

catio

nPerson

identifi

catio

nalgorithm

s[109]

Identifyhu

man

behavior

Decision

tree

radicradicradic

radicradic

radicSensea

ndsend

Health

care

radicradic

Classifi

catio

naccuracy

Doesn

otgu

arantee

thec

orrectness

Predictio

nfram

ework[103]

Distrib

uted

predictio

nDecision

tree

radicradic

radicradicradic

radicradic

Localprediction

Generic

radicradic

Predictio

naccuracy

Com

putatio

nal

complexity

NNTC

[96]

Real-time

classificatio

nNearest

neighb

orradicradic

radicradic

radicradic

Sensea

ndsend

Generic

radicradicradic

Classifi

catio

naccuracy

Not

evaluatedon

realdataset

LWClass[100]

Preserve

WSN

sresources

KNN

radicradic

radicradic

radicradic

Sensea

ndsend

Ubiqu

itous

environm

ents

radicradic

Resource

awareness

Non

adaptio

nto

conceptd

rift

FVLD

[104

]Lo

w-dim

ensio

nfeaturev

ector

generatio

nKN

NM

Lradic

radicradic

radicradic

radicradic

Classifi

catio

nVe

hicle

classificatio

nradic

radicEn

ergy

Highcostof

feature

vector

transm

ission

Fuzzypredictor

mod

el[99]

Occup

ancy

predictio

nFu

zzyrules

radicradic

radicradic

radicradic

Sensea

ndsend

Health

care

radicradic

Predictio

naccuracy

Ineffi

cientfor

complex

scenarios

Onlinelearning

[105]

Increm

ental

classificatio

nSV

Mradic

radicradic

radicradic

radicradic

Classifi

catio

nEn

vironm

ental

mon

itorin

gradic

radicEn

ergy

Com

putatio

nal

complexity

One-class

quarter-sphere

SVM

[108]

Ano

maly

detection

SVM

radicradic

radicradic

radicradicradic

Localano

maly

detection

Habitat

mon

itorin

gradic

radicEn

ergy

Igno

resspatia

lcorrelation

18 International Journal of Distributed Sensor Networks

mining becomes difficult because updates on this structureshould be persisted over time

Node Role Node can perform three types of role [33] asfollows

(i) Regular Sensor These are the nodes with limitedresources and they are used to sense the phenomenaand send the sensed data to the base station

(ii) Cluster Head Cluster head can be a regular sensornode or it can be rich in resources In centralizedapproaches cluster head is a regular sensor node thatonly controls the cluster membership In distributedapproaches besides responding for cluster formationCHs perform aggregationfusion of collected sensorsrsquodata Therefore they are equipped with significantlymore computation and communication resources

(iii) Relay It is the node that acts as medium to transmitthe data packet from one node to the others

Node Task In centralized approach node task is to sense thephenomena being monitored and send the sensed data to thebase station In distributed approaches node can performcomputation and can take action based on the detectedphenomena or target

55 Application Area We also evaluated the type of applica-tion benefited fromWSNs data mining techniques Here weexemplify some real-world applications as follows

(i) First is the environmental monitoring [5ndash7 51 5887] in which sensors are deployed in harsh andunattended regions to monitor the natural environ-ment Data mining techniques can identify when andwhere an event may occur and trigger an alarm upondetection

(ii) Second is the habitant and health monitoring [1 299 109] in which patientshumans are equipped withsmall sensors on multiple different positions of theirbody tomonitor their health or behaviorDataminingtechnique can identify the abnormal behavior andhelp to take effective action

(iii) Third is the object tracking [3 4 65 66] in whichsensors are embedded inmoving targets to track themin real-time Data mining techniques help to improvethe estimation of the location of targets and also tomake tracking more efficient and accurate

(iv) Fourth is the WSNs performance [46 48 50 51]WSNs are usually unattended and deployed in harshenvironment Sensor nodes are resource constrainedespecially in terms of power Data mining techniqueshelp to identify the faulty or dead nodes Theyalso help to conserve energy by using in-networkprocessing in which aggregated data is sent to centralside

(v) Fifth is the data analysis [67 84 90] Data miningtechniques help to discover potentially interesting

data patterns in a sensor network for a certainapplication

(vi) Sixth is the real-time monitoring [64 65 85] Datamining techniques especially distributed techniqueshelp to identify certain patterns and predict futureevents in a given time window which make real-timeresponse and action feasible

56 Implementation Each technique is also evaluated interms of experimental validation that is which dataset isused which WSNs optimization objectives are achieved andso forth

Evaluation Method Analytical modeling simulation andreal deployment are the most commonly used techniques toanalyze the performance of data mining technique forWSNs

(i) Analytical Modeling This method is very complexand usually certain simplifications are assumed topredict the performance of the proposed schemeSuch assumptions and simplifications may lead toimprecise results with limited confidence

(ii) Simulation It is the most popular and effectiveapproach to design and test any proposed schemein terms of cost and time it also provides higherlevel of details as comparedwith real implementationHowever the appropriate selection of a simulationframework according to problem and network char-acteristics is a critical task

(iii) Real Deployment It may not be feasible to evaluatethe performance of these techniques through realdeployment due to the unavailability of appropriatehardware in terms of technical and design limitationsUsually the real deployment requires hundreds ofsensor nodes and cost becomes another importantissue In a nutshell evaluating any technique pro-posed for WSNs through real deployment can getthe most convincing results although the evaluatingprocess is complex costly and time consuming

Data Source It refers to dataset use to experimentally validatethe proposed technique Two types of dataset are usedgenerally that is synthetic and real It is observed from thispaper that most of the techniques use the simulation onsynthetic dataset to validate the result In this paper it isobserved that most of the studies used the simulation due tolimited processing power of sensor nodes

Optimization Objective SinceWSNs are constrained in termsof different resources the technique is also evaluated in theoptimization objective that has been achieved Most of thetechniques consider the resource constraint and differentdesign philosophies of network None of them can workefficiently for all of the performance metrics like networksize communication overhead energy efficiency memoryconsumption node mobility and and so forth The largevariations in the performance metrics make it a difficult taskto present a comprehensive evaluation

International Journal of Distributed Sensor Networks 19

6 Limitations of Existing Data MiningTechniques for WSNs

Tables 2ndash6 show the characteristics of datamining techniquesdesigned for WSNs It is observed from comparative analysisthat the existing techniques have the following shortcomings

(i) Most of the techniques do not take into account theheterogeneous data and assume that the sensor data ishomogenous [42 46 49ndash51 65 87 110] They ignorethe fact that different attributes together can improvethe mining accuracy In some cases homogenousdata cannot contribute appropriately toward real-time decision

(ii) The majority of techniques only considers the spatialor temporal or spatiotemporal correlations [65ndash6787 88] among sensor data of neighboring nodes anddoes not consider the attribute dependency amongsensor nodes This in turn increases the computa-tional complexity and reduces the accuracy of miningtechnique

(iii) The techniqueswhich consider spatial correlation [51]among sensor data of neighboring nodes suffer fromthe choice of appropriate neighborhood range Tech-niques which consider temporal correlation amongsensor data suffers from the choice of the size of thesliding window

(iv) The majority of techniques uses centralized approach[21 42ndash44 46 58 84 101] in which all data istransmitted to the sink node for identifying certainpatterns These techniques cause much communica-tion overhead and delay the response time Whilethe techniques that used distributed architecture opti-mize response time and energy consumption theyhave the same problem as that of the centralizedapproach if the aggregatorcluster head has a largenumber of nodes under its membership

(v) Excluding a few the performance of all of the schemesdiscussed in this paper has been evaluated with thehelp of different simulation tools Although the num-ber of simulators is available and plays an importantrole for developing and testing new technique thereis always some kind of risk involved as simulationresults may not be accurate In order to analyze aprotocol more effectively it is important to knowdifferent available tools andunderstand the associatedbenefits and limitationsDue to different performancerequirements according to specific applications ageneral tool for sensor networks is still lacking atpresent

(vi) The techniques evaluated by using analytical mod-eling [21 23 46 49 100 109] used certain sim-plification and assumption to evaluate the perfor-mance of proposed technique Such assumptions andsimplifications may lead to imprecise results withlimited confidence None of the proposed techniqueis evaluated by using real deployment Although realdeployment is complex costly and time consuming

accurate results can only be obtained by using realdeployment

(vii) Excluding a few [22 103 109] the majority oftechniques assumes that sensor nodes are stationaryand do not consider nodes mobility Applying thesetechniques for mobile networks or the networks withdynamic changed topology would be challenging

(viii) Most of the techniques used the synthetic dataAlthough synthetic data is easily available therealways been chances that results generated on syn-thetic data are not accurate

(ix) For the data mining techniques themselves fre-quent pattern mining [15ndash20] approaches suffer fromchoice of proper and flexible support and confidencethreshold Clustering techniques [11ndash14] suffer fromthe choice of an appropriate parameter of clusterwidth and computing the distance between datainstances in heterogeneous data is computationallyexpensive whereas classification-based techniques[24ndash26] require some prior knowledge to classify theincoming data stream However learning accurateclassification model is challenging if the number ofvariables is large in deployed WSNs

7 Future Research Directions

It is observed from the analysis of existing data mining workon sensor network-based application there are still shortcom-ings in existing techniques By seeing these shortcomingsand special characteristics of WSNs there is a need for datamining technique designed for WSNs The technique shouldbe based on the following requirements

(i) The technique should combine offline learningmech-anisms with distributed and online data processing

(ii) It should also consider the resource constraint ofWSN and its special characteristics such as nodemobility and network topology

(iii) The technique should consider heterogeneous dataand dependencies among spatial temporal andattribute correlations which may exist between adja-cent nodes

(iv) During online mining the technique should be capa-ble for incremental learning

(v) The technique should have low computation com-plexity and be easy to be implemented

Based on aforementioned requirements for WSN ahybrid data mining framework is proposed as shown inFigure 6 In this framework sensor nodes use their pro-cessing abilities to locally carry out mining processing andtransmit only the required and partially processed data calledlocal models Single-pass algorithms are applied for networkdata processing as the data is continuously arriving and notavailable for the next scan

Local models contain the compact event patterns ratherthan raw data which address the issue of communication

20 International Journal of Distributed Sensor Networks

Node data processingData selectionRemove duplicationAggregationSummarizationData fusionclusteringAssociation analysismiddot middot middotmiddot middot middot

middot middot middot

Sensor datastream

Global model

Approximateresults

Network model Local modelQuery

Users

Sinkbasestation

In-network processingCentralized processing

Central data processingFrequent pattern miningClassificationClusteringIncremental learningPredicationAnomaly detectionTime series analysis

Network data processingLocal model integrationNetwork analysisReal time decisionsNetwork maintenance

Network patternidentification

monitoring

Sing

le p

ass

Mul

ti pa

ss

Figure 6 Proposed hybrid framework for sensor network based applications

overhead associated with data transfer Local models aredistributed on entire network which are integrated at specialnode which is resource sufficient as compared with othersensor nodes As a result a network model is computed that ismore abstract than local model and is transferred to the basestationsink inmultihop fashionThenetworkmodels are thenintegrated at base stationsink to get the global view of entirenetwork named the global model As a result approximatequery answers are returned to endusers

This framework addresses the following shortcomings ofthe existing techniques

(i) It combines the offline learning mechanisms withdistributed and online data processing The dynamicnature of WSNs data requires real-time analysismethodologies and systems Centralized processingthrough high-end computing is also required forgenerating offline predictive insights which in turncan facilitate real-time analysis The applications thatrequire real-time response and actions can use net-work model for decision and knowledge extractionThe applications that need extensive data analysis fortheir decision making can use global model and per-form central processing on base the stationsink Thenetwork model forwards the processed informationto global model for extensive predictive insight

(ii) Since the data management is a crucial issue inWSNsdata [111] in order to deal with large-scale data fromWSNs the proposed framework splits the data pro-cessing tasks at multiple locations in-network pro-cessing and processing at central server In-networkprocessing splits the large task into smaller ones atnode level and cluster head which is distributed overthe entire network and executes parallelly At the node

level storage capacities of single nodes are used tocompute the local model which contains aggregateddata from single node whereas cluster head acquiresthe data from group of nodes and aggregate datareadings over a certain region or period As a resultnetwork model is computed at each cluster headwhich contains compact data from set of nodes andreduces data size to be transmitted Network modelscan be integrated at sink to get the global view ofreal-time applications Since the sink at network levelhas restricted resource and cannot process large-scaledata for predictive analysis therefore network mod-els are sent to central server where global models canbe computed for predictive offline analysis Historicalquery from the user can also be addressed fromcentral server whereas instant query can be handledby sink to support real-time response In this way ofdata distribution the proposed framework is feasibleto deal with large amount of data obtained fromWSNs

(iii) It can consider the resource constraint of sensornode by using context-awareness techniques Mem-ory energy [79] and bandwidth are considered inthe implementation of data processing on the sensorsfor example many summarization and aggregationtechniques can be adopted to reduce energy andbandwidth consumption

(iv) The framework can address the problem quicklychanging nature of WSNs data where characteristicsof the monitored process may change over timeand render the old models outdated This problemcan be addressed using the incremental learning

International Journal of Distributed Sensor Networks 21

mechanism [39 112] that helps the model to updatenew information

(v) The framework can identified the spatial-temporalcorrelation at local model by using data correlation-based clustering whereas attribute correlation can beidentified at global model by using the multipass datamining algorithms

Currently we are working on implementation of thishybrid framework and the implementationwill be completedin the near future

8 Conclusion

The emerging need for the data mining techniques in thefield of WSNs resulted in the development of numerousalgorithms Each one of these algorithms solves certainissues related to the appropriate WSNs type and applicationIn this paper we analyzed discussed and compared therelated existing research approaches We observed that thetechniques intended for mining sensor data at the networkside are helpful for taking real-time decision aswell as serve asprerequisite for development of effective mechanism for datastorage retrieval query and transaction processing at centralside Moreover we have presented problem-based taxonomyan overall analysis and review of the past research and theirlimitations which can provide insights for endusers in apply-ing or developing an appropriate data mining method andappropriate technology forWSNs Based on these limitationswe have proposed a hybrid framework which can addressthe shortcomings of existing work We have also discussedthe challenges for implementing data mining techniques inresource-constrained WSNs Besides there are a number ofopen issues in existing studies which need to be addressedSurely the number of WSNs applications presented hereis neither complete nor exhaustive but merely a sample ofapplications that demonstrate the usefulness and possibleapplications of data mining method in sensor network

We believe that WSNs applications will become moremature and popular with the advancement of sensor tech-nology and sensor data will become more informationrich Mining techniques will then be very significant inorder to conduct advanced analysis such as determiningtrends and finding interesting patterns thus enhancingWSNsperformance and operation The intention to present thispaper is to stimulate interests in utilizing and developing theprevious studies into emerging applications

Acknowledgments

This work was supported in part by the Joint Funds ofNSFC-Microsoft Research Asia under Grant no 60933012the Specialized Research Fund for the Doctoral Programof Higher Education under Grant no 20110142110062 andInternational SampT Cooperation Program of Hubei Provinceunder Grant no 2010BFA008

References

[1] A Rozyyev H Hasbullah and F Subhan ldquoIndoor child track-ing in wireless sensor network using fuzzy logic techniquerdquoResearch Journal of Information Technology vol 3 no 2 pp 81ndash92 2011

[2] R Szewczyk E Osterweil J Polastre M Hamilton A Main-waring and D Estrin ldquoHabitat monitoring with sensor net-worksrdquo Communications of the ACM vol 47 no 6 pp 34ndash402004

[3] S H Chauhdary A K Bashir S C Shah and M S ParkldquoEOATR energy efficient object tracking by auto adjustingtransmission range in wireless sensor networkrdquo Journal ofApplied Sciences vol 9 no 24 pp 4247ndash4252 2009

[4] P K Biswas and S Phoha ldquoSelf-organizing sensor networks forintegrated target surveillancerdquo IEEETransactions onComputersvol 55 no 8 pp 1033ndash1047 2006

[5] L T Lee and C W Chen ldquoSynchronizing sensor networkswith pulse coupled and cluster based approachesrdquo InformationTechnology Journal vol 7 no 5 pp 737ndash745 2008

[6] N Sabri S A Aljunid B Ahmad A Yahya R KamaruddinandM S Salim ldquoWireless sensor actor network based on fuzzyinference system for greenhouse climate controlrdquo Journal ofApplied Sciences vol 11 no 17 pp 3104ndash3116 2011

[7] D Kumar ldquoMonitoring forest cover changes using remotesensing and GIS a global prospectiverdquo Research Journal ofEnvironmental Sciences vol 5 pp 105ndash123 2011

[8] J Yick B Mukherjee and D Ghosal ldquoWireless sensor networksurveyrdquoComputerNetworks vol 52 no 12 pp 2292ndash2330 2008

[9] T Arampatzis J Lygeros and S Manesis ldquoA survey of appli-cations of wireless sensors and wireless sensor networksrdquoin Proceedings of the 20th IEEE International Symposium onIntelligent Control (ISIC rsquo05) pp 719ndash724 June 2005

[10] Y-C Tseng M-S Pan and Y-Y Tsai ldquoWireless sensor net-works for emergency navigationrdquo Computer vol 39 no 7 pp55ndash62 2006

[11] T Yairi Y Kato and K Hori ldquoFault detection by miningassociation rules fromhouse-keeping datardquo inProceedings of the6th International Symposium on Artificial Intelligence Roboticsand Automation in Space pp 18ndash21 2001

[12] O Horovitz S Krishnaswamy and M M Gaber ldquoA fuzzyapproach for interpretation of ubiquitous data stream clusteringand its application in road safetyrdquo Intelligent Data Analysis vol11 no 1 pp 89ndash108 2007

[13] J Gama P P Rodrigues and L Lopes ldquoClustering distributedsensor data streams using local processing and reduced com-municationrdquo Intelligent Data Analysis vol 15 no 1 pp 3ndash282011

[14] Z A Aghbari I Kamel and T Awad ldquoOn clustering largenumber of data streamsrdquo Intelligent Data Analysis vol 16 no1 pp 69ndash91 2012

[15] A Boukerche and S Samarah ldquoAn efficient data extractionmechanism for mining association rules from wireless sensornetworksrdquo in Proceedings of the IEEE International Conferenceon Communications (ICC rsquo07) pp 3936ndash3941 June 2007

[16] Y Chi H Wang P S Yu and R R Muntz ldquoMomentmaintaining closed frequent itemsets over a stream slidingwindowrdquo inProceedings of the 4th IEEE International Conferenceon Data Mining (ICDM rsquo04) pp 59ndash66 November 2004

[17] M Deypir and M H Sadreddini ldquoEclatDS an efficient slid-ing window based frequent pattern mining method for data

22 International Journal of Distributed Sensor Networks

streamsrdquo Intelligent Data Analysis vol 15 no 4 pp 571ndash5872011

[18] J Gama A Ganguly O Omitaomu R Vatsavai and M GaberldquoKnowledge discovery from data streamsrdquo Intelligent DataAnalysis vol 13 no 3 pp 403ndash404 2009

[19] B George J M Kang and S Shekhar ldquoSpatio-temporal sensorgraphs (STSG) a data model for the discovery of spatio-temporal patternsrdquo Intelligent Data Analysis vol 13 no 3 pp457ndash475 2009

[20] A Mahmood K Shi and S Khatoon ldquoMining data generatedby sensor networks a surveyrdquo Information Technology Journalvol 11 pp 1534ndash1543 2012

[21] D J Cook M Youngblood E O Heierman III et alldquoMavHome an agent-based smart homerdquo in Proceedings of the1st IEEE International Conference on Pervasive Computing andCommunications (PerCom rsquo03) pp 521ndash524 March 2003

[22] J Rabatel S Bringay and P Poncelet ldquoSO MAD sensorminingfor anomaly detection in railway datardquo in Advances in DataMining Applications andTheoretical Aspects pp 191ndash205 2009

[23] V Guralnik and K Z Haigh ldquoLearning models of humanbehaviour with sequential patternsrdquo in Proceedings of the AAAI-02 Workshop on Automation as Caregiver pp 24ndash30 2002

[24] S Huang and Y Dong ldquoAn active learning system for miningtime-changing data streamsrdquo Intelligent Data Analysis vol 11no 4 pp 401ndash419 2007

[25] J Beringer and E Hullermeier ldquoEfficient instance-based learn-ing on data streamsrdquo Intelligent Data Analysis vol 11 no 6 pp627ndash650 2007

[26] E J Spinosaa A PD L F deCarvalhoa and J Gamab ldquoNoveltydetection with application to data streamsrdquo Intelligent DataAnalysis vol 13 no 3 pp 405ndash422 2009

[27] M Xie S Han B Tian and S Parvin ldquoAnomaly detectionin wireless sensor networks a surveyrdquo Journal of Network andComputer Applications vol 34 no 4 pp 1302ndash1325 2011

[28] Y Zhang N Meratnia and P Havinga ldquoOutlier detectiontechniques for wireless sensor networks a surveyrdquo IEEE Com-munications Surveys and Tutorials vol 12 no 2 pp 159ndash1702010

[29] V Chandola A Banerjee and V Kumar ldquoAnomaly detection asurveyrdquo ACM Computing Surveys vol 41 no 3 article 15 2009

[30] VMaojo and J Sanandre ldquoA survey of data mining techniquesrdquoMedical Data Analysis Lecture Notes in Computer Science vol1933 pp 17ndash22 2000

[31] W Jinlong X Congfu C Weidong and P Yunhe ldquoSurveyof the study on frequent pattern mining in data streamsrdquo inProceedings of the IEEE International Conference on SystemsMan and Cybernetics (SMC rsquo04) pp 5917ndash5922 October 2004

[32] J Cheng Y Ke and W Ng ldquoA survey on algorithms formining frequent itemsets over data streamsrdquo Knowledge andInformation Systems vol 16 no 1 pp 1ndash27 2008

[33] A A Abbasi andM Younis ldquoA survey on clustering algorithmsfor wireless sensor networksrdquo Computer Communications vol30 no 14-15 pp 2826ndash2841 2007

[34] O Boyinbode H Le and M Takizawa ldquoA survey on clusteringalgorithms for wireless sensor networksrdquo International Journalof Space-Based and SituatedComputing vol 1 no 2 pp 130ndash1362007

[35] M M Gaber A Zaslavsky and S Krishnaswamy ldquoA survey ofclassificationmethods in data streamsrdquo inData Streams pp 39ndash59 Springer 2007

[36] R Agrawal and R Srikant ldquoFast algorithms for mining associ-ation rulesrdquo in Proceedings of the 20th International ConferenceVery Large Data Bases (VLDB rsquo94) pp 487ndash499 Citeseer 1994

[37] R J Bayardo Jr ldquoEfficiently mining long patterns fromdatabasesrdquo SIGMOD Record vol 27 no 2 pp 85ndash93 1998

[38] S Brin RMotwani andC Silverstein ldquoBeyondmarket basketsgeneralizing association rules to correlationsrdquo SIGMODRecordvol 26 no 2 pp 265ndash276 1997

[39] W Cheung and O R Zaiane ldquoIncremental mining of frequentpatterns without candidate generation or support constraintrdquoin Proceedings of 7th International Database Engineering andApplications Symposium pp 111ndash116 2003

[40] R Agrawal T Imielinski and A Swami ldquoMining associationrules between sets of items in large databasesrdquo in Proceeding ofSIGMOD pp 207ndash216

[41] J Han J Pei Y Yin and R Mao ldquoMining frequent pat-terns without candidate generation a frequent-pattern treeapproachrdquo Data Mining and Knowledge Discovery vol 8 no 1pp 53ndash87 2004

[42] M Halatchev and L Gruenwald ldquoEstimating missing valuesin related sensor data streamsrdquo in Proceedings of the 11thInternational Conference on Management of Data (COMADrsquo05) 2005

[43] N Jiang ldquoDiscovering association rules in data streams basedon closed pattern miningrdquo in Proceedings of the SIGMODWorkshop on Innovative Database Research 2007

[44] N Jiang and L Gruenwald ldquoEstimating missing data in datastreamsrdquo Advances in Databases Concepts Systems and Appli-cations pp 981ndash987 2007

[45] N Jiang and L Gruenwald ldquoCFI-stream mining closed fre-quent itemsets in data streamsrdquo in Proceedings of the 12th ACMSIGKDD International Conference on Knowledge Discovery andData Mining (KDD rsquo06) pp 592ndash597 August 2006

[46] K Loo I Tong and B Kao ldquoOnline algorithms for min-ing inter-stream associations from large sensor networksrdquo inAdvances in KnowledgeDiscovery andDataMining pp 291ndash3022005

[47] G S Manku and R Motwani ldquoApproximate frequency countsover data streamsrdquo in Proceedings of the 28th InternationalConference on Very Large Data Bases pp 346ndash357 2002

[48] S K Chong S Krishnaswamy S W Loke and M M GaberldquoUsing association rules for energy conservation in wirelesssensor networksrdquo in Proceedings of the 23rd Annual ACMSymposium on Applied Computing (SAC rsquo08) pp 971ndash975March 2008

[49] S K Tanbeer C F Ahmed B-S Jeong and Y-K Lee ldquoEfficientmining of association rules from wireless sensor networksrdquo inProceedings of the 11th International Conference on AdvancedCommunication Technology (ICACT rsquo09) pp 719ndash724 February2009

[50] A Boukerche and S Samarah ldquoA novel algorithm for miningassociation rules in Wireless Ad Hoc Sensor Networksrdquo IEEETransactions on Parallel and Distributed Systems vol 19 no 7pp 865ndash877 2008

[51] K Romer ldquoDistributed mining of spatio-temporal event pat-terns in sensor networksrdquo in Proceedings of the 1st Euro-American Workshop on Middleware for Sensor Networks(EAWMS rsquo06) 2006

[52] BTnode platform httpwwwbtnodeethzch[53] R Agrawal and R Srikant ldquoMining sequential patternsrdquo in

Proceedings of the IEEE 11th International Conference on DataEngineering pp 3ndash14 March 1995

International Journal of Distributed Sensor Networks 23

[54] R Srikant and R Agrawal ldquoMining sequential patterns gen-eralizations and performance improvementsrdquo in Proceedings ofthe Advances in Database Technology (EDBT rsquo96) pp 1ndash17 1996

[55] F Masseglia F Cathala and P Poncelet ldquoThe PSP approachfor mining sequential patternsrdquo Principles of Data Mining andKnowledge Discovery pp 176ndash184 1998

[56] J Han J Pei B Mortazavi-Asl Q Chen U Dayal and M-CHsu ldquoFreeSpan frequent pattern-projected sequential patternminingrdquo in Proceedings of the Sixth ACMSIGKDD InternationalConference onKnowledgeDiscovery andDataMining (KDD rsquo01)pp 355ndash359 August 2000

[57] J Pei J Han B Mortazavi-Asl et al ldquoPrefixSpan min-ing sequential patterns efficiently by prefix-projected patterngrowthrdquo in Proceedings of the 17th International Conference onData Engineering pp 215ndash224 April 2001

[58] F Esposito T M A Basile N Di Mauro and S Ferilli ldquoA rela-tional approach to sensor network data miningrdquo InformationRetrieval and Mining in Distributed Environments pp 163ndash1812010

[59] F Esposito N Di Mauro T M A Basile and S FerillildquoMulti-dimensional relational sequence miningrdquo FundamentaInformaticae vol 89 no 1 pp 23ndash43 2008

[60] R Agrawal H Mannila R Srikant et al ldquoFast discovery ofassociation rulesrdquo inAdvances in KnowledgeDiscovery andDataMining pp 307ndash328 AAAI PressMenlo Park Calif USA 1996

[61] Mica2Dot CrossBow 2005 httpwwwxbowcom[62] Intel Berkeley Research Lab Data httpdbcsailmitedulab-

datalabdatahtml[63] P H Wu W C Peng and M S Chen ldquoMining sequential

alarm patterns in a telecommunication databaserdquo in Databasesin Telecommunications II pp 37ndash51 2001

[64] V S Tseng and E H-C Lu ldquoEnergy-efficient real-time objecttracking in multi-level sensor networks by mining and predict-ing movement patternsrdquo Journal of Systems and Software vol82 no 4 pp 697ndash706 2009

[65] V S Tseng and K W Lin ldquoEnergy efficient strategies for objecttracking in sensor networks a data mining approachrdquo Journalof Systems and Software vol 80 no 10 pp 1678ndash1698 2007

[66] S Samarah M Al-Hajri and A Boukerche ldquoA predictiveenergy-efficient technique to support object-tracking sensornetworksrdquo IEEE Transactions on Vehicular Technology vol 60no 2 pp 656ndash663 2011

[67] A Taherkordi R Mohammadi and F Eliassen ldquoA commu-nication-efficient distributed clustering algorithm for sensornetworksrdquo in Proceedings of the 22nd International Conferenceon Advanced Information Networking and Applications Work-shopsSymposia (AINA rsquo08) pp 634ndash638 March 2008

[68] G Gupta and M Younis ldquoLoad-balanced clustering of wirelesssensor networksrdquo in Proceedings of the International Conferenceon Communications (ICC rsquo03) vol 3 pp 1848ndash1852 May 2003

[69] S Bandyopadhyay and E J Coyle ldquoAn energy efficient hier-archical clustering algorithm for wireless sensor networksrdquo inProceedings of the 22nd Annual Joint Conference on the IEEEComputer and Communications Societies pp 1713ndash1723 April2003

[70] S Ghiasi A Srivastava X Yang and M Sarrafzadeh ldquoOptimalenergy aware clustering in sensor networksrdquo Sensors vol 2 no7 pp 258ndash269 2002

[71] O Younis and S Fahmy ldquoHEED a hybrid energy-efficientdistributed clustering approach for ad hoc sensor networksrdquoIEEE Transactions on Mobile Computing vol 3 no 4 pp 366ndash379 2004

[72] M Younis M Youssef and K Arisha ldquoEnergy-aware manage-ment for cluster-based sensor networksrdquo Computer Networksvol 43 no 5 pp 649ndash668 2003

[73] Y T Hou Y Shi H D Sherali and S F Midkiff ldquoOn energyprovisioning and relay node placement for wireless sensornetworksrdquo IEEE Transactions on Wireless Communications vol4 no 5 pp 2579ndash2590 2005

[74] T Wu and S Biswas ldquoA self-reorganizing slot allocation proto-col for multi-cluster sensor networksrdquo in Proceedings of the 4thInternational Symposium on Information Processing in SensorNetworks (IPSN rsquo05) pp 309ndash316 April 2005

[75] K Dasgupta K Kalpakis and P Namjoshi ldquoAn efficientclustering-based heuristic for data gathering and aggregationin sensor networksrdquo in Proceedings of the IEEE Wireless Com-munications and Networking Conference (WCNC rsquo03) vol 3 pp1948ndash1953 2003

[76] M Demirbas A Arora and V Mittal ldquoFLOC A fast local clus-tering service for wireless sensor networksrdquo in Proceedings ofWorkshop on Dependability Issues in Wireless Ad Hoc Networksand Sensor Networks (DIWANS rsquo04) 2004

[77] P Ding J Holliday and A Celik ldquoDistributed energy-efficienthierarchical clustering for wireless sensor networksrdquo in Pro-ceedings of the 1st IEEE International Conference on DistributedComputing in Sensor Systems (DCOSS rsquo05) pp 466ndash467 July2005

[78] H Chan and A Perrig ldquoACE an emergent algorithm for highlyuniform cluster formationrdquoWireless Sensor Networks vol 2920pp 154ndash171 2004

[79] H Chan M Luk and A Perrig ldquoUsing clustering informationfor sensor network localizationrdquo in Proceedings of the 1st IEEEInternational Conference on Distributed Computing in SensorSystems (DCOSS rsquo05) pp 109ndash125 July 2005

[80] H Huang and J Wu ldquoA probabilistic clustering algorithmin wireless sensor networksrdquo in Proceeding of IEEE 62ndSemiannual Vehicular Technology Conference (VTC rsquo05) p 17962005

[81] A Youssef M Younis M Youssef and A Agrawala ldquoDis-tributed formation of overlappingmulti-hop clusters in wirelesssensor networksrdquo in Proceedings of the 49th Annual IEEE GlobalCommunication Conference (Globecom rsquo06) pp 1ndash6 December2006

[82] S Dai P Wang L Gao and S Zheng ldquoMining clusteringalgorithm in wireless sensor networksrdquo in Proceedings of theIEEE International Conference on Granular Computing (GRCrsquo08) pp 178ndash182 August 2008

[83] W R Heinzelman A Chandrakasan and H Balakrish-nan ldquoEnergy-efficient communication protocol for wirelessmicrosensor networksrdquo in Proceedings of the 33rd AnnualHawaii International Conference on System Siences (HICSS rsquo00)vol 2 p 223 January 2000

[84] C Liu K Wu and J Pei ldquoA dynamic clustering and schedulingapproach to energy saving in data collection from wirelesssensor networksrdquo in Proceedings of the 2nd Annual IEEE Com-munications Society Conference on Sensor and AdHoc Commu-nications and Networks (SECON rsquo05) pp 374ndash385 September2005

[85] L Guo C Ai X Wang Z Cai and Y Li ldquoReal time clusteringof sensory data in wireless sensor networksrdquo in Proceedingsof the IEEE 28th International Performance Computing andCommunications Conference (IPCCC rsquo09) pp 33ndash40 December2009

24 International Journal of Distributed Sensor Networks

[86] M H Yeo M S Lee S J Lee and J S Yoo ldquoData correlation-based clustering in sensor networksrdquo in Proceedings of the Inter-national Symposium on Computer Science and its Applications(CSA rsquo08) pp 332ndash337 October 2008

[87] P Beyens A Nowe and K Steenhaut ldquoHigh-density wirelesssensor networks a new clustering approach for prediction-based monitoringrdquo in Proceedings of the 2nd European Work-shop on Wireless Sensor Networks (EWSN rsquo05) pp 188ndash196February 2005

[88] S Yoon and C Shahabi ldquoThe Clustered AGgregation (CAG)technique leveraging spatial and temporal correlations in wire-less sensor networksrdquo ACM Transactions on Sensor Networksvol 3 no 1 Article ID 1210672 2007

[89] K Wang S A Ayyash T D C Little and P Basu ldquoAttribute-based clustering for information dissemination in wirelesssensor networksrdquo in Proceedings of the 2nd Annual IEEE Com-munications Society Conference on Sensor and AdHoc Commu-nications and Networks (SECON rsquo05) pp 498ndash509 Santa ClaraCalif USA September 2005

[90] X Ma S Li Q Luo et al ldquoDistributed hierarchical clusteringand summarization in sensor networksrdquo in Advances in Dataand Web Management pp 168ndash175 2007

[91] L K Sharma O P Vyas S Schieder et al ldquoNearest neighbourclassification for trajectory datardquo Information and Communica-tion Technologies vol 101 pp 180ndash185 2010

[92] B Chikhaoui S Wang and H Pigot ldquoA new algorithm basedon sequential pattern mining for person identification in ubiq-uitous environmentsrdquo in Proceedings of the 4th InternationalWorkshop on Knowledge Discovery form Sensor Data (ACMSensorKDD rsquo10) pp 20ndash28 Washington DC USA 2010

[93] J R M Bauchet S Giroux H Pigot et al ldquoPervasive assistancein smart homes for people with intellectual disabilities a casestudy on meal preparationrdquo International Journal of AssistiveRobotics and Mechatronics vol 9 no 4 pp 42ndash54 2008

[94] D J Cook andM Schmitter-Edgecombe ldquoAssessing the qualityof activities in a smart environmentrdquoMethods of Information inMedicine vol 48 no 5 pp 480ndash485 2009

[95] I H Witten and E Frank Data Mining Practical MachineLearning Tools and Techniques With Java Implementation Mor-gan Kaufmann 2000

[96] K Sharma M Rajpoot and L K Sharma ldquoNearest neighbourclassification for wireless sensor network datardquo InternationalJournal of Computer Trends and Technology no 2 2011

[97] NS2 Simulator httpwwwisiedunsnamns[98] O P V L K Sharma S Schieder and A K Akasapu ldquoA nearest

neighbour classification for trajectory datardquo in Springer CCISvol 101 pp 180ndash185 2010

[99] M J Akhlaghinia A Lotfi C Langensiepen and N SherkatldquoA fuzzy predictor model for the occupancy prediction of anintelligent inhabited environmentrdquo in Proceedings of the IEEEInternational Conference on Fuzzy Systems (FUZZ rsquo08) pp 939ndash946 June 2008

[100] M Gaber S Krishnaswamy and A Zaslavsky ldquoOn-boardmining of data streams in sensor networksrdquo in AdvancedMethods for Knowledge Discovery from Complex Data pp 307ndash335 2005

[101] M M Gaber S Krishnaswamy and A Zaslavsky ldquoAdaptivemining techniques for data streams using algorithm outputgranularityrdquo in Proceedings of the Australasian Data MiningWorkshop 2003

[102] M M Gaber A Zaslavsky and S Krishnaswamy ldquoResource-aware knowledge discovery in data streamsrdquo in Proceedingsof 1st International Workshop on Knowledge Discovery in DataStreams held in Conjunction ECML and PKDD 2004

[103] S M McConnell and D B Skillicorn ldquoA distributed approachfor prediction in sensor networksrdquo in Proceedings of the Work-shop on Data Mining in Sensor Networks Newport Beach CalifUSA 2005

[104] B Malhotra I Nikolaidis and J Harms ldquoDistributed classifi-cation of acoustic targets in wireless audio-sensor networksrdquoComputer Networks vol 52 no 13 pp 2582ndash2593 2008

[105] K Flouri B Beferull-Lozano and T Tsakalides ldquoTraininga SVM-based classifier in distributed sensor networksrdquo inProceedings of the 14th International Conference onDigital SignalProcessing (DSP rsquo09) pp 1ndash5 2006

[106] K Flouri B Beferull-Lozano and T Tsakalides ldquoEnergy-efficient distributed support vectormachines for wireless sensornetworksrdquo in Proceedings of the EuropeanWorkshop onWirelessSensor Networks 2006

[107] K Flouri B Beferull-Lozano and T Tsakalides ldquoDistributedconsensus algorithms for SVM training in wireless sensornetworksrdquo in Proceedings of the 16th European Signal ProcessingConference (EUSIPCO 09) 2008

[108] S Rajasegarar C Leckie M Palaniswami and J C BezdekldquoQuarter sphere based distributed anomaly detection in wire-less sensor networksrdquo in Proceedings of the IEEE InternationalConference on Communications (ICC rsquo07) pp 3864ndash3869 June2007

[109] B Chikhaoui S Wang and H Pigot ldquoA new algorithm basedon sequential pattern mining for person identification in ubiq-uitous environmentsrdquo in Proceedings of the 4th InternationalWorkshop on Knowledge Discovery form Sensor Data (ACMSensorKDD rsquo10) pp 20ndash28 Washington DC USA 2010

[110] K Romer and F Mattern ldquoThe design space of wireless sensornetworksrdquo IEEEWireless Communications vol 11 no 6 pp 54ndash61 2004

[111] O Diallo J J P C Rodrigues and M Sene ldquoReal-time datamanagement on wireless sensor networks a surveyrdquo Journal ofNetwork andComputer Applications vol 35 no 3 pp 1013ndash10212012

[112] Y Yao L Feng B Jin and F Chen ldquoAn incremental learningapproachwith SupportVectorMachine for network data streamclassification problemrdquo Information Technology Journal vol 11no 2 pp 200ndash208 2012

Submit your manuscripts athttpwwwhindawicom

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2013Part I

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

DistributedSensor Networks

International Journal of

ISRN Signal Processing

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Mechanical Engineering

Advances in

Modelling amp Simulation in EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Advances inOptoElectronics

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2013

ISRN Sensor Networks

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawi Publishing Corporation httpwwwhindawicom Volume 2013

The Scientific World Journal

ISRN Robotics

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

International Journal of

Antennas andPropagation

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

ISRN Electronics

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

thinspJournalthinspofthinsp

Sensors

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Active and Passive Electronic Components

Chemical EngineeringInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Electrical and Computer Engineering

Journal of

ISRN Civil Engineering

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Advances inAcoustics ampVibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Page 17: ReviewArticle Data Mining Techniques for Wireless Sensor ...home.etf.bg.ac.rs/~vm/os/dmsw/Data Mining... · have a large impact on type of data mining algorithm to choose;therefore,onehastodecidetheprocessing

International Journal of Distributed Sensor Networks 17

Table6Com

paris

onof

dataminingtechniqu

esforw

irelesssensor

networkscon

tinued

Approach

Objectiv

eDM

metho

d

Processin

gSensor

data

Nod

eproperties

Implem

entatio

nLimitatio

nsArchitecture

Attributes

Correlatio

nCon

nectivity

Mob

ility

Role

Nod

etask

Applicationarea

Evaluatio

nmetho

dDatas

ource

Opt

objective

Distributed

Central

Homogenous

Heterogeneous

Attribute

Spatial

Temporal

Singlehop

Multihops

Static

Mobile

ClusterheadSensorRelay

Simulation

Analyticalmod

Real

Synthetic

Classifi

catio

nPerson

identifi

catio

nalgorithm

s[109]

Identifyhu

man

behavior

Decision

tree

radicradicradic

radicradic

radicSensea

ndsend

Health

care

radicradic

Classifi

catio

naccuracy

Doesn

otgu

arantee

thec

orrectness

Predictio

nfram

ework[103]

Distrib

uted

predictio

nDecision

tree

radicradic

radicradicradic

radicradic

Localprediction

Generic

radicradic

Predictio

naccuracy

Com

putatio

nal

complexity

NNTC

[96]

Real-time

classificatio

nNearest

neighb

orradicradic

radicradic

radicradic

Sensea

ndsend

Generic

radicradicradic

Classifi

catio

naccuracy

Not

evaluatedon

realdataset

LWClass[100]

Preserve

WSN

sresources

KNN

radicradic

radicradic

radicradic

Sensea

ndsend

Ubiqu

itous

environm

ents

radicradic

Resource

awareness

Non

adaptio

nto

conceptd

rift

FVLD

[104

]Lo

w-dim

ensio

nfeaturev

ector

generatio

nKN

NM

Lradic

radicradic

radicradic

radicradic

Classifi

catio

nVe

hicle

classificatio

nradic

radicEn

ergy

Highcostof

feature

vector

transm

ission

Fuzzypredictor

mod

el[99]

Occup

ancy

predictio

nFu

zzyrules

radicradic

radicradic

radicradic

Sensea

ndsend

Health

care

radicradic

Predictio

naccuracy

Ineffi

cientfor

complex

scenarios

Onlinelearning

[105]

Increm

ental

classificatio

nSV

Mradic

radicradic

radicradic

radicradic

Classifi

catio

nEn

vironm

ental

mon

itorin

gradic

radicEn

ergy

Com

putatio

nal

complexity

One-class

quarter-sphere

SVM

[108]

Ano

maly

detection

SVM

radicradic

radicradic

radicradicradic

Localano

maly

detection

Habitat

mon

itorin

gradic

radicEn

ergy

Igno

resspatia

lcorrelation

18 International Journal of Distributed Sensor Networks

mining becomes difficult because updates on this structureshould be persisted over time

Node Role Node can perform three types of role [33] asfollows

(i) Regular Sensor These are the nodes with limitedresources and they are used to sense the phenomenaand send the sensed data to the base station

(ii) Cluster Head Cluster head can be a regular sensornode or it can be rich in resources In centralizedapproaches cluster head is a regular sensor node thatonly controls the cluster membership In distributedapproaches besides responding for cluster formationCHs perform aggregationfusion of collected sensorsrsquodata Therefore they are equipped with significantlymore computation and communication resources

(iii) Relay It is the node that acts as medium to transmitthe data packet from one node to the others

Node Task In centralized approach node task is to sense thephenomena being monitored and send the sensed data to thebase station In distributed approaches node can performcomputation and can take action based on the detectedphenomena or target

55 Application Area We also evaluated the type of applica-tion benefited fromWSNs data mining techniques Here weexemplify some real-world applications as follows

(i) First is the environmental monitoring [5ndash7 51 5887] in which sensors are deployed in harsh andunattended regions to monitor the natural environ-ment Data mining techniques can identify when andwhere an event may occur and trigger an alarm upondetection

(ii) Second is the habitant and health monitoring [1 299 109] in which patientshumans are equipped withsmall sensors on multiple different positions of theirbody tomonitor their health or behaviorDataminingtechnique can identify the abnormal behavior andhelp to take effective action

(iii) Third is the object tracking [3 4 65 66] in whichsensors are embedded inmoving targets to track themin real-time Data mining techniques help to improvethe estimation of the location of targets and also tomake tracking more efficient and accurate

(iv) Fourth is the WSNs performance [46 48 50 51]WSNs are usually unattended and deployed in harshenvironment Sensor nodes are resource constrainedespecially in terms of power Data mining techniqueshelp to identify the faulty or dead nodes Theyalso help to conserve energy by using in-networkprocessing in which aggregated data is sent to centralside

(v) Fifth is the data analysis [67 84 90] Data miningtechniques help to discover potentially interesting

data patterns in a sensor network for a certainapplication

(vi) Sixth is the real-time monitoring [64 65 85] Datamining techniques especially distributed techniqueshelp to identify certain patterns and predict futureevents in a given time window which make real-timeresponse and action feasible

56 Implementation Each technique is also evaluated interms of experimental validation that is which dataset isused which WSNs optimization objectives are achieved andso forth

Evaluation Method Analytical modeling simulation andreal deployment are the most commonly used techniques toanalyze the performance of data mining technique forWSNs

(i) Analytical Modeling This method is very complexand usually certain simplifications are assumed topredict the performance of the proposed schemeSuch assumptions and simplifications may lead toimprecise results with limited confidence

(ii) Simulation It is the most popular and effectiveapproach to design and test any proposed schemein terms of cost and time it also provides higherlevel of details as comparedwith real implementationHowever the appropriate selection of a simulationframework according to problem and network char-acteristics is a critical task

(iii) Real Deployment It may not be feasible to evaluatethe performance of these techniques through realdeployment due to the unavailability of appropriatehardware in terms of technical and design limitationsUsually the real deployment requires hundreds ofsensor nodes and cost becomes another importantissue In a nutshell evaluating any technique pro-posed for WSNs through real deployment can getthe most convincing results although the evaluatingprocess is complex costly and time consuming

Data Source It refers to dataset use to experimentally validatethe proposed technique Two types of dataset are usedgenerally that is synthetic and real It is observed from thispaper that most of the techniques use the simulation onsynthetic dataset to validate the result In this paper it isobserved that most of the studies used the simulation due tolimited processing power of sensor nodes

Optimization Objective SinceWSNs are constrained in termsof different resources the technique is also evaluated in theoptimization objective that has been achieved Most of thetechniques consider the resource constraint and differentdesign philosophies of network None of them can workefficiently for all of the performance metrics like networksize communication overhead energy efficiency memoryconsumption node mobility and and so forth The largevariations in the performance metrics make it a difficult taskto present a comprehensive evaluation

International Journal of Distributed Sensor Networks 19

6 Limitations of Existing Data MiningTechniques for WSNs

Tables 2ndash6 show the characteristics of datamining techniquesdesigned for WSNs It is observed from comparative analysisthat the existing techniques have the following shortcomings

(i) Most of the techniques do not take into account theheterogeneous data and assume that the sensor data ishomogenous [42 46 49ndash51 65 87 110] They ignorethe fact that different attributes together can improvethe mining accuracy In some cases homogenousdata cannot contribute appropriately toward real-time decision

(ii) The majority of techniques only considers the spatialor temporal or spatiotemporal correlations [65ndash6787 88] among sensor data of neighboring nodes anddoes not consider the attribute dependency amongsensor nodes This in turn increases the computa-tional complexity and reduces the accuracy of miningtechnique

(iii) The techniqueswhich consider spatial correlation [51]among sensor data of neighboring nodes suffer fromthe choice of appropriate neighborhood range Tech-niques which consider temporal correlation amongsensor data suffers from the choice of the size of thesliding window

(iv) The majority of techniques uses centralized approach[21 42ndash44 46 58 84 101] in which all data istransmitted to the sink node for identifying certainpatterns These techniques cause much communica-tion overhead and delay the response time Whilethe techniques that used distributed architecture opti-mize response time and energy consumption theyhave the same problem as that of the centralizedapproach if the aggregatorcluster head has a largenumber of nodes under its membership

(v) Excluding a few the performance of all of the schemesdiscussed in this paper has been evaluated with thehelp of different simulation tools Although the num-ber of simulators is available and plays an importantrole for developing and testing new technique thereis always some kind of risk involved as simulationresults may not be accurate In order to analyze aprotocol more effectively it is important to knowdifferent available tools andunderstand the associatedbenefits and limitationsDue to different performancerequirements according to specific applications ageneral tool for sensor networks is still lacking atpresent

(vi) The techniques evaluated by using analytical mod-eling [21 23 46 49 100 109] used certain sim-plification and assumption to evaluate the perfor-mance of proposed technique Such assumptions andsimplifications may lead to imprecise results withlimited confidence None of the proposed techniqueis evaluated by using real deployment Although realdeployment is complex costly and time consuming

accurate results can only be obtained by using realdeployment

(vii) Excluding a few [22 103 109] the majority oftechniques assumes that sensor nodes are stationaryand do not consider nodes mobility Applying thesetechniques for mobile networks or the networks withdynamic changed topology would be challenging

(viii) Most of the techniques used the synthetic dataAlthough synthetic data is easily available therealways been chances that results generated on syn-thetic data are not accurate

(ix) For the data mining techniques themselves fre-quent pattern mining [15ndash20] approaches suffer fromchoice of proper and flexible support and confidencethreshold Clustering techniques [11ndash14] suffer fromthe choice of an appropriate parameter of clusterwidth and computing the distance between datainstances in heterogeneous data is computationallyexpensive whereas classification-based techniques[24ndash26] require some prior knowledge to classify theincoming data stream However learning accurateclassification model is challenging if the number ofvariables is large in deployed WSNs

7 Future Research Directions

It is observed from the analysis of existing data mining workon sensor network-based application there are still shortcom-ings in existing techniques By seeing these shortcomingsand special characteristics of WSNs there is a need for datamining technique designed for WSNs The technique shouldbe based on the following requirements

(i) The technique should combine offline learningmech-anisms with distributed and online data processing

(ii) It should also consider the resource constraint ofWSN and its special characteristics such as nodemobility and network topology

(iii) The technique should consider heterogeneous dataand dependencies among spatial temporal andattribute correlations which may exist between adja-cent nodes

(iv) During online mining the technique should be capa-ble for incremental learning

(v) The technique should have low computation com-plexity and be easy to be implemented

Based on aforementioned requirements for WSN ahybrid data mining framework is proposed as shown inFigure 6 In this framework sensor nodes use their pro-cessing abilities to locally carry out mining processing andtransmit only the required and partially processed data calledlocal models Single-pass algorithms are applied for networkdata processing as the data is continuously arriving and notavailable for the next scan

Local models contain the compact event patterns ratherthan raw data which address the issue of communication

20 International Journal of Distributed Sensor Networks

Node data processingData selectionRemove duplicationAggregationSummarizationData fusionclusteringAssociation analysismiddot middot middotmiddot middot middot

middot middot middot

Sensor datastream

Global model

Approximateresults

Network model Local modelQuery

Users

Sinkbasestation

In-network processingCentralized processing

Central data processingFrequent pattern miningClassificationClusteringIncremental learningPredicationAnomaly detectionTime series analysis

Network data processingLocal model integrationNetwork analysisReal time decisionsNetwork maintenance

Network patternidentification

monitoring

Sing

le p

ass

Mul

ti pa

ss

Figure 6 Proposed hybrid framework for sensor network based applications

overhead associated with data transfer Local models aredistributed on entire network which are integrated at specialnode which is resource sufficient as compared with othersensor nodes As a result a network model is computed that ismore abstract than local model and is transferred to the basestationsink inmultihop fashionThenetworkmodels are thenintegrated at base stationsink to get the global view of entirenetwork named the global model As a result approximatequery answers are returned to endusers

This framework addresses the following shortcomings ofthe existing techniques

(i) It combines the offline learning mechanisms withdistributed and online data processing The dynamicnature of WSNs data requires real-time analysismethodologies and systems Centralized processingthrough high-end computing is also required forgenerating offline predictive insights which in turncan facilitate real-time analysis The applications thatrequire real-time response and actions can use net-work model for decision and knowledge extractionThe applications that need extensive data analysis fortheir decision making can use global model and per-form central processing on base the stationsink Thenetwork model forwards the processed informationto global model for extensive predictive insight

(ii) Since the data management is a crucial issue inWSNsdata [111] in order to deal with large-scale data fromWSNs the proposed framework splits the data pro-cessing tasks at multiple locations in-network pro-cessing and processing at central server In-networkprocessing splits the large task into smaller ones atnode level and cluster head which is distributed overthe entire network and executes parallelly At the node

level storage capacities of single nodes are used tocompute the local model which contains aggregateddata from single node whereas cluster head acquiresthe data from group of nodes and aggregate datareadings over a certain region or period As a resultnetwork model is computed at each cluster headwhich contains compact data from set of nodes andreduces data size to be transmitted Network modelscan be integrated at sink to get the global view ofreal-time applications Since the sink at network levelhas restricted resource and cannot process large-scaledata for predictive analysis therefore network mod-els are sent to central server where global models canbe computed for predictive offline analysis Historicalquery from the user can also be addressed fromcentral server whereas instant query can be handledby sink to support real-time response In this way ofdata distribution the proposed framework is feasibleto deal with large amount of data obtained fromWSNs

(iii) It can consider the resource constraint of sensornode by using context-awareness techniques Mem-ory energy [79] and bandwidth are considered inthe implementation of data processing on the sensorsfor example many summarization and aggregationtechniques can be adopted to reduce energy andbandwidth consumption

(iv) The framework can address the problem quicklychanging nature of WSNs data where characteristicsof the monitored process may change over timeand render the old models outdated This problemcan be addressed using the incremental learning

International Journal of Distributed Sensor Networks 21

mechanism [39 112] that helps the model to updatenew information

(v) The framework can identified the spatial-temporalcorrelation at local model by using data correlation-based clustering whereas attribute correlation can beidentified at global model by using the multipass datamining algorithms

Currently we are working on implementation of thishybrid framework and the implementationwill be completedin the near future

8 Conclusion

The emerging need for the data mining techniques in thefield of WSNs resulted in the development of numerousalgorithms Each one of these algorithms solves certainissues related to the appropriate WSNs type and applicationIn this paper we analyzed discussed and compared therelated existing research approaches We observed that thetechniques intended for mining sensor data at the networkside are helpful for taking real-time decision aswell as serve asprerequisite for development of effective mechanism for datastorage retrieval query and transaction processing at centralside Moreover we have presented problem-based taxonomyan overall analysis and review of the past research and theirlimitations which can provide insights for endusers in apply-ing or developing an appropriate data mining method andappropriate technology forWSNs Based on these limitationswe have proposed a hybrid framework which can addressthe shortcomings of existing work We have also discussedthe challenges for implementing data mining techniques inresource-constrained WSNs Besides there are a number ofopen issues in existing studies which need to be addressedSurely the number of WSNs applications presented hereis neither complete nor exhaustive but merely a sample ofapplications that demonstrate the usefulness and possibleapplications of data mining method in sensor network

We believe that WSNs applications will become moremature and popular with the advancement of sensor tech-nology and sensor data will become more informationrich Mining techniques will then be very significant inorder to conduct advanced analysis such as determiningtrends and finding interesting patterns thus enhancingWSNsperformance and operation The intention to present thispaper is to stimulate interests in utilizing and developing theprevious studies into emerging applications

Acknowledgments

This work was supported in part by the Joint Funds ofNSFC-Microsoft Research Asia under Grant no 60933012the Specialized Research Fund for the Doctoral Programof Higher Education under Grant no 20110142110062 andInternational SampT Cooperation Program of Hubei Provinceunder Grant no 2010BFA008

References

[1] A Rozyyev H Hasbullah and F Subhan ldquoIndoor child track-ing in wireless sensor network using fuzzy logic techniquerdquoResearch Journal of Information Technology vol 3 no 2 pp 81ndash92 2011

[2] R Szewczyk E Osterweil J Polastre M Hamilton A Main-waring and D Estrin ldquoHabitat monitoring with sensor net-worksrdquo Communications of the ACM vol 47 no 6 pp 34ndash402004

[3] S H Chauhdary A K Bashir S C Shah and M S ParkldquoEOATR energy efficient object tracking by auto adjustingtransmission range in wireless sensor networkrdquo Journal ofApplied Sciences vol 9 no 24 pp 4247ndash4252 2009

[4] P K Biswas and S Phoha ldquoSelf-organizing sensor networks forintegrated target surveillancerdquo IEEETransactions onComputersvol 55 no 8 pp 1033ndash1047 2006

[5] L T Lee and C W Chen ldquoSynchronizing sensor networkswith pulse coupled and cluster based approachesrdquo InformationTechnology Journal vol 7 no 5 pp 737ndash745 2008

[6] N Sabri S A Aljunid B Ahmad A Yahya R KamaruddinandM S Salim ldquoWireless sensor actor network based on fuzzyinference system for greenhouse climate controlrdquo Journal ofApplied Sciences vol 11 no 17 pp 3104ndash3116 2011

[7] D Kumar ldquoMonitoring forest cover changes using remotesensing and GIS a global prospectiverdquo Research Journal ofEnvironmental Sciences vol 5 pp 105ndash123 2011

[8] J Yick B Mukherjee and D Ghosal ldquoWireless sensor networksurveyrdquoComputerNetworks vol 52 no 12 pp 2292ndash2330 2008

[9] T Arampatzis J Lygeros and S Manesis ldquoA survey of appli-cations of wireless sensors and wireless sensor networksrdquoin Proceedings of the 20th IEEE International Symposium onIntelligent Control (ISIC rsquo05) pp 719ndash724 June 2005

[10] Y-C Tseng M-S Pan and Y-Y Tsai ldquoWireless sensor net-works for emergency navigationrdquo Computer vol 39 no 7 pp55ndash62 2006

[11] T Yairi Y Kato and K Hori ldquoFault detection by miningassociation rules fromhouse-keeping datardquo inProceedings of the6th International Symposium on Artificial Intelligence Roboticsand Automation in Space pp 18ndash21 2001

[12] O Horovitz S Krishnaswamy and M M Gaber ldquoA fuzzyapproach for interpretation of ubiquitous data stream clusteringand its application in road safetyrdquo Intelligent Data Analysis vol11 no 1 pp 89ndash108 2007

[13] J Gama P P Rodrigues and L Lopes ldquoClustering distributedsensor data streams using local processing and reduced com-municationrdquo Intelligent Data Analysis vol 15 no 1 pp 3ndash282011

[14] Z A Aghbari I Kamel and T Awad ldquoOn clustering largenumber of data streamsrdquo Intelligent Data Analysis vol 16 no1 pp 69ndash91 2012

[15] A Boukerche and S Samarah ldquoAn efficient data extractionmechanism for mining association rules from wireless sensornetworksrdquo in Proceedings of the IEEE International Conferenceon Communications (ICC rsquo07) pp 3936ndash3941 June 2007

[16] Y Chi H Wang P S Yu and R R Muntz ldquoMomentmaintaining closed frequent itemsets over a stream slidingwindowrdquo inProceedings of the 4th IEEE International Conferenceon Data Mining (ICDM rsquo04) pp 59ndash66 November 2004

[17] M Deypir and M H Sadreddini ldquoEclatDS an efficient slid-ing window based frequent pattern mining method for data

22 International Journal of Distributed Sensor Networks

streamsrdquo Intelligent Data Analysis vol 15 no 4 pp 571ndash5872011

[18] J Gama A Ganguly O Omitaomu R Vatsavai and M GaberldquoKnowledge discovery from data streamsrdquo Intelligent DataAnalysis vol 13 no 3 pp 403ndash404 2009

[19] B George J M Kang and S Shekhar ldquoSpatio-temporal sensorgraphs (STSG) a data model for the discovery of spatio-temporal patternsrdquo Intelligent Data Analysis vol 13 no 3 pp457ndash475 2009

[20] A Mahmood K Shi and S Khatoon ldquoMining data generatedby sensor networks a surveyrdquo Information Technology Journalvol 11 pp 1534ndash1543 2012

[21] D J Cook M Youngblood E O Heierman III et alldquoMavHome an agent-based smart homerdquo in Proceedings of the1st IEEE International Conference on Pervasive Computing andCommunications (PerCom rsquo03) pp 521ndash524 March 2003

[22] J Rabatel S Bringay and P Poncelet ldquoSO MAD sensorminingfor anomaly detection in railway datardquo in Advances in DataMining Applications andTheoretical Aspects pp 191ndash205 2009

[23] V Guralnik and K Z Haigh ldquoLearning models of humanbehaviour with sequential patternsrdquo in Proceedings of the AAAI-02 Workshop on Automation as Caregiver pp 24ndash30 2002

[24] S Huang and Y Dong ldquoAn active learning system for miningtime-changing data streamsrdquo Intelligent Data Analysis vol 11no 4 pp 401ndash419 2007

[25] J Beringer and E Hullermeier ldquoEfficient instance-based learn-ing on data streamsrdquo Intelligent Data Analysis vol 11 no 6 pp627ndash650 2007

[26] E J Spinosaa A PD L F deCarvalhoa and J Gamab ldquoNoveltydetection with application to data streamsrdquo Intelligent DataAnalysis vol 13 no 3 pp 405ndash422 2009

[27] M Xie S Han B Tian and S Parvin ldquoAnomaly detectionin wireless sensor networks a surveyrdquo Journal of Network andComputer Applications vol 34 no 4 pp 1302ndash1325 2011

[28] Y Zhang N Meratnia and P Havinga ldquoOutlier detectiontechniques for wireless sensor networks a surveyrdquo IEEE Com-munications Surveys and Tutorials vol 12 no 2 pp 159ndash1702010

[29] V Chandola A Banerjee and V Kumar ldquoAnomaly detection asurveyrdquo ACM Computing Surveys vol 41 no 3 article 15 2009

[30] VMaojo and J Sanandre ldquoA survey of data mining techniquesrdquoMedical Data Analysis Lecture Notes in Computer Science vol1933 pp 17ndash22 2000

[31] W Jinlong X Congfu C Weidong and P Yunhe ldquoSurveyof the study on frequent pattern mining in data streamsrdquo inProceedings of the IEEE International Conference on SystemsMan and Cybernetics (SMC rsquo04) pp 5917ndash5922 October 2004

[32] J Cheng Y Ke and W Ng ldquoA survey on algorithms formining frequent itemsets over data streamsrdquo Knowledge andInformation Systems vol 16 no 1 pp 1ndash27 2008

[33] A A Abbasi andM Younis ldquoA survey on clustering algorithmsfor wireless sensor networksrdquo Computer Communications vol30 no 14-15 pp 2826ndash2841 2007

[34] O Boyinbode H Le and M Takizawa ldquoA survey on clusteringalgorithms for wireless sensor networksrdquo International Journalof Space-Based and SituatedComputing vol 1 no 2 pp 130ndash1362007

[35] M M Gaber A Zaslavsky and S Krishnaswamy ldquoA survey ofclassificationmethods in data streamsrdquo inData Streams pp 39ndash59 Springer 2007

[36] R Agrawal and R Srikant ldquoFast algorithms for mining associ-ation rulesrdquo in Proceedings of the 20th International ConferenceVery Large Data Bases (VLDB rsquo94) pp 487ndash499 Citeseer 1994

[37] R J Bayardo Jr ldquoEfficiently mining long patterns fromdatabasesrdquo SIGMOD Record vol 27 no 2 pp 85ndash93 1998

[38] S Brin RMotwani andC Silverstein ldquoBeyondmarket basketsgeneralizing association rules to correlationsrdquo SIGMODRecordvol 26 no 2 pp 265ndash276 1997

[39] W Cheung and O R Zaiane ldquoIncremental mining of frequentpatterns without candidate generation or support constraintrdquoin Proceedings of 7th International Database Engineering andApplications Symposium pp 111ndash116 2003

[40] R Agrawal T Imielinski and A Swami ldquoMining associationrules between sets of items in large databasesrdquo in Proceeding ofSIGMOD pp 207ndash216

[41] J Han J Pei Y Yin and R Mao ldquoMining frequent pat-terns without candidate generation a frequent-pattern treeapproachrdquo Data Mining and Knowledge Discovery vol 8 no 1pp 53ndash87 2004

[42] M Halatchev and L Gruenwald ldquoEstimating missing valuesin related sensor data streamsrdquo in Proceedings of the 11thInternational Conference on Management of Data (COMADrsquo05) 2005

[43] N Jiang ldquoDiscovering association rules in data streams basedon closed pattern miningrdquo in Proceedings of the SIGMODWorkshop on Innovative Database Research 2007

[44] N Jiang and L Gruenwald ldquoEstimating missing data in datastreamsrdquo Advances in Databases Concepts Systems and Appli-cations pp 981ndash987 2007

[45] N Jiang and L Gruenwald ldquoCFI-stream mining closed fre-quent itemsets in data streamsrdquo in Proceedings of the 12th ACMSIGKDD International Conference on Knowledge Discovery andData Mining (KDD rsquo06) pp 592ndash597 August 2006

[46] K Loo I Tong and B Kao ldquoOnline algorithms for min-ing inter-stream associations from large sensor networksrdquo inAdvances in KnowledgeDiscovery andDataMining pp 291ndash3022005

[47] G S Manku and R Motwani ldquoApproximate frequency countsover data streamsrdquo in Proceedings of the 28th InternationalConference on Very Large Data Bases pp 346ndash357 2002

[48] S K Chong S Krishnaswamy S W Loke and M M GaberldquoUsing association rules for energy conservation in wirelesssensor networksrdquo in Proceedings of the 23rd Annual ACMSymposium on Applied Computing (SAC rsquo08) pp 971ndash975March 2008

[49] S K Tanbeer C F Ahmed B-S Jeong and Y-K Lee ldquoEfficientmining of association rules from wireless sensor networksrdquo inProceedings of the 11th International Conference on AdvancedCommunication Technology (ICACT rsquo09) pp 719ndash724 February2009

[50] A Boukerche and S Samarah ldquoA novel algorithm for miningassociation rules in Wireless Ad Hoc Sensor Networksrdquo IEEETransactions on Parallel and Distributed Systems vol 19 no 7pp 865ndash877 2008

[51] K Romer ldquoDistributed mining of spatio-temporal event pat-terns in sensor networksrdquo in Proceedings of the 1st Euro-American Workshop on Middleware for Sensor Networks(EAWMS rsquo06) 2006

[52] BTnode platform httpwwwbtnodeethzch[53] R Agrawal and R Srikant ldquoMining sequential patternsrdquo in

Proceedings of the IEEE 11th International Conference on DataEngineering pp 3ndash14 March 1995

International Journal of Distributed Sensor Networks 23

[54] R Srikant and R Agrawal ldquoMining sequential patterns gen-eralizations and performance improvementsrdquo in Proceedings ofthe Advances in Database Technology (EDBT rsquo96) pp 1ndash17 1996

[55] F Masseglia F Cathala and P Poncelet ldquoThe PSP approachfor mining sequential patternsrdquo Principles of Data Mining andKnowledge Discovery pp 176ndash184 1998

[56] J Han J Pei B Mortazavi-Asl Q Chen U Dayal and M-CHsu ldquoFreeSpan frequent pattern-projected sequential patternminingrdquo in Proceedings of the Sixth ACMSIGKDD InternationalConference onKnowledgeDiscovery andDataMining (KDD rsquo01)pp 355ndash359 August 2000

[57] J Pei J Han B Mortazavi-Asl et al ldquoPrefixSpan min-ing sequential patterns efficiently by prefix-projected patterngrowthrdquo in Proceedings of the 17th International Conference onData Engineering pp 215ndash224 April 2001

[58] F Esposito T M A Basile N Di Mauro and S Ferilli ldquoA rela-tional approach to sensor network data miningrdquo InformationRetrieval and Mining in Distributed Environments pp 163ndash1812010

[59] F Esposito N Di Mauro T M A Basile and S FerillildquoMulti-dimensional relational sequence miningrdquo FundamentaInformaticae vol 89 no 1 pp 23ndash43 2008

[60] R Agrawal H Mannila R Srikant et al ldquoFast discovery ofassociation rulesrdquo inAdvances in KnowledgeDiscovery andDataMining pp 307ndash328 AAAI PressMenlo Park Calif USA 1996

[61] Mica2Dot CrossBow 2005 httpwwwxbowcom[62] Intel Berkeley Research Lab Data httpdbcsailmitedulab-

datalabdatahtml[63] P H Wu W C Peng and M S Chen ldquoMining sequential

alarm patterns in a telecommunication databaserdquo in Databasesin Telecommunications II pp 37ndash51 2001

[64] V S Tseng and E H-C Lu ldquoEnergy-efficient real-time objecttracking in multi-level sensor networks by mining and predict-ing movement patternsrdquo Journal of Systems and Software vol82 no 4 pp 697ndash706 2009

[65] V S Tseng and K W Lin ldquoEnergy efficient strategies for objecttracking in sensor networks a data mining approachrdquo Journalof Systems and Software vol 80 no 10 pp 1678ndash1698 2007

[66] S Samarah M Al-Hajri and A Boukerche ldquoA predictiveenergy-efficient technique to support object-tracking sensornetworksrdquo IEEE Transactions on Vehicular Technology vol 60no 2 pp 656ndash663 2011

[67] A Taherkordi R Mohammadi and F Eliassen ldquoA commu-nication-efficient distributed clustering algorithm for sensornetworksrdquo in Proceedings of the 22nd International Conferenceon Advanced Information Networking and Applications Work-shopsSymposia (AINA rsquo08) pp 634ndash638 March 2008

[68] G Gupta and M Younis ldquoLoad-balanced clustering of wirelesssensor networksrdquo in Proceedings of the International Conferenceon Communications (ICC rsquo03) vol 3 pp 1848ndash1852 May 2003

[69] S Bandyopadhyay and E J Coyle ldquoAn energy efficient hier-archical clustering algorithm for wireless sensor networksrdquo inProceedings of the 22nd Annual Joint Conference on the IEEEComputer and Communications Societies pp 1713ndash1723 April2003

[70] S Ghiasi A Srivastava X Yang and M Sarrafzadeh ldquoOptimalenergy aware clustering in sensor networksrdquo Sensors vol 2 no7 pp 258ndash269 2002

[71] O Younis and S Fahmy ldquoHEED a hybrid energy-efficientdistributed clustering approach for ad hoc sensor networksrdquoIEEE Transactions on Mobile Computing vol 3 no 4 pp 366ndash379 2004

[72] M Younis M Youssef and K Arisha ldquoEnergy-aware manage-ment for cluster-based sensor networksrdquo Computer Networksvol 43 no 5 pp 649ndash668 2003

[73] Y T Hou Y Shi H D Sherali and S F Midkiff ldquoOn energyprovisioning and relay node placement for wireless sensornetworksrdquo IEEE Transactions on Wireless Communications vol4 no 5 pp 2579ndash2590 2005

[74] T Wu and S Biswas ldquoA self-reorganizing slot allocation proto-col for multi-cluster sensor networksrdquo in Proceedings of the 4thInternational Symposium on Information Processing in SensorNetworks (IPSN rsquo05) pp 309ndash316 April 2005

[75] K Dasgupta K Kalpakis and P Namjoshi ldquoAn efficientclustering-based heuristic for data gathering and aggregationin sensor networksrdquo in Proceedings of the IEEE Wireless Com-munications and Networking Conference (WCNC rsquo03) vol 3 pp1948ndash1953 2003

[76] M Demirbas A Arora and V Mittal ldquoFLOC A fast local clus-tering service for wireless sensor networksrdquo in Proceedings ofWorkshop on Dependability Issues in Wireless Ad Hoc Networksand Sensor Networks (DIWANS rsquo04) 2004

[77] P Ding J Holliday and A Celik ldquoDistributed energy-efficienthierarchical clustering for wireless sensor networksrdquo in Pro-ceedings of the 1st IEEE International Conference on DistributedComputing in Sensor Systems (DCOSS rsquo05) pp 466ndash467 July2005

[78] H Chan and A Perrig ldquoACE an emergent algorithm for highlyuniform cluster formationrdquoWireless Sensor Networks vol 2920pp 154ndash171 2004

[79] H Chan M Luk and A Perrig ldquoUsing clustering informationfor sensor network localizationrdquo in Proceedings of the 1st IEEEInternational Conference on Distributed Computing in SensorSystems (DCOSS rsquo05) pp 109ndash125 July 2005

[80] H Huang and J Wu ldquoA probabilistic clustering algorithmin wireless sensor networksrdquo in Proceeding of IEEE 62ndSemiannual Vehicular Technology Conference (VTC rsquo05) p 17962005

[81] A Youssef M Younis M Youssef and A Agrawala ldquoDis-tributed formation of overlappingmulti-hop clusters in wirelesssensor networksrdquo in Proceedings of the 49th Annual IEEE GlobalCommunication Conference (Globecom rsquo06) pp 1ndash6 December2006

[82] S Dai P Wang L Gao and S Zheng ldquoMining clusteringalgorithm in wireless sensor networksrdquo in Proceedings of theIEEE International Conference on Granular Computing (GRCrsquo08) pp 178ndash182 August 2008

[83] W R Heinzelman A Chandrakasan and H Balakrish-nan ldquoEnergy-efficient communication protocol for wirelessmicrosensor networksrdquo in Proceedings of the 33rd AnnualHawaii International Conference on System Siences (HICSS rsquo00)vol 2 p 223 January 2000

[84] C Liu K Wu and J Pei ldquoA dynamic clustering and schedulingapproach to energy saving in data collection from wirelesssensor networksrdquo in Proceedings of the 2nd Annual IEEE Com-munications Society Conference on Sensor and AdHoc Commu-nications and Networks (SECON rsquo05) pp 374ndash385 September2005

[85] L Guo C Ai X Wang Z Cai and Y Li ldquoReal time clusteringof sensory data in wireless sensor networksrdquo in Proceedingsof the IEEE 28th International Performance Computing andCommunications Conference (IPCCC rsquo09) pp 33ndash40 December2009

24 International Journal of Distributed Sensor Networks

[86] M H Yeo M S Lee S J Lee and J S Yoo ldquoData correlation-based clustering in sensor networksrdquo in Proceedings of the Inter-national Symposium on Computer Science and its Applications(CSA rsquo08) pp 332ndash337 October 2008

[87] P Beyens A Nowe and K Steenhaut ldquoHigh-density wirelesssensor networks a new clustering approach for prediction-based monitoringrdquo in Proceedings of the 2nd European Work-shop on Wireless Sensor Networks (EWSN rsquo05) pp 188ndash196February 2005

[88] S Yoon and C Shahabi ldquoThe Clustered AGgregation (CAG)technique leveraging spatial and temporal correlations in wire-less sensor networksrdquo ACM Transactions on Sensor Networksvol 3 no 1 Article ID 1210672 2007

[89] K Wang S A Ayyash T D C Little and P Basu ldquoAttribute-based clustering for information dissemination in wirelesssensor networksrdquo in Proceedings of the 2nd Annual IEEE Com-munications Society Conference on Sensor and AdHoc Commu-nications and Networks (SECON rsquo05) pp 498ndash509 Santa ClaraCalif USA September 2005

[90] X Ma S Li Q Luo et al ldquoDistributed hierarchical clusteringand summarization in sensor networksrdquo in Advances in Dataand Web Management pp 168ndash175 2007

[91] L K Sharma O P Vyas S Schieder et al ldquoNearest neighbourclassification for trajectory datardquo Information and Communica-tion Technologies vol 101 pp 180ndash185 2010

[92] B Chikhaoui S Wang and H Pigot ldquoA new algorithm basedon sequential pattern mining for person identification in ubiq-uitous environmentsrdquo in Proceedings of the 4th InternationalWorkshop on Knowledge Discovery form Sensor Data (ACMSensorKDD rsquo10) pp 20ndash28 Washington DC USA 2010

[93] J R M Bauchet S Giroux H Pigot et al ldquoPervasive assistancein smart homes for people with intellectual disabilities a casestudy on meal preparationrdquo International Journal of AssistiveRobotics and Mechatronics vol 9 no 4 pp 42ndash54 2008

[94] D J Cook andM Schmitter-Edgecombe ldquoAssessing the qualityof activities in a smart environmentrdquoMethods of Information inMedicine vol 48 no 5 pp 480ndash485 2009

[95] I H Witten and E Frank Data Mining Practical MachineLearning Tools and Techniques With Java Implementation Mor-gan Kaufmann 2000

[96] K Sharma M Rajpoot and L K Sharma ldquoNearest neighbourclassification for wireless sensor network datardquo InternationalJournal of Computer Trends and Technology no 2 2011

[97] NS2 Simulator httpwwwisiedunsnamns[98] O P V L K Sharma S Schieder and A K Akasapu ldquoA nearest

neighbour classification for trajectory datardquo in Springer CCISvol 101 pp 180ndash185 2010

[99] M J Akhlaghinia A Lotfi C Langensiepen and N SherkatldquoA fuzzy predictor model for the occupancy prediction of anintelligent inhabited environmentrdquo in Proceedings of the IEEEInternational Conference on Fuzzy Systems (FUZZ rsquo08) pp 939ndash946 June 2008

[100] M Gaber S Krishnaswamy and A Zaslavsky ldquoOn-boardmining of data streams in sensor networksrdquo in AdvancedMethods for Knowledge Discovery from Complex Data pp 307ndash335 2005

[101] M M Gaber S Krishnaswamy and A Zaslavsky ldquoAdaptivemining techniques for data streams using algorithm outputgranularityrdquo in Proceedings of the Australasian Data MiningWorkshop 2003

[102] M M Gaber A Zaslavsky and S Krishnaswamy ldquoResource-aware knowledge discovery in data streamsrdquo in Proceedingsof 1st International Workshop on Knowledge Discovery in DataStreams held in Conjunction ECML and PKDD 2004

[103] S M McConnell and D B Skillicorn ldquoA distributed approachfor prediction in sensor networksrdquo in Proceedings of the Work-shop on Data Mining in Sensor Networks Newport Beach CalifUSA 2005

[104] B Malhotra I Nikolaidis and J Harms ldquoDistributed classifi-cation of acoustic targets in wireless audio-sensor networksrdquoComputer Networks vol 52 no 13 pp 2582ndash2593 2008

[105] K Flouri B Beferull-Lozano and T Tsakalides ldquoTraininga SVM-based classifier in distributed sensor networksrdquo inProceedings of the 14th International Conference onDigital SignalProcessing (DSP rsquo09) pp 1ndash5 2006

[106] K Flouri B Beferull-Lozano and T Tsakalides ldquoEnergy-efficient distributed support vectormachines for wireless sensornetworksrdquo in Proceedings of the EuropeanWorkshop onWirelessSensor Networks 2006

[107] K Flouri B Beferull-Lozano and T Tsakalides ldquoDistributedconsensus algorithms for SVM training in wireless sensornetworksrdquo in Proceedings of the 16th European Signal ProcessingConference (EUSIPCO 09) 2008

[108] S Rajasegarar C Leckie M Palaniswami and J C BezdekldquoQuarter sphere based distributed anomaly detection in wire-less sensor networksrdquo in Proceedings of the IEEE InternationalConference on Communications (ICC rsquo07) pp 3864ndash3869 June2007

[109] B Chikhaoui S Wang and H Pigot ldquoA new algorithm basedon sequential pattern mining for person identification in ubiq-uitous environmentsrdquo in Proceedings of the 4th InternationalWorkshop on Knowledge Discovery form Sensor Data (ACMSensorKDD rsquo10) pp 20ndash28 Washington DC USA 2010

[110] K Romer and F Mattern ldquoThe design space of wireless sensornetworksrdquo IEEEWireless Communications vol 11 no 6 pp 54ndash61 2004

[111] O Diallo J J P C Rodrigues and M Sene ldquoReal-time datamanagement on wireless sensor networks a surveyrdquo Journal ofNetwork andComputer Applications vol 35 no 3 pp 1013ndash10212012

[112] Y Yao L Feng B Jin and F Chen ldquoAn incremental learningapproachwith SupportVectorMachine for network data streamclassification problemrdquo Information Technology Journal vol 11no 2 pp 200ndash208 2012

Submit your manuscripts athttpwwwhindawicom

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2013Part I

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

DistributedSensor Networks

International Journal of

ISRN Signal Processing

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Mechanical Engineering

Advances in

Modelling amp Simulation in EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Advances inOptoElectronics

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2013

ISRN Sensor Networks

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawi Publishing Corporation httpwwwhindawicom Volume 2013

The Scientific World Journal

ISRN Robotics

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

International Journal of

Antennas andPropagation

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

ISRN Electronics

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

thinspJournalthinspofthinsp

Sensors

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Active and Passive Electronic Components

Chemical EngineeringInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Electrical and Computer Engineering

Journal of

ISRN Civil Engineering

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Advances inAcoustics ampVibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Page 18: ReviewArticle Data Mining Techniques for Wireless Sensor ...home.etf.bg.ac.rs/~vm/os/dmsw/Data Mining... · have a large impact on type of data mining algorithm to choose;therefore,onehastodecidetheprocessing

18 International Journal of Distributed Sensor Networks

mining becomes difficult because updates on this structureshould be persisted over time

Node Role Node can perform three types of role [33] asfollows

(i) Regular Sensor These are the nodes with limitedresources and they are used to sense the phenomenaand send the sensed data to the base station

(ii) Cluster Head Cluster head can be a regular sensornode or it can be rich in resources In centralizedapproaches cluster head is a regular sensor node thatonly controls the cluster membership In distributedapproaches besides responding for cluster formationCHs perform aggregationfusion of collected sensorsrsquodata Therefore they are equipped with significantlymore computation and communication resources

(iii) Relay It is the node that acts as medium to transmitthe data packet from one node to the others

Node Task In centralized approach node task is to sense thephenomena being monitored and send the sensed data to thebase station In distributed approaches node can performcomputation and can take action based on the detectedphenomena or target

55 Application Area We also evaluated the type of applica-tion benefited fromWSNs data mining techniques Here weexemplify some real-world applications as follows

(i) First is the environmental monitoring [5ndash7 51 5887] in which sensors are deployed in harsh andunattended regions to monitor the natural environ-ment Data mining techniques can identify when andwhere an event may occur and trigger an alarm upondetection

(ii) Second is the habitant and health monitoring [1 299 109] in which patientshumans are equipped withsmall sensors on multiple different positions of theirbody tomonitor their health or behaviorDataminingtechnique can identify the abnormal behavior andhelp to take effective action

(iii) Third is the object tracking [3 4 65 66] in whichsensors are embedded inmoving targets to track themin real-time Data mining techniques help to improvethe estimation of the location of targets and also tomake tracking more efficient and accurate

(iv) Fourth is the WSNs performance [46 48 50 51]WSNs are usually unattended and deployed in harshenvironment Sensor nodes are resource constrainedespecially in terms of power Data mining techniqueshelp to identify the faulty or dead nodes Theyalso help to conserve energy by using in-networkprocessing in which aggregated data is sent to centralside

(v) Fifth is the data analysis [67 84 90] Data miningtechniques help to discover potentially interesting

data patterns in a sensor network for a certainapplication

(vi) Sixth is the real-time monitoring [64 65 85] Datamining techniques especially distributed techniqueshelp to identify certain patterns and predict futureevents in a given time window which make real-timeresponse and action feasible

56 Implementation Each technique is also evaluated interms of experimental validation that is which dataset isused which WSNs optimization objectives are achieved andso forth

Evaluation Method Analytical modeling simulation andreal deployment are the most commonly used techniques toanalyze the performance of data mining technique forWSNs

(i) Analytical Modeling This method is very complexand usually certain simplifications are assumed topredict the performance of the proposed schemeSuch assumptions and simplifications may lead toimprecise results with limited confidence

(ii) Simulation It is the most popular and effectiveapproach to design and test any proposed schemein terms of cost and time it also provides higherlevel of details as comparedwith real implementationHowever the appropriate selection of a simulationframework according to problem and network char-acteristics is a critical task

(iii) Real Deployment It may not be feasible to evaluatethe performance of these techniques through realdeployment due to the unavailability of appropriatehardware in terms of technical and design limitationsUsually the real deployment requires hundreds ofsensor nodes and cost becomes another importantissue In a nutshell evaluating any technique pro-posed for WSNs through real deployment can getthe most convincing results although the evaluatingprocess is complex costly and time consuming

Data Source It refers to dataset use to experimentally validatethe proposed technique Two types of dataset are usedgenerally that is synthetic and real It is observed from thispaper that most of the techniques use the simulation onsynthetic dataset to validate the result In this paper it isobserved that most of the studies used the simulation due tolimited processing power of sensor nodes

Optimization Objective SinceWSNs are constrained in termsof different resources the technique is also evaluated in theoptimization objective that has been achieved Most of thetechniques consider the resource constraint and differentdesign philosophies of network None of them can workefficiently for all of the performance metrics like networksize communication overhead energy efficiency memoryconsumption node mobility and and so forth The largevariations in the performance metrics make it a difficult taskto present a comprehensive evaluation

International Journal of Distributed Sensor Networks 19

6 Limitations of Existing Data MiningTechniques for WSNs

Tables 2ndash6 show the characteristics of datamining techniquesdesigned for WSNs It is observed from comparative analysisthat the existing techniques have the following shortcomings

(i) Most of the techniques do not take into account theheterogeneous data and assume that the sensor data ishomogenous [42 46 49ndash51 65 87 110] They ignorethe fact that different attributes together can improvethe mining accuracy In some cases homogenousdata cannot contribute appropriately toward real-time decision

(ii) The majority of techniques only considers the spatialor temporal or spatiotemporal correlations [65ndash6787 88] among sensor data of neighboring nodes anddoes not consider the attribute dependency amongsensor nodes This in turn increases the computa-tional complexity and reduces the accuracy of miningtechnique

(iii) The techniqueswhich consider spatial correlation [51]among sensor data of neighboring nodes suffer fromthe choice of appropriate neighborhood range Tech-niques which consider temporal correlation amongsensor data suffers from the choice of the size of thesliding window

(iv) The majority of techniques uses centralized approach[21 42ndash44 46 58 84 101] in which all data istransmitted to the sink node for identifying certainpatterns These techniques cause much communica-tion overhead and delay the response time Whilethe techniques that used distributed architecture opti-mize response time and energy consumption theyhave the same problem as that of the centralizedapproach if the aggregatorcluster head has a largenumber of nodes under its membership

(v) Excluding a few the performance of all of the schemesdiscussed in this paper has been evaluated with thehelp of different simulation tools Although the num-ber of simulators is available and plays an importantrole for developing and testing new technique thereis always some kind of risk involved as simulationresults may not be accurate In order to analyze aprotocol more effectively it is important to knowdifferent available tools andunderstand the associatedbenefits and limitationsDue to different performancerequirements according to specific applications ageneral tool for sensor networks is still lacking atpresent

(vi) The techniques evaluated by using analytical mod-eling [21 23 46 49 100 109] used certain sim-plification and assumption to evaluate the perfor-mance of proposed technique Such assumptions andsimplifications may lead to imprecise results withlimited confidence None of the proposed techniqueis evaluated by using real deployment Although realdeployment is complex costly and time consuming

accurate results can only be obtained by using realdeployment

(vii) Excluding a few [22 103 109] the majority oftechniques assumes that sensor nodes are stationaryand do not consider nodes mobility Applying thesetechniques for mobile networks or the networks withdynamic changed topology would be challenging

(viii) Most of the techniques used the synthetic dataAlthough synthetic data is easily available therealways been chances that results generated on syn-thetic data are not accurate

(ix) For the data mining techniques themselves fre-quent pattern mining [15ndash20] approaches suffer fromchoice of proper and flexible support and confidencethreshold Clustering techniques [11ndash14] suffer fromthe choice of an appropriate parameter of clusterwidth and computing the distance between datainstances in heterogeneous data is computationallyexpensive whereas classification-based techniques[24ndash26] require some prior knowledge to classify theincoming data stream However learning accurateclassification model is challenging if the number ofvariables is large in deployed WSNs

7 Future Research Directions

It is observed from the analysis of existing data mining workon sensor network-based application there are still shortcom-ings in existing techniques By seeing these shortcomingsand special characteristics of WSNs there is a need for datamining technique designed for WSNs The technique shouldbe based on the following requirements

(i) The technique should combine offline learningmech-anisms with distributed and online data processing

(ii) It should also consider the resource constraint ofWSN and its special characteristics such as nodemobility and network topology

(iii) The technique should consider heterogeneous dataand dependencies among spatial temporal andattribute correlations which may exist between adja-cent nodes

(iv) During online mining the technique should be capa-ble for incremental learning

(v) The technique should have low computation com-plexity and be easy to be implemented

Based on aforementioned requirements for WSN ahybrid data mining framework is proposed as shown inFigure 6 In this framework sensor nodes use their pro-cessing abilities to locally carry out mining processing andtransmit only the required and partially processed data calledlocal models Single-pass algorithms are applied for networkdata processing as the data is continuously arriving and notavailable for the next scan

Local models contain the compact event patterns ratherthan raw data which address the issue of communication

20 International Journal of Distributed Sensor Networks

Node data processingData selectionRemove duplicationAggregationSummarizationData fusionclusteringAssociation analysismiddot middot middotmiddot middot middot

middot middot middot

Sensor datastream

Global model

Approximateresults

Network model Local modelQuery

Users

Sinkbasestation

In-network processingCentralized processing

Central data processingFrequent pattern miningClassificationClusteringIncremental learningPredicationAnomaly detectionTime series analysis

Network data processingLocal model integrationNetwork analysisReal time decisionsNetwork maintenance

Network patternidentification

monitoring

Sing

le p

ass

Mul

ti pa

ss

Figure 6 Proposed hybrid framework for sensor network based applications

overhead associated with data transfer Local models aredistributed on entire network which are integrated at specialnode which is resource sufficient as compared with othersensor nodes As a result a network model is computed that ismore abstract than local model and is transferred to the basestationsink inmultihop fashionThenetworkmodels are thenintegrated at base stationsink to get the global view of entirenetwork named the global model As a result approximatequery answers are returned to endusers

This framework addresses the following shortcomings ofthe existing techniques

(i) It combines the offline learning mechanisms withdistributed and online data processing The dynamicnature of WSNs data requires real-time analysismethodologies and systems Centralized processingthrough high-end computing is also required forgenerating offline predictive insights which in turncan facilitate real-time analysis The applications thatrequire real-time response and actions can use net-work model for decision and knowledge extractionThe applications that need extensive data analysis fortheir decision making can use global model and per-form central processing on base the stationsink Thenetwork model forwards the processed informationto global model for extensive predictive insight

(ii) Since the data management is a crucial issue inWSNsdata [111] in order to deal with large-scale data fromWSNs the proposed framework splits the data pro-cessing tasks at multiple locations in-network pro-cessing and processing at central server In-networkprocessing splits the large task into smaller ones atnode level and cluster head which is distributed overthe entire network and executes parallelly At the node

level storage capacities of single nodes are used tocompute the local model which contains aggregateddata from single node whereas cluster head acquiresthe data from group of nodes and aggregate datareadings over a certain region or period As a resultnetwork model is computed at each cluster headwhich contains compact data from set of nodes andreduces data size to be transmitted Network modelscan be integrated at sink to get the global view ofreal-time applications Since the sink at network levelhas restricted resource and cannot process large-scaledata for predictive analysis therefore network mod-els are sent to central server where global models canbe computed for predictive offline analysis Historicalquery from the user can also be addressed fromcentral server whereas instant query can be handledby sink to support real-time response In this way ofdata distribution the proposed framework is feasibleto deal with large amount of data obtained fromWSNs

(iii) It can consider the resource constraint of sensornode by using context-awareness techniques Mem-ory energy [79] and bandwidth are considered inthe implementation of data processing on the sensorsfor example many summarization and aggregationtechniques can be adopted to reduce energy andbandwidth consumption

(iv) The framework can address the problem quicklychanging nature of WSNs data where characteristicsof the monitored process may change over timeand render the old models outdated This problemcan be addressed using the incremental learning

International Journal of Distributed Sensor Networks 21

mechanism [39 112] that helps the model to updatenew information

(v) The framework can identified the spatial-temporalcorrelation at local model by using data correlation-based clustering whereas attribute correlation can beidentified at global model by using the multipass datamining algorithms

Currently we are working on implementation of thishybrid framework and the implementationwill be completedin the near future

8 Conclusion

The emerging need for the data mining techniques in thefield of WSNs resulted in the development of numerousalgorithms Each one of these algorithms solves certainissues related to the appropriate WSNs type and applicationIn this paper we analyzed discussed and compared therelated existing research approaches We observed that thetechniques intended for mining sensor data at the networkside are helpful for taking real-time decision aswell as serve asprerequisite for development of effective mechanism for datastorage retrieval query and transaction processing at centralside Moreover we have presented problem-based taxonomyan overall analysis and review of the past research and theirlimitations which can provide insights for endusers in apply-ing or developing an appropriate data mining method andappropriate technology forWSNs Based on these limitationswe have proposed a hybrid framework which can addressthe shortcomings of existing work We have also discussedthe challenges for implementing data mining techniques inresource-constrained WSNs Besides there are a number ofopen issues in existing studies which need to be addressedSurely the number of WSNs applications presented hereis neither complete nor exhaustive but merely a sample ofapplications that demonstrate the usefulness and possibleapplications of data mining method in sensor network

We believe that WSNs applications will become moremature and popular with the advancement of sensor tech-nology and sensor data will become more informationrich Mining techniques will then be very significant inorder to conduct advanced analysis such as determiningtrends and finding interesting patterns thus enhancingWSNsperformance and operation The intention to present thispaper is to stimulate interests in utilizing and developing theprevious studies into emerging applications

Acknowledgments

This work was supported in part by the Joint Funds ofNSFC-Microsoft Research Asia under Grant no 60933012the Specialized Research Fund for the Doctoral Programof Higher Education under Grant no 20110142110062 andInternational SampT Cooperation Program of Hubei Provinceunder Grant no 2010BFA008

References

[1] A Rozyyev H Hasbullah and F Subhan ldquoIndoor child track-ing in wireless sensor network using fuzzy logic techniquerdquoResearch Journal of Information Technology vol 3 no 2 pp 81ndash92 2011

[2] R Szewczyk E Osterweil J Polastre M Hamilton A Main-waring and D Estrin ldquoHabitat monitoring with sensor net-worksrdquo Communications of the ACM vol 47 no 6 pp 34ndash402004

[3] S H Chauhdary A K Bashir S C Shah and M S ParkldquoEOATR energy efficient object tracking by auto adjustingtransmission range in wireless sensor networkrdquo Journal ofApplied Sciences vol 9 no 24 pp 4247ndash4252 2009

[4] P K Biswas and S Phoha ldquoSelf-organizing sensor networks forintegrated target surveillancerdquo IEEETransactions onComputersvol 55 no 8 pp 1033ndash1047 2006

[5] L T Lee and C W Chen ldquoSynchronizing sensor networkswith pulse coupled and cluster based approachesrdquo InformationTechnology Journal vol 7 no 5 pp 737ndash745 2008

[6] N Sabri S A Aljunid B Ahmad A Yahya R KamaruddinandM S Salim ldquoWireless sensor actor network based on fuzzyinference system for greenhouse climate controlrdquo Journal ofApplied Sciences vol 11 no 17 pp 3104ndash3116 2011

[7] D Kumar ldquoMonitoring forest cover changes using remotesensing and GIS a global prospectiverdquo Research Journal ofEnvironmental Sciences vol 5 pp 105ndash123 2011

[8] J Yick B Mukherjee and D Ghosal ldquoWireless sensor networksurveyrdquoComputerNetworks vol 52 no 12 pp 2292ndash2330 2008

[9] T Arampatzis J Lygeros and S Manesis ldquoA survey of appli-cations of wireless sensors and wireless sensor networksrdquoin Proceedings of the 20th IEEE International Symposium onIntelligent Control (ISIC rsquo05) pp 719ndash724 June 2005

[10] Y-C Tseng M-S Pan and Y-Y Tsai ldquoWireless sensor net-works for emergency navigationrdquo Computer vol 39 no 7 pp55ndash62 2006

[11] T Yairi Y Kato and K Hori ldquoFault detection by miningassociation rules fromhouse-keeping datardquo inProceedings of the6th International Symposium on Artificial Intelligence Roboticsand Automation in Space pp 18ndash21 2001

[12] O Horovitz S Krishnaswamy and M M Gaber ldquoA fuzzyapproach for interpretation of ubiquitous data stream clusteringand its application in road safetyrdquo Intelligent Data Analysis vol11 no 1 pp 89ndash108 2007

[13] J Gama P P Rodrigues and L Lopes ldquoClustering distributedsensor data streams using local processing and reduced com-municationrdquo Intelligent Data Analysis vol 15 no 1 pp 3ndash282011

[14] Z A Aghbari I Kamel and T Awad ldquoOn clustering largenumber of data streamsrdquo Intelligent Data Analysis vol 16 no1 pp 69ndash91 2012

[15] A Boukerche and S Samarah ldquoAn efficient data extractionmechanism for mining association rules from wireless sensornetworksrdquo in Proceedings of the IEEE International Conferenceon Communications (ICC rsquo07) pp 3936ndash3941 June 2007

[16] Y Chi H Wang P S Yu and R R Muntz ldquoMomentmaintaining closed frequent itemsets over a stream slidingwindowrdquo inProceedings of the 4th IEEE International Conferenceon Data Mining (ICDM rsquo04) pp 59ndash66 November 2004

[17] M Deypir and M H Sadreddini ldquoEclatDS an efficient slid-ing window based frequent pattern mining method for data

22 International Journal of Distributed Sensor Networks

streamsrdquo Intelligent Data Analysis vol 15 no 4 pp 571ndash5872011

[18] J Gama A Ganguly O Omitaomu R Vatsavai and M GaberldquoKnowledge discovery from data streamsrdquo Intelligent DataAnalysis vol 13 no 3 pp 403ndash404 2009

[19] B George J M Kang and S Shekhar ldquoSpatio-temporal sensorgraphs (STSG) a data model for the discovery of spatio-temporal patternsrdquo Intelligent Data Analysis vol 13 no 3 pp457ndash475 2009

[20] A Mahmood K Shi and S Khatoon ldquoMining data generatedby sensor networks a surveyrdquo Information Technology Journalvol 11 pp 1534ndash1543 2012

[21] D J Cook M Youngblood E O Heierman III et alldquoMavHome an agent-based smart homerdquo in Proceedings of the1st IEEE International Conference on Pervasive Computing andCommunications (PerCom rsquo03) pp 521ndash524 March 2003

[22] J Rabatel S Bringay and P Poncelet ldquoSO MAD sensorminingfor anomaly detection in railway datardquo in Advances in DataMining Applications andTheoretical Aspects pp 191ndash205 2009

[23] V Guralnik and K Z Haigh ldquoLearning models of humanbehaviour with sequential patternsrdquo in Proceedings of the AAAI-02 Workshop on Automation as Caregiver pp 24ndash30 2002

[24] S Huang and Y Dong ldquoAn active learning system for miningtime-changing data streamsrdquo Intelligent Data Analysis vol 11no 4 pp 401ndash419 2007

[25] J Beringer and E Hullermeier ldquoEfficient instance-based learn-ing on data streamsrdquo Intelligent Data Analysis vol 11 no 6 pp627ndash650 2007

[26] E J Spinosaa A PD L F deCarvalhoa and J Gamab ldquoNoveltydetection with application to data streamsrdquo Intelligent DataAnalysis vol 13 no 3 pp 405ndash422 2009

[27] M Xie S Han B Tian and S Parvin ldquoAnomaly detectionin wireless sensor networks a surveyrdquo Journal of Network andComputer Applications vol 34 no 4 pp 1302ndash1325 2011

[28] Y Zhang N Meratnia and P Havinga ldquoOutlier detectiontechniques for wireless sensor networks a surveyrdquo IEEE Com-munications Surveys and Tutorials vol 12 no 2 pp 159ndash1702010

[29] V Chandola A Banerjee and V Kumar ldquoAnomaly detection asurveyrdquo ACM Computing Surveys vol 41 no 3 article 15 2009

[30] VMaojo and J Sanandre ldquoA survey of data mining techniquesrdquoMedical Data Analysis Lecture Notes in Computer Science vol1933 pp 17ndash22 2000

[31] W Jinlong X Congfu C Weidong and P Yunhe ldquoSurveyof the study on frequent pattern mining in data streamsrdquo inProceedings of the IEEE International Conference on SystemsMan and Cybernetics (SMC rsquo04) pp 5917ndash5922 October 2004

[32] J Cheng Y Ke and W Ng ldquoA survey on algorithms formining frequent itemsets over data streamsrdquo Knowledge andInformation Systems vol 16 no 1 pp 1ndash27 2008

[33] A A Abbasi andM Younis ldquoA survey on clustering algorithmsfor wireless sensor networksrdquo Computer Communications vol30 no 14-15 pp 2826ndash2841 2007

[34] O Boyinbode H Le and M Takizawa ldquoA survey on clusteringalgorithms for wireless sensor networksrdquo International Journalof Space-Based and SituatedComputing vol 1 no 2 pp 130ndash1362007

[35] M M Gaber A Zaslavsky and S Krishnaswamy ldquoA survey ofclassificationmethods in data streamsrdquo inData Streams pp 39ndash59 Springer 2007

[36] R Agrawal and R Srikant ldquoFast algorithms for mining associ-ation rulesrdquo in Proceedings of the 20th International ConferenceVery Large Data Bases (VLDB rsquo94) pp 487ndash499 Citeseer 1994

[37] R J Bayardo Jr ldquoEfficiently mining long patterns fromdatabasesrdquo SIGMOD Record vol 27 no 2 pp 85ndash93 1998

[38] S Brin RMotwani andC Silverstein ldquoBeyondmarket basketsgeneralizing association rules to correlationsrdquo SIGMODRecordvol 26 no 2 pp 265ndash276 1997

[39] W Cheung and O R Zaiane ldquoIncremental mining of frequentpatterns without candidate generation or support constraintrdquoin Proceedings of 7th International Database Engineering andApplications Symposium pp 111ndash116 2003

[40] R Agrawal T Imielinski and A Swami ldquoMining associationrules between sets of items in large databasesrdquo in Proceeding ofSIGMOD pp 207ndash216

[41] J Han J Pei Y Yin and R Mao ldquoMining frequent pat-terns without candidate generation a frequent-pattern treeapproachrdquo Data Mining and Knowledge Discovery vol 8 no 1pp 53ndash87 2004

[42] M Halatchev and L Gruenwald ldquoEstimating missing valuesin related sensor data streamsrdquo in Proceedings of the 11thInternational Conference on Management of Data (COMADrsquo05) 2005

[43] N Jiang ldquoDiscovering association rules in data streams basedon closed pattern miningrdquo in Proceedings of the SIGMODWorkshop on Innovative Database Research 2007

[44] N Jiang and L Gruenwald ldquoEstimating missing data in datastreamsrdquo Advances in Databases Concepts Systems and Appli-cations pp 981ndash987 2007

[45] N Jiang and L Gruenwald ldquoCFI-stream mining closed fre-quent itemsets in data streamsrdquo in Proceedings of the 12th ACMSIGKDD International Conference on Knowledge Discovery andData Mining (KDD rsquo06) pp 592ndash597 August 2006

[46] K Loo I Tong and B Kao ldquoOnline algorithms for min-ing inter-stream associations from large sensor networksrdquo inAdvances in KnowledgeDiscovery andDataMining pp 291ndash3022005

[47] G S Manku and R Motwani ldquoApproximate frequency countsover data streamsrdquo in Proceedings of the 28th InternationalConference on Very Large Data Bases pp 346ndash357 2002

[48] S K Chong S Krishnaswamy S W Loke and M M GaberldquoUsing association rules for energy conservation in wirelesssensor networksrdquo in Proceedings of the 23rd Annual ACMSymposium on Applied Computing (SAC rsquo08) pp 971ndash975March 2008

[49] S K Tanbeer C F Ahmed B-S Jeong and Y-K Lee ldquoEfficientmining of association rules from wireless sensor networksrdquo inProceedings of the 11th International Conference on AdvancedCommunication Technology (ICACT rsquo09) pp 719ndash724 February2009

[50] A Boukerche and S Samarah ldquoA novel algorithm for miningassociation rules in Wireless Ad Hoc Sensor Networksrdquo IEEETransactions on Parallel and Distributed Systems vol 19 no 7pp 865ndash877 2008

[51] K Romer ldquoDistributed mining of spatio-temporal event pat-terns in sensor networksrdquo in Proceedings of the 1st Euro-American Workshop on Middleware for Sensor Networks(EAWMS rsquo06) 2006

[52] BTnode platform httpwwwbtnodeethzch[53] R Agrawal and R Srikant ldquoMining sequential patternsrdquo in

Proceedings of the IEEE 11th International Conference on DataEngineering pp 3ndash14 March 1995

International Journal of Distributed Sensor Networks 23

[54] R Srikant and R Agrawal ldquoMining sequential patterns gen-eralizations and performance improvementsrdquo in Proceedings ofthe Advances in Database Technology (EDBT rsquo96) pp 1ndash17 1996

[55] F Masseglia F Cathala and P Poncelet ldquoThe PSP approachfor mining sequential patternsrdquo Principles of Data Mining andKnowledge Discovery pp 176ndash184 1998

[56] J Han J Pei B Mortazavi-Asl Q Chen U Dayal and M-CHsu ldquoFreeSpan frequent pattern-projected sequential patternminingrdquo in Proceedings of the Sixth ACMSIGKDD InternationalConference onKnowledgeDiscovery andDataMining (KDD rsquo01)pp 355ndash359 August 2000

[57] J Pei J Han B Mortazavi-Asl et al ldquoPrefixSpan min-ing sequential patterns efficiently by prefix-projected patterngrowthrdquo in Proceedings of the 17th International Conference onData Engineering pp 215ndash224 April 2001

[58] F Esposito T M A Basile N Di Mauro and S Ferilli ldquoA rela-tional approach to sensor network data miningrdquo InformationRetrieval and Mining in Distributed Environments pp 163ndash1812010

[59] F Esposito N Di Mauro T M A Basile and S FerillildquoMulti-dimensional relational sequence miningrdquo FundamentaInformaticae vol 89 no 1 pp 23ndash43 2008

[60] R Agrawal H Mannila R Srikant et al ldquoFast discovery ofassociation rulesrdquo inAdvances in KnowledgeDiscovery andDataMining pp 307ndash328 AAAI PressMenlo Park Calif USA 1996

[61] Mica2Dot CrossBow 2005 httpwwwxbowcom[62] Intel Berkeley Research Lab Data httpdbcsailmitedulab-

datalabdatahtml[63] P H Wu W C Peng and M S Chen ldquoMining sequential

alarm patterns in a telecommunication databaserdquo in Databasesin Telecommunications II pp 37ndash51 2001

[64] V S Tseng and E H-C Lu ldquoEnergy-efficient real-time objecttracking in multi-level sensor networks by mining and predict-ing movement patternsrdquo Journal of Systems and Software vol82 no 4 pp 697ndash706 2009

[65] V S Tseng and K W Lin ldquoEnergy efficient strategies for objecttracking in sensor networks a data mining approachrdquo Journalof Systems and Software vol 80 no 10 pp 1678ndash1698 2007

[66] S Samarah M Al-Hajri and A Boukerche ldquoA predictiveenergy-efficient technique to support object-tracking sensornetworksrdquo IEEE Transactions on Vehicular Technology vol 60no 2 pp 656ndash663 2011

[67] A Taherkordi R Mohammadi and F Eliassen ldquoA commu-nication-efficient distributed clustering algorithm for sensornetworksrdquo in Proceedings of the 22nd International Conferenceon Advanced Information Networking and Applications Work-shopsSymposia (AINA rsquo08) pp 634ndash638 March 2008

[68] G Gupta and M Younis ldquoLoad-balanced clustering of wirelesssensor networksrdquo in Proceedings of the International Conferenceon Communications (ICC rsquo03) vol 3 pp 1848ndash1852 May 2003

[69] S Bandyopadhyay and E J Coyle ldquoAn energy efficient hier-archical clustering algorithm for wireless sensor networksrdquo inProceedings of the 22nd Annual Joint Conference on the IEEEComputer and Communications Societies pp 1713ndash1723 April2003

[70] S Ghiasi A Srivastava X Yang and M Sarrafzadeh ldquoOptimalenergy aware clustering in sensor networksrdquo Sensors vol 2 no7 pp 258ndash269 2002

[71] O Younis and S Fahmy ldquoHEED a hybrid energy-efficientdistributed clustering approach for ad hoc sensor networksrdquoIEEE Transactions on Mobile Computing vol 3 no 4 pp 366ndash379 2004

[72] M Younis M Youssef and K Arisha ldquoEnergy-aware manage-ment for cluster-based sensor networksrdquo Computer Networksvol 43 no 5 pp 649ndash668 2003

[73] Y T Hou Y Shi H D Sherali and S F Midkiff ldquoOn energyprovisioning and relay node placement for wireless sensornetworksrdquo IEEE Transactions on Wireless Communications vol4 no 5 pp 2579ndash2590 2005

[74] T Wu and S Biswas ldquoA self-reorganizing slot allocation proto-col for multi-cluster sensor networksrdquo in Proceedings of the 4thInternational Symposium on Information Processing in SensorNetworks (IPSN rsquo05) pp 309ndash316 April 2005

[75] K Dasgupta K Kalpakis and P Namjoshi ldquoAn efficientclustering-based heuristic for data gathering and aggregationin sensor networksrdquo in Proceedings of the IEEE Wireless Com-munications and Networking Conference (WCNC rsquo03) vol 3 pp1948ndash1953 2003

[76] M Demirbas A Arora and V Mittal ldquoFLOC A fast local clus-tering service for wireless sensor networksrdquo in Proceedings ofWorkshop on Dependability Issues in Wireless Ad Hoc Networksand Sensor Networks (DIWANS rsquo04) 2004

[77] P Ding J Holliday and A Celik ldquoDistributed energy-efficienthierarchical clustering for wireless sensor networksrdquo in Pro-ceedings of the 1st IEEE International Conference on DistributedComputing in Sensor Systems (DCOSS rsquo05) pp 466ndash467 July2005

[78] H Chan and A Perrig ldquoACE an emergent algorithm for highlyuniform cluster formationrdquoWireless Sensor Networks vol 2920pp 154ndash171 2004

[79] H Chan M Luk and A Perrig ldquoUsing clustering informationfor sensor network localizationrdquo in Proceedings of the 1st IEEEInternational Conference on Distributed Computing in SensorSystems (DCOSS rsquo05) pp 109ndash125 July 2005

[80] H Huang and J Wu ldquoA probabilistic clustering algorithmin wireless sensor networksrdquo in Proceeding of IEEE 62ndSemiannual Vehicular Technology Conference (VTC rsquo05) p 17962005

[81] A Youssef M Younis M Youssef and A Agrawala ldquoDis-tributed formation of overlappingmulti-hop clusters in wirelesssensor networksrdquo in Proceedings of the 49th Annual IEEE GlobalCommunication Conference (Globecom rsquo06) pp 1ndash6 December2006

[82] S Dai P Wang L Gao and S Zheng ldquoMining clusteringalgorithm in wireless sensor networksrdquo in Proceedings of theIEEE International Conference on Granular Computing (GRCrsquo08) pp 178ndash182 August 2008

[83] W R Heinzelman A Chandrakasan and H Balakrish-nan ldquoEnergy-efficient communication protocol for wirelessmicrosensor networksrdquo in Proceedings of the 33rd AnnualHawaii International Conference on System Siences (HICSS rsquo00)vol 2 p 223 January 2000

[84] C Liu K Wu and J Pei ldquoA dynamic clustering and schedulingapproach to energy saving in data collection from wirelesssensor networksrdquo in Proceedings of the 2nd Annual IEEE Com-munications Society Conference on Sensor and AdHoc Commu-nications and Networks (SECON rsquo05) pp 374ndash385 September2005

[85] L Guo C Ai X Wang Z Cai and Y Li ldquoReal time clusteringof sensory data in wireless sensor networksrdquo in Proceedingsof the IEEE 28th International Performance Computing andCommunications Conference (IPCCC rsquo09) pp 33ndash40 December2009

24 International Journal of Distributed Sensor Networks

[86] M H Yeo M S Lee S J Lee and J S Yoo ldquoData correlation-based clustering in sensor networksrdquo in Proceedings of the Inter-national Symposium on Computer Science and its Applications(CSA rsquo08) pp 332ndash337 October 2008

[87] P Beyens A Nowe and K Steenhaut ldquoHigh-density wirelesssensor networks a new clustering approach for prediction-based monitoringrdquo in Proceedings of the 2nd European Work-shop on Wireless Sensor Networks (EWSN rsquo05) pp 188ndash196February 2005

[88] S Yoon and C Shahabi ldquoThe Clustered AGgregation (CAG)technique leveraging spatial and temporal correlations in wire-less sensor networksrdquo ACM Transactions on Sensor Networksvol 3 no 1 Article ID 1210672 2007

[89] K Wang S A Ayyash T D C Little and P Basu ldquoAttribute-based clustering for information dissemination in wirelesssensor networksrdquo in Proceedings of the 2nd Annual IEEE Com-munications Society Conference on Sensor and AdHoc Commu-nications and Networks (SECON rsquo05) pp 498ndash509 Santa ClaraCalif USA September 2005

[90] X Ma S Li Q Luo et al ldquoDistributed hierarchical clusteringand summarization in sensor networksrdquo in Advances in Dataand Web Management pp 168ndash175 2007

[91] L K Sharma O P Vyas S Schieder et al ldquoNearest neighbourclassification for trajectory datardquo Information and Communica-tion Technologies vol 101 pp 180ndash185 2010

[92] B Chikhaoui S Wang and H Pigot ldquoA new algorithm basedon sequential pattern mining for person identification in ubiq-uitous environmentsrdquo in Proceedings of the 4th InternationalWorkshop on Knowledge Discovery form Sensor Data (ACMSensorKDD rsquo10) pp 20ndash28 Washington DC USA 2010

[93] J R M Bauchet S Giroux H Pigot et al ldquoPervasive assistancein smart homes for people with intellectual disabilities a casestudy on meal preparationrdquo International Journal of AssistiveRobotics and Mechatronics vol 9 no 4 pp 42ndash54 2008

[94] D J Cook andM Schmitter-Edgecombe ldquoAssessing the qualityof activities in a smart environmentrdquoMethods of Information inMedicine vol 48 no 5 pp 480ndash485 2009

[95] I H Witten and E Frank Data Mining Practical MachineLearning Tools and Techniques With Java Implementation Mor-gan Kaufmann 2000

[96] K Sharma M Rajpoot and L K Sharma ldquoNearest neighbourclassification for wireless sensor network datardquo InternationalJournal of Computer Trends and Technology no 2 2011

[97] NS2 Simulator httpwwwisiedunsnamns[98] O P V L K Sharma S Schieder and A K Akasapu ldquoA nearest

neighbour classification for trajectory datardquo in Springer CCISvol 101 pp 180ndash185 2010

[99] M J Akhlaghinia A Lotfi C Langensiepen and N SherkatldquoA fuzzy predictor model for the occupancy prediction of anintelligent inhabited environmentrdquo in Proceedings of the IEEEInternational Conference on Fuzzy Systems (FUZZ rsquo08) pp 939ndash946 June 2008

[100] M Gaber S Krishnaswamy and A Zaslavsky ldquoOn-boardmining of data streams in sensor networksrdquo in AdvancedMethods for Knowledge Discovery from Complex Data pp 307ndash335 2005

[101] M M Gaber S Krishnaswamy and A Zaslavsky ldquoAdaptivemining techniques for data streams using algorithm outputgranularityrdquo in Proceedings of the Australasian Data MiningWorkshop 2003

[102] M M Gaber A Zaslavsky and S Krishnaswamy ldquoResource-aware knowledge discovery in data streamsrdquo in Proceedingsof 1st International Workshop on Knowledge Discovery in DataStreams held in Conjunction ECML and PKDD 2004

[103] S M McConnell and D B Skillicorn ldquoA distributed approachfor prediction in sensor networksrdquo in Proceedings of the Work-shop on Data Mining in Sensor Networks Newport Beach CalifUSA 2005

[104] B Malhotra I Nikolaidis and J Harms ldquoDistributed classifi-cation of acoustic targets in wireless audio-sensor networksrdquoComputer Networks vol 52 no 13 pp 2582ndash2593 2008

[105] K Flouri B Beferull-Lozano and T Tsakalides ldquoTraininga SVM-based classifier in distributed sensor networksrdquo inProceedings of the 14th International Conference onDigital SignalProcessing (DSP rsquo09) pp 1ndash5 2006

[106] K Flouri B Beferull-Lozano and T Tsakalides ldquoEnergy-efficient distributed support vectormachines for wireless sensornetworksrdquo in Proceedings of the EuropeanWorkshop onWirelessSensor Networks 2006

[107] K Flouri B Beferull-Lozano and T Tsakalides ldquoDistributedconsensus algorithms for SVM training in wireless sensornetworksrdquo in Proceedings of the 16th European Signal ProcessingConference (EUSIPCO 09) 2008

[108] S Rajasegarar C Leckie M Palaniswami and J C BezdekldquoQuarter sphere based distributed anomaly detection in wire-less sensor networksrdquo in Proceedings of the IEEE InternationalConference on Communications (ICC rsquo07) pp 3864ndash3869 June2007

[109] B Chikhaoui S Wang and H Pigot ldquoA new algorithm basedon sequential pattern mining for person identification in ubiq-uitous environmentsrdquo in Proceedings of the 4th InternationalWorkshop on Knowledge Discovery form Sensor Data (ACMSensorKDD rsquo10) pp 20ndash28 Washington DC USA 2010

[110] K Romer and F Mattern ldquoThe design space of wireless sensornetworksrdquo IEEEWireless Communications vol 11 no 6 pp 54ndash61 2004

[111] O Diallo J J P C Rodrigues and M Sene ldquoReal-time datamanagement on wireless sensor networks a surveyrdquo Journal ofNetwork andComputer Applications vol 35 no 3 pp 1013ndash10212012

[112] Y Yao L Feng B Jin and F Chen ldquoAn incremental learningapproachwith SupportVectorMachine for network data streamclassification problemrdquo Information Technology Journal vol 11no 2 pp 200ndash208 2012

Submit your manuscripts athttpwwwhindawicom

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2013Part I

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

DistributedSensor Networks

International Journal of

ISRN Signal Processing

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Mechanical Engineering

Advances in

Modelling amp Simulation in EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Advances inOptoElectronics

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2013

ISRN Sensor Networks

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawi Publishing Corporation httpwwwhindawicom Volume 2013

The Scientific World Journal

ISRN Robotics

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

International Journal of

Antennas andPropagation

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

ISRN Electronics

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

thinspJournalthinspofthinsp

Sensors

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Active and Passive Electronic Components

Chemical EngineeringInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Electrical and Computer Engineering

Journal of

ISRN Civil Engineering

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Advances inAcoustics ampVibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Page 19: ReviewArticle Data Mining Techniques for Wireless Sensor ...home.etf.bg.ac.rs/~vm/os/dmsw/Data Mining... · have a large impact on type of data mining algorithm to choose;therefore,onehastodecidetheprocessing

International Journal of Distributed Sensor Networks 19

6 Limitations of Existing Data MiningTechniques for WSNs

Tables 2ndash6 show the characteristics of datamining techniquesdesigned for WSNs It is observed from comparative analysisthat the existing techniques have the following shortcomings

(i) Most of the techniques do not take into account theheterogeneous data and assume that the sensor data ishomogenous [42 46 49ndash51 65 87 110] They ignorethe fact that different attributes together can improvethe mining accuracy In some cases homogenousdata cannot contribute appropriately toward real-time decision

(ii) The majority of techniques only considers the spatialor temporal or spatiotemporal correlations [65ndash6787 88] among sensor data of neighboring nodes anddoes not consider the attribute dependency amongsensor nodes This in turn increases the computa-tional complexity and reduces the accuracy of miningtechnique

(iii) The techniqueswhich consider spatial correlation [51]among sensor data of neighboring nodes suffer fromthe choice of appropriate neighborhood range Tech-niques which consider temporal correlation amongsensor data suffers from the choice of the size of thesliding window

(iv) The majority of techniques uses centralized approach[21 42ndash44 46 58 84 101] in which all data istransmitted to the sink node for identifying certainpatterns These techniques cause much communica-tion overhead and delay the response time Whilethe techniques that used distributed architecture opti-mize response time and energy consumption theyhave the same problem as that of the centralizedapproach if the aggregatorcluster head has a largenumber of nodes under its membership

(v) Excluding a few the performance of all of the schemesdiscussed in this paper has been evaluated with thehelp of different simulation tools Although the num-ber of simulators is available and plays an importantrole for developing and testing new technique thereis always some kind of risk involved as simulationresults may not be accurate In order to analyze aprotocol more effectively it is important to knowdifferent available tools andunderstand the associatedbenefits and limitationsDue to different performancerequirements according to specific applications ageneral tool for sensor networks is still lacking atpresent

(vi) The techniques evaluated by using analytical mod-eling [21 23 46 49 100 109] used certain sim-plification and assumption to evaluate the perfor-mance of proposed technique Such assumptions andsimplifications may lead to imprecise results withlimited confidence None of the proposed techniqueis evaluated by using real deployment Although realdeployment is complex costly and time consuming

accurate results can only be obtained by using realdeployment

(vii) Excluding a few [22 103 109] the majority oftechniques assumes that sensor nodes are stationaryand do not consider nodes mobility Applying thesetechniques for mobile networks or the networks withdynamic changed topology would be challenging

(viii) Most of the techniques used the synthetic dataAlthough synthetic data is easily available therealways been chances that results generated on syn-thetic data are not accurate

(ix) For the data mining techniques themselves fre-quent pattern mining [15ndash20] approaches suffer fromchoice of proper and flexible support and confidencethreshold Clustering techniques [11ndash14] suffer fromthe choice of an appropriate parameter of clusterwidth and computing the distance between datainstances in heterogeneous data is computationallyexpensive whereas classification-based techniques[24ndash26] require some prior knowledge to classify theincoming data stream However learning accurateclassification model is challenging if the number ofvariables is large in deployed WSNs

7 Future Research Directions

It is observed from the analysis of existing data mining workon sensor network-based application there are still shortcom-ings in existing techniques By seeing these shortcomingsand special characteristics of WSNs there is a need for datamining technique designed for WSNs The technique shouldbe based on the following requirements

(i) The technique should combine offline learningmech-anisms with distributed and online data processing

(ii) It should also consider the resource constraint ofWSN and its special characteristics such as nodemobility and network topology

(iii) The technique should consider heterogeneous dataand dependencies among spatial temporal andattribute correlations which may exist between adja-cent nodes

(iv) During online mining the technique should be capa-ble for incremental learning

(v) The technique should have low computation com-plexity and be easy to be implemented

Based on aforementioned requirements for WSN ahybrid data mining framework is proposed as shown inFigure 6 In this framework sensor nodes use their pro-cessing abilities to locally carry out mining processing andtransmit only the required and partially processed data calledlocal models Single-pass algorithms are applied for networkdata processing as the data is continuously arriving and notavailable for the next scan

Local models contain the compact event patterns ratherthan raw data which address the issue of communication

20 International Journal of Distributed Sensor Networks

Node data processingData selectionRemove duplicationAggregationSummarizationData fusionclusteringAssociation analysismiddot middot middotmiddot middot middot

middot middot middot

Sensor datastream

Global model

Approximateresults

Network model Local modelQuery

Users

Sinkbasestation

In-network processingCentralized processing

Central data processingFrequent pattern miningClassificationClusteringIncremental learningPredicationAnomaly detectionTime series analysis

Network data processingLocal model integrationNetwork analysisReal time decisionsNetwork maintenance

Network patternidentification

monitoring

Sing

le p

ass

Mul

ti pa

ss

Figure 6 Proposed hybrid framework for sensor network based applications

overhead associated with data transfer Local models aredistributed on entire network which are integrated at specialnode which is resource sufficient as compared with othersensor nodes As a result a network model is computed that ismore abstract than local model and is transferred to the basestationsink inmultihop fashionThenetworkmodels are thenintegrated at base stationsink to get the global view of entirenetwork named the global model As a result approximatequery answers are returned to endusers

This framework addresses the following shortcomings ofthe existing techniques

(i) It combines the offline learning mechanisms withdistributed and online data processing The dynamicnature of WSNs data requires real-time analysismethodologies and systems Centralized processingthrough high-end computing is also required forgenerating offline predictive insights which in turncan facilitate real-time analysis The applications thatrequire real-time response and actions can use net-work model for decision and knowledge extractionThe applications that need extensive data analysis fortheir decision making can use global model and per-form central processing on base the stationsink Thenetwork model forwards the processed informationto global model for extensive predictive insight

(ii) Since the data management is a crucial issue inWSNsdata [111] in order to deal with large-scale data fromWSNs the proposed framework splits the data pro-cessing tasks at multiple locations in-network pro-cessing and processing at central server In-networkprocessing splits the large task into smaller ones atnode level and cluster head which is distributed overthe entire network and executes parallelly At the node

level storage capacities of single nodes are used tocompute the local model which contains aggregateddata from single node whereas cluster head acquiresthe data from group of nodes and aggregate datareadings over a certain region or period As a resultnetwork model is computed at each cluster headwhich contains compact data from set of nodes andreduces data size to be transmitted Network modelscan be integrated at sink to get the global view ofreal-time applications Since the sink at network levelhas restricted resource and cannot process large-scaledata for predictive analysis therefore network mod-els are sent to central server where global models canbe computed for predictive offline analysis Historicalquery from the user can also be addressed fromcentral server whereas instant query can be handledby sink to support real-time response In this way ofdata distribution the proposed framework is feasibleto deal with large amount of data obtained fromWSNs

(iii) It can consider the resource constraint of sensornode by using context-awareness techniques Mem-ory energy [79] and bandwidth are considered inthe implementation of data processing on the sensorsfor example many summarization and aggregationtechniques can be adopted to reduce energy andbandwidth consumption

(iv) The framework can address the problem quicklychanging nature of WSNs data where characteristicsof the monitored process may change over timeand render the old models outdated This problemcan be addressed using the incremental learning

International Journal of Distributed Sensor Networks 21

mechanism [39 112] that helps the model to updatenew information

(v) The framework can identified the spatial-temporalcorrelation at local model by using data correlation-based clustering whereas attribute correlation can beidentified at global model by using the multipass datamining algorithms

Currently we are working on implementation of thishybrid framework and the implementationwill be completedin the near future

8 Conclusion

The emerging need for the data mining techniques in thefield of WSNs resulted in the development of numerousalgorithms Each one of these algorithms solves certainissues related to the appropriate WSNs type and applicationIn this paper we analyzed discussed and compared therelated existing research approaches We observed that thetechniques intended for mining sensor data at the networkside are helpful for taking real-time decision aswell as serve asprerequisite for development of effective mechanism for datastorage retrieval query and transaction processing at centralside Moreover we have presented problem-based taxonomyan overall analysis and review of the past research and theirlimitations which can provide insights for endusers in apply-ing or developing an appropriate data mining method andappropriate technology forWSNs Based on these limitationswe have proposed a hybrid framework which can addressthe shortcomings of existing work We have also discussedthe challenges for implementing data mining techniques inresource-constrained WSNs Besides there are a number ofopen issues in existing studies which need to be addressedSurely the number of WSNs applications presented hereis neither complete nor exhaustive but merely a sample ofapplications that demonstrate the usefulness and possibleapplications of data mining method in sensor network

We believe that WSNs applications will become moremature and popular with the advancement of sensor tech-nology and sensor data will become more informationrich Mining techniques will then be very significant inorder to conduct advanced analysis such as determiningtrends and finding interesting patterns thus enhancingWSNsperformance and operation The intention to present thispaper is to stimulate interests in utilizing and developing theprevious studies into emerging applications

Acknowledgments

This work was supported in part by the Joint Funds ofNSFC-Microsoft Research Asia under Grant no 60933012the Specialized Research Fund for the Doctoral Programof Higher Education under Grant no 20110142110062 andInternational SampT Cooperation Program of Hubei Provinceunder Grant no 2010BFA008

References

[1] A Rozyyev H Hasbullah and F Subhan ldquoIndoor child track-ing in wireless sensor network using fuzzy logic techniquerdquoResearch Journal of Information Technology vol 3 no 2 pp 81ndash92 2011

[2] R Szewczyk E Osterweil J Polastre M Hamilton A Main-waring and D Estrin ldquoHabitat monitoring with sensor net-worksrdquo Communications of the ACM vol 47 no 6 pp 34ndash402004

[3] S H Chauhdary A K Bashir S C Shah and M S ParkldquoEOATR energy efficient object tracking by auto adjustingtransmission range in wireless sensor networkrdquo Journal ofApplied Sciences vol 9 no 24 pp 4247ndash4252 2009

[4] P K Biswas and S Phoha ldquoSelf-organizing sensor networks forintegrated target surveillancerdquo IEEETransactions onComputersvol 55 no 8 pp 1033ndash1047 2006

[5] L T Lee and C W Chen ldquoSynchronizing sensor networkswith pulse coupled and cluster based approachesrdquo InformationTechnology Journal vol 7 no 5 pp 737ndash745 2008

[6] N Sabri S A Aljunid B Ahmad A Yahya R KamaruddinandM S Salim ldquoWireless sensor actor network based on fuzzyinference system for greenhouse climate controlrdquo Journal ofApplied Sciences vol 11 no 17 pp 3104ndash3116 2011

[7] D Kumar ldquoMonitoring forest cover changes using remotesensing and GIS a global prospectiverdquo Research Journal ofEnvironmental Sciences vol 5 pp 105ndash123 2011

[8] J Yick B Mukherjee and D Ghosal ldquoWireless sensor networksurveyrdquoComputerNetworks vol 52 no 12 pp 2292ndash2330 2008

[9] T Arampatzis J Lygeros and S Manesis ldquoA survey of appli-cations of wireless sensors and wireless sensor networksrdquoin Proceedings of the 20th IEEE International Symposium onIntelligent Control (ISIC rsquo05) pp 719ndash724 June 2005

[10] Y-C Tseng M-S Pan and Y-Y Tsai ldquoWireless sensor net-works for emergency navigationrdquo Computer vol 39 no 7 pp55ndash62 2006

[11] T Yairi Y Kato and K Hori ldquoFault detection by miningassociation rules fromhouse-keeping datardquo inProceedings of the6th International Symposium on Artificial Intelligence Roboticsand Automation in Space pp 18ndash21 2001

[12] O Horovitz S Krishnaswamy and M M Gaber ldquoA fuzzyapproach for interpretation of ubiquitous data stream clusteringand its application in road safetyrdquo Intelligent Data Analysis vol11 no 1 pp 89ndash108 2007

[13] J Gama P P Rodrigues and L Lopes ldquoClustering distributedsensor data streams using local processing and reduced com-municationrdquo Intelligent Data Analysis vol 15 no 1 pp 3ndash282011

[14] Z A Aghbari I Kamel and T Awad ldquoOn clustering largenumber of data streamsrdquo Intelligent Data Analysis vol 16 no1 pp 69ndash91 2012

[15] A Boukerche and S Samarah ldquoAn efficient data extractionmechanism for mining association rules from wireless sensornetworksrdquo in Proceedings of the IEEE International Conferenceon Communications (ICC rsquo07) pp 3936ndash3941 June 2007

[16] Y Chi H Wang P S Yu and R R Muntz ldquoMomentmaintaining closed frequent itemsets over a stream slidingwindowrdquo inProceedings of the 4th IEEE International Conferenceon Data Mining (ICDM rsquo04) pp 59ndash66 November 2004

[17] M Deypir and M H Sadreddini ldquoEclatDS an efficient slid-ing window based frequent pattern mining method for data

22 International Journal of Distributed Sensor Networks

streamsrdquo Intelligent Data Analysis vol 15 no 4 pp 571ndash5872011

[18] J Gama A Ganguly O Omitaomu R Vatsavai and M GaberldquoKnowledge discovery from data streamsrdquo Intelligent DataAnalysis vol 13 no 3 pp 403ndash404 2009

[19] B George J M Kang and S Shekhar ldquoSpatio-temporal sensorgraphs (STSG) a data model for the discovery of spatio-temporal patternsrdquo Intelligent Data Analysis vol 13 no 3 pp457ndash475 2009

[20] A Mahmood K Shi and S Khatoon ldquoMining data generatedby sensor networks a surveyrdquo Information Technology Journalvol 11 pp 1534ndash1543 2012

[21] D J Cook M Youngblood E O Heierman III et alldquoMavHome an agent-based smart homerdquo in Proceedings of the1st IEEE International Conference on Pervasive Computing andCommunications (PerCom rsquo03) pp 521ndash524 March 2003

[22] J Rabatel S Bringay and P Poncelet ldquoSO MAD sensorminingfor anomaly detection in railway datardquo in Advances in DataMining Applications andTheoretical Aspects pp 191ndash205 2009

[23] V Guralnik and K Z Haigh ldquoLearning models of humanbehaviour with sequential patternsrdquo in Proceedings of the AAAI-02 Workshop on Automation as Caregiver pp 24ndash30 2002

[24] S Huang and Y Dong ldquoAn active learning system for miningtime-changing data streamsrdquo Intelligent Data Analysis vol 11no 4 pp 401ndash419 2007

[25] J Beringer and E Hullermeier ldquoEfficient instance-based learn-ing on data streamsrdquo Intelligent Data Analysis vol 11 no 6 pp627ndash650 2007

[26] E J Spinosaa A PD L F deCarvalhoa and J Gamab ldquoNoveltydetection with application to data streamsrdquo Intelligent DataAnalysis vol 13 no 3 pp 405ndash422 2009

[27] M Xie S Han B Tian and S Parvin ldquoAnomaly detectionin wireless sensor networks a surveyrdquo Journal of Network andComputer Applications vol 34 no 4 pp 1302ndash1325 2011

[28] Y Zhang N Meratnia and P Havinga ldquoOutlier detectiontechniques for wireless sensor networks a surveyrdquo IEEE Com-munications Surveys and Tutorials vol 12 no 2 pp 159ndash1702010

[29] V Chandola A Banerjee and V Kumar ldquoAnomaly detection asurveyrdquo ACM Computing Surveys vol 41 no 3 article 15 2009

[30] VMaojo and J Sanandre ldquoA survey of data mining techniquesrdquoMedical Data Analysis Lecture Notes in Computer Science vol1933 pp 17ndash22 2000

[31] W Jinlong X Congfu C Weidong and P Yunhe ldquoSurveyof the study on frequent pattern mining in data streamsrdquo inProceedings of the IEEE International Conference on SystemsMan and Cybernetics (SMC rsquo04) pp 5917ndash5922 October 2004

[32] J Cheng Y Ke and W Ng ldquoA survey on algorithms formining frequent itemsets over data streamsrdquo Knowledge andInformation Systems vol 16 no 1 pp 1ndash27 2008

[33] A A Abbasi andM Younis ldquoA survey on clustering algorithmsfor wireless sensor networksrdquo Computer Communications vol30 no 14-15 pp 2826ndash2841 2007

[34] O Boyinbode H Le and M Takizawa ldquoA survey on clusteringalgorithms for wireless sensor networksrdquo International Journalof Space-Based and SituatedComputing vol 1 no 2 pp 130ndash1362007

[35] M M Gaber A Zaslavsky and S Krishnaswamy ldquoA survey ofclassificationmethods in data streamsrdquo inData Streams pp 39ndash59 Springer 2007

[36] R Agrawal and R Srikant ldquoFast algorithms for mining associ-ation rulesrdquo in Proceedings of the 20th International ConferenceVery Large Data Bases (VLDB rsquo94) pp 487ndash499 Citeseer 1994

[37] R J Bayardo Jr ldquoEfficiently mining long patterns fromdatabasesrdquo SIGMOD Record vol 27 no 2 pp 85ndash93 1998

[38] S Brin RMotwani andC Silverstein ldquoBeyondmarket basketsgeneralizing association rules to correlationsrdquo SIGMODRecordvol 26 no 2 pp 265ndash276 1997

[39] W Cheung and O R Zaiane ldquoIncremental mining of frequentpatterns without candidate generation or support constraintrdquoin Proceedings of 7th International Database Engineering andApplications Symposium pp 111ndash116 2003

[40] R Agrawal T Imielinski and A Swami ldquoMining associationrules between sets of items in large databasesrdquo in Proceeding ofSIGMOD pp 207ndash216

[41] J Han J Pei Y Yin and R Mao ldquoMining frequent pat-terns without candidate generation a frequent-pattern treeapproachrdquo Data Mining and Knowledge Discovery vol 8 no 1pp 53ndash87 2004

[42] M Halatchev and L Gruenwald ldquoEstimating missing valuesin related sensor data streamsrdquo in Proceedings of the 11thInternational Conference on Management of Data (COMADrsquo05) 2005

[43] N Jiang ldquoDiscovering association rules in data streams basedon closed pattern miningrdquo in Proceedings of the SIGMODWorkshop on Innovative Database Research 2007

[44] N Jiang and L Gruenwald ldquoEstimating missing data in datastreamsrdquo Advances in Databases Concepts Systems and Appli-cations pp 981ndash987 2007

[45] N Jiang and L Gruenwald ldquoCFI-stream mining closed fre-quent itemsets in data streamsrdquo in Proceedings of the 12th ACMSIGKDD International Conference on Knowledge Discovery andData Mining (KDD rsquo06) pp 592ndash597 August 2006

[46] K Loo I Tong and B Kao ldquoOnline algorithms for min-ing inter-stream associations from large sensor networksrdquo inAdvances in KnowledgeDiscovery andDataMining pp 291ndash3022005

[47] G S Manku and R Motwani ldquoApproximate frequency countsover data streamsrdquo in Proceedings of the 28th InternationalConference on Very Large Data Bases pp 346ndash357 2002

[48] S K Chong S Krishnaswamy S W Loke and M M GaberldquoUsing association rules for energy conservation in wirelesssensor networksrdquo in Proceedings of the 23rd Annual ACMSymposium on Applied Computing (SAC rsquo08) pp 971ndash975March 2008

[49] S K Tanbeer C F Ahmed B-S Jeong and Y-K Lee ldquoEfficientmining of association rules from wireless sensor networksrdquo inProceedings of the 11th International Conference on AdvancedCommunication Technology (ICACT rsquo09) pp 719ndash724 February2009

[50] A Boukerche and S Samarah ldquoA novel algorithm for miningassociation rules in Wireless Ad Hoc Sensor Networksrdquo IEEETransactions on Parallel and Distributed Systems vol 19 no 7pp 865ndash877 2008

[51] K Romer ldquoDistributed mining of spatio-temporal event pat-terns in sensor networksrdquo in Proceedings of the 1st Euro-American Workshop on Middleware for Sensor Networks(EAWMS rsquo06) 2006

[52] BTnode platform httpwwwbtnodeethzch[53] R Agrawal and R Srikant ldquoMining sequential patternsrdquo in

Proceedings of the IEEE 11th International Conference on DataEngineering pp 3ndash14 March 1995

International Journal of Distributed Sensor Networks 23

[54] R Srikant and R Agrawal ldquoMining sequential patterns gen-eralizations and performance improvementsrdquo in Proceedings ofthe Advances in Database Technology (EDBT rsquo96) pp 1ndash17 1996

[55] F Masseglia F Cathala and P Poncelet ldquoThe PSP approachfor mining sequential patternsrdquo Principles of Data Mining andKnowledge Discovery pp 176ndash184 1998

[56] J Han J Pei B Mortazavi-Asl Q Chen U Dayal and M-CHsu ldquoFreeSpan frequent pattern-projected sequential patternminingrdquo in Proceedings of the Sixth ACMSIGKDD InternationalConference onKnowledgeDiscovery andDataMining (KDD rsquo01)pp 355ndash359 August 2000

[57] J Pei J Han B Mortazavi-Asl et al ldquoPrefixSpan min-ing sequential patterns efficiently by prefix-projected patterngrowthrdquo in Proceedings of the 17th International Conference onData Engineering pp 215ndash224 April 2001

[58] F Esposito T M A Basile N Di Mauro and S Ferilli ldquoA rela-tional approach to sensor network data miningrdquo InformationRetrieval and Mining in Distributed Environments pp 163ndash1812010

[59] F Esposito N Di Mauro T M A Basile and S FerillildquoMulti-dimensional relational sequence miningrdquo FundamentaInformaticae vol 89 no 1 pp 23ndash43 2008

[60] R Agrawal H Mannila R Srikant et al ldquoFast discovery ofassociation rulesrdquo inAdvances in KnowledgeDiscovery andDataMining pp 307ndash328 AAAI PressMenlo Park Calif USA 1996

[61] Mica2Dot CrossBow 2005 httpwwwxbowcom[62] Intel Berkeley Research Lab Data httpdbcsailmitedulab-

datalabdatahtml[63] P H Wu W C Peng and M S Chen ldquoMining sequential

alarm patterns in a telecommunication databaserdquo in Databasesin Telecommunications II pp 37ndash51 2001

[64] V S Tseng and E H-C Lu ldquoEnergy-efficient real-time objecttracking in multi-level sensor networks by mining and predict-ing movement patternsrdquo Journal of Systems and Software vol82 no 4 pp 697ndash706 2009

[65] V S Tseng and K W Lin ldquoEnergy efficient strategies for objecttracking in sensor networks a data mining approachrdquo Journalof Systems and Software vol 80 no 10 pp 1678ndash1698 2007

[66] S Samarah M Al-Hajri and A Boukerche ldquoA predictiveenergy-efficient technique to support object-tracking sensornetworksrdquo IEEE Transactions on Vehicular Technology vol 60no 2 pp 656ndash663 2011

[67] A Taherkordi R Mohammadi and F Eliassen ldquoA commu-nication-efficient distributed clustering algorithm for sensornetworksrdquo in Proceedings of the 22nd International Conferenceon Advanced Information Networking and Applications Work-shopsSymposia (AINA rsquo08) pp 634ndash638 March 2008

[68] G Gupta and M Younis ldquoLoad-balanced clustering of wirelesssensor networksrdquo in Proceedings of the International Conferenceon Communications (ICC rsquo03) vol 3 pp 1848ndash1852 May 2003

[69] S Bandyopadhyay and E J Coyle ldquoAn energy efficient hier-archical clustering algorithm for wireless sensor networksrdquo inProceedings of the 22nd Annual Joint Conference on the IEEEComputer and Communications Societies pp 1713ndash1723 April2003

[70] S Ghiasi A Srivastava X Yang and M Sarrafzadeh ldquoOptimalenergy aware clustering in sensor networksrdquo Sensors vol 2 no7 pp 258ndash269 2002

[71] O Younis and S Fahmy ldquoHEED a hybrid energy-efficientdistributed clustering approach for ad hoc sensor networksrdquoIEEE Transactions on Mobile Computing vol 3 no 4 pp 366ndash379 2004

[72] M Younis M Youssef and K Arisha ldquoEnergy-aware manage-ment for cluster-based sensor networksrdquo Computer Networksvol 43 no 5 pp 649ndash668 2003

[73] Y T Hou Y Shi H D Sherali and S F Midkiff ldquoOn energyprovisioning and relay node placement for wireless sensornetworksrdquo IEEE Transactions on Wireless Communications vol4 no 5 pp 2579ndash2590 2005

[74] T Wu and S Biswas ldquoA self-reorganizing slot allocation proto-col for multi-cluster sensor networksrdquo in Proceedings of the 4thInternational Symposium on Information Processing in SensorNetworks (IPSN rsquo05) pp 309ndash316 April 2005

[75] K Dasgupta K Kalpakis and P Namjoshi ldquoAn efficientclustering-based heuristic for data gathering and aggregationin sensor networksrdquo in Proceedings of the IEEE Wireless Com-munications and Networking Conference (WCNC rsquo03) vol 3 pp1948ndash1953 2003

[76] M Demirbas A Arora and V Mittal ldquoFLOC A fast local clus-tering service for wireless sensor networksrdquo in Proceedings ofWorkshop on Dependability Issues in Wireless Ad Hoc Networksand Sensor Networks (DIWANS rsquo04) 2004

[77] P Ding J Holliday and A Celik ldquoDistributed energy-efficienthierarchical clustering for wireless sensor networksrdquo in Pro-ceedings of the 1st IEEE International Conference on DistributedComputing in Sensor Systems (DCOSS rsquo05) pp 466ndash467 July2005

[78] H Chan and A Perrig ldquoACE an emergent algorithm for highlyuniform cluster formationrdquoWireless Sensor Networks vol 2920pp 154ndash171 2004

[79] H Chan M Luk and A Perrig ldquoUsing clustering informationfor sensor network localizationrdquo in Proceedings of the 1st IEEEInternational Conference on Distributed Computing in SensorSystems (DCOSS rsquo05) pp 109ndash125 July 2005

[80] H Huang and J Wu ldquoA probabilistic clustering algorithmin wireless sensor networksrdquo in Proceeding of IEEE 62ndSemiannual Vehicular Technology Conference (VTC rsquo05) p 17962005

[81] A Youssef M Younis M Youssef and A Agrawala ldquoDis-tributed formation of overlappingmulti-hop clusters in wirelesssensor networksrdquo in Proceedings of the 49th Annual IEEE GlobalCommunication Conference (Globecom rsquo06) pp 1ndash6 December2006

[82] S Dai P Wang L Gao and S Zheng ldquoMining clusteringalgorithm in wireless sensor networksrdquo in Proceedings of theIEEE International Conference on Granular Computing (GRCrsquo08) pp 178ndash182 August 2008

[83] W R Heinzelman A Chandrakasan and H Balakrish-nan ldquoEnergy-efficient communication protocol for wirelessmicrosensor networksrdquo in Proceedings of the 33rd AnnualHawaii International Conference on System Siences (HICSS rsquo00)vol 2 p 223 January 2000

[84] C Liu K Wu and J Pei ldquoA dynamic clustering and schedulingapproach to energy saving in data collection from wirelesssensor networksrdquo in Proceedings of the 2nd Annual IEEE Com-munications Society Conference on Sensor and AdHoc Commu-nications and Networks (SECON rsquo05) pp 374ndash385 September2005

[85] L Guo C Ai X Wang Z Cai and Y Li ldquoReal time clusteringof sensory data in wireless sensor networksrdquo in Proceedingsof the IEEE 28th International Performance Computing andCommunications Conference (IPCCC rsquo09) pp 33ndash40 December2009

24 International Journal of Distributed Sensor Networks

[86] M H Yeo M S Lee S J Lee and J S Yoo ldquoData correlation-based clustering in sensor networksrdquo in Proceedings of the Inter-national Symposium on Computer Science and its Applications(CSA rsquo08) pp 332ndash337 October 2008

[87] P Beyens A Nowe and K Steenhaut ldquoHigh-density wirelesssensor networks a new clustering approach for prediction-based monitoringrdquo in Proceedings of the 2nd European Work-shop on Wireless Sensor Networks (EWSN rsquo05) pp 188ndash196February 2005

[88] S Yoon and C Shahabi ldquoThe Clustered AGgregation (CAG)technique leveraging spatial and temporal correlations in wire-less sensor networksrdquo ACM Transactions on Sensor Networksvol 3 no 1 Article ID 1210672 2007

[89] K Wang S A Ayyash T D C Little and P Basu ldquoAttribute-based clustering for information dissemination in wirelesssensor networksrdquo in Proceedings of the 2nd Annual IEEE Com-munications Society Conference on Sensor and AdHoc Commu-nications and Networks (SECON rsquo05) pp 498ndash509 Santa ClaraCalif USA September 2005

[90] X Ma S Li Q Luo et al ldquoDistributed hierarchical clusteringand summarization in sensor networksrdquo in Advances in Dataand Web Management pp 168ndash175 2007

[91] L K Sharma O P Vyas S Schieder et al ldquoNearest neighbourclassification for trajectory datardquo Information and Communica-tion Technologies vol 101 pp 180ndash185 2010

[92] B Chikhaoui S Wang and H Pigot ldquoA new algorithm basedon sequential pattern mining for person identification in ubiq-uitous environmentsrdquo in Proceedings of the 4th InternationalWorkshop on Knowledge Discovery form Sensor Data (ACMSensorKDD rsquo10) pp 20ndash28 Washington DC USA 2010

[93] J R M Bauchet S Giroux H Pigot et al ldquoPervasive assistancein smart homes for people with intellectual disabilities a casestudy on meal preparationrdquo International Journal of AssistiveRobotics and Mechatronics vol 9 no 4 pp 42ndash54 2008

[94] D J Cook andM Schmitter-Edgecombe ldquoAssessing the qualityof activities in a smart environmentrdquoMethods of Information inMedicine vol 48 no 5 pp 480ndash485 2009

[95] I H Witten and E Frank Data Mining Practical MachineLearning Tools and Techniques With Java Implementation Mor-gan Kaufmann 2000

[96] K Sharma M Rajpoot and L K Sharma ldquoNearest neighbourclassification for wireless sensor network datardquo InternationalJournal of Computer Trends and Technology no 2 2011

[97] NS2 Simulator httpwwwisiedunsnamns[98] O P V L K Sharma S Schieder and A K Akasapu ldquoA nearest

neighbour classification for trajectory datardquo in Springer CCISvol 101 pp 180ndash185 2010

[99] M J Akhlaghinia A Lotfi C Langensiepen and N SherkatldquoA fuzzy predictor model for the occupancy prediction of anintelligent inhabited environmentrdquo in Proceedings of the IEEEInternational Conference on Fuzzy Systems (FUZZ rsquo08) pp 939ndash946 June 2008

[100] M Gaber S Krishnaswamy and A Zaslavsky ldquoOn-boardmining of data streams in sensor networksrdquo in AdvancedMethods for Knowledge Discovery from Complex Data pp 307ndash335 2005

[101] M M Gaber S Krishnaswamy and A Zaslavsky ldquoAdaptivemining techniques for data streams using algorithm outputgranularityrdquo in Proceedings of the Australasian Data MiningWorkshop 2003

[102] M M Gaber A Zaslavsky and S Krishnaswamy ldquoResource-aware knowledge discovery in data streamsrdquo in Proceedingsof 1st International Workshop on Knowledge Discovery in DataStreams held in Conjunction ECML and PKDD 2004

[103] S M McConnell and D B Skillicorn ldquoA distributed approachfor prediction in sensor networksrdquo in Proceedings of the Work-shop on Data Mining in Sensor Networks Newport Beach CalifUSA 2005

[104] B Malhotra I Nikolaidis and J Harms ldquoDistributed classifi-cation of acoustic targets in wireless audio-sensor networksrdquoComputer Networks vol 52 no 13 pp 2582ndash2593 2008

[105] K Flouri B Beferull-Lozano and T Tsakalides ldquoTraininga SVM-based classifier in distributed sensor networksrdquo inProceedings of the 14th International Conference onDigital SignalProcessing (DSP rsquo09) pp 1ndash5 2006

[106] K Flouri B Beferull-Lozano and T Tsakalides ldquoEnergy-efficient distributed support vectormachines for wireless sensornetworksrdquo in Proceedings of the EuropeanWorkshop onWirelessSensor Networks 2006

[107] K Flouri B Beferull-Lozano and T Tsakalides ldquoDistributedconsensus algorithms for SVM training in wireless sensornetworksrdquo in Proceedings of the 16th European Signal ProcessingConference (EUSIPCO 09) 2008

[108] S Rajasegarar C Leckie M Palaniswami and J C BezdekldquoQuarter sphere based distributed anomaly detection in wire-less sensor networksrdquo in Proceedings of the IEEE InternationalConference on Communications (ICC rsquo07) pp 3864ndash3869 June2007

[109] B Chikhaoui S Wang and H Pigot ldquoA new algorithm basedon sequential pattern mining for person identification in ubiq-uitous environmentsrdquo in Proceedings of the 4th InternationalWorkshop on Knowledge Discovery form Sensor Data (ACMSensorKDD rsquo10) pp 20ndash28 Washington DC USA 2010

[110] K Romer and F Mattern ldquoThe design space of wireless sensornetworksrdquo IEEEWireless Communications vol 11 no 6 pp 54ndash61 2004

[111] O Diallo J J P C Rodrigues and M Sene ldquoReal-time datamanagement on wireless sensor networks a surveyrdquo Journal ofNetwork andComputer Applications vol 35 no 3 pp 1013ndash10212012

[112] Y Yao L Feng B Jin and F Chen ldquoAn incremental learningapproachwith SupportVectorMachine for network data streamclassification problemrdquo Information Technology Journal vol 11no 2 pp 200ndash208 2012

Submit your manuscripts athttpwwwhindawicom

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2013Part I

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

DistributedSensor Networks

International Journal of

ISRN Signal Processing

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Mechanical Engineering

Advances in

Modelling amp Simulation in EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Advances inOptoElectronics

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2013

ISRN Sensor Networks

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawi Publishing Corporation httpwwwhindawicom Volume 2013

The Scientific World Journal

ISRN Robotics

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

International Journal of

Antennas andPropagation

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

ISRN Electronics

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

thinspJournalthinspofthinsp

Sensors

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Active and Passive Electronic Components

Chemical EngineeringInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Electrical and Computer Engineering

Journal of

ISRN Civil Engineering

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Advances inAcoustics ampVibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Page 20: ReviewArticle Data Mining Techniques for Wireless Sensor ...home.etf.bg.ac.rs/~vm/os/dmsw/Data Mining... · have a large impact on type of data mining algorithm to choose;therefore,onehastodecidetheprocessing

20 International Journal of Distributed Sensor Networks

Node data processingData selectionRemove duplicationAggregationSummarizationData fusionclusteringAssociation analysismiddot middot middotmiddot middot middot

middot middot middot

Sensor datastream

Global model

Approximateresults

Network model Local modelQuery

Users

Sinkbasestation

In-network processingCentralized processing

Central data processingFrequent pattern miningClassificationClusteringIncremental learningPredicationAnomaly detectionTime series analysis

Network data processingLocal model integrationNetwork analysisReal time decisionsNetwork maintenance

Network patternidentification

monitoring

Sing

le p

ass

Mul

ti pa

ss

Figure 6 Proposed hybrid framework for sensor network based applications

overhead associated with data transfer Local models aredistributed on entire network which are integrated at specialnode which is resource sufficient as compared with othersensor nodes As a result a network model is computed that ismore abstract than local model and is transferred to the basestationsink inmultihop fashionThenetworkmodels are thenintegrated at base stationsink to get the global view of entirenetwork named the global model As a result approximatequery answers are returned to endusers

This framework addresses the following shortcomings ofthe existing techniques

(i) It combines the offline learning mechanisms withdistributed and online data processing The dynamicnature of WSNs data requires real-time analysismethodologies and systems Centralized processingthrough high-end computing is also required forgenerating offline predictive insights which in turncan facilitate real-time analysis The applications thatrequire real-time response and actions can use net-work model for decision and knowledge extractionThe applications that need extensive data analysis fortheir decision making can use global model and per-form central processing on base the stationsink Thenetwork model forwards the processed informationto global model for extensive predictive insight

(ii) Since the data management is a crucial issue inWSNsdata [111] in order to deal with large-scale data fromWSNs the proposed framework splits the data pro-cessing tasks at multiple locations in-network pro-cessing and processing at central server In-networkprocessing splits the large task into smaller ones atnode level and cluster head which is distributed overthe entire network and executes parallelly At the node

level storage capacities of single nodes are used tocompute the local model which contains aggregateddata from single node whereas cluster head acquiresthe data from group of nodes and aggregate datareadings over a certain region or period As a resultnetwork model is computed at each cluster headwhich contains compact data from set of nodes andreduces data size to be transmitted Network modelscan be integrated at sink to get the global view ofreal-time applications Since the sink at network levelhas restricted resource and cannot process large-scaledata for predictive analysis therefore network mod-els are sent to central server where global models canbe computed for predictive offline analysis Historicalquery from the user can also be addressed fromcentral server whereas instant query can be handledby sink to support real-time response In this way ofdata distribution the proposed framework is feasibleto deal with large amount of data obtained fromWSNs

(iii) It can consider the resource constraint of sensornode by using context-awareness techniques Mem-ory energy [79] and bandwidth are considered inthe implementation of data processing on the sensorsfor example many summarization and aggregationtechniques can be adopted to reduce energy andbandwidth consumption

(iv) The framework can address the problem quicklychanging nature of WSNs data where characteristicsof the monitored process may change over timeand render the old models outdated This problemcan be addressed using the incremental learning

International Journal of Distributed Sensor Networks 21

mechanism [39 112] that helps the model to updatenew information

(v) The framework can identified the spatial-temporalcorrelation at local model by using data correlation-based clustering whereas attribute correlation can beidentified at global model by using the multipass datamining algorithms

Currently we are working on implementation of thishybrid framework and the implementationwill be completedin the near future

8 Conclusion

The emerging need for the data mining techniques in thefield of WSNs resulted in the development of numerousalgorithms Each one of these algorithms solves certainissues related to the appropriate WSNs type and applicationIn this paper we analyzed discussed and compared therelated existing research approaches We observed that thetechniques intended for mining sensor data at the networkside are helpful for taking real-time decision aswell as serve asprerequisite for development of effective mechanism for datastorage retrieval query and transaction processing at centralside Moreover we have presented problem-based taxonomyan overall analysis and review of the past research and theirlimitations which can provide insights for endusers in apply-ing or developing an appropriate data mining method andappropriate technology forWSNs Based on these limitationswe have proposed a hybrid framework which can addressthe shortcomings of existing work We have also discussedthe challenges for implementing data mining techniques inresource-constrained WSNs Besides there are a number ofopen issues in existing studies which need to be addressedSurely the number of WSNs applications presented hereis neither complete nor exhaustive but merely a sample ofapplications that demonstrate the usefulness and possibleapplications of data mining method in sensor network

We believe that WSNs applications will become moremature and popular with the advancement of sensor tech-nology and sensor data will become more informationrich Mining techniques will then be very significant inorder to conduct advanced analysis such as determiningtrends and finding interesting patterns thus enhancingWSNsperformance and operation The intention to present thispaper is to stimulate interests in utilizing and developing theprevious studies into emerging applications

Acknowledgments

This work was supported in part by the Joint Funds ofNSFC-Microsoft Research Asia under Grant no 60933012the Specialized Research Fund for the Doctoral Programof Higher Education under Grant no 20110142110062 andInternational SampT Cooperation Program of Hubei Provinceunder Grant no 2010BFA008

References

[1] A Rozyyev H Hasbullah and F Subhan ldquoIndoor child track-ing in wireless sensor network using fuzzy logic techniquerdquoResearch Journal of Information Technology vol 3 no 2 pp 81ndash92 2011

[2] R Szewczyk E Osterweil J Polastre M Hamilton A Main-waring and D Estrin ldquoHabitat monitoring with sensor net-worksrdquo Communications of the ACM vol 47 no 6 pp 34ndash402004

[3] S H Chauhdary A K Bashir S C Shah and M S ParkldquoEOATR energy efficient object tracking by auto adjustingtransmission range in wireless sensor networkrdquo Journal ofApplied Sciences vol 9 no 24 pp 4247ndash4252 2009

[4] P K Biswas and S Phoha ldquoSelf-organizing sensor networks forintegrated target surveillancerdquo IEEETransactions onComputersvol 55 no 8 pp 1033ndash1047 2006

[5] L T Lee and C W Chen ldquoSynchronizing sensor networkswith pulse coupled and cluster based approachesrdquo InformationTechnology Journal vol 7 no 5 pp 737ndash745 2008

[6] N Sabri S A Aljunid B Ahmad A Yahya R KamaruddinandM S Salim ldquoWireless sensor actor network based on fuzzyinference system for greenhouse climate controlrdquo Journal ofApplied Sciences vol 11 no 17 pp 3104ndash3116 2011

[7] D Kumar ldquoMonitoring forest cover changes using remotesensing and GIS a global prospectiverdquo Research Journal ofEnvironmental Sciences vol 5 pp 105ndash123 2011

[8] J Yick B Mukherjee and D Ghosal ldquoWireless sensor networksurveyrdquoComputerNetworks vol 52 no 12 pp 2292ndash2330 2008

[9] T Arampatzis J Lygeros and S Manesis ldquoA survey of appli-cations of wireless sensors and wireless sensor networksrdquoin Proceedings of the 20th IEEE International Symposium onIntelligent Control (ISIC rsquo05) pp 719ndash724 June 2005

[10] Y-C Tseng M-S Pan and Y-Y Tsai ldquoWireless sensor net-works for emergency navigationrdquo Computer vol 39 no 7 pp55ndash62 2006

[11] T Yairi Y Kato and K Hori ldquoFault detection by miningassociation rules fromhouse-keeping datardquo inProceedings of the6th International Symposium on Artificial Intelligence Roboticsand Automation in Space pp 18ndash21 2001

[12] O Horovitz S Krishnaswamy and M M Gaber ldquoA fuzzyapproach for interpretation of ubiquitous data stream clusteringand its application in road safetyrdquo Intelligent Data Analysis vol11 no 1 pp 89ndash108 2007

[13] J Gama P P Rodrigues and L Lopes ldquoClustering distributedsensor data streams using local processing and reduced com-municationrdquo Intelligent Data Analysis vol 15 no 1 pp 3ndash282011

[14] Z A Aghbari I Kamel and T Awad ldquoOn clustering largenumber of data streamsrdquo Intelligent Data Analysis vol 16 no1 pp 69ndash91 2012

[15] A Boukerche and S Samarah ldquoAn efficient data extractionmechanism for mining association rules from wireless sensornetworksrdquo in Proceedings of the IEEE International Conferenceon Communications (ICC rsquo07) pp 3936ndash3941 June 2007

[16] Y Chi H Wang P S Yu and R R Muntz ldquoMomentmaintaining closed frequent itemsets over a stream slidingwindowrdquo inProceedings of the 4th IEEE International Conferenceon Data Mining (ICDM rsquo04) pp 59ndash66 November 2004

[17] M Deypir and M H Sadreddini ldquoEclatDS an efficient slid-ing window based frequent pattern mining method for data

22 International Journal of Distributed Sensor Networks

streamsrdquo Intelligent Data Analysis vol 15 no 4 pp 571ndash5872011

[18] J Gama A Ganguly O Omitaomu R Vatsavai and M GaberldquoKnowledge discovery from data streamsrdquo Intelligent DataAnalysis vol 13 no 3 pp 403ndash404 2009

[19] B George J M Kang and S Shekhar ldquoSpatio-temporal sensorgraphs (STSG) a data model for the discovery of spatio-temporal patternsrdquo Intelligent Data Analysis vol 13 no 3 pp457ndash475 2009

[20] A Mahmood K Shi and S Khatoon ldquoMining data generatedby sensor networks a surveyrdquo Information Technology Journalvol 11 pp 1534ndash1543 2012

[21] D J Cook M Youngblood E O Heierman III et alldquoMavHome an agent-based smart homerdquo in Proceedings of the1st IEEE International Conference on Pervasive Computing andCommunications (PerCom rsquo03) pp 521ndash524 March 2003

[22] J Rabatel S Bringay and P Poncelet ldquoSO MAD sensorminingfor anomaly detection in railway datardquo in Advances in DataMining Applications andTheoretical Aspects pp 191ndash205 2009

[23] V Guralnik and K Z Haigh ldquoLearning models of humanbehaviour with sequential patternsrdquo in Proceedings of the AAAI-02 Workshop on Automation as Caregiver pp 24ndash30 2002

[24] S Huang and Y Dong ldquoAn active learning system for miningtime-changing data streamsrdquo Intelligent Data Analysis vol 11no 4 pp 401ndash419 2007

[25] J Beringer and E Hullermeier ldquoEfficient instance-based learn-ing on data streamsrdquo Intelligent Data Analysis vol 11 no 6 pp627ndash650 2007

[26] E J Spinosaa A PD L F deCarvalhoa and J Gamab ldquoNoveltydetection with application to data streamsrdquo Intelligent DataAnalysis vol 13 no 3 pp 405ndash422 2009

[27] M Xie S Han B Tian and S Parvin ldquoAnomaly detectionin wireless sensor networks a surveyrdquo Journal of Network andComputer Applications vol 34 no 4 pp 1302ndash1325 2011

[28] Y Zhang N Meratnia and P Havinga ldquoOutlier detectiontechniques for wireless sensor networks a surveyrdquo IEEE Com-munications Surveys and Tutorials vol 12 no 2 pp 159ndash1702010

[29] V Chandola A Banerjee and V Kumar ldquoAnomaly detection asurveyrdquo ACM Computing Surveys vol 41 no 3 article 15 2009

[30] VMaojo and J Sanandre ldquoA survey of data mining techniquesrdquoMedical Data Analysis Lecture Notes in Computer Science vol1933 pp 17ndash22 2000

[31] W Jinlong X Congfu C Weidong and P Yunhe ldquoSurveyof the study on frequent pattern mining in data streamsrdquo inProceedings of the IEEE International Conference on SystemsMan and Cybernetics (SMC rsquo04) pp 5917ndash5922 October 2004

[32] J Cheng Y Ke and W Ng ldquoA survey on algorithms formining frequent itemsets over data streamsrdquo Knowledge andInformation Systems vol 16 no 1 pp 1ndash27 2008

[33] A A Abbasi andM Younis ldquoA survey on clustering algorithmsfor wireless sensor networksrdquo Computer Communications vol30 no 14-15 pp 2826ndash2841 2007

[34] O Boyinbode H Le and M Takizawa ldquoA survey on clusteringalgorithms for wireless sensor networksrdquo International Journalof Space-Based and SituatedComputing vol 1 no 2 pp 130ndash1362007

[35] M M Gaber A Zaslavsky and S Krishnaswamy ldquoA survey ofclassificationmethods in data streamsrdquo inData Streams pp 39ndash59 Springer 2007

[36] R Agrawal and R Srikant ldquoFast algorithms for mining associ-ation rulesrdquo in Proceedings of the 20th International ConferenceVery Large Data Bases (VLDB rsquo94) pp 487ndash499 Citeseer 1994

[37] R J Bayardo Jr ldquoEfficiently mining long patterns fromdatabasesrdquo SIGMOD Record vol 27 no 2 pp 85ndash93 1998

[38] S Brin RMotwani andC Silverstein ldquoBeyondmarket basketsgeneralizing association rules to correlationsrdquo SIGMODRecordvol 26 no 2 pp 265ndash276 1997

[39] W Cheung and O R Zaiane ldquoIncremental mining of frequentpatterns without candidate generation or support constraintrdquoin Proceedings of 7th International Database Engineering andApplications Symposium pp 111ndash116 2003

[40] R Agrawal T Imielinski and A Swami ldquoMining associationrules between sets of items in large databasesrdquo in Proceeding ofSIGMOD pp 207ndash216

[41] J Han J Pei Y Yin and R Mao ldquoMining frequent pat-terns without candidate generation a frequent-pattern treeapproachrdquo Data Mining and Knowledge Discovery vol 8 no 1pp 53ndash87 2004

[42] M Halatchev and L Gruenwald ldquoEstimating missing valuesin related sensor data streamsrdquo in Proceedings of the 11thInternational Conference on Management of Data (COMADrsquo05) 2005

[43] N Jiang ldquoDiscovering association rules in data streams basedon closed pattern miningrdquo in Proceedings of the SIGMODWorkshop on Innovative Database Research 2007

[44] N Jiang and L Gruenwald ldquoEstimating missing data in datastreamsrdquo Advances in Databases Concepts Systems and Appli-cations pp 981ndash987 2007

[45] N Jiang and L Gruenwald ldquoCFI-stream mining closed fre-quent itemsets in data streamsrdquo in Proceedings of the 12th ACMSIGKDD International Conference on Knowledge Discovery andData Mining (KDD rsquo06) pp 592ndash597 August 2006

[46] K Loo I Tong and B Kao ldquoOnline algorithms for min-ing inter-stream associations from large sensor networksrdquo inAdvances in KnowledgeDiscovery andDataMining pp 291ndash3022005

[47] G S Manku and R Motwani ldquoApproximate frequency countsover data streamsrdquo in Proceedings of the 28th InternationalConference on Very Large Data Bases pp 346ndash357 2002

[48] S K Chong S Krishnaswamy S W Loke and M M GaberldquoUsing association rules for energy conservation in wirelesssensor networksrdquo in Proceedings of the 23rd Annual ACMSymposium on Applied Computing (SAC rsquo08) pp 971ndash975March 2008

[49] S K Tanbeer C F Ahmed B-S Jeong and Y-K Lee ldquoEfficientmining of association rules from wireless sensor networksrdquo inProceedings of the 11th International Conference on AdvancedCommunication Technology (ICACT rsquo09) pp 719ndash724 February2009

[50] A Boukerche and S Samarah ldquoA novel algorithm for miningassociation rules in Wireless Ad Hoc Sensor Networksrdquo IEEETransactions on Parallel and Distributed Systems vol 19 no 7pp 865ndash877 2008

[51] K Romer ldquoDistributed mining of spatio-temporal event pat-terns in sensor networksrdquo in Proceedings of the 1st Euro-American Workshop on Middleware for Sensor Networks(EAWMS rsquo06) 2006

[52] BTnode platform httpwwwbtnodeethzch[53] R Agrawal and R Srikant ldquoMining sequential patternsrdquo in

Proceedings of the IEEE 11th International Conference on DataEngineering pp 3ndash14 March 1995

International Journal of Distributed Sensor Networks 23

[54] R Srikant and R Agrawal ldquoMining sequential patterns gen-eralizations and performance improvementsrdquo in Proceedings ofthe Advances in Database Technology (EDBT rsquo96) pp 1ndash17 1996

[55] F Masseglia F Cathala and P Poncelet ldquoThe PSP approachfor mining sequential patternsrdquo Principles of Data Mining andKnowledge Discovery pp 176ndash184 1998

[56] J Han J Pei B Mortazavi-Asl Q Chen U Dayal and M-CHsu ldquoFreeSpan frequent pattern-projected sequential patternminingrdquo in Proceedings of the Sixth ACMSIGKDD InternationalConference onKnowledgeDiscovery andDataMining (KDD rsquo01)pp 355ndash359 August 2000

[57] J Pei J Han B Mortazavi-Asl et al ldquoPrefixSpan min-ing sequential patterns efficiently by prefix-projected patterngrowthrdquo in Proceedings of the 17th International Conference onData Engineering pp 215ndash224 April 2001

[58] F Esposito T M A Basile N Di Mauro and S Ferilli ldquoA rela-tional approach to sensor network data miningrdquo InformationRetrieval and Mining in Distributed Environments pp 163ndash1812010

[59] F Esposito N Di Mauro T M A Basile and S FerillildquoMulti-dimensional relational sequence miningrdquo FundamentaInformaticae vol 89 no 1 pp 23ndash43 2008

[60] R Agrawal H Mannila R Srikant et al ldquoFast discovery ofassociation rulesrdquo inAdvances in KnowledgeDiscovery andDataMining pp 307ndash328 AAAI PressMenlo Park Calif USA 1996

[61] Mica2Dot CrossBow 2005 httpwwwxbowcom[62] Intel Berkeley Research Lab Data httpdbcsailmitedulab-

datalabdatahtml[63] P H Wu W C Peng and M S Chen ldquoMining sequential

alarm patterns in a telecommunication databaserdquo in Databasesin Telecommunications II pp 37ndash51 2001

[64] V S Tseng and E H-C Lu ldquoEnergy-efficient real-time objecttracking in multi-level sensor networks by mining and predict-ing movement patternsrdquo Journal of Systems and Software vol82 no 4 pp 697ndash706 2009

[65] V S Tseng and K W Lin ldquoEnergy efficient strategies for objecttracking in sensor networks a data mining approachrdquo Journalof Systems and Software vol 80 no 10 pp 1678ndash1698 2007

[66] S Samarah M Al-Hajri and A Boukerche ldquoA predictiveenergy-efficient technique to support object-tracking sensornetworksrdquo IEEE Transactions on Vehicular Technology vol 60no 2 pp 656ndash663 2011

[67] A Taherkordi R Mohammadi and F Eliassen ldquoA commu-nication-efficient distributed clustering algorithm for sensornetworksrdquo in Proceedings of the 22nd International Conferenceon Advanced Information Networking and Applications Work-shopsSymposia (AINA rsquo08) pp 634ndash638 March 2008

[68] G Gupta and M Younis ldquoLoad-balanced clustering of wirelesssensor networksrdquo in Proceedings of the International Conferenceon Communications (ICC rsquo03) vol 3 pp 1848ndash1852 May 2003

[69] S Bandyopadhyay and E J Coyle ldquoAn energy efficient hier-archical clustering algorithm for wireless sensor networksrdquo inProceedings of the 22nd Annual Joint Conference on the IEEEComputer and Communications Societies pp 1713ndash1723 April2003

[70] S Ghiasi A Srivastava X Yang and M Sarrafzadeh ldquoOptimalenergy aware clustering in sensor networksrdquo Sensors vol 2 no7 pp 258ndash269 2002

[71] O Younis and S Fahmy ldquoHEED a hybrid energy-efficientdistributed clustering approach for ad hoc sensor networksrdquoIEEE Transactions on Mobile Computing vol 3 no 4 pp 366ndash379 2004

[72] M Younis M Youssef and K Arisha ldquoEnergy-aware manage-ment for cluster-based sensor networksrdquo Computer Networksvol 43 no 5 pp 649ndash668 2003

[73] Y T Hou Y Shi H D Sherali and S F Midkiff ldquoOn energyprovisioning and relay node placement for wireless sensornetworksrdquo IEEE Transactions on Wireless Communications vol4 no 5 pp 2579ndash2590 2005

[74] T Wu and S Biswas ldquoA self-reorganizing slot allocation proto-col for multi-cluster sensor networksrdquo in Proceedings of the 4thInternational Symposium on Information Processing in SensorNetworks (IPSN rsquo05) pp 309ndash316 April 2005

[75] K Dasgupta K Kalpakis and P Namjoshi ldquoAn efficientclustering-based heuristic for data gathering and aggregationin sensor networksrdquo in Proceedings of the IEEE Wireless Com-munications and Networking Conference (WCNC rsquo03) vol 3 pp1948ndash1953 2003

[76] M Demirbas A Arora and V Mittal ldquoFLOC A fast local clus-tering service for wireless sensor networksrdquo in Proceedings ofWorkshop on Dependability Issues in Wireless Ad Hoc Networksand Sensor Networks (DIWANS rsquo04) 2004

[77] P Ding J Holliday and A Celik ldquoDistributed energy-efficienthierarchical clustering for wireless sensor networksrdquo in Pro-ceedings of the 1st IEEE International Conference on DistributedComputing in Sensor Systems (DCOSS rsquo05) pp 466ndash467 July2005

[78] H Chan and A Perrig ldquoACE an emergent algorithm for highlyuniform cluster formationrdquoWireless Sensor Networks vol 2920pp 154ndash171 2004

[79] H Chan M Luk and A Perrig ldquoUsing clustering informationfor sensor network localizationrdquo in Proceedings of the 1st IEEEInternational Conference on Distributed Computing in SensorSystems (DCOSS rsquo05) pp 109ndash125 July 2005

[80] H Huang and J Wu ldquoA probabilistic clustering algorithmin wireless sensor networksrdquo in Proceeding of IEEE 62ndSemiannual Vehicular Technology Conference (VTC rsquo05) p 17962005

[81] A Youssef M Younis M Youssef and A Agrawala ldquoDis-tributed formation of overlappingmulti-hop clusters in wirelesssensor networksrdquo in Proceedings of the 49th Annual IEEE GlobalCommunication Conference (Globecom rsquo06) pp 1ndash6 December2006

[82] S Dai P Wang L Gao and S Zheng ldquoMining clusteringalgorithm in wireless sensor networksrdquo in Proceedings of theIEEE International Conference on Granular Computing (GRCrsquo08) pp 178ndash182 August 2008

[83] W R Heinzelman A Chandrakasan and H Balakrish-nan ldquoEnergy-efficient communication protocol for wirelessmicrosensor networksrdquo in Proceedings of the 33rd AnnualHawaii International Conference on System Siences (HICSS rsquo00)vol 2 p 223 January 2000

[84] C Liu K Wu and J Pei ldquoA dynamic clustering and schedulingapproach to energy saving in data collection from wirelesssensor networksrdquo in Proceedings of the 2nd Annual IEEE Com-munications Society Conference on Sensor and AdHoc Commu-nications and Networks (SECON rsquo05) pp 374ndash385 September2005

[85] L Guo C Ai X Wang Z Cai and Y Li ldquoReal time clusteringof sensory data in wireless sensor networksrdquo in Proceedingsof the IEEE 28th International Performance Computing andCommunications Conference (IPCCC rsquo09) pp 33ndash40 December2009

24 International Journal of Distributed Sensor Networks

[86] M H Yeo M S Lee S J Lee and J S Yoo ldquoData correlation-based clustering in sensor networksrdquo in Proceedings of the Inter-national Symposium on Computer Science and its Applications(CSA rsquo08) pp 332ndash337 October 2008

[87] P Beyens A Nowe and K Steenhaut ldquoHigh-density wirelesssensor networks a new clustering approach for prediction-based monitoringrdquo in Proceedings of the 2nd European Work-shop on Wireless Sensor Networks (EWSN rsquo05) pp 188ndash196February 2005

[88] S Yoon and C Shahabi ldquoThe Clustered AGgregation (CAG)technique leveraging spatial and temporal correlations in wire-less sensor networksrdquo ACM Transactions on Sensor Networksvol 3 no 1 Article ID 1210672 2007

[89] K Wang S A Ayyash T D C Little and P Basu ldquoAttribute-based clustering for information dissemination in wirelesssensor networksrdquo in Proceedings of the 2nd Annual IEEE Com-munications Society Conference on Sensor and AdHoc Commu-nications and Networks (SECON rsquo05) pp 498ndash509 Santa ClaraCalif USA September 2005

[90] X Ma S Li Q Luo et al ldquoDistributed hierarchical clusteringand summarization in sensor networksrdquo in Advances in Dataand Web Management pp 168ndash175 2007

[91] L K Sharma O P Vyas S Schieder et al ldquoNearest neighbourclassification for trajectory datardquo Information and Communica-tion Technologies vol 101 pp 180ndash185 2010

[92] B Chikhaoui S Wang and H Pigot ldquoA new algorithm basedon sequential pattern mining for person identification in ubiq-uitous environmentsrdquo in Proceedings of the 4th InternationalWorkshop on Knowledge Discovery form Sensor Data (ACMSensorKDD rsquo10) pp 20ndash28 Washington DC USA 2010

[93] J R M Bauchet S Giroux H Pigot et al ldquoPervasive assistancein smart homes for people with intellectual disabilities a casestudy on meal preparationrdquo International Journal of AssistiveRobotics and Mechatronics vol 9 no 4 pp 42ndash54 2008

[94] D J Cook andM Schmitter-Edgecombe ldquoAssessing the qualityof activities in a smart environmentrdquoMethods of Information inMedicine vol 48 no 5 pp 480ndash485 2009

[95] I H Witten and E Frank Data Mining Practical MachineLearning Tools and Techniques With Java Implementation Mor-gan Kaufmann 2000

[96] K Sharma M Rajpoot and L K Sharma ldquoNearest neighbourclassification for wireless sensor network datardquo InternationalJournal of Computer Trends and Technology no 2 2011

[97] NS2 Simulator httpwwwisiedunsnamns[98] O P V L K Sharma S Schieder and A K Akasapu ldquoA nearest

neighbour classification for trajectory datardquo in Springer CCISvol 101 pp 180ndash185 2010

[99] M J Akhlaghinia A Lotfi C Langensiepen and N SherkatldquoA fuzzy predictor model for the occupancy prediction of anintelligent inhabited environmentrdquo in Proceedings of the IEEEInternational Conference on Fuzzy Systems (FUZZ rsquo08) pp 939ndash946 June 2008

[100] M Gaber S Krishnaswamy and A Zaslavsky ldquoOn-boardmining of data streams in sensor networksrdquo in AdvancedMethods for Knowledge Discovery from Complex Data pp 307ndash335 2005

[101] M M Gaber S Krishnaswamy and A Zaslavsky ldquoAdaptivemining techniques for data streams using algorithm outputgranularityrdquo in Proceedings of the Australasian Data MiningWorkshop 2003

[102] M M Gaber A Zaslavsky and S Krishnaswamy ldquoResource-aware knowledge discovery in data streamsrdquo in Proceedingsof 1st International Workshop on Knowledge Discovery in DataStreams held in Conjunction ECML and PKDD 2004

[103] S M McConnell and D B Skillicorn ldquoA distributed approachfor prediction in sensor networksrdquo in Proceedings of the Work-shop on Data Mining in Sensor Networks Newport Beach CalifUSA 2005

[104] B Malhotra I Nikolaidis and J Harms ldquoDistributed classifi-cation of acoustic targets in wireless audio-sensor networksrdquoComputer Networks vol 52 no 13 pp 2582ndash2593 2008

[105] K Flouri B Beferull-Lozano and T Tsakalides ldquoTraininga SVM-based classifier in distributed sensor networksrdquo inProceedings of the 14th International Conference onDigital SignalProcessing (DSP rsquo09) pp 1ndash5 2006

[106] K Flouri B Beferull-Lozano and T Tsakalides ldquoEnergy-efficient distributed support vectormachines for wireless sensornetworksrdquo in Proceedings of the EuropeanWorkshop onWirelessSensor Networks 2006

[107] K Flouri B Beferull-Lozano and T Tsakalides ldquoDistributedconsensus algorithms for SVM training in wireless sensornetworksrdquo in Proceedings of the 16th European Signal ProcessingConference (EUSIPCO 09) 2008

[108] S Rajasegarar C Leckie M Palaniswami and J C BezdekldquoQuarter sphere based distributed anomaly detection in wire-less sensor networksrdquo in Proceedings of the IEEE InternationalConference on Communications (ICC rsquo07) pp 3864ndash3869 June2007

[109] B Chikhaoui S Wang and H Pigot ldquoA new algorithm basedon sequential pattern mining for person identification in ubiq-uitous environmentsrdquo in Proceedings of the 4th InternationalWorkshop on Knowledge Discovery form Sensor Data (ACMSensorKDD rsquo10) pp 20ndash28 Washington DC USA 2010

[110] K Romer and F Mattern ldquoThe design space of wireless sensornetworksrdquo IEEEWireless Communications vol 11 no 6 pp 54ndash61 2004

[111] O Diallo J J P C Rodrigues and M Sene ldquoReal-time datamanagement on wireless sensor networks a surveyrdquo Journal ofNetwork andComputer Applications vol 35 no 3 pp 1013ndash10212012

[112] Y Yao L Feng B Jin and F Chen ldquoAn incremental learningapproachwith SupportVectorMachine for network data streamclassification problemrdquo Information Technology Journal vol 11no 2 pp 200ndash208 2012

Submit your manuscripts athttpwwwhindawicom

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2013Part I

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

DistributedSensor Networks

International Journal of

ISRN Signal Processing

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Mechanical Engineering

Advances in

Modelling amp Simulation in EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Advances inOptoElectronics

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2013

ISRN Sensor Networks

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawi Publishing Corporation httpwwwhindawicom Volume 2013

The Scientific World Journal

ISRN Robotics

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

International Journal of

Antennas andPropagation

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

ISRN Electronics

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

thinspJournalthinspofthinsp

Sensors

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Active and Passive Electronic Components

Chemical EngineeringInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Electrical and Computer Engineering

Journal of

ISRN Civil Engineering

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Advances inAcoustics ampVibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Page 21: ReviewArticle Data Mining Techniques for Wireless Sensor ...home.etf.bg.ac.rs/~vm/os/dmsw/Data Mining... · have a large impact on type of data mining algorithm to choose;therefore,onehastodecidetheprocessing

International Journal of Distributed Sensor Networks 21

mechanism [39 112] that helps the model to updatenew information

(v) The framework can identified the spatial-temporalcorrelation at local model by using data correlation-based clustering whereas attribute correlation can beidentified at global model by using the multipass datamining algorithms

Currently we are working on implementation of thishybrid framework and the implementationwill be completedin the near future

8 Conclusion

The emerging need for the data mining techniques in thefield of WSNs resulted in the development of numerousalgorithms Each one of these algorithms solves certainissues related to the appropriate WSNs type and applicationIn this paper we analyzed discussed and compared therelated existing research approaches We observed that thetechniques intended for mining sensor data at the networkside are helpful for taking real-time decision aswell as serve asprerequisite for development of effective mechanism for datastorage retrieval query and transaction processing at centralside Moreover we have presented problem-based taxonomyan overall analysis and review of the past research and theirlimitations which can provide insights for endusers in apply-ing or developing an appropriate data mining method andappropriate technology forWSNs Based on these limitationswe have proposed a hybrid framework which can addressthe shortcomings of existing work We have also discussedthe challenges for implementing data mining techniques inresource-constrained WSNs Besides there are a number ofopen issues in existing studies which need to be addressedSurely the number of WSNs applications presented hereis neither complete nor exhaustive but merely a sample ofapplications that demonstrate the usefulness and possibleapplications of data mining method in sensor network

We believe that WSNs applications will become moremature and popular with the advancement of sensor tech-nology and sensor data will become more informationrich Mining techniques will then be very significant inorder to conduct advanced analysis such as determiningtrends and finding interesting patterns thus enhancingWSNsperformance and operation The intention to present thispaper is to stimulate interests in utilizing and developing theprevious studies into emerging applications

Acknowledgments

This work was supported in part by the Joint Funds ofNSFC-Microsoft Research Asia under Grant no 60933012the Specialized Research Fund for the Doctoral Programof Higher Education under Grant no 20110142110062 andInternational SampT Cooperation Program of Hubei Provinceunder Grant no 2010BFA008

References

[1] A Rozyyev H Hasbullah and F Subhan ldquoIndoor child track-ing in wireless sensor network using fuzzy logic techniquerdquoResearch Journal of Information Technology vol 3 no 2 pp 81ndash92 2011

[2] R Szewczyk E Osterweil J Polastre M Hamilton A Main-waring and D Estrin ldquoHabitat monitoring with sensor net-worksrdquo Communications of the ACM vol 47 no 6 pp 34ndash402004

[3] S H Chauhdary A K Bashir S C Shah and M S ParkldquoEOATR energy efficient object tracking by auto adjustingtransmission range in wireless sensor networkrdquo Journal ofApplied Sciences vol 9 no 24 pp 4247ndash4252 2009

[4] P K Biswas and S Phoha ldquoSelf-organizing sensor networks forintegrated target surveillancerdquo IEEETransactions onComputersvol 55 no 8 pp 1033ndash1047 2006

[5] L T Lee and C W Chen ldquoSynchronizing sensor networkswith pulse coupled and cluster based approachesrdquo InformationTechnology Journal vol 7 no 5 pp 737ndash745 2008

[6] N Sabri S A Aljunid B Ahmad A Yahya R KamaruddinandM S Salim ldquoWireless sensor actor network based on fuzzyinference system for greenhouse climate controlrdquo Journal ofApplied Sciences vol 11 no 17 pp 3104ndash3116 2011

[7] D Kumar ldquoMonitoring forest cover changes using remotesensing and GIS a global prospectiverdquo Research Journal ofEnvironmental Sciences vol 5 pp 105ndash123 2011

[8] J Yick B Mukherjee and D Ghosal ldquoWireless sensor networksurveyrdquoComputerNetworks vol 52 no 12 pp 2292ndash2330 2008

[9] T Arampatzis J Lygeros and S Manesis ldquoA survey of appli-cations of wireless sensors and wireless sensor networksrdquoin Proceedings of the 20th IEEE International Symposium onIntelligent Control (ISIC rsquo05) pp 719ndash724 June 2005

[10] Y-C Tseng M-S Pan and Y-Y Tsai ldquoWireless sensor net-works for emergency navigationrdquo Computer vol 39 no 7 pp55ndash62 2006

[11] T Yairi Y Kato and K Hori ldquoFault detection by miningassociation rules fromhouse-keeping datardquo inProceedings of the6th International Symposium on Artificial Intelligence Roboticsand Automation in Space pp 18ndash21 2001

[12] O Horovitz S Krishnaswamy and M M Gaber ldquoA fuzzyapproach for interpretation of ubiquitous data stream clusteringand its application in road safetyrdquo Intelligent Data Analysis vol11 no 1 pp 89ndash108 2007

[13] J Gama P P Rodrigues and L Lopes ldquoClustering distributedsensor data streams using local processing and reduced com-municationrdquo Intelligent Data Analysis vol 15 no 1 pp 3ndash282011

[14] Z A Aghbari I Kamel and T Awad ldquoOn clustering largenumber of data streamsrdquo Intelligent Data Analysis vol 16 no1 pp 69ndash91 2012

[15] A Boukerche and S Samarah ldquoAn efficient data extractionmechanism for mining association rules from wireless sensornetworksrdquo in Proceedings of the IEEE International Conferenceon Communications (ICC rsquo07) pp 3936ndash3941 June 2007

[16] Y Chi H Wang P S Yu and R R Muntz ldquoMomentmaintaining closed frequent itemsets over a stream slidingwindowrdquo inProceedings of the 4th IEEE International Conferenceon Data Mining (ICDM rsquo04) pp 59ndash66 November 2004

[17] M Deypir and M H Sadreddini ldquoEclatDS an efficient slid-ing window based frequent pattern mining method for data

22 International Journal of Distributed Sensor Networks

streamsrdquo Intelligent Data Analysis vol 15 no 4 pp 571ndash5872011

[18] J Gama A Ganguly O Omitaomu R Vatsavai and M GaberldquoKnowledge discovery from data streamsrdquo Intelligent DataAnalysis vol 13 no 3 pp 403ndash404 2009

[19] B George J M Kang and S Shekhar ldquoSpatio-temporal sensorgraphs (STSG) a data model for the discovery of spatio-temporal patternsrdquo Intelligent Data Analysis vol 13 no 3 pp457ndash475 2009

[20] A Mahmood K Shi and S Khatoon ldquoMining data generatedby sensor networks a surveyrdquo Information Technology Journalvol 11 pp 1534ndash1543 2012

[21] D J Cook M Youngblood E O Heierman III et alldquoMavHome an agent-based smart homerdquo in Proceedings of the1st IEEE International Conference on Pervasive Computing andCommunications (PerCom rsquo03) pp 521ndash524 March 2003

[22] J Rabatel S Bringay and P Poncelet ldquoSO MAD sensorminingfor anomaly detection in railway datardquo in Advances in DataMining Applications andTheoretical Aspects pp 191ndash205 2009

[23] V Guralnik and K Z Haigh ldquoLearning models of humanbehaviour with sequential patternsrdquo in Proceedings of the AAAI-02 Workshop on Automation as Caregiver pp 24ndash30 2002

[24] S Huang and Y Dong ldquoAn active learning system for miningtime-changing data streamsrdquo Intelligent Data Analysis vol 11no 4 pp 401ndash419 2007

[25] J Beringer and E Hullermeier ldquoEfficient instance-based learn-ing on data streamsrdquo Intelligent Data Analysis vol 11 no 6 pp627ndash650 2007

[26] E J Spinosaa A PD L F deCarvalhoa and J Gamab ldquoNoveltydetection with application to data streamsrdquo Intelligent DataAnalysis vol 13 no 3 pp 405ndash422 2009

[27] M Xie S Han B Tian and S Parvin ldquoAnomaly detectionin wireless sensor networks a surveyrdquo Journal of Network andComputer Applications vol 34 no 4 pp 1302ndash1325 2011

[28] Y Zhang N Meratnia and P Havinga ldquoOutlier detectiontechniques for wireless sensor networks a surveyrdquo IEEE Com-munications Surveys and Tutorials vol 12 no 2 pp 159ndash1702010

[29] V Chandola A Banerjee and V Kumar ldquoAnomaly detection asurveyrdquo ACM Computing Surveys vol 41 no 3 article 15 2009

[30] VMaojo and J Sanandre ldquoA survey of data mining techniquesrdquoMedical Data Analysis Lecture Notes in Computer Science vol1933 pp 17ndash22 2000

[31] W Jinlong X Congfu C Weidong and P Yunhe ldquoSurveyof the study on frequent pattern mining in data streamsrdquo inProceedings of the IEEE International Conference on SystemsMan and Cybernetics (SMC rsquo04) pp 5917ndash5922 October 2004

[32] J Cheng Y Ke and W Ng ldquoA survey on algorithms formining frequent itemsets over data streamsrdquo Knowledge andInformation Systems vol 16 no 1 pp 1ndash27 2008

[33] A A Abbasi andM Younis ldquoA survey on clustering algorithmsfor wireless sensor networksrdquo Computer Communications vol30 no 14-15 pp 2826ndash2841 2007

[34] O Boyinbode H Le and M Takizawa ldquoA survey on clusteringalgorithms for wireless sensor networksrdquo International Journalof Space-Based and SituatedComputing vol 1 no 2 pp 130ndash1362007

[35] M M Gaber A Zaslavsky and S Krishnaswamy ldquoA survey ofclassificationmethods in data streamsrdquo inData Streams pp 39ndash59 Springer 2007

[36] R Agrawal and R Srikant ldquoFast algorithms for mining associ-ation rulesrdquo in Proceedings of the 20th International ConferenceVery Large Data Bases (VLDB rsquo94) pp 487ndash499 Citeseer 1994

[37] R J Bayardo Jr ldquoEfficiently mining long patterns fromdatabasesrdquo SIGMOD Record vol 27 no 2 pp 85ndash93 1998

[38] S Brin RMotwani andC Silverstein ldquoBeyondmarket basketsgeneralizing association rules to correlationsrdquo SIGMODRecordvol 26 no 2 pp 265ndash276 1997

[39] W Cheung and O R Zaiane ldquoIncremental mining of frequentpatterns without candidate generation or support constraintrdquoin Proceedings of 7th International Database Engineering andApplications Symposium pp 111ndash116 2003

[40] R Agrawal T Imielinski and A Swami ldquoMining associationrules between sets of items in large databasesrdquo in Proceeding ofSIGMOD pp 207ndash216

[41] J Han J Pei Y Yin and R Mao ldquoMining frequent pat-terns without candidate generation a frequent-pattern treeapproachrdquo Data Mining and Knowledge Discovery vol 8 no 1pp 53ndash87 2004

[42] M Halatchev and L Gruenwald ldquoEstimating missing valuesin related sensor data streamsrdquo in Proceedings of the 11thInternational Conference on Management of Data (COMADrsquo05) 2005

[43] N Jiang ldquoDiscovering association rules in data streams basedon closed pattern miningrdquo in Proceedings of the SIGMODWorkshop on Innovative Database Research 2007

[44] N Jiang and L Gruenwald ldquoEstimating missing data in datastreamsrdquo Advances in Databases Concepts Systems and Appli-cations pp 981ndash987 2007

[45] N Jiang and L Gruenwald ldquoCFI-stream mining closed fre-quent itemsets in data streamsrdquo in Proceedings of the 12th ACMSIGKDD International Conference on Knowledge Discovery andData Mining (KDD rsquo06) pp 592ndash597 August 2006

[46] K Loo I Tong and B Kao ldquoOnline algorithms for min-ing inter-stream associations from large sensor networksrdquo inAdvances in KnowledgeDiscovery andDataMining pp 291ndash3022005

[47] G S Manku and R Motwani ldquoApproximate frequency countsover data streamsrdquo in Proceedings of the 28th InternationalConference on Very Large Data Bases pp 346ndash357 2002

[48] S K Chong S Krishnaswamy S W Loke and M M GaberldquoUsing association rules for energy conservation in wirelesssensor networksrdquo in Proceedings of the 23rd Annual ACMSymposium on Applied Computing (SAC rsquo08) pp 971ndash975March 2008

[49] S K Tanbeer C F Ahmed B-S Jeong and Y-K Lee ldquoEfficientmining of association rules from wireless sensor networksrdquo inProceedings of the 11th International Conference on AdvancedCommunication Technology (ICACT rsquo09) pp 719ndash724 February2009

[50] A Boukerche and S Samarah ldquoA novel algorithm for miningassociation rules in Wireless Ad Hoc Sensor Networksrdquo IEEETransactions on Parallel and Distributed Systems vol 19 no 7pp 865ndash877 2008

[51] K Romer ldquoDistributed mining of spatio-temporal event pat-terns in sensor networksrdquo in Proceedings of the 1st Euro-American Workshop on Middleware for Sensor Networks(EAWMS rsquo06) 2006

[52] BTnode platform httpwwwbtnodeethzch[53] R Agrawal and R Srikant ldquoMining sequential patternsrdquo in

Proceedings of the IEEE 11th International Conference on DataEngineering pp 3ndash14 March 1995

International Journal of Distributed Sensor Networks 23

[54] R Srikant and R Agrawal ldquoMining sequential patterns gen-eralizations and performance improvementsrdquo in Proceedings ofthe Advances in Database Technology (EDBT rsquo96) pp 1ndash17 1996

[55] F Masseglia F Cathala and P Poncelet ldquoThe PSP approachfor mining sequential patternsrdquo Principles of Data Mining andKnowledge Discovery pp 176ndash184 1998

[56] J Han J Pei B Mortazavi-Asl Q Chen U Dayal and M-CHsu ldquoFreeSpan frequent pattern-projected sequential patternminingrdquo in Proceedings of the Sixth ACMSIGKDD InternationalConference onKnowledgeDiscovery andDataMining (KDD rsquo01)pp 355ndash359 August 2000

[57] J Pei J Han B Mortazavi-Asl et al ldquoPrefixSpan min-ing sequential patterns efficiently by prefix-projected patterngrowthrdquo in Proceedings of the 17th International Conference onData Engineering pp 215ndash224 April 2001

[58] F Esposito T M A Basile N Di Mauro and S Ferilli ldquoA rela-tional approach to sensor network data miningrdquo InformationRetrieval and Mining in Distributed Environments pp 163ndash1812010

[59] F Esposito N Di Mauro T M A Basile and S FerillildquoMulti-dimensional relational sequence miningrdquo FundamentaInformaticae vol 89 no 1 pp 23ndash43 2008

[60] R Agrawal H Mannila R Srikant et al ldquoFast discovery ofassociation rulesrdquo inAdvances in KnowledgeDiscovery andDataMining pp 307ndash328 AAAI PressMenlo Park Calif USA 1996

[61] Mica2Dot CrossBow 2005 httpwwwxbowcom[62] Intel Berkeley Research Lab Data httpdbcsailmitedulab-

datalabdatahtml[63] P H Wu W C Peng and M S Chen ldquoMining sequential

alarm patterns in a telecommunication databaserdquo in Databasesin Telecommunications II pp 37ndash51 2001

[64] V S Tseng and E H-C Lu ldquoEnergy-efficient real-time objecttracking in multi-level sensor networks by mining and predict-ing movement patternsrdquo Journal of Systems and Software vol82 no 4 pp 697ndash706 2009

[65] V S Tseng and K W Lin ldquoEnergy efficient strategies for objecttracking in sensor networks a data mining approachrdquo Journalof Systems and Software vol 80 no 10 pp 1678ndash1698 2007

[66] S Samarah M Al-Hajri and A Boukerche ldquoA predictiveenergy-efficient technique to support object-tracking sensornetworksrdquo IEEE Transactions on Vehicular Technology vol 60no 2 pp 656ndash663 2011

[67] A Taherkordi R Mohammadi and F Eliassen ldquoA commu-nication-efficient distributed clustering algorithm for sensornetworksrdquo in Proceedings of the 22nd International Conferenceon Advanced Information Networking and Applications Work-shopsSymposia (AINA rsquo08) pp 634ndash638 March 2008

[68] G Gupta and M Younis ldquoLoad-balanced clustering of wirelesssensor networksrdquo in Proceedings of the International Conferenceon Communications (ICC rsquo03) vol 3 pp 1848ndash1852 May 2003

[69] S Bandyopadhyay and E J Coyle ldquoAn energy efficient hier-archical clustering algorithm for wireless sensor networksrdquo inProceedings of the 22nd Annual Joint Conference on the IEEEComputer and Communications Societies pp 1713ndash1723 April2003

[70] S Ghiasi A Srivastava X Yang and M Sarrafzadeh ldquoOptimalenergy aware clustering in sensor networksrdquo Sensors vol 2 no7 pp 258ndash269 2002

[71] O Younis and S Fahmy ldquoHEED a hybrid energy-efficientdistributed clustering approach for ad hoc sensor networksrdquoIEEE Transactions on Mobile Computing vol 3 no 4 pp 366ndash379 2004

[72] M Younis M Youssef and K Arisha ldquoEnergy-aware manage-ment for cluster-based sensor networksrdquo Computer Networksvol 43 no 5 pp 649ndash668 2003

[73] Y T Hou Y Shi H D Sherali and S F Midkiff ldquoOn energyprovisioning and relay node placement for wireless sensornetworksrdquo IEEE Transactions on Wireless Communications vol4 no 5 pp 2579ndash2590 2005

[74] T Wu and S Biswas ldquoA self-reorganizing slot allocation proto-col for multi-cluster sensor networksrdquo in Proceedings of the 4thInternational Symposium on Information Processing in SensorNetworks (IPSN rsquo05) pp 309ndash316 April 2005

[75] K Dasgupta K Kalpakis and P Namjoshi ldquoAn efficientclustering-based heuristic for data gathering and aggregationin sensor networksrdquo in Proceedings of the IEEE Wireless Com-munications and Networking Conference (WCNC rsquo03) vol 3 pp1948ndash1953 2003

[76] M Demirbas A Arora and V Mittal ldquoFLOC A fast local clus-tering service for wireless sensor networksrdquo in Proceedings ofWorkshop on Dependability Issues in Wireless Ad Hoc Networksand Sensor Networks (DIWANS rsquo04) 2004

[77] P Ding J Holliday and A Celik ldquoDistributed energy-efficienthierarchical clustering for wireless sensor networksrdquo in Pro-ceedings of the 1st IEEE International Conference on DistributedComputing in Sensor Systems (DCOSS rsquo05) pp 466ndash467 July2005

[78] H Chan and A Perrig ldquoACE an emergent algorithm for highlyuniform cluster formationrdquoWireless Sensor Networks vol 2920pp 154ndash171 2004

[79] H Chan M Luk and A Perrig ldquoUsing clustering informationfor sensor network localizationrdquo in Proceedings of the 1st IEEEInternational Conference on Distributed Computing in SensorSystems (DCOSS rsquo05) pp 109ndash125 July 2005

[80] H Huang and J Wu ldquoA probabilistic clustering algorithmin wireless sensor networksrdquo in Proceeding of IEEE 62ndSemiannual Vehicular Technology Conference (VTC rsquo05) p 17962005

[81] A Youssef M Younis M Youssef and A Agrawala ldquoDis-tributed formation of overlappingmulti-hop clusters in wirelesssensor networksrdquo in Proceedings of the 49th Annual IEEE GlobalCommunication Conference (Globecom rsquo06) pp 1ndash6 December2006

[82] S Dai P Wang L Gao and S Zheng ldquoMining clusteringalgorithm in wireless sensor networksrdquo in Proceedings of theIEEE International Conference on Granular Computing (GRCrsquo08) pp 178ndash182 August 2008

[83] W R Heinzelman A Chandrakasan and H Balakrish-nan ldquoEnergy-efficient communication protocol for wirelessmicrosensor networksrdquo in Proceedings of the 33rd AnnualHawaii International Conference on System Siences (HICSS rsquo00)vol 2 p 223 January 2000

[84] C Liu K Wu and J Pei ldquoA dynamic clustering and schedulingapproach to energy saving in data collection from wirelesssensor networksrdquo in Proceedings of the 2nd Annual IEEE Com-munications Society Conference on Sensor and AdHoc Commu-nications and Networks (SECON rsquo05) pp 374ndash385 September2005

[85] L Guo C Ai X Wang Z Cai and Y Li ldquoReal time clusteringof sensory data in wireless sensor networksrdquo in Proceedingsof the IEEE 28th International Performance Computing andCommunications Conference (IPCCC rsquo09) pp 33ndash40 December2009

24 International Journal of Distributed Sensor Networks

[86] M H Yeo M S Lee S J Lee and J S Yoo ldquoData correlation-based clustering in sensor networksrdquo in Proceedings of the Inter-national Symposium on Computer Science and its Applications(CSA rsquo08) pp 332ndash337 October 2008

[87] P Beyens A Nowe and K Steenhaut ldquoHigh-density wirelesssensor networks a new clustering approach for prediction-based monitoringrdquo in Proceedings of the 2nd European Work-shop on Wireless Sensor Networks (EWSN rsquo05) pp 188ndash196February 2005

[88] S Yoon and C Shahabi ldquoThe Clustered AGgregation (CAG)technique leveraging spatial and temporal correlations in wire-less sensor networksrdquo ACM Transactions on Sensor Networksvol 3 no 1 Article ID 1210672 2007

[89] K Wang S A Ayyash T D C Little and P Basu ldquoAttribute-based clustering for information dissemination in wirelesssensor networksrdquo in Proceedings of the 2nd Annual IEEE Com-munications Society Conference on Sensor and AdHoc Commu-nications and Networks (SECON rsquo05) pp 498ndash509 Santa ClaraCalif USA September 2005

[90] X Ma S Li Q Luo et al ldquoDistributed hierarchical clusteringand summarization in sensor networksrdquo in Advances in Dataand Web Management pp 168ndash175 2007

[91] L K Sharma O P Vyas S Schieder et al ldquoNearest neighbourclassification for trajectory datardquo Information and Communica-tion Technologies vol 101 pp 180ndash185 2010

[92] B Chikhaoui S Wang and H Pigot ldquoA new algorithm basedon sequential pattern mining for person identification in ubiq-uitous environmentsrdquo in Proceedings of the 4th InternationalWorkshop on Knowledge Discovery form Sensor Data (ACMSensorKDD rsquo10) pp 20ndash28 Washington DC USA 2010

[93] J R M Bauchet S Giroux H Pigot et al ldquoPervasive assistancein smart homes for people with intellectual disabilities a casestudy on meal preparationrdquo International Journal of AssistiveRobotics and Mechatronics vol 9 no 4 pp 42ndash54 2008

[94] D J Cook andM Schmitter-Edgecombe ldquoAssessing the qualityof activities in a smart environmentrdquoMethods of Information inMedicine vol 48 no 5 pp 480ndash485 2009

[95] I H Witten and E Frank Data Mining Practical MachineLearning Tools and Techniques With Java Implementation Mor-gan Kaufmann 2000

[96] K Sharma M Rajpoot and L K Sharma ldquoNearest neighbourclassification for wireless sensor network datardquo InternationalJournal of Computer Trends and Technology no 2 2011

[97] NS2 Simulator httpwwwisiedunsnamns[98] O P V L K Sharma S Schieder and A K Akasapu ldquoA nearest

neighbour classification for trajectory datardquo in Springer CCISvol 101 pp 180ndash185 2010

[99] M J Akhlaghinia A Lotfi C Langensiepen and N SherkatldquoA fuzzy predictor model for the occupancy prediction of anintelligent inhabited environmentrdquo in Proceedings of the IEEEInternational Conference on Fuzzy Systems (FUZZ rsquo08) pp 939ndash946 June 2008

[100] M Gaber S Krishnaswamy and A Zaslavsky ldquoOn-boardmining of data streams in sensor networksrdquo in AdvancedMethods for Knowledge Discovery from Complex Data pp 307ndash335 2005

[101] M M Gaber S Krishnaswamy and A Zaslavsky ldquoAdaptivemining techniques for data streams using algorithm outputgranularityrdquo in Proceedings of the Australasian Data MiningWorkshop 2003

[102] M M Gaber A Zaslavsky and S Krishnaswamy ldquoResource-aware knowledge discovery in data streamsrdquo in Proceedingsof 1st International Workshop on Knowledge Discovery in DataStreams held in Conjunction ECML and PKDD 2004

[103] S M McConnell and D B Skillicorn ldquoA distributed approachfor prediction in sensor networksrdquo in Proceedings of the Work-shop on Data Mining in Sensor Networks Newport Beach CalifUSA 2005

[104] B Malhotra I Nikolaidis and J Harms ldquoDistributed classifi-cation of acoustic targets in wireless audio-sensor networksrdquoComputer Networks vol 52 no 13 pp 2582ndash2593 2008

[105] K Flouri B Beferull-Lozano and T Tsakalides ldquoTraininga SVM-based classifier in distributed sensor networksrdquo inProceedings of the 14th International Conference onDigital SignalProcessing (DSP rsquo09) pp 1ndash5 2006

[106] K Flouri B Beferull-Lozano and T Tsakalides ldquoEnergy-efficient distributed support vectormachines for wireless sensornetworksrdquo in Proceedings of the EuropeanWorkshop onWirelessSensor Networks 2006

[107] K Flouri B Beferull-Lozano and T Tsakalides ldquoDistributedconsensus algorithms for SVM training in wireless sensornetworksrdquo in Proceedings of the 16th European Signal ProcessingConference (EUSIPCO 09) 2008

[108] S Rajasegarar C Leckie M Palaniswami and J C BezdekldquoQuarter sphere based distributed anomaly detection in wire-less sensor networksrdquo in Proceedings of the IEEE InternationalConference on Communications (ICC rsquo07) pp 3864ndash3869 June2007

[109] B Chikhaoui S Wang and H Pigot ldquoA new algorithm basedon sequential pattern mining for person identification in ubiq-uitous environmentsrdquo in Proceedings of the 4th InternationalWorkshop on Knowledge Discovery form Sensor Data (ACMSensorKDD rsquo10) pp 20ndash28 Washington DC USA 2010

[110] K Romer and F Mattern ldquoThe design space of wireless sensornetworksrdquo IEEEWireless Communications vol 11 no 6 pp 54ndash61 2004

[111] O Diallo J J P C Rodrigues and M Sene ldquoReal-time datamanagement on wireless sensor networks a surveyrdquo Journal ofNetwork andComputer Applications vol 35 no 3 pp 1013ndash10212012

[112] Y Yao L Feng B Jin and F Chen ldquoAn incremental learningapproachwith SupportVectorMachine for network data streamclassification problemrdquo Information Technology Journal vol 11no 2 pp 200ndash208 2012

Submit your manuscripts athttpwwwhindawicom

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2013Part I

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

DistributedSensor Networks

International Journal of

ISRN Signal Processing

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Mechanical Engineering

Advances in

Modelling amp Simulation in EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Advances inOptoElectronics

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2013

ISRN Sensor Networks

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawi Publishing Corporation httpwwwhindawicom Volume 2013

The Scientific World Journal

ISRN Robotics

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

International Journal of

Antennas andPropagation

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

ISRN Electronics

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

thinspJournalthinspofthinsp

Sensors

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Active and Passive Electronic Components

Chemical EngineeringInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Electrical and Computer Engineering

Journal of

ISRN Civil Engineering

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Advances inAcoustics ampVibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Page 22: ReviewArticle Data Mining Techniques for Wireless Sensor ...home.etf.bg.ac.rs/~vm/os/dmsw/Data Mining... · have a large impact on type of data mining algorithm to choose;therefore,onehastodecidetheprocessing

22 International Journal of Distributed Sensor Networks

streamsrdquo Intelligent Data Analysis vol 15 no 4 pp 571ndash5872011

[18] J Gama A Ganguly O Omitaomu R Vatsavai and M GaberldquoKnowledge discovery from data streamsrdquo Intelligent DataAnalysis vol 13 no 3 pp 403ndash404 2009

[19] B George J M Kang and S Shekhar ldquoSpatio-temporal sensorgraphs (STSG) a data model for the discovery of spatio-temporal patternsrdquo Intelligent Data Analysis vol 13 no 3 pp457ndash475 2009

[20] A Mahmood K Shi and S Khatoon ldquoMining data generatedby sensor networks a surveyrdquo Information Technology Journalvol 11 pp 1534ndash1543 2012

[21] D J Cook M Youngblood E O Heierman III et alldquoMavHome an agent-based smart homerdquo in Proceedings of the1st IEEE International Conference on Pervasive Computing andCommunications (PerCom rsquo03) pp 521ndash524 March 2003

[22] J Rabatel S Bringay and P Poncelet ldquoSO MAD sensorminingfor anomaly detection in railway datardquo in Advances in DataMining Applications andTheoretical Aspects pp 191ndash205 2009

[23] V Guralnik and K Z Haigh ldquoLearning models of humanbehaviour with sequential patternsrdquo in Proceedings of the AAAI-02 Workshop on Automation as Caregiver pp 24ndash30 2002

[24] S Huang and Y Dong ldquoAn active learning system for miningtime-changing data streamsrdquo Intelligent Data Analysis vol 11no 4 pp 401ndash419 2007

[25] J Beringer and E Hullermeier ldquoEfficient instance-based learn-ing on data streamsrdquo Intelligent Data Analysis vol 11 no 6 pp627ndash650 2007

[26] E J Spinosaa A PD L F deCarvalhoa and J Gamab ldquoNoveltydetection with application to data streamsrdquo Intelligent DataAnalysis vol 13 no 3 pp 405ndash422 2009

[27] M Xie S Han B Tian and S Parvin ldquoAnomaly detectionin wireless sensor networks a surveyrdquo Journal of Network andComputer Applications vol 34 no 4 pp 1302ndash1325 2011

[28] Y Zhang N Meratnia and P Havinga ldquoOutlier detectiontechniques for wireless sensor networks a surveyrdquo IEEE Com-munications Surveys and Tutorials vol 12 no 2 pp 159ndash1702010

[29] V Chandola A Banerjee and V Kumar ldquoAnomaly detection asurveyrdquo ACM Computing Surveys vol 41 no 3 article 15 2009

[30] VMaojo and J Sanandre ldquoA survey of data mining techniquesrdquoMedical Data Analysis Lecture Notes in Computer Science vol1933 pp 17ndash22 2000

[31] W Jinlong X Congfu C Weidong and P Yunhe ldquoSurveyof the study on frequent pattern mining in data streamsrdquo inProceedings of the IEEE International Conference on SystemsMan and Cybernetics (SMC rsquo04) pp 5917ndash5922 October 2004

[32] J Cheng Y Ke and W Ng ldquoA survey on algorithms formining frequent itemsets over data streamsrdquo Knowledge andInformation Systems vol 16 no 1 pp 1ndash27 2008

[33] A A Abbasi andM Younis ldquoA survey on clustering algorithmsfor wireless sensor networksrdquo Computer Communications vol30 no 14-15 pp 2826ndash2841 2007

[34] O Boyinbode H Le and M Takizawa ldquoA survey on clusteringalgorithms for wireless sensor networksrdquo International Journalof Space-Based and SituatedComputing vol 1 no 2 pp 130ndash1362007

[35] M M Gaber A Zaslavsky and S Krishnaswamy ldquoA survey ofclassificationmethods in data streamsrdquo inData Streams pp 39ndash59 Springer 2007

[36] R Agrawal and R Srikant ldquoFast algorithms for mining associ-ation rulesrdquo in Proceedings of the 20th International ConferenceVery Large Data Bases (VLDB rsquo94) pp 487ndash499 Citeseer 1994

[37] R J Bayardo Jr ldquoEfficiently mining long patterns fromdatabasesrdquo SIGMOD Record vol 27 no 2 pp 85ndash93 1998

[38] S Brin RMotwani andC Silverstein ldquoBeyondmarket basketsgeneralizing association rules to correlationsrdquo SIGMODRecordvol 26 no 2 pp 265ndash276 1997

[39] W Cheung and O R Zaiane ldquoIncremental mining of frequentpatterns without candidate generation or support constraintrdquoin Proceedings of 7th International Database Engineering andApplications Symposium pp 111ndash116 2003

[40] R Agrawal T Imielinski and A Swami ldquoMining associationrules between sets of items in large databasesrdquo in Proceeding ofSIGMOD pp 207ndash216

[41] J Han J Pei Y Yin and R Mao ldquoMining frequent pat-terns without candidate generation a frequent-pattern treeapproachrdquo Data Mining and Knowledge Discovery vol 8 no 1pp 53ndash87 2004

[42] M Halatchev and L Gruenwald ldquoEstimating missing valuesin related sensor data streamsrdquo in Proceedings of the 11thInternational Conference on Management of Data (COMADrsquo05) 2005

[43] N Jiang ldquoDiscovering association rules in data streams basedon closed pattern miningrdquo in Proceedings of the SIGMODWorkshop on Innovative Database Research 2007

[44] N Jiang and L Gruenwald ldquoEstimating missing data in datastreamsrdquo Advances in Databases Concepts Systems and Appli-cations pp 981ndash987 2007

[45] N Jiang and L Gruenwald ldquoCFI-stream mining closed fre-quent itemsets in data streamsrdquo in Proceedings of the 12th ACMSIGKDD International Conference on Knowledge Discovery andData Mining (KDD rsquo06) pp 592ndash597 August 2006

[46] K Loo I Tong and B Kao ldquoOnline algorithms for min-ing inter-stream associations from large sensor networksrdquo inAdvances in KnowledgeDiscovery andDataMining pp 291ndash3022005

[47] G S Manku and R Motwani ldquoApproximate frequency countsover data streamsrdquo in Proceedings of the 28th InternationalConference on Very Large Data Bases pp 346ndash357 2002

[48] S K Chong S Krishnaswamy S W Loke and M M GaberldquoUsing association rules for energy conservation in wirelesssensor networksrdquo in Proceedings of the 23rd Annual ACMSymposium on Applied Computing (SAC rsquo08) pp 971ndash975March 2008

[49] S K Tanbeer C F Ahmed B-S Jeong and Y-K Lee ldquoEfficientmining of association rules from wireless sensor networksrdquo inProceedings of the 11th International Conference on AdvancedCommunication Technology (ICACT rsquo09) pp 719ndash724 February2009

[50] A Boukerche and S Samarah ldquoA novel algorithm for miningassociation rules in Wireless Ad Hoc Sensor Networksrdquo IEEETransactions on Parallel and Distributed Systems vol 19 no 7pp 865ndash877 2008

[51] K Romer ldquoDistributed mining of spatio-temporal event pat-terns in sensor networksrdquo in Proceedings of the 1st Euro-American Workshop on Middleware for Sensor Networks(EAWMS rsquo06) 2006

[52] BTnode platform httpwwwbtnodeethzch[53] R Agrawal and R Srikant ldquoMining sequential patternsrdquo in

Proceedings of the IEEE 11th International Conference on DataEngineering pp 3ndash14 March 1995

International Journal of Distributed Sensor Networks 23

[54] R Srikant and R Agrawal ldquoMining sequential patterns gen-eralizations and performance improvementsrdquo in Proceedings ofthe Advances in Database Technology (EDBT rsquo96) pp 1ndash17 1996

[55] F Masseglia F Cathala and P Poncelet ldquoThe PSP approachfor mining sequential patternsrdquo Principles of Data Mining andKnowledge Discovery pp 176ndash184 1998

[56] J Han J Pei B Mortazavi-Asl Q Chen U Dayal and M-CHsu ldquoFreeSpan frequent pattern-projected sequential patternminingrdquo in Proceedings of the Sixth ACMSIGKDD InternationalConference onKnowledgeDiscovery andDataMining (KDD rsquo01)pp 355ndash359 August 2000

[57] J Pei J Han B Mortazavi-Asl et al ldquoPrefixSpan min-ing sequential patterns efficiently by prefix-projected patterngrowthrdquo in Proceedings of the 17th International Conference onData Engineering pp 215ndash224 April 2001

[58] F Esposito T M A Basile N Di Mauro and S Ferilli ldquoA rela-tional approach to sensor network data miningrdquo InformationRetrieval and Mining in Distributed Environments pp 163ndash1812010

[59] F Esposito N Di Mauro T M A Basile and S FerillildquoMulti-dimensional relational sequence miningrdquo FundamentaInformaticae vol 89 no 1 pp 23ndash43 2008

[60] R Agrawal H Mannila R Srikant et al ldquoFast discovery ofassociation rulesrdquo inAdvances in KnowledgeDiscovery andDataMining pp 307ndash328 AAAI PressMenlo Park Calif USA 1996

[61] Mica2Dot CrossBow 2005 httpwwwxbowcom[62] Intel Berkeley Research Lab Data httpdbcsailmitedulab-

datalabdatahtml[63] P H Wu W C Peng and M S Chen ldquoMining sequential

alarm patterns in a telecommunication databaserdquo in Databasesin Telecommunications II pp 37ndash51 2001

[64] V S Tseng and E H-C Lu ldquoEnergy-efficient real-time objecttracking in multi-level sensor networks by mining and predict-ing movement patternsrdquo Journal of Systems and Software vol82 no 4 pp 697ndash706 2009

[65] V S Tseng and K W Lin ldquoEnergy efficient strategies for objecttracking in sensor networks a data mining approachrdquo Journalof Systems and Software vol 80 no 10 pp 1678ndash1698 2007

[66] S Samarah M Al-Hajri and A Boukerche ldquoA predictiveenergy-efficient technique to support object-tracking sensornetworksrdquo IEEE Transactions on Vehicular Technology vol 60no 2 pp 656ndash663 2011

[67] A Taherkordi R Mohammadi and F Eliassen ldquoA commu-nication-efficient distributed clustering algorithm for sensornetworksrdquo in Proceedings of the 22nd International Conferenceon Advanced Information Networking and Applications Work-shopsSymposia (AINA rsquo08) pp 634ndash638 March 2008

[68] G Gupta and M Younis ldquoLoad-balanced clustering of wirelesssensor networksrdquo in Proceedings of the International Conferenceon Communications (ICC rsquo03) vol 3 pp 1848ndash1852 May 2003

[69] S Bandyopadhyay and E J Coyle ldquoAn energy efficient hier-archical clustering algorithm for wireless sensor networksrdquo inProceedings of the 22nd Annual Joint Conference on the IEEEComputer and Communications Societies pp 1713ndash1723 April2003

[70] S Ghiasi A Srivastava X Yang and M Sarrafzadeh ldquoOptimalenergy aware clustering in sensor networksrdquo Sensors vol 2 no7 pp 258ndash269 2002

[71] O Younis and S Fahmy ldquoHEED a hybrid energy-efficientdistributed clustering approach for ad hoc sensor networksrdquoIEEE Transactions on Mobile Computing vol 3 no 4 pp 366ndash379 2004

[72] M Younis M Youssef and K Arisha ldquoEnergy-aware manage-ment for cluster-based sensor networksrdquo Computer Networksvol 43 no 5 pp 649ndash668 2003

[73] Y T Hou Y Shi H D Sherali and S F Midkiff ldquoOn energyprovisioning and relay node placement for wireless sensornetworksrdquo IEEE Transactions on Wireless Communications vol4 no 5 pp 2579ndash2590 2005

[74] T Wu and S Biswas ldquoA self-reorganizing slot allocation proto-col for multi-cluster sensor networksrdquo in Proceedings of the 4thInternational Symposium on Information Processing in SensorNetworks (IPSN rsquo05) pp 309ndash316 April 2005

[75] K Dasgupta K Kalpakis and P Namjoshi ldquoAn efficientclustering-based heuristic for data gathering and aggregationin sensor networksrdquo in Proceedings of the IEEE Wireless Com-munications and Networking Conference (WCNC rsquo03) vol 3 pp1948ndash1953 2003

[76] M Demirbas A Arora and V Mittal ldquoFLOC A fast local clus-tering service for wireless sensor networksrdquo in Proceedings ofWorkshop on Dependability Issues in Wireless Ad Hoc Networksand Sensor Networks (DIWANS rsquo04) 2004

[77] P Ding J Holliday and A Celik ldquoDistributed energy-efficienthierarchical clustering for wireless sensor networksrdquo in Pro-ceedings of the 1st IEEE International Conference on DistributedComputing in Sensor Systems (DCOSS rsquo05) pp 466ndash467 July2005

[78] H Chan and A Perrig ldquoACE an emergent algorithm for highlyuniform cluster formationrdquoWireless Sensor Networks vol 2920pp 154ndash171 2004

[79] H Chan M Luk and A Perrig ldquoUsing clustering informationfor sensor network localizationrdquo in Proceedings of the 1st IEEEInternational Conference on Distributed Computing in SensorSystems (DCOSS rsquo05) pp 109ndash125 July 2005

[80] H Huang and J Wu ldquoA probabilistic clustering algorithmin wireless sensor networksrdquo in Proceeding of IEEE 62ndSemiannual Vehicular Technology Conference (VTC rsquo05) p 17962005

[81] A Youssef M Younis M Youssef and A Agrawala ldquoDis-tributed formation of overlappingmulti-hop clusters in wirelesssensor networksrdquo in Proceedings of the 49th Annual IEEE GlobalCommunication Conference (Globecom rsquo06) pp 1ndash6 December2006

[82] S Dai P Wang L Gao and S Zheng ldquoMining clusteringalgorithm in wireless sensor networksrdquo in Proceedings of theIEEE International Conference on Granular Computing (GRCrsquo08) pp 178ndash182 August 2008

[83] W R Heinzelman A Chandrakasan and H Balakrish-nan ldquoEnergy-efficient communication protocol for wirelessmicrosensor networksrdquo in Proceedings of the 33rd AnnualHawaii International Conference on System Siences (HICSS rsquo00)vol 2 p 223 January 2000

[84] C Liu K Wu and J Pei ldquoA dynamic clustering and schedulingapproach to energy saving in data collection from wirelesssensor networksrdquo in Proceedings of the 2nd Annual IEEE Com-munications Society Conference on Sensor and AdHoc Commu-nications and Networks (SECON rsquo05) pp 374ndash385 September2005

[85] L Guo C Ai X Wang Z Cai and Y Li ldquoReal time clusteringof sensory data in wireless sensor networksrdquo in Proceedingsof the IEEE 28th International Performance Computing andCommunications Conference (IPCCC rsquo09) pp 33ndash40 December2009

24 International Journal of Distributed Sensor Networks

[86] M H Yeo M S Lee S J Lee and J S Yoo ldquoData correlation-based clustering in sensor networksrdquo in Proceedings of the Inter-national Symposium on Computer Science and its Applications(CSA rsquo08) pp 332ndash337 October 2008

[87] P Beyens A Nowe and K Steenhaut ldquoHigh-density wirelesssensor networks a new clustering approach for prediction-based monitoringrdquo in Proceedings of the 2nd European Work-shop on Wireless Sensor Networks (EWSN rsquo05) pp 188ndash196February 2005

[88] S Yoon and C Shahabi ldquoThe Clustered AGgregation (CAG)technique leveraging spatial and temporal correlations in wire-less sensor networksrdquo ACM Transactions on Sensor Networksvol 3 no 1 Article ID 1210672 2007

[89] K Wang S A Ayyash T D C Little and P Basu ldquoAttribute-based clustering for information dissemination in wirelesssensor networksrdquo in Proceedings of the 2nd Annual IEEE Com-munications Society Conference on Sensor and AdHoc Commu-nications and Networks (SECON rsquo05) pp 498ndash509 Santa ClaraCalif USA September 2005

[90] X Ma S Li Q Luo et al ldquoDistributed hierarchical clusteringand summarization in sensor networksrdquo in Advances in Dataand Web Management pp 168ndash175 2007

[91] L K Sharma O P Vyas S Schieder et al ldquoNearest neighbourclassification for trajectory datardquo Information and Communica-tion Technologies vol 101 pp 180ndash185 2010

[92] B Chikhaoui S Wang and H Pigot ldquoA new algorithm basedon sequential pattern mining for person identification in ubiq-uitous environmentsrdquo in Proceedings of the 4th InternationalWorkshop on Knowledge Discovery form Sensor Data (ACMSensorKDD rsquo10) pp 20ndash28 Washington DC USA 2010

[93] J R M Bauchet S Giroux H Pigot et al ldquoPervasive assistancein smart homes for people with intellectual disabilities a casestudy on meal preparationrdquo International Journal of AssistiveRobotics and Mechatronics vol 9 no 4 pp 42ndash54 2008

[94] D J Cook andM Schmitter-Edgecombe ldquoAssessing the qualityof activities in a smart environmentrdquoMethods of Information inMedicine vol 48 no 5 pp 480ndash485 2009

[95] I H Witten and E Frank Data Mining Practical MachineLearning Tools and Techniques With Java Implementation Mor-gan Kaufmann 2000

[96] K Sharma M Rajpoot and L K Sharma ldquoNearest neighbourclassification for wireless sensor network datardquo InternationalJournal of Computer Trends and Technology no 2 2011

[97] NS2 Simulator httpwwwisiedunsnamns[98] O P V L K Sharma S Schieder and A K Akasapu ldquoA nearest

neighbour classification for trajectory datardquo in Springer CCISvol 101 pp 180ndash185 2010

[99] M J Akhlaghinia A Lotfi C Langensiepen and N SherkatldquoA fuzzy predictor model for the occupancy prediction of anintelligent inhabited environmentrdquo in Proceedings of the IEEEInternational Conference on Fuzzy Systems (FUZZ rsquo08) pp 939ndash946 June 2008

[100] M Gaber S Krishnaswamy and A Zaslavsky ldquoOn-boardmining of data streams in sensor networksrdquo in AdvancedMethods for Knowledge Discovery from Complex Data pp 307ndash335 2005

[101] M M Gaber S Krishnaswamy and A Zaslavsky ldquoAdaptivemining techniques for data streams using algorithm outputgranularityrdquo in Proceedings of the Australasian Data MiningWorkshop 2003

[102] M M Gaber A Zaslavsky and S Krishnaswamy ldquoResource-aware knowledge discovery in data streamsrdquo in Proceedingsof 1st International Workshop on Knowledge Discovery in DataStreams held in Conjunction ECML and PKDD 2004

[103] S M McConnell and D B Skillicorn ldquoA distributed approachfor prediction in sensor networksrdquo in Proceedings of the Work-shop on Data Mining in Sensor Networks Newport Beach CalifUSA 2005

[104] B Malhotra I Nikolaidis and J Harms ldquoDistributed classifi-cation of acoustic targets in wireless audio-sensor networksrdquoComputer Networks vol 52 no 13 pp 2582ndash2593 2008

[105] K Flouri B Beferull-Lozano and T Tsakalides ldquoTraininga SVM-based classifier in distributed sensor networksrdquo inProceedings of the 14th International Conference onDigital SignalProcessing (DSP rsquo09) pp 1ndash5 2006

[106] K Flouri B Beferull-Lozano and T Tsakalides ldquoEnergy-efficient distributed support vectormachines for wireless sensornetworksrdquo in Proceedings of the EuropeanWorkshop onWirelessSensor Networks 2006

[107] K Flouri B Beferull-Lozano and T Tsakalides ldquoDistributedconsensus algorithms for SVM training in wireless sensornetworksrdquo in Proceedings of the 16th European Signal ProcessingConference (EUSIPCO 09) 2008

[108] S Rajasegarar C Leckie M Palaniswami and J C BezdekldquoQuarter sphere based distributed anomaly detection in wire-less sensor networksrdquo in Proceedings of the IEEE InternationalConference on Communications (ICC rsquo07) pp 3864ndash3869 June2007

[109] B Chikhaoui S Wang and H Pigot ldquoA new algorithm basedon sequential pattern mining for person identification in ubiq-uitous environmentsrdquo in Proceedings of the 4th InternationalWorkshop on Knowledge Discovery form Sensor Data (ACMSensorKDD rsquo10) pp 20ndash28 Washington DC USA 2010

[110] K Romer and F Mattern ldquoThe design space of wireless sensornetworksrdquo IEEEWireless Communications vol 11 no 6 pp 54ndash61 2004

[111] O Diallo J J P C Rodrigues and M Sene ldquoReal-time datamanagement on wireless sensor networks a surveyrdquo Journal ofNetwork andComputer Applications vol 35 no 3 pp 1013ndash10212012

[112] Y Yao L Feng B Jin and F Chen ldquoAn incremental learningapproachwith SupportVectorMachine for network data streamclassification problemrdquo Information Technology Journal vol 11no 2 pp 200ndash208 2012

Submit your manuscripts athttpwwwhindawicom

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2013Part I

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

DistributedSensor Networks

International Journal of

ISRN Signal Processing

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Mechanical Engineering

Advances in

Modelling amp Simulation in EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Advances inOptoElectronics

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2013

ISRN Sensor Networks

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawi Publishing Corporation httpwwwhindawicom Volume 2013

The Scientific World Journal

ISRN Robotics

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

International Journal of

Antennas andPropagation

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

ISRN Electronics

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

thinspJournalthinspofthinsp

Sensors

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Active and Passive Electronic Components

Chemical EngineeringInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Electrical and Computer Engineering

Journal of

ISRN Civil Engineering

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Advances inAcoustics ampVibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Page 23: ReviewArticle Data Mining Techniques for Wireless Sensor ...home.etf.bg.ac.rs/~vm/os/dmsw/Data Mining... · have a large impact on type of data mining algorithm to choose;therefore,onehastodecidetheprocessing

International Journal of Distributed Sensor Networks 23

[54] R Srikant and R Agrawal ldquoMining sequential patterns gen-eralizations and performance improvementsrdquo in Proceedings ofthe Advances in Database Technology (EDBT rsquo96) pp 1ndash17 1996

[55] F Masseglia F Cathala and P Poncelet ldquoThe PSP approachfor mining sequential patternsrdquo Principles of Data Mining andKnowledge Discovery pp 176ndash184 1998

[56] J Han J Pei B Mortazavi-Asl Q Chen U Dayal and M-CHsu ldquoFreeSpan frequent pattern-projected sequential patternminingrdquo in Proceedings of the Sixth ACMSIGKDD InternationalConference onKnowledgeDiscovery andDataMining (KDD rsquo01)pp 355ndash359 August 2000

[57] J Pei J Han B Mortazavi-Asl et al ldquoPrefixSpan min-ing sequential patterns efficiently by prefix-projected patterngrowthrdquo in Proceedings of the 17th International Conference onData Engineering pp 215ndash224 April 2001

[58] F Esposito T M A Basile N Di Mauro and S Ferilli ldquoA rela-tional approach to sensor network data miningrdquo InformationRetrieval and Mining in Distributed Environments pp 163ndash1812010

[59] F Esposito N Di Mauro T M A Basile and S FerillildquoMulti-dimensional relational sequence miningrdquo FundamentaInformaticae vol 89 no 1 pp 23ndash43 2008

[60] R Agrawal H Mannila R Srikant et al ldquoFast discovery ofassociation rulesrdquo inAdvances in KnowledgeDiscovery andDataMining pp 307ndash328 AAAI PressMenlo Park Calif USA 1996

[61] Mica2Dot CrossBow 2005 httpwwwxbowcom[62] Intel Berkeley Research Lab Data httpdbcsailmitedulab-

datalabdatahtml[63] P H Wu W C Peng and M S Chen ldquoMining sequential

alarm patterns in a telecommunication databaserdquo in Databasesin Telecommunications II pp 37ndash51 2001

[64] V S Tseng and E H-C Lu ldquoEnergy-efficient real-time objecttracking in multi-level sensor networks by mining and predict-ing movement patternsrdquo Journal of Systems and Software vol82 no 4 pp 697ndash706 2009

[65] V S Tseng and K W Lin ldquoEnergy efficient strategies for objecttracking in sensor networks a data mining approachrdquo Journalof Systems and Software vol 80 no 10 pp 1678ndash1698 2007

[66] S Samarah M Al-Hajri and A Boukerche ldquoA predictiveenergy-efficient technique to support object-tracking sensornetworksrdquo IEEE Transactions on Vehicular Technology vol 60no 2 pp 656ndash663 2011

[67] A Taherkordi R Mohammadi and F Eliassen ldquoA commu-nication-efficient distributed clustering algorithm for sensornetworksrdquo in Proceedings of the 22nd International Conferenceon Advanced Information Networking and Applications Work-shopsSymposia (AINA rsquo08) pp 634ndash638 March 2008

[68] G Gupta and M Younis ldquoLoad-balanced clustering of wirelesssensor networksrdquo in Proceedings of the International Conferenceon Communications (ICC rsquo03) vol 3 pp 1848ndash1852 May 2003

[69] S Bandyopadhyay and E J Coyle ldquoAn energy efficient hier-archical clustering algorithm for wireless sensor networksrdquo inProceedings of the 22nd Annual Joint Conference on the IEEEComputer and Communications Societies pp 1713ndash1723 April2003

[70] S Ghiasi A Srivastava X Yang and M Sarrafzadeh ldquoOptimalenergy aware clustering in sensor networksrdquo Sensors vol 2 no7 pp 258ndash269 2002

[71] O Younis and S Fahmy ldquoHEED a hybrid energy-efficientdistributed clustering approach for ad hoc sensor networksrdquoIEEE Transactions on Mobile Computing vol 3 no 4 pp 366ndash379 2004

[72] M Younis M Youssef and K Arisha ldquoEnergy-aware manage-ment for cluster-based sensor networksrdquo Computer Networksvol 43 no 5 pp 649ndash668 2003

[73] Y T Hou Y Shi H D Sherali and S F Midkiff ldquoOn energyprovisioning and relay node placement for wireless sensornetworksrdquo IEEE Transactions on Wireless Communications vol4 no 5 pp 2579ndash2590 2005

[74] T Wu and S Biswas ldquoA self-reorganizing slot allocation proto-col for multi-cluster sensor networksrdquo in Proceedings of the 4thInternational Symposium on Information Processing in SensorNetworks (IPSN rsquo05) pp 309ndash316 April 2005

[75] K Dasgupta K Kalpakis and P Namjoshi ldquoAn efficientclustering-based heuristic for data gathering and aggregationin sensor networksrdquo in Proceedings of the IEEE Wireless Com-munications and Networking Conference (WCNC rsquo03) vol 3 pp1948ndash1953 2003

[76] M Demirbas A Arora and V Mittal ldquoFLOC A fast local clus-tering service for wireless sensor networksrdquo in Proceedings ofWorkshop on Dependability Issues in Wireless Ad Hoc Networksand Sensor Networks (DIWANS rsquo04) 2004

[77] P Ding J Holliday and A Celik ldquoDistributed energy-efficienthierarchical clustering for wireless sensor networksrdquo in Pro-ceedings of the 1st IEEE International Conference on DistributedComputing in Sensor Systems (DCOSS rsquo05) pp 466ndash467 July2005

[78] H Chan and A Perrig ldquoACE an emergent algorithm for highlyuniform cluster formationrdquoWireless Sensor Networks vol 2920pp 154ndash171 2004

[79] H Chan M Luk and A Perrig ldquoUsing clustering informationfor sensor network localizationrdquo in Proceedings of the 1st IEEEInternational Conference on Distributed Computing in SensorSystems (DCOSS rsquo05) pp 109ndash125 July 2005

[80] H Huang and J Wu ldquoA probabilistic clustering algorithmin wireless sensor networksrdquo in Proceeding of IEEE 62ndSemiannual Vehicular Technology Conference (VTC rsquo05) p 17962005

[81] A Youssef M Younis M Youssef and A Agrawala ldquoDis-tributed formation of overlappingmulti-hop clusters in wirelesssensor networksrdquo in Proceedings of the 49th Annual IEEE GlobalCommunication Conference (Globecom rsquo06) pp 1ndash6 December2006

[82] S Dai P Wang L Gao and S Zheng ldquoMining clusteringalgorithm in wireless sensor networksrdquo in Proceedings of theIEEE International Conference on Granular Computing (GRCrsquo08) pp 178ndash182 August 2008

[83] W R Heinzelman A Chandrakasan and H Balakrish-nan ldquoEnergy-efficient communication protocol for wirelessmicrosensor networksrdquo in Proceedings of the 33rd AnnualHawaii International Conference on System Siences (HICSS rsquo00)vol 2 p 223 January 2000

[84] C Liu K Wu and J Pei ldquoA dynamic clustering and schedulingapproach to energy saving in data collection from wirelesssensor networksrdquo in Proceedings of the 2nd Annual IEEE Com-munications Society Conference on Sensor and AdHoc Commu-nications and Networks (SECON rsquo05) pp 374ndash385 September2005

[85] L Guo C Ai X Wang Z Cai and Y Li ldquoReal time clusteringof sensory data in wireless sensor networksrdquo in Proceedingsof the IEEE 28th International Performance Computing andCommunications Conference (IPCCC rsquo09) pp 33ndash40 December2009

24 International Journal of Distributed Sensor Networks

[86] M H Yeo M S Lee S J Lee and J S Yoo ldquoData correlation-based clustering in sensor networksrdquo in Proceedings of the Inter-national Symposium on Computer Science and its Applications(CSA rsquo08) pp 332ndash337 October 2008

[87] P Beyens A Nowe and K Steenhaut ldquoHigh-density wirelesssensor networks a new clustering approach for prediction-based monitoringrdquo in Proceedings of the 2nd European Work-shop on Wireless Sensor Networks (EWSN rsquo05) pp 188ndash196February 2005

[88] S Yoon and C Shahabi ldquoThe Clustered AGgregation (CAG)technique leveraging spatial and temporal correlations in wire-less sensor networksrdquo ACM Transactions on Sensor Networksvol 3 no 1 Article ID 1210672 2007

[89] K Wang S A Ayyash T D C Little and P Basu ldquoAttribute-based clustering for information dissemination in wirelesssensor networksrdquo in Proceedings of the 2nd Annual IEEE Com-munications Society Conference on Sensor and AdHoc Commu-nications and Networks (SECON rsquo05) pp 498ndash509 Santa ClaraCalif USA September 2005

[90] X Ma S Li Q Luo et al ldquoDistributed hierarchical clusteringand summarization in sensor networksrdquo in Advances in Dataand Web Management pp 168ndash175 2007

[91] L K Sharma O P Vyas S Schieder et al ldquoNearest neighbourclassification for trajectory datardquo Information and Communica-tion Technologies vol 101 pp 180ndash185 2010

[92] B Chikhaoui S Wang and H Pigot ldquoA new algorithm basedon sequential pattern mining for person identification in ubiq-uitous environmentsrdquo in Proceedings of the 4th InternationalWorkshop on Knowledge Discovery form Sensor Data (ACMSensorKDD rsquo10) pp 20ndash28 Washington DC USA 2010

[93] J R M Bauchet S Giroux H Pigot et al ldquoPervasive assistancein smart homes for people with intellectual disabilities a casestudy on meal preparationrdquo International Journal of AssistiveRobotics and Mechatronics vol 9 no 4 pp 42ndash54 2008

[94] D J Cook andM Schmitter-Edgecombe ldquoAssessing the qualityof activities in a smart environmentrdquoMethods of Information inMedicine vol 48 no 5 pp 480ndash485 2009

[95] I H Witten and E Frank Data Mining Practical MachineLearning Tools and Techniques With Java Implementation Mor-gan Kaufmann 2000

[96] K Sharma M Rajpoot and L K Sharma ldquoNearest neighbourclassification for wireless sensor network datardquo InternationalJournal of Computer Trends and Technology no 2 2011

[97] NS2 Simulator httpwwwisiedunsnamns[98] O P V L K Sharma S Schieder and A K Akasapu ldquoA nearest

neighbour classification for trajectory datardquo in Springer CCISvol 101 pp 180ndash185 2010

[99] M J Akhlaghinia A Lotfi C Langensiepen and N SherkatldquoA fuzzy predictor model for the occupancy prediction of anintelligent inhabited environmentrdquo in Proceedings of the IEEEInternational Conference on Fuzzy Systems (FUZZ rsquo08) pp 939ndash946 June 2008

[100] M Gaber S Krishnaswamy and A Zaslavsky ldquoOn-boardmining of data streams in sensor networksrdquo in AdvancedMethods for Knowledge Discovery from Complex Data pp 307ndash335 2005

[101] M M Gaber S Krishnaswamy and A Zaslavsky ldquoAdaptivemining techniques for data streams using algorithm outputgranularityrdquo in Proceedings of the Australasian Data MiningWorkshop 2003

[102] M M Gaber A Zaslavsky and S Krishnaswamy ldquoResource-aware knowledge discovery in data streamsrdquo in Proceedingsof 1st International Workshop on Knowledge Discovery in DataStreams held in Conjunction ECML and PKDD 2004

[103] S M McConnell and D B Skillicorn ldquoA distributed approachfor prediction in sensor networksrdquo in Proceedings of the Work-shop on Data Mining in Sensor Networks Newport Beach CalifUSA 2005

[104] B Malhotra I Nikolaidis and J Harms ldquoDistributed classifi-cation of acoustic targets in wireless audio-sensor networksrdquoComputer Networks vol 52 no 13 pp 2582ndash2593 2008

[105] K Flouri B Beferull-Lozano and T Tsakalides ldquoTraininga SVM-based classifier in distributed sensor networksrdquo inProceedings of the 14th International Conference onDigital SignalProcessing (DSP rsquo09) pp 1ndash5 2006

[106] K Flouri B Beferull-Lozano and T Tsakalides ldquoEnergy-efficient distributed support vectormachines for wireless sensornetworksrdquo in Proceedings of the EuropeanWorkshop onWirelessSensor Networks 2006

[107] K Flouri B Beferull-Lozano and T Tsakalides ldquoDistributedconsensus algorithms for SVM training in wireless sensornetworksrdquo in Proceedings of the 16th European Signal ProcessingConference (EUSIPCO 09) 2008

[108] S Rajasegarar C Leckie M Palaniswami and J C BezdekldquoQuarter sphere based distributed anomaly detection in wire-less sensor networksrdquo in Proceedings of the IEEE InternationalConference on Communications (ICC rsquo07) pp 3864ndash3869 June2007

[109] B Chikhaoui S Wang and H Pigot ldquoA new algorithm basedon sequential pattern mining for person identification in ubiq-uitous environmentsrdquo in Proceedings of the 4th InternationalWorkshop on Knowledge Discovery form Sensor Data (ACMSensorKDD rsquo10) pp 20ndash28 Washington DC USA 2010

[110] K Romer and F Mattern ldquoThe design space of wireless sensornetworksrdquo IEEEWireless Communications vol 11 no 6 pp 54ndash61 2004

[111] O Diallo J J P C Rodrigues and M Sene ldquoReal-time datamanagement on wireless sensor networks a surveyrdquo Journal ofNetwork andComputer Applications vol 35 no 3 pp 1013ndash10212012

[112] Y Yao L Feng B Jin and F Chen ldquoAn incremental learningapproachwith SupportVectorMachine for network data streamclassification problemrdquo Information Technology Journal vol 11no 2 pp 200ndash208 2012

Submit your manuscripts athttpwwwhindawicom

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2013Part I

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

DistributedSensor Networks

International Journal of

ISRN Signal Processing

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Mechanical Engineering

Advances in

Modelling amp Simulation in EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Advances inOptoElectronics

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2013

ISRN Sensor Networks

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawi Publishing Corporation httpwwwhindawicom Volume 2013

The Scientific World Journal

ISRN Robotics

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

International Journal of

Antennas andPropagation

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

ISRN Electronics

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

thinspJournalthinspofthinsp

Sensors

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Active and Passive Electronic Components

Chemical EngineeringInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Electrical and Computer Engineering

Journal of

ISRN Civil Engineering

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Advances inAcoustics ampVibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Page 24: ReviewArticle Data Mining Techniques for Wireless Sensor ...home.etf.bg.ac.rs/~vm/os/dmsw/Data Mining... · have a large impact on type of data mining algorithm to choose;therefore,onehastodecidetheprocessing

24 International Journal of Distributed Sensor Networks

[86] M H Yeo M S Lee S J Lee and J S Yoo ldquoData correlation-based clustering in sensor networksrdquo in Proceedings of the Inter-national Symposium on Computer Science and its Applications(CSA rsquo08) pp 332ndash337 October 2008

[87] P Beyens A Nowe and K Steenhaut ldquoHigh-density wirelesssensor networks a new clustering approach for prediction-based monitoringrdquo in Proceedings of the 2nd European Work-shop on Wireless Sensor Networks (EWSN rsquo05) pp 188ndash196February 2005

[88] S Yoon and C Shahabi ldquoThe Clustered AGgregation (CAG)technique leveraging spatial and temporal correlations in wire-less sensor networksrdquo ACM Transactions on Sensor Networksvol 3 no 1 Article ID 1210672 2007

[89] K Wang S A Ayyash T D C Little and P Basu ldquoAttribute-based clustering for information dissemination in wirelesssensor networksrdquo in Proceedings of the 2nd Annual IEEE Com-munications Society Conference on Sensor and AdHoc Commu-nications and Networks (SECON rsquo05) pp 498ndash509 Santa ClaraCalif USA September 2005

[90] X Ma S Li Q Luo et al ldquoDistributed hierarchical clusteringand summarization in sensor networksrdquo in Advances in Dataand Web Management pp 168ndash175 2007

[91] L K Sharma O P Vyas S Schieder et al ldquoNearest neighbourclassification for trajectory datardquo Information and Communica-tion Technologies vol 101 pp 180ndash185 2010

[92] B Chikhaoui S Wang and H Pigot ldquoA new algorithm basedon sequential pattern mining for person identification in ubiq-uitous environmentsrdquo in Proceedings of the 4th InternationalWorkshop on Knowledge Discovery form Sensor Data (ACMSensorKDD rsquo10) pp 20ndash28 Washington DC USA 2010

[93] J R M Bauchet S Giroux H Pigot et al ldquoPervasive assistancein smart homes for people with intellectual disabilities a casestudy on meal preparationrdquo International Journal of AssistiveRobotics and Mechatronics vol 9 no 4 pp 42ndash54 2008

[94] D J Cook andM Schmitter-Edgecombe ldquoAssessing the qualityof activities in a smart environmentrdquoMethods of Information inMedicine vol 48 no 5 pp 480ndash485 2009

[95] I H Witten and E Frank Data Mining Practical MachineLearning Tools and Techniques With Java Implementation Mor-gan Kaufmann 2000

[96] K Sharma M Rajpoot and L K Sharma ldquoNearest neighbourclassification for wireless sensor network datardquo InternationalJournal of Computer Trends and Technology no 2 2011

[97] NS2 Simulator httpwwwisiedunsnamns[98] O P V L K Sharma S Schieder and A K Akasapu ldquoA nearest

neighbour classification for trajectory datardquo in Springer CCISvol 101 pp 180ndash185 2010

[99] M J Akhlaghinia A Lotfi C Langensiepen and N SherkatldquoA fuzzy predictor model for the occupancy prediction of anintelligent inhabited environmentrdquo in Proceedings of the IEEEInternational Conference on Fuzzy Systems (FUZZ rsquo08) pp 939ndash946 June 2008

[100] M Gaber S Krishnaswamy and A Zaslavsky ldquoOn-boardmining of data streams in sensor networksrdquo in AdvancedMethods for Knowledge Discovery from Complex Data pp 307ndash335 2005

[101] M M Gaber S Krishnaswamy and A Zaslavsky ldquoAdaptivemining techniques for data streams using algorithm outputgranularityrdquo in Proceedings of the Australasian Data MiningWorkshop 2003

[102] M M Gaber A Zaslavsky and S Krishnaswamy ldquoResource-aware knowledge discovery in data streamsrdquo in Proceedingsof 1st International Workshop on Knowledge Discovery in DataStreams held in Conjunction ECML and PKDD 2004

[103] S M McConnell and D B Skillicorn ldquoA distributed approachfor prediction in sensor networksrdquo in Proceedings of the Work-shop on Data Mining in Sensor Networks Newport Beach CalifUSA 2005

[104] B Malhotra I Nikolaidis and J Harms ldquoDistributed classifi-cation of acoustic targets in wireless audio-sensor networksrdquoComputer Networks vol 52 no 13 pp 2582ndash2593 2008

[105] K Flouri B Beferull-Lozano and T Tsakalides ldquoTraininga SVM-based classifier in distributed sensor networksrdquo inProceedings of the 14th International Conference onDigital SignalProcessing (DSP rsquo09) pp 1ndash5 2006

[106] K Flouri B Beferull-Lozano and T Tsakalides ldquoEnergy-efficient distributed support vectormachines for wireless sensornetworksrdquo in Proceedings of the EuropeanWorkshop onWirelessSensor Networks 2006

[107] K Flouri B Beferull-Lozano and T Tsakalides ldquoDistributedconsensus algorithms for SVM training in wireless sensornetworksrdquo in Proceedings of the 16th European Signal ProcessingConference (EUSIPCO 09) 2008

[108] S Rajasegarar C Leckie M Palaniswami and J C BezdekldquoQuarter sphere based distributed anomaly detection in wire-less sensor networksrdquo in Proceedings of the IEEE InternationalConference on Communications (ICC rsquo07) pp 3864ndash3869 June2007

[109] B Chikhaoui S Wang and H Pigot ldquoA new algorithm basedon sequential pattern mining for person identification in ubiq-uitous environmentsrdquo in Proceedings of the 4th InternationalWorkshop on Knowledge Discovery form Sensor Data (ACMSensorKDD rsquo10) pp 20ndash28 Washington DC USA 2010

[110] K Romer and F Mattern ldquoThe design space of wireless sensornetworksrdquo IEEEWireless Communications vol 11 no 6 pp 54ndash61 2004

[111] O Diallo J J P C Rodrigues and M Sene ldquoReal-time datamanagement on wireless sensor networks a surveyrdquo Journal ofNetwork andComputer Applications vol 35 no 3 pp 1013ndash10212012

[112] Y Yao L Feng B Jin and F Chen ldquoAn incremental learningapproachwith SupportVectorMachine for network data streamclassification problemrdquo Information Technology Journal vol 11no 2 pp 200ndash208 2012

Submit your manuscripts athttpwwwhindawicom

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2013Part I

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

DistributedSensor Networks

International Journal of

ISRN Signal Processing

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Mechanical Engineering

Advances in

Modelling amp Simulation in EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Advances inOptoElectronics

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2013

ISRN Sensor Networks

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawi Publishing Corporation httpwwwhindawicom Volume 2013

The Scientific World Journal

ISRN Robotics

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

International Journal of

Antennas andPropagation

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

ISRN Electronics

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

thinspJournalthinspofthinsp

Sensors

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Active and Passive Electronic Components

Chemical EngineeringInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Electrical and Computer Engineering

Journal of

ISRN Civil Engineering

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Advances inAcoustics ampVibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Page 25: ReviewArticle Data Mining Techniques for Wireless Sensor ...home.etf.bg.ac.rs/~vm/os/dmsw/Data Mining... · have a large impact on type of data mining algorithm to choose;therefore,onehastodecidetheprocessing

Submit your manuscripts athttpwwwhindawicom

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2013Part I

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

DistributedSensor Networks

International Journal of

ISRN Signal Processing

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Mechanical Engineering

Advances in

Modelling amp Simulation in EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Advances inOptoElectronics

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2013

ISRN Sensor Networks

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawi Publishing Corporation httpwwwhindawicom Volume 2013

The Scientific World Journal

ISRN Robotics

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

International Journal of

Antennas andPropagation

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

ISRN Electronics

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

thinspJournalthinspofthinsp

Sensors

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Active and Passive Electronic Components

Chemical EngineeringInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Electrical and Computer Engineering

Journal of

ISRN Civil Engineering

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013

Advances inAcoustics ampVibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2013