Research Article A Dynamic Processing System for Sensor Data in …downloads.hindawi.com › journals › ijdsn › 2015 › 750452.pdf · 2015-11-24 · Research Article A Dynamic
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Research ArticleA Dynamic Processing System for Sensor Data in IoT
Minbo Li12 Yanling Liu1 and Yuanfeng Cai1
1Software School Fudan University Shanghai 201203 China2Shanghai Key Laboratory of Data Science Fudan University Shanghai 201203 China
Correspondence should be addressed to Minbo Li limbfudaneducn
Received 5 December 2014 Revised 23 April 2015 Accepted 1 June 2015
Academic Editor Xiuzhen Cheng
Copyright copy 2015 Minbo Li et alThis is an open access article distributed under the Creative CommonsAttribution License whichpermits unrestricted use distribution and reproduction in any medium provided the original work is properly cited
With the development of the Internet of Things (IoT for short) innumerable Wireless Sensor Networks (WSNs) are deployed tocapture the information of environmental status in the surrounding physical environmentThe data fromWSNs called sensor dataare generated in high frequency Similar to data of other open-loop applications for example networkmonitoring data sensor dataare heterogeneous redundant real-time massive and streaming Hence sensor data cannot be treated as the IoT business datawhich brings complexity and difficulty to information sharing in the open-loop environmentThis paper proposes a dynamic sensordata processing (SDP) system to capture and process sensor data continuously on the basis of data streaming technology ParticleSwarmOptimization (PSO) algorithm is employed to train threshold dynamically for data compression avoiding redundancyWiththe help of rules setting the proposed SDP is able to detect exception situationsMeanwhile the storagemodels in SQL andNOSQLdatabases are analyzed and compared trying to seek an appropriate type of database for sensor data storageThe experimental resultsshow that our SDP can compress sensor data through dynamically balancing the accuracy and compression rate and the model onNOSQL database has better performance than the model on SQL database
1 Introduction
The Internet of Things (IoT) [1ndash3] is a concept which aimsto integrate the virtual world of information technology withthe real world of things seamlesslyThe perception layer of theIoT consists of a large number of Wireless Sensor Networks(WSNs) [4] Sensors in WSNs are adopted to sense the envi-ronment bridging machines and physical world especially inmanufacturing and logistics applications Sensors frequentlycollect heat power light sound speed and so on as signalsto provide raw information for information systems Thesesignals are sensor data which reveal the environmentalstatus In traditional sensor applications such as greenhousemonitoring sensor data were processed and stored in a singlesystem Accordingly there was little need for exchange ofdata across organizational boundaries Hence the traditionalsensor applications often become close-loop applicationsTheIoT applications which always involve various stakeholdersand require information exchange and sharing are open-loopapplications Similar to other applications in the open-loopenvironment the sensor data in the IoT applications have thefollowing five characteristics
Sensor Data Are Heterogeneous The types and functionsof sensors are diverse Temperature and humidity sensorsand illumination sensors are common in agricultural andlogistic applications while blood sensors monitor bloodpressure heart rate and other body indices Different typesof devices monitor various indices Besides the protocols thedevices use are diverse The data from various sensors are inheterogeneous formats
Sensor Data Are Redundant The state of the environmentor the human being whom a sensor is monitoring is stableduring a short time For example the temperature of aroom may remain unchanged or float little in 5 minutesin common situations Since sensors collect information inseconds database will meet heavy load grow quickly andhave a bad performance if storing all of those data
Sensor Data Are Real-Time Data Sensor data are collectedover time So sensor data are real-time data in fact Processingthe sensor data in time can bring benefits into the IoTFor example once an abnormal situation occurs a timely
Hindawi Publishing CorporationInternational Journal of Distributed Sensor NetworksVolume 2015 Article ID 750452 10 pageshttpdxdoiorg1011552015750452
2 International Journal of Distributed Sensor Networks
exception detection can help operators make decisions oradjust measures fast to reduce the loss
Sensor Data Are Massive and Streaming As the produc-tion costs of sensors come down more and more sensorsare applied to detect complete environmental informationMeanwhile sensors are set to collect data in seconds Hencesensor data are generated and streamed to the Internet Andthe amount of sensor data is large
In order to maximize the use of sensor data the process-ing systems are required to recognize these characteristicsWe propose an IoT sensor data processing (SDP) systembased on data streaming technology trying to do dynamicdata compression reducing redundancy with the help ofParticle Swarm Optimization (PSO) algorithm and to detectexception situations in real-time Meanwhile we seek anappropriate type of database to store massive sensor dataThe experimental results show that compression method cantrain a threshold to balance the accuracy and compressionrate and the model on NOSQL database has distinctly betterperformance than the model on SQL database
The remainder of the paper is organized as followsSection 2 describes the related work Section 3 introducesthe preliminary notions Section 4 presents the frameworkof the proposed processing system SDP In Section 5 weconducted experiments on real sensor data and present theresults Section 6 concludes this paper
2 Related Work
Existing researches on sensor data processing can be summa-rized into two categories the bottom-up approach and thetop-down approachThe bottom-up category models sensorsand sensor networks from the device perspective SensorML[5] IEEE1451 [6] and GSN [7] fall under the bottom-upcategory SensorML models the sensor related processes forconsistent handling IEEE1451 defines transducer interface toaccess and manage smart transducers GSN abstracts sensordata sources specification and query tools intoXMLdescrip-tion offering plug-and-play detection and deployment Theresearches in bottom-up category noticed the heterogeneousand streaming characteristic the other three were not in theirconsideration The top-down category builds from the appli-cation perspective by analyzing functional requirementsThesensor network services platform (SNSP) [8] Cougar [9]sensor information networking architecture and application(SINA) [10] and COSMOS [11] belong to the top-downcategory SNSP proposed a set of fundamental services andinterfaces primitives to provide query operations on sensornetworks Cougar offered an in-network query processingmechanism SINA attempt to find an optimal way to facilitatequerying monitoring and tasking of sensor networks COS-MOSprovided amethod to integrate data over heterogeneoussensor networks and defined a standardized communicationprotocol andmessage formatsThe researches in this categorywere all focusing on the heterogeneous characteristic theother four were not considered in their papers
Those methods mentioned above could not handle thesensor data according to all characteristics and support
dynamic processing in an open-loop environment Com-pared with those frameworks the proposed SDP systemwhich belongs to the bottom-up category pays attention toall the five characteristics and differs in the following aspects
(1) Based on unified data model data compressionmethod employing PSO algorithm is present toreduce redundancy
(2) The proposed system adopts data stream technologyto monitor real-time and streaming exception situa-tions
(3) Relational and NoSQL databases are compared andanalyzed to seek a better model for massive sensordata storage
3 Preliminary Notions
31 Particle Swarm Optimization Particle Swarm Optimiza-tion (PSO) [12ndash14] proposed by Kennedy is an artificialintelligent algorithm aiming to help finding the optimalstate in 119863-dimensional search space with the help of theinteractions between particles in the swarm
Every individual particle 119894 moves toward the997888997888997888997888rarr119901119887119890119904119905119894
position and997888997888997888997888rarr119892119887119890119904119905 position The
997888997888997888997888rarr119901119887119890119904119905119894position is the best
position found by particle 119894 so far The997888997888997888997888rarr119892119887119890119904119905 position is the
best position found by the swarm so far Particle 119894 movesitself according to its velocity 997888rarrV
119894and current position 997888rarr119909
119894
The velocity and position of particles can be updated by thefollowing equations
Here 997888rarrV119894refers to the velocity of particle 119894 and 997888rarr119909
119894is the
position of the particle 119894997888997888997888997888rarr119901119887119890119904119905119894is the personal best position
of particle 119894 and997888997888997888997888rarr119892119887119890119904119905 is the global best position of the swarm
The inertia weight 119908 is used to control exploration andexploitation [15] The particles maintain high velocities witha larger 119908 and low velocities with a smaller 119908 A larger 119908 canprevent particles from becoming trapped in local optima anda smaller 119908 encourages particles exploiting the same searchspace areaThe constants 119888
1and 1198882are used to decide whether
particles prefer moving toward a 119901119887119890119904119905 position or 119892119887119890119904119905position rand
1and rand
2are random variables between 0
and 1The structure of PSO is asshown inAlgorithm 1The119892119887119890119904119905
after the iterations is the optimal position we seek
32 Data Compression Measurement Indices To evaluate theaccuracy and the compression rate of data we employ Ampli-tude Error index to represent accuracy and CompressionRatio index to stand for compression rate
International Journal of Distributed Sensor Networks 3
BeginInitialize a population of particles with random positions and velocities on119863 dimensions in the search spaceFor each iteration 119896
beginFor each particle 119894 in the swarm
Begin(1) Calculate new 997888rarrV
119894using (1)
(2) Update the position 997888rarr119909119894according to (2)
(3) Evaluate the value 119891V119894of fitness function 119891 with current position 997888rarr119909
321 Amplitude Error Index Amplitude Error (AE) [16] is ameasure of similarity between the input numerical list and theoutput numerical list of a method It is often used to evaluatethe accuracy of ECG data compression [17] The smaller AEvalue between the recovered list from the output list of acompression method and the original ECG list reflects thatthe higher accuracy is achieved by this method Our sensordata compression is similar to ECG data compression weborrow this index to evaluate our compression accuracy
Given an input numeric list IList = 1199061 1199062 119906
119899 where
119906119894is the 119894th number in IList and a numeric list RList =V1 V2 V
119899 where V
119894is the recovered value from the
compressed data AE is expressed by
AE =sum119899
119894=0 (119906119894 minus V119894)2
119899 (3)
322 Compression Ratio Index Compression Ratio (CR) [18]also comes fromECGdata compression It is ameasure of thechanges in the data amount The smaller CR value representsthat the larger amount of data have been compressed
Given an original numeric list OList = 1199061 1199062 119906
119899
where 119906119894is the 119894th number in OList and a compressed
numeric list CList = V1 V2 V
119898 where V
119894is the remaining
value and119898 ≦ 119899 CR is expressed by
CR = 119898119899 (4)
33 Fundamental Elements in the IoT In an IoT informationsystem there are four fundamental concepts [19ndash21]
EntitiesThese include all RFID-tagged entities such as itemscases pallets and even patients with RFID [22] bracelets
Readers There are two types of readers One type is RFIDreader the other type is sensor reader which refers to the basestation ofWSNs RFID readers use radio-frequency signals tocommunicate with RFID tags and also create business eventswhich describe the life cycle of an entity Base stations collect
data from the gateways of WSNs and stream all the data outto the Internet
WSNs WSNs collect and aggregate the environment infor-mation through sensors and communicate with base stationsthrough gateways
Container A container is symbolized to represent whereentities or readers or sensors locate It may be a warehouseor a truck The common identification of a container is theGlobal Location Number (GLN) [23] code defined by GlobeStandard 1 (GS1) [24] In general more than one reader andat least one sensor are deployed in one container for trackingentities and monitoring environment
4 Sensor Data Processing System
Figure 1 shows the components of proposed IoT sensordata processing (SDP) system and its relationship with theapplications in the open-loop environment We treat eachpiece of sensor data as an event called sensor event Inthe open-loop environment there are WSNs deployed inphysical containers and applications requiring informationexchange The SDP system consists of Observation CapturerReal-Time Exception Monitor and Dynamic Compressionthree components Sensor data streamed into SDP are inunified format called Observation One single index whichdescribes a physical phenomenon state is called PhenomenonEvent in SDP Observation Capturer receives all the Obser-vations from distributed base stations and split Observationsinto different Phenomenon Event according to the type ofindex Real-Time Exception Monitor detects exception Phe-nomenon Event and pushes exception warning out DynamicCompression employs PSO algorithm to train threshold tocompress Phenomenon Event
41 Unified Sensor Data Definitions We first introduce thenew concept of Observation (O for abbreviation) whichis a data object recording the basic information a sensor
4 International Journal of Distributed Sensor Networks
WSNWSN
SDP
Gateway
Basestation
Gateway
Basestation
Raw observations
Sensordata
repository
Observation Capturer
Warning
Phenomenonstandard
of location
Phenomenon Events
Dynamic Compression
Real-TimeException Monitor
Appl
icat
ions Compressed
Phenomenon EventsInformationexchange
middot middot middot
Figure 1 Framework of SDP system in IoT
and the values of phenomena A phenomenon is a collectedcondition state For example a temperature of value 232∘C isa phenomenon
Definition 1 (Observation) Consider
O = ltS addr Time Pgt
P = ltph type ph value ph unitgt+
Here S addr refers to the address of the sensorThe addressesof sensors in different WSNs differ with each other The mostcommon types of S addr are ltgateway ip sensor set offgtand sensor ip Time means the collected time of an Observa-tion ltph type ph value ph unitgt is a triple where ph typerefers to the type of a phenomenon ph value refers to thevalue of a phenomenon and ph unit denotes the unit ofph value The sign + indicates that there may be more thanone triple of phenomena generated by a sensor
We split Observations into Phenomenon Event (PE forabbreviation) list
Definition 2 (Phenomenon Event) Consider
PE = ltPh Type S addr Time Value Unitgt
Here Ph Type refers to the type of a phenomenon S addrand Time are inherited from Observation Value is thenumeric value of a phenomenon recorded by a sensor Unitdenotes the unit of Value
421 Observation Capture Base stations which wear themanaging gateway of WSNs parse pieces of raw sensor datainto the form of Os Once Observation Capturer catchesan O it splits O into several PEs and puts them into timewindows according to their Ph Type and S addr attributesA time window which is set to output a list when time is upor the list is full collects PEs for a preset time Phenomenawith the same Ph Type and S addr are pushed into the sametime window When a time window is full or the time is upObservation Capturer gathers the PEs in the window into aPE list and passes this list to Dynamic CompressionThe PEsin this list contain values of the same phenomenon and fromthe same sensor So the values are comparable
422 Dynamic Compression Dynamic Compression (DC)component here tries to compress data discarding theredundant PEs to reduce the load of Real-Time ExceptionMonitor system and data repository
One data compression method is to record the first valuein a sequence which contains the continuous same valuesHence once a new value comes into the sequence it will bethe next recorded value Using this method PEs generatedby a sensor containing two different air temperature values233∘C and 234∘C would be two recorded PhenomenonEvents However 01∘C the difference between 233∘C and234∘C does not mean a significant change of environmentin some situations such as in the greenhouse
The other way called Threshold Compression is to preseta threshold and then discard the values whose differencewith the last record is less than the threshold This is themost common data compression method It has three limitswhen it comes across the IoT sensor data Firstly the biggerthe threshold is the more values would be thrown awaycausing lower accuracy In other words when the thresholdis reduced higher accuracy will be achieved but more redun-dant data are stored The previous method is one special caseof this method with the smallest threshold zero So findingan appropriate threshold is a balance between accuracy andstorage size Secondly there are various types of phenomenain the IoT Presetting thresholds for all phenomena needsa lot of work Thirdly sensor data change all the time Acurrent appropriate thresholdwill be improper for future datasequence
Here DC seeks a threshold for each phenomenon typedynamically to balance accuracy and compression rate withthe help of PSO algorithm and the crossbreeding of particleswarms The primary goal of DC is to get the minimum AEBased on this goal we try to reduce the redundancy of PEs inPE list making smaller CR
International Journal of Distributed Sensor Networks 5
Once a PEList comes DC executes the following steps toget a compressed PE list CPEList
Step 1 Initialize two particle populations of size m withrandom positions and velocities on = 119909Min 119909Max where119909Min represents the minimum value of the searching spaceand 119909Max refers to the maximum value of searching spaceOne population CRP = CRP1CRP2 CRP119898 adoptingCR as fitness function is used for finding the threshold in119863 to get minimum CR The other population called AEP =AEP1AEP2 AEP119898 employing AE as fitness function isassigned to search the threshold in 119863 to achieve minimumAE
Step 2 Update the velocity and position of all particles inCRP and AEP with (1) and (2)
Step 3 For each 119894 isin (1 119898) let CRP119894and AEP
Step 5 Generate compressed PE lists for all particles inCRP with their positions as thresholds Evaluate CRs for theparticles in CRP using their compressed PEList with (4)
Step 6 Generate compressed PE lists for all particles inAEP with their positions as thresholds For each particlereconstruct time-value line with the PEs in its CPEList usingthe process described in Section 423 and calculate AEbetween PEList and CPEList
Step 7 For each particle in CRP compare its current CR valuewith the CR of its 119901119887119890119904119905 position If the current value is lessthen update 119901119887119890119904119905 and its current position
Step 8 For each particle inAEP compare its current AE valuewith the AE of its 119901119887119890119904119905 position If the current value is lessthen update 119901119887119890119904119905 and its current position
Step 9 Determine the global best particle gbCRP in CRPwith the smallest CR value
Step 10 Determine the global best particle gbAEP in AEPwith the smallest AE value
Step 11 Use the position of gbCRP as threshold to generatea compressed list CRCPEList from PEList and calculate theAE value between CRCPEList and PEList
Step 12 Compare the AE value from Step 11 to the AE valueof gbAEP If the AE value of gbAEP is bigger then set theposition of gbAEP to gbCRPrsquos positon
249
25
251
252
253
254
255
256
257
258
259
203
824
210
712
213
600
220
448
223
336
230
224
233
112
Time
Temperature
Air
tem
pera
ture
(∘C)
Figure 2 Example of time-value reconstruction
Step 13 Repeat Step 2 to Step 12 until a sufficiently good AEor a maximum number of iterations are met
The position of gbAEP after the above 13 steps is theoptimal threshold that can help us to compress PEList intoCPEList balancing accuracy and compression rate
423 Time-Value Reconstruction All of the PEs in a PE listare of the same type two key attributes in a PE are timeand value So reconstruction of a PE is to recalculate thephenomenon value with a time The time-value pairs in thecompressed PE list are discrete points For reconstructionwe make Time attribute as 119883 coordinate and Value attributeas 119884 coordinate For all PEs in a compressed PE list mark(Time Value) points one by one and draw lines between twoadjacent points The 119884 values in the connected line are thereconstructed values of PEs for a certain time
For example given a compressed Phenomenon EventList CPEList
The points on the line in Figure 2 are the reconstructedPhenomenon Events
For example the value at time 2013-06-19 215024on the line chart is 255 So the reconstructed PE isltldquoair temperaturerdquo 192168012 0x001 255 C 2013-06-19215024gt
43 Real-Time Exception Monitor As explained sensor datareflects the phenomenon status in the physical space Hence
6 International Journal of Distributed Sensor Networks
sensor data are closely associated with the containers Themonitored containers require that the phenomenon statesare stable in certain intervals For example a warehousein a cold train loaded with meats requires air temperaturemaintained at [minus18∘C 0∘C] Once the air temperature is outof this interval especially beyond the limits warnings arerequired to be pushed out to the applications
Real-Time Exception Monitor (REM) receives com-pressed Phenomenon Event List CPEList from DynamicCompression (DC) to detect exception situations Althoughthe elements in CPEList are not the complete capturedPhenomenon Events they are still key elements collectedwith a threshold Therefore the Phenomenon Events inCPEList can represent the actual exception situations
REM maintains a Rule Repository and a core executioncomponent Real-Time Exception Monitor Core (REMC forshort) Rule Repository stores a set of rules defining thestandard phenomenon interval Every rule is formatted as aRL
RL = ltGLN Ph Type Min Max Unitgt
Here GLN is a thirteen-digit number used to identifyparties and physical locations Ph Type denotes the type ofa phenomenon Min andMax describe a standard range withUnit of Ph TypeThe following example shows a standard airtemperature interval [minus18∘C 0∘C] of a warehouse with GLN6901404000029 in meat cold chain
REMC focuses on the exception monitoring execution Oncea compressed PE list comes REMC first uses the commonsensor address of elements in CPEList to get the respectiveGLN number Then REMC retrieves RL for the containerfrom Rule Repository with the GLN and the commonPh Type of PEs From all hit RLs REMC looks for themaximum Min and minimum Max to form the minimumstandard range for all types of entities For each PE in thecompressed PE list if the value is not in the range REMCwill send out a warning
44 Sensor Data Storage As described in Section 3 aninformation system manages several containers and at leastone sensor is deployed in a container Meanwhile the sensorscollect information of various phenomena in seconds As aresult the number of PhenomenonEvents generated each dayis large Although PEs have been compressed in DynamicCompression the stored number still increases fast To finda better way to store and retrieve Phenomenon Eventsefficiently supporting environment information discovery forentities we attempt to maintain sensor data in two ways SQLdatabase and NoSQL database
441 SQL Database The structure of sensor data in SQLdatabase is designed as Figure 3 A container contains oneor more sensors and sensors generate several types ofevents Considering the insertion and selection operations
focus on single type of phenomenon we classify events intodifferent tables according to their types to reduce the size ofPhenomenon Event table In this way if more phenomenaare monitored more tables will be created To improvedata retrieving efficiency and reduce the storage space wedefine fields using variable characters instead of charactersfloat replacing decimal number and timestamp rather thandatetime
Insertion PEs in the compressed PE list are stored directlyinto corresponding table grouped by their Ph Type
Query The most common query on sensor data like thefollowing example takes a GLN number and a time slot tosearch the records of a specific phenomenon GLNCode inContainer table and time in Phenomenon Event tables appearoften
Select lowast from AirTemperatureEvent a join sensorson aSensorID = sSensorID join Container c onsContainerID = cContainerID where cGLNCode = lsquo1rsquoand atime between lsquo2013-06-19 200000rsquo AND lsquo2013-06-20 165959rsquo
For dereasing the event retrieving time we add indices onGLNCode field and time field
442 NOSQL Database NoSQL database maintains sen-sor-GLN collection and Phenomenon Event collectionAlgorithm 2 shows the storage example tuple in NoSQLdatabase Since NoSQL database stores data in collectionsthis paper divides the Phenomenon Event and basic infor-mation of sensor and containers into two collections Sensor-GLN collectionmaps the relationship between sensor addressand GLN number of physical container managing [S addrGLN] tuples Phenomenon Event collection stores [Ph TypeS addr Ph Value Ph Unit Time] tuples S addr whichrepresents the sensor deployed in container and the datasource of Phenomenon Events is the key value to buildrelationships between physical containers and the actualvalues of phenomena while physical containers are identifiedwith GLN code
International Journal of Distributed Sensor Networks 7
Insertion PEs in the compressed PE list are stored directlyinto Phenomenon Event collection without extra operationssince all tuples in this collection record their Ph Type
Query The most common query is also finding the Phe-nomenon Event that happened in a container during a timeslot This query would require executing querying operationtwo times Firstly get the S addr list with GLN from sensor-GLN collection Secondly for each S addr find all eventsrecords generated over this period
To reduce the query times there is another way to storePEs It is to find theGLNnumberwith S addr before insertinga PE and store [Ph Type GLN Ph Value Ph Unit Time]tuples in Phenomenon Event collection However as to thecurrent situation insertion operations of sensor data happenmore frequently than queries Hence we adopt the firstdesign on NoSQL DB
Table 1 Parameters of PSO
119908 1198881
1198882
119909Min 119909Max VMax04 18 18 00 05 02
5 Evaluation
51 Experiment Setup We implemented a prototype of theproposed sensor data processing systemSPSOur experimen-tal platform consists of a PC runningWindows 7 professionalwith 400GB memory and Intel(R) Core(TM) i3-3220 CPU 330GHz processor The parameters of PSO are set asin Table 1 The selected SQL database is MySQL and therepresentative of NoSQL database is MongoDB
52 Performance Evaluation In order to evaluate perfor-mance of our SDP system three experiments are conducted
8 International Journal of Distributed Sensor Networks
10040100441004810052
Pres
sure
val
ue (h
Pa) Pressure thinning
True pressureRecord pressure
2013
-06-
1920
38
24
2013
-06-
1921
07
12
2013
-06-
1921
36
00
2013
-06-
1922
04
48
2013
-06-
1922
33
36
2013
-06-
1923
02
24
2013
-06-
1923
31
12
2013
-06-
2000
00
00
Time
Figure 4 Compression result of pressure data
The first experiment is to show whether DC can compressdata dynamically according to the characteristics of datasequences The second experiment is to show how thenumber of PEs affects the compression speed and how theselections of database affect the storage speed The thirdexperiment is to show how the setting of database affects thetracing speed
Testing data came from Wuxi Institute of Fudan Univer-sity These data were recorded every 5 seconds by the sensorsin a greenhouse For each experiment we set the window sizeto 30 seconds To test the performance of our framework allPEs were submitted to the sliding window in 30 seconds
There are many factors affecting the performance of theproposed IoT sensor data processing SDP system such as thecharacteristics of the PEs the number of PEs and the settingsof database In our evaluation experiments we focused onhow the number of PEs and the selections of database affectperformance Meanwhile we try to evaluate whether the datacompression (DC) can compress data dynamically accordingto the characteristics of data sequencesThe initial parametersof PSO are set as in Table 1 before experiment
The first experiment was conducted with the 1867 piecesof air temperature records and 1868 pieces of pressure recordsas PEs We measured the accuracy AE and the compressionrate CR when DC outputs a compressed PE list for aphenomenon type The experimental results are shown inFigures 4 and 5 We can see that DC keeps all PEs whosevalue of pressure differs from their previous recorded valuewhile DC finds the trend of air temperature in Figure 5 andrecords only 5 points DC can find a threshold to balance theaccuracy and compression rate Meanwhile the value of AEindex approximates 000321 and the value of CR is near 05133in Figure 4 And the data compression of air temperature inFigure 5 can achieve AE value of 128197 and CR value of000268 This part of experiment shows that the proposeddynamic data compression can identify the trends of dataand balance the accuracy and compression rate Redundancyfrom sensor data can be avoided
The second experiment was conducted with the numberof sensor data varying from 500 to 1000000 We measuredthe time when DC begins compressing the time of DC aftercompression and the time after storage The second onesubtracted by the first one was considered as the compressiontime and the last one subtracted by the second one was
248250252254256258260
Time
True values
Air temperature thinning
Recorded values
2013
-06-
1920
38
24
2013
-06-
1921
07
12
2013
-06-
1921
36
00
2013
-06-
1922
04
48
2013
-06-
1922
33
36
2013
-06-
1923
02
24
2013
-06-
1923
31
12
2013
-06-
2000
00
00
Tem
pera
ture
val
ue (∘
C)
Figure 5 Compression result of air temperature data
020000400006000080000
100000120000140000
500 1000 2000 4000 6000 8000 10000
Tim
e (m
s)
Numbers of PEs
CompressionMySQL storageMongoDB storage
Figure 6 Compression and storage time with various numbers ofPEs
taken as the storage time The result is shown in Figure 6and it reflected that the compression time of DC has linearrelationship with the number of PEs The time storing PEsinto MySQL database increases faster than the time storingdata into MongoDB
The third experiment was conducted using a query totrace the air temperature information during the lifecycleof an entity with an EPC The number of recorded PEs indatabase varied from 69880 to 1985247 The query time ofMySQL and MongoDB is shown in Figure 7 We learnedthat the tracing time of MongoDB grows faster than thetime of MySQL The reason is that MongoDB had to executetwo queries when the number of tuples increases Howevercomparing to the storage time the time cost by the sensordata queries of MongoDB is short
Hence when applications are faced with a large amountof sensor data storage it is better to store sensor datainto nonrelational databases while when the number ofenvironment information discovery queries is larger thanstorage rational databases have better performance
6 Conclusion
In this paper we present a new IoT sensor data processing(SDP) system to process sensor data dynamically in thecontext of the Internet ofThings First heterogeneous sensor
International Journal of Distributed Sensor Networks 9
0
1000
2000
3000
4000
5000
6000
7000
8000
69880 190354 1133228 1985247
Que
ry ti
me (
ms)
Number of tuples in database
MysqlMongoDB
Figure 7 Query time with various numbers of tuples
data are captured and transformed into unified data formatSecond Particle Swarm Optimization algorithm is employedto do data compression avoiding redundancy and helpingto reduce the load of database by adding crossbreedingoperation on PSO algorithm The proposed SDP systemdetects exception situations by setting the standard phe-nomenon rules of containers Meanwhile an appropriatetype of database suitable for the sensor data storage in theIoT is sought and analyzed in this paper The experimentalresults show that the proposed compression method can finda threshold achieving high compression rate and keepingaccuracy and NoSQL database has better performance insensor data storagewhile relational database does betterwhenexecuting environment information discovery queries
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
References
[1] L Atzori A Iera and G Morabito ldquoThe internet of things asurveyrdquoComputer Networks vol 54 no 15 pp 2787ndash2805 2010
[2] M C Domingo ldquoAn overview of the internet of things forpeople with disabilitiesrdquo Journal of Network and ComputerApplications vol 35 no 2 pp 584ndash596 2012
[3] L Zheng H Zhang W Han et al ldquoTechnologies applicationsand governance in the internet of thingsrdquo in Internet ofThingsmdashGlobal Technological and Societal Trends from Smart Environ-ments and Spaces to Green ICT River Publishers 2011
[4] J Yick B Mukherjee and D Ghosal ldquoWireless sensor networksurveyrdquoComputerNetworks vol 52 no 12 pp 2292ndash2330 2008
[5] M Botts and A Robin Sensor Model Language (SensorML)Implementation Specification OpenGIS Implementation Spec-ification 2007
[6] M Botts and A Robin ldquoSensor Model Language (SensorML)Implementation Specification OpenGIS Implementation peci-fication 2007rdquo IEEE Instrumentation and Measurement Soci-ety IEEE Standard for a Smart Transducer Interface for Sensorsand Actuators-Common Functions Communication Proto-cols and Transducer Electronic Data Sheet (TEDS) FormatsIEEE Std 14510 2007
[7] K Aberer M Hauswirth and A Salehi ldquoInfrastructure for dataprocessing in large-scale interconnected sensor networksrdquo inProceedings of the 8th International Conference on Mobile DataManagement (MDM rsquo07) pp 198ndash205 May 2007
[8] M Sgroi A Wolisz A Sangiovanni-Vincentelli and J MRabaey ldquoA service-based universal application interface forad hoc wireless sensor and actuator networksrdquo in AmbientIntelligence pp 149ndash172 Springer Berlin Germany 2005
[9] Y Yao and J Gehrke ldquoThe cougar approach to in-network queryprocessing in sensor networksrdquoACMSigmod Record vol 31 no3 pp 9ndash18 2002
[10] C-C Shen C Srisathapornphat and C Jaikaeo ldquoSensorinformation networking architecture and applicationsrdquo IEEEPersonal Communications vol 8 no 4 pp 52ndash59 2001
[11] M Kim J W Lee Y J Lee and J-C Ryou ldquoCosmos amiddleware for integrated data processing over heterogeneoussensor networksrdquo ETRI Journal vol 30 no 5 pp 696ndash7062008
[12] G Venter and J Sobieszczanski-Sobieski ldquoParticle swarm opti-mizationrdquo AIAA Journal vol 41 no 8 pp 1583ndash1589 2003
[13] J Kennedy ldquoParticle swarm optimizationrdquo in Encyclopedia ofMachine Learning pp 760ndash766 Springer New York NY USA2010
[14] F Van den Bergh and A P Engelbrecht ldquoA study of particleswarm optimization particle trajectoriesrdquo Information Sciencesvol 176 no 8 pp 937ndash971 2006
[15] Y Shi and R C Eberhart ldquoFuzzy adaptive particle swarmoptimizationrdquo in Proceedings of the Congress on EvolutionaryComputation vol 1 pp 101ndash106 IEEE Seoul Republic of KoreaMay 2001
[16] L Gang F Jing L Ling and Y Qilian ldquoFast realization of theLADT ECG data compression methodrdquo IEEE Engineering inMedicine and BiologyMagazine vol 13 no 2 pp 255ndash258 1994
[17] S M S Jalaleddine C G Hutchens R D Strattan and WA Coberly ldquoECG data compression techniquesmdasha unifiedapproachrdquo IEEETransactions on Biomedical Engineering vol 37no 4 pp 329ndash343 1990
[18] P T Gonciari B M Al-Hashimi and N Nicolici ldquoImprovingcompression ratio area overhead and test application timefor system-on-a-chip test data compressiondecompressionrdquo inProceedings of the Design Automation and Test in Europe Con-ference and Exhibition pp 604ndash611 IEEE Computer SocietyParis France 2002
[19] F Wang and P Liu ldquoTemporal management of RFID datardquo inProceedings of the 31st International Conference on Very LargeData Bases (VLDB rsquo05) pp 1128ndash1139 September 2005
[20] E Welbourne L Battle G Cole et al ldquoBuilding the internetof things using RFID the RFID ecosystem experiencerdquo IEEEInternet Computing vol 13 no 3 pp 48ndash55 2009
[21] I Groslashnbaeligk ldquoArchitecture for the Internet of Things (IoT)API and interconnectrdquo in Proceedings of the 2nd InternationalConference on Sensor Technologies and Applications (SENSOR-COMM rsquo08) pp 802ndash807 Cap Esterel France August 2008
[22] R Want ldquoAn introduction to RFID technologyrdquo IEEE PervasiveComputing vol 5 no 1 pp 25ndash33 2006
10 International Journal of Distributed Sensor Networks
[23] Global Location Numbers (GLN) GS1 httpwwwgs1orgdocsidkeysGS1 Global Location Numberspdf
2 International Journal of Distributed Sensor Networks
exception detection can help operators make decisions oradjust measures fast to reduce the loss
Sensor Data Are Massive and Streaming As the produc-tion costs of sensors come down more and more sensorsare applied to detect complete environmental informationMeanwhile sensors are set to collect data in seconds Hencesensor data are generated and streamed to the Internet Andthe amount of sensor data is large
In order to maximize the use of sensor data the process-ing systems are required to recognize these characteristicsWe propose an IoT sensor data processing (SDP) systembased on data streaming technology trying to do dynamicdata compression reducing redundancy with the help ofParticle Swarm Optimization (PSO) algorithm and to detectexception situations in real-time Meanwhile we seek anappropriate type of database to store massive sensor dataThe experimental results show that compression method cantrain a threshold to balance the accuracy and compressionrate and the model on NOSQL database has distinctly betterperformance than the model on SQL database
The remainder of the paper is organized as followsSection 2 describes the related work Section 3 introducesthe preliminary notions Section 4 presents the frameworkof the proposed processing system SDP In Section 5 weconducted experiments on real sensor data and present theresults Section 6 concludes this paper
2 Related Work
Existing researches on sensor data processing can be summa-rized into two categories the bottom-up approach and thetop-down approachThe bottom-up category models sensorsand sensor networks from the device perspective SensorML[5] IEEE1451 [6] and GSN [7] fall under the bottom-upcategory SensorML models the sensor related processes forconsistent handling IEEE1451 defines transducer interface toaccess and manage smart transducers GSN abstracts sensordata sources specification and query tools intoXMLdescrip-tion offering plug-and-play detection and deployment Theresearches in bottom-up category noticed the heterogeneousand streaming characteristic the other three were not in theirconsideration The top-down category builds from the appli-cation perspective by analyzing functional requirementsThesensor network services platform (SNSP) [8] Cougar [9]sensor information networking architecture and application(SINA) [10] and COSMOS [11] belong to the top-downcategory SNSP proposed a set of fundamental services andinterfaces primitives to provide query operations on sensornetworks Cougar offered an in-network query processingmechanism SINA attempt to find an optimal way to facilitatequerying monitoring and tasking of sensor networks COS-MOSprovided amethod to integrate data over heterogeneoussensor networks and defined a standardized communicationprotocol andmessage formatsThe researches in this categorywere all focusing on the heterogeneous characteristic theother four were not considered in their papers
Those methods mentioned above could not handle thesensor data according to all characteristics and support
dynamic processing in an open-loop environment Com-pared with those frameworks the proposed SDP systemwhich belongs to the bottom-up category pays attention toall the five characteristics and differs in the following aspects
(1) Based on unified data model data compressionmethod employing PSO algorithm is present toreduce redundancy
(2) The proposed system adopts data stream technologyto monitor real-time and streaming exception situa-tions
(3) Relational and NoSQL databases are compared andanalyzed to seek a better model for massive sensordata storage
3 Preliminary Notions
31 Particle Swarm Optimization Particle Swarm Optimiza-tion (PSO) [12ndash14] proposed by Kennedy is an artificialintelligent algorithm aiming to help finding the optimalstate in 119863-dimensional search space with the help of theinteractions between particles in the swarm
Every individual particle 119894 moves toward the997888997888997888997888rarr119901119887119890119904119905119894
position and997888997888997888997888rarr119892119887119890119904119905 position The
997888997888997888997888rarr119901119887119890119904119905119894position is the best
position found by particle 119894 so far The997888997888997888997888rarr119892119887119890119904119905 position is the
best position found by the swarm so far Particle 119894 movesitself according to its velocity 997888rarrV
119894and current position 997888rarr119909
119894
The velocity and position of particles can be updated by thefollowing equations
Here 997888rarrV119894refers to the velocity of particle 119894 and 997888rarr119909
119894is the
position of the particle 119894997888997888997888997888rarr119901119887119890119904119905119894is the personal best position
of particle 119894 and997888997888997888997888rarr119892119887119890119904119905 is the global best position of the swarm
The inertia weight 119908 is used to control exploration andexploitation [15] The particles maintain high velocities witha larger 119908 and low velocities with a smaller 119908 A larger 119908 canprevent particles from becoming trapped in local optima anda smaller 119908 encourages particles exploiting the same searchspace areaThe constants 119888
1and 1198882are used to decide whether
particles prefer moving toward a 119901119887119890119904119905 position or 119892119887119890119904119905position rand
1and rand
2are random variables between 0
and 1The structure of PSO is asshown inAlgorithm 1The119892119887119890119904119905
after the iterations is the optimal position we seek
32 Data Compression Measurement Indices To evaluate theaccuracy and the compression rate of data we employ Ampli-tude Error index to represent accuracy and CompressionRatio index to stand for compression rate
International Journal of Distributed Sensor Networks 3
BeginInitialize a population of particles with random positions and velocities on119863 dimensions in the search spaceFor each iteration 119896
beginFor each particle 119894 in the swarm
Begin(1) Calculate new 997888rarrV
119894using (1)
(2) Update the position 997888rarr119909119894according to (2)
(3) Evaluate the value 119891V119894of fitness function 119891 with current position 997888rarr119909
321 Amplitude Error Index Amplitude Error (AE) [16] is ameasure of similarity between the input numerical list and theoutput numerical list of a method It is often used to evaluatethe accuracy of ECG data compression [17] The smaller AEvalue between the recovered list from the output list of acompression method and the original ECG list reflects thatthe higher accuracy is achieved by this method Our sensordata compression is similar to ECG data compression weborrow this index to evaluate our compression accuracy
Given an input numeric list IList = 1199061 1199062 119906
119899 where
119906119894is the 119894th number in IList and a numeric list RList =V1 V2 V
119899 where V
119894is the recovered value from the
compressed data AE is expressed by
AE =sum119899
119894=0 (119906119894 minus V119894)2
119899 (3)
322 Compression Ratio Index Compression Ratio (CR) [18]also comes fromECGdata compression It is ameasure of thechanges in the data amount The smaller CR value representsthat the larger amount of data have been compressed
Given an original numeric list OList = 1199061 1199062 119906
119899
where 119906119894is the 119894th number in OList and a compressed
numeric list CList = V1 V2 V
119898 where V
119894is the remaining
value and119898 ≦ 119899 CR is expressed by
CR = 119898119899 (4)
33 Fundamental Elements in the IoT In an IoT informationsystem there are four fundamental concepts [19ndash21]
EntitiesThese include all RFID-tagged entities such as itemscases pallets and even patients with RFID [22] bracelets
Readers There are two types of readers One type is RFIDreader the other type is sensor reader which refers to the basestation ofWSNs RFID readers use radio-frequency signals tocommunicate with RFID tags and also create business eventswhich describe the life cycle of an entity Base stations collect
data from the gateways of WSNs and stream all the data outto the Internet
WSNs WSNs collect and aggregate the environment infor-mation through sensors and communicate with base stationsthrough gateways
Container A container is symbolized to represent whereentities or readers or sensors locate It may be a warehouseor a truck The common identification of a container is theGlobal Location Number (GLN) [23] code defined by GlobeStandard 1 (GS1) [24] In general more than one reader andat least one sensor are deployed in one container for trackingentities and monitoring environment
4 Sensor Data Processing System
Figure 1 shows the components of proposed IoT sensordata processing (SDP) system and its relationship with theapplications in the open-loop environment We treat eachpiece of sensor data as an event called sensor event Inthe open-loop environment there are WSNs deployed inphysical containers and applications requiring informationexchange The SDP system consists of Observation CapturerReal-Time Exception Monitor and Dynamic Compressionthree components Sensor data streamed into SDP are inunified format called Observation One single index whichdescribes a physical phenomenon state is called PhenomenonEvent in SDP Observation Capturer receives all the Obser-vations from distributed base stations and split Observationsinto different Phenomenon Event according to the type ofindex Real-Time Exception Monitor detects exception Phe-nomenon Event and pushes exception warning out DynamicCompression employs PSO algorithm to train threshold tocompress Phenomenon Event
41 Unified Sensor Data Definitions We first introduce thenew concept of Observation (O for abbreviation) whichis a data object recording the basic information a sensor
4 International Journal of Distributed Sensor Networks
WSNWSN
SDP
Gateway
Basestation
Gateway
Basestation
Raw observations
Sensordata
repository
Observation Capturer
Warning
Phenomenonstandard
of location
Phenomenon Events
Dynamic Compression
Real-TimeException Monitor
Appl
icat
ions Compressed
Phenomenon EventsInformationexchange
middot middot middot
Figure 1 Framework of SDP system in IoT
and the values of phenomena A phenomenon is a collectedcondition state For example a temperature of value 232∘C isa phenomenon
Definition 1 (Observation) Consider
O = ltS addr Time Pgt
P = ltph type ph value ph unitgt+
Here S addr refers to the address of the sensorThe addressesof sensors in different WSNs differ with each other The mostcommon types of S addr are ltgateway ip sensor set offgtand sensor ip Time means the collected time of an Observa-tion ltph type ph value ph unitgt is a triple where ph typerefers to the type of a phenomenon ph value refers to thevalue of a phenomenon and ph unit denotes the unit ofph value The sign + indicates that there may be more thanone triple of phenomena generated by a sensor
We split Observations into Phenomenon Event (PE forabbreviation) list
Definition 2 (Phenomenon Event) Consider
PE = ltPh Type S addr Time Value Unitgt
Here Ph Type refers to the type of a phenomenon S addrand Time are inherited from Observation Value is thenumeric value of a phenomenon recorded by a sensor Unitdenotes the unit of Value
421 Observation Capture Base stations which wear themanaging gateway of WSNs parse pieces of raw sensor datainto the form of Os Once Observation Capturer catchesan O it splits O into several PEs and puts them into timewindows according to their Ph Type and S addr attributesA time window which is set to output a list when time is upor the list is full collects PEs for a preset time Phenomenawith the same Ph Type and S addr are pushed into the sametime window When a time window is full or the time is upObservation Capturer gathers the PEs in the window into aPE list and passes this list to Dynamic CompressionThe PEsin this list contain values of the same phenomenon and fromthe same sensor So the values are comparable
422 Dynamic Compression Dynamic Compression (DC)component here tries to compress data discarding theredundant PEs to reduce the load of Real-Time ExceptionMonitor system and data repository
One data compression method is to record the first valuein a sequence which contains the continuous same valuesHence once a new value comes into the sequence it will bethe next recorded value Using this method PEs generatedby a sensor containing two different air temperature values233∘C and 234∘C would be two recorded PhenomenonEvents However 01∘C the difference between 233∘C and234∘C does not mean a significant change of environmentin some situations such as in the greenhouse
The other way called Threshold Compression is to preseta threshold and then discard the values whose differencewith the last record is less than the threshold This is themost common data compression method It has three limitswhen it comes across the IoT sensor data Firstly the biggerthe threshold is the more values would be thrown awaycausing lower accuracy In other words when the thresholdis reduced higher accuracy will be achieved but more redun-dant data are stored The previous method is one special caseof this method with the smallest threshold zero So findingan appropriate threshold is a balance between accuracy andstorage size Secondly there are various types of phenomenain the IoT Presetting thresholds for all phenomena needsa lot of work Thirdly sensor data change all the time Acurrent appropriate thresholdwill be improper for future datasequence
Here DC seeks a threshold for each phenomenon typedynamically to balance accuracy and compression rate withthe help of PSO algorithm and the crossbreeding of particleswarms The primary goal of DC is to get the minimum AEBased on this goal we try to reduce the redundancy of PEs inPE list making smaller CR
International Journal of Distributed Sensor Networks 5
Once a PEList comes DC executes the following steps toget a compressed PE list CPEList
Step 1 Initialize two particle populations of size m withrandom positions and velocities on = 119909Min 119909Max where119909Min represents the minimum value of the searching spaceand 119909Max refers to the maximum value of searching spaceOne population CRP = CRP1CRP2 CRP119898 adoptingCR as fitness function is used for finding the threshold in119863 to get minimum CR The other population called AEP =AEP1AEP2 AEP119898 employing AE as fitness function isassigned to search the threshold in 119863 to achieve minimumAE
Step 2 Update the velocity and position of all particles inCRP and AEP with (1) and (2)
Step 3 For each 119894 isin (1 119898) let CRP119894and AEP
Step 5 Generate compressed PE lists for all particles inCRP with their positions as thresholds Evaluate CRs for theparticles in CRP using their compressed PEList with (4)
Step 6 Generate compressed PE lists for all particles inAEP with their positions as thresholds For each particlereconstruct time-value line with the PEs in its CPEList usingthe process described in Section 423 and calculate AEbetween PEList and CPEList
Step 7 For each particle in CRP compare its current CR valuewith the CR of its 119901119887119890119904119905 position If the current value is lessthen update 119901119887119890119904119905 and its current position
Step 8 For each particle inAEP compare its current AE valuewith the AE of its 119901119887119890119904119905 position If the current value is lessthen update 119901119887119890119904119905 and its current position
Step 9 Determine the global best particle gbCRP in CRPwith the smallest CR value
Step 10 Determine the global best particle gbAEP in AEPwith the smallest AE value
Step 11 Use the position of gbCRP as threshold to generatea compressed list CRCPEList from PEList and calculate theAE value between CRCPEList and PEList
Step 12 Compare the AE value from Step 11 to the AE valueof gbAEP If the AE value of gbAEP is bigger then set theposition of gbAEP to gbCRPrsquos positon
249
25
251
252
253
254
255
256
257
258
259
203
824
210
712
213
600
220
448
223
336
230
224
233
112
Time
Temperature
Air
tem
pera
ture
(∘C)
Figure 2 Example of time-value reconstruction
Step 13 Repeat Step 2 to Step 12 until a sufficiently good AEor a maximum number of iterations are met
The position of gbAEP after the above 13 steps is theoptimal threshold that can help us to compress PEList intoCPEList balancing accuracy and compression rate
423 Time-Value Reconstruction All of the PEs in a PE listare of the same type two key attributes in a PE are timeand value So reconstruction of a PE is to recalculate thephenomenon value with a time The time-value pairs in thecompressed PE list are discrete points For reconstructionwe make Time attribute as 119883 coordinate and Value attributeas 119884 coordinate For all PEs in a compressed PE list mark(Time Value) points one by one and draw lines between twoadjacent points The 119884 values in the connected line are thereconstructed values of PEs for a certain time
For example given a compressed Phenomenon EventList CPEList
The points on the line in Figure 2 are the reconstructedPhenomenon Events
For example the value at time 2013-06-19 215024on the line chart is 255 So the reconstructed PE isltldquoair temperaturerdquo 192168012 0x001 255 C 2013-06-19215024gt
43 Real-Time Exception Monitor As explained sensor datareflects the phenomenon status in the physical space Hence
6 International Journal of Distributed Sensor Networks
sensor data are closely associated with the containers Themonitored containers require that the phenomenon statesare stable in certain intervals For example a warehousein a cold train loaded with meats requires air temperaturemaintained at [minus18∘C 0∘C] Once the air temperature is outof this interval especially beyond the limits warnings arerequired to be pushed out to the applications
Real-Time Exception Monitor (REM) receives com-pressed Phenomenon Event List CPEList from DynamicCompression (DC) to detect exception situations Althoughthe elements in CPEList are not the complete capturedPhenomenon Events they are still key elements collectedwith a threshold Therefore the Phenomenon Events inCPEList can represent the actual exception situations
REM maintains a Rule Repository and a core executioncomponent Real-Time Exception Monitor Core (REMC forshort) Rule Repository stores a set of rules defining thestandard phenomenon interval Every rule is formatted as aRL
RL = ltGLN Ph Type Min Max Unitgt
Here GLN is a thirteen-digit number used to identifyparties and physical locations Ph Type denotes the type ofa phenomenon Min andMax describe a standard range withUnit of Ph TypeThe following example shows a standard airtemperature interval [minus18∘C 0∘C] of a warehouse with GLN6901404000029 in meat cold chain
REMC focuses on the exception monitoring execution Oncea compressed PE list comes REMC first uses the commonsensor address of elements in CPEList to get the respectiveGLN number Then REMC retrieves RL for the containerfrom Rule Repository with the GLN and the commonPh Type of PEs From all hit RLs REMC looks for themaximum Min and minimum Max to form the minimumstandard range for all types of entities For each PE in thecompressed PE list if the value is not in the range REMCwill send out a warning
44 Sensor Data Storage As described in Section 3 aninformation system manages several containers and at leastone sensor is deployed in a container Meanwhile the sensorscollect information of various phenomena in seconds As aresult the number of PhenomenonEvents generated each dayis large Although PEs have been compressed in DynamicCompression the stored number still increases fast To finda better way to store and retrieve Phenomenon Eventsefficiently supporting environment information discovery forentities we attempt to maintain sensor data in two ways SQLdatabase and NoSQL database
441 SQL Database The structure of sensor data in SQLdatabase is designed as Figure 3 A container contains oneor more sensors and sensors generate several types ofevents Considering the insertion and selection operations
focus on single type of phenomenon we classify events intodifferent tables according to their types to reduce the size ofPhenomenon Event table In this way if more phenomenaare monitored more tables will be created To improvedata retrieving efficiency and reduce the storage space wedefine fields using variable characters instead of charactersfloat replacing decimal number and timestamp rather thandatetime
Insertion PEs in the compressed PE list are stored directlyinto corresponding table grouped by their Ph Type
Query The most common query on sensor data like thefollowing example takes a GLN number and a time slot tosearch the records of a specific phenomenon GLNCode inContainer table and time in Phenomenon Event tables appearoften
Select lowast from AirTemperatureEvent a join sensorson aSensorID = sSensorID join Container c onsContainerID = cContainerID where cGLNCode = lsquo1rsquoand atime between lsquo2013-06-19 200000rsquo AND lsquo2013-06-20 165959rsquo
For dereasing the event retrieving time we add indices onGLNCode field and time field
442 NOSQL Database NoSQL database maintains sen-sor-GLN collection and Phenomenon Event collectionAlgorithm 2 shows the storage example tuple in NoSQLdatabase Since NoSQL database stores data in collectionsthis paper divides the Phenomenon Event and basic infor-mation of sensor and containers into two collections Sensor-GLN collectionmaps the relationship between sensor addressand GLN number of physical container managing [S addrGLN] tuples Phenomenon Event collection stores [Ph TypeS addr Ph Value Ph Unit Time] tuples S addr whichrepresents the sensor deployed in container and the datasource of Phenomenon Events is the key value to buildrelationships between physical containers and the actualvalues of phenomena while physical containers are identifiedwith GLN code
International Journal of Distributed Sensor Networks 7
Insertion PEs in the compressed PE list are stored directlyinto Phenomenon Event collection without extra operationssince all tuples in this collection record their Ph Type
Query The most common query is also finding the Phe-nomenon Event that happened in a container during a timeslot This query would require executing querying operationtwo times Firstly get the S addr list with GLN from sensor-GLN collection Secondly for each S addr find all eventsrecords generated over this period
To reduce the query times there is another way to storePEs It is to find theGLNnumberwith S addr before insertinga PE and store [Ph Type GLN Ph Value Ph Unit Time]tuples in Phenomenon Event collection However as to thecurrent situation insertion operations of sensor data happenmore frequently than queries Hence we adopt the firstdesign on NoSQL DB
Table 1 Parameters of PSO
119908 1198881
1198882
119909Min 119909Max VMax04 18 18 00 05 02
5 Evaluation
51 Experiment Setup We implemented a prototype of theproposed sensor data processing systemSPSOur experimen-tal platform consists of a PC runningWindows 7 professionalwith 400GB memory and Intel(R) Core(TM) i3-3220 CPU 330GHz processor The parameters of PSO are set asin Table 1 The selected SQL database is MySQL and therepresentative of NoSQL database is MongoDB
52 Performance Evaluation In order to evaluate perfor-mance of our SDP system three experiments are conducted
8 International Journal of Distributed Sensor Networks
10040100441004810052
Pres
sure
val
ue (h
Pa) Pressure thinning
True pressureRecord pressure
2013
-06-
1920
38
24
2013
-06-
1921
07
12
2013
-06-
1921
36
00
2013
-06-
1922
04
48
2013
-06-
1922
33
36
2013
-06-
1923
02
24
2013
-06-
1923
31
12
2013
-06-
2000
00
00
Time
Figure 4 Compression result of pressure data
The first experiment is to show whether DC can compressdata dynamically according to the characteristics of datasequences The second experiment is to show how thenumber of PEs affects the compression speed and how theselections of database affect the storage speed The thirdexperiment is to show how the setting of database affects thetracing speed
Testing data came from Wuxi Institute of Fudan Univer-sity These data were recorded every 5 seconds by the sensorsin a greenhouse For each experiment we set the window sizeto 30 seconds To test the performance of our framework allPEs were submitted to the sliding window in 30 seconds
There are many factors affecting the performance of theproposed IoT sensor data processing SDP system such as thecharacteristics of the PEs the number of PEs and the settingsof database In our evaluation experiments we focused onhow the number of PEs and the selections of database affectperformance Meanwhile we try to evaluate whether the datacompression (DC) can compress data dynamically accordingto the characteristics of data sequencesThe initial parametersof PSO are set as in Table 1 before experiment
The first experiment was conducted with the 1867 piecesof air temperature records and 1868 pieces of pressure recordsas PEs We measured the accuracy AE and the compressionrate CR when DC outputs a compressed PE list for aphenomenon type The experimental results are shown inFigures 4 and 5 We can see that DC keeps all PEs whosevalue of pressure differs from their previous recorded valuewhile DC finds the trend of air temperature in Figure 5 andrecords only 5 points DC can find a threshold to balance theaccuracy and compression rate Meanwhile the value of AEindex approximates 000321 and the value of CR is near 05133in Figure 4 And the data compression of air temperature inFigure 5 can achieve AE value of 128197 and CR value of000268 This part of experiment shows that the proposeddynamic data compression can identify the trends of dataand balance the accuracy and compression rate Redundancyfrom sensor data can be avoided
The second experiment was conducted with the numberof sensor data varying from 500 to 1000000 We measuredthe time when DC begins compressing the time of DC aftercompression and the time after storage The second onesubtracted by the first one was considered as the compressiontime and the last one subtracted by the second one was
248250252254256258260
Time
True values
Air temperature thinning
Recorded values
2013
-06-
1920
38
24
2013
-06-
1921
07
12
2013
-06-
1921
36
00
2013
-06-
1922
04
48
2013
-06-
1922
33
36
2013
-06-
1923
02
24
2013
-06-
1923
31
12
2013
-06-
2000
00
00
Tem
pera
ture
val
ue (∘
C)
Figure 5 Compression result of air temperature data
020000400006000080000
100000120000140000
500 1000 2000 4000 6000 8000 10000
Tim
e (m
s)
Numbers of PEs
CompressionMySQL storageMongoDB storage
Figure 6 Compression and storage time with various numbers ofPEs
taken as the storage time The result is shown in Figure 6and it reflected that the compression time of DC has linearrelationship with the number of PEs The time storing PEsinto MySQL database increases faster than the time storingdata into MongoDB
The third experiment was conducted using a query totrace the air temperature information during the lifecycleof an entity with an EPC The number of recorded PEs indatabase varied from 69880 to 1985247 The query time ofMySQL and MongoDB is shown in Figure 7 We learnedthat the tracing time of MongoDB grows faster than thetime of MySQL The reason is that MongoDB had to executetwo queries when the number of tuples increases Howevercomparing to the storage time the time cost by the sensordata queries of MongoDB is short
Hence when applications are faced with a large amountof sensor data storage it is better to store sensor datainto nonrelational databases while when the number ofenvironment information discovery queries is larger thanstorage rational databases have better performance
6 Conclusion
In this paper we present a new IoT sensor data processing(SDP) system to process sensor data dynamically in thecontext of the Internet ofThings First heterogeneous sensor
International Journal of Distributed Sensor Networks 9
0
1000
2000
3000
4000
5000
6000
7000
8000
69880 190354 1133228 1985247
Que
ry ti
me (
ms)
Number of tuples in database
MysqlMongoDB
Figure 7 Query time with various numbers of tuples
data are captured and transformed into unified data formatSecond Particle Swarm Optimization algorithm is employedto do data compression avoiding redundancy and helpingto reduce the load of database by adding crossbreedingoperation on PSO algorithm The proposed SDP systemdetects exception situations by setting the standard phe-nomenon rules of containers Meanwhile an appropriatetype of database suitable for the sensor data storage in theIoT is sought and analyzed in this paper The experimentalresults show that the proposed compression method can finda threshold achieving high compression rate and keepingaccuracy and NoSQL database has better performance insensor data storagewhile relational database does betterwhenexecuting environment information discovery queries
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
References
[1] L Atzori A Iera and G Morabito ldquoThe internet of things asurveyrdquoComputer Networks vol 54 no 15 pp 2787ndash2805 2010
[2] M C Domingo ldquoAn overview of the internet of things forpeople with disabilitiesrdquo Journal of Network and ComputerApplications vol 35 no 2 pp 584ndash596 2012
[3] L Zheng H Zhang W Han et al ldquoTechnologies applicationsand governance in the internet of thingsrdquo in Internet ofThingsmdashGlobal Technological and Societal Trends from Smart Environ-ments and Spaces to Green ICT River Publishers 2011
[4] J Yick B Mukherjee and D Ghosal ldquoWireless sensor networksurveyrdquoComputerNetworks vol 52 no 12 pp 2292ndash2330 2008
[5] M Botts and A Robin Sensor Model Language (SensorML)Implementation Specification OpenGIS Implementation Spec-ification 2007
[6] M Botts and A Robin ldquoSensor Model Language (SensorML)Implementation Specification OpenGIS Implementation peci-fication 2007rdquo IEEE Instrumentation and Measurement Soci-ety IEEE Standard for a Smart Transducer Interface for Sensorsand Actuators-Common Functions Communication Proto-cols and Transducer Electronic Data Sheet (TEDS) FormatsIEEE Std 14510 2007
[7] K Aberer M Hauswirth and A Salehi ldquoInfrastructure for dataprocessing in large-scale interconnected sensor networksrdquo inProceedings of the 8th International Conference on Mobile DataManagement (MDM rsquo07) pp 198ndash205 May 2007
[8] M Sgroi A Wolisz A Sangiovanni-Vincentelli and J MRabaey ldquoA service-based universal application interface forad hoc wireless sensor and actuator networksrdquo in AmbientIntelligence pp 149ndash172 Springer Berlin Germany 2005
[9] Y Yao and J Gehrke ldquoThe cougar approach to in-network queryprocessing in sensor networksrdquoACMSigmod Record vol 31 no3 pp 9ndash18 2002
[10] C-C Shen C Srisathapornphat and C Jaikaeo ldquoSensorinformation networking architecture and applicationsrdquo IEEEPersonal Communications vol 8 no 4 pp 52ndash59 2001
[11] M Kim J W Lee Y J Lee and J-C Ryou ldquoCosmos amiddleware for integrated data processing over heterogeneoussensor networksrdquo ETRI Journal vol 30 no 5 pp 696ndash7062008
[12] G Venter and J Sobieszczanski-Sobieski ldquoParticle swarm opti-mizationrdquo AIAA Journal vol 41 no 8 pp 1583ndash1589 2003
[13] J Kennedy ldquoParticle swarm optimizationrdquo in Encyclopedia ofMachine Learning pp 760ndash766 Springer New York NY USA2010
[14] F Van den Bergh and A P Engelbrecht ldquoA study of particleswarm optimization particle trajectoriesrdquo Information Sciencesvol 176 no 8 pp 937ndash971 2006
[15] Y Shi and R C Eberhart ldquoFuzzy adaptive particle swarmoptimizationrdquo in Proceedings of the Congress on EvolutionaryComputation vol 1 pp 101ndash106 IEEE Seoul Republic of KoreaMay 2001
[16] L Gang F Jing L Ling and Y Qilian ldquoFast realization of theLADT ECG data compression methodrdquo IEEE Engineering inMedicine and BiologyMagazine vol 13 no 2 pp 255ndash258 1994
[17] S M S Jalaleddine C G Hutchens R D Strattan and WA Coberly ldquoECG data compression techniquesmdasha unifiedapproachrdquo IEEETransactions on Biomedical Engineering vol 37no 4 pp 329ndash343 1990
[18] P T Gonciari B M Al-Hashimi and N Nicolici ldquoImprovingcompression ratio area overhead and test application timefor system-on-a-chip test data compressiondecompressionrdquo inProceedings of the Design Automation and Test in Europe Con-ference and Exhibition pp 604ndash611 IEEE Computer SocietyParis France 2002
[19] F Wang and P Liu ldquoTemporal management of RFID datardquo inProceedings of the 31st International Conference on Very LargeData Bases (VLDB rsquo05) pp 1128ndash1139 September 2005
[20] E Welbourne L Battle G Cole et al ldquoBuilding the internetof things using RFID the RFID ecosystem experiencerdquo IEEEInternet Computing vol 13 no 3 pp 48ndash55 2009
[21] I Groslashnbaeligk ldquoArchitecture for the Internet of Things (IoT)API and interconnectrdquo in Proceedings of the 2nd InternationalConference on Sensor Technologies and Applications (SENSOR-COMM rsquo08) pp 802ndash807 Cap Esterel France August 2008
[22] R Want ldquoAn introduction to RFID technologyrdquo IEEE PervasiveComputing vol 5 no 1 pp 25ndash33 2006
10 International Journal of Distributed Sensor Networks
[23] Global Location Numbers (GLN) GS1 httpwwwgs1orgdocsidkeysGS1 Global Location Numberspdf
321 Amplitude Error Index Amplitude Error (AE) [16] is ameasure of similarity between the input numerical list and theoutput numerical list of a method It is often used to evaluatethe accuracy of ECG data compression [17] The smaller AEvalue between the recovered list from the output list of acompression method and the original ECG list reflects thatthe higher accuracy is achieved by this method Our sensordata compression is similar to ECG data compression weborrow this index to evaluate our compression accuracy
Given an input numeric list IList = 1199061 1199062 119906
119899 where
119906119894is the 119894th number in IList and a numeric list RList =V1 V2 V
119899 where V
119894is the recovered value from the
compressed data AE is expressed by
AE =sum119899
119894=0 (119906119894 minus V119894)2
119899 (3)
322 Compression Ratio Index Compression Ratio (CR) [18]also comes fromECGdata compression It is ameasure of thechanges in the data amount The smaller CR value representsthat the larger amount of data have been compressed
Given an original numeric list OList = 1199061 1199062 119906
119899
where 119906119894is the 119894th number in OList and a compressed
numeric list CList = V1 V2 V
119898 where V
119894is the remaining
value and119898 ≦ 119899 CR is expressed by
CR = 119898119899 (4)
33 Fundamental Elements in the IoT In an IoT informationsystem there are four fundamental concepts [19ndash21]
EntitiesThese include all RFID-tagged entities such as itemscases pallets and even patients with RFID [22] bracelets
Readers There are two types of readers One type is RFIDreader the other type is sensor reader which refers to the basestation ofWSNs RFID readers use radio-frequency signals tocommunicate with RFID tags and also create business eventswhich describe the life cycle of an entity Base stations collect
data from the gateways of WSNs and stream all the data outto the Internet
WSNs WSNs collect and aggregate the environment infor-mation through sensors and communicate with base stationsthrough gateways
Container A container is symbolized to represent whereentities or readers or sensors locate It may be a warehouseor a truck The common identification of a container is theGlobal Location Number (GLN) [23] code defined by GlobeStandard 1 (GS1) [24] In general more than one reader andat least one sensor are deployed in one container for trackingentities and monitoring environment
4 Sensor Data Processing System
Figure 1 shows the components of proposed IoT sensordata processing (SDP) system and its relationship with theapplications in the open-loop environment We treat eachpiece of sensor data as an event called sensor event Inthe open-loop environment there are WSNs deployed inphysical containers and applications requiring informationexchange The SDP system consists of Observation CapturerReal-Time Exception Monitor and Dynamic Compressionthree components Sensor data streamed into SDP are inunified format called Observation One single index whichdescribes a physical phenomenon state is called PhenomenonEvent in SDP Observation Capturer receives all the Obser-vations from distributed base stations and split Observationsinto different Phenomenon Event according to the type ofindex Real-Time Exception Monitor detects exception Phe-nomenon Event and pushes exception warning out DynamicCompression employs PSO algorithm to train threshold tocompress Phenomenon Event
41 Unified Sensor Data Definitions We first introduce thenew concept of Observation (O for abbreviation) whichis a data object recording the basic information a sensor
4 International Journal of Distributed Sensor Networks
WSNWSN
SDP
Gateway
Basestation
Gateway
Basestation
Raw observations
Sensordata
repository
Observation Capturer
Warning
Phenomenonstandard
of location
Phenomenon Events
Dynamic Compression
Real-TimeException Monitor
Appl
icat
ions Compressed
Phenomenon EventsInformationexchange
middot middot middot
Figure 1 Framework of SDP system in IoT
and the values of phenomena A phenomenon is a collectedcondition state For example a temperature of value 232∘C isa phenomenon
Definition 1 (Observation) Consider
O = ltS addr Time Pgt
P = ltph type ph value ph unitgt+
Here S addr refers to the address of the sensorThe addressesof sensors in different WSNs differ with each other The mostcommon types of S addr are ltgateway ip sensor set offgtand sensor ip Time means the collected time of an Observa-tion ltph type ph value ph unitgt is a triple where ph typerefers to the type of a phenomenon ph value refers to thevalue of a phenomenon and ph unit denotes the unit ofph value The sign + indicates that there may be more thanone triple of phenomena generated by a sensor
We split Observations into Phenomenon Event (PE forabbreviation) list
Definition 2 (Phenomenon Event) Consider
PE = ltPh Type S addr Time Value Unitgt
Here Ph Type refers to the type of a phenomenon S addrand Time are inherited from Observation Value is thenumeric value of a phenomenon recorded by a sensor Unitdenotes the unit of Value
421 Observation Capture Base stations which wear themanaging gateway of WSNs parse pieces of raw sensor datainto the form of Os Once Observation Capturer catchesan O it splits O into several PEs and puts them into timewindows according to their Ph Type and S addr attributesA time window which is set to output a list when time is upor the list is full collects PEs for a preset time Phenomenawith the same Ph Type and S addr are pushed into the sametime window When a time window is full or the time is upObservation Capturer gathers the PEs in the window into aPE list and passes this list to Dynamic CompressionThe PEsin this list contain values of the same phenomenon and fromthe same sensor So the values are comparable
422 Dynamic Compression Dynamic Compression (DC)component here tries to compress data discarding theredundant PEs to reduce the load of Real-Time ExceptionMonitor system and data repository
One data compression method is to record the first valuein a sequence which contains the continuous same valuesHence once a new value comes into the sequence it will bethe next recorded value Using this method PEs generatedby a sensor containing two different air temperature values233∘C and 234∘C would be two recorded PhenomenonEvents However 01∘C the difference between 233∘C and234∘C does not mean a significant change of environmentin some situations such as in the greenhouse
The other way called Threshold Compression is to preseta threshold and then discard the values whose differencewith the last record is less than the threshold This is themost common data compression method It has three limitswhen it comes across the IoT sensor data Firstly the biggerthe threshold is the more values would be thrown awaycausing lower accuracy In other words when the thresholdis reduced higher accuracy will be achieved but more redun-dant data are stored The previous method is one special caseof this method with the smallest threshold zero So findingan appropriate threshold is a balance between accuracy andstorage size Secondly there are various types of phenomenain the IoT Presetting thresholds for all phenomena needsa lot of work Thirdly sensor data change all the time Acurrent appropriate thresholdwill be improper for future datasequence
Here DC seeks a threshold for each phenomenon typedynamically to balance accuracy and compression rate withthe help of PSO algorithm and the crossbreeding of particleswarms The primary goal of DC is to get the minimum AEBased on this goal we try to reduce the redundancy of PEs inPE list making smaller CR
International Journal of Distributed Sensor Networks 5
Once a PEList comes DC executes the following steps toget a compressed PE list CPEList
Step 1 Initialize two particle populations of size m withrandom positions and velocities on = 119909Min 119909Max where119909Min represents the minimum value of the searching spaceand 119909Max refers to the maximum value of searching spaceOne population CRP = CRP1CRP2 CRP119898 adoptingCR as fitness function is used for finding the threshold in119863 to get minimum CR The other population called AEP =AEP1AEP2 AEP119898 employing AE as fitness function isassigned to search the threshold in 119863 to achieve minimumAE
Step 2 Update the velocity and position of all particles inCRP and AEP with (1) and (2)
Step 3 For each 119894 isin (1 119898) let CRP119894and AEP
Step 5 Generate compressed PE lists for all particles inCRP with their positions as thresholds Evaluate CRs for theparticles in CRP using their compressed PEList with (4)
Step 6 Generate compressed PE lists for all particles inAEP with their positions as thresholds For each particlereconstruct time-value line with the PEs in its CPEList usingthe process described in Section 423 and calculate AEbetween PEList and CPEList
Step 7 For each particle in CRP compare its current CR valuewith the CR of its 119901119887119890119904119905 position If the current value is lessthen update 119901119887119890119904119905 and its current position
Step 8 For each particle inAEP compare its current AE valuewith the AE of its 119901119887119890119904119905 position If the current value is lessthen update 119901119887119890119904119905 and its current position
Step 9 Determine the global best particle gbCRP in CRPwith the smallest CR value
Step 10 Determine the global best particle gbAEP in AEPwith the smallest AE value
Step 11 Use the position of gbCRP as threshold to generatea compressed list CRCPEList from PEList and calculate theAE value between CRCPEList and PEList
Step 12 Compare the AE value from Step 11 to the AE valueof gbAEP If the AE value of gbAEP is bigger then set theposition of gbAEP to gbCRPrsquos positon
249
25
251
252
253
254
255
256
257
258
259
203
824
210
712
213
600
220
448
223
336
230
224
233
112
Time
Temperature
Air
tem
pera
ture
(∘C)
Figure 2 Example of time-value reconstruction
Step 13 Repeat Step 2 to Step 12 until a sufficiently good AEor a maximum number of iterations are met
The position of gbAEP after the above 13 steps is theoptimal threshold that can help us to compress PEList intoCPEList balancing accuracy and compression rate
423 Time-Value Reconstruction All of the PEs in a PE listare of the same type two key attributes in a PE are timeand value So reconstruction of a PE is to recalculate thephenomenon value with a time The time-value pairs in thecompressed PE list are discrete points For reconstructionwe make Time attribute as 119883 coordinate and Value attributeas 119884 coordinate For all PEs in a compressed PE list mark(Time Value) points one by one and draw lines between twoadjacent points The 119884 values in the connected line are thereconstructed values of PEs for a certain time
For example given a compressed Phenomenon EventList CPEList
The points on the line in Figure 2 are the reconstructedPhenomenon Events
For example the value at time 2013-06-19 215024on the line chart is 255 So the reconstructed PE isltldquoair temperaturerdquo 192168012 0x001 255 C 2013-06-19215024gt
43 Real-Time Exception Monitor As explained sensor datareflects the phenomenon status in the physical space Hence
6 International Journal of Distributed Sensor Networks
sensor data are closely associated with the containers Themonitored containers require that the phenomenon statesare stable in certain intervals For example a warehousein a cold train loaded with meats requires air temperaturemaintained at [minus18∘C 0∘C] Once the air temperature is outof this interval especially beyond the limits warnings arerequired to be pushed out to the applications
Real-Time Exception Monitor (REM) receives com-pressed Phenomenon Event List CPEList from DynamicCompression (DC) to detect exception situations Althoughthe elements in CPEList are not the complete capturedPhenomenon Events they are still key elements collectedwith a threshold Therefore the Phenomenon Events inCPEList can represent the actual exception situations
REM maintains a Rule Repository and a core executioncomponent Real-Time Exception Monitor Core (REMC forshort) Rule Repository stores a set of rules defining thestandard phenomenon interval Every rule is formatted as aRL
RL = ltGLN Ph Type Min Max Unitgt
Here GLN is a thirteen-digit number used to identifyparties and physical locations Ph Type denotes the type ofa phenomenon Min andMax describe a standard range withUnit of Ph TypeThe following example shows a standard airtemperature interval [minus18∘C 0∘C] of a warehouse with GLN6901404000029 in meat cold chain
REMC focuses on the exception monitoring execution Oncea compressed PE list comes REMC first uses the commonsensor address of elements in CPEList to get the respectiveGLN number Then REMC retrieves RL for the containerfrom Rule Repository with the GLN and the commonPh Type of PEs From all hit RLs REMC looks for themaximum Min and minimum Max to form the minimumstandard range for all types of entities For each PE in thecompressed PE list if the value is not in the range REMCwill send out a warning
44 Sensor Data Storage As described in Section 3 aninformation system manages several containers and at leastone sensor is deployed in a container Meanwhile the sensorscollect information of various phenomena in seconds As aresult the number of PhenomenonEvents generated each dayis large Although PEs have been compressed in DynamicCompression the stored number still increases fast To finda better way to store and retrieve Phenomenon Eventsefficiently supporting environment information discovery forentities we attempt to maintain sensor data in two ways SQLdatabase and NoSQL database
441 SQL Database The structure of sensor data in SQLdatabase is designed as Figure 3 A container contains oneor more sensors and sensors generate several types ofevents Considering the insertion and selection operations
focus on single type of phenomenon we classify events intodifferent tables according to their types to reduce the size ofPhenomenon Event table In this way if more phenomenaare monitored more tables will be created To improvedata retrieving efficiency and reduce the storage space wedefine fields using variable characters instead of charactersfloat replacing decimal number and timestamp rather thandatetime
Insertion PEs in the compressed PE list are stored directlyinto corresponding table grouped by their Ph Type
Query The most common query on sensor data like thefollowing example takes a GLN number and a time slot tosearch the records of a specific phenomenon GLNCode inContainer table and time in Phenomenon Event tables appearoften
Select lowast from AirTemperatureEvent a join sensorson aSensorID = sSensorID join Container c onsContainerID = cContainerID where cGLNCode = lsquo1rsquoand atime between lsquo2013-06-19 200000rsquo AND lsquo2013-06-20 165959rsquo
For dereasing the event retrieving time we add indices onGLNCode field and time field
442 NOSQL Database NoSQL database maintains sen-sor-GLN collection and Phenomenon Event collectionAlgorithm 2 shows the storage example tuple in NoSQLdatabase Since NoSQL database stores data in collectionsthis paper divides the Phenomenon Event and basic infor-mation of sensor and containers into two collections Sensor-GLN collectionmaps the relationship between sensor addressand GLN number of physical container managing [S addrGLN] tuples Phenomenon Event collection stores [Ph TypeS addr Ph Value Ph Unit Time] tuples S addr whichrepresents the sensor deployed in container and the datasource of Phenomenon Events is the key value to buildrelationships between physical containers and the actualvalues of phenomena while physical containers are identifiedwith GLN code
International Journal of Distributed Sensor Networks 7
Insertion PEs in the compressed PE list are stored directlyinto Phenomenon Event collection without extra operationssince all tuples in this collection record their Ph Type
Query The most common query is also finding the Phe-nomenon Event that happened in a container during a timeslot This query would require executing querying operationtwo times Firstly get the S addr list with GLN from sensor-GLN collection Secondly for each S addr find all eventsrecords generated over this period
To reduce the query times there is another way to storePEs It is to find theGLNnumberwith S addr before insertinga PE and store [Ph Type GLN Ph Value Ph Unit Time]tuples in Phenomenon Event collection However as to thecurrent situation insertion operations of sensor data happenmore frequently than queries Hence we adopt the firstdesign on NoSQL DB
Table 1 Parameters of PSO
119908 1198881
1198882
119909Min 119909Max VMax04 18 18 00 05 02
5 Evaluation
51 Experiment Setup We implemented a prototype of theproposed sensor data processing systemSPSOur experimen-tal platform consists of a PC runningWindows 7 professionalwith 400GB memory and Intel(R) Core(TM) i3-3220 CPU 330GHz processor The parameters of PSO are set asin Table 1 The selected SQL database is MySQL and therepresentative of NoSQL database is MongoDB
52 Performance Evaluation In order to evaluate perfor-mance of our SDP system three experiments are conducted
8 International Journal of Distributed Sensor Networks
10040100441004810052
Pres
sure
val
ue (h
Pa) Pressure thinning
True pressureRecord pressure
2013
-06-
1920
38
24
2013
-06-
1921
07
12
2013
-06-
1921
36
00
2013
-06-
1922
04
48
2013
-06-
1922
33
36
2013
-06-
1923
02
24
2013
-06-
1923
31
12
2013
-06-
2000
00
00
Time
Figure 4 Compression result of pressure data
The first experiment is to show whether DC can compressdata dynamically according to the characteristics of datasequences The second experiment is to show how thenumber of PEs affects the compression speed and how theselections of database affect the storage speed The thirdexperiment is to show how the setting of database affects thetracing speed
Testing data came from Wuxi Institute of Fudan Univer-sity These data were recorded every 5 seconds by the sensorsin a greenhouse For each experiment we set the window sizeto 30 seconds To test the performance of our framework allPEs were submitted to the sliding window in 30 seconds
There are many factors affecting the performance of theproposed IoT sensor data processing SDP system such as thecharacteristics of the PEs the number of PEs and the settingsof database In our evaluation experiments we focused onhow the number of PEs and the selections of database affectperformance Meanwhile we try to evaluate whether the datacompression (DC) can compress data dynamically accordingto the characteristics of data sequencesThe initial parametersof PSO are set as in Table 1 before experiment
The first experiment was conducted with the 1867 piecesof air temperature records and 1868 pieces of pressure recordsas PEs We measured the accuracy AE and the compressionrate CR when DC outputs a compressed PE list for aphenomenon type The experimental results are shown inFigures 4 and 5 We can see that DC keeps all PEs whosevalue of pressure differs from their previous recorded valuewhile DC finds the trend of air temperature in Figure 5 andrecords only 5 points DC can find a threshold to balance theaccuracy and compression rate Meanwhile the value of AEindex approximates 000321 and the value of CR is near 05133in Figure 4 And the data compression of air temperature inFigure 5 can achieve AE value of 128197 and CR value of000268 This part of experiment shows that the proposeddynamic data compression can identify the trends of dataand balance the accuracy and compression rate Redundancyfrom sensor data can be avoided
The second experiment was conducted with the numberof sensor data varying from 500 to 1000000 We measuredthe time when DC begins compressing the time of DC aftercompression and the time after storage The second onesubtracted by the first one was considered as the compressiontime and the last one subtracted by the second one was
248250252254256258260
Time
True values
Air temperature thinning
Recorded values
2013
-06-
1920
38
24
2013
-06-
1921
07
12
2013
-06-
1921
36
00
2013
-06-
1922
04
48
2013
-06-
1922
33
36
2013
-06-
1923
02
24
2013
-06-
1923
31
12
2013
-06-
2000
00
00
Tem
pera
ture
val
ue (∘
C)
Figure 5 Compression result of air temperature data
020000400006000080000
100000120000140000
500 1000 2000 4000 6000 8000 10000
Tim
e (m
s)
Numbers of PEs
CompressionMySQL storageMongoDB storage
Figure 6 Compression and storage time with various numbers ofPEs
taken as the storage time The result is shown in Figure 6and it reflected that the compression time of DC has linearrelationship with the number of PEs The time storing PEsinto MySQL database increases faster than the time storingdata into MongoDB
The third experiment was conducted using a query totrace the air temperature information during the lifecycleof an entity with an EPC The number of recorded PEs indatabase varied from 69880 to 1985247 The query time ofMySQL and MongoDB is shown in Figure 7 We learnedthat the tracing time of MongoDB grows faster than thetime of MySQL The reason is that MongoDB had to executetwo queries when the number of tuples increases Howevercomparing to the storage time the time cost by the sensordata queries of MongoDB is short
Hence when applications are faced with a large amountof sensor data storage it is better to store sensor datainto nonrelational databases while when the number ofenvironment information discovery queries is larger thanstorage rational databases have better performance
6 Conclusion
In this paper we present a new IoT sensor data processing(SDP) system to process sensor data dynamically in thecontext of the Internet ofThings First heterogeneous sensor
International Journal of Distributed Sensor Networks 9
0
1000
2000
3000
4000
5000
6000
7000
8000
69880 190354 1133228 1985247
Que
ry ti
me (
ms)
Number of tuples in database
MysqlMongoDB
Figure 7 Query time with various numbers of tuples
data are captured and transformed into unified data formatSecond Particle Swarm Optimization algorithm is employedto do data compression avoiding redundancy and helpingto reduce the load of database by adding crossbreedingoperation on PSO algorithm The proposed SDP systemdetects exception situations by setting the standard phe-nomenon rules of containers Meanwhile an appropriatetype of database suitable for the sensor data storage in theIoT is sought and analyzed in this paper The experimentalresults show that the proposed compression method can finda threshold achieving high compression rate and keepingaccuracy and NoSQL database has better performance insensor data storagewhile relational database does betterwhenexecuting environment information discovery queries
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
References
[1] L Atzori A Iera and G Morabito ldquoThe internet of things asurveyrdquoComputer Networks vol 54 no 15 pp 2787ndash2805 2010
[2] M C Domingo ldquoAn overview of the internet of things forpeople with disabilitiesrdquo Journal of Network and ComputerApplications vol 35 no 2 pp 584ndash596 2012
[3] L Zheng H Zhang W Han et al ldquoTechnologies applicationsand governance in the internet of thingsrdquo in Internet ofThingsmdashGlobal Technological and Societal Trends from Smart Environ-ments and Spaces to Green ICT River Publishers 2011
[4] J Yick B Mukherjee and D Ghosal ldquoWireless sensor networksurveyrdquoComputerNetworks vol 52 no 12 pp 2292ndash2330 2008
[5] M Botts and A Robin Sensor Model Language (SensorML)Implementation Specification OpenGIS Implementation Spec-ification 2007
[6] M Botts and A Robin ldquoSensor Model Language (SensorML)Implementation Specification OpenGIS Implementation peci-fication 2007rdquo IEEE Instrumentation and Measurement Soci-ety IEEE Standard for a Smart Transducer Interface for Sensorsand Actuators-Common Functions Communication Proto-cols and Transducer Electronic Data Sheet (TEDS) FormatsIEEE Std 14510 2007
[7] K Aberer M Hauswirth and A Salehi ldquoInfrastructure for dataprocessing in large-scale interconnected sensor networksrdquo inProceedings of the 8th International Conference on Mobile DataManagement (MDM rsquo07) pp 198ndash205 May 2007
[8] M Sgroi A Wolisz A Sangiovanni-Vincentelli and J MRabaey ldquoA service-based universal application interface forad hoc wireless sensor and actuator networksrdquo in AmbientIntelligence pp 149ndash172 Springer Berlin Germany 2005
[9] Y Yao and J Gehrke ldquoThe cougar approach to in-network queryprocessing in sensor networksrdquoACMSigmod Record vol 31 no3 pp 9ndash18 2002
[10] C-C Shen C Srisathapornphat and C Jaikaeo ldquoSensorinformation networking architecture and applicationsrdquo IEEEPersonal Communications vol 8 no 4 pp 52ndash59 2001
[11] M Kim J W Lee Y J Lee and J-C Ryou ldquoCosmos amiddleware for integrated data processing over heterogeneoussensor networksrdquo ETRI Journal vol 30 no 5 pp 696ndash7062008
[12] G Venter and J Sobieszczanski-Sobieski ldquoParticle swarm opti-mizationrdquo AIAA Journal vol 41 no 8 pp 1583ndash1589 2003
[13] J Kennedy ldquoParticle swarm optimizationrdquo in Encyclopedia ofMachine Learning pp 760ndash766 Springer New York NY USA2010
[14] F Van den Bergh and A P Engelbrecht ldquoA study of particleswarm optimization particle trajectoriesrdquo Information Sciencesvol 176 no 8 pp 937ndash971 2006
[15] Y Shi and R C Eberhart ldquoFuzzy adaptive particle swarmoptimizationrdquo in Proceedings of the Congress on EvolutionaryComputation vol 1 pp 101ndash106 IEEE Seoul Republic of KoreaMay 2001
[16] L Gang F Jing L Ling and Y Qilian ldquoFast realization of theLADT ECG data compression methodrdquo IEEE Engineering inMedicine and BiologyMagazine vol 13 no 2 pp 255ndash258 1994
[17] S M S Jalaleddine C G Hutchens R D Strattan and WA Coberly ldquoECG data compression techniquesmdasha unifiedapproachrdquo IEEETransactions on Biomedical Engineering vol 37no 4 pp 329ndash343 1990
[18] P T Gonciari B M Al-Hashimi and N Nicolici ldquoImprovingcompression ratio area overhead and test application timefor system-on-a-chip test data compressiondecompressionrdquo inProceedings of the Design Automation and Test in Europe Con-ference and Exhibition pp 604ndash611 IEEE Computer SocietyParis France 2002
[19] F Wang and P Liu ldquoTemporal management of RFID datardquo inProceedings of the 31st International Conference on Very LargeData Bases (VLDB rsquo05) pp 1128ndash1139 September 2005
[20] E Welbourne L Battle G Cole et al ldquoBuilding the internetof things using RFID the RFID ecosystem experiencerdquo IEEEInternet Computing vol 13 no 3 pp 48ndash55 2009
[21] I Groslashnbaeligk ldquoArchitecture for the Internet of Things (IoT)API and interconnectrdquo in Proceedings of the 2nd InternationalConference on Sensor Technologies and Applications (SENSOR-COMM rsquo08) pp 802ndash807 Cap Esterel France August 2008
[22] R Want ldquoAn introduction to RFID technologyrdquo IEEE PervasiveComputing vol 5 no 1 pp 25ndash33 2006
10 International Journal of Distributed Sensor Networks
[23] Global Location Numbers (GLN) GS1 httpwwwgs1orgdocsidkeysGS1 Global Location Numberspdf
4 International Journal of Distributed Sensor Networks
WSNWSN
SDP
Gateway
Basestation
Gateway
Basestation
Raw observations
Sensordata
repository
Observation Capturer
Warning
Phenomenonstandard
of location
Phenomenon Events
Dynamic Compression
Real-TimeException Monitor
Appl
icat
ions Compressed
Phenomenon EventsInformationexchange
middot middot middot
Figure 1 Framework of SDP system in IoT
and the values of phenomena A phenomenon is a collectedcondition state For example a temperature of value 232∘C isa phenomenon
Definition 1 (Observation) Consider
O = ltS addr Time Pgt
P = ltph type ph value ph unitgt+
Here S addr refers to the address of the sensorThe addressesof sensors in different WSNs differ with each other The mostcommon types of S addr are ltgateway ip sensor set offgtand sensor ip Time means the collected time of an Observa-tion ltph type ph value ph unitgt is a triple where ph typerefers to the type of a phenomenon ph value refers to thevalue of a phenomenon and ph unit denotes the unit ofph value The sign + indicates that there may be more thanone triple of phenomena generated by a sensor
We split Observations into Phenomenon Event (PE forabbreviation) list
Definition 2 (Phenomenon Event) Consider
PE = ltPh Type S addr Time Value Unitgt
Here Ph Type refers to the type of a phenomenon S addrand Time are inherited from Observation Value is thenumeric value of a phenomenon recorded by a sensor Unitdenotes the unit of Value
421 Observation Capture Base stations which wear themanaging gateway of WSNs parse pieces of raw sensor datainto the form of Os Once Observation Capturer catchesan O it splits O into several PEs and puts them into timewindows according to their Ph Type and S addr attributesA time window which is set to output a list when time is upor the list is full collects PEs for a preset time Phenomenawith the same Ph Type and S addr are pushed into the sametime window When a time window is full or the time is upObservation Capturer gathers the PEs in the window into aPE list and passes this list to Dynamic CompressionThe PEsin this list contain values of the same phenomenon and fromthe same sensor So the values are comparable
422 Dynamic Compression Dynamic Compression (DC)component here tries to compress data discarding theredundant PEs to reduce the load of Real-Time ExceptionMonitor system and data repository
One data compression method is to record the first valuein a sequence which contains the continuous same valuesHence once a new value comes into the sequence it will bethe next recorded value Using this method PEs generatedby a sensor containing two different air temperature values233∘C and 234∘C would be two recorded PhenomenonEvents However 01∘C the difference between 233∘C and234∘C does not mean a significant change of environmentin some situations such as in the greenhouse
The other way called Threshold Compression is to preseta threshold and then discard the values whose differencewith the last record is less than the threshold This is themost common data compression method It has three limitswhen it comes across the IoT sensor data Firstly the biggerthe threshold is the more values would be thrown awaycausing lower accuracy In other words when the thresholdis reduced higher accuracy will be achieved but more redun-dant data are stored The previous method is one special caseof this method with the smallest threshold zero So findingan appropriate threshold is a balance between accuracy andstorage size Secondly there are various types of phenomenain the IoT Presetting thresholds for all phenomena needsa lot of work Thirdly sensor data change all the time Acurrent appropriate thresholdwill be improper for future datasequence
Here DC seeks a threshold for each phenomenon typedynamically to balance accuracy and compression rate withthe help of PSO algorithm and the crossbreeding of particleswarms The primary goal of DC is to get the minimum AEBased on this goal we try to reduce the redundancy of PEs inPE list making smaller CR
International Journal of Distributed Sensor Networks 5
Once a PEList comes DC executes the following steps toget a compressed PE list CPEList
Step 1 Initialize two particle populations of size m withrandom positions and velocities on = 119909Min 119909Max where119909Min represents the minimum value of the searching spaceand 119909Max refers to the maximum value of searching spaceOne population CRP = CRP1CRP2 CRP119898 adoptingCR as fitness function is used for finding the threshold in119863 to get minimum CR The other population called AEP =AEP1AEP2 AEP119898 employing AE as fitness function isassigned to search the threshold in 119863 to achieve minimumAE
Step 2 Update the velocity and position of all particles inCRP and AEP with (1) and (2)
Step 3 For each 119894 isin (1 119898) let CRP119894and AEP
Step 5 Generate compressed PE lists for all particles inCRP with their positions as thresholds Evaluate CRs for theparticles in CRP using their compressed PEList with (4)
Step 6 Generate compressed PE lists for all particles inAEP with their positions as thresholds For each particlereconstruct time-value line with the PEs in its CPEList usingthe process described in Section 423 and calculate AEbetween PEList and CPEList
Step 7 For each particle in CRP compare its current CR valuewith the CR of its 119901119887119890119904119905 position If the current value is lessthen update 119901119887119890119904119905 and its current position
Step 8 For each particle inAEP compare its current AE valuewith the AE of its 119901119887119890119904119905 position If the current value is lessthen update 119901119887119890119904119905 and its current position
Step 9 Determine the global best particle gbCRP in CRPwith the smallest CR value
Step 10 Determine the global best particle gbAEP in AEPwith the smallest AE value
Step 11 Use the position of gbCRP as threshold to generatea compressed list CRCPEList from PEList and calculate theAE value between CRCPEList and PEList
Step 12 Compare the AE value from Step 11 to the AE valueof gbAEP If the AE value of gbAEP is bigger then set theposition of gbAEP to gbCRPrsquos positon
249
25
251
252
253
254
255
256
257
258
259
203
824
210
712
213
600
220
448
223
336
230
224
233
112
Time
Temperature
Air
tem
pera
ture
(∘C)
Figure 2 Example of time-value reconstruction
Step 13 Repeat Step 2 to Step 12 until a sufficiently good AEor a maximum number of iterations are met
The position of gbAEP after the above 13 steps is theoptimal threshold that can help us to compress PEList intoCPEList balancing accuracy and compression rate
423 Time-Value Reconstruction All of the PEs in a PE listare of the same type two key attributes in a PE are timeand value So reconstruction of a PE is to recalculate thephenomenon value with a time The time-value pairs in thecompressed PE list are discrete points For reconstructionwe make Time attribute as 119883 coordinate and Value attributeas 119884 coordinate For all PEs in a compressed PE list mark(Time Value) points one by one and draw lines between twoadjacent points The 119884 values in the connected line are thereconstructed values of PEs for a certain time
For example given a compressed Phenomenon EventList CPEList
The points on the line in Figure 2 are the reconstructedPhenomenon Events
For example the value at time 2013-06-19 215024on the line chart is 255 So the reconstructed PE isltldquoair temperaturerdquo 192168012 0x001 255 C 2013-06-19215024gt
43 Real-Time Exception Monitor As explained sensor datareflects the phenomenon status in the physical space Hence
6 International Journal of Distributed Sensor Networks
sensor data are closely associated with the containers Themonitored containers require that the phenomenon statesare stable in certain intervals For example a warehousein a cold train loaded with meats requires air temperaturemaintained at [minus18∘C 0∘C] Once the air temperature is outof this interval especially beyond the limits warnings arerequired to be pushed out to the applications
Real-Time Exception Monitor (REM) receives com-pressed Phenomenon Event List CPEList from DynamicCompression (DC) to detect exception situations Althoughthe elements in CPEList are not the complete capturedPhenomenon Events they are still key elements collectedwith a threshold Therefore the Phenomenon Events inCPEList can represent the actual exception situations
REM maintains a Rule Repository and a core executioncomponent Real-Time Exception Monitor Core (REMC forshort) Rule Repository stores a set of rules defining thestandard phenomenon interval Every rule is formatted as aRL
RL = ltGLN Ph Type Min Max Unitgt
Here GLN is a thirteen-digit number used to identifyparties and physical locations Ph Type denotes the type ofa phenomenon Min andMax describe a standard range withUnit of Ph TypeThe following example shows a standard airtemperature interval [minus18∘C 0∘C] of a warehouse with GLN6901404000029 in meat cold chain
REMC focuses on the exception monitoring execution Oncea compressed PE list comes REMC first uses the commonsensor address of elements in CPEList to get the respectiveGLN number Then REMC retrieves RL for the containerfrom Rule Repository with the GLN and the commonPh Type of PEs From all hit RLs REMC looks for themaximum Min and minimum Max to form the minimumstandard range for all types of entities For each PE in thecompressed PE list if the value is not in the range REMCwill send out a warning
44 Sensor Data Storage As described in Section 3 aninformation system manages several containers and at leastone sensor is deployed in a container Meanwhile the sensorscollect information of various phenomena in seconds As aresult the number of PhenomenonEvents generated each dayis large Although PEs have been compressed in DynamicCompression the stored number still increases fast To finda better way to store and retrieve Phenomenon Eventsefficiently supporting environment information discovery forentities we attempt to maintain sensor data in two ways SQLdatabase and NoSQL database
441 SQL Database The structure of sensor data in SQLdatabase is designed as Figure 3 A container contains oneor more sensors and sensors generate several types ofevents Considering the insertion and selection operations
focus on single type of phenomenon we classify events intodifferent tables according to their types to reduce the size ofPhenomenon Event table In this way if more phenomenaare monitored more tables will be created To improvedata retrieving efficiency and reduce the storage space wedefine fields using variable characters instead of charactersfloat replacing decimal number and timestamp rather thandatetime
Insertion PEs in the compressed PE list are stored directlyinto corresponding table grouped by their Ph Type
Query The most common query on sensor data like thefollowing example takes a GLN number and a time slot tosearch the records of a specific phenomenon GLNCode inContainer table and time in Phenomenon Event tables appearoften
Select lowast from AirTemperatureEvent a join sensorson aSensorID = sSensorID join Container c onsContainerID = cContainerID where cGLNCode = lsquo1rsquoand atime between lsquo2013-06-19 200000rsquo AND lsquo2013-06-20 165959rsquo
For dereasing the event retrieving time we add indices onGLNCode field and time field
442 NOSQL Database NoSQL database maintains sen-sor-GLN collection and Phenomenon Event collectionAlgorithm 2 shows the storage example tuple in NoSQLdatabase Since NoSQL database stores data in collectionsthis paper divides the Phenomenon Event and basic infor-mation of sensor and containers into two collections Sensor-GLN collectionmaps the relationship between sensor addressand GLN number of physical container managing [S addrGLN] tuples Phenomenon Event collection stores [Ph TypeS addr Ph Value Ph Unit Time] tuples S addr whichrepresents the sensor deployed in container and the datasource of Phenomenon Events is the key value to buildrelationships between physical containers and the actualvalues of phenomena while physical containers are identifiedwith GLN code
International Journal of Distributed Sensor Networks 7
Insertion PEs in the compressed PE list are stored directlyinto Phenomenon Event collection without extra operationssince all tuples in this collection record their Ph Type
Query The most common query is also finding the Phe-nomenon Event that happened in a container during a timeslot This query would require executing querying operationtwo times Firstly get the S addr list with GLN from sensor-GLN collection Secondly for each S addr find all eventsrecords generated over this period
To reduce the query times there is another way to storePEs It is to find theGLNnumberwith S addr before insertinga PE and store [Ph Type GLN Ph Value Ph Unit Time]tuples in Phenomenon Event collection However as to thecurrent situation insertion operations of sensor data happenmore frequently than queries Hence we adopt the firstdesign on NoSQL DB
Table 1 Parameters of PSO
119908 1198881
1198882
119909Min 119909Max VMax04 18 18 00 05 02
5 Evaluation
51 Experiment Setup We implemented a prototype of theproposed sensor data processing systemSPSOur experimen-tal platform consists of a PC runningWindows 7 professionalwith 400GB memory and Intel(R) Core(TM) i3-3220 CPU 330GHz processor The parameters of PSO are set asin Table 1 The selected SQL database is MySQL and therepresentative of NoSQL database is MongoDB
52 Performance Evaluation In order to evaluate perfor-mance of our SDP system three experiments are conducted
8 International Journal of Distributed Sensor Networks
10040100441004810052
Pres
sure
val
ue (h
Pa) Pressure thinning
True pressureRecord pressure
2013
-06-
1920
38
24
2013
-06-
1921
07
12
2013
-06-
1921
36
00
2013
-06-
1922
04
48
2013
-06-
1922
33
36
2013
-06-
1923
02
24
2013
-06-
1923
31
12
2013
-06-
2000
00
00
Time
Figure 4 Compression result of pressure data
The first experiment is to show whether DC can compressdata dynamically according to the characteristics of datasequences The second experiment is to show how thenumber of PEs affects the compression speed and how theselections of database affect the storage speed The thirdexperiment is to show how the setting of database affects thetracing speed
Testing data came from Wuxi Institute of Fudan Univer-sity These data were recorded every 5 seconds by the sensorsin a greenhouse For each experiment we set the window sizeto 30 seconds To test the performance of our framework allPEs were submitted to the sliding window in 30 seconds
There are many factors affecting the performance of theproposed IoT sensor data processing SDP system such as thecharacteristics of the PEs the number of PEs and the settingsof database In our evaluation experiments we focused onhow the number of PEs and the selections of database affectperformance Meanwhile we try to evaluate whether the datacompression (DC) can compress data dynamically accordingto the characteristics of data sequencesThe initial parametersof PSO are set as in Table 1 before experiment
The first experiment was conducted with the 1867 piecesof air temperature records and 1868 pieces of pressure recordsas PEs We measured the accuracy AE and the compressionrate CR when DC outputs a compressed PE list for aphenomenon type The experimental results are shown inFigures 4 and 5 We can see that DC keeps all PEs whosevalue of pressure differs from their previous recorded valuewhile DC finds the trend of air temperature in Figure 5 andrecords only 5 points DC can find a threshold to balance theaccuracy and compression rate Meanwhile the value of AEindex approximates 000321 and the value of CR is near 05133in Figure 4 And the data compression of air temperature inFigure 5 can achieve AE value of 128197 and CR value of000268 This part of experiment shows that the proposeddynamic data compression can identify the trends of dataand balance the accuracy and compression rate Redundancyfrom sensor data can be avoided
The second experiment was conducted with the numberof sensor data varying from 500 to 1000000 We measuredthe time when DC begins compressing the time of DC aftercompression and the time after storage The second onesubtracted by the first one was considered as the compressiontime and the last one subtracted by the second one was
248250252254256258260
Time
True values
Air temperature thinning
Recorded values
2013
-06-
1920
38
24
2013
-06-
1921
07
12
2013
-06-
1921
36
00
2013
-06-
1922
04
48
2013
-06-
1922
33
36
2013
-06-
1923
02
24
2013
-06-
1923
31
12
2013
-06-
2000
00
00
Tem
pera
ture
val
ue (∘
C)
Figure 5 Compression result of air temperature data
020000400006000080000
100000120000140000
500 1000 2000 4000 6000 8000 10000
Tim
e (m
s)
Numbers of PEs
CompressionMySQL storageMongoDB storage
Figure 6 Compression and storage time with various numbers ofPEs
taken as the storage time The result is shown in Figure 6and it reflected that the compression time of DC has linearrelationship with the number of PEs The time storing PEsinto MySQL database increases faster than the time storingdata into MongoDB
The third experiment was conducted using a query totrace the air temperature information during the lifecycleof an entity with an EPC The number of recorded PEs indatabase varied from 69880 to 1985247 The query time ofMySQL and MongoDB is shown in Figure 7 We learnedthat the tracing time of MongoDB grows faster than thetime of MySQL The reason is that MongoDB had to executetwo queries when the number of tuples increases Howevercomparing to the storage time the time cost by the sensordata queries of MongoDB is short
Hence when applications are faced with a large amountof sensor data storage it is better to store sensor datainto nonrelational databases while when the number ofenvironment information discovery queries is larger thanstorage rational databases have better performance
6 Conclusion
In this paper we present a new IoT sensor data processing(SDP) system to process sensor data dynamically in thecontext of the Internet ofThings First heterogeneous sensor
International Journal of Distributed Sensor Networks 9
0
1000
2000
3000
4000
5000
6000
7000
8000
69880 190354 1133228 1985247
Que
ry ti
me (
ms)
Number of tuples in database
MysqlMongoDB
Figure 7 Query time with various numbers of tuples
data are captured and transformed into unified data formatSecond Particle Swarm Optimization algorithm is employedto do data compression avoiding redundancy and helpingto reduce the load of database by adding crossbreedingoperation on PSO algorithm The proposed SDP systemdetects exception situations by setting the standard phe-nomenon rules of containers Meanwhile an appropriatetype of database suitable for the sensor data storage in theIoT is sought and analyzed in this paper The experimentalresults show that the proposed compression method can finda threshold achieving high compression rate and keepingaccuracy and NoSQL database has better performance insensor data storagewhile relational database does betterwhenexecuting environment information discovery queries
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
References
[1] L Atzori A Iera and G Morabito ldquoThe internet of things asurveyrdquoComputer Networks vol 54 no 15 pp 2787ndash2805 2010
[2] M C Domingo ldquoAn overview of the internet of things forpeople with disabilitiesrdquo Journal of Network and ComputerApplications vol 35 no 2 pp 584ndash596 2012
[3] L Zheng H Zhang W Han et al ldquoTechnologies applicationsand governance in the internet of thingsrdquo in Internet ofThingsmdashGlobal Technological and Societal Trends from Smart Environ-ments and Spaces to Green ICT River Publishers 2011
[4] J Yick B Mukherjee and D Ghosal ldquoWireless sensor networksurveyrdquoComputerNetworks vol 52 no 12 pp 2292ndash2330 2008
[5] M Botts and A Robin Sensor Model Language (SensorML)Implementation Specification OpenGIS Implementation Spec-ification 2007
[6] M Botts and A Robin ldquoSensor Model Language (SensorML)Implementation Specification OpenGIS Implementation peci-fication 2007rdquo IEEE Instrumentation and Measurement Soci-ety IEEE Standard for a Smart Transducer Interface for Sensorsand Actuators-Common Functions Communication Proto-cols and Transducer Electronic Data Sheet (TEDS) FormatsIEEE Std 14510 2007
[7] K Aberer M Hauswirth and A Salehi ldquoInfrastructure for dataprocessing in large-scale interconnected sensor networksrdquo inProceedings of the 8th International Conference on Mobile DataManagement (MDM rsquo07) pp 198ndash205 May 2007
[8] M Sgroi A Wolisz A Sangiovanni-Vincentelli and J MRabaey ldquoA service-based universal application interface forad hoc wireless sensor and actuator networksrdquo in AmbientIntelligence pp 149ndash172 Springer Berlin Germany 2005
[9] Y Yao and J Gehrke ldquoThe cougar approach to in-network queryprocessing in sensor networksrdquoACMSigmod Record vol 31 no3 pp 9ndash18 2002
[10] C-C Shen C Srisathapornphat and C Jaikaeo ldquoSensorinformation networking architecture and applicationsrdquo IEEEPersonal Communications vol 8 no 4 pp 52ndash59 2001
[11] M Kim J W Lee Y J Lee and J-C Ryou ldquoCosmos amiddleware for integrated data processing over heterogeneoussensor networksrdquo ETRI Journal vol 30 no 5 pp 696ndash7062008
[12] G Venter and J Sobieszczanski-Sobieski ldquoParticle swarm opti-mizationrdquo AIAA Journal vol 41 no 8 pp 1583ndash1589 2003
[13] J Kennedy ldquoParticle swarm optimizationrdquo in Encyclopedia ofMachine Learning pp 760ndash766 Springer New York NY USA2010
[14] F Van den Bergh and A P Engelbrecht ldquoA study of particleswarm optimization particle trajectoriesrdquo Information Sciencesvol 176 no 8 pp 937ndash971 2006
[15] Y Shi and R C Eberhart ldquoFuzzy adaptive particle swarmoptimizationrdquo in Proceedings of the Congress on EvolutionaryComputation vol 1 pp 101ndash106 IEEE Seoul Republic of KoreaMay 2001
[16] L Gang F Jing L Ling and Y Qilian ldquoFast realization of theLADT ECG data compression methodrdquo IEEE Engineering inMedicine and BiologyMagazine vol 13 no 2 pp 255ndash258 1994
[17] S M S Jalaleddine C G Hutchens R D Strattan and WA Coberly ldquoECG data compression techniquesmdasha unifiedapproachrdquo IEEETransactions on Biomedical Engineering vol 37no 4 pp 329ndash343 1990
[18] P T Gonciari B M Al-Hashimi and N Nicolici ldquoImprovingcompression ratio area overhead and test application timefor system-on-a-chip test data compressiondecompressionrdquo inProceedings of the Design Automation and Test in Europe Con-ference and Exhibition pp 604ndash611 IEEE Computer SocietyParis France 2002
[19] F Wang and P Liu ldquoTemporal management of RFID datardquo inProceedings of the 31st International Conference on Very LargeData Bases (VLDB rsquo05) pp 1128ndash1139 September 2005
[20] E Welbourne L Battle G Cole et al ldquoBuilding the internetof things using RFID the RFID ecosystem experiencerdquo IEEEInternet Computing vol 13 no 3 pp 48ndash55 2009
[21] I Groslashnbaeligk ldquoArchitecture for the Internet of Things (IoT)API and interconnectrdquo in Proceedings of the 2nd InternationalConference on Sensor Technologies and Applications (SENSOR-COMM rsquo08) pp 802ndash807 Cap Esterel France August 2008
[22] R Want ldquoAn introduction to RFID technologyrdquo IEEE PervasiveComputing vol 5 no 1 pp 25ndash33 2006
10 International Journal of Distributed Sensor Networks
[23] Global Location Numbers (GLN) GS1 httpwwwgs1orgdocsidkeysGS1 Global Location Numberspdf
International Journal of Distributed Sensor Networks 5
Once a PEList comes DC executes the following steps toget a compressed PE list CPEList
Step 1 Initialize two particle populations of size m withrandom positions and velocities on = 119909Min 119909Max where119909Min represents the minimum value of the searching spaceand 119909Max refers to the maximum value of searching spaceOne population CRP = CRP1CRP2 CRP119898 adoptingCR as fitness function is used for finding the threshold in119863 to get minimum CR The other population called AEP =AEP1AEP2 AEP119898 employing AE as fitness function isassigned to search the threshold in 119863 to achieve minimumAE
Step 2 Update the velocity and position of all particles inCRP and AEP with (1) and (2)
Step 3 For each 119894 isin (1 119898) let CRP119894and AEP
Step 5 Generate compressed PE lists for all particles inCRP with their positions as thresholds Evaluate CRs for theparticles in CRP using their compressed PEList with (4)
Step 6 Generate compressed PE lists for all particles inAEP with their positions as thresholds For each particlereconstruct time-value line with the PEs in its CPEList usingthe process described in Section 423 and calculate AEbetween PEList and CPEList
Step 7 For each particle in CRP compare its current CR valuewith the CR of its 119901119887119890119904119905 position If the current value is lessthen update 119901119887119890119904119905 and its current position
Step 8 For each particle inAEP compare its current AE valuewith the AE of its 119901119887119890119904119905 position If the current value is lessthen update 119901119887119890119904119905 and its current position
Step 9 Determine the global best particle gbCRP in CRPwith the smallest CR value
Step 10 Determine the global best particle gbAEP in AEPwith the smallest AE value
Step 11 Use the position of gbCRP as threshold to generatea compressed list CRCPEList from PEList and calculate theAE value between CRCPEList and PEList
Step 12 Compare the AE value from Step 11 to the AE valueof gbAEP If the AE value of gbAEP is bigger then set theposition of gbAEP to gbCRPrsquos positon
249
25
251
252
253
254
255
256
257
258
259
203
824
210
712
213
600
220
448
223
336
230
224
233
112
Time
Temperature
Air
tem
pera
ture
(∘C)
Figure 2 Example of time-value reconstruction
Step 13 Repeat Step 2 to Step 12 until a sufficiently good AEor a maximum number of iterations are met
The position of gbAEP after the above 13 steps is theoptimal threshold that can help us to compress PEList intoCPEList balancing accuracy and compression rate
423 Time-Value Reconstruction All of the PEs in a PE listare of the same type two key attributes in a PE are timeand value So reconstruction of a PE is to recalculate thephenomenon value with a time The time-value pairs in thecompressed PE list are discrete points For reconstructionwe make Time attribute as 119883 coordinate and Value attributeas 119884 coordinate For all PEs in a compressed PE list mark(Time Value) points one by one and draw lines between twoadjacent points The 119884 values in the connected line are thereconstructed values of PEs for a certain time
For example given a compressed Phenomenon EventList CPEList
The points on the line in Figure 2 are the reconstructedPhenomenon Events
For example the value at time 2013-06-19 215024on the line chart is 255 So the reconstructed PE isltldquoair temperaturerdquo 192168012 0x001 255 C 2013-06-19215024gt
43 Real-Time Exception Monitor As explained sensor datareflects the phenomenon status in the physical space Hence
6 International Journal of Distributed Sensor Networks
sensor data are closely associated with the containers Themonitored containers require that the phenomenon statesare stable in certain intervals For example a warehousein a cold train loaded with meats requires air temperaturemaintained at [minus18∘C 0∘C] Once the air temperature is outof this interval especially beyond the limits warnings arerequired to be pushed out to the applications
Real-Time Exception Monitor (REM) receives com-pressed Phenomenon Event List CPEList from DynamicCompression (DC) to detect exception situations Althoughthe elements in CPEList are not the complete capturedPhenomenon Events they are still key elements collectedwith a threshold Therefore the Phenomenon Events inCPEList can represent the actual exception situations
REM maintains a Rule Repository and a core executioncomponent Real-Time Exception Monitor Core (REMC forshort) Rule Repository stores a set of rules defining thestandard phenomenon interval Every rule is formatted as aRL
RL = ltGLN Ph Type Min Max Unitgt
Here GLN is a thirteen-digit number used to identifyparties and physical locations Ph Type denotes the type ofa phenomenon Min andMax describe a standard range withUnit of Ph TypeThe following example shows a standard airtemperature interval [minus18∘C 0∘C] of a warehouse with GLN6901404000029 in meat cold chain
REMC focuses on the exception monitoring execution Oncea compressed PE list comes REMC first uses the commonsensor address of elements in CPEList to get the respectiveGLN number Then REMC retrieves RL for the containerfrom Rule Repository with the GLN and the commonPh Type of PEs From all hit RLs REMC looks for themaximum Min and minimum Max to form the minimumstandard range for all types of entities For each PE in thecompressed PE list if the value is not in the range REMCwill send out a warning
44 Sensor Data Storage As described in Section 3 aninformation system manages several containers and at leastone sensor is deployed in a container Meanwhile the sensorscollect information of various phenomena in seconds As aresult the number of PhenomenonEvents generated each dayis large Although PEs have been compressed in DynamicCompression the stored number still increases fast To finda better way to store and retrieve Phenomenon Eventsefficiently supporting environment information discovery forentities we attempt to maintain sensor data in two ways SQLdatabase and NoSQL database
441 SQL Database The structure of sensor data in SQLdatabase is designed as Figure 3 A container contains oneor more sensors and sensors generate several types ofevents Considering the insertion and selection operations
focus on single type of phenomenon we classify events intodifferent tables according to their types to reduce the size ofPhenomenon Event table In this way if more phenomenaare monitored more tables will be created To improvedata retrieving efficiency and reduce the storage space wedefine fields using variable characters instead of charactersfloat replacing decimal number and timestamp rather thandatetime
Insertion PEs in the compressed PE list are stored directlyinto corresponding table grouped by their Ph Type
Query The most common query on sensor data like thefollowing example takes a GLN number and a time slot tosearch the records of a specific phenomenon GLNCode inContainer table and time in Phenomenon Event tables appearoften
Select lowast from AirTemperatureEvent a join sensorson aSensorID = sSensorID join Container c onsContainerID = cContainerID where cGLNCode = lsquo1rsquoand atime between lsquo2013-06-19 200000rsquo AND lsquo2013-06-20 165959rsquo
For dereasing the event retrieving time we add indices onGLNCode field and time field
442 NOSQL Database NoSQL database maintains sen-sor-GLN collection and Phenomenon Event collectionAlgorithm 2 shows the storage example tuple in NoSQLdatabase Since NoSQL database stores data in collectionsthis paper divides the Phenomenon Event and basic infor-mation of sensor and containers into two collections Sensor-GLN collectionmaps the relationship between sensor addressand GLN number of physical container managing [S addrGLN] tuples Phenomenon Event collection stores [Ph TypeS addr Ph Value Ph Unit Time] tuples S addr whichrepresents the sensor deployed in container and the datasource of Phenomenon Events is the key value to buildrelationships between physical containers and the actualvalues of phenomena while physical containers are identifiedwith GLN code
International Journal of Distributed Sensor Networks 7
Insertion PEs in the compressed PE list are stored directlyinto Phenomenon Event collection without extra operationssince all tuples in this collection record their Ph Type
Query The most common query is also finding the Phe-nomenon Event that happened in a container during a timeslot This query would require executing querying operationtwo times Firstly get the S addr list with GLN from sensor-GLN collection Secondly for each S addr find all eventsrecords generated over this period
To reduce the query times there is another way to storePEs It is to find theGLNnumberwith S addr before insertinga PE and store [Ph Type GLN Ph Value Ph Unit Time]tuples in Phenomenon Event collection However as to thecurrent situation insertion operations of sensor data happenmore frequently than queries Hence we adopt the firstdesign on NoSQL DB
Table 1 Parameters of PSO
119908 1198881
1198882
119909Min 119909Max VMax04 18 18 00 05 02
5 Evaluation
51 Experiment Setup We implemented a prototype of theproposed sensor data processing systemSPSOur experimen-tal platform consists of a PC runningWindows 7 professionalwith 400GB memory and Intel(R) Core(TM) i3-3220 CPU 330GHz processor The parameters of PSO are set asin Table 1 The selected SQL database is MySQL and therepresentative of NoSQL database is MongoDB
52 Performance Evaluation In order to evaluate perfor-mance of our SDP system three experiments are conducted
8 International Journal of Distributed Sensor Networks
10040100441004810052
Pres
sure
val
ue (h
Pa) Pressure thinning
True pressureRecord pressure
2013
-06-
1920
38
24
2013
-06-
1921
07
12
2013
-06-
1921
36
00
2013
-06-
1922
04
48
2013
-06-
1922
33
36
2013
-06-
1923
02
24
2013
-06-
1923
31
12
2013
-06-
2000
00
00
Time
Figure 4 Compression result of pressure data
The first experiment is to show whether DC can compressdata dynamically according to the characteristics of datasequences The second experiment is to show how thenumber of PEs affects the compression speed and how theselections of database affect the storage speed The thirdexperiment is to show how the setting of database affects thetracing speed
Testing data came from Wuxi Institute of Fudan Univer-sity These data were recorded every 5 seconds by the sensorsin a greenhouse For each experiment we set the window sizeto 30 seconds To test the performance of our framework allPEs were submitted to the sliding window in 30 seconds
There are many factors affecting the performance of theproposed IoT sensor data processing SDP system such as thecharacteristics of the PEs the number of PEs and the settingsof database In our evaluation experiments we focused onhow the number of PEs and the selections of database affectperformance Meanwhile we try to evaluate whether the datacompression (DC) can compress data dynamically accordingto the characteristics of data sequencesThe initial parametersof PSO are set as in Table 1 before experiment
The first experiment was conducted with the 1867 piecesof air temperature records and 1868 pieces of pressure recordsas PEs We measured the accuracy AE and the compressionrate CR when DC outputs a compressed PE list for aphenomenon type The experimental results are shown inFigures 4 and 5 We can see that DC keeps all PEs whosevalue of pressure differs from their previous recorded valuewhile DC finds the trend of air temperature in Figure 5 andrecords only 5 points DC can find a threshold to balance theaccuracy and compression rate Meanwhile the value of AEindex approximates 000321 and the value of CR is near 05133in Figure 4 And the data compression of air temperature inFigure 5 can achieve AE value of 128197 and CR value of000268 This part of experiment shows that the proposeddynamic data compression can identify the trends of dataand balance the accuracy and compression rate Redundancyfrom sensor data can be avoided
The second experiment was conducted with the numberof sensor data varying from 500 to 1000000 We measuredthe time when DC begins compressing the time of DC aftercompression and the time after storage The second onesubtracted by the first one was considered as the compressiontime and the last one subtracted by the second one was
248250252254256258260
Time
True values
Air temperature thinning
Recorded values
2013
-06-
1920
38
24
2013
-06-
1921
07
12
2013
-06-
1921
36
00
2013
-06-
1922
04
48
2013
-06-
1922
33
36
2013
-06-
1923
02
24
2013
-06-
1923
31
12
2013
-06-
2000
00
00
Tem
pera
ture
val
ue (∘
C)
Figure 5 Compression result of air temperature data
020000400006000080000
100000120000140000
500 1000 2000 4000 6000 8000 10000
Tim
e (m
s)
Numbers of PEs
CompressionMySQL storageMongoDB storage
Figure 6 Compression and storage time with various numbers ofPEs
taken as the storage time The result is shown in Figure 6and it reflected that the compression time of DC has linearrelationship with the number of PEs The time storing PEsinto MySQL database increases faster than the time storingdata into MongoDB
The third experiment was conducted using a query totrace the air temperature information during the lifecycleof an entity with an EPC The number of recorded PEs indatabase varied from 69880 to 1985247 The query time ofMySQL and MongoDB is shown in Figure 7 We learnedthat the tracing time of MongoDB grows faster than thetime of MySQL The reason is that MongoDB had to executetwo queries when the number of tuples increases Howevercomparing to the storage time the time cost by the sensordata queries of MongoDB is short
Hence when applications are faced with a large amountof sensor data storage it is better to store sensor datainto nonrelational databases while when the number ofenvironment information discovery queries is larger thanstorage rational databases have better performance
6 Conclusion
In this paper we present a new IoT sensor data processing(SDP) system to process sensor data dynamically in thecontext of the Internet ofThings First heterogeneous sensor
International Journal of Distributed Sensor Networks 9
0
1000
2000
3000
4000
5000
6000
7000
8000
69880 190354 1133228 1985247
Que
ry ti
me (
ms)
Number of tuples in database
MysqlMongoDB
Figure 7 Query time with various numbers of tuples
data are captured and transformed into unified data formatSecond Particle Swarm Optimization algorithm is employedto do data compression avoiding redundancy and helpingto reduce the load of database by adding crossbreedingoperation on PSO algorithm The proposed SDP systemdetects exception situations by setting the standard phe-nomenon rules of containers Meanwhile an appropriatetype of database suitable for the sensor data storage in theIoT is sought and analyzed in this paper The experimentalresults show that the proposed compression method can finda threshold achieving high compression rate and keepingaccuracy and NoSQL database has better performance insensor data storagewhile relational database does betterwhenexecuting environment information discovery queries
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
References
[1] L Atzori A Iera and G Morabito ldquoThe internet of things asurveyrdquoComputer Networks vol 54 no 15 pp 2787ndash2805 2010
[2] M C Domingo ldquoAn overview of the internet of things forpeople with disabilitiesrdquo Journal of Network and ComputerApplications vol 35 no 2 pp 584ndash596 2012
[3] L Zheng H Zhang W Han et al ldquoTechnologies applicationsand governance in the internet of thingsrdquo in Internet ofThingsmdashGlobal Technological and Societal Trends from Smart Environ-ments and Spaces to Green ICT River Publishers 2011
[4] J Yick B Mukherjee and D Ghosal ldquoWireless sensor networksurveyrdquoComputerNetworks vol 52 no 12 pp 2292ndash2330 2008
[5] M Botts and A Robin Sensor Model Language (SensorML)Implementation Specification OpenGIS Implementation Spec-ification 2007
[6] M Botts and A Robin ldquoSensor Model Language (SensorML)Implementation Specification OpenGIS Implementation peci-fication 2007rdquo IEEE Instrumentation and Measurement Soci-ety IEEE Standard for a Smart Transducer Interface for Sensorsand Actuators-Common Functions Communication Proto-cols and Transducer Electronic Data Sheet (TEDS) FormatsIEEE Std 14510 2007
[7] K Aberer M Hauswirth and A Salehi ldquoInfrastructure for dataprocessing in large-scale interconnected sensor networksrdquo inProceedings of the 8th International Conference on Mobile DataManagement (MDM rsquo07) pp 198ndash205 May 2007
[8] M Sgroi A Wolisz A Sangiovanni-Vincentelli and J MRabaey ldquoA service-based universal application interface forad hoc wireless sensor and actuator networksrdquo in AmbientIntelligence pp 149ndash172 Springer Berlin Germany 2005
[9] Y Yao and J Gehrke ldquoThe cougar approach to in-network queryprocessing in sensor networksrdquoACMSigmod Record vol 31 no3 pp 9ndash18 2002
[10] C-C Shen C Srisathapornphat and C Jaikaeo ldquoSensorinformation networking architecture and applicationsrdquo IEEEPersonal Communications vol 8 no 4 pp 52ndash59 2001
[11] M Kim J W Lee Y J Lee and J-C Ryou ldquoCosmos amiddleware for integrated data processing over heterogeneoussensor networksrdquo ETRI Journal vol 30 no 5 pp 696ndash7062008
[12] G Venter and J Sobieszczanski-Sobieski ldquoParticle swarm opti-mizationrdquo AIAA Journal vol 41 no 8 pp 1583ndash1589 2003
[13] J Kennedy ldquoParticle swarm optimizationrdquo in Encyclopedia ofMachine Learning pp 760ndash766 Springer New York NY USA2010
[14] F Van den Bergh and A P Engelbrecht ldquoA study of particleswarm optimization particle trajectoriesrdquo Information Sciencesvol 176 no 8 pp 937ndash971 2006
[15] Y Shi and R C Eberhart ldquoFuzzy adaptive particle swarmoptimizationrdquo in Proceedings of the Congress on EvolutionaryComputation vol 1 pp 101ndash106 IEEE Seoul Republic of KoreaMay 2001
[16] L Gang F Jing L Ling and Y Qilian ldquoFast realization of theLADT ECG data compression methodrdquo IEEE Engineering inMedicine and BiologyMagazine vol 13 no 2 pp 255ndash258 1994
[17] S M S Jalaleddine C G Hutchens R D Strattan and WA Coberly ldquoECG data compression techniquesmdasha unifiedapproachrdquo IEEETransactions on Biomedical Engineering vol 37no 4 pp 329ndash343 1990
[18] P T Gonciari B M Al-Hashimi and N Nicolici ldquoImprovingcompression ratio area overhead and test application timefor system-on-a-chip test data compressiondecompressionrdquo inProceedings of the Design Automation and Test in Europe Con-ference and Exhibition pp 604ndash611 IEEE Computer SocietyParis France 2002
[19] F Wang and P Liu ldquoTemporal management of RFID datardquo inProceedings of the 31st International Conference on Very LargeData Bases (VLDB rsquo05) pp 1128ndash1139 September 2005
[20] E Welbourne L Battle G Cole et al ldquoBuilding the internetof things using RFID the RFID ecosystem experiencerdquo IEEEInternet Computing vol 13 no 3 pp 48ndash55 2009
[21] I Groslashnbaeligk ldquoArchitecture for the Internet of Things (IoT)API and interconnectrdquo in Proceedings of the 2nd InternationalConference on Sensor Technologies and Applications (SENSOR-COMM rsquo08) pp 802ndash807 Cap Esterel France August 2008
[22] R Want ldquoAn introduction to RFID technologyrdquo IEEE PervasiveComputing vol 5 no 1 pp 25ndash33 2006
10 International Journal of Distributed Sensor Networks
[23] Global Location Numbers (GLN) GS1 httpwwwgs1orgdocsidkeysGS1 Global Location Numberspdf
6 International Journal of Distributed Sensor Networks
sensor data are closely associated with the containers Themonitored containers require that the phenomenon statesare stable in certain intervals For example a warehousein a cold train loaded with meats requires air temperaturemaintained at [minus18∘C 0∘C] Once the air temperature is outof this interval especially beyond the limits warnings arerequired to be pushed out to the applications
Real-Time Exception Monitor (REM) receives com-pressed Phenomenon Event List CPEList from DynamicCompression (DC) to detect exception situations Althoughthe elements in CPEList are not the complete capturedPhenomenon Events they are still key elements collectedwith a threshold Therefore the Phenomenon Events inCPEList can represent the actual exception situations
REM maintains a Rule Repository and a core executioncomponent Real-Time Exception Monitor Core (REMC forshort) Rule Repository stores a set of rules defining thestandard phenomenon interval Every rule is formatted as aRL
RL = ltGLN Ph Type Min Max Unitgt
Here GLN is a thirteen-digit number used to identifyparties and physical locations Ph Type denotes the type ofa phenomenon Min andMax describe a standard range withUnit of Ph TypeThe following example shows a standard airtemperature interval [minus18∘C 0∘C] of a warehouse with GLN6901404000029 in meat cold chain
REMC focuses on the exception monitoring execution Oncea compressed PE list comes REMC first uses the commonsensor address of elements in CPEList to get the respectiveGLN number Then REMC retrieves RL for the containerfrom Rule Repository with the GLN and the commonPh Type of PEs From all hit RLs REMC looks for themaximum Min and minimum Max to form the minimumstandard range for all types of entities For each PE in thecompressed PE list if the value is not in the range REMCwill send out a warning
44 Sensor Data Storage As described in Section 3 aninformation system manages several containers and at leastone sensor is deployed in a container Meanwhile the sensorscollect information of various phenomena in seconds As aresult the number of PhenomenonEvents generated each dayis large Although PEs have been compressed in DynamicCompression the stored number still increases fast To finda better way to store and retrieve Phenomenon Eventsefficiently supporting environment information discovery forentities we attempt to maintain sensor data in two ways SQLdatabase and NoSQL database
441 SQL Database The structure of sensor data in SQLdatabase is designed as Figure 3 A container contains oneor more sensors and sensors generate several types ofevents Considering the insertion and selection operations
focus on single type of phenomenon we classify events intodifferent tables according to their types to reduce the size ofPhenomenon Event table In this way if more phenomenaare monitored more tables will be created To improvedata retrieving efficiency and reduce the storage space wedefine fields using variable characters instead of charactersfloat replacing decimal number and timestamp rather thandatetime
Insertion PEs in the compressed PE list are stored directlyinto corresponding table grouped by their Ph Type
Query The most common query on sensor data like thefollowing example takes a GLN number and a time slot tosearch the records of a specific phenomenon GLNCode inContainer table and time in Phenomenon Event tables appearoften
Select lowast from AirTemperatureEvent a join sensorson aSensorID = sSensorID join Container c onsContainerID = cContainerID where cGLNCode = lsquo1rsquoand atime between lsquo2013-06-19 200000rsquo AND lsquo2013-06-20 165959rsquo
For dereasing the event retrieving time we add indices onGLNCode field and time field
442 NOSQL Database NoSQL database maintains sen-sor-GLN collection and Phenomenon Event collectionAlgorithm 2 shows the storage example tuple in NoSQLdatabase Since NoSQL database stores data in collectionsthis paper divides the Phenomenon Event and basic infor-mation of sensor and containers into two collections Sensor-GLN collectionmaps the relationship between sensor addressand GLN number of physical container managing [S addrGLN] tuples Phenomenon Event collection stores [Ph TypeS addr Ph Value Ph Unit Time] tuples S addr whichrepresents the sensor deployed in container and the datasource of Phenomenon Events is the key value to buildrelationships between physical containers and the actualvalues of phenomena while physical containers are identifiedwith GLN code
International Journal of Distributed Sensor Networks 7
Insertion PEs in the compressed PE list are stored directlyinto Phenomenon Event collection without extra operationssince all tuples in this collection record their Ph Type
Query The most common query is also finding the Phe-nomenon Event that happened in a container during a timeslot This query would require executing querying operationtwo times Firstly get the S addr list with GLN from sensor-GLN collection Secondly for each S addr find all eventsrecords generated over this period
To reduce the query times there is another way to storePEs It is to find theGLNnumberwith S addr before insertinga PE and store [Ph Type GLN Ph Value Ph Unit Time]tuples in Phenomenon Event collection However as to thecurrent situation insertion operations of sensor data happenmore frequently than queries Hence we adopt the firstdesign on NoSQL DB
Table 1 Parameters of PSO
119908 1198881
1198882
119909Min 119909Max VMax04 18 18 00 05 02
5 Evaluation
51 Experiment Setup We implemented a prototype of theproposed sensor data processing systemSPSOur experimen-tal platform consists of a PC runningWindows 7 professionalwith 400GB memory and Intel(R) Core(TM) i3-3220 CPU 330GHz processor The parameters of PSO are set asin Table 1 The selected SQL database is MySQL and therepresentative of NoSQL database is MongoDB
52 Performance Evaluation In order to evaluate perfor-mance of our SDP system three experiments are conducted
8 International Journal of Distributed Sensor Networks
10040100441004810052
Pres
sure
val
ue (h
Pa) Pressure thinning
True pressureRecord pressure
2013
-06-
1920
38
24
2013
-06-
1921
07
12
2013
-06-
1921
36
00
2013
-06-
1922
04
48
2013
-06-
1922
33
36
2013
-06-
1923
02
24
2013
-06-
1923
31
12
2013
-06-
2000
00
00
Time
Figure 4 Compression result of pressure data
The first experiment is to show whether DC can compressdata dynamically according to the characteristics of datasequences The second experiment is to show how thenumber of PEs affects the compression speed and how theselections of database affect the storage speed The thirdexperiment is to show how the setting of database affects thetracing speed
Testing data came from Wuxi Institute of Fudan Univer-sity These data were recorded every 5 seconds by the sensorsin a greenhouse For each experiment we set the window sizeto 30 seconds To test the performance of our framework allPEs were submitted to the sliding window in 30 seconds
There are many factors affecting the performance of theproposed IoT sensor data processing SDP system such as thecharacteristics of the PEs the number of PEs and the settingsof database In our evaluation experiments we focused onhow the number of PEs and the selections of database affectperformance Meanwhile we try to evaluate whether the datacompression (DC) can compress data dynamically accordingto the characteristics of data sequencesThe initial parametersof PSO are set as in Table 1 before experiment
The first experiment was conducted with the 1867 piecesof air temperature records and 1868 pieces of pressure recordsas PEs We measured the accuracy AE and the compressionrate CR when DC outputs a compressed PE list for aphenomenon type The experimental results are shown inFigures 4 and 5 We can see that DC keeps all PEs whosevalue of pressure differs from their previous recorded valuewhile DC finds the trend of air temperature in Figure 5 andrecords only 5 points DC can find a threshold to balance theaccuracy and compression rate Meanwhile the value of AEindex approximates 000321 and the value of CR is near 05133in Figure 4 And the data compression of air temperature inFigure 5 can achieve AE value of 128197 and CR value of000268 This part of experiment shows that the proposeddynamic data compression can identify the trends of dataand balance the accuracy and compression rate Redundancyfrom sensor data can be avoided
The second experiment was conducted with the numberof sensor data varying from 500 to 1000000 We measuredthe time when DC begins compressing the time of DC aftercompression and the time after storage The second onesubtracted by the first one was considered as the compressiontime and the last one subtracted by the second one was
248250252254256258260
Time
True values
Air temperature thinning
Recorded values
2013
-06-
1920
38
24
2013
-06-
1921
07
12
2013
-06-
1921
36
00
2013
-06-
1922
04
48
2013
-06-
1922
33
36
2013
-06-
1923
02
24
2013
-06-
1923
31
12
2013
-06-
2000
00
00
Tem
pera
ture
val
ue (∘
C)
Figure 5 Compression result of air temperature data
020000400006000080000
100000120000140000
500 1000 2000 4000 6000 8000 10000
Tim
e (m
s)
Numbers of PEs
CompressionMySQL storageMongoDB storage
Figure 6 Compression and storage time with various numbers ofPEs
taken as the storage time The result is shown in Figure 6and it reflected that the compression time of DC has linearrelationship with the number of PEs The time storing PEsinto MySQL database increases faster than the time storingdata into MongoDB
The third experiment was conducted using a query totrace the air temperature information during the lifecycleof an entity with an EPC The number of recorded PEs indatabase varied from 69880 to 1985247 The query time ofMySQL and MongoDB is shown in Figure 7 We learnedthat the tracing time of MongoDB grows faster than thetime of MySQL The reason is that MongoDB had to executetwo queries when the number of tuples increases Howevercomparing to the storage time the time cost by the sensordata queries of MongoDB is short
Hence when applications are faced with a large amountof sensor data storage it is better to store sensor datainto nonrelational databases while when the number ofenvironment information discovery queries is larger thanstorage rational databases have better performance
6 Conclusion
In this paper we present a new IoT sensor data processing(SDP) system to process sensor data dynamically in thecontext of the Internet ofThings First heterogeneous sensor
International Journal of Distributed Sensor Networks 9
0
1000
2000
3000
4000
5000
6000
7000
8000
69880 190354 1133228 1985247
Que
ry ti
me (
ms)
Number of tuples in database
MysqlMongoDB
Figure 7 Query time with various numbers of tuples
data are captured and transformed into unified data formatSecond Particle Swarm Optimization algorithm is employedto do data compression avoiding redundancy and helpingto reduce the load of database by adding crossbreedingoperation on PSO algorithm The proposed SDP systemdetects exception situations by setting the standard phe-nomenon rules of containers Meanwhile an appropriatetype of database suitable for the sensor data storage in theIoT is sought and analyzed in this paper The experimentalresults show that the proposed compression method can finda threshold achieving high compression rate and keepingaccuracy and NoSQL database has better performance insensor data storagewhile relational database does betterwhenexecuting environment information discovery queries
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
References
[1] L Atzori A Iera and G Morabito ldquoThe internet of things asurveyrdquoComputer Networks vol 54 no 15 pp 2787ndash2805 2010
[2] M C Domingo ldquoAn overview of the internet of things forpeople with disabilitiesrdquo Journal of Network and ComputerApplications vol 35 no 2 pp 584ndash596 2012
[3] L Zheng H Zhang W Han et al ldquoTechnologies applicationsand governance in the internet of thingsrdquo in Internet ofThingsmdashGlobal Technological and Societal Trends from Smart Environ-ments and Spaces to Green ICT River Publishers 2011
[4] J Yick B Mukherjee and D Ghosal ldquoWireless sensor networksurveyrdquoComputerNetworks vol 52 no 12 pp 2292ndash2330 2008
[5] M Botts and A Robin Sensor Model Language (SensorML)Implementation Specification OpenGIS Implementation Spec-ification 2007
[6] M Botts and A Robin ldquoSensor Model Language (SensorML)Implementation Specification OpenGIS Implementation peci-fication 2007rdquo IEEE Instrumentation and Measurement Soci-ety IEEE Standard for a Smart Transducer Interface for Sensorsand Actuators-Common Functions Communication Proto-cols and Transducer Electronic Data Sheet (TEDS) FormatsIEEE Std 14510 2007
[7] K Aberer M Hauswirth and A Salehi ldquoInfrastructure for dataprocessing in large-scale interconnected sensor networksrdquo inProceedings of the 8th International Conference on Mobile DataManagement (MDM rsquo07) pp 198ndash205 May 2007
[8] M Sgroi A Wolisz A Sangiovanni-Vincentelli and J MRabaey ldquoA service-based universal application interface forad hoc wireless sensor and actuator networksrdquo in AmbientIntelligence pp 149ndash172 Springer Berlin Germany 2005
[9] Y Yao and J Gehrke ldquoThe cougar approach to in-network queryprocessing in sensor networksrdquoACMSigmod Record vol 31 no3 pp 9ndash18 2002
[10] C-C Shen C Srisathapornphat and C Jaikaeo ldquoSensorinformation networking architecture and applicationsrdquo IEEEPersonal Communications vol 8 no 4 pp 52ndash59 2001
[11] M Kim J W Lee Y J Lee and J-C Ryou ldquoCosmos amiddleware for integrated data processing over heterogeneoussensor networksrdquo ETRI Journal vol 30 no 5 pp 696ndash7062008
[12] G Venter and J Sobieszczanski-Sobieski ldquoParticle swarm opti-mizationrdquo AIAA Journal vol 41 no 8 pp 1583ndash1589 2003
[13] J Kennedy ldquoParticle swarm optimizationrdquo in Encyclopedia ofMachine Learning pp 760ndash766 Springer New York NY USA2010
[14] F Van den Bergh and A P Engelbrecht ldquoA study of particleswarm optimization particle trajectoriesrdquo Information Sciencesvol 176 no 8 pp 937ndash971 2006
[15] Y Shi and R C Eberhart ldquoFuzzy adaptive particle swarmoptimizationrdquo in Proceedings of the Congress on EvolutionaryComputation vol 1 pp 101ndash106 IEEE Seoul Republic of KoreaMay 2001
[16] L Gang F Jing L Ling and Y Qilian ldquoFast realization of theLADT ECG data compression methodrdquo IEEE Engineering inMedicine and BiologyMagazine vol 13 no 2 pp 255ndash258 1994
[17] S M S Jalaleddine C G Hutchens R D Strattan and WA Coberly ldquoECG data compression techniquesmdasha unifiedapproachrdquo IEEETransactions on Biomedical Engineering vol 37no 4 pp 329ndash343 1990
[18] P T Gonciari B M Al-Hashimi and N Nicolici ldquoImprovingcompression ratio area overhead and test application timefor system-on-a-chip test data compressiondecompressionrdquo inProceedings of the Design Automation and Test in Europe Con-ference and Exhibition pp 604ndash611 IEEE Computer SocietyParis France 2002
[19] F Wang and P Liu ldquoTemporal management of RFID datardquo inProceedings of the 31st International Conference on Very LargeData Bases (VLDB rsquo05) pp 1128ndash1139 September 2005
[20] E Welbourne L Battle G Cole et al ldquoBuilding the internetof things using RFID the RFID ecosystem experiencerdquo IEEEInternet Computing vol 13 no 3 pp 48ndash55 2009
[21] I Groslashnbaeligk ldquoArchitecture for the Internet of Things (IoT)API and interconnectrdquo in Proceedings of the 2nd InternationalConference on Sensor Technologies and Applications (SENSOR-COMM rsquo08) pp 802ndash807 Cap Esterel France August 2008
[22] R Want ldquoAn introduction to RFID technologyrdquo IEEE PervasiveComputing vol 5 no 1 pp 25ndash33 2006
10 International Journal of Distributed Sensor Networks
[23] Global Location Numbers (GLN) GS1 httpwwwgs1orgdocsidkeysGS1 Global Location Numberspdf
Insertion PEs in the compressed PE list are stored directlyinto Phenomenon Event collection without extra operationssince all tuples in this collection record their Ph Type
Query The most common query is also finding the Phe-nomenon Event that happened in a container during a timeslot This query would require executing querying operationtwo times Firstly get the S addr list with GLN from sensor-GLN collection Secondly for each S addr find all eventsrecords generated over this period
To reduce the query times there is another way to storePEs It is to find theGLNnumberwith S addr before insertinga PE and store [Ph Type GLN Ph Value Ph Unit Time]tuples in Phenomenon Event collection However as to thecurrent situation insertion operations of sensor data happenmore frequently than queries Hence we adopt the firstdesign on NoSQL DB
Table 1 Parameters of PSO
119908 1198881
1198882
119909Min 119909Max VMax04 18 18 00 05 02
5 Evaluation
51 Experiment Setup We implemented a prototype of theproposed sensor data processing systemSPSOur experimen-tal platform consists of a PC runningWindows 7 professionalwith 400GB memory and Intel(R) Core(TM) i3-3220 CPU 330GHz processor The parameters of PSO are set asin Table 1 The selected SQL database is MySQL and therepresentative of NoSQL database is MongoDB
52 Performance Evaluation In order to evaluate perfor-mance of our SDP system three experiments are conducted
8 International Journal of Distributed Sensor Networks
10040100441004810052
Pres
sure
val
ue (h
Pa) Pressure thinning
True pressureRecord pressure
2013
-06-
1920
38
24
2013
-06-
1921
07
12
2013
-06-
1921
36
00
2013
-06-
1922
04
48
2013
-06-
1922
33
36
2013
-06-
1923
02
24
2013
-06-
1923
31
12
2013
-06-
2000
00
00
Time
Figure 4 Compression result of pressure data
The first experiment is to show whether DC can compressdata dynamically according to the characteristics of datasequences The second experiment is to show how thenumber of PEs affects the compression speed and how theselections of database affect the storage speed The thirdexperiment is to show how the setting of database affects thetracing speed
Testing data came from Wuxi Institute of Fudan Univer-sity These data were recorded every 5 seconds by the sensorsin a greenhouse For each experiment we set the window sizeto 30 seconds To test the performance of our framework allPEs were submitted to the sliding window in 30 seconds
There are many factors affecting the performance of theproposed IoT sensor data processing SDP system such as thecharacteristics of the PEs the number of PEs and the settingsof database In our evaluation experiments we focused onhow the number of PEs and the selections of database affectperformance Meanwhile we try to evaluate whether the datacompression (DC) can compress data dynamically accordingto the characteristics of data sequencesThe initial parametersof PSO are set as in Table 1 before experiment
The first experiment was conducted with the 1867 piecesof air temperature records and 1868 pieces of pressure recordsas PEs We measured the accuracy AE and the compressionrate CR when DC outputs a compressed PE list for aphenomenon type The experimental results are shown inFigures 4 and 5 We can see that DC keeps all PEs whosevalue of pressure differs from their previous recorded valuewhile DC finds the trend of air temperature in Figure 5 andrecords only 5 points DC can find a threshold to balance theaccuracy and compression rate Meanwhile the value of AEindex approximates 000321 and the value of CR is near 05133in Figure 4 And the data compression of air temperature inFigure 5 can achieve AE value of 128197 and CR value of000268 This part of experiment shows that the proposeddynamic data compression can identify the trends of dataand balance the accuracy and compression rate Redundancyfrom sensor data can be avoided
The second experiment was conducted with the numberof sensor data varying from 500 to 1000000 We measuredthe time when DC begins compressing the time of DC aftercompression and the time after storage The second onesubtracted by the first one was considered as the compressiontime and the last one subtracted by the second one was
248250252254256258260
Time
True values
Air temperature thinning
Recorded values
2013
-06-
1920
38
24
2013
-06-
1921
07
12
2013
-06-
1921
36
00
2013
-06-
1922
04
48
2013
-06-
1922
33
36
2013
-06-
1923
02
24
2013
-06-
1923
31
12
2013
-06-
2000
00
00
Tem
pera
ture
val
ue (∘
C)
Figure 5 Compression result of air temperature data
020000400006000080000
100000120000140000
500 1000 2000 4000 6000 8000 10000
Tim
e (m
s)
Numbers of PEs
CompressionMySQL storageMongoDB storage
Figure 6 Compression and storage time with various numbers ofPEs
taken as the storage time The result is shown in Figure 6and it reflected that the compression time of DC has linearrelationship with the number of PEs The time storing PEsinto MySQL database increases faster than the time storingdata into MongoDB
The third experiment was conducted using a query totrace the air temperature information during the lifecycleof an entity with an EPC The number of recorded PEs indatabase varied from 69880 to 1985247 The query time ofMySQL and MongoDB is shown in Figure 7 We learnedthat the tracing time of MongoDB grows faster than thetime of MySQL The reason is that MongoDB had to executetwo queries when the number of tuples increases Howevercomparing to the storage time the time cost by the sensordata queries of MongoDB is short
Hence when applications are faced with a large amountof sensor data storage it is better to store sensor datainto nonrelational databases while when the number ofenvironment information discovery queries is larger thanstorage rational databases have better performance
6 Conclusion
In this paper we present a new IoT sensor data processing(SDP) system to process sensor data dynamically in thecontext of the Internet ofThings First heterogeneous sensor
International Journal of Distributed Sensor Networks 9
0
1000
2000
3000
4000
5000
6000
7000
8000
69880 190354 1133228 1985247
Que
ry ti
me (
ms)
Number of tuples in database
MysqlMongoDB
Figure 7 Query time with various numbers of tuples
data are captured and transformed into unified data formatSecond Particle Swarm Optimization algorithm is employedto do data compression avoiding redundancy and helpingto reduce the load of database by adding crossbreedingoperation on PSO algorithm The proposed SDP systemdetects exception situations by setting the standard phe-nomenon rules of containers Meanwhile an appropriatetype of database suitable for the sensor data storage in theIoT is sought and analyzed in this paper The experimentalresults show that the proposed compression method can finda threshold achieving high compression rate and keepingaccuracy and NoSQL database has better performance insensor data storagewhile relational database does betterwhenexecuting environment information discovery queries
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
References
[1] L Atzori A Iera and G Morabito ldquoThe internet of things asurveyrdquoComputer Networks vol 54 no 15 pp 2787ndash2805 2010
[2] M C Domingo ldquoAn overview of the internet of things forpeople with disabilitiesrdquo Journal of Network and ComputerApplications vol 35 no 2 pp 584ndash596 2012
[3] L Zheng H Zhang W Han et al ldquoTechnologies applicationsand governance in the internet of thingsrdquo in Internet ofThingsmdashGlobal Technological and Societal Trends from Smart Environ-ments and Spaces to Green ICT River Publishers 2011
[4] J Yick B Mukherjee and D Ghosal ldquoWireless sensor networksurveyrdquoComputerNetworks vol 52 no 12 pp 2292ndash2330 2008
[5] M Botts and A Robin Sensor Model Language (SensorML)Implementation Specification OpenGIS Implementation Spec-ification 2007
[6] M Botts and A Robin ldquoSensor Model Language (SensorML)Implementation Specification OpenGIS Implementation peci-fication 2007rdquo IEEE Instrumentation and Measurement Soci-ety IEEE Standard for a Smart Transducer Interface for Sensorsand Actuators-Common Functions Communication Proto-cols and Transducer Electronic Data Sheet (TEDS) FormatsIEEE Std 14510 2007
[7] K Aberer M Hauswirth and A Salehi ldquoInfrastructure for dataprocessing in large-scale interconnected sensor networksrdquo inProceedings of the 8th International Conference on Mobile DataManagement (MDM rsquo07) pp 198ndash205 May 2007
[8] M Sgroi A Wolisz A Sangiovanni-Vincentelli and J MRabaey ldquoA service-based universal application interface forad hoc wireless sensor and actuator networksrdquo in AmbientIntelligence pp 149ndash172 Springer Berlin Germany 2005
[9] Y Yao and J Gehrke ldquoThe cougar approach to in-network queryprocessing in sensor networksrdquoACMSigmod Record vol 31 no3 pp 9ndash18 2002
[10] C-C Shen C Srisathapornphat and C Jaikaeo ldquoSensorinformation networking architecture and applicationsrdquo IEEEPersonal Communications vol 8 no 4 pp 52ndash59 2001
[11] M Kim J W Lee Y J Lee and J-C Ryou ldquoCosmos amiddleware for integrated data processing over heterogeneoussensor networksrdquo ETRI Journal vol 30 no 5 pp 696ndash7062008
[12] G Venter and J Sobieszczanski-Sobieski ldquoParticle swarm opti-mizationrdquo AIAA Journal vol 41 no 8 pp 1583ndash1589 2003
[13] J Kennedy ldquoParticle swarm optimizationrdquo in Encyclopedia ofMachine Learning pp 760ndash766 Springer New York NY USA2010
[14] F Van den Bergh and A P Engelbrecht ldquoA study of particleswarm optimization particle trajectoriesrdquo Information Sciencesvol 176 no 8 pp 937ndash971 2006
[15] Y Shi and R C Eberhart ldquoFuzzy adaptive particle swarmoptimizationrdquo in Proceedings of the Congress on EvolutionaryComputation vol 1 pp 101ndash106 IEEE Seoul Republic of KoreaMay 2001
[16] L Gang F Jing L Ling and Y Qilian ldquoFast realization of theLADT ECG data compression methodrdquo IEEE Engineering inMedicine and BiologyMagazine vol 13 no 2 pp 255ndash258 1994
[17] S M S Jalaleddine C G Hutchens R D Strattan and WA Coberly ldquoECG data compression techniquesmdasha unifiedapproachrdquo IEEETransactions on Biomedical Engineering vol 37no 4 pp 329ndash343 1990
[18] P T Gonciari B M Al-Hashimi and N Nicolici ldquoImprovingcompression ratio area overhead and test application timefor system-on-a-chip test data compressiondecompressionrdquo inProceedings of the Design Automation and Test in Europe Con-ference and Exhibition pp 604ndash611 IEEE Computer SocietyParis France 2002
[19] F Wang and P Liu ldquoTemporal management of RFID datardquo inProceedings of the 31st International Conference on Very LargeData Bases (VLDB rsquo05) pp 1128ndash1139 September 2005
[20] E Welbourne L Battle G Cole et al ldquoBuilding the internetof things using RFID the RFID ecosystem experiencerdquo IEEEInternet Computing vol 13 no 3 pp 48ndash55 2009
[21] I Groslashnbaeligk ldquoArchitecture for the Internet of Things (IoT)API and interconnectrdquo in Proceedings of the 2nd InternationalConference on Sensor Technologies and Applications (SENSOR-COMM rsquo08) pp 802ndash807 Cap Esterel France August 2008
[22] R Want ldquoAn introduction to RFID technologyrdquo IEEE PervasiveComputing vol 5 no 1 pp 25ndash33 2006
10 International Journal of Distributed Sensor Networks
[23] Global Location Numbers (GLN) GS1 httpwwwgs1orgdocsidkeysGS1 Global Location Numberspdf
8 International Journal of Distributed Sensor Networks
10040100441004810052
Pres
sure
val
ue (h
Pa) Pressure thinning
True pressureRecord pressure
2013
-06-
1920
38
24
2013
-06-
1921
07
12
2013
-06-
1921
36
00
2013
-06-
1922
04
48
2013
-06-
1922
33
36
2013
-06-
1923
02
24
2013
-06-
1923
31
12
2013
-06-
2000
00
00
Time
Figure 4 Compression result of pressure data
The first experiment is to show whether DC can compressdata dynamically according to the characteristics of datasequences The second experiment is to show how thenumber of PEs affects the compression speed and how theselections of database affect the storage speed The thirdexperiment is to show how the setting of database affects thetracing speed
Testing data came from Wuxi Institute of Fudan Univer-sity These data were recorded every 5 seconds by the sensorsin a greenhouse For each experiment we set the window sizeto 30 seconds To test the performance of our framework allPEs were submitted to the sliding window in 30 seconds
There are many factors affecting the performance of theproposed IoT sensor data processing SDP system such as thecharacteristics of the PEs the number of PEs and the settingsof database In our evaluation experiments we focused onhow the number of PEs and the selections of database affectperformance Meanwhile we try to evaluate whether the datacompression (DC) can compress data dynamically accordingto the characteristics of data sequencesThe initial parametersof PSO are set as in Table 1 before experiment
The first experiment was conducted with the 1867 piecesof air temperature records and 1868 pieces of pressure recordsas PEs We measured the accuracy AE and the compressionrate CR when DC outputs a compressed PE list for aphenomenon type The experimental results are shown inFigures 4 and 5 We can see that DC keeps all PEs whosevalue of pressure differs from their previous recorded valuewhile DC finds the trend of air temperature in Figure 5 andrecords only 5 points DC can find a threshold to balance theaccuracy and compression rate Meanwhile the value of AEindex approximates 000321 and the value of CR is near 05133in Figure 4 And the data compression of air temperature inFigure 5 can achieve AE value of 128197 and CR value of000268 This part of experiment shows that the proposeddynamic data compression can identify the trends of dataand balance the accuracy and compression rate Redundancyfrom sensor data can be avoided
The second experiment was conducted with the numberof sensor data varying from 500 to 1000000 We measuredthe time when DC begins compressing the time of DC aftercompression and the time after storage The second onesubtracted by the first one was considered as the compressiontime and the last one subtracted by the second one was
248250252254256258260
Time
True values
Air temperature thinning
Recorded values
2013
-06-
1920
38
24
2013
-06-
1921
07
12
2013
-06-
1921
36
00
2013
-06-
1922
04
48
2013
-06-
1922
33
36
2013
-06-
1923
02
24
2013
-06-
1923
31
12
2013
-06-
2000
00
00
Tem
pera
ture
val
ue (∘
C)
Figure 5 Compression result of air temperature data
020000400006000080000
100000120000140000
500 1000 2000 4000 6000 8000 10000
Tim
e (m
s)
Numbers of PEs
CompressionMySQL storageMongoDB storage
Figure 6 Compression and storage time with various numbers ofPEs
taken as the storage time The result is shown in Figure 6and it reflected that the compression time of DC has linearrelationship with the number of PEs The time storing PEsinto MySQL database increases faster than the time storingdata into MongoDB
The third experiment was conducted using a query totrace the air temperature information during the lifecycleof an entity with an EPC The number of recorded PEs indatabase varied from 69880 to 1985247 The query time ofMySQL and MongoDB is shown in Figure 7 We learnedthat the tracing time of MongoDB grows faster than thetime of MySQL The reason is that MongoDB had to executetwo queries when the number of tuples increases Howevercomparing to the storage time the time cost by the sensordata queries of MongoDB is short
Hence when applications are faced with a large amountof sensor data storage it is better to store sensor datainto nonrelational databases while when the number ofenvironment information discovery queries is larger thanstorage rational databases have better performance
6 Conclusion
In this paper we present a new IoT sensor data processing(SDP) system to process sensor data dynamically in thecontext of the Internet ofThings First heterogeneous sensor
International Journal of Distributed Sensor Networks 9
0
1000
2000
3000
4000
5000
6000
7000
8000
69880 190354 1133228 1985247
Que
ry ti
me (
ms)
Number of tuples in database
MysqlMongoDB
Figure 7 Query time with various numbers of tuples
data are captured and transformed into unified data formatSecond Particle Swarm Optimization algorithm is employedto do data compression avoiding redundancy and helpingto reduce the load of database by adding crossbreedingoperation on PSO algorithm The proposed SDP systemdetects exception situations by setting the standard phe-nomenon rules of containers Meanwhile an appropriatetype of database suitable for the sensor data storage in theIoT is sought and analyzed in this paper The experimentalresults show that the proposed compression method can finda threshold achieving high compression rate and keepingaccuracy and NoSQL database has better performance insensor data storagewhile relational database does betterwhenexecuting environment information discovery queries
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
References
[1] L Atzori A Iera and G Morabito ldquoThe internet of things asurveyrdquoComputer Networks vol 54 no 15 pp 2787ndash2805 2010
[2] M C Domingo ldquoAn overview of the internet of things forpeople with disabilitiesrdquo Journal of Network and ComputerApplications vol 35 no 2 pp 584ndash596 2012
[3] L Zheng H Zhang W Han et al ldquoTechnologies applicationsand governance in the internet of thingsrdquo in Internet ofThingsmdashGlobal Technological and Societal Trends from Smart Environ-ments and Spaces to Green ICT River Publishers 2011
[4] J Yick B Mukherjee and D Ghosal ldquoWireless sensor networksurveyrdquoComputerNetworks vol 52 no 12 pp 2292ndash2330 2008
[5] M Botts and A Robin Sensor Model Language (SensorML)Implementation Specification OpenGIS Implementation Spec-ification 2007
[6] M Botts and A Robin ldquoSensor Model Language (SensorML)Implementation Specification OpenGIS Implementation peci-fication 2007rdquo IEEE Instrumentation and Measurement Soci-ety IEEE Standard for a Smart Transducer Interface for Sensorsand Actuators-Common Functions Communication Proto-cols and Transducer Electronic Data Sheet (TEDS) FormatsIEEE Std 14510 2007
[7] K Aberer M Hauswirth and A Salehi ldquoInfrastructure for dataprocessing in large-scale interconnected sensor networksrdquo inProceedings of the 8th International Conference on Mobile DataManagement (MDM rsquo07) pp 198ndash205 May 2007
[8] M Sgroi A Wolisz A Sangiovanni-Vincentelli and J MRabaey ldquoA service-based universal application interface forad hoc wireless sensor and actuator networksrdquo in AmbientIntelligence pp 149ndash172 Springer Berlin Germany 2005
[9] Y Yao and J Gehrke ldquoThe cougar approach to in-network queryprocessing in sensor networksrdquoACMSigmod Record vol 31 no3 pp 9ndash18 2002
[10] C-C Shen C Srisathapornphat and C Jaikaeo ldquoSensorinformation networking architecture and applicationsrdquo IEEEPersonal Communications vol 8 no 4 pp 52ndash59 2001
[11] M Kim J W Lee Y J Lee and J-C Ryou ldquoCosmos amiddleware for integrated data processing over heterogeneoussensor networksrdquo ETRI Journal vol 30 no 5 pp 696ndash7062008
[12] G Venter and J Sobieszczanski-Sobieski ldquoParticle swarm opti-mizationrdquo AIAA Journal vol 41 no 8 pp 1583ndash1589 2003
[13] J Kennedy ldquoParticle swarm optimizationrdquo in Encyclopedia ofMachine Learning pp 760ndash766 Springer New York NY USA2010
[14] F Van den Bergh and A P Engelbrecht ldquoA study of particleswarm optimization particle trajectoriesrdquo Information Sciencesvol 176 no 8 pp 937ndash971 2006
[15] Y Shi and R C Eberhart ldquoFuzzy adaptive particle swarmoptimizationrdquo in Proceedings of the Congress on EvolutionaryComputation vol 1 pp 101ndash106 IEEE Seoul Republic of KoreaMay 2001
[16] L Gang F Jing L Ling and Y Qilian ldquoFast realization of theLADT ECG data compression methodrdquo IEEE Engineering inMedicine and BiologyMagazine vol 13 no 2 pp 255ndash258 1994
[17] S M S Jalaleddine C G Hutchens R D Strattan and WA Coberly ldquoECG data compression techniquesmdasha unifiedapproachrdquo IEEETransactions on Biomedical Engineering vol 37no 4 pp 329ndash343 1990
[18] P T Gonciari B M Al-Hashimi and N Nicolici ldquoImprovingcompression ratio area overhead and test application timefor system-on-a-chip test data compressiondecompressionrdquo inProceedings of the Design Automation and Test in Europe Con-ference and Exhibition pp 604ndash611 IEEE Computer SocietyParis France 2002
[19] F Wang and P Liu ldquoTemporal management of RFID datardquo inProceedings of the 31st International Conference on Very LargeData Bases (VLDB rsquo05) pp 1128ndash1139 September 2005
[20] E Welbourne L Battle G Cole et al ldquoBuilding the internetof things using RFID the RFID ecosystem experiencerdquo IEEEInternet Computing vol 13 no 3 pp 48ndash55 2009
[21] I Groslashnbaeligk ldquoArchitecture for the Internet of Things (IoT)API and interconnectrdquo in Proceedings of the 2nd InternationalConference on Sensor Technologies and Applications (SENSOR-COMM rsquo08) pp 802ndash807 Cap Esterel France August 2008
[22] R Want ldquoAn introduction to RFID technologyrdquo IEEE PervasiveComputing vol 5 no 1 pp 25ndash33 2006
10 International Journal of Distributed Sensor Networks
[23] Global Location Numbers (GLN) GS1 httpwwwgs1orgdocsidkeysGS1 Global Location Numberspdf
International Journal of Distributed Sensor Networks 9
0
1000
2000
3000
4000
5000
6000
7000
8000
69880 190354 1133228 1985247
Que
ry ti
me (
ms)
Number of tuples in database
MysqlMongoDB
Figure 7 Query time with various numbers of tuples
data are captured and transformed into unified data formatSecond Particle Swarm Optimization algorithm is employedto do data compression avoiding redundancy and helpingto reduce the load of database by adding crossbreedingoperation on PSO algorithm The proposed SDP systemdetects exception situations by setting the standard phe-nomenon rules of containers Meanwhile an appropriatetype of database suitable for the sensor data storage in theIoT is sought and analyzed in this paper The experimentalresults show that the proposed compression method can finda threshold achieving high compression rate and keepingaccuracy and NoSQL database has better performance insensor data storagewhile relational database does betterwhenexecuting environment information discovery queries
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
References
[1] L Atzori A Iera and G Morabito ldquoThe internet of things asurveyrdquoComputer Networks vol 54 no 15 pp 2787ndash2805 2010
[2] M C Domingo ldquoAn overview of the internet of things forpeople with disabilitiesrdquo Journal of Network and ComputerApplications vol 35 no 2 pp 584ndash596 2012
[3] L Zheng H Zhang W Han et al ldquoTechnologies applicationsand governance in the internet of thingsrdquo in Internet ofThingsmdashGlobal Technological and Societal Trends from Smart Environ-ments and Spaces to Green ICT River Publishers 2011
[4] J Yick B Mukherjee and D Ghosal ldquoWireless sensor networksurveyrdquoComputerNetworks vol 52 no 12 pp 2292ndash2330 2008
[5] M Botts and A Robin Sensor Model Language (SensorML)Implementation Specification OpenGIS Implementation Spec-ification 2007
[6] M Botts and A Robin ldquoSensor Model Language (SensorML)Implementation Specification OpenGIS Implementation peci-fication 2007rdquo IEEE Instrumentation and Measurement Soci-ety IEEE Standard for a Smart Transducer Interface for Sensorsand Actuators-Common Functions Communication Proto-cols and Transducer Electronic Data Sheet (TEDS) FormatsIEEE Std 14510 2007
[7] K Aberer M Hauswirth and A Salehi ldquoInfrastructure for dataprocessing in large-scale interconnected sensor networksrdquo inProceedings of the 8th International Conference on Mobile DataManagement (MDM rsquo07) pp 198ndash205 May 2007
[8] M Sgroi A Wolisz A Sangiovanni-Vincentelli and J MRabaey ldquoA service-based universal application interface forad hoc wireless sensor and actuator networksrdquo in AmbientIntelligence pp 149ndash172 Springer Berlin Germany 2005
[9] Y Yao and J Gehrke ldquoThe cougar approach to in-network queryprocessing in sensor networksrdquoACMSigmod Record vol 31 no3 pp 9ndash18 2002
[10] C-C Shen C Srisathapornphat and C Jaikaeo ldquoSensorinformation networking architecture and applicationsrdquo IEEEPersonal Communications vol 8 no 4 pp 52ndash59 2001
[11] M Kim J W Lee Y J Lee and J-C Ryou ldquoCosmos amiddleware for integrated data processing over heterogeneoussensor networksrdquo ETRI Journal vol 30 no 5 pp 696ndash7062008
[12] G Venter and J Sobieszczanski-Sobieski ldquoParticle swarm opti-mizationrdquo AIAA Journal vol 41 no 8 pp 1583ndash1589 2003
[13] J Kennedy ldquoParticle swarm optimizationrdquo in Encyclopedia ofMachine Learning pp 760ndash766 Springer New York NY USA2010
[14] F Van den Bergh and A P Engelbrecht ldquoA study of particleswarm optimization particle trajectoriesrdquo Information Sciencesvol 176 no 8 pp 937ndash971 2006
[15] Y Shi and R C Eberhart ldquoFuzzy adaptive particle swarmoptimizationrdquo in Proceedings of the Congress on EvolutionaryComputation vol 1 pp 101ndash106 IEEE Seoul Republic of KoreaMay 2001
[16] L Gang F Jing L Ling and Y Qilian ldquoFast realization of theLADT ECG data compression methodrdquo IEEE Engineering inMedicine and BiologyMagazine vol 13 no 2 pp 255ndash258 1994
[17] S M S Jalaleddine C G Hutchens R D Strattan and WA Coberly ldquoECG data compression techniquesmdasha unifiedapproachrdquo IEEETransactions on Biomedical Engineering vol 37no 4 pp 329ndash343 1990
[18] P T Gonciari B M Al-Hashimi and N Nicolici ldquoImprovingcompression ratio area overhead and test application timefor system-on-a-chip test data compressiondecompressionrdquo inProceedings of the Design Automation and Test in Europe Con-ference and Exhibition pp 604ndash611 IEEE Computer SocietyParis France 2002
[19] F Wang and P Liu ldquoTemporal management of RFID datardquo inProceedings of the 31st International Conference on Very LargeData Bases (VLDB rsquo05) pp 1128ndash1139 September 2005
[20] E Welbourne L Battle G Cole et al ldquoBuilding the internetof things using RFID the RFID ecosystem experiencerdquo IEEEInternet Computing vol 13 no 3 pp 48ndash55 2009
[21] I Groslashnbaeligk ldquoArchitecture for the Internet of Things (IoT)API and interconnectrdquo in Proceedings of the 2nd InternationalConference on Sensor Technologies and Applications (SENSOR-COMM rsquo08) pp 802ndash807 Cap Esterel France August 2008
[22] R Want ldquoAn introduction to RFID technologyrdquo IEEE PervasiveComputing vol 5 no 1 pp 25ndash33 2006
10 International Journal of Distributed Sensor Networks
[23] Global Location Numbers (GLN) GS1 httpwwwgs1orgdocsidkeysGS1 Global Location Numberspdf