•BICRITERIA OPTIMIZATION OF ENERGY EFFICIENT PLACEMENT AND ROUTING IN HETEROGENOUS WIRELESS SENSOR NETWORKS •TIME SERIES DATA MINING 1 Mustafa Gokce Baydogan Singapore Management University, 5/10/2012 Mustafa Gokce Baydogan PhD Candidate School of Computing, Informatics and Decision Systems Engineering Arizona State University (ASU) Tempe, AZ, USA •TIME SERIES DATA MINING BICRITERIA OPTIMIZATION OF ENERGY EFFICIENT PLACEMENT AND ROUTING IN HETEROGENOUS WIRELESS SENSOR NETWORKS Mustafa Gökçe Baydoğan School of Computing, Informatics and Decision Systems Engineering 2 Mustafa Gokce Baydogan Singapore Management University, 5/10/2012 Arizona State University (ASU) Tempe, AZ, USA Nur Evin Özdemirel, PhD Department of Industrial Engineering Middle East Technical University (METU) Ankara, Turkey MOTIVATION SOCIOECONOMIC – Environmental monitoring – Air, soil or water monitoring – Habibat monitoring – Seismic detection – Military surveillance 3 Mustafa Gokce Baydogan Singapore Management University, 5/10/2012 – Battlefield monitoring – Sniper localization – Nuclear, biological or chemical attack detection – Disaster area monitoring RESEARCH DESIGN ISSUES IN WSNs Deployment random vs deterministic; one-time vs iterative Mobility mobile vs immobile Heterogeneity homogeneous vs heterogeneus 4 Mustafa Gokce Baydogan Singapore Management University, 5/10/2012 Communication modality radio vs light vs sound Infrastructure infrastructure vs ad hoc Network Topology single-hop vs star vs tree vs mesh Römer and Mattern, 2004, The Design Space of Wireless Sensor Networks, IEEE Wireless Communications, 11:6, 54-6
18
Embed
baydogan time series data mining - Home | Living Analytics ...
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
•BICRITERIA OPTIMIZATION OF ENERGY EFFICIENT PLACEMENT AND ROUTING IN HETEROGENOUS WIRELESS SENSOR NETWORKS
•TIME SERIES DATA MINING
1Mustafa Gokce Baydogan
Singapore Management University, 5/10/2012
Mustafa Gokce Baydogan
PhD Candidate
School of Computing, Informatics and Decision Systems Engineering
Arizona State University (ASU) Tempe, AZ, USA
•TIME SERIES DATA MINING
BICRITERIA OPTIMIZATION OF ENERGY EFFICIENT PLACEMENT AND ROUTING IN HETEROGENOUS
WIRELESS SENSOR NETWORKS
Mustafa Gökçe Baydoğan
School of Computing, Informatics and Decision Systems Engineering
2Mustafa Gokce Baydogan
Singapore Management University, 5/10/2012
Arizona State University (ASU) Tempe, AZ, USA
Nur Evin Özdemirel, PhD
Department of Industrial Engineering
Middle East Technical University (METU) Ankara, Turkey
MOTIVATION
SOCIOECONOMIC
– Environmental monitoring
– Air, soil or water monitoring
– Habibat monitoring
– Seismic detection
– Military surveillance
3Mustafa Gokce Baydogan
Singapore Management University, 5/10/2012
– Battlefield monitoring
– Sniper localization
– Nuclear, biological or chemical attack detection
– Disaster area monitoring
RESEARCH0
DESIGN ISSUES IN WSNs
Deployment
random vs deterministic; one-time vs iterative
Mobility
mobile vs immobile
Heterogeneity
homogeneous vs heterogeneus
4Mustafa Gokce Baydogan
Singapore Management University, 5/10/2012
Communication modality
radio vs light vs sound
Infrastructure
infrastructure vs ad hoc
Network Topology
single-hop vs star vs tree vs mesh
Römer and Mattern, 2004, The Design Space of Wireless Sensor Networks, IEEE Wireless Communications, 11:6, 54-6
PROBLEM CHARACTERISTICS
There are some events (targets) to be sensed in the monitoring area
Sink
Locate sensors (battery powered) to possible locations so that events are sensed(detected) with a
given probability
Determine the rate of data flow between sensors and sink node (base station)
5Mustafa Gokce Baydogan
Singapore Management University, 5/10/2012
TOTAL SENSOR
COST
NETWORK LIFETIME
PROBLEM DEFINITION
OBJECTIVES
– Minimize total cost of sensors deployed
– Maximize lifetime of the network
DECISIONS
– Location of heterogeneous sensors
6Mustafa Gokce Baydogan
Singapore Management University, 5/10/2012
– Data routing
CONSTRAINTS
– Connectivity
– Node (sensor) and channel (link) capacity
– Coverage
– Battery power
LITERATURE
7Mustafa Gokce Baydogan
Singapore Management University, 5/10/2012
PROBLEM DEFINITIONp
CONNECTIVITY
A sensor of type k located at location i can communicate with a sensor of type k located at
location j if ( )jki
k crcr ,mindist ij ≤
i
kcr i
kcr
8Mustafa Gokce Baydogan
Singapore Management University, 5/10/2012
ijdist j
kcr
ji
(a) (b)
j
kcr
j
ijdist
i
PROBLEM DEFINITIONp
≤
=−
otherwise ,0
dist if , ip k
dist
ikp
srepr
ipkβ Strength of the sensor signal
decreases as distance increases†
COVERAGE
Denoted as the detection probability of a target at point
By a sensor of type k located at location
p
i
9Mustafa Gokce Baydogan
Singapore Management University, 5/10/2012
otherwise ,0
† Zou and Chakrabarty, 2005, A Distributed Coverage- and Connectivity- Centric Technique for Selecting Active Nodes in
Wireless Sensor Networks, IEEE Transactions on Computers
( )( )∏∈
−−=p
ik
Bki
x
ikppprpr
,
11
Detection probability of a target at point p
PROBLEM DEFINITION
ENERGY CONSUMPTION MODEL †
Sources of energy consumption in a sensor
– Generating data
– Receiving data
β=
γ=keg
10Mustafa Gokce Baydogan
Singapore Management University, 5/10/2012
– Transmitting data
† J. Tang, B. Hao, and A. Sen, 2006, Relay node placement in large scale wireless sensor networks, Computer Communications,
29:4, 490-501
m
ijij distet *λδ +=
β=ijer
δ is a distance-independent constant term
λ is a coefficient term associated with the distance dependent term
ijdist is the distance between two locations
m is the path loss index
PROBLEM FORMULATION
total cost of sensors located
11Mustafa Gokce Baydogan
Singapore Management University, 5/10/2012
lifetime of the network
one sensor can be located at each location
PROBLEM FORMULATION
data flow balance at a sensor
all data is routed to sink node
12Mustafa Gokce Baydogan
Singapore Management University, 5/10/2012
sensor capacity
channel (link) capacity
coverage
PROBLEM FORMULATION
battery power
13Mustafa Gokce Baydogan
Singapore Management University, 5/10/2012
location decision data flow decision
battery power
THE BICRITERIA PROBLEM
DOMINATION
dominates ( )'2'
1 , zz ( )''2''
1 , zz
and ''
1
'
1zz ≤if
''
2
'
2zz > or and
''
1
'
1zz < ''
2
'
2zz ≥
18
20
14Mustafa Gokce Baydogan
Singapore Management University, 5/10/2012
10 12 14 16 18 20 22 24 266
8
10
12
14
16
Sensor Cost
Netw
ork
Lifetim
e
FINDING PARETO OPTIMAL SOLUTIONS
A BICRITERIA PROBLEM
10
12
14
16
18
20
Netw
ork
Lifetim
e
Solve for to find lower bound on cost1z
Solve for
s.t.
to find lower bound on lifetime
1cost z≤
2z
Solve for to find upper bound on lifetime2z
15Mustafa Gokce Baydogan
Singapore Management University, 5/10/2012
10 12 14 16 18 20 22 24 266
8
10
Sensor Cost
Ne Solve for to find upper bound on lifetime2
Solve for
s.t.
to find upper bound on cost
2lifetime z≥
1z
For all integral values
solve for
1z
2z
GENETIC ALGORITHM
Why evolutionary algorithms?
• Classical search and optimization methods– find single solution in every iteration
– need repetitive use of a single objective optimization method
– assumptions like linearity, continuity
• Evolutionary Algorithms – use a population of solutions in every generation
16Mustafa Gokce Baydogan
Singapore Management University, 5/10/2012
– use a population of solutions in every generation
– no assumptions
– find and maintain multiple good solutions • Emphasize all nondominated solutions in a population equally
• Preserve a diverse set of multiple nondominated solutions
�Near optimal, uniformly disributed, well extended set of solutions for MO problems
Nondominated sorting approach (Goldberg, 1989)
GENETIC ALGORITHM
Convergence to Pareto optimal front
17Mustafa Gokce Baydogan
Singapore Management University, 5/10/2012
Diverse set of solution along Pareto
optimal front
GENETIC ALGORITHM
REPRESENTATION
type of the sensor located on the corresponding location
Disadvantages
– Flow allocation is not stored
– Lifetime cannot be determined
0 1 2 ------- 1 3 ------- 0 3 0
i 1+i n1 1−n2−n2 3
18Mustafa Gokce Baydogan
Singapore Management University, 5/10/2012
– Lifetime cannot be determined
– Finding feasible solutions after mutation and crossover operators is very hard
Advantages
– Problem reduces to LP with given sensor locations
– By solving LP, maximum lifetime and constraint violations can be determined
FITNESS
Based on nondominated sorting idea
considering three objectives
– Total sensor cost
– Network lifetime
– Overall constraint violation
• Connectivity
GENETIC ALGORITHM
19Mustafa Gokce Baydogan
Singapore Management University, 5/10/2012
• Connectivity
• Coverage
• Capacity violations
(channel and sensor)
GENETIC ALGORITHM
INITIAL POPULATION GENERATION
– Two phase approach
– Sensor location
– Location according to target coordinates
– Relay location
– Location according to sensor coordinates
20Mustafa Gokce Baydogan
Singapore Management University, 5/10/2012
MUTATION
– Repair and improve
– Repair coverage constraints
– Improve cost and lifetime objectives
– Repair connectivity constraints
– Improve cost objective
GENETIC ALGORITHM
21Mustafa Gokce Baydogan
Singapore Management University, 5/10/2012
Small problems
– Problems with 24 possible locations
– Problems with 40 possible locations
TEST PROBLEMS
25 possible locations
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
PS2441 possible locations
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
PS40BS BS
22Mustafa Gokce Baydogan
Singapore Management University, 5/10/2012
50 targets are dispersed across the monitoring area
Each target has a random coverage threshold uniformly distributed between 0.7 and 1
The rate of data generated for each target is a random integer between 1 Kbps and 3 Kbps
COMPUTATIONAL RESULTS
PERFORMANCE MEASURES
Proximity Indicator (PI)
For each solution found, find the Pareto
optimal solution with closest normalized
Tchebychev distance
Reverse Proximity Indicator (RPI)
23Mustafa Gokce Baydogan
Singapore Management University, 5/10/2012
For each Pareto optimal solution, find the
solution with closest normalized Tchebychev
distance
Hypervolume Indicator (HI)
Find the ratio of area bounded by nadir point
that cannot be covered
COMPUTATIONAL RESULTS
Smaller Problems
Problem
size
Constraint
tightness
# of feasible
problems RPI PI HI ε -constraint GA GA=Exact ε -constraint GA
Time series classification� A supervised learning problem aimed at
labeling temporally structured univariate
(or multivariate) sequences of certain (or
variable) length.
9
Datasets� Datasets are from different domains
Word
recognition
MedicineEnergy
Biology
10
Face
recognition
Image and video
classification
Energy
Robotics
Gesture
recognition
Astronomy
A Bag-of-Features Framework to
classify time series (TSBF)
� Bag of features � a common method used for image classification
� also referred as� Bag of words in document analysis� Bag of frames in audio and speech recognition
� Accurate even with simple shape based features� Accurate even with simple shape based features
� SBF provides a framework for time series classification, alternative algorithms for the following tasks may provide better solutions� Local feature extraction� Codebook generation
� Classification
11
The details and the code of TSBF and the datasets are provided inhttp://www.mustafabaydogan.com/research/time-series-classification.html
Supervised Time Series Pattern Discovery
through Local Importance (TS-PD)
� TS-PD aims at finding patterns for
interpretability
� TS-PD identifies regions of interests� TS-PD identifies regions of interests
� Provides a visualization tool for understanding underlying relations
� Fast approach to detect the local
information related to the classification
12
The details and the code of TS-PD and the datasets will be provided in
� Classify gestures (8 different types of gestures)
13
TS-PD
Example
14
Using DM as a tool� Decision makers are interested in knowledge that permits them to do their jobs better by taking some specific actions in response to the newly discovered knowledge.
� Usually a data mining algorithm is executed first and then profitable actions are determined based on the results from the data mining� Example: Market basket analysis
� Association rule mining to decide location of items in the supermarket
15
Using DM as a tool
Market-Basket transactions
TID Items
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
3 Milk, Diaper, Beer, Coke
Example of Association Rules
{Diaper} → {Beer},
{Milk, Bread} → {Eggs,Coke},
{Beer, Bread} → {Milk},
� Put diapers and beer on the same shelf???
16
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke
Implication means co-
occurrence, not causality!
Using DM as a tool
� Root cause analysis in networks
� Supply chain networks
� Identify corrupt nodes and their relations
� Why are my deliveries late?
17
� Why are my deliveries late?
Using DM as a tool� Transaction data
� Several factors affecting the network
id Stage 1 Stage 2 Stage 3 Stage 4
Weather
status
at stage 1 ….
Road status
between stage
1 and stage 2 ….
Transportation
vehicle
between stage
1 and stage 2 Weight Delayed?
1 S2 P1 D2 C2 Sunny … good …. Plane 30lbs Yes
2 S5 P3 D4 C1 Rainy … bad … Truck 40lbs No
18
2 S5 P3 D4 C1 Rainy … bad … Truck 40lbs No
3 . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
N . . . . . . . . . . .
Using DM as a tool� DM is required since
� Data is high dimensional� There may be missing values in the data� Not all indicators are numerical
� Identify interaction between the network nodes to find out the causes of delayto find out the causes of delay� What decisions are causing delay?
� Take actions� Modification of the optimization algorithm
� Introduce constraints based on the learning (data mining result)
� Simulation to generate more data� Further analysis of simulated data
19
Future directions� Reinforcement learning
� The decision-maker recognizes her state within the environment, and reacts by initiating an action.
� Example application: dynamic pricing� Example application: dynamic pricing
� Consequently she obtains a reward signal, and enters another state
20
Future directions
� The mechanism that generates reward signals and introduces new states is referred to as the dynamics of the environment.
� Agent is unfamiliar with dynamics of the � Agent is unfamiliar with dynamics of the environment and therefore initially it cannot correctly predict the outcome of actions.
� As the agent interacts with the environment and observes the actual consequences of its decisions, it can gradually adapt its behavior accordingly
21
Future directions� Dynamic programming is widely used to solve this problem� However environment can be highly unpredictable
� Modeling the environment efficiently is a important†
� provides knowledge about the domain that produced the datadata
� Revisiting dynamic pricing problem� In game-theoretic setting, all players are assumed to be rational but is that true?
� predicting opponent’s proposed price in advance (reduce uncertainty in the environment)
� Another example� If we know that a certain pattern observed in the stock price lead to high profit under certain conditions in the past, this may be important in taking actions
22
†L. Busoniu, R. Babuska, and B. De Schutter, "A comprehensive survey of multi-agent reinforcement learning," IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, vol. 38, no. 2,
pp. 156-172, Mar. 2008.
Thanks for your patience!
Questions and comments?
23
Supplemental material
24
DM and OR
� Using OR for DM� Optimization algorithms used for DM
� Data visualization
� Attribute selection
� Classification
Unsupervised learning� Unsupervised learning
� Using DM as a tool for decision making� Data mining can be used to complement traditional OR methods in many areas