Near-Optimal Sensor Placements in Gaussian Processes Carlos Guestrin Andreas Krause Ajit Singh Carnegie Mellon University
Jan 15, 2016
Near-Optimal Sensor Placements in Gaussian Processes
Carlos GuestrinAndreas Krause Ajit Singh
Carnegie Mellon University
Sensor placement applications
Monitoring of spatial phenomena Temperature Precipitation Drilling oil wells ...
Active learning, experimental design, ...
Results today not limited to 2-dimensions
Precipitationdata fromPacific NW
SERVER
LAB
KITCHEN
COPYELEC
PHONEQUIET
STORAGE
CONFERENCE
OFFICEOFFICE50
51
52 53
54
46
48
49
47
43
45
44
42 41
3739
38 36
33
3
6
10
11
12
13 14
1516
17
19
2021
22
242526283032
31
2729
23
18
9
5
8
7
4
34
1
2
3540
Temperature data from sensor network
Deploying sensors
SERVER
LAB
KITCHEN
COPYELEC
PHONEQUIET
STORAGE
CONFERENCE
OFFICEOFFICE50
51
52 53
54
46
48
49
47
43
45
44
42 41
3739
38 36
33
3
6
10
11
12
13 14
1516
17
19
2021
22
242526283032
31
2729
23
18
9
5
8
7
4
34
1
2
3540
This deployment:Evenly distributed
sensors
But, what are the optimal placements???i.e., solving combinatorial (non-myopic) optimization
Chicken-and-Egg problem: No data or assumptions
about distribution
Don’t know where to place sensors
assumptionsConsidered in:
Computer science(c.f., [Hochbaum & Maass ’85])
Spatial statistics(c.f., [Cressie ’91])
Strong assumption – Sensing radius
Node predictsvalues of positionswith some radius
Becomes a covering problem
SERVER
LAB
KITCHEN
COPYELEC
PHONEQUIET
STORAGE
CONFERENCE
OFFICEOFFICE
Problem is NP-completeBut there are good algorithms with
(PTAS) -approximation guarantees [Hochbaum & Maass ’85]
Unfortunately, approach is usually not useful… Assumption is wrong on real data!
For example…
Spatial correlation
SERVER
LAB
KITCHEN
COPYELEC
PHONEQUIET
STORAGE
CONFERENCE
OFFICEOFFICE50
51
52 53
54
46
48
49
47
43
45
44
42 41
3739
38 36
33
3
6
10
11
12
13 14
1516
17
19
2021
22
242526283032
31
2729
23
18
9
5
8
7
4
34
1
2
3540
Precipitationdata fromPacific NW
Non-local, Non-circular correlations
0.5
0.5
0.55
0.55
0.55
0.6
0.6
0.6
0.6
0.65
0.6
5
0.6
5
0.6
5
0.65
0.7
0.7
0.7
0.7
0.7
0.7
0.7
5
0.75
0.75
0.75
0.75
0.75
0.7
5
0.8
0.80.8
0.8
0.8
0.8
0.8
0.8
0.8
5
0.850.85
0.85
0.8
5
0.85
0.8
5
0.8
50.
85
0.9
0.9
0.9
0.9 0.9
0.9 0.9
0.9
0.9
0.95
0.95
0.95
0.950.950.95
0.95
0.95
0.9
5
0.9
5
0.9
5
0.95
1
1
1
1
5 10 15 20 25 30 35 40
0
5
10
15
20
25
Complex positive and negativecorrelations
-0.15
-0.1
-0.1
-0.1-0
.05
-0.05
-0.05
-0.0
5
-0.0
5
-0.05
0
0
0
0
0
0.05
0.0
5
0.05
0.0
5 0.1
0.1
0.1
0.15
0.1
5
0.1
5
0.2
0.2
0.25
0.25
0.3
0.3
0.3
5
0.4
43 44 45 46 47 48
-124
-123
-122
-121
-120
-119
-118
-117
-0.1
5-0.15
-0.15
-0.1
5
-0.1
-0.1
-0.1-0.1
-0.0
5-0.05
-0.05-0.05
-0.05
-0.05 -0.0
5
-0.0
5
0
0
0
0
0
0
0
0
0
0.05
0.05
0.05 0.05
0.05
0.05
0.050.1
0.1
0.10.1
0.1
0.1
0.1
0.1
0.15
0.15
0.1
5
0.15
0.15
0.2
0.20.2
0.2
0.2
0.250.25
0.3
0.3
0.35
0.4
43 44 45 46 47 48
-124
-123
-122
-121
-120
-119
-118
-117
Complex, noisy correlations
Complex, uneven sensing “region”
Actually, noisy correlations, rather than sensing region
Combining multiple sources of information
Individually, sensors are bad predictors Combined information is more reliable How do we combine information?
Focus of spatial statistics
Temphere?
Gaussian process (GP) - Intuition
x - position
y -
tem
pera
ture
GP – Non-parametric; represents uncertainty;complex correlation functions (kernels)
less sure here
more sure here
Uncertainty after observations are made
Gaussian processes
SERVER
LAB
KITCHEN
COPYELEC
PHONEQUIET
STORAGE
CONFERENCE
OFFICEOFFICE50
51
52 53
54
46
48
49
47
43
45
44
42 41
3739
38 36
33
3
6
10
11
12
13 14
1516
17
19
2021
22
242526283032
31
2729
23
18
9
5
8
7
4
34
1
2
3540 Posterior
mean temperaturePosteriorvariance
Kernel function: Prediction after observing set of sensors A:
Gaussian processes for sensor placement
SERVER
LAB
KITCHEN
COPYELEC
PHONEQUIET
STORAGE
CONFERENCE
OFFICEOFFICE50
51
52 53
54
46
48
49
47
43
45
44
42 41
3739
38 36
33
3
6
10
11
12
13 14
1516
17
19
2021
22
242526283032
31
2729
23
18
9
5
8
7
4
34
1
2
3540 Posterior
mean temperaturePosteriorvariance
Goal: Find sensor placement with least uncertainty after observations
Problem is still NP-complete Need approximation
Consider myopically selecting
This can be seen as an attempt to non-myopically maximize
Non-myopic placements
H(A1) + H(A2 | {A1}) + ... + H(Ak | {A1 ... Ak-1})
most uncertain
most uncertaingiven A1
most uncertaingiven A1 ... Ak-1
This is exactly the joint entropyH(A) = H({A1 ... Ak})
Entropy criterion (c.f., [Cressie ’91])
A Ã ; For i = 1 to k
Add location Xi to A, s.t.:
“Wasted” information
observed by[O’Hagan ’78]
EntropyHigh uncertainty
given current set A – X is different
Temperature data placements: Entropy
Uncertainty (entropy) plot
05
1015
20
0
5
10
150
0.5
1
1.5
2
2.5
05
1015
20
0
5
10
150
0.5
1
1.5
2
2.5
05
1015
20
0
5
10
150
0.5
1
1.5
2
2.5
05
1015
20
0
5
10
150
0.5
1
1.5
2
2.5
05
1015
20
0
5
10
150
0.5
1
1.5
2
2.5
Entropy places sensors along borders
Entropy criterion wastes information [O’Hagan ’78], Indirect, doesn’t consider sensing region – No formal non-myopic guarantees
Proposed objective function:Mutual information
Locations of interest V Find locations AµV
maximizing mutual information:
Intuitive greedy rule:
High uncertainty given A –
X is different
Low uncertainty given rest –
X is informative
Uncertainty ofuninstrumented
locationsafter sensing
Uncertainty ofuninstrumented
locationsbefore sensing
Intuitive criterion – Locations thatare both different and informative
We give formal non-myopic guarantees
Temperature data placements: Entropy Mutual information
T1
T2
An important observation
T5
T4
T3
Selecting T1 tells sth.about T2 and T5
Selecting T3 tells sth.about T2 and T4
Now adding T2 would not help much
In many cases, new information is worth less if we know more
(diminishing returns)!
Submodular set functions Submodular set functions are a natural
formalism for this idea:
f(A [ {X}) – f(A)
Maximization of SFs is NP-hard But…
B A {X}
¸ f(B [ {X}) – f(B) for A µ B
Theorem [Nemhauser et al. ’78]: The greedy algorithm guarantees (1-1/e) OPT approximation for monotone SFs, i.e.
~ 63%
How can we leverage submodularity?
Theorem [Nemhauser et al. ’78]: The greedy algorithm guarantees (1-1/e) OPT approximation for monotone SFs, i.e.
~ 63%
How can we leverage submodularity?
Mutual information and submodularity
Mutual information is submodular F(A) = I(A;V\A) So, we should be able to use Nemhauser et al.
Mutual information is not monotone!!! Initially, adding sensor increases MI; later
adding sensors decreases MI F(;) = I(;;V) = 0 F(V) = I(V;;) = 0 F(A) ¸ 0
mutu
al in
form
ati
on
A=; A=Vnum. sensors
Even though MI is submodular,can’t apply Nemhauser et al.
Or can we…
Approximate monotonicity of mutual information
If H(X|A) – H(X|V\A) ¸ 0, then MI monotonic Solution: Add grid Z of unobservable
locations If H(X|A) – H(X|ZV\A) ¸ 0, then MI monotonic
X
AV\A
H(X|A) << H(X|V\A)MI not monotonicFor sufficiently fine Z:
H(X|A) > H(X|ZV\A) - MI approximately monotonic
Z – unobservable
Theorem: Mutual information sensor placement
Greedy MI algorithm provides constant factor approximation: placing k sensors, 8 >0:
Optimalnon-myopic
solution
Result ofour algorithm
Constant factor
Approximate monotonicityfor sufficiently discretization –
poly(1/,k,,L,M) – sensor noise, L – Lipschitz const. of kernels,
M – maxX K(X,X)
Different costs for different placements
Theorem 1: Constant-factor approximation of optimal locations – select k sensorsTheorem 2: (Cost-sensitive placements) In practice, different locations may have different costs
Corridor versus inside wall Have a budget B to spend on placing sensors
Constant-factor approximation – same constant (1-1/e) Slightly more complicated than greedy algorithm [Sviridenko / Krause, Guestrin]
Deployment results
“True” temp.prediction
“True” temp.variance
Used initial deployment to select 22 new sensors Learned new GP on test data using just these sensors
Posteriormean
Posteriorvariance
Entropy criterion Mutual information criterion
Mutual information has 3 times less variance than entropy criterion
Model learned from 54 sensors
Comparing to other heuristics
mutu
al in
form
ati
on
Bett
er
Greedy Algorithm we analyze
Random placements Pairwise exchange
(PE) Start with a some
placement Swap locations while
improving solution
Our bound enables a posteriori analysis for any heuristic
Assume, algorithm TUAFSPGP gives results which are 10% better than the results obtained from the greedy algorithmThen we immediately know, TUAFSPGP is within 70% of optimum!
Precipitation data
Bette
r
Entropycriterion
MutualinformationEntropy
Mutual information
Computing the greedy rule
Exploit sparsity in kernel matrix
At each iteration For each candidate position i 2{1,…,N}, must compute:
Requires inversion of NxN matrix – about O(N3)
Total running time for k sensors: O(kN4) Polynomial! But very slow in practice
Local kernels Covariance matrix may have many zeros!
Each sensor location correlated with a small number of other locations
Exploiting locality: If each location correlated with at most d others A sparse representation, and a priority queue
trick Reduce complexity from O(kN4) to:
Only about O(N log N)
=
Usually, matrix is only almost sparse
Approximately local kernels
Covariance matrix may have many elements close to zero E.g., Gaussian kernel Matrix not sparse
What if we set them to zero? Sparse matrix Approximate solution
Theorem: Truncate small entries ! small effect on solution
quality If |K(x,y)| · , set to 0 Then, quality of placements only O() worse
Effect of truncated kernels on solution – Rain data
Improvement in running time
Bette
r
Effect on solution quality
Bette
r
About 3 times faster, minimal effect on solution quality
Summary Mutual information criterion for sensor
placement in general GPs Efficient algorithms with strong
approximation guarantees: (1-1/e) OPT-ε Exploiting local structure improves
efficiency Superior prediction accuracy for several
real-world problems Related ideas in discrete settings
presented at UAI and IJCAI this yearEffective algorithm for sensor placement and experimental design; basis for active learning
A note on maximizing entropy
Entropy is submodular [Ko et al. `95], but… Function F is monotonic iff:
Adding X cannot hurt F(A[X) ¸ F(A)
Remark: Entropy in GPs not monotonic (not even
approximately) H(A[X) – H(A) = H(X|A) As discretization becomes finer H(X|A) ! -1
Nemhauser et al. analysis for submodularfunctions not applicable directly to
entropy
How do we predict temperatures at unsensed locations?
position
tem
pera
ture
Interpolation?
Ove
rfits
Far away points?
How do we predict temperatures at unsensed locations?
x - position
y -
tem
pera
ture
Regression y = a + bx + cx2 + dx3 Few parameters, less overfitting
But, regression function has no notion of uncertainty!!!
How sure are we about the prediction?
less sure here
more sure here