NONPARAMETRIC AND PARTIALLY NONPARAMETRIC STATISTICAL INFERENCE IN WIRELESS SENSOR NETWORKS A Dissertation Presented to the Faculty of the Graduate School of Cornell University in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy by Ting He August 2007
226
Embed
NONPARAMETRIC AND PARTIALLY NONPARAMETRIC … › researcher › files › us... · 2010-03-30 · NONPARAMETRIC AND PARTIALLY NONPARAMETRIC STATISTICAL INFERENCE IN WIRELESS SENSOR
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
NONPARAMETRIC AND PARTIALLY
NONPARAMETRIC STATISTICAL INFERENCE IN
WIRELESS SENSOR NETWORKS
A Dissertation
Presented to the Faculty of the Graduate School
of Cornell University
in Partial Fulfillment of the Requirements for the Degree of
1.1 Reported alarmed sensors (red) in two collections. . . . . . . . . . 61.2 Is S communicating with D through R? . . . . . . . . . . . . . . . 101.3 Transmission patterns of S, R, and D suggest a communication
between S and D through R. . . . . . . . . . . . . . . . . . . . . . 101.4 In a wireless network, eavesdroppers are deployed to report the
transmission activities of nodes A and B to a detector at the fusioncenter, which in turn decides whether there are information flowsthrough these nodes. Si (i = 1, 2): a sequence of transmissionepochs of node A (or B). . . . . . . . . . . . . . . . . . . . . . . . 11
2.1 Members of HD and HCD; : sample point in S1, •: sample pointin S2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.2 The set s1, s2, s3, s4 is shatterable by axis-aligned rectangles, butthe set s1, s2, s3, s4, s5 is not. . . . . . . . . . . . . . . . . . . . . 31
2.3 Members of HR; : sample point in S1, •: sample point in S2. . . . 322.4 The set s1, s2, s3, s4 is shatterable by A ∪ B. . . . . . . . . . . . 352.5 Members of HV and HH; : sample point in S1, •: sample point in
S2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362.6 Members of HDR; : sample point in S1, •: sample point in S2. . . 442.7 Detection threshold as a function of the sample size for different
VC-dimension’s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 482.8 Detection threshold as a function of the detector size for different
sample sizes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 492.9 Miss detection probability of δdA as a function of the sample size:
simulation results. Here p = 0.98, q = 0.02, r = s/12. Use 1000Monte Carlo runs. . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
2.10 Miss detection probability of δφAas a function of the sample size:
simulation results. Here p = 0.98, q = 0.02, r = s/12. Use 10000Monte Carlo runs. . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
2.11 Detection probability of δdA as a function of detector size, 1000Monte Carlo runs. . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
2.12 Detection probability of δφAas a function of detector size, 10000
3.1 Detecting information flows through nodes A and B by analyzingtheir transmission activities S1 and S2. . . . . . . . . . . . . . . . 63
3.2 Both the solid and the dotted lines denote matchings that are causaland bounded in delay, but the dotted lines also preserve the orderof incoming packets. . . . . . . . . . . . . . . . . . . . . . . . . . 67
3.3 Finding the match of s1(1): there are three candidates in the ∆-length interval following s1(1). . . . . . . . . . . . . . . . . . . . . 68
3.4 (a) the cumulative counting functions ni(w) (i = 1, 2); (b) thecumulative difference d(w) and the maximum variation v(w). . . . 70
xi
3.5 The statistic of DA is no larger than that of DMV. . . . . . . . . 753.6 PF (δDA), PF (δDMV), and their bounds; M = 40 packets, 100000
Monte Carlo runs. . . . . . . . . . . . . . . . . . . . . . . . . . . . 783.7 PF (δDM) under various rates; ∆ = 10 seconds, 100000 Monte Carlo
4.2 An information flow along the path R1 → . . . → Rn. . . . . . . . . 884.3 BGM: a sequential greedy match algorithm. . . . . . . . . . . . . 954.4 Example: •: sk ∈ S1; : sk ∈ S2; M1(k): the statistics calculated
by BMR. Initially, M1(0) = 0, indicating that the memory is empty.The first packet is a departure, and it is assigned as chaff becauseotherwise the memory will be underflowed. The second packet isan arrival, and thus the memory size is increased by one. Suchupdating occurs at each arrival or departure. . . . . . . . . . . . . 98
4.5 Example: (a) The scheduling obtained by repeatedly using BGM.(b) Another scheduling. It shows that repeatedly using BGM issuboptimal. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
⊕ · · ·⊕ s4): monitor thememory sizes of the relay nodes and assign a chaff packet if thememory of any node will be underflowed or overflowed. Initially,Mi(0) = 0 (i = 1, 2, 3); at the end of this realization (after the10th packet), (M1(10), M2(10), M3(10)) = (1, 1, 0). . . . . . . . . 106
4.11 The level of undetectability βMn and its bounds as functions of n:
4.22 Inserting virtual packets to calculate the delays of chaff packets. . 1264.23 The Markov chain formed by d′(w); p = λ1
λ1+λ2, q = 1 − p. . . . . 127
4.24 Every relay sequence in P∗ corresponds to a relay sequence in P;solid line: sequences in P; dashed line: sequences in P∗. . . . . . . 130
4.25 Solid lines denote the original relay sequences; dashed lines denotethe reorganized relay sequences which preserve the order of packets. 131
4.26 A “batched” arrival process generated from a Poisson process. ¥:arrival epochs; : points in the underlying Poisson process; M = 2,period= 5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
4.27 The Markov chain of (M1(k), M2(k))∞k=0. All straight lines havetransition probability 1/3. All the states are marked with theirlimiting probabilities, e.g.,π(0, 2) = 1/15. . . . . . . . . . . . . . . 141
5.1 In a wireless network, nodes A and B may serve on one or multipleroutes. Eavesdroppers are deployed to collect their transmissionactivities Si (i = 1, 2), which are then sent to a detector at thefusion center. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
5.2 A distributed detection system. This system consists of two quan-tizers q
5.9 Construct f1: : original epochs; •: constructed epochs. . . . . . . 1835.10 Construct (f1, f2) from (x′n, y′n) (T ≥ ∆). The matching found by
IC-SE guarantees that x′j = y′
j, 2 + y′j+1, 1. . . . . . . . . . . . . . . 186
xiv
Chapter 1
Introduction
1.1 Nonparametric Statistical Inference in Wireless Sensor
Networks
Wireless sensor networks have become increasingly popular in the past few years.
The development of such networks was originally motivated by military applica-
tions such as battlefield surveillance. Now the use of wireless sensor networks is ex-
tended to many civilian applications, including environment and habitat monitor-
ing, healthcare applications, home automation, process monitoring, traffic control,
etc. These applications all require the collaborative inference of certain physical
or environmental conditions based on information collected by the sensors.
In classical statistical inference, the conditions to be inferred are assumed to
be characterized by a known parametric family, and the problem is reduced to
find the correct index in this family. This approach corresponds to the case when
there is thorough understanding of the conditions and their influence on sensor
measurements so that it is possible to formulate a parametric model properly.
Unlike in classical inference, the phenomena to be monitored by wireless sensor
networks are often not known at the time of inference or too diverse to fit into
specific parametric models. Therefore, it is desirable that nonparametric statistical
inference is considered in applications of wireless sensor networks.
The need of nonparametric inference also arises out of the concern of network
security. Wireless sensors may be deployed in an open environment and thus sub-
1
ject to tempering by malicious intruders. In this case, it has been proposed to use
statistical inference methods to identify misbehaving sensors, but the knowledge
about how compromised sensors will behave is very limited. Recent research inter-
ests have grown in the area of defending against intelligent adversaries, when the
intruder can control compromised sensors to collaboratively disrupt the inference
in an intelligent manner. In the presence of intelligent adversaries, it is highly
desirable that inference methods can guarantee certain performance even in the
worst case.
It is generally impossible to design a single inference method that is optimal
for different underlying distributions. Thus the enforcement of nonparametric
methods will inevitably result in a loss of performance for a specific distribution. In
the presence of intelligent adversaries, there may even be scenarios in which reliable
inference is impossible. Therefore, it is crucial to investigate the performance of
nonparametric inference techniques and their fundamental limits.
1.2 Dissertation Outline
This thesis attempts to study nonparametric statistical inference in wireless sensor
networks from the perspectives of both theoretical analysis and practical algorithm
design. The thesis addresses two problems. The first problem is the nonparamet-
ric detection and estimation of changes in the geographical distribution of alarmed
sensors, where detectors with exponentially decaying error probabilities and consis-
tent estimators are developed. The second problem is the detection of information
flows by timing analysis. The problem is further divided into three subproblems,
which deal with the detection of information flows without chaff noise, the detec-
2
tion in the presence of chaff noise, and distributed detection. Various detectors are
developed under the assumption of intelligent adversaries, and their asymptotic
performance is evaluated by error exponents (in the case of no chaff) or the maxi-
mum amount of chaff noise to guarantee vanishing error probabilities (otherwise).
In Chapter 2, we consider nonparametric change detection and estimation in
planar random sensor fields. Sensors are deployed to measure certain underlying
phenomenon and make binary decisions (i.e., alarmed, normal). Given samples
of the locations of alarmed sensors from two data collections, we want to know
whether and where the underlying phenomenon has changed. Assuming that in
each collection, the samples are i.i.d. drawn from an unknown geographical dis-
tribution, we formulate the problem as detecting changes in this distribution be-
tween two collection periods and estimating the location of the maximum change
if changes do occur. Our main contributions include a threshold detector based on
the distance between empirical distributions and uniform upper bounds on its error
probabilities under arbitrary distributions. Polynomial-time algorithms are devel-
oped to implement the detector for several types of empirical distances. Solutions
to the detection problem also give an estimate of the set with the largest change.
We show that under certain regularity conditions, such estimation is consistent.
In Chapter 3, we consider the detection of information flows. Given a wireless
or wired ad hoc network, we want to know if there are flows of information-carrying
packets through the nodes of interest by measuring the transmission activities of
these nodes in timing. Timing analysis has the advantages of robustness against
encryption and padding and easily obtainable measurements (especially in wireless
networks). Its challenges include perturbations imposed by delays, permutations,
etc. and chaff noise, which consists of dummy traffic and unrelated traffic mul-
3
tiplexed at intermediate nodes. In this chapter, we decompose the detection to
pairwise detection of every two hops of the information flows and only consider
timing perturbations. Assuming that the perturbations are bounded in delay or
memory, and there is no chaff noise, we develop linear-time detectors which have no
miss detection. We show that the proposed detectors outperform existing detectors
in false alarm probability.
In Chapter 4, we generalize the detection of information flows to allow the
insertion of chaff noise. Assuming that nodes can collaboratively perturb timing
and insert chaff noise to evade detection, we show that there exists a threshold
on the fraction of chaff noise, beyond which Chernoff-consistent detection is im-
possible. The threshold is characterized as the minimum chaff noise required for
an information flow to mimic the distribution under the null hypothesis. Optimal
chaff-inserting algorithms are developed to compute the threshold, and closed-
form expressions are obtained under the assumption that traffic under the null
hypothesis can be modelled as independent Poisson processes. Furthermore, we
develop a threshold detector based on the optimal chaff-inserting algorithms, which
can achieve Chernoff-consistent detection in the presence of chaff noise arbitrar-
ily close to the threshold. Therefore, we obtain a tight bound on the fraction of
chaff noise, within which the proposed detector is Chernoff-consistent, and beyond
which there exists an information flow embedded in chaff noise that is statisti-
cally identical with traffic under the null hypothesis so that no detector can be
Chernoff-consistent. We use this bound to characterize the level of detectability of
information flows in chaff noise. Furthermore, we show that joint detection over
multiple hops can greatly increase the level of detectability.
Chapter 5 addresses distributed detection of information flows. We focus on
4
pairwise detection of bounded delay flows in chaff noise. In distributed detection,
the collection of measurements is subject to capacity constraints in the commu-
nication channels, which makes the problem more applicable to wireless sensor
networks because the wide deployment and limited power supply make it neces-
sary to limit the communication rates. We derive theoretical upper and lower
bounds on the level of detectability as functions of the capacity constraints, and
then direct the focus to the design of practical detection systems. The detec-
tion systems consist of simple slot-based quantizers and threshold detectors based
on optimal chaff-inserting algorithms which compute the minimum chaff noise re-
quired to generate the received (compressed) measurements. Performance of the
proposed detection systems is analyzed and compared to gain heuristics on system
design.
The rest of this chapter elaborates each problem that we have introduced,
including our initial motivation for the problem, a summary of results, and a brief
overview of the related work.
1.3 Nonparametric Change Detection and Estimation in
2D Random Sensor Fields
We consider the detection of certain phenomenal change in a large-scale randomly
deployed sensor field. For example, sensors may be designed to detect certain
chemical components. When the sensor measurement exceeds certain threshold,
the sensor is “alarmed”. The state of a sensor depends on where it resides; sensors
in some area are more likely to be in the alarmed state than others are. We are not
interested in the event that certain sensors are alarmed. We are interested instead
5
in whether there is a change in the geographical distribution of alarmed sensors
from data collections at two different times. Such a change in distribution could
be an indication of abnormality.
First data collection Second data collection
Figure 1.1: Reported alarmed sensors (red) in two collections.
We assume that some (not necessarily all) of the alarmed sensors are reported
to a fusion center, either through the use a mobile access point (SENMA [36])
or using certain in-network routing scheme. Suppose that the fusion center ob-
tains reports of the locations of alarmed sensors, as illustrated in Fig. 1.1, from
two separate data collections. In the ith report, let the location of alarmed sen-
sors have some unknown distribution Pi, and each sample Si be a set of locations
drawn independently according to Pi. The change detection problem is one of
testing whether P1 = P2 without making prior assumptions about the data gener-
ating distributions Pi. Note that Pi only specifies the geographical distribution of
alarmed sensors. The joint distribution of alarmed and non-alarmed sensors is not
specified completely. A change in Pi may be caused by the change of the actual
phenomenon or the change of the sensor lay-out.
Such a general nonparametric assumption comes with a cost of usually requiring
large sample size, which renders the solution most applicable in large-scale sensor
networks where it is possible to obtain a large amount of sensor data.
There is also a related estimation problem in which, assuming that the detection
of change has been made, we would like to know where in the sensor field the change
has occurred, or where the change is the most significant (in a sense that will be
6
made precise later).
1.3.1 Summary of Results
We present a number of nonparametric change detection and estimation algorithms
based on an application of Vapnik-Chervonenkis Theory [40]. The basis of this ap-
proach has been outlined in [3] where we provided a mathematical characterization
of changes in distribution. Our focus is on the algorithmic side, aiming at obtain-
ing practical algorithms that scale with the sample size along with a certain level
of performance guarantee.
We first present results that establish a theoretical guarantee of performance.
The nonparametric detection problem considered here depends on the choice of the
distance measure between two probability distributions, and the choice is usually
subjective. We consider two distance measures. The first is the so-called A-
distance (also used in [3]) that measures the maximum change in probability on
A—a collection of measurable sets. The second is called relative A-distance—a
variation from that in [3]—for cases when the change in probability is concentrated
in areas of small probability. With these two distance measures, we apply the
Vapnik-Chervonenkis Theory to obtain exponential bounds1 on detection error
probabilities and establish the consistency results for the proposed detector and
estimator.
Next we derive a number of practical algorithms. The complexity of applying
the Vapnik-Chervonenkis Theory comes from the search among a (possibly infinite)
collection of measurable sets. In particular, given data S being the union of the
1Here we mean the error probabilities decay exponentially with the increase of sample size.
7
samples from the two collections, i.e., S = S1
⋃
S2, the key is to reduce the search
in an infinite collection of sets (e.g.,planer disks) to a search in a finite collection
H(S) (a function of S). Here we need a constraint on H(S) such that this reduction
does not affect the performance.
We consider three commonly used geometrical shapes—disks, rectangles, and
stripes—as our choices of measurable sets A. For the A-distance measure, if
M = |S| is the total number of data points in the two collections, we show that
a direct implementation of exhaustive search among the collection of all planer
disks has the complexity O(M4). We present a suboptimal algorithm, the Search
in sample-Centered Disks (SCD), that has the complexity O(M2 log M). Under
mild assumptions on Pi, the loss of performance of SCD diminishes as the sample
size increases. For the class of axis-aligned rectangles, we show that the opti-
mal search Search in Axis-aligned Rectangles (SAR) has complexity O(M3). A
suboptimal approach Search in Diagonal-defined axis-aligned Rectangles (SDR)
reduces the complexity to O(M2), again, with diminishing loss of performance
under mild assumptions. For the collection of strips, we present two algorithms:
Search in Axis-aligned Stripes (SAS) and Search in Random Stripes (SRS), both
have complexity O(M log M). Similar analysis has also been obtained for the
relative distance metric. See Table 2.1.
We implement several algorithms and verify their performance through simu-
lation. We also answer some practical questions arising in the implementation of
the detector, e.g.,how to decide the detection threshold and how to estimate the
minimum sample size.
8
1.3.2 Related Work
The problem of change detection in sensor field has been considered in different
(mostly parametric) settings [24,27]. The underlying statistical problem belongs to
the category of two-sample nonparametric change detection. A classical approach
is the Kolmogorov-Smirnov two-sample test [13] in which the empirical cumulative
distributions are compared, and the maximum difference in the empirical cumu-
lative distribution functions are used as test statistics. In a way, the proposed
methods generalize the idea of Kolmogorov-Smirnov test to a more general col-
lection of measurable sets using general forms of distance measures. Indeed, the
Kolmogorov-Smirnov two-sample test becomes a special case of the SAR (Search
in Axis-aligned Rectangles) algorithm presented in Section 2.4.1.
There is a wealth of nonparametric change detection techniques for one-
dimensional data set in which data are completely ordered. Examples include
testing the number of runs (successive sample points from the same collection) such
as Wald-Wolfowitz runs test, or testing the relative order of the sample points, e.g.
median test, control median test, Mann-Whitney U test, and linear rank statistic
tests [13,22,33]. Such techniques, however, do not have natural generalizations for
the two dimensional sensor network applications.
Vapnik-Chervonenkis Theory (VC Theory) is a statistical theory about com-
putational learning processes developed by Vapnik and Chervonenkis [38–40]. The
theory lays the theoretical foundation for learning-based inference methods. The
parts of the theory most related to our problem are the theory of consistency in
learning and the nonasymptotic theory of convergence rates. The theory has been
substantially developed since the original study of Vapnik and Chervonenkis; see
the book chapter by Bousquet et al. in [6] and the references therein.
9
1.4 Information Flow Detection by Timing Analysis
Consider a wireless ad hoc network illustrated in Fig. 1.2. We want to know if there
is information flowing between nodes S, R, and D. Suppose that we can observe
their transmission epochs2, as shown in Fig. 1.3. Then from the transmission
patterns, we can probably infer that S is communicating with D, and R is the
relay node.
S
DR
Figure 1.2: Is S communicating with D through R?
t
t
t
S
D
R
Figure 1.3: Transmission patterns of S, R, and D suggest a communicationbetween S and D through R.
This example belongs to the problem of detecting information flows by timing
analysis. Generally, in a wireless ad hoc network illustrated in Fig. 1.4, there
may be information flows along multiple potential routes. We want to decide
whether a particular information flow is going on by eavesdropping the traffic on
the route. Suppose that we cannot rely on nodes in the network to report the
information flows, and all the packets are encrypted and padded at every hop,
leaving only timing information to be observable. Given eavesdroppers deployed
2For example, if the nodes use transmitter directed signaling to communicate, and we knowthe transmitters’ codes, then we can deploy eavesdroppers turned to the transmitters’ codes todetect transmission activities.
10
to record transmission epochs of the nodes of interest, the problem is how to
correlate these transmission epochs to detect whether the corresponding nodes are
transmitting an information flow.
A B
S1: S2:
Uplinkchannels
Detector
Wireless node Eavesdropper
Figure 1.4: In a wireless network, eavesdroppers are deployed to report thetransmission activities of nodes A and B to a detector at the fusioncenter, which in turn decides whether there are information flowsthrough these nodes. Si (i = 1, 2): a sequence of transmission epochsof node A (or B).
Timing measurements are subject to a number of sources of perturbations. For
example, a relay node can hold the incoming packets for random periods of time,
reshuffle them, relay them in batches, etc. Furthermore, traffic on different routes
will multiplex at the intersecting nodes, and relay nodes may selectively drop
certain packets or insert dummy packets. Both traffic multiplexing and packet
dropping/insertion cause our measurements to contain packets that do not belong
to the information flow of interest. We will refer to such packets as chaff noise.
The presence of chaff noise significantly increases the difficulty of the problem.
Another challenge comes from capacity constraints in the uplink channels. In
wide-area networks such as wireless sensor networks, eavesdroppers are often pow-
ered by batteries and have to report to the fusion center with limited power.
11
Therefore, the uplink channels are subject to limited capacity constraints. The
direct consequence is that the measurements received at the fusion center will not
be identical with the raw measurements of the eavesdroppers, but will be distorted
to a certain extent. An exception is when the detector is located at one of the
eavesdroppers, in which case the detector knows the raw measurements of that
eavesdropper perfectly, referred to as the case of full side-information.
1.4.1 Summary of Results
We consider the detection of information flows through certain nodes of interest
by measuring their transmission activities in timing. We first consider detecting
information flows with exact timing measurements (i.e., centralized detection) and
then add capacity constraints in data collection (i.e., distributed detection). With
transmission activities modelled by point processes, the problem is formulated as a
hypothesis testing against point processes conforming to certain flow models. We
consider two types of flow models derived from constraints in reliable communi-
cations: the bounded delay flow and the bounded memory flow. Chaff noise does
not need to satisfy any of the constraints.
For centralized detection of information flows without chaff noise, we develop
pairwise linear-time detection algorithms by packet matching or counting schemes.
We show that these algorithms have no miss detection and exponentially decaying
false alarm probabilities if traffic under the null hypothesis can be modelled as
independent Poisson processes. We compare our algorithms with existing detec-
tion algorithms by both error exponent analysis and numerical simulations. The
comparison shows that our algorithms outperform the existing ones.
12
For centralized detection of information flows with chaff noise, we give an ex-
act characterization of the level of detectability of information flows, defined as
the maximum fraction of chaff noise allowed for Chernoff-consistent detection (de-
tailed definition is in Chapter 4). Our contributions include a converse result
and an achievability result. For the converse, we show that there is a bound on
the fraction of chaff noise beyond which Chernoff-consistent detection is impos-
sible. Specifically, the bound is characterized as the minimum fraction of chaff
noise needed to make an information flow statistically identical with traffic under
the null hypothesis. This bound is used to establish a level of undetectability for
information flows. Optimal chaff-inserting algorithms are proposed to calculate
the level of undetectability, and closed-form expressions are derived under the as-
sumption that traffic under the null hypothesis can be modelled as independent
Poisson processes. For the achievability, we develop a detector based on the opti-
mal chaff-inserting algorithms, which claims detection if the fraction of chaff noise
in the measurements computed by these algorithms is bounded by a predeter-
mined threshold. Under Poisson null hypothesis, the proposed detector is proved
to be Chernoff-consistent for all the information flows with fractions of chaff noise
bounded by the level of undetectability. Therefore, the level of detectability is
equal to the level of undetectability, and the proposed detector is optimal. We
show that the level of detectability increases to one as the number of hops in the
information flow increases, indicating that it is impossible to hide information flows
over arbitrarily long paths.
For distributed detection of information flows (with chaff noise), we focus on
the bounded delay flow model and pairwise detection. Our results have both
theoretical and algorithmic elements. In the theoretical aspect, we extend the
notions of detectability and undetectability to the context of distributed detec-
13
tion. Theoretical upper and lower bounds on the level of detectability are derived.
In the algorithmic aspect, we propose a three-stage detection procedure which
consists of quantization, data transmission, and detection. Quantization is per-
formed based on fixed slot partition. We propose a slotted quantizer and a one-bit
quantizer which compress each slot into the number of epochs and the indicator
of nonempty slot, respectively. Under each quantization, we develop a threshold
detector based on the optimal chaff-inserting algorithm analogous to those in cen-
tralized detection except that its input is quantized. With the performance of a
detection system measured by the maximum fraction of chaff noise such that the
system remains Chernoff-consistent, we compare the proposed detection systems
together with their analytical upper bounds. The comparison shows that slotted
quantization outperforms one-bit quantization for heavy traffic, and the detector
under slotted quantization and full side-information is near optimal. Performance
of the proposed detection systems gives lower bounds on the level of detectability
as a function of capacity constraints.
1.4.2 Related Work
Information flow detection is a special case of timing analysis, which in turn belongs
to the family of traffic analysis problems [12]. For wireless networks, the idea of
traffic analysis is especially promising because the shared wireless medium is open
to interception. Most of the existing work on traffic analysis in a wireless context
is experiment-oriented, e.g., [26, 35,50].
The problem of detecting information flows has mainly been addressed in the
framework of intrusion detection in wired networks, especially internet. In 1995,
Staniford and Heberlein [34] first considered the problem of stepping-stone detec-
14
tion. The key problem in stepping-stone detection is to reconstruct the intrusion
path by analyzing various characteristics of the attacking traffic. Related work in
the literature only considers pairwise detection.
Early detection techniques are based on the content of the traffic; see, e.g.,
[34, 46]. To deal with encrypted traffic, timing characteristics are used in detec-
tion, such as the On-Off detection by Zhang and Paxson [51], the deviation-based
detection by Yoda and Etoh [48], and the packet interarrival-based detection by
Wang et al. [45]. The drawback of these approaches is that they are vulnerable to
active timing perturbations by the attacker.
Donoho et al. [11] were the first to consider bounded delay perturbation. They
showed that if packet delays are bounded by a maximum amount, then it is possible
to distinguish traffic containing information flows from independent traffic. Their
work was followed by several practical detectors, including the watermark-based
detector by Wang and Reeves [44] and the counting-based detector by Blum et
al. [5].
The problem becomes much more challenging when chaff can be inserted, with
only incomplete solutions in the literature, e.g., [5, 11, 29, 49]. Donoho et al. [11]
showed that there will be distinguishable difference between information flows in
chaff and independent traffic if chaff noise is independent of the information flows.
Peng et al. [29] and Zhang et al. [49] separately proposed active and passive packet-
matching schemes which can detect information flows with bounded delay in chaff
if chaff packets only appear in the outgoing traffic of the relay node. Blum et al. [5]
modified their counting-based detector to handle a limited number of chaff packets
at the cost of an increased false alarm probability. These techniques can only deal
with a fixed number of chaff packets if nodes can insert chaff noise intelligently.
15
The dual problem of information flow detection is how to randomize transmis-
sion activities to maximally conceal information flows. This is a critical problem
in protecting anonymous communications against timing analysis attacks. In the
context of wireless ad hoc networks, Hong et al. in [23] proposed to add random
delays to prevent correlation of specific packets, and Deng et al. in [10] proposed to
randomize sensor transmission epochs within each data collection period to thwart
the detection of routes. At flow level, however, transmission activities of nodes on
the same information flow are still correlated. Zhu et al. in [52] proposed to make
traffic on all the outgoing links from a certain node identical in timing by inserting
chaff noise. Although this approach completely hides the information flow, it is
inefficient in terms of the required amount of chaff noise. More efficient methods
to hide information flows are developed based on the chaff-inserting algorithms
developed in Chapter 4; see [41].
Distributed detection of information flows belongs to hypothesis testing un-
der multiterminal data compression [16]. Solutions in this field can model spatial
correlation across nodes, which generalize the conditional i.i.d. assumption made
in classical distributed detection [37], but they can not deal with temporal cor-
relation. Specifically, existing work only deals with temporal i.i.d. data, i.e., the
observations (xi, yi) (i = 1, 2, . . .) are drawn i.i.d. from a distribution P , where
P = P0 under H0, and P = P1 under H1. The best error exponent (or its lower
bounds) is derived as a function of data compression rates under the Neyman Pear-
son framework; see [16] and the references therein. Our problem is fundamentally
different because the timing measurements of information flows are not i.i.d. , and
our hypotheses do not have a single-letter characterization.
The problem of compressing Poisson processes has been studied previously
16
in [31, 42]. Rubin in [31] derived the rate distortion function and practical com-
pression schemes under the absolute-error distortion measure. Verdu in [42] derived
a closed-form expression for the rate distortion function under an asymmetric dis-
tortion measure on interarrival times. One of our quantization schemes, slotted
quantization, is the same as the quantization scheme proposed by Rubin. Although
slotted quantization is near optimal in Rubin’s problem, it is not necessarily near
optimal in our problem since we want to optimize the overall detection performance
whereas Rubin just wanted to reconstruct the processes.
17
Chapter 2
Nonparametric Change Detection and
Estimation
2.1 Outline
In this chapter, statistical learning based techniques are proposed to detect changes
in a 2D random field without statistical knowledge about the underlying distribu-
tions. Section 2.2 specifies the model and defines the detector and the estimator.
Section 2.3 states the main theorems about the exponential bounds on error prob-
abilities of the detector and the consistency of the estimator. Section 2.4 presents
the detection and estimation algorithms, and Section 2.5 provides simulation re-
sults. The chapter is concluded with comments about the strengths and weaknesses
of the proposed approach.
2.2 The Problem Statement
2.2.1 The Model
Let set Ω denote the sensor field and F the σ-field on Ω. We assume that in each
data collection, we draw i.i.d. samples from the locations of alarmed sensors. Let
Pi (i = 1, 2) be a probability measure on (Ω, F) modelling the drawing in the ith
collection. Drawings in different collections are independent. Let Si denote the set
of locations collected in the ith collection and S = S1
⋃
S2 the set that contains
18
data from the two collections.
We point out that the joint distribution of sensor location and report, which
is influenced by sensor layout, readings, and sampling strategy, is not completely
specified. Note that although the i.i.d. assumption implies that the decisions of
alarm occur independently, the decisions are not necessarily identically distributed,
and the probability of alarm may vary at different locations. Moreover, the prob-
ability that an alarmed sensor reports to the fusion center may also be different
across sensors. Both of these probabilities can be incorporated into Pi. Note that
how unalarmed sensors are distributed is not specified; we can model arbitrary
correlations among them, and they will not have any impact on our result. This
allows us to model certain types of correlated sensor readings.
The probability measures Pi (i = 1, 2) are not known. Instead of making specific
assumptions on the form of Pi, we introduce a collection A ⊆ F of measurable sets
to model the geographical areas of practical interest and only look for changes in
the probabilities of sets in A. The collection A represents our prior knowledge of
what changes are expected. It does not have to be finite or even countable, and
is part of the algorithm design. For example, if we expect changes in the mean
of a symmetric distribution with monotone decreasing density from a center, it
may be good to choose A as the collection of disks. The choice of A is subjective,
and it depends on the application at hand. We will focus in this chapter on
regular geometrical shapes: disks, rectangles, and stripes. Intuitively, disks and
rectangles are suitable for changes in the location or spread of the probability mass,
and stripes (a special type of rectangles) are better for changes in correlation or
marginal distributions. We point out that although parametric model for Pi is not
needed, prior knowledge helps detection by allowing us to choose A which “fits”
19
the changes best, as discussed after Theorem 1.
Given a pair of samples S1, S2 drawn i.i.d. from distributions P1, P2, and a
collection A ⊆ F , we are interested in whether there is a change in probability
measure on A and, if there is a change, which set in A has the maximum change
of probability. Specifically, the detection problem considered in this chapter is the
test of the following hypotheses on A
H0 : P1 = P2 vs. H1 : P1 6= P21
The estimation problem, conditioned on that there is a change, is to estimate
the set A∗ ∈ A that has the maximum change. For example, using the absolute
difference, we want to estimate
A∗ = arg maxA∈A
|P1(A) − P2(A)|.
We will also consider a normalized difference measure in Section 2.2.2.
2.2.2 Distance Measures
To measure changes, we need some notion of distance between two probability
distributions. In this chapter, we will consider two distance measures: A-distance
and relative A-distance.
Definition 1 (A-distance and empirical A-distance [3]) Given probability spaces
(Ω,F , Pi) and a collection A ⊆ F , the A-distance between P1 and P2 is defined as
dA(P1, P2) = supA∈A
|P1(A) − P2(A)|. (2.1)
1Here H0 means P1(A) = P2(A) for all A ∈ A and H1 ∃A ∈ A s.t. P1(A) 6= P2(A).
20
The empirical A-distance dA(S1, S2) is similarly defined by replacing Pi(A) by the
empirical measure
Si(A)∆=
|Si
⋂
A||Si|
(2.2)
where |Si ∩ A| is the number of points in both Si and set A.
This notion of empirical A-distance dA(S1, S2) is related to the Kolmogorov-
Smirnov two-sample statistic. For the case where the domain set is the real line,
the Kolmogorov-Smirnov test considers
supx
|F1(x) − F2(x)|, Fi(x)∆= Pi(y : y ≤ x)
as the measure of difference between two distributions. By setting A to be the
set of all the one-sided intervals (−∞, x), dA(S1, S2) is the Kolmogorov-Smirnov
statistic.
The A-distance does not take into account the relative significance of the
change. For example, one could argue that changing the probability of a set from
0.99 to 0.999 is less significant than a change from 0.001 to 0.01 because the latter
is a ten-fold increase whereas the former is just an increase by less than 1%. For
applications in which small probability sets are of interest, we introduce the fol-
lowing notion of relative A-distance that takes the relative magnitude of a change
into account.
Definition 2 (Relative and Empirical Relative A-distance) Given proba-
bility spaces (Ω,F , Pi) and a collection A ⊆ F , the relative A-distance between
P1 and P2 is defined as
φA(P1, P2) = supA∈A
fφ(P1(A), P2(A)), (2.3)
21
where fφ : [0, 1] × [0, 1] → [0,√
2] is defined as
fφ(x, y) =
0 if x = y = 0
|x−y|√x+y
2
o.w.. (2.4)
The empirical relative A-distance is defined similarly by replacing Pi(A) with
the empirical measure defined in (2.2).
The above definition is slightly different from that used in [3]. It is obvious
that |P1(A) − P2(A)| is a metric. The proof that |P1(A)−P2(A)|√
P1(A)+P2(A)2
is a metric follows
from [2]. Note that in general dA(P1, P2) = 0 or φA(P1, P2) = 0 does not imply
P1 = P2, but implies P1(A) = P2(A) for any A ∈ A. If we only care about sets in
A, then dA and φA defined above are pseudo-metrics.
2.2.3 Detection and Estimation
With the distance measure defined, we can now specify the class of detectors and
estimators considered in this chapter.
Definition 3 (Detector δ(S1, S2; ǫ):) Given two collections of sample points S1
and S2, drawn i.i.d from probability distributions P1 and P2 respectively, and thresh-
old ǫ ∈ (0, 1), for hypotheses H0 vs. H1, the detector using the A-distance is defined
22
as2
δdA(S1, S2; ǫ) =
1 if dA(S1, S2) > ǫ
0 otherwise(2.5)
The detector δφA(S1, S2; ǫ) using the relative A-distance is defined the same way by
replacing dA(S1, S2) by φA(S1, S2) and letting ǫ ∈ (0,√
2).
Assuming that a change of probability distribution has occurred, we define the
estimator for the event that gives the maximum change in probabilities.
Definition 4 (Estimator A∗(S1, S2):) Given two collection of sample points S1
and S2, drawn i.i.d from probability distributions P1 and P2 respectively, the esti-
mator for the event that gives the maximum change of probability is defined as3
A∗dA
(S1, S2) = arg maxA∈A
|S1(A) − S2(A)| .
The estimator A∗φA
(S1, S2) using the relative A-distance is defined similarly.
The definitions given above require searching in a possibly infinite collection
of sets. At the moment, we only specify what the outcome should be without
addressing the algorithmic procedure generating it. We will address that issue in
Section 2.4.
2We use the convention that the detector gives the value 1 for H1 and 0 for H0. Note that ifnǫ is an integer, then dA(S1, S2) will be equal to ǫ with positive probability. Ideally, in Neyman-Pearson framework, randomization is used when dA(S1, S2) = ǫ to achieve lower miss probabilityalthough it complicates the analysis. Instead, we stick to such a deterministic detector and derivean explicit expression for the threshold by Vapnik-Chervonenkis inequalities. See the discussionsfollowing Theorem 1.
3In the case of tie, choose any one of the sets achieving the maximum change of empiricalprobability.
23
2.3 Performance Guarantee
We present in this section consistency results for the detector and estimator pre-
sented earlier. The results are given in the forms of error exponents.
First let us look at some technical preliminary from [39]. For measurable space
(Ω,F), let A ⊆ F . We say a set S ⊂ Ω is shatterable by A if for all B ⊆ S,
∃A ∈ A s.t.
B = A ∩ S.
Definition 5 (VC-Dimension) The Vapnik-Chervonenkis dimension of a col-
lection A of sets is
VC-d(A) = supn : ∃S s.t. |S| = n and S is shatterable by A.
The VC-dimension of a class of sets quantifies its ability to separate sets of points.
Intuitively, the VC-dimension of a class A is the maximum number of free param-
eters needed to specify a set in A. For example, if A = 2D disks, then we see
that at most 3 free parameters are needed — x, y-coordinates of the center and a
radius, and it is shown that the VC-dimension of A is indeed 3 ( [47]).
Note that the VC-dimension of a class may be infinite,e.g.,VC-dimension of the
entire σ-field F is ∞ because any set is shatterable by F .
Theorem 1 (Detector Error Exponents) Given probability spaces (Ω,F , Pi)
and a collection A ⊆ F with finite VC-dimension d, let Si ⊂ Ω be a set of n sample
points drawn according to Pi. The false alarm probabilities for the detectors defined
24
in (2.5) are bounded by
PF (δdA) ≤ 8(2n + 1)de−nǫ2/32, (2.6)
PF (δφA) ≤ 2(2n + 1)de−nǫ2/4. (2.7)
Furthermore, if dA(P1, P2) > ǫ and φA(P1, P2) > ǫ, the miss detection probabil-
A few remarks are in order. First, if the maximum change between P1 and P2
on A exceeds ǫ, the detector detects the change with probability arbitrarily close
to 1 as the sample size goes to infinity. Similarly, if there is no change in Pi on A,
then the probability of false alarm also goes to zero. Notice that the decay rates
of the error probabilities are different when the two different distance measures
are used; from (2.6,2.7), the decay rate of false alarm probabilities for the detector
using φA is eight times that using dA.
Second, the above theorem provides a way of deciding the detection threshold
ǫ for a particular detection criterion. For example, the threshold (not necessarily
optimal) of the Neyman-Pearson detection for a given size α can be obtained from
the bounds on false alarm probabilities. Theorem 1 suggests that we should choose
25
(n, ǫ) such that
8(2n + 1)de−nǫ2/32 ≤ α for δdA (2.10)
2(2n + 1)de−nǫ2/4 ≤ α for δφA. (2.11)
Taking ǫ(n) to make the inequalities equal gives a threshold4
ǫ(n) =
√
32n
log 8(2n+1)d
αfor δdA
√
4n
log 2(2n+1)d
αfor δφA
(2.12)
We shall think of ǫ(n) as a measure of detector sensitivity. From (2.8,2.9)
in Theorem 1, we see that miss detection probability starts to drop exponen-
tially when ǫ(n) < dA(P1, P2) or ǫ(n) < φA(P1, P2). Thus, roughly, ǫ(n) is a
lower bound on the amount of changes in order for the change to be detected
with high probability. Furthermore, the smaller the ǫ(n), the larger the values of
[dA(P1, P2)− ǫ(n)]2/32 and [φA(P1, P2)− ǫ(n)]2/16, and the lower the upper bound
on miss detection probability. One should be cautioned that although the error
probabilities decay exponentially, the error exponents could be small, and thus a
large sample size may be required. For example, for d = 2 and ǫ = 0.1, 105 sample
points are required to guarantee a false alarm probability bounded by 5% for the
A-distance based detector. We can reduce the sample size to 104 by using the
detector based on the relative A-distance.
Third, note that the VC-dimension d of A has diminishing effects on the rate
of decay of error probabilities. The selection of A, however, may affect the error
exponent through dA or φA. Furthermore, the selection of A has a significant
impact on the complexity of practically implementable algorithms.
Finally, we should also note that, while we have stated the above theorem
4All the log in this thesis is natural logarithm.
26
under |Si| = n, the results generalized easily to the case when two collections have
difference sizes.
The consistency of the estimator is implied by the following theorem.
Theorem 2 Given probability spaces (Ω,F , Pi) (i = 1, 2) and a collection A ⊆ F
with finite VC-dimension, if A∗dA
∆= arg max
A∈A|P1(A) − P2(A)| is separated from the
rest of A in the sense that5
|P1(A∗dA
) − P2(A∗dA
)| − supB∈A\A∗
dA|P1(B) − P2(B)| > 0,
then A∗dA
(S1, S2) converges to A∗dA
in probability. Similar result holds for A∗φA
.
Proof: See Appendix 2.A.
2.4 Algorithms
We now turn our attention to practically implementable algorithms and their com-
plexities. The key step is to obtain test statistics within a finite number of oper-
ations, preferably with the complexity that scales well with the total number of
data points M = |S1
⋃
S2|.
Given sample points S = S1
⋃
S2 and a possibly infinite collection of sets A,
we need to reduce the search in A to a search in a finite collection H(S) ⊂ A, and
replace dA(S1, S2) by dH(S1, S2). If H is not chosen properly, such a reduction of
5If Pi’s are continuous, and A∗dA
can be approximated arbitrarily by other sets in A, thenthis condition will not be satisfied. In that case, we have results on the estimation performanceevaluated by the amount of change in the estimated set [20].
27
the search domain may lead to a loss of performance. Thus we need the notion of
completeness when choosing the search domain.
Definition 6 (Completeness) Given A being a collection of measurable subsets
of space Ω, and S ⊂ Ω be a set of points in Ω. Let H(S) ⊂ A be a finite sub-
collection of measurable sets which is a function of S. We call the collection H(S)
complete for S with respect to A if ∀A ∈ A, there exists a B ∈ H(S) such that
S ∩ A = S ∩ B.
The significance of the completeness is that, if H(S1∪S2) is complete w.r.t. A,
then dA(S1, S2) = dH(S1, S2) and φA(S1, S2) = φH(S1, S2).
For the choice of A, we consider regular geometric areas, e.g.,disks, rectangles,
and stripes. We present next six algorithms for different choices of A and sub-
collection H. We first present complete algorithms, i.e. the sub-collection H is
complete with respect to A. Next we give a couple of heuristic algorithms which
simplify the computation at the cost of a loss in completeness.
Hereinafter all sets defined are closed sets unless otherwise stated.
2.4.1 Complete Algorithms
Search in Planar Disks (SPD)
Let A be the collection of two dimensional disks. Let VC-d denote the VC-
dimension of a class. The following result is proved by [15]:
28
Proposition 1
VC-d(A) = 3.
For the set of sample points S ⊆ Ω, consider the finite sub-collection of A
defined by
HD(S)∆=
⋃
(si,sj ,sk)∈THD(si, sj, sk) (2.13)
where
T ∆= si, sj, sk ∈ S3 : si, sj, sk are not collinear,
Figure 2.7: Detection threshold as a function of the sample size for differentVC-dimension’s
Fig.2.8 shows that the detection threshold is not sensitive to the maximum
false alarm α. We see that given a certain sample size, a detector with a larger size
would not have a much smaller detection threshold. Hence increasing the sample
size is usually the only way to improve the accuracy of the detector.
48
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9Theoretical Threshold w.r.t. False Alarm Probability VC−dimension = 2
size
dete
ctio
n th
resh
old
dA: n = 1000
n = 2000 n = 3000 φ
A: n = 1000
n = 2000 n = 3000
Figure 2.8: Detection threshold as a function of the detector size for differentsample sizes
2.5.3 Performance
We focus on miss detection in our Monte Carlo simulations. Fig. 2.9 and Fig. 2.10
show the miss detection probability vs. sample size. We observe that there is a
threshold sample size beyond which the miss detection probability drops sharply.
This can be explained using Theorem 1, which states that the upper bound on
miss detection probability begins to drop when ǫ(n) < dA(P1, P2) for δdA or ǫ(n) <
φA(P1, P2) for δφA, and once it starts to drop, it drops exponentially. A heuristic
argument on the minimum sample size would be that the sample size n should be
s.t.
ǫ(n) =
√
32
nlog
8(2n + 1)d
α≤ dA(P1, P2) for δdA
(2.36)
ǫ(n) =
√
4
nlog
2(2n + 1)d
α≤ φA(P1, P2) for δφA
(2.37)
If we know P1 and P2, we can calculate dA(P1, P2) and φA(P1,2 ) to obtain a
lower bound on n by solving the inequalities (2.36) and (2.37). An observation is
49
that this estimation is close to the minimum sample size required in the simulation.
For example, in our simulation setup, the estimated minimum sample sizes for
Algorithm SAS and SCD using A-distance metric are both 2725, and that for
SCD using relative A-distance metric is 53. As indicated in Fig. 2.9 and Fig. 2.10,
they all agree well to the sharp drop in missing detection probabilities.
1000 2000 3000 4000 5000 6000 7000 800010
−2
10−1
100
n
PM
Miss Detection vs. Sample Size size: α = 0.05
Algorithm SAS(dA)
Algorithm SRS(dA)
Algorithm SCD(dA)
Algorithm SDR(dA)
Figure 2.9: Miss detection probability of δdA as a function of the sample size:simulation results. Here p = 0.98, q = 0.02, r = s/12. Use 1000Monte Carlo runs.
0 100 200 300 400 500 600 700 800 900 100010
−3
10−2
10−1
100
n
PM
Miss Detection vs. Sample Size size: α = 0.05
Algorithm SAS(φA)
Algorithm SRS(φA)
Algorithm SCD(φA)
Algorithm SDR(φA)
Figure 2.10: Miss detection probability of δφAas a function of the sample size:
simulation results. Here p = 0.98, q = 0.02, r = s/12. Use 10000Monte Carlo runs.
As expected, both threshold and miss detection probability are decreasing func-
tions of sample size, which reflects a trade-off between detection precision and
50
sampling time, energy consumed and data processing expense.
We also plot the detection probability w.r.t. the size of the detector. See
Fig. 2.11 and Fig. 2.12. The plot shows the detection probability does not increase
significantly with the increase of the detector size, which is expected because the
size affects detection probability only through the threshold, and the threshold is
not sensitive to the change of size (see Fig. 2.8).
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.905
0.91
0.915
0.92
0.925
0.93
0.935
0.94
Detection Prob. vs. Detector Size: δ(dA) sample size = 3000
Figure 2.12: Detection probability of δφAas a function of detector size, 10000
Monte Carlo runs.
Note that by choosing the threshold from the upper bound in (2.38) and (2.41),
we only guarantee the false alarm is upper bounded by α. Our simulation shows
51
the actual false alarm probability can be much less than the size of the detector8,
which implies that the theoretical threshold is a loose upper bound of the actual
minimum threshold needed to guarantee the required detector size. This is be-
cause of the nonparametric nature of the theoretical threshold. This threshold is
proved to satisfy the size constraint under arbitrary distributions by the Vapnik-
Chervonenkis Theory. Therefore for a given distribution, this threshold may be
loose.
For comparison among the algorithms, an obvious observation is that δφAout-
performs δdA in detection probability. This is because on one hand, given n and α,
using (2.36,2.37) to choose threshold yields that ǫ(n) for φA is 1/2√
2 smaller than
that for dA; on the other hand, we have φA(S1, S2) ≥ dA(S1, S2). Therefore in our
simulation it is easier for algorithms using statistic φA(S1, S2) to detect a change.
However, this is caused by the specific way to decide the detection threshold, and
does not imply that δφAis uniformly better than δdA .
An intuitive guideline in algorithm design is that the better sets in A separate
the probability mass in P1 and P2 and the simpler A is, the better the detector
performance is, e.g. Algorithm SCD performs better than Algorithm SAS and
SRS. Moreover, we can introduce random factors into the algorithm to make it
more robust, e.g. we randomize SAS to be SRS so as to make it independent of
the direction in which change occurs.
8For example, in our simulation of Algorithm SAS and SRS, for sample size up to 10, 000using 1000 Monte Carlo runs, we encounter no false alarm at all.
52
2.6 Extension to Finite-level Sensor Measurements
We have presented our results based on collecting sensor locations of sensors with
the same report (i.e., “alarm”). Extension can be made to applications with finite-
level sensor measurements.
Without loss of generality, let each sensor report either it is alarmed (say,
measurement level 1) or it is not alarmed (level 0). In such a case, the ith data
collection is modelled by probability space (Ω×0, 1,F , Pi) where F is a σ-field on
Ω× 0, 1. Let random variable x ∈ Ω denote the sensor location, and L ∈ 0, 1
denote the sensor report. In the ith collection, (x, L) has joint distribution Pi, and
the location of alarmed sensors has conditional distribution Pi|L=1. It is easy to
see that there are cases when Pi changes but Pi|L=1 does not. Hence by collecting
both types of sensor reports, we are able to detect a wider range of changes.
To apply the algorithms presented previously, choose class A′ to be the collec-
tion of sets from A in either 0-plane or 1-plane, i.e., A′ = A×0, 1. For instance,
the collection of planar disks becomes the collection of planar disks with either
measurement 0 or measurement 1. Algorithms should be applied to both 0-plane
and 1-plane and we choose the larger as the test statistics dA(S1, S2) or φA(S1, S2).
The detection and estimation performance guarantee still holds, but note that the
sample size now becomes the total number of sensor reports collected (rather than
the number of alarms collected). Note that the VC-dimension of such a class A′
remains the same as that of A:
Proposition 9 For a class A of planar sets,
VC-d(A× 0, 1) = VC-d(A).
53
Proof:
It is easy to see that VC-d(A× 0, 1) ≥VC-d(A).
For any set S, if S contains points from different planes, S is not shatterable
because no set in A × 0, 1 contains points from different planes. If S only
contains points in one plane, it is shatterable only if |S| ≤ VC-d(A). Therefore,
VC-d(A× 0, 1) ≤VC-d(A).
¥
2.7 Summary
We have presented in this chapter a nonparametric approach to the detection of
changes in the distribution of alarmed sensors. We have provided exponential
bounds for the miss detection and false alarm probabilities. The error exponents
of these probabilities provide useful guideline for determining the number of sample
points required.
We have also proposed several nonparametric change detection and estimation
algorithms. Here we have aimed at reducing the computation complexity while
preserving the theoretical performance guarantee by using recursive search strate-
gies that reuse earlier computations, which gives us two near linear-complexity
algorithms SAS and SRS. The more expensive algorithms SCD and SDR also have
their roles, despite their near square cost, especially in detecting changes of highly
clustered distributions. This is because the search classes in Algorithm SCD and
SDR may yield larger distance than the more simplified classes, which in turn gives
54
larger error exponents as indicated in Theorem 1. Moreover, Algorithm SCD is
much more efficient than the exhaustive algorithm SPD with complexity O(M4),
and Algorithm SDR also improves the complexity of its exhaustive counterpart
Algorithm SAR significantly. Complexities of different algorithms presented so far
are summed up in the following table.
Table 2.1: Time Complexity ComparisondA φA
SPD O(M4) O(M4)SCD O(M2 log M) O(M2 log M)SAR O(M3) O(M4)SDR O(M2) O(M2)SAS O(M log M) O(M2)SRS O(M log M) O(M2)
Besides running time, one may also care about the amount of storage used for
executing the algorithms. Obviously O(M) space is needed to store S1 and S2,
In this chapter, we address the detection of information flows mixed with chaff
noise. The main contribution is a tight characterization of flow detectability as
the maximum amount of chaff noise allowed for consistent detection. The rest
of the chapter is organized as follows. Section 4.2 defines the problem. Section
4.3 summarizes our results on the detectability of information flows. Sections 4.4
and 4.5 present chaff-inserting algorithms for the optimal embedding. Section 4.6
presents the detector and analyzes its performance. The analysis is supported by
simulation results in Section 4.8. Section 4.7 comments on the generalization of the
Poisson assumption. Then Section 4.9 concludes the chapter with remarks on its
contributions. Appendix 4.A includes all the proofs, and Appendix 4.C contains
pseudo code implementations of all the proposed algorithms.
4.2 Problem Formulation
We use the same convention for notations as in Chapter 3.
87
4.2.1 Multi-hop Flow Models
The two-hop flow models in Section 3.2.2 can be extended, in a natural way, to
flows over multiple hops. Suppose that we are interested in detecting information
flows through n (n ≥ 2) nodes, as illustrated in Fig. 4.1. Let Si (i = 1, . . . , n) be
the process of transmission epochs of node Ri, i.e.,
Si = (Si(1), Si(2), Si(3), . . .), i = 1, 2, . . . , n,
where Si(k) (k ≥ 1) is the kth transmission epoch of Ri.
S1:
S2:
R2
Sn:
R1 Rn
· · ·
Figure 4.1: Detecting information flows through nodes R1, R2, . . . , Rn bymeasuring their transmission activities; dotted lines denote apotential route.
F1 F2 Fn−1 Fn
R1 R2Rn
· · ·· · ·
Figure 4.2: An information flow along the path R1 → . . . → Rn.
If (Si)ni=1 contains an information flow, then it can be decomposed into an
information-carrying part (Fi)ni=1 and a chaff part (Wi)
ni=1:
Si = Fi
⊕
Wi, i = 1, · · · , n, (4.1)
where the information-carrying part consists of packets sent by R1 and relayed
sequentially by Ri (i = 2, . . . , n) as illustrated in Fig. 4.2. Note that chaff noise
is not subject to any constraints on information flows and can be correlated with
the information flows.
88
We extend the definition of information flows from two hops to arbitrary hops
as follows.
Definition 9 A sequence of processes (F1, . . . , Fn) is an information flow if for
every realization fi (i = 1, . . . , n), there exist bijections gi : Fi → Fi+1 (i =
1, . . . , n − 1) such that gi(s) − s ≥ 0 for all s ∈ Fi. For an information flow
with bounded delay ∆, gi(s)− s ≤ ∆ for all s ∈ Fi; for an information flow with
bounded memory M , gi satisfies
0 ≤ |Fi ∩ [0, t]| − |Fi+1 ∩ [0, t]| ≤ M (4.2)
for any t ≥ 0.
The bijection gi is a mapping between the transmission epochs of the same
packets at nodes Ri and Ri+1. For explanation of this definition, we refer to the
comments after Definition 7. Although in this definition, we have assumed equal
delay or memory constraint at every relay node, it can be easily generalized to
unequal constraints. Again, the constants ∆ and M are assumed to be known.
4.2.2 Problem Statement
We are interested in testing the following hypotheses:
H0 : S1, S2, . . . , Sn are jointly independent;
H1 : (Si)ni=1 contains an information flow,
by observing Si (i = 1, . . . , n) for some time t (t > 0). No statistical assumptions
are made for Fi and Wi (i = 1, . . . , n) under H1, but the distributions of Si
89
(i = 1, . . . , n) are assumed to be known under H0 (they are assumed to be Poisson
processes in our analysis). We point out that although Poisson assumption is
needed to obtain explicit expressions, the idea of detection is applicable for general
point processes.
Remark: The above is a test of independent traffic against end-to-end informa-
tion flows. Since the complement of H0 is not H1, one should view this test as part
of an overall detection scheme. For example, if we observe realizations s1, . . . , sN ,
and we want to find out whether a subset of the processes contains an information
flow, we can first apply the above hypothesis testing to every pair of realizations
(si, sj) (i, j ∈ 1, . . . , N) to test if this pair contains an information flow, and
then if there is no detection on pairs, we extend the scope to every triple, etc. That
is, we can sequentially test H0 versus H1 on every subset (si)i∈I (I ⊆ 1, . . . , N)
for |I| = 2, . . . , N . This procedure helps us to simplify the detection of partial
information flows which may only go through a subset of the monitored nodes to
the detection of end-to-end flows.
To characterize the amount of chaff noise, we introduce the following definition.
Definition 10 Given realizations of an information flow (fi)ni=1 and chaff noise
(wi)ni=1, the chaff-to-traffic ratio (CTR) is defined as
CTR(t)∆=
n∑
i=1
|Wi ∩ [0, t]|n∑
i=1
|Si ∩ [0, t]|, CTR
∆= lim sup
t→∞CTR(t) (4.3)
In words, CTR(t) is the fraction of chaff packets in the first t period of time
and CTR its asymptotic value. We are interested in the asymptotic detection
performance with respect to CTR.
90
Since we consider a nonparametric alternative hypothesis in which distributions
of Fi and Wi (i = 1, . . . , n) are unknown, we borrow the notion of Chernoff-
consistency in [32] to introduce the following performance measure.
Definition 11 A detector δt is called r-consistent (r ∈ [0, 1]) if it is Chernoff-
consistent for all the information flows with CTR bounded by r a.s.1, that is, the
false alarm probability PF (δt) and the miss probability PM(δt) satisfy
1. limt→∞
PF (δt) = 0 for any (Si)ni=1 under H0;
2. sup(Si)n
i=1∈Plimt→∞
PM(δt) = 0, where
P = (Si)ni=1 : (Si)
ni=1 contains an information flow,
and lim supt→∞
CTR(t) ≤ r a.s..
The consistency of a detector is defined as the supremum of r such that the detector
is r-consistent.
4.3 Flow Detectability
We first give the general detectability result, starting with the following definitions.
Definition 12 For n-hop information flows with bounded delay ∆, the level of
weak detectability, denoted by α∆n , is defined as
α∆n
∆= supr : ∀(Si)
ni=1 containing an information flow with
bounded delay ∆, if lim supt→∞
CTR(t) ≤ r a.s., then
∃ a Chernoff-consistent detector for (Si)ni=1..
1Here a.s. means “almost surely”.
91
The level of strong detectability, denoted by α∆n , is defined as
α∆n
∆= supr : ∃ δt s.t. δt is r-consistent..
For information flows with bounded memory, the levels of weak and strong de-
tectabilities, denoted by αMn and αM
n , are defined similarly.
By definition, the weak detectability allows the detector to depend on the
distribution of information flows, whereas the strong detectability does not. Thus
the level of weak detectability is no lower than that of strong detectability, i.e.,
αjn ≤ αj
n (j = ∆, M).
With a sufficient amount of chaff noise, the nodes can make traffic containing
an information flow mimic arbitrary traffic patterns, including the traffic patterns
under H0. Therefore, there must be some limits on the amount of chaff noise be-
yond which information flows are no longer detectable. A basic limit is the amount
of chaff noise sufficient to make an information flow statistically identical with in-
dependent traffic. Specifically, we define a notion of the level of undetectability as
follows.
Given H0, define the level of undetectability as2
β∆n
∆= infr ∈ [0, 1] : ∃(Fi)
ni=1, (Wi)
ni=1 satisfying:
1) (Fi
⊕
Wi)ni=1
d= (Si)
ni=1 for some
(Si)ni=1 under H0;
2) (Fi)ni=1 is an information flow with
bounded delay ∆;
3) lim supt→∞
CTR(t) ≤ r a.s.. (4.4)
2Here “d=” means equal in distribution.
92
That is, β∆n is the minimum CTR for an n-hop information flow with bounded delay
∆ to be equal to traffic under H0 in distribution. The corresponding quantity βMn
for bounded memory flows is defined similarly.
Our main results are the following relationships among the levels of weak and
strong detectabilities and the level of undetectability.
Theorem 5 If Si (i = 1, . . . , n) are Poisson processes of bounded rates under H0,
then
αjn = αj
n = βjn, j = ∆, M.
Remark: This theorem states that for Poisson null hypothesis, the levels of weak
and strong detectabilities are equal and equal to the minimum fraction of chaff to
mimic the null hypothesis. For CTR less than βjn (j = ∆, M), any information
flow can be detected consistently by the same detector; for CTR above or equal
to βjn, there is a method to hide the information flow among chaff noise such that
consistent detection is impossible. We will give explicit expressions for βjn or its
bounds later.
Proof: The proof contains a converse part and an achievability part. For the
converse part, we need to show that αjn ≤ βj
n (j = ∆, M). By the definition of
βjn, there exists (Si)
ni=1 such that it contains an information flow with βj
n fraction
of chaff, and S1, . . . , Sn are truly independent Poisson processes. Thus, it is
impossible to have a Chernoff-consistent detector for this information flow, which
implies that βjn is an upper bound on the level of weak detectability.
For the achievability part, we need to show that αjn ≥ βj
n (j = ∆, M). The
approach is to design a detector which is r-consistent for r arbitrarily close to βjn.
93
The detector is presented later in Definition 13 and analysis of its consistency in
Theorems 11 and 12. Combining the converse and the achievability results and
the fact that αjn ≤ αj
n (j = ∆, M) gives Theorem 5.
¥
In the following sections, we will explain how to compute βjn (j = ∆, M) and
how to do the detection.
4.4 Detectability of Two-hop Flows
In this section, we consider 2-hop information flows (i.e., n = 2). Given the
distribution of (S1, S2) under H0, we aim at characterizing the value of βj2 (j =
∆, M).
Our approach is to first find the algorithms which optimally partition Si (i =
1, 2) into Fi and Wi such that (F1, F2) is an information flow, and the CTR is
minimized, and then calculate βj2 by analyzing the CTR of these algorithms under
H0. Such algorithms are called chaff-inserting algorithms, and the CTR of these
algorithms is defined as the CTR of the partitioned traffic.
4.4.1 Two-hop Flows with Bounded Delay
Suppose that nodes R1 and R2 want to send a 2-hop information flow with bounded
delay ∆, and they are allowed to design the insertion of chaff noise. The question
94
is how to insert the minimum amount of chaff noise such that S1 and S2 become
statistically independent.
To answer this question, Blum et al. in [5] proposed a greedy algorithm called
“Bounded-Greedy-Match” (BGM) which works as follows: given a realization
(s1, s2),
1. match every packet transmitted at time s in the first process s1 with the first
unmatched packet transmitted in [s, s + ∆] in the second process s2;
2. label all the unmatched packets in s1 and s2 as chaff.
See Fig. 4.3 for an illustration of BGM. It is easy to see that BGM has complexity
O(|S1| + |S2|). For a pseudo code implementation of BGM, see Appendix 4.C.
s1
s2
Chaff
∆
Figure 4.3: BGM: a sequential greedy match algorithm.
Algorithm BGM has been shown in [5] to be the optimal chaff-inserting algo-
rithm for 2-hop information flows with bounded delay, as stated in the following
proposition.
Proposition 12 ( [5]) For any realization (s1, s2), BGM inserts the minimum
number of chaff packets in transmitting an information flow with bounded delay ∆.
The optimality of BGM allows us to characterize the minimum chaff needed
to mimic completely independent traffic by analyzing the CTR of BGM. If, in
95
particular, the independent traffic can be modelled as Poisson processes, then we
prove the following results.
Theorem 6 If S1 and S2 are independent Poisson processes of rates λ1 and λ2,
respectively, then with probability one, the CTR of BGM satisfies
limt→∞
CTRBGM(t)
=
(λ2−λ1)(
1+(
λ1λ2
)
e∆(λ1−λ2))
(λ1+λ2)(
1−(
λ1λ2
)
e∆(λ1−λ2)) if λ1 6= λ2,
11+λ1∆
if λ1 = λ2.
Proof: See Appendix 4.A.
¥
It is easy to show that if λi ≤ λ (i = 1, 2), then the CTR of BGM is lower
bounded by 1/(1 + λ∆). By the optimality of BGM, we see that the following
result holds.
Corollary 1 If under H0, S1 and S2 are independent Poisson processes with max-
imum rate λ, then the level of undetectability β∆2 = 1/(1 + λ∆).
With 1/(1+λ∆) fraction of chaff noise, the 2-hop traffic containing an informa-
tion flow with bounded delay can be made identical with traffic under H0 so that
no detector can detect this flow consistently. Note that as λ∆ → ∞, the value of
β∆2 will decrease to zero, implying that it is easy to mimic H0 if the traffic load is
heavy (large λ) or the delay bound is loose (large ∆).
96
4.4.2 Two-hop Flows with Bounded Memory
Consider the transmission of a 2-hop information flow with bounded memory M .
We want to find a method that schedules transmissions according to independent
traffic while inserting the minimum amount of chaff noise.
The bounded memory constraint requires that the memory size used at the
relay node to store relay packets is always bounded between 0 and M . Thus, a
feasible scheduling is to keep updating the memory size for each arrival (i.e., a
packet in S1) or departure (i.e., a packet in S2), and assign that packet to be chaff
if the memory is overflowed or underflowed. Based on this idea, we develop a chaff-
inserting algorithm called “Bounded-Memory-Relay” (BMR). Given a realization
(s1, s2) and (sk)∞k=1
∆= s1
⊕
s2, let M1(k) be the memory size after the transmission
of the kth packet in s1
⊕
s2. Algorithm BMR does the following: for k = 1, 2, . . .,
1. label a packet sk as chaff if and only if this packet will cause a memory
overflow, i.e., sk ∈ S1 and M1(k − 1) = M , or underflow, i.e., sk ∈ S2 and
M1(k − 1) = 0; initially, M1(0) = 0;
2. compute M1(k) by3
M1(k) =
M1(k − 1) if sk = chaff,
M1(k − 1) + Isk∈S1
−Isk∈S2 o.w.
A sample path of M1(k) (k ≥ 1) is shown in Fig. 4.4.
The complexity of BMR is O(|S1| + |S2|). See Appendix 4.C for an imple-
mentation of BMR. Note that unlike BGM, BMR does not specify the mapping
3Here I· is the indicator function.
97
0
0
s1
⊕
s2
M = 2
k
M1(k)
Chaff
Figure 4.4: Example: •: sk ∈ S1; : sk ∈ S2; M1(k): the statistics calculated byBMR. Initially, M1(0) = 0, indicating that the memory is empty. Thefirst packet is a departure, and it is assigned as chaff becauseotherwise the memory will be underflowed. The second packet is anarrival, and thus the memory size is increased by one. Such updatingoccurs at each arrival or departure.
between packets in the two processes because as long as the memory constraint is
satisfied, the order of transmission is irrelevant.
The optimality of BMR is guaranteed by the following proposition.
Proposition 13 For any realization (s1, s2), BMR inserts the minimum number
of chaff packets in transmitting an information flow with bounded memory M .
Proof: See Appendix 4.A.
¥
Since BMR is optimal, we can characterize βM2 by the CTR of BMR, as stated
in the following theorem.
Theorem 7 If S1 and S2 are independent Poisson processes of rates λ1 and λ2,
98
respectively, then with probability one, the CTR of BMR satisfies
limt→∞
CTRBMR(t) =
(λ2−λ1)
(
1+(
λ1λ2
)M+1)
(λ1+λ2)
(
1−(
λ1λ2
)M+1) if λ1 6= λ2,
11+M
if λ1 = λ2.
Proof: See Appendix 4.A.
¥
It can be shown that the CTR is minimized when λ1 = λ2, based on which we
have the following result.
Corollary 2 If under H0, S1 and S2 are independent Poisson processes, then the
level of undetectability βM2 = 1/(1 + M).
If nodes can insert at least 1/(1+M) fraction of chaff noise, then BMR gives a
feasible transmission schedule for an information flow with bounded memory such
that the overall traffic is statistically the same as traffic under H0. Therefore,
1/(1 + M) establishes a limit on the maximum amount of chaff noise under the
requirement of Chernoff-consistent detection. If M ≫ 1, then very little chaff noise
suffices to hide the information flows.
4.5 Detectability of Multi-hop Flows
The results in Section 4.4 suggest that pairwise detection of information flows
is vulnerable to chaff noise because a relatively small amount of chaff noise can
99
make the information flow undetectable. These results indeed reveal the weakness
of pairwise detection. As the number of hops increases, however, we see that
the constraints imposed on information-carrying packets become tighter because
only the packets satisfying the constraints at every hop can successfully reach the
destination. This observation motivates us to extend the results in Section 4.4 to
information flows over multiple hops. Specifically, we will show that the fraction of
chaff noise needed to make a multi-hop information flow mimic jointly independent
traffic increases to one as the number of hops increases, which implies that joint
detection may significantly improve the performance against chaff noise.
4.5.1 Multi-hop Flows with Bounded Delay
Consider the transmission of an n-hop (n ≥ 2) information flow with bounded
delay ∆ according to certain processes. Given a sequence of processes (Si)ni=1,
we want to decompose Si (i = 1, . . . , n) into Fi and Wi such that (Fi)ni=1 is an
information flow with bounded delay, and the CTR is minimized.
Given the 2-hop chaff-inserting algorithm BGM, one might think that we can
sequentially apply BGM to every pair of processes to obtain (Fi)ni=1. Such an ap-
proach, however, does not give the optimal decomposition. For example, consider
the realizations shown in Fig. 4.5. If we use BGM to match packets in s1 and s2,
and then repeat BGM to match the matched packets in s2 with s3, we only find
one sequence of matched packets (as shown in (a)). There is, however, another way
of matching that gives two sequences of matched packets (as shown in (b)). The
implication is that for n > 2, a hop-by-hop greedy match is not sufficient. We have
to jointly consider all the subsequent hops to find the optimal packet matching.
100
s1s1
s2s2
s3s3
∆∆(a) (b)
Figure 4.5: Example: (a) The scheduling obtained by repeatedly using BGM. (b)Another scheduling. It shows that repeatedly using BGM issuboptimal.
To solve this problem, we develop an algorithm called “Multi-Bounded-Delay-
Relay” (MBDR). The idea of MBDR is that a packet at time t1 in s1 can be
matched with a packet at t2 ∈ [t1, t1 + ∆] in s2 only if t2 has matched packets
in si for all i = 3, . . . , n. The matching of t2 and its matched packets is done by
recursions. Such recursions allow us to consider all the processes simultaneously
and achieve a smaller CTR than repeatedly applying BGM. Specifically, MBDR
works as follows: given a realization (si)ni=1,
1. match every packet at time t1 in s1 with the first unmatched packet t2 in
[t1, t1 + ∆] in s2, conditioned on that t2 has a match in s3;
2. for i = 2, . . . , n−1, match the packet ti in si with the first unmatched packet
ti+1 in [ti, ti+∆] in si+1, conditioned on that ti+1 has a match in si+2 (assume
every packet in sn has a match);
3. after trying to match all the packets in s1, label all the unmatched packets
as chaff.
For example, consider the 3-hop information flow illustrated in Fig. 4.6. To match
t1 ∈ S1, MBDR first tries to find a match for t2. Since t2 can be matched with
t3 ∈ S3, t1 is matched with t2. If t2 does not have a match in s3, MBDR will try
101
to match t1 with the next unmatched packet in [t1, t1 + ∆] in s2. If there is no
more packet left, MBDR will label t1 as chaff.
s1
s2
s3
chafft1
t2
t3
Figure 4.6: MBDR: a recursive greedy match algorithm.
A direct implementation of MBDR has complexity O((λ∆)n|S1|), where λ is
the maximum rate of S1, . . . , Sn. The complexity can be reduced to O(n2|S1|)
by expanding the recursions (see Appendix 4.C). Note that MBDR is reduced to
BGM when n = 2.
It is easy to verify that if we transmit information-carrying packets according
to the matching found by MBDR, the transmissions will satisfy the bounded delay
constraint at every hop. Moreover, such a transmission schedule preserves the order
of incoming packets. The following proposition states that MBDR is optimal.
Proposition 14 For any realization (si)ni=1, MBDR inserts the minimum number
of chaff packets in transmitting an n-hop information flow with bounded delay ∆.
Proof: See Appendix 4.A.
¥
102
By arguments similar to those in the proof of Theorem 6, one can show that the
CTR of MBDR converges a.s. It is difficult to compute the exact limit4. Instead,
we give the following bound.
Theorem 8 If Si (i = 1, . . . , n) are independent Poisson processes of maximum
rate λ, then
limt→∞
CTRMBDR(t) ≥ 1 − κn a.s.
where
κn = min
(
(λ∆)n−2(1 − e−λ∆),n−1∏
i=1
(1 − e−iλ∆)
)
.
Proof: See Appendix 4.A.
¥
By Theorem 8, we see that the CTR of MBDR goes to one exponentially with
the increase of n if λ∆ < 1. It can be shown that if we repeatedly apply BGM,
then the CTR is lower bounded by 1− (1− e−λ∆)n−1 a.s., which always converges
to one exponentially.
Although in Definition 9 we have assumed identical delay bounds at all the
relay nodes, MBDR can be easily extended to different delay bounds, and κn in
Theorem 8 becomes
min
(
(1 − e−λ∆n−1)n−2∏
i=1
(λ∆i),n−1∏
i=1
(1 − e−iλ∆i)
)
,
where ∆i is the maximum delay at the ith relay node.
4For example, for independent Poisson processes, computing the CTR of MBDR involvescomputing the limiting distribution of an (n − 1)-dimensional continuous state space Markovprocess.
103
The optimality of MBDR allows us to have the following result.
Corollary 3 If under H0, S1, . . . , Sn are independent Poisson processes of rates
bounded by λ, then β∆n ≥ 1 − κn.
By this result, we see that for sufficiently light traffic or small delay bound
(i.e., λ∆ < 1), β∆n converges to one exponentially fast as n increases. Numerical
calculation shows that β∆n still converges to one for λ∆ > 1, but the convergence
is slower than exponential. If we calculate the maximum rate of the information
flow by λ(1 − β∆n ), then this rate will go to zero with the increase of n, implying
that it is almost impossible to hide information flows over arbitrarily long paths.
See Fig. 4.7–4.9 for numerically computed information rate 1 − β∆n as a function
of n. From the plots, it is clear that the information rate decays exponentially at
λ∆ < 1 (Fig. 4.7) and subexponentially at λ∆ > 1 (Fig. 4.8, 4.9).
4.5.2 Multi-hop Flows with Bounded Memory
Suppose that we want to transmit an n-hop information flow with bounded memory
M according to certain processes. We generalize BMR to an algorithm called
“Multi-Bounded-Memory-Relay” (MBMR) to insert chaff noise in this case.
Algorithm MBMR borrows the idea of monitoring memory size in BMR. Specif-
ically, let Mi(k) (i = 1, . . . , n − 1) denote the memory size of Ri+1 after the
kth packet in the total traffic. Algorithm MBMR keeps updating (Mi(k))n−1i=1 for
k = 1, 2, . . . and assigns chaff packets if memory underflow or overflow occurs.
Given a realization (si)ni=1 and (sk)
∞k=1
∆= s1
⊕ · · ·⊕ sn, MBMR works as follows:
for k = 1, 2, . . .,
104
The normalized rate of information flow 1 − β∆n with respect to n (∆ = 1):
solid line: 1 − β∆n computed for 1000 packets per process; dashed line: κn.
2 3 4 5 6 7 8 9 1010
−2
10−1
100
n
info
rmat
ion
rate
Figure 4.7: λ = 0.9.
2 4 6 8 10 12 14 16 18 2010
−1
100
n
info
rmat
ion
rate
Figure 4.8: λ = 2.
0 5 10 15 20 25 30 35 4010
−1
100
n
info
rmat
ion
rate
Figure 4.9: λ = 4.
1. label sk ∈ Si as chaff if and only if Mi−1(k − 1) = 0 or Mi(k − 1) = M
and Mj(k) = Mj(k − 1) for j = 1, . . . , i − 2, i + 1, . . . , n − 1.
See Fig. 4.10 for an example of MBMR.
105
s1
s2
s3
s4
s
M1
M2
M3
Chaff
Figure 4.10: MBMR for n = 4 and M = 3 (s = s1
⊕ · · ·⊕ s4): monitor thememory sizes of the relay nodes and assign a chaff packet if thememory of any node will be underflowed or overflowed. Initially,Mi(0) = 0 (i = 1, 2, 3); at the end of this realization (after the 10thpacket), (M1(10), M2(10), M3(10)) = (1, 1, 0).
Algorithm MBMR has complexity O(n∑
i=1
|Si|). See Appendix 4.C for its imple-
mentation. Note that MBMR is reduced to BMR when n = 2. If we sequentially
match the non-chaff packets found by MBMR, then we will have a transmission
schedule that satisfies the bounded memory constraint. The optimality of MBMR
is provided by the following proposition.
Proposition 15 For any realization (si)ni=1, MBMR inserts the minimum number
of chaff packets to schedule the transmission of an n-hop information flow with
bounded memory M .
Proof: The proof follows the same arguments as in the proof of Proposition 13.
¥
We can now characterize βMn by the CTR of MBMR. If S1, . . . , Sn are indepen-
106
dent Poisson processes, then the CTR of MBMR converges almost surely, and the
limit can be calculated by the limiting distribution of a Markov chain, as shown
in Appendix 4.B.
It is difficult to give a closed-form expression for the exact CTR of MBMR.
Alternatively, we derive the following upper and lower bounds. Let
A ∆= (Si)
ni=1 : S1, . . . , Sn are independent Poisson processes.
We have the following theorem.
Theorem 9 For any (Si)ni=1 ∈ A,
limt→∞
CTRMBMR(t) ≥ 1 − un a.s.
Furthermore,
infA
limt→∞
CTRMBMR(t) ≤ 1 − ln a.s.
Here ln and un are given by
ln+1 =ln(1 − lMn )
1 − lM+1n
and
un+1 = un
(
1 − 1
M + 12−M/un
)
for n ≥ 2, and l2 = u2 = M/(M + 1).
Proof: See Appendix 4.A.
¥
107
Although identical memory constraints have been assumed in Definition 9,
MBMR can be easily modified to allow different memory constraints, and it can
be shown that the CTR is bounded between 1 − u′n and 1 − l′n, where
l′n+1 =l′n(1 − l′n
Kn)
1 − l′nKn+1
and
u′n+1 = u′
n
(
1 − 1
Kn + 12−Kn/u′
n
)
for n ≥ 2, and l′2 = u′2 = K1/(K1 + 1). Here Ki (i = 1, . . . , n − 1) is the memory
constraint at the ith relay node.
Based on Theorem 9 and the optimality of MBMR, we have the following result.
Corollary 4 If under H0, S1, . . . , Sn are independent Poisson processes, then
1 − un ≤ βMn ≤ 1 − ln.
The bounds in Corollary 4 are not far from the actual value of βMn at small n;
see the numerical results in Fig. 4.11.
2 3 4 5 6 7 8 9 100.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
0.55CTR of MBMR
n
CT
R
βMn
1−ln
1−un
Figure 4.11: The level of undetectability βMn and its bounds as functions of n:
M = 4; compute βMn on 10000 packets.
108
Another interpretation of Corollary 4 is that the normalized maximum rate of
information flow calculated by 1 − βMn is bounded between ln and un. Numerical
calculation shows that ln and un both decay polynomially. Specifically, ln decays at
approximately Θ(n−1/M) and un at Θ(n−1/(2M−2)). Furthermore, numerical com-
parison shows that if λ∆ = M , βMn increases slower than β∆
n as n → ∞, suggesting
that it is relatively easier to hide information flows with bounded memory.
4.6 Detector
In Section 4.4 and 4.5, we have characterized the levels of undetectabilities for infor-
mation flows with bounded delay or bounded memory. The results are summarized
in Table 4.1. These results provide upper bounds on the level of detectability.
Table 4.1: Levels of undetectabilities (Poisson null hypothesis).β∆
2 = 11+λ∆
βM2 = 1
1+M
β∆n ≥ 1 − κn
1 − un ≤ βMn ≤ 1 − ln
In this section, we will present an explicit detector whose consistency can ap-
proximate βjn (j = ∆, M) arbitrarily. Our main theorem is stated as follows.
Theorem 10 For any ǫ > 0, there exists a detector such that its consistency is
no smaller than βjn − ǫ (j = ∆, M).
Remark: The theorem states that as ǫ → 0, there exists a sequence of detectors
with consistency approaching βjn (j = ∆, M). Therefore, the level of strong
detectability is no smaller than βjn, i.e., αj
n ≥ βjn (j = ∆, M).
109
The proof of Theorem 10 is by constructing a detector and showing that its
consistency approximates βjn (j = ∆, M) arbitrarily. Ideally, we would like to know
what strategy is used to perturb timing and insert chaff noise so that we can design
a detector accordingly. The difficulty here is that we do not know what strategy
is going to be used when information flows are transmitted, and therefore our goal
is to design a single detector which has good performance for a wide variety of
information flows.
The key idea is to design the detector based on the amount of chaff noise needed
by the optimal chaff-inserting algorithms. If the detector is designed to guarantee
that even the optimal algorithms need a sufficiently large amount of chaff to evade
detection, then any other chaff-inserting algorithm would have to insert no less
chaff noise to evade detection. Therefore, we can make sure that the detector is
r-consistent against fractions of chaff up to a certain level. Specifically, we propose
the following detector.
Definition 13 Given observations5 (si)ni=1 (n ≥ 2), the detector is defined as
δt((si)ni=1; τn) =
1 if CTR(t) ≤ τn,
0 o.w.,
where τn is a predetermined threshold, and CTR(t) is the minimum fraction of
chaff in the measurements.
Remark: The statistic CTR(t) is computed by the optimal chaff-inserting algo-
rithm followed by certain adjustments. Specifically, it is calculated by the following
procedure:
5To be precise, the detector is only given the part of si (i = 1, . . . , n) that falls into thelength-t observation interval.
110
1. compute C , the set of chaff packets found by the optimal chaff-inserting
algorithm (MBDR for bounded delay flows or MBMR for bounded memory
flows);
2. calculate a number C by
C =
∣
∣
∣
∣
∣
C \(
n⋃
i=1
Si ∩ [0, (i − 1)∆)
)∣
∣
∣
∣
∣
for bounded delay flows, or
C = |C | + min0≤k≤w∗
d(k)
for bounded memory flows, where d(k) is the cumulative difference defined
as
d(k)∆=
k∑
j=1
(
Isj∈S1 − Isj∈S2)
, (4.5)
d(0) = 0, and w∗ is the first time that d(k) varies by M , i.e., w∗ ∆= infw :
max0≤k≤w
d(k) − min0≤k≤w
d(k) = M;
3. compute CTR(t) = C/N , where N =n∑
i=1
|Si|.
For implementation details, we refer to Appendix 4.C. We point out that for large
N , the influence of the adjustment in step (2) on CTR(t) is negligible.
The reason for CTR(t) to be the minimum fraction of chaff in the measurements
hinges on two facts. The first is the optimality of the chaff-inserting algorithm used
to find C . The second is the adjustment in step (2). The adjustment is needed
because the detector may not observe the beginning of the information flow. At the
time the detector starts, there may have been packets stored at the relay nodes, and
when these packets are relayed, the relay packets may appear to be chaff noise from
the detector’s perspective since they do not correspond to any observed packets.
We solve this problem by ignoring certain chaff packets found at the beginning of
111
the measurements. For bounded delay flows, these are the packets in [0, (i− 1)∆)
in si (i = 1, . . . , n). For bounded memory flows, these are the packets which may
be relays of packets stored in the memory initially. Detailed explanations can be
found in Appendix 4.C.
Now that CTR(t) is the minimum CTR in the measurements, we can guarantee
detection as follows.
Theorem 11 The detector in Definition 13 has vanishing miss probability for all
the information flows with CTR bounded by τn a.s.
Theorem 11 is a direct implication of the fact that CTR(t) is the minimum
fraction of chaff packets in the measurements. Actually, a stronger statement holds,
which is that the detector has no miss detection for all realizations of information
flows with no more than τn fraction of chaff packets.
The threshold value needs to be carefully chosen such that the detector satisfies
certain false alarm constraint. Specifically, under the assumption that Si’s are
independent Poisson processes of maximum rate λ under H0, we have the following
theorems on the false alarm probabilities.
Theorem 12 If τn < βjn (j = ∆, M), then the false alarm probability satisfies
limN→∞
1
Nlog PF (δt) ≤ −Γn(τn; λ, ∆) < 0
for bounded delay flows, and
limN→∞
1
Nlog PF (δt) ≤ −Γn(τn; M) < 0
for bounded memory flows, where N =n∑
i=1
|Si|.
112
Proof: See Appendix 4.A.
¥
The theorem states that the false alarm probability of the proposed detector
decays exponentially as long as the threshold is less than βjn (j = ∆, M). The
functions Γn(τn; λ, ∆) and Γn(τn; M) give lower bounds on the error exponents;
see the proof for their definitions. We point out that Γn(τn; λ, ∆) and Γn(τn; M)
are positive for all τn < βjn (j = ∆, M), and they are both decreasing functions of
τn.
Combining Theorem 11 and 12 yields the following result.
Corollary 5 If τn < βjn (j = ∆, M), then τn is the consistency of the proposed
detector.
Remark: As τn → βjn, the consistency of the proposed detector converges to βj
n,
which proves that the level of strong detectability is lower bounded by βjn. From
Corollary 5, we see that the proposed detector is optimal in terms of consistency.
In particular, since βjn → 1 as n increases, the proposed detector can detect almost
all the long-lasting information flows with sufficiently long paths.
The threshold τn represents a tradeoff between the consistency and the false
alarm probability. A larger τn enables consistent detection against more chaff
noise at the cost of a higher false alarm probability, whereas a smaller τn leads to
a smaller false alarm probability but less consistency against chaff noise.
113
4.7 Generalization of Poisson Assumption
We have assumed that the node transmission epochs can be modelled as indepen-
dent Poisson processes under H0. Poisson assumption allows us to obtain clean
analytical results, but it is known that wide-area traffic such as internet traffic does
not fit the Poisson model. It, however, can be argued that Poisson processes are
less bursty than real-world traffic, and therefore, our results provide lower bounds
on the levels of detectability of actual information flows.
Specifically, suppose that traffic under H0 can be modelled as independent
renewal processes with Pareto interarrival distributions [28]. It was shown in [28]
that Pareto distribution fits experimental data over many time scales. We show
that such processes are more difficult to mimic than independent Poisson processes,
as stated in the following theorem.
Theorem 13 Let CTR′BMR
(t) denote the chaff-to-traffic ratio found by BMR in in-
dependent renewal processes with Pareto interarrival distributions and CTRBMR(t)
in independent Poisson processes of the same rates. Then
lim inft→∞
CTR′BMR
(t) ≥ limt→∞
CTRBMR(t) a.s.
Similar statement holds for the CTR of BGM.
Proof: See Appendix 4.A.
¥
By this theorem, we see that it requires more chaff noise to mimic the null
114
hypothesis under Pareto interarrival distributions. The results can be generalized
to MBDR and MBMR.
If traffic under H0 has Pareto interarrival distributions, the idea in prov-
ing Theorem 5 is still applicable. Specifically, let CTR′(t) be the fraction of
chaff packets inserted by the optimal chaff-inserting algorithm (i.e., MBDR or
MBMR) in the interval [0, t] under Pareto interarrival distribution. Then the up-
per bound on the level of weak detectability is the minimum r (r ∈ [0, 1]) such that
lim supt→∞
CTR′(t) ≤ r a.s., and the lower bound on the level of strong detectability
is the maximum r such that lim inft→∞
CTR′(t) ≥ r a.s6. We see that the levels of de-
tectabilities under Pareto distribution are no smaller than those under Exponential
distribution.
To verify the claim that Poisson assumption provides lower bounds on the ac-
tual detection performance, we simulate BGM and BMR on the traces LBL-PKT-
4, which contains an hour’s worth of all wide-area traffic between the Lawrence
Berkeley Laboratory and the rest of the world7. We compute the CTR of pairs of
different traces8, and then compare the empirical cumulative distribution function
(c.d.f.) of the computed CTR with the c.d.f. of the CTR predicted by Theorems
6 and 7 for independent Poisson processes of the same rates as the empirical rates
of the traces. See Fig. 4.12 and 4.13. From these plots, it is clear that at the
same threshold, the traces have much lower false alarm probabilities than Poisson
processes.
We point out that the results in Theorem 13 also apply to renewal processes
with other interarrival distributions which have the heavy-tailed property [21].
6Note that for Pareto interarrival distributions, the upper and the lower bounds may notmeet.
7The traces were made by Paxson and were first used in his paper [28].8We extract 134 TCP traces from the data, each of which is truncated to 1000 packets.
115
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Poisson
traces
c.d.f.
CTR
Figure 4.12: The c.d.f. of the CTR of BGM for ∆ = 5: CTR on traces vs. CTRon Poisson processes.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Poisson
traces
c.d.f.
CTR
Figure 4.13: The c.d.f. of the CTR of BMR for M = 20: CTR on traces vs. CTRon Poisson processes.
On the other hand, if the interarrival distributions are light-tailed such as the
uniform distribution, it can be shown that the opposite results hold. In terms of
tailweight, we have analyzed a popular medium-tailed distribution, the exponential
distribution, and our results should be viewed as a benchmark for other tailweights.
116
4.8 Simulations
In this section, we simulate the proposed detectors on both synthetic Poisson traffic
and internet traces. The simulations on Poisson traffic are meant to verify our
analysis and examine properties of the proposed detectors, whereas the simulations
on traces are mainly used to verify the performance on actual traffic and show the
relative advantages of our detectors compared with existing flow detectors.
4.8.1 Synthetic Data
For synthetic data, (S1, . . . , Sn) is a sequence of independent Poisson processes of
rate λ under H0. Under H1, it is the mixture of an information flow (F1, . . . , Fn)
of rate (1− fc)λ (for some fc ∈ (0, 1)) and chaff traffic (W1, . . . , Wn), where Wi
(i = 1, . . . , n) are independent Poisson processes of rate fcλ. Here the parameter
fc is the CTR. The process F1 is a Poisson process of rate (1− fc)λ, and its relays
Fi (i > 1) are generated as follows. For information flows with bounded delay,
Fi = sort(Fi−1(1) + D1, Fi−1(2) + D2, . . .), i > 1,
where Fi−1 = (Fi−1(1), Fi−1(2), . . .), and D1, D2, . . . are i.i.d. delays uniformly
distributed in [0, ∆]. For information flows with bounded memory, we partition
the epochs of Fi−1 into groups of size ⌊M/2⌋, where the jth group is
(Fi−1((j − 1)⌊M/2⌋), . . . , Fi−1(j⌊M/2⌋ − 1)).
Then Fi is generated by selecting ⌊M/2⌋ epochs independently and uniformly from
the interval [Fi−1((j − 1)⌊M/2⌋), Fi−1(j⌊M/2⌋)) for each j ≥ 2. As illustrated in
Fig. 4.14, if we match epochs in the generated realizations fi−1 and fi (i ≥ 2)
sequentially, then the matching satisfies the bounded memory constraint.
117
f1
f2
⌊M2⌋
Figure 4.14: Generating information flows with bounded memory (⌊M2⌋ = 3): f2
is generated by storing ⌊M2⌋ packets from f1 and randomly releasing
these packets during the arrival of the next ⌊M2⌋ packets.
Explanations of the parameters used in this simulation are summarized in Ta-
ble 4.2. We are mainly interested in the influence of changing n on the detection
performance. Since it can be shown that increasing n has opposite effects on the
false alarm and the miss probabilities, we plot the receiver operating characteristics
(ROCs) [30] for different n.
Table 4.2: Parameters for Simulations on Synthetic Data.n the number of processesλ the rate of Si (i = 1, . . . , n)∆ maximum delayM maximum memory sizefc CTR
We first fix the sample size per process and vary the threshold to plot the ROCs
for bounded delay flows and bounded memory flows; see Fig. 4.15, 4.16. From
the plots, we see that the ROCs approach the upper left corner (i.e., zero error
probabilities) as n increases, implying that the detector has better performance
as the number of processes increases. This is as expected because as n increases,
the detector has more observations, and thus the detection performance should be
improved.
We then fix the total sample size and plot the ROCs for different n; see Fig. 4.17,
Figure 4.18: The ROCs for detecting bounded memory flows: λ = 4, M = 4,fc = 0.2, n = 2, . . . , 6, totally 100 packets over all process, 10000Monte Carlo runs.
Note that the levels of detectability at n = 2 for bounded delay flows and
bounded memory flows are both equal to 0.2. By the discussions following Corol-
laries 1 and 2, we know that with 20% chaff noise, we can generate information
120
flows which make the detection no better than random guessing. In the simulation
(Fig. 4.15–4.18), however, the detection is clearly much better than random guess-
ing. This observation shows that the flow-generating models used in the simulation
are not optimal. If we compare the ROCs for bounded delay flows with those for
bounded memory flows9 (i.e., Fig. 4.15 vs. Fig. 4.16 and Fig. 4.17 vs. Fig. 4.18),
we see that the detector of bounded memory flows outperforms that of bounded
delay flows for these flow-generating models.
4.8.2 Traces
For simulation on traces, we use the TCP traces in LBL-PKT-4 referenced in
Section 4.7. We extract 134 flows from the TCP packets in LBL-PKT-4. Each
flow has at least 1000 packets, and 4 of them have at least 10000 packets. Only
pairwise detection is simulated due to the limited data. Under H0, (S1, S2) is a
pair of different traces of size 1000. Under H1, Si = Fi ⊕ Wi (i = 1, 2), where
Wi consists of Nc packets i.i.d. uniformly distributed on the range of Fi. The
process F1 is a trace of size 10000, and F2 is generated by bounded delay or
bounded memory perturbations as those in Section 4.8.1. Parameters used in this
simulation are explained in Table 4.3.
Table 4.3: Parameters for Simulations on Traces.N total number of epochs∆ maximum delayM maximum memory sizeNc number of chaff packets per process
We compare the proposed detectors for bounded delay flows (denoted by δBD)
9We have made M = λ∆ for fair comparison.
121
and bounded memory flows (denoted by δBM) with the detector δDAC using algorithm
“Detect-Attacks-Chaff” (DAC) for bounded memory flows10 (by Blum et al. in [5])
and the detector δS-III using algorithm S-III for bounded delay flows (by Zhang et
al. in [49]). We first simulate the false alarm probabilities; see Fig. 4.19. We
choose the thresholds of δBD and δBM such that their false alarm probabilities are
comparable with that of δDAC. The false alarm probabilities of these three detectors
level off after the sample size 1000; the false alarm probability of δS-III, however,
keeps decreasing to a much smaller value. From the plot, we see that the false
alarm probabilities of δBM, δBD, and δDAC do not decay exponentially. It is possible
that the false alarm probability of δS-III decays exponentially, but we do not have
enough data in these traces to verify that.
0 200 400 600 800 1000 1200 1400 1600 1800 200010
−5
10−4
10−3
10−2
10−1
100
N
PF
(δ)
δBM
δBD
δDAC δS-III
Figure 4.19: PF (δBM), PF (δDAC), PF (δBD), and PF (δS-III) on LBL-PKT-4: M = 20,∆ = 5, threshold for δBD = 1/14, threshold for δBM = 1/21, tested on134 × 133 trace pairs.
We then simulate the miss probabilities of δBM and δDAC; see Fig. 4.20. For
each of the 4 traces of size 10000, we generate 1000 bounded memory flows inde-
pendently. The simulation shows that δBM has much lower miss probability than
δDAC. In fact, δBM detects all the information flows, whereas δDAC has up to 27.58%
10Originally, DAC was proposed for information flows with bounded delay and bounded peakrate, but it is applicable for bounded memory flows as discussed in Section 3.5.1.
122
misses by sample size 22000. The plot also shows that the miss probability of δDAC
increases with the sample size. This is because as the sample size increases, the
average number of chaff packets also increases, and δDAC can only handle a fixed
number of chaff packets. Note that although our analysis says that δBM is only
consistent for CTR up to 0.0476, δBM survives CTR = 0.1 in the simulation, which
implies that the uniform chaff insertion is not optimal for bounded memory flows.
0 0.5 1 1.5 2 2.5
x 104
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
N
PM
(δ)
δBM
δDAC
Figure 4.20: PM(δBM) and PM(δDAC): M = 20, Nc = 1000, threshold forδBM = 1/21, tested on 4000 bounded memory flows.
Next we simulate the miss probabilities of δBD and δS-III; see Fig. 4.21. We
generate 1000 bounded delay flows independently from each of the traces of size
10000. The plot confirms that δBD has a much smaller miss probability than δS-III;
actually, in the simulation, δBD has no miss for almost all the sample sizes11. This
is because δBD can tolerate a certain fraction of chaff packets no matter where
they are inserted, whereas δS-III is vulnerable to chaff packets in S1. As in the case
of bounded memory flows, we see that δBD handles much more chaff noise than
predicted by the analysis, which shows that uniform chaff insertion is also not
optimal for bounded delay flows. Moreover, comparing Fig. 4.20 and Fig. 4.21, we
see that δDAC is more robust to chaff noise than δS-III.
11The exception is at the sample size 3000, where we have 6 misses out of 4000 informationflows.
123
0 0.5 1 1.5 2 2.5
x 104
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
N
PM
(δ)
δBD
δS-III
Figure 4.21: PM(δBD) and PM(δS-III): ∆ = 5, Nc = 1000, threshold for δBD = 1/14,tested on 4000 bounded delay flows.
4.9 Summary
This chapter addresses timing-based detection of information flows in the pres-
ence of active perturbations and chaff noise. It characterizes the detectability
of information flows in terms of the maximum amount of chaff noise that allows
consistent detection and shows how to design the detector to achieve consistent de-
tection based on knowledge of the null hypothesis. The Poisson assumption under
the null hypothesis makes our results lower bounds on the detection performance
of practical information flows. The proposed detector coupled with capacity con-
straints between neighbor nodes can capture all the long-lived information flows
with positive rates and sufficiently long paths.
124
APPENDIX 4.A
PROOF OF CHAPTER 4
Proof of Theorem 6
Let Yj be the jth packet delay, i.e., Yj = S2(j) − S1(j). Define
We see that Zj’s are i.i.d. random variables; each Zj is the difference between two
independent exponential random variables with mean 1/λ2 and 1/λ1, respectively.
The process Yj∞j=1 is a general random walk with step Zj. Define Y0 = 0.
Now for every chaff packet inserted at t in S2, we insert a virtual packet at t
in S1; for every chaff packet at s in S1, we insert a virtual packet at s + ∆ in S2,
as illustrated in Fig. 4.22. Let the new packet delays after the insertion of virtual
packets be Y ′j ∞j=0. It can be shown that Y ′
j ∞j=0 is also a random walk with step
Zj, but it has two reflecting barriers at 0 and ∆, i.e.,
Y ′j = min(max(Y ′
j−1 + Zj, 0), ∆).
Since it is almost surely impossible for Y ′j−1 + Zj to be exactly equal to 0 or
∆, each time Y ′j = 0 corresponds to a chaff packet in S2, and Y ′
j = ∆ corresponds
to a chaff packet in S1. Thus, the limiting probability for a packet to be chaff
is h∆/(1 − h0) in S1, and h0/(1 − h∆) in S2, where h0 = limj→∞
PrY ′j = 0, and
h∆ = limj→∞
PrY ′j = ∆. The overall probability for a packet in S1
⊕
S2 to be chaff
is a weighted sum
λ1h∆
(λ1 + λ2)(1 − h0)+
λ2h0
(λ1 + λ2)(1 − h∆). (4.6)
125
0
0 S1
S2t
s∆
Chaff Attackingpacket
packetVirtual
Figure 4.22: Inserting virtual packets to calculate the delays of chaff packets.
By ergodicity of Y ′j ∞j=0, the CTR of BGM converges to the limiting probability
in (4.6) almost surely.
Now we calculate h0 and h∆. Let the equilibrium distribution function of Y ′j
be H(x), i.e., H(x) = limj→∞
PrY ′j ≤ x. It is shown in Example 2.16 in [8] that
h0 = H(0) =
1−λ1λ2
1−(
λ1λ2
)2e∆(λ1−λ2)
if λ1 6= λ2,
12+λ1∆
o.w.
and
h∆ = 1 − H(∆−) =
(
λ1λ2
)
e∆(λ1−λ2)(
1−λ1λ2
)
1−(
λ1λ2
)2e∆(λ1−λ2)
if λ1 6= λ2,
12+λ1∆
o.w.
Therefore, by (4.6), we have that the CTR of BGM satisfies
limt→∞
CTRBGM(t)
=
(λ2−λ1)(
1+(
λ1λ2
)
e∆(λ1−λ2))
(λ1+λ2)(
1−(
λ1λ2
)
e∆(λ1−λ2)) if λ1 6= λ2,
11+λ1∆
if λ1 = λ2,
almost surely.
¥
126
Proof of Proposition 13
Algorithm BMR is feasible since the non-chaff part (f1, f2) satisfies the bounded
memory constraint. It remains to show the optimality.
Assume that C∗ is an optimal chaff-inserting algorithm. If M1(k − 1) = M
and sk ∈ S1, then node R2 has an arrival when the memory is full, and C∗ has to
drop at least one arriving packet at or before sk to prevent memory overflow. If
M1(k − 1) = 0 and sk ∈ S2, then R2 has a departure when the memory is empty,
so C∗ has to insert at least one dummy packet at or before sk in s2 to prevent
memory underflow. Therefore, BMR inserts no more chaff than C∗.
¥
Proof of Theorem 7
If S1 and S2 are independent Poisson processes of rates λ1 and λ2 respectively,
then it is known that the cumulative differences d(w) defined in (4.5) form
a simple random walk. Algorithm BMR assigns chaff such that the cumulative
differences d′(w) of the processes F1 and F2 satisfy 0 ≤ d′(w) ≤ M for all w.
By the memoryless property of exponential interarrival times, it is easy to see that
d′(w) is a random walk with reflecting barriers at 0 and M (i.e., a Markov chain
with state space 0, . . . , M). Its transition probabilities are shown in Fig. 4.23.
0 1 M
pppp
qqq q
Figure 4.23: The Markov chain formed by d′(w); p = λ1
λ1+λ2, q = 1 − p.
127
It is easy to see that d′(w) is an irreducible, aperiodic, and positive recurrent
Markov chain, and thus has a limit distribution (π0, . . . , πM). Since the limit
distribution satisfies
πi =λ1
λ2
πi−1, i = 1, . . . ,M,
we have
π0 =
1−λ1λ2
1−(
λ1λ2
)M+1 if λ1 6= λ2,
11+M
o.w.
πM =
(
λ1λ2
)M(
1−λ1λ2
)
1−(
λ1λ2
)M+1 if λ1 6= λ2,
11+M
o.w.
The physical meaning of d′(w) is the memory size after the transmission of the
wth packets in S1
⊕
S2. The self-loop at state 0 corresponds to chaff packets in
S2 because these transmissions occur when the memory is empty (so they have to
be dummy packets); the self-loop at state M corresponds to chaff in S1 because
the transmissions occur when the memory is full (so the packets will be dropped).
By ergodicity of d′(w), as w → ∞, the CTR of BMR converges to the limiting
probability of self-loops almost surely. The limiting probability is a weighted sum
π0q + πMp, which is equal to
(λ2−λ1)
(
1+(
λ1λ2
)M+1)
(λ1+λ2)
(
1−(
λ1λ2
)M+1) if λ1 6= λ2,
11+M
if λ1 = λ2.
¥
128
Proof of Proposition 14
By expanding the recursions of MBDR, it can be shown that MBDR is equivalent
to an algorithm which finds the earliest sequence of relay epochs for each packet
in s1. That is, for s ∈ S1, if p = (s, t2, . . . , tn) (ti ∈ Si) is a sequence of relay
epochs for s, then MBDR finds the sequence p = (s, t2, . . . , tn) such that
1. p satisfies the causality and the bounded delay constraints;
2. ti ≤ ti (i = 2, . . . , n) for any other sequence of relay epochs that satisfies
these constraints.
We will refer to a sequence of relay epochs as a relay sequence.
A set of relay sequences preserves the order of packets if for any two sequences
(ti)ni=1 and (t′i)
ni=1 in the set, t1 ≤ t′1 implies ti ≤ t′i for all i = 2, . . . , n. We will use
the following result.
Lemma 3 Among all sets of relay sequences satisfying the constraints of causality,
packet-conservation, and bounded delay, there always exists a set which has the
largest size and preserves the order of packets.
By this lemma, it suffices to search among order-preserving sets of relay se-
quences. It remains to show that it is optimal to find the earliest relay sequences.
Let P be the set of relay sequences found by MBDR, and P∗ be the largest
and order-preserving set of relay sequences. Suppose s1 ∈ S1 has a relay sequence
p∗1 ∈ P∗ but not in P, as illustrated in Fig. 4.24. Then there must be relay
sequences in P which start earlier than s1 and partly overlap with p∗1 (otherwise,
129
MBDR would have chosen p∗1 or a sequence earlier than p∗1 for s1). Let the earliest
of these sequences be p1, with starting epoch s2 ∈ S1. For j = 2, 3, . . ., do the
following.
i) If sj does not have a relay sequence in P∗, we stop searching; otherwise,
suppose that sj has a relay sequence p∗j ∈ P∗.
ii) The sequence p∗j is at least partly earlier than pj−1 because p∗j is earlier than
p∗j−1 and p∗j−1 partly overlaps with pj−1. Since MBDR has not chosen the
earlier part of p∗j , it implies that there must be sequences in P earlier than
pj−1, which partly overlap with p∗j . Let the earliest of these sequences be pj,
with starting epoch sj+1 ∈ S1. Continue with i).
· · ·...
s1
sn
s1s2s3smsm+1
p1
p∗1p∗2
p2p∗mpm
Figure 4.24: Every relay sequence in P∗ corresponds to a relay sequence in P;solid line: sequences in P; dashed line: sequences in P∗.
When we stop searching, we will either find an epoch in s1 which has a relay
sequence in P but not in P∗, or reach a relay sequence pm ∈ P which starts before
all the relay sequences in P∗. Therefore, for every relay sequence in P∗, we can
find a different sequence in P, which implies that the size of P is no smaller than
that of P∗.
¥
130
Proof of Lemma 3
The proof is by induction. As illustrated in Fig. 4.25, suppose that
(s1(1), s2(2), s3(1)) and (s1(2), s2(1), s3(2)) are relay sequences satisfying causal-
ity, packet-conservation, and bounded delay. By switching the intersected part, we
obtain two sequences (s1(1), s2(1), s3(1)) and (s1(2), s2(2), s3(2)) which satisfy
these constraints and also preserve the packet order. By repeatedly applying such
switching, we can reorganize any set of relay sequences into an order-preserving
set and maintain satisfaction of the constraints.
s1
s2
s3
s1(1) s1(2)
s2(1)
s2(2)
s3(1) s3(2)
Figure 4.25: Solid lines denote the original relay sequences; dashed lines denotethe reorganized relay sequences which preserve the order of packets.
¥
Proof of Theorem 8
We bound the CTR of MBDR by deriving upper bounds on the probability for an
arbitrary packet in S1 to have a match. Then the result of Theorem 8 holds by
ergodicity. Compared with the first packet, subsequent packets are more difficult
to match because some of their relay epochs may have been used to relay previous
packets. Thus, it suffices to upper bound the probability for the first packet to
131
have a match. Denote this probability by Pn.
First, note that a necessary condition for the first packet at time t to have a
match is that the corresponding intervals [t, t + (i − 1)∆] in Si (i = 2, . . . , n)
in which the packet can be relayed are all nonempty. The probability for this
event is at mostn−1∏
i=1
(1− e−iλ∆) (achievable if all the processes have rate λ). Thus,
Pn ≤n−1∏
i=1
(1 − e−iλ∆).
Next, we prove by induction that Pn is also upper bounded by (λ∆)n−2(1 −
e−λ∆). For n = 2, this bound is the same as the upper bound derived above.
Assume that the result holds for Pn−1 (n ≥ 3). By writing Pn in parts with
respect to the number of epochs within delay ∆ in S2, we have
Pn ≤∞
∑
k=1
(λ∆)k
k!e−λ∆
·Prat least one of the k epochs has a match
≤∞
∑
k=1
(λ∆)k
k!e−λ∆kPn−1 (4.7)
= λ∆Pn−1,
where union bound is used to obtain (4.7). Hence, we have shown that Pn ≤
(λ∆)n−2(1 − e−λ∆).
Combining these two bounds, we have that the CTR of MBDR is lower bounded
by 1 − min(n−1∏
i=1
(1 − e−iλ∆), (λ∆)n−2(1 − e−λ∆)) a.s.
¥
132
Proof of Theorem 9
We prove the theorem by induction.
For n = 2, we have seen from Theorem 7 that the minimum CTR of MBMR is
1/(1 + M).
Assume the result holds up to n (n ≥ 2). For (n + 1)-hop flows, it suffices to
show that 1 − un+1 ≤ limt→∞
CTRMBMR(t) ≤ 1 − ln+1 a.s. when Si’s have equal rate.
This is because equal rate is the case that minimizes CTR (which can be shown
by arguments similar to Theorem 7). We prove the result by showing that the
asymptotic fraction of non-chaff packets (i.e., 1− CTR) is bounded between ln+1
and un+1.
Note that the output of a relay node is no longer a Poisson process. This
is because the probability of finding another information-carrying packet af-
ter an information-carrying packet is greater than the probability of finding an
information-carrying packet after a chaff packet. The precise model to decide
whether a packet is chaff or not is the Markov chain shown in Appendix 4.B. As a
result, the arrival process at node Rn+1 is more regular than a Poisson process of
the same rate.
For the lower bound, assuming Si’s all have rate λ, we substitute the arrival
process at node Rn+1 with a Poisson process of rate λln. Since we destroy the
regularity and may also reduce the rate (because λln is a lower bound on the
rate), this substitution gives us a lower bound on the fraction of non-chaff packets.
For this arrival process and an independent Poisson process of rate λ which is
the departing process of Rn+1, we know from the proof of Theorem 7 that the
133
asymptotic fraction of chaff packets in the departing process is
π0 =1 − λ1
λ2
1 −(
λ1
λ2
)M+1,
where λ1 = λln and λ2 = λ. Therefore, we have that the asymptotic fraction of
non-chaff packets is lower bounded by
1 − π0 = 1 − 1 − ln1 − lM+1
n
,
which is equal to ln+1.
For the upper bound, we consider the following arrival process at node Rn+1.
The process is generated by dividing points in a Poisson process of rate λ into
consecutive groups of size M/un and selecting M consecutive points from the
beginning of each group. Analogous to conventional batched processes, we refer to
the group size M/un as the period, and M as the batch size. A realization of such
a process is drawn in Fig. 4.26.
batch
periodtime
Figure 4.26: A “batched” arrival process generated from a Poisson process. ¥:arrival epochs; : points in the underlying Poisson process; M = 2,period= 5.
We consider such a batched process because it maximizes the time between the
(kM)th arrival and the (kM + 1)th arrival (k ∈ N) so that it is least likely for
the memory to be overflowed. Moreover, we choose the period to make the arrival
rate equal to λun (which may be higher than the actual rate). Therefore, using
this arrival process allows us to obtain an upper bound on the fraction of non-chaff
packets.
134
Consider such an arrival process and an independent Poisson process of rate
λ. After the first arrival in a period, with probability 2−M/un , there will be no
departure until the first arrival in the next period. In this case, there are M + 1
consecutive arrivals, and thus at least 1 packet will be dropped. Hence, the fraction
of dropped packets at node Rn+1 is lower bounded by 2−M/un/(M +1), i.e., at most
1 − 2−M/un/(M + 1) fraction of the information-carrying packets arriving at Rn+1
can be successfully relayed. Since at most un fraction of the incoming packets of
Rn+1 is carrying information, the overall fraction of information-carrying packets
relayed by Rn+1 is upper bounded by
un
(
1 − 1
M + 12−M/un
)
,
which is equal to un+1.
¥
Proof of Theorem 12
We prove the theorem for bounded delay flows and bounded memory flows sepa-
rately. Here we present the proof for n = 2; the proof for n ≥ 2 is analogous.
Proof for Bounded Memory Flows
By Theorem 7, we know that the false alarm probability is maximized when λ1 =
λ2, where λi (i = 1, 2) is the rate of Si. Consider this equal rate case.
Define T1 to be the number of packets in S1
⊕
S2 until the first chaff packet,
including the first chaff packet, and Ti (i > 1) the number of packets between the
135
(i − 1)th and ith chaff packets, excluding the (i − 1)th chaff packet but including
the ith. Let C be the number of chaff packets found by BMR. Then the false alarm
probability can be written as
PF (δt) = PrC ≤ τ2N
= Prτ2N∑
i=1
Ti ≥ N
= Pr 1
τ2N
τ2N∑
i=1
Ti ≥1
τ2
. (4.8)
It is know that for Poisson processes, the cumulative differences d(k)k=1, 2,...
defined in (4.5) form a simple random walk with Prd(k) = d + 1|d(k − 1) = d =
1/2. The Markovian property implies that T1, T2, . . . are independent, and for
i ≥ 2, Ti has the same distribution as N−1, M+1 defined by
N−1, M+1∆= infk : d(k) = −1 or M + 1 | d(0) = 0. (4.9)
By Theorem 7, we know that the ratio C/N will almost surely converge to
1/(1 + M) as N → ∞, i.e., limc→∞
c/(c
∑
i=1
Ti) = 1/(1 + M) almost surely. It implies
that limc→∞
1c
c∑
i=1
Ti = 1 + M almost surely, and thus E[Ti] = 1 + M (i ≥ 2).
Now that Ti’s (i ≥ 2) are i.i.d. , by Sanov’s Theorem in [7], we have
limN ′→∞
1
N ′ log Pr 1
N ′
N ′∑
i=1
Ti ≥ 1/τ2 =
− minW : E[W ]≥1/τ2
D(W ||T2),
where N ′ = τ2N . By (4.8), we obtain that
limN→∞
1
Nlog PF (δt) = −τ2 min
W : E[W ]≥1/τ2D(W ||T2)
∆= −Γ2(τ2; M).
136
It is difficult to compute Γ2(τ2; M) directly, but the computation can be reduced
to an optimization over a single variable by Cramer’s Theorem [9]. Nevertheless, as
long as 1/τ2 > 1 + M , we have that E[W ] > E[T2], and thus Γ2(τ2; M) is positive.
By the definition of Γ2(τ2; M), it is easy to see that it is a decreasing function of
τ2.
Proof for Bounded Delay Flows
The proof for bounded delay flows is similar to that for bounded memory flows.
By Theorem 6, we see that the false alarm probability is maximized when S1 and
S2 both have the maximum rate λ. Consider this case.
Let Ti (i ≥ 1) be defined the same as in the proof for bounded memory flows.
Then the false alarm probability can be written as
PF (δt) = Pr 1
N ′
N ′∑
i=1
Ti ≥ 1/τ2, (4.10)
where N ′ = τ2N .
Let Yj be defined as in the proof of Theorem 6. We have shown that the
process Yjj=1, 2,... is a general random walk. For i ≥ 2, Ti’s are i.i.d. with the
same distribution as
2 · infj : Yj < 0 or Yj > ∆ | Y0 = 0 − 1. (4.11)
Let C be the number of chaff packets found by BGM. By Theorem 6, we have
limN→∞
C/N = 1/(1 + λ∆) almost surely. Thus, limc→∞
1c
c∑
i=1
Ti = 1 + λ∆ almost surely,
which implies that E[T2] = 1 + λ∆.
137
By Sanov’s Theorem [7], we have
limN ′→∞
1
N ′ log Pr 1
N ′
N ′∑
i=1
Ti ≥ 1/τ2 =
− minW : E[W ]≥1/τ2
D(W ||T2).
Plugging in (4.10) yields that
limN→∞
1
Nlog PF (δt) = −τ2 min
E[W ]≥1/τ2D(W ||T2)
∆= −Γ2(τ2; λ, ∆).
For 1/τ2 > 1 + λ∆, we have that E[W ] > E[T2], and therefore Γ2(τ2; λ, ∆) >
0. As τ2 increases, the minimization is over a larger set, and thus Γ2(τ2; λ, ∆)
decreases. This completes the proof.
¥
Proof of Theorem 13
The proof utilizes the ideas in the proofs of Theorem 6 and 7.
The classical Pareto distribution (see [28]) with shape parameter β (β ≥ 0)
and location parameter α (α ≥ 0) has the probability density function
p(x) = βαβx−β−1, x ≥ α.
This distribution has a property that the conditional expectation E[X − x|X ≥ x]
is an increasing function of x.
For information flows with bounded memory, consider the cumulative differ-
ences d′(w)∞w=0 between the processes of matched epochs found by BMR, as
138
defined in the proof of Theorem 7. The CTR of BMR is the frequency of self-loops
in d′(w)∞w=0, as illustrated in Fig. 4.23. Unlike exponential distribution, Pareto
distribution has memory, and the resulting d′(w)∞w=0 is not Markovian. Note,
however, that the memory of interarrival times makes it easier to reach the states
0 and M and generate self-loops. This is because whenever d′(w) increases by 1,
the next arrival is more likely to be in S1 (since the arrival in S2 has waited for
some time and it is likely to wait even longer), and thus d′(w) is likely to keep
increasing. Hence the average time to reach 0, M is shorter than that for Pois-
son processes. At the state 0 (or M), the same argument implies that it is more
likely to take more self-loops after a self-loop. Therefore, BMR inserts more chaff
noise in independent renewal processes with Pareto interarrival distributions than
in independent Poisson processes.
For information flows with bounded delay, similar arguments hold. The process
Y ′j ∞j=0 defined in the proof of Theorem 6 is no longer a Markov process under
Pareto interarrival distributions, but we can show that the endpoints 0 and ∆ are
visited more frequently and therefore produce more chaff noise.
¥
139
APPENDIX 4.B
ASYMPTOTIC CTR OF MBMR
Here we show how to calculate the CTR of MBMR by a Markov chain. In
particular, we are interested in computing βMn (n ≥ 2). Assume the processes are
independent and Poisson under H0.
If S1, . . . , Sn are independent Poisson processes, then the vectors
(Mi(k))n−1i=1 ∞k=0 computed by MBMR form an (n − 1)-dimensional homogeneous
Markov chain. By arguments similar to that in the proof of Theorem 7, it can be
shown that the CTR is minimized when all Si’s have equal rate, in which case the
CTR of MBMR is βMn . We will focus on the equal rate case although the method
is easily generalizable to arbitrary rates.
If Si (i = 1, . . . , n) have equal rate, then the transition probabili-
ties of (Mi(k))n−1i=1 are as follows. Denote the transition probability by
Prmn−11 |m′n−1
1 , where mn−11 , m′n−1
1 ∈ 0, . . . , Mn−1, and (mi, . . . , mj) (i ≤ j)
by mji . For 2 ≤ i ≤ n − 1, mi−1 > 0, and mi < M ,
Pr(mi−21 , mi−1 − 1, mi + 1, mn−1
i+1 |mn−11 ) =
1
n;
for m1 < M ,
Pr(m1 + 1, mn−12 |mn−1
1 ) =1
n;
for mn−1 > 0,
Pr(mn−21 , mn−1 − 1|mn−1
1 ) =1
n;
140
moreover,
Pr(mn−11 |mn−1
1 ) =1
n
·(
Im1=M +n−1∑
i=2
Imi−1=0 ∨ mi=M + Imn−1=0
)
.
According to MBMR, each self-loop corresponds to a chaff packet, and therefore
the CTR is equal to the probability of self-loops in the equilibrium distribution.
That is, if π is the equilibrium distribution of (Mi(k))n−1i=1 , then the CTR of
MBMR converges to the limiting probability of self-loops, denoted by ηn, almost
surely, where
ηn =∑
mn−11 ∈0,...,Mn−1
π(mn−11 ) Pr(mn−1
1 |mn−11 ).
For example, for n = 3 and M = 2, (M1(k), M2(k)) (k ≥ 0) follows the Markov
chain in Fig. 4.27. Here η3 = 13( 1
15+ 2×4
45+ 2×1
9) + 2
3(2×4
45+ 2
9) = 19
45. This is the
CTR of MBMR for 3-hop information flows with memory sizes bounded by 2, i.e.,
β23 = 19
45.
0, 0 2, 0
0, 1 1, 1 2, 1
0, 2 1, 2 2, 2
13
13
13
13
13
13
23
23
23
115
445
445
445
445
19
19
215
29
29
Figure 4.27: The Markov chain of (M1(k), M2(k))∞k=0. All straight lines havetransition probability 1/3. All the states are marked with theirlimiting probabilities, e.g.,π(0, 2) = 1/15.
141
APPENDIX 4.C
ALGORITHMS OF CHAPTER 4
Chaff-inserting Algorithm for Two-hop Bounded Delay
Flows
For the algorithm BGM presented in Section 4.4.1, we combine the insertion of
chaff and the matching of information-carrying packets into the implementation
presented in Table 4.4.
Table 4.4: Bounded-Greedy-Match (BGM).
Bounded-Greedy-Match(s1, s2, ∆):m = n = 1;while m ≤ |S1| and n ≤ |S2|
if s2(n) − s1(m) < 0s2(n) = chaff; n = n + 1;
else if s2(n) − s1(m) > ∆s1(m) = chaff; m = m + 1;
elsematch s1(m) with s2(n);m = m + 1; n = n + 1;
endend
end
This implementation of BGM uses two pointers m and n to record the current
epochs examined in s1 and s2, and keeps updating m and n depending on whether
the match is successful or not. Its complexity is O(|S1| + |S2|).
142
Chaff-inserting Algorithm for Multi-hop Bounded Delay
Flows
Implementation of the algorithm MBDR presented in Section 4.5.1 is presented in
Table 4.5. The complexity of such an direct implementation is O((λ∆)n|S1|) (λ
match of s1(k) = MBDR1(s1(k), 1, s1, . . . , sn, ∆);if match of s1(k) = ∅
s1(k) = chaff;end
end
MBDR1(s, i, s1, . . . , sn, ∆):for t ∈ Si+1 ∩ [s, s + ∆]
match of t = MBDR1(t, i + 1, s1, . . . , sn, ∆);if match of t = ∅
t = chaff;else
return t;end
endreturn ∅;
Performance of recursive algorithms can often be improved by expanding re-
cursions. An implementation of expanded MBDR is shown in Table 4.6. The
complexity of this implementation is12 O(n2|S1|).12The dominating step is the recursive computation of Ci, j ’s. Suppose that the maximum
rate of S1, . . . , Sn is λ, and thus there are at most (i − 1)λ∆ points in Ci, j on the average.The selection of these points takes (2i − 3)λ∆ steps. The total complexity can be calculated by
Mi−1 = Mi−1 − 1; Mi = Mi + 1;Li−1 = min(Li−1, Mi−1);Ui = max(Ui, Mi);
elseC = C + 1;
endend
endend
return
H1 if CN
≤ τn,H0 o.w.;
152
Chapter 5
Distributed Detection of Information
Flows
5.1 Outline
In the previous chapters, precise timing measurements have been used for de-
tection. In wide-area networks (e.g.,wireless sensor networks), there are usually
constraints on the communication rates between the points of measurements and
the detector. This chapter addresses this issue in the framework of distributed
detection. The rest of the chapter is organized as follows. Section 5.2 formulates
the problem. Section 5.3 defines the performance criteria and gives some theo-
retical results on the performance of general detection systems. Sections 5.4–5.6
are dedicated to practical detection systems, where Section 5.4 defines two simple
quantizers, Section 5.5 presents optimal chaff-inserting and detection algorithms
for each quantizer, and Section 5.6 analyzes and compares the performance of the
proposed detection systems. Then Section 5.7 concludes the chapter with a few
remarks.
153
5.2 The Problem Formulation
5.2.1 Problem Statement
In a wireless ad hoc network as illustrated in Fig. 5.1, nodes A and B may be car-
rying an information flow. If the nodes are transmitting an information flow, then
their transmission activities Si (i = 1, 2) can be decomposed into an information
flow (F1, F2) and chaff noise Wi, i.e., Si = Fi ⊕Wi (referred to as containing an
information flow). As in Chapter 4, we allow Wi to be any process, and it may
be correlated with Fi. In this chapter, we only consider information flows with
bounded delay ∆ defined in Definition 7.
A B
S1: S2:
Uplinkchannels
Detector
Wireless node Eavesdropper
Figure 5.1: In a wireless network, nodes A and B may serve on one or multipleroutes. Eavesdroppers are deployed to collect their transmissionactivities Si (i = 1, 2), which are then sent to a detector at the fusioncenter.
We are interested in testing the following hypotheses:
H0 : S1, S2 are independent,
H1 : (S1, S2) contains an information flow,(5.1)
by observing measurements compressed from Si (i = 1, 2). Assume that the
154
maximum delay ∆ is known. Moreover, assume that the marginal distributions of
Si (i = 1, 2) are known, and they are the same under both hypotheses (detailed
analysis is done for Poisson processes). This is a partially nonparametric hypothesis
testing because no statistical assumptions are imposed on the correlation of S1 and
S2 under H1.
We point out that the assumption that Si (i = 1, 2) have the same distributions
under both hypotheses is not a limiting assumption. This is because otherwise, an
eavesdropper can independently make a decision based on its own measurements
(e.g.,by the Anderson-Darling test [28]) and send the result (a 1-bit message) to
the fusion center, and the error probabilities can be made arbitrarily small if there
are enough measurements.
5.2.2 System Architecture
The capacity constraints in the uplink channels make it necessary to incorporate
quantizers q(t)i (i = 1, 2) at the eavesdroppers, where t is the duration of the
observation. As illustrated in Fig. 5.2, the processes Si (i = 1, 2) are compressed
into q(t)i (Si) which are delivered to the fusion center, and then the detector makes
a decision in the form of
θt = δt(q(t)1 (S1), q
(t)2 (S2)),
where1 θt ∈ 0, 1. The capacity constraints are expressed as2
||q(t)i || ≤ etRi , i = 1, 2 (5.2)
1The value 0 denotes H0, and 1 denotes H1.2The unit of Ri (i = 1, 2) is nats per unit time.
155
for sufficiently large t, where ||q(t)i || is the alphabet size of the output of q
(t)i .
Generally, R1, R2 < ∞, but if the detector is located at one of eavesdroppers,
e.g.,the eavesdropper of node B, then R2 = ∞, which is called the case of full
side-information.
S1
S2
q(t)1
q(t)2
U ∈ 1, . . . , etR1
V ∈ 1, . . . , etR2 δt
θt
Figure 5.2: A distributed detection system. This system consists of twoquantizers q
(t)1 and q
(t)2 and a detector δt.
Given (R1, R2), the problem is to design q(t)i (i = 1, 2) and δt such that the
overall detection performance is optimized.
5.3 Performance Criteria
In this section, we define the criteria for evaluating detection performance and
present theoretical results on the optimal performance under the proposed criteria.
5.3.1 Level of detectability
The detection performance in classical multiterminal hypothesis testing is usually
evaluated by the error exponents [16]. In our problem, the alternative hypothesis
is nonparametric, which makes it improper to adopt the error exponent criterion.
Instead, we measure the performance by the notion of consistency defined in Defini-
tion 11. The optimal performance establishes a level of detectability of information
flows as follows.
156
Given capacity constraints (R1, R2), we characterize the extent to which infor-
mation flows are detectable by a notion called the level of detectability, denoted by
α(R1, R2), which is defined as
α(R1, R2)
∆= supr ∈ [0, 1] : ∃(q
(t)1 , q
(t)2 , δt) :
1) δt is r-consistent;
2) lim supt→∞
1
tlog ||q(t)
i || ≤ Ri, i = 1, 2.. (5.3)
That is, α(R1, R2) is the maximum consistency of all the detection systems under
capacity constraints (R1, R2). If Ri = ∞ (i = 1, 2), then this definition is reduced
to the level of strong detectability in centralized detection (see Definition 12). Our
goal is to design quantizers and detectors to achieve α(R1, R2).
Before concluding the introduction to our performance measure, we would like
to show an example which explains why our approach deviates from the classical
approaches.
Example Consider an alternative formulation where we assume Si (i = 1, 2)
are renewal processes under both hypotheses, i.e., the interarrival times Kj∆=
S1(j + 1)−S1(j) and Lj∆= S2(j + 1)−S2(j) (j = 1, 2, . . .) are i.i.d. , respectively.
Moreover, assume that the process of epoch pairs (S1(j), S2(j))∞j=1 is also renewal,
i.e., (Kj, Lj) (j ≥ 1) are i.i.d. with some distribution PKL. The testing hypotheses
are
H0 : PKL = PK PL, H1 : PKL 6= PK PL, (5.4)
where PK PL is the product distribution with the same marginals as PKL, defined
by PK PL(k, l) = PK(k)PL(l) (i.e., Kj and Lj are independent). This is a
testing against dependence problem under multiterminal data compression. By
157
similar techniques as in the testing against independence problem in [1], one can
develop the optimal test of (5.4) to minimize the error probabilities. The problem
is, however, that this is not the problem we want to solve in information flow
detection. Specifically, there are simple strategies to manipulate the information
flows such that the optimal test of (5.4) fails. For example, consider the scenario
in Fig. 5.3, where S2 is a identical copy of S1 except that a chaff packet is inserted
at the beginning. Then the subsequent observations of interarrival times will be
misaligned. In particular, for j ≥ 3, the jth pair of interarrival times becomes
(Kj, Lj) = (Kj, Kj−1). Since Kj’s are independent, the test of (5.4) will fail to
detect such an obvious information flow.
S1
S2
Chaff
K1
K2 K3 K4K5
L1 L2L3 L4 L5
Figure 5.3: Inserting one chaff packet can destroy the alignment of measurements.
The notion of consistency prevents obvious mistakes as in the above example
by guaranteeing that it is possible to have non-vanishing miss probability only if
a sufficient amount of chaff noise is inserted.
5.3.2 Level of Undetectability
Since the eavesdroppers cannot distinguish chaff noise from information flows, there
is a limit on the amount of chaff noise beyond which an information flow can be
158
made statistically identical with traffic under H0. We use this limit to measure
the level of undetectability. For centralized detection, the level of undetectability
is defined as the minimum CTR for an information flow to mimic the distribu-
tions under H0; see (4.4). For distributed detection, the distributions seen by the
detector depend on the quantizers, and so does the level of undetectability.
For deterministic quantizers3 qi (i = 1, 2), the level of undetectability is defined
as the minimum CTR required to mimic H0 after quantization, i.e.,
φ(H0; q1, q2)
∆= infr ∈ [0, 1] : ∃Fi, Wi (i = 1, 2) :
1) Fi
⊕
Wid= Si, i = 1, 2, and
(qi(Fi
⊕
Wi))2i=1
d= (qi(Si))
2i=1 for some
(Si)2i=1 under H0;
2) (F1, F2) is an information flow;
3) lim supt→∞
CTR(t) ≤ r a.s.. (5.5)
With proper perturbations and φ(H0; q1, q2) fraction of chaff noise, an information
flow can appear to be the same as traffic under H0 to both the eavesdroppers and
the detector. Therefore, the maximum consistency under quantizers (q1, q2) is
upper bounded by φ(H0; q1, q2).
Generally, the quantization schemes may involve randomization. A random-
ized quantizer of S1 is a set of conditional distributions Q1(x|s1), where s1 is a
realization of S1, x ∈ X ∞ for a finite or countable alphabet X , and Q1(x|s1) is
the probability of quantizing s1 to x. A randomized quantizer Q2(y|s2) of S2 is
3Each qi can be viewed as the limit of a sequence of deterministic quantizers q(t)i t≥0 as t
increases.
159
defined similarly. Given (Q1, Q2), the level of undetectability is defined as
φ(H0; Q1, Q2)
∆= infr ∈ [0, 1] : ∃Fi, Wi (i = 1, 2) :
1) Fi
⊕
Wid= Si, i = 1, 2, and
(X, Y)|(Fi
⊕
Wi)2i=1
d= (X, Y)|(Si)2i=1
for
some (Si)2i=1 under H0;
2) (F1, F2) is an information flow;
3) lim supt→∞
CTR(t) ≤ r a.s.. (5.6)
where (X, Y)|(Si)2i=1is the marginal of (X, Y) in (X, Y, S1, S2) specified by
the distribution of (S1, S2) and the conditional distribution Q(X, Y|S1, S2) =
Q1(X|S1)Q2(Y|S2). Note that we can write the conditional distribution in product
form because the quantization of the two processes is independent, i.e., X → S1 →
S2 → Y. Similar to φ(H0; q1, q2), φ(H0; Q1, Q2) gives an upper bound on the
maximum consistency under (Q1, Q2).
5.3.3 General Converse and Achievability
Given capacity constraints (R1, R2), we are interested in finding the value of
α(R1, R2) and designing detection systems to achieve it. In Chapter 4, we have
answered these questions for infinite capacities. Now we provide high-level answers
for finite capacities.
Theorem 14 For any Ri ≥ 0 (i = 1, 2),
α(R1, R2) ≤ maxP1
φ(H0; Q1, Q2), (5.7)
160
where4
P1 = (Q1(X|S1), Q2(Y|S2)) : lim supt→∞
1
tI(S1; X) ≤ R1,
lim supt→∞
1
tI(S2; Y) ≤ R2..
Furthermore, let Q∗1 and Q∗
2 achieve the maximum in (5.7), and (F∗i , W
∗i ) (i = 1, 2)
achieve φ(H0; Q∗1, Q∗
2) as defined in (5.6) without the requirement that Fi
⊕
Wid=
Si (i = 1, 2). If Q∗1 and Q∗
2 are deterministic, and the CTR of (F∗i
⊕
W∗i )
2i=1
converges a.s. to some value α∗(R1, R2), then
α(R1, R2) ≥ α∗(R1, R2).
Proof: See Appendix 5.A.
¥
Remark: The theorem contains a converse result and an achievability result.
The converse result states that the level of detectability under certain capacity
constraints is no more than the maximum level of undetectability over all the
quantizers satisfying the capacity constraints. The achievability result gives a lower
bound on the level of detectability by constructing a specific detection system with
consistency equal to α∗(R1, R2).
It can be shown that solving the maximization in (5.7) is equivalent to com-
puting a distortion rate function with distortion measure
φ(H0) − φ(H0; Q1, Q2),
4Note that P1 is well-defined because Si (i = 1, 2) have the same distributions under bothhypotheses.
161
which characterizes the performance loss due to quantization by Qi (i = 1, 2). How
to compute this distortion rate function is an open problem because the distortion
measure is not single-letter (and it is a function of distributions). Instead, we will
develop practical detection systems and analyze their performance to give lower
bounds on α(R1, R2).
5.4 Quantizer Design
The design of quantizers q(t)i (i = 1, 2) is complicated by the dependency on t.
To simplify design, we partition the observation into n slots of equal length T
(T = t/n) and perform independent and identical quantization in each slot. We
propose the following quantizers based on the counting measure.
Definition 14 Given a point process S, a slotted quantizer with slot length T is
defined as γ(S)∆= (Z1, Z2, . . .), where Zj (j ≥ 1) is the number of points in the
jth slot (i.e., the interval [(j − 1)T, jT )) of S.
The slotted quantizer was first used to compress Poisson processes by Rubin in
[31], where combined with proper reconstruction methods, it was shown to achieve
compression performance close to the optimal predicted by the rate distortion
function under the single-letter absolute-error fidelity criterion. Note that it does
not imply that slotted quantizer is optimal or near optimal in our problem because
our fidelity criterion is different. We refer to the quantization by a slotted quantizer
as slotted quantization. It is easy to see that the above definition is equivalent to
the point-wise quantizer γ(t) = ⌊t/T ⌋, where t ∈ R+.
162
For applications requiring extremely low rate, it may be desirable to further
compress the results of slotted quantization. To this end, we propose the following
quantizer.
Definition 15 Given a point process S, a one-bit quantizer is a binary quantiza-
tion of the output of a slotted quantizer, defined as
γ(S) = (IZj>0)∞j=1,
where Z = γ(S), and I· is the indicator function.
Quantization by a one-bit quantizer is called one-bit quantization. The rate of
one-bit quantizer decays at O(1/T ) as T → ∞.
Hereafter, we will refer to the quantization results of S1 and S2 by Xn = (Xj)nj=1
and Yn = (Yj)nj=1, respectively, the meaning of which will depend on the quantizers
used. For the full side-information case (i.e., R2 = ∞), we will use Y (s, t) to denote
the number of epochs in S2 in the interval [s, t).
If Si (i = 1, 2) are Poisson processes, then Xj’s and Yj’s are i.i.d. , and it is
known that they can be delivered almost perfectly under the capacity constraints
in (5.2) if and only if
H(X1)
T≤ R1,
H(Y1)
T≤ R2. (5.8)
5.5 Detection Algorithms
In this section, we will present detectors for each of the quantization schemes pro-
posed in Section 5.4 and analyze their consistency. The detectors borrow the idea
163
in centralized detection, i.e., the detector should compute the minimum fraction
of chaff noise needed to generate the received measurements and make detection if
this fraction is suspiciously small. Optimal chaff-inserting algorithms are developed
to compute the minimum fraction of chaff.
In the rest of this section, we will discuss the following four cases: I) q1 is
a slotted quantizer, q2 is an identity function (full side-information); II) q1 and
q2 are both slotted quantizers; III) q1 is a one-bit quantizer, q2 is an identity
function; IV) q1, q2 are both one-bit quantizers. In Cases II and IV, equal capacity
constraints (R1 = R2) are considered for simplicity, although the idea of detection
is generalizable to unequal constraints. Since the level of detectability in high
capacity regime is already known, our analysis will focus on the low capacity (i.e.,
large slot length) regime.
5.5.1 Case I: Slotted Quantization, Full Side-Information
Consider the case when q1 is a slotted quantizer and q2 is an identity function.
Assume that the capacities are sufficient to permit reliable delivery of quantized
measurements. Then the detector needs to make a decision based on the measure-
ments xn and s2.
We want to insert the minimum chaff noise to mimic given (xn, s2), i.e., we want
to find realizations of an information flow (fi)2i=1 and chaff noise wi (i = 1, 2) such
that i) xn = γ(f1⊕
w1), s2 = f2⊕
w2, and ii) the CTR is minimized. If both s1
and s2 are given, then the optimal chaff-inserting algorithms is BGM presented in
Section 4.4.1. Now that we only know xn and s2, the idea is to reconstruct s1 from
xn and apply BGM on the reconstructed processes. Based on this idea, we develop
164
a chaff-inserting algorithm called “Insert Chaff: Slotted, Full side-information”
(IC-SF) as follows. Given (xn, s2), IC-SF does the following:
1. construct a point process s1 as busts of xj simultaneous epochs at (j − 1)T
(j ≥ 1), as illustrated in Fig. 5.4;
2. run BGM on (s1, s2) with delay bound T + ∆.
s1
s2
xn: 22
∆ 2T
0
0 T
chaff
Figure 5.4: IC-SF: Match s1 with s2 subject to delay bound T + ∆. : directlyobserved epoch in s2; •: reconstructed epoch in s1.
The optimality of IC-SF is provided by the following proposition.
Proposition 16 Algorithm IC-SF inserts the minimum number of chaff packets
to make an information flow mimic any (xn, s2) under the quantization in Case I.
Proof: See Appendix 5.A.
¥
Since IC-SF is optimal, we can compute the minimum number of chaff packets
to mimic the measurements (xn, s2) using IC-SF. This idea leads to the following
detector.
165
Given (xn, s2), define a detector δI as
δI(xn, s2) =
1 if CI/N ≤ τI,
0 o.w.,
where N =n∑
j=1
xj + |S2|, and CI is the number of chaff packets found by IC-SF in
(xn, s2), excluding chaff packets in5 S2 ∩ [0, ∆). The implementation of δI can be
found in Appendix 5.B.
The actual number of chaff packets has to be at least CI, and therefore, δI has
vanishing miss probability for all the information flows with CTR bounded by τI
a.s.
The false alarm probability of δI is guaranteed by the following theorem.
Theorem 15 If under H0, S1 and S2 are independent Poisson processes of rates
bounded by λ, and T is large, then for any τI < 1√πλT
− ∆4T
, the false alarm probability
of δI decays exponentially with n.
Proof: See Appendix 5.A.
¥
Theorem 15 tells us how to choose τI to have exponentially decaying false alarm
probability. Combining the theorem with our discussion on miss probability, we see
that a proper choice of threshold will enable δI to be r-consistent for r arbitrarily
close to
αI(T )∆=
1√πλT
− ∆
4T≈ 1√
πλT(5.9)
5This is because packets in this interval may be relays of packets transmitted before thedetector starts taking observations.
166
fraction of chaff noise, i.e., the consistency of δI is lower bounded by αI(T ). As
expected, αI(T ) is a decreasing function of T .
If we fix the quantization scheme as slotted quantization, then the capacity
constraints affect detection only through T . It is known that for Poisson processes
of maximum rate λ, the rate6
RI(T )∆= H(Poi(λT ))/T (5.10)
suffices to reliably deliver Xn for large n. By Gaussian approximation, we see
that H(Poi(λT ))/T ≈ log (2πeλT )/(2T ) for large T , i.e., the required rate under
slotted quantization decreases as O(log T/T ).
Combining Theorem 15 and (5.10) gives us an achievable rate-consistency pair
for each T , denoted by (RI(T ), αI(T )). Given a capacity constraint R, the achiev-
able consistency-rate function can be obtained by αI(R−1I (R)). The consistency-
rate functions for the other cases (i.e., Case II–IV) are characterized similarly.
5.5.2 Case II: Slotted Quantization, Equal Capacity Con-
straints
Consider equal capacity constraints (R1 = R2 = R < ∞), and qi (i = 1, 2) are
both slotted quantizers of the same slot length T . We follow the procedure in Case
I to develop a detector for this scenario.
We develop an optimal chaff-inserting algorithm called “Insert Chaff: Slotted,
Equal capacities” (IC-SE) based on ideas similar to IC-SF. Given (xn, yn), IC-SE
works as follows:6Here H(Poi(λT )) is the entropy of Poisson distribution with mean λT .
167
1. construct point processes si (i = 1, 2) as bursts of xj (or yj) simultaneous
points at (j − 1)T for j ≥ 1;
2. run BGM on (s1, s2) with delay bound ⌈∆T⌉T .
Algorithm IC-SE is optimal in minimizing the number of chaff packets, as stated
in the following proposition.
Proposition 17 For any (xn, yn), IC-SE inserts the minimum number of chaff
packets to make an information flow mimic these observations under the quanti-
zation in Case II.
Proof: See Appendix 5.A.
¥
Algorithm IC-SE provides a method to compute the minimum amount of chaff
noise in the measurements, based on which we design a detector as follows.
Given (xn, yn), define a detector δII as
δII(xn, yn) =
1 if CII/N ≤ τII,
0 o.w.,
where N =n∑
j=1
(xj + yj), and CII is the number of chaff packets found by IC-SE
in (xn, yn), except for chaff packets in7 S2 ∩ [0, ⌈∆T⌉T ). See Appendix 5.B for an
implementation of δII.
7As in the computation of CI, this adjustment is needed because packets at the beginning ofs2 may be the relays of packets transmitted before the detector starts.
168
Under H1, the optimality of IC-SE implies that the actual number of chaff
packets in the measurements is no smaller than CII. Therefore, a CTR larger
than τII is required to evade δII, i.e., δII has vanishing miss probability for all the
information flows with CTR bounded by τII a.s.
Under H0, the following theorem guarantees the false alarm probability of δII.
Theorem 16 If S1 and S2 are independent Poisson processes of maximum rate λ,
and T is large, then for any τII < c12√
λTe−λT/6, where c1 = 0.0014, the false alarm
probability of δII decays exponentially with n.
Proof: See Appendix 5.A.
¥
By Theorem 16, δII can achieve Chernoff-consistent detection for arbitrarily
close to
αII(T )∆=
c1
2√
λTe−λT/6 (5.11)
fraction of chaff noise, and thus its consistency is at least αII(T ). Note that as
T increases, αII(T ) decays exponentially at the rate O(e−λT/6); compared with
the O(1/√
T ) decay of αI(T ), the results suggest that the consistency in Case II
decays much faster than that in Case I due to the quantization of S2. The pair
(RII(T ), αII(T )), where RII(T ) = RI(T ), gives us a pair of achievable rate and
consistency.
169
5.5.3 Case III: One-Bit Quantization, Full Side-Information
Consider the scenario when S1 is compressed by one-bit quantization, and S2 is
fully available.
This case is similar to Case I in Section 5.5.1 except that the observations are
indicators instead of the exact counts. Clearly, more information is lost after one-
bit quantization because when xj = 1, there can be one or more epochs in slot
j in s1. To overcome this difficulty, we use a backward matching, i.e., matching
epochs in s2 with nonempty slots in s1. Specifically, we develop a chaff-inserting
algorithm called “Insert Chaff: One-bit, Full side-information” (IC-OF) which
works as follows. Given (xn, s2), IC-OF:
1. match every epoch in s2 with the earliest unmatched nonempty slot within
delay ∆, as illustrated in Fig. 5.5;
2. unmatched epochs become chaff; each unmatched nonempty slot contains a
chaff packet.
s2
xn:
∆
T
0 1111
Figure 5.5: IC-OF: Backward greedy matching. Each epoch is matched to thefirst unmatched nonempty slot that is no more than ∆ earlier.
Algorithm IC-OF is the optimal chaff-inserting algorithm in Case III, as stated
in the following proposition.
170
Proposition 18 Algorithm IC-OF inserts the minimum number of chaff packets
to make an information flow generate any given observations (xn, s2) under the
quantization in Case III.
Proof: See Appendix 5.A.
¥
Based on IC-OF, we develop the following detector. Given (xn, s2), the detector
δIII is defined as
δIII(xn, s2) =
1 if CIII/(nN1 + |S2|) ≤ τIII,
0 o.w.,
where CIII is the number of chaff packets found by IC-OF in (xn, s2), excluding
chaff packets in S2∩ [0, ∆), and N1 = − log (1 − x) for x = 1n
n∑
j=1
xj. Here N1 is the
Maximum Likelihood estimate of the mean number of epochs per slot in S1 based
on the assumption that S1 is Poisson. See Appendix 5.B for an implementation of
δIII.
Under H1, Proposition 18 guarantees that the actual number of chaff packets
is no smaller than CIII. Moreover, under the Poisson assumption, N1 converges to
the average traffic size per slot in S1 a.s. Thus the statistic CIII/(nN1 + |S2|) is
upper bounded by the actual CTR a.s. as n → ∞, implying that δIII has vanishing
miss probability for CTR bounded by τIII a.s.
Under H0, the performance of δIII is guaranteed by the following theorem.
171
Theorem 17 If S1 and S2 are independent Poisson processes of maximum rate λ,
and T is large, then for any τIII < 12e−λT , the false alarm probability of δIII decays
exponentially with n.
Proof: See Appendix 5.A.
¥
By this theorem, we see that the consistency of δIII is lower bounded by
αIII(T )∆=
1
2e−λT . (5.12)
For S1 considered in Theorem 17, it requires a rate of
RIII(T )∆=
log 2/T if λT ≥ log 2,
h(e−λT )/T o.w.,(5.13)
to deliver Xn reliably for large n, where h(p) is the binary entropy function defined
as h(p) = −p log p−(1−p) log (1 − p). Therefore, we see that (RIII(T ), αIII(T )) is an
achievable rate-consistency pair. As T increases, αIII(T ) decays exponentially with
the exponent λ. Note that this decay is much faster than the O(1/√
T ) decay of
αI(T ), indicating that for the same slot length, one-bit quantization significantly
reduces consistency compared with slotted quantization. It, however, does not
imply that slotted quantization is better because one-bit quantizers can use a
slot length much smaller than that of slotted quantizers under the same capacity
constraints.
172
5.5.4 Case IV: One-Bit Quantization, Equal Capacity Con-
straints
Suppose that one-bit quantizers with the same slot length T are used for both S1
and S2. This case is similar to Case II in Section 5.5.2, except that the measure-
ments (xn, yn) are binary vectors instead of exact packet counts.
To match the epochs, we can still use the idea of IC-SE, but since the number of
epochs in a nonempty slot can be one or more, we assume it to be the number that
minimizes the number of chaff packets over all positive integers. The amount of
chaff noise can be computed by an algorithm called “Insert Chaff: One-bit, Equal
capacities” (IC-OE) as follows. Given (xn, yn), IC-OE inserts a chaff packet in
slot j if
xj >
j+⌈∆/T ⌉∑
k=j
yk, or yj >
j∑
k=j−⌈∆/T ⌉xk
for j = 1, . . . , n.
Algorithm IC-OE computes the minimum amount of chaff noise as stated in
the following proposition.
Proposition 19 Algorithm IC-OE inserts the minimum number of chaff pack-
ets to make an information flow mimic given binary vectors (xn, yn) under the
quantization in Case IV.
Proof: See Appendix 5.A.
¥
173
Based on IC-OE, we develop a detector δIV as follows. Given (xn, yn), the
detector is defined as
δIV(xn, yn) =
1 if CIV/[n(N1 + N2)] ≤ τIV,
0 o.w.,
where CIV is the number of chaff packets inserted by IC-OE in (xn, yn), excluding
chaff packets in S2 ∩ [0, ⌈∆T⌉T ), and Ni (i = 1, 2) are defined as in δIII as functions
of xn and yn, respectively. See Appendix 5.B for its implementation.
Under H1, δIV has vanishing miss probability as long as the CTR is bounded
by τIV a.s. because of Proposition 19 and arguments similar to those in Section
5.5.3.
Under H0, the following theorem tells us how to choose the threshold to guar-
antee vanishing false alarm probability.
Theorem 18 If S1 and S2 are independent Poisson processes of maximum rate
λ, and T is large, then for any τIV < (1−e−λT )2λT
e−2λT , the false alarm probability of
δIV decays exponentially with n.
Proof: See Appendix 5.A.
¥
By this theorem, the consistency of δIV is lower bounded by
αIV(T )∆=
(1 − e−λT )
2λTe−2λT . (5.14)
Thus we have an achievable rate-consistency pair (RIV(T ), αIV(T )), where RIV(T ) =
RIII(T ). The value of αIV(T ) decays exponentially with the increase of T with the
174
exponent 2λ. Comparing this result with the decay of αII(T ), we see the analysis
suggests that for the same T , the consistency under one-bit quantization decays
12 times faster than that under slotted quantization. Again, it does not mean that
slotted quantization is better because the slot lengths under different quantization
schemes are different.
5.6 Analysis and Comparison
Recall that we have taken a separated approach by breaking the distributed de-
tection process into three steps—quantization, data transmission, and detection.
In this section, we analyze the consistency of the proposed detectors and then
compare their consistency to gain insights into the quantizer design.
5.6.1 Performance Analysis
Assume that S1 and S2 are independent Poisson processes of maximum rate λ
under H0. We will analyze the consistency of the detectors proposed in Section
5.5 and give bounds on the maximum consistency in each of the four cases.
Conceptually, we can calculate the exact consistency of the proposed detectors
as follows.
Theorem 19 There exist functions α∗i (T ) (i = I, . . . , IV) such that δi has van-
ishing false alarm probability if and only if τi < α∗i (T ).
Proof: See Appendix 5.A.
175
¥
The theorem implies that the consistency of δi (i = I, . . . , IV) is equal to α∗i (T ).
The definition of α∗i (T ) can be found in the proof. Their computation is rather
involved; instead, we resort to closed-form lower bounds that will also guarantee
vanishing false alarm probabilities, which leads to αi(T ) in (5.9, 5.11, 5.12, 5.14).
Fixing quantization schemes as in Case i (i = I, . . . , IV), we provide a converse
result in the following theorem.
Theorem 20 The level of undetectability in Case i (i = I, . . . , IV) is bounded by
φ(H0; q1, q2) ≤E[|X − Y |]
2λT,
where qj (j = 1, 2) are the quantizers in Case i, and X and Y are independent
Poisson random variables with mean λT .
Proof: See Appendix 5.A.
¥
Note that detectors δi (i = I, . . . , IV) are not necessarily optimal because the
chaff-inserting algorithms used in these detectors only make an information flow
mimic the joint distribution of quantized processes under H0. The marginal distri-
butions are different from H0 (e.g.,the process constructed by IC-SF is not Poisson)
and can still be used to distinguish the two hypotheses. In the proof of Theorem
20, we give a method to mimic both the marginal and the joint distributions under
176
H0 and analyze the CTR of that method to obtain the upper bound on the level
of undetectability.
Combining Theorems 19 and 20 yields the following result.
Corollary 6 The maximum consistency in Case i (i = I, . . . , IV) is lower bounded
by α∗i (T ) and upper bounded by E[|X − Y |]/(2λT ), where X and Y are defined as
in Theorem 20.
The relationship of the quantities discussed so far regarding the consistency in
Case i (i = I, . . . , IV) can be summarized as follows:
αi(T ) ≤ α∗i (T ) ≤ max consistency in Case i
≤ φ(H0; q1, q2) ≤E[|X − Y |]
2λT,
where qj (j = 1, 2) are the quantizers in Case i.
5.6.2 Numerical Comparison
We now give some heuristics on quantizer design by comparing the consistency
of δi (i = I, . . . , IV) as functions of capacity constraints. Specifically, let the
capacity constraints be (R, ∞) in Cases I and III, and (R, R) in Cases II and
IV. The consistency-rate functions are computed by α∗i (R
−1i (R)) (i = I, . . . , IV).
Since the form of α∗i (T ) is not explicit, we calculate it numerically as the CTR of
the optimal chaff-inserting algorithms (i.e., IC-SF, SE, OF, OE) on independent
Poisson processes of rate λ. In addition, we compare the computed consistency-
rate functions with the upper bound u(R)∆= E[|X − Y |]/(2λT ) for T = R−1
I (R),
177
where X and Y are defined in Theorem 20 (it can be shown that the upper bound
for T = R−1III (R) is much looser, and thus this bound is omitted). For algorithmic
simplicity, we choose the range of R to guarantee that R−1i (R) ≥ ∆ (i = I, . . . , IV).
See Fig. 5.6–5.8 for plots of the consistency-rate functions under different traffic
rates (i.e., different λ).
The plots yield the following observations: i) for small λ (Fig. 5.6), the consis-
tency of δI is similar to that of δIII, and the same holds for δII and δIV; as λ increases
(Fig. 5.7, 5.8), the consistency of δI (or δII) becomes increasingly larger than that
of δIII (or δIV); at λ = 1 (Fig. 5.8), the consistency of δII exceeds that of δIII even
though δIII has full side-information; ii) at the same R, the consistency of all the
detectors decreases with the increase of λ; iii) the consistency of δI is close to the
upper bound on the maximum consistency in Case I, especially at small R.
Observation (i) clearly suggests that which quantizer to use should depend on
the traffic rate. For very light traffic, we can use the simpler one-bit quantizer to
achieve the same performance as the more complicated slotted quantizer, whereas
we should use the slotted quantizer to obtain better performance if the traffic is
not so light. The intuition behind this observation is that for very small λ, the
probability that a slot contains more than one epoch is small, and thus we will
not lose much information by further compressing the results of slotted quanti-
zation by one-bit quantization; otherwise, there is nonnegligible probability for a
nonempty slot to contain multiple epochs, and this information will be lost after
one-bit quantization, making it more difficult to distinguish the two hypotheses.
Observation (ii) says that it is more difficult to detect information flows in heavy
traffic. The intuition behind is that if we normalize the maximum delay by the
average interarrival time, the normalized maximum delay constraint λ∆ will be
178
The consistency-rate functions of δI, . . . , δIV for various traffic rates:∆ = 1; α∗
i is computed over 104 slots.
0 0.05 0.1 0.15 0.2 0.25 0.3 0.350
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Case I Case II Case III Case IV Upper bound
R
α∗ i
α∗I
α∗II
α∗III α∗
IV
u
Figure 5.6: λ = 0.1.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.70
0.1
0.2
0.3
0.4
0.5
0.6Case I Case II Case III Case IV Upper bound
R
α∗ i
α∗I
α∗II
α∗III
α∗IV
u
Figure 5.7: λ = 0.5.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.70
0.05
0.1
0.15
0.2
0.25
0.3
0.35Case I Case II Case III Case IV Upper bound
R
α∗ i
α∗I
α∗II
α∗III
α∗IV
u
Figure 5.8: λ = 1.
relatively loose for large λ, making the detection more difficult (see parallel results
in Section 3.5.2). Observation (iii) implies that the detector δI is close to the opti-
mal in Case I. Its consistency and the upper bound jointly specify the maximum
consistency under slotted quantization and full side-information.
179
5.7 Summary
In this chapter, we consider distributed detection of bounded delay flows in chaff
noise. We give a theoretical characterization of the optimal performance and then
focus on the development of practical detection systems. We are especially inter-
ested in the detection performance at extremely low rates. Our results suggest that
slotted quantization coupled with the proposed detector gives satisfactory perfor-
mance in terms of the consistency-rate tradeoff. What remains to be solved by the
future work includes tighter converse results, better lower bounds, and ultimately,
the optimal quantizers and detectors.
180
APPENDIX 5.A
PROOF OF CHAPTER 5
Proof of Theorem 14
For the converse result, since for any quantizers Qi (i = 1, 2), the consistency is
upper bounded by φ(H0; Q1, Q2), the largest of such upper bounds under the given
capacity constraints gives an upper bound on the maximum consistency, i.e., the
level of detectability.
For the achievability result, consider the detection system8 (Q∗(n)1 , Q∗(n)
2 , δ∗n),
where δ∗n is a threshold detector defined as follows. Given (xn, yn),
δ∗n(xn, yn) =
1 if CTR∗(xn, yn) ≤ τ
0 o.w.,
where
CTR∗(xn, yn)∆= min
P2
CTR(nT )
for
P2=(fi, wi)2i=1 :
1)(n)
Q∗1(x
n|f1⊕
w1) > 0,(n)
Q∗2(y
n|f2⊕
w2) > 0;
2) (f1, f2) is a realization of an information flow..
That is, CTR∗(xn, yn) is the minimum CTR over all realizations of informa-
tion flows and chaff noise, which can generate (xn, yn) after being quantized by
8Quantizers Q∗(n)i (i = 1, 2) are the marginalization of Q∗
i on [0, nT ].
181
(Q∗(n)1 , Q∗(n)
2 ). The detector δ∗n makes detection if this minimum CTR is upper
bounded by a predetermined threshold τ .
Since the statistic CTR∗(xn, yn) is a lower bound on the actual CTR in the
measurements, it is easy to see that δ∗n has vanishing miss probability as long as
the CTR is upper bounded by τ a.s.
Generally, the statistic is smaller than the CTR required to mimic H0. If,
however, Q∗i (i = 1, 2) are deterministic, then CTR∗(Xn, Yn) is the minimum CTR
for an information flow to mimic the distribution of (Xn, Yn) after quantization,
and the minimum CTR∗(Xn,Yn) under H0 is the minimum CTR to mimic the joint
distribution of the quantization results of some (S1, S2) under H0. By definition,
this is the CTR of (F∗i
⊕
W∗i )
2i=1. Note that the processes achieving CTR∗(Xn,Yn)
do not necessarily mimic the marginal distributions of Si (i = 1, 2) under H0. By
assumption, there exists a constant α∗(R1, R2) such that the CTR of (F∗i
⊕
W∗i )
2i=1
converges to α∗(R1, R2) a.s. Thus, for any τ < α∗(R1, R2), we have that
limn→∞
PF (δ∗n) = limn→∞
PrCTR∗(Xn, Yn) ≤ τ = 0.
Combining this result and the arguments on miss probability, we conclude that
the consistency of δ∗n is at least α∗(R1, R2). Therefore, α∗(R1, R2) is a lower bound
on α(R1, R2).
¥
182
Proof of Proposition 16
First, we show that the matched pairs found by IC-SF indeed form a realization
of an information flow. Let x′n be the vector of the numbers of matched epochs
in s1, and f2 = (t1, t2, . . .) be the sequence of matched epochs in s2. We construct
a sequence f1 = (sj)j≥1 as follows. As illustrated in Fig. 5.9, for an epoch t1 in
f2 matched to the same slot, we construct an epoch s1 = t1; for an epoch t2 in
f2 matched to a previous slot, we construct s2 at the end of that slot. It is easy
to see that slotted quantization of f1 yields x′n, and (f1, f2) is a realization of an
information flow.
f1
f2
x′1 = 2 x′
2 = 1
∆ 2T0 T
s1 s2
t1 t2
Figure 5.9: Construct f1: : original epochs; •: constructed epochs.
Then we show that IC-SF is optimal. Since it is known that for given real-
izations and delay bound, BGM inserts the minimum number of chaff packets, it
only remains to show that our construction of s1 and choice of delay bound min-
imize the need of chaff. Given xn, the xj packets in slot j can be anywhere in
[(j − 1)T, jT ). By the causality and bounded delay constraints, the maximum
interval for these packets to be relayed is [(j − 1)T, jT +∆). By putting all the xj
packets at (j−1)T and allowing delays up to T +∆, we allow the matched packets
in s2 to be anywhere in the maximum interval. Thus, any other chaff-inserting
algorithm will have to insert no fewer chaff packets than IC-SF. Therefore, IC-SF
mimics (xn, s2) by inserting the minimum number of chaff packets.
183
¥
Proof of Theorem 15
Let Ck be the number of chaff packets inserted in the kth slot. Since T ≫ ∆, we
see that Ck (k = 1, 2, . . .) are approximately i.i.d. , and
Ckd= max(Y (υ, T ) − X1, 0) + max(X1 − Y (υ, T + ∆), 0),
where υ is a random variable in [0, ∆] denoting the time used in each slot to relay
packets sent in the previous slot. It is easy to see that the CTR is minimized when
the processes have equal rate because unequal rates will make υ drift towards 0 or
∆ and increase the mean of Ck. Moreover, if we prove the theorem for processes
of equal rate λ, then the result also holds for smaller rates. For example, if Si
(i = 1, 2) have rate λ′ < λ, then by the result of the theorem, we have that
PrCI/N < 1√πλ′T
− ∆4T decays exponentially, implying that PrCI/N < 1√
πλT− ∆
4T
also decays exponentially. Therefore, it suffices to consider independent Poisson
processes of equal rate λ.
We first show that Pr 1n
n∑
k=1
Ck ≤ η decays exponentially for any η <
2λTαI(T ). By Cramer’s Theorem [9], this result holds if we show that E[Ck] ≥
2λTαI(T ). Fix a value of υ in [0, ∆]. By Gaussian approximation of Poisson
random variables, we have that
Y (υ, T ) − X1 ∼ N (−λυ, λ(2T − υ)) ≈ N (−λυ, 2λT ).
184
Then
E[max(Y (υ, T ) − X1, 0)]
≈∞
∫
0
z√4πλT
e−(z+λυ)2/(4λT )dz
=
√
λT
πe−λυ2/(4T ) − λυQ
(
λυ√2λT
)
≈(
√
λT
π− 1
2λυ
)
e−λυ2/(4T )
≈√
λT
π− 1
2λυ.
Similarly, E[max(X1 − Y (υ, T + ∆), 0)] ≈√
λT/π − 12λ(∆ − υ). Therefore,
E[Ck] ≈ 2
√
λT
π− 1
2λ∆ = 2λTαI(T ).
Next, let β∆= τI/αI(T ) (β ∈ [0, 1)). A necessary condition for false alarm is
that 1n
n∑
k=1
Ck ≤ √β2λTαI(T ) or N/n ≥ (2λT )/
√β. By union bound, we have that
PF (δI) ≤ Pr 1
n
n∑
k=1
Ck ≤√
β2λTαI(T ) + PrN
n≥ 2λT√
β.
We have shown that the first term decays exponentially with n, and by Cramer’s
Theorem, the second term can be shown to decay exponentially as well. Therefore,
the overall false alarm probability decays exponentially. This completes the proof.
¥
Proof of Proposition 17
First, we show that IC-SE indeed finds realizations of an information flow and
chaff noise such that the slotted quantization results are equal to (xn, yn). Let
185
(x′n, y′n) denote vectors of the matched numbers found by IC-SE. We will show
that (x′n, y′n) is the result of slotted quantization of a pair of sequences (f1, f2)
which is a realization of an information flow. As illustrated in Fig. 5.10, for T ≥ ∆,
we construct f1 as x′j (j ≥ 1) epochs at the end of slot j and f2 as y′
j, 1 epochs at the
beginning and y′j, 2 epochs at the end of slot j, where y′
j, 1 is the number of epochs
out of y′j which are matched to the (j − 1)th slot, and y′
j, 2 is the number of epochs
matched to the jth slot (we have y′j = y′
j, 1 +y′j, 2). Such construction preserves the
quantization results, and (f1, f2) forms a realization of an information flow. For
T < ∆, the construction of fi (i = 1, 2) is the same except that for f2, y′j, 1 is the
number of epochs matched to slots before the jth slot, and y′j, 2 is the number of
epochs matched to the jth slot.
f1
f2
x′1 = 2 x′
2 = 3
y′1, 1 = 0
y′1, 2 = 1
y′2, 1 = 1
y′2, 2 = 3
y′3, 1 = 0
0 T 2T
Figure 5.10: Construct (f1, f2) from (x′n, y′n) (T ≥ ∆). The matching found byIC-SE guarantees that x′
j = y′j, 2 + y′
j+1, 1.
Next, we show that IC-SE is optimal. Due to the constraints of causality
and bounded delay, a packet in slot j can only be matched to packets from slots
j, . . . , j + ⌈∆T⌉, and IC-SE allows all such matches. Combining this argument with
the fact that BGM is optimal yields the optimality of IC-SE.
¥
186
Proof of Theorem 16
By arguments parallel to those in the proof of Theorem 15, we only need to consider
independent Poisson processes of equal rate λ. Following the idea of that proof,
we will prove Theorem 16 if we show that PrCII/n ≤ η decays exponentially for
any η < 2λTαII(T ).
In δII, no matter how large T is (relative to ∆), the numbers of chaff packets
in consecutive slots are still correlated. If, however, we run δII only on every other
slot, and let C2i (i = 1, 2, . . .) be the number of chaff packets inserted in the (2i)th
slot, then C2, C4, C6, . . . will be i.i.d. . Obviously9, CII ≥n/2∑
i=1
C2i. Then we have
PrCII/n ≤ η ≤ Pr 2
n
n/2∑
i=1
C2i ≤ 2η.
By Cramer’s Theorem, we can prove the exponential decay if we show that E[C2] ≥
Table 5.4: Detector for Case IV.δIV(xn, yn, ∆, τIV):
CIV = 0;x0 = 1;for k = 1 : n
if (xk > yk + yk+1) or (yk > xk−1 + xk)CIV = CIV + 1;
endend
N = −n(log (1 − 1n
n∑
k=1
xk) + log (1 − 1n
n∑
k=1
yk));
return
H1 if CIV/N ≤ τIV,H0 o.w.;
200
Chapter 6
Conclusions
In this dissertation, we investigate statistical inference in sensor networks and
general ad hoc networks when there is no or incomplete parametric description of
the underlying distributions.
In Chapter 2, we considered the problem of detecting unknown changes in the
unknown distribution of alarmed sensors in a randomly deployed sensor field. We
proposed a threshold detector based on the distance between the empirical dis-
tributions in two data collections and provided an estimate of the set with the
maximum change by the set with the maximum change in the empirical distri-
butions. By applying the Vapnik-Chervonenkis Theory, we derived exponential
upper bounds on detection error probabilities and proved the consistency of the
estimator under certain regularity conditions for arbitrary distributions. We also
developed several practical algorithms to implement the detector and the estimator
efficiently. Specifically, we simplified the search in infinitely many sets to a search
in a finite number of sets defined by sample points and developed polynomial-time
algorithms for regular sets such as disks, rectangles, and stripes. Comparison of
their complexity and performance suggests that prior knowledge about the changes
allows us to design searching sets to fit the changed sets and therefore significantly
improve the performance.
In Chapter 3, we considered the problem of detecting information flows by
timing analysis when there is no chaff noise in the measurements. We modelled in-
formation flows by constraints such as causality, packet conservation, and bounded
delay or bounded memory. While the bounded delay condition is only applicable to
201
interactive information flows, the bounded memory condition is always satisfied in
sensor networks due to limited memory size per sensor. We proposed a matching-
based algorithm under the bounded delay model and a rank-based algorithm under
the bounded memory model. We showed that the algorithms have zero miss de-
tection and exponentially decaying false alarm probabilities if independent traffic
can be modelled as Poisson processes. A comparison of error exponents and sim-
ulations both show that the proposed algorithms outperform existing algorithms.
Comparison between the proposed algorithms suggests that it is easier to detect
information flows with bounded delay than with bounded memory if the traffic
rate is sufficiently low and vice versa. Since pairwise detection already yields suf-
ficiently good performance, we can safely decompose the detection of multi-hop
flows into subproblems of detecting 2-hop flows for every pairs.
In Chapter 4, we generalized the detection of information flows to allow chaff
noise in the measurements. The insertion of chaff noise makes it impossible to
detect information flows when the mixture of information flows and chaff noise
becomes statistically independent. We used the minimum fraction of chaff noise
required to mimic independent traffic to characterize the level of detectability of
information flows. Optimal chaff-inserting algorithms are developed to compute
the minimum fraction of chaff, and threshold detectors based on these algorithms
are proposed to achieve Chernoff-consistent detection in the presence of chaff noise.
Our analysis shows that pairwise detection can be easily defeated by a relatively
small amount of chaff noise. Thus, unlike the case of no chaff noise, pairwise detec-
tion alone can no longer provide satisfactory performance. To solve this problem,
we extend the scope of detector to multiple hops. Such extension significantly
improves the robustness against chaff noise. In particular, for Poisson null hy-
pothesis, the fraction of chaff noise for which Chernoff-consistent detection can be
202
achieved converges to one as the number of hops increases, implying that it is al-
most impossible to hide arbitrarily long paths. Although Poisson assumption has
been made under the null hypothesis to facilitate analysis, we showed both theo-
retically and experimentally that independent traffic in practice is even easier to
distinguish from information flows, implying that our results for Poisson processes
provide lower bounds on the detection performance of practical information flows.
In Chapter 5, we further extended the detection of information flows to the
scenario where there are capacity constraints in data collection. In this chapter,
we focus on bounded delay information flows through a pair of nodes. Still mea-
suring performance by the maximum fraction of chaff noise for Chernoff-consistent
detection, we extended the definitions in Chapter 4 to incorporate quantization
performed at the eavesdroppers. The minimum fraction of chaff noise required
to mimic both the marginal distributions at the eavesdroppers and the joint dis-
tribution of quantized measurements at the fusion center gives an upper bound
on the level of detectability as a function of the capacity constraints. Although
the optimal performance remains unknown, we designed practical detection sys-
tems to give achievable lower bounds. The detection systems consist of simple
slot-based quantizers and threshold detectors based on the optimal chaff-inserting
algorithms for quantized measurements. Specifically, we proposed a slotted quan-
tizer which quantizes transmission epochs to numbers of epochs in each slot and
a one-bit quantizer which further compresses the results of slotted quantization
to binary indicators of empty or nonempty slots. For each quantizer, linear-time
algorithms are developed to implement the detector both with and without full
side-information. Numerical comparison of the performance of the proposed de-
tection systems for Poisson processes shows that the two types of quantization
schemes have similar performance at low traffic rate, but slotted quantization be-
203
comes increasingly advantageous as traffic rate increases. This result combined
with previous results in [31] suggests that slotted quantization is a reasonably
good method to compress Poisson processes.
The change detection and estimation problem in Chapter 2 is purely non-
parametric. The information flow detection problem in Chapter 3–5 is partially
nonparametric because no parametric assumption is made for information flows,
but distributions under the null hypothesis are assumed to be known (indepen-
dent Poisson processes in the analysis). Moreover, in Chapter 5, the processes are
assumed to have the same marginal distributions under both hypotheses.
6.1 Publications
The following is a list of journal publications/submissions that contain parts of
this thesis.
• T. He, S. Ben-David, and L. Tong, “Nonparametric Change Detection and
Estimation in Large-Scale Sensor Networks,” IEEE Transactions on Signal
Processing, vol. 54, no. 4, pp. 1204–1217, April 2006.
• T. He and L. Tong, “Detecting Encrypted Stepping-Stone Connections,”
IEEE Transactions on Signal Processing, vol. 55, no. 5, pp. 1612–1623,
May 2007.
• T. He and L. Tong, “Detection of Information Flows,” submitted to IEEE
Transactions on Information Theory, 2007.
• T. He and L. Tong, “Distributed Detection of Information Flows,” submitted
to IEEE Transactions on Information Theory, 2007.
204
6.2 Future Directions
The advantage of nonparametric techniques over their parametric counterparts lies
in that they provide reasonable performance without specific parametric knowledge
about the actual distributions. Therefore, it is crucial that in partially nonpara-
metric techniques such as those for information flow detection, the parametric
assumptions are generally satisfied in applications of practical interest. Although
we have shown by analytical arguments and some experimental data that the pro-
posed detectors will probably have even better performance on real traffic, it is
desirable to verify the statement by more extensive study and experiments with
actual traces. Moreover, since most of the related experimental work has been
done in the context of internet, it is of interest to implement the detection schemes
in wireless networks, especially wireless sensor networks, to investigate the oppor-
tunities and challenges present in these contexts.
205
BIBLIOGRAPHY
[1] R. Ahlswede and I. Csiszar. Hypothesis testing with communication con-straints. Information Theory, IEEE Transactions on, 32(4):533–542, 1986.
[2] S. Ben-David, J. Gehrke, and D. Kifer. Detecting Change in Data Streams.In Proc. 2004 VLDB Conference, Toronto, Canada, 2004.
[3] S. Ben-David, T. He, and L. Tong. Non-Parametric Approach to ChangeDetection and Estimation in Large Scale Sensor Networks. In Proceedings ofthe 2004 Conference on Information Sciences and Systems, Princeton, NJ,March 2004.
[4] D. P. Bertsekas and R. Gallager. Data Networks. Prentice Hall, 1992.
[5] A. Blum, D. Song, and S. Venkataraman. Detection of Interactive Step-ping Stones: Algorithms and Confidence Bounds. In Conference of RecentAdvance in Intrusion Detection (RAID), Sophia Antipolis, French Riviera,France, September 2004.
[6] O. Bousquet, U. V. Luxburg, and G. R atsch. Advanced Lectures on MachineLearning. Springer, Heidelberg, Germany, 2004.
[7] T. Cover and J. Thomas. Elements of Information Theory. John Wiley &Sons, Inc., 1991.
[8] D.R. Cox and H.D. Miller. The Theory of Stochastic Processes. John Wiley& Sons Inc., New York, 1965.
[9] Frank den Hollander. Large Deviations (Fields Institute Monographs, 14).American Mathematical Society, 2000.
[10] J. Deng, R. Han, and S. Mishra. Intrusion tolerance and anti-traffic analysisstrategies for wireless sensor networks. In IEEE International Conferenceon Dependable Systems and Networks (DSN), pages 594–603, Florence, Italy,June 2004.
[11] D. Donoho, A.G. Flesia, U. Shankar, V. Paxson, J. Coit, and S. Staniford.Multiscale stepping-stone detection: Detecting pairs of jittered interactivestreams by exploiting maximum tolerable delay. In 5th International Sympo-sium on Recent Advances in Intrusion Detection, Lecture Notes in ComputerScience 2516, 2002.
206
[12] N. Ferguson and B. Schneier. Practical Cryptography. John Wiley & Sons,Inc., Indianapolis,IN, 2003.
[13] J. D. Gibbons and S. Chakraborti. Nonparametric Statistical Inference. Mar-cel Dekker, 2003.
[14] J. Giles and B. Hajek. An Information-Theoretic and Game-Theoretic Studyof Timing Channels. IEEE Transactions on Information Theory, 48(9):2455–2477, September 2002.
[15] Piyush Gupta and P. R. Kumar. The capacity of wireless networks. IEEETrans. Inform. Theory, 46(2):388–404, March 2000.
[16] Te Sun Han and S. Amari. Statistical inference under multiterminal datacompression. IEEE Trans. Inform. Theory, 44(6):2300–2324, Oct. 1998.
[17] T. He and L. Tong. On A-distance and Relative A-distance.Technical Report ACSP-TR-08-04-02, Cornell University, August 2004.http://acsp.ece.cornell.edu/pubR.html.
[18] T. He and L. Tong. An Almost Surely Complete Subset of PlanarDisks. Technical Report ACSP-TR-04-05-01, Cornell University, April 2005.http://acsp.ece.cornell.edu/pubR.html.
[19] T. He and L. Tong. Detecting Encrypted Stepping-Stone Connections. IEEETransactions on Signal Processing, 55(5):1612–1623, May 2007.
[20] T. He, L. Tong, and A. Swami. Nonparametric Change Estimation in 2DRandom Fields. In Proc. of IEEE MILCOM’05, Atlantic City, NJ, October2005.
[21] T. Hettmansperger and M. Keenan. Tailweight, Statistical Inference, andFamilies of Distributions - A Brief Survey. Statistical Distributions in Scien-tific Work, 1:161–172, 1980.
[22] Myles Hollander and Douglas A. Wolfe. Nonparametric Statistical Methods.Wiley Interscience, 1973.
[23] X. Hong, P. Wang, J. Kong, Q. Zheng, and J. Liu. Effective ProbabilisticApproach Protecting Sensor Traffic. In Military Communications Conference,2005, pages 1–7, Atlantic City, NJ, Oct. 2005.
207
[24] Y. Hong and A. Scaglione. Distributed change detection in large scale sensornetworks through the synchronization of pulse-coupled oscillators. In Proc.Intl. Conf. Acoust., Speech, and Signal Processing, pages 869 – 872, Montreal,Canada, May 2004.
[25] N. Kingsbury. Approximation formulae for the Gaussian error in-tegral Q(x). Technical Report m11067, Connexions, June 2005.http://cnx.org/content/m11067/latest/.
[26] D. Kotz and K. Essien. Analysis of a campus-wide wireless network. ACMWireless Networks Journal, 11(1-2):115–133, Jan. 2005.
[27] N. Patwari, A. O. Hero, and B. M. Sadler. Hierarchical censoring sensors forchange detection. In 2003 IEEE Workshop on Statistical Signal Processing,pages 21–24, St. Louis, MO, September 2003.
[28] V. Paxson and S. Floyd. Wide-Area Traffic: The Failure of Poisson Modeling.IEEE/ACM Transactions on Networking, 3(3):226–244, June 1995.
[29] P. Peng, P. Ning, D.S. Reeves, and X. Wang. Active Timing-Based Correla-tion of Perturbed Traffic Flows with Chaff Packets. In Proc. 25th IEEE In-ternational Conference on Distributed Computing Systems Workshops, pages107–113, Columbus, OH, June 2005.
[30] H. V. Poor. An Introduction to Signal Detection and Estimation. Springer-Verlag, New York, 1994.
[31] I. Rubin. Information Rates and Data-Compression Schemes for Poisson Pro-cesses. IEEE Transactions on Information Theory, 20(2):200–210, March1974.
[32] J. Shao. Mathematical Statistics. Springer, 1999.
[33] David J. Sheskin. Handbook of Parametric and Nonparametric StatisticalProcedures. Chapman & Hall/CRC, 2004. 3rd Ed.
[34] S. Staniford-Chen and L.T. Heberlein. Holding intruders accountable on theinternet. In Proc. the 1995 IEEE Symposium on Security and Privacy, pages39–49, Oakland, CA, May 1995.
[35] D. Tang and M. Baker. Analysis of a local-area wireless network. In MOBI-COM, pages 1–10, Boston, MA, Aug. 2000.
208
[36] L. Tong, Q. Zhao, and S. Adireddy. Sensor Networks with Mobile Agents. InProc. 2003 Intl. Symp. Military Communications, Boston, MA, Oct. 2003.
[37] John N. Tsitsiklis. Decentralized Detection. Advances in Statistical SignalProcessing, 2:297–344, 1993.
[38] V. N. Vapnik. The Nature of Statistical Learning Theory. Springer-Verlag,New York, NY, 1995.
[39] V. N. Vapnik. Statistical Learning Theory. Wiley-Interscience, New York,NY, 1998.
[40] V.N. Vapnik and A. Ya. Chervonenkis. On the uniform convergence of rela-tive frequencie of events to their probabilities. Theory of Probability and itsApplications, 16:264–280, 1971.
[41] P. Venkitasubramaniam, T. He, and L. Tong. Anonymous Networking amidstEavesdroppers. Submitted to IEEE Transactions on Information Theory: Spe-cial Issue on Information-Thoeretic Security, Feb. 2007.
[42] S. Verdu. The Exponential Distribution in Information Theory. Problems ofInformation Transmission, 32(1):86–95, 1996.
[43] X. Wang. The loop fallacy and serialization in tracing intrusion connectionsthrough stepping stones. In Proc. of the 2004 ACM Symposium on AppliedComputing, pages 404–411, Nicosia, Cyprus, March 2004.
[44] X. Wang and D. Reeves. Robust correlation of encrypted attack traffic throughstepping stones by manipulation of inter-packet delays. In Proc. of the 2003ACM Conference on Computer and Communications Security, pages 20–29,2003.
[45] X. Wang, D. Reeves, and S. Wu. Inter-packet delay-based correlation for trac-ing encrypted connections through stepping stones. In 7th European Sympo-sium on Research in Computer Security, Lecture Notes in Computer Science2502, pages 244–263, 2002.
[46] X. Wang, D. Reeves, S. Wu, and J. Yuill. Sleepy watermark tracing: Anactive network-based intrusion response framework. In Proc. of the 16th In-ternational Information Security Conference, pages 369–384, 2001.
209
[47] R. S. Wenocur and R. M. Dudley. Some Special Vapnik-Chervonenkis Classes.Discrete Mathematics, 33:313–318, 1981.
[48] K. Yoda and H. Etoh. Finding a connection chain for tracing intruders. In6th European Symposium on Research in Computer Security, Lecture Notesin Computer Science 1895, Toulouse, France, October 2000.
[49] L. Zhang, A.G. Persaud, A. Johson, and Y. Guan. Stepping Stone AttackAttribution in Non-cooperative IP Networks. In Proc. of the 25th IEEE Inter-national Performance Computing and Communications Conference (IPCCC2006), Phoenix, AZ, April 2006.
[50] Y. Zhang, W. Lee, and Y. Huang. Intrusion detection techniques for mobilewireless networks. ACM Wireless Networks Journal, 9(5):545–556, Sept. 2003.
[51] Y. Zhang and V. Paxson. Detecting stepping stones. In Proc. the 9th USENIXSecurity Symposium, pages 171–184, August 2000.
[52] Y. Zhu, X. Fu, B. Graham, R.Bettati, and W. Zhao. On flow correlationattacks and countermeasures in mix networks. In Proceedings of Privacy En-hancing Technologies workshop, May 26-28 2004.