Second-order residual analysis of spatio-temporal point processes and applications in model evaluation Jiancang Zhuang Institute of Statistical Mathematics, Tokyo, Japan. Summary.This paper gives first-order residual analysis for spatio-temporal point processes similar to the residual analy- sis developed by Baddeley et al. (2005) for spatial point process, and also proposes principles for second-order residual analysis based on the viewpoint of martingales. Examples are given for both first- and second-order residuals. In par- ticular, residual analysis can be used as a powerful tool in model improvement. Taking a spatio-temporal epidemic-type aftershock sequence (ETAS) model for earthquake occurrences as the baseline model, second-order residual analysis can be useful to identify many features of the data not implied in the baseline model, providing us with clues of how to formulate better models. 1. Introduction and motivations Temporal, spatial and spatio-temporal point processes have been increasingly widely used in many fields, in- cluding epidemiology, biology, environmental sciences and geosciences. Among the associated statistical inference techniques, such as model specification, parameter estimation, model selection, testing goodness-of-fit and model evaluation, the tools used for testing goodness-of-fit and model evaluation are quite under-developed. This is one motivation for this article. Model selection procedures can be used in testing goodness-of-fit and model evaluation. Given several explicit models that are fitted to the same dataset, we can use some model selection criterion such as Akaike’s information criterion (AIC, see, e.g., Akaike, 1974) and cross validation (e.g., Stone, 1977) to find the best model among them. To find a model better than the current best model, one can always try several possible versions of new
30
Embed
Second-order residual analysis of spatio-temporal point ...bemlar.ism.ac.jp/zhuang/pubs/residual3.4.pdf · Temporal, spatial and spatio-temporal point processes have been increasingly
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Second-order residual analysis of spatio-temporal point processesand applications in model evaluation
Jiancang Zhuang
Institute of Statistical Mathematics, Tokyo, Japan.
Summary.This paper gives first-order residual analysis for spatio-temporal point processes similar to the residual analy-
sis developed by Baddeley et al. (2005) for spatial point process, and also proposes principles for second-order residual
analysis based on the viewpoint of martingales. Examples are given for both first- and second-order residuals. In par-
ticular, residual analysis can be used as a powerful tool in model improvement. Taking a spatio-temporal epidemic-type
aftershock sequence (ETAS) model for earthquake occurrences as the baseline model, second-order residual analysis
can be useful to identify many features of the data not implied in the baseline model, providing us with clues of how to
formulate better models.
1. Introduction and motivations
Temporal, spatial and spatio-temporal point processes have been increasingly widely used in many fields, in-
cluding epidemiology, biology, environmental sciences and geosciences. Among the associated statistical inference
techniques, such as model specification, parameter estimation, model selection, testing goodness-of-fit and model
evaluation, the tools used for testing goodness-of-fit and model evaluation are quite under-developed. This is one
motivation for this article.
Model selection procedures can be used in testing goodness-of-fit and model evaluation. Given several explicit
models that are fitted to the same dataset, we can use some model selection criterion such as Akaike’s information
criterion (AIC, see, e.g., Akaike, 1974) and cross validation (e.g., Stone, 1977) to find the best model among
them. To find a model better than the current best model, one can always try several possible versions of new
2 J. Zhuang
models, fit them to the same dataset and use the model selection procedures again to see whether one of the new
models becomes the best performing model. However, the above procedures are not always easy to implement.
Formulating a model and fitting it to the dataset may involve heavy programming and computational tasks. Finally,
model selection procedures only give us some quantities that indicate the overall fit of each model. It is hard to
deduce from these quantities whether a model, even if it is not the best one, has some better properties than other
models ranked higher by the model selection procedure. It is very helpful if the model improvement process can
be simplified. Residual analysis developed in this article can be used for this purpose.
To help motivate this work, we begin with a description of the dataset used in this study. The developments
here are associated with seeking answers to a series of problems in modelling the phenomena associated with
earthquake clusters. Although earthquake data come from the field of geophysics, similar problems also appear in
the epidemiological modelling of contagious diseases, in biology, in ecology and in environmental sciences.
The epicentres of earthquakes are not homogeneously distributed on the surface of the earth. In the globe,
earthquakes mainly occur in the subduction zone between plate boundaries. Locally, earthquakes accumulate along
active faults or in volcanic regions. Their depths range from several to 700 kilometers. Although an earthquake
can be as big as M7, resulting in huge disasters, most earthquakes are so small that they can be detected only by
sensitive seismometres.
Seismicity is clustered in both space and time. The overlapping of earthquake clusters with one another and
also with the background seismicity, complicates our analysis. For the purpose of long-term earthquake prediction,
i.e., evaluating the risk of the occurrence of a powerful earthquake in about a 10-year time scale, a good estimate
of the background seismicity rate is necessary. On the other hand, for short-term prediction (in a scale of an hour
or a day), a good understanding of earthquake clusters is necessary.
The earthquake catalogue consist of a list (ti, xi, si) and other associated information, where ti, xi and si
record, respectively, the occurrence time, the epicentre location and magnitude of the ith event. Figure 1 shows
second-order residual analysis of point processes 3
the shallow earthquakes (with depths less than 100 km) in the Japanese Meteorological Agency (JMA) catalogue
used in this analysis. The time span of this catalogue is 01/01/1926 to 31/12/1999. In this article, we select
the data in the polygon with vertices (134.0E, 31.9N), (137.9E, 33.0N), (143.1E, 33.2N), (144.9E, 35.2N),
(147.8E, 41.3N), (137.8E, 44.2N), (137.4E, 40.2N), (135.1E, 38.0N) and (130.6E, 35.4N). The time
period from the 10000th day after 01/01/1926 to 31/12/1991 is used as the target range in which to estimate the
parameters through the method of maximum likelihood.
It is easy to see from Figure 1 that earthquakes are clustered. Typically, the spatio-temporal ETAS (epidemic
type aftershock sequence) model is used to describe the behavior of earthquake clustering (Kagan, 1991; Rathbun,
1994; Musmeci and Vere-Jones, 1992; Ogata, 1988, 1998, 2004; Ogata et al 2003; Zhuang et al., 2002, 2004;
Console and Murru, 2002; Console et al., 2003; Helmstetter and Sornette, 2003a, 2003b; Helmstetter et al., 2003).
In this model, seismicity is classified into two components, the background and the cluster. Background seismicity
is modelled as a Poisson process that is temporally stationary but not spatially homogeneous. Once an event
occurs, no matter if it is a background event or if it is generated (triggered) by another previous event, it produces
(triggers) its own children according to certain rules. Such a model is a continuous-type branching processes with
immigration (background). This model can by defined completely by the conditional intensity function (hazard
rate conditional on a given history Ft up to current time t, see Appendix A or Daley and Vere-Jones, 2003, Chapter
7, for more details) as
λ(t, x, s) = lim∆t↓0, ∆x↓0,∆s↓0
PrN((t, t + ∆t] × (x, x + ∆x] × (s, s + ∆s]) ≥ 1|Ft
∆t ∆x∆s.
The conditional intensity function of the ETAS model used in this paper takes the form given by Ogata (1998),
i.e.,
λ(t, x, s) = γ(s)
[
u(x) +∑
i: ti<t
κ(si) g(t − ti) f(x − xi | si)
]
(1)
4 J. Zhuang
where
κ(s) = A exp[αs]; (2)
γ(s) = β exp[−βs] H(s);
f(x | s) =q − 1
πCeαs
(
1 +‖x‖2
Ceαs
)−q
, (3)
and
g(t) =p − 1
c
(
1 +t
c
)−p
H(t), p > 1,
H being the Heaviside function. In the above, the magnitude distribution γ(s) is based on the Gutenberg-Richter
law (Gutenberg and Richter, 1956), the expected number of children κ(s) is based on Yamanaka and Shimazaki
(1990), and the time density g(t) is based on the modified Omori formula (Omori, 1898; Utsu, 1969), all being
empirical laws in seismicity studies. The background rate and the parameters A, α, β, c, p and C in the model
can be estimated by an iterative algorithm (Zhuang et al., 2002, 2004, see Appendix C).
However, there are many questions about the above model formulation. For example:
1 Is the background process stationary?
2 Are background events and triggered events different, for example, in magnitude distribution or in triggering
offspring?
3 Does the magnitude distribution of triggered events depend on the magnitudes of their parent events?
4 Is it reasonable to apply the same exponential function eαs in both κ(s) and f(x|s)?
In previous studies, residual analysis has been carried out by transforming the point process into a standard
Poisson process (Ogata, 1988). Schoenberg (2004) uses the thinned residuals to analyse the goodness-of-fit of the
ETAS model to Californian earthquake data. Baddeley et al. (2004) have made more remarkable and general
second-order residual analysis of point processes 5
developments. But their residual analysis methods are all of the first order and far from being sufficient for solving
problems where second-order properties such as clustering and inhibition are concerned. To answer these, it is
necessary for us to generalise the concepts of residual analysis to higher orders.
Zhuang et al. (2004) developed a stochastic reconstruction method to test the above hypotheses associated with
earthquake clusters, using the ETAS model as the reference model. Their method is based purely on intuition
rather than on a strict theoretical basis. As we show in this article, their method can be validated by using the
tools of residual analysis. Providing a theoretical basis for the stochastic reconstruction method that can also be
applied to a wider range of point-process models is another motivation of this article.
In the ensuing sections, we first review the first-order residuals developed by Baddeley et al. (2005) and then
propose principles for second-order residuals. The uses and powers of these residuals are illustrated via some simple
examples and also by solving the questions raised in the formulation of the ETAS model for the purpose of model
improvement.
2. First-order residuals and examples
Baddelay et al. (2005) give the first-order residuals for a spatial point process X = x1, x2, · · · based on the
Nguyen-Zessin (1979) formula
E
[
∑
xi∈X∩D
h(xi; X \ xi)
]
= E
[∫
D
h(x; X)λp(x; X)µ(dx)
]
,
for any measurable set D, where λp is the Papangelou conditional intensity. Thus, they define innovations with
respect to (D, h) to be
I(D, h, λp) =∑
xi∈X∩D
h(xi; X \ xi) −
∫
D
h(x; X)λp(x; X)µ(dx),
and residuals corresponding to I(D, h, λp)
R(D, h, λp) =∑
xi∈X∩D
h(xi; X \ xi) −
∫
D
h(x; X)λp(x; X)µ(dx),
6 J. Zhuang
where λp is the fitted conditional intensity. If h also depends on the parameters θ in λp, h is obtained by substituting
the estimated parameters θ in h.
The conditional intensity for temporal or spatio-temporal point processes defined in Appendix A is simpler than
the Papangelou conditional intensity for spatial point processes. In the former case, the occurrence of an event at a
certain time or a spatio-temporal location only depends on the events occurring at earlier times; while in the latter
case, the occurrence of an event at a particular location depends on all the other events that occur elsewhere. In
principle, we can simply obtain the conditional intensity defined in (1) and Appendix A through taking expectation
of the Papangelou conditional intensity, which is defined as conditional on the σ-algebra generated by all the events
not at the current time and location, over a simpler σ-algebra of the observation history up to current time, to
define first-order residuals for a spatio-temporal point process. However, it is hard to generalise first-order residuals
to second-order residuals in this way, which is the main concern of this article. We make use of the evolutionary
features of the process through the viewpoints of martingale theory.
Let N be a simple spatio-temporal point process in a interval [0, T ] and a d-dimensional region X ⊂ Rd,
admitting a conditional intensity λ(t, x) (see Delay and Vere-Jones, 1988, Chapter 13, or Appendix A for definition
of conditional intensity). According to the martingale property of the conditional intensity, for any predictable
process (measurable with respect to the σ-algebra generated by sets of the form (s, t] × B × E where E ∈ Fs and
B ∈ X ) h(t, x) ≥ 0, almost everywhere (a.e.),
E
[∫
D
h(t, x)N(dt × dx)
]
= E
[∫
D
h(t, x)λ(t, x)µ(dt × dx)
]
, ∀D ∈ T⊗
X (4)
where µ is the Lebesgue measure ℓ × ℓd (see also, Bremaud, 1981, Chap. 2; Karr, 1991, Chap. 5).
Using the terminology adopted in Baddeley et al. (2005) for spatial point processes, define first-order innovations
with respect to a predictable function h(t, x) ≥ 0, a.e., and a measurable set D ⊂ T × X by
is also a zero-mean martingale when t and t′ are fixed, i.e.,
E[J(u) | Ft′ ] = J(t′) = 0, u > t.
second-order residual analysis of point processes 25
i.e.,
E
[
∫
(t′, u]
H−(t, ω, s, ω′)N(ds, ω′)
∣
∣
∣
∣
∣
Ft′
]
= E
[
∫
(t′, u]
H−(t, ω, s, ω′)λ(s, ω′)µ(ds)
∣
∣
∣
∣
∣
Ft′
]
.
Since the conditional intensity measure is diffuse, the integral in the right-hand side of the above equation is
continuous with respect to t′. Thus, if u → +∞, for fixed t and ω, F (t, ω, t′, ω′) is left-continuous and adopted to
Ft′ . From Bremaud (1980, Theorem T5, Chapter I.3), F (t, ω, t′, ω′) is F -predictable with respect to (t′, ω′).
Moreover, if H−(t, ω, t′, ω′) takes a form of I(a,b] IU (ω) I(a′, b′](t′) IV (ω′), U ∈ Fa, V ∈ Fa′ , then F takes the
form I(a,b] IU (ω) f(t′, ω′) where f(t′, ω′) is F -predictable. That is to say, F is F⊗
F -predictable.
The related conclusions for G can similarly be proved. 2
Proposition 2. Assume that a second-order F-predictable process H(t, ω, t′, ω) can be similarly decomposed
into H−(t, ω, t′, ω), H+(t, ω, t′, ω) and H0(t, ω, t′, ω) as in (25). Then
E
[∫∫
D
H−(t, ω, t′, ω)N(dt, ω)N(dt′, ω)
]
= E
[∫∫
D
H−(t, ω, t′, ω, )λ(t, ω)λ(t′, ω)µ(dt′)µ(dt)
]
(29)
if the integral on the right-hand side exists, where D is a measurable subset of T × T .
Proof: Notice
I ≡ E
[∫∫
D
H−(t, ω, t′, ω)N(dt, ω)N(dt′, ω)
]
= E
∫
dom D
E
[∫ +∞
0
H−(t, ω, t′, ω) ID(t)(t′)N(dt′, ω)
∣
∣
∣
∣
Ft
]
N(dt, ω)
where domD = t : ∃t′ such that (t, t′) ∈ D and D(t) = t′ : (t, t′) ∈ D. From Lemma 1,
F (t, ω, t′, ω′) ≡ E
[∫ +∞
0
H−(t, ω, s, ω′) ID(t′)(s)N(ds, ω′)
∣
∣
∣
∣
Ft′
]
is F⊗
F -predictable, implying that
F (t, ω) ≡ F (t, ω, t, ω) = E
[∫ +∞
0
H−(t, ω, t′, ω) ID(t)(t′)N(dt′, ω)
∣
∣
∣
∣
Ft
]
26 J. Zhuang
is F -predictable.
Thus,
I = E
[∫
domD
F (t, ω)N(dt, ω)
]
= E
[∫
domD
F (t, ω)λ(t, ω)µ(dt)
]
.
On the other hand, from Lemma 1, the right-hand side of (29)
E
∫∫
D
H−(t, ω, t′, ω)λ(t′, ω)λ(t, ω)µ(dt′)µ(dt)
= E
∫
dom D
E
[
∫
D(t)
H−(t, ω, t′, ω)λ(t′, ω)µ(dt′)
∣
∣
∣
∣
∣
Ft
]
λ(t, ω)µ(dt)
= E
[∫
dom D
F (t, ω)λ(t, ω)µ(dt)
]
.
2
Similarly, (29) holds if we replace H− by H+, i.e.,
E
[∫∫
D
H+(t, ω, t′, ω)N(dt, ω)N(dt′, ω)
]
= E
[∫∫
D
H+(t, ω, t′, ω, )λ(t, ω)λ(t′, ω)µ(dt′)µ(dt)
]
. (30)
When t = t′, H0(t, ω) ≡ H0(t, ω, t′, ω) is a F -predictable process and N(dt, ω)N(dt′, ω) = N(dt, ω); when
t 6= t′, H0(t, ω, t′, ω) = 0. Thus
E
[∫∫
D
H0(t, ω, t′, ω)N(dt, ω)N(dt′, ω)
]
= E
[∫
diag D
H0(t, ω, t, ω)N(dt, ω)
]
= E
[∫
diag D
H0(t, ω, t, ω)λ(dt, ω)µ(dt)
]
, (31)
where diag D = t : (t, t) ∈ D.
Combining (29), (30) and (31), we have
Theorem 3. For any second-order predictable process H(t, ω, t′, ω), given a point process N(ω) with conditional
intensity λ(t, ω), the equalities
E
[∫∫
D
H(t, ω, t′, ω)N(dt, ω)N(dt′, ω)
]
= E
[∫∫
D
H(t, ω, t′, ω)λ(t, ω)λ(t′, ω)µ(dt′)µ(dt)
]
+ E
[∫
diag D
H0(t, ω, t, ω)λ(dt, ω)µ(dt)
]
,
second-order residual analysis of point processes 27
and
E
[
∫∫
D\diag2 D
H(t, ω, t′, ω)N(dt, ω)N(dt′, ω)
]
= E
[∫∫
D
H(t, ω, t′, ω)λ(t, ω)λ(t′, ω)µ(dt′)µ(dt)
]
,
hold if the integrals on the right-hand side exist, where diag2 D = (t, t) ∈ D.
C. Estimation procedures for the ETAS model (Zhuang et al., 2002, 2004)
(a) Set up the initial background seismicity rate, for example, let u(x) = 1.
(b) Set
λ(t, x) = γ(s)
[
ν u(x) +∑
i:ti<t
κ(si) g(t − ti) f(x − xi | si)
]
,
and estimate ν and the other model parameters by maximizing the log-likelihood function
log L =∑
(ti,xi,si)∈D
log λ(ti, xi, si) −
∫∫∫
D
λ(t, x, s) dt dxds,
where D is a specified spatio-temporal region of interests.
(c) For each event i, set
ϕi = ν u(xi) γ(si)/λ(ti, xi, si). (32)
(d) Get a better estimate of the background rate by using the weighted kernel estimates
u(x) ∝∑
i
ϕi Z(x − xi, hi) (33)
where Z represent the gaussian kernel function, the bandwidth hi is the distance to the npth closest events
to i or is a threshold bandwidth if this distance is less than the threshold bandwidth.
(e) Replace the background rate by this better one, and return to Step 2. Repeat until the results converge.
28 J. Zhuang
130 135 140 145 150
3035
4045 (a)
Longitude
Latit
ude
0 5000 10000 15000 20000 25000
3035
4045 (b)
Time in days
Latit
ude
Fig. 1. Occurrence locations and times of shallow earthquakes in the central Japan area. (a) Epicentral locations. The polygonrepresents the target region. (b) Epicentral latitudes against occurrence times. Black and gray circles are target events andcomplementary events, respectively. Different sizes of circles show the magnitude of earthquakes from 4.2 to 8.1.
second-order residual analysis of point processes 29
Magnitude
Trig
gerin
g ab
ilitie
s
5 6 7 8
0.05
0.1
0.2
0.5
12
510
2050
All eventsTriggeredBackgroundTheoretical
Fig. 2. Reconstructed triggering abilities (average number of children triggered by events of same magnitude) for all the eventsK
(1)(s), background events κ(1)0 (s) and triggered events κ
(1)1 (s), as in (14), (20) and (21), respectively (Zhuang et al., 2004).
The solid line represents the initial κ(s) = K(0)(s) = κ
(0)0 (s) = κ
(0)1 (s) obtained by MLE from fitting model (1) to the JMA
catalogue. Magnitude 4.2 on the horizontal axes corresponds to s = 0.
30 J. Zhuang
(a)
Magnitude
Sca
ling
Fac
tor
0.00
10.
010.
1
5 6 7 8
(b)
MagnitudeS
calin
g F
acto
r0.
001
0.01
0.1
5 6 7 8
(c)
Magnitude
Sca
ling
Fac
tor
0.00
10.
010.
1
5 6 7 8
(d)
Magnitude
Sca
ling
Fac
tor
0.00
10.
010.
1
5 6 7 8
Fig. 3. Reconstructed results of σ(1)(s) against the corresponding magnitudes (Magnitude 4.2 corresponds to s = 0) of parent
earthquakes (Ogata and Zhuang, 2006). Circles indicate the values of σ(1)(s). (a) shows the reconstructed results from the
original model for the JMA catalogue. (b) is the same as (a), but the catalogue is a simulation of the original model. (c)Reconstructed results from the model equipped with (24). (d) is the same as (c), but the catalogue is a simulation of the modelequipped with (24). In (c) and (d), the solid lines represent Ce