Second-order residual analysis of spatio-temporal point ...bemlar.ism.ac.jp/zhuang/pubs/residual3.4.pdf · Temporal, spatial and spatio-temporal point processes have been increasingly

Second-order residual analysis of spatio-temporal point processesand applications in model evaluation

Jiancang Zhuang

Institute of Statistical Mathematics, Tokyo, Japan.

Summary.This paper gives first-order residual analysis for spatio-temporal point processes similar to the residual analy-

sis developed by Baddeley et al. (2005) for spatial point process, and also proposes principles for second-order residual

analysis based on the viewpoint of martingales. Examples are given for both first- and second-order residuals. In par-

ticular, residual analysis can be used as a powerful tool in model improvement. Taking a spatio-temporal epidemic-type

aftershock sequence (ETAS) model for earthquake occurrences as the baseline model, second-order residual analysis

can be useful to identify many features of the data not implied in the baseline model, providing us with clues of how to

formulate better models.

1. Introduction and motivations

Temporal, spatial and spatio-temporal point processes have been increasingly widely used in many fields, in-

cluding epidemiology, biology, environmental sciences and geosciences. Among the associated statistical inference

techniques, such as model specification, parameter estimation, model selection, testing goodness-of-fit and model

evaluation, the tools used for testing goodness-of-fit and model evaluation are quite under-developed. This is one

motivation for this article.

Model selection procedures can be used in testing goodness-of-fit and model evaluation. Given several explicit

models that are fitted to the same dataset, we can use some model selection criterion such as Akaike’s information

criterion (AIC, see, e.g., Akaike, 1974) and cross validation (e.g., Stone, 1977) to find the best model among

them. To find a model better than the current best model, one can always try several possible versions of new

2 J. Zhuang

models, fit them to the same dataset and use the model selection procedures again to see whether one of the new

models becomes the best performing model. However, the above procedures are not always easy to implement.

Formulating a model and fitting it to the dataset may involve heavy programming and computational tasks. Finally,

model selection procedures only give us some quantities that indicate the overall fit of each model. It is hard to

deduce from these quantities whether a model, even if it is not the best one, has some better properties than other

models ranked higher by the model selection procedure. It is very helpful if the model improvement process can

be simplified. Residual analysis developed in this article can be used for this purpose.

To help motivate this work, we begin with a description of the dataset used in this study. The developments

here are associated with seeking answers to a series of problems in modelling the phenomena associated with

earthquake clusters. Although earthquake data come from the field of geophysics, similar problems also appear in

the epidemiological modelling of contagious diseases, in biology, in ecology and in environmental sciences.

The epicentres of earthquakes are not homogeneously distributed on the surface of the earth. In the globe,

earthquakes mainly occur in the subduction zone between plate boundaries. Locally, earthquakes accumulate along

active faults or in volcanic regions. Their depths range from several to 700 kilometers. Although an earthquake

can be as big as M7, resulting in huge disasters, most earthquakes are so small that they can be detected only by

sensitive seismometres.

Seismicity is clustered in both space and time. The overlapping of earthquake clusters with one another and

also with the background seismicity, complicates our analysis. For the purpose of long-term earthquake prediction,

i.e., evaluating the risk of the occurrence of a powerful earthquake in about a 10-year time scale, a good estimate

of the background seismicity rate is necessary. On the other hand, for short-term prediction (in a scale of an hour

or a day), a good understanding of earthquake clusters is necessary.

The earthquake catalogue consist of a list (ti, xi, si) and other associated information, where ti, xi and si

record, respectively, the occurrence time, the epicentre location and magnitude of the ith event. Figure 1 shows

second-order residual analysis of point processes 3

the shallow earthquakes (with depths less than 100 km) in the Japanese Meteorological Agency (JMA) catalogue

used in this analysis. The time span of this catalogue is 01/01/1926 to 31/12/1999. In this article, we select

the data in the polygon with vertices (134.0E, 31.9N), (137.9E, 33.0N), (143.1E, 33.2N), (144.9E, 35.2N),

(147.8E, 41.3N), (137.8E, 44.2N), (137.4E, 40.2N), (135.1E, 38.0N) and (130.6E, 35.4N). The time

period from the 10000th day after 01/01/1926 to 31/12/1991 is used as the target range in which to estimate the

parameters through the method of maximum likelihood.

It is easy to see from Figure 1 that earthquakes are clustered. Typically, the spatio-temporal ETAS (epidemic

type aftershock sequence) model is used to describe the behavior of earthquake clustering (Kagan, 1991; Rathbun,

1994; Musmeci and Vere-Jones, 1992; Ogata, 1988, 1998, 2004; Ogata et al 2003; Zhuang et al., 2002, 2004;

Console and Murru, 2002; Console et al., 2003; Helmstetter and Sornette, 2003a, 2003b; Helmstetter et al., 2003).

In this model, seismicity is classified into two components, the background and the cluster. Background seismicity

is modelled as a Poisson process that is temporally stationary but not spatially homogeneous. Once an event

occurs, no matter if it is a background event or if it is generated (triggered) by another previous event, it produces

(triggers) its own children according to certain rules. Such a model is a continuous-type branching processes with

immigration (background). This model can by defined completely by the conditional intensity function (hazard

rate conditional on a given history Ft up to current time t, see Appendix A or Daley and Vere-Jones, 2003, Chapter

7, for more details) as

λ(t, x, s) = lim∆t↓0, ∆x↓0,∆s↓0

PrN((t, t + ∆t] × (x, x + ∆x] × (s, s + ∆s]) ≥ 1|Ft

∆t ∆x∆s.

The conditional intensity function of the ETAS model used in this paper takes the form given by Ogata (1998),

i.e.,

λ(t, x, s) = γ(s)

[

u(x) +∑

i: ti<t

κ(si) g(t − ti) f(x − xi | si)

]

(1)

4 J. Zhuang

where

κ(s) = A exp[αs]; (2)

γ(s) = β exp[−βs] H(s);

f(x | s) =q − 1

πCeαs

(

1 +‖x‖2

Ceαs

)−q

, (3)

and

g(t) =p − 1

c

(

1 +t

c

)−p

H(t), p > 1,

H being the Heaviside function. In the above, the magnitude distribution γ(s) is based on the Gutenberg-Richter

law (Gutenberg and Richter, 1956), the expected number of children κ(s) is based on Yamanaka and Shimazaki

(1990), and the time density g(t) is based on the modified Omori formula (Omori, 1898; Utsu, 1969), all being

empirical laws in seismicity studies. The background rate and the parameters A, α, β, c, p and C in the model

can be estimated by an iterative algorithm (Zhuang et al., 2002, 2004, see Appendix C).

However, there are many questions about the above model formulation. For example:

1 Is the background process stationary?

2 Are background events and triggered events different, for example, in magnitude distribution or in triggering

offspring?

3 Does the magnitude distribution of triggered events depend on the magnitudes of their parent events?

4 Is it reasonable to apply the same exponential function eαs in both κ(s) and f(x|s)?

In previous studies, residual analysis has been carried out by transforming the point process into a standard

Poisson process (Ogata, 1988). Schoenberg (2004) uses the thinned residuals to analyse the goodness-of-fit of the

ETAS model to Californian earthquake data. Baddeley et al. (2004) have made more remarkable and general


developments. But their residual analysis methods are all of the first order and far from being sufficient for solving

problems where second-order properties such as clustering and inhibition are concerned. To answer these, it is

necessary for us to generalise the concepts of residual analysis to higher orders.

Zhuang et al. (2004) developed a stochastic reconstruction method to test the above hypotheses associated with

earthquake clusters, using the ETAS model as the reference model. Their method is based purely on intuition

rather than on a strict theoretical basis. As we show in this article, their method can be validated by using the

tools of residual analysis. Providing a theoretical basis for the stochastic reconstruction method that can also be

applied to a wider range of point-process models is another motivation of this article.

In the ensuing sections, we first review the first-order residuals developed by Baddeley et al. (2005) and then

propose principles for second-order residuals. The uses and powers of these residuals are illustrated via some simple

examples and also by solving the questions raised in the formulation of the ETAS model for the purpose of model

improvement.

2. First-order residuals and examples

Baddelay et al. (2005) give the first-order residuals for a spatial point process X = x1, x2, · · · based on the

Nguyen-Zessin (1979) formula

E

[

∑

xi∈X∩D

h(xi; X \ xi)

]

= E

[∫

D

h(x; X)λp(x; X)µ(dx)

]

,

for any measurable set D, where λp is the Papangelou conditional intensity. Thus, they define innovations with

respect to (D, h) to be

I(D, h, λp) =∑

xi∈X∩D

h(xi; X \ xi) −

∫

D

h(x; X)λp(x; X)µ(dx),

and residuals corresponding to I(D, h, λp)

R(D, h, λp) =∑

xi∈X∩D

h(xi; X \ xi) −

∫

D

h(x; X)λp(x; X)µ(dx),

6 J. Zhuang

where λp is the fitted conditional intensity. If h also depends on the parameters θ in λp, h is obtained by substituting

the estimated parameters θ in h.

The conditional intensity for temporal or spatio-temporal point processes defined in Appendix A is simpler than

the Papangelou conditional intensity for spatial point processes. In the former case, the occurrence of an event at a

certain time or a spatio-temporal location only depends on the events occurring at earlier times; while in the latter

case, the occurrence of an event at a particular location depends on all the other events that occur elsewhere. In

principle, we can simply obtain the conditional intensity defined in (1) and Appendix A through taking expectation

of the Papangelou conditional intensity, which is defined as conditional on the σ-algebra generated by all the events

not at the current time and location, over a simpler σ-algebra of the observation history up to current time, to

define first-order residuals for a spatio-temporal point process. However, it is hard to generalise first-order residuals

to second-order residuals in this way, which is the main concern of this article. We make use of the evolutionary

features of the process through the viewpoints of martingale theory.

Let N be a simple spatio-temporal point process in a interval [0, T ] and a d-dimensional region X ⊂ Rd,

admitting a conditional intensity λ(t, x) (see Delay and Vere-Jones, 1988, Chapter 13, or Appendix A for definition

of conditional intensity). According to the martingale property of the conditional intensity, for any predictable

process (measurable with respect to the σ-algebra generated by sets of the form (s, t] × B × E where E ∈ Fs and

B ∈ X ) h(t, x) ≥ 0, almost everywhere (a.e.),

E

[∫

D

h(t, x)N(dt × dx)

]

= E

[∫

D

h(t, x)λ(t, x)µ(dt × dx)

]

, ∀D ∈ T⊗

X (4)

where µ is the Lebesgue measure ℓ × ℓd (see also, Bremaud, 1981, Chap. 2; Karr, 1991, Chap. 5).

Using the terminology adopted in Baddeley et al. (2005) for spatial point processes, define first-order innovations

with respect to a predictable function h(t, x) ≥ 0, a.e., and a measurable set D ⊂ T × X by

I(D, h, λ) =

∫

D

[h(t, x)N(dt × dx) − h(t, x)λ(t, x)µ(dt × dx)] . (5)


Due to (4), E[I(D, h, λ)] = 0. The first-order residuals are defined by replacing the true model in (5) by the fitted

model, i.e.,

R(D, h, λ) =

∫

D

[

h(t, x)N(dt × dx) − h(t, x) λ(t, x)µ(dt × dx)]

.

If the fitted model is close enough to the true model, then R(D, h, λ) ≈ 0.

The following six examples give different kinds of first-order innovations. Because the residuals corresponding

to each of these innovations can be easily defined, we give only the forms of innovations.

Example 1. Raw innovations. Let h(t, x) = 1, then the corresponding innovation is

I(D, h, λ) = N(D) −

∫

D

λ(t, x)µ(dt × dx),

which is called the raw innovation. One important property of raw innovations/residuals in applications is that,

for a realization of the process (ti, xi) : i ∈ N,

τi =

∫ ti

0

∫

X

λ(t, x) ℓd(dx) ℓ(dt)

is a standard Poisson process (Meyer, 1971; Papangelou, 1972; Ogata, 1988; Vere-Jones and Schoenberg, 2004).

Example 2. Reciprocal-lambda innovations (Baddeley et al. 2005; Schoenberg, 2004) If h(t, x) = 1/λ(t, x),

I(D, h, λ) =

∫

D

N(dt × dx)

λ(t, x)− µ(D).

This is an analogue of the Stoyan-Grabarnik (1991) weights for Gibbs point processes. The residual analysis on

the space-time ETAS model given by Schoenberg (2004) was essentially based on the reciprocal-lambda innova-

tions/residuals.

Example 3. Pearson innovations (Baddeley et al., 2005). If h(t, x) = 1/√

λ(t, x),

I(D, h, λ) =

∫

D

N(dt × dx)/√

λ(t, x) −

∫

D

√

λ(t, x) µ(dt × dx).

8 J. Zhuang

Example 4. Score innovations (Baddeley et al., 2005). The score innovations are defined by

I

(

D,∂ log λ(t, x)

∂θ, λ

)

=

∫

D

∂ log λ(t, x)

∂θN(dt × dx) −

∫

D

∂λ(t, x)

∂θµ(dt × dx),

where θ is any of the regular parameters in the model. The equation on the score residuals

R

(

D,∂ log λ(t, x)

∂θ, λ

)

= 0

is the condition for maximizing the log-likelihood function in D, i.e.,

log L =

∫

D

log λ(t, x)N(dt × dx) −

∫

D

λ(t, x)µ(dt × dx).

Example 5. Weighted score innovations and localized maximum likelihood estimates. The innovation defined

by

I

(

D, w(t, x; t0, x0)∂

∂θlog λ(t, x), λ

)

=

∫

D

w(t, x; t0, x0)∂ log λ(t, x)

∂θ[ N(dt × dx) − λ(t, x)µ(dt × dx)]

is called the weighted score innovation, where w(t, x; t0, x0) is a kernel function centered at a given (t0, x0). The

equality on the weighted score residual

R

(

D, w(t, x; t0, x0)∂ log λ(t, x)

∂θ, λ

)

= 0,

is the condition for maximizing the local log-likelihood

WLL =

∫

D

w(t, x; t0, x0) log λ(t, x)N(dt × dx) −

∫

D

w(t, x; t0, x0)λ(t, x)µ(dt × dx).

Example 6. Testing stationarity of the background process in the ETAS model. Define the background inno-

vation for the ETAS model by

I(D, ϕ, λ) =∑

i

ϕ(ti, xi, si) ID(ti, xi, si)) −

∫∫∫

D

u(x) γ(s) dt dxds, (6)


where ϕ(t, x, s) = u(x)γ(s)/λ(t, x, s), λ(t, x, s) and u(x) are given in (1), D is a measurable subset of T ×X×M and

ID is the index function of D. To test the stationarity of the background process, choose D(t) = (0, t) × Bx × Bs,

Bx ⊂ X and Bs ⊂ S. From E [I(D, ϕ, λ)] = 0,

I(D, ϕ, λ)(t) =∑

i

ϕ(ti, xi, si) ID(t)(ti, xi, si) ≈ t

∫

Bx

u(x)µ(dx)

∫

Bs

γ(s)µ(ds), (7)

which is a linear function with respect to t. Applications of such tests and geophysical interpretations can be found

in Zhuang et al. (2005) and Hainzl and Ogata (2005).

3. Second-order residual analysis

Given a second-order F -predictable function H(t, x; t′, x′), i.e., a process that can be generated from processes

of the form h1(t, x)h2(t′, x′), where h1 and h2 are first-order predictable processes, by linear combinations and

monotone limits (see details in Appendix B), the second-order innovation with respect to H(t, x; t′, x′) for D ∈

(T⊗

X )⊗

(T⊗

X ) is

I2(D, H, λ) =

∫∫

D\ diag2 D

H(t, x; t′, x′)N(dt′ × dx′)N(dt × dx)

−

∫∫

D

H(t, x; t′, x′)λ(t, x)λ(t′, x′)µ(dt′ × dx′)µ(dt × dx), (8)

where diag2 D = (t, x, t, x) ∈ D. As shown in Appendix B, E[I2(D, H, λ)] = 0. The second-order residual can be

defined by R2(D, H, λ) = I2(D, H, λ).

10 J. Zhuang

Example 7. Variance of first-order innovations. For a first-order predictable process h(t, x) and any B ∈

T⊗

X ,

Var

[∫

B

h(t)N(dt)

]

= E

[

(∫

B

h(t)N(dt)

)2]

−

[

E

(∫

B

h(t)N(dt)

)]2

= E

[∫∫

B×B

h(t)h(s)N(dt)N(ds)

]

−

[

E

(∫

B

h(t)N(dt)

)]2

= E

[∫∫

B×B

h(t)h(s)λ(t)λ(s)µ(dt)µ(ds)

]

+ E

[∫

B

h2(t)λ(t)µ(dt)

]

−

[

E

(∫

B

h(t)λ(t)µ(dt)

)]2

= E

[∫

B

h2(t)λ(t)µ(dt)

]

,

or,

Var

[∫

B

h(t)N(dt)

]

= E

[∫

B

h2(t)N(dt)

]

.

The above formula leads to an approximate estimate of the standard error for first-order residuals. In Example 6, if

the ETAS model is close enough to the true one, the standard error of the background residual R(D, ϕ, λ)(t) deviat-

ing from the expected linear function of t given in (7) can be approximated by[∑

i ϕ2(ti, xi, si) ID(t)(ti, xi, si)]1/2

.

4. Applications of second-order residual to earthquake data: model improvement

Now we have tools to solve the problems associated with the earthquake clusters listed in the introduction. In this

section, before testing these hypotheses, we briefly recall the basic ideas of the stochastic reconstruction method

proposed by Zhuang et al. (2004).

Once the conditional intensity function in (1) is estimated, it provides us with a good way to evaluate the

probability of whether an event is likely to be a background event or be triggered by others (Kagan and Knoppoff,

1981; Zhuang et al 2002). Consider the contribution of the background seismicity rate relative to total seismicity

rate at the occurrence of the ith event,

ϕi =u(xi) γ(si)

λ(ti, xi, si), (9)


If we remove the ith event with probability 1 − ϕi for all the events in the process, we can realize a process with

the occurrence rate of µ(x)γ(s) (see, also, Ogata, 1981; Karr, 1991, Chap. 5 for justification). Thus it is natural to

regard ϕi as the probability that the ith event is a background event. Similarly,

ρij =γ(sj) g(tj − ti) f(xj − xi | si)

λ(tj , xj , sj)(10)

can be regarded as the probability that the jth event is directly produced by the ith event. That is to say, given i

fixed, if we keep each event j with probability ρij , we can obtain a Poisson process of intensity γ(s) g(t−ti) f(x−xi |

si), which is the process consisting of the children of the ith event in the catalogue. Based on the above ideas, Zhuang

et al. (2004) tested various hypotheses associated with the clustering features of earthquakes by building empirical

functions with the events weighted by the estimated probabilities ϕi or ρij according to the fitted conditional

intensity of the ETAS model. This reconstruction method was purely based on intuition. In the coming sections,

we show how it works.

For simplification, in the following sections, the notation of Riemann integrals is used, and, without confusion,

we suppose that the integrals are over the whole range to which the data correspond .

4.1. Improving individual functions

Formulating a point-process model with a conditional intensity function for a particular dataset is mostly to begin

with empirical results and the researchers’ experience. Once an explicit form for the model is given, the first

question is whether this form is suitable or whether it can be improved in some way. In this subsection, we will

use κ(s) in (1) as a simple example to show how to improve the individual function in the model formulation.

Consider the following form for the conditional intensity of the ETAS model

λ1(t, x, s) = γ(s)

[

u(x) +∑

i: ti<t

K(si) g(t − ti) f(x − xi|si)

]

,

where K(s) is assumed to be a more suitable function than κ(s) for the mean number of children from an event

of magnitude s. Even though such K(s) is unknown, we will show below that we can still construct a so-called

12 J. Zhuang

“ratio-unbiased” (a ratio of two unbiased estimates) estimate of K(s) based on the properties of second-order

innovations/residuals.

Let

Hij =K(si) g(tj − ti) f(xj − xi | si) γ(sj)

λ1(tj , xj , sj).

Then from (8)

E

∑

i, j

I(si ∈ [s0 − δ, s0 + δ])Hij

= E

∫∫∫∫∫∫

I(s ∈ [s0 − δ, s0 + δ])K(s) g(t′ − t) f(x′ − x|s) γ(s′)λ1(t, x, s) dt′ dx′ ds′ dt dxds

=

∫

K(s) γ(s) I(s ∈ [s0 − δ, s0 + δ]) ds × E

[∫∫

λ1(t, x) dt dx

]

≈ 2 δ K(s0) γ(s0) × E

[∫∫

λ1(t, x) dt dx

]

, (11)

where

λ1(t, x) = u(x) +∑

i: ti<t

K(si) g(t − ti) f(x − xi|si).

Using the first-order innovation formula,

E

∑

i, j

I(si ∈ [s0 − δ, s0 + δ])

= E

∫∫∫

I(s ∈ [s0 − δ, s0 + δ])λ1(t, x, s) dt dxds

=

∫

γ(s) I(s ∈ [s0 − δ, s0 + δ]) ds × E

[∫∫

λ1(t, x) dt dx

]

≈ 2 δ γ(s0) × E

[∫∫

λ1(t, x) dt dx

]

. (12)

Dividing (11) by (12) yields

K(s0) ≈

∑

i, j

Hij I(si ∈ [s0 − δ, s0 + δ])

∑

i

I(si ∈ [s0 − δ, s0 + δ]). (13)


Although K(s) is unknown, we can set K(0)(s) = κ(s) and thus H(0)ij = ρij for the first step, where κ(s) is from

the fitted conditional intensity of the ETAS model and ρij is the estimated ρij defined in (10). We can then use

(13) to get an updated K(s), i.e.,

K(1)(s0) =

∑

i, j

ρij I(si ∈ [s0 − δ, s0 + δ])

∑

i

I(si ∈ [s0 − δ, s0 + δ]). (14)

Since∑

j ρij can be interpreted as an estimate of the number of children from the ith event, the right side of (14)

can be regarded as the average number of children produced by a mother event of magnitude around s0. This

procedure can be iterated until convergence.

In this application, the second-order innovation plays a role in guaranteeing that the estimate of the quantity

in (11) is unbiasedly proportional to K(s) γ(s). The estimates from fitting the ETAS model to the earthquake data

are only used as initial values in (13). Such a procedure is similar in spirit to the expectation-maximization (EM)

algorithm, with the expectation step carried out by the properties of second-order innovations/residuals and the

maximization step replaced by the non-parametric approach of the “ratio-unbiased” estimate in (13).

Example 8. Average numbers of children produced by earthquakes in the JMA catalogue. We apply the above

reconstruction procedures to the JMA catalogue and the reconstructed results are shown in Figure 2. It is clear

that the differences between K(1)(s) and κ(s) are negligible, which implies that it is not necessary to improve κ(s)

at this stage. We will continue discussing this figure in Example 9.

4.2. Difference between background events and triggered events

The ETAS model assumes that there is no distinction between background events and triggered events. Once an

event occurs, its magnitude is from the common unique magnitude distribution and it triggers offspring in the same

manner as others. It is interesting and important to test whether background events are different from triggered

events. In this section, we will test whether they are different in the behavior of triggering children.

14 J. Zhuang

Consider a more complicated model admitting a conditional intensity

λ1(t, x, s) = λ1(t, x, s, ω) I(ω = 0) + λ1(t, x, s, ω) I(ω = 1), (15)

where

λ1(t, x, s, ω) =

u(x) γ0(s), if ω = 0,

γ1(s)∑

ti<t ξ(t, x; ti, xi, si, ωi), if ω = 1.(16)

and

ξ(t, x; ti, xi, si, ωi) =

κ0(si) g0(t − ti) f0(x − xi | si), if ωi = 0;κ1(si) g1(t − ti) f1(x − xi | si), if ω = 1.

(17)

That is to say, an event is a background event if ω = 0 or a triggered event if ω = 1. In this mixed-type model,

a background event (t, x, s) has a magnitude s from a p.d.f γ0(s) and triggers on average κ0(s) children, whose

magnitude, time and location distributions are characterised by γ1, g0 and f0, respectively. For a triggered event

(t, x, s), the magnitude is from the density γ1(s) and it triggers κ1(s) children on average, with the children’s

magnitude, time and location densities being γ1, g1 and f1, respectively.

To construct a “ratio-unbiased” estimate of κ0, let

H(t, x, s; t′, x′, s′) =γ1(s

′) ξ(t′, x′; t, x, s, ω)λ1(t, x, s, ω) I(ω = 0)

λ1(t′, x′, s′)λ1(t, x, s)

=γ1(s

′)κ0(s) g0(t′ − t) f0(x

′ − x | s)u(x) γ0(s)

λ1(t′, x′, s′)λ1(t, x, s)

and

h(t, x, s) =λ1(t, x, s, ω) I(ω = 0)

λ1(t, x, s)=

u(x) γ0(s)

λ1(t, x, s).

Then

E

∑

i, j

H(ti, xi, si; tj , xj , sj) I(si ∈ [s0 − δ, s0 + δ])

= E

[∫∫∫∫∫∫

H(t, x, s; t′, x′, s′) I(si ∈ [s0 − δ, s0 + δ]) dt′ dx′ ds′ dt dxds

]

=

∫

κ0(s) γ0(s) I(si ∈ [s0 − δ, s0 + δ]) ds ×

∫∫

u(x) dt dx

≈ 2δ κ0(s0) γ0(s0) ×

∫∫

u(x) dt dx (18)


and

E

[

∑

i

h(ti, xi, si) I(si ∈ [s0 − δ, s0 + δ])

]

= E

[∫∫∫

h(t, x, s) I(si ∈ [s0 − δ, s0 + δ]) dt dxds

]

=

∫

γ0(s) I(si ∈ [s0 − δ, s0 + δ]) ds ×

∫∫

u(x) dt dx

≈ 2δ γ0(s0) ×

∫∫

u(x) dt dx. (19)

A “ratio-unbiased” estimate of κ0(s) given by (18) and (19) is

κ0(s0) =

∑

i, j H(ti, xi, si; tj , xj , sj) I(si ∈ [s0 − δ, s0 + δ])∑

i h(ti, xi, si) I(si ∈ [s0 − δ, s0 + δ]).

If, at the initial step, we let γ(0)0 = γ

(0)1 = γ, κ

(0)0 = κ

(0)1 = κ, f

(0)0 = f

(0)1 = f and g

(0)0 = g

(0)1 = g, where γ, κ, f

and g are from the fitted conditional intensity of the ETAS model, we can obtain

κ(1)0 (s0) =

∑

i, j ρij ϕi I(si ∈ [s0 − δ, s0 + δ])∑

i ϕi I(si ∈ [s0 − δ, s0 + δ]), (20)

where ϕi and ρij are the estimates of ϕi and ρij defined in (9) and(10), respectively.

Similarly, we can find that

κ1(s0) =

∑

i, j H ′(ti, xi, si; tj, xj , sj) I(si ∈ [s0 − δ, s0 + δ])∑

i(1 − h(ti, xi, si)) I(si ∈ [s0 − δ, s0 + δ]).

where

H ′(t, x, s; t′, x′, s′) =γ1(s

′) ξ(t′, x′; t, x, s, ω)λ1(t, x, s, ω) I(ω = 1)

λ1(t′, x′, s′)λ1(t, x, s)

=γ1(s

′)κ1(s) g1(t′ − t) f1(x

′ − x | s)

λ1(t′, x′, s′)

[

1 −u(x) γ0(s)

λ1(t, x, s)

]

and

κ(1)1 (s0) =

∑

i, j ρij (1 − ϕi) I(si ∈ [s0 − δ, s0 + δ])∑

i(1 − ϕi) I(si ∈ [s0 − δ, s0 + δ]). (21)

The other functions in equations (15) to (17) such as g1, g2, f1 and f2 can be reconstructed in similar ways.

16 J. Zhuang

Example 9. Differences in average numbers of children triggered by the background events and the triggered

events in the JMA catalogue. Applying the above procedure to the JMA catalogue, the triggering abilities (average

numbers of children) from the background events and the triggered events for each magnitude are plotted in

Figure 2. It is found that background events and triggered events generate children according to different positive

exponential laws. However, a background event triggers fewer children on average than a triggered event of the

same magnitude. Such a feature plays an important role in evaluating the probability associated with foreshocks

(background events of which each has at least one larger descendant) (see a detailed discussion in Zhuang and

Ogata, 2006).

4.3. Spatial scaling factor and triggering abilities

As argued in the introduction, immediate questions regarding the use of (3) include: 1 Is the scaling factor Ceαs

necessary, i.e., can it be replaced by a constant D0? 2 Should the scaling factor Ceαs have the same exponent α

as in the triggering abilities of (2)? 3 Is the scaling factor of the form of an exponential law?

Suppose that the true conditional intensity is

λ1(t, x, s) = γ(s)

[

u(x) +∑

i: ti<t

κ(si) g(t − ti) f∗(x − xi|si)

]

,

where

f∗(x | s) = f∗(x, σ(s)) =π(q − 1)

σ(s)

[

1 +||x||2

σ(s)

]−q

,

is the p.d.f. for the locations of offspring with σ(s) being a function of s. Consider

H(t, x, s; t′, x′, s′) =γ(s′)κ(s) g(t′ − t)f∗(x′ − x, σ(s))

λ1(t′, x′, s′)

∂

∂σ(s)log f∗(x′ − x, σ(s)),


and then, for any positive number δ,

E

∑

i, j

H(ti, xi, si; tj , xj , sj) I(si ∈ [s0 − δ, s0 + δ])

= E

[∫∫∫∫∫∫

γ(s′)κ(s) g(t′ − t)∂f(x′ − x, σ(s))

∂σ(s)I(s ∈ [s0 − δ, s0 + δ])λ1(t, x, s) dt′ dx′ ds′ dt dxds

]

=

∫

κ(s) I(s ∈ [s0 − δ, s0 + δ]) γ(s) ds × E[λ1(t, x)] ×∂

∂σ(s)

∫

f∗(x′ − x, σ(s)) dx′

= 0.

Since f∗ integrates to one. Thus, for all s0, if δ is small enough,

∑

i, j

H(ti, xi, si; tj , xj , sj)I(si ∈ [s0 − δ, s0 + δ]) ≈ 0.

Further, the above equation gives

∑

i, j

γ(sj)κ(si) g(tj − ti)f∗(xj − xi, σ(si))

λ1(tj , xj , sj)I(si ∈ [s0 − δ, s0 + δ])

∂

∂σ(s)log f∗(xj − xi, σ(s))

∣

∣

∣

∣

σ(s)=σ(si)

≈ 0.

In order to update σ(s), we convert the above equation into

∑

i, j

γ(sj)κ(si) g(tj − ti)f∗(xj − xi, σ

(0)(si))

λ1(tj , xj , sj)I(si ∈ [s0 − δ, s0 + δ])

∂


∣

∣

∣

∣

σ(s)=σ(1)(si)

≈ 0.

(22)

Taking σ(0)(s) = Ceαs in the above equation, i.e., f∗(x, σ(0)(s)) = f , where f is the estimated f defined in (3),

and (22) gives

∑

i, j

ρij I(si ∈ [s0 − δ, s0 + δ])∂


∣

∣

∣

∣

σ(s)=σ(1)(s0)

= 0, (23)

The solution σ(1)(s0) for each s0 in (23) is the updated version from σ(s0) (see Zhuang et al. 2004 for how to solve

this equation iteratively). Such an estimate can be regarded as maximizing the following pseudo-log-likelihood

PLL(σ(s0)) =∑

i, j

ρij log f∗(xj − xi, σ(si)) I(si ∈ [s0 − δ, s0 + δ])

where δ is a small positive number.

18 J. Zhuang

Example 10. The scaling law for the locations of clusters in the JMA catalogue. Applying the above procedure

to the JMA catalogue by using the original model and a model equipped with

f1(x|s) =q − 1

πCeα′sexp

(

1 +||x||2

Ceα′s

)−q

, (24)

we obtain the results shown in Figure 3. For comparison, the same procedure is also applied to simulated catalogues

generated from the model. It can be seen that the scaling law for the spatial locations of offspring is still an

exponential law, but not the same as the one for triggering abilities. This conclusion results in a new revision of

the formulation of the space-time ETAS model in the form of (24) (See more detailed discussion in Ogata and

Zhuang, 2006; Zhuang et al., 2005).

5. Conclusions

The idea of residual analysis for spatio-temporal point processes can be generalised through the viewpoint of

martingale theory. For the first-order innovations/residuals, as shown in the examples, it can be almost said that

each martingale associated with the conditional intensity corresponds to a specific kind of residual. Extending the

idea to higher-order cases, the second-order residual analysis is especially useful for improving model formulation

when second order properties such as clustering or inhibition are of interest. In this article, we use the spatio-

temporal ETAS model with a non-homogeneous background rate as the initial model for describing the clustering

features in earthquake occurrences. First- and second-order residual analysis helps us to find a proper direction in

which to improve on the model formulation.

Acknowledgements

This work was carried out while the author was supported by a post-doctoral programme P04039 funded by the

Japan Society for Promotion of Science. The author thanks Prof. Yosihiko Ogata and Prof. David Vere-Jones for

introducing him to the interesting research field of point processes. Prof. Adrian Baddeley kindly sent him his


manuscript on residual analysis for spatial point processes, which stimulated this study. Discussions with Daryl

Daley, Estate Khmaladze, Frederic Schoenberg and David Vere-Jones proved very helpful as did the encouragement

and comments from the editor, Robin Henderson, as well as two anonymous referees.

References

Akaike, H. (1974), A new look at the statistical model identification. IEEE Transaction on Automatic Control,

AC-19, pp. 716-723.

Baddeley A., Turner R., Møller J. and Hazelton M. (2005) Residual analysis for spatial point processes. Journal

of the Royal Statistical Society, Series B, 67(5), 617-666. doi:10.1111/j.1467-9868.2005.00519.x.

Bremaud P. (1981), Point Processes and Queues: Martingale Dynamics, New York: Springer-Verlag.

Console, R., and M. Murru (2001), A simple and testable model for earthquake clustering, J. Geophys. Res., 106,

8699-8711.

Console, R., M. Murru, and A. M. Lombardi (2003), Refining earthquake clustering models. J. Geophys. Res.,

108(B10), 2468, doi:10.1029/2002JB002130.

Daley, D. and D. Vere-Jones (2003), An Introduction to Theory of Point Processes. Springer, New York.

Gutenberg B., Richter C. F., (1956). Earthquake Magnitude, Intensity, Energy, and Acceleration. Bull. Seism.

Soc. Am., 46, 105.

Hainzl, S. and Y. Ogata (2005), Detecting fluid signals in seismicity data through statistical earthquake modeling.

Journal of Geophysical Research, 110, B5, B05S07, doi:10.1029/2004JB003247.

Helmstetter A. and Sornette D. (2003) Foreshocks explained by cascades of triggered seismicity, J. Geophys. Res.,

108 (B10),2457 10.1029/2003JB002409.

20 J. Zhuang

Helmstetter A. and Sornette D. (2003) Predictability in the ETAS Model of Interacting Triggered Seismicity, J.

Geophys. Res., 108, 2482, 10.1029/2003JB002485.

Helmstetter A., Sornette D. and Grasso J.-R. (2003) Mainshocks are Aftershocks of Conditional Foreshocks:

How do foreshock statistical properties emerge from aftershock laws, J. Geophys. Res., 108 (B1), 2046,

doi:10.1029/2002JB001991.

Kagan, Y. (1991), Likelihood analysis of earthquake catalogues. Geophysical Journal International, 106, B7,

135-148.

Kagan, Y. Y., and L. Knopoff, (1981). Stochastic synthesis of earthquake catalogs. Journal of Geophysical

Research, 86, 2853-2862.

Karr, A.F. (1991). Point processes and their Statistical Inference, second edition. New York: Dekker.

Meyer P. (1971) Demonstration simplifiee d’un theoreme de Knight. In Seminaire de Probabilites, V, University

of Strasbourg, Lecture Notes in Math, 191, 191-195.

Musmeci, F. and D. Vere-Jones (1992). A space-time clustering model for historical earthquakes. Ann. Inst. Stat. Math.,

44, 1-11.

Nguyen X. X. and Zessin H. (1979). Integral and Differential Characterizations of the Gibbs process. Mathema-

tische Nachrichten. 88: 105-115.

Ogata Y. (1981). On Lewis’ simulation method for point processes. IEEE Transactions on Information Theory,

IT-27 (1): 23-31.

Ogata, Y. (1988), Statistical model for earthquake occurrences and residual analysis for point processes. J. Am.

Stat. Assoc., 83, 9-27.


Ogata, Y. (1998), Space-time point-process models for earthquake occurrences, Ann. Inst. Stat. Math., 50,

379-402.

Ogata, Y. (2004), Space-time model for regional seismicity and detection of crustal stress changes, J. Geo-

phys. Res., 109, B03308, doi:10.1029/2003JB002621.

Ogata, Y., K. Katsura and M. Tanemura (2003). Modelling heterogeneous space-time occurrences of earthquake

and its residual analysis, Appl. Stat. (J. Roy. Statist. Soc, Ser. C), 52, 499-509.

Ogata, Y. and J. Zhuang (2006), Space-time ETAS models and an improved extension, Tectonophysics, . Tectono-

physics. 413, 13-23.

Omori F., (1894). On after-shocks of earthquakes. J. Coll. Sci. Imp. Univ. Tokyo, 7, 111 – 200.

Papangelou F. (1972). Integrability of expected increments point processes and a related random change of scale.

Trans. Mer. Math. Soc.,165, 483-506.

Rathbun, S. L. (1993), Modelling marked spatio-temporal point patterns. Bull. Int. Stat. Inst., 55, Book 2,

379-396.

Schoenberg F. P. (2004). Multidimensional residual analysis of point process models for earthquake occurrences.

Journal of the American Statistical Association, 98, 789-795, doi: 10.1198/016214503000000710.

Stone, M. (1977), Asymptotics for and against cross-validation. Biometrika, 64, 29-35.

Stoyan D. and Grabarnik P. (1991). Second-order characteristics for stochastic structures connected with Gibbs

point processes. Mathematische Nachritchten, 151, 95-100.

Utsu T. (1969) Aftershock and earthquake statistics (I): Some para meters which characterize an aftershock

sequence and their interrelations. Journal o f the Faculty of Science, Hokkaido University, Series VII (Geo-

physics), 3: 129–19 5.

22 J. Zhuang

Vere-Jones D. and Schoenberg F. P. (2004). Rescaling marked point processes. Australian & New Zealand Journal

of Statistics, 46 (1): 133-143.

Yamanaka, Y. and K. Shimazaki (1990), Scaling relationship between the number of aftershocks and the size of

the main shock, J. Phys. Earth, 38, 305-324.

Zhuang J., Ogata Y. and Vere-Jones D. (2002). Stochastic declustering of space-time earthquake occurrences.

Journal of the American Statistical Association, 97: 369-380.

Zhuang J., Ogata Y., Vere-Jones D. (2004). Analyzing earthquake clustering features by using stochastic recon-

struction. Journal of Geophysical Research, 109, No. B5, B05301, doi:10.1029/2003JB002879.

Zhuang J., Chang C.-P., Ogata Y. and Chen Y.-I. (2005). A study on the background and clustering seismic-

ity in the Taiwan region by using a point-process model. Journal of Geophysical Research, 110: B05S18,

doi:10.1029/2004JB003157.

Zhuang J. and Ogata Y. (2006) Properties of the probability distribution associated with the largest event

in an earthquake cluster and their implications to foreshocks. Accepted by Physics Review, E. (e-print:

http://bemlar.ism.ac.jp/zhuang/publication.htm)

A. Point processes and conditional intensities

Consider a space-time point process consisting of events occurring at times ti in the interval [0, T ] and at

corresponding locations xi in a region X ⊂ Rd. Assume that the marginal temporal point process is orderly; i.e.,

the probability that more than one event occurs in [t, t + δ) × B is o(δ) for all t ≥ 0 and bounded B ⊂ X . Such

space-time point processes can be defined through random counting measures N on [0,∞) × X . Here, N(C × B)

is the number of events falling in a region B ∈ X and at times in a set C ∈ T , where X and T are the Borel

σ-algebras of subsets of X and [0,∞), respectively. Let Φ be the collection of locally bounded counting measures


N on [0,∞)×X and A be the σ-algebra on Φ generated by N((r, t]×B) : B ∈ X , 0 ≤ r ≤ t. Then a space-time

point process N is a measurable mapping of a probability space (Ω,F , P ) onto (Φ,A).

For any measurable B ∈ X , there exists an F -compensator A(t, B) such that N([0, t) × B) − A(t, B) is an

F -martingale. Let ℓ and ℓd denote Lebesgue measure in R and Rd, respectively. Suppose that, for each x ∈ X ,

there exists an integrable, non-negative, F -adapted process λ(t, x) such that, with probability 1, for all t ∈ R+

and B ∈ X ,∫

B

∫ t

0

λ(s, x) ℓ(ds) ℓd(dx) = A(t, B).

The conditions and justification for the existence of such a process can be found, for example, in Vere-Jones and

Schoenberg (2004). If it exists, λ(t, x) is called an F -conditional intensity, satisfying

∫

B

λ(t, x)ℓd(dx) = lim∆t↓0

E[N(t + ∆t, B) − N(t, B) | Ft]

∆t.

B. Second-order predictability and second-order residuals

Call the sub-σ-algebra of (T⊗

F)⊗

(T⊗

F) generated by sets of the form (s, t] × U × (s′, t′] × V , the F⊗

F-

predictable σ-algebra, denoted by ΨFN

F , where U ∈ Fs, 0 ≤ s ≤ t < ∞, V ∈ Fs, 0 ≤ s′ ≤ t′ < ∞ and T is the

Borel σ-algebra on R+. A process H(t, ω, t′, ω′) is F

⊗

F -predictable when it is ΨFN

F -measurable; that is, for

any A ∈ T , (t, ω, t′, ω) : H(t, ω, t′, ω′) ∈ A ∈ ΨFN

F . In other words, all the F⊗

F -predictable processes can

be generated from processes of the form

I(a,b](t) I(a′,b′](t′) IU (ω) IV (ω′), U ∈ Fa, V ∈ Fa′ ,

by linear combinations and monotone limits. We call the process h(t, t′, ω) = H(t, ω, t′, ω) second-order predictable

if H(t, ω, t′, ω′) is F⊗

F -predictable.

Given a F⊗

F -predictable process H(t, ω, t′, ω′), it is easy to verify that the process can be decomposed as

H(t, ω, t′, ω′) = H0(t, ω, t′, ω′) + H−(t, ω, t′, ω′) + H+(t, ω, t′, ω′) (25)

24 J. Zhuang

where H0(t, ω, t′, ω′) = H(t, ω, t′, ω′) It(t′), and H−(t, ω, t′, ω′) = H(t, ω, t′, ω′) I(t,∞](t

′) is F -predictable with

respect to t′ and ω′ when t and ω are fixed, and H+(t, ω, t′, ω′) = H(t, ω, t′, ω′) I(t′,∞](t) is F -predictable with

respect to t and ω when t′ and ω′ are fixed.

Lemma 1. Using the notation of H− and H+ in (25) for an F⊗

F-predictable process H(t, ω, t′, ω′), given an

F-adapted point process, N(t, ω) with F-conditional intensity λ(t, ω), define

F (t, ω, t′, ω′) ≡ E

[

∫

(t′, ∞)

H−(t, ω, s, ω′)N(ds, ω′)

∣

∣

∣

∣

∣

Ft′

]

(26)

and

G(t, ω, t′, ω′) ≡ E

[

∫

(t,∞)

H+(s, ω, t′, ω′)N(ds, ω)

∣

∣

∣

∣

∣

Ft

]

.

Then

F (t, ω, t′, ω′) = E

[

∫

(t′,∞)

H−(t, ω, s, ω′)λ(s, ω′) ds

∣

∣

∣

∣

∣

Ft′

]

(27)

and

G(t, ω, t′, ω′) = E

[

∫

(t,∞)

H+(s, ω, t′, ω′)λ(s, ω) ds

∣

∣

∣

∣

∣

Ft

]

, (28)

assuming that the integrals in the right sides of (27) and (28) exist. Moreover, F (t, ω, t′, ω′) and G(t, ω, t′, ω′) are

both F⊗

F-predictable processes.

Proof: For F in (26), we need only consider the case when t′ > t. For a fixed t and any u > t′, H−(t, ω, u, ω′)I(t′,+∞)(u)

is F -predictable with respect to u. Based on the martingale property associated with the conditional intensity,

J(u) =

∫

(0,u]

H−(t, ω, s, ω′) I(t′,+∞)(s) [N(ds, ω′) − λ(s, ω′)µ(ds)]

is also a zero-mean martingale when t and t′ are fixed, i.e.,

E[J(u) | Ft′ ] = J(t′) = 0, u > t.


i.e.,

E

[

∫

(t′, u]

H−(t, ω, s, ω′)N(ds, ω′)

∣

∣

∣

∣

∣

Ft′

]

= E

[

∫

(t′, u]

H−(t, ω, s, ω′)λ(s, ω′)µ(ds)

∣

∣

∣

∣

∣

Ft′

]

.

Since the conditional intensity measure is diffuse, the integral in the right-hand side of the above equation is

continuous with respect to t′. Thus, if u → +∞, for fixed t and ω, F (t, ω, t′, ω′) is left-continuous and adopted to

Ft′ . From Bremaud (1980, Theorem T5, Chapter I.3), F (t, ω, t′, ω′) is F -predictable with respect to (t′, ω′).

Moreover, if H−(t, ω, t′, ω′) takes a form of I(a,b] IU (ω) I(a′, b′](t′) IV (ω′), U ∈ Fa, V ∈ Fa′ , then F takes the

form I(a,b] IU (ω) f(t′, ω′) where f(t′, ω′) is F -predictable. That is to say, F is F⊗

F -predictable.

The related conclusions for G can similarly be proved. 2

Proposition 2. Assume that a second-order F-predictable process H(t, ω, t′, ω) can be similarly decomposed

into H−(t, ω, t′, ω), H+(t, ω, t′, ω) and H0(t, ω, t′, ω) as in (25). Then

E

[∫∫

D

H−(t, ω, t′, ω)N(dt, ω)N(dt′, ω)

]

= E

[∫∫

D

H−(t, ω, t′, ω, )λ(t, ω)λ(t′, ω)µ(dt′)µ(dt)

]

(29)

if the integral on the right-hand side exists, where D is a measurable subset of T × T .

Proof: Notice

I ≡ E

[∫∫

D

H−(t, ω, t′, ω)N(dt, ω)N(dt′, ω)

]

= E

∫

dom D

E

[∫ +∞

0

H−(t, ω, t′, ω) ID(t)(t′)N(dt′, ω)

∣

∣

∣

∣

Ft

]

N(dt, ω)

where domD = t : ∃t′ such that (t, t′) ∈ D and D(t) = t′ : (t, t′) ∈ D. From Lemma 1,

F (t, ω, t′, ω′) ≡ E

[∫ +∞

0

H−(t, ω, s, ω′) ID(t′)(s)N(ds, ω′)

∣

∣

∣

∣

Ft′

]

is F⊗

F -predictable, implying that

F (t, ω) ≡ F (t, ω, t, ω) = E

[∫ +∞

0

H−(t, ω, t′, ω) ID(t)(t′)N(dt′, ω)

∣

∣

∣

∣

Ft

]

26 J. Zhuang

is F -predictable.

Thus,

I = E

[∫

domD

F (t, ω)N(dt, ω)

]

= E

[∫

domD

F (t, ω)λ(t, ω)µ(dt)

]

.

On the other hand, from Lemma 1, the right-hand side of (29)

E

∫∫

D

H−(t, ω, t′, ω)λ(t′, ω)λ(t, ω)µ(dt′)µ(dt)

= E

∫

dom D

E

[

∫

D(t)

H−(t, ω, t′, ω)λ(t′, ω)µ(dt′)

∣

∣

∣

∣

∣

Ft

]

λ(t, ω)µ(dt)

= E

[∫

dom D

F (t, ω)λ(t, ω)µ(dt)

]

.

2

Similarly, (29) holds if we replace H− by H+, i.e.,

E

[∫∫

D

H+(t, ω, t′, ω)N(dt, ω)N(dt′, ω)

]

= E

[∫∫

D

H+(t, ω, t′, ω, )λ(t, ω)λ(t′, ω)µ(dt′)µ(dt)

]

. (30)

When t = t′, H0(t, ω) ≡ H0(t, ω, t′, ω) is a F -predictable process and N(dt, ω)N(dt′, ω) = N(dt, ω); when

t 6= t′, H0(t, ω, t′, ω) = 0. Thus

E

[∫∫

D

H0(t, ω, t′, ω)N(dt, ω)N(dt′, ω)

]

= E

[∫

diag D

H0(t, ω, t, ω)N(dt, ω)

]

= E

[∫

diag D

H0(t, ω, t, ω)λ(dt, ω)µ(dt)

]

, (31)

where diag D = t : (t, t) ∈ D.

Combining (29), (30) and (31), we have

Theorem 3. For any second-order predictable process H(t, ω, t′, ω), given a point process N(ω) with conditional

intensity λ(t, ω), the equalities

E

[∫∫

D

H(t, ω, t′, ω)N(dt, ω)N(dt′, ω)

]

= E

[∫∫

D

H(t, ω, t′, ω)λ(t, ω)λ(t′, ω)µ(dt′)µ(dt)

]

+ E

[∫

diag D

H0(t, ω, t, ω)λ(dt, ω)µ(dt)

]

,


and

E

[

∫∫

D\diag2 D

H(t, ω, t′, ω)N(dt, ω)N(dt′, ω)

]

= E

[∫∫

D

H(t, ω, t′, ω)λ(t, ω)λ(t′, ω)µ(dt′)µ(dt)

]

,

hold if the integrals on the right-hand side exist, where diag2 D = (t, t) ∈ D.

C. Estimation procedures for the ETAS model (Zhuang et al., 2002, 2004)

(a) Set up the initial background seismicity rate, for example, let u(x) = 1.

(b) Set

λ(t, x) = γ(s)

[

ν u(x) +∑

i:ti<t

κ(si) g(t − ti) f(x − xi | si)

]

,

and estimate ν and the other model parameters by maximizing the log-likelihood function

log L =∑

(ti,xi,si)∈D

log λ(ti, xi, si) −

∫∫∫

D

λ(t, x, s) dt dxds,

where D is a specified spatio-temporal region of interests.

(c) For each event i, set

ϕi = ν u(xi) γ(si)/λ(ti, xi, si). (32)

(d) Get a better estimate of the background rate by using the weighted kernel estimates

u(x) ∝∑

i

ϕi Z(x − xi, hi) (33)

where Z represent the gaussian kernel function, the bandwidth hi is the distance to the npth closest events

to i or is a threshold bandwidth if this distance is less than the threshold bandwidth.

(e) Replace the background rate by this better one, and return to Step 2. Repeat until the results converge.

28 J. Zhuang

130 135 140 145 150

3035

4045 (a)

Longitude

Latit

ude

0 5000 10000 15000 20000 25000

3035

4045 (b)

Time in days

Latit

ude

Fig. 1. Occurrence locations and times of shallow earthquakes in the central Japan area. (a) Epicentral locations. The polygonrepresents the target region. (b) Epicentral latitudes against occurrence times. Black and gray circles are target events andcomplementary events, respectively. Different sizes of circles show the magnitude of earthquakes from 4.2 to 8.1.


Magnitude

Trig

gerin

g ab

ilitie

s

5 6 7 8

0.05

0.1

0.2

0.5

12

510

2050

All eventsTriggeredBackgroundTheoretical

Fig. 2. Reconstructed triggering abilities (average number of children triggered by events of same magnitude) for all the eventsK

(1)(s), background events κ(1)0 (s) and triggered events κ

(1)1 (s), as in (14), (20) and (21), respectively (Zhuang et al., 2004).

The solid line represents the initial κ(s) = K(0)(s) = κ

(0)0 (s) = κ

(0)1 (s) obtained by MLE from fitting model (1) to the JMA

catalogue. Magnitude 4.2 on the horizontal axes corresponds to s = 0.

30 J. Zhuang

(a)

Magnitude

Sca

ling

Fac

tor

0.00

10.

010.

1

5 6 7 8

(b)

MagnitudeS

calin

g F

acto

r0.

001

0.01

0.1

5 6 7 8

(c)

Magnitude

Sca

ling

Fac

tor

0.00

10.

010.

1

5 6 7 8

(d)

Magnitude

Sca

ling

Fac

tor

0.00

10.

010.

1

5 6 7 8

Fig. 3. Reconstructed results of σ(1)(s) against the corresponding magnitudes (Magnitude 4.2 corresponds to s = 0) of parent

earthquakes (Ogata and Zhuang, 2006). Circles indicate the values of σ(1)(s). (a) shows the reconstructed results from the

original model for the JMA catalogue. (b) is the same as (a), but the catalogue is a simulation of the original model. (c)Reconstructed results from the model equipped with (24). (d) is the same as (c), but the catalogue is a simulation of the modelequipped with (24). In (c) and (d), the solid lines represent Ce

αs and the dashed lines represent Ceα′s.

Second-order residual analysis of spatio-temporal point ...bemlar.ism.ac.jp/zhuang/pubs/residual3.4.pdf · Temporal, spatial and spatio-temporal point processes have been increasingly

Documents