Page 1
Thesis for the degree of Licentiate of Enginering in MathematicalStatistics
Modelling Precipitation inSweden Using Multiple Step
Markov Chains and aComposite Model
Jan Lennartsson
Department of Mathematical SciencesChalmers University of Technology and University of Gothenburg
Goteborg, Sweden, 2008
Page 2
Modelling Precipitation in Sweden Using Multi-ple Step Markov Chains and a Composite Model
Jan Lennartsson
© Jan Lennartsson, 2008
Department of Mathematical SciencesChalmers University of Technology and University of GothenburgSE–41296 Goteborg, SwedenPhone: +4631–772 1000
ISSN 1652-9715Technical Report 2008:38
Printed in Sweden, 2008.
Page 3
Abstract
In this thesis, we propose a new method to model precipitation in Sweden. We considera chain dependent stochastic model that consists of a first component that models theprobability of occurrence of precipitation at a weather station and a second componentthat models the amount of precipitation when precipitation occurs. For the first com-ponent we fit a multiple order Markov chain. It turns out that for most of the weatherstations in Sweden a Markov chain of an order higher than one is required. For the secondcomponent, which is a temporal Gaussian process with marginals transformed to have adistribution composed of the empirical distribution of the amount of historically observedprecipitation at each weather station below a suitable threshold and a fitted generalizedPareto distribution above that threshold. In other words, we model the temporal depen-dence between amounts of precipitation at different times by means of a Gaussian copula.The derived model is then used to compute different weather indices. The distribution ofthese indices according to our model show good agreement with the corresponding em-pirical distributions for the indices as computed from real world data, which supports thechoice of the model.
Keywords: Multiple order Markov chain; Generalized Pareto distribution; Gaussiancopula; Precipitation process; Empirical distribution; Sweden.
i
Page 5
Acknowledgements
First, I would like to express my gratitude to my supervisor, Patrik Albin, for thesupport, guidance and belief in me. Thank you for always having time with myquestions and ideas and for sharing your experience and for pushing me forwardwith great enthusiasm. You also deserve a rose for just being the lovely unique manthat you are.
Secondly, Anastassia Baxevani for endlessly accepting my stripped declarations andlack of stringency. Also, I am in great debt to you for aiding, driving, pushing andforcing this project into land.
Thirdly, Dan Stromberg for committing and helping out and my previous co–advisor,Igor Rychlik, for his support.
Fourthly and in ascending order Daniel Drugge, Elin Lennartsson and Martin Nor-ling for helping me complete this thesis.
Fifthly, Johan Tykesson, Dan Kuylenstierna, Marcus Warfheimer and Marcus Gavelldeserves gratitude for putting up with me during courses and/or most of the timeaccepting or intriguing challenging discussions.
Ottmar Cronie for temporarily looking after my office.
Former and present colleagues at mathematical sciences.
*
This work was financially supported by Goteborgs MiljoVetenskapliga centrum (GMV).
iii
Page 7
Contents
Abstract i
Acknowledgements iii
Introduction 1Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2My contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Appended paper 5
v
Page 9
Introduction
This licentiate thesis is based on the manuscript “Modelling precipitation in Swedenusing multiple step Markov chains and a composite model” which can be found asan appended paper after this introduction. This manuscript has been accepted forpublication in Journal of Hydrology. The manuscript is a joint work with Anastas-sia Baxevani and Deliang Chen. Below is an introductory motivation and a non-technical description of the manuscript, together with a statement of my personalcontributions to the manuscript as compared with that of my co-authors.
Motivation
Distribution of precipitation is a major environmental issue on both a national andan international level. For example, it is clear that the effects of drought or anoma-lously wet weather conditions may have devastating consequences on agriculture.However, the entire distribution of the downfall is also of great interest and realisticsequences of meteorological variables such as precipitation are key inputs in manyhydrologic, ecologic and agricultural models.
Climate, and in particular precipitation, is governed by a set of physical princi-ples which may in mathematical language be represented as differential equations. Inmeteorology, where only a short time span is considered, these differential equationsare used to predict precipitation. However, the equations feature high complexitywhich has the consequence that no exact solution(s) are known. This in turn re-sults in that variability and uncertainty for predicting precipitation in longer timeperiods, e.g., over a month or a year, grows in an unmanageable extent. In orderto forecast precipitation for longer time spans the precipitation process is thereforeusually modelled as a stochastic process.
In the absence of complete knowledge of all the underlying processes which gov-ern climate, simulation models are needed to model the stochastic behaviour ofthe system when historical records are of insufficient duration or inadequate spatialand/or temporal coverage. In these cases synthetic sequences may be used to fillin gaps in historical records, to extend the historical record, or to generate realiza-tions of weather that are stochastically similar to the historical record. A weathergenerator is a stochastic numerical model that generates daily weather series withthe same statistical properties as observed real world weather series.
However, even if the measured information of interest – i.e., the amount ofprecipitation – is a fairly accessible entity, the actual building of a weather generatoris made difficult by the long time periods and complex processes which governs it.
In contrast to meteorology, which focuses on short term weather systems, in thecase of a weather generator the interest lies rather in unfolding the structure ofthe underlying process. The actual pattern of measures such as frequency or thetrends of those systems are of more interest than the daily precipitation. When
1
Page 10
watching the news one is usually more interested in knowledge about the opposite,the precipitation in a near future.
The climate consists of more than averages and seasonal variations. Rare eventswith extreme weather are also a part of the climate. There are different kinds ofclimate extremes, some are violent, such as a severe rainfall, which may have adevastating affect on the community. In order to optimize the counter response tothese a model that accurately describes the distribution of these events is a highpriority.
Description
The aim is to create a weather generator that can produce realistic sequences ofprecipitation for some specific geographical sites in Sweden.
In developing the weather generator, the stochastic structure of the time series ofdaily amounts of precipitation is described by a statistical model. The parametersof this model are estimated by means of using observed real world precipitationdata. The thus completed weather generator allows us to generate arbitrarily longseries with the stochastic structure of the model. We can also evaluate the degreeof similarity between weather data produced by the model and the real world dataseries.
The statistical model is a chain-dependent model which consists of two steps:First a model for the sequence of wet/dry days, for which we employ a multiple orderMarkov chain. Secondly a model for the amount of precipitation for the wet days,for which we employ a composite model that incorporates the empirical distributiontogether with an extreme value distribution for the tail.
In this study the actual physical processes that are influencing the climate areof less importance than what they accomplish, that is, the daily amount of precipi-tation.
In the mathematical part of our work a great deal revolves around Markovprocesses. Briefly put, a Markov process is a process that features the propertyof a full blown amnesia: The future is independent of the past conditioned on thatthe current is known. A multiple step Markov chain is just that the full amnesiaproperty is relaxed so that with knowledge of the current state and the states somesteps back, then the future is independent of the past. However, even though theMarkov property is very useful in terms of explicitly finding the probability of a vastnumber of interesting events, one must remember that this is just a mathematicalmodel for a real world that sometimes might be much more complicated. Still it isa good and versatile model.
By countless examples it has been demonstrated that Gaussian processes veryaccurately describe the dependence structure in our world. So when confrontingthe fact that the data indicated temporal dependence it is natural to appointeea Gaussian copula. Instead of estimating the parameter for a Gaussian copuladirectly from the data – a computationally very hard assignment – we sidestep thatproblem by means of using the fact that all computations for copulas are basedon the empirical rank. We transform these ranks to their corresponding normalquantile, meaning that if the bivariates really are governed by the Gaussian copula,then the transformed bivariables are multivariate normal distributed. Estimatingthe Gaussian copula parameter now reduces to estimate the correlation coefficientfor a multivariate normal distributed variable, which is an elementary task.
2
Page 11
Since extreme rainfall so crucially affect the community, we spend substantialeffort on analysing these rare events. By a point-process approach, singling out thedependence of close-by extremes, the extremal distribution is estimated.
My contribution
The problem of creating a weather generator, together with suggestions for suitabledata sets, was proposed to me by environmentalist Deliang Chen. My head advisor– Patrik Albin – got me started by suggesting dividing the precipitation processin different parts and specifically modelling the precipitation-/no precipitation partas a multiple step Markov chain and the rain distribution by a combination of theempirical distribution and an extreme value distribution. By the fruitful tutoring ofmy co-advisor – Anastassia Baxevani – we moved the problem forward and endedup with this thesis at hand. Except for the above mentioned crucial advisory con-tributions, everything in this thesis has been done by myself the undersigned. Thatmeans building up the stochastical model from scratch into something that veryaccurately replicates historical data, by means of statistical test with respect to theestablished weather indices.
3
Page 13
Appended paper
Lennartsson, J., Baxevani, A., Chen, D., Modelling precipitation in Sweden usingmultiple step Markov chains and a composite model, Journal of Hydrology (2008),doi: 10.1016/j.jhydrol.2008.10.003
Two versions of the manuscript is appended because of the differing typograph-ical advantages; the final manuscript as accepted for publication and the presentversion of the manuscript as typeset by Journal of Hydrology.
5
Page 15
Modelling Precipitation in Sweden Using Multiple Step Markov
Chains and a Composite Model
Jan Lennartsson1, Anastassia Baxevani1∗†, Deliang Chen2
1 Department of Mathematical Sciences, Chalmers University of Technology, Univer-
sity of Gothenburg, Gothenburg, Sweden
2 Department of Earth Sciences, University of Gothenburg, Gothenburg, Sweden
Abstract
In this paper, we propose a new method for modelling precipitation in Sweden. We consider a
chain dependent stochastic model that consists of a component that models the probability of
occurrence of precipitation at a weather station and a component that models the amount of
precipitation at the station when precipitation does occur. For the first component, we show
that for most of the weather stations in Sweden a Markov chain of an order higher than one is
required. For the second component, which is a Gaussian process with transformed marginals,
we use a composite of the empirical distribution of the amount of precipitation below a given
threshold and the generalized Pareto distribution for the excesses in the amount of precipitation
above the given threshold. The derived models are then used to compute different weather
indices. The distribution of the modelled indices and the empirical ones show good agreement,
which supports the choice of the model.
Key words: High order Markov chain, generalized Pareto distribution, copula, precipita-
tion process, Sweden
∗Corresponding author†Research supported partially by the Gothenburg Stochastic Center and the Swedish foundation for
Strategic Research through Gothenburg Mathematical Modelling Center.
1
Page 16
1 Introduction
Realistic sequences of meteorological variables such as precipitation are key inputs in many
hydrologic, ecologic and agricultural models. Simulation models are needed to model
stochastic behavior of climate system when historical records are of insufficient duration
or inadequate spatial and /or temporal coverage. In these cases synthetic sequences may
be used to fill in gaps in the historical record, to extend the historical record, or to generate
realizations of weather that are stochastically similar to the historical record. A weather
generator is a stochastic numerical model that generates daily weather series with the
same statistical properties as the observed ones, see Liao et al. (2004).
In developing the weather generator, the stochastic structure of the series is described
by a statistical model. Then, the parameters of the model are estimated using the observed
series. This allows us to generate arbitrarily long series with stochastic structure similar
to the real data series.
Parameter estimation of stochastic precipitation models has been a topic of intense
research the last 20 years. The estimation procedures are intrinsically linked to the nature
of the precipitation model itself and the timescale used to represent the process. There are
models which describe the precipitation process in continuous time and models describing
the probabilistic characteristics of precipitation accumulated on a given time period, say
daily or monthly totals. Different reviews of the available models have been presented: see
for example Woolshiser (1992), Cox and Isham (1988) and Smith and Robinson (1997).
Continuous time models for a single site with parameters related to the underly-
ing physical precipitation process are particularly important for the analysis of data at
short timescales, e.g. hourly. Some of these models are described in Rodrıguez-Iturbe et
al. (1987, 1988) and Waymire and Gupta (1981).
When only accumulated precipitation amounts for a particular time period (daily) are
recorded then empirical statistical models, based on stochastic models that are calibrated
from actual data are appealing. Empirical statistical models for generating daily precip-
itation data at a given site can be classified into four different types, chain dependent
or two-part models, transition probability matrix models, resampling models and ARMA
time series models, see Srikanthan and McMahon (2001) for a complete review of the
2
Page 17
different models.
A generalization of the precipitation models for a single site is the spatial extension
of these models for multiple sites, to try to incorporate the intersite dependence but pre-
serving the marginal properties at each site. A more ambitious task is the modelling of
precipitation continuously in time and space and original work on these type of models
based on point process theory was presented by LeCam (1961) and further developed by
Waymire et al. (1984) and Cox and Isham (1994). Mellor (1996) has developed the mod-
ified turning bands model which reproduces some of the physical features of precipitation
fields in space as rainbands, cluster potential regions of rain cells.
In this study we concentrate on the chain-dependent model for the daily precipitation
in Sweden which consists of two steps, first a model for the sequence of wet/dry days
and second, a model for the amount of precipitation for the wet days. For the first, we
use high-order Markov chains and for the second we introduce a composite model that
incorporates the empirical distribution and the generalized Pareto distribution.
2 Data
Fig.1: Location of the stations.
Precipitation data from 20 stations in Sweden have been used in the studies presented
in this paper. The locations are shown in Fig. 1 and the names of the stations are given
in Table 1. The data consist of accumulated daily precipitation collected during 44 years
starting on the 1st of January 1961 and ending the 31st of December 2004 and are provided
by the Swedish Meteorological and Hydrological Institute (SMHI). The number of missing
3
Page 18
observations in all stations is generally low (< 5%). The time plots of the annual number
of wet days (above the threshold 0.1 mm) at the 20 stations are presented in Fig. 2.
1970 1980 1990 20000
100
200
300
annu
al n
o. o
f wet
day
s
Years
Lund
1970 1980 1990 20000
100
200
300
annu
al n
o. o
f wet
day
s
Years
Bolmen
1970 1980 1990 20000
100
200
300
annu
al n
o. o
f wet
day
s
Years
Hanö
1970 1980 1990 20000
100
200
300
annu
al n
o. o
f wet
day
s
Years
Borås
1970 1980 1990 20000
100
200
300
annu
al n
o. o
f wet
day
s
Years
Varberg
1970 1980 1990 20000
100
200
300
annu
al n
o. o
f wet
day
s
Years
Ungsberg
1970 1980 1990 20000
100
200
300
annu
al n
o. o
f wet
day
s
Years
Säffle
1970 1980 1990 20000
100
200
300
annu
al n
o. o
f wet
day
s
Years
Söderköping
1970 1980 1990 20000
100
200
300
annu
al n
o. o
f wet
day
s
Years
Stockholm
1970 1980 1990 20000
100
200
300
annu
al n
o. o
f wet
day
s
Years
Malugn
1970 1980 1990 20000
100
200
300
annu
al n
o. o
f wet
day
s
Years
Vattholma
1970 1980 1990 20000
100
200
300
annu
al n
o. o
f wet
day
s
Years
Myskåsen
1970 1980 1990 20000
100
200
300an
nual
no.
of w
et d
ays
Years
Härnösand
1970 1980 1990 20000
100
200
300
annu
al n
o. o
f wet
day
s
Years
Rösta
1970 1980 1990 20000
100
200
300
annu
al n
o. o
f wet
day
s
Years
Piteå
1970 1980 1990 20000
100
200
300
annu
al n
o. o
f wet
day
s
Years
Stensele
1970 1980 1990 20000
100
200
300
annu
al n
o. o
f wet
day
s
Years
Haparanda
1970 1980 1990 20000
100
200
300
annu
al n
o. o
f wet
day
s
Years
Kvikkjokk
1970 1980 1990 20000
100
200
300
annu
al n
o. o
f wet
day
s
Years
Pajala
1970 1980 1990 20000
100
200
300
annu
al n
o. o
f wet
day
s
Years
Karesuando
Fig.2: Time plot of annual number of wet days.
Time plots of annual number of wet days showed that the precipitation regime in
some stations (namely, Soderkopping, Rosta and Stensele) contains possible trends. The
4
Page 19
results presented in the next sections refer to the whole period of data from all stations,
but attention should be paid when we refer to the above mentioned stations. In Fig. 3,
time plots of the annual amount of precipitation of the wet days are presented. The total
amounts of precipitation seem to be stationary over the different years.
3 Model
To model precipitation in Sweden, we have decided to use a chain dependent model. The
first part of the model can be dealt with using Markov chains. Gabriel and Newman (1962)
used a first-order stationary Markov chain. The models have since been extended to allow
for non-stationarity, both by fitting separate chains to different periods of the year and
by fitting continuous curves to the transition probabilities, see Stern and Coe (1984) and
references within. The order of Markov chain required has also been discussed extensively,
for example Chin (1977) and references therein, with the obvious conclusion that different
sites require different orders. Still, the first order Markov chains are a popular choice since
they have been shown to perform well for a wide range of different climates, see for example
Bruhn et al. (1980), Lana and Burgueno (1998) and Castellvi and Stockle (2001). The
main deficiency associated with the use of first order models is that long dry spells are
not well reproduced, see Racsko et al. (1991), Guttorp (1995).
To model the amount of precipitation that has occurred during a wet day, different
models have been proposed in the literature all of which assume that the daily amounts
of precipitation are independent and identically distributed. Stidd (1973) and Hutchin-
son (1995) have proposed a truncated normal model for the amount of precipitation with
a time dependent parameter, while the Gamma and Weibull distributions have been se-
lected by Geng et al. (1986) as well as Selker and Heith (1990), because of their site-specific
shape.
In this study, we model the occurrence of wet/dry days using Markov chains of higher
order and for the amount of precipitation we use a composite model, consisting of the
empirical distribution function for values below a threshold and the distribution of excesses
for values above the given threshold. Such a model is more flexible, describes better the
tail of the distribution and additionally allows for dependence in the precipitation process.
5
Page 20
1980 2000
500
1000
1500am
ount
of p
reci
p. (
mm
)
Years
Lund
1980 2000
500
1000
1500
amou
nt o
f pre
cip.
(m
m)
Years
Bolmen
1980 2000
500
1000
1500
amou
nt o
f pre
cip.
(m
m)
Years
Hanö
1980 2000
500
1000
1500
amou
nt o
f pre
cip.
(m
m)
Years
Borås
1980 2000
500
1000
1500
amou
nt o
f pre
cip.
(m
m)
Years
Varberg
1980 2000
500
1000
1500
amou
nt o
f pre
cip.
(m
m)
Years
Ungsberg
1980 2000
500
1000
1500
amou
nt o
f pre
cip.
(m
m)
Years
Säffle
1980 2000
500
1000
1500
amou
nt o
f pre
cip.
(m
m)
Years
Söderköping
1980 2000
500
1000
1500
amou
nt o
f pre
cip.
(m
m)
Years
Stockholm
1980 2000
500
1000
1500
amou
nt o
f pre
cip.
(m
m)
Years
Malugn
1980 2000
500
1000
1500
amou
nt o
f pre
cip.
(m
m)
Years
Vattholma
1980 2000
500
1000
1500
amou
nt o
f pre
cip.
(m
m)
Years
Myskåsen
1980 2000
500
1000
1500
amou
nt o
f pre
cip.
(m
m)
Years
Härnösand
1980 2000
500
1000
1500
amou
nt o
f pre
cip.
(m
m)
Years
Rösta
1980 2000
500
1000
1500
amou
nt o
f pre
cip.
(m
m)
Years
Piteå
1980 2000
500
1000
1500
amou
nt o
f pre
cip.
(m
m)
Years
Stensele
1980 2000
500
1000
1500
amou
nt o
f pre
cip.
(m
m)
Years
Haparanda
1980 2000
500
1000
1500
amou
nt o
f pre
cip.
(m
m)
Years
Kvikkjokk
1980 2000
500
1000
1500
amou
nt o
f pre
cip.
(m
m)
Years
Pajala
1980 2000
500
1000
1500
amou
nt o
f pre
cip.
(m
m)
Years
Karesuando
Fig.3: Time plot of annual amount of precipitation.
Let Zt be the precipitation at a certain site at time t measured in days. Then, a
chain-dependent model for the precipitation is given by,
Zt = XtWt,
where Xt and Wt are stochastic processes such that Xt takes values in {0, 1} and Wt
6
Page 21
Number Name
1 Lund
2 Bolmen
3 Hano
4 Boras
5 Varberg
6 Ungsberg
7 Saffle
8 Soderkoping
9 Stockholm
10 Malung
11 Vattholma
12 Myskelasen
13 Harnosand
14 Rosta
15 Pitea
16 Stensele
17 Haparanda
18 Kvikkjokk
19 Pajala
20 Karesuando
Table 1: Names of weather stations.
takes values in R+ \ 0. The processes Xt and Wt will be referred to as the occurrence of
precipitation and the amount of precipitation process, respectively.
The approach presented in this study provides a mechanism to make predictions of
precipitation in time. This is particularly important for many applications in hydrology,
ecology and agriculture. For example, at a monthly level, the amount of precipitation and
the probability and length of a dry period are required quantities for many applications.
7
Page 22
4 Models for the Occurrence of Precipitation
Let {Xt, t = t1, . . . , tN} denote the sequence of daily precipitation occurrence, i.e. Xt = 1,
indicates a wet day and Xt = 0, a dry day. A wet day in the context of this study, occurs
when at least 0.1mm of precipitation was recorded by the rain gauge. The level has been
chosen above zero in order to avoid identifying dew and other noise as precipitation and
to also avoid difficulties arising from the inconsistent recording of very small precipitation
amounts. Moreover, daily precipitation amounts of less than 0.1mm can have relatively
large observational errors, and including them would cause a significant change in the
estimated transition probabilities of the occurrences. As a consequence this introduces
additional errors into the fitted models. The model is fitted over different periods of the
year, that is subsets of the N days of the year, that may be assumed stationary.
Before we continue any further we need to introduce some notation. Let S = {0, 1}denote the state space of the k-Markov chain Xt. The elements of S are called letters and
an ordering of letters w ∈ Sl = S × · · · × S is called a word of length l, while the words
composed of the letters from position i to j in w for some 1 ≤ i ≤ j ≤ l, are denoted as
wji = (wi, wi+1, ..., wj). Finally, for k ≤ l let τk(w) = wl
l−k+1 denote the k-tail of the w
word, i.e. τk(w) denotes the last k letters of w. If no confusion will arise when k ≤ j − i,
we also write τk(wj) instead of τk(w
ji ).
It is assumed that the process Xt is a k-Markov chain: a model completely character-
ized by the transition probability
pw,j(t) := P (Xt = j|τk(Xt−1) = w), j ∈ S, t = t1, . . . , tN ,
where w is a word of length k and X t−1 = {. . . , Xt−2, Xt−1} is the whole process up
to t − 1 so τk(Xt−1) is the last k days up to and including Xt−1; that is, τk(X
t−1) =
(Xt−k, . . . , Xt−1). Note that, for a 2−state Markov chain of any order pw,1(t)+pw,0(t) = 1.
In the special case of time homogeneous Markov chain, pw,j(t) = pw,j, for t = t1, . . . , tN ,
i.e. the transition probabilities are independent of time.
Let nw,j(t) denote the number of years during which day t is in state j and is preceded
by the word w (i.e. τk(Xt−1) = w,w ∈ Sk and Xt = j). Then the probabilities pw,j(t) are
estimated by the observed proportions
pw,j(t) =nw,j(t)
nw,+(t), w ∈ Sk, j ∈ S, t = t1, . . . , tN ,
8
Page 23
where + indicates summation over the subscript. Note also that day 60 (February 29th)
has data only in leap years so day 59 precedes day 61 in non-leap years. Fig. 4 (left)
shows the unconditional probability of precipitation, pooled over 5 days for clarity, plotted
against t for the data from the station in Lund.
In the context of environmental processes, non-stationarity is often apparent, as in
this case, because of seasonal effects or different patterns in different months. A usual
practice is to specify different subsets of the year as seasons, which results to different
models for each season, although the determination of an appropriate segregation into
seasons is itself an issue.
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec0.3
0.35
0.4
0.45
0.5
0.55
0.6
0.65
0.7
0.75
0.8
prob
. of p
reci
p.
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec10
11
12
13
14
15
16
17
18
19
20
mea
n no
. of w
et d
ays
Fig.4: Lund, Sweden (data from 1961 to 2004). (Left): Observed p(t) pooled over 5
days. (Right): Mean number of wet days per month (”+”), and per season (solid lines).
4.1 Fitting Models to the Occurrence of Precipitation
There is an inter-annual variation in the annual number of wet days, as can be seen in
Fig. 2. Moreover, there is also seasonal variation in the mean monthly number of wet days,
see Fig. 4 (Right) for data from Lund, although this is not as prominent as in other regions
of the world. It is possible that the optimum order of the chain describing the wet/dry
sequence varies within the year and from one year to another. It is therefore important
to properly identify the period of record that can be assumed as time homogeneous.
Moreover, the problem of finding an appropriate model for the occurrence of precip-
itation process, Xt, is equivalent to the problem of finding the order of a multiple step
Markov chain. The Akaike Information Criterion (AIC), Bayesian Information Criterion
9
Page 24
(BIC) and the Generalized Maximum Fluctuation Criterion (GMFC) order estimators, a
short description of which can be found in the subsection 8.1, have been applied to the
data for each of the stations. Various block lengths were considered for determining the
order of the Markov chain, k, as suggested in Jimoh and Webster (1996).
• 1 month blocks (i.e. January, February, ..., December),
• 2 month blocks (January - February, February - March, ..., December - January),
• 3 month blocks (January - March, February - April, ..., December - February).
The effect of block length on the order of the Markov chain can be seen in Figs. 5-7.
We can notice that grouping the data in blocks of length more than one month, results
in Markov chains of ”smoother” order, in the sense that the order of the chain does not
change so fast. It is also interesting to notice that while the order of the Markov chain for
the stations 16-20, varies a lot according to the AIC and GMFC estimators it seems to
be almost constant for the BIC order estimator. As it has been expected, the BIC order
estimator underestimates the order k of the Markov chain relatively to both the AIC and
GMFC order estimators for large k and moderate data sets, see Dalevi et al. (2006), while
the values of the GMFC order estimator lie between the BIC and AIC ones. The results
presented in Figs. 5-7, confirm that the model order is sensitive to the season (month) and
the length of the season (number of months) considered, as well as the method used in
identifying the optimum order. Possible dependence on the threshold used for identifying
wet and dry days has not been studied here. For the rest of this study, we define as
seasons the 3 month periods, December-February, March-May, June-August, September-
November. As can be seen in Fig. 3 for the station in Lund, the rest of the stations
provide with similar plots, the probability of precipitation is close to be constant during
these periods, which makes the assumption of stationarity seem plaussible. The orders of
the Markov chain for these periods can be found in Fig. 7. For the rest of this study the
order k of the Markov chain is decided according to the GMFC order estimator.
10
Page 25
Estimated Orders by Akaike order estimator
st 01 st 02 st 03 st 04 st 05
st 06 st 07 st 08 st 09 st 10
st 11 st 12 st 13 st 14 st 15
st 16 st 17 st 18 st 19 st 20
5
3
1
5
3
1
5
3
1
5
3
1
Jan Jun DecJan Jun DecJan Jun DecJan Jun DecJan Jun Dec
st 01 st 02 st 03 st 04 st 05
st 06 st 07 st 08 st 09 st 10
st 11 st 12 st 13 st 14 st 15
st 16 st 17 st 18 st 19 st 20
5
3
1
5
3
1
5
3
1
5
3
1
Jan Jun DecJan Jun DecJan Jun DecJan Jun DecJan Jun Dec
Estimated Orders by Bayesian order estimator
st 01 st 02 st 03 st 04 st 05
st 06 st 07 st 08 st 09 st 10
st 11 st 12 st 13 st 14 st 15
st 16 st 17 st 18 st 19 st 20
5
3
1
5
3
1
5
3
1
5
3
1
Jan Jun DecJan Jun DecJan Jun DecJan Jun DecJan Jun Dec
Estimated Orders by GMFC order estimator
st 01 st 02 st 03 st 04 st 05
st 06 st 07 st 08 st 09 st 10
st 11 st 12 st 13 st 14 st 15
st 16 st 17 st 18 st 19 st 20
5
3
1
5
3
1
5
3
1
5
3
1
Jan Jun DecJan Jun DecJan Jun DecJan Jun DecJan Jun Dec
Fig.5: k-Markov chain orders for block lengths of one month, (Jan, Feb, ...).
Estimated Orders by Akaike order estimator
st 01 st 02 st 03 st 04 st 05
st 06 st 07 st 08 st 09 st 10
st 11 st 12 st 13 st 14 st 15
st 16 st 17 st 18 st 19 st 20
5
3
1
5
3
1
5
3
1
5
3
1
Jan Jun DecJan Jun DecJan Jun DecJan Jun DecJan Jun Dec
Estimated Orders by Bayesian order estimator
st 01 st 02 st 03 st 04 st 05
st 06 st 07 st 08 st 09 st 10
st 11 st 12 st 13 st 14 st 15
st 16 st 17 st 18 st 19 st 20
5
3
1
5
3
1
5
3
1
5
3
1
Jan Jun DecJan Jun DecJan Jun DecJan Jun DecJan Jun Dec
Estimated Orders by GMFC order estimator
st 01 st 02 st 03 st 04 st 05
st 06 st 07 st 08 st 09 st 10
st 11 st 12 st 13 st 14 st 15
st 16 st 17 st 18 st 19 st 20
5
3
1
5
3
1
5
3
1
5
3
1
Jan Jun DecJan Jun DecJan Jun DecJan Jun DecJan Jun Dec
Fig.6 k-Markov chain orders for block lengths of two months, (Jan-Feb, Feb-Mar, ...).
Estimated Orders by Akaike order estimator
st 01 st 02 st 03 st 04 st 05
st 06 st 07 st 08 st 09 st 10
st 11 st 12 st 13 st 14 st 15
st 16 st 17 st 18 st 19 st 20
5
3
1
5
3
1
5
3
1
5
3
1
Jan Jun DecJan Jun DecJan Jun DecJan Jun DecJan Jun Dec
Estimated Orders by Bayesian order estimator
st 01 st 02 st 03 st 04 st 05
st 06 st 07 st 08 st 09 st 10
st 11 st 12 st 13 st 14 st 15
st 16 st 17 st 18 st 19 st 20
5
3
1
5
3
1
5
3
1
5
3
1
Jan Jun DecJan Jun DecJan Jun DecJan Jun DecJan Jun Dec
Estimated Orders by GMFC order estimator
st 01 st 02 st 03 st 04 st 05
st 06 st 07 st 08 st 09 st 10
st 11 st 12 st 13 st 14 st 15
st 16 st 17 st 18 st 19 st 20
5
3
1
5
3
1
5
3
1
5
3
1
Jan Jun DecJan Jun DecJan Jun DecJan Jun DecJan Jun Dec
Fig.7 k-Markov chain orders for block lengths of three months, (Jan-Mar, Feb-Apr, ... ).
4.2 Distribution of Dry Spell length
An interesting aspect of the wet/dry behavior, i.e. the process Xt, is the distribution of
the dry (wet) spells, i.e., the number of consecutive dry (wet) days, which is an accessible
property of multiple step Markov chains.
11
Page 26
For a time homogeneous (stationary) k-Markov chain Xt, (k ≥ 2), with state-space S
let T be the first time the process Xt is such that τ2(Xt) = (1, 0), i.e.,
T = inf{t ≥ 0 : τ2(Yt) = (1, 0)}.
So T is the time of the start of the first dry period. Let also for the words u, v ∈ Sk
au,v = P (τk(XT ) = v|τk(X
0) = u)
denote the probability the process Xt has at time T a k-tail equal to v given that the
k-tail at time 0 is equal to u. The probabilities au,v are easily obtained for stationary
processes, see Norris (1997). Note that at t = 0, there may be the start of a dry period,
the start of a wet period, the continuation of a dry period or the continuation of a wet
period. If D(Xt) denotes the length of the first dry period that starts at time t = 0 for the
k-Markov chain Xt, then assuming additionally that the process Xt is time homogeneous,
the distribution of the first dry spell can be computed as
P (D(Xt) = m) =∑
{u∈Sk}πu
∑{w∈Sk:τ2(w)=(1,0)}
au,wP (τm(Xm−1) = 0, Xm = 1|τk(X0) = w),
(1)
where 0 is used to denote sequences of 0′s of appropriate length.
Now, if v = w01 is a word of length m + k (0 here is of order m − 1) and using the
fact the process Xt is a k-Markov chain, Eq.1 can be rewritten as
P (D(Xt) = m) =∑
{u∈Sk}πu
∑{w∈Sk:τ2(w)=(1,0)}
au,w
m∏i=1
P (Xi = vk+i|τk(Xi−1) = τk(v
k+i−1)).
(2)
Remark 1 Here we should notice that the distribution of the first dry spell is different
than the distribution of the subsequent dry spells for Markov chains of order greater than
two. For one or two order Markov chains there is no need for this distinction. Moreover
the equivalent of Eq. 1 for k = 1 is
P (D(Xt) = m) = pm−10,0 p0,1
while for k = 2, Eq. 2 simplifies to
P (D(Xt) = m) =
p10,1 for m = 1
p10,0pm−200,0 p00,1 for m ≥ 2,
where au,v = 1 for all u,v in Eq. 1.
12
Page 27
The distribution of the first dry spell can be also used for model selection or model
validation purposes. For this, we use the Kolmogorov-Smirnov (KS) test, see Benjamin
and Cornell (1970). The one sample KS test compares the empirical distribution function
with the cumulative distribution function specified by the null hypothesis.
Assuming that Pk(x) is the true distribution function (of a Markov chain of order k)
the KS test is
D = supm∈N+
|Pk(D(X) ≤ m)− Femp(m)|,
where Femp(x) is the empirical cumulative distribution of the length of the first dry spell.
If the data comes truly from a k order Markov chain and the transition probabilities are
the correct ones, then by Glivenko-Cantelli theorem, the KS test converges to zero almost
surely (a.s).
To apply the test, the transition probabilities have been estimated from the data using
maximum likelihood for different values of the order k of the Markov chain. To obtain
the empirical distribution of the length of the first dry spell, we have computed the length
of the dry spells (sequence of zeros) following the first (1, 0). (Here note that this is
equivalent to computing the length of the first dry spell for Markov chains of order k = 1
or k = 2. In the case of k = 3, although the distribution of the first dry spell is not
exactly the same as the distribution of any dry spell, we have still used all the dry spells
available due to shortage of data.) The procedure has been applied separately to data
from each station and season. If the first observations were zeros, they were ignored as
the continuation of a dry spell. Also if a dry spell was not over by the end of the season
then it was followed inside the next season.
To determine whether the theoretical model was correct or not, Monte Carlo simula-
tions were performed. We have obtained the empirical distribution of the length of the
first dry spell using 500 synthetic wet/dry records of 44 years of data (each station and
season was treated separately), and the KS test was computed for each one of them, which
resulted to the distribution of the KS statistic.
13
Page 28
Estimated Orders by KS−criterion of dry spell order estimator
st 01 st 02 st 03 st 04 st 05
st 06 st 07 st 08 st 09 st 10
st 11 st 12 st 13 st 14 st 15
st 16 st 17 st 18 st 19 st 20
531
531
531
531
Jan May Nov Jan May Nov Jan May Nov Jan May Nov Jan May Nov
st 01 st 02 st 03 st 04 st 05
st 06 st 07 st 08 st 09 st 10
st 11 st 12 st 13 st 14 st 15
st 16 st 17 st 18 st 19 st 20
531
531
531
531
Jan May Nov Jan May Nov Jan May Nov Jan May Nov Jan May Nov
st 01 st 02 st 03 st 04 st 05
st 06 st 07 st 08 st 09 st 10
st 11 st 12 st 13 st 14 st 15
st 16 st 17 st 18 st 19 st 20
531
531
531
531
Jan May Nov Jan May Nov Jan May Nov Jan May Nov Jan May Nov
Fig.8: Order of Markov chain as suggested by the Kolmogorov-Smirnov statistic at 10%
tail value for each station and season.
The suggested orders of the Markov chain using the Kolmogorov-Smirnov statistic at
the 10% tail value are collected in Fig. 8. The resulting orders of the Markov chain appear
to be close to those obtained by the BIC order estimator. In Table 2, we have collected
information on how many data sets have passed the Kolmogorov-Smirnov test at the 10%
tail value for the different seasons. Observe that the KS test suggests that the 1-Markov
chain, although widely used, is an inadequate model for the majority of the stations in
Sweden over the different seasons.
Season
Model S1 S2 S3 S4
k = 1 1 0 1 6
k = 2 20 20 20 20
k = 3 20 20 20 20
Table 2: Number of data sets that have passed the Kolmogorov-Smirnov test at the
10% tail value for different orders of the Markov chain. S1 stands for Dec.-Feb., S2 for
Mar.-May, S3 for Jun.-Aug. and S4 for Sep.-Nov.
4.3 Distribution of Long Dry Spells
Let us now define as long dry spell, a dry spell with length longer or equal to the order k
of the Markov chain. Then it is easy to show that the distribution of the long dry spell is
14
Page 29
actually geometric. Indeed, let a long dry spell that starts at time i have length m ≥ k
and let us also assume that we know that the length of the dry spell is at least l. Then,
for m ≥ l ≥ k
P (D(Xt) = m|τl(Xi+l−1) = 0) = p0,1p
m−l0,0 = p0,1(1− p0,1)
m−l,
where as before
p0,1 = P (Xn+1 = 1|τk(Xn) = 0), ∀n.
Therefore, the expected length of long dry spells is given by
E(D(Xt)|τl(Xi+l−1) = 0) = l +
1− p0,1
p0,1
. (3)
2 4 6 8 10 12 14
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
time (days)
prob
abili
ty
Empirical3−Markov2−Markov1−Markov
Fig. 9: Conditional distribution of Dry Spell given the Dry Spell is longer or equal to 3
days for k-Markov chain models of order k = 1, k = 2 and k = 3 and the data from Lund.
Data are from the winter months December-February.
Fig. 9 shows the conditional distribution of dry spell given that it has lasted for more
than two days for the first season and the data from Lund. The estimated order of the
Markov chain for this data set is 2 using both the GMFC and the KS criterion. A first order
Markov chain, the popular model of choice in this case would obviously underestimate
the risk of a long dry spell. A two order Markov chain seems to be the best choice for
this particular data set.
It is clear from Table 3, that underestimation of the order k of the Markov chain leads
to underestimation of the expected length of the long dry spells, where again a dry spell
is defined as long if it has length larger than or equal to the order of the Markov chain.
15
Page 30
Model l = 1 l = 2 l = 3
k = 1 2.49 3.49 4.49
k = 2 - 3.91 4.91
k = 3 - - 5.11
Observed 2.56 3.97 5.23
mean value
Table 3: Expected length of long dry spells for season Dec-Feb in Lund.
5 Modeling the Amount Precipitation Process
In this section we model the amounts of daily precipitation. This is done in two steps.
Firstly we model the dependence structure of the amount precipitation process and sec-
ondly we estimate the marginal distribution.
One of the important features of any climatological data set, is that they exhibit
dependence between nearby stations or successive days. In this work we are interested in
the latter case and the dependence structure is modelled using two-dimensional Gaussian
copula.
After the copula has been estimated, we remove the days with precipitation below the
cut-off level of 0.1mm. That is, we let Yt be the thinning process resulting from the amount
of precipitation process Wt when we consider only the wet days, i.e., Yt := Wt|Xt = 1.
Then, the marginal distribution of the amounts of daily precipitation is modelled following
an approach that combines the fit of the distribution of excesses over a high threshold
with the empirical distribution of the thinned data below the threshold.
5.1 Copula
Almost every climatological data set exhibit dependence between successive days. To
model the temporal dependence structure of the data we use the two-dimensional Gaussian
copula C given by
C(u, v; ρ) =
∫ Φ−1(u)
−∞
∫ Φ−1(v)
−∞
1
2π√
1− ρ2e−x2−2ρxy+y2
2(1−ρ2) dxdy (4)
= Φρ(Φ−1(u), Φ−1(v)),
16
Page 31
where Φ is the cumulative distribution function of the standard normal distribution and
Φρ is the joint cumulative distribution function of two standard normal random variables
with correlation coefficient ρ.
To estimate the copula, let
A = {t : Yt > 0 and Yt+1 > 0},
be the set of all days with non zero precipitation that were followed by days also with
non zero precipitation (greater than 0.1mm) and
u = [Ya1 , Ya2 , . . . ] , v = [Ya1+1, Ya2+1, . . . ] , a1, a2, · · · ∈ A
be the vectors consisting of the amounts of precipitation during the days indicated in the
set A and the following days respectively, both with marginal distribution F (x). Then,
transforming the vectors u and v by taking the empirical cumulative distribution corrected
by the factor nn+1
, (n is the number of days with positive precipitation in the data set)
results to vectors U and V respectively that follow the discrete uniform distribution in
(0, 1). If the Gaussian copula in Eq. 4 describes correctly the dependence structure of the
data, then (Φ−1(U), Φ−1(V)
) ∼ N µ1
µ2
,
σ1 ρσ1σ2
ρσ1σ2 σ2
.
Finally the copula parameter ρ is estimated using Pearson’s correlation coefficient. An
analytic description of the method and its application can be found in Lennartsson and
Shu, (2005). The dependence between successive days is demonstrated in Fig. 13 where
the transformed data from Lund are plotted.
−4 −3 −2 −1 0 1 2 3 4−4
−3
−2
−1
0
1
2
3
4
T(Y
t+1)
T(Yt)
17
Page 32
Fig.13: Plot of the dependence structure with the marginal distributions transformed to
standard normal.
For a thorough coverage of bivariate copulas and their properties see Hutchinson and
Lai (1990), Joe (1997), Nelsen (2006), and Trivedi and Zimmer (2005) who provide with
a copula tutorial for practitioners. The values of the correlation coefficient ρ, estimated
for each station are collected in Table 4. Notice that all the estimates of the correlation
coefficient ρ are statistically significant, which makes the assumption of independence
between the data points to seem unreasonable.
5.2 Marginal Distribution
Finally, to model the amount precipitation process we propose an approach that combines
the fit of the distribution of excesses over a high threshold with the empirical distribution
of the original data below the threshold. We commence our analysis by introducing
some notation followed by some introductory remarks. Let X1, X2, . . . be a sequence of
independent and identically distributed random variables having marginal distribution
F (x). Let us also denote by
Fu(x) = P (X ≤ x|X > u),
for x > u, the conditional distribution of X given that it exceeds level u and assume that
Fu(x) can be modelled by means of a generalized Pareto distribution, that is
Fu(x) = 1−(
1 + ξ
(x− u
σ
))− 1ξ
, (5)
for some µ, σ > 0 and ξ over the set {x : x > u and 1 + ξ x−uσ
> 0}, and zero otherwise.
Let also, Femp(x) denote the empirical distribution i.e.,
Femp(x) =1
n
n∑i=1
{Xi ≤ x},
where {·} denotes the indicator function of an event, i.e. the 0−1 random variable which
takes value 1 if the condition between brackets is satisfied and 0 otherwise.
Finally, define the function
FC(x; u) = Femp(x ∧ u) + (1− Femp(u))Fu(x),
18
Page 33
which, as can be easily checked, is a probability distribution function that will be used to
model the amount precipitation process. Thus what needs to be addressed is the choice
of the level u above which the excesses can be accurately modelled using a generalized
Pareto distribution as well as methods for the estimation of the distribution parameters.
5.2.1 Choice of Threshold Level
Selection of a threshold level u, above which the generalized Pareto distribution assump-
tion is appropriate is a difficult task in practice see for example, McNeil (1996), Davison
and Smith (1990) and Rootzen and Tajvidi (1997). Frigessi et al. (2002), suggest a dy-
namic mixture model for the estimation of the tail distribution without having to specify
a threshold in advance. Once the threshold u is fixed, the model parameters ξ and σ are
estimated using maximum likelihood, although there exists a number of other alternative
methods, see for instance Resnick (1997) and Crovella and Taqqu (1999) and references
therein.
5.2.2 Extreme Value Analysis for Dependent Sequences
The generalized Pareto distribution is asymptotically a good model for the marginal
distribution of high excesses of independent and identically distributed random variables,
see Coles (2001), Leadbetter et al. (1983). Unfortunately, this is a property that is almost
unreasonable for most of the climatological data sets since dependence in successive days
is to be expected. A way of dealing with the dependence between the excesses is either to
choose the level u high enough so that enough time has past between successive excesses to
make them independent or to use declustering, which is probably the most widely adopted
method for dealing with dependent exceedances; it corresponds to filtering the dependent
observations to obtain a set of threshold excesses that are approximately independent,
see Coles (2001). A simple way of determining m-clusters of extremes, after specifying a
threshold u, is to define consecutive excesses of u to belong to the same m-cluster as long
as they are separated by less than m+1 time days. It should be noted that the separation
of extreme events into clusters is likely to be sensitive to the choice of u, although we do
not study this effect in this work. The effect of declustering to the generalized Pareto
distribution in Eq. 5 is the replacement of the parameters σ and ξ by σθ−1 and ξ, where
19
Page 34
θ is the so-called extremal index and is loosely defined as
θ = (limiting mean cluster size)−1.
5.3 Method Application
In this subsection we apply the method described in subsection 5.2 to model the thinning
of the amount of precipitation process, i.e. Yt. To demonstrate the method we use data
from the station in Lund. The rest of the stations give similar results.
As we have already seen, the data exhibit temporal dependence. The correlation
coefficient ρ, using the Gaussian copula for the data from Lund was estimated to be 0.1362.
The dependence in the data can also be seen in Fig. 10, where the expected number of
m clusters (with more than one observation) for different values of m and u = 15mm
are plotted. The expected number of these m clusters, assuming the observations are
independent is denoted by ’o’ and are consistently less that the observed number of m
clusters that is denoted by ’+’. The expected number of m clusters computed assuming
the observations are actually correlated (ρ = 0.1362) is denoted by ’*’ and provides with
an obvious improvement to the assumption of independence. We also provide with 95%
exact confidence intervals for both cases. The observed values fall inside the confidence
interval constructed assuming correlated data.
0 1 2 3 4 5 60
10
20
30
40
50
m
No
clus
ters
with
mor
e th
an o
ne o
bs
Fig. 10: Number of m-clusters with more than one observation. ’+’ denotes the observed
and ’o’ the theoretical number of m-clusters assuming that the observations are indepen-
dent, while ’*’ denotes the number of m clusters using ρ = 0.1362. Line ’–’ denotes the
95% confidence interval for the theoretical number of m-clusters assuming independence,
20
Page 35
while ’-.’ denotes the 95% confidence interval for the theoretical number of m-clusters
assuming ρ = 0.1362.
After the cluster size has been decided, in the case of the station in Lund m = 0, we
turn to the problem of estimating the parameters ξ, σ and θ for the generalized Pareto
model. The choice of the specific threshold (u = 15mm) was based on mean residual life
plot. It is expected, see Coles (2001) that for the threshold u for which the generalized
Pareto model provides a good approximation for the excesses above that level, the mean
residual life plot i.e. the locus of the points{(u,
1
nu
nu∑i=1
(Yt(i) − u)
): u < Y max
t
},
where Yt(1), . . . , Yt(nu) are the nu observations that exceed u and Y maxt is the largest ob-
servation of the process Yt, should be approximately linear in u. Fig. 11 shows the mean
residual life plot with approximate 95% confidence interval for the daily precipitation in
Lund. The graph appears to curve from u = 0 until u = 15 and is approximately linear
after that threshold. It is tempting to conclude that there is no stability until u = 28
after which there is approximate linearity which suggests u = 28. However, such threshold
gives very few excesses for any meaningful inference (33 observations out of 16000). So
we decided to work initially with the threshold set at u = 15.
0 10 20 30 40 50 600
2
4
6
8
10
12
E[X
−u|
X>
u]
u
Fig. 11: Mean residual life plot of amount precipitation process from Lund, dotted lines
give the 95% confidence interval.
Finally, the different diagnostic plots for the fit of the Generalized Pareto distribution
are collected in Fig. 12. The data from the rest of the stations have produced similar plots
21
Page 36
none of which gave any reason for concern about the quality of the fitted models. The
parameters of the generalized Pareto model for the data from all the stations together
with 95% confidence intervals are collected in Table 4. For three different stations, (i.e.
Bolmen, Boras, and Hapamanda), the estimates of the shape parameter, ξ, are negative.
0 0.5 10
0.5
1
Ext
rem
e E
mpi
rical
mod
el
GP model
Probability Plot
20 40 60
20
40
60
GP model (mm)
Ext
rem
e E
mpi
rical
mod
el (
mm
) Quantile Plot
10 40 160
40
60
80
Return Period (Years)
Ret
urn
Leve
l
Return Level Plot
20 40 600
20
40
60
80
no o
f obs
erva
tions
Amount of precipitation (mm)
Density Plot
Fig. 12: Diagnostic plots for threshold excess model fitted to daily precipitation data
from the station in Lund.
Table 5 shows θ for different values of m-clusters and threshold u = 15 for the data
from Lund.
6 Evaluation
To verify the validity of the model, we have obtained distribution functions of the dif-
ferent precipitation indices as stipulated by the Expert Team and its predecessor, the
CCl/CLIVAR Working Group (WG) on Climate Change Detection, see Peterson et al. (2001)
and Karl et al. (1999). Sixteen of those indices are of relevance to this work, two regard-
ing only the occurrence of precipitation process (CDD and CWD), another two regarding
only the amount precipitation process (SDII and Prec90p) and the remaining twelve con-
cerning both processes, see Table 6. Using the chain dependent model, we have obtained
the distribution of each index based on 100, 000 simulations. This has been compared
to the empirical distribution (’.-’ line in Figs. 14 - 18). The agreement between the two
distributions is more than satisfactory. Moreover, the empirical distribution falls always
inside the 90% exact confidence intervals. The results have been presented for the weather
station in Lund. The rest of the stations give similar results.
22
Page 37
Station σ CI for σ ξ CI for ξ θ u (mm) ρ
Lund 5.91 (4.93, 7.03) 0.076 (-0.041, 0.236) 0.935 15 0.1362
Bolmen 6.44 (5.56, 7.41) -0.0002 (-0.095, 0.116) 0.921 15 0.2008
Hano 5.29 (3.044, 8.737) 0.458 (0.115, 1.05) 0.977 25 0.1649
Boras 7.63 (7.01, 8.28) -0.011 (-0.067,0.053) 0.794 10 0.1982
Varberg 5.48 (4.687, 6.378) 0.106 (0.001, 0.236) 0.926 15 0.1206
Ungsberg 5.768 (4.622,7.115) 0.245 (0.089,0.445) 0.925 15 0.1843
Saffle 6.62 (5.96,7.329) 0.099 (0.027,0.183) 0.857 10 0.1809
Soderkoping 6.259 (4.32,8.884) 0.297 (0.1, 0.649) 0.984 25 0.1678
Stockholm 5.597 (4.827,6.453) 0.135 (0.033,0.259) 0.903 10 0.1523
Malung 6.355 (5.676,7.095) 0.08 (0.004,0.17) 0.86 10 0.2280
Vattholma 4.964 (3.521,6.784) 0.334 (0.098,0.667) 0.984 20 0.1709
Myskelasen 6.854 (5.962,7.844) 0.019 (-0.072,0.13) 0.849 10 0.2311
Harnosand 7.863 (7.053, 8.742) 0.087 (0.011, 0.175) 0.832 10 0.2068
Rosta 6.276 (5.453,7.19) 0.032 (-0.062, 0.145) 0.876 10 0.2116
Pitea 5.937 (4.429, 7.822) 0.19 (0.004, 0.456) 0.96 20 0.2010
Stensele 7.66 (6.098, 9.5) 0.041 (-0.11, 0.236) 0.915 15 0.2249
Haparanda 5.628 (4.405, 7.07) -0.073 (-0.196, 0.125) 0.984 18 0.1871
Kvikkjokk 5.66 (5.01, 6.36) 0.04 (-0.04, 0.137) 0.864 10 0.2526
Pajala 5.033 (3.705, 6.728) 0.356 (0.153, 0.646) 0.966 18 0.2385
Karesuando 5.303 (4.117, 6.754) 0.12 (-0.037, 0.34) 0.922 15 0.2206
Table 4: Extremal parameters and their 95% confidence intervals for each weather station.
m θ
0 0.9144
1 0.8836
2 0.8425
3 0.8322
Table 5: Values of the parameter θ for different choices of m clusters.
23
Page 38
Index Description Formula
R10mm Heavy precipitation days∑
1{Zi>10}
R20mm Very heavy precipitation days∑
1{Zi>20}
RX1day Highest 1 day precipitation amount maxi Zi
RX5day Highest 5 day precipitation amount maxi
∑4j=0 Zi+j
CDD Max number of consecutive dry days max{j : τj(Xi) = 0}
CWD Max number of consecutive wet days max{j : w = τj(Xi), wk > 0,∀k}
R75p Moderate wet days∑
1{Zi>q0.75}
R90p Above moderate wet days∑
1{Zi>q0.90}
R95p Very wet days∑
1{Zi>q0.95}
R95p Extremely wet days∑
1{Zi>q0.99}
R75pTOT Precipitation fraction due to R75p∑
Zi1{Zi>q0.75}/∑
Zi
R90pTOT Precipitation fraction due to R90p∑
Zi1{Zi>q0.90}/∑
Zi
R95pTOT Precipitation fraction due to R95p∑
Zi1{Zi>q0.95}/∑
Zi
R99pTOT Precipitation fraction due to R99p∑
Zi1{Zi>q0.99}/∑
Zi
SDII Simple daily intensity index∑
Yi/∑
1{Yi>0}Prec90p 90%-quant. of thinned amount of precipitation F−1
Y (0.9)
Table 6: Weather Indices and their mathematical expressions. The quantiles q(·) have
been estimated using the observed data.
As we can see, Fig. 14 (top left), approximately during two years we expect to have
about 17 days with precipitation more than 10mm and, Fig. 14 (top right), about 3 days
with precipitation more than 20mm. But then, see Fig. 14 (bottom left), the precipitation
during each one of these three days will be quite a lot more than 20mm. Fig. 14 (bottom
right) tell us that the probability of having 5 consecutive days of really heavy precipitation
in Lund is quite high.
24
Page 39
10 15 20 25 30
0
0.2
0.4
0.6
0.8
1
no. of days
Index R10mm
Model cumul. distr.Empirical distr.
0 1 2 3 4 5 6 7 8 9 10
0
0.2
0.4
0.6
0.8
1
no. of days
Index R20mm
Model cumul. distr.Empirical distr.
15 20 25 30 35 40 45 50 55 60 65
0
0.2
0.4
0.6
0.8
1
amount of prec. (mm)
Index RX1day
Model cumul. distr.Empirical distr.
30 40 50 60 70 80 90 100 110
0
0.2
0.4
0.6
0.8
1
amount of prec. (mm)
Index RX5day
Model cumul. distr.Empirical distr.
Fig. 14: Plots of R10mm (top left), R20mm (top right), RX1day (bottom left) and
RX5day (bottom right). theoretical distribution ’-’ and empirical distribution ’.-’.
10 15 20 25 30 35
0
0.2
0.4
0.6
0.8
1
no. of days
Index CDD
Model cumul. distr.Empirical distr.
8 10 12 14 16 18 20 22 24
0
0.2
0.4
0.6
0.8
1
no. of days
Index CWD
Model cumul. distr.Empirical distr.
Fig. 15: Plot of maximum number of consecutive dry days (left), and maximum number
of consecutive wet days (right).
As we notice in Fig. 15 (left), once every two years we should expect to have a dry
spell with length more than two weeks, and a wet spell of approximately 12 days.
25
Page 40
30 35 40 45 50 55 60 65 70
0
0.2
0.4
0.6
0.8
1
no. of days
Index R75p
Model cumul. distr.Empirical distr.
10 15 20 25 30
0
0.2
0.4
0.6
0.8
1
no. of days
Index R90p
Model cumul. distr.Empirical distr.
4 6 8 10 12 14 16 18
0
0.2
0.4
0.6
0.8
1
no. of days
Index R95p
Model cumul. distr.Empirical distr.
0 1 2 3 4 5 6 7 8 9
0
0.2
0.4
0.6
0.8
1
no. of days
Index R99p
Model cumul. distr.Empirical distr.
Fig. 16: Plot of the probability of number of moderate wet days (top left), above mod-
erate wet days (top right), very wet days (bottom left) and extremely wet days (bottom
right).
In Fig. 16 (top left), we see that every two years in Lund, we expect to have almost
fifty moderately wet days (top right), almost 18 above moderately wet days (top right),
almost 8 very wet days (bottom left) and almost 2 extremely wet days (bottom right).
In Fig. 17 (top left), we see that during the fifty moderately wet days that we ex-
pect over a period of two years in Lund we will have about 70% of the total amount of
precipitation. Similarly, during the 18 above moderate wet days we expect on average a
little more than 40% of the total precipitation amount (top right), for the 8 very wet days
about 25% of the total amount (bottom left) and for the 2 extremely wet days about 10%
(bottom right) of the total amount.
In Fig. 18 (left), we see that the average amount of precipitation per day of precipita-
tion is 3.5mm and also every year on average only 1 out of the 10 precipitation days the
downfall exceeds 9.5mm.
26
Page 41
0.6 0.65 0.7 0.75 0.8
0
0.2
0.4
0.6
0.8
1
Index R75pTOT
Model cumul. distr.Empirical distr.
0.25 0.3 0.35 0.4 0.45 0.5 0.55
0
0.2
0.4
0.6
0.8
1
Index R90pTOT
Model cumul. distr.Empirical distr.
0.15 0.2 0.25 0.3 0.35 0.4
0
0.2
0.4
0.6
0.8
1
Index R95pTOT
Model cumul. distr.Empirical distr.
0 0.05 0.1 0.15 0.2 0.25
0
0.2
0.4
0.6
0.8
1
Index R99pTOT
Model cumul. distr.Empirical distr.
Fig. 17: Percentage of precipitation during the moderately wet days (top left), the above
moderate wet days (top right), the very wet days (bottom left) and the extremely wet
days (bottom right).
3 3.5 4 4.5 5
0
0.2
0.4
0.6
0.8
1
amount of prec. (mm)
Index SDII
Model cumul. distr.Empirical distr.
8 9 10 11 12 13
0
0.2
0.4
0.6
0.8
1
amount of prec. (mm)
Index Prec90p
Model cumul. distr.Empirical distr.
Fig. 18: Plot of the average amount of precipitation per day of precipitation (left) and the
90% quantile of the amount of precipitation of the thinned precipitation process (right).
7 Conclusions
In this paper, we have modelled the temporal variability of the precipitation in Sweden.
The different weather stations have been assumed as not having any spatial dependence.
27
Page 42
It is among our future research plans to try to model also the spatial variability of the
precipitation in the different weather stations in Sweden. Some interesting conclusions
can be drawn.
We have used a chain dependent model for the precipitation. That consists of a compo-
nent for the occurrence of precipitation and a component for the amount of precipitation.
For the first component, we have used high order Markov chains with two states. We have
shown that the 1-Markov chain model that has been used extensively, is an inadequate
model for most of the Swedish stations. For example, when the distribution of the long
dry spell is of interest, the 1-Markov chains underestimates the length of the long dry
spell in some cases up to half a day.
For the amount of precipitation process, we have used a copula to describe the temporal
dependence structure between successive days, which in reality is a Gaussian process with
transformed marginals. Then, the cumulative distribution has been modelled in two
steps. First using the empirical distribution for the amounts of precipitation that are less
than a given threshold and, then using a generalised Pareto distribution to model the
excesses above the threshold. Such models have the advantage that they provide with the
mathematical platform that allows computation of such quantities as return periods.
Finally, the distributions of different weather indices have been computed using Monte
Carlo Markov Chain techniques, and been compared to the empirical distributions ob-
tained from the data. The agreement between the two distributions has been really good,
which supports the choice of the models.
References
[1] Akaike, H. (1974). A new look at statistical model identification, IEEE Trans. Auto.
Contol, AC, 19, pp.716-722
[2] Benjamin, J.R. and Cornell, C.A., (1970). Probability, Statistics and Decision for Civil
Engineers, McGraw-Hill, Inc., New York, 685 pp.
[3] Bruhn, J.A., Fry, W.E. and Fick, G.W., (1980). Simulation of daily weather data
using theoretical probability distributions. J. Appl. Meteorol. 19, pp. 1029-1036.
[4] Castellvi, F. and Stockle, C.O., (2001). Comparing a locally-calibrated versus a gen-
eralised temperature weather generation. Trans. ASAE 44 5, pp. 1143-1148.
28
Page 43
[5] Chin, E. H., (1977). Modelling daily precipitation process with Markov chain, Wat.
Resources Res., 13, 949-956.
[6] Coles, S., (2001). An Introduction to Statistical Modeling of Extreme Values. Springer,
London.
[7] Cox, D.R. and Isham, V., (1988). A simple spatial-temporal model for rainfall (with
discussion). Proc. R. Soc. Lond. A, 415, 317-328.
[8] Cox, D.R. and Isham, V., (1994). Stochastic models of precipitation. In Statistics for
the Environment 2: Water Related Issues (eds V. Barnett and K.F. Turkman), ch. 1, pp.
3-18. Chichester: Wiley.
[9] Crovella, M. and Taqqu, M., (1999). Estimating the heavy tail index from scaling
properties, Methodology and Computing in Applied Probability 1, 55-79.
[10] Dalevi, D., Pubhashi, D. and Hermansson, M., (2006). A New Order Estimator for
Fixed and Variable Length Markov Models with Applications to DNA Sequence Similar-
ity, Stat. Appl. Genet. Mol. Biol., 5, Article 8.
[11] Davison, A.C. and Smith, R.L., (1990). Models for exceedances over high thresholds,
J. Roy. Statist. Soc. B, 52, pp. 393-442.
[12] Frigessi, A., Haug, O., Rue, H., (2002). A Dynamic Mixture Model for Unsupervised
Tail Estimation without Threshold Selection, Extremes, 5, pp.219-235.
[13] Gabriel, K.R. and Neumann, J., (1962). A Markov chain model for daily rainfall
occurrences at Tel Aviv. Quart.J.royal Met.Soc. 88, 90-95.
[14] Geng, S., Frits, W.T., de Vries, P. and Supit, I., (1986). A simple method for gener-
ating daily raifall data. Agric. For. Meteorol. 36, pp. 363-376.
[15] Guttorp, P. (1995). Stochastic Modelling of Scientific Data,Chapman & Hall, London
Chapter 2
[16] Hutchinson, M.F., (1995). Stochastic space-time weather models from ground-based
data. Agric. For. Meteorol., 73, 237-264.
[17] Hutchinson, T.P. and Lai, C.D., (1990). Continuous Bivariate Distributions, Empha-
sising Applications. Sydney, Australia: Rumsby.
[18] Jimoh, O.D. and Webster, P., (1996). the optimum order of a Markov chain model
for daily rainfall in Nigeria. Journal of Hydrology, 185, 45-69.
[19] Joe, H., (1997). Multivariate Models and Dependence Concepts. London: Chapman
29
Page 44
& Hall
[20] Karl, T.R., Nicholls, N. and Ghazi, A., (1999). CLIVAR/GCOS/WMO workshop on
indices and indicators for climate extremes: Workshop summary, Climatic Change. Vol.
32, pp. 3-7.
[21] Lana, X. and Burgueno, A., (1998). Daily dry-wet behaviour in Catalonia (NE Spain)
from the viewpoint of Markov chains, Int. J. Climatol. 18, 793-815.
[22] Leadbetter, M.R., Lindgren, G., and Rootzen, H., (1983). Extremes and Related
Properties of Random Sequences and Series. Springer Verlag, New York.
[23] LeCam, L., (1961). A stochastic description of precipitation Proc.4th Berkeley Symp.,
pp.165-186.
[24] Lennartsson, J., and Shu, M., (2005). Copula Dependence Structure on Real Stock
Markets, Masters thesis, Chalmers University of Technology, 2005-01.
[25] Liao, Y., Zhang, Q. and Chen, D., (2004). Stochastic modeling of daily precipitation
in China. Journal of Geographical Sciences, 14(4), 417-426.
[26] Mellor, D., (1996). The modified turning bands (mtb) model for space-time rainfall:i,
model definition and properties. J. Hydrol., 175 113-127.
[27] McNeil, A.J., (1996). Estimating the tails of loss severity distributions using extreme
value theory, Technical report, Department Mathematik, ETH Zentrum, Zurich.
[28] Nelsen, R. B., (2006). An Introduction to Copulas 2nd edition. New York: Springer.
[29] Norris, J.R., (2005). Markov chains, Cambridge University Press.
[30] Peterson, T.C., Folland, C., Gruza, G., Hogg, W., Mokssit, A. and Plummer, N.,
(2001). Report on the Activities of the Working Group on Climate Change Detection
and Related Rapporteurs 1998-2001. World Meteorological Organisation, WCDMP-47,
WMO-TD 1071.
[31] Racsko, P., Szeidl, L. and Semenov, M., (1991). A serial approach to local stochastic
weather models. Ecol. Model. 57, pp. 27-41.
[32] Resnick, S.I., (1997). Heavy tail modeling and teletraffic data, The Annals of Statis-
tics 255, 1805-1869.
[33] Rootzen, H. and Tajvidi, N., (1997). Extreme value statistics and wind storm losses:
A case study, Scandinavian Actuarial Journal 1, 70-94.
[34] Rodrıguez-Iturbe, I., Cox, D. and Isham, V., (1987). Some models for rainfall based
30
Page 45
on stochastic point processes. Proc. R. Soc. Lond., A 410, 269-288.
[35] Rodrıguez-Iturbe, I., Cox, D. and Isham, V., (1988). A point process model for rain-
fall: further developments. Proc. R. Soc. Lond., A 417, 283-298.
[36] Schwarz, G., (1978). Estimating the dimension of a model. Ann. Stat. 6, pp. 461-
464.
[37] Selker, J.S. and Haith, D.A., (1990). Development and testing of simple parameter
precipitation distributions. Water Resour. Res. 26 11, pp. 2733-2740.
[38] Smith, R. L. and Robinson, P.J., (1997). A Bayesian approach to the modelling of
spatial-temporal precipitation data. Lect. Notes Statist., 237-269.
[39] Srikanthan, R. and McMAhon, T.A., (2001). Stochastic generation of annual, monthly
and daily climate data: A review. Hydrol. Earth Syst. Sci. 5 4, pp. 653-670.
[40] Stern, R.D. and Coe, R. (1984)., A Model fitting Analysis of Daily Rainfall Data,
J.R.Statist.Soc. A, 147, Part1, pp.1-34.
[41] Stidd, C.K., (1973). Estimating the precipitation climate. Wat. Resour. Res., 9
1235-1241.
[42] Trivedi, P. K. and Zimmer, D.M., (2007). Copula Modelling: An Introduction for
Practitioners, Foundations and Trends in Econometrics, Vol. 1, No 1, 1-111.
[43] Waymire, E., Gupta, V. K., (1981). The mathematical structure of rainfall repre-
sentations: 3,, Some applications of the point process theory to rainfall processes. Wat.
Resour. Res.17, 1287-1294.
[44] Waymire, E., Gupta, V. K. and Rodrıguez-Iturbe, I., (1984). Spectral theory of rain-
fall intensity at the meso-β scale. Wat. Resour. Res.20, 1453-1465.
[45] Woolhiser, D.A., (1992). Modelling daily precipitation-progress and problems. In:
A. Walden and P. Guttorp (Editors), Statistics in the Environmental and Earth Sciences.
Edward Arnold, London, pp.71-89.
31
Page 46
8 Appendix
8.1 Review of Mathematical Order Estimators
Let Xt denote a k-Markov chain that is defined on a state space S and xn1 its realisation.
Let also PML(k)(xn1 ) be the kth order maximum likelihood, i.e.
PML(k)(xn1 ) = max P (Xk
1 )Πni=k+1P (Xi = xi|τk(X
i−1) = τk(xi−1)).
Tong (1975) reported that the Akaike Information Criterion (AIC) order estimator, could
be used as an objective technique for determining the optimum order k of the chain, see
also Akaike (1974).The optimium order k is the order that has the minimum loss function:
kAIC(xn1 ) = argmink(− log PML(k)(x
n1 ) + |S|k).
Schwartz (1978) presented an alternative technique the Bayesian Information Criterion
(BIC) order estimator whose consistency was established under general conditions was
only recently established. The optimum order, k is the order that minimises the loss
function which now is given by:
kBIC(xn1 ) = argmink(− log PML(xn
1 ) +|S|k(|S| − 1)
2log(n)).
Dalevi et al. (2006) showed using experimental results that the BIC order estimator tends
to under-estimate the order as k gets larger for moderate data sizes.
Finally, the Maximal Fluctuation Criterion (MFC) contrary to the AIC and BIC
order estimators, was specifically designed for multiple step Markov chains. Let for any
realisation x ∈ Sn of the k-Markov chain, Nx(w) = |{i ∈ [1, n] : τl(xi) = w,w ∈ Sl}|
denote the number of times w occurs in x. The Peres-Shields Fluctuation function is
defined as
∆k(v) = maxs∈S
|Nx(vs)− Nx(τk(v)s)
Nx(τk(v))Nx(v)|.
When the order of the Markov chain is k or less, this fluctuation is small. Therefore, the
Maximal Fluctuation Criterion (MFC) order estimator is defined as
kMFC(xn1 ) = min{k ≥ 0 : max
k<|v|<log log(n)∆k(v) < n3/4}.
In practice the function log log(·) is substituted by any function that grows slower than
log(·). Dalevi et al. (2006) suggested the Generalized Maximum Fluctuation Criterion
32
Page 47
(GMFC) order estimator, which is closely related to the Maximal Fluctuation Criterion
(MFC) order estimator,
kGMFC(xn1 ) = argmaxk
maxk−1<|v|<f(n) ∆k−1(v)
maxk<|v|<f(n) ∆k(v),
where f(n) is any function that satisfies the same conditions as for the GMF order esti-
mator.
33
Page 48
Accepted Manuscript
Modelling precipitation in Sweden using multiple step Markov chains and a
composite model
Jan Lennartsson, Anastassia Baxevani, Deliang Chen
PII: S0022-1694(08)00484-8
DOI: 10.1016/j.jhydrol.2008.10.003
Reference: HYDROL 16319
To appear in: Journal of Hydrology
Received Date: 28 May 2008
Revised Date: 3 October 2008
Accepted Date: 4 October 2008
Please cite this article as: Lennartsson, J., Baxevani, A., Chen, D., Modelling precipitation in Sweden using multiple
step Markov chains and a composite model, Journal of Hydrology (2008), doi: 10.1016/j.jhydrol.2008.10.003
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers
we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and
review of the resulting proof before it is published in its final form. Please note that during the production process
errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Page 49
ACCEPTED MANUSCRIPT
Modelling precipitation in Sweden using multiple step Markov chains and a
composite model
Jan Lennartsson1, Anastassia Baxevani1∗†, Deliang Chen2
1 Department of Mathematical Sciences, Chalmers University of Technology, University of Gothen-1
burg, Gothenburg, Sweden 2 Department of Earth Sciences, University of Gothenburg, Gothenburg,2
Sweden3
Abstract4
In this paper, we propose a new method for modelling precipitation in Sweden. We consider a chain dependent5
stochastic model that consists of a component that models the probability of occurrence of precipitation at a6
weather station and a component that models the amount of precipitation at the station when precipitation does7
occur. For the first component, we show that for most of the weather stations in Sweden a Markov chain of8
an order higher than one is required. For the second component, which is a Gaussian process with transformed9
marginals, we use a composite of the empirical distribution of the amount of precipitation below a given threshold10
and the generalized Pareto distribution for the excesses in the amount of precipitation above the given threshold.11
The derived models are then used to compute different weather indices. The distribution of the modelled indices12
and the empirical ones show good agreement, which supports the choice of the model.13
Key words: High order Markov chain, generalized Pareto distribution, copula, precipitation process, Sweden14
1 Introduction15
Realistic sequences of meteorological variables such as precipitation are key inputs in many hydrologic,16
ecologic and agricultural models. Simulation models are needed to model stochastic behavior of climate17
system when historical records are of insufficient duration or inadequate spatial and /or temporal coverage.18
In these cases synthetic sequences may be used to fill in gaps in the historical record, to extend the19
historical record, or to generate realizations of weather that are stochastically similar to the historical20
record. A weather generator is a stochastic numerical model that generates daily weather series with the21
same statistical properties as the observed ones, see Liao et al. (2004).22
In developing the weather generator, the stochastic structure of the series is described by a statistical23
model. Then, the parameters of the model are estimated using the observed series. This allows us to24
generate arbitrarily long series with stochastic structure similar to the real data series.25
∗Corresponding author†Research supported partially by the Gothenburg Stochastic Center and the Swedish foundation for Strategic Research
through Gothenburg Mathematical Modelling Center.
1
Page 50
ACCEPTED MANUSCRIPT
Parameter estimation of stochastic precipitation models has been a topic of intense research in the last26
20 years. The estimation procedures are intrinsically linked to the nature of the precipitation model itself27
and the timescale used to represent the process. There are models which describe the precipitation process28
in continuous time and models describing the probabilistic characteristics of precipitation accumulated29
on a given time period, say daily or monthly totals. Different reviews of the available models have been30
presented: see for example Woolshiser (1992), Cox and Isham (1988) and Smith and Robinson (1997).31
Continuous time models for a single site with parameters related to the underlying physical precipi-32
tation process are particularly important for the analysis of data at short timescales, e.g. hourly. Some33
of these models are described in Rodrıguez-Iturbe et al. (1987, 1988) and Waymire and Gupta (1981).34
When only accumulated precipitation amounts for a particular time period (daily) are recorded then35
empirical statistical models, based on stochastic models that are calibrated from actual data are appealing.36
Empirical statistical models for generating daily precipitation data at a given site can be classified into37
four different types, chain dependent or two-part models, transition probability matrix models, resampling38
models and ARMA time series models, see Srikanthan and McMahon (2001) for a complete review of the39
different models.40
A generalization of the precipitation models for a single site is the spatial extension of these models41
for multiple sites, to try to incorporate the intersite dependence but preserving the marginal properties42
at each site. A more ambitious task is the modelling of precipitation continuously in time and space and43
original work on these type of models based on point process theory was presented by LeCam (1961) and44
further developed by Waymire et al. (1984) and Cox and Isham (1994). Mellor (1996) has developed the45
modified turning bands model which reproduces some of the physical features of precipitation fields in46
space as rainbands, cluster potential regions of rain cells.47
In this study we concentrate on the chain-dependent model for the daily precipitation in Sweden48
which consists of two steps, first a model for the sequence of wet/dry days and second, a model for the49
amount of precipitation for the wet days. For the first, we use high-order Markov chains and for the50
second we introduce a composite model that incorporates the empirical distribution and the generalized51
Pareto distribution.52
2 Data53
Precipitation data from 20 stations in Sweden have been used in the studies presented in this paper. The54
locations are shown in Fig. 1 and the names of the stations are given in Table 1. The data consist of55
accumulated daily precipitation collected during 44 years starting on the 1st of January 1961 and ending56
the 31st of December 2004 and are provided by the Swedish Meteorological and Hydrological Institute57
(SMHI). The number of missing observations in all stations is generally low (< 5%). The time plots of58
the annual number of wet days (above the threshold 0.1 mm) at the 20 stations are presented in Fig. 2.59
Time plots of annual number of wet days showed that the precipitation regime in some stations60
(namely, Soderkopping, Rosta and Stensele) contains possible trends. The results presented in the next61
2
Page 51
ACCEPTED MANUSCRIPT
sections refer to the whole period of data from all stations, but attention should be paid when we refer62
to the above mentioned stations. In Fig. 3, time plots of the annual amount of precipitation of the wet63
days are presented. The total amounts of precipitation seem to be stationary over the different years.64
3 Model65
To model precipitation in Sweden, we have decided to use a chain dependent model. The first part of the66
model can be dealt with using Markov chains. Gabriel and Neumann (1962) used a first-order stationary67
Markov chain. The models have since been extended to allow for non-stationarity, both by fitting separate68
chains to different periods of the year and by fitting continuous curves to the transition probabilities, see69
Stern and Coe (1984) and references within. The order of Markov chain required has also been discussed70
extensively, for example Chin (1977) and references therein, with the obvious conclusion that different71
sites require different orders. Still, the first order Markov chains are a popular choice since they have been72
shown to perform well for a wide range of different climates, see for example Bruhn et al. (1980), Lana73
and Burgueno (1998), Aksoy and Bayazit (2000) and Castellvi and Stockle (2001). The main deficiency74
associated with the use of first order models is that long dry spells are not well reproduced, see Racsko75
et al. (1991), Guttorp (1995).76
To model the amount of precipitation that has occurred during a wet day, different models have been77
proposed in the literature all of which assume that the daily amounts of precipitation are independent78
and identically distributed. Stidd (1973) and Hutchinson (1995) have proposed a truncated normal79
model for the amount of precipitation with a time dependent parameter, while the Gamma and Weibull80
distributions have been selected by Geng et al. (1986) as well as Selker and Heith (1990), because of their81
site-specific shape.82
In this study, we model the occurrence of wet/dry days using Markov chains of higher order and for83
the amount of precipitation we use a composite model, consisting of the empirical distribution function84
for values below a threshold and the distribution of excesses for values above the given threshold. Such a85
model is more flexible, describes better the tail of the distribution and additionally allows for dependence86
in the precipitation process.87
Let Zt be the precipitation at a certain site at time t measured in days. Then, a chain-dependent
model for the precipitation is given by,
Zt = XtWt,
where Xt and Wt are stochastic processes such that Xt takes values in {0, 1} and Wt takes values in88
R+ \ 0. The processes Xt and Wt will be referred to as the occurrence of precipitation and the amount89
of precipitation process, respectively.90
The approach presented in this study provides a mechanism to make predictions of precipitation in91
time. This is particularly important for many applications in hydrology, ecology and agriculture. For92
example, at a monthly level, the amount of precipitation and the probability and length of a dry period93
are required quantities for many applications.94
3
Page 52
ACCEPTED MANUSCRIPT
4 Models for the Occurrence of Precipitation95
Let {Xt, t = t1, . . . , tN} denote the sequence of daily precipitation occurrence, i.e. Xt = 1, indicates a96
wet day and Xt = 0, a dry day. A wet day in the context of this study, occurs when at least 0.1mm of97
precipitation was recorded by the rain gauge. The level has been chosen above zero in order to avoid98
identifying dew and other noise as precipitation and to also avoid difficulties arising from the inconsistent99
recording of very small precipitation amounts. Moreover, daily precipitation amounts of less than 0.1mm100
can have relatively large observational errors, and including them would cause a significant change in the101
estimated transition probabilities of the occurrences. As a consequence this introduces additional errors102
into the fitted models. The model is fitted over different periods of the year, that is subsets of the N103
days of the year, that may be assumed stationary.104
Before we continue any further we need to introduce some notation. Let S = {0, 1} denote the105
state space of the k-Markov chain Xt. The elements of S are called letters and an ordering of letters106
w ∈ Sl = S×· · ·×S is called a word of length l, while the words composed of the letters from position i to107
j in w for some 1 ≤ i ≤ j ≤ l, are denoted as wji = (wi, wi+1, ..., wj). Finally, for k ≤ l let τk(w) = wl
l−k+1108
denote the k-tail of the w word, i.e. τk(w) denotes the last k letters of w. If no confusion will arise when109
k ≤ j − i, we also write τk(wj) instead of τk(wji ).110
It is assumed that the process Xt is a k-Markov chain: a model completely characterized by the
transition probability
pw,j(t) := P (Xt = j|τk(Xt−1) = w), j ∈ S, t = t1, . . . , tN ,
where w is a word of length k and Xt−1 = {. . . , Xt−2, Xt−1} is the whole process up to t− 1 so τk(Xt−1)111
is the last k days up to and including Xt−1; that is, τk(Xt−1) = (Xt−k, . . . , Xt−1). Note that, for a112
2−state Markov chain of any order pw,1(t)+pw,0(t) = 1. In the special case of time homogeneous Markov113
chain, pw,j(t) = pw,j , for t = t1, . . . , tN , i.e. the transition probabilities are independent of time.114
Let nw,j(t) denote the number of years during which day t is in state j and is preceded by the word
w (i.e. τk(Xt−1) = w, w ∈ Sk and Xt = j). Then the probabilities pw,j(t) are estimated by the observed
proportions
pw,j(t) =nw,j(t)nw,+(t)
, w ∈ Sk, j ∈ S, t = t1, . . . , tN ,
where + indicates summation over the subscript. Note also that day 60 (February 29th) has data only in115
leap years so day 59 precedes day 61 in non-leap years. Fig. 4 (left) shows the unconditional probability116
of precipitation, pooled over 5 days for clarity, plotted against t for the data from the station in Lund.117
In the context of environmental processes, non-stationarity is often apparent, as in this case, because118
of seasonal effects or different patterns in different months. A usual practice is to specify different subsets119
of the year as seasons, which results to different models for each season, although the determination of120
an appropriate segregation into seasons is itself an issue.121
4
Page 53
ACCEPTED MANUSCRIPT
4.1 Fitting Models to the Occurrence of Precipitation122
There is an inter-annual variation in the annual number of wet days, as can be seen in Fig. 2. Moreover,123
there is also seasonal variation in the mean monthly number of wet days, see Fig. 4 (Right) for data124
from Lund, although this is not as prominent as in other regions of the world. It is possible that the125
optimum order of the chain describing the wet/dry sequence varies within the year and from one year to126
another. It is therefore important to properly identify the period of record that can be assumed as time127
homogeneous.128
Moreover, the problem of finding an appropriate model for the occurrence of precipitation process, Xt,129
is equivalent to the problem of finding the order of a multiple step Markov chain. The Akaike Informa-130
tion Criterion (AIC), Bayesian Information Criterion (BIC) and the Generalized Maximum Fluctuation131
Criterion (GMFC) order estimators, a short description of which can be found in the subsection 8.1, have132
been applied to the data for each of the stations. Various block lengths were considered for determining133
the order of the Markov chain, k, as suggested in Jimoh and Webster (1996).134
• 1 month blocks (i.e. January, February, ..., December),135
• 2 month blocks (January - February, February - March, ..., December - January),136
• 3 month blocks (January - March, February - April, ..., December - February).137
The effect of block length on the order of the Markov chain can be seen in Figs. 5-7. We can notice138
that grouping the data in blocks of length more than one month, results in Markov chains of ”smoother”139
order, in the sense that the order of the chain does not change so fast. It is also interesting to notice140
that while the order of the Markov chain for the stations 16-20, varies a lot according to the AIC and141
GMFC estimators it seems to be almost constant for the BIC order estimator. As it has been expected,142
the BIC order estimator underestimates the order k of the Markov chain relatively to both the AIC and143
GMFC order estimators for large k and moderate data sets, see Dalevi et al. (2006), while the values of144
the GMFC order estimator lie between the BIC and AIC ones. The results presented in Figs. 5-7, confirm145
that the model order is sensitive to the season (month) and the length of the season (number of months)146
considered, as well as the method used in identifying the optimum order. Possible dependence on the147
threshold used for identifying wet and dry days has not been studied here. For the rest of this study,148
we define as seasons the 3 month periods, December-February, March-May, June-August, September-149
November. As can be seen in Fig. 3 for the station in Lund, the rest of the stations provide with similar150
plots, the probability of precipitation is close to be constant during these periods, which makes the151
assumption of stationarity seem plaussible. The orders of the Markov chain for these periods can be152
found in Fig. 7. For the rest of this study the order k of the Markov chain is decided according to the153
GMFC order estimator.154
5
Page 54
ACCEPTED MANUSCRIPT
4.2 Distribution of Dry Spell length155
An interesting aspect of the wet/dry behavior, i.e. the process Xt, is the distribution of the dry (wet)156
spells, i.e., the number of consecutive dry (wet) days, which is an accessible property of multiple step157
Markov chains.158
For a time homogeneous (stationary) k-Markov chain Xt, (k ≥ 2), with state-space S let T be the
first time the process Xt is such that τ2(Xt) = (1, 0), i.e.,
T = inf{t ≥ 0 : τ2(Y t) = (1, 0)}.
So T is the time of the start of the first dry period. Let also for the words u, v ∈ Sk
au,v = P (τk(XT ) = v|τk(X0) = u)
denote the probability the process Xt has at time T a k-tail equal to v given that the k-tail at time 0 is
equal to u. The probabilities au,v are easily obtained for stationary processes, see Norris (2005). Note
that at t = 0, there may be the start of a dry period, the start of a wet period, the continuation of a
dry period or the continuation of a wet period. If D(Xt) denotes the length of the first dry period that
starts at time t = 0 for the k-Markov chain Xt, then assuming additionally that the process Xt is time
homogeneous, the distribution of the first dry spell can be computed as
P (D(Xt) = m) =∑
{u∈Sk}πu
∑{w∈Sk:τ2(w)=(1,0)}
au,wP (τm(Xm−1) = 0, Xm = 1|τk(X0) = w), (1)
where 0 is used to denote sequences of 0′s of appropriate length.159
Now, if v = w01 is a word of length m + k (0 here is of order m− 1) and using the fact the process
Xt is a k-Markov chain, Eq.(1) can be rewritten as
P (D(Xt) = m) =∑
{u∈Sk}πu
∑{w∈Sk:τ2(w)=(1,0)}
au,w
m∏i=1
P (Xi = vk+i|τk(X i−1) = τk(vk+i−1)). (2)
Remark 1 Here we should notice that the distribution of the first dry spell is different than the distri-
bution of the subsequent dry spells for Markov chains of order greater than two. For one or two order
Markov chains there is no need for this distinction. Moreover the equivalent of Eq.( 1) for k = 1 is
P (D(Xt) = m) = pm−10,0 p0,1
while for k = 2, Eq. (2) simplifies to
P (D(Xt) = m) =
⎧⎨⎩ p10,1 for m = 1
p10,0pm−200,0 p00,1 for m ≥ 2,
where au,v = 1 for all u,v in Eq. (1).160
The distribution of the first dry spell can be also used for model selection or model validation purposes.161
For this, we use the Kolmogorov-Smirnov (KS) test, see Benjamin and Cornell (1970). The one sample162
6
Page 55
ACCEPTED MANUSCRIPT
KS test compares the empirical distribution function with the cumulative distribution function specified163
by the null hypothesis.164
Assuming that Pk(x) is the true distribution function (of a Markov chain of order k) the KS test is
D = supm∈N+
|Pk(D(X) ≤ m)− Femp(m)|,
where Femp(x) is the empirical cumulative distribution of the length of the first dry spell. If the data165
comes truly from a k order Markov chain and the transition probabilities are the correct ones, then by166
Glivenko-Cantelli theorem, see Dudewicz and Mishra (1988) the KS test converges to zero almost surely167
(a.s).168
To apply the test, the transition probabilities have been estimated from the data using maximum169
likelihood for different values of the order k of the Markov chain. To obtain the empirical distribution170
of the length of the first dry spell, we have computed the length of the dry spells (sequence of zeros)171
following the first (1, 0). (Here note that this is equivalent to computing the length of the first dry spell172
for Markov chains of order k = 1 or k = 2. In the case of k = 3, although the distribution of the first173
dry spell is not exactly the same as the distribution of any dry spell, we have still used all the dry spells174
available due to shortage of data.) The procedure has been applied separately to data from each station175
and season. If the first observations were zeros, they were ignored as the continuation of a dry spell. Also176
if a dry spell was not over by the end of the season then it was followed inside the next season.177
To determine whether the theoretical model was correct or not, Monte Carlo simulations were per-178
formed. We have obtained the empirical distribution of the length of the first dry spell using 500 synthetic179
wet/dry records of 44 years of data (each station and season was treated separately), and the KS test180
was computed for each one of them, which resulted to the distribution of the KS statistic.181
The suggested orders of the Markov chain using the Kolmogorov-Smirnov statistic at the 10% tail182
value are collected in Fig. 8. The resulting orders of the Markov chain appear to be close to those obtained183
by the BIC order estimator. In Table 2, we have collected information on how many data sets have passed184
the Kolmogorov-Smirnov test at the 10% tail value for the different seasons. Observe that the KS test185
suggests that the 1-Markov chain, although widely used, is an inadequate model for the majority of the186
stations in Sweden over the different seasons.187
4.3 Distribution of Long Dry Spells188
Let us now define as long dry spell, a dry spell with length longer or equal to the order k of the Markov
chain. Then it is easy to show that the distribution of the long dry spell is actually geometric. Indeed,
let a long dry spell that starts at time i have length m ≥ k and let us also assume that we know that the
length of the dry spell is at least l. Then, for m ≥ l ≥ k
P (D(Xt) = m|τl(X i+l−1) = 0) = p0,1pm−l0,0 = p0,1(1− p0,1)m−l,
where as before
p0,1 = P (Xn+1 = 1|τk(Xn) = 0), ∀n.
7
Page 56
ACCEPTED MANUSCRIPT
Therefore, the expected length of long dry spells is given by
E(D(Xt)|τl(X i+l−1) = 0) = l +1− p0,1
p0,1. (3)
Fig. 9 shows the conditional distribution of dry spell given that it has lasted for more than two days189
for the first season and the data from Lund. The estimated order of the Markov chain for this data set is190
2 using both the GMFC and the KS criterion. A first order Markov chain, the popular model of choice191
in this case would obviously underestimate the risk of a long dry spell. A two order Markov chain seems192
to be the best choice for this particular data set.193
It is clear from Table 3, that underestimation of the order k of the Markov chain leads to underesti-194
mation of the expected length of the long dry spells, where again a dry spell is defined as long if it has195
length larger than or equal to the order of the Markov chain.196
5 Modeling the Amount Precipitation Process197
In this section we model the amounts of daily precipitation. This is done in two steps. Firstly we model198
the dependence structure of the amount precipitation process and secondly we estimate the marginal199
distribution.200
One of the important features of any climatological data set, is that they exhibit dependence between201
nearby stations or successive days. In this work we are interested in the latter case and the dependence202
structure is modelled using two-dimensional Gaussian copula.203
After the copula has been estimated, we remove the days with precipitation below the cut-off level204
of 0.1mm. That is, we let Yt be the thinning process resulting from the amount of precipitation process205
Wt when we consider only the wet days, i.e., Yt := Wt|Xt = 1. Then, the marginal distribution of the206
amounts of daily precipitation is modelled following an approach that combines the fit of the distribution207
of excesses over a high threshold with the empirical distribution of the thinned data below the threshold.208
5.1 Copula209
Almost every climatological data set exhibit dependence between successive days. To model the temporal
dependence structure of the data we use the two-dimensional Gaussian copula C given by
C(u, v; ρ) =∫ Φ−1(u)
−∞
∫ Φ−1(v)
−∞
12π√
1− ρ2e−x2−2ρxy+y2
2(1−ρ2) dxdy (4)
= Φρ(Φ−1(u), Φ−1(v)),
where Φ is the cumulative distribution function of the standard normal distribution and Φρ is the joint210
cumulative distribution function of two standard normal random variables with correlation coefficient ρ.211
To estimate the copula, let
A = {t : Yt > 0 and Yt+1 > 0},
8
Page 57
ACCEPTED MANUSCRIPT
be the set of all days with non zero precipitation that were followed by days also with non zero precipitation
(greater than 0.1mm) and
u = [Ya1 , Ya2 , . . . ] , v = [Ya1+1, Ya2+1, . . . ] , a1, a2, · · · ∈ A
be the vectors consisting of the amounts of precipitation during the days indicated in the set A and
the following days respectively, both with marginal distribution F (x). Then, transforming the vectors
u and v by taking the empirical cumulative distribution corrected by the factor nn+1 , (n is the number
of days with positive precipitation in the data set) results to vectors U and V respectively that follow
the discrete uniform distribution in (0, 1). If the Gaussian copula in Eq. (4) describes correctly the
dependence structure of the data, then
(Φ−1(U), Φ−1(V)
) ∼ N⎛⎝⎡⎣ µ1
µ2
⎤⎦ ,
⎡⎣ σ1 ρσ1σ2
ρσ1σ2 σ2
⎤⎦⎞⎠ .
Finally the copula parameter ρ is estimated using Pearson’s correlation coefficient. An analytic description212
of the method and its application can be found in Lennartsson and Shu, (2005). The dependence between213
successive days is demonstrated in Fig. 10 where the transformed data from Lund are plotted.214
For a thorough coverage of bivariate copulas and their properties see Hutchinson and Lai (1990),215
Joe (1997), Nelsen (2006), and Trivedi and Zimmer (2005) who provide with a copula tutorial for prac-216
titioners. The values of the correlation coefficient ρ, estimated for each station are collected in Table 4.217
Notice that all the estimates of the correlation coefficient ρ are statistically significant, which makes the218
assumption of independence between the data points to seem unreasonable.219
5.2 Marginal Distribution220
Finally, to model the amount precipitation process we propose an approach that combines the fit of the
distribution of excesses over a high threshold with the empirical distribution of the original data below
the threshold. We commence our analysis by introducing some notation followed by some introductory
remarks. Let X1, X2, . . . be a sequence of independent and identically distributed random variables
having marginal distribution F (x). Let us also denote by
Fu(x) = P (X ≤ x|X > u),
for x > u, the conditional distribution of X given that it exceeds level u and assume that Fu(x) can be
modelled by means of a generalized Pareto distribution, that is
Fu(x) = 1−(
1 + ξ
(x− u
σ
))− 1ξ
, (5)
for some µ, σ > 0 and ξ over the set {x : x > u and 1+ ξ x−uσ > 0}, and zero otherwise. Let also, Femp(x)
denote the empirical distribution i.e.,
Femp(x) =1n
n∑i=1
{Xi ≤ x},
9
Page 58
ACCEPTED MANUSCRIPT
where {·} denotes the indicator function of an event, i.e. the 0− 1 random variable which takes value 1221
if the condition between brackets is satisfied and 0 otherwise.222
Finally, define the function
FC(x; u) = Femp(x ∧ u) + (1− Femp(u))Fu(x),
which, as can be easily checked, is a probability distribution function that will be used to model the223
amount precipitation process. Thus what needs to be addressed is the choice of the level u above which224
the excesses can be accurately modelled using a generalized Pareto distribution as well as methods for225
the estimation of the distribution parameters.226
5.2.1 Choice of Threshold Level227
Selection of a threshold level u, above which the generalized Pareto distribution assumption is appropriate228
is a difficult task in practice see for example, McNeil (1996), Davison and Smith (1990) and Rootzen and229
Tajvidi (1997). Frigessi et al. (2002), suggest a dynamic mixture model for the estimation of the tail230
distribution without having to specify a threshold in advance. Once the threshold u is fixed, the model231
parameters ξ and σ are estimated using maximum likelihood, although there exists a number of other232
alternative methods, see for instance Resnick (1997) and Crovella and Taqqu (1999) and references therein.233
5.2.2 Extreme Value Analysis for Dependent Sequences234
The generalized Pareto distribution is asymptotically a good model for the marginal distribution of high
excesses of independent and identically distributed random variables, see Coles (2001), Leadbetter et
al. (1983). Unfortunately, this is a property that is almost unreasonable for most of the climatological data
sets since dependence in successive days is to be expected. A way of dealing with the dependence between
the excesses is either to choose the level u high enough so that enough time has past between successive
excesses to make them independent or to use declustering, which is probably the most widely adopted
method for dealing with dependent exceedances; it corresponds to filtering the dependent observations
to obtain a set of threshold excesses that are approximately independent, see Coles (2001). A simple way
of determining m-clusters of extremes, after specifying a threshold u, is to define consecutive excesses
of u to belong to the same m-cluster as long as they are separated by less than m + 1 time days. It
should be noted that the separation of extreme events into clusters is likely to be sensitive to the choice
of u, although we do not study this effect in this work. The effect of declustering to the generalized
Pareto distribution in Eq.( 5) is the replacement of the parameters σ and ξ by σθ−1 and ξ, where θ is
the so-called extremal index and is loosely defined as
θ = (limiting mean cluster size)−1.
10
Page 59
ACCEPTED MANUSCRIPT
5.3 Method Application235
In this subsection we apply the method described in subsection 5.2 to model the thinning of the amount236
of precipitation process, i.e. Yt. To demonstrate the method we use data from the station in Lund. The237
rest of the stations give similar results.238
As we have already seen, the data exhibit temporal dependence. The correlation coefficient ρ, using239
the Gaussian copula for the data from Lund was estimated to be 0.1362. The dependence in the data can240
also be seen in Fig. 11, where the expected number of m clusters (with more than one observation) for241
different values of m and u = 15mm are plotted. The expected number of these m clusters, assuming the242
observations are independent is denoted by ’o’ and are consistently less than the observed number of m243
clusters that is denoted by ’+’. The expected number of m clusters computed assuming the observations244
are actually correlated (ρ = 0.1362) is denoted by ’*’ and provides with an obvious improvement to the245
assumption of independence. We also provide with 95% exact confidence intervals for both cases. The246
observed values fall inside the confidence interval constructed assuming correlated data.247
After the cluster size has been decided, in the case of the station in Lund m = 0, we turn to the
problem of estimating the parameters ξ, σ and θ for the generalized Pareto model. The choice of the
specific threshold (u = 15mm) was based on mean residual life plot. It is expected, see Coles (2001)
that for the threshold u for which the generalized Pareto model provides a good approximation for the
excesses above that level, the mean residual life plot i.e. the locus of the points{(u,
1nu
nu∑i=1
(Yt(i) − u)
): u < Y max
t
},
where Yt(1), . . . , Yt(nu) are the nu observations that exceed u and Y maxt is the largest observation of the248
process Yt, should be approximately linear in u. Fig. 12 shows the mean residual life plot with approximate249
95% confidence interval for the daily precipitation in Lund. The graph appears to curve from u = 0mm250
until u = 15mm and is approximately linear after that threshold. It is tempting to conclude that there251
is no stability until u = 28mm after which there is approximate linearity which suggests u = 28mm.252
However, such threshold gives very few excesses for any meaningful inference (33 observations out of253
16000). So we decided to work initially with the threshold set at u = 15mm.254
Finally, the different diagnostic plots for the fit of the Generalized Pareto distribution are collected in255
Fig. 13. The data from the rest of the stations have produced similar plots none of which gave any reason256
for concern about the quality of the fitted models. The parameters of the generalized Pareto model for257
the data from all the stations together with 95% confidence intervals are collected in Table 4. For three258
different stations, (i.e. Bolmen, Boras, and Hapamanda), the estimates of the shape parameter, ξ, are259
negative.260
Table 5 shows θ for different values of m-clusters and threshold u = 15mm for the data from Lund.261
11
Page 60
ACCEPTED MANUSCRIPT
6 Evaluation262
To verify the validity of the model, we have obtained distribution functions of the different precipitation263
indices as stipulated by the Expert Team and its predecessor, the CCl/CLIVAR Working Group (WG) on264
Climate Change Detection, see Peterson et al. (2001) and Karl et al. (1999). Sixteen of those indices are265
of relevance to this work, two regarding only the occurrence of precipitation process (CDD and CWD),266
another two regarding only the amount precipitation process (SDII and Prec90p) and the remaining267
twelve concerning both processes, see Table 6.268
Using the chain dependent model, we have obtained the distribution of each index based on 100, 000269
simulations. This has been compared to the empirical distribution (’.-’ line in Figs. 14 - 18). The270
agreement between the two distributions is more than satisfactory. Moreover, the empirical distribution271
falls always inside the 90% exact confidence intervals. The results have been presented for the weather272
station in Lund. The rest of the stations give similar results.273
As we can see, Fig. 14 (top left), approximately during two years we expect to have about 17 days274
with precipitation more than 10mm and, Fig. 14 (top right), about 3 days with precipitation more than275
20mm. But then, see Fig. 14 (bottom left), the precipitation during each one of these three days will be276
quite a lot more than 20mm. Fig. 14 (bottom right) tell us that the probability of having 5 consecutive277
days of really heavy precipitation in Lund is quite high.278
As we notice in Fig. 15 (left), once every two years we should expect to have a dry spell with length279
more than two weeks, and a wet spell of approximately 12 days.280
In Fig. 16 (top left), we see that every two years in Lund, we expect to have almost fifty moderately281
wet days (top right), almost 18 above moderately wet days (top right), almost 9 very wet days (bottom282
left) and almost 2 extremely wet days (bottom right).283
In Fig. 17 (top left), we see that during the fifty moderately wet days that we expect over a period of284
two years in Lund we will have about 70% of the total amount of precipitation. Similarly, during the 18285
above moderate wet days we expect on average a little more than 40% of the total precipitation amount286
(top right), for the 8 very wet days about 25% of the total amount (bottom left) and for the 2 extremely287
wet days about 10% (bottom right) of the total amount.288
In Fig. 18 (left), we see that the average amount of precipitation per day of precipitation is 3.5mm289
and also every year on average only 1 out of the 10 precipitation days the downfall exceeds 9.5mm.290
7 Conclusions291
In this paper, we have modelled the temporal variability of the precipitation in Sweden. The different292
weather stations have been assumed as not having any spatial dependence. It is among our future research293
plans to try to model also the spatial variability of the precipitation in the different weather stations in294
Sweden. Some interesting conclusions can be drawn.295
We have used a chain dependent model for the precipitation. That consists of a component for the296
occurrence of precipitation and a component for the amount of precipitation. For the first component,297
12
Page 61
ACCEPTED MANUSCRIPT
we have used high order Markov chains with two states. We have shown that the 1-Markov chain model298
that has been used extensively, is an inadequate model for most of the Swedish stations. For example,299
when the distribution of the long dry spell is of interest, the 1-Markov chains underestimates the length300
of the long dry spell in some cases up to half a day.301
For the amount of precipitation process, we have used a copula to describe the temporal dependence302
structure between successive days, which in reality is a Gaussian process with transformed marginals.303
Then, the cumulative distribution has been modelled in two steps. First using the empirical distribution304
for the amounts of precipitation that are less than a given threshold and, then using a generalised Pareto305
distribution to model the excesses above the threshold. Such models have the advantage that they provide306
with the mathematical platform that allows computation of such quantities as return periods.307
Finally, the distributions of different weather indices have been computed using Monte Carlo Markov308
Chain techniques, and been compared to the empirical distributions obtained from the data. The agree-309
ment between the two distributions has been really good, which supports the choice of the models.310
References311
[1] Akaike, H. (1974). A new look at statistical model identification, IEEE Trans. Auto. Contol, AC,312
19, pp.716-722313
[2] Aksoy, H. and Bayazit, M., (2000). A model for daily flows of intermittent streams. Hydrological314
Processes, 14, 1725-1744.315
[3] Benjamin, J.R. and Cornell, C.A., (1970). Probability, Statistics and Decision for Civil Engineers,316
McGraw-Hill, Inc., New York, 685 pp.317
[4] Bruhn, J.A., Fry, W.E. and Fick, G.W., (1980). Simulation of daily weather data using theoretical318
probability distributions. J. Appl. Meteorol. 19, pp. 1029-1036.319
[5] Castellvi, F. and Stockle, C.O., (2001). Comparing a locally-calibrated versus a generalised tempera-320
ture weather generation. Trans. ASAE 44 5, pp. 1143-1148.321
[6] Chin, E. H., (1977). Modelling daily precipitation process with Markov chain, Wat. Resources Res.,322
13, 949-956.323
[7] Coles, S., (2001). An Introduction to Statistical Modeling of Extreme Values. Springer, London.324
[8] Cox, D.R. and Isham, V., (1988). A simple spatial-temporal model for rainfall (with discussion). Proc.325
R. Soc. Lond. A, 415, 317-328.326
[9] Cox, D.R. and Isham, V., (1994). Stochastic models of precipitation. In Statistics for the Environment327
2: Water Related Issues (eds V. Barnett and K.F. Turkman), ch. 1, pp. 3-18. Chichester: Wiley.328
[10] Crovella, M. and Taqqu, M., (1999). Estimating the heavy tail index from scaling properties, Method-329
ology and Computing in Applied Probability 1, 55-79.330
[11] Dalevi, D., Pubhashi, D. and Hermansson, M., (2006). A New Order Estimator for Fixed and Vari-331
able Length Markov Models with Applications to DNA Sequence Similarity, Stat. Appl. Genet. Mol.332
Biol., 5, Article 8.333
13
Page 62
ACCEPTED MANUSCRIPT
[12] Davison, A.C. and Smith, R.L., (1990). Models for exceedances over high thresholds, J. Roy. Statist.334
Soc. B, 52, pp. 393-442.335
[12] Frigessi, A., Haug, O., Rue, H., (2002). A Dynamic Mixture Model for Unsupervised Tail Estimation336
without Threshold Selection, Extremes, 5, pp.219-235.337
[13] Dudewizz, E. J. and Mishra, S. N. (1988). Modern Mathematical Statistics, Wiley Series in Proba-338
bility and Mathematical Statistics.339
[14] Gabriel, K.R. and Neumann, J., (1962). A Markov chain model for daily rainfall occurrences at Tel340
Aviv. Quart.J.Royal Met.Soc. 88, 90-95.341
[15] Geng, S., Frits, W.T., de Vries, P. and Supit, I., (1986). A simple method for generating daily rainfall342
data. Agric. For. Meteorol. 36, pp. 363-376.343
[16] Guttorp, P. (1995). Stochastic Modelling of Scientific Data,Chapman & Hall, London Chapter 2344
[17] Hutchinson, M.F., (1995). Stochastic space-time weather models from ground-based data. Agric.345
For. Meteorol., 73, 237-264.346
[18] Hutchinson, T.P. and Lai, C.D., (1990). Continuous Bivariate Distributions, Emphasising Applica-347
tions. Sydney, Australia: Rumsby.348
[19] Jimoh, O.D. and Webster, P., (1996). The optimum order of a Markov chain model for daily rainfall349
in Nigeria. Journal of Hydrology, 185, 45-69.350
[20] Joe, H., (1997). Multivariate Models and Dependence Concepts. London: Chapman & Hall351
[21] Karl, T.R., Nicholls, N. and Ghazi, A., (1999). CLIVAR/GCOS/WMO workshop on indices and352
indicators for climate extremes: Workshop summary, Climatic Change. Vol. 32, pp. 3-7.353
[22] Lana, X. and Burgueno, A., (1998). Daily dry-wet behaviour in Catalonia (NE Spain) from the354
viewpoint of Markov chains, Int. J. Climatol. 18, 793-815.355
[23] Leadbetter, M.R., Lindgren, G., and Rootzen, H., (1983). Extremes and Related Properties of Ran-356
dom Sequences and Series. Springer Verlag, New York.357
[24] LeCam, L., (1961). A stochastic description of precipitation Proc.4th Berkeley Symp., pp.165-186.358
[25] Lennartsson, J., and Shu, M., (2005). Copula Dependence Structure on Real Stock Markets, Masters359
thesis, Chalmers University of Technology, 2005-01.360
[26] Liao, Y., Zhang, Q. and Chen, D., (2004). Stochastic modeling of daily precipitation in China.361
Journal of Geographical Sciences, 14(4), 417-426.362
[27] Mellor, D., (1996). The modified turning bands (mtb) model for space-time rainfall:i, model defini-363
tion and properties. J. Hydrol., 175 113-127.364
[28] McNeil, A.J., (1996). Estimating the tails of loss severity distributions using extreme value theory,365
Technical report, Department Mathematik, ETH Zentrum, Zurich.366
[29] Nelsen, R. B., (2006). An Introduction to Copulas 2nd edition. New York: Springer.367
[30] Norris, J.R., (2005). Markov chains, Cambridge University Press.368
[31] Peterson, T.C., Folland, C., Gruza, G., Hogg, W., Mokssit, A. and Plummer, N., (2001). Report on369
the Activities of the Working Group on Climate Change Detection and Related Rapporteurs 1998-2001.370
14
Page 63
ACCEPTED MANUSCRIPT
World Meteorological Organisation, WCDMP-47, WMO-TD 1071.371
[32] Racsko, P., Szeidl, L. and Semenov, M., (1991). A serial approach to local stochastic weather models.372
Ecol. Model. 57, pp. 27-41.373
[33] Resnick, S.I., (1997). Heavy tail modeling and teletraffic data, The Annals of Statistics 255, 1805-374
1869.375
[34] Rootzen, H. and Tajvidi, N., (1997). Extreme value statistics and wind storm losses: A case study,376
Scandinavian Actuarial Journal 1, 70-94.377
[35] Rodrıguez-Iturbe, I., Cox, D. and Isham, V., (1987). Some models for rainfall based on stochastic378
point processes. Proc. R. Soc. Lond., A 410, 269-288.379
[36] Rodrıguez-Iturbe, I., Cox, D. and Isham, V., (1988). A point process model for rainfall: further380
developments. Proc. R. Soc. Lond., A 417, 283-298.381
[37] Schwarz, G., (1978). Estimating the dimension of a model. Ann. Stat. 6, pp. 461-464.382
[38] Selker, J.S. and Haith, D.A., (1990). Development and testing of simple parameter precipitation383
distributions. Water Resour. Res. 26 11, pp. 2733-2740.384
[39] Smith, R. L. and Robinson, P.J., (1997). A Bayesian approach to the modelling of spatial-temporal385
precipitation data. Lect. Notes Statist., 237-269.386
[40] Srikanthan, R. and McMAhon, T.A., (2001). Stochastic generation of annual, monthly and daily387
climate data: A review. Hydrol. Earth Syst. Sci. 5 4, pp. 653-670.388
[41] Stern, R.D. and Coe, R. (1984)., A Model fitting Analysis of Daily Rainfall Data, J.R.Statist.Soc.389
A, 147, Part1, pp.1-34.390
[42] Stidd, C.K., (1973). Estimating the precipitation climate. Wat. Resour. Res., 9 1235-1241.391
[43] Tong, H. (1975). Determination of the order of a Markov chain by Akaike’s information criterion. J.392
Appl. Prob., 12: 486-497.393
[44] Trivedi, P. K. and Zimmer, D.M., (2007). Copula Modelling: An Introduction for Practitioners,394
Foundations and Trends in Econometrics, Vol. 1, No 1, 1-111.395
[45] Waymire, E., Gupta, V. K., (1981). The mathematical structure of rainfall representations: 3,, Some396
applications of the point process theory to rainfall processes. Wat. Resour. Res.17, 1287-1294.397
[46] Waymire, E., Gupta, V. K. and Rodrıguez-Iturbe, I., (1984). Spectral theory of rainfall intensity at398
the meso-β scale. Wat. Resour. Res.20, 1453-1465.399
[47] Woolhiser, D.A., (1992). Modelling daily precipitation-progress and problems. In: A. Walden and400
P. Guttorp (Editors), Statistics in the Environmental and Earth Sciences. Edward Arnold, London,401
pp.71-89.402
403
15
Page 64
ACCEPTED MANUSCRIPT
8 Appendix404
8.1 Review of Mathematical Order Estimators405
Let Xt denote a k-Markov chain that is defined on a state space S and xn1 its realisation. Let also
PML(k)(xn1 ) be the kth order maximum likelihood, i.e.
PML(k)(xn1 ) = maxP (Xk
1 )Πni=k+1P (Xi = xi|τk(X i−1) = τk(xi−1)).
Tong (1975) reported that the Akaike Information Criterion (AIC) order estimator, could be used as
an objective technique for determining the optimum order k of the chain, see also Akaike (1974). The
optimium order k is the order that has the minimum loss function:
kAIC(xn1 ) = argmink(− log PML(k)(xn
1 ) + |S|k).
Schwarz (1978) presented an alternative technique the Bayesian Information Criterion (BIC) order
estimator whose consistency was established under general conditions was only recently established. The
optimum order, k is the order that minimises the loss function which now is given by:
kBIC(xn1 ) = argmink(− log PML(xn
1 ) +|S|k(|S| − 1)
2log(n)).
Dalevi et al. (2006) showed using experimental results that the BIC order estimator tends to under-406
estimate the order as k gets larger for moderate data sizes.407
Finally, the Maximal Fluctuation Criterion (MFC) contrary to the AIC and BIC order estimators,
was specifically designed for multiple step Markov chains. Let for any realisation x ∈ Sn of the k-Markov
chain, Nx(w) = |{i ∈ [1, n] : τl(xi) = w, w ∈ Sl}| denote the number of times w occurs in x. The
Peres-Shields Fluctuation function is defined as
∆k(v) = maxs∈S
|Nx(vs) − Nx(τk(v)s)Nx(τk(v))
Nx(v)|.
When the order of the Markov chain is k or less, this fluctuation is small. Therefore, the Maximal
Fluctuation Criterion (MFC) order estimator is defined as
kMFC(xn1 ) = min{k ≥ 0 : max
k<|v|<log log(n)∆k(v) < n3/4}.
In practice the function log log(·) is substituted by any function that grows slower than log(·). Dalevi et
al. (2006) suggested the Generalized Maximum Fluctuation Criterion (GMFC) order estimator, which is
closely related to the Maximal Fluctuation Criterion (MFC) order estimator,
kGMFC(xn1 ) = argmaxk
maxk−1<|v|<f(n) ∆k−1(v)maxk<|v|<f(n) ∆k(v)
,
where f(n) is any function that satisfies the same conditions as for the GMF order estimator.408
Fig.1: Location of the stations.409
Fig.2: Time plot of annual number of wet days.410
Fig.3: Time plot of annual amount of precipitation.411
16
Page 65
ACCEPTED MANUSCRIPT
Fig.4: Lund, Sweden (data from 1961 to 2004). (Left): Observed p(t) pooled over 5 days. (Right):412
Mean number of wet days per month (”+”), and per season (solid lines).413
Fig.5: k-Markov chain orders for block lengths of one month, (Jan, Feb, ...).414
Fig.6 k-Markov chain orders for block lengths of two months, (Jan-Feb, Feb-Mar, ...).415
Fig.7 k-Markov chain orders for block lengths of three months, (Jan-Mar, Feb-Apr, ... ).416
Fig.8: Order of Markov chain as suggested by the Kolmogorov-Smirnov statistic at 10% tail value417
for each station and season.418
Fig. 9: Conditional distribution of Dry Spell given the Dry Spell is longer or equal to 3 days for419
k-Markov chain models of order k = 1, k = 2 and k = 3 and the data from Lund. Data are from the420
winter months December-February.421
Fig.10: Plot of the dependence structure with the marginal distributions transformed to standard422
normal.423
Fig. 11: Number of m-clusters with more than one observation. ’+’ denotes the observed and ’o’ the424
theoretical number of m-clusters assuming that the observations are independent, while ’*’ denotes the425
number of m clusters using ρ = 0.1362. Line ’–’ denotes the 95% confidence interval for the theoretical426
number of m-clusters assuming independence, while ’-.’ denotes the 95% confidence interval for the427
theoretical number of m-clusters assuming ρ = 0.1362.428
Fig. 12: Mean residual life plot of amount precipitation process from Lund, dotted lines give the429
95% confidence interval.430
Fig. 13: Diagnostic plots for threshold excess model fitted to daily precipitation data from the station431
in Lund.432
Fig. 14: Plots of R10mm (top left), R20mm (top right), RX1day (bottom left) and RX5day (bottom433
right). theoretical distribution ’-’ and empirical distribution ’.-’.434
Fig. 15: Plot of maximum number of consecutive dry days (left), and maximum number of consecutive435
wet days (right).436
Fig. 16: Plot of the probability of number of moderate wet days (top left), above moderate wet days437
(top right), very wet days (bottom left) and extremely wet days (bottom right).438
Fig. 17: Percentage of precipitation during the moderately wet days (top left), the above moderate439
wet days (top right), the very wet days (bottom left) and the extremely wet days (bottom right).440
Fig. 18: Plot of the average amount of precipitation per day of precipitation (left) and the 90%441
quantile of the amount of precipitation of the thinned precipitation process (right).442
Table 1: Names of weather stations.443
Table 2: Number of data sets that have passed the Kolmogorov-Smirnov test at the 10% tail value444
for different orders of the Markov chain. S1 stands for Dec.-Feb., S2 for Mar.-May, S3 for Jun.-Aug. and445
S4 for Sep.-Nov.446
Table 3: Expected length of long dry spells in days for season Dec-Feb in Lund.447
Table 4: Extremal parameters and their 95% confidence intervals for each weather station.448
17
Page 66
ACCEPTED MANUSCRIPT
Table 5: Values of the parameter θ for different choices of m clusters.449
Table 6: Weather Indices and their mathematical expressions. The quantiles q(·) have been estimated450
using the observed data.451
18
Page 67
ACCEPTED MANUSCRIPT
Figure 1
Page 68
ACCEPTED MANUSCRIPT
1970 1980 1990 20000
100
200
300
annu
al n
o. o
f wet
day
s
Years
Lund
1970 1980 1990 20000
100
200
300
annu
al n
o. o
f wet
day
s
Years
Bolmen
1970 1980 1990 20000
100
200
300
annu
al n
o. o
f wet
day
s
Years
Hanö
1970 1980 1990 20000
100
200
300
annu
al n
o. o
f wet
day
s
Years
Borås
Figure 2
Page 69
ACCEPTED MANUSCRIPT
1970 1980 1990 20000
100
200
300
annu
al n
o. o
f wet
day
s
Years
Varberg
1970 1980 1990 20000
100
200
300
annu
al n
o. o
f wet
day
s
Years
Ungsberg
1970 1980 1990 20000
100
200
300
annu
al n
o. o
f wet
day
s
Years
Säffle
1970 1980 1990 20000
100
200
300
annu
al n
o. o
f wet
day
s
Years
Söderköping
Figure 2
Page 70
ACCEPTED MANUSCRIPT
1970 1980 1990 20000
100
200
300
annu
al n
o. o
f wet
day
s
Years
Stockholm
1970 1980 1990 20000
100
200
300
annu
al n
o. o
f wet
day
s
Years
Malugn
1970 1980 1990 20000
100
200
300
annu
al n
o. o
f wet
day
s
Years
Vattholma
1970 1980 1990 20000
100
200
300
annu
al n
o. o
f wet
day
s
Years
Myskåsen
Figure 2
Page 71
ACCEPTED MANUSCRIPT
1970 1980 1990 20000
100
200
300
annu
al n
o. o
f wet
day
s
Years
Härnösand
1970 1980 1990 20000
100
200
300
annu
al n
o. o
f wet
day
s
Years
Rösta
1970 1980 1990 20000
100
200
300
annu
al n
o. o
f wet
day
s
Years
Piteå
1970 1980 1990 20000
100
200
300
annu
al n
o. o
f wet
day
s
Years
Stensele
Figure 2
Page 72
ACCEPTED MANUSCRIPT
1970 1980 1990 20000
100
200
300
annu
al n
o. o
f wet
day
s
Years
Haparanda
1970 1980 1990 20000
100
200
300
annu
al n
o. o
f wet
day
s
Years
Kvikkjokk
1970 1980 1990 20000
100
200
300
annu
al n
o. o
f wet
day
s
Years
Pajala
1970 1980 1990 20000
100
200
300
annu
al n
o. o
f wet
day
s
Years
Karesuando
Figure 2
Page 73
ACCEPTED MANUSCRIPT
1980 2000
500
1000
1500
amou
nt o
f pre
cip.
(m
m)
Years
Lund
1980 2000
500
1000
1500
amou
nt o
f pre
cip.
(m
m)
Years
Bolmen
1980 2000
500
1000
1500
amou
nt o
f pre
cip.
(m
m)
Years
Hanö
1980 2000
500
1000
1500
amou
nt o
f pre
cip.
(m
m)
Years
Borås
Figure 3
Page 74
ACCEPTED MANUSCRIPT
1980 2000
500
1000
1500
amou
nt o
f pre
cip.
(m
m)
Years
Varberg
1980 2000
500
1000
1500
amou
nt o
f pre
cip.
(m
m)
Years
Ungsberg
1980 2000
500
1000
1500
amou
nt o
f pre
cip.
(m
m)
Years
Säffle
1980 2000
500
1000
1500
amou
nt o
f pre
cip.
(m
m)
Years
Söderköping
Figure 3
Page 75
ACCEPTED MANUSCRIPT
1980 2000
500
1000
1500
amou
nt o
f pre
cip.
(m
m)
Years
Stockholm
1980 2000
500
1000
1500
amou
nt o
f pre
cip.
(m
m)
Years
Malugn
1980 2000
500
1000
1500
amou
nt o
f pre
cip.
(m
m)
Years
Vattholma
1980 2000
500
1000
1500
amou
nt o
f pre
cip.
(m
m)
Years
Myskåsen
Figure 3
Page 76
ACCEPTED MANUSCRIPT
1980 2000
500
1000
1500
amou
nt o
f pre
cip.
(m
m)
Years
Härnösand
1980 2000
500
1000
1500
amou
nt o
f pre
cip.
(m
m)
Years
Rösta
1980 2000
500
1000
1500
amou
nt o
f pre
cip.
(m
m)
Years
Piteå
1980 2000
500
1000
1500
amou
nt o
f pre
cip.
(m
m)
Years
Stensele
Figure 3
Page 77
ACCEPTED MANUSCRIPT
1980 2000
500
1000
1500
amou
nt o
f pre
cip.
(m
m)
Years
Haparanda
1980 2000
500
1000
1500
amou
nt o
f pre
cip.
(m
m)
Years
Kvikkjokk
1980 2000
500
1000
1500
amou
nt o
f pre
cip.
(m
m)
Years
Pajala
1980 2000
500
1000
1500
amou
nt o
f pre
cip.
(m
m)
Years
Karesuando
Figure 3
Page 78
ACCEPTED MANUSCRIPT
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec0.3
0.35
0.4
0.45
0.5
0.55
0.6
0.65
0.7
0.75
0.8
prob
. of p
reci
p.
Figure 4
Page 79
ACCEPTED MANUSCRIPT
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec10
11
12
13
14
15
16
17
18
19
20
mea
n no
. of w
et d
ays
Figure 4
Page 80
ACCEPTED MANUSCRIPT
Estimated Orders by Akaike order estimator
st 01 st 02 st 03 st 04 st 05
st 06 st 07 st 08 st 09 st 10
st 11 st 12 st 13 st 14 st 15
st 16 st 17 st 18 st 19 st 20
5
3
1
5
3
1
5
3
1
5
3
1
Jan Jun DecJan Jun DecJan Jun DecJan Jun DecJan Jun Dec
st 01 st 02 st 03 st 04 st 05
st 06 st 07 st 08 st 09 st 10
st 11 st 12 st 13 st 14 st 15
st 16 st 17 st 18 st 19 st 20
5
3
1
5
3
1
5
3
1
5
3
1
Jan Jun DecJan Jun DecJan Jun DecJan Jun DecJan Jun Dec
Figure 5
Page 81
ACCEPTED MANUSCRIPT
Estimated Orders by Bayesian order estimator
st 01 st 02 st 03 st 04 st 05
st 06 st 07 st 08 st 09 st 10
st 11 st 12 st 13 st 14 st 15
st 16 st 17 st 18 st 19 st 20
5
3
1
5
3
1
5
3
1
5
3
1
Jan Jun DecJan Jun DecJan Jun DecJan Jun DecJan Jun Dec
Figure 5
Page 82
ACCEPTED MANUSCRIPT
Estimated Orders by GMFC order estimator
st 01 st 02 st 03 st 04 st 05
st 06 st 07 st 08 st 09 st 10
st 11 st 12 st 13 st 14 st 15
st 16 st 17 st 18 st 19 st 20
5
3
1
5
3
1
5
3
1
5
3
1
Jan Jun DecJan Jun DecJan Jun DecJan Jun DecJan Jun Dec
Figure 5
Page 83
ACCEPTED MANUSCRIPT
Estimated Orders by Akaike order estimator
st 01 st 02 st 03 st 04 st 05
st 06 st 07 st 08 st 09 st 10
st 11 st 12 st 13 st 14 st 15
st 16 st 17 st 18 st 19 st 20
5
3
1
5
3
1
5
3
1
5
3
1
Jan Jun DecJan Jun DecJan Jun DecJan Jun DecJan Jun Dec
Figure 6
Page 84
ACCEPTED MANUSCRIPT
Estimated Orders by Bayesian order estimator
st 01 st 02 st 03 st 04 st 05
st 06 st 07 st 08 st 09 st 10
st 11 st 12 st 13 st 14 st 15
st 16 st 17 st 18 st 19 st 20
5
3
1
5
3
1
5
3
1
5
3
1
Jan Jun DecJan Jun DecJan Jun DecJan Jun DecJan Jun Dec
Figure 6
Page 85
ACCEPTED MANUSCRIPT
Estimated Orders by GMFC order estimator
st 01 st 02 st 03 st 04 st 05
st 06 st 07 st 08 st 09 st 10
st 11 st 12 st 13 st 14 st 15
st 16 st 17 st 18 st 19 st 20
5
3
1
5
3
1
5
3
1
5
3
1
Jan Jun DecJan Jun DecJan Jun DecJan Jun DecJan Jun Dec
Figure 6
Page 86
ACCEPTED MANUSCRIPT
Estimated Orders by Akaike order estimator
st 01 st 02 st 03 st 04 st 05
st 06 st 07 st 08 st 09 st 10
st 11 st 12 st 13 st 14 st 15
st 16 st 17 st 18 st 19 st 20
5
3
1
5
3
1
5
3
1
5
3
1
Jan Jun DecJan Jun DecJan Jun DecJan Jun DecJan Jun Dec
Figure 7
Page 87
ACCEPTED MANUSCRIPT
Estimated Orders by Bayesian order estimator
st 01 st 02 st 03 st 04 st 05
st 06 st 07 st 08 st 09 st 10
st 11 st 12 st 13 st 14 st 15
st 16 st 17 st 18 st 19 st 20
5
3
1
5
3
1
5
3
1
5
3
1
Jan Jun DecJan Jun DecJan Jun DecJan Jun DecJan Jun Dec
Figure 7
Page 88
ACCEPTED MANUSCRIPT
Estimated Orders by GMFC order estimator
st 01 st 02 st 03 st 04 st 05
st 06 st 07 st 08 st 09 st 10
st 11 st 12 st 13 st 14 st 15
st 16 st 17 st 18 st 19 st 20
5
3
1
5
3
1
5
3
1
5
3
1
Jan Jun DecJan Jun DecJan Jun DecJan Jun DecJan Jun Dec
Figure 7
Page 89
ACCEPTED MANUSCRIPT
Estimated Orders by KS−criterion of dry spell order estimator
st 01 st 02 st 03 st 04 st 05
st 06 st 07 st 08 st 09 st 10
st 11 st 12 st 13 st 14 st 15
st 16 st 17 st 18 st 19 st 20
531
531
531
531
Jan May Nov Jan May Nov Jan May Nov Jan May Nov Jan May Nov
st 01 st 02 st 03 st 04 st 05
st 06 st 07 st 08 st 09 st 10
st 11 st 12 st 13 st 14 st 15
st 16 st 17 st 18 st 19 st 20
531
531
531
531
Jan May Nov Jan May Nov Jan May Nov Jan May Nov Jan May Nov
st 01 st 02 st 03 st 04 st 05
st 06 st 07 st 08 st 09 st 10
st 11 st 12 st 13 st 14 st 15
st 16 st 17 st 18 st 19 st 20
531
531
531
531
Jan May Nov Jan May Nov Jan May Nov Jan May Nov Jan May Nov
Figure 8
Page 90
ACCEPTED MANUSCRIPT
2 4 6 8 10 12 14
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
time (days)
prob
abili
ty
Empirical3−Markov2−Markov1−Markov
Figure 9
Page 91
ACCEPTED MANUSCRIPT
−4 −3 −2 −1 0 1 2 3 4−4
−3
−2
−1
0
1
2
3
4
T(Y
t+1)
T(Yt)
Figure 10
Page 92
ACCEPTED MANUSCRIPT
01
23
45
60 10 20 30 40 50
m
No clusters with more than one obs
Figure 11
Page 93
ACCEPTED MANUSCRIPT
0 10 20 30 40 50 600
2
4
6
8
10
12
E[X
−u|
X>
u]
u
Figure 12
Page 94
ACCEPTED MANUSCRIPT
0 0.5 10
0.5
1
Ext
rem
e E
mpi
rical
mod
el
GP model
Probability Plot
20 40 60
20
40
60
GP model (mm)
Ext
rem
e E
mpi
rical
mod
el (
mm
) Quantile Plot
10 40 160
40
60
80
Return Period (Years)
Ret
urn
Leve
l
Return Level Plot
20 40 600
20
40
60
80
no o
f obs
erva
tions
Amount of precipitation (mm)
Density Plot
Figure 13
Page 95
ACCEPTED MANUSCRIPT
10 15 20 25 30
0
0.2
0.4
0.6
0.8
1
no. of days
Index R10mm
Model cumul. distr.Empirical distr.
Figure 14
Page 96
ACCEPTED MANUSCRIPT
0 1 2 3 4 5 6 7 8 9 10
0
0.2
0.4
0.6
0.8
1
no. of days
Index R20mm
Model cumul. distr.Empirical distr.
Figure 14
Page 97
ACCEPTED MANUSCRIPT
15 20 25 30 35 40 45 50 55 60 65
0
0.2
0.4
0.6
0.8
1
amount of prec. (mm)
Index RX1day
Model cumul. distr.Empirical distr.
Figure 14
Page 98
ACCEPTED MANUSCRIPT
30 40 50 60 70 80 90 100 110
0
0.2
0.4
0.6
0.8
1
amount of prec. (mm)
Index RX5day
Model cumul. distr.Empirical distr.
Figure 14
Page 99
ACCEPTED MANUSCRIPT
10 15 20 25 30 35
0
0.2
0.4
0.6
0.8
1
no. of days
Index CDD
Model cumul. distr.Empirical distr.
Figure 15
Page 100
ACCEPTED MANUSCRIPT
8 10 12 14 16 18 20 22 24
0
0.2
0.4
0.6
0.8
1
no. of days
Index CWD
Model cumul. distr.Empirical distr.
Figure 15
Page 101
ACCEPTED MANUSCRIPT
30 35 40 45 50 55 60 65 70
0
0.2
0.4
0.6
0.8
1
no. of days
Index R75p
Model cumul. distr.Empirical distr.
Figure 16
Page 102
ACCEPTED MANUSCRIPT
10 15 20 25 30
0
0.2
0.4
0.6
0.8
1
no. of days
Index R90p
Model cumul. distr.Empirical distr.
Figure 16
Page 103
ACCEPTED MANUSCRIPT
4 6 8 10 12 14 16 18
0
0.2
0.4
0.6
0.8
1
no. of days
Index R95p
Model cumul. distr.Empirical distr.
Figure 16
Page 104
ACCEPTED MANUSCRIPT
0 1 2 3 4 5 6 7 8 9
0
0.2
0.4
0.6
0.8
1
no. of days
Index R99p
Model cumul. distr.Empirical distr.
Figure 16
Page 105
ACCEPTED MANUSCRIPT
0.6 0.65 0.7 0.75 0.8
0
0.2
0.4
0.6
0.8
1
Index R75pTOT
Model cumul. distr.Empirical distr.
Figure 17
Page 106
ACCEPTED MANUSCRIPT
0.25 0.3 0.35 0.4 0.45 0.5 0.55
0
0.2
0.4
0.6
0.8
1
Index R90pTOT
Model cumul. distr.Empirical distr.
Figure 17
Page 107
ACCEPTED MANUSCRIPT
0.15 0.2 0.25 0.3 0.35 0.4
0
0.2
0.4
0.6
0.8
1
Index R95pTOT
Model cumul. distr.Empirical distr.
Figure 17
Page 108
ACCEPTED MANUSCRIPT
0 0.05 0.1 0.15 0.2 0.25
0
0.2
0.4
0.6
0.8
1
Index R99pTOT
Model cumul. distr.Empirical distr.
Figure 17
Page 109
ACCEPTED MANUSCRIPT
3 3.5 4 4.5 5
0
0.2
0.4
0.6
0.8
1
amount of prec. (mm)
Index SDII
Model cumul. distr.Empirical distr.
Figure 18
Page 110
ACCEPTED MANUSCRIPT
8 9 10 11 12 13
0
0.2
0.4
0.6
0.8
1
amount of prec. (mm)
Index Prec90p
Model cumul. distr.Empirical distr.
Figure 18
Page 111
ACCEPTED MANUSCRIPT
Number Name
1 Lund
2 Bolmen
3 Hano
4 Boras
5 Varberg
6 Ungsberg
7 Saffle
8 Soderkoping
9 Stockholm
10 Malung
11 Vattholma
12 Myskelasen
13 Harnosand
14 Rosta
15 Pitea
16 Stensele
17 Haparanda
18 Kvikkjokk
19 Pajala
20 Karesuando
1
Table 1
Page 112
ACCEPTED MANUSCRIPT
Season
Model S1 S2 S3 S4
k = 1 1 0 1 6
k = 2 20 20 20 20
k = 3 20 20 20 20
1
Table 2
Page 113
ACCEPTED MANUSCRIPT
Model l = 1 l = 2 l = 3
k = 1 2.49 3.49 4.49
k = 2 - 3.91 4.91
k = 3 - - 5.11
Observed 2.56 3.97 5.23
mean value
1
Table 3
Page 114
ACCEPTED MANUSCRIPT
Station σ CI for σ ξ CI for ξ θ u (mm) ρ
Lund 5.91 (4.93, 7.03) 0.076 (-0.041, 0.236) 0.935 15 0.1362
Bolmen 6.44 (5.56, 7.41) -0.0002 (-0.095, 0.116) 0.921 15 0.2008
Hano 5.29 (3.044, 8.737) 0.458 (0.115, 1.05) 0.977 25 0.1649
Boras 7.63 (7.01, 8.28) -0.011 (-0.067,0.053) 0.794 10 0.1982
Varberg 5.48 (4.687, 6.378) 0.106 (0.001, 0.236) 0.926 15 0.1206
Ungsberg 5.768 (4.622,7.115) 0.245 (0.089,0.445) 0.925 15 0.1843
Saffle 6.62 (5.96,7.329) 0.099 (0.027,0.183) 0.857 10 0.1809
Soderkoping 6.259 (4.32,8.884) 0.297 (0.1, 0.649) 0.984 25 0.1678
Stockholm 5.597 (4.827,6.453) 0.135 (0.033,0.259) 0.903 10 0.1523
Malung 6.355 (5.676,7.095) 0.08 (0.004,0.17) 0.86 10 0.2280
Vattholma 4.964 (3.521,6.784) 0.334 (0.098,0.667) 0.984 20 0.1709
Myskelasen 6.854 (5.962,7.844) 0.019 (-0.072,0.13) 0.849 10 0.2311
Harnosand 7.863 (7.053, 8.742) 0.087 (0.011, 0.175) 0.832 10 0.2068
Rosta 6.276 (5.453,7.19) 0.032 (-0.062, 0.145) 0.876 10 0.2116
Pitea 5.937 (4.429, 7.822) 0.19 (0.004, 0.456) 0.96 20 0.2010
Stensele 7.66 (6.098, 9.5) 0.041 (-0.11, 0.236) 0.915 15 0.2249
Haparanda 5.628 (4.405, 7.07) -0.073 (-0.196, 0.125) 0.984 18 0.1871
Kvikkjokk 5.66 (5.01, 6.36) 0.04 (-0.04, 0.137) 0.864 10 0.2526
Pajala 5.033 (3.705, 6.728) 0.356 (0.153, 0.646) 0.966 18 0.2385
Karesuando 5.303 (4.117, 6.754) 0.12 (-0.037, 0.34) 0.922 15 0.2206
1
Table 4
Page 115
ACCEPTED MANUSCRIPT
m θ
0 0.9144
1 0.8836
2 0.8425
3 0.8322
1
Table 5
Page 116
ACCEPTED MANUSCRIPT
Index Description Formula
R10mm Heavy precipitation days∑
1{Zi>10}
R20mm Very heavy precipitation days∑
1{Zi>20}
RX1day Highest 1 day precipitation amount maxi Zi
RX5day Highest 5 day precipitation amount maxi
∑4j=0 Zi+j
CDD Max number of consecutive dry days max{j : τj(X i) = 0}CWD Max number of consecutive wet days max{j : w = τj(X i), wk > 0, ∀k}R75p Moderate wet days
∑1{Zi>q0.75}
R90p Above moderate wet days∑
1{Zi>q0.90}
R95p Very wet days∑
1{Zi>q0.95}
R95p Extremely wet days∑
1{Zi>q0.99}
R75pTOT Precipitation fraction due to R75p∑
Zi1{Zi>q0.75}/∑
Zi
R90pTOT Precipitation fraction due to R90p∑
Zi1{Zi>q0.90}/∑
Zi
R95pTOT Precipitation fraction due to R95p∑
Zi1{Zi>q0.95}/∑
Zi
R99pTOT Precipitation fraction due to R99p∑
Zi1{Zi>q0.99}/∑
Zi
SDII Simple daily intensity index∑
Yi/∑
1{Yi>0}Prec90p 90%-quant. of thinned amount of precipitation F−1
Y (0.9)
1
Table 6