Modelling Precipitation in Sweden Using Multiple Step Markov …palbin/JansLicuppsats.pdf · 2010. 3. 18. · series with the stochastic structure of the model. We can also eval uate

Thesis for the degree of Licentiate of Enginering in MathematicalStatistics

Modelling Precipitation inSweden Using Multiple Step

Markov Chains and aComposite Model

Jan Lennartsson

Department of Mathematical SciencesChalmers University of Technology and University of Gothenburg

Goteborg, Sweden, 2008

Modelling Precipitation in Sweden Using Multi-ple Step Markov Chains and a Composite Model

Jan Lennartsson

© Jan Lennartsson, 2008

Department of Mathematical SciencesChalmers University of Technology and University of GothenburgSE–41296 Goteborg, SwedenPhone: +4631–772 1000

ISSN 1652-9715Technical Report 2008:38

Printed in Sweden, 2008.

Abstract

In this thesis, we propose a new method to model precipitation in Sweden. We considera chain dependent stochastic model that consists of a first component that models theprobability of occurrence of precipitation at a weather station and a second componentthat models the amount of precipitation when precipitation occurs. For the first com-ponent we fit a multiple order Markov chain. It turns out that for most of the weatherstations in Sweden a Markov chain of an order higher than one is required. For the secondcomponent, which is a temporal Gaussian process with marginals transformed to have adistribution composed of the empirical distribution of the amount of historically observedprecipitation at each weather station below a suitable threshold and a fitted generalizedPareto distribution above that threshold. In other words, we model the temporal depen-dence between amounts of precipitation at different times by means of a Gaussian copula.The derived model is then used to compute different weather indices. The distribution ofthese indices according to our model show good agreement with the corresponding em-pirical distributions for the indices as computed from real world data, which supports thechoice of the model.

Keywords: Multiple order Markov chain; Generalized Pareto distribution; Gaussiancopula; Precipitation process; Empirical distribution; Sweden.

i

Acknowledgements

First, I would like to express my gratitude to my supervisor, Patrik Albin, for thesupport, guidance and belief in me. Thank you for always having time with myquestions and ideas and for sharing your experience and for pushing me forwardwith great enthusiasm. You also deserve a rose for just being the lovely unique manthat you are.

Secondly, Anastassia Baxevani for endlessly accepting my stripped declarations andlack of stringency. Also, I am in great debt to you for aiding, driving, pushing andforcing this project into land.

Thirdly, Dan Stromberg for committing and helping out and my previous co–advisor,Igor Rychlik, for his support.

Fourthly and in ascending order Daniel Drugge, Elin Lennartsson and Martin Nor-ling for helping me complete this thesis.

Fifthly, Johan Tykesson, Dan Kuylenstierna, Marcus Warfheimer and Marcus Gavelldeserves gratitude for putting up with me during courses and/or most of the timeaccepting or intriguing challenging discussions.

Ottmar Cronie for temporarily looking after my office.

Former and present colleagues at mathematical sciences.

*

This work was financially supported by Goteborgs MiljoVetenskapliga centrum (GMV).

iii

Contents

Abstract i

Acknowledgements iii

Introduction 1Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2My contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

Appended paper 5

v

Introduction

This licentiate thesis is based on the manuscript “Modelling precipitation in Swedenusing multiple step Markov chains and a composite model” which can be found asan appended paper after this introduction. This manuscript has been accepted forpublication in Journal of Hydrology. The manuscript is a joint work with Anastas-sia Baxevani and Deliang Chen. Below is an introductory motivation and a non-technical description of the manuscript, together with a statement of my personalcontributions to the manuscript as compared with that of my co-authors.

Motivation

Distribution of precipitation is a major environmental issue on both a national andan international level. For example, it is clear that the effects of drought or anoma-lously wet weather conditions may have devastating consequences on agriculture.However, the entire distribution of the downfall is also of great interest and realisticsequences of meteorological variables such as precipitation are key inputs in manyhydrologic, ecologic and agricultural models.

Climate, and in particular precipitation, is governed by a set of physical princi-ples which may in mathematical language be represented as differential equations. Inmeteorology, where only a short time span is considered, these differential equationsare used to predict precipitation. However, the equations feature high complexitywhich has the consequence that no exact solution(s) are known. This in turn re-sults in that variability and uncertainty for predicting precipitation in longer timeperiods, e.g., over a month or a year, grows in an unmanageable extent. In orderto forecast precipitation for longer time spans the precipitation process is thereforeusually modelled as a stochastic process.

In the absence of complete knowledge of all the underlying processes which gov-ern climate, simulation models are needed to model the stochastic behaviour ofthe system when historical records are of insufficient duration or inadequate spatialand/or temporal coverage. In these cases synthetic sequences may be used to fillin gaps in historical records, to extend the historical record, or to generate realiza-tions of weather that are stochastically similar to the historical record. A weathergenerator is a stochastic numerical model that generates daily weather series withthe same statistical properties as observed real world weather series.

However, even if the measured information of interest – i.e., the amount ofprecipitation – is a fairly accessible entity, the actual building of a weather generatoris made difficult by the long time periods and complex processes which governs it.

In contrast to meteorology, which focuses on short term weather systems, in thecase of a weather generator the interest lies rather in unfolding the structure ofthe underlying process. The actual pattern of measures such as frequency or thetrends of those systems are of more interest than the daily precipitation. When

1

watching the news one is usually more interested in knowledge about the opposite,the precipitation in a near future.

The climate consists of more than averages and seasonal variations. Rare eventswith extreme weather are also a part of the climate. There are different kinds ofclimate extremes, some are violent, such as a severe rainfall, which may have adevastating affect on the community. In order to optimize the counter response tothese a model that accurately describes the distribution of these events is a highpriority.

Description

The aim is to create a weather generator that can produce realistic sequences ofprecipitation for some specific geographical sites in Sweden.

In developing the weather generator, the stochastic structure of the time series ofdaily amounts of precipitation is described by a statistical model. The parametersof this model are estimated by means of using observed real world precipitationdata. The thus completed weather generator allows us to generate arbitrarily longseries with the stochastic structure of the model. We can also evaluate the degreeof similarity between weather data produced by the model and the real world dataseries.

The statistical model is a chain-dependent model which consists of two steps:First a model for the sequence of wet/dry days, for which we employ a multiple orderMarkov chain. Secondly a model for the amount of precipitation for the wet days,for which we employ a composite model that incorporates the empirical distributiontogether with an extreme value distribution for the tail.

In this study the actual physical processes that are influencing the climate areof less importance than what they accomplish, that is, the daily amount of precipi-tation.

In the mathematical part of our work a great deal revolves around Markovprocesses. Briefly put, a Markov process is a process that features the propertyof a full blown amnesia: The future is independent of the past conditioned on thatthe current is known. A multiple step Markov chain is just that the full amnesiaproperty is relaxed so that with knowledge of the current state and the states somesteps back, then the future is independent of the past. However, even though theMarkov property is very useful in terms of explicitly finding the probability of a vastnumber of interesting events, one must remember that this is just a mathematicalmodel for a real world that sometimes might be much more complicated. Still it isa good and versatile model.

By countless examples it has been demonstrated that Gaussian processes veryaccurately describe the dependence structure in our world. So when confrontingthe fact that the data indicated temporal dependence it is natural to appointeea Gaussian copula. Instead of estimating the parameter for a Gaussian copuladirectly from the data – a computationally very hard assignment – we sidestep thatproblem by means of using the fact that all computations for copulas are basedon the empirical rank. We transform these ranks to their corresponding normalquantile, meaning that if the bivariates really are governed by the Gaussian copula,then the transformed bivariables are multivariate normal distributed. Estimatingthe Gaussian copula parameter now reduces to estimate the correlation coefficientfor a multivariate normal distributed variable, which is an elementary task.

2

Since extreme rainfall so crucially affect the community, we spend substantialeffort on analysing these rare events. By a point-process approach, singling out thedependence of close-by extremes, the extremal distribution is estimated.

My contribution

The problem of creating a weather generator, together with suggestions for suitabledata sets, was proposed to me by environmentalist Deliang Chen. My head advisor– Patrik Albin – got me started by suggesting dividing the precipitation processin different parts and specifically modelling the precipitation-/no precipitation partas a multiple step Markov chain and the rain distribution by a combination of theempirical distribution and an extreme value distribution. By the fruitful tutoring ofmy co-advisor – Anastassia Baxevani – we moved the problem forward and endedup with this thesis at hand. Except for the above mentioned crucial advisory con-tributions, everything in this thesis has been done by myself the undersigned. Thatmeans building up the stochastical model from scratch into something that veryaccurately replicates historical data, by means of statistical test with respect to theestablished weather indices.

3

Appended paper

Lennartsson, J., Baxevani, A., Chen, D., Modelling precipitation in Sweden usingmultiple step Markov chains and a composite model, Journal of Hydrology (2008),doi: 10.1016/j.jhydrol.2008.10.003

Two versions of the manuscript is appended because of the differing typograph-ical advantages; the final manuscript as accepted for publication and the presentversion of the manuscript as typeset by Journal of Hydrology.

5

Modelling Precipitation in Sweden Using Multiple Step Markov

Chains and a Composite Model

Jan Lennartsson1, Anastassia Baxevani1∗†, Deliang Chen2

1 Department of Mathematical Sciences, Chalmers University of Technology, Univer-

sity of Gothenburg, Gothenburg, Sweden

2 Department of Earth Sciences, University of Gothenburg, Gothenburg, Sweden

Abstract

In this paper, we propose a new method for modelling precipitation in Sweden. We consider a

chain dependent stochastic model that consists of a component that models the probability of

occurrence of precipitation at a weather station and a component that models the amount of

precipitation at the station when precipitation does occur. For the first component, we show

that for most of the weather stations in Sweden a Markov chain of an order higher than one is

required. For the second component, which is a Gaussian process with transformed marginals,

we use a composite of the empirical distribution of the amount of precipitation below a given

threshold and the generalized Pareto distribution for the excesses in the amount of precipitation

above the given threshold. The derived models are then used to compute different weather

indices. The distribution of the modelled indices and the empirical ones show good agreement,

which supports the choice of the model.

Key words: High order Markov chain, generalized Pareto distribution, copula, precipita-

tion process, Sweden

∗Corresponding author†Research supported partially by the Gothenburg Stochastic Center and the Swedish foundation for

Strategic Research through Gothenburg Mathematical Modelling Center.

1

1 Introduction

Realistic sequences of meteorological variables such as precipitation are key inputs in many

hydrologic, ecologic and agricultural models. Simulation models are needed to model

stochastic behavior of climate system when historical records are of insufficient duration

or inadequate spatial and /or temporal coverage. In these cases synthetic sequences may

be used to fill in gaps in the historical record, to extend the historical record, or to generate

realizations of weather that are stochastically similar to the historical record. A weather

generator is a stochastic numerical model that generates daily weather series with the

same statistical properties as the observed ones, see Liao et al. (2004).

In developing the weather generator, the stochastic structure of the series is described

by a statistical model. Then, the parameters of the model are estimated using the observed

series. This allows us to generate arbitrarily long series with stochastic structure similar

to the real data series.

Parameter estimation of stochastic precipitation models has been a topic of intense

research the last 20 years. The estimation procedures are intrinsically linked to the nature

of the precipitation model itself and the timescale used to represent the process. There are

models which describe the precipitation process in continuous time and models describing

the probabilistic characteristics of precipitation accumulated on a given time period, say

daily or monthly totals. Different reviews of the available models have been presented: see

for example Woolshiser (1992), Cox and Isham (1988) and Smith and Robinson (1997).

Continuous time models for a single site with parameters related to the underly-

ing physical precipitation process are particularly important for the analysis of data at

short timescales, e.g. hourly. Some of these models are described in Rodrıguez-Iturbe et

al. (1987, 1988) and Waymire and Gupta (1981).

When only accumulated precipitation amounts for a particular time period (daily) are

recorded then empirical statistical models, based on stochastic models that are calibrated

from actual data are appealing. Empirical statistical models for generating daily precip-

itation data at a given site can be classified into four different types, chain dependent

or two-part models, transition probability matrix models, resampling models and ARMA

time series models, see Srikanthan and McMahon (2001) for a complete review of the

2

different models.

A generalization of the precipitation models for a single site is the spatial extension

of these models for multiple sites, to try to incorporate the intersite dependence but pre-

serving the marginal properties at each site. A more ambitious task is the modelling of

precipitation continuously in time and space and original work on these type of models

based on point process theory was presented by LeCam (1961) and further developed by

Waymire et al. (1984) and Cox and Isham (1994). Mellor (1996) has developed the mod-

ified turning bands model which reproduces some of the physical features of precipitation

fields in space as rainbands, cluster potential regions of rain cells.

In this study we concentrate on the chain-dependent model for the daily precipitation

in Sweden which consists of two steps, first a model for the sequence of wet/dry days

and second, a model for the amount of precipitation for the wet days. For the first, we

use high-order Markov chains and for the second we introduce a composite model that

incorporates the empirical distribution and the generalized Pareto distribution.

2 Data

Fig.1: Location of the stations.

Precipitation data from 20 stations in Sweden have been used in the studies presented

in this paper. The locations are shown in Fig. 1 and the names of the stations are given

in Table 1. The data consist of accumulated daily precipitation collected during 44 years

starting on the 1st of January 1961 and ending the 31st of December 2004 and are provided

by the Swedish Meteorological and Hydrological Institute (SMHI). The number of missing

3

observations in all stations is generally low (< 5%). The time plots of the annual number

of wet days (above the threshold 0.1 mm) at the 20 stations are presented in Fig. 2.

1970 1980 1990 20000

100

200

300

annu

al n

o. o

f wet

day

s

Years

Lund

1970 1980 1990 20000

100

200

300

annu

al n

o. o

f wet

day

s

Years

Bolmen

1970 1980 1990 20000

100

200

300

annu

al n

o. o

f wet

day

s

Years

Hanö

1970 1980 1990 20000

100

200

300

annu

al n

o. o

f wet

day

s

Years

Borås

1970 1980 1990 20000

100

200

300

annu

al n

o. o

f wet

day

s

Years

Varberg

1970 1980 1990 20000

100

200

300

annu

al n

o. o

f wet

day

s

Years

Ungsberg

1970 1980 1990 20000

100

200

300

annu

al n

o. o

f wet

day

s

Years

Säffle

1970 1980 1990 20000

100

200

300

annu

al n

o. o

f wet

day

s

Years

Söderköping

1970 1980 1990 20000

100

200

300

annu

al n

o. o

f wet

day

s

Years

Stockholm

1970 1980 1990 20000

100

200

300

annu

al n

o. o

f wet

day

s

Years

Malugn

1970 1980 1990 20000

100

200

300

annu

al n

o. o

f wet

day

s

Years

Vattholma

1970 1980 1990 20000

100

200

300

annu

al n

o. o

f wet

day

s

Years

Myskåsen

1970 1980 1990 20000

100

200

300an

nual

no.

of w

et d

ays

Years

Härnösand

1970 1980 1990 20000

100

200

300

annu

al n

o. o

f wet

day

s

Years

Rösta

1970 1980 1990 20000

100

200

300

annu

al n

o. o

f wet

day

s

Years

Piteå

1970 1980 1990 20000

100

200

300

annu

al n

o. o

f wet

day

s

Years

Stensele

1970 1980 1990 20000

100

200

300

annu

al n

o. o

f wet

day

s

Years

Haparanda

1970 1980 1990 20000

100

200

300

annu

al n

o. o

f wet

day

s

Years

Kvikkjokk

1970 1980 1990 20000

100

200

300

annu

al n

o. o

f wet

day

s

Years

Pajala

1970 1980 1990 20000

100

200

300

annu

al n

o. o

f wet

day

s

Years

Karesuando

Fig.2: Time plot of annual number of wet days.

Time plots of annual number of wet days showed that the precipitation regime in

some stations (namely, Soderkopping, Rosta and Stensele) contains possible trends. The

4

results presented in the next sections refer to the whole period of data from all stations,

but attention should be paid when we refer to the above mentioned stations. In Fig. 3,

time plots of the annual amount of precipitation of the wet days are presented. The total

amounts of precipitation seem to be stationary over the different years.

3 Model

To model precipitation in Sweden, we have decided to use a chain dependent model. The

first part of the model can be dealt with using Markov chains. Gabriel and Newman (1962)

used a first-order stationary Markov chain. The models have since been extended to allow

for non-stationarity, both by fitting separate chains to different periods of the year and

by fitting continuous curves to the transition probabilities, see Stern and Coe (1984) and

references within. The order of Markov chain required has also been discussed extensively,

for example Chin (1977) and references therein, with the obvious conclusion that different

sites require different orders. Still, the first order Markov chains are a popular choice since

they have been shown to perform well for a wide range of different climates, see for example

Bruhn et al. (1980), Lana and Burgueno (1998) and Castellvi and Stockle (2001). The

main deficiency associated with the use of first order models is that long dry spells are

not well reproduced, see Racsko et al. (1991), Guttorp (1995).

To model the amount of precipitation that has occurred during a wet day, different

models have been proposed in the literature all of which assume that the daily amounts

of precipitation are independent and identically distributed. Stidd (1973) and Hutchin-

son (1995) have proposed a truncated normal model for the amount of precipitation with

a time dependent parameter, while the Gamma and Weibull distributions have been se-

lected by Geng et al. (1986) as well as Selker and Heith (1990), because of their site-specific

shape.

In this study, we model the occurrence of wet/dry days using Markov chains of higher

order and for the amount of precipitation we use a composite model, consisting of the

empirical distribution function for values below a threshold and the distribution of excesses

for values above the given threshold. Such a model is more flexible, describes better the

tail of the distribution and additionally allows for dependence in the precipitation process.

5

1980 2000

500

1000

1500am

ount

of p

reci

p. (

mm

)

Years

Lund

1980 2000

500

1000

1500

amou

nt o

f pre

cip.

(m

m)

Years

Bolmen

1980 2000

500

1000

1500

amou

nt o

f pre

cip.

(m

m)

Years

Hanö

1980 2000

500

1000

1500

amou

nt o

f pre

cip.

(m

m)

Years

Borås

1980 2000

500

1000

1500

amou

nt o

f pre

cip.

(m

m)

Years

Varberg

1980 2000

500

1000

1500

amou

nt o

f pre

cip.

(m

m)

Years

Ungsberg

1980 2000

500

1000

1500

amou

nt o

f pre

cip.

(m

m)

Years

Säffle

1980 2000

500

1000

1500

amou

nt o

f pre

cip.

(m

m)

Years

Söderköping

1980 2000

500

1000

1500

amou

nt o

f pre

cip.

(m

m)

Years

Stockholm

1980 2000

500

1000

1500

amou

nt o

f pre

cip.

(m

m)

Years

Malugn

1980 2000

500

1000

1500

amou

nt o

f pre

cip.

(m

m)

Years

Vattholma

1980 2000

500

1000

1500

amou

nt o

f pre

cip.

(m

m)

Years

Myskåsen

1980 2000

500

1000

1500

amou

nt o

f pre

cip.

(m

m)

Years

Härnösand

1980 2000

500

1000

1500

amou

nt o

f pre

cip.

(m

m)

Years

Rösta

1980 2000

500

1000

1500

amou

nt o

f pre

cip.

(m

m)

Years

Piteå

1980 2000

500

1000

1500

amou

nt o

f pre

cip.

(m

m)

Years

Stensele

1980 2000

500

1000

1500

amou

nt o

f pre

cip.

(m

m)

Years

Haparanda

1980 2000

500

1000

1500

amou

nt o

f pre

cip.

(m

m)

Years

Kvikkjokk

1980 2000

500

1000

1500

amou

nt o

f pre

cip.

(m

m)

Years

Pajala

1980 2000

500

1000

1500

amou

nt o

f pre

cip.

(m

m)

Years

Karesuando

Fig.3: Time plot of annual amount of precipitation.

Let Zt be the precipitation at a certain site at time t measured in days. Then, a

chain-dependent model for the precipitation is given by,

Zt = XtWt,

where Xt and Wt are stochastic processes such that Xt takes values in {0, 1} and Wt

6

Number Name

1 Lund

2 Bolmen

3 Hano

4 Boras

5 Varberg

6 Ungsberg

7 Saffle

8 Soderkoping

9 Stockholm

10 Malung

11 Vattholma

12 Myskelasen

13 Harnosand

14 Rosta

15 Pitea

16 Stensele

17 Haparanda

18 Kvikkjokk

19 Pajala

20 Karesuando

Table 1: Names of weather stations.

takes values in R+ \ 0. The processes Xt and Wt will be referred to as the occurrence of

precipitation and the amount of precipitation process, respectively.

The approach presented in this study provides a mechanism to make predictions of

precipitation in time. This is particularly important for many applications in hydrology,

ecology and agriculture. For example, at a monthly level, the amount of precipitation and

the probability and length of a dry period are required quantities for many applications.

7

4 Models for the Occurrence of Precipitation

Let {Xt, t = t1, . . . , tN} denote the sequence of daily precipitation occurrence, i.e. Xt = 1,

indicates a wet day and Xt = 0, a dry day. A wet day in the context of this study, occurs

when at least 0.1mm of precipitation was recorded by the rain gauge. The level has been

chosen above zero in order to avoid identifying dew and other noise as precipitation and

to also avoid difficulties arising from the inconsistent recording of very small precipitation

amounts. Moreover, daily precipitation amounts of less than 0.1mm can have relatively

large observational errors, and including them would cause a significant change in the

estimated transition probabilities of the occurrences. As a consequence this introduces

additional errors into the fitted models. The model is fitted over different periods of the

year, that is subsets of the N days of the year, that may be assumed stationary.

Before we continue any further we need to introduce some notation. Let S = {0, 1}denote the state space of the k-Markov chain Xt. The elements of S are called letters and

an ordering of letters w ∈ Sl = S × · · · × S is called a word of length l, while the words

composed of the letters from position i to j in w for some 1 ≤ i ≤ j ≤ l, are denoted as

wji = (wi, wi+1, ..., wj). Finally, for k ≤ l let τk(w) = wl

l−k+1 denote the k-tail of the w

word, i.e. τk(w) denotes the last k letters of w. If no confusion will arise when k ≤ j − i,

we also write τk(wj) instead of τk(w

ji ).

It is assumed that the process Xt is a k-Markov chain: a model completely character-

ized by the transition probability

pw,j(t) := P (Xt = j|τk(Xt−1) = w), j ∈ S, t = t1, . . . , tN ,

where w is a word of length k and X t−1 = {. . . , Xt−2, Xt−1} is the whole process up

to t − 1 so τk(Xt−1) is the last k days up to and including Xt−1; that is, τk(X

t−1) =

(Xt−k, . . . , Xt−1). Note that, for a 2−state Markov chain of any order pw,1(t)+pw,0(t) = 1.

In the special case of time homogeneous Markov chain, pw,j(t) = pw,j, for t = t1, . . . , tN ,

i.e. the transition probabilities are independent of time.

Let nw,j(t) denote the number of years during which day t is in state j and is preceded

by the word w (i.e. τk(Xt−1) = w,w ∈ Sk and Xt = j). Then the probabilities pw,j(t) are

estimated by the observed proportions

pw,j(t) =nw,j(t)

nw,+(t), w ∈ Sk, j ∈ S, t = t1, . . . , tN ,

8

where + indicates summation over the subscript. Note also that day 60 (February 29th)

has data only in leap years so day 59 precedes day 61 in non-leap years. Fig. 4 (left)

shows the unconditional probability of precipitation, pooled over 5 days for clarity, plotted

against t for the data from the station in Lund.

In the context of environmental processes, non-stationarity is often apparent, as in

this case, because of seasonal effects or different patterns in different months. A usual

practice is to specify different subsets of the year as seasons, which results to different

models for each season, although the determination of an appropriate segregation into

seasons is itself an issue.

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec0.3

0.35

0.4

0.45

0.5

0.55

0.6

0.65

0.7

0.75

0.8

prob

. of p

reci

p.

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec10

11

12

13

14

15

16

17

18

19

20

mea

n no

. of w

et d

ays

Fig.4: Lund, Sweden (data from 1961 to 2004). (Left): Observed p(t) pooled over 5

days. (Right): Mean number of wet days per month (”+”), and per season (solid lines).

4.1 Fitting Models to the Occurrence of Precipitation

There is an inter-annual variation in the annual number of wet days, as can be seen in

Fig. 2. Moreover, there is also seasonal variation in the mean monthly number of wet days,

see Fig. 4 (Right) for data from Lund, although this is not as prominent as in other regions

of the world. It is possible that the optimum order of the chain describing the wet/dry

sequence varies within the year and from one year to another. It is therefore important

to properly identify the period of record that can be assumed as time homogeneous.

Moreover, the problem of finding an appropriate model for the occurrence of precip-

itation process, Xt, is equivalent to the problem of finding the order of a multiple step

Markov chain. The Akaike Information Criterion (AIC), Bayesian Information Criterion

9

(BIC) and the Generalized Maximum Fluctuation Criterion (GMFC) order estimators, a

short description of which can be found in the subsection 8.1, have been applied to the

data for each of the stations. Various block lengths were considered for determining the

order of the Markov chain, k, as suggested in Jimoh and Webster (1996).

• 1 month blocks (i.e. January, February, ..., December),

• 2 month blocks (January - February, February - March, ..., December - January),

• 3 month blocks (January - March, February - April, ..., December - February).

The effect of block length on the order of the Markov chain can be seen in Figs. 5-7.

We can notice that grouping the data in blocks of length more than one month, results

in Markov chains of ”smoother” order, in the sense that the order of the chain does not

change so fast. It is also interesting to notice that while the order of the Markov chain for

the stations 16-20, varies a lot according to the AIC and GMFC estimators it seems to

be almost constant for the BIC order estimator. As it has been expected, the BIC order

estimator underestimates the order k of the Markov chain relatively to both the AIC and

GMFC order estimators for large k and moderate data sets, see Dalevi et al. (2006), while

the values of the GMFC order estimator lie between the BIC and AIC ones. The results

presented in Figs. 5-7, confirm that the model order is sensitive to the season (month) and

the length of the season (number of months) considered, as well as the method used in

identifying the optimum order. Possible dependence on the threshold used for identifying

wet and dry days has not been studied here. For the rest of this study, we define as

seasons the 3 month periods, December-February, March-May, June-August, September-

November. As can be seen in Fig. 3 for the station in Lund, the rest of the stations

provide with similar plots, the probability of precipitation is close to be constant during

these periods, which makes the assumption of stationarity seem plaussible. The orders of

the Markov chain for these periods can be found in Fig. 7. For the rest of this study the

order k of the Markov chain is decided according to the GMFC order estimator.

10

Estimated Orders by Akaike order estimator

st 01 st 02 st 03 st 04 st 05

st 06 st 07 st 08 st 09 st 10

st 11 st 12 st 13 st 14 st 15

st 16 st 17 st 18 st 19 st 20

5

3

1

5

3

1

5

3

1

5

3

1

Jan Jun DecJan Jun DecJan Jun DecJan Jun DecJan Jun Dec

st 01 st 02 st 03 st 04 st 05

st 06 st 07 st 08 st 09 st 10

st 11 st 12 st 13 st 14 st 15

st 16 st 17 st 18 st 19 st 20

5

3

1

5

3

1

5

3

1

5

3

1


Estimated Orders by Bayesian order estimator

st 01 st 02 st 03 st 04 st 05

st 06 st 07 st 08 st 09 st 10

st 11 st 12 st 13 st 14 st 15

st 16 st 17 st 18 st 19 st 20

5

3

1

5

3

1

5

3

1

5

3

1


Estimated Orders by GMFC order estimator

st 01 st 02 st 03 st 04 st 05

st 06 st 07 st 08 st 09 st 10

st 11 st 12 st 13 st 14 st 15

st 16 st 17 st 18 st 19 st 20

5

3

1

5

3

1

5

3

1

5

3

1


Fig.5: k-Markov chain orders for block lengths of one month, (Jan, Feb, ...).


st 01 st 02 st 03 st 04 st 05

st 06 st 07 st 08 st 09 st 10

st 11 st 12 st 13 st 14 st 15

st 16 st 17 st 18 st 19 st 20

5

3

1

5

3

1

5

3

1

5

3

1



st 01 st 02 st 03 st 04 st 05

st 06 st 07 st 08 st 09 st 10

st 11 st 12 st 13 st 14 st 15

st 16 st 17 st 18 st 19 st 20

5

3

1

5

3

1

5

3

1

5

3

1



st 01 st 02 st 03 st 04 st 05

st 06 st 07 st 08 st 09 st 10

st 11 st 12 st 13 st 14 st 15

st 16 st 17 st 18 st 19 st 20

5

3

1

5

3

1

5

3

1

5

3

1


Fig.6 k-Markov chain orders for block lengths of two months, (Jan-Feb, Feb-Mar, ...).


st 01 st 02 st 03 st 04 st 05

st 06 st 07 st 08 st 09 st 10

st 11 st 12 st 13 st 14 st 15

st 16 st 17 st 18 st 19 st 20

5

3

1

5

3

1

5

3

1

5

3

1



st 01 st 02 st 03 st 04 st 05

st 06 st 07 st 08 st 09 st 10

st 11 st 12 st 13 st 14 st 15

st 16 st 17 st 18 st 19 st 20

5

3

1

5

3

1

5

3

1

5

3

1



st 01 st 02 st 03 st 04 st 05

st 06 st 07 st 08 st 09 st 10

st 11 st 12 st 13 st 14 st 15

st 16 st 17 st 18 st 19 st 20

5

3

1

5

3

1

5

3

1

5

3

1


Fig.7 k-Markov chain orders for block lengths of three months, (Jan-Mar, Feb-Apr, ... ).

4.2 Distribution of Dry Spell length

An interesting aspect of the wet/dry behavior, i.e. the process Xt, is the distribution of

the dry (wet) spells, i.e., the number of consecutive dry (wet) days, which is an accessible

property of multiple step Markov chains.

11

For a time homogeneous (stationary) k-Markov chain Xt, (k ≥ 2), with state-space S

let T be the first time the process Xt is such that τ2(Xt) = (1, 0), i.e.,

T = inf{t ≥ 0 : τ2(Yt) = (1, 0)}.

So T is the time of the start of the first dry period. Let also for the words u, v ∈ Sk

au,v = P (τk(XT ) = v|τk(X

0) = u)

denote the probability the process Xt has at time T a k-tail equal to v given that the

k-tail at time 0 is equal to u. The probabilities au,v are easily obtained for stationary

processes, see Norris (1997). Note that at t = 0, there may be the start of a dry period,

the start of a wet period, the continuation of a dry period or the continuation of a wet

period. If D(Xt) denotes the length of the first dry period that starts at time t = 0 for the

k-Markov chain Xt, then assuming additionally that the process Xt is time homogeneous,

the distribution of the first dry spell can be computed as

P (D(Xt) = m) =∑

{u∈Sk}πu

∑{w∈Sk:τ2(w)=(1,0)}

au,wP (τm(Xm−1) = 0, Xm = 1|τk(X0) = w),

(1)

where 0 is used to denote sequences of 0′s of appropriate length.

Now, if v = w01 is a word of length m + k (0 here is of order m − 1) and using the

fact the process Xt is a k-Markov chain, Eq.1 can be rewritten as

P (D(Xt) = m) =∑

{u∈Sk}πu

∑{w∈Sk:τ2(w)=(1,0)}

au,w

m∏i=1

P (Xi = vk+i|τk(Xi−1) = τk(v

k+i−1)).

(2)

Remark 1 Here we should notice that the distribution of the first dry spell is different

than the distribution of the subsequent dry spells for Markov chains of order greater than

two. For one or two order Markov chains there is no need for this distinction. Moreover

the equivalent of Eq. 1 for k = 1 is

P (D(Xt) = m) = pm−10,0 p0,1

while for k = 2, Eq. 2 simplifies to

P (D(Xt) = m) =

p10,1 for m = 1

p10,0pm−200,0 p00,1 for m ≥ 2,

where au,v = 1 for all u,v in Eq. 1.

12

The distribution of the first dry spell can be also used for model selection or model

validation purposes. For this, we use the Kolmogorov-Smirnov (KS) test, see Benjamin

and Cornell (1970). The one sample KS test compares the empirical distribution function

with the cumulative distribution function specified by the null hypothesis.

Assuming that Pk(x) is the true distribution function (of a Markov chain of order k)

the KS test is

D = supm∈N+

|Pk(D(X) ≤ m)− Femp(m)|,

where Femp(x) is the empirical cumulative distribution of the length of the first dry spell.

If the data comes truly from a k order Markov chain and the transition probabilities are

the correct ones, then by Glivenko-Cantelli theorem, the KS test converges to zero almost

surely (a.s).

To apply the test, the transition probabilities have been estimated from the data using

maximum likelihood for different values of the order k of the Markov chain. To obtain

the empirical distribution of the length of the first dry spell, we have computed the length

of the dry spells (sequence of zeros) following the first (1, 0). (Here note that this is

equivalent to computing the length of the first dry spell for Markov chains of order k = 1

or k = 2. In the case of k = 3, although the distribution of the first dry spell is not

exactly the same as the distribution of any dry spell, we have still used all the dry spells

available due to shortage of data.) The procedure has been applied separately to data

from each station and season. If the first observations were zeros, they were ignored as

the continuation of a dry spell. Also if a dry spell was not over by the end of the season

then it was followed inside the next season.

To determine whether the theoretical model was correct or not, Monte Carlo simula-

tions were performed. We have obtained the empirical distribution of the length of the

first dry spell using 500 synthetic wet/dry records of 44 years of data (each station and

season was treated separately), and the KS test was computed for each one of them, which

resulted to the distribution of the KS statistic.

13

Estimated Orders by KS−criterion of dry spell order estimator

st 01 st 02 st 03 st 04 st 05

st 06 st 07 st 08 st 09 st 10

st 11 st 12 st 13 st 14 st 15

st 16 st 17 st 18 st 19 st 20

531

531

531

531

Jan May Nov Jan May Nov Jan May Nov Jan May Nov Jan May Nov

st 01 st 02 st 03 st 04 st 05

st 06 st 07 st 08 st 09 st 10

st 11 st 12 st 13 st 14 st 15

st 16 st 17 st 18 st 19 st 20

531

531

531

531


st 01 st 02 st 03 st 04 st 05

st 06 st 07 st 08 st 09 st 10

st 11 st 12 st 13 st 14 st 15

st 16 st 17 st 18 st 19 st 20

531

531

531

531


Fig.8: Order of Markov chain as suggested by the Kolmogorov-Smirnov statistic at 10%

tail value for each station and season.

The suggested orders of the Markov chain using the Kolmogorov-Smirnov statistic at

the 10% tail value are collected in Fig. 8. The resulting orders of the Markov chain appear

to be close to those obtained by the BIC order estimator. In Table 2, we have collected

information on how many data sets have passed the Kolmogorov-Smirnov test at the 10%

tail value for the different seasons. Observe that the KS test suggests that the 1-Markov

chain, although widely used, is an inadequate model for the majority of the stations in

Sweden over the different seasons.

Season

Model S1 S2 S3 S4

k = 1 1 0 1 6

k = 2 20 20 20 20

k = 3 20 20 20 20

Table 2: Number of data sets that have passed the Kolmogorov-Smirnov test at the

10% tail value for different orders of the Markov chain. S1 stands for Dec.-Feb., S2 for

Mar.-May, S3 for Jun.-Aug. and S4 for Sep.-Nov.

4.3 Distribution of Long Dry Spells

Let us now define as long dry spell, a dry spell with length longer or equal to the order k

of the Markov chain. Then it is easy to show that the distribution of the long dry spell is

14

actually geometric. Indeed, let a long dry spell that starts at time i have length m ≥ k

and let us also assume that we know that the length of the dry spell is at least l. Then,

for m ≥ l ≥ k

P (D(Xt) = m|τl(Xi+l−1) = 0) = p0,1p

m−l0,0 = p0,1(1− p0,1)

m−l,

where as before

p0,1 = P (Xn+1 = 1|τk(Xn) = 0), ∀n.

Therefore, the expected length of long dry spells is given by

E(D(Xt)|τl(Xi+l−1) = 0) = l +

1− p0,1

p0,1

. (3)

2 4 6 8 10 12 14

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

time (days)

prob

abili

ty

Empirical3−Markov2−Markov1−Markov

Fig. 9: Conditional distribution of Dry Spell given the Dry Spell is longer or equal to 3

days for k-Markov chain models of order k = 1, k = 2 and k = 3 and the data from Lund.

Data are from the winter months December-February.

Fig. 9 shows the conditional distribution of dry spell given that it has lasted for more

than two days for the first season and the data from Lund. The estimated order of the

Markov chain for this data set is 2 using both the GMFC and the KS criterion. A first order

Markov chain, the popular model of choice in this case would obviously underestimate

the risk of a long dry spell. A two order Markov chain seems to be the best choice for

this particular data set.

It is clear from Table 3, that underestimation of the order k of the Markov chain leads

to underestimation of the expected length of the long dry spells, where again a dry spell

is defined as long if it has length larger than or equal to the order of the Markov chain.

15

Model l = 1 l = 2 l = 3

k = 1 2.49 3.49 4.49

k = 2 - 3.91 4.91

k = 3 - - 5.11

Observed 2.56 3.97 5.23

mean value

Table 3: Expected length of long dry spells for season Dec-Feb in Lund.

5 Modeling the Amount Precipitation Process

In this section we model the amounts of daily precipitation. This is done in two steps.

Firstly we model the dependence structure of the amount precipitation process and sec-

ondly we estimate the marginal distribution.

One of the important features of any climatological data set, is that they exhibit

dependence between nearby stations or successive days. In this work we are interested in

the latter case and the dependence structure is modelled using two-dimensional Gaussian

copula.

After the copula has been estimated, we remove the days with precipitation below the

cut-off level of 0.1mm. That is, we let Yt be the thinning process resulting from the amount

of precipitation process Wt when we consider only the wet days, i.e., Yt := Wt|Xt = 1.

Then, the marginal distribution of the amounts of daily precipitation is modelled following

an approach that combines the fit of the distribution of excesses over a high threshold

with the empirical distribution of the thinned data below the threshold.

5.1 Copula

Almost every climatological data set exhibit dependence between successive days. To

model the temporal dependence structure of the data we use the two-dimensional Gaussian

copula C given by

C(u, v; ρ) =

∫ Φ−1(u)

−∞

∫ Φ−1(v)

−∞

1

2π√

1− ρ2e−x2−2ρxy+y2

2(1−ρ2) dxdy (4)

= Φρ(Φ−1(u), Φ−1(v)),

16

where Φ is the cumulative distribution function of the standard normal distribution and

Φρ is the joint cumulative distribution function of two standard normal random variables

with correlation coefficient ρ.

To estimate the copula, let

A = {t : Yt > 0 and Yt+1 > 0},

be the set of all days with non zero precipitation that were followed by days also with

non zero precipitation (greater than 0.1mm) and

u = [Ya1 , Ya2 , . . . ] , v = [Ya1+1, Ya2+1, . . . ] , a1, a2, · · · ∈ A

be the vectors consisting of the amounts of precipitation during the days indicated in the

set A and the following days respectively, both with marginal distribution F (x). Then,

transforming the vectors u and v by taking the empirical cumulative distribution corrected

by the factor nn+1

, (n is the number of days with positive precipitation in the data set)

results to vectors U and V respectively that follow the discrete uniform distribution in

(0, 1). If the Gaussian copula in Eq. 4 describes correctly the dependence structure of the

data, then (Φ−1(U), Φ−1(V)

) ∼ N µ1

µ2

,

σ1 ρσ1σ2

ρσ1σ2 σ2

.

Finally the copula parameter ρ is estimated using Pearson’s correlation coefficient. An

analytic description of the method and its application can be found in Lennartsson and

Shu, (2005). The dependence between successive days is demonstrated in Fig. 13 where

the transformed data from Lund are plotted.

−4 −3 −2 −1 0 1 2 3 4−4

−3

−2

−1

0

1

2

3

4

T(Y

t+1)

T(Yt)

17

Fig.13: Plot of the dependence structure with the marginal distributions transformed to

standard normal.

For a thorough coverage of bivariate copulas and their properties see Hutchinson and

Lai (1990), Joe (1997), Nelsen (2006), and Trivedi and Zimmer (2005) who provide with

a copula tutorial for practitioners. The values of the correlation coefficient ρ, estimated

for each station are collected in Table 4. Notice that all the estimates of the correlation

coefficient ρ are statistically significant, which makes the assumption of independence

between the data points to seem unreasonable.

5.2 Marginal Distribution

Finally, to model the amount precipitation process we propose an approach that combines

the fit of the distribution of excesses over a high threshold with the empirical distribution

of the original data below the threshold. We commence our analysis by introducing

some notation followed by some introductory remarks. Let X1, X2, . . . be a sequence of

independent and identically distributed random variables having marginal distribution

F (x). Let us also denote by

Fu(x) = P (X ≤ x|X > u),

for x > u, the conditional distribution of X given that it exceeds level u and assume that

Fu(x) can be modelled by means of a generalized Pareto distribution, that is

Fu(x) = 1−(

1 + ξ

(x− u

σ

))− 1ξ

, (5)

for some µ, σ > 0 and ξ over the set {x : x > u and 1 + ξ x−uσ

> 0}, and zero otherwise.

Let also, Femp(x) denote the empirical distribution i.e.,

Femp(x) =1

n

n∑i=1

{Xi ≤ x},

where {·} denotes the indicator function of an event, i.e. the 0−1 random variable which

takes value 1 if the condition between brackets is satisfied and 0 otherwise.

Finally, define the function

FC(x; u) = Femp(x ∧ u) + (1− Femp(u))Fu(x),

18

which, as can be easily checked, is a probability distribution function that will be used to

model the amount precipitation process. Thus what needs to be addressed is the choice

of the level u above which the excesses can be accurately modelled using a generalized

Pareto distribution as well as methods for the estimation of the distribution parameters.

5.2.1 Choice of Threshold Level

Selection of a threshold level u, above which the generalized Pareto distribution assump-

tion is appropriate is a difficult task in practice see for example, McNeil (1996), Davison

and Smith (1990) and Rootzen and Tajvidi (1997). Frigessi et al. (2002), suggest a dy-

namic mixture model for the estimation of the tail distribution without having to specify

a threshold in advance. Once the threshold u is fixed, the model parameters ξ and σ are

estimated using maximum likelihood, although there exists a number of other alternative

methods, see for instance Resnick (1997) and Crovella and Taqqu (1999) and references

therein.

5.2.2 Extreme Value Analysis for Dependent Sequences

The generalized Pareto distribution is asymptotically a good model for the marginal

distribution of high excesses of independent and identically distributed random variables,

see Coles (2001), Leadbetter et al. (1983). Unfortunately, this is a property that is almost

unreasonable for most of the climatological data sets since dependence in successive days

is to be expected. A way of dealing with the dependence between the excesses is either to

choose the level u high enough so that enough time has past between successive excesses to

make them independent or to use declustering, which is probably the most widely adopted

method for dealing with dependent exceedances; it corresponds to filtering the dependent

observations to obtain a set of threshold excesses that are approximately independent,

see Coles (2001). A simple way of determining m-clusters of extremes, after specifying a

threshold u, is to define consecutive excesses of u to belong to the same m-cluster as long

as they are separated by less than m+1 time days. It should be noted that the separation

of extreme events into clusters is likely to be sensitive to the choice of u, although we do

not study this effect in this work. The effect of declustering to the generalized Pareto

distribution in Eq. 5 is the replacement of the parameters σ and ξ by σθ−1 and ξ, where

19

θ is the so-called extremal index and is loosely defined as

θ = (limiting mean cluster size)−1.

5.3 Method Application

In this subsection we apply the method described in subsection 5.2 to model the thinning

of the amount of precipitation process, i.e. Yt. To demonstrate the method we use data

from the station in Lund. The rest of the stations give similar results.

As we have already seen, the data exhibit temporal dependence. The correlation

coefficient ρ, using the Gaussian copula for the data from Lund was estimated to be 0.1362.

The dependence in the data can also be seen in Fig. 10, where the expected number of

m clusters (with more than one observation) for different values of m and u = 15mm

are plotted. The expected number of these m clusters, assuming the observations are

independent is denoted by ’o’ and are consistently less that the observed number of m

clusters that is denoted by ’+’. The expected number of m clusters computed assuming

the observations are actually correlated (ρ = 0.1362) is denoted by ’*’ and provides with

an obvious improvement to the assumption of independence. We also provide with 95%

exact confidence intervals for both cases. The observed values fall inside the confidence

interval constructed assuming correlated data.

0 1 2 3 4 5 60

10

20

30

40

50

m

No

clus

ters

with

mor

e th

an o

ne o

bs

Fig. 10: Number of m-clusters with more than one observation. ’+’ denotes the observed

and ’o’ the theoretical number of m-clusters assuming that the observations are indepen-

dent, while ’*’ denotes the number of m clusters using ρ = 0.1362. Line ’–’ denotes the

95% confidence interval for the theoretical number of m-clusters assuming independence,

20

while ’-.’ denotes the 95% confidence interval for the theoretical number of m-clusters

assuming ρ = 0.1362.

After the cluster size has been decided, in the case of the station in Lund m = 0, we

turn to the problem of estimating the parameters ξ, σ and θ for the generalized Pareto

model. The choice of the specific threshold (u = 15mm) was based on mean residual life

plot. It is expected, see Coles (2001) that for the threshold u for which the generalized

Pareto model provides a good approximation for the excesses above that level, the mean

residual life plot i.e. the locus of the points{(u,

1

nu

nu∑i=1

(Yt(i) − u)

): u < Y max

t

},

where Yt(1), . . . , Yt(nu) are the nu observations that exceed u and Y maxt is the largest ob-

servation of the process Yt, should be approximately linear in u. Fig. 11 shows the mean

residual life plot with approximate 95% confidence interval for the daily precipitation in

Lund. The graph appears to curve from u = 0 until u = 15 and is approximately linear

after that threshold. It is tempting to conclude that there is no stability until u = 28

after which there is approximate linearity which suggests u = 28. However, such threshold

gives very few excesses for any meaningful inference (33 observations out of 16000). So

we decided to work initially with the threshold set at u = 15.

0 10 20 30 40 50 600

2

4

6

8

10

12

E[X

−u|

X>

u]

u

Fig. 11: Mean residual life plot of amount precipitation process from Lund, dotted lines

give the 95% confidence interval.

Finally, the different diagnostic plots for the fit of the Generalized Pareto distribution

are collected in Fig. 12. The data from the rest of the stations have produced similar plots

21

none of which gave any reason for concern about the quality of the fitted models. The

parameters of the generalized Pareto model for the data from all the stations together

with 95% confidence intervals are collected in Table 4. For three different stations, (i.e.

Bolmen, Boras, and Hapamanda), the estimates of the shape parameter, ξ, are negative.

0 0.5 10

0.5

1

Ext

rem

e E

mpi

rical

mod

el

GP model

Probability Plot

20 40 60

20

40

60

GP model (mm)

Ext

rem

e E

mpi

rical

mod

el (

mm

) Quantile Plot

10 40 160

40

60

80

Return Period (Years)

Ret

urn

Leve

l

Return Level Plot

20 40 600

20

40

60

80

no o

f obs

erva

tions

Amount of precipitation (mm)

Density Plot

Fig. 12: Diagnostic plots for threshold excess model fitted to daily precipitation data

from the station in Lund.

Table 5 shows θ for different values of m-clusters and threshold u = 15 for the data

from Lund.

6 Evaluation

To verify the validity of the model, we have obtained distribution functions of the dif-

ferent precipitation indices as stipulated by the Expert Team and its predecessor, the

CCl/CLIVAR Working Group (WG) on Climate Change Detection, see Peterson et al. (2001)

and Karl et al. (1999). Sixteen of those indices are of relevance to this work, two regard-

ing only the occurrence of precipitation process (CDD and CWD), another two regarding

only the amount precipitation process (SDII and Prec90p) and the remaining twelve con-

cerning both processes, see Table 6. Using the chain dependent model, we have obtained

the distribution of each index based on 100, 000 simulations. This has been compared

to the empirical distribution (’.-’ line in Figs. 14 - 18). The agreement between the two

distributions is more than satisfactory. Moreover, the empirical distribution falls always

inside the 90% exact confidence intervals. The results have been presented for the weather

station in Lund. The rest of the stations give similar results.

22

Station σ CI for σ ξ CI for ξ θ u (mm) ρ

Lund 5.91 (4.93, 7.03) 0.076 (-0.041, 0.236) 0.935 15 0.1362

Bolmen 6.44 (5.56, 7.41) -0.0002 (-0.095, 0.116) 0.921 15 0.2008

Hano 5.29 (3.044, 8.737) 0.458 (0.115, 1.05) 0.977 25 0.1649

Boras 7.63 (7.01, 8.28) -0.011 (-0.067,0.053) 0.794 10 0.1982

Varberg 5.48 (4.687, 6.378) 0.106 (0.001, 0.236) 0.926 15 0.1206

Ungsberg 5.768 (4.622,7.115) 0.245 (0.089,0.445) 0.925 15 0.1843

Saffle 6.62 (5.96,7.329) 0.099 (0.027,0.183) 0.857 10 0.1809

Soderkoping 6.259 (4.32,8.884) 0.297 (0.1, 0.649) 0.984 25 0.1678

Stockholm 5.597 (4.827,6.453) 0.135 (0.033,0.259) 0.903 10 0.1523

Malung 6.355 (5.676,7.095) 0.08 (0.004,0.17) 0.86 10 0.2280

Vattholma 4.964 (3.521,6.784) 0.334 (0.098,0.667) 0.984 20 0.1709

Myskelasen 6.854 (5.962,7.844) 0.019 (-0.072,0.13) 0.849 10 0.2311

Harnosand 7.863 (7.053, 8.742) 0.087 (0.011, 0.175) 0.832 10 0.2068

Rosta 6.276 (5.453,7.19) 0.032 (-0.062, 0.145) 0.876 10 0.2116

Pitea 5.937 (4.429, 7.822) 0.19 (0.004, 0.456) 0.96 20 0.2010

Stensele 7.66 (6.098, 9.5) 0.041 (-0.11, 0.236) 0.915 15 0.2249

Haparanda 5.628 (4.405, 7.07) -0.073 (-0.196, 0.125) 0.984 18 0.1871

Kvikkjokk 5.66 (5.01, 6.36) 0.04 (-0.04, 0.137) 0.864 10 0.2526

Pajala 5.033 (3.705, 6.728) 0.356 (0.153, 0.646) 0.966 18 0.2385

Karesuando 5.303 (4.117, 6.754) 0.12 (-0.037, 0.34) 0.922 15 0.2206

Table 4: Extremal parameters and their 95% confidence intervals for each weather station.

m θ

0 0.9144

1 0.8836

2 0.8425

3 0.8322

Table 5: Values of the parameter θ for different choices of m clusters.

23

Index Description Formula

R10mm Heavy precipitation days∑

1{Zi>10}

R20mm Very heavy precipitation days∑

1{Zi>20}

RX1day Highest 1 day precipitation amount maxi Zi

RX5day Highest 5 day precipitation amount maxi

∑4j=0 Zi+j

CDD Max number of consecutive dry days max{j : τj(Xi) = 0}

CWD Max number of consecutive wet days max{j : w = τj(Xi), wk > 0,∀k}

R75p Moderate wet days∑

1{Zi>q0.75}

R90p Above moderate wet days∑

1{Zi>q0.90}

R95p Very wet days∑

1{Zi>q0.95}

R95p Extremely wet days∑

1{Zi>q0.99}

R75pTOT Precipitation fraction due to R75p∑

Zi1{Zi>q0.75}/∑

Zi


Zi1{Zi>q0.90}/∑

Zi


Zi1{Zi>q0.95}/∑

Zi


Zi1{Zi>q0.99}/∑

Zi

SDII Simple daily intensity index∑

Yi/∑

1{Yi>0}Prec90p 90%-quant. of thinned amount of precipitation F−1

Y (0.9)

Table 6: Weather Indices and their mathematical expressions. The quantiles q(·) have

been estimated using the observed data.

As we can see, Fig. 14 (top left), approximately during two years we expect to have

about 17 days with precipitation more than 10mm and, Fig. 14 (top right), about 3 days

with precipitation more than 20mm. But then, see Fig. 14 (bottom left), the precipitation

during each one of these three days will be quite a lot more than 20mm. Fig. 14 (bottom

right) tell us that the probability of having 5 consecutive days of really heavy precipitation

in Lund is quite high.

24

10 15 20 25 30

0

0.2

0.4

0.6

0.8

1

no. of days

Index R10mm

Model cumul. distr.Empirical distr.

0 1 2 3 4 5 6 7 8 9 10

0

0.2

0.4

0.6

0.8

1

no. of days

Index R20mm


15 20 25 30 35 40 45 50 55 60 65

0

0.2

0.4

0.6

0.8

1

amount of prec. (mm)

Index RX1day


30 40 50 60 70 80 90 100 110

0

0.2

0.4

0.6

0.8

1


Index RX5day


Fig. 14: Plots of R10mm (top left), R20mm (top right), RX1day (bottom left) and

RX5day (bottom right). theoretical distribution ’-’ and empirical distribution ’.-’.

10 15 20 25 30 35

0

0.2

0.4

0.6

0.8

1

no. of days

Index CDD


8 10 12 14 16 18 20 22 24

0

0.2

0.4

0.6

0.8

1

no. of days

Index CWD


Fig. 15: Plot of maximum number of consecutive dry days (left), and maximum number

of consecutive wet days (right).

As we notice in Fig. 15 (left), once every two years we should expect to have a dry

spell with length more than two weeks, and a wet spell of approximately 12 days.

25

30 35 40 45 50 55 60 65 70

0

0.2

0.4

0.6

0.8

1

no. of days

Index R75p


10 15 20 25 30

0

0.2

0.4

0.6

0.8

1

no. of days

Index R90p


4 6 8 10 12 14 16 18

0

0.2

0.4

0.6

0.8

1

no. of days

Index R95p


0 1 2 3 4 5 6 7 8 9

0

0.2

0.4

0.6

0.8

1

no. of days

Index R99p


Fig. 16: Plot of the probability of number of moderate wet days (top left), above mod-

erate wet days (top right), very wet days (bottom left) and extremely wet days (bottom

right).

In Fig. 16 (top left), we see that every two years in Lund, we expect to have almost

fifty moderately wet days (top right), almost 18 above moderately wet days (top right),

almost 8 very wet days (bottom left) and almost 2 extremely wet days (bottom right).

In Fig. 17 (top left), we see that during the fifty moderately wet days that we ex-

pect over a period of two years in Lund we will have about 70% of the total amount of

precipitation. Similarly, during the 18 above moderate wet days we expect on average a

little more than 40% of the total precipitation amount (top right), for the 8 very wet days

about 25% of the total amount (bottom left) and for the 2 extremely wet days about 10%

(bottom right) of the total amount.

In Fig. 18 (left), we see that the average amount of precipitation per day of precipita-

tion is 3.5mm and also every year on average only 1 out of the 10 precipitation days the

downfall exceeds 9.5mm.

26

0.6 0.65 0.7 0.75 0.8

0

0.2

0.4

0.6

0.8

1

Index R75pTOT


0.25 0.3 0.35 0.4 0.45 0.5 0.55

0

0.2

0.4

0.6

0.8

1

Index R90pTOT


0.15 0.2 0.25 0.3 0.35 0.4

0

0.2

0.4

0.6

0.8

1

Index R95pTOT


0 0.05 0.1 0.15 0.2 0.25

0

0.2

0.4

0.6

0.8

1

Index R99pTOT


Fig. 17: Percentage of precipitation during the moderately wet days (top left), the above

moderate wet days (top right), the very wet days (bottom left) and the extremely wet

days (bottom right).

3 3.5 4 4.5 5

0

0.2

0.4

0.6

0.8

1


Index SDII


8 9 10 11 12 13

0

0.2

0.4

0.6

0.8

1


Index Prec90p


Fig. 18: Plot of the average amount of precipitation per day of precipitation (left) and the

90% quantile of the amount of precipitation of the thinned precipitation process (right).

7 Conclusions

In this paper, we have modelled the temporal variability of the precipitation in Sweden.

The different weather stations have been assumed as not having any spatial dependence.

27

It is among our future research plans to try to model also the spatial variability of the

precipitation in the different weather stations in Sweden. Some interesting conclusions

can be drawn.

We have used a chain dependent model for the precipitation. That consists of a compo-

nent for the occurrence of precipitation and a component for the amount of precipitation.

For the first component, we have used high order Markov chains with two states. We have

shown that the 1-Markov chain model that has been used extensively, is an inadequate

model for most of the Swedish stations. For example, when the distribution of the long

dry spell is of interest, the 1-Markov chains underestimates the length of the long dry

spell in some cases up to half a day.

For the amount of precipitation process, we have used a copula to describe the temporal

dependence structure between successive days, which in reality is a Gaussian process with

transformed marginals. Then, the cumulative distribution has been modelled in two

steps. First using the empirical distribution for the amounts of precipitation that are less

than a given threshold and, then using a generalised Pareto distribution to model the

excesses above the threshold. Such models have the advantage that they provide with the

mathematical platform that allows computation of such quantities as return periods.

Finally, the distributions of different weather indices have been computed using Monte

Carlo Markov Chain techniques, and been compared to the empirical distributions ob-

tained from the data. The agreement between the two distributions has been really good,

which supports the choice of the models.

References

[1] Akaike, H. (1974). A new look at statistical model identification, IEEE Trans. Auto.

Contol, AC, 19, pp.716-722

[2] Benjamin, J.R. and Cornell, C.A., (1970). Probability, Statistics and Decision for Civil

Engineers, McGraw-Hill, Inc., New York, 685 pp.

[3] Bruhn, J.A., Fry, W.E. and Fick, G.W., (1980). Simulation of daily weather data

using theoretical probability distributions. J. Appl. Meteorol. 19, pp. 1029-1036.

[4] Castellvi, F. and Stockle, C.O., (2001). Comparing a locally-calibrated versus a gen-

eralised temperature weather generation. Trans. ASAE 44 5, pp. 1143-1148.

28

[5] Chin, E. H., (1977). Modelling daily precipitation process with Markov chain, Wat.

Resources Res., 13, 949-956.

[6] Coles, S., (2001). An Introduction to Statistical Modeling of Extreme Values. Springer,

London.

[7] Cox, D.R. and Isham, V., (1988). A simple spatial-temporal model for rainfall (with

discussion). Proc. R. Soc. Lond. A, 415, 317-328.

[8] Cox, D.R. and Isham, V., (1994). Stochastic models of precipitation. In Statistics for

the Environment 2: Water Related Issues (eds V. Barnett and K.F. Turkman), ch. 1, pp.

3-18. Chichester: Wiley.

[9] Crovella, M. and Taqqu, M., (1999). Estimating the heavy tail index from scaling

properties, Methodology and Computing in Applied Probability 1, 55-79.

[10] Dalevi, D., Pubhashi, D. and Hermansson, M., (2006). A New Order Estimator for

Fixed and Variable Length Markov Models with Applications to DNA Sequence Similar-

ity, Stat. Appl. Genet. Mol. Biol., 5, Article 8.

[11] Davison, A.C. and Smith, R.L., (1990). Models for exceedances over high thresholds,

J. Roy. Statist. Soc. B, 52, pp. 393-442.

[12] Frigessi, A., Haug, O., Rue, H., (2002). A Dynamic Mixture Model for Unsupervised

Tail Estimation without Threshold Selection, Extremes, 5, pp.219-235.

[13] Gabriel, K.R. and Neumann, J., (1962). A Markov chain model for daily rainfall

occurrences at Tel Aviv. Quart.J.royal Met.Soc. 88, 90-95.

[14] Geng, S., Frits, W.T., de Vries, P. and Supit, I., (1986). A simple method for gener-

ating daily raifall data. Agric. For. Meteorol. 36, pp. 363-376.

[15] Guttorp, P. (1995). Stochastic Modelling of Scientific Data,Chapman & Hall, London

Chapter 2

[16] Hutchinson, M.F., (1995). Stochastic space-time weather models from ground-based

data. Agric. For. Meteorol., 73, 237-264.

[17] Hutchinson, T.P. and Lai, C.D., (1990). Continuous Bivariate Distributions, Empha-

sising Applications. Sydney, Australia: Rumsby.

[18] Jimoh, O.D. and Webster, P., (1996). the optimum order of a Markov chain model

for daily rainfall in Nigeria. Journal of Hydrology, 185, 45-69.

[19] Joe, H., (1997). Multivariate Models and Dependence Concepts. London: Chapman

29

& Hall

[20] Karl, T.R., Nicholls, N. and Ghazi, A., (1999). CLIVAR/GCOS/WMO workshop on

indices and indicators for climate extremes: Workshop summary, Climatic Change. Vol.

32, pp. 3-7.

[21] Lana, X. and Burgueno, A., (1998). Daily dry-wet behaviour in Catalonia (NE Spain)

from the viewpoint of Markov chains, Int. J. Climatol. 18, 793-815.

[22] Leadbetter, M.R., Lindgren, G., and Rootzen, H., (1983). Extremes and Related

Properties of Random Sequences and Series. Springer Verlag, New York.

[23] LeCam, L., (1961). A stochastic description of precipitation Proc.4th Berkeley Symp.,

pp.165-186.

[24] Lennartsson, J., and Shu, M., (2005). Copula Dependence Structure on Real Stock

Markets, Masters thesis, Chalmers University of Technology, 2005-01.

[25] Liao, Y., Zhang, Q. and Chen, D., (2004). Stochastic modeling of daily precipitation

in China. Journal of Geographical Sciences, 14(4), 417-426.

[26] Mellor, D., (1996). The modified turning bands (mtb) model for space-time rainfall:i,

model definition and properties. J. Hydrol., 175 113-127.

[27] McNeil, A.J., (1996). Estimating the tails of loss severity distributions using extreme

value theory, Technical report, Department Mathematik, ETH Zentrum, Zurich.

[28] Nelsen, R. B., (2006). An Introduction to Copulas 2nd edition. New York: Springer.

[29] Norris, J.R., (2005). Markov chains, Cambridge University Press.

[30] Peterson, T.C., Folland, C., Gruza, G., Hogg, W., Mokssit, A. and Plummer, N.,

(2001). Report on the Activities of the Working Group on Climate Change Detection

and Related Rapporteurs 1998-2001. World Meteorological Organisation, WCDMP-47,

WMO-TD 1071.

[31] Racsko, P., Szeidl, L. and Semenov, M., (1991). A serial approach to local stochastic

weather models. Ecol. Model. 57, pp. 27-41.

[32] Resnick, S.I., (1997). Heavy tail modeling and teletraffic data, The Annals of Statis-

tics 255, 1805-1869.

[33] Rootzen, H. and Tajvidi, N., (1997). Extreme value statistics and wind storm losses:

A case study, Scandinavian Actuarial Journal 1, 70-94.

[34] Rodrıguez-Iturbe, I., Cox, D. and Isham, V., (1987). Some models for rainfall based

30

on stochastic point processes. Proc. R. Soc. Lond., A 410, 269-288.

[35] Rodrıguez-Iturbe, I., Cox, D. and Isham, V., (1988). A point process model for rain-

fall: further developments. Proc. R. Soc. Lond., A 417, 283-298.

[36] Schwarz, G., (1978). Estimating the dimension of a model. Ann. Stat. 6, pp. 461-

464.

[37] Selker, J.S. and Haith, D.A., (1990). Development and testing of simple parameter

precipitation distributions. Water Resour. Res. 26 11, pp. 2733-2740.

[38] Smith, R. L. and Robinson, P.J., (1997). A Bayesian approach to the modelling of

spatial-temporal precipitation data. Lect. Notes Statist., 237-269.

[39] Srikanthan, R. and McMAhon, T.A., (2001). Stochastic generation of annual, monthly

and daily climate data: A review. Hydrol. Earth Syst. Sci. 5 4, pp. 653-670.

[40] Stern, R.D. and Coe, R. (1984)., A Model fitting Analysis of Daily Rainfall Data,

J.R.Statist.Soc. A, 147, Part1, pp.1-34.

[41] Stidd, C.K., (1973). Estimating the precipitation climate. Wat. Resour. Res., 9

1235-1241.

[42] Trivedi, P. K. and Zimmer, D.M., (2007). Copula Modelling: An Introduction for

Practitioners, Foundations and Trends in Econometrics, Vol. 1, No 1, 1-111.

[43] Waymire, E., Gupta, V. K., (1981). The mathematical structure of rainfall repre-

sentations: 3,, Some applications of the point process theory to rainfall processes. Wat.

Resour. Res.17, 1287-1294.

[44] Waymire, E., Gupta, V. K. and Rodrıguez-Iturbe, I., (1984). Spectral theory of rain-

fall intensity at the meso-β scale. Wat. Resour. Res.20, 1453-1465.

[45] Woolhiser, D.A., (1992). Modelling daily precipitation-progress and problems. In:

A. Walden and P. Guttorp (Editors), Statistics in the Environmental and Earth Sciences.

Edward Arnold, London, pp.71-89.

31

8 Appendix

8.1 Review of Mathematical Order Estimators

Let Xt denote a k-Markov chain that is defined on a state space S and xn1 its realisation.

Let also PML(k)(xn1 ) be the kth order maximum likelihood, i.e.

PML(k)(xn1 ) = max P (Xk

1 )Πni=k+1P (Xi = xi|τk(X

i−1) = τk(xi−1)).

Tong (1975) reported that the Akaike Information Criterion (AIC) order estimator, could

be used as an objective technique for determining the optimum order k of the chain, see

also Akaike (1974).The optimium order k is the order that has the minimum loss function:

kAIC(xn1 ) = argmink(− log PML(k)(x

n1 ) + |S|k).

Schwartz (1978) presented an alternative technique the Bayesian Information Criterion

(BIC) order estimator whose consistency was established under general conditions was

only recently established. The optimum order, k is the order that minimises the loss

function which now is given by:

kBIC(xn1 ) = argmink(− log PML(xn

1 ) +|S|k(|S| − 1)

2log(n)).

Dalevi et al. (2006) showed using experimental results that the BIC order estimator tends

to under-estimate the order as k gets larger for moderate data sizes.

Finally, the Maximal Fluctuation Criterion (MFC) contrary to the AIC and BIC

order estimators, was specifically designed for multiple step Markov chains. Let for any

realisation x ∈ Sn of the k-Markov chain, Nx(w) = |{i ∈ [1, n] : τl(xi) = w,w ∈ Sl}|

denote the number of times w occurs in x. The Peres-Shields Fluctuation function is

defined as

∆k(v) = maxs∈S

|Nx(vs)− Nx(τk(v)s)

Nx(τk(v))Nx(v)|.

When the order of the Markov chain is k or less, this fluctuation is small. Therefore, the

Maximal Fluctuation Criterion (MFC) order estimator is defined as

kMFC(xn1 ) = min{k ≥ 0 : max

k<|v|<log log(n)∆k(v) < n3/4}.

In practice the function log log(·) is substituted by any function that grows slower than

log(·). Dalevi et al. (2006) suggested the Generalized Maximum Fluctuation Criterion

32

(GMFC) order estimator, which is closely related to the Maximal Fluctuation Criterion

(MFC) order estimator,

kGMFC(xn1 ) = argmaxk

maxk−1<|v|<f(n) ∆k−1(v)

maxk<|v|<f(n) ∆k(v),

where f(n) is any function that satisfies the same conditions as for the GMF order esti-

mator.

33

Accepted Manuscript

Modelling precipitation in Sweden using multiple step Markov chains and a

composite model

Jan Lennartsson, Anastassia Baxevani, Deliang Chen

PII: S0022-1694(08)00484-8

DOI: 10.1016/j.jhydrol.2008.10.003

Reference: HYDROL 16319

To appear in: Journal of Hydrology

Received Date: 28 May 2008

Revised Date: 3 October 2008

Accepted Date: 4 October 2008

Please cite this article as: Lennartsson, J., Baxevani, A., Chen, D., Modelling precipitation in Sweden using multiple

step Markov chains and a composite model, Journal of Hydrology (2008), doi: 10.1016/j.jhydrol.2008.10.003

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers

we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and

review of the resulting proof before it is published in its final form. Please note that during the production process

errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT

Modelling precipitation in Sweden using multiple step Markov chains and a

composite model

Jan Lennartsson1, Anastassia Baxevani1∗†, Deliang Chen2

1 Department of Mathematical Sciences, Chalmers University of Technology, University of Gothen-1

burg, Gothenburg, Sweden 2 Department of Earth Sciences, University of Gothenburg, Gothenburg,2

Sweden3

Abstract4

In this paper, we propose a new method for modelling precipitation in Sweden. We consider a chain dependent5

stochastic model that consists of a component that models the probability of occurrence of precipitation at a6

weather station and a component that models the amount of precipitation at the station when precipitation does7

occur. For the first component, we show that for most of the weather stations in Sweden a Markov chain of8

an order higher than one is required. For the second component, which is a Gaussian process with transformed9

marginals, we use a composite of the empirical distribution of the amount of precipitation below a given threshold10

and the generalized Pareto distribution for the excesses in the amount of precipitation above the given threshold.11

The derived models are then used to compute different weather indices. The distribution of the modelled indices12

and the empirical ones show good agreement, which supports the choice of the model.13

Key words: High order Markov chain, generalized Pareto distribution, copula, precipitation process, Sweden14

1 Introduction15

Realistic sequences of meteorological variables such as precipitation are key inputs in many hydrologic,16

ecologic and agricultural models. Simulation models are needed to model stochastic behavior of climate17

system when historical records are of insufficient duration or inadequate spatial and /or temporal coverage.18

In these cases synthetic sequences may be used to fill in gaps in the historical record, to extend the19

historical record, or to generate realizations of weather that are stochastically similar to the historical20

record. A weather generator is a stochastic numerical model that generates daily weather series with the21

same statistical properties as the observed ones, see Liao et al. (2004).22

In developing the weather generator, the stochastic structure of the series is described by a statistical23

model. Then, the parameters of the model are estimated using the observed series. This allows us to24

generate arbitrarily long series with stochastic structure similar to the real data series.25

∗Corresponding author†Research supported partially by the Gothenburg Stochastic Center and the Swedish foundation for Strategic Research

through Gothenburg Mathematical Modelling Center.

1

ACCEPTED MANUSCRIPT

Parameter estimation of stochastic precipitation models has been a topic of intense research in the last26

20 years. The estimation procedures are intrinsically linked to the nature of the precipitation model itself27

and the timescale used to represent the process. There are models which describe the precipitation process28

in continuous time and models describing the probabilistic characteristics of precipitation accumulated29

on a given time period, say daily or monthly totals. Different reviews of the available models have been30

presented: see for example Woolshiser (1992), Cox and Isham (1988) and Smith and Robinson (1997).31

Continuous time models for a single site with parameters related to the underlying physical precipi-32

tation process are particularly important for the analysis of data at short timescales, e.g. hourly. Some33

of these models are described in Rodrıguez-Iturbe et al. (1987, 1988) and Waymire and Gupta (1981).34

When only accumulated precipitation amounts for a particular time period (daily) are recorded then35

empirical statistical models, based on stochastic models that are calibrated from actual data are appealing.36

Empirical statistical models for generating daily precipitation data at a given site can be classified into37

four different types, chain dependent or two-part models, transition probability matrix models, resampling38

models and ARMA time series models, see Srikanthan and McMahon (2001) for a complete review of the39

different models.40

A generalization of the precipitation models for a single site is the spatial extension of these models41

for multiple sites, to try to incorporate the intersite dependence but preserving the marginal properties42

at each site. A more ambitious task is the modelling of precipitation continuously in time and space and43

original work on these type of models based on point process theory was presented by LeCam (1961) and44

further developed by Waymire et al. (1984) and Cox and Isham (1994). Mellor (1996) has developed the45

modified turning bands model which reproduces some of the physical features of precipitation fields in46

space as rainbands, cluster potential regions of rain cells.47

In this study we concentrate on the chain-dependent model for the daily precipitation in Sweden48

which consists of two steps, first a model for the sequence of wet/dry days and second, a model for the49

amount of precipitation for the wet days. For the first, we use high-order Markov chains and for the50

second we introduce a composite model that incorporates the empirical distribution and the generalized51

Pareto distribution.52

2 Data53

Precipitation data from 20 stations in Sweden have been used in the studies presented in this paper. The54

locations are shown in Fig. 1 and the names of the stations are given in Table 1. The data consist of55

accumulated daily precipitation collected during 44 years starting on the 1st of January 1961 and ending56

the 31st of December 2004 and are provided by the Swedish Meteorological and Hydrological Institute57

(SMHI). The number of missing observations in all stations is generally low (< 5%). The time plots of58

the annual number of wet days (above the threshold 0.1 mm) at the 20 stations are presented in Fig. 2.59

Time plots of annual number of wet days showed that the precipitation regime in some stations60

(namely, Soderkopping, Rosta and Stensele) contains possible trends. The results presented in the next61

2

ACCEPTED MANUSCRIPT

sections refer to the whole period of data from all stations, but attention should be paid when we refer62

to the above mentioned stations. In Fig. 3, time plots of the annual amount of precipitation of the wet63

days are presented. The total amounts of precipitation seem to be stationary over the different years.64

3 Model65

To model precipitation in Sweden, we have decided to use a chain dependent model. The first part of the66

model can be dealt with using Markov chains. Gabriel and Neumann (1962) used a first-order stationary67

Markov chain. The models have since been extended to allow for non-stationarity, both by fitting separate68

chains to different periods of the year and by fitting continuous curves to the transition probabilities, see69

Stern and Coe (1984) and references within. The order of Markov chain required has also been discussed70

extensively, for example Chin (1977) and references therein, with the obvious conclusion that different71

sites require different orders. Still, the first order Markov chains are a popular choice since they have been72

shown to perform well for a wide range of different climates, see for example Bruhn et al. (1980), Lana73

and Burgueno (1998), Aksoy and Bayazit (2000) and Castellvi and Stockle (2001). The main deficiency74

associated with the use of first order models is that long dry spells are not well reproduced, see Racsko75

et al. (1991), Guttorp (1995).76

To model the amount of precipitation that has occurred during a wet day, different models have been77

proposed in the literature all of which assume that the daily amounts of precipitation are independent78

and identically distributed. Stidd (1973) and Hutchinson (1995) have proposed a truncated normal79

model for the amount of precipitation with a time dependent parameter, while the Gamma and Weibull80

distributions have been selected by Geng et al. (1986) as well as Selker and Heith (1990), because of their81

site-specific shape.82

In this study, we model the occurrence of wet/dry days using Markov chains of higher order and for83

the amount of precipitation we use a composite model, consisting of the empirical distribution function84

for values below a threshold and the distribution of excesses for values above the given threshold. Such a85

model is more flexible, describes better the tail of the distribution and additionally allows for dependence86

in the precipitation process.87

Let Zt be the precipitation at a certain site at time t measured in days. Then, a chain-dependent

model for the precipitation is given by,

Zt = XtWt,

where Xt and Wt are stochastic processes such that Xt takes values in {0, 1} and Wt takes values in88

R+ \ 0. The processes Xt and Wt will be referred to as the occurrence of precipitation and the amount89

of precipitation process, respectively.90

The approach presented in this study provides a mechanism to make predictions of precipitation in91

time. This is particularly important for many applications in hydrology, ecology and agriculture. For92

example, at a monthly level, the amount of precipitation and the probability and length of a dry period93

are required quantities for many applications.94

3

ACCEPTED MANUSCRIPT

4 Models for the Occurrence of Precipitation95

Let {Xt, t = t1, . . . , tN} denote the sequence of daily precipitation occurrence, i.e. Xt = 1, indicates a96

wet day and Xt = 0, a dry day. A wet day in the context of this study, occurs when at least 0.1mm of97

precipitation was recorded by the rain gauge. The level has been chosen above zero in order to avoid98

identifying dew and other noise as precipitation and to also avoid difficulties arising from the inconsistent99

recording of very small precipitation amounts. Moreover, daily precipitation amounts of less than 0.1mm100

can have relatively large observational errors, and including them would cause a significant change in the101

estimated transition probabilities of the occurrences. As a consequence this introduces additional errors102

into the fitted models. The model is fitted over different periods of the year, that is subsets of the N103

days of the year, that may be assumed stationary.104

Before we continue any further we need to introduce some notation. Let S = {0, 1} denote the105

state space of the k-Markov chain Xt. The elements of S are called letters and an ordering of letters106

w ∈ Sl = S×· · ·×S is called a word of length l, while the words composed of the letters from position i to107

j in w for some 1 ≤ i ≤ j ≤ l, are denoted as wji = (wi, wi+1, ..., wj). Finally, for k ≤ l let τk(w) = wl

l−k+1108

denote the k-tail of the w word, i.e. τk(w) denotes the last k letters of w. If no confusion will arise when109

k ≤ j − i, we also write τk(wj) instead of τk(wji ).110

It is assumed that the process Xt is a k-Markov chain: a model completely characterized by the

transition probability

pw,j(t) := P (Xt = j|τk(Xt−1) = w), j ∈ S, t = t1, . . . , tN ,

where w is a word of length k and Xt−1 = {. . . , Xt−2, Xt−1} is the whole process up to t− 1 so τk(Xt−1)111

is the last k days up to and including Xt−1; that is, τk(Xt−1) = (Xt−k, . . . , Xt−1). Note that, for a112

2−state Markov chain of any order pw,1(t)+pw,0(t) = 1. In the special case of time homogeneous Markov113

chain, pw,j(t) = pw,j , for t = t1, . . . , tN , i.e. the transition probabilities are independent of time.114

Let nw,j(t) denote the number of years during which day t is in state j and is preceded by the word

w (i.e. τk(Xt−1) = w, w ∈ Sk and Xt = j). Then the probabilities pw,j(t) are estimated by the observed

proportions

pw,j(t) =nw,j(t)nw,+(t)

, w ∈ Sk, j ∈ S, t = t1, . . . , tN ,

where + indicates summation over the subscript. Note also that day 60 (February 29th) has data only in115

leap years so day 59 precedes day 61 in non-leap years. Fig. 4 (left) shows the unconditional probability116

of precipitation, pooled over 5 days for clarity, plotted against t for the data from the station in Lund.117

In the context of environmental processes, non-stationarity is often apparent, as in this case, because118

of seasonal effects or different patterns in different months. A usual practice is to specify different subsets119

of the year as seasons, which results to different models for each season, although the determination of120

an appropriate segregation into seasons is itself an issue.121

4

ACCEPTED MANUSCRIPT

4.1 Fitting Models to the Occurrence of Precipitation122

There is an inter-annual variation in the annual number of wet days, as can be seen in Fig. 2. Moreover,123

there is also seasonal variation in the mean monthly number of wet days, see Fig. 4 (Right) for data124

from Lund, although this is not as prominent as in other regions of the world. It is possible that the125

optimum order of the chain describing the wet/dry sequence varies within the year and from one year to126

another. It is therefore important to properly identify the period of record that can be assumed as time127

homogeneous.128

Moreover, the problem of finding an appropriate model for the occurrence of precipitation process, Xt,129

is equivalent to the problem of finding the order of a multiple step Markov chain. The Akaike Informa-130

tion Criterion (AIC), Bayesian Information Criterion (BIC) and the Generalized Maximum Fluctuation131

Criterion (GMFC) order estimators, a short description of which can be found in the subsection 8.1, have132

been applied to the data for each of the stations. Various block lengths were considered for determining133

the order of the Markov chain, k, as suggested in Jimoh and Webster (1996).134

• 1 month blocks (i.e. January, February, ..., December),135

• 2 month blocks (January - February, February - March, ..., December - January),136

• 3 month blocks (January - March, February - April, ..., December - February).137

The effect of block length on the order of the Markov chain can be seen in Figs. 5-7. We can notice138

that grouping the data in blocks of length more than one month, results in Markov chains of ”smoother”139

order, in the sense that the order of the chain does not change so fast. It is also interesting to notice140

that while the order of the Markov chain for the stations 16-20, varies a lot according to the AIC and141

GMFC estimators it seems to be almost constant for the BIC order estimator. As it has been expected,142

the BIC order estimator underestimates the order k of the Markov chain relatively to both the AIC and143

GMFC order estimators for large k and moderate data sets, see Dalevi et al. (2006), while the values of144

the GMFC order estimator lie between the BIC and AIC ones. The results presented in Figs. 5-7, confirm145

that the model order is sensitive to the season (month) and the length of the season (number of months)146

considered, as well as the method used in identifying the optimum order. Possible dependence on the147

threshold used for identifying wet and dry days has not been studied here. For the rest of this study,148

we define as seasons the 3 month periods, December-February, March-May, June-August, September-149

November. As can be seen in Fig. 3 for the station in Lund, the rest of the stations provide with similar150

plots, the probability of precipitation is close to be constant during these periods, which makes the151

assumption of stationarity seem plaussible. The orders of the Markov chain for these periods can be152

found in Fig. 7. For the rest of this study the order k of the Markov chain is decided according to the153

GMFC order estimator.154

5

ACCEPTED MANUSCRIPT

4.2 Distribution of Dry Spell length155

An interesting aspect of the wet/dry behavior, i.e. the process Xt, is the distribution of the dry (wet)156

spells, i.e., the number of consecutive dry (wet) days, which is an accessible property of multiple step157

Markov chains.158

For a time homogeneous (stationary) k-Markov chain Xt, (k ≥ 2), with state-space S let T be the

first time the process Xt is such that τ2(Xt) = (1, 0), i.e.,

T = inf{t ≥ 0 : τ2(Y t) = (1, 0)}.

So T is the time of the start of the first dry period. Let also for the words u, v ∈ Sk

au,v = P (τk(XT ) = v|τk(X0) = u)

denote the probability the process Xt has at time T a k-tail equal to v given that the k-tail at time 0 is

equal to u. The probabilities au,v are easily obtained for stationary processes, see Norris (2005). Note

that at t = 0, there may be the start of a dry period, the start of a wet period, the continuation of a

dry period or the continuation of a wet period. If D(Xt) denotes the length of the first dry period that

starts at time t = 0 for the k-Markov chain Xt, then assuming additionally that the process Xt is time

homogeneous, the distribution of the first dry spell can be computed as

P (D(Xt) = m) =∑

{u∈Sk}πu

∑{w∈Sk:τ2(w)=(1,0)}

au,wP (τm(Xm−1) = 0, Xm = 1|τk(X0) = w), (1)

where 0 is used to denote sequences of 0′s of appropriate length.159

Now, if v = w01 is a word of length m + k (0 here is of order m− 1) and using the fact the process

Xt is a k-Markov chain, Eq.(1) can be rewritten as

P (D(Xt) = m) =∑

{u∈Sk}πu

∑{w∈Sk:τ2(w)=(1,0)}

au,w

m∏i=1

P (Xi = vk+i|τk(X i−1) = τk(vk+i−1)). (2)

Remark 1 Here we should notice that the distribution of the first dry spell is different than the distri-

bution of the subsequent dry spells for Markov chains of order greater than two. For one or two order

Markov chains there is no need for this distinction. Moreover the equivalent of Eq.( 1) for k = 1 is

P (D(Xt) = m) = pm−10,0 p0,1

while for k = 2, Eq. (2) simplifies to

P (D(Xt) = m) =

⎧⎨⎩ p10,1 for m = 1

p10,0pm−200,0 p00,1 for m ≥ 2,

where au,v = 1 for all u,v in Eq. (1).160

The distribution of the first dry spell can be also used for model selection or model validation purposes.161

For this, we use the Kolmogorov-Smirnov (KS) test, see Benjamin and Cornell (1970). The one sample162

6

ACCEPTED MANUSCRIPT

KS test compares the empirical distribution function with the cumulative distribution function specified163

by the null hypothesis.164

Assuming that Pk(x) is the true distribution function (of a Markov chain of order k) the KS test is

D = supm∈N+

|Pk(D(X) ≤ m)− Femp(m)|,

where Femp(x) is the empirical cumulative distribution of the length of the first dry spell. If the data165

comes truly from a k order Markov chain and the transition probabilities are the correct ones, then by166

Glivenko-Cantelli theorem, see Dudewicz and Mishra (1988) the KS test converges to zero almost surely167

(a.s).168

To apply the test, the transition probabilities have been estimated from the data using maximum169

likelihood for different values of the order k of the Markov chain. To obtain the empirical distribution170

of the length of the first dry spell, we have computed the length of the dry spells (sequence of zeros)171

following the first (1, 0). (Here note that this is equivalent to computing the length of the first dry spell172

for Markov chains of order k = 1 or k = 2. In the case of k = 3, although the distribution of the first173

dry spell is not exactly the same as the distribution of any dry spell, we have still used all the dry spells174

available due to shortage of data.) The procedure has been applied separately to data from each station175

and season. If the first observations were zeros, they were ignored as the continuation of a dry spell. Also176

if a dry spell was not over by the end of the season then it was followed inside the next season.177

To determine whether the theoretical model was correct or not, Monte Carlo simulations were per-178

formed. We have obtained the empirical distribution of the length of the first dry spell using 500 synthetic179

wet/dry records of 44 years of data (each station and season was treated separately), and the KS test180

was computed for each one of them, which resulted to the distribution of the KS statistic.181

The suggested orders of the Markov chain using the Kolmogorov-Smirnov statistic at the 10% tail182

value are collected in Fig. 8. The resulting orders of the Markov chain appear to be close to those obtained183

by the BIC order estimator. In Table 2, we have collected information on how many data sets have passed184

the Kolmogorov-Smirnov test at the 10% tail value for the different seasons. Observe that the KS test185

suggests that the 1-Markov chain, although widely used, is an inadequate model for the majority of the186

stations in Sweden over the different seasons.187

4.3 Distribution of Long Dry Spells188

Let us now define as long dry spell, a dry spell with length longer or equal to the order k of the Markov

chain. Then it is easy to show that the distribution of the long dry spell is actually geometric. Indeed,

let a long dry spell that starts at time i have length m ≥ k and let us also assume that we know that the

length of the dry spell is at least l. Then, for m ≥ l ≥ k

P (D(Xt) = m|τl(X i+l−1) = 0) = p0,1pm−l0,0 = p0,1(1− p0,1)m−l,

where as before

p0,1 = P (Xn+1 = 1|τk(Xn) = 0), ∀n.

7

ACCEPTED MANUSCRIPT

Therefore, the expected length of long dry spells is given by

E(D(Xt)|τl(X i+l−1) = 0) = l +1− p0,1

p0,1. (3)

Fig. 9 shows the conditional distribution of dry spell given that it has lasted for more than two days189

for the first season and the data from Lund. The estimated order of the Markov chain for this data set is190

2 using both the GMFC and the KS criterion. A first order Markov chain, the popular model of choice191

in this case would obviously underestimate the risk of a long dry spell. A two order Markov chain seems192

to be the best choice for this particular data set.193

It is clear from Table 3, that underestimation of the order k of the Markov chain leads to underesti-194

mation of the expected length of the long dry spells, where again a dry spell is defined as long if it has195

length larger than or equal to the order of the Markov chain.196

5 Modeling the Amount Precipitation Process197

In this section we model the amounts of daily precipitation. This is done in two steps. Firstly we model198

the dependence structure of the amount precipitation process and secondly we estimate the marginal199

distribution.200

One of the important features of any climatological data set, is that they exhibit dependence between201

nearby stations or successive days. In this work we are interested in the latter case and the dependence202

structure is modelled using two-dimensional Gaussian copula.203

After the copula has been estimated, we remove the days with precipitation below the cut-off level204

of 0.1mm. That is, we let Yt be the thinning process resulting from the amount of precipitation process205

Wt when we consider only the wet days, i.e., Yt := Wt|Xt = 1. Then, the marginal distribution of the206

amounts of daily precipitation is modelled following an approach that combines the fit of the distribution207

of excesses over a high threshold with the empirical distribution of the thinned data below the threshold.208

5.1 Copula209

Almost every climatological data set exhibit dependence between successive days. To model the temporal

dependence structure of the data we use the two-dimensional Gaussian copula C given by

C(u, v; ρ) =∫ Φ−1(u)

−∞

∫ Φ−1(v)

−∞

12π√

1− ρ2e−x2−2ρxy+y2

2(1−ρ2) dxdy (4)

= Φρ(Φ−1(u), Φ−1(v)),

where Φ is the cumulative distribution function of the standard normal distribution and Φρ is the joint210

cumulative distribution function of two standard normal random variables with correlation coefficient ρ.211

To estimate the copula, let

A = {t : Yt > 0 and Yt+1 > 0},

8

ACCEPTED MANUSCRIPT

be the set of all days with non zero precipitation that were followed by days also with non zero precipitation

(greater than 0.1mm) and

u = [Ya1 , Ya2 , . . . ] , v = [Ya1+1, Ya2+1, . . . ] , a1, a2, · · · ∈ A

be the vectors consisting of the amounts of precipitation during the days indicated in the set A and

the following days respectively, both with marginal distribution F (x). Then, transforming the vectors

u and v by taking the empirical cumulative distribution corrected by the factor nn+1 , (n is the number

of days with positive precipitation in the data set) results to vectors U and V respectively that follow

the discrete uniform distribution in (0, 1). If the Gaussian copula in Eq. (4) describes correctly the

dependence structure of the data, then

(Φ−1(U), Φ−1(V)

) ∼ N⎛⎝⎡⎣ µ1

µ2

⎤⎦ ,

⎡⎣ σ1 ρσ1σ2

ρσ1σ2 σ2

⎤⎦⎞⎠ .

Finally the copula parameter ρ is estimated using Pearson’s correlation coefficient. An analytic description212

of the method and its application can be found in Lennartsson and Shu, (2005). The dependence between213

successive days is demonstrated in Fig. 10 where the transformed data from Lund are plotted.214

For a thorough coverage of bivariate copulas and their properties see Hutchinson and Lai (1990),215

Joe (1997), Nelsen (2006), and Trivedi and Zimmer (2005) who provide with a copula tutorial for prac-216

titioners. The values of the correlation coefficient ρ, estimated for each station are collected in Table 4.217

Notice that all the estimates of the correlation coefficient ρ are statistically significant, which makes the218

assumption of independence between the data points to seem unreasonable.219

5.2 Marginal Distribution220

Finally, to model the amount precipitation process we propose an approach that combines the fit of the

distribution of excesses over a high threshold with the empirical distribution of the original data below

the threshold. We commence our analysis by introducing some notation followed by some introductory

remarks. Let X1, X2, . . . be a sequence of independent and identically distributed random variables

having marginal distribution F (x). Let us also denote by

Fu(x) = P (X ≤ x|X > u),

for x > u, the conditional distribution of X given that it exceeds level u and assume that Fu(x) can be

modelled by means of a generalized Pareto distribution, that is

Fu(x) = 1−(

1 + ξ

(x− u

σ

))− 1ξ

, (5)

for some µ, σ > 0 and ξ over the set {x : x > u and 1+ ξ x−uσ > 0}, and zero otherwise. Let also, Femp(x)

denote the empirical distribution i.e.,

Femp(x) =1n

n∑i=1

{Xi ≤ x},

9

ACCEPTED MANUSCRIPT

where {·} denotes the indicator function of an event, i.e. the 0− 1 random variable which takes value 1221

if the condition between brackets is satisfied and 0 otherwise.222

Finally, define the function

FC(x; u) = Femp(x ∧ u) + (1− Femp(u))Fu(x),

which, as can be easily checked, is a probability distribution function that will be used to model the223

amount precipitation process. Thus what needs to be addressed is the choice of the level u above which224

the excesses can be accurately modelled using a generalized Pareto distribution as well as methods for225

the estimation of the distribution parameters.226

5.2.1 Choice of Threshold Level227

Selection of a threshold level u, above which the generalized Pareto distribution assumption is appropriate228

is a difficult task in practice see for example, McNeil (1996), Davison and Smith (1990) and Rootzen and229

Tajvidi (1997). Frigessi et al. (2002), suggest a dynamic mixture model for the estimation of the tail230

distribution without having to specify a threshold in advance. Once the threshold u is fixed, the model231

parameters ξ and σ are estimated using maximum likelihood, although there exists a number of other232

alternative methods, see for instance Resnick (1997) and Crovella and Taqqu (1999) and references therein.233

5.2.2 Extreme Value Analysis for Dependent Sequences234

The generalized Pareto distribution is asymptotically a good model for the marginal distribution of high

excesses of independent and identically distributed random variables, see Coles (2001), Leadbetter et

al. (1983). Unfortunately, this is a property that is almost unreasonable for most of the climatological data

sets since dependence in successive days is to be expected. A way of dealing with the dependence between

the excesses is either to choose the level u high enough so that enough time has past between successive

excesses to make them independent or to use declustering, which is probably the most widely adopted

method for dealing with dependent exceedances; it corresponds to filtering the dependent observations

to obtain a set of threshold excesses that are approximately independent, see Coles (2001). A simple way

of determining m-clusters of extremes, after specifying a threshold u, is to define consecutive excesses

of u to belong to the same m-cluster as long as they are separated by less than m + 1 time days. It

should be noted that the separation of extreme events into clusters is likely to be sensitive to the choice

of u, although we do not study this effect in this work. The effect of declustering to the generalized

Pareto distribution in Eq.( 5) is the replacement of the parameters σ and ξ by σθ−1 and ξ, where θ is

the so-called extremal index and is loosely defined as

θ = (limiting mean cluster size)−1.

10

ACCEPTED MANUSCRIPT

5.3 Method Application235

In this subsection we apply the method described in subsection 5.2 to model the thinning of the amount236

of precipitation process, i.e. Yt. To demonstrate the method we use data from the station in Lund. The237

rest of the stations give similar results.238

As we have already seen, the data exhibit temporal dependence. The correlation coefficient ρ, using239

the Gaussian copula for the data from Lund was estimated to be 0.1362. The dependence in the data can240

also be seen in Fig. 11, where the expected number of m clusters (with more than one observation) for241

different values of m and u = 15mm are plotted. The expected number of these m clusters, assuming the242

observations are independent is denoted by ’o’ and are consistently less than the observed number of m243

clusters that is denoted by ’+’. The expected number of m clusters computed assuming the observations244

are actually correlated (ρ = 0.1362) is denoted by ’*’ and provides with an obvious improvement to the245

assumption of independence. We also provide with 95% exact confidence intervals for both cases. The246

observed values fall inside the confidence interval constructed assuming correlated data.247

After the cluster size has been decided, in the case of the station in Lund m = 0, we turn to the

problem of estimating the parameters ξ, σ and θ for the generalized Pareto model. The choice of the

specific threshold (u = 15mm) was based on mean residual life plot. It is expected, see Coles (2001)

that for the threshold u for which the generalized Pareto model provides a good approximation for the

excesses above that level, the mean residual life plot i.e. the locus of the points{(u,

1nu

nu∑i=1

(Yt(i) − u)

): u < Y max

t

},

where Yt(1), . . . , Yt(nu) are the nu observations that exceed u and Y maxt is the largest observation of the248

process Yt, should be approximately linear in u. Fig. 12 shows the mean residual life plot with approximate249

95% confidence interval for the daily precipitation in Lund. The graph appears to curve from u = 0mm250

until u = 15mm and is approximately linear after that threshold. It is tempting to conclude that there251

is no stability until u = 28mm after which there is approximate linearity which suggests u = 28mm.252

However, such threshold gives very few excesses for any meaningful inference (33 observations out of253

16000). So we decided to work initially with the threshold set at u = 15mm.254

Finally, the different diagnostic plots for the fit of the Generalized Pareto distribution are collected in255

Fig. 13. The data from the rest of the stations have produced similar plots none of which gave any reason256

for concern about the quality of the fitted models. The parameters of the generalized Pareto model for257

the data from all the stations together with 95% confidence intervals are collected in Table 4. For three258

different stations, (i.e. Bolmen, Boras, and Hapamanda), the estimates of the shape parameter, ξ, are259

negative.260

Table 5 shows θ for different values of m-clusters and threshold u = 15mm for the data from Lund.261

11

ACCEPTED MANUSCRIPT

6 Evaluation262

To verify the validity of the model, we have obtained distribution functions of the different precipitation263

indices as stipulated by the Expert Team and its predecessor, the CCl/CLIVAR Working Group (WG) on264

Climate Change Detection, see Peterson et al. (2001) and Karl et al. (1999). Sixteen of those indices are265

of relevance to this work, two regarding only the occurrence of precipitation process (CDD and CWD),266

another two regarding only the amount precipitation process (SDII and Prec90p) and the remaining267

twelve concerning both processes, see Table 6.268

Using the chain dependent model, we have obtained the distribution of each index based on 100, 000269

simulations. This has been compared to the empirical distribution (’.-’ line in Figs. 14 - 18). The270

agreement between the two distributions is more than satisfactory. Moreover, the empirical distribution271

falls always inside the 90% exact confidence intervals. The results have been presented for the weather272

station in Lund. The rest of the stations give similar results.273

As we can see, Fig. 14 (top left), approximately during two years we expect to have about 17 days274

with precipitation more than 10mm and, Fig. 14 (top right), about 3 days with precipitation more than275

20mm. But then, see Fig. 14 (bottom left), the precipitation during each one of these three days will be276

quite a lot more than 20mm. Fig. 14 (bottom right) tell us that the probability of having 5 consecutive277

days of really heavy precipitation in Lund is quite high.278

As we notice in Fig. 15 (left), once every two years we should expect to have a dry spell with length279

more than two weeks, and a wet spell of approximately 12 days.280

In Fig. 16 (top left), we see that every two years in Lund, we expect to have almost fifty moderately281

wet days (top right), almost 18 above moderately wet days (top right), almost 9 very wet days (bottom282

left) and almost 2 extremely wet days (bottom right).283

In Fig. 17 (top left), we see that during the fifty moderately wet days that we expect over a period of284

two years in Lund we will have about 70% of the total amount of precipitation. Similarly, during the 18285

above moderate wet days we expect on average a little more than 40% of the total precipitation amount286

(top right), for the 8 very wet days about 25% of the total amount (bottom left) and for the 2 extremely287

wet days about 10% (bottom right) of the total amount.288

In Fig. 18 (left), we see that the average amount of precipitation per day of precipitation is 3.5mm289

and also every year on average only 1 out of the 10 precipitation days the downfall exceeds 9.5mm.290

7 Conclusions291

In this paper, we have modelled the temporal variability of the precipitation in Sweden. The different292

weather stations have been assumed as not having any spatial dependence. It is among our future research293

plans to try to model also the spatial variability of the precipitation in the different weather stations in294

Sweden. Some interesting conclusions can be drawn.295

We have used a chain dependent model for the precipitation. That consists of a component for the296

occurrence of precipitation and a component for the amount of precipitation. For the first component,297

12

ACCEPTED MANUSCRIPT

we have used high order Markov chains with two states. We have shown that the 1-Markov chain model298

that has been used extensively, is an inadequate model for most of the Swedish stations. For example,299

when the distribution of the long dry spell is of interest, the 1-Markov chains underestimates the length300

of the long dry spell in some cases up to half a day.301

For the amount of precipitation process, we have used a copula to describe the temporal dependence302

structure between successive days, which in reality is a Gaussian process with transformed marginals.303

Then, the cumulative distribution has been modelled in two steps. First using the empirical distribution304

for the amounts of precipitation that are less than a given threshold and, then using a generalised Pareto305

distribution to model the excesses above the threshold. Such models have the advantage that they provide306

with the mathematical platform that allows computation of such quantities as return periods.307

Finally, the distributions of different weather indices have been computed using Monte Carlo Markov308

Chain techniques, and been compared to the empirical distributions obtained from the data. The agree-309

ment between the two distributions has been really good, which supports the choice of the models.310

References311

[1] Akaike, H. (1974). A new look at statistical model identification, IEEE Trans. Auto. Contol, AC,312

19, pp.716-722313

[2] Aksoy, H. and Bayazit, M., (2000). A model for daily flows of intermittent streams. Hydrological314

Processes, 14, 1725-1744.315

[3] Benjamin, J.R. and Cornell, C.A., (1970). Probability, Statistics and Decision for Civil Engineers,316

McGraw-Hill, Inc., New York, 685 pp.317

[4] Bruhn, J.A., Fry, W.E. and Fick, G.W., (1980). Simulation of daily weather data using theoretical318

probability distributions. J. Appl. Meteorol. 19, pp. 1029-1036.319

[5] Castellvi, F. and Stockle, C.O., (2001). Comparing a locally-calibrated versus a generalised tempera-320

ture weather generation. Trans. ASAE 44 5, pp. 1143-1148.321

[6] Chin, E. H., (1977). Modelling daily precipitation process with Markov chain, Wat. Resources Res.,322

13, 949-956.323

[7] Coles, S., (2001). An Introduction to Statistical Modeling of Extreme Values. Springer, London.324

[8] Cox, D.R. and Isham, V., (1988). A simple spatial-temporal model for rainfall (with discussion). Proc.325

R. Soc. Lond. A, 415, 317-328.326

[9] Cox, D.R. and Isham, V., (1994). Stochastic models of precipitation. In Statistics for the Environment327

2: Water Related Issues (eds V. Barnett and K.F. Turkman), ch. 1, pp. 3-18. Chichester: Wiley.328

[10] Crovella, M. and Taqqu, M., (1999). Estimating the heavy tail index from scaling properties, Method-329

ology and Computing in Applied Probability 1, 55-79.330

[11] Dalevi, D., Pubhashi, D. and Hermansson, M., (2006). A New Order Estimator for Fixed and Vari-331

able Length Markov Models with Applications to DNA Sequence Similarity, Stat. Appl. Genet. Mol.332

Biol., 5, Article 8.333

13

ACCEPTED MANUSCRIPT

[12] Davison, A.C. and Smith, R.L., (1990). Models for exceedances over high thresholds, J. Roy. Statist.334

Soc. B, 52, pp. 393-442.335

[12] Frigessi, A., Haug, O., Rue, H., (2002). A Dynamic Mixture Model for Unsupervised Tail Estimation336

without Threshold Selection, Extremes, 5, pp.219-235.337

[13] Dudewizz, E. J. and Mishra, S. N. (1988). Modern Mathematical Statistics, Wiley Series in Proba-338

bility and Mathematical Statistics.339

[14] Gabriel, K.R. and Neumann, J., (1962). A Markov chain model for daily rainfall occurrences at Tel340

Aviv. Quart.J.Royal Met.Soc. 88, 90-95.341

[15] Geng, S., Frits, W.T., de Vries, P. and Supit, I., (1986). A simple method for generating daily rainfall342

data. Agric. For. Meteorol. 36, pp. 363-376.343

[16] Guttorp, P. (1995). Stochastic Modelling of Scientific Data,Chapman & Hall, London Chapter 2344

[17] Hutchinson, M.F., (1995). Stochastic space-time weather models from ground-based data. Agric.345

For. Meteorol., 73, 237-264.346

[18] Hutchinson, T.P. and Lai, C.D., (1990). Continuous Bivariate Distributions, Emphasising Applica-347

tions. Sydney, Australia: Rumsby.348

[19] Jimoh, O.D. and Webster, P., (1996). The optimum order of a Markov chain model for daily rainfall349

in Nigeria. Journal of Hydrology, 185, 45-69.350

[20] Joe, H., (1997). Multivariate Models and Dependence Concepts. London: Chapman & Hall351

[21] Karl, T.R., Nicholls, N. and Ghazi, A., (1999). CLIVAR/GCOS/WMO workshop on indices and352

indicators for climate extremes: Workshop summary, Climatic Change. Vol. 32, pp. 3-7.353

[22] Lana, X. and Burgueno, A., (1998). Daily dry-wet behaviour in Catalonia (NE Spain) from the354

viewpoint of Markov chains, Int. J. Climatol. 18, 793-815.355

[23] Leadbetter, M.R., Lindgren, G., and Rootzen, H., (1983). Extremes and Related Properties of Ran-356

dom Sequences and Series. Springer Verlag, New York.357

[24] LeCam, L., (1961). A stochastic description of precipitation Proc.4th Berkeley Symp., pp.165-186.358

[25] Lennartsson, J., and Shu, M., (2005). Copula Dependence Structure on Real Stock Markets, Masters359

thesis, Chalmers University of Technology, 2005-01.360

[26] Liao, Y., Zhang, Q. and Chen, D., (2004). Stochastic modeling of daily precipitation in China.361

Journal of Geographical Sciences, 14(4), 417-426.362

[27] Mellor, D., (1996). The modified turning bands (mtb) model for space-time rainfall:i, model defini-363

tion and properties. J. Hydrol., 175 113-127.364

[28] McNeil, A.J., (1996). Estimating the tails of loss severity distributions using extreme value theory,365

Technical report, Department Mathematik, ETH Zentrum, Zurich.366

[29] Nelsen, R. B., (2006). An Introduction to Copulas 2nd edition. New York: Springer.367

[30] Norris, J.R., (2005). Markov chains, Cambridge University Press.368

[31] Peterson, T.C., Folland, C., Gruza, G., Hogg, W., Mokssit, A. and Plummer, N., (2001). Report on369

the Activities of the Working Group on Climate Change Detection and Related Rapporteurs 1998-2001.370

14

ACCEPTED MANUSCRIPT

World Meteorological Organisation, WCDMP-47, WMO-TD 1071.371

[32] Racsko, P., Szeidl, L. and Semenov, M., (1991). A serial approach to local stochastic weather models.372

Ecol. Model. 57, pp. 27-41.373

[33] Resnick, S.I., (1997). Heavy tail modeling and teletraffic data, The Annals of Statistics 255, 1805-374

1869.375

[34] Rootzen, H. and Tajvidi, N., (1997). Extreme value statistics and wind storm losses: A case study,376

Scandinavian Actuarial Journal 1, 70-94.377

[35] Rodrıguez-Iturbe, I., Cox, D. and Isham, V., (1987). Some models for rainfall based on stochastic378

point processes. Proc. R. Soc. Lond., A 410, 269-288.379

[36] Rodrıguez-Iturbe, I., Cox, D. and Isham, V., (1988). A point process model for rainfall: further380

developments. Proc. R. Soc. Lond., A 417, 283-298.381

[37] Schwarz, G., (1978). Estimating the dimension of a model. Ann. Stat. 6, pp. 461-464.382

[38] Selker, J.S. and Haith, D.A., (1990). Development and testing of simple parameter precipitation383

distributions. Water Resour. Res. 26 11, pp. 2733-2740.384

[39] Smith, R. L. and Robinson, P.J., (1997). A Bayesian approach to the modelling of spatial-temporal385

precipitation data. Lect. Notes Statist., 237-269.386

[40] Srikanthan, R. and McMAhon, T.A., (2001). Stochastic generation of annual, monthly and daily387

climate data: A review. Hydrol. Earth Syst. Sci. 5 4, pp. 653-670.388

[41] Stern, R.D. and Coe, R. (1984)., A Model fitting Analysis of Daily Rainfall Data, J.R.Statist.Soc.389

A, 147, Part1, pp.1-34.390

[42] Stidd, C.K., (1973). Estimating the precipitation climate. Wat. Resour. Res., 9 1235-1241.391

[43] Tong, H. (1975). Determination of the order of a Markov chain by Akaike’s information criterion. J.392

Appl. Prob., 12: 486-497.393

[44] Trivedi, P. K. and Zimmer, D.M., (2007). Copula Modelling: An Introduction for Practitioners,394

Foundations and Trends in Econometrics, Vol. 1, No 1, 1-111.395

[45] Waymire, E., Gupta, V. K., (1981). The mathematical structure of rainfall representations: 3,, Some396

applications of the point process theory to rainfall processes. Wat. Resour. Res.17, 1287-1294.397

[46] Waymire, E., Gupta, V. K. and Rodrıguez-Iturbe, I., (1984). Spectral theory of rainfall intensity at398

the meso-β scale. Wat. Resour. Res.20, 1453-1465.399

[47] Woolhiser, D.A., (1992). Modelling daily precipitation-progress and problems. In: A. Walden and400

P. Guttorp (Editors), Statistics in the Environmental and Earth Sciences. Edward Arnold, London,401

pp.71-89.402

403

15

ACCEPTED MANUSCRIPT

8 Appendix404

8.1 Review of Mathematical Order Estimators405

Let Xt denote a k-Markov chain that is defined on a state space S and xn1 its realisation. Let also

PML(k)(xn1 ) be the kth order maximum likelihood, i.e.

PML(k)(xn1 ) = maxP (Xk

1 )Πni=k+1P (Xi = xi|τk(X i−1) = τk(xi−1)).

Tong (1975) reported that the Akaike Information Criterion (AIC) order estimator, could be used as

an objective technique for determining the optimum order k of the chain, see also Akaike (1974). The

optimium order k is the order that has the minimum loss function:

kAIC(xn1 ) = argmink(− log PML(k)(xn

1 ) + |S|k).

Schwarz (1978) presented an alternative technique the Bayesian Information Criterion (BIC) order

estimator whose consistency was established under general conditions was only recently established. The

optimum order, k is the order that minimises the loss function which now is given by:

kBIC(xn1 ) = argmink(− log PML(xn

1 ) +|S|k(|S| − 1)

2log(n)).

Dalevi et al. (2006) showed using experimental results that the BIC order estimator tends to under-406

estimate the order as k gets larger for moderate data sizes.407

Finally, the Maximal Fluctuation Criterion (MFC) contrary to the AIC and BIC order estimators,

was specifically designed for multiple step Markov chains. Let for any realisation x ∈ Sn of the k-Markov

chain, Nx(w) = |{i ∈ [1, n] : τl(xi) = w, w ∈ Sl}| denote the number of times w occurs in x. The

Peres-Shields Fluctuation function is defined as

∆k(v) = maxs∈S

|Nx(vs) − Nx(τk(v)s)Nx(τk(v))

Nx(v)|.

When the order of the Markov chain is k or less, this fluctuation is small. Therefore, the Maximal

Fluctuation Criterion (MFC) order estimator is defined as

kMFC(xn1 ) = min{k ≥ 0 : max

k<|v|<log log(n)∆k(v) < n3/4}.

In practice the function log log(·) is substituted by any function that grows slower than log(·). Dalevi et

al. (2006) suggested the Generalized Maximum Fluctuation Criterion (GMFC) order estimator, which is

closely related to the Maximal Fluctuation Criterion (MFC) order estimator,

kGMFC(xn1 ) = argmaxk

maxk−1<|v|<f(n) ∆k−1(v)maxk<|v|<f(n) ∆k(v)

,

where f(n) is any function that satisfies the same conditions as for the GMF order estimator.408

Fig.1: Location of the stations.409

Fig.2: Time plot of annual number of wet days.410

Fig.3: Time plot of annual amount of precipitation.411

16

ACCEPTED MANUSCRIPT

Fig.4: Lund, Sweden (data from 1961 to 2004). (Left): Observed p(t) pooled over 5 days. (Right):412

Mean number of wet days per month (”+”), and per season (solid lines).413

Fig.5: k-Markov chain orders for block lengths of one month, (Jan, Feb, ...).414

Fig.6 k-Markov chain orders for block lengths of two months, (Jan-Feb, Feb-Mar, ...).415

Fig.7 k-Markov chain orders for block lengths of three months, (Jan-Mar, Feb-Apr, ... ).416

Fig.8: Order of Markov chain as suggested by the Kolmogorov-Smirnov statistic at 10% tail value417

for each station and season.418

Fig. 9: Conditional distribution of Dry Spell given the Dry Spell is longer or equal to 3 days for419

k-Markov chain models of order k = 1, k = 2 and k = 3 and the data from Lund. Data are from the420

winter months December-February.421

Fig.10: Plot of the dependence structure with the marginal distributions transformed to standard422

normal.423

Fig. 11: Number of m-clusters with more than one observation. ’+’ denotes the observed and ’o’ the424

theoretical number of m-clusters assuming that the observations are independent, while ’*’ denotes the425

number of m clusters using ρ = 0.1362. Line ’–’ denotes the 95% confidence interval for the theoretical426

number of m-clusters assuming independence, while ’-.’ denotes the 95% confidence interval for the427

theoretical number of m-clusters assuming ρ = 0.1362.428

Fig. 12: Mean residual life plot of amount precipitation process from Lund, dotted lines give the429

95% confidence interval.430

Fig. 13: Diagnostic plots for threshold excess model fitted to daily precipitation data from the station431

in Lund.432

Fig. 14: Plots of R10mm (top left), R20mm (top right), RX1day (bottom left) and RX5day (bottom433

right). theoretical distribution ’-’ and empirical distribution ’.-’.434

Fig. 15: Plot of maximum number of consecutive dry days (left), and maximum number of consecutive435

wet days (right).436

Fig. 16: Plot of the probability of number of moderate wet days (top left), above moderate wet days437

(top right), very wet days (bottom left) and extremely wet days (bottom right).438

Fig. 17: Percentage of precipitation during the moderately wet days (top left), the above moderate439

wet days (top right), the very wet days (bottom left) and the extremely wet days (bottom right).440

Fig. 18: Plot of the average amount of precipitation per day of precipitation (left) and the 90%441

quantile of the amount of precipitation of the thinned precipitation process (right).442

Table 1: Names of weather stations.443

Table 2: Number of data sets that have passed the Kolmogorov-Smirnov test at the 10% tail value444

for different orders of the Markov chain. S1 stands for Dec.-Feb., S2 for Mar.-May, S3 for Jun.-Aug. and445

S4 for Sep.-Nov.446

Table 3: Expected length of long dry spells in days for season Dec-Feb in Lund.447

Table 4: Extremal parameters and their 95% confidence intervals for each weather station.448

17

ACCEPTED MANUSCRIPT

Table 5: Values of the parameter θ for different choices of m clusters.449

Table 6: Weather Indices and their mathematical expressions. The quantiles q(·) have been estimated450

using the observed data.451

18

ACCEPTED MANUSCRIPT

Figure 1

ACCEPTED MANUSCRIPT

1970 1980 1990 20000

100

200

300

annu

al n

o. o

f wet

day

s

Years

Lund

1970 1980 1990 20000

100

200

300

annu

al n

o. o

f wet

day

s

Years

Bolmen

1970 1980 1990 20000

100

200

300

annu

al n

o. o

f wet

day

s

Years

Hanö

1970 1980 1990 20000

100

200

300

annu

al n

o. o

f wet

day

s

Years

Borås

Figure 2

ACCEPTED MANUSCRIPT

1970 1980 1990 20000

100

200

300

annu

al n

o. o

f wet

day

s

Years

Varberg

1970 1980 1990 20000

100

200

300

annu

al n

o. o

f wet

day

s

Years

Ungsberg

1970 1980 1990 20000

100

200

300

annu

al n

o. o

f wet

day

s

Years

Säffle

1970 1980 1990 20000

100

200

300

annu

al n

o. o

f wet

day

s

Years

Söderköping

Figure 2

ACCEPTED MANUSCRIPT

1970 1980 1990 20000

100

200

300

annu

al n

o. o

f wet

day

s

Years

Stockholm

1970 1980 1990 20000

100

200

300

annu

al n

o. o

f wet

day

s

Years

Malugn

1970 1980 1990 20000

100

200

300

annu

al n

o. o

f wet

day

s

Years

Vattholma

1970 1980 1990 20000

100

200

300

annu

al n

o. o

f wet

day

s

Years

Myskåsen

Figure 2

ACCEPTED MANUSCRIPT

1970 1980 1990 20000

100

200

300

annu

al n

o. o

f wet

day

s

Years

Härnösand

1970 1980 1990 20000

100

200

300

annu

al n

o. o

f wet

day

s

Years

Rösta

1970 1980 1990 20000

100

200

300

annu

al n

o. o

f wet

day

s

Years

Piteå

1970 1980 1990 20000

100

200

300

annu

al n

o. o

f wet

day

s

Years

Stensele

Figure 2

ACCEPTED MANUSCRIPT

1970 1980 1990 20000

100

200

300

annu

al n

o. o

f wet

day

s

Years

Haparanda

1970 1980 1990 20000

100

200

300

annu

al n

o. o

f wet

day

s

Years

Kvikkjokk

1970 1980 1990 20000

100

200

300

annu

al n

o. o

f wet

day

s

Years

Pajala

1970 1980 1990 20000

100

200

300

annu

al n

o. o

f wet

day

s

Years

Karesuando

Figure 2

ACCEPTED MANUSCRIPT

1980 2000

500

1000

1500

amou

nt o

f pre

cip.

(m

m)

Years

Lund

1980 2000

500

1000

1500

amou

nt o

f pre

cip.

(m

m)

Years

Bolmen

1980 2000

500

1000

1500

amou

nt o

f pre

cip.

(m

m)

Years

Hanö

1980 2000

500

1000

1500

amou

nt o

f pre

cip.

(m

m)

Years

Borås

Figure 3

ACCEPTED MANUSCRIPT

1980 2000

500

1000

1500

amou

nt o

f pre

cip.

(m

m)

Years

Varberg

1980 2000

500

1000

1500

amou

nt o

f pre

cip.

(m

m)

Years

Ungsberg

1980 2000

500

1000

1500

amou

nt o

f pre

cip.

(m

m)

Years

Säffle

1980 2000

500

1000

1500

amou

nt o

f pre

cip.

(m

m)

Years

Söderköping

Figure 3

ACCEPTED MANUSCRIPT

1980 2000

500

1000

1500

amou

nt o

f pre

cip.

(m

m)

Years

Stockholm

1980 2000

500

1000

1500

amou

nt o

f pre

cip.

(m

m)

Years

Malugn

1980 2000

500

1000

1500

amou

nt o

f pre

cip.

(m

m)

Years

Vattholma

1980 2000

500

1000

1500

amou

nt o

f pre

cip.

(m

m)

Years

Myskåsen

Figure 3

ACCEPTED MANUSCRIPT

1980 2000

500

1000

1500

amou

nt o

f pre

cip.

(m

m)

Years

Härnösand

1980 2000

500

1000

1500

amou

nt o

f pre

cip.

(m

m)

Years

Rösta

1980 2000

500

1000

1500

amou

nt o

f pre

cip.

(m

m)

Years

Piteå

1980 2000

500

1000

1500

amou

nt o

f pre

cip.

(m

m)

Years

Stensele

Figure 3

ACCEPTED MANUSCRIPT

1980 2000

500

1000

1500

amou

nt o

f pre

cip.

(m

m)

Years

Haparanda

1980 2000

500

1000

1500

amou

nt o

f pre

cip.

(m

m)

Years

Kvikkjokk

1980 2000

500

1000

1500

amou

nt o

f pre

cip.

(m

m)

Years

Pajala

1980 2000

500

1000

1500

amou

nt o

f pre

cip.

(m

m)

Years

Karesuando

Figure 3

ACCEPTED MANUSCRIPT

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec0.3

0.35

0.4

0.45

0.5

0.55

0.6

0.65

0.7

0.75

0.8

prob

. of p

reci

p.

Figure 4

ACCEPTED MANUSCRIPT

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec10

11

12

13

14

15

16

17

18

19

20

mea

n no

. of w

et d

ays

Figure 4

ACCEPTED MANUSCRIPT


st 01 st 02 st 03 st 04 st 05

st 06 st 07 st 08 st 09 st 10

st 11 st 12 st 13 st 14 st 15

st 16 st 17 st 18 st 19 st 20

5

3

1

5

3

1

5

3

1

5

3

1


st 01 st 02 st 03 st 04 st 05

st 06 st 07 st 08 st 09 st 10

st 11 st 12 st 13 st 14 st 15

st 16 st 17 st 18 st 19 st 20

5

3

1

5

3

1

5

3

1

5

3

1


Figure 5

ACCEPTED MANUSCRIPT


st 01 st 02 st 03 st 04 st 05

st 06 st 07 st 08 st 09 st 10

st 11 st 12 st 13 st 14 st 15

st 16 st 17 st 18 st 19 st 20

5

3

1

5

3

1

5

3

1

5

3

1


Figure 5

ACCEPTED MANUSCRIPT


st 01 st 02 st 03 st 04 st 05

st 06 st 07 st 08 st 09 st 10

st 11 st 12 st 13 st 14 st 15

st 16 st 17 st 18 st 19 st 20

5

3

1

5

3

1

5

3

1

5

3

1


Figure 5

ACCEPTED MANUSCRIPT


st 01 st 02 st 03 st 04 st 05

st 06 st 07 st 08 st 09 st 10

st 11 st 12 st 13 st 14 st 15

st 16 st 17 st 18 st 19 st 20

5

3

1

5

3

1

5

3

1

5

3

1


Figure 6

ACCEPTED MANUSCRIPT


st 01 st 02 st 03 st 04 st 05

st 06 st 07 st 08 st 09 st 10

st 11 st 12 st 13 st 14 st 15

st 16 st 17 st 18 st 19 st 20

5

3

1

5

3

1

5

3

1

5

3

1


Figure 6

ACCEPTED MANUSCRIPT


st 01 st 02 st 03 st 04 st 05

st 06 st 07 st 08 st 09 st 10

st 11 st 12 st 13 st 14 st 15

st 16 st 17 st 18 st 19 st 20

5

3

1

5

3

1

5

3

1

5

3

1


Figure 6

ACCEPTED MANUSCRIPT


st 01 st 02 st 03 st 04 st 05

st 06 st 07 st 08 st 09 st 10

st 11 st 12 st 13 st 14 st 15

st 16 st 17 st 18 st 19 st 20

5

3

1

5

3

1

5

3

1

5

3

1


Figure 7

ACCEPTED MANUSCRIPT


st 01 st 02 st 03 st 04 st 05

st 06 st 07 st 08 st 09 st 10

st 11 st 12 st 13 st 14 st 15

st 16 st 17 st 18 st 19 st 20

5

3

1

5

3

1

5

3

1

5

3

1


Figure 7

ACCEPTED MANUSCRIPT


st 01 st 02 st 03 st 04 st 05

st 06 st 07 st 08 st 09 st 10

st 11 st 12 st 13 st 14 st 15

st 16 st 17 st 18 st 19 st 20

5

3

1

5

3

1

5

3

1

5

3

1


Figure 7

ACCEPTED MANUSCRIPT

Estimated Orders by KS−criterion of dry spell order estimator

st 01 st 02 st 03 st 04 st 05

st 06 st 07 st 08 st 09 st 10

st 11 st 12 st 13 st 14 st 15

st 16 st 17 st 18 st 19 st 20

531

531

531

531


st 01 st 02 st 03 st 04 st 05

st 06 st 07 st 08 st 09 st 10

st 11 st 12 st 13 st 14 st 15

st 16 st 17 st 18 st 19 st 20

531

531

531

531


st 01 st 02 st 03 st 04 st 05

st 06 st 07 st 08 st 09 st 10

st 11 st 12 st 13 st 14 st 15

st 16 st 17 st 18 st 19 st 20

531

531

531

531


Figure 8

ACCEPTED MANUSCRIPT

2 4 6 8 10 12 14

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

time (days)

prob

abili

ty

Empirical3−Markov2−Markov1−Markov

Figure 9

ACCEPTED MANUSCRIPT

−4 −3 −2 −1 0 1 2 3 4−4

−3

−2

−1

0

1

2

3

4

T(Y

t+1)

T(Yt)

Figure 10

ACCEPTED MANUSCRIPT

01

23

45

60 10 20 30 40 50

m

No clusters with more than one obs

Figure 11

ACCEPTED MANUSCRIPT

0 10 20 30 40 50 600

2

4

6

8

10

12

E[X

−u|

X>

u]

u

Figure 12

ACCEPTED MANUSCRIPT

0 0.5 10

0.5

1

Ext

rem

e E

mpi

rical

mod

el

GP model

Probability Plot

20 40 60

20

40

60

GP model (mm)

Ext

rem

e E

mpi

rical

mod

el (

mm

) Quantile Plot

10 40 160

40

60

80

Return Period (Years)

Ret

urn

Leve

l

Return Level Plot

20 40 600

20

40

60

80

no o

f obs

erva

tions

Amount of precipitation (mm)

Density Plot

Figure 13

ACCEPTED MANUSCRIPT

10 15 20 25 30

0

0.2

0.4

0.6

0.8

1

no. of days

Index R10mm


Figure 14

ACCEPTED MANUSCRIPT

0 1 2 3 4 5 6 7 8 9 10

0

0.2

0.4

0.6

0.8

1

no. of days

Index R20mm


Figure 14

ACCEPTED MANUSCRIPT

15 20 25 30 35 40 45 50 55 60 65

0

0.2

0.4

0.6

0.8

1


Index RX1day


Figure 14

ACCEPTED MANUSCRIPT

30 40 50 60 70 80 90 100 110

0

0.2

0.4

0.6

0.8

1


Index RX5day


Figure 14

ACCEPTED MANUSCRIPT

10 15 20 25 30 35

0

0.2

0.4

0.6

0.8

1

no. of days

Index CDD


Figure 15

ACCEPTED MANUSCRIPT

8 10 12 14 16 18 20 22 24

0

0.2

0.4

0.6

0.8

1

no. of days

Index CWD


Figure 15

ACCEPTED MANUSCRIPT

30 35 40 45 50 55 60 65 70

0

0.2

0.4

0.6

0.8

1

no. of days

Index R75p


Figure 16

ACCEPTED MANUSCRIPT

10 15 20 25 30

0

0.2

0.4

0.6

0.8

1

no. of days

Index R90p


Figure 16

ACCEPTED MANUSCRIPT

4 6 8 10 12 14 16 18

0

0.2

0.4

0.6

0.8

1

no. of days

Index R95p


Figure 16

ACCEPTED MANUSCRIPT

0 1 2 3 4 5 6 7 8 9

0

0.2

0.4

0.6

0.8

1

no. of days

Index R99p


Figure 16

ACCEPTED MANUSCRIPT

0.6 0.65 0.7 0.75 0.8

0

0.2

0.4

0.6

0.8

1

Index R75pTOT


Figure 17

ACCEPTED MANUSCRIPT

0.25 0.3 0.35 0.4 0.45 0.5 0.55

0

0.2

0.4

0.6

0.8

1

Index R90pTOT


Figure 17

ACCEPTED MANUSCRIPT

0.15 0.2 0.25 0.3 0.35 0.4

0

0.2

0.4

0.6

0.8

1

Index R95pTOT


Figure 17

ACCEPTED MANUSCRIPT

0 0.05 0.1 0.15 0.2 0.25

0

0.2

0.4

0.6

0.8

1

Index R99pTOT


Figure 17

ACCEPTED MANUSCRIPT

3 3.5 4 4.5 5

0

0.2

0.4

0.6

0.8

1


Index SDII


Figure 18

ACCEPTED MANUSCRIPT

8 9 10 11 12 13

0

0.2

0.4

0.6

0.8

1


Index Prec90p


Figure 18

ACCEPTED MANUSCRIPT

Number Name

1 Lund

2 Bolmen

3 Hano

4 Boras

5 Varberg

6 Ungsberg

7 Saffle

8 Soderkoping

9 Stockholm

10 Malung

11 Vattholma

12 Myskelasen

13 Harnosand

14 Rosta

15 Pitea

16 Stensele

17 Haparanda

18 Kvikkjokk

19 Pajala

20 Karesuando

1

Table 1

ACCEPTED MANUSCRIPT

Season

Model S1 S2 S3 S4

k = 1 1 0 1 6

k = 2 20 20 20 20

k = 3 20 20 20 20

1

Table 2

ACCEPTED MANUSCRIPT

Model l = 1 l = 2 l = 3

k = 1 2.49 3.49 4.49

k = 2 - 3.91 4.91

k = 3 - - 5.11

Observed 2.56 3.97 5.23

mean value

1

Table 3

ACCEPTED MANUSCRIPT

Station σ CI for σ ξ CI for ξ θ u (mm) ρ

Lund 5.91 (4.93, 7.03) 0.076 (-0.041, 0.236) 0.935 15 0.1362

Bolmen 6.44 (5.56, 7.41) -0.0002 (-0.095, 0.116) 0.921 15 0.2008

Hano 5.29 (3.044, 8.737) 0.458 (0.115, 1.05) 0.977 25 0.1649

Boras 7.63 (7.01, 8.28) -0.011 (-0.067,0.053) 0.794 10 0.1982

Varberg 5.48 (4.687, 6.378) 0.106 (0.001, 0.236) 0.926 15 0.1206

Ungsberg 5.768 (4.622,7.115) 0.245 (0.089,0.445) 0.925 15 0.1843

Saffle 6.62 (5.96,7.329) 0.099 (0.027,0.183) 0.857 10 0.1809

Soderkoping 6.259 (4.32,8.884) 0.297 (0.1, 0.649) 0.984 25 0.1678

Stockholm 5.597 (4.827,6.453) 0.135 (0.033,0.259) 0.903 10 0.1523

Malung 6.355 (5.676,7.095) 0.08 (0.004,0.17) 0.86 10 0.2280

Vattholma 4.964 (3.521,6.784) 0.334 (0.098,0.667) 0.984 20 0.1709

Myskelasen 6.854 (5.962,7.844) 0.019 (-0.072,0.13) 0.849 10 0.2311

Harnosand 7.863 (7.053, 8.742) 0.087 (0.011, 0.175) 0.832 10 0.2068

Rosta 6.276 (5.453,7.19) 0.032 (-0.062, 0.145) 0.876 10 0.2116

Pitea 5.937 (4.429, 7.822) 0.19 (0.004, 0.456) 0.96 20 0.2010

Stensele 7.66 (6.098, 9.5) 0.041 (-0.11, 0.236) 0.915 15 0.2249

Haparanda 5.628 (4.405, 7.07) -0.073 (-0.196, 0.125) 0.984 18 0.1871

Kvikkjokk 5.66 (5.01, 6.36) 0.04 (-0.04, 0.137) 0.864 10 0.2526

Pajala 5.033 (3.705, 6.728) 0.356 (0.153, 0.646) 0.966 18 0.2385

Karesuando 5.303 (4.117, 6.754) 0.12 (-0.037, 0.34) 0.922 15 0.2206

1

Table 4

ACCEPTED MANUSCRIPT

m θ

0 0.9144

1 0.8836

2 0.8425

3 0.8322

1

Table 5

ACCEPTED MANUSCRIPT

Index Description Formula

R10mm Heavy precipitation days∑

1{Zi>10}

R20mm Very heavy precipitation days∑

1{Zi>20}

RX1day Highest 1 day precipitation amount maxi Zi

RX5day Highest 5 day precipitation amount maxi

∑4j=0 Zi+j

CDD Max number of consecutive dry days max{j : τj(X i) = 0}CWD Max number of consecutive wet days max{j : w = τj(X i), wk > 0, ∀k}R75p Moderate wet days

∑1{Zi>q0.75}

R90p Above moderate wet days∑

1{Zi>q0.90}

R95p Very wet days∑

1{Zi>q0.95}

R95p Extremely wet days∑

1{Zi>q0.99}


Zi1{Zi>q0.75}/∑

Zi


Zi1{Zi>q0.90}/∑

Zi


Zi1{Zi>q0.95}/∑

Zi


Zi1{Zi>q0.99}/∑

Zi

SDII Simple daily intensity index∑

Yi/∑

1{Yi>0}Prec90p 90%-quant. of thinned amount of precipitation F−1

Y (0.9)

1

Table 6

Modelling Precipitation in Sweden Using Multiple Step Markov …palbin/JansLicuppsats.pdf · 2010. 3. 18. · series with the stochastic structure of the model. We can also eval uate

Documents