-
Hindawi Publishing CorporationJournal of Probability and
StatisticsVolume 2013, Article ID 797014, 15
pageshttp://dx.doi.org/10.1155/2013/797014
Research ArticleEstimation of Extreme Values by the Average
ConditionalExceedance Rate Method
A. Naess,1 O. Gaidai,2 and O. Karpa3
1 Department of Mathematical Sciences and CeSOS, Norwegian
University of Science and Technology, 7491 Trondheim,
Norway2Norwegian Marine Technology Research Institute, 7491
Trondheim, Norway3 Centre for Ships and Ocean Structures (CeSOS),
Norwegian University of Science and Technology, 7491 Trondheim,
Norway
Correspondence should be addressed to A. Naess;
[email protected]
Received 18 October 2012; Revised 22 December 2012; Accepted 9
January 2013
Academic Editor: A. Thavaneswaran
Copyright 2013 A. Naess et al.This is an open access article
distributed under the Creative Commons Attribution License,
whichpermits unrestricted use, distribution, and reproduction in
any medium, provided the original work is properly cited.
This paper details a method for extreme value prediction on the
basis of a sampled time series. The method is specifically
designedto account for statistical dependence between the sampled
data points in a precisemanner. In fact, if properly used, the
newmethodwill provide statistical estimates of the exact extreme
value distribution provided by the data in most cases of practical
interest. Itavoids the problem of having to decluster the data to
ensure independence, which is a requisite component in the
application of, forexample, the standard peaks-over-threshold
method. The proposed method also targets the use of subasymptotic
data to improveprediction accuracy. The method will be demonstrated
by application to both synthetic and real data. From a practical
point ofview, it seems to perform better than the POT and block
extremes methods, and, with an appropriate modification, it is
directlyapplicable to nonstationary time series.
1. Introduction
Extreme value statistics, even in applications, are
generallybased on asymptotic results. This is done either by
assumingthat the epochal extremes, for example, yearly extreme
windspeeds at a given location, are distributed according to
thegeneralized (asymptotic) extreme value distribution withunknown
parameters to be estimated on the basis of the ob-served data [1,
2]. Or it is assumed that the exceedances abovehigh thresholds
follow a generalized (asymptotic) Paretodistribution with
parameters that are estimated from the data[14]. The major problem
with both of these approaches isthat the asymptotic extreme value
theory itself cannot be usedin practice to decide to what extent it
is applicable for theobserved data. And since the statistical tests
to decide thisissue are rarely precise enough to completely settle
this prob-lem, the assumption that a specific asymptotic extreme
valuedistribution is the appropriate distribution for the
observeddata is based more or less on faith or convenience.
On the other hand, one can reasonably assume that inmost cases
long time series obtained from practical measure-ments do contain
values that are large enough to provide
useful information about extreme events that are
trulyasymptotic. This cannot be strictly proved in general,
ofcourse, but the accumulated experience indicates that asymp-totic
extreme value distributions do provide reasonable, if notalways
very accurate, predictions when based on measureddata. This is
amply documented in the vast literature on thesubject, and good
references to this literature are [2, 5, 6]. Inan effort to improve
on the current situation, we have tried todevelop an approach to
the extreme value prediction problemthat is less restrictive and
more flexible than the ones basedon asymptotic theory. The approach
is based on two separatecomponentswhich are designed to improve on
two importantaspects of extreme value prediction based on observed
data.The first component has the capability to accurately
captureand display the effect of statistical dependence in the
data,which opens for the opportunity of using all the available
datain the analysis. The second component is then constructedso as
to make it possible to incorporate to a certain extentalso the
subasymptotic part of the data into the estimationof extreme
values, which is of some importance for accurateestimation. We have
used the proposed method on a widevariety of estimation problems,
and our experience is that
-
2 Journal of Probability and Statistics
it represents a very powerful addition to the toolbox ofmethods
for extreme value estimation. Needless to say, whatis presented in
this paper is by no means considered a closedchapter. It is a novel
method, and it is to be expected thatseveral aspects of the
proposed approach will see significantimprovements.
2. Cascade of Conditioning Approximations
In this section, a sequence of nonparametric
distributionfunctions will be constructed that converges to the
exactextreme value distribution for the time series
considered.Thisconstitutes the core of the proposed approach.
Consider a stochastic process (), which has beenobserved over a
time interval, (0, ) say. Assume that values1, . . . ,
, which have been derived from the observed
process, are allocated to the discrete times 1, . . . ,
in (0, ).
This could be simply the observed values of () at each, = 1, . .
. , , or it could be average values or peak
values over smaller time intervals centered at the s. Our
goal in this paper is to accurately determine the
distributionfunction of the extreme value
= max{
; = 1, . . . , }.
Specifically, we want to estimate () = Prob( )
accurately for large values of . An underlying premise for
thedevelopment in this paper is that a rational approach to
thestudy of the extreme values of the sampled time series is
toconsider exceedances of the individual random variables
above given thresholds, as in classical extreme value theory.The
alternative approach of considering the exceedances byupcrossing of
given thresholds by a continuous stochasticprocess has been
developed in [7, 8] along lines similar to thatadopted here.The
approach taken in the present paper seemsto be the appropriate way
to deal with the recorded data timeseries of, for example, the
hourly or daily largest wind speedsobserved at a given
location.
From the definition of () it follows that
() = Prob ( ) = Prob {
, . . . ,
1 }
= Prob { |
1 , . . . ,
1 }
Prob {1 , . . . ,
1 }
=
=2
Prob { |
1 , . . . ,
1 }
Prob (1 ) .
(1)
In general, the variables are statistically dependent.
Hence, instead of assuming that all the are statistically
independent, which leads to the classical approximation
() 1() :=
=1
Prob ( ) , (2)
where :=means by definition, the following one-step mem-ory
approximation will, to a certain extent, account for thedependence
between the
s,
Prob { |
1 , . . . ,
1 }
Prob { |
1 } ,
(3)
for 2 . With this approximation, it is obtained that
() 2()
:=
=2
Prob { |
1 }Prob (
1 ) .
(4)
By conditioning on one more data point, the one-step mem-ory
approximation is extended to
Prob { |
1 , . . . ,
1 }
Prob { |
1 ,
2 } ,
(5)
where 3 , which leads to the approximation
() 3() :=
=3
Prob { |
1 ,
2 }
Prob {2 |
1 }Prob (
1 ) .
(6)
For a general , 2 , it is obtained that
() ()
:=
=
Prob { |
1 , . . . ,
+1 }
1
=2
Prob { |
1 . . . ,
1 }
Prob (1 ) ,
(7)
where () = ().
It should be noted that the one-step memory approxi-mation
adopted above is not a Markov chain approximation[911], nor do the
-step memory approximations lead toth-order Markov chains [12, 13].
An effort to relinquish theMarkov chain assumption to obtain an
approximate distribu-tion of clusters of extremes is reported in
[14].
It is of interest to have a closer look at the values for
()obtained by using (7) as compared to (2). Now, (2) can
berewritten in the form
() 1() =
=1
(1 1()) , (8)
-
Journal of Probability and Statistics 3
where 1() = Prob{
> }, = 1, . . . , . Then the approx-
imation based on assuming independent data can be writtenas
() 1() := exp(
=1
1()) , . (9)
Alternatively, (7) can be expressed as,
() () =
=
(1 ())
1
=1
(1 ()) , (10)
where () = Prob{
> |
1 , . . . ,
+1 }, for
2, denotes the exceedance probability conditional on 1 previous
nonexceedances. From (10) it is now obtainedthat
() () := exp(
=
()
1
=1
()) ,
,
(11)
and () () as with
() = () for .
For the cascade of approximations () to have practical
significance, it is implicitly assumed that there is a
cut-offvalue
satisfying
such that effectively
() =
(). It may be noted that for -dependent stationary data
sequences, that is, for data whereand
are independent
whenever | | > , then () = +1() exactly, and, under
rather mild conditions on the joint distributions of the
data,lim
1() = lim
() [15]. In fact, it can be shown
that lim
1() = lim
() is true for weaker con-
ditions than -dependence [16]. However, for finite values of the
picture is much more complex, and purely asymptoticresults should
be used with some caution. Cartwright [17]used the notion of
-dependence to investigate the effect onextremes of correlation in
sea wave data time series.
Returning to (11), extreme value prediction by the con-ditioning
approach described above reduces to estimation of(combinations) of
the
() functions. In accordance with
the previous assumption about a cut-off value , for all -
values of interest, , so that 1=1() is effectively
negligible compared to=(). Hence, for simplicity, the
following approximation is adopted, which is applicable toboth
stationary and nonstationary data,
() = exp(
=
()) , 1. (12)
Going back to the definition of 1(), it follows that
=11() is equal to the expected number of exceedances
of the threshold during the time interval (0, ). Equation(9)
therefore expresses the approximation that the stream ofexceedance
events constitute a (nonstationary) Poisson pro-cess. This opens
for an understanding of (12) by interpretingthe expressions
=() as the expected effective number
of independent exceedance events provided by conditioningon 1
previous observations.
3. Empirical Estimation of the AverageConditional Exceedance
Rates
The concept of average conditional exceedance rate (ACER)of
order is now introduced as follows:
() =
1
+ 1
=
() , = 1, 2, . . . . (13)
In general, this ACER function also depends on the numberof data
points.
In practice, there are typically two scenarios for theunderlying
process (). Either we may consider to be astationary process, or,
in fact, even an ergodic process. Thealternative is to view() as a
process that depends on certainparameters whose variation in time
may be modelled as anergodic process in its own right. For each set
of values of theparameters, the premise is that () can then be
modelled asan ergodic process. This would be the scenario that can
beused to model long-term statistics [18, 19].
For both these scenarios, the empirical estimation of theACER
function
() proceeds in a completely analogous
way by counting the total number of favourable incidents,that
is, exceedances combined with the requisite number ofpreceding
nonexceedances, for the total data time series andthen finally
dividing by + 1 . This can be shown toapply for the long-term
situation.
A few more details on the numerical estimation of ()
for 2 may be appropriate. We start by introducing thefollowing
random functions:
() = 1 {
> ,
1 , . . . ,
+1 } ,
= , . . . , , = 2, 3, . . . ,
() = 1 {
1 , . . . ,
+1 } ,
= , . . . , , = 2, . . . ,
(14)
where 1{A} denotes the indicator function of some
eventA.Then,
() =
E [()]
E [()]
, = , . . . , , = 2, . . . , (15)
where E[] denotes the expectation operator. Assuming anergodic
process, then obviously
() =
() = =
(), and by replacing ensemble means with correspond-
ing time averages, it may be assumed that for the time seriesat
hand
() = lim
=()
=()
, (16)
where () and
() are the realized values of
() and
(), respectively, for the observed time series.Clearly, lim
E[()] = 1. Hence, lim
()/
() = 1, where
() =
=E [()]
+ 1. (17)
-
4 Journal of Probability and Statistics
The advantage of using themodified ACER function () for
2 is that it is easier to use for nonstationary or long-term
statistics than
(). Since our focus is on the values of
the ACER functions at the extreme levels, we may use anyfunction
that provides correct predictions of the appropriateACER function
at these extreme levels.
To see why (17) may be applicable for nonstationary timeseries,
it is recognized that
() exp(
=
()) = exp(
=
E [()]
E [()]
)
exp(
=
E [()]) .
(18)
If the time series can be segmented into blocks, such thatE[()]
remains approximately constant within each block
and such that
E[()]
() for a sufficient
range of -values, wheredenotes the set of indices for block
no. , = 1, . . . , , then=
E[()]
=(). Hence,
() exp ( ( + 1) ()) , (19)
where
() =
1
+ 1
=
() . (20)
It is of interest to note what events are actually countedfor
the estimation of the various
(), 2. Let us
start with 2(). It follows from the definition of
2() that
2() ( 1) can be interpreted as the expected number of
exceedances above the level , satisfying the condition that
anexceedance is counted only if it is immediately preceded by
anon-exceedance. A reinterpretation of this is that
2() (
1) equals the average number of clumps of exceedancesabove , for
the realizations considered, where a clump ofexceedances is defined
as a maximum number of consecutiveexceedances above . In
general,
() ( + 1) then
equals the average number of clumps of exceedances above
separated by at least 1 nonexceedances. If the timeseries analysed
is obtained by extracting local peak valuesfrom a narrow band
response process, it is interesting tonote the similarity between
the ACER approximations andthe envelope approximations for extreme
value prediction[7, 20]. For alternative statistical approaches to
account forthe effect of clustering on the extreme value
distribution, thereader may consult [2126]. In these works, the
emphasis ison the notion of an extremal index, which characterizes
theclumping or clustering tendency of the data and its effect onthe
extreme value distribution. In the ACER functions, theseeffects are
automatically accounted for.
Now, let us look at the problem of estimating a
confidenceinterval for
(), assuming a stationary time series. If
realizations of the requisite length of the time series
isavailable, or, if one long realization can be segmented into
subseries, then the sample standard deviation () can be
estimated by the standard formula
()2
=1
1
=1
(()
()
())2
, (21)
where ()() denotes the ACER function estimate from real-
ization no. , and () =
=1()
()/.
Assuming that realizations are independent, for a suitablenumber
, for example, 20, (21) leads to a good approx-imation of the 95%
confidence interval CI = ((), +())for the value
(), where
() =
()
1.96 ()
. (22)
Alternatively, and which also applies to the non-station-ary
case, it is consistent with the adopted approach to assumethat the
stream of conditional exceedances over a threshold constitute a
Poisson process, possibly non-homogeneous.Hence, the variance of
the estimator
() of
(), where
() =
=()
+ 1
(23)
is Var[()] =
().Therefore, for high levels , the approx-
imate limits of a 95% confidence interval of (), and also
(), can be written as
() =
()(1
1.96
( + 1) ()
) . (24)
4. Estimation of Extremes for the AsymptoticGumbel Case
The second component of the approach to extreme valueestimation
presented in this paper was originally derived fora time series
with an asymptotic extreme value distributionof the Gumbel type,
compared with [27]. We have thereforechosen to highlight this case
first, also because the extensionof the asymptotic distribution to
a parametric class of extremevalue distribution tails that are
capable of capturing to someextent subasymptotic behaviour is more
transparent, andperhaps more obvious, for the Gumbel case. The
reasonbehind the efforts to extend the extreme value distributions
tothe subasymptotic range is the fact that the ACER functionsallow
us to use not only asymptotic data, which is clearly anadvantage
since proving that observed extremes are trulyasymptotic is really
a nontrivial task.
The implication of the asymptotic distribution being ofthe
Gumbel type on the possible subasymptotic functionalforms of
() cannot easily be decided in any detail.However,
using the asymptotic form as a guide, it is assumed thatthe
behaviour of the mean exceedance rate in the tail isdominated by a
function of the form exp{( )} ( 1 ), where , , and are suitable
constants, and
1is an
-
Journal of Probability and Statistics 5
appropriately chosen tail marker. Hence, it will be
assumedthat,
() =
() exp {
(
)
} , 1, (25)
where the function () is slowly varying, compared with
the exponential function exp{(
)} and
, , and
are suitable constants, that in general will be dependent on
.Note that the value
= () = 1 corresponds to the asymp-
totic Gumbel distribution, which is then a special case of
theassumed tail behaviour.
From (25) it follows that
log
log(()
())
= log (
) log (
) .
(26)
Therefore, under the assumptions made, a plot of log | log(
()/
())| versus log(
) will exhibit a
perfectly linear tail behaviour.It is realized that if the
function
() could be replaced
by a constant value, say , one would immediately be in a
position to apply a linear extrapolation strategy for deep
tailprediction problems. In general,
() is not constant, but its
variation in the tail region is often sufficiently slow to
allowfor its replacement by a constant, possibly by adjusting the
tailmarker
1.Theproposed statistical approach to the prediction
of extreme values is therefore based on the assumption thatwe
can write,
() =
exp {
(
)
} , 1, (27)
where , , , and
are appropriately chosen constants. In
a certain sense, this is aminimal class of parametric
functionsthat can be used for this purpose which makes it possible
toachieve three important goals. Firstly, the parametric
classcontains the asymptotic form given by
= = 1 as a
special case. Secondly, the class is flexible enough to
capture,to a certain extent, subasymptotic behaviour of any
extremevalue distribution, that is, asymptotically Gumbel.
Thirdly,the parametric functions agree with a wide range of
knownspecial cases, of which a very important example is theextreme
value distribution for a regular stationary Gaussianprocess, which
has
= 2.
The viability of this approach has been successfully
dem-onstrated by the authors formean up-crossing rate estimationfor
extreme value statistics of the response processes relatedto a wide
range of different dynamical systems, comparedwith [7, 8].
As to the question of finding the parameters , , , (the
subscript , if it applies, is suppressed), the adoptedapproach is
to determine these parameters byminimizing thefollowingmean square
error function, with respect to all fourarguments,
(, , , ) =
=1
log (
) log + (
)
2
, (28)
where 1< <
denotes the levels where the ACER func-
tion has been estimated, denotes a weight factor that
puts more emphasis on the more reliably estimated ().
The choice of weight factor is to some extent arbitrary. Wehave
previously used
= (log+(
) log(
)) with
= 1 and 2, combined with a Levenberg-Marquardt leastsquares
optimization method [28]. This has usually workedwell provided
reasonable and initial values for the parameterswere chosen. Note
that the form of
puts some restriction
on the use of the data. Usually, there is a level
beyondwhich
is no longer defined, that is, (
) becomes negative.
Hence, the summation in (28) has to stop before that
happens.Also, the data should be preconditioned by establishing
thetail marker
1based on inspection of the empirical ACER
functions.In general, to improve robustness of results, it is
recom-
mended to apply a nonlinearly constrained optimization [29].The
set of constraints is written as
log ( )
0,
0 < < +,
min
<
1,
0 < < +,
0 < < 5.
(29)
Here, the first nonlinear inequality constraint is evident,
sinceunder our assumptionwe have
() = exp{(
)}, and
() < 1 by definition.
A Note of Caution. When the parameter is equal to 1.0 orclose to
it, that is, the distribution is close to the Gumbeldistribution,
the optimization problem becomes ill-definedor close to
ill-defined. It is seen that when = 1.0, there isan infinity of (,
) values that gives exactly the same valueof (, , , ). Hence, there
is no well-defined optimum inparameter space.There are simply
toomany parameters.Thisproblem is alleviated by fixing the -value,
and the obviouschoice is = 1.
Although the Levenberg-Marquardt method generallyworks well with
four or, when appropriate, three parameters,we have also developed
a more direct and transparentoptimization method for the problem at
hand. It is realizedby scrutinizing (28) that if and are fixed, the
optimizationproblem reduces to a standard weighted linear
regressionproblem. That is, with both and fixed, the optimal
valuesof and log are found using closed form weighted
linearregression formulas in terms of
, = log (
) and
=
( ). In that light, it can also be concluded that the best
linear unbiased estimators (BLUE) are obtained for= 2
,
where 2= Var[
] (empirical) [30, 31]. Unfortunately, this
is not a very practical weight factor for the kind of problemwe
have here because the summation in (28) then typicallywould have to
stop at undesirably small values of
.
-
6 Journal of Probability and Statistics
It is obtained that the optimal values of and are givenby the
relations
(, ) =
=1( ) (
)
=1( )2,
log (, ) = + (, ) ,
(30)
where = =1/
=1, with a similar definition of .
To calculate the final optimal set of parameters, onemay use the
Levenberg-Marquardt method on the function(, ) = (
(, ), , ,
(, )) to find the optimal values
and , and then use (30) to calculate the corresponding and .For
a simple construction of a confidence interval for
the predicted, deep tail extreme value given by a particularACER
function as provided by the fitted parametric curve, theempirical
confidence band is reanchored to the fitted curveby centering the
individual confidence intervals CI
0.95for the
point estimates of the ACER function on the fitted curve.Under
the premise that the specified class of parametriccurves fully
describes the behaviour of the ACER functions inthe tail,
parametric curves are fitted as described above to theboundaries of
the reanchored confidence band. These curvesare used to determine a
first estimate of a 95% confidenceinterval of the predicted extreme
value. To obtain a moreprecise estimate of the confidence interval,
a bootstrappingmethod would be recommended. A comparison of
estimatedconfidence intervals by both these methods will be
presentedin the section on extreme value prediction for synthetic
data.As a final point, it has been observed that the predicted
valueis not very sensitive to the choice of
1, provided it is chosen
with some care. This property is easily recognized by lookingat
the way the optimized fitting is done. If the tail marker isin the
appropriate domain of the ACER function, the optimalfitted curve
does not change appreciably by moving the tailmarker.
5. Estimation of Extremes for the General Case
For independent data in the general case, the ACER function1()
can be expressed asymptotically as
1()
[1 + ( ( ))]1/
, (31)
where > 0, , are constants. This follows from theexplicit
form of the so-called generalized extreme value(GEV) distribution
Coles [1].
Again, the implication of this assumption on the
possiblesubasymptotic functional forms of
() in the general case
is not a trivial matter. The approach we have chosen is toassume
that the class of parametric functions needed forthe prediction of
extreme values for the general case can bemodelled on the relation
between the Gumbel distributionand the general extreme value
distribution. While the exten-sion of the asymptotic Gumbel case to
the proposed class ofsubasymptotic distributions was fairly
transparent, this is notequally so for the general case. However,
using a similar kind
of approximation, the behaviour of the mean exceedance ratein
the subasymptotic part of the tail is assumed to follow afunction
largely of the form [1 + (( ))]1/ (
1
), where > 0, , > 0, and > 0 are suitable
constants,and
1is an appropriately chosen tail level. Hence, it will be
assumed that [32]
() =
() [1 +
((
)
)]1/
, 1, (32)
where the function () is weakly varying, compared with
the function [1 + ((
))]1/ and
> 0,
, > 0
and > 0 are suitable constants, that in general will be
dependent on . Note that the values = 1 and
() = 1 cor-
responds to the asymptotic limit, which is then a special caseof
the general expression given in (25). Another method toaccount for
subasymptotic effects has recently been proposedby Eastoe and Tawn
[33], building on ideas developed byTawn [34], Ledford and Tawn
[35] and Heffernan and Tawn[36]. In this approach, the asymptotic
form of the marginaldistribution of exceedances is kept, but it is
modified by amultiplicative factor accounting for the dependence
structureof exceedances within a cluster.
An alternative form to (32) would be to assume that
() = [1 +
((
)
+ ())]
1/
, 1,
(33)
where the function () is weakly varying compared with
the function (
) . However, for estimation purposes, it
turns out that the form given by (25) is preferable as it leads
tosimpler estimation procedures. This aspect will be discussedlater
in the paper.
For practical identification of the ACER functions givenby (32),
it is expedient to assume that the unknown function() varies
sufficiently slowly to be replaced by a constant.
In general, () is not constant, but its variation in the
tail region is assumed to be sufficiently slow to allow for
itsreplacement by a constant. Hence, as in the Gumbel case, itis in
effect assumed that
() can be replaced by a constant
for 1, for an appropriate choice of tail marker
1. For
simplicity of notation, in the following we will suppress
theindex on the ACER functions, which will then be written as
() = [1 + ( )
]
, 1, (34)
where = 1/, = .For the analysis of data, first the tail
marker
1is
provisionally identified from visual inspection of the logplot
(, ln
()). The value chosen for
1corresponds to the
beginning of regular tail behaviour in a sense to be
discussedbelow.
The optimization process to estimate the parameters isdone
relative to the log plot, as for theGumbel case.Themeansquare error
function to be minimized is in the general casewritten as (, , , ,
) =
=1
log (
) log
+ log [1 + ( )
]
2
,
(35)
where is a weight factor as previously defined.
-
Journal of Probability and Statistics 7
An option for estimating the five parameters , , ,, is again to
use the Levenberg-Marquardt least squaresoptimization method, which
can be simplified also in thiscase by observing that if , , and are
fixed in (28), theoptimization problem reduces to a standard
weighted linearregression problem.That is, with , , and fixed, the
optimalvalues of and log are found using closed form weightedlinear
regression formulas in terms of
, = log (
) and
= 1 + (
).
It is obtained that the optimal values of and log are given by
relations similar to (30). To calculate the finaloptimal set of
parameters, the Levenberg-Marquardt meth-od may then be used on the
function (, , ) =(, , ,
(, , ),
(, , )) to find the optimal values ,
, and , and then the corresponding and can becalculated. The
optimal values of the parameters may, forexample, also be found by
a sequential quadratic program-ming (SQP) method [37].
6. The Gumbel Method
To offer a comparison of the predictions obtained by themethod
proposed in this paper with those obtained by othermethods, we will
use the predictions given by the two meth-ods that seem to bemost
favored by practitioners, theGumbelmethod and the
peaks-over-threshold (POT) method, pro-vided, of course, that the
correct asymptotic extreme valuedistribution is of the Gumbel
type.
The Gumbel method is based on recording epochalextreme values
and fitting these values to a correspondingGumbel distribution
[38]. By assuming that the recordedextreme value data are Gumbel
distributed, then representingthe obtained data set of extreme
values as a Gumbel probabil-ity plot should ideally result in a
straight line. In practice, onecannot expect this to happen, but on
the premise that the datafollow a Gumbel distribution, a straight
line can be fitted tothe data. Due to its simplicity, a popular
method for fittingthis straight line is the method of moments,
which is alsoreasonably stable for limited sets of data. That is,
writing theGumbel distribution of the extreme value
as
Prob ( ) = exp { exp ( ( ))} , (36)
it is known that the parameters and are related to
themeanvalue
and standard deviation
of () as follows:
= 0.57722
1 and = 1.28255/[39].The estimates
of and
obtained from the available sample therefore
provides estimates of and , which leads to the fittedGumbel
distribution by the moment method.
Typically, a specified quantile value of the fitted
Gumbeldistribution is then extracted and used in a design
considera-tion. To be specific, let us assume that the requested
quantilevalue is the 100(1 )% fractile, where is usually a
smallnumber, for example, = 0.1. To quantify the
uncertaintyassociated with the obtained 100(1 )% fractile value
basedon a sample of size , the 95% confidence interval of thisvalue
is often used. A good estimate of this confidenceinterval can be
obtained by using a parametric bootstrappingmethod [40, 41]. Note
that the assumption that the initial
extreme values are actually generated with good approxima-tion
fromaGumbel distribution cannot easily be verifiedwithany accuracy
in general, which is a drawback of this method.Comparedwith the
POTmethod, the Gumbelmethodwouldalso seem to use much less of the
information available inthe data. This may explain why the POT
method has becomeincreasingly popular over the past years, but the
Gumbelmethod is still widely used in practice.
7. The Peaks-over-Threshold Method
7.1.TheGeneralized ParetoDistribution. ThePOTmethod
forindependent data is based on what is called the
generalizedPareto (GP) distribution (defined below) in the
followingmanner: it has been shown in [42] that asymptotically
theexcess values above a high level will follow a GP distributionif
and only if the parent distribution belongs to the domainof
attraction of one of the extreme value distributions. Theassumption
of a Poisson process model for the exceedancetimes combined with GP
distributed excesses can be shownto lead to the generalized extreme
value (GEV) distributionfor corresponding extremes, see below.The
expression for theGP distribution is
() = (; , ) = Prob ( ) = 1 (1 +
)
1/
+
.
(37)
Here > 0 is a scale parameter and ( < < ) deter-mines
the shape of the distribution. ()
+= max(0, ).
The asymptotic result referred to above implies that(37) can be
used to represent the conditional cumulativedistribution function
of the excess = of the observedvariates over the threshold , given
that > for issufficiently large [42]. The cases > 0, = 0, and
< 0correspond to Frechet (Type II), Gumbel (Type I),
andreverseWeibull (Type III) domains of attraction,
respectively,compared with section below.
For = 0, which corresponds to the Gumbel extremevalue
distribution, the expression between the parenthesesin (37) is
understood in a limiting sense as the exponentialdistribution
() = (; , 0) = exp(
) . (38)
Since the recorded data in practice are rarely indepen-dent, a
declustering technique is commonly used to filter thedata to
achieve approximate independence [1, 2].
7.2. Return Periods. The return period of a given value
of in terms of a specified length of time , for example,a year,
is defined as the inverse of the probability that thespecified
value will be exceeded in any time interval of length. If denotes
the mean exceedance rate of the threshold per length of time (i.e.,
the average number of data pointsabove the threshold per ), then
the return period of thevalue of corresponding to the level
= + is given by
the relation
=1
Prob ( > )=
1
Prob ( > ). (39)
-
8 Journal of Probability and Statistics
Hence, it follows that
Prob ( ) = 1 1(). (40)
Invoking (1) for = 0 leads to the result
=
[1 ()]
. (41)
Similarly, for = 0, it is found that,
= + ln () , (42)
where is the threshold used in the estimation of and .
8. Extreme Value Prediction for Synthetic Data
In this section, we illustrate the performance of the ACERmethod
and also the 95%CI estimation.We consider 20 yearsof synthetic wind
speed data, amounting to 2000 data points,which is not much for
detailed statistics. However, this casemay represent a real
situationwhen nothing but a limited datasample is available. In
this case, it is crucial to provide extremevalue estimates
utilizing all data available. As we will see, thetail extrapolation
technique proposed in this paper performsbetter than asymptotic
methods such as POT or Gumbel.
The extreme value statistics will first be analyzed
byapplication to synthetic data for which the exact extremevalues
can be calculated [43]. In particular, it is assumedthat the
underlying (normalized) stochastic process () isstationary and
Gaussian with mean value zero and standarddeviation equal to one.
It is also assumed that the mean zeroup-crossing rate +(0) is such
that the product +(0) = 103,where = 1 year, which seems to be
typical for the windspeed process.Using the Poisson assumption, the
distributionof the yearly extreme value of () is then calculated by
theformula
1 yr() = exp {+ () } = exp{103 exp(
2
2)} ,
(43)
where = 1 year and +() is the mean up-crossing rate peryear, is
the scaled wind speed. The 100-year return periodvalue 100 yr is
then calculated from the relation1 yr(100 yr) =1 1/100, which gives
100 yr = 4.80.
The Monte Carlo simulated data to be used for thesynthetic
example are generated based on the observationthat the peak events
extracted from measurements of thewind speed process, are usually
separated by 3-4 days. This isdone to obtain approximately
independent data, as requiredby the POTmethod. In accordance with
this, peak event dataare generated from the extreme value
distribution
3 d() = exp{ exp(
2
2)} , (44)
where = +(0) = 10, which corresponds to = 3.65 days,and 1 yr() =
(3 d())100.
Since the data points (i.e., = 3.65 days maxima)
areindependent,
() is independent of . Therefore, we put =
1. Since we have 100 data from one year, the data amounts to2000
data points. For estimation of a 95% confidence intervalfor each
estimated value of the ACER function
1() for the
chosen range of -values, the required standard deviation in(22)
was based on 20 estimates of the ACER function usingthe yearly
data. This provided a 95% confidence band onthe optimally fitted
curve based on 2000 data. From thesedata, the predicted 100-year
return level is obtained from1(100 yr) = 10
4. A nonparametric bootstrapping methodwas also used to estimate
a 95% confidence interval based on1000 resamples of size 2000.
The POT prediction of the 100-year return level wasbased on
using maximum likelihood estimates (MLE) of theparameters in (37)
for a specific choice of threshold. The95% confidence interval was
obtained from the parametri-cally bootstrapped PDF of the POT
estimate for the giventhreshold. A sample of 1000 data sets was
used. One of theunfortunate features of the POTmethod is that the
estimated100 year value may vary significantly with the choice
ofthreshold. So also for the synthetic data.We have followed
thestandard recommended procedures for identifying a
suitablethreshold [1].
Note that in spite of the fact that the true
asymptoticdistribution of exceedances is the exponential
distribution in(38), the POT method used here is based on adopting
(37).The reason is simply that this is the recommended
procedure[1], which is somewhat unfortunate but understandable.The
reason being that the GP distribution provides greaterflexibility
in terms of curve fitting. If the correct asymptoticdistribution of
exceedances had been used on this example,poor results for the
estimated return period values would beobtained.The price to pay
for using theGP distribution is thatthe estimated parametersmay
easily lead to an asymptoticallyinconsistent extreme value
distribution.
The 100-year return level predicted by the Gumbel meth-od was
based on using themethod of moments for parameterestimation on the
sample of 20 yearly extremes. This choiceof estimation method is
due to the small sample of extremevalues. The 95% confidence
interval was obtained from theparametrically bootstrapped PDF of
the Gumbel prediction.This was based on a sample of size 10,000
data sets of 20 yearlyextremes. The results obtained by the method
of momentswere compared with the corresponding results obtained
byusing the maximum likelihood method. While there wereindividual
differences, the overall picture was one of verygood agreement.
In order to get an idea about the performance of theACER, POT,
and Gumbel methods, 100 independent 20-year MC simulations as
discussed above were done. Table 1compares predicted values and
confidence intervals for aselection of 10 cases together with
average values over the 100simulated cases. It is seen that the
average of the 100 predicted100-year return levels is slightly
better for the ACER methodthan for both the POT and the Gumbel
methods. But moresignificantly, the range of predicted 100-year
return levels bythe ACER method is 4.345.36, while the same for the
POTmethod is 4.195.87 and for the Gumbel method is 4.415.71.
-
Journal of Probability and Statistics 9
Table 1: 100-year return level estimates and 95% CI (BCI = CI by
bootstrap) for A = ACER, G = Gumbel, and P = POT. Exact value =
4.80.
Sim. No. A 100 ACI ABCI G 100 GBCI P 100 PBCI1 5.07 (4.67, 5.21)
(4.69, 5.42) 4.41 (4.14, 4.73) 4.29 (4.13, 4.52)10 4.65 (4.27,
4.94) (4.37, 5.03) 4.92 (4.40, 5.58) 4.88 (4.42, 5.40)20 4.86
(4.49, 5.06) (4.44, 5.19) 5.04 (4.54, 5.63) 5.04 (4.48, 5.74)30
4.75 (4.22, 5.01) (4.33, 5.02) 4.75 (4.27, 5.32) 4.69 (4.24,
5.26)40 4.54 (4.20, 4.74) (4.27, 4.88) 4.80 (4.31, 5.39) 4.73
(4.19, 5.31)50 4.80 (4.35, 5.05) (4.42, 5.14) 4.91 (4.41, 5.50)
4.79 (4.31, 5.34)60 4.84 (4.36, 5.20) (4.48, 5.19) 4.85 (4.36,
5.43) 4.71 (4.32, 5.23)70 5.02 (4.47, 5.31) (4.62, 5.36) 4.96
(4.47, 5.53) 4.97 (4.47, 5.71)80 4.59 (4.33, 4.81) (4.38, 4.98)
4.76 (4.31, 5.31) 4.68 (4.15, 5.27)90 4.84 (4.49, 5.11) (4.60,
5.30) 4.77 (4.34, 5.32) 4.41 (4.23, 4.64)100 4.62 (4.29, 5.05)
(4.45, 5.09) 4.79 (4.31, 5.41) 4.53 (4.05, 4.88)Av. 100 4.82 (4.41,
5.09) (4.48, 5.18) 4.84 (4.37, 5.40) 4.72 (4.27, 5.23)
Hence, in this case the ACER method performs consistentlybetter
than both these methods. It is also observed from theestimated 95%
confidence intervals that theACERmethod, asimplemented in this
paper, provides slightly higher accuracythan the other two methods.
Lastly, it is pointed out that theconfidence intervals of the
100-year return level estimatedby the ACER method obtained by
either the simplifiedextrapolated confidence band approach or by
nonparametricbootstrapping are very similar, except for a slight
mean shift.As a final comparison, the 100 bootstrapped confidence
inter-vals obtained for the ACER and Gumbel methods missedthe
target value three times, while for the POT method thisnumber was
18.
An example of the ACER plot and results obtained forone set of
data is presented in Figure 1.The predicted 100-yearvalue is 4.85
with a predicted 95% confidence interval (4.45,5.09). Figure 2
presents POT predictions based on MLE fordifferent thresholds in
terms of the number of data pointsabove the threshold. The
predicted value is 4.7 at = 204,while the 95% confidence interval
is (4.25, 5.28). The samedata set as in Figure 1 was used. This was
also used for theGumbel plot shown in Figure 3. In this case the
predictedvalue based on themethod ofmoments (MM) is 100 yrMM =
4.75with a parametric bootstrapped 95% confidence interval of(4.34,
5.27). Prediction based on the Gumbel-Lieblein BLUEmethod (GL),
compared with for example, Cook [44], is100 yrGL = 4.73with a
parametric bootstrapped 95% confidenceinterval equal to (4.35,
5.14).
9. Measured Wind Speed Data
In this section, we analyze real wind speed data, measuredat two
weather stations off the coast of Norway: at Nordyanand at
Hekkingen, see Figure 4. Extreme wind speed predic-tion is an
important issue for design of structures exposed tothe weather
variations. Significant efforts have been devotedto the problemof
predicting extremewind speeds on the basisof measured data by
various authors over several decades,see, for example, [4548] for
extensive references to previouswork.
2.5 3 3.5 4 4.5 5 5.5
4.85
101
102
103
104
105
ACER1(
)
Figure 1: Synthetic data ACER 1, Monte Carlo simulation ();
optimized curve fit (); empirical 95% confidence band (-
-);optimized confidence band ( ). Tail marker
1= 2.3.
Hourly maximum gust wind was recorded during the13 years
19992012 at Nordyan and the 14 years 19982012 at Hekkingen. The
objective is to estimate a 100-yearwind speed. Variation in the
wind speed caused by seasonalvariations in the wind climate during
the year makes thewind speed a non-stationary process on the scale
of months.Moreover, due to global climate change, yearly statistics
mayvary on the scale of years. The latter is, however, a
slowprocess, and for the purpose of long-term prediction weassume
here that within a time span of 100 years a quasi-stationary model
of the wind speeds applies. This may not beentirely true, of
course.
9.1. Nordyan. Figure 5 highlights the cascade of
ACERestimates
1, . . . ,
96, for the case of 13 years of hourly data
recorded at theNordyanweather station.Here, 96is consid-
ered to represent the final converged results. By converged,we
mean that
96 for > 96 in the tail, so that there is no
-
10 Journal of Probability and Statistics
100 130 160 190 204 230 250
4.6
4.62
4.64
4.66
4.68
4.7
4.72
4.74
4.76
4.7
100
yr
Figure 2: The point estimate 100 yr of the 100-year return
periodvalue based on 20 years synthetic data as a function of the
number of data points above threshold. The return level estimate =
4.7 at = 204.
3.6 3.8 4 4.2 4.4 4.6 4.8
0
1
2
3
4
5
4.751
ln
(ln((+1)/))
Figure 3: The point estimate 100 yr of the 100-year return
periodvalue based on 20 years synthetic data. Lines are fitted by
themethod of momentssolid line () and the Gumbel-LiebleinBLUE
methoddash-dotted lite (- -). The return level estimateby the
method of moments is 4.75, by the Gumbel-Lieblein BLUEmethod is
4.73.
need to consider conditioning of an even higher order than96.
Figure 5 reveals a rather strong statistical dependencebetween
consecutive data, which is clearly reflected in theeffect of
conditioning on previous data values. It is also inter-esting to
observe that this effect is to some extent capturedalready by
2, that is, by conditioning only on the value of the
previous data point. Subsequent conditioning on more thanone
previous data point does not lead to substantial changesin ACER
values, especially for tail values. On the other hand,to bring out
fully the dependence structure of these data, itwas necessary to
carry the conditioning process to (at least)the 96th ACER function,
as discussed above.
However, from a practical point of view, the most impor-tant
information provided by theACERplot of Figure 5 is that
Hekkingen Fyr 88690
Nordyan Fyr 75410
Figure 4: Wind speed measurement stations.
3.5 4 4.5 5 5.5 6 6.5 7
101
102
103
104
= 1 = 2
= 4 = 24
= 48
= 72
= 96
/
ACER(
)
Figure 5: Nordyan wind speed statistics, 13 years hourly
data.Comparison between ACER estimates for different degrees of
con-ditioning. = 6.01m/s.
for the prediction of a 100-year value, one may use the
firstACER function.The reason for this is that Figure 5 shows
thatall the ACER functions coalesce in the far tail. Hence,
wemayuse any of the ACER functions for the prediction. Then,
theobvious choice is to use the first ACER function, which allowsus
to use all the data in its estimation and thereby
increaseaccuracy.
In Figure 6 is shown the results of parametric estimationof the
return value and its 95% CI for 13 years of hourly
-
Journal of Probability and Statistics 11
3 4 5 6 7 8 9
8.62
101
102
103
104
106
105
/
ACER1(
)
Figure 6: Nordyanwind speed statistics, 13 years hourly data.
1()
(); optimized curve fit (); empirical 95% confidence band (-
-);optimized confidence band ( ). Tail marker
1= 12.5m/s = 2.08
( = 6.01m/s).
100 120 140 161 180 2007.6
7.7
7.8
7.9
8
8.1
7.95
100
yr/
Figure 7:The point estimate 100 yr of the 100-year return level
basedon 13 years hourly data as a function of the number of data
pointsabove threshold. = 6.01m/s.
maxima. The predicted 100-year return speed is 100 yr =51.85m/s
with 95% confidence interval (48.4, 53.1). = 13years of data may
not be enough to guarantee (22), since werequired 20. Nevertheless,
for simplicity, we use it hereeven with = 13, accepting that it may
not be very accurate.
Figure 7 presents POT predictions for different thresholdnumbers
based on MLE. The POT prediction is 100 yr =47.8m/s at threshold =
161, while the bootstrapped 95%confidence interval is found to be
(44.8, 52.7) m/s basedon 10,000 generated samples. It is
interesting to observe theunstable characteristics of the
predictions over a range ofthreshold values, while they are quite
stable on either side ofthis range giving predictions that are more
in line with theresults from the other two methods.
Figure 8 presents a Gumbel plot based on the 13 yearlyextremes
extracted from the 13 years of hourly data. The
6 6.5 7 7.5 8 8.5 9 9.5
0
1
2
3
4
5
1
/
ln
(ln((+1)/))
8.56 9.23
Figure 8: Nordyan wind speed statistics, 13 years of hourlydata.
Gumbel plot of yearly extremes. Lines are fitted by themethod of
momentssolid line () and the Gumbel-LiebleinBLUE methoddash-dotted
lite (- -). = 6.01m/s.
Table 2: Predicted 100-year return period levels for NordyanFyr
weather station by the ACER method for different degrees
ofconditioning, annual maxima, and POT methods, respectively.
Method Spec 100 yr, m/s 95% CI (100 yr), m/s
ACER, various
1 51.85 (48.4, 53.1)2 51.48 (46.1, 54.1)4 52.56 (46.7, 55.7)24
52.90 (47.0, 56.2)48 54.62 (47.7, 57.6)72 53.81 (46.9, 58.3)96
54.97 (47.5, 60.5)
Annual maxima MM 51.5 (45.2, 59.3)GL 55.5 (48.0, 64.9)
POT 47.8 (44.8, 52.7)
Gumbel prediction based on the method of moments (MM)is 100 yrMM
= 51.5m/s, with a parametric bootstrapped 95%confidence interval
equal to (45.2, 59.3) m/s, while predictionbased on the
Gumbel-Lieblein BLUEmethod (GL) is 100 yrGL =55.5m/s, with a
parametric bootstrapped 95% confidenceinterval equal to (48.0,
64.9) m/s.
In Table 2 the 100-year return period values for theNordyan
station are listed together with the predicted 95%confidence
intervals for all methods.
9.2. Hekkingen. Figure 9 shows the cascade of estimatedACER
functions
1, . . . ,
96for the case of 14 years of hourly
data. As for Nordyan, 96
is used to represent the finalconverged results. Figure 9 also
reveals a rather strong sta-tistical dependence between consecutive
data at moderatewind speed levels.This effect is again to some
extent capturedalready by
2, so that subsequent conditioning on more than
one previous data point does not lead to substantial changesin
ACER values, especially for tail values.
-
12 Journal of Probability and Statistics
4 4.5 5 5.5 6 6.5 7 7.5 8 8.5
104
103
102
/
= 1
= 2
= 4
= 24
= 48
= 72
= 96
ACER(
)
Figure 9: Hekkingen wind speed statistics, 14 years hourly
data.Comparison between ACER estimates for different degrees of
con-ditioning. = 5.72m/s.
5 6 7 8 9 10 11 12
10.6
106
105
104
103
102
/
ACER1(
)
Figure 10: Hekkingen wind speed statistics, 14 years hourly
data.1() (); optimized curve fit (); empirical 95% confidence
band
(- -); optimized confidence band ( ). Tail marker 1= 23m/s =
4.02 ( = 5.72m/s).
Also, for the Hekkingen data, the ACER plot of Figure 9indicates
that the ACER functions coalesce in the far tail.Hence, for the
practical prediction of a 100-year value, onemay use the first ACER
function.
In Figure 10 is shown the results of parametric estimationof the
return value and its 95% CI for 14 years of hourlymaxima. The
predicted 100-year return speed is 100 yr =60.47m/s with 95%
confidence interval (53.1, 64.9). Equation(22) has been used also
for this example with = 14.
Figure 11 presents POT predictions for different
thresholdnumbers based on MLE. The POT prediction is 100 yr
=53.48m/s at threshold = 183, while the bootstrapped
Table 3: Predicted 100-year return period levels for NordyanFyr
weather station by the ACER method for different degrees
ofconditioning, annual maxima, and POT methods, respectively.
Method Spec 100 yr, m/s 95% CI (100 yr), m/s
ACER, various
1 60.47 (53.1, 64.9)
2 62.23 (53.3, 70.0)
4 63.03 (53.0, 74.5)
24 60.63 (51.3, 70.7)
48 60.44 (51.3, 77.0)
72 58.06 (51.2, 66.4)
96 59.19 (52.0, 68.3)
Annual maxima MM 58.10 (50.8, 67.3)
GL 60.63 (53.0, 70.1)
POT 53.48 (48.9, 57.0)
100 140 185 220 260 300
9.26
9.3
9.34
9.38
9.42
9.35
100
yr/
Figure 11: The point estimate 100 yr of the 100-year return
levelbased on 14 years hourly data as a function of the number of
datapoints above threshold. = 5.72m/s.
95% confidence interval is found to be (48.9, 57.0) m/s basedon
10,000 generated samples. It is interesting to observethe unstable
characteristics of the predictions over a range ofthreshold values,
while they are quite stable on either side ofthis range giving
predictions that are more in line with theresults from the other
two methods.
Figure 12 presents a Gumbel plot based on the 14 yearlyextremes
extracted from the 14 years of hourly data. TheGumbel prediction
based on the method of moments (MM)is 100 yrMM = 58.10m/s, with a
parametric bootstrapped 95%confidence interval equal to (50.8,
67.3)m/s. Prediction basedon the Gumbel-Lieblein BLUE method (GL)
is 100 yrGL =60.63m/s, with a parametric bootstrapped 95%
confidenceinterval equal to (53.0, 70.1) m/s.
In Table 3, the 100-year return period values for theHekkingen
station are listed together with the predicted 95%confidence
intervals for all methods.
-
Journal of Probability and Statistics 13
ln(ln((+1)/))
7 7.5 8 8.5 9 9.5 10 10.5
0
1
2
3
4
5
10.2 10.6
/
1
Figure 12: Hekkingen wind speed statistics, 14 years of
hourlydata. Gumbel plot of yearly extremes. Lines are fitted by
themethod of momentssolid line () and the Gumbel-LiebleinBLUE
methoddash-dotted lite (- -). = 5.72m/s.
10. Extreme Value Prediction for a NarrowBand Process
In engineering mechanics, a classical extreme response
pre-diction problem is the case of a lightly damped mechani-cal
oscillator subjected to random forces. To illustrate thisprediction
problem, we will investigate the response pro-cess of a linear
mechanical oscillator driven by a Gaussianwhite noise. Let ()
denote the displacement response; thedynamic model can then be
expressed as, () + 2
() +
2
() = (), where = relative damping,
= undamped
eigenfrequency, and()= a stationaryGaussianwhite noise(of
suitable intensity). By choosing a small value for , theresponse
time series will exhibit narrow band characteristics,that is, the
spectral density of the response process ()will assume significant
values only over a narrow rangeof frequencies. This manifests
itself by producing a strongbeating of the response time series,
which means that the sizeof the response peakswill change slowly in
time, see Figure 13.A consequence of this is that neighbouring
peaks are stronglycorrelated, and there is a conspicuous clumping
of the peakvalues. Hence the problemwith accurate prediction, since
theusual assumption of independent peak values is then
violated.
Many approximations have been proposed to deal withthis
correlation problem, but no completely satisfactorysolution has
been presented. In this section, we will showthat the ACER method
solves this problem efficiently andelegantly in a statistical
sense. In Figure 14 are shown someof the ACER functions for the
example time series. It maybe verified from Figure 13 that there
are approximately 32sample points between two neighbouring peaks in
the timeseries. To illustrate a point, we have chosen to analyze
thetime series consisting of all sample points.Usually, in
practice,only the time series obtained by extracting the peak
valueswould be used for the ACER analysis. In the present case,the
first ACER function is then based on assuming that all
60 70 80 90 100 110 120
0
0.5
1
1.5
2
2.5
Time (s)
2.5
2
1.5
1
0.5
(
)
Figure 13: Part of the narrow-band response time series of the
linearoscillator with fully sampled and peak values indicated.
the sampled data points are independent, which is
obviouslycompletely wrong. The second ACER function, which isbased
on counting each exceedance with an immediatelypreceding
nonexceedance, is nothing but an upcrossing rate.Using this ACER
function is largely equivalent to assumingindependent peak values.
It is now interesting to observethat the 25th ACER function can
hardly be distinguishedfrom the second ACER function. In fact, the
ACER functionsafter the second do not change appreciably until one
starts toapproach the 32nd, which corresponds to hitting the
previouspeak value in the conditioning process. So, the
importantinformation concerning the dependence structure in
thepresent time series seems to reside in the peak values, whichmay
not be very surprising. It is seen that the ACER functionsshow a
significant change in value as a result of accountingfor the
correlation effects in the time series. To verify thefull
dependence structure in the time series, it is necessaryto continue
the conditioning process down to at least the64th ACER function. In
the present case, there is virtuallyno difference between the 32nd
and the 64th, which showsthat the dependence structure in this
particular time series iscaptured almost completely by conditioning
on the previouspeak value. It is interesting to contrast the method
of dealingwith the effect of sampling frequency discussed here with
thatof [49].
To illustrate the results obtained by extracting only thepeak
values from the time series, which would be theapproach typically
chosen in an engineering analysis, theACER plots for this case is
shown in Figure 15. By comparingresults from Figures 14 and 15, it
can be verified that theyare in very close agreement by recognizing
that the secondACER function in Figure 14 corresponds to the first
ACERfunction in Figure 15, and by noting that there is a factor
ofapproximately 32 between corresponding ACER functions inthe two
figures. This is due to the fact that the time series ofpeak values
contains about 32 times less data than the originaltime series.
-
14 Journal of Probability and Statistics
1.5 2 2.5 3 3.5 4
101
102
103
104
105
= 1 = 2 = 25
= 32 = 64
/
ACER(
)
Figure 14: Comparison between ACER estimates for
differentdegrees of conditioning for the narrow-band time
series.
1 1.5 2 2.5 3 3.5 4
103
102
101
/
= 1
= 2
= 4
= 5 = 3
ACER(
)
Figure 15: Comparison between ACER estimates for
differentdegrees of conditioning based on the time series of the
peak values,compared with Figure 13.
11. Concluding Remarks
This paper studies a newmethod for extreme value predictionfor
sampled time series.Themethod is based on the introduc-tion of a
conditional average exceedance rate (ACER), whichallows dependence
in the time series to be properly and easilyaccounted for.
Declustering of the data is therefore avoided,and all the data are
used in the analysis. Significantly, theproposed method also aims
at capturing to some extent thesubasymptotic form of the extreme
value distribution.
Results for wind speeds, both synthetic and measured,are used to
illustrate the method. An estimation problem
related to applications in mechanics is also presented.
Thevalidation of the method is done by comparison with exactresults
(when available), or other widely used methods forextreme value
statistics, such as the Gumbel and the peaks-over-threshold (POT)
methods. Comparison of the variousestimates indicate that the
proposed method provides moreaccurate results than the Gumbel and
POT methods.
Subject to certain restrictions, the proposed method alsoapplies
to nonstationary time series, but it cannot directlypredict for
example, the effect of climate change in the formof long-term
trends in the average exceedance rates extendingbeyond the data.
This must be incorporated into the analysisby explicit modelling
techniques.
As a final remark, it may be noted that the ACERmethodas
described in this paper has a natural extension to
higherdimensional distributions. The implication is that, it is
thenpossible to provide estimates of for example, the exact
bivari-ate extreme value distribution for a suitable set of data
[50].However, as is easily recognized, the extrapolation problem
isnot as simply dealt with as for the univariate case studied
inthis paper.
Acknowledgment
This work was supported by the Research Council of Norway(NFR)
through the Centre for Ships and Ocean Structures(CeSOS) at the
Norwegian University of Science and Tech-nology.
References
[1] S. Coles, An Introduction to Statistical Modeling of
ExtremeValues, Springer Series in Statistics, Springer, London,UK,
2001.
[2] J. Beirlant, Y. Goegebeur, J. Teugels, and J. Segers,
Statistics ofExtremes, Wiley Series in Probability and Statistics,
John Wiley& Sons, Chichester, UK, 2004.
[3] A. C. Davison and R. L. Smith, Models for exceedances
overhigh thresholds, Journal of the Royal Statistical Society.
Series B.Methodological, vol. 52, no. 3, pp. 393442, 1990.
[4] R.-D. Reiss and M. Thomas, Statistical Analysis of Extreme
Val-ues, Birkhauser, Basel, Switzerland, 3rd edition, 1997.
[5] P. Embrechts, C. Kluppelberg, and T. Mikosch, Modelling
Ex-tremal Events, vol. 33 ofApplications ofMathematics
(NewYork),Springer, Berlin, Germany, 1997.
[6] M. Falk, J. Husler, and R.-D. Reiss, Laws of Small
Numbers:Extremes and Rare Events, Birkhauser, Basel, Switzerland, 2
ndedition, 2004.
[7] A. Naess and O. Gaidai, Monte Carlo methods for
estimatingthe extreme response of dynamical systems, Journal of
Engi-neering Mechanics, vol. 134, no. 8, pp. 628636, 2008.
[8] A. Naess, O. Gaidai, and S. Haver, Efficient estimation
ofextreme response of drag-dominated offshore structures byMonte
Carlo simulation,Ocean Engineering, vol. 34, no. 16, pp.21882197,
2007.
[9] R. L. Smith, The extremal index for a Markov chain,
Journalof Applied Probability, vol. 29, no. 1, pp. 3745, 1992.
[10] S. G. Coles, A temporal study of extreme rainfall, in
Statisticsfor the Environment 2-Water Related Issues, V. Barnett
and K.F. Turkman, Eds., chapter 4, pp. 6178, John Wiley &
Sons,Chichester, UK, 1994.
-
Journal of Probability and Statistics 15
[11] R. L. Smith, J. A. Tawn, and S. G. Coles, Markov chain
modelsfor threshold exceedances, Biometrika, vol. 84, no. 2, pp.
249268, 1997.
[12] S. Yun, The extremal index of a higher-order stationary
Mark-ov chain, The Annals of Applied Probability, vol. 8, no. 2,
pp.408437, 1998.
[13] S. Yun, The distributions of cluster functionals of
extremeevents in a dth-order Markov chain, Journal of Applied
Prob-ability, vol. 37, no. 1, pp. 2944, 2000.
[14] J. Segers, Approximate distributions of clusters of
extremes,Statistics & Probability Letters, vol. 74, no. 4, pp.
330336, 2005.
[15] G. S. Watson, Extreme values in samples from
-dependentstationary stochastic processes, Annals of Mathematical
Statis-tics, vol. 25, pp. 798800, 1954.
[16] M. R. Leadbetter, G. Lindgren, and H. Rootzen, Extremes
andRelated Properties of Random Sequences and Processes,
SpringerSeries in Statistics, Springer, New York, NY, USA,
1983.
[17] D. E. Cartwright, On estimating the mean energy of sea
wavesfrom the highest waves in a record, Proceedings of the
RoyalSociety of London. Series A, vol. 247, pp. 2228, 1958.
[18] A. Naess, On the long-term statistics of extremes,
AppliedOcean Research, vol. 6, no. 4, pp. 227228, 1984.
[19] G. Schall,M.H. Faber, and R. Rackwitz, Ergodicity
assumptionfor sea states in the reliability estimation of offshore
structures,Journal of Offshore Mechanics and Arctic Engineering,
vol. 113,no. 3, pp. 241246, 1991.
[20] E. H. Vanmarcke, On the distribution of the first-passage
timefor normal stationary random processes, Journal of
AppliedMechanics, vol. 42, no. 1, pp. 215220, 1975.
[21] M. R. Leadbetter, Extremes and local dependence in
stationarysequences, Zeitschrift fur Wahrscheinlichkeitstheorie und
Ver-wandte Gebiete, vol. 65, no. 2, pp. 291306, 1983.
[22] T. Hsing, On the characterization of certain point
processes,Stochastic Processes and Their Applications, vol. 26, no.
2, pp.297316, 1987.
[23] T. Hsing, Estimating the parameters of rare events,
StochasticProcesses and Their Applications, vol. 37, no. 1, pp.
117139, 1991.
[24] M. R. Leadbetter, On high level exceedance modeling and
tailinference, Journal of Statistical Planning and Inference, vol.
45,no. 1-2, pp. 247260, 1995.
[25] C. A. T. Ferro and J. Segers, Inference for clusters of
extremevalues, Journal of the Royal Statistical Society. Series B,
vol. 65,no. 2, pp. 545556, 2003.
[26] C. Y. Robert, Inference for the limiting cluster size
distributionof extreme values,TheAnnals of Statistics, vol. 37, no.
1, pp. 271310, 2009.
[27] A. Naess and O. Gaidai, Estimation of extreme values
fromsampled time series, Structural Safety, vol. 31, no. 4, pp.
325334, 2009.
[28] P. E. Gill, W. Murray, andM. H.Wright, Practical
Optimization,Academic Press Inc, London, UK, 1981.
[29] W. Forst and D. Hoffmann,OptimizationTheory and
Practice,Springer Undergraduate Texts inMathematics and
Technology,Springer, New York, NY, USA, 2010.
[30] N. R. Draper and H. Smith, Applied Regression Analysis,
WileySeries in Probability and Statistics: Texts and References
Sec-tion, JohnWiley & Sons, New York, NY, USA, 3rd edition,
1998.
[31] D. C. Montgomery, E. A. Peck, and G. G. Vining,
Introductionto Linear Regression Analysis, Wiley Series in
Probability andStatistics: Texts, References, and Pocketbooks
Section, Wiley-Interscience, New York, NY, USA, Third edition,
2001.
[32] A. Naess, Estimation of Extreme Values of Time Series
withHeavy Tails, Preprint Statistics No. 14/2010, Department
ofMathematical Sciences, Norwegian University of Science
andTechnology, Trondheim, Norway, 2010.
[33] E. F. Eastoe and J. A. Tawn, Modelling the distribution of
thecluster maxima of exceedances of subasymptotic
thresholds,Biometrika, vol. 99, no. 1, pp. 4355, 2012.
[34] J. A. Tawn, Discussion of paper by A. C. Davison and R.
L.Smith, Journal of the Royal Statistical Society. Series B, vol.
52,no. 3, pp. 393442, 1990.
[35] A. W. Ledford and J. A. Tawn, Statistics for near
independencein multivariate extreme values, Biometrika, vol. 83,
no. 1, pp.169187, 1996.
[36] J. E. Heffernan and J. A. Tawn, A conditional approach
formultivariate extreme values, Journal of the Royal
StatisticalSociety. Series B, vol. 66, no. 3, pp. 497546, 2004.
[37] Numerical Algorithms Group, NAG Toolbox for Matlab,
NAG,Oxford, UK, 2010.
[38] E. J. Gumbel, Statistics of Extremes, Columbia University
Press,New York, NY, USA, 1958.
[39] K. V. Bury, Statistical Models in Applied Science, Wiley
Seriesin Probability and Mathematical Statistics, John Wiley &
Sons,New York, NY, USA, 1975.
[40] B. Efron and R. J. Tibshirani, An Introduction to the
Bootstrap,vol. 57 of Monographs on Statistics and Applied
Probability,Chapman and Hall, New York, NY, USA, 1993.
[41] A. C. Davison and D. V. Hinkley, Bootstrap Methods and
TheirApplication, vol. 1 of Cambridge Series in Statistical and
Prob-abilistic Mathematics, Cambridge University Press,
Cambridge,UK, 1997.
[42] J. Pickands, III, Statistical inference using extreme
orderstatistics,The Annals of Statistics, vol. 3, pp. 119131,
1975.
[43] A. Naess and P. H. Clausen, Combination of the
peaks-over-threshold and bootstrappingmethods for extreme value
predic-tion, Structural Safety, vol. 23, no. 4, pp. 315330,
2001.
[44] N. J. Cook, The Designers Guide to Wind Loading of
BuildingStructures, Butterworths, London, UK, 1985.
[45] N. J. Cook, Towards better estimation of extreme
winds,Journal of Wind Engineering and Industrial Aerodynamics,
vol.9, no. 3, pp. 295323, 1982.
[46] A. Naess, Estimation of long return period design values
forwind speeds, Journal of Engineering Mechanics, vol. 124, no.
3,pp. 252259, 1998.
[47] J. P. Palutikof, B. B. Brabson, D. H. Lister, and S. T.
Adcock, Areview of methods to calculate extreme wind
speeds,Meteoro-logical Applications, vol. 6, no. 2, pp. 119132,
1999.
[48] O. Perrin, H. Rootzen, and R. Taesler, A discussion of
statisticalmethods used to estimate extremewind speeds,Theoretical
andApplied Climatology, vol. 85, no. 3-4, pp. 203215, 2006.
[49] M. E. Robinson and J. A. Tawn, Extremal analysis of
processessampled at different frequencies, Journal of the Royal
StatisticalSociety. Series B, vol. 62, no. 1, pp. 117135, 2000.
[50] A. Naess,ANote on the Bivariate ACERMethod, Preprint
Statis-tics No. 01/2011, Department of Mathematical Sciences,
Nor-wegian University of Science and Technology, Trondheim,Norway,
2011.
-
Submit your manuscripts athttp://www.hindawi.com
Hindawi Publishing Corporationhttp://www.hindawi.com Volume
2014
MathematicsJournal of
Hindawi Publishing Corporationhttp://www.hindawi.com Volume
2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttp://www.hindawi.com
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttp://www.hindawi.com Volume
2014
Probability and StatisticsHindawi Publishing
Corporationhttp://www.hindawi.com Volume 2014
Journal of
Hindawi Publishing Corporationhttp://www.hindawi.com Volume
2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttp://www.hindawi.com Volume
2014
OptimizationJournal of
Hindawi Publishing Corporationhttp://www.hindawi.com Volume
2014
CombinatoricsHindawi Publishing
Corporationhttp://www.hindawi.com Volume 2014
International Journal of
Hindawi Publishing Corporationhttp://www.hindawi.com Volume
2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttp://www.hindawi.com Volume
2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing
Corporationhttp://www.hindawi.com Volume 2014
International Journal of Mathematics and Mathematical
Sciences
Hindawi Publishing Corporationhttp://www.hindawi.com Volume
2014
The Scientific World JournalHindawi Publishing Corporation
http://www.hindawi.com Volume 2014
Hindawi Publishing Corporationhttp://www.hindawi.com Volume
2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttp://www.hindawi.com Volume
2014
Hindawi Publishing Corporationhttp://www.hindawi.com Volume
2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttp://www.hindawi.com
Volume 2014
Hindawi Publishing Corporationhttp://www.hindawi.com Volume
2014
Stochastic AnalysisInternational Journal of
-
Applied Ocean Research 28 (2006)
18www.elsevier.com/locate/apor
Numerical methods for calculating the crossing rate of high and
extremeresponse levels of compliant offshore structures subjected
to random waves
A. Naessa,b,, H.C. Karlsenb, P.S. Teigenc
aCentre for Ships and Ocean Structures, Norwegian University of
Science and Technology, A. Getz vei 1, NO-7491, Trondheim,
NorwaybDepartment of Mathematical Sciences, Norwegian University of
Science and Technology, A. Getz vei 1, NO-7491, Trondheim,
Norway
c Statoil Research Centre, Rotvoll, Trondheim, Norway
Received 16 December 2005; received in revised form 3 April
2006; accepted 8 April 2006Available online 5 June 2006
Abstract
The focus of the paper is on methods for calculating the mean
upcrossing rate of stationary stochastic processes that can be
represented assecond order stochastic Volterra series. This is the
current state-of-the-art representation of the horizontal motion
response of e.g. a tension legplatform in random seas. Until
recently, there has been no method available for accurately
calculating the mean level upcrossing rate of suchresponse
processes. Since the mean upcrossing rate is a key parameter for
estimating the large and extreme responses it is clearly of
importanceto develop methods for its calculation. The paper
describes in some detail a numerical method for calculating the
mean level upcrossing rateof a stochastic response process of the
type considered. Since no approximations are made, the only source
of inaccuracy is in the numericalcalculation, which can be
controlled. In addition to this exact method, two approximate
methods are also discussed.c 2006 Elsevier Ltd. All rights
reserved.Keywords: Second order stochastic Volterra model; Mean
crossing rate; Extreme response; Slow drift response; Method of
steepest descent
1. Introduction
The problem of calculating the extreme response ofcompliant
offshore structures like tension leg platforms ormoored spar buoys
in random seas, has been a challenge formany years, and, in fact,
it still represents a challenge. Startingwith the state-of-the-art
representation of the horizontalexcursions of moored, floating
offshore structures in randomseas as a second order stochastic
Volterra series, we shall inthis paper develop a general method for
estimating the extremeresponse of such structures. Even if the
Volterra series modelwas formulated more than 30 years ago, it is
not until quiterecently that general numerical methods have become
availablethat allow accurate calculation of the probability
distributionand, perhaps more importantly, the mean upcrossing rate
ofthe total response process. This last quantity is the
crucialparameter for estimating extreme responses.
During the 1980s significant efforts were directed
towardsdeveloping methods for calculating the response statistics
of
Corresponding author. Tel.: +47 73 59 70 53; fax: +47 73 59 35
24.E-mail address: [email protected] (A. Naess).
compliant offshore structures subjected to random waves. Thelist
of contributions is long. To mention but a few, whichalso contain
references to other work focussing on responsestatistics, see [16].
However, none of these works succeeded indeveloping a general
method that made it possible to calculatethe exact statistical
distribution of the total response process,not to mention the much
harder problem of calculating the meanupcrossing rate.
A general method for solving the first problem of calculatingthe
exact statistical distribution was presented in [7]. Then ittook
almost a decade before a general method for solving thesecond
problem was outlined [8]. This method was developedfurther in [9].
While the method is mathematically sound, initialefforts to carry
out the requisite calculations have revealed thatsome care is
needed in setting up the numerical algorithms.The work presented in
this paper is part of continued effortsto set up a robust and
accurate numerical procedure. It shouldbe emphasized that while the
discussion in this paper is limitedto the long-crested seas case,
the method described also coversthe short-crested seas case, cf.
[6]. A recent paper presentinga discussion and comparison of some
approximate proceduresfor calculating the mean upcrossing rate is
notable [10].
0141-1187/$ - see front matter c 2006 Elsevier Ltd. All rights
reserved.doi:10.1016/j.apor.2006.04.001
-
2 A. Naess et al. / Applied Ocean Research 28 (2006) 18
2. The response process
The response process Z(t) that is considered here is assumedto
be represented as a second order stochastic Volterra series.This
would apply to the state of the art representation ofe.g. the surge
response of a large volume, compliant offshorestructure in random
waves. This response would consist of acombination of the wave
frequency component Z1(t) and theslow-drift component Z2(t), that
is, Z(t) = Z1(t) + Z2(t).Naess [6] describes the standard
representation of the tworesponse components leading to a second
order Volterra seriesmodel for the total response. To alleviate the
statistical analysisof the response process, it has been shown
[2,6] that the slow-drift response Z2(t) can be expressed as
Z2(t) =Nj=1
j {W2 j1(t)2 +W2 j (t)2}. (1)
Here W j (t), j = 1, . . . , 2N are real stationary GaussianN
(0, 1)-processes. The coefficients j are obtained by solvingthe
eigenvalue problem (assumed nonsingular)
Qu j = ju j (2)to find the eigenvalues j and orthonormal
eigenvectors u j ,j = 1, . . . , N , of the N N -matrix Q = (Qi j
), where
Qi j = H2(i , j ) 12 [SX (i )SX ( j )]1/21. (3)
Here H2(, ) denotes the quadratic transfer function betweenthe
waves and the surge response, cf. [2,6], SX () denotesthe one-sided
spectral density of the waves, and 0 < 1 < < N is a
suitable discretization of the frequency axis.The stochastic
processes W j (t) can be represented as follows(i = 1)
W2 j1(t)+ iW2 j (t) =2
Nk=1
u j (k)Bkeik t (4)
where u j (k) denotes the kth component of u j and {Bk}is a set
of independent, complex Gaussian N (0, 1)-variableswith independent
and identically distributed real and imaginaryparts. The
representation can be arranged so that W2 j (t)becomes the Hilbert
transform of W2 j1(t), cf. [6]. For eachfixed t , {W j (t)} becomes
a set of independent Gaussianvariables.
Having achieved the desired representation of the
quadraticresponse Z2(t), it can then be shown that the linear
responsecan be expressed as
Z1(t) =2Nj=1
jW j (t). (5)
The (real) parameters j are given by the relations
2 j1 + i2 j =N
k=1H1(k)
SX (k)1 u j (k) (6)
where H1() denotes the linear transfer function between thewaves
and the surge response. Based on the representationsgiven by Eqs.
(1) and (5), [11] describes how to calculatethe statistical moments
of the response process Z(t), whilea general and accurate numerical
method for calculating thePDF of Z(t) is given in [7]. However, for
important predictionpurposes, like extreme response estimation, the
crucial quantityis the mean rate of level upcrossings by the
response process.
3. The mean crossing rate
Let N+Z ( ) denote the rate of upcrossings of the level by Z(t),
cf. [12], and let +Z ( ) = E[N+Z ( )], that is,+Z ( ) denotes the
mean rate of upcrossings of the level . Asdiscussed in [9], under
suitable regularity conditions on theresponse process, which can be
adopted here, the followingformula can be used
+Z ( ) = 0
s fZ Z (, s)ds (7)
where fZ Z (, ) denotes the joint PDF of Z(0) and Z(0) =dZ(t)/dt
|t=0. Eq. (7) is often referred to as the Riceformula [13]. +Z ( )
is assumed throughout to be finite.
Calculating the mean crossing rate of a stochastic
processrepresented as a second order stochastic Volterra series
directlyfrom Eq. (7) has turned out to be very difficult due to
thedifficulties of calculating the joint PDF fZ Z (, ). However,
thiscan be circumvented by invoking the concept of
characteristicfunction.
Denote the characteristic function of the joint variable(Z , Z)
by MZ Z (, ), or, for simplicity of notation, by M(, ).Then
M(u, v) = MZ Z (u, v) = E[exp(iuZ + iv Z)]. (8)Assuming that M(,
) is an integrable function, that is,M(, ) L1(R2), it follows
that
fZ Z (, s) =1
(2pi)2
M(u, v)
exp (iu ivs) dudv. (9)By substituting from Eq. (9) back into Eq.
(7), the meancrossing rate is formally expressed in terms of the
characteristicfunction, but this is not a very practical
expression.
The solution to this is obtained by considering
thecharacteristic function as a function of two complex
variables.It can then often be shown that this new function
becomesholomorphic in suitable regions of C2, where C denotes
thecomplex plane. As shown in detail in [14], under
suitableconditions, the use of complex function theory allows
thederivation of two alternative expressions for the crossing
rate.Here we shall focus on one of these alternatives, viz.
+Z ( ) = 1
(2pi)2
iaia
ibib
1
w2M(z, w)eizdzdw (10)
where 0 < a < a1 for some positive constant a1, and b0
< b 0.
-
A. Naess et al. / Applied Ocean Research 28 (2006) 18 3
To actually carry out the calculations, the joint
characteristicfunction needs to be known. It has been shown [8]
that forthe second order stochastic Volterra series, it can be
givenin closed form. To this end, consider the
multidimensionalGaussian vectors W = (W1, . . . ,Wn) ( denotes
transposition)and W = (W1, . . . , Wn), where n = 2N . It is
obtained that thecovariance matrix of (W , W ) is given by
=(11 1221 22
)(11)
where 11 = I = the n n identity matrix, 12 =(ri j ) = (E[Wi W j
]), 21 = (E[WiW j ]) and 22 = (si j ) =(E[Wi W j ]); i, j = 1, . .
. , n. ri j = r j i and 12 = 21. Itfollows from Eq. (4), that the
entries of the covariance matrix can be expressed in terms of the
eigenvectors u j , cf. [2]. Let
Ri j =N
k=1(ik)ui (k)u j (k). (12)
Then it can be shown that
r2i1,2 j1 = r2i,2 j = R(Ri j ) (13)while
r2i1,2 j = r2i,2 j1 = =(Ri j ) (14)whereR(z) denotes the real
part and =(z) the imaginary part ofz. Similarly, let
Si j =N
k=12kui (k)u j (k)
. (15)
Then
s2i1,2 j1 = s2i,2 j = R(Si j ) (16)while
s2i1,2 j = s2i,2 j1 = =(Si j ). (17)By this, the covariance
matrix is completely specified.
It is convenient to introduce a new set of eigenvalues j ,j = 1,
. . . , n defined by 2i1 = 2i = i , i = 1, . . . , N .Let = diag(1,
. . . , n) be the diagonal matrix with theparameters j on the
diagonal, and let = (1, . . . , n). Itcan now be shown that [8]
M(u, v)
= exp{12ln(det(A)) 1
2v2 V + 1
2t A1 t
}(18)
where
A = I 2 i u 2 i v (21 + 12 )+ 4 v2 V (19)V = 22 21 12 (20)t
=
(i u I + i v12 2 v2 V
). (21)
4. Numerical calculation
Previous efforts to carry out numerical calculation of themean
crossing rate using Eq. (10) have been reported in [9].These
initial investigations indicated that the method had thepotential
to provide very accurate numerical results. We shallrewrite Eq.
(10) as follows
+Z ( ) = 1
(2pi)2
iaia
1
w2I (, w)dw (22)
where
I = I (, w) = ibib
M(z, w) eizdz
= ibib
exp{iz + lnM(z, w)}dz. (23)
A numerical calculation of the mean crossing rate can start
bycalculating the function I (, w) for specified values of andw.
However, a direct numerical integration of Eq. (23) is
madedifficult by the oscillatory term exp{iR(z) }. This problemcan
be avoided by invoking the method of steepest descent, alsocalled
the saddle point method. For this purpose, we write
g(z) = g(z;w) = iz + lnM(z, w)= (x, y)+ i(x, y) (24)
where z = x + iy. (x, y) and (x, y) become real
harmonicfunctions when g(z) is holomorphic. The idea is to
identifythe saddle point of the surface (x, y) (x, y) closest tothe
integration line from ib to ib. By shifting thisintegration line to
a new integration contour that passes throughthe saddle point, and
then follows the path of steepest descentaway from the saddle
point, it can be shown that the function(x, y) stays constant, and
therefore the oscillatory term in theintegral degenerates to a
constant. This is a main advantage ofthe method of steepest descent
for numerical calculations. Itcan be shown that the integral does
not change its value as longas the function g(z) is a holomorphic
function in the regionbounded by the two integration contours and
if the integralsvanish along the contour segments required to close
the region.
If zs denotes the identified saddle point, where g(zs) =0, the
steepest descent path away from the saddle point willfollow the
direction given by g(z), for z 6= zs , cf. [15].Typically, the
singular points of the function g will be aroundthe imaginary axis,
which indicates that the direction of thepaths of steepest descent
emanating from the saddle point willtypically not deviate
substantially from a direction orthogonalto the imaginary axis.
This provides a guide for setting up anumerical integration
procedure based on the path of steepestdescent. First the saddle
point zs is identified. Then the path ofsteepest descent starting
at zs and going right is approximatedby the sequence of points {z j
}j=0 calculated as follows:z0 = zs z1 = zs + h (25)
1z j = g(z j )
|g(z j )| h, j = 1, 2, . . . (26)z j+1 = z j +1z j , j = 1, 2, .
. . (27)where h is a small positive constant.
-
4 A. Naess et al. / Applied Ocean Research 28 (2006) 18
Similarly, the path of steepest descent going left
isapproximated by the sequence {z j }j=0 calculated byz1 = zs h
(28)
1z j = g(z j )
|g(z j )| h, j = 1,2, . . . (29)z j1 = z j +1z j , j = 1,2, . .
. . (30)
A numerical estimate I of I can be obtained as follows.
I = I+ + I (31)where
I+ = h2exp{g(zs)} +
Kj=1
1z j exp{g(z j )} (32)
and
I = h2exp{g(zs)}
Kj=1
1z j exp{g(z j )} (33)
for a suitably large integer K .A numerical estimate +Z ( ) of
the mean crossing rate can
now be obtained by the sum
+Z ( ) = 1
(2pi)2R
{L
j=L
1
w2j
I (, w j )1w j
}(34)
where the discretization points w j are chosen to follow
thenegative real axis from a suitably large negative number up toa
point at , where 0 < a, then follow a semi-circle inthe lower
half plane to on the positive real axis, and finallyfollow this
axis to a suitably large positive number. Since thenumerical
estimate does not necessarily have an imaginary partthat is exactly
equal to zero, the real part operator has beenapplied.
Generally, the CPU time required to carry out thecomputations
above can be quite long, depending on the size ofthe problem, which
is related to the number N of eigenvalues.It is therefore of
interest to see if approximating formulasare accurate enough. The
first such approximation we shallhave a look at is the Laplace
approximation for the innerintegral over the saddle point [15]. The
simplest version of thisapproximation, adapted to the situation at
hand, leads to theresult
I = I (, w) 2pi 2g(zs ;w)
x2
exp{g(zs;w)} (35)
which can be substituted directly into Eq. (34), leading to
anapproximation of +Z ( ), which is denoted by
+Z ( ).
This approximation can be exploited in the following way:(1) The
full method is used for an inner interval of w-values,which
contribute significantly to the integral in Eq. (22). (2) ALaplace
approximation is then used in an outer interval of w-values where
the contribution is less than significant. Of course,the level of
significance is chosen according to some suitable
criterion. By this procedure, the CPU time was reduced bya
factor of about 3. This method will be referred to as thehybrid
method, and the corresponding approximation of +Z ( )is denoted by
+Z ( ).
A simple approximation proposed in [16,17] is worth acloser
scrutiny. It is based on the widely adopted simplifyingassumption
that the displacement process is independent of thevelocity
process. This leads to an alternative approximation of+Z ( ), which
we shall denote by
+Z ( ). It is given by the
formula
+Z ( ) = +Z (ref)fZ ( )
fZ (ref)(36)
where fZ denotes the marginal PDF of the surge response,and ref
denotes a suitable reference level, typically the meanresponse.
Here, ref has been chosen as the point where fZassumes its maximum,
which correspond