1 Applied Hydrology RSLAB-NTU Lab for Remote Sensing Hydrology and Spatial Modeling Frequency Analysis Professor Ke-Sheng Cheng Dept. of Bioenvironmental Systems Engineering National Taiwan University
Mar 27, 2015
1
Applied Hydrology
RSLAB-NTU
Lab for Remote Sensing Hydrology and Spatial Modeling
Frequency Analysis
Professor Ke-Sheng ChengDept. of Bioenvironmental Systems Engineering
National Taiwan University
2Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
General interpretation of hydrological frequency analysis
Hydrological frequency analysis is the work of determining the magnitude of hydrological variables that corresponds to a given probability of exceedance. Frequency analysis can be conducted for many hydrological variables including floods, rainfalls, and droughts.
The work can be better perceived by treating the interested variable as a random variable.
3Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
Let X represent the hydrological (random) variable under investigation. A value xc associating to some event is chosen such that if X assumes a value exceeding xc the event is said to occur. Every time when a random experiment (or a trial) is conducted the event may or may not occur.
We are interested in the number of Bernoulli trials in which the first success occur. This can be described by the geometric distribution.
4Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
Geometric distribution
Geometric distribution represents the probability of obtaining the first success in x independent and identical Bernoulli trials.
,3,2,1)1();( 1 x pppxf xX
pXE /1][ 2/][ pqXVar
5Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
6Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
Average number of trials to achieve the first success.
Recurrence interval vs return period
7Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
8Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
The frequency factor equation
9Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
10Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
It is apparent that calculation of involves determining the type of distribution for X and estimation of its mean and standard deviation. The former can be done by GOF test and the latter is accomplished by parametric point estimation.
Tx 1. Collecting required data.
2. Determining appropriate distribution.
3. Estimating the mean and standard deviation.
4. Calculating xT using the general eq.
11Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
Data series used for frequency analysis
Complete duration series A complete duration series consists of all the observed
data.
Partial duration series A partial duration series is a series of data which are
selected so that their magnitude is greater than a predefined base value. If the base value is selected so that the number of values in the series is equal to the number of years of the record, the series is called an “annual exceedance series”.
12Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
Extreme value series An extreme value series is a data series that
includes the largest or smallest values occurring in each of the equally-long time intervals of the record. If the time interval is taken as one year and the largest values are used, then we have an “annual maximum series”.
Data independencyWhy is it important?
13Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
Techniques for goodness-of-fit test
A good reference for detailed discussion about GOF test is:
Goodness-of-fit Techniques. Edited by R.B. D’Agostino and M.A. Stephens, 1986.
Probability plottingChi-square test
Kolmogorov-Smirnov TestMoment-ratios diagram method
L-moments based GOF tests
14Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
Rainfall frequency analysis
Consider event total rainfall at a location.What is a storm event?
Parameters related to partition of storm eventsMinimum inter-event-timeA threshold value for rainfall depth
15Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
16Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
Total depths of storm events
Total rainfall depth of a storm event varies with its storm duration. [A bivariate distribution for (D, tr).]
For a given storm duration tr, the total depth D(tr) is considered as a random variable and its magnitudes corresponding to specific exceedance probabilities are estimated. [Conditional distribution]
In general, . if )]([)]([ 2121 trtrtrDEtrDE
17Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
Probabilistic Interpretation of the Design Storm Depth
18Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
Random Sample For Estimation of Design Storm Depth
The design storm depth of a specified duration with return period T is the value of D(tr) with the probability of exceedance equals /T.
Estimation of the design storm depth requires collecting a random sample of size n, i.e., {x1, x2, …, xn}.
A random sample is a collection of independently observed and identically distributed (IID) data.
19Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
Annual Maximum SeriesData in an annual maximum series are cons
idered IID and therefore form a random sample.
For a given design duration tr, we continuously move a window of size tr along the time axis and select the maximum total values within the window in each year.
Determination of the annual maximum rainfall is NOT based on the real storm duration; instead, a design duration which is artificially picked is used for this purpose.
20Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
Fitting A Probability Distribution to Annual Maximum Series
How do we fit a probability distribution to a random sample?What type of distribution should be adopted?What are the parameter values for the
distribution?How good is our fit?
21Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
Chi-square GOF test
22Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
23Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
24Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
25Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
26Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
27Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
28Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
Kolmogorov-Smirnov GOF test
The chi-square test compares the empirical histogram against the theoretical histogram.
In contrast, the K-S test compares the empirical cumulative distribution function (ECDF) against the theoretical CDF.
29Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
30Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
31Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
32Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
In order to measure the difference between Fn(X) and F(X), ECDF statistics based on th
e vertical distances between Fn(X) and F(X)
have been proposed.
33Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
34Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
35Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
36Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
37Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
38Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
Hypothesis test using Dn
39Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
Values of for the Kolmogorov-Smirnov test
,nD
40Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
GOF test using L-moment-ratios diagram (LMRD)
Concept of identifying appropriate distributions using moment-ratio diagrams (MRD).
Product-moment-ratio diagram (PMRD)L-moment-ratio diagram (LMRD)
Two-parameter distributionsNormal, Gumbel (EV-1), etc.
Three-parameter distributionsLog-normal, Pearson type III, GEV, etc.
41Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
Moment ratios are unique properties of probability distributions and sample moment ratios of ordinary skewness and kurtosis have been used for selection of probability distribution.
The L-moments uniquely define the distribution if the mean of the distribution exists, and the L-skewness and L-kurtosis are much less biased than the ordinary skewness and kurtosis.
42Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
A two-parameter distribution with a location and a scale parameter plots as a single point on the LMRD, whereas a three-parameter distribution with location, scale and shape parameters plots as a curve on the LMRD, and distributions with more than one shape parameter generally are associated with regions on the diagram.
However, theoretical points or curves of various probability distributions on the LMRD cannot accommodate for uncertainties induced by parameter estimation using random samples.
43Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
Ordinary (or product) moment-ratios diagram (PMRD)
44Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
The ordinary (or product) moment ratios diagram
45Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
46Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
Sample estimates of product moment ratios
47Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
48Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
(D'Agostino and Stephens, 1986)
90% 95%
49Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
Even though joint distribution of the ordinary sample skewness and sample kurtosis is asymptotically normal, such asymptotic property is a poor approximation in small and moderately samples, particularly when the underlying distribution is even moderately skew.
50Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
51Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
Scattering of sample moment ratios of the normal distribution
(100,000 random samples)
52Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
L-moments and the L-moment ratios diagram
53Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
54Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
55Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
56Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
L-moment-ratio diagram of various distributions
57Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
Sample estimates of L-moment ratios (probability weighted moment estimators)
58Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
59Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
Sample estimates of L-moment ratios (plotting-position estimators)
60Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
Hosking and Wallis (1997) indicated that is not an unbiased estimator of , but its bias tends to zero in large samples.
and are respectively referred to as the probability-weighted-moment estimator and the plotting-position estimator of the L-moment ratio .
r~
r
rt r~
r
61Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
Establishing acceptance region for L-moment ratios
The standard normal and standard Gumbel distributions (zero mean and unit standard deviation) are used to exemplify the approach for construction of acceptance regions for L-moment ratio diagram.
L-moment-ratios ( , ) of the normal and Gumbel distributions are respectively (0, 0.1226) and (0.1699, 0.1504).
3 4
62Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
Stochastic simulation of the normal and Gumbel distributions
For either of the standard normal and standard Gumbel distribution, a total of 100,000 random samples were generated with respect to the specified sample size20, 30, 40, 50, 60, 75, 100, 150, 250, 500, and 1,000.
For each of the 100,000 samples, sample L-skewness and L-kurtosis were calculated using the probability-weighted-moment estimator and the plotting-position estimator.
63Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
Scattering of sample L-moment ratiosNormal distribution
(100,000 random samples)
Normal distribution !
64Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
(100,000 random samples)
Normal distribution ?
65Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
(100,000 random samples)
Non-normal distribution !
95% acceptance region
99% acceptance region
66Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
Scattering of sample L-moment ratiosGumbel distribution
(100,000 random samples)
67Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
(100,000 random samples)
68Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
(100,000 random samples)
69Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
For both distribution types, the joint distribution of sample L-skewness and L-kurtosis seem to resemble a bivariate normal distribution for a larger sample size (n = 100).
However, for sample size n = 20, the joint distribution of sample L-skewness and L-kurtosis seems to differ from the bivariate normal. Particularly for Gumbel distribution, sample L-moments of both estimators are positively skewed.
70Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
For smaller sample sizes (n = 20 and 50), the distribution cloud of sample L-moment-ratios estimated by the plotting-position method appears to have its center located away from ( , ), an indication of biased estimation.
However, for sample size n = 100, the bias is almost unnoticeable, suggesting that the bias in L-moment-ratio estimation using the plotting-position estimator is negligible for larger sample sizes.
3 4
71Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
In contrast, the distribution cloud of the sample L-moment-ratios estimated by the probability-weighted-moment method appears to have its center almost coincide with ( , ). 3 4
72Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
Bias of sample L-skewness and L-kurtosis - Normal distribution
73Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
74Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
Bias of sample L-skewness and L-kurtosis - Gumbel distribution
75Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
76Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
Mardia test for bivariate normality of sample L-skewness and L-kurtosis
77Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
78Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
79Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
80Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
Mardia test for bivariate normality of sample L-skewness and L-kurtosis
81Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
Mardia test for bivariate normality of sample L-skewness and L-kurtosis
82Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
It appears that the assumption of bivariate normal distribution for sample L-skewness and L-kurtosis of both distributions is valid for moderate to large sample sizes. However, for random samples of normal distribution with sample size , the bivariate normal assumption may not be adequate. Similarly, the bivariate normal assumption for sample L-skewness and L-kurtosis of the Gumbel distribution may not be adequate for sample size .
30n
60n
83Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
Establishing acceptance regions for LMRD-based GOF tests
For moderate to large sample sizes, the sample L-skewness and L-kurtosis of both the normal and Gumbel distributions have asymptotic bivariate normal distributions.
Using this property, the acceptance region of a GOF test based on sample L-skewness and L-kurtosis can be determined by the equiprobable density contour of the bivariate normal distribution with its encompassing area equivalent to .
)%1(100
1
84Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
The probability density function of a multivariate normal distribution is generally expressed by
The probability density function depends on the random vector X only through the quadratic form which has a chi-square distribution with p degrees of freedom.
XX T
p eXf1
21
2
2
1
2
1)(
XXQ T 1
85Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
Therefore, probability density contours of a multivariate normal distribution can be expressed by
for any constant . For a bivariate normal distribution (p=2) th
e above equation represents an equiprobable ellipse, and a set of equiprobable ellipses can be constructed by assigning to c for various values of .
cXXQ T 1
0c
2,2
86Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
Consequently, the acceptance region of a GOF test based on the sample L-skewness and L-kurtosis is expressed by
where is the upper quantile of the distribution at significance level .
)%1(100
2,2
1 XX T
2,2
22
87Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
For bivariate normal random vector , the density contour of can also be expressed as
However, the expected values and covariance matrix of sample L-skewness and L-kurtosis are unknown and can only be estimated from random samples generated by stochastic simulation.
)( 21XXX T
cXX T 1
c
XXXX
2
2
222
21
221121
211
2
2
1
1
88Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
Thus, in construction of the equiprobable ellipses, population parameters must be respectively replaced by their sample estimates .
The Hotelling’s T2 statistic
and ,,
rSx and , ,
xXSxXT T 12
22
222
21
221121
211
2
2
1
1
s
xX
ss
xXxXr
s
xX
r
89Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
The Hotelling’s T2 is distributed as a multiple of an F-distribution, i.e.,
For large N,
Therefore, the distribution of the Hotelling’s T2 can be well approximated by the chi-square distribution with degree of freedom 2.
)2,2(
22
)2(
)1(2~
NF
NN
NT
2,22,22,2
2
)(2)()2(
)1(2
NN FFNN
N
90Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
Thus, if the sample L-moments of a random sample of size n falls outside of the corresponding ellipse, i.e.
the null hypothesis that the random sample is originated from a normal or Gumbel distribution is rejected.
2,2
12
nnT
n xXSxXT
91Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
Scattering of sample L-moment ratiosNormal distribution
(100,000 random samples)
92Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
(100,000 random samples)
Normal distribution ?
93Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
Variation of 95% acceptance regions with respect to sample size n
(100,000 random samples)
Non-normal distribution !
95% acceptance region
n=100
n=50
n=20
What if n=36?
94Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
Empirical relationships between parameters of
acceptance regions and sample size Since the 95% acceptance regions of the
proposed GOF tests are dependent on the sample size n, it is therefore worthy to investigate the feasibility of establishing empirical relationships between the 95% acceptance region and the sample size. Such empirical relationships can be established using the following regression model c
n
b
n
an 2)(
95Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
Empirical relationships between the sample size and parameters of the bivariate distribution of sample L-skewness and L-kurtosis
22
222
21
221121
211
22 2
1
1
s
xX
ss
xXxXr
s
xX
rT
96Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
Empirical relationships between the sample size and parameters of the bivariate distribution of sample L-skewness and L-kurtosis
22
222
21
221121
211
22 2
1
1
s
xX
ss
xXxXr
s
xX
rT
97Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
Example
Suppose that a random sample of size n = 44 is available, and the plotting-position sample L-skewness and L-kurtosis are calculated as ( , ) = (0.214, 0.116). We want to test whether the sample is originated from the Gumbel distribution.
3~ 4
~
22
222
21
221121
211
22 2
1
1
s
xX
ss
xXxXr
s
xX
rT
98Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
From the regression models for plotting-position estimators, we find
to be respectively 0.1784, 0.1369, 0.005119, 0.002924, and 0.6039. The Hotelling’s T2 is then calculated as 0.9908.
The value of T2 is much smaller than the threshold value
,ˆ,ˆ,ˆ 2CSLCKLCSL
rCKL and ,ˆ 2
99.5205.0,2
99Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
The null hypothesis that the random sample is originated from the Gumbel distribution is not rejected.
100Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
95% acceptance regions of L-moments-based GOF test for the normal distribution
Acceptance ellipses correspond to various sample sizes (n = 20, 30, 40, 50, 60, 75, 100, 150, 250, 500, and 1,000).
101Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
Acceptance ellipses correspond to various sample sizes (n = 20, 30, 40, 50, 60, 75, 100, 150, 250, 500, and 1,000).
102Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
95% acceptance regions of L-moments-based GOF test for the Gumbel distribution
Acceptance ellipses correspond to various sample sizes (n = 20, 30, 40, 50, 60, 75, 100, 150, 250, 500, and 1,000).
103Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
Acceptance ellipses correspond to various sample sizes (n = 20, 30, 40, 50, 60, 75, 100, 150, 250, 500, and 1,000).
104Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
Validity check of the LMRD acceptance regions
The sample-size-dependent confidence intervals established using empirical relationships described in the last section are further checked for their validity. This is done by stochastically generating 10,000 random samples for both the standard normal and Gumbel distributions, with sample size20, 30, 40, 50, 60, 75, 100, 150, 250, 500, and 1,000.
105Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
106Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
For validity of the sample-size-dependent 95% acceptance regions, the rejection rate should be very close to the level of significance ( 0.05) or the acceptance rate be very close to 0.95.
107Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
Acceptance rate of the validity check for sample-size-dependent 95%
acceptance regions of sample L-skewness and L-kurtosis pairs.
Based on 10,000 random samples for any given sample size n.
108Lab for Remote Sensing Hydrology and Spatial ModelingRSLAB-NTU
End of this session.