Statistical Methods

Analysis of Hydrologic Data

Estimation of Design Discharge and Water Level

Estimation of both flood discharges and high water levels are necessary for bank protection design. Careful estimation of discharge and water level is important for all sites with erodible banks. This section describes the methods of assessing flood discharge and water level at the site under consideration. The design discharge and water level are determined for selected probability of exceedance or return period.The design discharge and water level arising from floods should be selected after due consideration of the following: The maximum historical discharge as recorded at the site, or as calculated on the basis of recorded water level at the site, or as calculated on the basis of measured discharge at other points on the river from which corresponding site discharge can reasonably be inferred;

the discharge derived from a frequency analysis using a probability of exceedance or return period which is appropriate to the importance and value of the protection work.

The maximum historical water level as recorded at the site, or as inferred from observed or recorded water level at other points on the river from which level can reasonably be transferred to the site in question;

the water level derived from a frequency analysis using a probability of exceedance or return period which is appropriate to the importance and value of the protection work.

In estimating high flows, primary reliance should be placed on careful field investigations, local enquiries and searches of historical records. Data so obtained should be compared with recorded data for hydrometric stations, and supplemented by analytical procedure using stage-discharge curves: At most hydrometric gauging stations reasonably stable relationship exists between water level and discharge. At some sites, however, the stage discharge curve may be quite unstable because of aggradation or degradation at channel bed or backwater effect from downstream, and may change drastically during major floods. A persistent trend of rising or lowering of curve indicates progressive channel aggradation or degradation. The stage corresponding to design flood which exceeds any recorded flow obtained by extrapolating the stage-discharge relationships.

The most commonly used method for estimating design discharge and water level examines the observed discharge and water level to arrive at suitable estimates. The method, known as frequency analysis, is founded on statistical analyses of discharge and water level records. For locations where records of stream flows are available, or where flows from another basin can be transported to the design location, design flood magnitude and water level can be estimated directly from those records by means of frequency analysis.

Frequency Analysis

Frequency of a hydrologic event, such as the annual peak flow is the probability that a value will be equaled or exceeded in any year. This is more appropriately called the exceedance probability, P(F). The reciprocal of the exceedance probability is the return period T in years, i.e., . The length of record should be sufficient to justify extrapolating the frequency relationship. For example, it might be reasonable to estimate a 50-year flood on the basis of a 30-year record, but to estimate a 100-year flood on the basis of a 10-year record would normally be absurd (Neill 1973). Viessman and Lewis (1996) noted that as a general rule, frequency analysis is cautioned when working with shorter records and estimating frequencies of hydrologic events greater than twice the record length.

Frequency analysis can be conducted in two ways: one is the analytical approach and the other is the graphical technique in which flood magnitudes are usually plotted against probability of exceedance.

Here in the following sections, procedures are given mostly for discharge frequency analysis; the similar procedures can also be followed for water level frequency analysis.

Analytical Frequency Analysis

Analytical frequency analysis is based on fitting theoretical probability distributions to given data. Numerous distributions have been suggested on the basis of their ability to fit the plotted data from streams (Linsley et al. 1988). The Log-Pearson Type III (LP3) has been adopted for use in the United States federal agencies for flood analysis. The first asymptotic distribution of extreme values (EV1), commonly called Gumbel Distribution has been widely used and is recommended in the United Kingdom. EV1 Distribution was found to fit peak flow data for several rivers in Bangladesh (Bari and Saleque 1995).

Extreme Value Distributions: Distributions of the extreme values selected from sets of samples of any probability distribution converge to any one of three forms of Extreme Value Distributions, called Type I, II, and III, respectively, when the number of selected extreme values is large. The three limiting forms are special cases of a single distribution called Generalized Extreme Value (GEV) Distribution. (Chow et al. 1988). The cumulative distribution function for the GEV is

(1)

where , u, and are parameters to be determined. For EVI Distribution x is unbounded, while for EVII, x is bounded from below, and for EVIII, x is bounded from above. The EVI and EVII Distributions are also known as the Gumbel and Frechet Distributions, respectively.The Extreme Value Type I (EVI) cumulative distribution function is

- x (2)The parameters are estimated by

and (3)Eq (2) can be expressed as

(4)where y is the reduced variate defined as

(5)Solving Eq (4) for y:

(6)

Noting that the probability of occurrence of an event is the inverse of its return period T, we can write

so

and substituting for F() into Eq (6)

(7)

For a given return period is related to by Eq (5), or

(8)

Example 1Using the EVI Distribution, a model is developed for frequency analysis of the annual peak flow data of Old Brahmaputra River at Mymensingh for 5, 10, 25 and 50 years return period peak flows are calculated.

Annual peak discharges (m3/s) of the Old Brahmaputra River at Mymensingh for the period from 1964-98YearPeak flowYearPeak flowYearPeak flow

196428301978277019892180

196532301979263019902060

196634901980334019912900

196730001981269019921490

196828101982247019932060

196927701983237019941065

197032501984478019953187

197438201985307019962369

197530601986193019971973

197632101987323019983267

1977355019884910

Sample Size n = 32

Max =4910Ave, =2867.53

Min = 1065Std, s =804.54

Skew, Cs =0.372

Note that data for 1971, 72 and 73 are missing. When a fairly long record has a short gap, it may be justifiable to estimate the missing data by correlation with a nearby station; otherwise it is preferable to consolidate the various recorded sequences as if they formed a continuous record (Neill 1973). The latter approach is used in this example.

For the given data and s = 804.54. Substituting in Eq (3) yields

= 627.62

and

The probability model is

To determine the values of for various values of T, it is convenient to use the reduced variate.

For T = 5 years, Eq (7) gives

and Eq (8) yields = 2505.27 + 627.621.5 = 3446.7 m3/s.

Similarly for other values of T, and values are found as follows:

T=10 years, = 2.25, = 3918 m3/s

T=25 years, = 3.20, = 4513 m3/s

T=50 years, = 3.90, = 4954 m3/sFrequency Analysis using Frequency Factors

Calculating the magnitudes of extreme events by the method outlined in the above example requires that the probability distribution function be invertible, that is, given a value of T or, the corresponding value of can be determined. Some probability distribution functions are not readily invertible, like the Normal and Pearson Type III Distributions. Thus an alternative method based on frequency factor is used for calculating the magnitudes of extreme events. Chow (1951) has shown that most frequency functions can be generalized to

(9)

where is a flood of specified probability or return period T, is the mean of the flood series, s is the standard deviation of the series; and is the frequency factor and is a function of return period and type of probability distribution, as well as coefficient of skewness for skewed distributions, such as LP3.

In the event that the variable analyzed is , for example as in Lognormal and LP3 Distributions, the same method is applied to the statistics for the logarithms of data using , and the required value of is found taking antilog of .

Chow (1951) proposed the frequency factor as in Eq (9), and it is applicable to many probability distributions used in hydrologic frequency analysis. The K-T relationship can be expressed in mathematical terms or by a table.

Normal Distribution: From Eq (9) the frequency factor can be expressed as

(10)

Thus, for Normal Distribution is the same as the standard normal variable z. The value of z and hence can be obtained from Table 2.Lognormal Distribution: The recommended procedure for use of the Lognormal Distribution is to convert the data series to logarithms and compute:

1)

2) Compute the mean, and standard deviation

3) Compute

So, can be taken from Table 2.

4) Finally compute

Log-Pearson Type III (LP3) Distribution: The recommended procedure for use of the LP3 Distribution is to convert the data series to logarithms and compute:

1)

2) Compute the mean, and standard deviation 3) Compute coefficient of skewness

4) Compute (11)

where is taken from Table 3.

5) Finally compute

Table 3 gives values of the frequency factors for the LP3 Distribution for various values of return period and coefficient of skewness, Cs. When Cs =0, the frequency factor is equal to the standard normal variable z (Table 2). Extreme Value I (EVI) Distribution: Chow (1951) derived the following expression for frequency factor for the EVI Distribution

(12)

When, Eq (9) (in population term) gives and Eq (12) gives T=2.33 years. This is the return period of the mean of the EVI Distribution.

Table of frequency factors for the EVI Distribution, given in Table 4, is taken from Haan (1977). The values computed from the above equation are equivalent to an infinite sample size in Table 4.

Example 2For illustration the 5 and 50 years return period annual maximum discharges (m3/s) for the Old Brahmaputra River near Mymensingh is calculated using the Lognormal, Log-Pearson Type III and EVI Distributions.YearPeak flowy = log QYearPeak flowLog QYearPeak flowLog Q

196428303.451786197827703.442479198921803.338456

196532303.509202197926303.419955199020603.313867

196634903.542825198033403.523746199129003.462397

196730003.477121198126903.429752199214903.173186

196828103.448706198224703.392696199320603.313867

196927703.442479198323703.374748199410653.027349

197032503.511883198447803.679427199531873.503382

197438203.582063198530703.487138199623693.374565

197530603.485721198619303.285557199719733.295127

197632103.506505198732303.509202199832673.514149

197735503.550228198849103.691081

Ave, =3.4394Std, =0.1326Skew, Cs =-0.9303

Lognormal Distribution: For T = 50 year, 1/T = 0.02 and Table 2 is entered and z = 2.054 is obtained by interpolation corresponding to the tabular value of Note that the value of frequency factor can be obtained from Table 3 with Cs = 0.

, 3.4394 + 2.0540.1326 = 3.7118, m3/s

Log-Pearson Type III (LP3) Distribution: For Cs = -0.9303, the value of 1.532, so,, 3.4394 +1.5320.1326 = 3.6425, m3/s

Extreme Value I (EVI) Distribution: Eq (12) gives2.592 (however, Table 4 gives3.007 for n=32 years), so,, 2867.5 + 2.592804.54 = 4953 m3/s.

Graphical Frequency Analysis

The frequency of an event can be obtained by use of probability plot, which is a plot of event magnitude versus probability. As a check that a probability distribution fits a set of hydrologic data, the data are plotted on specially designed probability paper that linearizes the distribution function. The plotted data are then fitted with a straight line for interpolation and extrapolation purposes. Determining the probability to assign a data point is commonly referred to as determining probability position.

Plotting Positions: Plotting position refers to the probability value assigned to each piece of data to be plotted. If n is the total number of values to be plotted and m is the rank of a value in a list ordered by descending magnitude, the exceedance probability of the mth largest value is, for large n,

However this simple formula (known as Californias formula) produces a probability of 100%, which implies that the largest sample value is the largest possible value. A value of 100% cannot be plotted on many probability paper (Haan 1977). To overcome this limitation other formulas have been proposed. Several plotting position formulas are given below.

Plotting position formulasCalifornia(m/n)

Hazen(m-0.5)/n

Beard1 - (0.5)1/n

Weibullm/(n+1)

Gringorten(m-0.44)/(n+0.12)

Chegodayev (m-0.3)/(n+0.4)

Blom(m-3/8)/(n+1/4)

Tukey(3m-1/3n+1)

The technique in all cases is to arrange the data in increasing or decreasing order of magnitude and to assign order number m to the ranked values. The most efficient formula for computing plotting positions for unspecified distribution and the one now commonly used for most sample data, is

When m is ranked from lowest to highest, P is an estimate of the probability of values being equal to or less than the ranked value, that is, P(Xx); when the rank is from highest to lowest, P is P(Xx).

Example 3As an example, probability plotting analysis of the annual maximum discharges (m3/s) of the Old Brahmaputra near Mymensingh is performed. Also plotted data are compared with best-fit EVI Distribution. RankPeak flowPlotting position*RankPeak flowPlotting positionRankPeak flowPlotting position

mQPmQPmQP

149100.0171231870.36042324700.702

247800.0491330700.3912423700.733

338200.07971430600.4222523690.765

435500.1111530000.4532621800.7958

534900.1421629000.4842720600.827

633400.1731728300.51562820600.858

732670.2041828100.5472919730.889

832500.2351927700.5783019300.920

932300.2672027700.6103114900.951

1032300.2982126900.6403210650.983

1132100.3292226300.671

Sample size n = 32

Ave = 2867.531

Std dev = 804.5443, Skew = 0.37198

* Gringorten, P = (m0.44)/(n+10.88)

First the data are ranked from largest (m=1), as shown below to smallest (m=n=32). Gringortens plotting formula (b=0.44) was used since data are being fitted to EVI Distribution. For example, for m=1, the exceedance probability (Q 4910 m3/s) = (m0.44)/(n+10.88) = (10.44)/(32+0.12) = 0.56/32.12 = 0.017. Similarly all the plotting positions are calculated and plotted on EVI paper (Fig 6). The plotted points represent the empirical distribution obtained using 32 observed peak flows.

Several points on the best-fit EVI line are calculated using Eq (9) as follows:T=5 years,P(Qq) = 0.20K5 = 0.719,Q5 = 3446 m3/sT =25 years,P(Qq) = 0.04K25 = 2.044,Q25 = 4511 m3/sT =50 years,P(Qq) = 0.02K50 = 2.592,Q50 = 4952 m3/sT =100 years,P(Qq) = 0.01K100 = 3.137,Q100 = 5390 m3/s

A straight line is drawn through the calculated points to obtain the best-fit EVI Distribution line. In this example the plotted points show good-fit with EVI Distribution.

Goodness-of-fit Tests

The goodness of fit of a probability distribution can be tested by comparing the theoretical and sample values of the relative frequency or the cumulative frequency function. In the case of the relative frequency function, the 2 test is used and with cumulative frequency function the Kolmogorov-Smirnov test is used.Chi-Square Test: The test statistic is given by

(13)where k is the number of intervals; the sample value of the relative frequency of interval i is, fs(xi) = ni/n; the theoretical value of the relative frequency function (also called incremental probability function) is p(xi) = F(xi) - F(xi-1). It may be noted that nfs(xi) = ni, the observed number of occurrences in interval i, and np(xi) is the corresponding expected number of occurrences in interval i.

To describe thetest, theprobability distribution must be defined. A distribution with = k-l-1 degrees of freedom (l is the number of parameters used in fitting the proposed distribution) is the distribution for the sum of squares of independent standard normal random variables zi. The critical distribution function is tabulated (in Table 5) from Haan (1977). A confidence level is chosen for the test; it is often expressed as 1-, where is termed the significance level.

Exceedence ProbabilityDischarge (m3/s)Return Period (Years)Fig. 6 EVI probability plot for annual peak flows of Old Brahmaputra, MymensinghExample 4Chi square test is used to determine whether EVI Distribution adequately fits the Old Brahmaputra river annual peak flow data.

Thirty two peak flow observation are divided into six class intervals. The number or frequency of observations, ni in each class is counted. The observed or sample values of relative frequency fs(xi) is calculated with n = 32. For example, for the second class interval fs(x2) = 8/32 = 0.25. The observed cumulative frequency found by summing up the relative frequencies.

To fit EVI Distribution, the parameters and u are calculated as before ( = 627.62, u = 2505.27, = 2867.5 and s = 804.54 m3/s). The theoretical cumulative frequencies corresponding to the upper limit of each of class interval is calculated by finding reduced variate y from Eq (5) and the F(x) by Eq (4). For example, for the second class interval

and p(x2) = P(1750 X 2500) = F(2500) F(1750) = 0.32904

The value of 0.32904 is entered under the expected relative frequency corresponding to the class interval 1750-2500 in the table below.

The calculation is repeated for other class intervals and summed to obtain 2 = 2.35. This is the computed 2 value.

To test the goodness of fit, this is compared with the critical 2 value to be obtained from tabular values as shown below.

Class limitNum of obs.Obs frequencyObs cum frequencyReduced variateExpected cum frequencyExpected relative frequencyChi square

Lower limitUpper limitnifs(xi)Fs(xi)yiF(xi)p(xi)

1000175020.06250.0625-1.20330.035750.035750.64050

1750250080.250.3125-0.00840.364790.329040.60757

25003250140.43750.751.18660.736930.372140.36734

3250400060.18750.93752.38150.911740.174810.02948

4000475010.031250.968753.57650.972420.060680.45676

4750550010.031251.04.77160.991570.019150.24465

Total321.00Computed Chi square 22.3463

For a confidence level of 90%, from Table 5, the critical Chi square for = k-l-1 = 6-2-1 = 3 degree of freedom, 2 = 6.25. Since the computed Chi square value of 2.35 is less than the critical value of 6.25, the data fits EVI Distribution adequately.

Kolmogorov-Smirnov Test: The theoretical and sample values of the cumulative frequency are compared with the Kolmogorov-Smirnov (S-K) test. The test statistic D, which is based on deviations of the sample distribution function P(x) from the completely specified continuous hypothetical distribution function Po(x), such that:

Developed by Kolmogorov (Kite 1988) in 1933, the test requires that the value of D computed from the sample distribution be less than the tabulated value of D (Table 6) at the required confidence level. Kolmogorov-Smirnov test for Gumbels Extremal Distribution gives better result in BangladeshBankful Discharge

The bankful discharge of a river may be defined as the discharge which is contained within the banks of the river. This is the state of maximum velocity in the channel, and therefore of maximum competence for the transport of sediment load.

Bankful discharge is assumed to be a major determinant of the size and shape of a river channel, but it is difficult to measure in the field, and a wide variety of field procedures exist for this measurement. Quoting return periods for bankful discharge is a tricky business because over a dozen methods are available, but the frequency of its occurrence seems to vary with climatic regimes.

Dominant Discharge Analysis

The dominant discharge is the flow doing most geomorphic work and it is, therefore, the channel forming discharge. It probably does not correspond to bankful flow on any river. The dominant or channel forming flow represents an alternative benchmark criterion to bankful flow when analyzing channel form and process. To estimate the dominant discharge the following steps are followed:

1. Obtain long-term (30 year plus) distribution of flows for gauging station.

Frequency, F

2. Split this into discrete of equal class interval. For Brahmaputra, let us try initially 5,000 m3/s class interval, and check sensitivity of results to this choice of interval. Find mid point of each class.

Q F

3. Obtain the most reliable sediment rating curve for the gauging station. Ideally this should be for total load, but a suspended load curve may be used provided that suspended load makes up most of the total load, as is usually the case.

Qs

Q

4. Use the sediment rating curve to find the sediment transport rate (tons/sec) for the mid-point discharge of each flow class.

Qs Qsi

Q

Qi

5. Multiply the sediment transport rate for each discharge class with the frequency of that class to obtain the total sediment load transported by that flow during the period; plot this as a histogram.

Total QsQMode = Qd

6. From the histogram, identify the mode. This corresponds to the dominant discharge. Determine the magnitude of Qd = dominant discharge and use the flow duration curve to establish its return period.

Exceedance Probability

River System and Estimation of Design Water Level and Discharge

381

Table 2 Cumulative probability of the Standard Normal Distribution

Table 3 Frequency factors for Pearson Type III Distribution

Table 3 Continued

Table 4 Frequency factors for Extreme Value I Distribution Sample size (n)Return Period

51015202550751001000

150.9671.7032.1172.4102.6323.3213.7214.0056.265

200.9191.6252.0232.3022.5173.1793.5633.8366.006

250.8881.5751.9632.2352.4443.0883.4633.7295.842

300.8661.5411.9222.1882.3933.0263.3933.6535.727

350.8511.5161.8912.1522.3542.9793.3413.598

400.8381.4951.8662.1262.3262.9433.3013.5545.576

450.8291.4781.8472.1042.3032.9133.2683.520

500.8201.4661.8312.0862.2832.8893.2413.4915.478

550.8131.4551.8182.0712.2672.8693.2193.467

600.8071.4461.8062.0592.2532.8523.2003.446

650.8011.4371.7962.0482.2412.8373.1833.429

700.7971.4301.7882.0382.2302.8243.1693.4135.359

750.9721.4231.7802.0292.2202.8123.1553.400

800.7881.4171.7732.0202.2122.8023.1453.387

850.7851.4131.7672.0132.2052.7933.1353.376

900.7821.4091.7622.0072.1982.7853.1253.367

950.7801.4051.7572.0022.1932.7773.1163.357

1000.7791.4011.7521.9982.1872.7703.1093.3495.261

0.7191.3051.6351.8662.0442.5922.9113.1374.936

2a, vTable 5 2 Distribution

DOF

17.886.635.023.842.711.320.4550.1020.01580.00390.00100.00020.0000

210.69.217.385.994.612.771.390.575.211.103.0506.0201.0100

312.811.39.357.816.254.112.371.21.584.352.216.115.072

414.913.311.19.497.785.393.361.921.06.711.484.297.207

516.715.112.811.19.246.634.352.671.611.15.831.554.412

618.516.814.412.610.67.845.353.452.201.641.24.872.676

720.318.516.014.112.09.046.354.252.832.171.691.24.989

822.020.117.515.513.410.27.345.073.492.732.181.651.34

923.621.719.016.914.711.48.345.904.173.332.702.091.73

1025.223.220.518.316.012.59.346.744.873.943.252.562.16

1126.824.721.919.717.313.710.37.585.584.573.823.052.60

1228.326.223.321.018.514.811.38.446.305.234.403.573.07

1329.827.724.722.419.816.012.39.307.045.895.014.113.57

1431.329.126.123.721.117.113.310.27.796.575.634.664.07

1532.830.627.525.022.318.214.311.08.557.266.265.234.60

1634.332.028.826.323.519.415.311.99.317.966.915.815.14

1735.733.430.227.624.820.516.312.810.18.677.566.415.70

1837.234.831.528.926.021.617.313.710.99.398.237.016.26

1938.636.232.930.127.222.718.314.611.710.18.917.636.84

2040.037.634.231.428.423.819.315.512.410.99.598.267.43

2141.438.935.532.729.624.920.316.313.211.610.38.908.03

2242.840.336.833.930.826.021.317.214.012.311.09.548.64

2344.241.638.135.232.027.122.318.114.813.111.710.29.26

2445.643.039.436.433.228.223.319.015.713.812.410.99.89

2546.944.340.637.734.429.324.319.916.514.613.111.510.5

2648.345.641.938.935.630.425.320.817.315.413.812.211.2

2749.647.043.240.136.731.526.321.718.116.214.612.911.8

2851.048.344.541.337.932.627.322.718.916.915.313.612.5

2952.349.645.742.639.133.728.323.619.817.716.014.313.1

3053.750.947.043.840.334.829.324.520.618.516.815.013.8

4066.863.759.355.851.845.639.333.729.126.524.422.220.7

5079.576.271.467.563.256.349.342.937.734.832.429.728.0

6092.088.483.379.174.467.059.352.346.543.240.537.535.5

70104.2100.495.090.585.577.669.361.755.351.748.845.443.3

80116.3112.3106.6101.996.688.179.371.164.360.457.253.551.2

90128.3124.1118.1113.1107.698.689.380.673.369.165.661.859.2

100140.2135.8129.6124.3118.5109.199.390.182.477.974.270.167.3

Source: Catherine M. Thompson, Table of percentage points of the 2 distribution, Biometrika, Vol. 32 (1941), by permission of the author and publisher.

Table 6 Kolmogorov-Smirnov Distribution Sample size (n)Significance Level

.200.150.100.050.01

1.900.925.950.975.995

2.684.726.776.842.929

3.565.597.642.708.829

4.494.725.564.624.734

5.446.474.510.563.669

6.410.436.470.521.618

7.381.405.438.486.577

8.358.381.411.457.543

9.339.360.388.432.514

10.322.342.368.409.486

11.307.326.352.391.468

12.295.313.338.375.450

13.284.302.325.361.433

14.274.292.314.349.418

15.266.283.304.338.404

16.258.274.295.328.391

17.250.266.286.318.380

18.244.259.278.309.370

19.237.252.272.301.361

20.231.246.264.294.352

25.21.22.24.264.32

30.19.20.22.242.29

35.18.19.21.23.27

40.21.25

50.19.23

60.17.21

70.16.19

80.15.18

90.14

100.14

Asymptotic Formula

Source: Journal American Statistical Association 47:425-441, 1952.Z.W. Birnbaum.

Statistical Methods

Documents

water level records

water level estimation

site discharge

basis of recorded water

observed discharge

maximum historical water

estimating design discharge

stage discharge curve