Top Banner
13

Statistical–Dynamical Approach for Streamflow …water.columbia.edu/files/2011/11/Block2009Statistical .pdfStatistical–Dynamical Approach for Streamflow Modeling at Malakal,

Jun 18, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Statistical–Dynamical Approach for Streamflow …water.columbia.edu/files/2011/11/Block2009Statistical .pdfStatistical–Dynamical Approach for Streamflow Modeling at Malakal,
Page 2: Statistical–Dynamical Approach for Streamflow …water.columbia.edu/files/2011/11/Block2009Statistical .pdfStatistical–Dynamical Approach for Streamflow Modeling at Malakal,

Statistical–Dynamical Approach for Streamflow Modelingat Malakal, Sudan, on the White Nile River

Paul Block1 and Balaji Rajagopalan2

Abstract: The upper White Nile Basin above Malakal, Sudan, is considered to be one of the most complicated and diverse hydrologicsettings on Earth. Accurately depicting and predicting the streamflow at Malakal is essential for water managers considering NileBasin-wide initiatives and potential large-scale projects. Dynamical, statistical, and combination models are assessed for their ability topredict monthly streamflow at Malakal. The dynamical model represents a lumped parameter, average-monthly water balance, whereas thestatistical model incorporates a nonparametric approach based on local polynomial regression, utilizing principal components of precipi-tation and temperature. The combination of dynamical and statistical models through linear regression produces model weights of 0.44and 0.59, respectively, implying a relatively balanced influence. Evaluation of the combination model demonstrates significant overallskill �correlation coefficients equal to 0.83�, outperforming either individual model for the validation periods selected. Peak streamflowanalyses of timing and quantity also exhibit superior performance by the combination model. An ensemble approach, practical forplanning and management from a probabilistic standpoint, is additionally demonstrated.

DOI: 10.1061/�ASCE�1084-0699�2009�14:2�185�

CE Database subject headings: Dynamic models; Statistics; Streamflow; Sudan; Africa; Nile River.

Introduction

The upper White Nile Basin above Malakal, Sudan, is consideredto be one of the most complicated and diverse hydrologic settingson Earth. The White Nile River originates in headstreams aboveLake Victoria and continues nearly 2,900 km to Malakal, travers-ing through other lakes and the swamps of the Sudd in southernSudan, draining a total of nearly 1.5 million km2 �Shahin 1985�.Fig. 1 illustrates the upper White Nile Basin, including the threeequatorial lakes �Victoria, Kyoga, and Albert� and three swampregions �Bahr el Jebel, Bahr el Ghazal, and Sobat� of primaryinterest; Malakal is located along the northern boundary of thebasin. Both the lakes and the swamps in this basin exhibit non-linear and discrete behavior. Additionally, the swamps impose aregulating effect, expanding during times of high inflow, allowingfor increased levels of evaporation, and releasing at a more mod-erate rate unique to each swamp �Sutcliffe and Parks 1987�. AlanMoorehead, in The White Nile, writing about the Sudd, said “thereis no more formidable swamp in the world” �Moorehead 1971�.Precipitation and evapotranspiration, along with the nonlinearitiesinherent in the system, are the driving forces behind the region’shydrologic response.

Understanding and modeling streamflow within the Nile Basin

1Postdoctoral Research Scientist, International Research Institutefor Climate and Society, Columbia Univ., Lamont Campus, 61 Rte. 9W,Palisades, NY 10964. E-mail: [email protected]

2Associate Professor, Dept. of Civil, Environmental, and ArchitecturalEngineering, Univ. of Colorado-Boulder, ECOT 441, 428 UCB, Boulder,CO 80309.

Note. Discussion open until July 1, 2009. Separate discussions mustbe submitted for individual papers. The manuscript for this paper wassubmitted for review and possible publication on May 31, 2007; approvedon April 28, 2008. This paper is part of the Journal of Hydrologic En-gineering, Vol. 14, No. 2, February 1, 2009. ©ASCE, ISSN 1084-0699/

2009/2-185–196/$25.00.

JOURNAL

Downloaded 17 Feb 2010 to 209.2.227.181. Redistribution subject to

is critical for effective water resources management, yet nottrivial due to its size and composition, including ten ripariancountries that share its water. To this end, accurate White Nilestreamflow simulations and forecasts are highly desired. Yatesand Strzepek �1998� created a dynamical model of the White Nilesystem, driven by common hydrometeorological inputs, capableof generating monthly flow scenarios under varying climatic con-ditions. The model is predominantly physically based, yet rathercomplex and fairly inflexible. To improve upon the dynamicalmodel forecast capabilities, it is of interest whether combinationwith another model may bolster predictions. Recent developmentsin multimodel combination �Krishnamurti et al. 1999, 2000;Hagedorn et al. 2005; Rajagopalan et al. 2002; Regonda et al.2006� suggest that combining outputs from different models tendsto perform better than any single model. A statistically basedmodeling framework is proposed in this work for combinationwith the dynamical model, due to its simplicity in nature, with thehope of alleviating drawbacks associated with dynamical models,and capturing alternative streamflow features. The need for im-proved streamflow simulations is further motivated by a separateproject focusing on hydropower implications along the Blue Nilein Ethiopia, with downstream ramifications in Sudan and Egypt�Block and Rajagopalan 2007�.

This paper begins with a description of the data utilized, fol-lowed by background on contributing factors to Malakal stream-flow variability. The three modeling frameworks, dynamical,statistical, and combination, are then presented. Model validationmethods are subsequently described, followed by the results forMalakal streamflow assessment. The paper concludes with a sum-mary and discussion of the results.

Data

Streamflow records at Malakal have been recorded monthly since

1912, and are publicly available until 1995. Numerous sources

OF HYDROLOGIC ENGINEERING © ASCE / FEBRUARY 2009 / 185

ASCE license or copyright; see http://pubs.asce.org/copyright

Page 3: Statistical–Dynamical Approach for Streamflow …water.columbia.edu/files/2011/11/Block2009Statistical .pdfStatistical–Dynamical Approach for Streamflow Modeling at Malakal,

provide these data, including the National Center for AtmosphericResearch’s �NCAR� ds552.1 dataset �Bodo 2001�. In an attemptto diminish climate change influences, the data set is restricted to1912–1990.

Input data for the dynamical and statistical models is derivedfrom the Climate Research Unit’s �CRU� TS 2.0 and CL 2.0 datasets, obtained from the University of East Anglia �New et al.2002; Mitchell et al. 2004�. The first set includes monthly valuesfor precipitation, mean daily temperature, diurnal temperaturerange, vapor pressure, and cloud cover; average monthly windspeeds were obtained from the second data set. Both sets areestablished on a 0.5° by 0.5° grid for 1901–2000, based on stationdata and anomalies. Although the historical station data within theregion are sparse and spotty, the CRU precipitation data havebeen shown to be strongly correlated with other data sets, includ-ing the Climate Prediction Center’s merged analysis precipitationand the University of Delaware precipitation, for nearby regions�Block and Rajagopalan 2007�.

Malakal Streamflow Variability

Monthly and annual time series for the recorded streamflow atMalakal, 1912–1990, are illustrated in Fig. 2. In Fig. 2�a�, thesolid line connects the monthly averages over the 79 year period,indicating high flow months in the later portion of the calendaryear, specifically September through December. The boxes indi-

Fig. 1. Equatorial lakes and swamps in the upper White Nile Basin�hatched�. Triangles �six� indicate approximate location of selectedprecipitation and temperature data utilized in the study.

cate the monthly variability in streamflow with the box covering

186 / JOURNAL OF HYDROLOGIC ENGINEERING © ASCE / FEBRUARY 2

Downloaded 17 Feb 2010 to 209.2.227.181. Redistribution subject to

the 25th and 75th percentile, the horizontal line within the boxrepresenting the median, and whiskers extending to the 5th and95th percentiles. The dashed line presents average monthly pre-cipitation over the same time period, clearly showing a distinctlead with streamflow, predominantly due to the regulating factorof the swamps in southern Sudan. From Fig. 2�b�, it is evidentthat there have been two significantly high flow periods: the firstbetween 1917 and 1918, and the second from 1963 to 1966. The1964 event represents the highest annual flow on record. Both ofthese epochs are associated with periods of above normal precipi-tation in the upper White Nile region, as well as other neighboringequatorial regions �Block and Rajagopalan 2007�. An additionalfactor coincident to the 1960s event was the revision of the waterresources treaty between Uganda and Egypt for the relativelynewly constructed Owens Falls Dam �1954� on Lake Victoria�Reynolds 2005�.

The interannual and interdecadal variability in streamflow atMalakal and precipitation in the basin have been investigated bynumerous researchers �Sutcliffe 1974; Kite 1981; WMO 1981;Shahin 1985; Sutcliffe and Parks 1987; Conway and Hulme 1993;Camberlin 1997; Mohamed et al. 2005�. The summer rains nearMalakal are part of the larger east African monsoon, resultingfrom a northward shift of the Intertropical Convergence Zone�ITCZ�, a direct result of solar heating and warming of the surface�Griffiths 1972; Gamachu 1977�. The main band of precipitationlies just south of the ITCZ. Malakal lies close to the northernmostextent of the ITCZ, and thereby receives the majority of its pre-cipitation during July–September. Other parts of the basin south-ward of Malakal receive precipitation in a more even month-to-month distribution, or alternatively in two distinct seasons, due tothe southward shifting of the ITCZ in the later calendar months ofthe year. Both the Atlantic and Indian Oceans act as contributingsources �Block and Rajagopalan 2007�. Simultaneous to the shift-ing of the ITCZ, high-pressure systems in the South Atlantic andIndian Oceans, coupled with the Arabian and the Sudan thermallows, allow for the influx of moisture into the basin �Seleshi andZanke 2004�.

Interannual variability is attributable to numerous climaticforcings, not the least of which is the El Niño southern oscillationphenomenon �Camberlin 1995; Nicholson and Kim 1997; Mutaiand Ward 2000; Ntale and Gan 2004�. It has also been suggestedthat precipitation in El Niño years is controlled by the IndianOcean, whereas precipitation in La Niña years is directed by theAtlantic Ocean �Nicholson and Kim 1997�. Other important fac-tors include complex topography, the extent to which the ITCZshifts, the equatorial lakes, and the Indian Ocean.

Modeling Framework

Numerous studies have developed dynamically based modelsof the Nile Basin, most recently under the pretense of studyingpotential climate change effects �Gleick 1991; Conway 1996,Strzepek et al. 1996; Yates and Strzepek 1998�. Other models,such as the Nile Decision Support System �Georgakakos 2004�,have been created in an effort to model basin hydrology and ex-isting projects to assess basin-wide development scenarios. Thegoal of these types of models is essentially the same: to accuratelydepict or predict hydrologic conditions at various points of inter-est. Creating these models is not trivial, and often takes a signifi-cant amount of time for building and parameterization. Themotivation of this study is to determine if a statistical model,

much simpler in nature, may prove to be on par with existing

009

ASCE license or copyright; see http://pubs.asce.org/copyright

Page 4: Statistical–Dynamical Approach for Streamflow …water.columbia.edu/files/2011/11/Block2009Statistical .pdfStatistical–Dynamical Approach for Streamflow Modeling at Malakal,

dynamical models, and if a multimodel combination further en-hances predictive capabilities.

For this work, two modeling frameworks are assessed, inde-pendently and in combination, for estimation of monthly Malakalstreamflow, namely a dynamically based approach and a nonpara-metric local polynomial statistical approach. The nonparametricapproach has been selected in lieu of traditional parametric tech-niques due to the complex, inherent nonlinearities within thebasin. Fig. 3 clearly presents the nonlinear relationship betweenMalakal streamflow and basin-wide average precipitation andtemperature, two dominant factors in determining streamflowquantity. It is the capability of the nonparametric technique tocapture these nonlinearities that makes it so attractive.

Dynamical Model

The dynamical model utilized in this study is a modification ofthe full Nile Basin-wide model developed by Yates and Strzepek�1998� to assess streamflow variations under climatic change. It isa simple, lumped parameter, average-monthly water balancemodel incorporating six subbasins above Malakal, each associ-ated with one of the equatorial lakes or swamps �see Fig. 2 inYates and Strzepek �1998��. Three major components constitutethe model, including soil moisture accounting, evaluation of po-tential evapotranspiration, and a reservoir storage scheme for bothlakes and swamps. Fig. 3 in Yates and Strzepek �1998� depicts thewater balance component of the model.

Model inputs include precipitation, temperature, vapor pres-sure, cloud cover, and winds to produce monthly runoff andevapotranspiration aggregated within each subbasin. Direct pre-

Fig. 2. Malakal time series, 1912–1990: �a� average monthly streamfl�solid line� with historic average �dashed line�

cipitation and evaporation over the lakes and swamps is also in-

JOURNAL

Downloaded 17 Feb 2010 to 209.2.227.181. Redistribution subject to

cluded. The outlet works for each lake are unique, and based onnonlinear lake level–discharge relationships. The discharge fromeach swamp is a function of the swamp depth, recharge coeffi-cient, lateral spread, and lagging of upstream inflows.

The model may be run for a user-defined number of years,with outputs including lake levels and flow rates at points withineach subbasin. For the purposes of this study, only the streamflowrate at Malakal is reported.

Local Polynomial Statistical Model

Nonparametric statistical methods are becoming increasing popu-lar for modeling in hydroclimatological studies �Lall 1995;Regonda et al. 2005; Prairie et al. 2007; Grantz et al. 2007; Blockand Rajagopalan 2007�. These methods provide an attractive al-ternative for addressing the drawbacks of traditional linear regres-sion, including meeting normality requirements, the potentiallylarge influence by a small number of outliers, and the inability tocapture nonlinear relationships between the dependent and inde-pendent variables. Generally, regression models take the follow-ing simple form:

Y = f�x� + e �1�

where x represents a vector of regression variables �independentvariables�; f =function; Y =dependent variable and e=error, oftenassumed to be normally distributed with a mean of zero and vari-ance �e

2. Traditional linear regression involves fitting a linearfunction f to the entire data. In the nonparametric approach, esti-mation of the function f is performed “locally” at the point to be

lid line� and precipitation �dashed line� values; �b� annual streamflow

ow �so

estimated. This local estimation provides the ability to capture

OF HYDROLOGIC ENGINEERING © ASCE / FEBRUARY 2009 / 187

ASCE license or copyright; see http://pubs.asce.org/copyright

Page 5: Statistical–Dynamical Approach for Streamflow …water.columbia.edu/files/2011/11/Block2009Statistical .pdfStatistical–Dynamical Approach for Streamflow Modeling at Malakal,

features �i.e., nonlinearities� that might be present locally, withoutgranting outliers any undue influence in the overall fit �Block andRajagopalan 2007�. Several nonparametric methods for regressionand probability density function estimation exist; for an overviewof these methods and their applications to hydroclimatology, seeLall �1995�.

In this work, the local polynomial-based nonparametricapproach �Loader 1999� is proposed for its ease in understand-ing, implementation, and successful past applications. Themethodology is described in the following algorithm �Block andRajagopalan 2007�. For a point of interest where an estimation ofthe function is desired, say xpl �subscript p represents “predic-tive,” and l represents “local”�:1. K �=�N� nearest neighbors are identified in proximity to xpl.

The neighbors can be obtained using either the Euclidean orMahalanobis distance. The parameter � describes the size ofthe neighborhood and is within the �0, 1� range. N representsthe total number of data points. Clearly, if � takes a value of1, the number of neighbors selected includes all data points.

2. A polynomial of order P is fit to the K nearest neighbors,using a weighted least-squares method. The fitted polynomialis used to obtain the estimate of the dependent variable, Ypl.The local error standard deviation, �pl, can be obtained fromregression theory �Helsel and Hirsch 1995�.

It is noteworthy to mention that for �=1 and P=1, this approachstill differs from the traditional linear regression, due to theweighting scheme. Linear regression utilizes ordinary leastsquares for optimization of predictor coefficients, which effec-tively weights each point equally. The local polynomial approach

Fig. 3. Surface plot of Malakal streamflow as a function of

uses a weighted least-squares scheme, weighting observed data

188 / JOURNAL OF HYDROLOGIC ENGINEERING © ASCE / FEBRUARY 2

Downloaded 17 Feb 2010 to 209.2.227.181. Redistribution subject to

points closer to xpl higher. This gives more influence to localpoints, and little to none for points far from the desired predictionpoint. For a perfectly linear relationship, the results of the localpolynomial approach are indistinguishable from the traditionallinear regression approach results, thus making it a more generaland flexible approach.

The advantages of this approach over the dynamical model areclear: the model is a great deal simpler to create and code, takesless processing time, requires significantly fewer inputs, and isnot required to match intermediary flows in the system, onlystreamflow at Malakal.

Generalized Cross-Validation Skill Score

The optimal values of the two parameters K �or �� and P must beestimated from the historical data, and may be obtained using thegeneralized cross-validation �GCV� score function, given in thefollowing:

GCV��,P� =

�t=1

Net

2

N

�1 −m

N�2 �2�

where et=model residual �difference between observed andmodel-estimated values of the dependent variable�, andm=number of regression variables in the fitted polynomial. The

-wide average precipitation and temperature for 1912–1990

basin

GCV function penalizes overfitting �large numbers of regression

009

ASCE license or copyright; see http://pubs.asce.org/copyright

Page 6: Statistical–Dynamical Approach for Streamflow …water.columbia.edu/files/2011/11/Block2009Statistical .pdfStatistical–Dynamical Approach for Streamflow Modeling at Malakal,

variables� and is a very good estimate of the predictive risk�Craven and Wahba 1979�.

For each combination of � and P, the model is fitted, as de-scribed in the above presented algorithm, and the GCV score iscomputed; the combination providing the minimum GCV score isselected as the optimal one.

The GCV function can also be used to select the best subsetfrom a suite of regression variables. This process involves includ-ing different combinations of the regression variables, along withvarying � and P values, calculating the GCV, and selectingthe combination of regression variables, �, and P that providethe minimum GCV score as the best parameter combination. Theuse of GCV for subset selection is fairly recent �Regonda et al.2005, 2006� and has been shown to be quite effective. The appli-cation of this method in the present research is described in thefollowing.

Identification of Best Variables for Local PolynomialModel

Potential regression variables for the local polynomial statisticalmodel were limited to inputs utilized in the dynamical model,including precipitation, temperature, and potential evapotranspi-ration. The latter was eventually eliminated, as it is strongly cor-related to both precipitation and temperature. To capture thevariability of the entire upper White Nile Basin, six representativeprecipitation and temperature locations were chosen throughout,mimicking the dynamical model subbasins, as illustrated by thetriangles in Fig. 1, totaling twelve variables. Not surprisingly, thevariables demonstrated multicollinearities �i.e., strongly corre-lated amongst each other�, even though the spatial region consid-ered is quite large. To combat this, principal component analyses�PCA� were performed separately on the precipitation and tem-

Fig. 4. Eigenspectrums from principal compo

perature data. PCA methods, widely used in climate research,

JOURNAL

Downloaded 17 Feb 2010 to 209.2.227.181. Redistribution subject to

decompose a space–time random field �e.g., multivariate data setsuch as the monthly precipitation at the six locations in the basin�into orthogonal space and time patterns using eigendecomposition�von Storch and Zwiers 1999�. The space–time patterns �alsocalled “modes”� are ordered according to the percentage of vari-ance captured. Typically, the first few modes capture most of thevariance present in the data. This is analogous to a dimensionreduction technique, where a large multivariate data set is effec-tively represented by a few modes �i.e., smaller dimension�. Thetemporal patterns are also referred to as principal components�PCs�. As the PCs are orthogonal, they can be analyzed indepen-dently and combined to reconstruct the original data.

The mathematical formulation is as follows:

�PC� = �E��X� �3�

where X=data matrix; E=matrix of eigenvectors; and PC�corre-sponding matrix of the principal components.

The variance explained by each mode, also known as theeigenspectrum, from the PCA of precipitation and temperature forthe 1912–1990 period is shown in Fig. 4. Clearly the leading twomodes capture most of the data variance. Thus, the twelve PCs,six each from precipitation and temperature, form the potentialvariable set used in the above-described GCV framework for se-lecting the best subset.

Combination Model

A combination model, or multimodel, approach is also undertakenfor evaluation of its potential ability to capture Malakal stream-flow. Previous researchers have indicated that multimodel tech-niques may produce more robust results than single modelapproaches �Morel-Seytoux et al. 1993; Balmaseda et al. 1994;

nalyses for �a� precipitation; �b� temperature

nent a

Regonda et al. 2006�. Additionally, the National Weather Service

OF HYDROLOGIC ENGINEERING © ASCE / FEBRUARY 2009 / 189

ASCE license or copyright; see http://pubs.asce.org/copyright

Page 7: Statistical–Dynamical Approach for Streamflow …water.columbia.edu/files/2011/11/Block2009Statistical .pdfStatistical–Dynamical Approach for Streamflow Modeling at Malakal,

currently uses a statistical–dynamical multimodel approach forlong-lead forecasts of atmospheric variables �Ed O’lenic, personalcommunication, May 23, 2006�. For this work, we propose theoptimal combination of dynamical and statistical models using asimple linear regression, with an assumed y intercept of zero,over the entire 1912–1990 period. The choice of linear regressionis based on the ability to easily assess the relative influence ofeach model and gauge its general contribution. Eq. �4� presentsthe following relationship:

SFM = �1SFD + �2SFS + e �4�

where SFD=streamflow at Malakal as determined by the dynami-cal model; SFS=streamflow as determined by the statisticalmodel; e=model error; �1 and �2=optimal coefficients; andSFM =combination model streamflow estimation.

Table 1. Correlation Coefficients among Dynamical, Statistical, and Com

Period Dynamical Statisti

1912–1990 0.75 0.77

1912–1950, 1971–1990 �C� 0.70 0.79

1951–1970 �V� 0.83 —

1912–1970 �C� 0.79 0.75

1971–1990 �V� 0.64 —

Note: C�calibration period; and V�validation period.

Fig. 5. Observed �solid line� and combination m

190 / JOURNAL OF HYDROLOGIC ENGINEERING © ASCE / FEBRUARY 2

Downloaded 17 Feb 2010 to 209.2.227.181. Redistribution subject to

Model Calibration and Validation

The dynamical model was calibrated using average monthly pa-rameters for each input variable over 1948–1973 by minimizingerrors between mean modeled streamflow and mean observedstreamflow �Yates and Strzepek 1998�. The authors claim that thiscalibration aptly captures the hydrologic features of the basin. Allvalidation periods presented in this study are based on the afore-mentioned calibration period. Further details are available inYates and Strzepek �1998�.

The statistical model is validated through a cross-validationapproach by dropping a fraction of the data �i.e., the validationperiod�, obtaining the best subset of model variables, fitting amodel on the remaining data �i.e., the fitting period�, and thenpredicting the monthly streamflow for the validation period. The

on Model Streamflow Predictions and Observations

CombinationStatisticalpredicted

Combinationpredicted

0.83 — —

0.82 — —

— 0.70 0.83

0.83 — —

— 0.86 0.83

dashed line� streamflow at Malakal, 1951–1970

binati

cal

odel �

009

ASCE license or copyright; see http://pubs.asce.org/copyright

Page 8: Statistical–Dynamical Approach for Streamflow …water.columbia.edu/files/2011/11/Block2009Statistical .pdfStatistical–Dynamical Approach for Streamflow Modeling at Malakal,

necessary PCs for the validation period are obtained by multiply-ing the variable �precipitation or temperature� by the eigenvaluesfrom the PCA of the fitting period.

To test the statistical model robustly, validation was performedin three different ways: �1� dropping each year individually �mim-icking a prediction-type assessment�; �2� validating on 1951–1970; and �3� validating on 1971–1990. The period 1951–1970represents a sharp increase followed by a decline in Malakalstreamflow, including the wettest year on record, whereas the1971–1990 period portrays a steady decline. An examination ofannual streamflow in Fig. 2�b� visually illustrates these patterns.The GCV approach described earlier was utilized to obtain theoptimal model parameters � and P, and the best subset of vari-ables for the fitting periods. Using GCV, the best subset of vari-ables includes the first two PCs of both precipitation andtemperature as well as the first PC for each lagged by 1 month.These six predictors constitute the optimal set for all validationperiods. Inclusion of the lagged PCs as a predictor is not surpris-ing, due to the regulating characteristics of the swamps in thenorthern half of the study area. The best order of polynomial Pwas found to be 1 for all periods and � was found to be 0.20when dropping each year individually, 0.25 for 1951–1970, and0.3 for 1971–1990.

The predictions resulting from the dynamical, statistical, andcombination models are evaluated by correlation coefficients be-

Fig. 6. Observed �solid line� and combination m

tween the model predictions and the observed streamflow values.

JOURNAL

Downloaded 17 Feb 2010 to 209.2.227.181. Redistribution subject to

Results

Correlation coefficients between streamflow predictions from thedynamical, statistical, and combination models and historical ob-servations are presented in Table 1. Over the entire 1912–1990period, the dynamical and statistical models are quite comparable,with correlation coefficients of 0.75 and 0.77, respectively. Notsurprisingly, the combination model produces a higher correlationcoefficient of 0.83, by taking advantage of the capabilities of bothmodels. The combination model coefficients, �1 and �2, as pre-sented in Eq. �4�, are 0.44 and 0.59 for the dynamical and statis-tical models, respectively, implying a relatively balancedinfluence, with slightly greater weight originating from the statis-tical model.

The two 20 year validation periods also prove insightful. Thecombination model produces a strong correlation coefficient of0.83 for both validation periods, giving credence to the ability ofthe model to perform robustly throughout varying climatic trends.This is not the case, however, for the dynamical and statisticalmodels independently. During the 1951–1970 validation period,the correlation coefficient for the dynamical model is quite high�0.83�, whereas the statistical model is less �0.70�. The 1971–1990 validation period, however, illustrates a reversal, as the sta-tistical model produces a high correlation coefficient �0.86�, with

dashed line� streamflow at Malakal, 1971–1990

odel �

the dynamical model significantly lower �0.64�. These two epochs

OF HYDROLOGIC ENGINEERING © ASCE / FEBRUARY 2009 / 191

ASCE license or copyright; see http://pubs.asce.org/copyright

Page 9: Statistical–Dynamical Approach for Streamflow …water.columbia.edu/files/2011/11/Block2009Statistical .pdfStatistical–Dynamical Approach for Streamflow Modeling at Malakal,

clearly demonstrate the advantages of the combination approach,as it tempers the inadequacies of the individual models, but isable to capture the positive features of both. Not unexpectedly,the optimal combination model coefficients utilized in the twovalidation periods also reflect the ability of the dynamical andstatistical models to capture streamflow during the associatedcalibration periods. For 1951–1970, the combination model coef-ficients come to 0.55 and 0.5, favoring the dynamical model;coefficients for 1971–1990 are 0.34 and 0.69, favoring the statis-tical model.

Figs. 5 and 6 illustrate the observed and combination modelpredicted monthly streamflow at Malakal over the two 20 yearvalidation periods. The combination model results mimic the ob-served values quite well, with the exception of grossly underesti-mating the 1964 peak and overestimating the 1987 peak. Figs. 7and 8 show fewer years, 1962–1966, including the wettest year inthe record, and 1988–1990, respectively, with the dynamical andstatistical model results also included. The 1988–1990 figure isenlightening, illustrating the ability of the combination model toclosely approximate the peak in the first 2 years, even though thetwo independent models deviate significantly; it fails, however, toadequately capture the 1990 peak.

A peak streamflow analysis reveals the ability of the dynami-cal, statistical, and combination models to predict both the timingand quantity of the highest monthly streamflow in each year. Fig.9 illustrates the capacity of the models to predict the month inwhich the peak streamflow occurs. Clearly, the statistical modelhistogram is more tightly grouped around the highest observedmonth than the dynamical model histogram, thereby assisting thecombination model in better forecasting the peak timing. Virtually

Fig. 7. Observed �solid line�, combination model �dashed line�, dynaat Malakal, 1962–1965

all �97%� of the combination model predictions of peak stream-

192 / JOURNAL OF HYDROLOGIC ENGINEERING © ASCE / FEBRUARY 2

Downloaded 17 Feb 2010 to 209.2.227.181. Redistribution subject to

flow lie within 1 month of the observed peak month, with astrong majority falling into the correct month. Fig. 10 indicateseach model’s capacity to capture the peak streamflow quantity.Although the dynamical model tends to better reproduce higherpeaks, the statistical model �and therefore the combination model�generally appears to undersimulate peak streamflow. This is adirect result of the statistical modeling nature and limited dataavailability; when the statistical model is forecasting a wet year,and the fitting period includes few significant wet years, predic-tion will prove difficult. Correlation coefficient skill scores for thedynamical, statistical, and combination models equate to 0.47,0.2, and 0.47, respectively, favoring the dynamical model,whereas root mean square errors total to 7,067, 6,312, and 5,673for peak streamflow quantity, favoring the statistical model. Over-all, the combination model is clearly superior, and appears toalleviate timing and quantity deficiencies evident in the two indi-vidual models.

An ensemble approach, in lieu of the deterministic results de-scribed thus far, may also be advantageous. This process involvesuse of the statistical model errors �me� to create random normaldeviates, with mean zero and variance �me

2 , which are added tothe predicted streamflow values from the statistical model, thusproviding ensembles and the associated probability density func-tion. This has been used successfully in predicting seasonal rain-fall and streamflow �see, e.g., Grantz et al. 2007; Singhrattna et al.2005; Regonda et al. 2006�. The combination model ensembleis generated using Eq. �4�, with monthly stochastic predictionsfrom the statistical model. Combination model coefficients re-main unchanged. Fig. 11 illustrates the combination model en-

model �dotted line�, and statistical model �dash dot line� streamflow

mical

semble for the 1988–1990 period. Ensembles are shown as box

009

ASCE license or copyright; see http://pubs.asce.org/copyright

Page 10: Statistical–Dynamical Approach for Streamflow …water.columbia.edu/files/2011/11/Block2009Statistical .pdfStatistical–Dynamical Approach for Streamflow Modeling at Malakal,

plots in Fig. 11, with the box covering the 25th and 75th percen-tile, the horizontal line inside the box representing the median,whiskers extending to the 5th and 95th percentile, and outliersshown as circles.

The rank probability skill score �RPSS�, a measure of the skillof ensemble forecasts, is a widely used probabilistic measure forcomparison with prediction by climatology forecasts �Wilks1995; Saunders and Fletcher 2004�. The general form of the rankprobability score �RPS� equation for any year takes the form:

RPS = �m=1

R

�CPF,m − CPO,m�2 �5�

where R=number of categories; CPF,m=cumulative predictedprobability for the forecast ensemble �through category m�; andCPO,m=cumulative observed probability �also through categorym�. This study incorporates three categories of equal size �e.g.,below normal, near normal, or above normal streamflow�, suchthat the climatological probability of being in each category is1 /3; for the category that was observed the probability is one, andzero elsewhere. A perfect forecast results in RPS equaling zero.

Fig. 8. Observed �solid line�, combination model �dashed line�, dynaat Malakal, 1988–1990

The RPSS is then defined as

JOURNAL

Downloaded 17 Feb 2010 to 209.2.227.181. Redistribution subject to

RPSS = 1 −RPSforecast

RPSclimatology�6�

RPSS values range from +1 to −�. A value of +1 representsperfect skill, or a perfect forecast, whereas negative values repre-sent poor skill; any value above zero represents an improved fore-cast over climatology. The RPSS is calculated for each yearseparately. For the ensemble forecast shown in Fig. 11, a medianRPSS skill score of 0.65 results. This indicates a skillful ensembleforecast and gives clear indication of the improvement over cli-matology. The ensemble approach provides a framework forprobability density function assessment and evaluation of thresh-old exceedance probabilities, especially useful for probabilisticprediction of flood or drought conditions �e.g., Block and Raja-gopalan 2007; Grantz et al. 2007�.

Summary and Discussion

Dynamical, statistical, and combination models are assessed fortheir ability to predict monthly streamflow at Malakal, Sudan. Thestatistical model incorporates a nonparametric approach based on

model �dotted line�, and statistical model �dash dot line� streamflow

mical

local polynomial regression, utilizing principal components of

OF HYDROLOGIC ENGINEERING © ASCE / FEBRUARY 2009 / 193

ASCE license or copyright; see http://pubs.asce.org/copyright

Page 11: Statistical–Dynamical Approach for Streamflow …water.columbia.edu/files/2011/11/Block2009Statistical .pdfStatistical–Dynamical Approach for Streamflow Modeling at Malakal,

Fig. 9. Peak flow timing histogram for �a� dynamical; �b� statistical; and �c� combination models compared to observed flows over 1912–1990.Zero indicates modeled peak flow months are identical to observed peak flow months, �1 indicates model prediction differs by 1 month, etc.

Fig. 10. Peak flow quantity histogram for �a� dynamical; �b� statistical; and �c� combination models compared to observed flows over 1912–1990.Diagonal line represents perfect predictions.

194 / JOURNAL OF HYDROLOGIC ENGINEERING © ASCE / FEBRUARY 2009

Downloaded 17 Feb 2010 to 209.2.227.181. Redistribution subject to ASCE license or copyright; see http://pubs.asce.org/copyright

Page 12: Statistical–Dynamical Approach for Streamflow …water.columbia.edu/files/2011/11/Block2009Statistical .pdfStatistical–Dynamical Approach for Streamflow Modeling at Malakal,

precipitation and temperature throughout the upper White NileBasin. The GCV function is employed for determining the bestsubset �six� of regression variables out of a suite of twelve poten-tial ones. The combination model is a simple linear regression ofthe outputs from the dynamical and statistical models; evaluationof correlation coefficients for the combination model demon-strates significant overall skill, outperforming either of the othertwo models independently. Peak streamflow analyses of timingand quantity also exhibit superior performance by the combina-tion model. An ensemble approach to the combination model,using random normal deviates of the statistical model error, isalso illustrated for 1988–1990, and demonstrates a framework forplanning and management from a probabilistic standpoint.

The combination model could easily be transformed into aprediction tool if forecasts of precipitation and temperature areavailable, making it attractive to basin managers and decisionmakers. Other options include implementing precipitation andtemperature scenarios from general circulation models for assess-ing potential climate change effects.

Other aspects also warrant further attention, including thresh-old exceedance probability evaluation and extending the model toincorporate a larger portion of the Nile Basin. By applying theensemble forecasting framework to the model, threshold exceed-ances may be set to give indication of risk levels of wet or dry �orflood or drought� conditions in a month or season �Block andRajagopalan 2007�. Extending the model, particularly to theAswan Dam in Egypt, would also be of value for comparisonwith existing basin-wide water systems models and evaluation of

Fig. 11. Box plots of monthly ensemble predictions for 1988–19predictions �dashed line�

potential large-scale basin projects.

JOURNAL

Downloaded 17 Feb 2010 to 209.2.227.181. Redistribution subject to

Acknowledgments

This research study was partially funded by a grant from the U.S.Agency for International Development �USAID� through the In-ternational Food Policy Research Institute �IFPRI�, and forms partof the first writer’s Ph.D. dissertation at the University ofColorado—Boulder. The writers wish to express their apprecia-tion to the editors and three anonymous reviewers for their in-sightful comments and suggestions, undeniably improving thequality of this paper.

References

Balmaseda, M., Anderson, D., and Davey, M. �1994�. “ENSO predictionusing a dynamical ocean model coupled to statistical atmospheres.”Tellus, Ser. A, 46A�4�, 497–511.

Block, P., and Rajagopalan, B. �2007�. “Interannual variability and en-semble forecast of Upper Blue Nile Basin kiremt season precipita-tion.” J. Hydrometeor., 8�3�, 327–343.

Bodo, B. �2001�. “Monthly discharge data for world rivers.”�dss.ucar.edu/datasets/ds552.1� �Sept. 19, 2005�.

Camberlin, P. �1995�. “June–September rainfall in north-eastern Africaand atmospheric signals over the tropics: A zonal perspective.” Int. J.Climatol., 15�7�, 773–783.

Camberlin, P. �1997�. “Rainfall anomalies in the source region of the Nileand their connection with the Indian summer monsoon.” J. Clim.,10�6�, 1380–1392.

Conway, D. �1996�. “The impacts of climate variability and future cli-mate change in the Nile Basin on water resources in Egypt.” Int. J.

cluding observed flows �solid line� and mean combination model

90, in

Water Resour. Dev., 12�3�, 277–296.

OF HYDROLOGIC ENGINEERING © ASCE / FEBRUARY 2009 / 195

ASCE license or copyright; see http://pubs.asce.org/copyright

Page 13: Statistical–Dynamical Approach for Streamflow …water.columbia.edu/files/2011/11/Block2009Statistical .pdfStatistical–Dynamical Approach for Streamflow Modeling at Malakal,

Conway, D., and Hulme, M. �1993�. “Recent fluctuations in precipitationand runoff over the Nile sub-basins and their impact on Nile dis-charge.” Clim. Change, 25�2�, 127–151.

Craven, P., and Wahba, G. �1979�. “Smoothing noisy data with splinefunctions.” Numer. Math., 31�4�, 377–403.

Gamachu, D. �1977�. Aspects of climate and water budget in Ethiopia,Addis Ababa University Press, Addis Ababa, Ethiopia.

Georgakakos, A. �2004�. “Decision support systems for integrated waterresources management with an application to the Nile Basin.” Proc.,Int. Federation for Automatic Control Workshop on Modeling andControl for Participatory Planning and Managing Water Systems,Venice, Italy, Sept. 29 and Oct. 1, Elsevier, New York.

Gleick, P. �1991�. “The vulnerability of runoff in the Nile basin to climatechanges.” Environ. Prof., 13�1�, 66–73.

Grantz, K., Rajagopalan, B., Clark, M., and Zagona, E. �2007�. “Sea-sonal shifts in the North American monsoon.” J. Clim., 20�9�, 1923–1935.

Griffiths, J. �1972�. Ethiopian highlands. World survey of climatology,H. Landsberg, ed., Vol. 10, Elsevier, Amsterdam, The Netherlands,369–388.

Hagedorn, R., Doblas-Reyes, F. J., and Palmer, T. N. �2005�. “Therationale behind the success of multi-model ensembles in sea-sonal forecasting. Part I: Basic concept.” Tellus, Ser. A, 57A, �3�,219–233.

Helsel, D., and Hirsch, R. �1995�. Statistical methods in water resources,Elsevier Science, Amsterdam, The Netherlands.

Kite, G. �1981�. “Recent changes in the levels of Lake Victoria.” Hydrol.Sci. Bull., 26�3�, 233–243.

Krishnamurti, T. N., et al. �1999�. “Improved weather and seasonal cli-mate forecasts from multi-model superensemble.” Science,285�5433�, 1548–1550.

Krishnamurti, T. N., et al. �2000�. “Multimodel ensemble forecasts forweather and seasonal climate.” J. Clim., 13�23�, 4196–4216.

Lall, U. �1995�. “Recent advances in nonparametric function estimation:Hydraulic applications.” Rev. Geophys., 33�S1�, 1093–1102.

Loader, C. �1999�. Local regression likelihood, Springer, New York.Mitchell, T., Carter, T., Jones, P., Hulme, M., and New, M. �2004�. “A

comprehensive set of high-resolution grids of monthly climate forEurope and the globe: The observed record �1901-2000� and 16 sce-narios �2001–2100�.” Tyndale Working Paper No. 55, Tyndale Center,UEA, Norwich, U.K.

Mohamed, Y., van den Hurk, B., Savenije, H., and Bastiaanssen, W.�2005�. “Hydroclimatology of the Nile: Results from a regional cli-mate model.” Hydrology Earth Syst. Sci., 2, 319–364.

Moorehead, A. �1971�. The White Nile, Harper & Row, New York.Morel-Seytoux, H., Fahmy, H., and Lamagat, J. �1993�. “A composite

hydraulic and statistical flow-routing method.” Water Resour. Res.,29�2�, 413–418.

Mutai, C., and Ward, M. �2000�. “East African rainfall and the tropicalcirculation/convection on intraseasonal to interannual timescales.”J. Clim., 13�22�, 3915–3939.

New, M., Lister, D., Hulme, M., and Makin, I. �2002�. “A high-resolution

196 / JOURNAL OF HYDROLOGIC ENGINEERING © ASCE / FEBRUARY 2

Downloaded 17 Feb 2010 to 209.2.227.181. Redistribution subject to

data set of surface climate over global land areas.” Clim. Res., 21�1�,1–25.

Nicholson, S., and Kim, J. �1997�. “The relationship of the El Niño–southern oscillation to the African rainfall.” Int. J. Climatol., 17, �2�,117–135.

Ntale, H., and Gan, T. �2004�. “East African rainfall anomaly patterns inassociation with El Niño/southern oscillation.” J. Hydrol. Eng., 9�4�,257–268.

Prairie, J., Rajagopalan, B., Lall, U., and Fulp, T. �2007�. “A stochasticnonparametric technique for space-time disaggregation of stream-flows.” Water Resour. Res., 43, W03432.

Rajagopalan, B., Lall, U., and Zebiak, S. �2002�. “Categorical climateforecasts through regularization and optimal combination of multipleGCM ensembles.” Mon. Weather Rev., 130�7�, 1792–1811.

Regonda, S., Rajagopalan, B., Clark, M., and Zagona, E. �2006�. “Amultimodel ensemble forecast framework: Application to spring sea-sonal flows in the Gunnison River Basin.” Water Resour. Res., 42,W09404.

Regonda, S., Rajagopalan, B., Lall, U., Clark, M., and Moon, Y. �2005�.“Local polynomial method for ensemble forecast of time series.” Non-linear Processes Geophys., 12, 397–406.

Reynolds, C. �2005�. “Low water levels observed on Lake Victoria.”Rep., Production Estimates and Crop Assessment Division of theUSDA Foreign Agricultural Service, �http://www.fas.usda.gov/pecad/highlights/2005/09/uganda_26sep2005/� �Sept. 26, 2005�.

Saunders, M., and Fletcher, C. �2004�. “Verification of spring 2004 UKcity temperature seasonal forecasts.” University College, London,�http://forecast.mssl.ucl.ac.uk/docs/Spring2004TempVerification.pdf��Mar. 15, 2006�.

Seleshi, Y., and Zanke, U. �2004�. “Recent changes in rainfall and rainydays in Ethiopia.” Int. J. Climatol., 24�8�, 973–983.

Shahin, M. �1985�. Hydrology of the Nile Basin, Elsevier, Amsterdam,The Netherlands.

Singhrattna, N., Rajagopalan, B., Clark, M., and Krishna Kumar, K.�2005�. “Seasonal forecasting of Thailand summer monsoon rainfall.”Int. J. Climatol., 25�5�, 649–664.

Strzepek, K., Yates, D., and El Quosy, D. �1996�. “Vulnerability assess-ment of water resources in Egypt to climatic change in the NileBasin.” Clim. Res., 6�2�, 89–95.

Sutcliffe, J. �1974�. “A hydrological study of the southern Sudd region ofthe Upper Nile.” Hydrol. Sci. Bull., 19�2�, 237–255.

Sutcliffe, J., and Parks, Y. �1987�. “Hydrologic modeling of the Sudd andJongeli Canal.” J. Hydrol. Sci., 32�2�, 143–159.

von Storch, H., and Zwiers, F. W. �1999�. Statistical analysis in climateresearch. Cambridge University Press, Cambridge, Mass.

Wilks, D. �1995�. Statistical methods in atmospheric science: An intro-duction, Academic, San Diego.

World Meteorological Organization �WMO�. �1981�. Hydrometeorologi-cal survey of the catchments of Lakes Victoria, Kyoga, and MobutuSese Seko, Geneva.

Yates, D., and Strzepek, K. �1998�. “Modeling the Nile basin under cli-mate change.” J. Hydrol. Eng., 3�2�, 98–108.

009

ASCE license or copyright; see http://pubs.asce.org/copyright