-
or
, Kow
assistance of Taha Ouarda, Associate Editor
Keywords:
represateand. Th
watersheds in China (Lushui and Daning) were used in six models
for three prediction horizons (i.e., 1-,
have gwidelyawson
which is derived from the chaotic time series (Farmer and
Sidoro-wich, 1987). Laio et al. (2003) carried out a comparison of
ANN andNLP for ood predictions and found that ANN performed
slightlybetter at long forecast time while the situation was
reversed forshorter time. Sivakumar et al. (2002) found that ANN
was worsethan NLP in short-term river ow prediction.
The ANN is able to capture the dynamics of the ow series byusing
previously observed ow values as inputs during the fore-
niques for the purpose of data-preprocessing may be
favorable.Two such techniques are known as singular spectral
analysis(SSA) and wavelet multi-resolution analysis (WMRA). Briey,
theSSA decomposes a time series into a number of components
withsimpler structures, such as a slowly varying trend,
oscillationsand noise. The SSA uses the basis functions
characterized bydata-adaptive nature, which makes the approach
suitable for theanalysis of some nonlinear dynamics (Elsner and
Tsonis, 1997). Atime series in theWMRA breaks down into a series of
linearly inde-pendent detail signals and one approximation signal
by using dis-crete wavelet transform with a specic wavelet function
such as
* Corresponding author. Tel.: +852 27666014.
Journal of Hydrology 372 (2009) 8093
Contents lists availab
H
elsE-mail address: [email protected] (K.W. Chau).good
state-of-the-art reviews on ANN modeling in hydrology.Many studies
focused on streamow predictions have proven thatANN is superior to
traditional regression techniques and time-series models including
autoregressive (AR) and autoregressivemoving average (ARMA) (Raman
and Sunilkumar, 1995; Jainet al., 1999; Thirumalaiah and Deo, 2000;
Abrahart and See,2002; Castellano-Mndeza et al., 2004; Kisi, 2003,
2005). Besides,ANN is also compared with nonlinear prediction (NLP)
method
2005; Muttil and Chau, 2006). De Vos and Rientjes (2005)
sug-gested that an effective solution to the forecasting lag effect
is toobtain newmodel inputs by moving average (MA) over the
originaldischarge data.
As known, a natural ow series can be viewed as a quasi-peri-odic
signal, which is contaminated by various noises at differentow
levels. Cleaner signals used as model inputs will improvethe model
performance. Therefore, signal decomposition tech-Daily ows
predictionArticial neural networkLagged predictionMoving
averageSingular spectral analysisWavelet multi-resolution
analysis
Introduction
Articial neural networks (ANNs)tion in past two decades and
beenforecasting. The ASCE (2000) and D0022-1694/$ - see front
matter 2009 Elsevier B.V. Adoi:10.1016/j.jhydrol.2009.03.0382-, and
3-day-ahead forecast). The poor performance on ANN forecast models
was mainly due to the exis-tence of the lagged prediction. The
ANN-MA, among six models, performed best and eradicated the
lageffect. The performances from the ANN-SSA1 and ANN-SSA2 were
similar, and the performances fromthe ANN-WMRA1 and ANN-WMRA2 were
also similar. However, the models based on the SSA presentedbetter
performance than the models based on the WMRA at all forecast
horizons, which meant that theSSA is more effective than the WMRA
in improving the ANN performance in the current study. Based onan
overall consideration including the model performance and the
complexity of modeling, the ANN-MAmodel was optimal, then the ANN
model coupled with SSA, and nally the ANN model coupled
withWMRA.
2009 Elsevier B.V. All rights reserved.
ained signicant atten-used for hydrological
and Wilby (2001) give
casting of daily ows from the ow data alone. As a
consequence,the high autocorrelation of the ow data often introduce
the laggedpredictions for the ANN model. The issue of lagged
predictions inthe ANN model has been mentioned by some researchers
(Dawsonand Wilby, 1999; Jain and Srinivasulu, 2004; de Vos and
Rientjes,This manuscript was handled byK. Georgakakos,
Editor-in-Chief, with the
The ANN-SSA1, ANN-SSA2, ANN-WMRA1 and ANN-WMRA2 were generated
by using the original ANNmodel coupled with SSA and WMRA in terms
of two different means. Two daily ow series from differentMethods
to improve neural network perf
C.L. Wu, K.W. Chau *, Y.S. LiDept. of Civil and Structural
Engineering, Hong Kong Polytechnic University, Hung Hom
a r t i c l e i n f o
Article history:Received 2 April 2008Received in revised form 25
February 2009Accepted 30 March 2009
s u m m a r y
In this paper, three data-p(SSA), and wavelet multi-(ANN) to
improve the estimpreprocessing, were set upWMRA1, and ANN-WMRA2
Journal of
journal homepage: www.ll rights reserved.mance in daily ows
prediction
loon, Hong Kong, Peoples Republic of China
rocessing techniques, moving average (MA), singular spectrum
analysisolution analysis (WMRA), were coupled with articial neural
networkof daily ows. Six models, including the original ANN model
without dataevaluated. Five new models were ANN-MA, ANN-SSA1,
ANN-SSA2, ANN-e ANN-MA was derived from the raw ANN model combined
with the MA.
le at ScienceDirect
ydrology
evier .com/locate / jhydrol
-
the Haar wavelet. Mallat (1989) presented a complete theory
forwavelet multi-resolution signal decomposition (also mentionedas
pyramid decomposition algorithm). Moreover, the continuouswavelet
transform can conduct a local signal analysis at whichpoint the
traditional Fourier and SSA are, however, less effective(Howell and
Mahrt, 1994), which can be referred to Torrence andCompo (1998) for
a practical guide. Nevertheless, the signal analy-sis in the
timefrequency space is not the point of concern in thisstudy.
The techniques of SSA and WMRA have been successfully
intro-duced to the eld of hydrology (Lisi et al., 1995;
Sivapragasamet al., 2001; Marques et al., 2006; Partal and Kisi,
2007). Sivaprag-asam et al. (2001) established a hybrid model of
support vector
Yangtze River, and are both located in Hubei province,
PeopleRepublic of China. The ow data from Lushui River were
acquiredat Tongcheng hydrology station which is at the upper stream
ofLushui watershed (hereafter, the ow data is referred to as
Lushuiseries). The watershed has an area of 224 km2. The data
period cov-ers a 5 years long duration (January 1, 1984December 31,
1988).The ow data from Daning River were collected at Wuxi
hydrologystation which is at the upper and middle streams of Daning
wa-tershed. The drainage area controlled by Wuxi station is 2001
km2. The ow data spanned 20 years (from January 1, 1988to December
31, 2007).
In the process of modeling of ANNs, the raw ow data is
oftenpartitioned into three parts as training set, cross-validation
set
m3)
C.L. Wu et al. / Journal of Hydrology 372 (2009) 8093 81machine
(SVM) in conjunction with the SSA for the forecasting ofrainfall
and runoff. A considerable improvement in the model per-formance
was obtained in comparison with the original SVM mod-el. However,
the paper did not explicitly mention how to combineSVM with the
SSA. The applications of WMRA to precipitation anddischarge were
presented in the work of Partal and Kisi (2007) andPartal and
Cigizoglu (2008) respectively, where the WMRA was ap-plied to each
model input variable. Results from their studies indi-cated that
the WMRA is highly promising for improvement of themodel
performance.
The objective of this study is to evaluate the effectiveness of
thethree data-preprocessing techniques of MA, SSA and WMRA in
theimprovement of the ANN model performance. To explore the SSAor
WMRA, the ANN model is coupled with the components ofSSA or WMRA in
terms of two different methods. One is that theraw ow data is rst
decomposed, then a new ow series is ob-tained by components lter,
and nally the new ow series is usedto generate the model inputs.
The type of model is named as ANN-SSA1 or ANN-WMRA1. The other
method is the same as Partal andCigizoglu (2008), based on which
the model is named as ANN-SSA2or ANN-WMRA2. With the original ANN
model and the ANN-MA,there are six models for the ow data
forecasting in all. This paperis organized in the following manner.
Streamow data presentsthe two sets of streamow data. Methods
describes the modelingmethods including a brief introduction of
ANN, MA, SSA, andWMRA, and how to construct the hybrid ANN models.
The applica-tion of the forecast models to the ow data is presented
in Appli-cation of models to the ow data where relevant points
includedecomposition of ows, the identication of the ANNs
architec-ture, implementation of ANN models, and forecasting
results anddiscussion. Conclusions sheds light on main conclusions
in thisstudy.
Streamow data
Daily mean ow data from two rivers of Lushui and Daning areused
in this study. The two rivers belong to direct tributaries of
Table 1Related information for two rivers and the ow data.
Watershed and datasets Statistical parameters
l (m3) Sx (m3) Cv Cs Xmin (
LushuiOriginal data 4.63 8.49 0.55 7.38 0.02Training 4.41 8.56
0.52 7.84 0.02Cross-validation 5.78 8.35 0.69 3.84 0.05Testing 3.90
8.39 0.47 10.22 0.30
WuxiOriginal data 61.9 112.6 0.55 7.20 6.0Training 60.6 95.6
0.63 5.90 7.6
Cross-validation 60.7 132.2 0.46 8.35 6.0Testing 66.0 122.1 0.54
6.30 10.1and testing set. The training set serves the model
training andthe testing set is used to evaluate the performances of
models.The cross-validation set help to implement an early stopping
ap-proach in order to avoid overtting of the training data. The
samedata partition was adopted in two daily ow series: the rst half
ofthe entire ow data as training set and the rst half of the
remain-ing data as cross-validation set and the other half as
testing set.
Table 1 presents related information about two rivers and
somedescriptive statistics of the original data and three data
subsets,including mean (l), standard deviation (Sx), coefcient of
variation(Cv), skewness coefcient (Cs), minimum (Xmin), and
maximum(Xmax). As shown in Table 1, the training set cannot fully
includethe cross-validation or testing set. Due to the weak
extrapolationability of ANN, it is suggested that all data should
be scaled tothe interval [0.9, 0.9] rather than [1, 1] when ANN
employsthe hyperbolic tangent functions as transfer functions in
the hid-den layer and output layer.
Fig. 1 estimates the autocorrelation functions (ACF),
averagemutual information (AMI), and partial autocorrelation
functions(PACF) from lag 0 to lag 30 days for the two ow series.
The AMImeasures the general dependence of two variables (Fraser
andSwinney, 1986) whereas the ACF and PACF show the dependencefrom
the perspective of linearity. The rst order autocorrelationof each
ow data is large (0.59 for Lushui, and 0.7 for Wuxi). Therapid
decaying pattern of the PACF conrms the dominance ofautoregressive
process, relative to the moving-averaging processrevealed by the
ACF.
Methods
Articial neural networks
An ANN is a massively parallel-distributed information
process-ing system with highly exible conguration and so has an
excel-lent nonlinearity capturing ability. The feed-forward
multilayerperceptron (MLP) among many ANN paradigms is by far the
mostpopular, which usually uses the technique of error back
propaga-
Watershed area and data period
Xmax (m3)
134 Area: 224 km2; data period: January, 1984December,
198812863
134
2230 Area: 2001 km2; data period: January, 1988December,
20071530
22301730
-
bd
2 an
Hydrtion to train the network conguration. The architecture of
theANN consists of the number of hidden layers and the number
ofneurons in input layer, hidden layers and output layer. ANNs
withone hidden layer are commonly used in hydrologic
modeling(Dawson and Wilby, 2001; de Vos and Rientjes, 2005) since
thesenetworks are considered to provide enough complexity to
accu-rately simulate the nonlinear-properties of the hydrologic
process.A three-layer ANN is therefore chosen for the present
study, whichcomprises the input layer with I nodes, the hidden
layer with H
0 5 10 15 20 25 300
0.2
AMI (
bits
)
0 5 10 15 20 25 30-1
-0.5
0
0.5
1
ACF
Lushui
AMI ACF
0 5 10 15 20 25 30-0.5
0
0.5
1
Lag (day)
PAC
F
a
c
Fig. 1. Plots of ACF, AMI, and PACF of the ow data (1 and 3 for
Lushui;
82 C.L. Wu et al. / Journal ofnodes (neurons), and the output
layer with one node. The hyper-bolic tangent functions are used as
transfer functions in the hiddenlayer and output layer. The purpose
of network training is to opti-mize the weights w connecting
neighboring layers and bias h ofeach neuron in hidden layer and
output layer. The LevernbergMarquardt (LM) training algorithm is
used here for adjusting theweights and bias.
Moving average
The moving average method smoothes data by replacing eachdata
point with the average of the K neighboring data points, whereKmay
be called the length ofmemorywindow. Themethod is basedon the idea
that any large irregular component at any point in timewill exert a
smaller effect if we average the point with its immedi-ate
neighbors (Newbold et al., 2003). The most common movingaverage
method is the unweighted moving average, in which eachvalue of the
data carries the same weight in the smoothing process.For time
series {x1, x2, . . ., xN}, the K-term unweighted moving aver-age
is written as xs
Pk1i0 xsi=k where t K; . . .N; xs when
the backward moving mode is adopted (Lee et al., 2000). Choiceof
the window length K is by a trial and error procedure of
minimiz-ing the ANN prediction error.
Singular spectrum analysis
The implementation of SSA can be referred to Vautard et
al.(1992) and Elsner and Tsonis (1997). Four steps are
summarizedfor the implementation. The rst step is to construct the
trajectorymatrix. The trajectory matrix results from the method of
delays.In the method of delays, the coordinates of the phase space
willapproximate the dynamic of the system by using lagged copies
ofthe time series. Therefore, the trajectory matrix can reect
theevolution of the time series with a careful choice of (s, L)
window.For time series {x1, x2, . . ., xN}, the trajectory matrix
is denoted by
X 1p
x1 x1s x12s . . . x1m1sx2 x2s x22s . . . x2L1sx3 x3s x32s . . .
x3L1s
0BBBBBB
1CCCCCC 1
0 5 10 15 20 25 300
0.1
0.2
0.3
0.4
AMI (
bits
)
0 5 10 15 20 25 30-1
-0.5
0
0.5
1
ACF
Wuxi
AMI ACF
0 5 10 15 20 25 30-0.5
0
0.5
1
Lag (day)PA
CF
d 4 for Wuxi) where the dashed lines stand for 95% condence
bound.
ology 372 (2009) 8093N... ..
. ... ..
. ...
xNL1s xNL2s xNL3s . . . xN
B@ CA
where L is the embedding dimension (also called singular
numberin the context of SSA), s is the lagged (or delay) time. The
matrixdimension is R L where R = N (L 1)s. The next step is the
sin-gular value decomposition (SVD) of the trajectory matrix X.
LetS = XTX (called the lagged-covariance matrix). With SVD, X can
bewritten as X = DLET where D and E are left and right singular
vectorsof X, and L is a diagonal matrix of singular values. E
consists oforthonormal columns, and is also called the empirical
orthonormalfunctions (EOFs). Substituting X into the denition of S
yields theformula of S = EL2ET. Further S = E^ET since L2 = ^where
^ is a diag-onal matrix consisting of ordered values 0 k1 6 k2 6
kL. There-fore, the right singular vectors of X are the
eigenvectors of S (Elsnerand Tsonis, 1997). In other words, the
singular vectors E and singu-lar values of X can be respectively
attained by calculating the eigen-vectors and the square roots of
the eigenvalues of S.
The rst two steps involve the decomposition stage of SSA, andthe
next two steps belong to the recovering stage. The third step isto
calculate the principal components (aki
0s) by projecting the origi-nal time record onto the
eigenvectors as follows:
aki XLj1
xij1sekj ; for i 1;2; . . . ;N L 1s 2
where ekj represents the jth component of the kth eigenvector.
Asknown, each principal component is a ltered process of the
origi-nal series with length N (L 1)s, not length N as desired,
whichposes a problem in real-time prediction.
-
ANN-SSA1 and ANN-SSA2The raw ow data is rst decomposed by SSA
into L RCs, and
then the raw ow data is ltered by selecting p (6L) from all
LRCs. A new ow series is generated by summing the selected pRCs.
Finally, the new ow series is used to generate the model in-puts.
The type of model is hereafter referred to as ANN-SSA1.
Different from ANN-SSA1, the model inputs are rst derivedfrom
the original ow data, and then each input variable seriesof the
model is ltered by selecting p ( 1, b0 > 0 are xed. The
appropriatechoices for a0 and b0 depend on the wavelet function. A
commonchoice for them is a0 = 2 and b0 = 1. Now Assuming a discrete
timeseries xt, where xi occurs at the discrete time i, the DWT
becomes
Wm;n 2m=2XN1i0
xiw2mi n 5
where Wm,n is the wavelet coefcient for the discrete wavelet
func-tion with scale a = 2m and location b = 2mn. In this study,
the wave-let function is derived from the family of Daubechies
wavelets withthe 3 order.
Multi-resolution analysis (MRA)The Mallats decomposition
algorithm (Mallat, 1989) is em-
ployed in this study. According to the Mallats theory, the
originaldiscrete time series xt is decomposed into a series of
linearly inde-pendent approximation and detail signals.
The process consists of a number of successive ltering steps
asdepicted in Fig. 2. Fig. 2a displays an entire MRA scheme,
andFig. 2b shows the ltering operation between two adjacent
resolu-tions. The original signal xt is rst decomposed into an
approxima-
C.L. Wu et al. / Journal oftion and accompanying detail. The
decomposition process is theniterated, with successive
approximation being decomposed in turnso that the nest-resolution
original signal is transformed intomany coarser-resolution
components (Kck and Agiral_Ioglu,2006). As shown in Fig. 2b, the
approximation cAi+1 is achievedby letting cAi pass through the
low-pass lter H and downsam-pling by two (denoted as ;2) whereas
the detailed version cDi+1is obtained by letting cAi pass through
the high-pass lter G0 anddownsampling by two. The details are
therefore the low-scale,high-frequency components whereas the
approximations are thehigh-scale, low-frequency components.
Finally, the original signalxt is decomposed into many detailed
components and one approx-imation component which denotes the
coarsest resolution. Follow-ing the procedure, the raw ow data can
be decomposed intom + 1components if the m in DWA is set.
ANNs integrated with data preprocessing techniques
To explore the capability of ANNs, ve ANN models are gener-ated
with the aid of the above three data-processing techniques.These
data-preprocessing techniques are aimed at improving map-ping
relationship between inputs and output of the ANN model bysmoothing
raw ow data. Six forecasting models are described asfollows.
ANNThe original ANN model (hereafter referred to as ANN)
directly
employs original ow data to generate model input/output pairs.
Itis used as the baseline model for the purpose of comparison
withthe other ve proposed models.
ANN-MAThe moving average method rst smoothes the original ow
data, and then the smoothed data are used to form the model
in-puts. The model is hereafter referred to as ANN-MA.
tx
1cD
1cA
2cD
2cA
3cD
3cA
icA
'G 2 1icD +'H 2 1icA+
(b) (a)
Fig. 2. Schematics of WMRA (a) perform decomposition of xt at
level 3 and (b) ltersignal.
ology 372 (2009) 8093 83The ANN-WMRA1 and ANN-WMRA2 are
established in combi-nation with the WRMA instead of SSA. The idea
behind the model-ling is identical to the ANN-SSA1 and
ANN-SSA2.
-
Evaluation of model performances
The Persons correlation coefcient (r) or the coefcient
ofdetermination (R2 = r2), have been identied as
inappropriatemeasures in hydrologic model evaluation by Legates and
McCabe(1999). The coefcient of efciency (CE) (Nash and
Sutcliffe,1970) is a good alternative to r or R2 as a goodness-of-t
orrelative error measure in that it is sensitive to differences in
theobserved and forecasted means and variances. Legates and
McCabe(1999) also suggested that a complete assessment of
modelperformance should include at least one absolute error
measure(e.g., RMSE) as necessary supplement to a relative error
measure.
Besides, the Persistence Index (PI) (Kitanidis And Bras, 1980)
wasadopted here for the purpose of checking the prediction lag
effect.Three measures were therefore used in this study. They
are
formulated as: CE 1Pi 1nTi T^ i2=Pni1Ti T2, RMSE 1n
q Pni1Ti T^ i2; and PI 1
Pni1Ti T^ i2=
Pni1Ti Til2. In
these equations, n is the number of observations, T^ i stands
forforecasted ow, Ti represents observed ow, T denotes
averageobserved ow, and Til is the ow estimate from a so-call
persis-tence model (or called nave model) that basically takes the
lastow observation (at time i minus the lead time l) as a
prediction.CE and PI values of 1 stands for perfect ts.
5 10 20 30 505
5.5
6
6.5
7
7.5
8
ln (
sing
ular
val
ue )
singular number
Lushui
5 10 20 30 508
8.5
9
9.5
10
10.5
11
ln (
sing
ular
val
ue )
singular number
Wuxi
L=10L=20L=30L=50
L=10L=20L=30L=50
a b
Fig. 3. Singular spectrum for (a) Lushui and (b) Wuxi with
different L.
5
5.5
6
lue
)
Lushui
10
10.5
11
lue
)
Wuxi
=1=3=5=7=10
=1=3=5=7=10
a b
84 C.L. Wu et al. / Journal of Hydrology 372 (2009) 80931 2 3 4
52.5
3
3.5
4
4.5
ln (
sing
ular
vasingular number
Fig. 4. Singular spectrum for (a) Lush1 2 3 4 57.5
8
8.5
9
9.5
ln (
sing
ular
vasingular number
ui and (b) Wuxi with different s.
-
Application of models to the ow data
Decomposition of daily ow data
Decomposition by SSADecomposition of the raw ow data by SSA
requires identifying
the parameter pair (s, L). The choice of L represents a
compromisebetween information content and statistical condence. The
valueof L should be able to clearly resolve different oscillations
hidden inthe original signal. In other words, some leading
eigenvaluesshould be identied. Fig. 3 displays the sensitivities of
the eigen-value decomposition to the singular number L for Lushui
andWuxi.Results show that about ve leading eigenvalues stand out
for dif-ferent L, which implies the leading eigenvalues are
insensitive to L.
These leading eigenvalues are associated with lower
frequencyoscillations. For the convenience of ltering operation
later, L isset a small value of ve in the present study. Fig. 4
presents thesensitivities of the eigenvalue decomposition to the
lag time swhen L = 5. Results suggest that the eigenvalues can be
distin-guished when s = 1, which means that original signal can be
re-solved distinctly. The nal parameter pair (s, L) in SSA
weretherefore set as (1, 5) for two studied ow data series.
Taking the ow data of Lushui as the example, Fig. 5 presentsve
RCs and the original ow series excluding the testing data.The RC1
represents an obvious low-frequency oscillation, whichexhibits a
similar mode to the original ow series. The other RCsreect
high-frequency oscillations, part of which can be deletedso as to
improve the mapping between inputs and output of ANN
200 400 600 800 1000 12000
20
40
RC1
200 400 600 800 1000 1200
-10
01020
RC2
200 400 600 800 1000 1200
-100
1020
RC3
Dis
char
ge (m
3 )
200 400 600 800 1000 1200
-505
1015
RC4
200 400 600 800 1000 1200
-10
0
10
RC5
Time (day) 200 400 600 800 1000 1200
0
50
100
Original flow series
Time (day)
Fig. 5. Reconstructed components (RCs) by SSA and original ow
series of Lushui.
0
0.5
1
AMI (
bits
)
RC1
-1
-0.5
0
0.5
1
CC
F
0
0.5
1
AMI (
bits
)
RC2
-1
-0.5
0
0.5
1
CC
F
C.L. Wu et al. / Journal of Hydrology 372 (2009) 8093 850 2 4 6
8 10 120 2 4 6 8 10 12
0 2 4 6 8 10 120
0.5
1
AMI (
bits
)
RC3
0 2 4 6 8 10 12-1
-0.5
0
0.5
1
CC
F
0 2 4 6 8 10 120
0.5
1
AMI (
bits
)
RC5
0 2 4 6 8 10 12-1
-0.5
0
0.5
1
CC
FLag (day)
Fig. 6. Plots of AMI and CCF between R0 2 4 6 8 10 120 2 4 6 8
10 12
0 2 4 6 8 10 120
0.5
1
AMI (
bits
)
RC4
0 2 4 6 8 10 12-1
-0.5
0
0.5
1
CC
F
0 2 4 6 8 10 12
0.10.20.30.40.5
Average of all RCs
AMI.CCFLag (day)
C and the raw ow data of Lushui.
-
models. Fig. 6 depicts AMI and cross-correlation function (CCF)
be-tween RCs and the original ow data. The last plot in Fig. 6
denotesthe average of AMI and CCF, which was generated by averaging
theresults in the plots of ve RCs. The average indicates an overall
cor-
quickly shifts from positive value to negative value for other
RCswith the increase of lag time. In essence, the positive or
negativevalue of CCF may indicate that the RC makes a positive or
negativecontribution to the output of model when the RC is used as
the in-
200 400 600 800 1000 12000
50
100
Original flow series
Dis
char
ge (m
3 )200 400 600 800 1000 1200
-350
3570
DWC1
200 400 600 800 1000 1200-20
02040
DWC2
200 400 600 800 1000 1200-20
02040
DWC3
200 400 600 800 1000 1200-20
020
DWC4
200 400 600 800 1000 1200-10
010
DWC5
200 400 600 800 1000 1200-10
010
DWC6
200 400 600 800 1000 1200-505
10DWC7
200 400 600 800 1000 1200-505
10DWC8
Time (day)200 400 600 800 1000 1200
05
10DWC9
Time (day)
Fig. 7. Discrete wavelet components (DWCs) and original ow
series of Lushui.
86 C.L. Wu et al. / Journal of Hydrology 372 (2009) 8093relation
either being positive or negative. The best positive correla-tion
occurs at lag 1. The RC1 among all ve RCs exhibits the bestpositive
correlation with the original ow series. The correlation
1
its)
DWC1
0.5 1 0 2 4 6 8 10 120
0.5
AMI(b
0 2 4 6 8 10 12-1 -0.50
CC
F
0 2 4 6 8 10 120
0.5
1
AMI(b
its)
DWC3
0 2 4 6 8 10 12-1 -0.50 0.5 1
CC
F
0 2 4 6 8 10 120
0.5
1
AMI(b
its)
DWC5
0 2 4 6 8 10 12-1 -0.50 0.5 1
CC
F
0 2 4 6 8 10 120
0.5
1
AMI(b
its)
DWC7
0 2 4 6 8 10 12-1 -0.50 0.5 1
CC
F
0 2 4 6 8 10 120
0.5
1
AMI(b
its)
DWC9
Lag (day)0 2 4 6 8 10 12
-1 -0.50 0.5 1
CC
F
Fig. 8. Plots of AMI and CCF between DWput of model. Therefore,
deleting RCs, which are negative correla-tions with the model
output if the average AMI or CCF ispositive, can improve the
performance of the forecasting model.
1
its)
DWC2
0.5 1 0 2 4 6 8 10 120
0.5
AMI(b
0 2 4 6 8 10 12-1 -0.50
CC
F
0 2 4 6 8 10 120
0.5
1
AMI(b
its)
DWC4
0 2 4 6 8 10 12-1 -0.50 0.5 1
CC
F
0 2 4 6 8 10 120
0.5
1
AMI(b
its)
DWC6
0 2 4 6 8 10 12-1 -0.50 0.5 1
CC
F
0 2 4 6 8 10 120
0.5
1
AMI(b
its)
DWC8
0 2 4 6 8 10 12-1 -0.50 0.5 1
CC
F
0 2 4 6 8 10 12
0.1
0.2
0.3Average of all DWCs
Lag (day)
AMI.CCF
C and the raw ow data of Lushui.
-
This is the underlying reason that ANN is coupled with SSA
orWRMA in this study.
Decomposition by WMRAThe WMRA decomposes an original signal into
many compo-
nents at different scales (or frequencies). Each of components
playsa distinct role in the original ow series. The low-frequency
com-ponent generally reects the identity (periodicity and trend)
ofthe signal whereas the high-frequency component uncovers
details(Kck and Agiral_Ioglu, 2006). An important issue in the WMRA
isto choose the appropriate number of scales. The largest
scalesshould be shorter than the size of testing data. The sizes of
testing
data are 550 days (1.25 years) for Lushui and 1826 days (5
years)for Wuxi. The largest scale m is therefore chosen as 8 and 10
forLushui and Wuxi, respectively. Thus, the ow data of Lushui
wasdecomposed at 8 wavelet resolution levels (2122232425262728
days), and the ow data of Wuxi was decomposed at 10wavelet
resolution levels (212223242526272829210days).Fig. 7 shows the
original ow data of Lushui (excluding testingdata) and nine wavelet
components (eight details componentsand one approximation
component). For the purpose of distinctionwith the components of
SSA, one wavelet component at some scaleis expressed by DWC with
the power of two. For instance, DWC1stands for the component at the
scale of 21 days and DWC2 repre-
7
8
9
3 )
ANN-SSA1
8
9
10
3)
ANN-WMRA1
one-daytwo-daythree-day
one-daytwo-daythree-day
a b
0 5 10 15 2010-1
100
101
102
embedding dimension L
FNN
P (%
)
Lushui
0 5 10 15 2010-1
100
101
102
Wuxi
embedding dimension L
FNN
P (%
)
a b
Fig. 9. FNNP as a function of the embedding dimension for (a)
Lushui and (b) Wuxi when s = 1 and Rtol = 15.
C.L. Wu et al. / Journal of Hydrology 372 (2009) 8093 875 4 3 2
13
4
5
6
Vali-
RM
SE (m Number of RCs
Fig. 10. Performances of (a) ANN-SSA1 and (b) ANN-WMRA1 as a
function of9 8 7 6 5 4 3 2 14
5
6
7
Vali-
RM
SE (m Number of DWCs
p (6L) at different prediction horizons (based on the Lushui ow
data).
-
and Wuxi when RP = 2%, and the values of L are 7 and 12 for
LushuiandWuxi when RP = 1%. The nal selection of model inputs were
sixfor Luishui (i.e., using Qt1, Qt2, Qt3, Qt4, Qt5 and Qt6 as
input topredict Qt) and 8 (i.e., using Qt1, Qt2, Qt3, Qt4, Qt5,
Qt6, Qt7 andQt8 as input to predict Qt) for Wuxi by trial and error
among threepotential model inputs.
The ensuing task is to optimize the size of the hidden layer
withthe chosen three inputs and one output. The optimal size H of
thehidden layer was found by systematically increasing the number
ofhidden neurons from 1 to 10 until the network performance on
thecross-validation set was no longer improved signicantly.
Theidentied ANN architecture was: 681 for Lushui and 891 forWuxi.
Note that the identied ANNmodel was used as the baselinemodel for
the following hybrid operation with
data-preprocessingtechniques.
Implementation of models
ANN-MA
Hydrology 372 (2009) 8093sents the component at the scale of 22
days whereas DWC9 de-notes the approximation for the Lushui ow
series. The approxima-tion component at the right end of Fig. 7 is
the residual whichreects the trend of the ow data. As revealed in
Fig. 7, detail com-ponents at scales of 28 (256 days) and 27 (128
days) are character-ized by notable periodicity, which partially
exhibits annualoscillation and semi-annual oscillation in the
original ow series.Relative weak periodic signals occur at scales
of 16 days, 32 daysand 64 days. Other high-frequency components
tend to capturethe details (or noises) of the original ow series.
Hence, the inputsof model can be ltered by deleting some
high-frequency DWCs.
AMI and CCF between DWCs and the original ow are presentedin
Fig. 8. The last plot in Fig. 8 describes the average plots of
AMIand CCF, which was generated by averaging the plots of 9
DWCs.The DWCs with lower frequencies including 28, 29 and 210 days
al-ways keep positive correlations with the original ow data
withina long lag time. The approximation component DWC9 also
exhibitsa positive correlation with the original ow data with a
long lagtime. It can be seen from other plots of DWCs that the
correlationcoefcient shifts between positive and negative
values.
Identication of the ANN architecture
Six models architectures need to be identied depending on theraw
or ltered ow data before models can be applied to the owprediction.
The ANN model is used as a paradigm to shed light onthe
procedure.
The architecture identication of the ANN model
includesdetermining model inputs and the number of nodes (or
neurons)in the hidden layer when there is one model output. The
selectionof appropriate model inputs is crucial in model
development.There is no any theoretic guide for the selection of
model inputsalthough a large number of methods have been reported
in litera-ture which was reviewed by Bowden et al. (2005). These
methodsappear very subjective. Sudheer et al. (2002) suggested that
thestatistical approach depending on cross-, auto- and
partial-auto-correlation of the observed data is a good alternative
to the trial-and-error method in identifying model inputs. The
statisticalmethod was also successfully applied to daily suspended
sedimentdata by Kisi (2008). The model input in this method is
mainlydetermined by the plot of PACF. The essence of this method is
toexamine the dependence between the input and output data
series.According to this method, the model inputs were originally
consid-ered to take previous 6 daily ows for Luishui and previous
13 dai-ly ows for Wuxi because the PACF within the condence
bandoccurs at lag 6 for Luishui and lag 13 for Wuxi (Fig. 1).
The false nearest neighbors (FNNs) (Kennel et al., 1992;
Abarba-nel et al., 1993) is another commonly used method to
identifymodel inputs, which is for the perspective of dynamics
reconstruc-tion of a system (Wang et al., 2006). The following
discussion out-lines the basic concepts of the FNN algorithm.
Suppose the pointYi = {xi, xi+s, xi+2s, . . ., xi+(L1)s} has a
neighbor Yj = {xj, xj+s, xj+2s, . . . ,xj+(L-1)s}, the criterion
that Yj is viewed as a false neighbor of Yi is:
jxiLs xjLsjkYi Yjk > Rtol 6
where |||| stands for the distance in a Euclidean sense, Rtol is
somethreshold with the common range of 1030. For all points i in
thevector state space, Eq. (6) is performed and then the
percentageof points which have FNNs is calculated. The algorithm is
repeatedfor increasing L until the percentage of FNNs drops to zero
(or someacceptable small number, denoted by RP, such as RP = 1%),
where L is
88 C.L. Wu et al. / Journal ofthe target L. Setting Rtol = 15
and s = 1, the Percentage of FNNs(FNNP) as a function of L were
calculated for the two ow series,shown in Fig. 9. The values of L
are 6 and 8, respectively for LushuiThe window length K (see moving
average) can be determinedby varying K from 1 to 10 depending on
the identied ANN model.The targeted value of K was associated with
the optimal networkperformance in terms of RMSE. The nal K was 3,
5, and 7 at 1-,2-, and 3-day-ahead forecast horizons for the two ow
data.
ANN-SSA1 (or ANN-WMRA1)According to the methodological procedure
for the ANN-SSA1
and ANN-WMRA1, the further tasks include sorting out
contribut-ing components from RCs or DWCs and determining the
ANN-SSA1architecture. RCs from Lushui were employed to describe
theimplementation of the ANN-SSA1.
The determination of the effective RCs depends on the
correla-tion coefcients between RCs and the original ow data (Fig.
6).The procedure includes the following steps:
Identify that the average of CCF (shown at the right below
cornerof Fig. 6) is positive or negative. For 1-day-ahead
prediction, theaverage of CCF is positive (0.18) at lag 1.
Sort the value of CCFs at lag 1 for all RCs in a descending
order(in an ascending order, if the average of CCF is negative).
Forthe 1-day-ahead prediction, the new order is RC1, RC2, RC3,RC4,
and RC5, which is the same as the original order.
Use the ANN model to conduct predictions in which p (6L)
RCsgenerating the new model inputs systematically decreases
from
Table 2Effective components for the ANN-SSA2 and ANN-WMRA2
inputs at variousforecasting horizons (based on the ow data of
Lushui).
Model Modelinputs
Effective components
1-day lead 2-day lead 3-day lead
ANN-SSA2 Qt1 1, 2a 1, 5 1, 4Qt2 1, 5 1, 4 1Qt3 1, 4 1 1Qt4 1 1
1Qt5 1 1 1Qt6 1 1 1
ANN-WMRA2 Qt1 4, 8, 3, 6, 7, 2, 5, 9b 8, 4, 6, 7, 5, 9 8, 6, 7,
4, 5, 9Qt2 8, 4, 6, 7, 5, 9, 3 8, 6, 7, 4, 5, 9 8, 6, 7, 9, 4, 5Qt3
8, 6, 7, 4, 5, 9, 1 8, 6, 7, 9, 4, 5 8, 6, 7, 9, 5Qt4 8, 6, 7, 9,
4, 5, 1 8, 6, 7, 9, 5 8, 6, 7, 9Qt5 8, 6, 7, 9, 5, 2, 4 8, 6, 7, 9
8, 6, 7, 9Qt6 8, 6, 7, 9, 2, 5 8, 6, 7, 9 8, 6, 7, 9
a The numbers of 1, 2 denote RC1 and RC2.
b The numbers of 4, 8, 3, 6, 7, 2, 5, 9 stand for DWC4, DWC8,
DWC3, DWC6,
DWC7, DWC2, DWC5, and DWC9, and the sequence of these numbers is
in adescending order of their correlation coefcients.
-
all ve RCs at the beginning to only RC1 at the end. The
targetvalue of p is associated with the minimum RMSE amongst veruns
of the ANN model.
Fig. 10 shows the results of the RCs and DWCs lter at all
threeprediction horizons. It can be seen from Fig. 10(a) that two
compo-nents (RC1and RC2) were remained for 1-day-ahead
prediction,two components (RC1 and RC5, because the new order is
RC1-RC5-RC2-RC4-RC3) for 2-day-ahead prediction, and only
compo-nent (RC1, due to the new order being RC1-RC4-RC5-RC3-RC2)for
3-day-ahead prediction. Fig. 10(b) shows that most of DWCsare kept.
For instance, only DWC1 (detail at the scale of 21 day)of all nine
DWCs was deleted for 1-step-ahead prediction, twoDWCs (DWC1 and
DWC2) were deleted for 2-step-ahead predic-
tion, and three DWCs (DWC1, DWC2, and DWC3) were removedfor
3-step-ahead prediction. The values of vali-RMSE in Fig. 10
alsoshow that the SSA is superior to the WMRA in the improvement
ofthe ANN performance.
Based on the remained RCs or DWCs, the number of nodes inthe
hidden layer of the ANN model is optimized again. The identi-ed
architectures of ANN-SSA1 and ANN-WMRA1 were the sameas the
original ANN model (i.e., 681 for Lushui and 891 forWuxi).
ANN-SSA2 (or ANN-WMRA2)Implementation of the ANN-SSA2 or
ANN-WMRA2 can be re-
ferred to Partal and Kisi (2007) and Partal and Cigizoglu
(2008)for details. A three-step procedure of the implementation is:
to
0 50 100 1500
50
100
150
Observed
Fore
cast
ed
Lushui
100 200 300 400
50
100
150
Time (day)
Dis
char
ge (m
3)
0 500 1000 15000
500
1000
1500
Observed
Fore
cast
ed
Wuxi
500 1000 1500
500
1000
1500
Time (day)
Dis
char
ge (m
3)
ObservedForecasted
ObservedForecasted
RMSE: 6.41CE: 0.42PI: 0.10
RMSE: 88.9CE: 0.47PI: 0.03
a b
c d
Fig. 11. Scatter plots and hydrographs of the results of
1-day-ahead forecast by the ANN model using the Lushui data (1 and
3) and Wuxi data (2 and 4).
50
100
150
isch
arge
(m3 )
Lushui
500
1000
1500
isch
arge
(m3 )
Wuxi
ObservedForecasted
ObservedForecasted
a b
C.L. Wu et al. / Journal of Hydrology 372 (2009) 8093 89150 170
190 210 230 250Time (1 March to 9 June 1988)
D
-8 -6 -4 -2 0 2 4 6 80
0.2
0.4
0.6
0.8
1
Lag (day)
CC
F
One-dayTwo-dayThree-day
cFig. 12. Representative detail of observed and forecasted
discharges for 1-day-ahead forefrom the ANN model (1 and 3 for
Lushui; 2 and 4 for Wuxi).880 900 920 940 960 9800
Time (31 May to 8 Sep. 2005)
D
-8 -6 -4 -2 0 2 4 6 80
0.2
0.4
0.6
0.8
1
Lag (day)
CC
F
One-dayTwo-dayThree-day
dcast and CCF between observed and forecasted discharges at
three forecast horizons
-
rstly use the SSA or WRMA to decompose each input variable
ser-ies of the original ANN model; to then select effective
componentsfor each input variable; to nally generate a new input
variableseries by summing selected effective components. Obviously,
theprocedure is very time-consuming because it has to be
repeatedfor each ANN input variable. There is no denite criterion
for theselection of RCs or DWCs. A basic principle is to remain
these com-ponents that make a positive contribution to the model
output. Atrial and error approach was therefore employed in the
presentstudy. The value of CCF in Figs. 6 and 8 indicate the
contributionof each component in each input variable to the ANN
output. Forinstance, the values of CCF at lag 1 denote the
correlationcoefcients between components of Qt1 and the output
variableQt. Table 2 lists the effective components of each input
variable
for ANN-SSA2 and ANN-WMRA2 based on the Lushui ows. Itcan be
seen that the SSA is more effective than the WMRA becausemost of
DWCs are remained for each input variable of the ANN-WMRA2 whereas
only one or two RCs are kept for each inputvariable of the
ANN-SSA2. With newmodel inputs, identied archi-tectures of the
ANN-SSA2 and ANN-WMRA2 were also the same asthe original ANN model
(i.e., 681 for Lushui and 891 forWuxi).
Forecasting results and discussion
Fig. 11 shows the scatter plots and hydrographs of the results
of1-day-ahead prediction of the ANN model using the ow data
ofLushui andWuxi. The ANNmodel seriously underestimates a num-
0 50 100 1500
50
100
150
Fore
cast
ed
ANN-SSA1
0 50 100 1500
50
100
150
Observed
Fore
cast
ed
ANN-SSA2
0 50 100 1500
50
100
150
Fore
cast
ed
ANN-WMRA1
0 50 100 1500
50
100
150
Observed
Fore
cast
ed
ANN-WMRA2
RMSE: 4.67CE: 0.70PI: 0.54
RMSE: 3.77CE: 0.80PI: 0.69
RMSE: 3.18CE: 0.86PI: 0.78
RMSE: 5.85CE: 0.51PI: 0.25
a b
c d
Fig. 13. Scatter plots of observed and forecasted Luishui
discharges for 1-day-ahead forecast using (a) ANN-SSA1, (b)
ANN-WMRA1, (c) ANN-SSA2, and (d) ANN-WMRA2.
50
100
150
isch
arge
(m3 )
ANN-SSA1
50
100
150
isch
arge
(m3 )
ANN-WMRA1
ObservedForecasted
ObservedForecasted
a b
90 C.L. Wu et al. / Journal of Hydrology 372 (2009) 8093150 170
190 210 230 2500
D
150 170 190 210 230 2500
50
100
150
Time (1 March to 9 June 1988)
Dis
char
ge (m
3 )
ANN-SSA2
ObservedForecasted
cFig. 14. Representative detail of observed and forecasted
Luishui discharges for 1-day-aWMRA2.150 170 190 210 230 2500
D
150 170 190 210 230 250
50
100
150
Time (1 March to 9 June 1988)
Dis
char
ge (m
3 )
ANN-WMRA2
ObservedForecasted
dhead forecast using (a) ANN-SSA1, (b) ANN-WMRA1, (c) ANN-SSA2,
and (d) ANN-
-
ber of moderate and high magnitudes of the ows. The low valuesof
CE and PI demonstrate that a time lag may exist between
theforecasted and observed ows. A representative detail of
thehydrographs is presented in Figs. 12(1) and 12(2), in which the
pre-diction lag effect is fairly obvious. Figs. 12(3) and 12(4)
illustratethe lag values at 1-, 2-, and 3-day-ahead forecast
horizons for Lush-ui andWuxi on a basis of the CCF between
forecasted and observeddischarges. The value of CCF at zero lag
corresponds to the actualperformance (i.e., correlation coefcient)
of the modes. The lag atwhich the value of CCF is maximized, is an
expression for the meanlag in the model forecast. Therefore, there
were 1, 2, and 4 days lagfor Lushui, and 1, 2, and 3 days lag for
Wuxi, which are respectivelyassociated with 1-, 2-, and 3-day-ahead
forecasting.
The scatter plots of simulation results of 1-day-ahead
predic-tion based on the Lushui ows by using the ANN-SSA1,
ANN-SSA2, ANN-WMRA1, and ANN-WMRA2, are presented in Fig. 13.Each
of the four models exhibits a noticeable improvement inthe
performance compared with the ANN model. The remarkableimprovement
is, however, from the ANN-SSA1 and ANN-SSA2 interms of RMSE, CE and
PI. Fig. 14 describes the representative de-tail of the hydrographs
of the four prediction models. The laggedpredictions can be clearly
found in the detail plots derived fromthe ANN-WMRA1 and ANN-WMRA1,
in particular in the latter.Figs. 15 and 16 present the scatter
plots and detail parts of thehydrographs from the same four models
based on the ows ofWuxi. A great improvement in the model
performance can be seen
0 500 1000 15000
500
1000
1500
Fore
cast
ed
ANN-SSA1
0 500 1000 15000
500
1000
1500
Observed
Fore
cast
ed
ANN-SSA2
0 500 1000 15000
500
1000
1500
Fore
cast
ed
ANN-WMRA1
0 500 1000 15000
500
1000
1500
Observed
Fore
cast
ed
ANN-WMRA2
RMSE: 78.7CE: 0.58PI: 0.24
RMSE: 48.1CE: 0.84PI: 0.72
RMSE: 46.0CE: 0.86PI: 0.74
RMSE: 69.6CE: 0.68PI: 0.41
a b
c d
Fig. 15. Scatter plots of observed and forecasted Wuxi
discharges for 1-day-ahead forecast using (a) ANN-SSA1, (b)
ANN-WMRA1, (c) ANN-SSA2, and (d) ANN-WMRA2.
500
1000
1500
Dis
char
ge (m
3 )
ANN-SSA1
500
1000
1500
Dis
char
ge (m
3 )
ANN-WMRA1a b
C.L. Wu et al. / Journal of Hydrology 372 (2009) 8093 91880 900
920 940 960 9800
880 900 920 940 960 9800
500
1000
1500
Time (31 May to 8 Sep. 2005)
Dis
char
ge (m
3 )
ANN-SSA2cFig. 16. Representative detail of observed and
forecasted Wuxi discharges for 1-day-aWMRA2.880 900 920 940 960
9800
880 900 920 940 960 9800
500
1000
1500
Time (31 May to 8 Sep. 2005)
Dis
char
ge (m
3 )
ANN-WMRA2dhead forecast using (a) ANN-SSA1, (b) ANN-WMRA1, (c)
ANN-SSA2, and (d) ANN-
-
0 50 100 1500
50
100
150
Observed
Fore
cast
ed
Lushui
100
150
(m3 )
Lag (day)
RMSE: 2.47CE: 0.91PI: 0.87
a b
c d
xi ((b), (d), and (f)) where (a) and (b) denote scatter plots,
(c) and (d) are representativethree prediction levels.
Table 4Model performances at various forecasting horizons using
testing data of Wuxi.
Model RMSE CE PI
1a 2a 3a 1 2 3 1 2 3
ANN 88.9 111.0 114.7 0.47 0.17 0.12 0.03 0.24 0.32ANN-MA 29.4
39.4 41.5 0.94 0.90 0.88 0.89 0.90 0.91
92 C.L. Wu et al. / Journal of Hydrwhen the four models are
compared with the ANN model. In termsof RMSE, CE and PI, the
ANN-SSA1 and ANN-SSA2 performed betterthan the ANN-WMRA1 and
ANN-WMRA2. The detail plots inFig. 16(a) and (c) also indicate that
the ANN-SSA1 and ANN-SSA2can reasonably approximate the ows of
Wuxi. In contrast, theANN-WRMA1 and ANN-WRMA1 underestimate quite a
number
150 200 2500
50
Time (1 March to 9 June 1988)
-8 -6 -4 -2 0 2 4 6 80
0.5
1
Lag (day)
CC
F
One-dayTwo-dayThree-day
Dis
char
ge
e
Fig. 17. Forecast results from the ANN-MA model for Lushui ((a),
(c), and (e)) and Wudetails, and (e) and (f) show the CCF between
forecasts and observed discharges atof peak value ows. Furthermore,
the lag effect is still visible inFig. 16(b) and (d).
In terms of the ANN-SSA1 and ANN-SSA2, the scatter plots withlow
spread, the low RMSE, and high CE and PI indicate excellentmodel
performance. The matched-perfectly detail plots in Figs.16(a) and
(c) also show that the two models highly approximatethe ows of
Wuxi.
The ANN-MA simulation results of 1-day-ahead prediction
arepresented in Fig. 17. Fig. 17(a), (c), and (e) depicts the
scatter plots,the hydrographs and the CCF curves at three
forecasting horizonsbased on the ows of Lushui. Fig. 17(b), (d),
and (f) demonstratesthe scatter plots, the hydrographs and the CCF
curves at three fore-casting horizons depending on the ows of Wuxi.
The results fromFig. 17(e) and (f) show that the issue of lagged
prediction is com-pletely eliminated by the MA because the maximum
CCF occursat zero lag. Compared with the other ve models (ANN,
ANN-SSA1, ANN-SSA2, ANN-WMRA1, and ANN-WMRA2), the ANN-MAmodel
exhibits the best model performance including the scatter
Table 3Model performances at various forecasting horizons using
testing data of Lushui.
Model RMSE CE PI
1a 2a 3a 1 2 3 1 2 3
ANN 6.41 7.50 8.27 0.42 0.20 0.03 0.10 0.26 0.36ANN-MA 2.47 2.63
2.60 0.91 0.90 0.90 0.87 0.91 0.94ANN-SSA1 3.77 3.85 3.98 0.80 0.79
0.77 0.69 0.81 0.85ANN-SSA2 3.18 3.47 3.85 0.86 0.83 0.79 0.78 0.84
0.86ANN-WMRA1 4.67 5.14 7.17 0.70 0.62 0.27 0.54 0.65 0.52ANN-WMRA2
5.85 6.26 6.91 0.51 0.44 0.32 0.25 0.48 0.55
a The number of 1, 2, and 3 denote 1-, 2-, and 3-day-ahead
forecasts.0 500 1000 15000
500
1000
1500
Observed
Fore
cast
ed
Wuxi
880 900 920 940 960 9800
500
1000
1500
Time (31 May to 8 Sep. 2005)
Dis
char
ge (m
3 )-8 -6 -4 -2 0 2 4 6 8
0
0.5
1
CC
F
One-dayTwo-dayThree-day
RMSE: 29.4CE: 0.94PI: 0.89
f
ology 372 (2009) 8093plots with low spread, the low RMSE, and
high CE and PI. Thematched-perfectly detail plots in Fig. 17(b) and
(d) indicate thatthe two models are fairly adequate in reproducing
the observedows of Lushui and Wuxi. In addition, the ANN-MA model
alsoshows a great ability in capturing the peak value of ows
(depictedin Figs. 17(a)(d)).
Table 3 and 4 summarizes the forecasting performance of all
sixmodels in terms of RMSE, CE, and PI at three prediction
horizons.The ANN model shows markedly inferior results compared
withthe other ve models. The ANN-MA among all models holds thebest
performance at each prediction horizon. It can be also seenthat the
performances from the ANN-SSA1 and ANN-SSA2 are sim-ilar, and the
same situation appears between the ANN-WMRA1 andANN-WMRA2. However,
the models based on SSA provide notice-ably better performance than
the models based on WMRA at eachforecast horizon, which means that
the SSA is more effective thanthe WMRA in improving the ANN
performance in the currentstudy.
Conclusions
In this study, the conventional ANN model was coupled withthree
different data-preprocessing techniques, i.e., MA, SSA, andWMRA. As
a result, six ANN models, the original ANN model,ANN-MA, ANN-SSA1,
ANN-SSA2, ANN-WMRA1, and ANN-WMRA2,
ANN-SSA1 46.0 50.4 50.5 0.86 0.83 0.83 0.74 0.84 0.87ANN-SSA2
48.1 45.3 50.5 0.84 0.86 0.83 0.72 0.87 0.87ANN-WMRA1 69.6 82.3
92.2 0.68 0.55 0.43 0.41 0.59 0.56ANN-WMRA2 78.7 86.9 94.1 0.58
0.49 0.41 0.24 0.54 0.54
a The number of 1, 2, and 3 denote 1-, 2-, and, 3-day-ahead
forecasts.
-
were proposed to forecast two daily ow series of Lushui andWuxi.
To apply these models to the ow data, the memory lengthK of MA, the
lag time and embedding dimension (s, L) of SSA, andthe largest
scale m of WMRA needed to be decided in advance. TheK for 1-, 2-,
and 3-day-ahead were 3, 5, and 7 days for each owdata by trial and
error. The values of (s, L) were set as the valueof (1, 5) for each
ow data by sensitivity analysis. The largest scalem of WMRA was 8
and 10 for Lushui and Wuxi respectivelydepending on the length size
of the testing data.
The results from the original ANN model were disappointingdue to
the existence of the prediction lag effect. The analysis ofCCF
between predicted and observed ows revealed that the lags
Fraser, A.M., Swinney, H.L., 1986. Independent coordinates for
strange attractorsfrom mutual information. Phys. Rev. A 33 (2),
11341140.
Howell, J.F., Mahrt, L., 1994. An Adaptive Decomposition:
Application to Turbulencein Wavelet in Geophysics. Academic Press,
New York. pp. 107128.
Jain, S.K., Das, A., Drivastava, D.K., 1999. Application of ANN
for reservoirinow prediction and operation. J. Water. Resour.
Plann. Manage. 125 (5),263271.
Jain, A., Srinivasulu, S., 2004. Development of effective and
efcient rainfallrunoffmodels using integration of deterministic,
real-coded genetic algorithms andarticial neural network
techniques. Water Resour. Res. 40, W04302.
Kennel, M.B., Brown, R., Abarbanel, H.D.I., 1992. Determining
embedding dimensionfor phase space reconstruction using geometrical
construction. Phy. Rev. A. 45(6), 34033411.
Kisi, O., 2003. River ow modeling using articial neural
networks. J. Hydrol. Eng. 9
C.L. Wu et al. / Journal of Hydrology 372 (2009) 8093 93at three
prediction horizons were 1, 2, and 4 days lag for Lushuiand 1, 2,
and 3 days lag for Wuxi. All three data-preprocessingtechniques
could improve the ANN performance. The ANN-MA,among all six models,
performed best and eradicated the lag effect.It could be also seen
that the performances from the ANN-SSA1 andANN-SSA2 were similar,
and the same situation appeared betweenthe ANN-WMRA1 and ANN-WMRA2.
However, the models basedon SSA provided noticeably better
performance than the modelsbased on WMRA at each forecast horizon,
which meant that theSSA was more effective than the WMRA in
improving the ANN per-formance in the current study.
Under the overall consideration including the model perfor-mance
and the complexity of modeling, the ANN-MA model wasoptimal, then
the ANN model coupled with SSA, and nally theANN model coupled with
WMRA.
References
Abarbanel, H.D.I., Brown, R., Sidorowich, J.J., Tsimring, L.S.,
1993. The analysis ofobserved chaotic data in physical systems.
Rev. Mod. Phys. 65 (4), 13311392.
Abrahart, R.J., See, L., 2002. Multi-model data fusion for river
ow forecasting: anevaluation of six alternative methods based on
two contrasting catchments.Hydrol. Earth Syst. Sci. 6 (4),
655670.
ASCE Task Committee on Application of the Articial Neural
Networks inHydrology, 2000. Articial neural networks in hydrology.
II: Hydrologicapplications. J. Hydrol. Eng., ASCE 5 (2),
124137.
Bowden, G.J., Dandy, G.C., Maier, H.R., 2005. Input
determination for neural networkmodels in water resources
applications: part 1 background and methodology.J. Hydrol. 301,
7592.
Castellano-Mndeza, M., Gonzlez-Manteigaa, W., Febrero-Bande, M.,
ManuelPrada-Sncheza, J., Lozano-Caldern, R., 2004. Modeling of the
monthly anddaily behavior of the runoff of the Xallas river using
BoxJenkins and neuralnetworks methods. J. Hydrol. 296, 3858.
Daubechies, I., 1992. Ten Lectures on Wavelets CSBM-NSF Series,
ApplicationMathematics, vol. 61. SIAM Publication, Philadelphia,
PA.
Dawson, C.W., Wilby, R.L., 1999. A comparison of articial neural
networks used forriver ow forecasting. Hydrol. Earth Syst. Sci. 3,
529540.
Dawson, C.W., Wilby, R.L., 2001. Hydrological modeling using
articial neuralnetworks. Prog. Phys. Geogr. 25 (1), 80108.
De Vos, N.J., Rientjes, T.H.M., 2005. Constraints of articial
neural networks forrainfallrunoff modeling: trade-offs in
hydrological state representation andmodel evaluation. Hydrol.
Earth Syst. Sci. 9, 111126.
Elsner, J.B., Tsonis, A.A., 1997. Singular Spectrum Analysis: A
New Tool in TimeSeries Analysis. Plenum Press, New York.
Farmer, J.D., Sidorowich, J.J., 1987. Predicting chaotic time
series. Phys. Rev. Lett. 59(4), 845848.(1), 6063.Kisi, O., 2005.
Daily river ow forecasting using articial neural networks and
auto-
regressive models. Turkish J. Eng. Environ. Sci. 29, 920.Kisi,
O., 2008. Constructing neural network sediment estimation models
using a
data-driven algorithm. Math. Comput. Simulat. 79,
94103.Kitanidis, P.K., Bras, R.L., 1980. Real-time forecasting with
a conceptual hydrologic
model, 2, applications and results. Water Resour. Res. 16 (6),
10341044.Kck, M., Agiral_Ioglu, N., 2006. Regression technique for
stream ow prediction. J.
Appl. Stat. 33 (9), 943960.Laio, F., Porporato, A., Revelli, R.,
Ridol, L., 2003. A comparison of nonlinear ood
forecasting methods. Water Resour. Res. 39 (5), 1129.
doi:10.1029/2002WR001551.
Lee, C.F., Lee, J.C., Lee, A.C., 2000. Statistics for Business
and Financial Economics(second nd version). World Scientic,
Singapore.
Legates, D.R., McCabe Jr., G.J., 1999. Evaluating the use of
goodness-of-t measuresin hydrologic and hydroclimatic model
validation. Water Resour. Res. 35 (1),233241.
Lisi, F., Nicolis, Sandri, M., 1995. Combining singular-spectrum
analysis and neuralnetworks for time series forecasting. Neural
Process. Lett. 2 (4), 610.
Mallat, S.G., 1989. A theory for multi-resolution signal
decomposition: the waveletrepresentation. IEEE Trans. Pattern Anal.
Machine Intell. (7), 674692.
Marques, C.A.F., Ferreira, J., Rocha, A., Castanheira, J.,
Gonalves, P., Vaz, N., Dias, J.M.,2006. Singular spectral analysis
and forecasting of hydrological time series.Phys. Chem. Earth 31,
11721179.
Muttil, N., Chau, K.W., 2006. Neural network and genetic
programming formodelling coastal algal blooms. Int. J. Environ.
Pollut. 28 (3/4), 223238.
Nash, J.E., Sutcliffe, J.V., 1970. River ow forecasting through
conceptual models;part I a discussion of principles. J. Hydrol. 10,
282290.
Partal, T., Cigizoglu, H.K., 2008. Estimation and forecasting of
daily suspendedsediment data using waveletneural networks. J.
Hydrol. (34), 317331.
Partal, T., Kisi, O., 2007. Wavelet and Neuro-fuzzy conjunction
model forprecipitation forecasting. J. Hydrol. (2), 199212.
Newbold, P., Carlson, W.L., Thorne, B.M., 2003. Statistics for
Business and Economics(fth version). Prentice Hall, Upper Saddle
River, NJ.
Raman, H., Sunilkumar, N., 1995. Multivariate modeling of water
resources timeseries using articial neural networks. Hydrol. Sci.
J. 40 (2), 145163.
Sivakumar, B., Jayawardena, A.W., Fernando, T.M.K., 2002. River
ow forecasting:use of phasespace reconstruction and articial neural
networks approaches. J.Hydrol. 265 (1), 225245.
Sivapragasam, C., Liong, S.Y., Pasha, M.F.K., 2001. Rainfall and
discharge forecastingwith SSA-SVM approach. J. Hydroinformat. 3
(7), 141152.
Sudheer, K.P., Gosain, A.K., Ramasastri, K.S., 2002. A
data-driven algorithm forconstructing articial neural network
rainfallrunoff models. Hydrol. Process.16, 13251330.
Thirumalaiah, K., Deo, M.C., 2000. Hydrological forecasting
using neural networks. J.Hydrol. Eng. 5 (2), 180189.
Torrence, C., Compo, G.P., 1998. A practical guide to wavelet
analysis. Bull. Am.Meteorol. Soc. 79, 6178.
Vautard, R., Yiou, P., Ghil, M., 1992. Singular-spectrum
analysis: a toolkit for short,noisy and chaotic signals. Physica D
58, 95126.
Wang, W., van Gelder, P.H.A.J.M., Vrijling, J.K., Ma, J., 2006.
Forecasting dailystreamow using hybrid ANN models. J. Hydrol. 324,
383399.
Methods to improve neural network performance in daily flows
predictionIntroductionStreamflow dataMethods Artificial neural
networks Moving average Singular spectrum analysis Wavelet
multi-resolution analysis (WMRA) Wavelet transformMulti-resolution
analysis (MRA)
ANNs integrated with data preprocessing
techniquesANNANN-MAANN-SSA1 and ANN-SSA2ANN-WMRA1 and ANN-WMRA2
Evaluation of model performances
Application of models to the flow data Decomposition of daily
flow data Decomposition by SSA Decomposition by WMRA
Identification of the ANN architecture Implementation of models
ANN-MA ANN-SSA1 (or ANN-WMRA1) ANN-SSA2 (or ANN-WMRA2)
Forecasting results and discussion
ConclusionsReferences