-
Predicting copper concentrations in acid mine drainage:a
comparative analysis of five machine learning techniques
Getnet D. Betrie & Solomon Tesfamariam &Kevin A. Morin
& Rehan Sadiq
Received: 27 November 2011 /Accepted: 30 August 2012 /Published
online: 15 September 2012# Springer Science+Business Media B.V.
2012
Abstract Acid mine drainage (AMD) is a global prob-lem that may
have serious human health and environ-mental implications.
Laboratory and field tests arecommonly used for predicting AMD,
however, this ischallenging since its formation varies from
site-to-sitefor a number of reasons. Furthermore, these tests
areoften conducted at small-scale over a short period oftime.
Subsequently, extrapolation of these results intolarge-scale
setting of mine sites introduce huge uncer-tainties for
decision-makers. This study presents ma-chine learning techniques
to develop models to predictAMD quality using historical monitoring
data of a minesite. The machine learning techniques explored in
thisstudy include artificial neural networks (ANN), supportvector
machine with polynomial (SVM-Poly) and radialbase function
(SVM-RBF) kernels, model tree (M5P),and K-nearest neighbors (K-NN).
Input variables (phys-ico-chemical parameters) that influence
drainage dy-namics are identified and used to develop models
topredict copper concentrations. For these selectedtechniques, the
predictive accuracy and uncertaintywere evaluated based on
different statistical meas-ures. The results showed that SVM-Poly
performed
best, followed by the SVM-RBF, ANN, M5P, and KNNtechniques.
Overall, this study demonstrates that themachine learning
techniques are promising tools forpredicting AMD quality.
Keywords Acid mine drainage . Acid rockdrainage .Machine
learning . Artificial neuralnetwork . Support vector machine .
Model tree . K-nearest neighbors
Introduction
Acid mine drainage (AMD), also called acid rock drain-age, is a
major pollution problem globally that is ad-versely affecting the
surrounding environment (Gray1996). The AMD is produced when
sulfide-bearingmaterial is exposed to oxygen and water during
miningactivities (Morin and Hutt 1997; Price 2009). This ex-posure
results in oxidation and other weathering pro-cesses, which changes
relatively insoluble chemicalspecies in sulfide minerals into more
easily dissolvedfree ionic species (e.g., Cu, As, and Zn) or
secondaryminerals (e.g., sulfates, carbonates, and
hydroxides).Moreover, the oxidation of some sulfide minerals
pro-duces acid that may lower the drainage pH. This lowerdrainage
pH could increase the rate of sulfide oxidation,solubility of many
products of sulfide oxidation, andrate of weathering of other
minerals.
Once AMD is produced, water can transport thesetoxic substances
into the environment that contaminate
Environ Monit Assess (2013) 185:41714182DOI
10.1007/s10661-012-2859-7
G. D. Betrie (*) : S. Tesfamariam :R. SadiqSchool of
Engineering, UBC-Okanagan,Kelowna, BC, Canadae-mail:
[email protected]
K. A. MorinMinesite Drainage Assessment Group,Vancouver, BC,
Canada
-
water resources and soils. Exposure to these toxic sub-stances
may cause serious human and ecological risks(Azapagic 2004). The
associated human health riskincludes increased chronic diseases and
various typesof cancers. The ecological risks range from
eliminationof species, significantly reducing ecological
stabilityand bioaccumulation of metals in the flora and fauna(Gray
1996).
Predicting the future drainage chemistry is impor-tant to assess
potential environmental risks of AMDand implement appropriate
mitigation measures. How-ever, predicting the potential for AMD can
be exceed-ingly challenging because its formation is highlyvariable
that varies from site-to-site depending uponmineralogy and other
operational and environmentalfactors (USEPA 1994). For this reason,
laboratory test,field test, and a variety of predictive
modelingapproaches have been used for predicting the potentialof
mined materials to generate acid and contaminant(USEPA 1994; Maest
et al. 2005; Price 2009). Labo-ratory and field tests are often
undertaken for shortperiods of time with respect to the potential
persistenceperiod of AMD; hence, they may inadequately mimicthe
evolutionary nature of the process of acid generation(USEPA 1994).
Predictive modeling approaches havebeen used to overcome the
uncertainties inherent inshort-term testing and avoid the
prohibitive costs of verylong-term testing.
Predictive models for AMD can be classified asempirical and
deterministic models (USEPA 1994;Perkins et al. 1995; Maest et al.
2005; Price 2009).The summary of deterministic and empirical
modelscommonly applied to evaluate mine drainage quality
ispresented in Table 1. Empirical models describe thetime-dependent
behavior of one or more variables of amine waste geochemical system
in terms of observed
behavior trends. These empirical models are sitespecific and
based on years of monitoring at a minesite. Thus, the AMD
prediction accuracy of theempirical models depends heavily on the
quality ofavailable data. On the other hand, deterministic mod-els
describe system in terms of chemical and/orphysical processes that
are believed to controlAMD. Deterministic models often require
intensivesite-specific studies and data but collecting thosedata
with sufficient accuracy is often difficult andexpensive. In this
study, the empirical modelingapproach is investigated to make use
of monitoring datacollected at a mine site.
An example of empirical model was providedby Morin and Hutt
(2001). These researchers de-veloped an empirical model, named
empiricaldrainage-chemistry model (EDCM), and applied it forthe
prediction of drainage quality using historical datafrommine sites.
The EDCM approach involves definingcorrelation equations using
least-linear fitting betweenconcentrations and other geochemical
parameters typi-cally pH and sulfate.
In this paper, machine learning techniques havebeen explored to
develop models that predict futuredrainage quality using existing
data. This approach iswidely applied to solve environmental and
civil engi-neering problems (Reich 1997; William 2009). Anexample
of using one of machine learning techniquesto predict AMD was
presented by Khndelwal andSingh (2005) although it did not involve
the use ofexisting mine sites database. These researchers com-pared
artificial neural network (ANN) to multivari-ate regression
analysis (MVRA) for prediction ofmine water quality. They reported
that ANN providedacceptable results compared with MVRA.
However,this comparison lacks adequate performance
evaluationmeasures.
Machine learning approaches are useful to developpredictive
models but their use requires insight intothe learning problem
formulation, selection of appro-priate learning methods, and
evaluation of modelingresults to achieve the stated goal of the
modelingactivity (Reich and Barai 1999; Cherkassky et al.2006).
This paper compares the predictive accuracyand uncertainty of five
selected machine learning tech-niques using rigorous statistical
tests. The selectedmachine learning techniques are ANN, support
vectormachine with polynomial (SVM-Poly) and radial basefunction
(SVM-RBF) kernels, model tree (M5P), and
Table 1 Summary of deterministic and empirical models usedfor
evaluating mine drainage quality
Deterministic model Empirical model
MINTEQ (Allisonet al. 1991)
EDCM (Morin and Hutt 1993,1994; Morin and Hutt 2001)
PHREEQC (Parkhurstand Appelo 1999)
MINTRAN (Walteret al. 1994)
WATAIL (Schareret al. 1993)
4172 Environ Monit Assess (2013) 185:41714182
-
K-nearest neighbors (K-NN). The prediction accuracyrefers to the
difference between observed and pre-dicted values. On the other
hand, predictive uncertain-ty refers to the variability of the
overall error aroundthe mean error. The detailed description of
machinelearning techniques and the study approach are pre-sented in
the following sections.
Materials and methods
Machine learning techniques
Machine learning is an algorithm that estimates an un-known
dependency between mine waste geochemicalsystem inputs and its
outputs from the available data.The available mine waste
geochemical data are usuallyrepresented as a pair (xi, yi), which
is called an exampleor an instance. The machine learning consists
of inputvariables X, mine waste geochemical system thatreturns
output Y for each input variable, and amachine learning algorithm
that selects mapping
functions (i.e., bY X ; Y ), which describes how themine waste
geochemical system behaves as shown inFig. 1. The goal of learning
(training) is to select the bestfunction that minimizes the error
between the system
output (Y) and predicted output bY based on examplesdata. These
examples data used for training purpose arecalled a training
dataset. The process of building amachine learning model follows
general principlesadopted in modeling: study the problem, collect
data,select model structure, build the model, test themodel, and
iterate (Solomatine and Ostfeld 2008).There are various types of
machine learning techni-ques but ANN, support vector machine
(SVM),
M5P, and K-neighbors are explored in this study.These techniques
are implemented using WEKA3.6.4 Software (Bouckaert et al. 2010),
and theyare described in detail in the following sections.
Artificial neural network
ANN is one of the machine learning techniques thatconsist of
neurons with massively weighted intercon-nections (Bishop 1995).
These neurons are arrangedas input, hidden and output layers as
displayed inFig. 2. The task of input layer is only to send
theinput signals to the hidden layer without performingany
operations. The hidden and output layers multi-ply the input
signals by a set of weights and eitherlinearly or nonlinearly
transform results into outputvalues. These weights are optimized
during ANN train-ing (calibration) process to obtain reasonable
predictionsaccuracy.
In this study, multilayer perceptron is used al-though there are
various types of ANN algorithms(Bishop 1995). Multilayer perceptron
is feedforwardneural network, where signals always travel in
thedirection of the output layer. A typical multilayerperceptron
with one hidden layer can be mathemat-ically expressed in Eqs. 14.
The outputs of hiddenlayer (Zj) are obtained as (1) summing
products ofthe inputs (Xi) and weight vectors (aij) and a
hiddenlayers bias term (a0j; see Eq. 1), and (2) trans-forming this
sum using transfer function g (seeEq. 2). The most widely used
transfer functionsare logistic and hyperbolic tangent. Similarly,
theoutputs of the output layer (Yk) are obtained by (1)summing
products of hidden layers outputs (Zj)and weight vectors (bjk) and
output layers bias
MACHINE LEARNING
AMD SYSTEM
INPUT DATA
MINIMIZE
X
Y
YFig. 1 A machine learningalgorithm using real systemdata to
predict output
Environ Monit Assess (2013) 185:41714182 4173
-
term (b0k; see Eq. 3) and (2) transforming this sumusing
transfer function g (see Eq. 4).
uj X
Ninpi1Xiaij aoj 1
Zj guj 2
vk X
Nhidj1Zjbjk b0k 3
Yk g vk 4
Support vector machine
SVM was mainly developed by Vapnik and co-workers (Vapnik 1998;
Cherkassky and Mulier2007). Its principle is based on Structural
Risk Mini-mization that overcomes the limitation of the
tradition-al empirical risk minimization technique under
limitedtraining data. Structural risk minimization aims
atminimizing a bound on the generalization error of amodel instead
of minimizing the error on the trainingdataset. The SVM algorithm
was first developed forclassification problems and then adapted to
addressregression problems. In this study, the basic idea ofSVM
regression is illustrated since a regression prob-lem is
solved.
The complete description of SVM regression iswell presented by
Smola and Schlkopf (1998) and asummary of it is presented in this
study. Given a
training dataset (xi, yi), where xi is the ith input patternand
yi is corresponding target value Yi . The goal ofSVM regression is
to find a function f(x) that has atmost deviation from actually
obtained targets yi forall training data, and at the same time, is
as flat aspossible (Vapnik 1995). The function f is
representedusing a linear function in the feature space
f x w; xh i b with w 2 X ; b 2 R 5
where .,. denotes the dot product in X. In this case,
theflatness means seeking a small w. This can be ensuredby
minimizing the norm (i.e., w20w.w) if theassumption that a function
f is known a priori to approx-imate all pairs (xi, yi) with
precision. If such function isnot known a priori, it is possible to
introduce slack
variables xi;x*i and allow for some errors. This minimi-
zation problem can be mathematically expressed as
minimize1
2k wk2 C
Xi
xi x*i 6
subject toyi w; xih i b " xiw; xih i b yi " x*i
xi; xi 0
80 determines the tradeoff between
the flatness of f and the amount up to which deviationslarger
than are tolerated. The constrained optimizationproblem is
converted into unconstrained optimizationby introducing Lagrange
function. The Lagrange func-tion is constructed from the objective
function and thecorresponding constraints by introducing a dual set
ofvariables as follows:
L : 12k wk2 C
Xli1 xi x*i
X
li1ai " xi yi w; xih i b
X
li1a
i " x*i yi w; xih i b
X
li1 ixi i x*i
7
It follows from the saddle point condition that thepartial
derivatives of L with respect to the primal
XNinp
X1 aij Z1
ZNhid
Y1
YNout
bij
Fig. 2 Multilayer perceptron neural networks
4174 Environ Monit Assess (2013) 185:41714182
-
variables w; b; xi; x*i
have to vanish for optimality.
Substituting the results of this derivation into Eq. 7yields the
dual optimization problem.
Maximize 12Pl
i1 ai a*i
aj a*j
xi; xj
"Pli1 ai a*i Pli1 yi ai a*i Subject to
Pli1 ai a*i 0 and ai; a*i 2 0;C
8Once the coefficient i and a*i are determined from
Eq. 8, the desired vectors can be written as follows:
w X
li1 ai a*i
xi; and therefore
f x X
ii1 ai a*i
xi; xh i b
9
Nonlinear regression problems are very common inmost engineering
applications. In such case, a nonlin-ear mapping kernel K is used
to map the data into ahigher-dimensional feature space or
hyperplane by thefunction . The kernel function, K(xi,
x)0(xi),(x)can assume any form. In this study, the SVM-Poly
andSVM-RBF kernels are used. These kernels are pre-sented in Eqs.
10 and 11.
Polynomial kemel : K xi; x y xi; xh i t d ; g > 010
Radial basis function kemel : K xi; x exp g k xi xk2
; g > 0 11
where , , and d are kernel parameters.
Model trees
M5P are tree-based models for dealing with continuous-class
learning problems with piecewise linear func-tions, originally
developed by Quinlan (1992). Theschematic representation of model
tree is depictedin Fig. 3. Given a training set T, this set is
eitherassociated with a leaf or some test is chosen thatsplits T
into subsets corresponding to the test out-comes and the same
process is applied recursivelyto the subsets. For a new input
vector, (1) it isclassified to one of the subsets and (2) the
correspondingmodel is run to produce the prediction. The steps
to
build M5P are building the initial tree, pruning
andsmoothing.
In the building tree procedure, a splitting criterionin each
node is determined. The splitting criterion isbased on treating the
standard deviation of the classvalues that reach a node as a
measure of the error atthat node, and calculating the expected
reduction inerror as a result of testing each attribute at that
node(Wang and Witten 1997). The attribute which max-imizes the
expected error reduction is chosen. Thestandard deviation reduction
(SDR) is calculated usingEq. 12.
SDR sdT Xi
j Ti jT
sd Ti 12
where T is the set of examples that reach the node andT1, T2,
are the subsets that result from splitting thenode according to
chosen attribute. The splitting pro-cess will terminate if the
output values of all theinstances that reach the node vary only
slightly, oronly a few instances remain.
The pruning procedure makes use of an estimateof the expected
error that will be experienced ateach node for the test data.
First, the absolutedifference between the predicted value and the
ac-tual class value is averaged for each of the trainingexamples
that reach that node. This average willunderestimate the expected
error for unseen cases,to compensate this it is multiplied by the
followingequation:
n v * pfn v 13
where n is the number of training instances thatreach that node,
v is the number of parameters inthe model that represents the class
value at that
M1
x1
-
node, pf is a pruning factor. The resulting linearmodel is
simplified by dropping terms to minimizethe estimated error
calculated using the above mul-tiplication factor, which may be
enough to offsetthe inevitable increase in average error over
thetraining instances. Terms are dropped one by oneuntil the error
estimate stops decreasing. Once alinear model is in place for each
interior node, thetree is pruned back from the leaves, so long as
theexpected error decreases.
The smoothing process is used to compensate forthe sharp
discontinuities that will inevitably occurbetween adjacent linear
models at the leaves of thepruned trees. This is a particular
problem for modelsconstructed from a small number of training
instances.The smoothing procedure in M5P first uses the leafmodel
to compute the predicted value, and then filtersthat value along
the path back to the root, smoothing itat each node by combining it
with the value predictedby linear model for that node. The formula
used forsmoothing is:
p0 np kqn k 14
where p is the prediction passed up to the next highernode, p is
the prediction passed to this node frombelow, q is the value
predicted by the model at thisnode, n is the number of training
instances that reachbelow, and k is a constant.
K-nearest neighbors
K-NN technique is an instance-based learning,where training
examples are stored and the gener-alization is postponed until a
prediction made(Mitchell 1997). The K-NN classifies an unknowninput
vector xq by choosing the class of the nearestexample x in the
training set as measured by aEuclidean distance. For real valued
target functions,the estimate is the mean value of the
K-nearestneighboring examples. However, this method isslow for a
bigger test set because it involves find-ing which member of the
training set is closest toan unknown test instance (xq) is
calculate the dis-tance from every member of the training set
andselect the smallest (Witten and Frank 2005). Oneway of improving
this limitation is to considerweighted distance. Thus, the distance
weighted K-NNalgorithm was used in this study. In this algorithm,
each
K-neighbor xi is weighted according to their distancefrom the
query point xq as follows:
f xq Pki1 wi f xi Pk
i1 wi15
where weight wi is a function of distance d(xq, xi) be-tween xq
and xi. The most commonly used weightfunctions are provided in Eqs.
1618, in this study,however, Eq. 17 is used.
Linear : wi 1 d xq; xi 16
Inverse : wi d xq; xi 1 17
Inverse square : wi d xq; xi 2 18
Parameter selection for drainage quality
Modeling of AMD chemistry using machine learningmethods requires
defining control variables that dictatethe process. According to
Morin and his co-workers(1994, 1997, 2010), the most important
factors thatcontrol drainage chemistry are:
1. Geochemical production rates,2. Infiltration of waters,3.
Elapsed time between infiltration events,4. Residence time of water
within rocks,5. Internal temperatures and pore-gas
concentrations
of oxygen and carbon dioxide,6. Particle size, and7. Iron and
sulfur-oxidizing bacteria.
Geochemical production rates refer to the produc-tion rates of
elements, acidity, and alkalinity underacid and pH-neutral
conditions rock. Once produced,these reaction products are either
flushed by flowingwater or accumulate in the rocks. Whenever,
theproducts flushed out of waste rocks, it highly affectsthe
drainage quality. The infiltration of water con-trols the amount of
reaction products to be flushed.The importance of the elapsed time
between infiltrationevents is that it provides opportunity for
reaction prod-ucts to accumulate in the flow channel. It is worth
notingthat both the volume of infiltrating water and the
elapsedtime between infiltration events affect the concentra-tions
and loadings observed in the basal seepage. The
4176 Environ Monit Assess (2013) 185:41714182
-
residence time of water within waste rocks refers to
timerequired for infiltrated water to pass through rocks.
Thusresidence time determines the time of reaction productsto occur
in the basal seepage. The internal temperaturesand pore-gas
concentrations of oxygen and carbon di-oxide can affect the rates
of pyrite oxidation and acidgeneration, and the amount of reactions
products. Forinstance, higher temperatures, lower oxygen, and
highercarbon dioxide could be associated with higher rates ofpyrite
oxidation and acid generation. The particle size isimportant factor
since it primarily affects the surfacearea exposed to weathering
and oxidation. In addition,this factor affects the amount of water
and air percolat-ing into waste rocks. Iron and sulfur oxidizing
bacteriaaffect oxidation rates since they catalyze the
oxidationreaction.
In this study, the physico-chemical parameters mon-itored for
over 25 years from waste rocks were obtainedfrom Island Copper
Mine, British Columbia, Canada(Morin et al. 1995). The monitoring
was routinely per-formed by collecting the drainage samples at
well-established stations and these samples were analyzedby
qualified personnel at Island Copper Mine laborato-ry. The chemical
parameters include pH, conductivity,alkalinity, acidity, sulfate,
and metals. The physicalparameters include flow rate, dissolved
oxygen, andtemperature. The amount of precipitation infiltrated
intowaste rocks was estimated from climatic data. The cli-matic
data such as precipitation, minimum, and maxi-mum temperature were
obtained from EnvironmentCanada. The evapotranspiration of the site
was calculat-ed using Hargreaves method (Hargreaves and Riley1985).
The Hargreaves method uses minimum and max-imum temperature, and
solar radiation to estimateevapotranspiration. Then the site
effective precipitationwas estimated as difference between
precipitation andevapotranspiration. Note that not all data
obtained fromIsland Copper Mine site were used in this study
becauseeither the available data were collected for short span
oftime or have many missing values.
The input variables to machine learning techniquesshould consist
of all relevant variables that influence theAMDs generation
process. However, overlapping infor-mation of input variables
should be avoided to simplifythe task of the training algorithms.
In order to makeparsimonious selection of inputs, the linear
correlationsbetween input and output variables were examined. It
isworth noting that a nonlinearmachine learning techniquecould be
able to make use of more information than is
revealed by this linear technique. The correlation be-tween the
copper concentrations and the others variableswith their time lags
is shown in Table 2. It shows thatthe current copper concentration
highly correlated toprevious time copper concentrations (i.e., t-1
to t-5)and others variables except effective precipitation.While pH
is negatively correlated to current timecopper concentration,
conductivity and acidity are pos-itively correlated to it.
Moreover, this table shows thatthe current time concentration has
strong correlationswith pH, conductivity and acidity at previous
timestate. Therefore, pH, conductivity, acidity, and previ-ous time
copper concentrations were used as controlvariables and the current
time copper concentrationswere used as output.
The statistical summary of the input and outputvariables
considered in this study are summarized inTable 3. These variables
are pH, conductivity, acid-ity, and dissolved copper. The
statistics of the dataincludes minimum, maximum, mean, standard
devi-ation, and coefficient of variation. This table showsthat pH
dataset distribution has the lowest variabili-ty, followed by
conductivity, acidity, and copper. Inaddition, the variability of
the independent variables(i.e., pH, acidity, conductivity, and
effective precipita-tion) and the dependent variable (copper) are
with in areasonable range.
Model development and validation
The dataset was divided into training and testing setsfollowing
the k-fold cross-validation method (Mitchell1997). In the k-fold
cross-validation method, the data-set is subdivided into k subsets
preferably of equalsize. Next, the k-1 subsets are used to train
the ma-chine learning models and the remaining one subset is
Table 2 Correlation between copper concentration and
othervariables with time lags
Time(t (day))
pH Conductivity(S/cm)
Acidity(mg CaCO3/L)
EffectivePrecipitation(mm)
Cu(mg/L)
t 0.74 0.52 0.81 0.02 1.00t-1 0.69 0.51 0.78 0.05 0.94t-2 0.68
0.50 0.76 0.01 0.90t-3 0.65 0.50 0.74 0.01 0.89t-4 0.62 0.49 0.72
0.01 0.86t-5 0.59 0.49 0.71 0.01 0.84
Environ Monit Assess (2013) 185:41714182 4177
-
used for testing the models. In this study, eachsubset has the
size of 128 values and tenfoldcross-validation with stratification
was repeated tentimes. This exercise provided a total 100
indepen-dent models error for each machine learning techni-ques.
This method is computationally very intensive;however, the authors
strongly believe that it provid-ed reliable results.
Model evaluation
The prediction accuracy helps to evaluate the overallmatch
between observed and predicted values foreach machine learning
technique. The predictive ac-curacy of each machine learning
technique was eval-uated using the root mean squared error
(RMSE),mean absolute error (MAE), root relative squarederror
(RRSE), and relative absolute error (RAE);where the smaller value
indicates a better technique.Moreover, a paired t test was used to
determinewhether the mean of error estimates of one machinelearning
technique is significantly different from another
technique. The equations of the error estimates are givenin Eqs.
1922:
RMSE PN
i1 Yo Yp 2n
s19
MAE PN
i1 j Yo Yp jn
20
RRSE Pn
i1 Yo Yp 2Pn
i1 Yp Yp 2
vuut 21
RAE Pn
i1 j Yo Yp jPni1 j Yo Yp j
22
where Yo and Yp represent the observed and predictedoutputs, Yp
represents the mean of the predicted out-put, and n represents the
number of examples pre-sented to the learning algorithms.
A predictive uncertainty refers to the variability ofthe overall
error around the mean error. The predictiveuncertainty of each
machine learning technique wasevaluated using averaged error
residuals of the models.Next, the averaged residuals of the five
techniques areassumed as random variable and 18 probability
distri-butions were fitted using @Risk software
(PalisadeCorporation Inc 2005).
Results and discussions
The performance of the five machine learning techni-ques for
predicting copper concentrations in terms of
Table 3 Variables and summary of the data used in the study
Variables Min Max Mean SD CoV
pHa 3.56 6.46 4.53 0.45 9.93
Conductivitya
(S/cm)500 3,140 1,717.7 596.9 34.8
Aciditya (mgCO3/L)
1.5 570 202.5 128.6 63.5
Cub (mg/L) 0.01 2.50 0.93 0.48 51.74
CoV coefficient of variationa Used as inputs in model
developmentb Used as a model output
Table 4 Performance of models over testing sets
Models MAE RMSE RAE (%) RRSE (%)
Min Mean Max Min Mean Max Min Mean Max Min Mean Max
ANN 0.07 0.17 0.56 0.09 0.22 0.62 11.67 41.05 269.37 13.73 43.19
232.81
SVM-Poly 0.06 0.14 0.42 0.08 0.18 0.46 9.06 32.47 152.07 10.83
36.11 135.91
SVM-RBF 0.06 0.15 0.38 0.09 0.20 0.41 11.32 35.59 218.01 15.16
38.86 188.86
K-NN 0.09 0.28 0.68 0.13 0.34 0.73 17.52 61.29 260.93 17.52
61.29 260.93
M5P 0.05 0.21 0.68 0.06 0.26 1.44 10.57 47.48 205.46 14.39 50.93
193.48
RMSE root mean squared error, MAE mean absolute error, RRSE root
relative squared error, RAE relative absolute error, ANN
artificialneural network, K-NN K-nearest neighbors,M5P model tree,
SVM-Poly support vector machine with polynomial, SVM-RBF radial
basefunction
4178 Environ Monit Assess (2013) 185:41714182
-
four evaluation methods is presented in Table 4. Thistable shows
the best, mean, and worst performance ofthe selected five
techniques. The best and worst valuesshow the performance range of
the techniques, whereasthe mean value shows the average performance
of the
techniques over testing datasets. These indicators areimportant
for making decision in environmental riskanalysis. The comparison
of the mean performancesindicates that SVM-Poly is the best
technique, followedby SVM-RBF, ANN, M5P, and K-NN techniques on
all
0.50.70.91.1
1.31.5
1.7
1.92.1
2.32.5
0.5 0.7 0.9 1.1 1.3 1.5 1.7 1.9 2.1 2.3 2.5
Obs
erve
d Va
lue
(mg/L
)
Predicted Vlaue (mg/L)
ANN
0.50.70.91.1
1.31.5
1.7
1.92.1
2.32.5
0.5 0.7 0.9 1.1 1.3 1.5 1.7 1.9 2.1 2.3 2.5
Obs
erve
d Va
lue
(mg/L
)
Predicted Vlaue (mg/L)
M5P
0.50.70.91.1
1.31.5
1.7
1.92.1
2.32.5
0.5 0.7 0.9 1.1 1.3 1.5 1.7 1.9 2.1 2.3 2.5O
bser
ved
Valu
e (m
g/L)
Predicted Vlaue (mg/L)
SVM-Poly
0.50.70.91.1
1.31.5
1.7
1.92.1
2.32.5
0.5 0.7 0.9 1.1 1.3 1.5 1.7 1.9 2.1 2.3 2.5
Predicted Vlaue (mg/L)
SVM-RBF
0.5
1
1.5
2
2.5
0.5 0.7 0.9 1.1 1.3 1.5 1.7 1.9 2.1 2.3 2.5
Obs
erve
d Va
lue
(mg/L
)
Predicted Vlaue (mg/L)
K-NN
Obs
erve
d Va
lue
(mg/L
)
Fig. 4 Scatter plots of the observed and predicted copper
concentrations
Environ Monit Assess (2013) 185:41714182 4179
-
evaluation methods. The K-NN technique was found tobe the
poorest performing predictive model.
The observed and predicted copper concentrationsusing different
models are presented in Fig. 4. Thisfigure shows that the overall
predictions of SVM-Polyfits best to the ideal line (i.e., the
diagonal line),followed by SVM-RBF, ANN, M5P, and K-NN.
Thisconfirms that the conclusion made above on the per-formance of
the techniques. Most of the predictedvalues of the M5P technique
are below the idealprediction line, whereas the K-NN predictions
areabove the ideal line. This implies that the K-NN tech-nique over
estimates and M5P underestimates theoverall predictions. As can be
seen in Fig. 4, SVM-Poly predicted the high values better, which is
desir-able as it is conservative for environmental risk anal-ysis
decision making. Whereas the high values areunder predicted by
SVM-RBF, ANN, and M5P tech-niques and the K-NN could not predict
the highervalues at all. This suggests that K-NN should not beused
for decision making in which the associated riskis high. It is
interesting to note that there are a fewoutliers in Fig. 4. These
outliers data are seen onlyonce in the testing dataset.
Subsequently, these valueswere either underestimated if the
observed data havehigh values (e.g., 2.5 mg/L) or overestimated if
theobserved data have low values (e.g., 0.96 mg/L) byML techniques.
These indicate that the prediction ofML techniques for outliers
data should be carefullyanalyzed.
A paired t test was used to determine whether themean of error
estimates of one machine learning tech-nique is significantly
different from another technique.This t test is important to ensure
that the obtained resultsare not because of a particular dataset
used. The p values,at a significance level of p00.05, of the paired
t test onprediction error residuals of the five techniques areshown
in Table 5. The test results show that the obtainedresults are
statistically significant except the SVM-Polyand SVM-RBF techniques
predictions. Although SVM-Poly performed better than SVM-RBF on all
modelevaluation methods, this test indicates that difference isnot
statistically significant.
The predictive uncertainty of each machine learningtechnique was
evaluated using error residuals. Theseerror residuals were computed
as a difference betweenmeasured and predicted copper
concentrations. Foreach machine learning technique, the residuals
of 100independent models were calculated and averaged.
Next, the averaged residuals of the five techniques areassumed
as random variable and 18 probability distri-butions were fitted.
The lognormal probability distribu-tion was the best fit to the
residuals of the fivetechniques (Fig. 5). The best technique is the
one thathas residuals represented by narrowest, symmetrical,and
highest probability distribution. SVM-Poly isthe best in terms of
the predictive uncertainty, followedby SVM-RBF and ANN techniques.
M5P shows a widerrange of predictive uncertainty, whereas K-NN
showedthe worst predictive uncertainty.
Summary and conclusions
This study evaluated the accuracy and uncertainty offive machine
learning techniques used for predictingAMD chemistry from
minesites. The machine learningtechniques investigated include ANN
with multilayerperceptrons, SVM-Poly and SVM-RBF kernels, M5Pwith
M5P algorithm, and K-NN. Physico-chemical
Table 5 The p values of the paired t test on error residuals
ANN K-NN M5P SVM-Poly SVM-RBF
ANN 1 0.000 0.009 0.000 0.005
K-NN 1 0.001 0.000 0.000
M5P 1 0.000 0.001
SVM-Poly 1 0.034
SVM-RBF 1
ANN artificial neural network, K-NN K-nearest neighbors,
M5Pmodel tree, SVM-Poly support vector machine with
polynomial,SVM-RBF radial base function
0
2
4
6
8
10
0 0.2 0.4 0.6 0.8 1 1.2 1.4
Pro
babi
lity
Error Residuals
K-NN
M5P
SVM-PolySVM-RBFANN
Fig. 5 The probability distribution of the error residuals of
fivetechniques
4180 Environ Monit Assess (2013) 185:41714182
-
parameters and time lag that influence the drainagechemistry
were identified as important parameters andwere used as inputs in
the five techniques. Although theprecipitation has been mentioned
in the literature as themost controlling parameter of AMD
generation, it hasnot affected the AMD process for this case study
area.The prediction results show that the identified
inputsparameters represented the system dynamic. However,the
predicted results likely improve if more parameters(e.g., flow
rate, internal gases concentrations, and tem-perature) are
considered as more data become availablein the future.
The experimental results showed that the SVM-Polykernel
performed best both in terms of predictive accura-cy and
uncertainty evaluation methods. The SVM withSVM-RBF kernel and ANN
provided better predictionresults, followed by M5P and K-NN showed
the worstperformance in terms of both evaluation measures.
Theseresults indicate that the process of AMD generation ishighly
nonlinear and could not be captured with techni-ques that build
local linear models such as the M5P andK-NN techniques. However,
the SVM and ANN techni-ques have their own limitations. These
techniques takeconsiderable time for training model since they
haveparameters to be optimized and the optimization is
heu-ristically done. Another limitation of ANN is that thefunction,
which represents a given ANN model, is pre-sented by
interconnection weights and threshold valuesand is not easily
understandable by decision-makers.
This study shows that machine learning techniquesare promising
tools for predicting AMD chemistry. Theirprediction results could
be used to evaluate and identifycost-effective AMDmanagement
alternatives for a givenminesite. In addition, these techniques
could be integrat-ed into human and environmental risk assessment
frame-work for sustainable mine wastes management.
Acknowledgments This research has been carried out as apart of
NSERC-DG (Discovery Grant) funded by Natural Sci-ences and
Engineering Research Council of Canada (NSERC).We also like to
thank Minesite Drainage Assessment Group(MDAG) for providing
valuable data to test various models.
References
Allison, J.D., Brown, D.S. & Novo-Gradac, K.J. (1991).
MIN-TEQA2/PRODEFA2- user's manual (version 3.0). A geo-chemical
assessment model for environmental systems.U.S. Environmental
Protection Agency, Athens, GA.EPA/600/3-91/021.
Azapagic, A. (2004). Developing a framework for
sustainabledevelopment indicators for the mining and minerals
industry.Journal of Cleaner Production, 12, 639662.
Bishop, C.M. (1995). Neural networks for pattern
recognition.Oxford: Clarendon.
Bouckaert, R.R., Frank, E., Hall, M., Kirkby, R., Reutemann,
P.,Seewald, A., et al. (2010). WEKA manual (version
3.6.4).Hamilton: University of Waikato.
Cherkassky, V.S., & Mulier, F. (2007). Learning from
data:concepts, theory, and methods. New Jersey: Wiley.
Cherkassky, V., Krasnopolsky, V., Solomatine, D., & Valdes,
J.(2006). Computational intelligence in earth sciences
andenvironmental applications: issues and challenges.
NeuralNetworks, 19, 113121.
Gray, N.F. (1996). Field assessment of acid mine
drainagecontamination in surface and ground water.
EnvironmentalGeology, 27, 358361.
Hargreaves, G., & Riley, J. (1985). Agricultural benefits
forSenegal River basin. Journal of Irrigation Drainage, E-ASCE,
111, 113124.
Khndelwal, M., & Singh, T.N. (2005). Prediction of mine
waterquality by physical parameters. Journal of Scientific
andIndustrial Research, 64, 564570.
Maest, A.S., Kuipers, J.R., Travers, C.L., & Atkins,
D.A.(2005). Predicting water quality at hardrock mines: meth-ods
and models, uncertainty and state-of-the-art. Montana:Kuipers &
Associate.
Mitchell, T. (1997). Machine learning. New York:
McGraw-Hill.Morin, K.A., & Hutt, N.M. (1993). The use of
routine monitor-
ing data for assessment and prediction of water chemistry.In
Proceedings of the 17th Annual Mine Reclamation Sym-posium of
Mining Association of British Columbia, PortHardy, BC, Canada.
Morin, K.A., & Hutt, N.M. (1994). An empirical technique
forpredicting the chemistry of water seeping from mine-rockpiles.
In Proceedings of the international conference on theAbatement of
Acidic Drainage. Pittsburgh, PA, USA.
Morin, K.A., & Hutt, N.M. (1997). Environmental
geochemistryof minesite drainage: practical theory and case
studies.Vancouver: MDAG Publishing.
Morin, K.A., & Hutt, N.M. (2001). Prediction of
minesite-drainage chemistry through closure using operational
mon-itoring data. Journal of Geochemical Exploration,
73(200),123130.
Morin, K.A., Hutt, N.M. & Horne, I.A. (1995). Prediction
offuture water chemistry from Island Copper Mine's On-Land Dumps.
In 19th Annual British Columbia Mine Rec-lamation Symposium. Dawson
Creek, BC, Canada.
Morin, K.A., Hutt, N.M. & Aziz, M.L. (2010). Twenty-three
yearsof monitoring minesite-drainage chemistry, during operationand
after closure: the Equity Silver Minesite, BritishColumbia, Canada.
Available at http://www.mdag.com/case_studies/MDAG-com Case Study
35-23 Years ofMinesite-Drainage Chemistry at Equity Silver
Minesite.pdf.Accessed 13 August 2011.
Palisade Corporation Inc. (2005). Guide to using @RISK.
Ad-vanced risk analysis for spreadsheets. New York:
PalisadeCorporation.
Parkhurst, D.L., & Appelo, C.A. J. (1999). Users guide
toPHREEQC (version 2.18.0). A computer program for spe-ciation,
batch reaction, one-dimensional transport and
Environ Monit Assess (2013) 185:41714182 4181
-
inverse geochemical calculations. U.S. Geological SurveyWater
Resources Investigation, Report 994259.
Perkins, E.H., Nesbitt, H.W., Gunter, W.D., St-Arnaud, L.
C.,Mycroft, J. R., et al. (1995). Critical Review of Geochem-ical
Processes and Geochemical Models Adaptable forPrediction of Acidic
Drainage from Waste Rock. CanadianMine Environment Neutral Drainage
(MEND). Report1.42.1.
Price, W.A. (2009). Prediction Manual of Drainage Chemistryfrom
Sulphidic Geologic Materials. Canadian Mine Envi-ronment Neutral
Drainage (MEND). Report 1.20.1.
Quinlan, J.R. (1992). Learning with continuous classes. In
Pro-ceedings of the 5th Australian Joint Conference on Artifi-cial
Intelligence, Singapore.
Reich, Y. (1997). Machine learning techniques for civil
engi-neering problems. Microcomputer in civil engineering,
12,295310.
Reich, Y., & Barai, S.V. (1999). Evaluating machine
learningmodels for engineering problems. Artificial Intelligence
inEngineering, 13(3), 257272.
Scharer, J.M., Annable, W.K., & Nicholson, R.V.
(1993).WATAIL-Users Manual(version1.0). A tailings basin mod-el to
evaluate transient water quality of acid mine drain-age. Institute
of Groundwater Research, University ofWaterloo.
Smola, A.J. & Schlkopf, B. (1998). A tutorial on
supportvector regression. Report 1998-030, Royal Holloway Col-lege,
London
Solomatine, D.P., & Ostfeld, A. (2008). Data-driven
modelling:some past experiences and new approaches. Journal
ofHydroinformatics, 10(1), 322.
USEPA (1994). Technical document of acid mine
drainageprediction. Washington: Office of Solid Waste.
ReportEPA530-R-94-036.
Vapnik, V. (1995). The Nature of statistical learning theory.New
York: Springer.
Vapnik, V. (1998). Statistical learning theory. New York:
Willey.Walter, A.L., Frind, E.O., Blowes, D. W., Ptacek, C.J.,
Molson, J.
W., et al. (1994). Modelling of multicomponent reactivetransport
in groundwater 1. Model development and evalua-tion. Water
Resources Research, 30(11), 31373148.
Wang, Y. &Witten, I. (1997). Inducing model trees for
continuousclasses. In Poster Papers of the 9th European Conference
onMachine Learning.
William, W.H. (2009). Machine learning methods in the
environ-mental sciences: neural networks and kernels.
Cambridge:Cambridge University Press.
Witten, I.H., & Frank, E. (2005). Data mining: practical
ma-chine learning tools and techniques. San Francisco: Mor-gan
Kaufmann.
4182 Environ Monit Assess (2013) 185:41714182
-
Copyright of Environmental Monitoring & Assessment is the
property of Springer Science & Business MediaB.V. and its
content may not be copied or emailed to multiple sites or posted to
a listserv without the copyrightholder's express written
permission. However, users may print, download, or email articles
for individual use.