Top Banner
Predicting copper concentrations in acid mine drainage: a comparative analysis of five machine learning techniques Getnet D. Betrie & Solomon Tesfamariam & Kevin A. Morin & Rehan Sadiq Received: 27 November 2011 / Accepted: 30 August 2012 / Published online: 15 September 2012 # Springer Science+Business Media B.V. 2012 Abstract Acid mine drainage (AMD) is a global prob- lem that may have serious human health and environ- mental implications. Laboratory and field tests are commonly used for predicting AMD, however, this is challenging since its formation varies from site-to-site for a number of reasons. Furthermore, these tests are often conducted at small-scale over a short period of time. Subsequently, extrapolation of these results into large-scale setting of mine sites introduce huge uncer- tainties for decision-makers. This study presents ma- chine learning techniques to develop models to predict AMD quality using historical monitoring data of a mine site. The machine learning techniques explored in this study include artificial neural networks (ANN), support vector machine with polynomial (SVM-Poly) and radial base function (SVM-RBF) kernels, model tree (M5P), and K-nearest neighbors (K-NN). Input variables (phys- ico-chemical parameters) that influence drainage dy- namics are identified and used to develop models to predict copper concentrations. For these selected techniques, the predictive accuracy and uncertainty were evaluated based on different statistical meas- ures. The results showed that SVM-Poly performed best, followed by the SVM-RBF, ANN, M5P, and KNN techniques. Overall, this study demonstrates that the machine learning techniques are promising tools for predicting AMD quality. Keywords Acid mine drainage . Acid rock drainage . Machine learning . Artificial neural network . Support vector machine . Model tree . K-nearest neighbors Introduction Acid mine drainage (AMD), also called acid rock drain- age, is a major pollution problem globally that is ad- versely affecting the surrounding environment (Gray 1996). The AMD is produced when sulfide-bearing material is exposed to oxygen and water during mining activities (Morin and Hutt 1997; Price 2009). This ex- posure results in oxidation and other weathering pro- cesses, which changes relatively insoluble chemical species in sulfide minerals into more easily dissolved free ionic species (e.g., Cu, As, and Zn) or secondary minerals (e.g., sulfates, carbonates, and hydroxides). Moreover, the oxidation of some sulfide minerals pro- duces acid that may lower the drainage pH. This lower drainage pH could increase the rate of sulfide oxidation, solubility of many products of sulfide oxidation, and rate of weathering of other minerals. Once AMD is produced, water can transport these toxic substances into the environment that contaminate Environ Monit Assess (2013) 185:41714182 DOI 10.1007/s10661-012-2859-7 G. D. Betrie (*) : S. Tesfamariam : R. Sadiq School of Engineering, UBC-Okanagan, Kelowna, BC, Canada e-mail: [email protected] K. A. Morin Minesite Drainage Assessment Group, Vancouver, BC, Canada
13
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • Predicting copper concentrations in acid mine drainage:a comparative analysis of five machine learning techniques

    Getnet D. Betrie & Solomon Tesfamariam &Kevin A. Morin & Rehan Sadiq

    Received: 27 November 2011 /Accepted: 30 August 2012 /Published online: 15 September 2012# Springer Science+Business Media B.V. 2012

    Abstract Acid mine drainage (AMD) is a global prob-lem that may have serious human health and environ-mental implications. Laboratory and field tests arecommonly used for predicting AMD, however, this ischallenging since its formation varies from site-to-sitefor a number of reasons. Furthermore, these tests areoften conducted at small-scale over a short period oftime. Subsequently, extrapolation of these results intolarge-scale setting of mine sites introduce huge uncer-tainties for decision-makers. This study presents ma-chine learning techniques to develop models to predictAMD quality using historical monitoring data of a minesite. The machine learning techniques explored in thisstudy include artificial neural networks (ANN), supportvector machine with polynomial (SVM-Poly) and radialbase function (SVM-RBF) kernels, model tree (M5P),and K-nearest neighbors (K-NN). Input variables (phys-ico-chemical parameters) that influence drainage dy-namics are identified and used to develop models topredict copper concentrations. For these selectedtechniques, the predictive accuracy and uncertaintywere evaluated based on different statistical meas-ures. The results showed that SVM-Poly performed

    best, followed by the SVM-RBF, ANN, M5P, and KNNtechniques. Overall, this study demonstrates that themachine learning techniques are promising tools forpredicting AMD quality.

    Keywords Acid mine drainage . Acid rockdrainage .Machine learning . Artificial neuralnetwork . Support vector machine .

    Model tree . K-nearest neighbors

    Introduction

    Acid mine drainage (AMD), also called acid rock drain-age, is a major pollution problem globally that is ad-versely affecting the surrounding environment (Gray1996). The AMD is produced when sulfide-bearingmaterial is exposed to oxygen and water during miningactivities (Morin and Hutt 1997; Price 2009). This ex-posure results in oxidation and other weathering pro-cesses, which changes relatively insoluble chemicalspecies in sulfide minerals into more easily dissolvedfree ionic species (e.g., Cu, As, and Zn) or secondaryminerals (e.g., sulfates, carbonates, and hydroxides).Moreover, the oxidation of some sulfide minerals pro-duces acid that may lower the drainage pH. This lowerdrainage pH could increase the rate of sulfide oxidation,solubility of many products of sulfide oxidation, andrate of weathering of other minerals.

    Once AMD is produced, water can transport thesetoxic substances into the environment that contaminate

    Environ Monit Assess (2013) 185:41714182DOI 10.1007/s10661-012-2859-7

    G. D. Betrie (*) : S. Tesfamariam :R. SadiqSchool of Engineering, UBC-Okanagan,Kelowna, BC, Canadae-mail: [email protected]

    K. A. MorinMinesite Drainage Assessment Group,Vancouver, BC, Canada

  • water resources and soils. Exposure to these toxic sub-stances may cause serious human and ecological risks(Azapagic 2004). The associated human health riskincludes increased chronic diseases and various typesof cancers. The ecological risks range from eliminationof species, significantly reducing ecological stabilityand bioaccumulation of metals in the flora and fauna(Gray 1996).

    Predicting the future drainage chemistry is impor-tant to assess potential environmental risks of AMDand implement appropriate mitigation measures. How-ever, predicting the potential for AMD can be exceed-ingly challenging because its formation is highlyvariable that varies from site-to-site depending uponmineralogy and other operational and environmentalfactors (USEPA 1994). For this reason, laboratory test,field test, and a variety of predictive modelingapproaches have been used for predicting the potentialof mined materials to generate acid and contaminant(USEPA 1994; Maest et al. 2005; Price 2009). Labo-ratory and field tests are often undertaken for shortperiods of time with respect to the potential persistenceperiod of AMD; hence, they may inadequately mimicthe evolutionary nature of the process of acid generation(USEPA 1994). Predictive modeling approaches havebeen used to overcome the uncertainties inherent inshort-term testing and avoid the prohibitive costs of verylong-term testing.

    Predictive models for AMD can be classified asempirical and deterministic models (USEPA 1994;Perkins et al. 1995; Maest et al. 2005; Price 2009).The summary of deterministic and empirical modelscommonly applied to evaluate mine drainage quality ispresented in Table 1. Empirical models describe thetime-dependent behavior of one or more variables of amine waste geochemical system in terms of observed

    behavior trends. These empirical models are sitespecific and based on years of monitoring at a minesite. Thus, the AMD prediction accuracy of theempirical models depends heavily on the quality ofavailable data. On the other hand, deterministic mod-els describe system in terms of chemical and/orphysical processes that are believed to controlAMD. Deterministic models often require intensivesite-specific studies and data but collecting thosedata with sufficient accuracy is often difficult andexpensive. In this study, the empirical modelingapproach is investigated to make use of monitoring datacollected at a mine site.

    An example of empirical model was providedby Morin and Hutt (2001). These researchers de-veloped an empirical model, named empiricaldrainage-chemistry model (EDCM), and applied it forthe prediction of drainage quality using historical datafrommine sites. The EDCM approach involves definingcorrelation equations using least-linear fitting betweenconcentrations and other geochemical parameters typi-cally pH and sulfate.

    In this paper, machine learning techniques havebeen explored to develop models that predict futuredrainage quality using existing data. This approach iswidely applied to solve environmental and civil engi-neering problems (Reich 1997; William 2009). Anexample of using one of machine learning techniquesto predict AMD was presented by Khndelwal andSingh (2005) although it did not involve the use ofexisting mine sites database. These researchers com-pared artificial neural network (ANN) to multivari-ate regression analysis (MVRA) for prediction ofmine water quality. They reported that ANN providedacceptable results compared with MVRA. However,this comparison lacks adequate performance evaluationmeasures.

    Machine learning approaches are useful to developpredictive models but their use requires insight intothe learning problem formulation, selection of appro-priate learning methods, and evaluation of modelingresults to achieve the stated goal of the modelingactivity (Reich and Barai 1999; Cherkassky et al.2006). This paper compares the predictive accuracyand uncertainty of five selected machine learning tech-niques using rigorous statistical tests. The selectedmachine learning techniques are ANN, support vectormachine with polynomial (SVM-Poly) and radial basefunction (SVM-RBF) kernels, model tree (M5P), and

    Table 1 Summary of deterministic and empirical models usedfor evaluating mine drainage quality

    Deterministic model Empirical model

    MINTEQ (Allisonet al. 1991)

    EDCM (Morin and Hutt 1993,1994; Morin and Hutt 2001)

    PHREEQC (Parkhurstand Appelo 1999)

    MINTRAN (Walteret al. 1994)

    WATAIL (Schareret al. 1993)

    4172 Environ Monit Assess (2013) 185:41714182

  • K-nearest neighbors (K-NN). The prediction accuracyrefers to the difference between observed and pre-dicted values. On the other hand, predictive uncertain-ty refers to the variability of the overall error aroundthe mean error. The detailed description of machinelearning techniques and the study approach are pre-sented in the following sections.

    Materials and methods

    Machine learning techniques

    Machine learning is an algorithm that estimates an un-known dependency between mine waste geochemicalsystem inputs and its outputs from the available data.The available mine waste geochemical data are usuallyrepresented as a pair (xi, yi), which is called an exampleor an instance. The machine learning consists of inputvariables X, mine waste geochemical system thatreturns output Y for each input variable, and amachine learning algorithm that selects mapping

    functions (i.e., bY X ; Y ), which describes how themine waste geochemical system behaves as shown inFig. 1. The goal of learning (training) is to select the bestfunction that minimizes the error between the system

    output (Y) and predicted output bY based on examplesdata. These examples data used for training purpose arecalled a training dataset. The process of building amachine learning model follows general principlesadopted in modeling: study the problem, collect data,select model structure, build the model, test themodel, and iterate (Solomatine and Ostfeld 2008).There are various types of machine learning techni-ques but ANN, support vector machine (SVM),

    M5P, and K-neighbors are explored in this study.These techniques are implemented using WEKA3.6.4 Software (Bouckaert et al. 2010), and theyare described in detail in the following sections.

    Artificial neural network

    ANN is one of the machine learning techniques thatconsist of neurons with massively weighted intercon-nections (Bishop 1995). These neurons are arrangedas input, hidden and output layers as displayed inFig. 2. The task of input layer is only to send theinput signals to the hidden layer without performingany operations. The hidden and output layers multi-ply the input signals by a set of weights and eitherlinearly or nonlinearly transform results into outputvalues. These weights are optimized during ANN train-ing (calibration) process to obtain reasonable predictionsaccuracy.

    In this study, multilayer perceptron is used al-though there are various types of ANN algorithms(Bishop 1995). Multilayer perceptron is feedforwardneural network, where signals always travel in thedirection of the output layer. A typical multilayerperceptron with one hidden layer can be mathemat-ically expressed in Eqs. 14. The outputs of hiddenlayer (Zj) are obtained as (1) summing products ofthe inputs (Xi) and weight vectors (aij) and a hiddenlayers bias term (a0j; see Eq. 1), and (2) trans-forming this sum using transfer function g (seeEq. 2). The most widely used transfer functionsare logistic and hyperbolic tangent. Similarly, theoutputs of the output layer (Yk) are obtained by (1)summing products of hidden layers outputs (Zj)and weight vectors (bjk) and output layers bias

    MACHINE LEARNING

    AMD SYSTEM

    INPUT DATA

    MINIMIZE

    X

    Y

    YFig. 1 A machine learningalgorithm using real systemdata to predict output

    Environ Monit Assess (2013) 185:41714182 4173

  • term (b0k; see Eq. 3) and (2) transforming this sumusing transfer function g (see Eq. 4).

    uj X

    Ninpi1Xiaij aoj 1

    Zj guj 2

    vk X

    Nhidj1Zjbjk b0k 3

    Yk g vk 4

    Support vector machine

    SVM was mainly developed by Vapnik and co-workers (Vapnik 1998; Cherkassky and Mulier2007). Its principle is based on Structural Risk Mini-mization that overcomes the limitation of the tradition-al empirical risk minimization technique under limitedtraining data. Structural risk minimization aims atminimizing a bound on the generalization error of amodel instead of minimizing the error on the trainingdataset. The SVM algorithm was first developed forclassification problems and then adapted to addressregression problems. In this study, the basic idea ofSVM regression is illustrated since a regression prob-lem is solved.

    The complete description of SVM regression iswell presented by Smola and Schlkopf (1998) and asummary of it is presented in this study. Given a

    training dataset (xi, yi), where xi is the ith input patternand yi is corresponding target value Yi . The goal ofSVM regression is to find a function f(x) that has atmost deviation from actually obtained targets yi forall training data, and at the same time, is as flat aspossible (Vapnik 1995). The function f is representedusing a linear function in the feature space

    f x w; xh i b with w 2 X ; b 2 R 5

    where .,. denotes the dot product in X. In this case, theflatness means seeking a small w. This can be ensuredby minimizing the norm (i.e., w20w.w) if theassumption that a function f is known a priori to approx-imate all pairs (xi, yi) with precision. If such function isnot known a priori, it is possible to introduce slack

    variables xi;x*i and allow for some errors. This minimi-

    zation problem can be mathematically expressed as

    minimize1

    2k wk2 C

    Xi

    xi x*i 6

    subject toyi w; xih i b " xiw; xih i b yi " x*i

    xi; xi 0

    80 determines the tradeoff between

    the flatness of f and the amount up to which deviationslarger than are tolerated. The constrained optimizationproblem is converted into unconstrained optimizationby introducing Lagrange function. The Lagrange func-tion is constructed from the objective function and thecorresponding constraints by introducing a dual set ofvariables as follows:

    L : 12k wk2 C

    Xli1 xi x*i

    X

    li1ai " xi yi w; xih i b

    X

    li1a

    i " x*i yi w; xih i b

    X

    li1 ixi i x*i

    7

    It follows from the saddle point condition that thepartial derivatives of L with respect to the primal

    XNinp

    X1 aij Z1

    ZNhid

    Y1

    YNout

    bij

    Fig. 2 Multilayer perceptron neural networks

    4174 Environ Monit Assess (2013) 185:41714182

  • variables w; b; xi; x*i

    have to vanish for optimality.

    Substituting the results of this derivation into Eq. 7yields the dual optimization problem.

    Maximize 12Pl

    i1 ai a*i

    aj a*j

    xi; xj

    "Pli1 ai a*i Pli1 yi ai a*i Subject to

    Pli1 ai a*i 0 and ai; a*i 2 0;C

    8Once the coefficient i and a*i are determined from

    Eq. 8, the desired vectors can be written as follows:

    w X

    li1 ai a*i

    xi; and therefore

    f x X

    ii1 ai a*i

    xi; xh i b

    9

    Nonlinear regression problems are very common inmost engineering applications. In such case, a nonlin-ear mapping kernel K is used to map the data into ahigher-dimensional feature space or hyperplane by thefunction . The kernel function, K(xi, x)0(xi),(x)can assume any form. In this study, the SVM-Poly andSVM-RBF kernels are used. These kernels are pre-sented in Eqs. 10 and 11.

    Polynomial kemel : K xi; x y xi; xh i t d ; g > 010

    Radial basis function kemel : K xi; x exp g k xi xk2

    ; g > 0 11

    where , , and d are kernel parameters.

    Model trees

    M5P are tree-based models for dealing with continuous-class learning problems with piecewise linear func-tions, originally developed by Quinlan (1992). Theschematic representation of model tree is depictedin Fig. 3. Given a training set T, this set is eitherassociated with a leaf or some test is chosen thatsplits T into subsets corresponding to the test out-comes and the same process is applied recursivelyto the subsets. For a new input vector, (1) it isclassified to one of the subsets and (2) the correspondingmodel is run to produce the prediction. The steps to

    build M5P are building the initial tree, pruning andsmoothing.

    In the building tree procedure, a splitting criterionin each node is determined. The splitting criterion isbased on treating the standard deviation of the classvalues that reach a node as a measure of the error atthat node, and calculating the expected reduction inerror as a result of testing each attribute at that node(Wang and Witten 1997). The attribute which max-imizes the expected error reduction is chosen. Thestandard deviation reduction (SDR) is calculated usingEq. 12.

    SDR sdT Xi

    j Ti jT

    sd Ti 12

    where T is the set of examples that reach the node andT1, T2, are the subsets that result from splitting thenode according to chosen attribute. The splitting pro-cess will terminate if the output values of all theinstances that reach the node vary only slightly, oronly a few instances remain.

    The pruning procedure makes use of an estimateof the expected error that will be experienced ateach node for the test data. First, the absolutedifference between the predicted value and the ac-tual class value is averaged for each of the trainingexamples that reach that node. This average willunderestimate the expected error for unseen cases,to compensate this it is multiplied by the followingequation:

    n v * pfn v 13

    where n is the number of training instances thatreach that node, v is the number of parameters inthe model that represents the class value at that

    M1

    x1

  • node, pf is a pruning factor. The resulting linearmodel is simplified by dropping terms to minimizethe estimated error calculated using the above mul-tiplication factor, which may be enough to offsetthe inevitable increase in average error over thetraining instances. Terms are dropped one by oneuntil the error estimate stops decreasing. Once alinear model is in place for each interior node, thetree is pruned back from the leaves, so long as theexpected error decreases.

    The smoothing process is used to compensate forthe sharp discontinuities that will inevitably occurbetween adjacent linear models at the leaves of thepruned trees. This is a particular problem for modelsconstructed from a small number of training instances.The smoothing procedure in M5P first uses the leafmodel to compute the predicted value, and then filtersthat value along the path back to the root, smoothing itat each node by combining it with the value predictedby linear model for that node. The formula used forsmoothing is:

    p0 np kqn k 14

    where p is the prediction passed up to the next highernode, p is the prediction passed to this node frombelow, q is the value predicted by the model at thisnode, n is the number of training instances that reachbelow, and k is a constant.

    K-nearest neighbors

    K-NN technique is an instance-based learning,where training examples are stored and the gener-alization is postponed until a prediction made(Mitchell 1997). The K-NN classifies an unknowninput vector xq by choosing the class of the nearestexample x in the training set as measured by aEuclidean distance. For real valued target functions,the estimate is the mean value of the K-nearestneighboring examples. However, this method isslow for a bigger test set because it involves find-ing which member of the training set is closest toan unknown test instance (xq) is calculate the dis-tance from every member of the training set andselect the smallest (Witten and Frank 2005). Oneway of improving this limitation is to considerweighted distance. Thus, the distance weighted K-NNalgorithm was used in this study. In this algorithm, each

    K-neighbor xi is weighted according to their distancefrom the query point xq as follows:

    f xq Pki1 wi f xi Pk

    i1 wi15

    where weight wi is a function of distance d(xq, xi) be-tween xq and xi. The most commonly used weightfunctions are provided in Eqs. 1618, in this study,however, Eq. 17 is used.

    Linear : wi 1 d xq; xi 16

    Inverse : wi d xq; xi 1 17

    Inverse square : wi d xq; xi 2 18

    Parameter selection for drainage quality

    Modeling of AMD chemistry using machine learningmethods requires defining control variables that dictatethe process. According to Morin and his co-workers(1994, 1997, 2010), the most important factors thatcontrol drainage chemistry are:

    1. Geochemical production rates,2. Infiltration of waters,3. Elapsed time between infiltration events,4. Residence time of water within rocks,5. Internal temperatures and pore-gas concentrations

    of oxygen and carbon dioxide,6. Particle size, and7. Iron and sulfur-oxidizing bacteria.

    Geochemical production rates refer to the produc-tion rates of elements, acidity, and alkalinity underacid and pH-neutral conditions rock. Once produced,these reaction products are either flushed by flowingwater or accumulate in the rocks. Whenever, theproducts flushed out of waste rocks, it highly affectsthe drainage quality. The infiltration of water con-trols the amount of reaction products to be flushed.The importance of the elapsed time between infiltrationevents is that it provides opportunity for reaction prod-ucts to accumulate in the flow channel. It is worth notingthat both the volume of infiltrating water and the elapsedtime between infiltration events affect the concentra-tions and loadings observed in the basal seepage. The

    4176 Environ Monit Assess (2013) 185:41714182

  • residence time of water within waste rocks refers to timerequired for infiltrated water to pass through rocks. Thusresidence time determines the time of reaction productsto occur in the basal seepage. The internal temperaturesand pore-gas concentrations of oxygen and carbon di-oxide can affect the rates of pyrite oxidation and acidgeneration, and the amount of reactions products. Forinstance, higher temperatures, lower oxygen, and highercarbon dioxide could be associated with higher rates ofpyrite oxidation and acid generation. The particle size isimportant factor since it primarily affects the surfacearea exposed to weathering and oxidation. In addition,this factor affects the amount of water and air percolat-ing into waste rocks. Iron and sulfur oxidizing bacteriaaffect oxidation rates since they catalyze the oxidationreaction.

    In this study, the physico-chemical parameters mon-itored for over 25 years from waste rocks were obtainedfrom Island Copper Mine, British Columbia, Canada(Morin et al. 1995). The monitoring was routinely per-formed by collecting the drainage samples at well-established stations and these samples were analyzedby qualified personnel at Island Copper Mine laborato-ry. The chemical parameters include pH, conductivity,alkalinity, acidity, sulfate, and metals. The physicalparameters include flow rate, dissolved oxygen, andtemperature. The amount of precipitation infiltrated intowaste rocks was estimated from climatic data. The cli-matic data such as precipitation, minimum, and maxi-mum temperature were obtained from EnvironmentCanada. The evapotranspiration of the site was calculat-ed using Hargreaves method (Hargreaves and Riley1985). The Hargreaves method uses minimum and max-imum temperature, and solar radiation to estimateevapotranspiration. Then the site effective precipitationwas estimated as difference between precipitation andevapotranspiration. Note that not all data obtained fromIsland Copper Mine site were used in this study becauseeither the available data were collected for short span oftime or have many missing values.

    The input variables to machine learning techniquesshould consist of all relevant variables that influence theAMDs generation process. However, overlapping infor-mation of input variables should be avoided to simplifythe task of the training algorithms. In order to makeparsimonious selection of inputs, the linear correlationsbetween input and output variables were examined. It isworth noting that a nonlinearmachine learning techniquecould be able to make use of more information than is

    revealed by this linear technique. The correlation be-tween the copper concentrations and the others variableswith their time lags is shown in Table 2. It shows thatthe current copper concentration highly correlated toprevious time copper concentrations (i.e., t-1 to t-5)and others variables except effective precipitation.While pH is negatively correlated to current timecopper concentration, conductivity and acidity are pos-itively correlated to it. Moreover, this table shows thatthe current time concentration has strong correlationswith pH, conductivity and acidity at previous timestate. Therefore, pH, conductivity, acidity, and previ-ous time copper concentrations were used as controlvariables and the current time copper concentrationswere used as output.

    The statistical summary of the input and outputvariables considered in this study are summarized inTable 3. These variables are pH, conductivity, acid-ity, and dissolved copper. The statistics of the dataincludes minimum, maximum, mean, standard devi-ation, and coefficient of variation. This table showsthat pH dataset distribution has the lowest variabili-ty, followed by conductivity, acidity, and copper. Inaddition, the variability of the independent variables(i.e., pH, acidity, conductivity, and effective precipita-tion) and the dependent variable (copper) are with in areasonable range.

    Model development and validation

    The dataset was divided into training and testing setsfollowing the k-fold cross-validation method (Mitchell1997). In the k-fold cross-validation method, the data-set is subdivided into k subsets preferably of equalsize. Next, the k-1 subsets are used to train the ma-chine learning models and the remaining one subset is

    Table 2 Correlation between copper concentration and othervariables with time lags

    Time(t (day))

    pH Conductivity(S/cm)

    Acidity(mg CaCO3/L)

    EffectivePrecipitation(mm)

    Cu(mg/L)

    t 0.74 0.52 0.81 0.02 1.00t-1 0.69 0.51 0.78 0.05 0.94t-2 0.68 0.50 0.76 0.01 0.90t-3 0.65 0.50 0.74 0.01 0.89t-4 0.62 0.49 0.72 0.01 0.86t-5 0.59 0.49 0.71 0.01 0.84

    Environ Monit Assess (2013) 185:41714182 4177

  • used for testing the models. In this study, eachsubset has the size of 128 values and tenfoldcross-validation with stratification was repeated tentimes. This exercise provided a total 100 indepen-dent models error for each machine learning techni-ques. This method is computationally very intensive;however, the authors strongly believe that it provid-ed reliable results.

    Model evaluation

    The prediction accuracy helps to evaluate the overallmatch between observed and predicted values foreach machine learning technique. The predictive ac-curacy of each machine learning technique was eval-uated using the root mean squared error (RMSE),mean absolute error (MAE), root relative squarederror (RRSE), and relative absolute error (RAE);where the smaller value indicates a better technique.Moreover, a paired t test was used to determinewhether the mean of error estimates of one machinelearning technique is significantly different from another

    technique. The equations of the error estimates are givenin Eqs. 1922:

    RMSE PN

    i1 Yo Yp 2n

    s19

    MAE PN

    i1 j Yo Yp jn

    20

    RRSE Pn

    i1 Yo Yp 2Pn

    i1 Yp Yp 2

    vuut 21

    RAE Pn

    i1 j Yo Yp jPni1 j Yo Yp j

    22

    where Yo and Yp represent the observed and predictedoutputs, Yp represents the mean of the predicted out-put, and n represents the number of examples pre-sented to the learning algorithms.

    A predictive uncertainty refers to the variability ofthe overall error around the mean error. The predictiveuncertainty of each machine learning technique wasevaluated using averaged error residuals of the models.Next, the averaged residuals of the five techniques areassumed as random variable and 18 probability distri-butions were fitted using @Risk software (PalisadeCorporation Inc 2005).

    Results and discussions

    The performance of the five machine learning techni-ques for predicting copper concentrations in terms of

    Table 3 Variables and summary of the data used in the study

    Variables Min Max Mean SD CoV

    pHa 3.56 6.46 4.53 0.45 9.93

    Conductivitya

    (S/cm)500 3,140 1,717.7 596.9 34.8

    Aciditya (mgCO3/L)

    1.5 570 202.5 128.6 63.5

    Cub (mg/L) 0.01 2.50 0.93 0.48 51.74

    CoV coefficient of variationa Used as inputs in model developmentb Used as a model output

    Table 4 Performance of models over testing sets

    Models MAE RMSE RAE (%) RRSE (%)

    Min Mean Max Min Mean Max Min Mean Max Min Mean Max

    ANN 0.07 0.17 0.56 0.09 0.22 0.62 11.67 41.05 269.37 13.73 43.19 232.81

    SVM-Poly 0.06 0.14 0.42 0.08 0.18 0.46 9.06 32.47 152.07 10.83 36.11 135.91

    SVM-RBF 0.06 0.15 0.38 0.09 0.20 0.41 11.32 35.59 218.01 15.16 38.86 188.86

    K-NN 0.09 0.28 0.68 0.13 0.34 0.73 17.52 61.29 260.93 17.52 61.29 260.93

    M5P 0.05 0.21 0.68 0.06 0.26 1.44 10.57 47.48 205.46 14.39 50.93 193.48

    RMSE root mean squared error, MAE mean absolute error, RRSE root relative squared error, RAE relative absolute error, ANN artificialneural network, K-NN K-nearest neighbors,M5P model tree, SVM-Poly support vector machine with polynomial, SVM-RBF radial basefunction

    4178 Environ Monit Assess (2013) 185:41714182

  • four evaluation methods is presented in Table 4. Thistable shows the best, mean, and worst performance ofthe selected five techniques. The best and worst valuesshow the performance range of the techniques, whereasthe mean value shows the average performance of the

    techniques over testing datasets. These indicators areimportant for making decision in environmental riskanalysis. The comparison of the mean performancesindicates that SVM-Poly is the best technique, followedby SVM-RBF, ANN, M5P, and K-NN techniques on all

    0.50.70.91.1

    1.31.5

    1.7

    1.92.1

    2.32.5

    0.5 0.7 0.9 1.1 1.3 1.5 1.7 1.9 2.1 2.3 2.5

    Obs

    erve

    d Va

    lue

    (mg/L

    )

    Predicted Vlaue (mg/L)

    ANN

    0.50.70.91.1

    1.31.5

    1.7

    1.92.1

    2.32.5

    0.5 0.7 0.9 1.1 1.3 1.5 1.7 1.9 2.1 2.3 2.5

    Obs

    erve

    d Va

    lue

    (mg/L

    )

    Predicted Vlaue (mg/L)

    M5P

    0.50.70.91.1

    1.31.5

    1.7

    1.92.1

    2.32.5

    0.5 0.7 0.9 1.1 1.3 1.5 1.7 1.9 2.1 2.3 2.5O

    bser

    ved

    Valu

    e (m

    g/L)

    Predicted Vlaue (mg/L)

    SVM-Poly

    0.50.70.91.1

    1.31.5

    1.7

    1.92.1

    2.32.5

    0.5 0.7 0.9 1.1 1.3 1.5 1.7 1.9 2.1 2.3 2.5

    Predicted Vlaue (mg/L)

    SVM-RBF

    0.5

    1

    1.5

    2

    2.5

    0.5 0.7 0.9 1.1 1.3 1.5 1.7 1.9 2.1 2.3 2.5

    Obs

    erve

    d Va

    lue

    (mg/L

    )

    Predicted Vlaue (mg/L)

    K-NN

    Obs

    erve

    d Va

    lue

    (mg/L

    )

    Fig. 4 Scatter plots of the observed and predicted copper concentrations

    Environ Monit Assess (2013) 185:41714182 4179

  • evaluation methods. The K-NN technique was found tobe the poorest performing predictive model.

    The observed and predicted copper concentrationsusing different models are presented in Fig. 4. Thisfigure shows that the overall predictions of SVM-Polyfits best to the ideal line (i.e., the diagonal line),followed by SVM-RBF, ANN, M5P, and K-NN. Thisconfirms that the conclusion made above on the per-formance of the techniques. Most of the predictedvalues of the M5P technique are below the idealprediction line, whereas the K-NN predictions areabove the ideal line. This implies that the K-NN tech-nique over estimates and M5P underestimates theoverall predictions. As can be seen in Fig. 4, SVM-Poly predicted the high values better, which is desir-able as it is conservative for environmental risk anal-ysis decision making. Whereas the high values areunder predicted by SVM-RBF, ANN, and M5P tech-niques and the K-NN could not predict the highervalues at all. This suggests that K-NN should not beused for decision making in which the associated riskis high. It is interesting to note that there are a fewoutliers in Fig. 4. These outliers data are seen onlyonce in the testing dataset. Subsequently, these valueswere either underestimated if the observed data havehigh values (e.g., 2.5 mg/L) or overestimated if theobserved data have low values (e.g., 0.96 mg/L) byML techniques. These indicate that the prediction ofML techniques for outliers data should be carefullyanalyzed.

    A paired t test was used to determine whether themean of error estimates of one machine learning tech-nique is significantly different from another technique.This t test is important to ensure that the obtained resultsare not because of a particular dataset used. The p values,at a significance level of p00.05, of the paired t test onprediction error residuals of the five techniques areshown in Table 5. The test results show that the obtainedresults are statistically significant except the SVM-Polyand SVM-RBF techniques predictions. Although SVM-Poly performed better than SVM-RBF on all modelevaluation methods, this test indicates that difference isnot statistically significant.

    The predictive uncertainty of each machine learningtechnique was evaluated using error residuals. Theseerror residuals were computed as a difference betweenmeasured and predicted copper concentrations. Foreach machine learning technique, the residuals of 100independent models were calculated and averaged.

    Next, the averaged residuals of the five techniques areassumed as random variable and 18 probability distri-butions were fitted. The lognormal probability distribu-tion was the best fit to the residuals of the fivetechniques (Fig. 5). The best technique is the one thathas residuals represented by narrowest, symmetrical,and highest probability distribution. SVM-Poly isthe best in terms of the predictive uncertainty, followedby SVM-RBF and ANN techniques. M5P shows a widerrange of predictive uncertainty, whereas K-NN showedthe worst predictive uncertainty.

    Summary and conclusions

    This study evaluated the accuracy and uncertainty offive machine learning techniques used for predictingAMD chemistry from minesites. The machine learningtechniques investigated include ANN with multilayerperceptrons, SVM-Poly and SVM-RBF kernels, M5Pwith M5P algorithm, and K-NN. Physico-chemical

    Table 5 The p values of the paired t test on error residuals

    ANN K-NN M5P SVM-Poly SVM-RBF

    ANN 1 0.000 0.009 0.000 0.005

    K-NN 1 0.001 0.000 0.000

    M5P 1 0.000 0.001

    SVM-Poly 1 0.034

    SVM-RBF 1

    ANN artificial neural network, K-NN K-nearest neighbors, M5Pmodel tree, SVM-Poly support vector machine with polynomial,SVM-RBF radial base function

    0

    2

    4

    6

    8

    10

    0 0.2 0.4 0.6 0.8 1 1.2 1.4

    Pro

    babi

    lity

    Error Residuals

    K-NN

    M5P

    SVM-PolySVM-RBFANN

    Fig. 5 The probability distribution of the error residuals of fivetechniques

    4180 Environ Monit Assess (2013) 185:41714182

  • parameters and time lag that influence the drainagechemistry were identified as important parameters andwere used as inputs in the five techniques. Although theprecipitation has been mentioned in the literature as themost controlling parameter of AMD generation, it hasnot affected the AMD process for this case study area.The prediction results show that the identified inputsparameters represented the system dynamic. However,the predicted results likely improve if more parameters(e.g., flow rate, internal gases concentrations, and tem-perature) are considered as more data become availablein the future.

    The experimental results showed that the SVM-Polykernel performed best both in terms of predictive accura-cy and uncertainty evaluation methods. The SVM withSVM-RBF kernel and ANN provided better predictionresults, followed by M5P and K-NN showed the worstperformance in terms of both evaluation measures. Theseresults indicate that the process of AMD generation ishighly nonlinear and could not be captured with techni-ques that build local linear models such as the M5P andK-NN techniques. However, the SVM and ANN techni-ques have their own limitations. These techniques takeconsiderable time for training model since they haveparameters to be optimized and the optimization is heu-ristically done. Another limitation of ANN is that thefunction, which represents a given ANN model, is pre-sented by interconnection weights and threshold valuesand is not easily understandable by decision-makers.

    This study shows that machine learning techniquesare promising tools for predicting AMD chemistry. Theirprediction results could be used to evaluate and identifycost-effective AMDmanagement alternatives for a givenminesite. In addition, these techniques could be integrat-ed into human and environmental risk assessment frame-work for sustainable mine wastes management.

    Acknowledgments This research has been carried out as apart of NSERC-DG (Discovery Grant) funded by Natural Sci-ences and Engineering Research Council of Canada (NSERC).We also like to thank Minesite Drainage Assessment Group(MDAG) for providing valuable data to test various models.

    References

    Allison, J.D., Brown, D.S. & Novo-Gradac, K.J. (1991). MIN-TEQA2/PRODEFA2- user's manual (version 3.0). A geo-chemical assessment model for environmental systems.U.S. Environmental Protection Agency, Athens, GA.EPA/600/3-91/021.

    Azapagic, A. (2004). Developing a framework for sustainabledevelopment indicators for the mining and minerals industry.Journal of Cleaner Production, 12, 639662.

    Bishop, C.M. (1995). Neural networks for pattern recognition.Oxford: Clarendon.

    Bouckaert, R.R., Frank, E., Hall, M., Kirkby, R., Reutemann, P.,Seewald, A., et al. (2010). WEKA manual (version 3.6.4).Hamilton: University of Waikato.

    Cherkassky, V.S., & Mulier, F. (2007). Learning from data:concepts, theory, and methods. New Jersey: Wiley.

    Cherkassky, V., Krasnopolsky, V., Solomatine, D., & Valdes, J.(2006). Computational intelligence in earth sciences andenvironmental applications: issues and challenges. NeuralNetworks, 19, 113121.

    Gray, N.F. (1996). Field assessment of acid mine drainagecontamination in surface and ground water. EnvironmentalGeology, 27, 358361.

    Hargreaves, G., & Riley, J. (1985). Agricultural benefits forSenegal River basin. Journal of Irrigation Drainage, E-ASCE, 111, 113124.

    Khndelwal, M., & Singh, T.N. (2005). Prediction of mine waterquality by physical parameters. Journal of Scientific andIndustrial Research, 64, 564570.

    Maest, A.S., Kuipers, J.R., Travers, C.L., & Atkins, D.A.(2005). Predicting water quality at hardrock mines: meth-ods and models, uncertainty and state-of-the-art. Montana:Kuipers & Associate.

    Mitchell, T. (1997). Machine learning. New York: McGraw-Hill.Morin, K.A., & Hutt, N.M. (1993). The use of routine monitor-

    ing data for assessment and prediction of water chemistry.In Proceedings of the 17th Annual Mine Reclamation Sym-posium of Mining Association of British Columbia, PortHardy, BC, Canada.

    Morin, K.A., & Hutt, N.M. (1994). An empirical technique forpredicting the chemistry of water seeping from mine-rockpiles. In Proceedings of the international conference on theAbatement of Acidic Drainage. Pittsburgh, PA, USA.

    Morin, K.A., & Hutt, N.M. (1997). Environmental geochemistryof minesite drainage: practical theory and case studies.Vancouver: MDAG Publishing.

    Morin, K.A., & Hutt, N.M. (2001). Prediction of minesite-drainage chemistry through closure using operational mon-itoring data. Journal of Geochemical Exploration, 73(200),123130.

    Morin, K.A., Hutt, N.M. & Horne, I.A. (1995). Prediction offuture water chemistry from Island Copper Mine's On-Land Dumps. In 19th Annual British Columbia Mine Rec-lamation Symposium. Dawson Creek, BC, Canada.

    Morin, K.A., Hutt, N.M. & Aziz, M.L. (2010). Twenty-three yearsof monitoring minesite-drainage chemistry, during operationand after closure: the Equity Silver Minesite, BritishColumbia, Canada. Available at http://www.mdag.com/case_studies/MDAG-com Case Study 35-23 Years ofMinesite-Drainage Chemistry at Equity Silver Minesite.pdf.Accessed 13 August 2011.

    Palisade Corporation Inc. (2005). Guide to using @RISK. Ad-vanced risk analysis for spreadsheets. New York: PalisadeCorporation.

    Parkhurst, D.L., & Appelo, C.A. J. (1999). Users guide toPHREEQC (version 2.18.0). A computer program for spe-ciation, batch reaction, one-dimensional transport and

    Environ Monit Assess (2013) 185:41714182 4181

  • inverse geochemical calculations. U.S. Geological SurveyWater Resources Investigation, Report 994259.

    Perkins, E.H., Nesbitt, H.W., Gunter, W.D., St-Arnaud, L. C.,Mycroft, J. R., et al. (1995). Critical Review of Geochem-ical Processes and Geochemical Models Adaptable forPrediction of Acidic Drainage from Waste Rock. CanadianMine Environment Neutral Drainage (MEND). Report1.42.1.

    Price, W.A. (2009). Prediction Manual of Drainage Chemistryfrom Sulphidic Geologic Materials. Canadian Mine Envi-ronment Neutral Drainage (MEND). Report 1.20.1.

    Quinlan, J.R. (1992). Learning with continuous classes. In Pro-ceedings of the 5th Australian Joint Conference on Artifi-cial Intelligence, Singapore.

    Reich, Y. (1997). Machine learning techniques for civil engi-neering problems. Microcomputer in civil engineering, 12,295310.

    Reich, Y., & Barai, S.V. (1999). Evaluating machine learningmodels for engineering problems. Artificial Intelligence inEngineering, 13(3), 257272.

    Scharer, J.M., Annable, W.K., & Nicholson, R.V. (1993).WATAIL-Users Manual(version1.0). A tailings basin mod-el to evaluate transient water quality of acid mine drain-age. Institute of Groundwater Research, University ofWaterloo.

    Smola, A.J. & Schlkopf, B. (1998). A tutorial on supportvector regression. Report 1998-030, Royal Holloway Col-lege, London

    Solomatine, D.P., & Ostfeld, A. (2008). Data-driven modelling:some past experiences and new approaches. Journal ofHydroinformatics, 10(1), 322.

    USEPA (1994). Technical document of acid mine drainageprediction. Washington: Office of Solid Waste. ReportEPA530-R-94-036.

    Vapnik, V. (1995). The Nature of statistical learning theory.New York: Springer.

    Vapnik, V. (1998). Statistical learning theory. New York: Willey.Walter, A.L., Frind, E.O., Blowes, D. W., Ptacek, C.J., Molson, J.

    W., et al. (1994). Modelling of multicomponent reactivetransport in groundwater 1. Model development and evalua-tion. Water Resources Research, 30(11), 31373148.

    Wang, Y. &Witten, I. (1997). Inducing model trees for continuousclasses. In Poster Papers of the 9th European Conference onMachine Learning.

    William, W.H. (2009). Machine learning methods in the environ-mental sciences: neural networks and kernels. Cambridge:Cambridge University Press.

    Witten, I.H., & Frank, E. (2005). Data mining: practical ma-chine learning tools and techniques. San Francisco: Mor-gan Kaufmann.

    4182 Environ Monit Assess (2013) 185:41714182

  • Copyright of Environmental Monitoring & Assessment is the property of Springer Science & Business MediaB.V. and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyrightholder's express written permission. However, users may print, download, or email articles for individual use.