Research Article Nonlinear Survival Regression Using Artificial Neural …downloads.hindawi.com/journals/jps/2013/753930.pdf · 2019. 7. 31. · regression, is a popular method in
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Hindawi Publishing CorporationJournal of Probability and StatisticsVolume 2013 Article ID 753930 7 pageshttpdxdoiorg1011552013753930
Research ArticleNonlinear Survival Regression Using Artificial Neural Network
Akbar Biglarian1 Enayatollah Bakhshi1 Ahmad Reza Baghestani2
Mahmood Reza Gohari3 Mehdi Rahgozar1 and Masoud Karimloo1
1 Department of Biostatistics University of Social Welfare and Rehabilitation Sciences (USWRS) Tehran 1985713834 Iran2Department of Biostatistics Faculty of Paramedical Sciences Shahid Beheshti University of Medical Sciences Tehran 1971653313 Iran3Hospital Management Research Center Tehran University of Medical Sciences (TUMS) Tehran 1996713883 Iran
Correspondence should be addressed to Akbar Biglarian abiglariangmailcom
Received 9 May 2012 Revised 21 November 2012 Accepted 23 November 2012
Academic Editor Shein-chung Chow
Copyright copy 2013 Akbar Biglarian et alThis is an open access article distributed under the Creative CommonsAttribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited
Survival analysismethods deal with a type of data which is waiting time till occurrence of an event One commonmethod to analyzethis sort of data is Cox regression Sometimes the underlying assumptions of the model are not true such as nonproportionalityfor the Cox model In model building choosing an appropriate model depends on complexity and the characteristics of the datathat effect the appropriateness of the model One strategy which is used nowadays frequently is artificial neural network (ANN)model which needs a minimal assumption This study aimed to compare predictions of the ANN and Cox models by simulateddata sets which the average censoring rate were considered 20 to 80 in both simple and complex model All simulations andcomparisons were performed by R 2141
1 Introduction
Many different parametric nonparametric and semipara-metric regression methods are increasingly examined toexplore the relationship between a response variable and aset of covariates The choice of an appropriate method formodeling depends on the methodology of the survey and thenature of the outcome and explanatory variables
A common research question in medical research is todetermine whether a set of covariates are correlated with thesurvival or failure times Twomajor characteristics of survivaldata are censoring and violation of normal assumption forordinary least squares multiple regressions These two char-acteristics of time variable are reasons that straightforwardmultiple regression techniques cannot be used Differentparametric and semiparametricmodels in survival regressionwere introduced which model survival or hazard functionParametric models for instance exponential or weibull pre-dict survival function while accelerated failure time modelsare parametric regression methods with logarithm failuretime as dependent variable [1 2]
Choosing an appropriate model for the analysis of thesurvival data depends on some conditions which are called
the underlying assumptions of the model Sometimes theseassumptions may not be true for example (a) lack of inde-pendence between consequent waiting times to occurrence ofan event or nonproportionality of hazards in semiparametricmodels (b) lack of independency of censoring or the distri-bution of failure times in the case of parametric models [1ndash3]
Although the Cox regression model is an efficient strat-egy in analyzing survival data but when the assumptions ofthis model are fail the free assumption methods could besuitable
Artificial neural network (ANN) models which arecompletely nonparametric have been used increasingly indifferent areas of sciences Although analyzing the data usingANN methodology is usually more complex than traditionalapproaches ANNmodels aremore flexible and efficientwhenour main aim is prediction or classification of an outcomeusing different explanatory variables [4ndash17]
Note that when several covariates and complex interac-tions are of concern the best method is ANN otherwisebased on model assumptions simple regression models canbe appropriately used
In this study simulated data sets with different rates ofcensoring were used to predict the outcome using ANN and
2 Journal of Probability and Statistics
traditional Cox regression models and then the results ofpredictions were compared
2 Methods
21 Cox RegressionModel Suppose that119879 denotes a continu-ous nonnegative random variable describing the failure timeof an event (ie time-to-event) in a system The probabilitydensity function of 119905 that is the actual survival time is 119891(119905)The survival function 119878(119905) is probability that the failureoccurs later than time 119905 The related hazard function ℎ(119905)denotes the probability density of an event occurring aroundtime 119905 given that it has not occurred prior to time 119905
As we know an inherent characteristic of survival datais censoring Right censored data which is the commonestformof censoring occurswhen survival times are greater thansome defined time point [1 2]The generated data used in thisstudy contains right-censored data
Proportional hazards model which also called Coxregression is a popular method in analysis of survival dataThis model is presented as
plete and 120575119894= 0 if it is censored 119877(119905
119894) is defined as the risk set
at time 119905119894
To fit the Cox regression and estimate 120573 the partiallikelihood in (3) is maximized using iteratively reweightedleast squares to implement the Newton-Raphson methodHowever in the high-dimensional case this approach cannotbe used to estimate 120573 even 120573 is not unique
22 Neural Networks Model An ANN consists of severallayers Layers are interconnected group of artificial neuronsIn addition each layer has a weight that indicates the amountof the effect of neurons on each other Usually an ANNmodel has three layers that called input hidden (middle) andoutput The input layer contains the predictors The hidden
layer contains unobservable nodes and applies a nonlineartransformation to the linear combination of input layer Thevalue of each hidden node is a function of the predictorsTheoutput layer contains the outcome which is some functionsof the hidden units In hidden and output layers the exactform of the function depends on the network type and userdefinition (based on response variable)
There are different methods for learning in the NNFor example in multiple layers perceptron (MLP) whichis the most commonly used the learning performs withminimization of the mean square error of the output and byback-propagation learning algorithm [16 19 20]
In this paper we use the activation transfer function (119892ℎ)
as sigmoid function in hidden and in output layers (119892119900) The
119894th response 119910119894for the predictor values 119867
Note that X1015840119894is 119894th row of the input data matrix X 119867
119894ℎis a
nonlinear function of linear combination of input data 120573 isthe vector weights of the hidden to the output units and 120572 isthe matrix weights of the input to the hidden units Equation(4) together yield the MLP model
By the sigmoid activation function (5) can be written asbelow which is a nonlinear regression
119910119894= [1 + exp (minusH1015840
119894120573)]minus1
+ 120576119894
= [1 + exp(minus1205730minus
119867minus1
sum
ℎ=1
120573ℎ[1 + exp (minusX1015840
119894120572119895)]minus1
)]
minus1
+ 120576119894
= 119892 (X11989412057312057211205722 120572
119867) + 120576119894
(6)
where 12057312057211205722 120572
119867are unknown parameter vectors X
119894
is a vector of known constants and 120576119894are residuals The
parameters (weights) can be estimated by optimizing somecriterion function such as maximizing the log-likelihoodfunction or minimizing the sum of squared errors
In an MLP framework a serious problem is overfittingTo control of the overfitting usually a penalty term is addedto the optimization criterion To this penalized least squarescriterion for parameter estimation is given by [1]
where the penalty term is 119901120582(12057312057211205722 120572
119867) = 120582(sum120573
2
119894+
sum1205722
119894119895)In likelihood schema which is often used in shrinkage
method an adaption of (7) is [8 21]
119871 = minus log likelihood + 120582 (sum1205732
119894+sum120572
2
119894119895) (8)
The penalty weight 120582 regulates between over- and underfit-ting A best value of 120582 is between 0001 and 01 and is chosenby cross-validation [1 8] In this paper we use (8) to get theparameter estimated It is mentioned that for an outcome(the response variable) with two classes 120575 = (01) 119901
119894is
probability of event for the 119894th patient and the error functionprovides the cross-entropy error function as
119864 = minussum120575119894log (119901
119894) + (1 minus 120575
119894) log (1 minus 119901
119894) (9)
An ANN can be modeled as a generalized linear modelingwith nonlinear predictors [8ndash11] Bignazoli et al [8] intro-duced a method called partial logistic ANN and Lisboa [22]developed it with fit smooth estimates of the discrete timehazard in structure It is similar to MLP [23] with additionalcovariate namely time as an input and given by
hidden nodes respectively 119887ℎand 119887 denote bias term in the
hidden and output layers respectively After the estimationof the network weights w a single output node estimatesconditional failure probability values from the connectionswith themiddle units and the survivorship is calculated fromthe estimated discrete time hazard by multiplying the condi-tionals for survival over time interval Then minuslog(likelihood)statistics could be obtained as [8]
119864 = minus
no of patient
sum
119901=1
1199051
sum
119896=1
[120575119901119896log (ℎ
119901(119909119901 119905119896))
+ (1 minus 120575119901119896) log (1 minus ℎ
119901(119909119901 119905119896))]
(11)
where 120575 is the censoring indicator function
23 Model Fitting The ultimate goal of the learning processis to minimize the error by net In training step to fit themodel by a fixed number of hidden nodes we use penalizedlikelihood as
119864lowast= 119864 + 120582 (sum120573
2
119894+sum120572
2
119894119895) (12)
By using this we improve the convergence of the optimiza-tion and also control overfitting problem [1 8 9 16 21]
To identify the number of the hidden nodes and thenmodel selection Bayesian Information Criterion (BIC) andNetwork Information Criterion (NIC) [8 23 24] that isgeneralization of Akaike Information Criterion (AIC) arecalculated
BIC = minus2 times log likelihood + log (119873) times 119875
NIC = 2119864lowast + 2119875(13)
where 119875 is the number of the parameters estimated and119873 isthe number of observations in training set The best modelis with the smallest value of these criterions In additionto assess prediction accuracy in validation (testing) groupwe calculated classification accuracy and mean square error(MSE)
The best model is selected with the smallest value ofMSEThe models considered 2 3 4 5 10 15 and 20 hidden nodesTheweight decay was considered 0012 which is chosen basedon some empirical study [25]
At finally in order to comparison of the Cox and ANNpredictions classification accuracy and concordance indexeswere calculated All simulations and comparisons were per-formed by R 2141
3 Simulation
In order to compare the accuracy of the predictions by ANNand Cox regression four different simulation schemes basedon Monte Carlo simulation were used In each schemahazard at any time 119905 was considered as exponential form[26] namely 120582 (Table 1) For each schema 1000 independentrandomobservations were generated and thenwith the basedon the relationship between exponential parameter and inde-pendent variables survival times were generated Afterwardthe survival times were transformed as right censored In thiscontext if generated time 119905
119894is greater than the quantile of
exponential function with parameter of 120582 it is considered ascensorshipThis process was repeated 100 times To access theaccuracy of predictions each sample is randomly divided totwo parts The first part the training group was consistingof 700 observations and the 300 remainder observationswere allocated to second group that is the testing groupFurthermore in all simulation the average rates of censorshipwere considered equal to 20 30 40 50 60 70 and80 In addition the models were considered with the maineffects and withoutwith any interaction terms as simple andcomplex model respectively
In simulation 1 and 2 two covariates were used whichwas generated randomly from binomial and standard normaldistributions The models of these simulations were consist-ing of any and one interaction terms In simulation 3 threecovariates were used which was generated randomly frombinomial and standardnormal distributions respectivelyThemodel of this simulation has had two interaction terms Insimulation 4 four covariates were used which was generatedrandomly from binomial and standard normal distributionsThe models of these simulations are complex and consist oftwo- three- and four-interaction terms (Table 1)
Table 3 Results of concordance indexes of simulation study intesting subset (300 cases with 100 replications)
Modellowast ANN Cox RegI
20 0821 plusmn 0028 0820 plusmn 0031
30 0819 plusmn 0027 0820 plusmn 0030
40 0824 plusmn 0028 0823 plusmn 0030
50 0823 plusmn 0027 0823 plusmn 0029
60 0823 plusmn 0027 0822 plusmn 0030
70 0825 plusmn 0027 0822 plusmn 0032
80 0826 plusmn 0030 0823 plusmn 0031
II20 0818 plusmn 0029 0817 plusmn 0031
30 0817 plusmn 0034 0815 plusmn 0046
40 0818 plusmn 0032 0818 plusmn 0031
50 0816 plusmn 0035 0814 plusmn 0032
60 0816 plusmn 0031 0813 plusmn 0032
70 0814 plusmn 0033 0811 plusmn 0031
80 0812 plusmn 0034 0808 plusmn 0036
III20 0808 plusmn 0030 0798 plusmn 0035
30 0799 plusmn 0033 0790 plusmn 0033
40 0812 plusmn 0043 0795 plusmn 0038
50 0805 plusmn 0028 0795 plusmn 0030
60 0804 plusmn 0028 0790 plusmn 0033
70 0805 plusmn 0028 0791 plusmn 0034
80 0803 plusmn 0028 0790 plusmn 0032
IV20 0764 plusmn 0033 0759 plusmn 0036
30 0773 plusmn 0029 0762 plusmn 0033
40 0777 plusmn 0030 0759 plusmn 0035
50 0779 plusmn 0035 0764 plusmn 0034
60 0812 plusmn 0039 0790 plusmn 0036
70 0764 plusmn 0036 0744 plusmn 0035
80 0766 plusmn 0035 0741 plusmn 0037lowast
Four models with different rates of censoring
The model selection is based on BIC for learning set andSSE criterion for the testing subset data as a verification Theresults in Table 2 show that the simple model performs withless hidden node but complex model performs better withmore hidden nodes The MSE values confirm these results(Table 2)
In the next step to compare of ANN and Cox regressionpredictions concordance indexes were calculated from clas-sification accuracy table in testing subset Concordance indexwas reported as a generalization of the area under receiveroperating characteristic curve for censored data [27 28]Thisindexmeans that the proportion of the cases that are classifiedcorrectly in noncensored (event) and censored groups and 0to 1 values indicated as the ability of the models accuracyTheconcordance index of ANN and Cox regression models was
reported in Table 3The results of simulation study in simplermodel showed that there was not any different betweenthe predictions of Cox regression and NN models But NNpredictions were better than Cox regression predictions incomplex model with high rates of censoring
4 Conclusion
In this paper we presented two approaches for modeling ofsurvival data with different degrees of censoring Cox regres-sion and neural network models A Monte-Carlo simulationstudy was performed to compare predictive accuracy of Coxand neural network models in simulation data sets
In the simulation study four different models wereconsideredThe rate of censorship in each of thesemodelswasconsidered from 20 up 80These models were consideredwith the main effects and also with the interaction termsThen the ability of these models in prediction was evaluatedAs was seen in simple models and with less censored casesthere was little difference in ANN and Cox regressionmodelspredictions It seems that for simpler models the levels ofcensorship have no effect on predictions but the predictionsin more complex models depend on the levels of censorshipThe results showed that the NN model for more complexmodels was provided better predictions But for simplermodels predictions there was not any different in resultsThis result was consistent with the finding fromXiangrsquos study[26] Therefore NN model is proposed in two cases of (1)occurrence of high censorship (ie censoring rate of 60 andhigher) andor (2) in the complex models (ie with manycovariates and any interaction terms) This is a very goodresult and can be used in practical issues which often arefaced onwithmany numbers of variables and alsomany casesof censorship For that reason in these two cases the ANNstrategy can be used as an alternate of traditional Cox modelFinally it is mentioned that there are some flexible alternativemethods such as piecewise exponential and grouped timemodels which can be used for survival data and then its abilitycompared with ANNmodel
Acknowledgment
The authors wish to express their special thanks to refereesfor their valuable comments
References
[1] E T Lee and J W Wang Statistical Methods for SurvivalData Analysis Wiley Series in Probability and Statistics Wiley-Interscience Hoboken NJ USA 3rd edition 2003
[2] V Lagani and I Tsamardinos ldquoStructure-based variable selec-tion for survival datardquo Bioinformatics vol 26 no 15 pp 1887ndash1894 2010
[3] M H Kutner C J Nachtsheim and J Neter Applied LinearRegression Models McGraw-HillIrwin New York NY USA4th edition 2004
[4] WG Baxt and J Skora ldquoProspective validation of artificial neu-ral networks trained to identify acute myocardial infarctionrdquoLancet vol 347 pp 12ndash15 1996
Journal of Probability and Statistics 7
[5] B A Mobley E Schecheer and W E Moore ldquoPredictionof coronary artery stenosis by artificial networksrdquo ArtificialIntelligence in Medicine vol 18 pp 187ndash203 2000
[6] D West and V West ldquoModel selection for medical diagnosticdecision support system breast cancer detection caserdquoArtificialIntelligence in Medicine vol 20 pp 183ndash204 2000
[7] F Ambrogi N Lama P Boracchi and E Biganzoli ldquoSelectionof artificial neural network models for survival analysis withgenetic algorithmsrdquo Computational Statistics amp Data Analysisvol 52 no 1 pp 30ndash42 2007
[8] E Biganzoli P Boracchi LMariani andEMarubini ldquoFeed for-ward neural networks for the analysis of censored survival dataa partial logistic regression approachrdquo Statistics inMedicine vol17 pp 1169ndash1186 1998
[9] E Biganzoli P Boracchi and E Marubini ldquoA general frame-work for neural network models on censored survival datardquoNeural Networks vol 15 pp 209ndash218 2002
[10] E M Bignazoli and P Borrachi ldquoThe Partial Logistic ArtificialNeural Network (PLANN) a tool for the flexible modelling ofcensored survival datardquo in Proceedings of the European Confer-ence on EmergentAspects inClinical DataAnalysis (EACDA rsquo05)2005
[11] E M Bignazoli P Borrachi F Amborgini and E MarubinildquoArtificial neural network for the joint modeling of discretecause-specific hazardsrdquo Artificial Intelligence in Medicine vol37 pp 119ndash130 2006
[12] R Bittern A Cuschieri S D Dolgobrodov et al ldquoAn artificialneural network for analysing the survival of patients withcolorectal cancerrdquo in Proceedings of the European Symposium onArtificial Neural Networks (ESANN rsquo05) Bruges Belgium April2005
[13] K U Chen and C J Christian ldquoUsing back-propagation neuralnetwork to forecast the production values of the machineryindustry in Taiwanrdquo Journal of American Academy of BusinessCambridge vol 9 no 1 pp 183ndash190 2006
[14] C L Chia W Nick Street and H W William ldquoApplicationof artificial neural network-based survival analysis on twobreast cancer datasetsrdquo in Proceedings of the American MedicalInformatics Association Annual Symposium (AMIA rsquo07) pp130ndash134 Chicago Ill USA November 2007
[15] A Eleuteri R Tagliaferri and L Milano ldquoA novel neuralnetwork-based survival analysis modelrdquo Neural Networks vol16 pp 855ndash864 2003
[16] B D Ripley and R M Ripley ldquoNeural networks as statisticalmethods in survival analysisrdquo in Clinical Applications of Artifi-cial Neural Networks pp 237ndash255 Cambridge University PressCambridge UK 2001
[17] B Warner and M Manavendra ldquoUnderstanding neural net-works as statistical toolsrdquo Amstat vol 50 no 4 pp 284ndash2931996
[18] J P Klein andM LMoeschberger Survival Analysis Techniquesfor Censored andTruncatedData Springer NewYorkNYUSA2th edition 2003
[19] JW Kay andDM Titterington Statistics andNeural NetworksOxford University Press Oxford UK 1999
[20] E P Goss and G S Vozikis ldquoImproving health care organiza-tional management through neural network learningrdquo HealthCare Management Science vol 5 pp 221ndash227 2002
[21] R M Ripley A L Harris and L Tarassenko ldquoNon-linearsurvival analysis using neural networksrdquo Statistics in Medicinevol 23 pp 825ndash842 2004
[22] P J G Lisboa H Wong P Harris and R Swindell ldquoA Bayesianneural network approach for modeling censored data withan application to prognosis after surgery for breast cancerrdquoArtificial Intelligence in Medicine vol 28 no 1 pp 1ndash25 2003
[23] C M Bishop Neural Networks for Pattern Recognition TheClarendon Press Oxford University Press New York NY USA1995
[24] B D Ripley Pattern Recognition and Neural Networks Cam-bridge University Press Cambridge UK 1996
[25] N Fallah G U Hong KMohammad et al ldquoNonlinear Poissonregression using neural networks a simulation studyrdquo NeuralComputing and Applications vol 18 no 8 pp 939ndash943 2009
[26] A Xiang P Lapuerta A Ryutov et al ldquoComparison of theperformance of neural network methods and Cox regressionfor censored survival datardquo Computational Statistics amp DataAnalysis vol 34 no 2 pp 243ndash257 2000
[27] F E Harrell R M Califf D B Pryor et al ldquoEvaluating theyield of medical testsrdquo The Journal of the American MedicalAssociation vol 247 pp 2543ndash2546 1982
[28] F E Harrell K L Lee R M Califf et al ldquoRegression modelingstrategies for improved prognostic predictionrdquo Statistics inMedicine vol 3 pp 143ndash152 1984
traditional Cox regression models and then the results ofpredictions were compared
2 Methods
21 Cox RegressionModel Suppose that119879 denotes a continu-ous nonnegative random variable describing the failure timeof an event (ie time-to-event) in a system The probabilitydensity function of 119905 that is the actual survival time is 119891(119905)The survival function 119878(119905) is probability that the failureoccurs later than time 119905 The related hazard function ℎ(119905)denotes the probability density of an event occurring aroundtime 119905 given that it has not occurred prior to time 119905
As we know an inherent characteristic of survival datais censoring Right censored data which is the commonestformof censoring occurswhen survival times are greater thansome defined time point [1 2]The generated data used in thisstudy contains right-censored data
Proportional hazards model which also called Coxregression is a popular method in analysis of survival dataThis model is presented as
plete and 120575119894= 0 if it is censored 119877(119905
119894) is defined as the risk set
at time 119905119894
To fit the Cox regression and estimate 120573 the partiallikelihood in (3) is maximized using iteratively reweightedleast squares to implement the Newton-Raphson methodHowever in the high-dimensional case this approach cannotbe used to estimate 120573 even 120573 is not unique
22 Neural Networks Model An ANN consists of severallayers Layers are interconnected group of artificial neuronsIn addition each layer has a weight that indicates the amountof the effect of neurons on each other Usually an ANNmodel has three layers that called input hidden (middle) andoutput The input layer contains the predictors The hidden
layer contains unobservable nodes and applies a nonlineartransformation to the linear combination of input layer Thevalue of each hidden node is a function of the predictorsTheoutput layer contains the outcome which is some functionsof the hidden units In hidden and output layers the exactform of the function depends on the network type and userdefinition (based on response variable)
There are different methods for learning in the NNFor example in multiple layers perceptron (MLP) whichis the most commonly used the learning performs withminimization of the mean square error of the output and byback-propagation learning algorithm [16 19 20]
In this paper we use the activation transfer function (119892ℎ)
as sigmoid function in hidden and in output layers (119892119900) The
119894th response 119910119894for the predictor values 119867
Note that X1015840119894is 119894th row of the input data matrix X 119867
119894ℎis a
nonlinear function of linear combination of input data 120573 isthe vector weights of the hidden to the output units and 120572 isthe matrix weights of the input to the hidden units Equation(4) together yield the MLP model
By the sigmoid activation function (5) can be written asbelow which is a nonlinear regression
119910119894= [1 + exp (minusH1015840
119894120573)]minus1
+ 120576119894
= [1 + exp(minus1205730minus
119867minus1
sum
ℎ=1
120573ℎ[1 + exp (minusX1015840
119894120572119895)]minus1
)]
minus1
+ 120576119894
= 119892 (X11989412057312057211205722 120572
119867) + 120576119894
(6)
where 12057312057211205722 120572
119867are unknown parameter vectors X
119894
is a vector of known constants and 120576119894are residuals The
parameters (weights) can be estimated by optimizing somecriterion function such as maximizing the log-likelihoodfunction or minimizing the sum of squared errors
In an MLP framework a serious problem is overfittingTo control of the overfitting usually a penalty term is addedto the optimization criterion To this penalized least squarescriterion for parameter estimation is given by [1]
where the penalty term is 119901120582(12057312057211205722 120572
119867) = 120582(sum120573
2
119894+
sum1205722
119894119895)In likelihood schema which is often used in shrinkage
method an adaption of (7) is [8 21]
119871 = minus log likelihood + 120582 (sum1205732
119894+sum120572
2
119894119895) (8)
The penalty weight 120582 regulates between over- and underfit-ting A best value of 120582 is between 0001 and 01 and is chosenby cross-validation [1 8] In this paper we use (8) to get theparameter estimated It is mentioned that for an outcome(the response variable) with two classes 120575 = (01) 119901
119894is
probability of event for the 119894th patient and the error functionprovides the cross-entropy error function as
119864 = minussum120575119894log (119901
119894) + (1 minus 120575
119894) log (1 minus 119901
119894) (9)
An ANN can be modeled as a generalized linear modelingwith nonlinear predictors [8ndash11] Bignazoli et al [8] intro-duced a method called partial logistic ANN and Lisboa [22]developed it with fit smooth estimates of the discrete timehazard in structure It is similar to MLP [23] with additionalcovariate namely time as an input and given by
hidden nodes respectively 119887ℎand 119887 denote bias term in the
hidden and output layers respectively After the estimationof the network weights w a single output node estimatesconditional failure probability values from the connectionswith themiddle units and the survivorship is calculated fromthe estimated discrete time hazard by multiplying the condi-tionals for survival over time interval Then minuslog(likelihood)statistics could be obtained as [8]
119864 = minus
no of patient
sum
119901=1
1199051
sum
119896=1
[120575119901119896log (ℎ
119901(119909119901 119905119896))
+ (1 minus 120575119901119896) log (1 minus ℎ
119901(119909119901 119905119896))]
(11)
where 120575 is the censoring indicator function
23 Model Fitting The ultimate goal of the learning processis to minimize the error by net In training step to fit themodel by a fixed number of hidden nodes we use penalizedlikelihood as
119864lowast= 119864 + 120582 (sum120573
2
119894+sum120572
2
119894119895) (12)
By using this we improve the convergence of the optimiza-tion and also control overfitting problem [1 8 9 16 21]
To identify the number of the hidden nodes and thenmodel selection Bayesian Information Criterion (BIC) andNetwork Information Criterion (NIC) [8 23 24] that isgeneralization of Akaike Information Criterion (AIC) arecalculated
BIC = minus2 times log likelihood + log (119873) times 119875
NIC = 2119864lowast + 2119875(13)
where 119875 is the number of the parameters estimated and119873 isthe number of observations in training set The best modelis with the smallest value of these criterions In additionto assess prediction accuracy in validation (testing) groupwe calculated classification accuracy and mean square error(MSE)
The best model is selected with the smallest value ofMSEThe models considered 2 3 4 5 10 15 and 20 hidden nodesTheweight decay was considered 0012 which is chosen basedon some empirical study [25]
At finally in order to comparison of the Cox and ANNpredictions classification accuracy and concordance indexeswere calculated All simulations and comparisons were per-formed by R 2141
3 Simulation
In order to compare the accuracy of the predictions by ANNand Cox regression four different simulation schemes basedon Monte Carlo simulation were used In each schemahazard at any time 119905 was considered as exponential form[26] namely 120582 (Table 1) For each schema 1000 independentrandomobservations were generated and thenwith the basedon the relationship between exponential parameter and inde-pendent variables survival times were generated Afterwardthe survival times were transformed as right censored In thiscontext if generated time 119905
119894is greater than the quantile of
exponential function with parameter of 120582 it is considered ascensorshipThis process was repeated 100 times To access theaccuracy of predictions each sample is randomly divided totwo parts The first part the training group was consistingof 700 observations and the 300 remainder observationswere allocated to second group that is the testing groupFurthermore in all simulation the average rates of censorshipwere considered equal to 20 30 40 50 60 70 and80 In addition the models were considered with the maineffects and withoutwith any interaction terms as simple andcomplex model respectively
In simulation 1 and 2 two covariates were used whichwas generated randomly from binomial and standard normaldistributions The models of these simulations were consist-ing of any and one interaction terms In simulation 3 threecovariates were used which was generated randomly frombinomial and standardnormal distributions respectivelyThemodel of this simulation has had two interaction terms Insimulation 4 four covariates were used which was generatedrandomly from binomial and standard normal distributionsThe models of these simulations are complex and consist oftwo- three- and four-interaction terms (Table 1)
Table 3 Results of concordance indexes of simulation study intesting subset (300 cases with 100 replications)
Modellowast ANN Cox RegI
20 0821 plusmn 0028 0820 plusmn 0031
30 0819 plusmn 0027 0820 plusmn 0030
40 0824 plusmn 0028 0823 plusmn 0030
50 0823 plusmn 0027 0823 plusmn 0029
60 0823 plusmn 0027 0822 plusmn 0030
70 0825 plusmn 0027 0822 plusmn 0032
80 0826 plusmn 0030 0823 plusmn 0031
II20 0818 plusmn 0029 0817 plusmn 0031
30 0817 plusmn 0034 0815 plusmn 0046
40 0818 plusmn 0032 0818 plusmn 0031
50 0816 plusmn 0035 0814 plusmn 0032
60 0816 plusmn 0031 0813 plusmn 0032
70 0814 plusmn 0033 0811 plusmn 0031
80 0812 plusmn 0034 0808 plusmn 0036
III20 0808 plusmn 0030 0798 plusmn 0035
30 0799 plusmn 0033 0790 plusmn 0033
40 0812 plusmn 0043 0795 plusmn 0038
50 0805 plusmn 0028 0795 plusmn 0030
60 0804 plusmn 0028 0790 plusmn 0033
70 0805 plusmn 0028 0791 plusmn 0034
80 0803 plusmn 0028 0790 plusmn 0032
IV20 0764 plusmn 0033 0759 plusmn 0036
30 0773 plusmn 0029 0762 plusmn 0033
40 0777 plusmn 0030 0759 plusmn 0035
50 0779 plusmn 0035 0764 plusmn 0034
60 0812 plusmn 0039 0790 plusmn 0036
70 0764 plusmn 0036 0744 plusmn 0035
80 0766 plusmn 0035 0741 plusmn 0037lowast
Four models with different rates of censoring
The model selection is based on BIC for learning set andSSE criterion for the testing subset data as a verification Theresults in Table 2 show that the simple model performs withless hidden node but complex model performs better withmore hidden nodes The MSE values confirm these results(Table 2)
In the next step to compare of ANN and Cox regressionpredictions concordance indexes were calculated from clas-sification accuracy table in testing subset Concordance indexwas reported as a generalization of the area under receiveroperating characteristic curve for censored data [27 28]Thisindexmeans that the proportion of the cases that are classifiedcorrectly in noncensored (event) and censored groups and 0to 1 values indicated as the ability of the models accuracyTheconcordance index of ANN and Cox regression models was
reported in Table 3The results of simulation study in simplermodel showed that there was not any different betweenthe predictions of Cox regression and NN models But NNpredictions were better than Cox regression predictions incomplex model with high rates of censoring
4 Conclusion
In this paper we presented two approaches for modeling ofsurvival data with different degrees of censoring Cox regres-sion and neural network models A Monte-Carlo simulationstudy was performed to compare predictive accuracy of Coxand neural network models in simulation data sets
In the simulation study four different models wereconsideredThe rate of censorship in each of thesemodelswasconsidered from 20 up 80These models were consideredwith the main effects and also with the interaction termsThen the ability of these models in prediction was evaluatedAs was seen in simple models and with less censored casesthere was little difference in ANN and Cox regressionmodelspredictions It seems that for simpler models the levels ofcensorship have no effect on predictions but the predictionsin more complex models depend on the levels of censorshipThe results showed that the NN model for more complexmodels was provided better predictions But for simplermodels predictions there was not any different in resultsThis result was consistent with the finding fromXiangrsquos study[26] Therefore NN model is proposed in two cases of (1)occurrence of high censorship (ie censoring rate of 60 andhigher) andor (2) in the complex models (ie with manycovariates and any interaction terms) This is a very goodresult and can be used in practical issues which often arefaced onwithmany numbers of variables and alsomany casesof censorship For that reason in these two cases the ANNstrategy can be used as an alternate of traditional Cox modelFinally it is mentioned that there are some flexible alternativemethods such as piecewise exponential and grouped timemodels which can be used for survival data and then its abilitycompared with ANNmodel
Acknowledgment
The authors wish to express their special thanks to refereesfor their valuable comments
References
[1] E T Lee and J W Wang Statistical Methods for SurvivalData Analysis Wiley Series in Probability and Statistics Wiley-Interscience Hoboken NJ USA 3rd edition 2003
[2] V Lagani and I Tsamardinos ldquoStructure-based variable selec-tion for survival datardquo Bioinformatics vol 26 no 15 pp 1887ndash1894 2010
[3] M H Kutner C J Nachtsheim and J Neter Applied LinearRegression Models McGraw-HillIrwin New York NY USA4th edition 2004
[4] WG Baxt and J Skora ldquoProspective validation of artificial neu-ral networks trained to identify acute myocardial infarctionrdquoLancet vol 347 pp 12ndash15 1996
Journal of Probability and Statistics 7
[5] B A Mobley E Schecheer and W E Moore ldquoPredictionof coronary artery stenosis by artificial networksrdquo ArtificialIntelligence in Medicine vol 18 pp 187ndash203 2000
[6] D West and V West ldquoModel selection for medical diagnosticdecision support system breast cancer detection caserdquoArtificialIntelligence in Medicine vol 20 pp 183ndash204 2000
[7] F Ambrogi N Lama P Boracchi and E Biganzoli ldquoSelectionof artificial neural network models for survival analysis withgenetic algorithmsrdquo Computational Statistics amp Data Analysisvol 52 no 1 pp 30ndash42 2007
[8] E Biganzoli P Boracchi LMariani andEMarubini ldquoFeed for-ward neural networks for the analysis of censored survival dataa partial logistic regression approachrdquo Statistics inMedicine vol17 pp 1169ndash1186 1998
[9] E Biganzoli P Boracchi and E Marubini ldquoA general frame-work for neural network models on censored survival datardquoNeural Networks vol 15 pp 209ndash218 2002
[10] E M Bignazoli and P Borrachi ldquoThe Partial Logistic ArtificialNeural Network (PLANN) a tool for the flexible modelling ofcensored survival datardquo in Proceedings of the European Confer-ence on EmergentAspects inClinical DataAnalysis (EACDA rsquo05)2005
[11] E M Bignazoli P Borrachi F Amborgini and E MarubinildquoArtificial neural network for the joint modeling of discretecause-specific hazardsrdquo Artificial Intelligence in Medicine vol37 pp 119ndash130 2006
[12] R Bittern A Cuschieri S D Dolgobrodov et al ldquoAn artificialneural network for analysing the survival of patients withcolorectal cancerrdquo in Proceedings of the European Symposium onArtificial Neural Networks (ESANN rsquo05) Bruges Belgium April2005
[13] K U Chen and C J Christian ldquoUsing back-propagation neuralnetwork to forecast the production values of the machineryindustry in Taiwanrdquo Journal of American Academy of BusinessCambridge vol 9 no 1 pp 183ndash190 2006
[14] C L Chia W Nick Street and H W William ldquoApplicationof artificial neural network-based survival analysis on twobreast cancer datasetsrdquo in Proceedings of the American MedicalInformatics Association Annual Symposium (AMIA rsquo07) pp130ndash134 Chicago Ill USA November 2007
[15] A Eleuteri R Tagliaferri and L Milano ldquoA novel neuralnetwork-based survival analysis modelrdquo Neural Networks vol16 pp 855ndash864 2003
[16] B D Ripley and R M Ripley ldquoNeural networks as statisticalmethods in survival analysisrdquo in Clinical Applications of Artifi-cial Neural Networks pp 237ndash255 Cambridge University PressCambridge UK 2001
[17] B Warner and M Manavendra ldquoUnderstanding neural net-works as statistical toolsrdquo Amstat vol 50 no 4 pp 284ndash2931996
[18] J P Klein andM LMoeschberger Survival Analysis Techniquesfor Censored andTruncatedData Springer NewYorkNYUSA2th edition 2003
[19] JW Kay andDM Titterington Statistics andNeural NetworksOxford University Press Oxford UK 1999
[20] E P Goss and G S Vozikis ldquoImproving health care organiza-tional management through neural network learningrdquo HealthCare Management Science vol 5 pp 221ndash227 2002
[21] R M Ripley A L Harris and L Tarassenko ldquoNon-linearsurvival analysis using neural networksrdquo Statistics in Medicinevol 23 pp 825ndash842 2004
[22] P J G Lisboa H Wong P Harris and R Swindell ldquoA Bayesianneural network approach for modeling censored data withan application to prognosis after surgery for breast cancerrdquoArtificial Intelligence in Medicine vol 28 no 1 pp 1ndash25 2003
[23] C M Bishop Neural Networks for Pattern Recognition TheClarendon Press Oxford University Press New York NY USA1995
[24] B D Ripley Pattern Recognition and Neural Networks Cam-bridge University Press Cambridge UK 1996
[25] N Fallah G U Hong KMohammad et al ldquoNonlinear Poissonregression using neural networks a simulation studyrdquo NeuralComputing and Applications vol 18 no 8 pp 939ndash943 2009
[26] A Xiang P Lapuerta A Ryutov et al ldquoComparison of theperformance of neural network methods and Cox regressionfor censored survival datardquo Computational Statistics amp DataAnalysis vol 34 no 2 pp 243ndash257 2000
[27] F E Harrell R M Califf D B Pryor et al ldquoEvaluating theyield of medical testsrdquo The Journal of the American MedicalAssociation vol 247 pp 2543ndash2546 1982
[28] F E Harrell K L Lee R M Califf et al ldquoRegression modelingstrategies for improved prognostic predictionrdquo Statistics inMedicine vol 3 pp 143ndash152 1984
where the penalty term is 119901120582(12057312057211205722 120572
119867) = 120582(sum120573
2
119894+
sum1205722
119894119895)In likelihood schema which is often used in shrinkage
method an adaption of (7) is [8 21]
119871 = minus log likelihood + 120582 (sum1205732
119894+sum120572
2
119894119895) (8)
The penalty weight 120582 regulates between over- and underfit-ting A best value of 120582 is between 0001 and 01 and is chosenby cross-validation [1 8] In this paper we use (8) to get theparameter estimated It is mentioned that for an outcome(the response variable) with two classes 120575 = (01) 119901
119894is
probability of event for the 119894th patient and the error functionprovides the cross-entropy error function as
119864 = minussum120575119894log (119901
119894) + (1 minus 120575
119894) log (1 minus 119901
119894) (9)
An ANN can be modeled as a generalized linear modelingwith nonlinear predictors [8ndash11] Bignazoli et al [8] intro-duced a method called partial logistic ANN and Lisboa [22]developed it with fit smooth estimates of the discrete timehazard in structure It is similar to MLP [23] with additionalcovariate namely time as an input and given by
hidden nodes respectively 119887ℎand 119887 denote bias term in the
hidden and output layers respectively After the estimationof the network weights w a single output node estimatesconditional failure probability values from the connectionswith themiddle units and the survivorship is calculated fromthe estimated discrete time hazard by multiplying the condi-tionals for survival over time interval Then minuslog(likelihood)statistics could be obtained as [8]
119864 = minus
no of patient
sum
119901=1
1199051
sum
119896=1
[120575119901119896log (ℎ
119901(119909119901 119905119896))
+ (1 minus 120575119901119896) log (1 minus ℎ
119901(119909119901 119905119896))]
(11)
where 120575 is the censoring indicator function
23 Model Fitting The ultimate goal of the learning processis to minimize the error by net In training step to fit themodel by a fixed number of hidden nodes we use penalizedlikelihood as
119864lowast= 119864 + 120582 (sum120573
2
119894+sum120572
2
119894119895) (12)
By using this we improve the convergence of the optimiza-tion and also control overfitting problem [1 8 9 16 21]
To identify the number of the hidden nodes and thenmodel selection Bayesian Information Criterion (BIC) andNetwork Information Criterion (NIC) [8 23 24] that isgeneralization of Akaike Information Criterion (AIC) arecalculated
BIC = minus2 times log likelihood + log (119873) times 119875
NIC = 2119864lowast + 2119875(13)
where 119875 is the number of the parameters estimated and119873 isthe number of observations in training set The best modelis with the smallest value of these criterions In additionto assess prediction accuracy in validation (testing) groupwe calculated classification accuracy and mean square error(MSE)
The best model is selected with the smallest value ofMSEThe models considered 2 3 4 5 10 15 and 20 hidden nodesTheweight decay was considered 0012 which is chosen basedon some empirical study [25]
At finally in order to comparison of the Cox and ANNpredictions classification accuracy and concordance indexeswere calculated All simulations and comparisons were per-formed by R 2141
3 Simulation
In order to compare the accuracy of the predictions by ANNand Cox regression four different simulation schemes basedon Monte Carlo simulation were used In each schemahazard at any time 119905 was considered as exponential form[26] namely 120582 (Table 1) For each schema 1000 independentrandomobservations were generated and thenwith the basedon the relationship between exponential parameter and inde-pendent variables survival times were generated Afterwardthe survival times were transformed as right censored In thiscontext if generated time 119905
119894is greater than the quantile of
exponential function with parameter of 120582 it is considered ascensorshipThis process was repeated 100 times To access theaccuracy of predictions each sample is randomly divided totwo parts The first part the training group was consistingof 700 observations and the 300 remainder observationswere allocated to second group that is the testing groupFurthermore in all simulation the average rates of censorshipwere considered equal to 20 30 40 50 60 70 and80 In addition the models were considered with the maineffects and withoutwith any interaction terms as simple andcomplex model respectively
In simulation 1 and 2 two covariates were used whichwas generated randomly from binomial and standard normaldistributions The models of these simulations were consist-ing of any and one interaction terms In simulation 3 threecovariates were used which was generated randomly frombinomial and standardnormal distributions respectivelyThemodel of this simulation has had two interaction terms Insimulation 4 four covariates were used which was generatedrandomly from binomial and standard normal distributionsThe models of these simulations are complex and consist oftwo- three- and four-interaction terms (Table 1)
Table 3 Results of concordance indexes of simulation study intesting subset (300 cases with 100 replications)
Modellowast ANN Cox RegI
20 0821 plusmn 0028 0820 plusmn 0031
30 0819 plusmn 0027 0820 plusmn 0030
40 0824 plusmn 0028 0823 plusmn 0030
50 0823 plusmn 0027 0823 plusmn 0029
60 0823 plusmn 0027 0822 plusmn 0030
70 0825 plusmn 0027 0822 plusmn 0032
80 0826 plusmn 0030 0823 plusmn 0031
II20 0818 plusmn 0029 0817 plusmn 0031
30 0817 plusmn 0034 0815 plusmn 0046
40 0818 plusmn 0032 0818 plusmn 0031
50 0816 plusmn 0035 0814 plusmn 0032
60 0816 plusmn 0031 0813 plusmn 0032
70 0814 plusmn 0033 0811 plusmn 0031
80 0812 plusmn 0034 0808 plusmn 0036
III20 0808 plusmn 0030 0798 plusmn 0035
30 0799 plusmn 0033 0790 plusmn 0033
40 0812 plusmn 0043 0795 plusmn 0038
50 0805 plusmn 0028 0795 plusmn 0030
60 0804 plusmn 0028 0790 plusmn 0033
70 0805 plusmn 0028 0791 plusmn 0034
80 0803 plusmn 0028 0790 plusmn 0032
IV20 0764 plusmn 0033 0759 plusmn 0036
30 0773 plusmn 0029 0762 plusmn 0033
40 0777 plusmn 0030 0759 plusmn 0035
50 0779 plusmn 0035 0764 plusmn 0034
60 0812 plusmn 0039 0790 plusmn 0036
70 0764 plusmn 0036 0744 plusmn 0035
80 0766 plusmn 0035 0741 plusmn 0037lowast
Four models with different rates of censoring
The model selection is based on BIC for learning set andSSE criterion for the testing subset data as a verification Theresults in Table 2 show that the simple model performs withless hidden node but complex model performs better withmore hidden nodes The MSE values confirm these results(Table 2)
In the next step to compare of ANN and Cox regressionpredictions concordance indexes were calculated from clas-sification accuracy table in testing subset Concordance indexwas reported as a generalization of the area under receiveroperating characteristic curve for censored data [27 28]Thisindexmeans that the proportion of the cases that are classifiedcorrectly in noncensored (event) and censored groups and 0to 1 values indicated as the ability of the models accuracyTheconcordance index of ANN and Cox regression models was
reported in Table 3The results of simulation study in simplermodel showed that there was not any different betweenthe predictions of Cox regression and NN models But NNpredictions were better than Cox regression predictions incomplex model with high rates of censoring
4 Conclusion
In this paper we presented two approaches for modeling ofsurvival data with different degrees of censoring Cox regres-sion and neural network models A Monte-Carlo simulationstudy was performed to compare predictive accuracy of Coxand neural network models in simulation data sets
In the simulation study four different models wereconsideredThe rate of censorship in each of thesemodelswasconsidered from 20 up 80These models were consideredwith the main effects and also with the interaction termsThen the ability of these models in prediction was evaluatedAs was seen in simple models and with less censored casesthere was little difference in ANN and Cox regressionmodelspredictions It seems that for simpler models the levels ofcensorship have no effect on predictions but the predictionsin more complex models depend on the levels of censorshipThe results showed that the NN model for more complexmodels was provided better predictions But for simplermodels predictions there was not any different in resultsThis result was consistent with the finding fromXiangrsquos study[26] Therefore NN model is proposed in two cases of (1)occurrence of high censorship (ie censoring rate of 60 andhigher) andor (2) in the complex models (ie with manycovariates and any interaction terms) This is a very goodresult and can be used in practical issues which often arefaced onwithmany numbers of variables and alsomany casesof censorship For that reason in these two cases the ANNstrategy can be used as an alternate of traditional Cox modelFinally it is mentioned that there are some flexible alternativemethods such as piecewise exponential and grouped timemodels which can be used for survival data and then its abilitycompared with ANNmodel
Acknowledgment
The authors wish to express their special thanks to refereesfor their valuable comments
References
[1] E T Lee and J W Wang Statistical Methods for SurvivalData Analysis Wiley Series in Probability and Statistics Wiley-Interscience Hoboken NJ USA 3rd edition 2003
[2] V Lagani and I Tsamardinos ldquoStructure-based variable selec-tion for survival datardquo Bioinformatics vol 26 no 15 pp 1887ndash1894 2010
[3] M H Kutner C J Nachtsheim and J Neter Applied LinearRegression Models McGraw-HillIrwin New York NY USA4th edition 2004
[4] WG Baxt and J Skora ldquoProspective validation of artificial neu-ral networks trained to identify acute myocardial infarctionrdquoLancet vol 347 pp 12ndash15 1996
Journal of Probability and Statistics 7
[5] B A Mobley E Schecheer and W E Moore ldquoPredictionof coronary artery stenosis by artificial networksrdquo ArtificialIntelligence in Medicine vol 18 pp 187ndash203 2000
[6] D West and V West ldquoModel selection for medical diagnosticdecision support system breast cancer detection caserdquoArtificialIntelligence in Medicine vol 20 pp 183ndash204 2000
[7] F Ambrogi N Lama P Boracchi and E Biganzoli ldquoSelectionof artificial neural network models for survival analysis withgenetic algorithmsrdquo Computational Statistics amp Data Analysisvol 52 no 1 pp 30ndash42 2007
[8] E Biganzoli P Boracchi LMariani andEMarubini ldquoFeed for-ward neural networks for the analysis of censored survival dataa partial logistic regression approachrdquo Statistics inMedicine vol17 pp 1169ndash1186 1998
[9] E Biganzoli P Boracchi and E Marubini ldquoA general frame-work for neural network models on censored survival datardquoNeural Networks vol 15 pp 209ndash218 2002
[10] E M Bignazoli and P Borrachi ldquoThe Partial Logistic ArtificialNeural Network (PLANN) a tool for the flexible modelling ofcensored survival datardquo in Proceedings of the European Confer-ence on EmergentAspects inClinical DataAnalysis (EACDA rsquo05)2005
[11] E M Bignazoli P Borrachi F Amborgini and E MarubinildquoArtificial neural network for the joint modeling of discretecause-specific hazardsrdquo Artificial Intelligence in Medicine vol37 pp 119ndash130 2006
[12] R Bittern A Cuschieri S D Dolgobrodov et al ldquoAn artificialneural network for analysing the survival of patients withcolorectal cancerrdquo in Proceedings of the European Symposium onArtificial Neural Networks (ESANN rsquo05) Bruges Belgium April2005
[13] K U Chen and C J Christian ldquoUsing back-propagation neuralnetwork to forecast the production values of the machineryindustry in Taiwanrdquo Journal of American Academy of BusinessCambridge vol 9 no 1 pp 183ndash190 2006
[14] C L Chia W Nick Street and H W William ldquoApplicationof artificial neural network-based survival analysis on twobreast cancer datasetsrdquo in Proceedings of the American MedicalInformatics Association Annual Symposium (AMIA rsquo07) pp130ndash134 Chicago Ill USA November 2007
[15] A Eleuteri R Tagliaferri and L Milano ldquoA novel neuralnetwork-based survival analysis modelrdquo Neural Networks vol16 pp 855ndash864 2003
[16] B D Ripley and R M Ripley ldquoNeural networks as statisticalmethods in survival analysisrdquo in Clinical Applications of Artifi-cial Neural Networks pp 237ndash255 Cambridge University PressCambridge UK 2001
[17] B Warner and M Manavendra ldquoUnderstanding neural net-works as statistical toolsrdquo Amstat vol 50 no 4 pp 284ndash2931996
[18] J P Klein andM LMoeschberger Survival Analysis Techniquesfor Censored andTruncatedData Springer NewYorkNYUSA2th edition 2003
[19] JW Kay andDM Titterington Statistics andNeural NetworksOxford University Press Oxford UK 1999
[20] E P Goss and G S Vozikis ldquoImproving health care organiza-tional management through neural network learningrdquo HealthCare Management Science vol 5 pp 221ndash227 2002
[21] R M Ripley A L Harris and L Tarassenko ldquoNon-linearsurvival analysis using neural networksrdquo Statistics in Medicinevol 23 pp 825ndash842 2004
[22] P J G Lisboa H Wong P Harris and R Swindell ldquoA Bayesianneural network approach for modeling censored data withan application to prognosis after surgery for breast cancerrdquoArtificial Intelligence in Medicine vol 28 no 1 pp 1ndash25 2003
[23] C M Bishop Neural Networks for Pattern Recognition TheClarendon Press Oxford University Press New York NY USA1995
[24] B D Ripley Pattern Recognition and Neural Networks Cam-bridge University Press Cambridge UK 1996
[25] N Fallah G U Hong KMohammad et al ldquoNonlinear Poissonregression using neural networks a simulation studyrdquo NeuralComputing and Applications vol 18 no 8 pp 939ndash943 2009
[26] A Xiang P Lapuerta A Ryutov et al ldquoComparison of theperformance of neural network methods and Cox regressionfor censored survival datardquo Computational Statistics amp DataAnalysis vol 34 no 2 pp 243ndash257 2000
[27] F E Harrell R M Califf D B Pryor et al ldquoEvaluating theyield of medical testsrdquo The Journal of the American MedicalAssociation vol 247 pp 2543ndash2546 1982
[28] F E Harrell K L Lee R M Califf et al ldquoRegression modelingstrategies for improved prognostic predictionrdquo Statistics inMedicine vol 3 pp 143ndash152 1984
Table 3 Results of concordance indexes of simulation study intesting subset (300 cases with 100 replications)
Modellowast ANN Cox RegI
20 0821 plusmn 0028 0820 plusmn 0031
30 0819 plusmn 0027 0820 plusmn 0030
40 0824 plusmn 0028 0823 plusmn 0030
50 0823 plusmn 0027 0823 plusmn 0029
60 0823 plusmn 0027 0822 plusmn 0030
70 0825 plusmn 0027 0822 plusmn 0032
80 0826 plusmn 0030 0823 plusmn 0031
II20 0818 plusmn 0029 0817 plusmn 0031
30 0817 plusmn 0034 0815 plusmn 0046
40 0818 plusmn 0032 0818 plusmn 0031
50 0816 plusmn 0035 0814 plusmn 0032
60 0816 plusmn 0031 0813 plusmn 0032
70 0814 plusmn 0033 0811 plusmn 0031
80 0812 plusmn 0034 0808 plusmn 0036
III20 0808 plusmn 0030 0798 plusmn 0035
30 0799 plusmn 0033 0790 plusmn 0033
40 0812 plusmn 0043 0795 plusmn 0038
50 0805 plusmn 0028 0795 plusmn 0030
60 0804 plusmn 0028 0790 plusmn 0033
70 0805 plusmn 0028 0791 plusmn 0034
80 0803 plusmn 0028 0790 plusmn 0032
IV20 0764 plusmn 0033 0759 plusmn 0036
30 0773 plusmn 0029 0762 plusmn 0033
40 0777 plusmn 0030 0759 plusmn 0035
50 0779 plusmn 0035 0764 plusmn 0034
60 0812 plusmn 0039 0790 plusmn 0036
70 0764 plusmn 0036 0744 plusmn 0035
80 0766 plusmn 0035 0741 plusmn 0037lowast
Four models with different rates of censoring
The model selection is based on BIC for learning set andSSE criterion for the testing subset data as a verification Theresults in Table 2 show that the simple model performs withless hidden node but complex model performs better withmore hidden nodes The MSE values confirm these results(Table 2)
In the next step to compare of ANN and Cox regressionpredictions concordance indexes were calculated from clas-sification accuracy table in testing subset Concordance indexwas reported as a generalization of the area under receiveroperating characteristic curve for censored data [27 28]Thisindexmeans that the proportion of the cases that are classifiedcorrectly in noncensored (event) and censored groups and 0to 1 values indicated as the ability of the models accuracyTheconcordance index of ANN and Cox regression models was
reported in Table 3The results of simulation study in simplermodel showed that there was not any different betweenthe predictions of Cox regression and NN models But NNpredictions were better than Cox regression predictions incomplex model with high rates of censoring
4 Conclusion
In this paper we presented two approaches for modeling ofsurvival data with different degrees of censoring Cox regres-sion and neural network models A Monte-Carlo simulationstudy was performed to compare predictive accuracy of Coxand neural network models in simulation data sets
In the simulation study four different models wereconsideredThe rate of censorship in each of thesemodelswasconsidered from 20 up 80These models were consideredwith the main effects and also with the interaction termsThen the ability of these models in prediction was evaluatedAs was seen in simple models and with less censored casesthere was little difference in ANN and Cox regressionmodelspredictions It seems that for simpler models the levels ofcensorship have no effect on predictions but the predictionsin more complex models depend on the levels of censorshipThe results showed that the NN model for more complexmodels was provided better predictions But for simplermodels predictions there was not any different in resultsThis result was consistent with the finding fromXiangrsquos study[26] Therefore NN model is proposed in two cases of (1)occurrence of high censorship (ie censoring rate of 60 andhigher) andor (2) in the complex models (ie with manycovariates and any interaction terms) This is a very goodresult and can be used in practical issues which often arefaced onwithmany numbers of variables and alsomany casesof censorship For that reason in these two cases the ANNstrategy can be used as an alternate of traditional Cox modelFinally it is mentioned that there are some flexible alternativemethods such as piecewise exponential and grouped timemodels which can be used for survival data and then its abilitycompared with ANNmodel
Acknowledgment
The authors wish to express their special thanks to refereesfor their valuable comments
References
[1] E T Lee and J W Wang Statistical Methods for SurvivalData Analysis Wiley Series in Probability and Statistics Wiley-Interscience Hoboken NJ USA 3rd edition 2003
[2] V Lagani and I Tsamardinos ldquoStructure-based variable selec-tion for survival datardquo Bioinformatics vol 26 no 15 pp 1887ndash1894 2010
[3] M H Kutner C J Nachtsheim and J Neter Applied LinearRegression Models McGraw-HillIrwin New York NY USA4th edition 2004
[4] WG Baxt and J Skora ldquoProspective validation of artificial neu-ral networks trained to identify acute myocardial infarctionrdquoLancet vol 347 pp 12ndash15 1996
Journal of Probability and Statistics 7
[5] B A Mobley E Schecheer and W E Moore ldquoPredictionof coronary artery stenosis by artificial networksrdquo ArtificialIntelligence in Medicine vol 18 pp 187ndash203 2000
[6] D West and V West ldquoModel selection for medical diagnosticdecision support system breast cancer detection caserdquoArtificialIntelligence in Medicine vol 20 pp 183ndash204 2000
[7] F Ambrogi N Lama P Boracchi and E Biganzoli ldquoSelectionof artificial neural network models for survival analysis withgenetic algorithmsrdquo Computational Statistics amp Data Analysisvol 52 no 1 pp 30ndash42 2007
[8] E Biganzoli P Boracchi LMariani andEMarubini ldquoFeed for-ward neural networks for the analysis of censored survival dataa partial logistic regression approachrdquo Statistics inMedicine vol17 pp 1169ndash1186 1998
[9] E Biganzoli P Boracchi and E Marubini ldquoA general frame-work for neural network models on censored survival datardquoNeural Networks vol 15 pp 209ndash218 2002
[10] E M Bignazoli and P Borrachi ldquoThe Partial Logistic ArtificialNeural Network (PLANN) a tool for the flexible modelling ofcensored survival datardquo in Proceedings of the European Confer-ence on EmergentAspects inClinical DataAnalysis (EACDA rsquo05)2005
[11] E M Bignazoli P Borrachi F Amborgini and E MarubinildquoArtificial neural network for the joint modeling of discretecause-specific hazardsrdquo Artificial Intelligence in Medicine vol37 pp 119ndash130 2006
[12] R Bittern A Cuschieri S D Dolgobrodov et al ldquoAn artificialneural network for analysing the survival of patients withcolorectal cancerrdquo in Proceedings of the European Symposium onArtificial Neural Networks (ESANN rsquo05) Bruges Belgium April2005
[13] K U Chen and C J Christian ldquoUsing back-propagation neuralnetwork to forecast the production values of the machineryindustry in Taiwanrdquo Journal of American Academy of BusinessCambridge vol 9 no 1 pp 183ndash190 2006
[14] C L Chia W Nick Street and H W William ldquoApplicationof artificial neural network-based survival analysis on twobreast cancer datasetsrdquo in Proceedings of the American MedicalInformatics Association Annual Symposium (AMIA rsquo07) pp130ndash134 Chicago Ill USA November 2007
[15] A Eleuteri R Tagliaferri and L Milano ldquoA novel neuralnetwork-based survival analysis modelrdquo Neural Networks vol16 pp 855ndash864 2003
[16] B D Ripley and R M Ripley ldquoNeural networks as statisticalmethods in survival analysisrdquo in Clinical Applications of Artifi-cial Neural Networks pp 237ndash255 Cambridge University PressCambridge UK 2001
[17] B Warner and M Manavendra ldquoUnderstanding neural net-works as statistical toolsrdquo Amstat vol 50 no 4 pp 284ndash2931996
[18] J P Klein andM LMoeschberger Survival Analysis Techniquesfor Censored andTruncatedData Springer NewYorkNYUSA2th edition 2003
[19] JW Kay andDM Titterington Statistics andNeural NetworksOxford University Press Oxford UK 1999
[20] E P Goss and G S Vozikis ldquoImproving health care organiza-tional management through neural network learningrdquo HealthCare Management Science vol 5 pp 221ndash227 2002
[21] R M Ripley A L Harris and L Tarassenko ldquoNon-linearsurvival analysis using neural networksrdquo Statistics in Medicinevol 23 pp 825ndash842 2004
[22] P J G Lisboa H Wong P Harris and R Swindell ldquoA Bayesianneural network approach for modeling censored data withan application to prognosis after surgery for breast cancerrdquoArtificial Intelligence in Medicine vol 28 no 1 pp 1ndash25 2003
[23] C M Bishop Neural Networks for Pattern Recognition TheClarendon Press Oxford University Press New York NY USA1995
[24] B D Ripley Pattern Recognition and Neural Networks Cam-bridge University Press Cambridge UK 1996
[25] N Fallah G U Hong KMohammad et al ldquoNonlinear Poissonregression using neural networks a simulation studyrdquo NeuralComputing and Applications vol 18 no 8 pp 939ndash943 2009
[26] A Xiang P Lapuerta A Ryutov et al ldquoComparison of theperformance of neural network methods and Cox regressionfor censored survival datardquo Computational Statistics amp DataAnalysis vol 34 no 2 pp 243ndash257 2000
[27] F E Harrell R M Califf D B Pryor et al ldquoEvaluating theyield of medical testsrdquo The Journal of the American MedicalAssociation vol 247 pp 2543ndash2546 1982
[28] F E Harrell K L Lee R M Califf et al ldquoRegression modelingstrategies for improved prognostic predictionrdquo Statistics inMedicine vol 3 pp 143ndash152 1984
Table 3 Results of concordance indexes of simulation study intesting subset (300 cases with 100 replications)
Modellowast ANN Cox RegI
20 0821 plusmn 0028 0820 plusmn 0031
30 0819 plusmn 0027 0820 plusmn 0030
40 0824 plusmn 0028 0823 plusmn 0030
50 0823 plusmn 0027 0823 plusmn 0029
60 0823 plusmn 0027 0822 plusmn 0030
70 0825 plusmn 0027 0822 plusmn 0032
80 0826 plusmn 0030 0823 plusmn 0031
II20 0818 plusmn 0029 0817 plusmn 0031
30 0817 plusmn 0034 0815 plusmn 0046
40 0818 plusmn 0032 0818 plusmn 0031
50 0816 plusmn 0035 0814 plusmn 0032
60 0816 plusmn 0031 0813 plusmn 0032
70 0814 plusmn 0033 0811 plusmn 0031
80 0812 plusmn 0034 0808 plusmn 0036
III20 0808 plusmn 0030 0798 plusmn 0035
30 0799 plusmn 0033 0790 plusmn 0033
40 0812 plusmn 0043 0795 plusmn 0038
50 0805 plusmn 0028 0795 plusmn 0030
60 0804 plusmn 0028 0790 plusmn 0033
70 0805 plusmn 0028 0791 plusmn 0034
80 0803 plusmn 0028 0790 plusmn 0032
IV20 0764 plusmn 0033 0759 plusmn 0036
30 0773 plusmn 0029 0762 plusmn 0033
40 0777 plusmn 0030 0759 plusmn 0035
50 0779 plusmn 0035 0764 plusmn 0034
60 0812 plusmn 0039 0790 plusmn 0036
70 0764 plusmn 0036 0744 plusmn 0035
80 0766 plusmn 0035 0741 plusmn 0037lowast
Four models with different rates of censoring
The model selection is based on BIC for learning set andSSE criterion for the testing subset data as a verification Theresults in Table 2 show that the simple model performs withless hidden node but complex model performs better withmore hidden nodes The MSE values confirm these results(Table 2)
In the next step to compare of ANN and Cox regressionpredictions concordance indexes were calculated from clas-sification accuracy table in testing subset Concordance indexwas reported as a generalization of the area under receiveroperating characteristic curve for censored data [27 28]Thisindexmeans that the proportion of the cases that are classifiedcorrectly in noncensored (event) and censored groups and 0to 1 values indicated as the ability of the models accuracyTheconcordance index of ANN and Cox regression models was
reported in Table 3The results of simulation study in simplermodel showed that there was not any different betweenthe predictions of Cox regression and NN models But NNpredictions were better than Cox regression predictions incomplex model with high rates of censoring
4 Conclusion
In this paper we presented two approaches for modeling ofsurvival data with different degrees of censoring Cox regres-sion and neural network models A Monte-Carlo simulationstudy was performed to compare predictive accuracy of Coxand neural network models in simulation data sets
In the simulation study four different models wereconsideredThe rate of censorship in each of thesemodelswasconsidered from 20 up 80These models were consideredwith the main effects and also with the interaction termsThen the ability of these models in prediction was evaluatedAs was seen in simple models and with less censored casesthere was little difference in ANN and Cox regressionmodelspredictions It seems that for simpler models the levels ofcensorship have no effect on predictions but the predictionsin more complex models depend on the levels of censorshipThe results showed that the NN model for more complexmodels was provided better predictions But for simplermodels predictions there was not any different in resultsThis result was consistent with the finding fromXiangrsquos study[26] Therefore NN model is proposed in two cases of (1)occurrence of high censorship (ie censoring rate of 60 andhigher) andor (2) in the complex models (ie with manycovariates and any interaction terms) This is a very goodresult and can be used in practical issues which often arefaced onwithmany numbers of variables and alsomany casesof censorship For that reason in these two cases the ANNstrategy can be used as an alternate of traditional Cox modelFinally it is mentioned that there are some flexible alternativemethods such as piecewise exponential and grouped timemodels which can be used for survival data and then its abilitycompared with ANNmodel
Acknowledgment
The authors wish to express their special thanks to refereesfor their valuable comments
References
[1] E T Lee and J W Wang Statistical Methods for SurvivalData Analysis Wiley Series in Probability and Statistics Wiley-Interscience Hoboken NJ USA 3rd edition 2003
[2] V Lagani and I Tsamardinos ldquoStructure-based variable selec-tion for survival datardquo Bioinformatics vol 26 no 15 pp 1887ndash1894 2010
[3] M H Kutner C J Nachtsheim and J Neter Applied LinearRegression Models McGraw-HillIrwin New York NY USA4th edition 2004
[4] WG Baxt and J Skora ldquoProspective validation of artificial neu-ral networks trained to identify acute myocardial infarctionrdquoLancet vol 347 pp 12ndash15 1996
Journal of Probability and Statistics 7
[5] B A Mobley E Schecheer and W E Moore ldquoPredictionof coronary artery stenosis by artificial networksrdquo ArtificialIntelligence in Medicine vol 18 pp 187ndash203 2000
[6] D West and V West ldquoModel selection for medical diagnosticdecision support system breast cancer detection caserdquoArtificialIntelligence in Medicine vol 20 pp 183ndash204 2000
[7] F Ambrogi N Lama P Boracchi and E Biganzoli ldquoSelectionof artificial neural network models for survival analysis withgenetic algorithmsrdquo Computational Statistics amp Data Analysisvol 52 no 1 pp 30ndash42 2007
[8] E Biganzoli P Boracchi LMariani andEMarubini ldquoFeed for-ward neural networks for the analysis of censored survival dataa partial logistic regression approachrdquo Statistics inMedicine vol17 pp 1169ndash1186 1998
[9] E Biganzoli P Boracchi and E Marubini ldquoA general frame-work for neural network models on censored survival datardquoNeural Networks vol 15 pp 209ndash218 2002
[10] E M Bignazoli and P Borrachi ldquoThe Partial Logistic ArtificialNeural Network (PLANN) a tool for the flexible modelling ofcensored survival datardquo in Proceedings of the European Confer-ence on EmergentAspects inClinical DataAnalysis (EACDA rsquo05)2005
[11] E M Bignazoli P Borrachi F Amborgini and E MarubinildquoArtificial neural network for the joint modeling of discretecause-specific hazardsrdquo Artificial Intelligence in Medicine vol37 pp 119ndash130 2006
[12] R Bittern A Cuschieri S D Dolgobrodov et al ldquoAn artificialneural network for analysing the survival of patients withcolorectal cancerrdquo in Proceedings of the European Symposium onArtificial Neural Networks (ESANN rsquo05) Bruges Belgium April2005
[13] K U Chen and C J Christian ldquoUsing back-propagation neuralnetwork to forecast the production values of the machineryindustry in Taiwanrdquo Journal of American Academy of BusinessCambridge vol 9 no 1 pp 183ndash190 2006
[14] C L Chia W Nick Street and H W William ldquoApplicationof artificial neural network-based survival analysis on twobreast cancer datasetsrdquo in Proceedings of the American MedicalInformatics Association Annual Symposium (AMIA rsquo07) pp130ndash134 Chicago Ill USA November 2007
[15] A Eleuteri R Tagliaferri and L Milano ldquoA novel neuralnetwork-based survival analysis modelrdquo Neural Networks vol16 pp 855ndash864 2003
[16] B D Ripley and R M Ripley ldquoNeural networks as statisticalmethods in survival analysisrdquo in Clinical Applications of Artifi-cial Neural Networks pp 237ndash255 Cambridge University PressCambridge UK 2001
[17] B Warner and M Manavendra ldquoUnderstanding neural net-works as statistical toolsrdquo Amstat vol 50 no 4 pp 284ndash2931996
[18] J P Klein andM LMoeschberger Survival Analysis Techniquesfor Censored andTruncatedData Springer NewYorkNYUSA2th edition 2003
[19] JW Kay andDM Titterington Statistics andNeural NetworksOxford University Press Oxford UK 1999
[20] E P Goss and G S Vozikis ldquoImproving health care organiza-tional management through neural network learningrdquo HealthCare Management Science vol 5 pp 221ndash227 2002
[21] R M Ripley A L Harris and L Tarassenko ldquoNon-linearsurvival analysis using neural networksrdquo Statistics in Medicinevol 23 pp 825ndash842 2004
[22] P J G Lisboa H Wong P Harris and R Swindell ldquoA Bayesianneural network approach for modeling censored data withan application to prognosis after surgery for breast cancerrdquoArtificial Intelligence in Medicine vol 28 no 1 pp 1ndash25 2003
[23] C M Bishop Neural Networks for Pattern Recognition TheClarendon Press Oxford University Press New York NY USA1995
[24] B D Ripley Pattern Recognition and Neural Networks Cam-bridge University Press Cambridge UK 1996
[25] N Fallah G U Hong KMohammad et al ldquoNonlinear Poissonregression using neural networks a simulation studyrdquo NeuralComputing and Applications vol 18 no 8 pp 939ndash943 2009
[26] A Xiang P Lapuerta A Ryutov et al ldquoComparison of theperformance of neural network methods and Cox regressionfor censored survival datardquo Computational Statistics amp DataAnalysis vol 34 no 2 pp 243ndash257 2000
[27] F E Harrell R M Califf D B Pryor et al ldquoEvaluating theyield of medical testsrdquo The Journal of the American MedicalAssociation vol 247 pp 2543ndash2546 1982
[28] F E Harrell K L Lee R M Califf et al ldquoRegression modelingstrategies for improved prognostic predictionrdquo Statistics inMedicine vol 3 pp 143ndash152 1984
Table 3 Results of concordance indexes of simulation study intesting subset (300 cases with 100 replications)
Modellowast ANN Cox RegI
20 0821 plusmn 0028 0820 plusmn 0031
30 0819 plusmn 0027 0820 plusmn 0030
40 0824 plusmn 0028 0823 plusmn 0030
50 0823 plusmn 0027 0823 plusmn 0029
60 0823 plusmn 0027 0822 plusmn 0030
70 0825 plusmn 0027 0822 plusmn 0032
80 0826 plusmn 0030 0823 plusmn 0031
II20 0818 plusmn 0029 0817 plusmn 0031
30 0817 plusmn 0034 0815 plusmn 0046
40 0818 plusmn 0032 0818 plusmn 0031
50 0816 plusmn 0035 0814 plusmn 0032
60 0816 plusmn 0031 0813 plusmn 0032
70 0814 plusmn 0033 0811 plusmn 0031
80 0812 plusmn 0034 0808 plusmn 0036
III20 0808 plusmn 0030 0798 plusmn 0035
30 0799 plusmn 0033 0790 plusmn 0033
40 0812 plusmn 0043 0795 plusmn 0038
50 0805 plusmn 0028 0795 plusmn 0030
60 0804 plusmn 0028 0790 plusmn 0033
70 0805 plusmn 0028 0791 plusmn 0034
80 0803 plusmn 0028 0790 plusmn 0032
IV20 0764 plusmn 0033 0759 plusmn 0036
30 0773 plusmn 0029 0762 plusmn 0033
40 0777 plusmn 0030 0759 plusmn 0035
50 0779 plusmn 0035 0764 plusmn 0034
60 0812 plusmn 0039 0790 plusmn 0036
70 0764 plusmn 0036 0744 plusmn 0035
80 0766 plusmn 0035 0741 plusmn 0037lowast
Four models with different rates of censoring
The model selection is based on BIC for learning set andSSE criterion for the testing subset data as a verification Theresults in Table 2 show that the simple model performs withless hidden node but complex model performs better withmore hidden nodes The MSE values confirm these results(Table 2)
In the next step to compare of ANN and Cox regressionpredictions concordance indexes were calculated from clas-sification accuracy table in testing subset Concordance indexwas reported as a generalization of the area under receiveroperating characteristic curve for censored data [27 28]Thisindexmeans that the proportion of the cases that are classifiedcorrectly in noncensored (event) and censored groups and 0to 1 values indicated as the ability of the models accuracyTheconcordance index of ANN and Cox regression models was
reported in Table 3The results of simulation study in simplermodel showed that there was not any different betweenthe predictions of Cox regression and NN models But NNpredictions were better than Cox regression predictions incomplex model with high rates of censoring
4 Conclusion
In this paper we presented two approaches for modeling ofsurvival data with different degrees of censoring Cox regres-sion and neural network models A Monte-Carlo simulationstudy was performed to compare predictive accuracy of Coxand neural network models in simulation data sets
In the simulation study four different models wereconsideredThe rate of censorship in each of thesemodelswasconsidered from 20 up 80These models were consideredwith the main effects and also with the interaction termsThen the ability of these models in prediction was evaluatedAs was seen in simple models and with less censored casesthere was little difference in ANN and Cox regressionmodelspredictions It seems that for simpler models the levels ofcensorship have no effect on predictions but the predictionsin more complex models depend on the levels of censorshipThe results showed that the NN model for more complexmodels was provided better predictions But for simplermodels predictions there was not any different in resultsThis result was consistent with the finding fromXiangrsquos study[26] Therefore NN model is proposed in two cases of (1)occurrence of high censorship (ie censoring rate of 60 andhigher) andor (2) in the complex models (ie with manycovariates and any interaction terms) This is a very goodresult and can be used in practical issues which often arefaced onwithmany numbers of variables and alsomany casesof censorship For that reason in these two cases the ANNstrategy can be used as an alternate of traditional Cox modelFinally it is mentioned that there are some flexible alternativemethods such as piecewise exponential and grouped timemodels which can be used for survival data and then its abilitycompared with ANNmodel
Acknowledgment
The authors wish to express their special thanks to refereesfor their valuable comments
References
[1] E T Lee and J W Wang Statistical Methods for SurvivalData Analysis Wiley Series in Probability and Statistics Wiley-Interscience Hoboken NJ USA 3rd edition 2003
[2] V Lagani and I Tsamardinos ldquoStructure-based variable selec-tion for survival datardquo Bioinformatics vol 26 no 15 pp 1887ndash1894 2010
[3] M H Kutner C J Nachtsheim and J Neter Applied LinearRegression Models McGraw-HillIrwin New York NY USA4th edition 2004
[4] WG Baxt and J Skora ldquoProspective validation of artificial neu-ral networks trained to identify acute myocardial infarctionrdquoLancet vol 347 pp 12ndash15 1996
Journal of Probability and Statistics 7
[5] B A Mobley E Schecheer and W E Moore ldquoPredictionof coronary artery stenosis by artificial networksrdquo ArtificialIntelligence in Medicine vol 18 pp 187ndash203 2000
[6] D West and V West ldquoModel selection for medical diagnosticdecision support system breast cancer detection caserdquoArtificialIntelligence in Medicine vol 20 pp 183ndash204 2000
[7] F Ambrogi N Lama P Boracchi and E Biganzoli ldquoSelectionof artificial neural network models for survival analysis withgenetic algorithmsrdquo Computational Statistics amp Data Analysisvol 52 no 1 pp 30ndash42 2007
[8] E Biganzoli P Boracchi LMariani andEMarubini ldquoFeed for-ward neural networks for the analysis of censored survival dataa partial logistic regression approachrdquo Statistics inMedicine vol17 pp 1169ndash1186 1998
[9] E Biganzoli P Boracchi and E Marubini ldquoA general frame-work for neural network models on censored survival datardquoNeural Networks vol 15 pp 209ndash218 2002
[10] E M Bignazoli and P Borrachi ldquoThe Partial Logistic ArtificialNeural Network (PLANN) a tool for the flexible modelling ofcensored survival datardquo in Proceedings of the European Confer-ence on EmergentAspects inClinical DataAnalysis (EACDA rsquo05)2005
[11] E M Bignazoli P Borrachi F Amborgini and E MarubinildquoArtificial neural network for the joint modeling of discretecause-specific hazardsrdquo Artificial Intelligence in Medicine vol37 pp 119ndash130 2006
[12] R Bittern A Cuschieri S D Dolgobrodov et al ldquoAn artificialneural network for analysing the survival of patients withcolorectal cancerrdquo in Proceedings of the European Symposium onArtificial Neural Networks (ESANN rsquo05) Bruges Belgium April2005
[13] K U Chen and C J Christian ldquoUsing back-propagation neuralnetwork to forecast the production values of the machineryindustry in Taiwanrdquo Journal of American Academy of BusinessCambridge vol 9 no 1 pp 183ndash190 2006
[14] C L Chia W Nick Street and H W William ldquoApplicationof artificial neural network-based survival analysis on twobreast cancer datasetsrdquo in Proceedings of the American MedicalInformatics Association Annual Symposium (AMIA rsquo07) pp130ndash134 Chicago Ill USA November 2007
[15] A Eleuteri R Tagliaferri and L Milano ldquoA novel neuralnetwork-based survival analysis modelrdquo Neural Networks vol16 pp 855ndash864 2003
[16] B D Ripley and R M Ripley ldquoNeural networks as statisticalmethods in survival analysisrdquo in Clinical Applications of Artifi-cial Neural Networks pp 237ndash255 Cambridge University PressCambridge UK 2001
[17] B Warner and M Manavendra ldquoUnderstanding neural net-works as statistical toolsrdquo Amstat vol 50 no 4 pp 284ndash2931996
[18] J P Klein andM LMoeschberger Survival Analysis Techniquesfor Censored andTruncatedData Springer NewYorkNYUSA2th edition 2003
[19] JW Kay andDM Titterington Statistics andNeural NetworksOxford University Press Oxford UK 1999
[20] E P Goss and G S Vozikis ldquoImproving health care organiza-tional management through neural network learningrdquo HealthCare Management Science vol 5 pp 221ndash227 2002
[21] R M Ripley A L Harris and L Tarassenko ldquoNon-linearsurvival analysis using neural networksrdquo Statistics in Medicinevol 23 pp 825ndash842 2004
[22] P J G Lisboa H Wong P Harris and R Swindell ldquoA Bayesianneural network approach for modeling censored data withan application to prognosis after surgery for breast cancerrdquoArtificial Intelligence in Medicine vol 28 no 1 pp 1ndash25 2003
[23] C M Bishop Neural Networks for Pattern Recognition TheClarendon Press Oxford University Press New York NY USA1995
[24] B D Ripley Pattern Recognition and Neural Networks Cam-bridge University Press Cambridge UK 1996
[25] N Fallah G U Hong KMohammad et al ldquoNonlinear Poissonregression using neural networks a simulation studyrdquo NeuralComputing and Applications vol 18 no 8 pp 939ndash943 2009
[26] A Xiang P Lapuerta A Ryutov et al ldquoComparison of theperformance of neural network methods and Cox regressionfor censored survival datardquo Computational Statistics amp DataAnalysis vol 34 no 2 pp 243ndash257 2000
[27] F E Harrell R M Califf D B Pryor et al ldquoEvaluating theyield of medical testsrdquo The Journal of the American MedicalAssociation vol 247 pp 2543ndash2546 1982
[28] F E Harrell K L Lee R M Califf et al ldquoRegression modelingstrategies for improved prognostic predictionrdquo Statistics inMedicine vol 3 pp 143ndash152 1984
[5] B A Mobley E Schecheer and W E Moore ldquoPredictionof coronary artery stenosis by artificial networksrdquo ArtificialIntelligence in Medicine vol 18 pp 187ndash203 2000
[6] D West and V West ldquoModel selection for medical diagnosticdecision support system breast cancer detection caserdquoArtificialIntelligence in Medicine vol 20 pp 183ndash204 2000
[7] F Ambrogi N Lama P Boracchi and E Biganzoli ldquoSelectionof artificial neural network models for survival analysis withgenetic algorithmsrdquo Computational Statistics amp Data Analysisvol 52 no 1 pp 30ndash42 2007
[8] E Biganzoli P Boracchi LMariani andEMarubini ldquoFeed for-ward neural networks for the analysis of censored survival dataa partial logistic regression approachrdquo Statistics inMedicine vol17 pp 1169ndash1186 1998
[9] E Biganzoli P Boracchi and E Marubini ldquoA general frame-work for neural network models on censored survival datardquoNeural Networks vol 15 pp 209ndash218 2002
[10] E M Bignazoli and P Borrachi ldquoThe Partial Logistic ArtificialNeural Network (PLANN) a tool for the flexible modelling ofcensored survival datardquo in Proceedings of the European Confer-ence on EmergentAspects inClinical DataAnalysis (EACDA rsquo05)2005
[11] E M Bignazoli P Borrachi F Amborgini and E MarubinildquoArtificial neural network for the joint modeling of discretecause-specific hazardsrdquo Artificial Intelligence in Medicine vol37 pp 119ndash130 2006
[12] R Bittern A Cuschieri S D Dolgobrodov et al ldquoAn artificialneural network for analysing the survival of patients withcolorectal cancerrdquo in Proceedings of the European Symposium onArtificial Neural Networks (ESANN rsquo05) Bruges Belgium April2005
[13] K U Chen and C J Christian ldquoUsing back-propagation neuralnetwork to forecast the production values of the machineryindustry in Taiwanrdquo Journal of American Academy of BusinessCambridge vol 9 no 1 pp 183ndash190 2006
[14] C L Chia W Nick Street and H W William ldquoApplicationof artificial neural network-based survival analysis on twobreast cancer datasetsrdquo in Proceedings of the American MedicalInformatics Association Annual Symposium (AMIA rsquo07) pp130ndash134 Chicago Ill USA November 2007
[15] A Eleuteri R Tagliaferri and L Milano ldquoA novel neuralnetwork-based survival analysis modelrdquo Neural Networks vol16 pp 855ndash864 2003
[16] B D Ripley and R M Ripley ldquoNeural networks as statisticalmethods in survival analysisrdquo in Clinical Applications of Artifi-cial Neural Networks pp 237ndash255 Cambridge University PressCambridge UK 2001
[17] B Warner and M Manavendra ldquoUnderstanding neural net-works as statistical toolsrdquo Amstat vol 50 no 4 pp 284ndash2931996
[18] J P Klein andM LMoeschberger Survival Analysis Techniquesfor Censored andTruncatedData Springer NewYorkNYUSA2th edition 2003
[19] JW Kay andDM Titterington Statistics andNeural NetworksOxford University Press Oxford UK 1999
[20] E P Goss and G S Vozikis ldquoImproving health care organiza-tional management through neural network learningrdquo HealthCare Management Science vol 5 pp 221ndash227 2002
[21] R M Ripley A L Harris and L Tarassenko ldquoNon-linearsurvival analysis using neural networksrdquo Statistics in Medicinevol 23 pp 825ndash842 2004
[22] P J G Lisboa H Wong P Harris and R Swindell ldquoA Bayesianneural network approach for modeling censored data withan application to prognosis after surgery for breast cancerrdquoArtificial Intelligence in Medicine vol 28 no 1 pp 1ndash25 2003
[23] C M Bishop Neural Networks for Pattern Recognition TheClarendon Press Oxford University Press New York NY USA1995
[24] B D Ripley Pattern Recognition and Neural Networks Cam-bridge University Press Cambridge UK 1996
[25] N Fallah G U Hong KMohammad et al ldquoNonlinear Poissonregression using neural networks a simulation studyrdquo NeuralComputing and Applications vol 18 no 8 pp 939ndash943 2009
[26] A Xiang P Lapuerta A Ryutov et al ldquoComparison of theperformance of neural network methods and Cox regressionfor censored survival datardquo Computational Statistics amp DataAnalysis vol 34 no 2 pp 243ndash257 2000
[27] F E Harrell R M Califf D B Pryor et al ldquoEvaluating theyield of medical testsrdquo The Journal of the American MedicalAssociation vol 247 pp 2543ndash2546 1982
[28] F E Harrell K L Lee R M Califf et al ldquoRegression modelingstrategies for improved prognostic predictionrdquo Statistics inMedicine vol 3 pp 143ndash152 1984