Probabilistic comparison of survival analysis models using ...shell.cas.usf.edu/~ctsokos/cr/ws/crws2.4.pdf · LOGO Probabilistic comparison of survival analysis models using simulation
Post on 11-Jul-2018
214 Views
Preview:
Transcript
LOGO
Probabilistic comparison of survival analysis models using simulation and cancer data
Statistical research team
Department of Mathematics and StatisticsUniversity of South Florida
Part 1Objective
By using Monte Carlo simulation, we want to evaluate uncensored survival data by using the most popular ways including Kaplan-Meier (KM) and Cox PH model and compare them with Kernel density (KD) and parametric model.
We want to validate these three ways by using real cancer data and we are interested to see if some way is better than others and based on that give some recommendations.
Parametric Analysis
We begin with the underlying failure distribution to be gamma with two parameters andHere we choose n=300 because when n=300 the MLE has already converged to the true value with small standard error.
Survival function is
True S(t)
Fitted S(t)
)3(
)2
,3(
1S(t,3,2)t
t
)3.0448746(
)1.941165,3.0448746(
1)ˆ,ˆ(t,St
t
0.3 0.5
Hazard function
Support ,
True h(t)
Fitted h(t)
is the gamma function and is the incomplete gamma function.
0t 0
)3()3(
th(t)
2
t
te
)3.0448746()3.0448746(
t(t)h
2.0448746
t
te
)(x
Graphical form n=300
We can see the Survival curve of true stat of nature and parametric gamma with MLE of and is close.
The survival plot is
3.0448746ˆ 0.5151546ˆ
0 5 10 15
0.0
0.2
0.4
0.6
0.8
1.0
time
S(t
)
true plot
fitted plot
)(S t
Probability residual analysis
Consider the difference between true and parameter curve as our probability residual then mean of the probability residual is -0.006250685 with standard deviation is 0.003213453 and standard error is 0.0001855288.
Kaplan-Meier (KM) method
Survival function of Kaplan-Meier is
Cumulative Hazard function
tti
ni
dinit
)(S
)-ln()(H
tti
ni
dinit
For n=300 the true survival plot and KM survival plot
0 5 10 15
0.0
0.2
0.4
0.6
0.8
1.0
time
S(t
)
fitted plot blue
true plot red
)(S t
Probability residual analysis
Consider the difference between true and parameter curve as our probability residual then mean of the probability residual is -0.007361148with standard deviation is 0.01458173 and standard error is 0.0008418766. Clearly the parametric way is much better than KM.
Kernel Density Analysis
probability density function is
Where K is a kernel and h is the optimal bandwidth.
Survival function
Hazard function
N
i
ih
h
xxK
Nhxf
1
)(1
)(ˆ
xX
N
i
ih
h
xxK
Nhx
1
)(1
1)(S
xX
N
i
i
N
i
i
h
h
xxK
Nh
h
xxK
Nhx
1
1
)(1
1
)(1
)(h
Detail and
Here is the detail of S(t) and h(t)
xX
N
i
ih
n
xIQRxSD
xx
n
xIQRxSDN
x1
2
5
1
5
1
)
34.1
))(),((1.06min(-1
4
3
34.1
))(),((1.06min
11)(S )(
xX
N
i
i
N
i
i
h
n
xIQRxSD
xx
n
xIQRxSDN
n
xIQRxSD
xx
n
xIQRxSDN
x
1
2
5
1
5
1
1
2
5
1
5
1
)
34.1
))(),((1.06min(-1
4
3
34.1
))(),((1.06min
11
)
34.1
))(),((1.06min(-1
4
3
34.1
))(),((1.06min
1
)(h
)(
)(
)(h )(S tt
Probability residual analysis
Consider the difference between true and parameter curve as our probability residual then mean of the probability residual is -0.007816693 with standard deviation is 0.007590435 and standard error is 0.000438234.
Mean SD SE Rank of SE
Parametric -0.00625 0.0032 0.00018 #1
Kaplan-Meier -0.0074 0.0145 0.00084 #3
Kernel density -0.0078 0.0075 0.00043 #2
Partial Conclusion
From the above table we can clearly to see that parametric is much better than other two. This verifies with what we expected.
For difficult data, which means we can not proceed to go parametric way, we are better of going to KD method instead of popular KM method.
The reason is KM and KD is tie on mean of probability residual but KD’s standard deviation and standard error is about 50% of KM way. Therefore KD should be the priority to choose for nonparametric way in this situation. (If the true is two parameter Gamma and by this random data set).
Model validation with real breast cancer data
We have known that parametric way will for sure be best. Then the next question is for KD, KM and Cox ph survival model which is better using real data. We want to check if the real data support our previous analysis.
With attributable variables we can apply Cox ph model and previous analysis we do not have covariates therefore we can not proceed Cox ph model.
[14] We use the data in the 641 breast cancer patients with 48 are uncensored. By using the real data, we will compare all three nonparametric analysis.
Goodness of fit test
There are four response variables including survival time and three types of relapse time. Here I choose survival time as my response
we need to do the goodness of fit to detect the data’s distribution and we will use One-sample Kolmogorov-Smirnov test for gamma.
One-sample Kolmogorov-Smirnov testD = 0.0638, p-value = 0.9898
P-value is very high therefore we have enough evidence to conclude that the data follows Gamma distribution
Probability residual analysis
Consider the difference between true and parameter curve as our probability residual then mean of the probability residual is 0.00949 with standard deviation is 0.14014 and standard error is 0.02023.
Probability residual analysis
Consider the difference between true and parameter curve as our probability residual then mean of the probability residual is 0.005529787 with standard deviation is 0.02352216 and standard error is 0.003395132.
Cox pH model
Analytical form
Hazard function
Survival function
We use Cox ph with interactions
)...exp()(h 2211 ikkiioi xxxhx
))...exp(exp()(S 2211
dtxxxhx ikkiioi
Probability residual analysis
All three ways
Mean SD SE Rank of SE
Cox PH 0.0095 0.14 0.02 #2
Kaplan-Meier 0.00949 0.14014 0.02023 #3
Kernel density 0.0055 0.0235 0.0034 #1
STATISTICAL MODEL VALIDATION FOR CENSORED DATA
Quite often we deal with censored data due to limited and difficult experimental conditions. In the present study we are interested in investigating how KD analysis performs under a censored data situation. The problem is that we will never know the true state of nature under the censored circumstances and the only information we are certain of is that by the time it is censored the patient is still alive.
How to conduct a goodness of fit test for censored data is still an open problem. Edel A. Pena, [9], discussing the subject matter stated that we can only reject some probability distributions but still can not have the unique best distribution to probabilistically characterize the censored data. In this study we will perform KM, KD, Cox PH and parametric survival analysis for censored data and evaluate their response.
KM Survival Analysis
The survival probability estimate of the censored data using the KM model with the parametric survival model is given by Figure 4.1, below. In comparing the KM survival curve with the parametric model using the probability residuals, we have found that the mean of the probability residual is 0.000495, sample variance is 4.596*10^(-5), sample standard deviation is 0.0068 and standard error is 0.0024.
0 2 4 6 8 10
0.8
80
.90
0.9
20
.94
0.9
60
.98
1.0
0
time (age)
Su
rviv
al p
rob
ab
ility (
%)
parametric survival plot
KM survival plot
Cox PH Model
To select the best possible Cox PH model for censored data, we consider the model has all terms significant with the minimum AIC. Through statistical testing we have found that six first order terms and two interactions significantly contribute to the response variable. These attributable variables are tx, pathsize,nodediss, age, hrlevel, stnum, tx:age and nodediss:hrlevel. Thus, for the subject data and the attributable variables using the Cox PH model we plot the probability survival curve with the parametric survival curve and they are shown by Figure 4.2, below. In comparing the Cox PH curve with the estimated parametric survival curve, we found the mean of the probability residual is 0.028, sample variance is 0.000347, sample standard deviation is 0.0186 and standard error is 0.0066.
0 2 4 6 8
0.9
00
.92
0.9
40
.96
0.9
81
.00
time (age)
Su
rviv
al p
rob
ab
ility (
%)
parametric survival plot
Cox PH survival plot
KD survival analysis
Despite the difficulties in working with censored data, we proceeded to use the nonparametric KD procedure to estimate the survival curve together with the fitted parametric survival curve. The results are shown by Figure 4.3, which are different than what we have found using KM and Cox PH models in terms of the smoothness of the curve.
In comparing the KD survival curve with the fitted parametric model, we have found the mean of the probability residual is 0.00883, sample variance is 2.942*10^(-5), sample standard deviation is 0.0054 and standard error is 0.00191.
Table 4.1, below, summarizes the response of the three survival analysis models, KM, Cox PH and KD, in comparison with the parametric model using the two parameter Weibull probability density function to characterize the failures. Thus, if we assume that we can proceed to statistically analyze the censored data, all three survival models performed well, but the edge goes to the KD model in terms of the smaller sample variance and standard error.
0 2 4 6 8
0.9
00
.92
0.9
40
.96
0.9
81
.00
time (age)
Su
rviv
al p
rob
ab
ility (
%)
parametric survival plot
KD survival plot
Residual analysis
Methods Mean SD SE Rank of model
Cox PH vs fitted parametric 0.028 0.0186 0.00658 3
Kaplan-Meier vs fitted parametric 0.000495 0.00678 0.002397 2
Kernel density vs fitted parametric 0.00883 0.0054 0.00191 1
CONCLUSIONS
The present study consists of three parts in comparing the effectiveness of three survival analysis models, namely, KM, Cox PH and KD.
Initially, using Monte Carlo simulation we compare the subject models with parametric survival models and found that the proposed KD survival model gives as good results, if not better, than the KM.
The second part consists of using actual uncensored breast cancer data. Performing a similar evaluation, the results support that the proposed KD model gives results in better estimates than the popular KM and Cox PH models with interactions.
Thirdly, we performed the same analysis with actual censored breast cancer data. Although working with censored data is quite difficult to justify such an analysis, under the circumstances we analyzed the data and the results are similar to the Monte Carlo simulation and using the uncensored data.
Part 2
IDENTIFY ATTRIBUTABLE VARIABLES AND INTERACTIONS IN BREAST CANCER
Statistical research team
Department of Mathematics and StatisticsUniversity of South Florida
The object of the present study is to develop a statistical model for breast cancer tumor size prediction for United States patients based on real uncensored data. We accomplish the objective by developing a high quality statistical model that identifies the significant attributable variables and interactions. We rank these contributing entities according to their percentage contribution to breast cancer tumor growth. This proposed statistical model can also be used to conduct surface response analysis to identify the necessary restrictions on the significant attributable variables and their interactions to minimize the size of the breast tumor. One can also use the proposed model to generate various scenarios of the tumor size as a function of different values of the subjective entities.
www.themegallery.com Company Logo
INTRODUCTION
The proposed model that we are developing includes individual variables, interactions, and higher order variables if applicable. In developing the statistical model, the response variable is the tumor size at diagnosis for breast cancer patients. We have identified 26 possible attributable variables for breast cancer, denoted, X1, X2,..., X26. For example, X1 stands for patient ID and X2 stands for the patient’s age at diagnosis. The list of attributable variables is in Table 1.1, below. In this study, we would like to find the relation between the tumor size and all other attributable variables. We cannot use survival time to predict the tumor size since death time happens after the tumor is detected. Therefore, we exclude the variable survival time(x25) and the censoring indicator function vss (x26) in the first part of study. Thus, we have only 24 variables left to construct our statistical model.
INTRODUCTION
In the present analysis, we used real data from the Surveillance Epidemiology and End Results (SEER) Program. SEER collects and compiles information on incidence, survival, and prevalence from specific geographic areas representing about 26 percent of the U.S. population plus cancer mortality for the entire U.S. [18].
The proposed statistical model is useful in predicting the tumor size given data for the attributable variables. It is statistically evaluated using R square, R square adjusted, the PRESS statistic and several types of residual analyses. Finally, its usefulness is illustrated by utilizing different combinations of the attributable variables.
In addition, the attributable variables are ranked according to their contributions to accurately estimate a patient’s tumor size.
HISTORICAL REVIEW
Survival analysis is used more and more in many areas. Many researchers have contributed to this subject. C. A. McGilchrist, [3], [4] discussed the regression with frailty in survival analysis. D. R. Cox, [7], [8] introduced the Cox proportional hazards (PH) model for survival data. E. L. Kaplan and P. Meier, [10] constructed Kaplan Meier empirical type of survival model. E. A. Gehan developed the generalized Wilcoxon test and this test is more powerful than the Cox proportional hazard’s test when the proportional hazard assumption is violated early on. K. Liu and C. P. Tsokos, [12], [13], [14] utilized kernel density methods in reliability analysis. N. Mantel and W. Haenszel, [16] proposed the Mantel Haenszel test for survival analysis. P. Qiu and C. P. Tsokos studied extensively the accelerated life testing model. Y. Xu and C. P. Tsokos, [20] probabilistically discussed and evaluated several most commonly used survival analysis models.
Some classical historical research papers can be found in [1], [6], [11] and [17]. Other important and recent references for the readers who will have an interest in survival analysis may be found in [5], and [15].
List of attributable variables
Name Full name of variables Short form
X1 Patient ID id
X2 Age at diagnosis age
X3 Year of Birth birthy
X4 Birth Place birthp
X5 Sequence Number Central snc
X6 Month of diagnosis month
X7 Year of diagnosis year
X8 Primary Site ps
X9 Laterality la
X10 Histologic Type ICDO3 ht
X11 Behavior Code ICDO3 bc
X12 Type of Reporting Source trs
List of attributable variables
X13 RXSumm SurgPrimSite rxps
X14 RXSumm Radiation rxr
X15 RXSumm RadtoCNS rxcns
X16 Age Recode Year olds ager
X17 Site Recode sr
X18 CSS chema css
X19 AJCC stage3 rdedition ajcc
X20 First malignant primary
indicator
findi
X21 State-county recode scr
X22 Race race
X23 Cause of Death to SEER
site recode
cod
X24 Sex sex
X25 Survival time recode survtime
X26 Vital Status recode vss
We proceed to develop a statistical model taking into consideration the twenty four attributable variables listed in Figure above. The form of the statistical model is given by tumor size as a function of (x1, x2,… , x24). Note that some of the variables’ values are obtained after the tumor size is recorded. In our analysis all the patients in the data base have breast cancer. We utilize the values of the tumor size once the patient has gone through a diagnostic process. Thus, the general statistical form of the proposed model with all possible attributable variables and interactions will be of the form in equation below
jjii BBBAAATS ...... 221122110
One of the basic underlying assumptions in formulating an estimate of the above statistical model is that the response variable should be Gaussian distributed. Unfortunately, in the present form that is not the case. This fact is clearly demonstrated by the QQ plot shown by Figure below.
Furthermore, the Shapiro-Wilk normality test with the necessary calculation of the test statistic W = 0.7437 and p-value = 3.787e-15 is additional evidence that the tumor size does not follow normal probability distribution. We proceed in utilizing the Box-Cox transformation to the tumor size to determine if such a filter will modify the given data to follow the normal distribution so that we can proceed to formulate the proposed statistical model. Applying the Box-Cox transformation results in the statistical information presented in Table 3.1. One tumor size data’s value is zero and Box-Cox transformation can only apply to a positive data set. Therefore, we use .00000000000001 to replace this zero value so we can perform Box-Cox transformation.
Box-Cox Transformation for Normality for the Original Data
Box-Cox Transformation for Normality from Transformed Data
Est.Power Std.Err. Wald(Power=0) Wald(Power=1)
0.2659 0.0339 7.8445 -21.6563
Est.Power Std.Err. Wald(Power=0) Wald(Power=1)
1 0.1275 7.8444 -2e-04
During our statistical analysis in the estimation process, we found only four of the twenty-four attributable variables were significant contributors. We found only three higher order interactions to significantly contribute. The significantly contributing and interaction variables are RXR(X14), COD(X23), RXPS(X13) and AJCC(X19). However, SNC(X5), HT(X10), themselves individually do not significantly contribute to the response variables but when they interact with other variables they do significantly contribute to the response variable. Therefore, we still keep them in our final model. There are thirty-one missing values in the variable AJCC. We use the mean of the rest of the data value in the variable AJCC to replace the NA value in order to perform prediction of the model. Thus, the results of estimation of equation before are given by equation below as follows
19145
1-
14105
4-
191423
6-
19
3-
1413
-3
10
-4
5
-2.2659
10.191
10.91124. 1093.4102.72
3.3-10.28310.142102.09-7.2)ˆln(
XXX
XXXXXXX
XXXXST
List of Attributable Variables
NoIndividual
variablesName of individual variables
1 X5 Sequence Number Central
2 X10 Histologic Type ICDO3
3 X13 RXSumm SurgPrimSite
4 X14 RXSumm Radiation
5 X19 AJCC stage3 rdedition
6 X23 Cause of Death to SEER site recode
Interactions
7 X14:X19 RXR ∩ AJCC
8 X5:X10:X14 SNC ∩ HT ∩ RXR
9 X5:X14:X19 SNC∩ RXR ∩ AJCC
For our final model the R squared is 0.889 and R squared adjusted is 0.881. Both R squared value and R squared adjusted value are high (close to 90%) and these two are very close to each other. This shows our model’s R squared increase is not due to the increase of the parameters’ estimates, but rather the good quality of the proposed model to predict tumor size given values of the identified attributable variables [2]. Secondly, the PRESS statistics’ results support the fact that the proposed model is of high quality. We list in Table 3.3 the best three models based on the PRESS statistic out of total thirty-six models. From that table it is clear that the best model is number 36, which is our final model.
Furthermore, R square and R square adjusted are calculated for those 36 models which are of interest but the proposed model still gives the best possible estimates of the tumor size for breast cancer in SEER’s data.
PRESS Statistics for Best Three Models
Model number PRESS value Rank of the model
31 96.73797 3
33 98.4167 2
36 104.8218 1
Rank of Variable According to Contributions
Rank Variables
1 X5:X14:X19
2 X14:X19
3 X19
4 X5:X10:X14
5 X5
6 X14
7 X13
8 X10
9 X23
Residual Analysis
No Residual Values
146 1.3264266
147 1.0579828
148 0.9756659
149 1.7362950
150 0.9643773
151 1.2427113
152 1.3705402
153 1.3640997
154 1.6072370
155 1.9573079
We first randomly divide the data into two datasets of the same size. We use one of the datasets to construct the model and then use the resulting model to predict the values in the other dataset. Then we will switch the two data sets and repeat the procedure. The mean of all residuals turned out to be 1.0652916.
Next, we divided the dataset into six small data sets and use five of them to construct the model and validate the model using the sixth one. We will repeat the same procedure for each of the six small datasets. The mean of all residuals was 0.1318486.
Finally, we divided the dataset into 155 datasets and use all 154 datasets to construct the model and validate the model using the one left out. We repeat the procedure 155 times. Table 4.2 shows the last ten residuals out of the total one hundred fifty-five residuals.
The mean of the residuals was 0.634, the variance of the residuals was 42.89, standard deviation of the residuals was 6.55 and standard error of the residuals was 0.53.
Residual Analysis for Cross Validation
No Residual Values
146 0.13328523
147 0.08479963
148 0.07219656
149 0.36801567
150 0.07578824
151 0.11712640
152 0.14230504
153 0.14117093
154 0.19570259
155 0.29023850
USEFULNESS OF THE PROPOSED STATISTICAL MODEL
We can conclude from our extensive statistical analysis that there are only four significant attributable variables to the tumor size for breast cancer namely, RXR(X14), COD(X23), RXPS(X13) and AJCC(X19). As for SNC(X5), HT(X10). They themselves individually do not significantly contribute to the response variables; however, when they interact with other variables, they do significantly contribute to the response variable. Furthermore, we also tested two thousand and three hundred possible interactions of the attributable variables and we found three interactions to significantly contribute to tumor size for breast cancer.
USEFULNESS OF THE PROPOSED STATISTICAL MODEL
This model is useful for a number of reasons.
1. It can be used to identify the significant attributable variables.
2. It identifies the significant interactions of these attributable variables.
3. The most significant contributions to the tumor size growth are ranked.
4. One can also use the proposed model to generate various scenarios of the tumor size as a function of different values of the subjective entities.
5. A confidence interval for the tumor size can be constructed with parametric analysis. By obtaining the % confidence limits for the response, we can describe how confident we are that our estimate is close to the actual tumor size.
6.The model as shown in equation (3.3) can be used to perform surface response analysis to place the restrictions on the significant attributable variables and interactions to minimize the breast cancer tumor size. We can also put restrictions on the variables to minimize the response of the tumor size by nonlinear control with % confidence limits.
CONCLUSIONS & DISCUSSION
In the present study, we performed parametric analysis to estimate tumor size for breast cancer patients. The initial measurement of tumor size was collected from the SEER database. Those data do not follow normal probability distribution. Using the standard Box-Cox transformation, the SEER tumor size data became approximately normally distributed. We developed a “nonlinear” statistical model (nonlinear in terms of the power and logarithm of the response variable). Through the process of developing the statistical model, we found only four variables, namely, rxr(X14), cod(X23), rxps(X13) and ajcc(X19) and three interactions that significantly contribute to the tumor size. The proposed statistical model was evaluated using the R-square, R-square adjusted, PRESS statistics and three cross validation methods, all of which support the high quality of the developed statistical model. This model can be used to obtain a good estimate of tumor size knowing the four significantly attributable variables and three interaction terms.
LOGO
Statistical modeling of breast cancer using differential equations
Statistical research team
Department of Mathematics and StatisticsUniversity of South Florida
Part 3
The object of the present study is to utilize the attributable variables and interactions that have been identified to cause the breast tumor (cancer) to develop a differential equation that will characterize the behavior of the tumor as a function of time. Having such a differential equation, the solution of which once plotted will identify the rate of change of tumor size as we increase time (age). Once we have the differential equations and the solution of the differential equation, we can validate the quality of the proposed differential equations.
How do we determine the partition of the age?
In order to make the differential equation to maximize the quality of the model. Since the tumor growth rate is vary according to age. We will truncate the data starting from age 40 to age 85. Because we for patients who’s age is from 33 to 37 and age 39 we have missing data. For people who’s age is more than 85, the cause of death can be more complicated. Even the natural death can be more chances, therefore we decide to truncated the data from age 40 to age 85. We want our differential equations to be connect in all intervals therefore we will use same connection point for two adjacent intervals.
Breast Cancer Tree Diagram
814 patients’
data with
available
information
Original 1000
patient data
First age group
is patients from
age 40 to 58
Second age group
is patients from
age 58 to 70
1
3
Total 159 patients
with minimum size
of tumor is 0 mm
and maximum size
of tumor is 120 mm
Total 81 patients with
minimum size of
tumor is 1 mm and
maximum size of
tumor is 140 mm
Minimum age is 32 and
maximum age is 101
Minimum size of tumor
is 0 mm and maximum
size of tumor is 200 mm
Total 226 patients with
minimum size of
tumor is 1 mm and
maximum size of
tumor is 200 mm
Third age group is
patients from age
70 to 73
2
4
Forth age group
is patients from
age 73 to 85
Total 397 patients
with minimum size
of tumor is 0 mm
and maximum size
of tumor is 140 mm
Let X stands for the term of years and the instantaneous change of rate is: TS stands for tumor size.
The differential equation is
6452432
2456
10685.210773.7345.91097.5
1014.210076.410224.3TSTS
xxxx
xx
,
From this DF we can know the relationship between the size of tumor and the derivative of the tumor size.
The solution to the above DF is integratable DF as follow:
Multiple R-squared: 0.773, Adjusted R-squared: 0.6595
534
32345
10611.13967.
968.381091.110662.410542.4d(x)
d(TS)
xx
xxx
The differential equation is
From this DF we can know the relationship between the size of tumor and the derivative of the tumor size.
The solution to the above DF is integratable DF as follow:
Multiple R-squared: 0.8515, Adjusted R-squared: 0.7773
4232
2245
10908.21039.7
10031.710968.2106887.4TSTS
xx
xx
,
3234116.518.22104512.110113.3
d(x)
d(TS)xxx
The differential equation is
The solution to the above DF is integratable DF as follow:
Multiple R-squared: 0.9998, Adjusted R-squared: 0.9994
221816.857.2510231.9TSTS xx
,
x3632.857.25d(x)
d(TS)
The differential equation is
The solution to the above DF is integratable DF as follow:
Multiple R-squared: 0.9805, Adjusted R-squared: 0.9665
53432
2457
10474.47506.110737.2
1014.210347.8103.1TSTS
xxx
xx
,
42
32245
10237.2
092.710424.810446.410792.8d(x)
d(TS)
x
xxx
LOGO
Part 4POWER LAW PROCESS IN CANCER ANALYSIS
Statistical research team
Department of Mathematics and Statistics
University of South Florida
ABSTRACT
The object of the present study is to propose the power law
process also known as non homogenous poison process which is
identical to the weibull process in analyzing and modeling
different types of cancer, especially breast cancer.
The key objective is to study the change of the tumor growth as a
function of age. The intensity function within the power law
process will give us the rate of change of the tumor growth as a
function of time.
In addition the key parameter within the intensity function can
give us the preliminary indication of the behavior of the tumor
subject to a given treatment.
POWER LAW PROCESS ANALYSIS
Power law process (PLP) also named non-homogeneous poisson process (NHPP) as well as weibull process (WP). [4]. PLP has been used in many applications. PLP is a special Poisson process and Poisson process is one of counting process.
A counting process is a stochastic process that possesses the following properties:
1. N(t) >0 2. N(t) is an integer. 3. If s<= t then N(s) <= N(t). If s< t, then N(t)-N(s) is the number of events
occurred during the interval (s,t] .[5]
,...1,0!
)(]))())([( ,
,
kk
ekaNbNP
k
baba
POWER LAW PROCESS ANALYSIS
NHPP has the intensity function:
V(t) has been very successfully used in reliability analysis v(t)=f(beta)
.0,0,0,)(
1
tfort
t
POWER LAW PROCESS ANALYSIS
We know that if the parameter beta is greater than one, then the tumor size increase means the survival rate decreased. If beta is less than one in survival analysis, then the tumor size decrease which means the survival time increase. If beta equals to one then the tumor size is constant and the NHPP will become homogenous passion process (HPP).
The unbiased estimator of beta is provided by bain and Enelhardt (1991). [8]
n
i i
n
MLEU
t
t
n
n
n
1
log
11ˆ
1) if the parameter beta is greater than one, then the tumor size increase means the survival rate decreased.
V(t)
t
2) If beta is less than one in survival analysis, then the tumor size decrease which means the survival time increase.
V(t)
t
3) If beta equals to one then the tumor size is constant and the NHPP will become homogenous passion process (HPP).
V(t)
t
POWER LAW PROCESS ANALYSIS
Gamma is indicator function. If gamma=1 the system will be failure time truncated, means our system is restricted by a number of tails and we will stop the testing when we reach that number of testing. If gamma=0 the system will be time truncated, means our system is restricted by a final failure time and we will stop the testing when we reach that time. [8]
ˆ
1ˆ
n
tn
50000 patient
data which
randomly
choose from
SEER data
Original SEER
data of 578134
patients with
breast cancer
49715 female
patients 285 male patients
b
29640
Ductal
patient
s
530
Medu
llary
patien
ts
We use simple random
sample (SRS) to select
this 50000 data from
original data set
6340
Lobular
patients
6272
patients
NA
4034 black patients 2985 Asian (other)
patients 42782 white patients
199 unknown
(unspecified) patients
2820
Ductal
patient
s
108
Medul
lary
patient
s
462
Lobular
patients
2314
Ductal
patient
s
34
Medul
lary
patient
s
299
Lobular
patients
644
patients
NA
338
patients
NA
6813
Stage 1
patients
5054
Stage 2
patients
a
a
6813
Stage 1
patients
4893
patients
alive
1920
patients
dead
426
patients
dead
because
of breast
cancer
1494
patients
dead
because
of other
reseason
250
patients
dead
without
radiation
treatment
164
patients
dead with
beam
radiation
treatment
12
patients
dead with
other
treatment
POWER LAW PROCESS ANALYSIS
Table Beta estimation for PLP
Stage 1
Beta 1 withtreatment 0.9209635
Beta 2
Without treatment
1.062705
Alpha 1
withtreatment
42.27743
Alpha 2
Without treatment
78.293
Intensity function 1
withtreatment
Intensity function 2
Without treatment
079.
28.42022.
t
063.
293.780136.
t
Intensity function without treatment
0 500 1000 1500 2000 2500
0.0
13
0.0
14
0.0
15
0.0
16
0.0
17
years
inte
nsity
CONCLUSIONS & DISCUSSION
We construct PLP for 1st stages of breast cancer patients. We calculated the beta for those PLP and found them all less than one. It is an indicator that the treatment works for those patients.
Future work We will continue do the study for more data and
eventually we can construct PLP for each stage, each tumor size available for all treatment and compare the results. Then we can make suggestion for patients with particular tumor size that which treatment is best for them to have maximum survival expectation. We will fix grade and behavior code with same sex, stage and tumor size.
We will expand this form uncensored case to censored case.
We can also apply PLP in Bayesian survival analysis to improve our result and give better suggestions.
REFERENCES
A. W. Fyles, D. R. McCready, L. A Manchul., M. E. Trudeau, P. Merante, M. Pintile, L. M. Weir, and I. A. Olivotto, Tamoxifen with or without breast irradiation in women 50 years of age or older with early breast cancer, New England Journal of Medicine, 351 (2004) 963-970.
B. Abraham and J. Ledolter, Introduction to regression modeling, 2006 C. A. McGilchrist and C.W. Aisbett. Regression with Frailty in Survival
Analysis. Biometrics, 47(2):461-466, 1991. C. A. McGilchrist, REML Estimation for Survival Models with Fraility.
Biometrics, 49(1):221-225,1993 D. Collett, Modeling survival data in medical research (Chapman &
Hall/CRC) , 2003. D. P. Harrington, T.R. Fleming, A Class of Rank Test Procedures for
Censored Survival Data. Biometrika, 69(3):553-566, 1982. D. R. Cox, Regression models and life-tables (with discussion). Journal of
the Royal Statistical Society Series B, 34: 187–220, 1972. D. R. Cox and D. Oakes, Analysis of survival data (London: Chapman &
Hall), 1984. E. L. Kaplan and P. Meier. Nonparametric estimation from incomplete
observations. 53:457-448, 1958. J. P. Klein. Semiparametric Estimation of Random Effects Using the Cox
Model Based on the EM Algorithm. Biometrics, 48(3)795-806, 1992.
REFERENCES
K. Liu and C. P. Tsokos, Nonparametric Density Estimation for the Sum of Two Independent RandomVariables, Journal of Stochastic Analysis, 2000
K. Liu and C. P. Tsokos, Nonparametric Reliability Modeling for Parallel Systems, Journal of StochasticAnalysis, 1999
K. Liu and C. P. Tsokos, Optimal Bandwidth Selection for a Nonparametric Estimate of the Cumulative Distribution Function‖, International Journal of Applied Mathematics, Vol.10, No.1, pp.33-49, 2002.
N. A Ibrahim, A. Kudus, I.Daud, and M. R. Abu Bakar, Decision tree for competing risks survival probability in breast cancer study, International Journal of Biomedical Sciences Volume 3 Number 1, 2008.
N. Mantel and W. Haenszel, Statical aspects of the analysis of data from retrospective studies of disease. Journal of the National cancer Institute, 22(4), 1959.
P. Qiu and C. P. Tsokos, Accelerated Life-Testing Model Building with Box-Cox Transformation, Sankhya, Vol. 62, Series A, Pt. 2, pp. 223-235, 2000.
U.S. National institutes of Health, http://seer.cancer.gov Wikipedia, http://en.wikipedia.org/wiki/Breast_cancer Y. Xu and C. P. Tsokos, Probabilistic Comparison of Survival Analysis
Models using Simulation and Cancer Data, Communications in Applied Analysis, 2009, accepted
top related