1 Luigi Salmaso Associate Professor of Statistics University of Padova Research Group for the Bladder Cancer multicentric study : PF. Bassi, C. Brombin, L. Corain, M. Racioppi, L. Salmaso ROBUST CLINICAL PREDICTION INTERNATIONAL SYMPOSIUM OF UROLOGY FUT-UROLOGY 2008
INTERNATIONAL SYMPOSIUM OF UROLOGY FUT-UROLOGY 2008. ROBUST CLINICAL PREDICTION. Topics. Some considerations on DATA COLLECTION and STATISTICAL METHODS most frequently used in UROLOGY Case study: INVASIVE BLADDER CANCER - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1
Luigi SalmasoAssociate Professor of StatisticsUniversity of PadovaResearch Group for the Bladder Cancer multicentric study: PF. Bassi, C. Brombin, L. Corain, M. Racioppi, L. Salmaso
ROBUST CLINICAL PREDICTION
INTERNATIONAL SYMPOSIUM OF UROLOGYFUT-UROLOGY 2008
2
Topics
• Some considerations on DATA COLLECTION and STATISTICAL METHODS most frequently used in UROLOGY
• Case study: INVASIVE BLADDER CANCER
• Application and results of several statistical methods to the case study
• Robust clinical prediction using the NonParametric Combination of Dependent Permutation Tests (NPC Test)
• Conclusions and practical suggestions
3
Necessary steps for ‘optimal’ statistical predictions
• Study design• Collecting data using
a Web-based Database
Study protocol…………………… ……………………….……………………. ………………………. ……………………. ……………………….
Individual predictions based, e. g., on nomograms or other techniques
4
Some considerations on DATA COLLECTION and STATISTICAL METHODS most frequently used in UROLOGY
•The availability of an electronic database can improve the quality and completeness of collected data, reducing, in particular, the number of missing data and the risk of imputation errors.
•Accuracy in defining the nature (observational/ randomized/…) and the endpoints of the study can lead to a better choice of the sample size and of the subsequent statistical analysis to perform.
5
ELECTRONIC DATABASE : An example
WEB-based Database
Variables’ coding
WEB-based Database
6
NonParametric Combination of Dependent Permutation Tests (NPC Test)
STATISTICAL ANALYSIS: standard methods and recent advances
Survival Analysis
Months
120100806040200
Cum
Sur
viva
l
1.0
.8
.6
.4
.2
0.0
Survival Function
Censored
Univariate Test (Student t test, Wilcoxon)
0%5%
10%15%20%25%30%35%40%45%50%
0-1 2-3 4-5 6-7 8-9 >=10Tumour (Phase III)
% o
f pat
ient
s
NEDDOD+AWD
Student's t: p =0.000Wilkoxon: p =0.000
Classification complex methods (Neural Networks,
Artificial Intelligence, …)
Multivariate Methods (Logistic regression, …)
7
Case study: INVASIVE BLADDER CANCER
Total sample size: 1,003 subjects
469 subjects including DOD (Dead of Disease) and AWD (Alive with Disease, i.e. “statistically” died) patients
534 subjects including NED (Non Evidence of Disease) patients
Lost patients and DOC (Dead for Other Causes) patients were excluded
Aim of the study: Detecting variables (factors) that best predict the outcome (DEAD or ALIVE) after a BLADDER CANCER DIAGNOSIS
Italian multicentric observational study (from Jan 2001 to Dec 2006)
Reference: prof. PF. Bassi (Univ. Cattolica, Rome)
8
• TNM-Classification of Bladder Cancer has been used, according to Wittekind & Sobin (2002), thus the original variables were transformed into ordinal variables. 30 endpoints were considered as relevant for the statistical analysis.
Case study: INVASIVE BLADDER CANCER
First sympton Diagnosispatient state of health at the first medical visit
I Phase
Diagnosispatient condition after bladder cancer diagnosis
II Phase
Surgerypatient state after surgery (histopathological variables were examined)
DiagnosisIII Phase
• In particular, the interest is in evaluating the importance of endpoints, collected at three phases of the study, in predicting the outcome.
9Months
120100806040200
Cum
Sur
viva
l
1.0
.8
.6
.4
.2
0.0
Survival Function
Censored
Results of Kaplan-Meier (survival analysis)
(artificial example)
10
0%10%20%30%40%50%60%70%80%90%
100%
0 1 2 3Grading (Phase III)
% o
f pat
ient
s
NEDDOD+AWD
Student's t: p =0.000Wilkoxon: p =0.000
Results of univariate tests
0%10%20%30%40%50%60%70%80%90%
100%
0 1Desease restarting (Phase III)
% o
f pat
ient
s
NEDDOD+AWD
Student's t: p =0.000Wilkoxon: p =0.000
0%
10%
20%
30%
40%
50%
60%
0-1 2-3 4-5 6-7 8-9 >=10Tumour (Phase II)
% o
f pat
ient
s
NEDDOD+AWD
Student's t: p =0.000Wilkoxon: p =0.000
0%
10%
20%
30%
40%
50%
60%
0-1 2-3 4-5 6-7 8-9 >=10Tumour (Phase II)
% o
f pat
ient
s
NEDDOD+AWD
Student's t: p =0.000Wilkoxon: p =0.000
11
• The logistic regression model has been applied to the same dataset but very poor results were obtained (only two significant predictors: Stage TNM at I and II Phase)
• The main problems for application:
– the inability of logistic regression to handle missing values (missing data are present in 522 subjects out of 1,003 individuals);
– the high number of coefficients to be estimated so that the recursive algorithm do not converge (after 1000 iterations). Note that when convergence is not achieved for parameter estimates, results may be unreliable.
Carcinoma In Situ (CIS) 44 4% Grading 140 14% Regional lymph nodes 7 1% Metastases 65 6% Histoloy 82 8% Trigone infiltration 100 10% Corpus invasion 145 14% Urethral involvement 110 11% Vascular invasion 144 14% Lymphonodal invasion 117 12% Prostatic Invasion 187 19% Adenocarcinoma of the Prostate 131 13% Highway TCC (Transitional Cell Carcinoma) 87 9% Desease restarting 102 10% Chemotherapy before surgery 50 5% Chemoterapy after surgery 1 0%
III P
hase
Theraphy restarting 87 9%
14
The multivariate permutation approach for hypothesis testing by NonParametric Combination (NPC) offers the following advantages:
PERMUTATION APPROACH FOR HYPOTHESIS TESTING
No need to specify the dependence structure among variables
Exact solutions
Powerful testsTreatment of missing values (missing completely at random, MCAR, or not completely at random, not-MAR)
It also deals with:- Stratification- Multivariate
categorical variables
It handles:- Mixed variables- Multivariate restricted alternatives
• NPC Test implements methods and algorithms presented in several international papers by prof. L. Salmaso and prof. F. Pesarin. L. Salmaso leads an internationally recognised research group in theoretical and applied nonparametric statistics.
• NPC TEST is a unique and innovative statistical method (and software) that provides researchers with authentic and powerful innovative solutions in the field of hypotheses testing.
Robust statistical prediction using NPC Test
15
Robust statistical prediction using NPC Test
FEATURES OF STATISTICAL SOFTWARE NPC TEST 2.0
• NPC TEST allows us to perform hypothesis testing in the case of:Two and C samples with dependent or independent variables
Two and C samples with repeated measures
Stratified analysis
• NPC TEST also provides: Powerful test statistics for the treatment
of missing values One or two tailed test
• Data (including mixed variables): categorical
ordered categorical
numeric or continuous
binary
16
t StatisticANOVA
differ. of means
test statistics - missing values
Anderson Darling
Cramer-Von-Mises
Chi-square
ModifiedChi-square
Likelihood Ratio
Robust statistical prediction using NPC TestFEATURES OF STATISTICAL SOFTWARE NPC TEST 2.0
Combining functions for intermediate tests include:
An innovation of NPC TEST w.r.t. existing methods consists in the performance of any combination of tests, starting with an appropriate set of elementary tests, leading to a multivariate or multistrata overall global test through the NPC methodology.
Elementary partial test statistics include:
Fisher Liptak Tippet Direct
NPC TEST supports all statistical software standard functions: data import, data manipulating and produces an effective report that can be easily integrated and customized by means of an efficient text editor.
17
Robust statistical prediction using NPC Test
18
• After processing variables thus obtaining p-values using NPC methods, we also performed a control of the familywise error rate (FWE)
• The need for multiplicity control arises when any problem is structured into two or more experimental hypotheses (Finos and Salmaso, 2006)
• In order to have an inference on all the hypotheses defining the multivariate problem, it is necessary to control the probability of erroneously rejecting at least one univariate (elementary) hypothesis; this is called multivariate type I error or familywise error rate (FWE) (Marcus et al., 1976)
Robust statistical prediction using NPC Test
19
Robust statistical prediction using NPC Test
CLOSED TESTING GRAPHICAL REPRESENTATION
20
p-value Phase Variables (explanation) univariate
(partial test) 1st
combination Previous superficial TCC (Transitional Cell Carcinoma) n.s Focality n.s Stage TNM n.s Grading n.s I P
Adenocarcinoma of the Prostate n.s Highway TCC (Transitional Cell Carcinoma) n.s.
n.s
Desease restarting 0,0004 Chemotherapy before surgery n.s Chemoterapy after surgery 0,0004
III P
hase
Theraphy restarting 0,0004
0,0002
Results of NPC Test
22
p-value Phase 1st
combination 2nd combination (global test)
I Phase n.s.
II Phase 0,0007
0,0006
0,0005
n.s III Phase
0,0002
0,0013
Results of NPC Test
23
• NPC method can offer a significant contribution to successful research in biomedical studies with several endpoints
• The advantages of NPC Test are connected with its flexibility of handling any type of variables
• We recommended the use of this methodology whenever the normality assumption is hard to justify, in presence of missing values and when the number of variables is higher than the number of subjects
Conclusions and practical suggestions
24
Bassi P.F., Pagano F. (2007). Invasive Bladder Cancer. Springer. Corain L., Salmaso L. (2007). A critical review and a comparative study on conditional permutation
tests for two-way ANOVA. Communications in Statistics – Simulations and Computation, 36, 791-805.
Finos L., Salmaso L. (2006). Weighted methods controlling the multiplicity when the number of variables is much higher than the number of observations. Journal of Nonparametric Statistics, 18, 245-261.
Finos L., Salmaso L. (2006). FDR- and FWE-controlling methods using data-driven weights. Journal of Statistical Inference and Planning, 137, 3859-3870.
Finos L., Salmaso L., Solari A. (2007). Conditional Inference under simultaneous stochastic ordering constraints. Journal of Statistical Inference and Planning, 137, 2633-2641.
Marcus R., Peritz E., Gabriel K.R. (1976). On closed testing procedures with special reference to ordered analysis of variance. Biometrika, 63, 655-660.
Marozzi M., Salmaso L. (2006). Multivariate Bi-Aspect Testing for Two-Sample Location Problem. Communications in Statistics – Theory and Methods, 35, 477-488.
Salmaso L., Solari A. (2005). Multiple aspect testing for case-control designs. Metrika, 62, 331-340. Wittekind C., Sobin L. H. (2002). TNM Classification of malignant tumours UICC, International Union
Against cancer (6. ed.). Wiley-Liss, New York. http://www.gest.unipd.it/~salmaso/NPC_TEST.htm
REFERENCES
25
• We applied a neural network model (Multilayer Perceptron) to the same dataset• By applying a k-fold cross-validation, we obtained a rate of right
classification of 75.3% for DOD+AWD and of 60.5% for NED. By using the subset of variables identified by univariate analysis we got a very similar performance (74.5% and 62.4%)
• Main problems of neural networks are:– Neural network work as black boxes, hence it is not possible to convert the
neuronal structure into a known model structure– All input fields ‘must’ be numeric (in the study we do not have numerical but
ordinal categorical variables)– Neuronal networks can suffer from a problem called interference (i.e. to