Top Banner
Breast Cancer Diagnosis using a Hybrid Genetic Algorithm for Feature Selection based on Mutual Information Abeer Alzubaidi (PhD researcher) School of Science and Technology Nottingham Trent University
20

Breast Cancer Diagnosis using a Hybrid Genetic Algorithm for Feature Selection based on Mutual Information (Abeer Alzubaidi, Georgina Cosma, David Brown and Graham Pockley)

Jan 12, 2017

Download

Education

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Breast Cancer Diagnosis using a Hybrid Genetic Algorithm for Feature Selection based on Mutual Information (Abeer Alzubaidi, Georgina Cosma, David Brown and Graham Pockley)

Breast Cancer Diagnosis using a Hybrid Genetic Algorithm for Feature Selection based on

Mutual Information

Abeer Alzubaidi

(PhD researcher)

School of Science and Technology

Nottingham Trent University

Page 2: Breast Cancer Diagnosis using a Hybrid Genetic Algorithm for Feature Selection based on Mutual Information (Abeer Alzubaidi, Georgina Cosma, David Brown and Graham Pockley)

• What is Breast Cancer?

• Breast Cancer Diagnosis

• Statistical Methods

• Predictive Modelling

• Evolutionary Computation

• The Hybrid Genetic Approach

• Breast Cancer Dataset

• The results

• Current Work & Conclusion

2

Content

Page 3: Breast Cancer Diagnosis using a Hybrid Genetic Algorithm for Feature Selection based on Mutual Information (Abeer Alzubaidi, Georgina Cosma, David Brown and Graham Pockley)

What is the Breast Cancer?

• Breast Cancer begins in the breast tissue and may start in the duct or lobe of the breast when the “controls” in the breast cells are not working properly, they divide continually and a lump or tumor is formed.

3

Page 4: Breast Cancer Diagnosis using a Hybrid Genetic Algorithm for Feature Selection based on Mutual Information (Abeer Alzubaidi, Georgina Cosma, David Brown and Graham Pockley)

• Breast cancer is the most common cancer in women in both developed and developing countries.

• The number of breast cancer cases worldwide was estimated at 14.1 million new cases and 8.2 million deaths in 2012.

4

Breast Cancer Statistics

Article Source: Model Comparison for Breast Cancer Prognosis Based on Clinical Data

Page 5: Breast Cancer Diagnosis using a Hybrid Genetic Algorithm for Feature Selection based on Mutual Information (Abeer Alzubaidi, Georgina Cosma, David Brown and Graham Pockley)

Breast Cancer Diagnosis

• Successful early detection

– Better treatments to patients.

– Better clinical decision making.

5

Page 6: Breast Cancer Diagnosis using a Hybrid Genetic Algorithm for Feature Selection based on Mutual Information (Abeer Alzubaidi, Georgina Cosma, David Brown and Graham Pockley)

• Statistical methods are the most popular approaches used in clinical practice for cancer diagnosis and prognosis.

• Statistical Methods Challenges o Data Diversity

o High dimensional data.

o The uncertainty and imprecision

o Relevancy & Redundancy

6

Statistical Methods

Page 7: Breast Cancer Diagnosis using a Hybrid Genetic Algorithm for Feature Selection based on Mutual Information (Abeer Alzubaidi, Georgina Cosma, David Brown and Graham Pockley)

• Predictive modelling in medicine involves deriving a mathematical model for the prediction of an outcome for future patients.

• Our goal is to classify two types of tumors for breast cancer diagnosis, i.e. if the cancer is Malignant or if it is Benign.

7

Predictive Modelling

Page 8: Breast Cancer Diagnosis using a Hybrid Genetic Algorithm for Feature Selection based on Mutual Information (Abeer Alzubaidi, Georgina Cosma, David Brown and Graham Pockley)

Generating a good Model

Accurate Stable General

8

Prediction Model Challenges

Page 9: Breast Cancer Diagnosis using a Hybrid Genetic Algorithm for Feature Selection based on Mutual Information (Abeer Alzubaidi, Georgina Cosma, David Brown and Graham Pockley)

• Evolutionary Algorithms are suitable for constructing good predictive models.

9

Evolutionary Computation

Page 10: Breast Cancer Diagnosis using a Hybrid Genetic Algorithm for Feature Selection based on Mutual Information (Abeer Alzubaidi, Georgina Cosma, David Brown and Graham Pockley)

10

The Hybrid Genetic Approach

• The proposed method is the combination of a Genetic Algorithm (GA) based on Mutual Information (MI) for identifying cancer predictors.

• Genetic algorithm iterates through the combinations of features. The best set of features (i.e. predictors) is then selected statistically and passed through the ML classifier.

• Prediction is based the knowledge which has been acquired by the model during the learning process.

Page 11: Breast Cancer Diagnosis using a Hybrid Genetic Algorithm for Feature Selection based on Mutual Information (Abeer Alzubaidi, Georgina Cosma, David Brown and Graham Pockley)

• This study used the Wisconsin Breast Cancer dataset.

• The dataset is provided by university of Wisconsin hospital, Madison.

• The dataset contains records collected from 699 patients.

• According to the class distribution 458 (65.5%) cases were derived from patients with a benign tumor and 241 (34.5%) cases were derived from patients with a malignant tumor.

11

Breast Cancer Datasets

Page 12: Breast Cancer Diagnosis using a Hybrid Genetic Algorithm for Feature Selection based on Mutual Information (Abeer Alzubaidi, Georgina Cosma, David Brown and Graham Pockley)

The Attribute Information For Breast Cancer Datasets

Feature name Range

1 Clump thickness 1-10

2 Uniformity of cell size 1-10

3 Uniformity of cell shape 1-10

4 Marginal adhesion 1-10

5 Single epithelial cell size 1-10

6 Bare nuclei 1-10

7 Bland chromatin 1-10

8 Normal nucleoli 1-10

9 Mitoses 1-10

10 Diagnosis 0 for benign, 1 for malignant.

Article Source : Multisurface method of pattern separation for medical diagnosis applied to breast cytology 12

Page 13: Breast Cancer Diagnosis using a Hybrid Genetic Algorithm for Feature Selection based on Mutual Information (Abeer Alzubaidi, Georgina Cosma, David Brown and Graham Pockley)

Leave-One-Out Cross Validation

(LOOCV) • Breast cancer dataset contained 699 patient

cases. • Evaluations using cross validation: A total of

699 iterations. In each iteration 699-1 patient cases were used for training and the remaining one case was used for testing. This is the most acceptable approach in the clinical literature.

• Eventually, all patient cases are passed through the testing process.

• Performance of the algorithm is based on its predictive accuracy to detect the test cases (i.e. all previously unseen patient records)

13

Page 14: Breast Cancer Diagnosis using a Hybrid Genetic Algorithm for Feature Selection based on Mutual Information (Abeer Alzubaidi, Georgina Cosma, David Brown and Graham Pockley)

14

Experimental Results Using The SVM Classifier

Evaluation Measures

SVM - Kernel Functions

RBF Linear Quadratic MLP 5 Features

Correct Rate 0.9820 0.9822 0.9844 0.9795

AUC 0.9605 0.9659 0.9669 0.9508

ORP FPR 0.0332 0.0332 0.0290 0.0373

ORP TPR 0.9541 0.9651 0.9629 0.9389

6 Features

Correct Rate 0.9778 0.9823 0.9844 0.9683

AUC 0.9607 0.9681 0.9669 0.9382

ORP FPR 0.0415 0.0332 0.0290 0.0581

ORP TPR 0.9629 0.9694 0.9629 0.9345

7 Features Correct Rate 0.9822 0.9845 0.9800 0.9909

AUC 0.9648 0.9702 0.9617 0.9688

ORP FPR 0.0332 0.0290 0.0373 0.0166

ORP TPR 0.9629 0.9694 0.9607 0.9541

Page 15: Breast Cancer Diagnosis using a Hybrid Genetic Algorithm for Feature Selection based on Mutual Information (Abeer Alzubaidi, Georgina Cosma, David Brown and Graham Pockley)

15

Experimental Results Using The k-NN Classifier

Evaluation Measures

k-NN Distance Measures

Correlation Minkowski Euclidean Seuclidean

7 Features

Correct Rate 0.9605 0.9887 0.9887 0.9865

AUC 0.9156 0.9678 0.9678 0.9679

ORP FPR 0.9017 0.0207 0.0207 0.0249

ORP TPR 0.9017 0.9563 0.9563 0.9607

8 Features

Correct Rate 0.9624 0.9843 0.9843 0.9844

AUC 0.9133 0.9658 0.9658 0.9680

ORP FPR 0.8930 0.0290 0.0290 0.0290

ORP TPR 0.8930 0.9607 0.9607 0.9651

9 Features

Correct Rate 0.9559 0.9910 0.9910 0.9888

AUC 0.9104 0.9731 0.9731 0.9733

ORP FPR 0.8996 0.0166 0.0166 0.0207

ORP TPR 0.8996 0.9629 0.9629 0.9672

Page 16: Breast Cancer Diagnosis using a Hybrid Genetic Algorithm for Feature Selection based on Mutual Information (Abeer Alzubaidi, Georgina Cosma, David Brown and Graham Pockley)

• Developed a hybrid approach to detecting breast cancer based on Genetic Algorithm and Mutual Information.

• Experiments were performed to evaluate the performance of proposed approach with two different machine learning classifiers, K-NN, and SVM, each tuned using different distance measures and kernel functions, respectively.

• The results revealed that the proposed hybrid approach is highly accurate for predicting breast cancer.

16

Current Work

Page 17: Breast Cancer Diagnosis using a Hybrid Genetic Algorithm for Feature Selection based on Mutual Information (Abeer Alzubaidi, Georgina Cosma, David Brown and Graham Pockley)

17

Conclusion

Page 18: Breast Cancer Diagnosis using a Hybrid Genetic Algorithm for Feature Selection based on Mutual Information (Abeer Alzubaidi, Georgina Cosma, David Brown and Graham Pockley)

18

• Director of Studies: – Dr Georgina Cosma [email protected]

• Supervisory Team: – Professor Graham Pockley [email protected]

– Professor David Brown [email protected]

TEAM

Page 19: Breast Cancer Diagnosis using a Hybrid Genetic Algorithm for Feature Selection based on Mutual Information (Abeer Alzubaidi, Georgina Cosma, David Brown and Graham Pockley)

19

Acknowledgements

• Support: Funding received from Ministry of High Education and Scientific Research in Iraq.

Page 20: Breast Cancer Diagnosis using a Hybrid Genetic Algorithm for Feature Selection based on Mutual Information (Abeer Alzubaidi, Georgina Cosma, David Brown and Graham Pockley)

Thank you Any questions?