AUTOMATIC MITOSIS DETECTION IN BREAST HISTOPATHOLOGY IMAGES USING KNN CLASSIFIER 1 G.Usha, 2 K.Narasimman, 3 T.Shanmuganathan, 4 M.Thalaimalaichamy 1 Department of ECE, SRC, SASTRA Deemed University, Kumbakonam, Tamilnadu, India 2 Department of ECE, School of EEE, SASTRA Deemed University, Thanjavur, Tamilnadu, India 3 Department of ECE, Hindustan Institute of Technology and Science, Chennai, Tamilnadu, India 4 Department of ECE, SRC, SASTRA Deemed University, Kumbakonam, Tamilnadu, India 1 [email protected]Abstract: Mitosis detection is very hard to detect. Mitotic count is an important factor in grading of breast cancer. In fact, mitosis is a process in which nucleus of the cell undergoes various transformations. In addition, different image areas are characterized by different tissue types, which exhibit highly variable appearance. Pixel classifiers are used to solve many detection problems, and these are characterized by the relatively obvious appearance of the objects to be detected. A KNN classifier is utilized to detect mitotic candidates from the contour segmented nuclei regions. The technique utilizes stain normalization process to reduce the complexity in segmenting exact nuclei boundary in large clinical images. The algorithm provides improved performance with average F-score of 99.09% for the mitosis data set. Keywords: H & E stained images, Stain reinhard normalisation, K-means clustering, KNN 1. Introduction Mitotic count is one of the most important prognostic factors in breast cancer grading as it is the key element for the assessment of tumour. Usually, mitotic nuclei are in the form of hyper chromatic objects without a clear nuclear membrane in H & E stained breast histopathology images [1]. Fig. 1 displays four main evolution phases in the mitosis, namely interface, prophase, metaphase, anaphase and telophase. The shape of nucleus will be different in various stages. However, they should be count as single mitosis since they are not separate cells. Due to large variety of shapes, low frequency and size of the mitotic cells the detection process is time- International Journal of Pure and Applied Mathematics Volume 119 No. 18 2018, 2795-2805 ISSN: 1314-3395 (on-line version) url: http://www.acadpubl.eu/hub/ Special Issue http://www.acadpubl.eu/hub/ 2795
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
AUTOMATIC MITOSIS DETECTION IN BREAST HISTOPATHOLOGY IMAGES
USING KNN CLASSIFIER
1G.Usha,
2K.Narasimman,
3T.Shanmuganathan,
4M.Thalaimalaichamy
1Department of ECE, SRC, SASTRA Deemed University, Kumbakonam, Tamilnadu, India
2Department of ECE, School of EEE, SASTRA Deemed University, Thanjavur, Tamilnadu, India
3Department of ECE, Hindustan Institute of Technology and Science, Chennai, Tamilnadu, India
4Department of ECE, SRC, SASTRA Deemed University, Kumbakonam, Tamilnadu, India
Abstract: Mitosis detection is very hard to detect. Mitotic count is an important factor in grading
of breast cancer. In fact, mitosis is a process in which nucleus of the cell undergoes various
transformations. In addition, different image areas are characterized by different tissue types,
which exhibit highly variable appearance. Pixel classifiers are used to solve many detection
problems, and these are characterized by the relatively obvious appearance of the objects to be
detected. A KNN classifier is utilized to detect mitotic candidates from the contour segmented
nuclei regions. The technique utilizes stain normalization process to reduce the complexity in
segmenting exact nuclei boundary in large clinical images. The algorithm provides improved
performance with average F-score of 99.09% for the mitosis data set.
Keywords: H & E stained images, Stain reinhard normalisation, K-means clustering, KNN
1. Introduction
Mitotic count is one of the most important prognostic factors in breast cancer grading as it
is the key element for the assessment of tumour. Usually, mitotic nuclei are in the form of hyper
chromatic objects without a clear nuclear membrane in H & E stained breast histopathology
images [1]. Fig. 1 displays four main evolution phases in the mitosis, namely interface, prophase,
metaphase, anaphase and telophase. The shape of nucleus will be different in various stages.
However, they should be count as single mitosis since they are not separate cells. Due to large
variety of shapes, low frequency and size of the mitotic cells the detection process is time-
International Journal of Pure and Applied MathematicsVolume 119 No. 18 2018, 2795-2805ISSN: 1314-3395 (on-line version)url: http://www.acadpubl.eu/hub/Special Issue http://www.acadpubl.eu/hub/
2795
consuming and extremely difficult. In addition, irregular illumination, non-uniform stain
variation, and lymphocyte presence nuclei makes the detection process more difficult [2].
In this paper the pre-processing is done by using Stain Reinhard Normalisation technique.
The input images are H & E stained images. In this normalisation Hematoxyline stains nuclei
cells into blue and Eosin stains proteins as red, pink or orange. In segmentation K-Means
clustering algorithm is used to segment the interested area from the background. It will classify
given set into certain number of clusters. „K‟ is selected centroids so as the number of clusters.
The classification of the selected clustered image is done by means of KNN classifier.
2. Literature Survey
The detection of mitosis in H and E stained slides of the breast cancer is tedious process
because mitosis are of small sizes with large variety of shapes. Mitosis can be easily confused
with other artefacts present in the image [3]. The Krill Herd algorithm is proposed for solving
optimization of the tasks. It is based on the herding behaviour simulation of krill individuals.
Minimum distances of the individual krill from highest density of the herd are the objective
function for the krill movement [4]. The number of cells undergoing mitosis will play a vital role
in the classification system. However manual calculation is difficult, a computer assisted system
will produce precise results which results in high accuracy [5]. Image analysis using multi
threshold concept is implemented in the detection process to produce maximum optimization [6].
The multi threshold concept is applied in the segmentation of the biomedical images so that no
cell is left behind.
3. Methods
For the classification of the input image five steps were involved namely, image
acquisition, pre-processing, segmentation, feature extraction, performance analysis. The acquired
input image is treated using pre-processing by using stain normalisation followed by
segmentation using K-means clustering and in the classification is done by using Knn classifier.
International Journal of Pure and Applied Mathematics Special Issue
2796
4. Preprocessing
Image pre-processing is the process of enhancing the image . It consists of 3 major steps
namely filtering noise in input image, edge detection to detect the required object from the
unwanted background and binary image conversion (the process of converting the pixel
Fig. 1.Samples of mitotic cells in five mitotic phases.
International Journal of Pure and Applied Mathematics Special Issue
2797
value of the image into zero‟s and one‟s. The technique used for pre-processing is Stain reinhard
normalization Fig 2. Haematoxylin images are of dark blue or violet stained of basic in nature and
binds to basophilic substances such as DNA/RNA which are acidic in nature. Eosin is a pink or
red stain of acidic in nature which binds to acidophilic substances like DNA/RNA arginine and
colours cytoplasm red and RBC cherry red in colour. [9] Haemalum is a complex formed from
aluminium and haematin. It results in staining of nuclei cells in blue colour and with aqueous or
alcoholic solution which results in the eosinophilic structures like proteins in shades of red, pink
and orange. The staining of nuclei due to haemalum results in chemical reaction between dye and
cellular components [7].
5. Segmentation
The accuracy of mitotic count depends of the pre-processing, segmentation and
classification procedures. Cell nuclei and other cell structures can be differentiated using Stain
Normalization technique. Here comes the segmentation process where Krill Herd Algorithm
(KHA) was used in the existing system. Usually in the starting stages of breast cancer or any
other cancer cell membrane vanishes. So the background and the nuclei can be differentiated
easily. This is because we can‟t find a valid threshold (let‟s think it‟s a line that differentiates
background and nucleus). So what was done in KHA is we will first take a coloured infected
tissue image and convert that image to binary image that is 0's and 1's image (black and white
image). This is done because processing of coloured image is complex and time taking. Now in
this binary image (have pixels) selected pixels are made 1 and all the other pixels are made zero.
So now we get a selected imaged which is nothing but mask image and this specifies the centroids
of the nuclei region.
Now three thresholds are selected to differentiate nuclei from cytoplasm, background
stroma and vacuoles. This bi-level image is a mask which provides initial outline to segment
nuclei with exact boundaries by using LACM. LACM is Localized Active Contour Model which
is a broad overview in computer vision for describing object contour from disturbed image. It is
used in the applications like segmentation and shape recognition etc.., It is nothing but energy
minimizing curve (saline - it‟s a curve that connect two specific points) that pulls it towards
object contours that can withstand deformation. Till now it is the existing system and what we
used in our proposed system is K-means Algorithm instead of KHA, LACM because in LACM,
International Journal of Pure and Applied Mathematics Special Issue
2798
due to the energy minimization minute features are not considered over the entire contour and
KHA is not so efficient in large data collection and the performance speed is very low.
.
Here comes the K-means algorithm which is a clustering algorithm. It is used to segment
the selected area from the background. Before this segmentation we go for pre-processing for
improving the quality of the image. K-means classifies the given set through certain number of
clusters (bunch of similar things). How we get different clusters? let us take an example of an
image of 100*100 pixels. Let us select a part of image which has 10*10 pixels. These 10*10
pixels are nothing but 10*10 data points. In K-means K represent the count of randomly selected
centroid and so as the number of clusters we have. Let‟s take K value as 2. So now we have 2
randomly selected centroids ie., we will select two centroids randomly on the 100*100 pixel
image. Let it be C1 and C2. Now we have 10*10 data points and 2 centroids. Now we have to
calculate the distance from each data point to C1 and C2. The lesser the distance the closer the
centroid to that data point. So let us assume that 6*6(x*y) else belong to C1 and 4*4 pixels belong
to C2.
The next step is to calculate the mean i.e, average of 6*6 pixels and average of 4*4 pixels.
These averages will be the new centroids. Again the same process is repeated by calculating the
distance of data points from these new centroids and so on. After three to four iterations the
process can be stopped because though we can get new centroids those will be very nearer to the
previously arrived centroids.
We are extracting 2 clusters from the normalized image in segmentation. In our project we
considered the value of K as 4 and so we have 4 randomly selected centroids and so as the four
clustered images. We can get such number of images which have different similar things using
repetition matrix (inbuilt function) in matlab. This is how the segmentation is done using K-
means Algorithm.
5.1Advantages of K-means Algorithm
By K-means algorithm high performance speed is achieved by means of the repetition
matrix. The efficiency in the data collection is high. Accurate boundaries can be identified by
using K-means clustering.
International Journal of Pure and Applied Mathematics Special Issue
2799
6. Nuclei Classification
Classification phase consists of three stages such as
• Feature computation
• Feature selection
• Decision fusion of individual classifiers using KNN classifier frame work
7. Feature Computation
The cells which undergo mitosis will exhibit variations in texture, shape, size at different
stages. Fig.3(a) shows the example of an input image and Fig.3(b) displays zoomed version of a
selected segmented region. Useful features such as intensity based features, shape based and
texture based features of the cells are extracted from the segmented nucleus patch shown in
Fig.3(c). The intensity-based features include Median (M), Variance (V), Kurtosis (K) and
Skewness (S). The features such as Area (A), Perimeter (P) and Solidity (SL) are the shape-based
features considered along with thirteen Haralick texture features [3,7].
The Gray Level Co-occurrence Matrix (GLCM) will describe the pairing of pixels with
specific values which occurred in an image. However, the GLCM matrices can be estimated by
taking any direction. The adjacency occurs in horizontal (0◦), vertical (90◦), along 45◦ & 135◦, the
texture features are computed along these four directions. By taking the average in all the four
directions, thirteen texture features are computed that include Autocorrelation, Contrast (C),
Correlation (CR), Sum of Squares (SoS), Inverse Difference Moment (IDM), Sum Average (SA),