-
do
, R
e andia
Oral submucous brosisZernike momentsParabola ttingColor
deconvolutionFuzzy divergenceUnsupervised feature selectionGradient
vector owPattern classication
from normal oral mucosa (NOM) in respect to morphological and
textural properties of the basal cell
these cases transform into OC. Through progression of this
patho-sis, OC develops in the epithelial region of the oral mucosa.
Theprecancerous status is judged on the basis of light microscopic
his-topathological features of oral epithelial dysplasia (OED)
and/orcellular atypia which have different grades according
toinvolvement of the epithelial region (Paul et al., 2005).
sue biopsies for a long time, relying on their personal
experienceon giving decisions about the healthiness state of the
examinedbiopsy. This includes distinguishing normal from abnormal
(i.e.,cancerous) tissue, benign versus malignant tumors and
identifyingthe level of tumor malignancy. Nevertheless, variability
in the re-ported diagnosis may still occur (Duncan & Ayache,
2000), whichcould be due to, but not limited to the heterogeneous
nature ofthe diseases; ambiguity caused by nuclei overlapping;
noise arisingfrom the staining process of the tissue samples;
intra-observer var-iability, i.e., pathologists are not able to
give the same reading ofthe same image at more than one occasion;
and inter-observer
Corresponding author. Address: School of Medical Science and
Technology,Indian Institute of Technology Kharagpur, West Bengal
721 302, India. Tel.: +913222 283570; fax: +91 3222 28881.
Expert Systems with Applications 39 (2012) 10621077
Contents lists available at
Expert Systems w
.eE-mail address: [email protected] (C.
Chakraborty).1. Introduction
Oral cancer (OC) is the sixth most common cancer in the world.It
accounts for approximately 4% of all cancers and 2% of all
cancerdeaths world-wide. In India it is the commonest malignant
neo-plasm, accounting for 2030% of all cancers (Banoczy,
1982;Burkhardt, 1985; Daftary et al., 1993). A higher incidence of
OC isobserved on the Indian subcontinent mainly due to the late
diagno-sis of potentially precancerous lesions. Oral submucous
brosis(OSF) is an insidious chronic, progressive, precancerous
conditionwith a high degree of malignant potentiality. A large
number of
There is no established quantitative technique by which
histop-athologically signicant features of the diseased tissue like
(i) thick-ness of different histological layers, (ii) density,
distribution, andalignment of tissue components, and (iii) cell
population density,distribution, and their different morphological
attributes could beanalyzed. Actually, a precancerous state
generally depicts mixedfeatures of normalcy as well as pro- or
pre-malignancy. With thedisease progression, the histological
scenario alters slowly in differ-ent combinations, characterizing
the specic pathological state ofprogression toward malignancy (Paul
et al., 2005).
Pathologists have been using microscopic images to study
tis-0957-4174/$ - see front matter 2011 Elsevier Ltd.
Adoi:10.1016/j.eswa.2011.07.107nuclei. Practically, basal cells
constitute the proliferative compartment (called basal layer) of
the epithe-lium. In the context of histopathological evaluation,
the morphometry and texture of basal nuclei areassumed to vary
during malignant transformation according to onco-pathologists. In
order to automatethe pathological understanding, the basal layer is
initially extracted from histopathological images ofNOM (n = 341)
and OSF (n = 429) samples using fuzzy divergence, morphological
operations and parabolatting followed by median lter-based noise
reduction. Next, the nuclei are segmented from the layerusing color
deconvolution, marker-controlled watershed transform and gradient
vector ow (GVF) activecontour method. Eighteen morphological, 4
gray-level co-occurrence matrix (GLCM) based texture fea-tures and
1 intensity feature are quantized from ve types of basal nuclei
characteristics. Afterwards,unsupervised feature selection method
is used to evaluate signicant features and hence 18 are obtainedas
most discriminative out of 23. Finally, supervised and unsupervised
classiers are trained and testedwith 18 features for the
classication between normal and OSF samples. Experimental results
areobtained and compared. It is observed that linear kernel based
support vector machine (SVM) leads to99.66% accuracy in comparison
with Bayesian classier (96.56%) and Gaussian mixture model
(90.37%).
2011 Elsevier Ltd. All rights reserved.Keywords: This work
presents a quantitative microscopic approach for discriminating
oral submucous brosis (OSF)Hybrid segmentation, characterization
anfrom histopathological images of normal
M. Muthu Rama Krishnan a, Chandan Chakraborty a,a School of
Medical Science and Technology, IIT Kharagpur, IndiabDepartment of
Oral and Maxillofacial Pathology, Guru Nanak Institute of Dental
ScienccDepartment of Electronics and Electrical Communication
Engineering, IIT Kharagpur, In
a r t i c l e i n f o a b s t r a c t
journal homepage: wwwll rights reserved.classication of basal
cell nucleiral mucosa and oral submucous brosis
anjan Rashmi Paul b, Ajoy K. Ray c
d Research, Kolkata, India
SciVerse ScienceDirect
ith Applications
lsevier .com/locate /eswa
-
aims to avoid unnecessary biopsies and assist pathologists in
theprocess of cancer diagnosis (Gilles et al., 2008;
Grootscholten
stemet al., 2008; Shuttleworth, Todman, Norrish, & Bennett,
2005).Thus, quantitative evaluation of the histopathological
features isnot only important for accurate diagnostics, but it is
also vital forassessing the relative involvement of the different
tissue compo-nents in the pathology of the disease.
Generally, basal cells form the proliferative
compartment(Shabana, Gel-Labban, & Lee, 1987) of the epithelium
from whichcells migrate, differentiate as they progress and
eventually desqua-mated at the surface. The keratinocytes of the
basal cell layer of theoral stratied squamous epithelium represent
the progenitor cells(Satheesh, Paul, & Hammond, 2007) that are
responsible for theproduction of other cells making the various
layers of the epithe-lium. Changes in the basal cells may have
serious implications onfuture cell behavior, including malignant
transformation. The mea-surement of their size and shape in OSF may
be an important prog-nostic marker as studies have shown that there
is an increase inthe size and shape of both the cell and the
nucleus during OSF.
Automatic grading of pathological images has been investigatedin
various elds during the past few years, including brain tumor asto
cytomas (ASTs) (Glotsos, 2003; McKeown & Ramsey,
1996;Scarpelli, Bartels, Montironi, Galluzzi, & Thompson, 1994;
Schad,Schmitt, Oberwittler, & Lorenz, 1987), prostate carcinoma
(Farjam,Soltanian-Zadeh, Zorro, & Jafari-Khouzani, 2005;
Jafari-Khouzani& Soltanian-Zadeh, 2003; Smith, Zajicek, Werman,
Pizov, &Sherman, 1999; Tabesh et al., 2007), renal cell
carcinoma (RCC)(Fuhrman, Lasky, & Limas, 1982; Hand &
Broders, 1931; Kim, Choi,Cha, & Choi, 2005; Lohse, Blute,
Zincke, Weaver, & Chenille, 2002;Novara, Martignoni, Artibani,
& Ficarra, 2007), and hepatocellularcarcinoma (Huang & Lai,
2010); however, an automated systemfor screening OSF biopsy images
has not been exhaustively reportedin the literature, but some works
have been reported (Muthu RamaKrishnan et al., 2009; Muthu Rama
Krishnan, Shah et al., 2010) inother layers of oral mucosa. Since
the grading system for a specictype of cancer cannot be applied to
other types of cancers, it is nec-essary to exploit appropriate
segmentation, feature extraction, andclassication methods for
different types of cancers. This is partic-ularly true for the oral
cancer because OSF biopsy images alwayssuffer from the problem of
impurities, undesirable elements, anduneven exposure. In OSF
screening, the characteristics of basal cellnuclei are the key to
estimate the degree of oral malignancy. How-ever, the areas of
nuclei, cytoplasm, and cells are difcult to be iden-tied and
measured. In this paper, we propose a novel method tosegment the
basal cell nuclei. Twenty three features are extractedfrom OSF
biopsy images according to ve types of characteristicscommonly
adopted by pathologists.
We use set of supervised (Bayesian and SVM) and
unsupervised(k-means, fuzzy c-means and Gaussian mixture model
(GMM))classiers to test the effectiveness of classication for OSF
biopsyimages. In this study, we nd that not all 23 features are
equallyimportant or necessary to distinguish normal and OSF
images.Therefore, we implemented an unsupervised feature selection
foroptimal feature subset selection so that the best performance
ofclassifying OSF images can be achieved.
2. Materials and methods
2.1. Histologyvariability, i.e., increase in classication
variation among patholo-gists. Therefore, over the past three
decades, quantitative tech-niques have been developed for
computer-aided diagnosis, which
M. Muthu Rama Krishnan et al. / Expert SyTwelve study subjects
clinically diagnosed with OSF have beensubjected to incisional
biopsy under their informed consent at theDepartment of Oral and
Maxillofacial Pathology, Guru Nanak Insti-tute of Dental Sciences
and Research, Kolkata, India. Normal studysamples are collected
from the buccal mucosa of 10 healthy volun-teers without having any
oral habits or any other known systemicdiseases with prior written
consent. The study subjects are of sim-ilar age (2140 years) and
food habits. This study is duly approvedby the ethics review
committee of the institute. All the biopsy sam-ples processed for
histopathological examination and parafnembedded tissue sections of
5 lm thickness prepared and thenstained by haematoxylin and eosin
(H&E).
2.2. Image acquisition
Images of basal layer for Normal oral mucosa and OSF are
opti-cally grabbed by Zeiss Observer. Z1 Microscope from H&E
stainedhistological sections under 100 objectives (N.A.1.4) at
School ofMedical Science & Technology. At a resolution of 0.24
lm and thepixel size of 0.06 lm. Image database for this analysis
consist of1194 cells, which are extracted from 341 normal and 429
OSF withdysplasia images. The grabbed images are digitized at 1388
1040 pixels and stored in a computer.
2.3. Image processing
The histopathological image of oral mucosa grabbed by CarlZeiss
microscope contains white and black pixels (noise) randomly.To
remove the noise median lter is used (Gonzalez & Woods,2002).
The block diagram of the proposed methodology for quanti-tative
evaluation of basal cell nuclei is as shown in Fig. 1.
2.4. Basal cell nuclei segmentation
The novel approach for basal cell nuclei analysis mainly
consistsof three stages. (1) Basal nuclei extraction, (2) feature
extractionand (3) classication. Basal nuclei extraction is a four
step mecha-nism, i.e. (a) extraction of lower boundary of
epithelium, (b) parab-ola tting to segment the basal layer (c)
segmentation of basal cellusing marker controlled watershed and (d)
extraction of nucleiusing gradient vector ow. Anatomically, basal
layer is the rstlayer in epithelium and rst mechanism of basal
nuclei extractionkeeps this promise by extracting
epithelio-mesenchymal (EM)junction. The rst step in this mechanism
consists of morphologi-cal operations on H&E stained image to
diminish the local maxima(cell boundaries and variation in collagen
bers) present in epithe-lium and connective tissue followed by
enhancing the edges byanisotropic diffusion and generating binary
image using fuzzydivergence (Chaira & Ray, 2003, 2009) based
thresholding.
2.4.1. Edge enhancement using anisotropic diffusionAnisotropic
diffusion (Grieg, Kubler, Kikinis, & Jolesz, 1992) is a
technique aiming at removing or smoothing of the homogeneouspart
of an image keeping the signicant part of the image like edge,line
and other details that are important for the image interpreta-tion.
The main idea of this approach is to embed the original imagein a
family of derived images obtained by convolving the originalimage
with a Gaussian kernel having variance t, which is a scale-space
parameter. Larger values of t correspond to coarser resolu-tion and
lower values correspond to ne resolution. This oneparameter family
of derived images can be represented as a solu-tion of the heat
conduction or diffusion equation. Mathematically,the anisotropic
diffusion is dened as
@I@t
divcx; y; trI rc:rI cx; y; tDI; 1
s with Applications 39 (2012) 10621077 1063where D denotes the
Laplacian operator,r denotes the gradient andc(x, y, t) is the
diffusion coefcient which controls the rate of
-
tem1064 M. Muthu Rama Krishnan et al. / Expert Sysdiffusion. The
solutions to the diffusion Eq. (1) is proposed by Pero-na and Malik
(1990) as two functions for diffusion coefcients
ckrIk e krIkK 2
; 2
ckrIk 11 krIkK
2 ; 3
the constant K controls the sensitivity to edges and the rst
privi-leges high-contrast edges over low-contrast edges and the
secondone privileges wide regions over smaller ones. 8-nearest
neighborsdiscretization of the Laplacian operator is used. After
diffusion, theimage edges are made sharp (Fig. 2(d)) it can be
inferred from thehistograms before and after diffusion. Further,
thresholding is doneusing fuzzy divergence to segment out the
surface epithelium.
2.4.2. Fuzzy divergence based threshold selectionFan and Xie
(1999) proposed fuzzy divergence from fuzzy expo-
nential entropy by using a single row vector. Here the
divergenceconcept of Fan and Xie is extended to an image,
represented by amatrix. In an image of size M M with L distinct
gray level having
Fig. 1. Block diagram of the proposed methodology.probabilities
(p0, p1, p2, . . . , pL1), the exponential entropy is de-ned as H
PL1i0 pie1pi .
The fuzzy entropy for an image A of size M M is dened as
HA 1n ep 1
XM1i0
XM1j0
lAfij e1lAfij 1 lAfij elAfij 1
4Here n =M2 and i, j = 0, 1, 2, 3, . . . , (M 1). lAfij is the
membershipvalue if the pixel in the image and fij is the (i, j)th
pixel of the imageA. For two images A and B, at the (i, j)th pixel
of the image, the infor-mation of discrimination between lA(aij)
and lB(bij) of images A andB is given by (Chaira & Ray, 2003,
2009)
elAaij=elBbij elAaijlBbij; 5where lA(aij) and lB(bij) are the
membership values of the (i, j)thpixels in images A and B,
respectively. i, j = 0, 1, 2, . . . ,M 1. Thediscrimination of
image A against image B may be given as
D1A;BXM1i0
XM1j0
1 1lAaij
elAaijlBbij lAaijelBbijlAaij
6Likewise the discrimination of B against A is
D2B;AXM1i0
XM1j0
1 1lBbij
elBbijlAaij lBbijelAaijlBbij
:
7So, total fuzzy divergence between image A and B is obtained
fromEqs. (6) and (7)
DA;B D1A;B D2B;A; 8
DA;B XM1i0
XM1j0
2 1 lAaij lBbij
:elAaijlBbij
1 lBbij lAaij
:elBbijlAaij: 9
In the method, image A is an original image and image B is an
ide-ally segmented image. An ideally segmented image is dened as
theimage which is perfectly thresholded so that each pixel belongs
toexactly either to the object or to the background region. In such
sit-uation, the membership values for ideally segmented image of
eachpixel belong to the object/background region should be equal
toone. Hence the above Eq. (9) becomes,
DA;B XM1io
XN1j0
2 2 lAaij
:elAaij1 lAaij:e1lAaij
:
10Henceforth, in that way the divergence value of each pixel is
calcu-lated for whole image and corresponding gray level is noted.
Thegray value corresponding to the minimum divergence (Fig. 3(b))
ischosen as threshold initially for segmenting the object
(epithelium)and background (rest of the image) regions. In fact,
the minimumdivergence value indicates the maximum belongingness of
each ob-ject pixel to the object region (epithelium) and each
backgroundpixel to the background region (connective tissue).
Morphologicaloperations are performed on Fig. 3(c) to diminish the
local maxima(cell boundaries and variation in collagen bers)
present in epithe-lium and connective tissue output is shown in
Fig. 3(d). The areahaving maximum white pixels is extracted using
connected compo-nent labeling (Fig. 3(e)).
s with Applications 39 (2012) 10621077Next, the edges,
boundaries are extracted from this binary im-age using canny edge
detector (Fig. 3 (f)). The longest edge presentin this image is the
epithelio-mesenchymal (EM) junction, which is
-
stemM. Muthu Rama Krishnan et al. / Expert Syextracted by
Connected component labeling to locate the lowerboundary of the
basal layer (Fig. 3 (f)). The abrupt variation in thisedge is
lessened by ltering it with band-pass lter. The shape
andorientation of this lower boundary at 100 magnication can
beapproximated by parabola (Fig. 3(g)). Parabola tting is
performedby linear regression as parabola equation is a linear
model (Rust,2001).
2.4.3. Parabola ttingGeneralized equation for parabola is Y =
aX2 + bX + C. If the
straight line model is inadequate for given data set,
polynomialwith degree 2, i.e., parabola may be one of the good
choices ashigher orders polynomial are unstable. Polynomial
equation is alinear model so generalized model can be used to
obtain linearregression (Muthu Rama Krishnan, Pal et al., 2010).
The modelfor the (n + 1)th order or nth degree polynomial is
yt Xn1i1
aiXi1: 11
Fig. 2. (a) Normal colour image; (b) gray scale image of (a);
(c) histogram of image (b); (s with Applications 39 (2012) 10621077
1065In matrix form,
yt1yt2yt3
..
.
ytm
2666666664
3777777775
1 Xt1 X2t1 X
nt1
1 Xt2 X2t2 .. . X
nt2
..
. . .. ..
.
1 Xtm X2tm .. . X
ntm
26666664
37777775
a
a2a3
..
.
an1
2666666664
3777777775
or in shorter form, Y = Xa, where X is m n + 1-dimensional
matrix(n ( m) and a is a n + 1 column vector.
Let us write the objective function for the least square
estima-tion as
La Xmi1
Y Xa2i Y XaT Y Xa: 12
Which we can expand to give
La YTY 2aTXTY aTXTXa: 13
d) diffused image of (b) using anisotropic diffusion; and (e)
histogram of image (d).
-
e, focted
temGeometrically, the objective function denes an (n + 1)
dimen-sional, quadratic hypersurface, sometimes called the response
sur-face, whose level curves correspond to concentric
n-dimensional
Fig. 3. (a) Normal gray scale image; (b) plots of gray level
against fuzzy divergencoperation to remove small objects within the
epithelium; (e) larger white area extra
1066 M. Muthu Rama Krishnan et al. / Expert Sysellipsoids in the
a-space. It has a unique global minimum that wecan nd by
differentiating L(a) with respect to a and equating theresult to
the zero vector,
@L@a
2XTY 2XTXa 0:
Thus, the minimizing a must satisfy the n n system of
linearequations
XTXa XTY : 14Which are often called the normal equations.
Because the columnsof X are linearly independent, the matrix
product on the left side isnonsingular, so the unique solution
is
a XTX1XTY: 15If relatively small perturbations in the data
produce relatively largeperturbation in solution, we can get more
numerically stable algo-rithm by computing an orthogonal
factorization form
X Q R0
;
where Q is an m m orthogonal matrix QTQ = l = QQT, R is an n
nupper triangular matrix, and 0 is an (m n) n matrix of zeroes.By
substituting this factorization into Equation, we can easily
verifythat a satises the n n upper triangular systemRa Q1Y ;
16where Q1 is the m n matrix formed by the rst n columns of Q(Rust,
2001).
Assuming the model tted to the data is correct, the
residualsapproximate the random errors. It is dened as ri = Yi Xia,
fori = t1, t2, . . ., , tm.Therefore, if the residuals appear to
behave randomly, it sug-gests that the model ts the data well. The
parabola is tted overthe lower boundary (EM junction).
r selection of the threshold value; (c) thresholded image of
(a); (d) morphologicalusing connected component labeling; and (f)
lower boundary extraction from (e).
s with Applications 39 (2012) 10621077Next step is to generate n
parabola parallel to the tted parab-ola for lower boundary of basal
layer such that the image gener-ated from these parallel parabola
overlays basal layer completely(Fig. 4a and b). The effective
distance between two parallel parab-olas at distal end and center
is not same. This property of the par-allel parabolas generates
image mask (Fig. 4(b)) and this mask issuperimposed on Fig. 3(a),
which gives basal layer (Fig. 4(c)).
2.4.4. Basal cell segmentation using color
deconvolutionMoreover, epithelial cell borders cannot be isolated
accurately
in H&E stain; it can be estimated statistically using space
partitionprocedure. Initially, the Haematoxylin plane is extracted
using col-or deconvolution (Ruifrok & Johnston, 2001), which
has high con-trast between nuclei and cytoplasm.
2.4.4.1. Color deconvolution. According to LambertBeers law,
thedetected intensities of light transmitted through the
specimenand the amount (A) of stain with absorption factor c is
described by
I IoeAc 17with I0 is the intensity of light entering the
specimen, I is the inten-sity of light detected after passing the
specimen. This suggests thatthe gray-values of each RGB channel
depend on concentration ofstain in a non-linear way. Hence it is
difcult to separate out eachstain by intensity. However, the
optical density (OD) can be usedto separate it out and it is dened
as
OD log10IIo Ac: 18
Hence OD is proportional to absorption factor c for given amount
ofstain. This helps us to separate the contribution of each stain
frommulti stained specimen. Each pure stain will be characterized
by a
-
specic optical density for the light in each of the three RGB
chan-nels, which can be represented by a 3 1 OD vector describing
thestain in the OD-converted RGB color space. The length of the
vectorwill be proportional to the amount of stain, while the
relative valuesof the vector describe the actual OD for the
detection channels(Ruifrok & Johnston, 2001).
In the case of three channels, the color system can be
describedas a matrix of the formwith every row representing a
specic stain,and every column representing the optical density as
detected bythe red, green and blue channel for each stain.
Stain-specic valuesfor the OD in each of the three channels can be
easily determinedby measuring relative absorption for red, green
and blue on slides
M m11 m12 m13m21 m22 m23m31 m32 m33
0B@
1CA;
where mij OijP3
k1O2ik
qand Oij is the element of the OD matrix.
If C is the 3 1 vector for amounts of the three stains at a
par-ticular pixel, then the vector of OD levels detected at that
pixel isy =MC. From the above it is clear that C =M1 y. This means,
thatmultiplication of the OD image with the inverse of the OD
matrix,which we dene as the color-deconvolution matrix D =M1,
resultsin orthonormal representation of the stains forming the
image;
Fig. 4. (a) Parabola tting using lower boundary; (b) mask of the
tted parabola; and (c) extracted basal layer using the mask.
M. Muthu Rama Krishnan et al. / Expert Systems with Applications
39 (2012) 10621077 1067stained with a single stain.If we can nd out
the ortho-normal transformation of this ma-
trix, it is easy to separate each stain contribution. The
transforma-tion has to be normalized to achieve correct balancing
of theabsorbtion factor for each separate stain. If matrixM is
normalizedmatrix of matrix OD then it is dened asFig. 5. (a)
Extracted basal layer; (b) contrast enhanced nuclei using color
deconvolumorphological operations on image (c); (e) watershed
output over image (d); (f) segmeC Dy: 19This enhanced nuclei (Fig.
5(b)) with morphological operations(Fig. 5(d)) works as a marker in
watershed algorithm to segmentepithelium in different compartment.
This compartment effectivelyshows the segmentation of epithelium in
to basal cells (Fig. 5(f)).Here, all partitions do not exactly
contain the basal cells as sometion; (c) thresholded image of (b)
using fuzzy divergence; (d) after performingnted boundaries of
basal cells are superimposed on the extracted basal layer.
-
; (d
1068 M. Muthu Rama Krishnan et al. / Expert Systems with
Applications 39 (2012) 10621077of them have the suprabasal cells or
clump of basal cells. The fol-lowing approach is adopted to
classify the partition so called pseu-do cell as a basal cell or
non-basal cell.
Fig. 6. (a) Basal cell image; (b) gradient image; (c) normalized
GVF eldFirst step is to nd the neighbors for all pseudo cells
followed byevaluation of each pseudo cell area and if it is not
within threshold,
Fig. 7. (ac) Basal cell nuclei contours tracked usinthen it
should be merged or ignored depending upon whether it ispart of the
cell or background respectively and named as to bemerged cell.
Further, shape parameter compactness and variance
) deformation of the contour; and (e) nal contour obtained using
GVF.are evaluated for to be merged cell and respective neighbor.
Thesefeatures are fuzzy in nature and are evaluated by
trapezoid
g GVF based snakes. (df) Segmented nucleus.
-
me
iteration times for gradient computation is bigger than 80 in
these
(f ) Perimeter equivalent diameter (PED) mathematically
area
(f6)
whereby ell
been done by rst tting the nucleus by a minimum bounding
rect-angle.
For a discrete case such as image, if p(x, y) is the current
pixel, the
To cal
stemcases.Fig. 6(ae) shows an example of a cell being tracked by
the GVF
snake. The red lines indicate the moving contour at different
pointsof time. The segmented nuclei is shown in Fig. 7(df).
3. Feature extraction
The criteria of OSF with dysplasia grading are usually based
onthe following four types of characteristics: nuclear changes
(varia-tion in size and shape, polymorphism (nuclei of the basal
layer areelongated and perpendicular to basement membrane),
nuclearirregularity, hyperchromasia (excessive pigmentation in
hemoglo-bin content of basal cell nuclei)). The above four types of
character-istics are provided by experienced onco-pathologists
(Paul et al.,2005) and usually used for OSF with dysplasia grading.
In addition,to facilitate computer processing and image analysis,
the onco-pathologists also suggest nuclear texture as the fth type
of char-acteristics. Then, 23 features based on these ve types of
character-istics are extracted from oral histopathological images
forclassication.
The following features are evaluated for nucleus. (a) Area,
(b)perimeter, (c) eccentricity, (d) area equivalent diameter, (e)
perim-eter equivalent diameter, (f) convex area, (g) Zernike
moments, and(h) Fourier descriptors, etc. Counting the number of
pixels presentin binary image of the nucleus gives the (f1) area,
whereas (f2)perimeter of the nucleus has been obtained by counting
the numberof boundary pixels present in the nucleus. (f3) Form
factor is pro-portional to the area of each nucleus divided by the
square ofperimeter
(http://www.dentistry.bham.ac.uk/landinig/software/software.html).
(f3) Form factor mathematically dened as
Form factor 4 p area : 20highest membership value. Moreover, the
elimination of supraba-sal layer is carried out by extracting the
lowest cell from the imageas basal layer is the rst layer in
epithelium.
2.5. Basal cell nuclei tracking using GVF snakes
The watershed segmentation gives the initial boundary
aroundnuclei which also contains the background epithelial region.
Tosegment the exact boundaries of objects, we use an
energy-minimizing contour, called snake (Xu & Prince, 1997),
which isguided by external constraint forces and inuenced by
imageforces that pull towards the edges. Snake provides a
powerfulinteractive tool for image segmentation. We use the contour
ob-tained from the previous segmentation result as the initial
contour,and then move this contour close to the more accurate
nuclei con-tour under the inuence of internal forces depending on
the intrin-sic properties of the curve and external forces derived
from theimage edge data.
To obtain good segmentation result, Gaussian Blur is applied
onthe image which restrains the noise in the cell image. The
edgegradient of the image is computed using edge computation by
So-bel operator. Flexible parameter a and rigid parameter b are
alsoanalysed by testing it. One different cases, which prove that
thesnake model cannot get good convergence result if a is less
than1. Hence a is taken to be 1.2. Furthermore, parameter b does
notwork in any cases. At the same time, iteration times of GVF
Snakeare analysed too. The segmentation result becomes stable if
thembership function. Then, to be merged cell is merged with
M. Muthu Rama Krishnan et al. / Expert Syperimeter2
(f4) Area equivalent diameter (AED) mathematically dened as
unit dthe orare nox y
culate the Zernike moment, the image is rst mapped to
theexpression of Zernike moment becomes
Amn m 1pXX
px; yVmnx; y: 30where m = 0, 1, 2,. . . denes order of the
moment and f(x, y) is thefunction being described. Here n is an
integer that depicting theangular dependence or rotation subject to
the following condition:
m jnj even; jnj 6 m: 26Now, its expression in polar coordinates
is
Vmnr; h Rmnr expjnh: 27Here Rmn is the orthogonal radial
polynomial and is dened as
Rmnr Xmjnj2s0
1sFm;n; s; r; 28
and Fm;n; s; r m s!s! mjnj2 s
! mjnj2 s
!rm2s: 29Amn m 1p x yf x; yVmnx; y dxdy; 25where Feret: Largest
axis length of minimum bounding rectangle;Breadth: The largest axis
perpendicular to the Feret (not necessarilycolinear).
(f8) Zernike moment.The Zernike polynomials are rst proposed in
1934 by Zernike.Their moment formulation appears to be one of the
most pop-ular, outperforming in terms of noise resilience,
informationredundancy and reconstruction capability. Complex
Zernikemoments are constructed using a set of complex
polynomialswhich form a complete orthogonal basis set dened on the
unitdisc. They are expressed as Two dimensional Zernike
moment(Khotanzad & Hong, 1990):Z ZAspect Ratio FeretBreadth
; 24Chaudari and Samal (2007).(f7) Aspect ratioIt is
mathematically dened asThe algorithm for minimum bounding rectangle
is given inby a minimum bounding rectangle. The ellipse
approximation hasa and b indicates major and minor axis. Which are
obtainediptical approximation. Each nucleus has been
approximatedPerimeter equivalent diameter p
; 22
Eccentricity is calculated by the following equation:
Eccentricity a2 b2
pa
; 235
dened as rp
Area equivalent diameter
4 area
r21
s with Applications 39 (2012) 10621077 1069isc using polar
coordinates, where the centre of the image isigin of the unit disc.
Those pixels falling outside the unit disct used in the
calculation. The coordinates are then described
-
byandrotScathe
age obance i
Now t
quencthe edcompu
heightrectan
hu(f13dia
some structuring element, and bottom-hat transform is the
dif-ference between the closing and the input image.
Top-hattransform returns an image containing elements that are
smal-ler than the structuring element and brighter than their
sur-roundings. Bottom-hat transform returns an image
containingelements that are smaller than the structuring elements
anddarker than their surroundings.
Spot areas ratio 1n
Xni1
1kNik kBNik kDNik
; 40
where B(Ni): the overall size of all bright-spots innucleus Ni;
D(Ni):the overallsize of all dark-spots innucleus Ni; Ni: ith
nucleus in theimage 1 6 i 6 n.
107 tem(f14)
Concav ity : Convex Area-Nuclear Area: 37(f15) Orientation:
Angle (in degrees) between the x-axis and themajor axis of the
ellipse that has the same second-moments asthe region.(f ) Area
Irregularity: The nucleus is rotated so that its major16axiboll.)
Roundness: Nuclear area divided by the area of a circle withmeter
equal to the length of the major axis.(f11) Convex area: Area of
the convex hull (area of the smallestconvex set of pixels
containing the entire nuclear object).(f12) Solidity: Nuclear area
divided by the area of the convex. R describes the deviation degree
of the nuclei to thegle.where FR(u, v) and FI(u, v)are real and
imaginary parts of the Fouriertransform of the image respectively,
and u and v are the frequenciesalong the x and y axes of the image,
respectively. Fourier descriptorsare not invariants to scaling and
translation. Scaling and translationinvariant can be achieved using
Eqs. (31) and (32).
(f10) Rectangularity of the nuclei region mathematically
denedas
R AW H : 36
A stands for the area, W stands for the width, H stands for
theu0;v0
PAC
XF2Ru;v F2I u; v; 35e, the total number of the descriptors
varies as the length ofge changes. Here the AC power of the Fourier
descriptor isted as follows:au XK1k0
skej2puk=K : 34
If we consider length of DFT of any sequence is same as original
se-sk xi jyi: 33he DFT of s(k) iscount for a binary image) is m00 =
b, where b is a predetermined va-lue (Khotanzad & Hong,
1990).
(f9) Fourier descriptorsIn any image (xi, yi) where i = 1, 2, .
. . , K represents the edgepoints of an object, Fourier descriptors
of that edge can be rep-resented by the following approach. Each
point can be treatedas a complex number (Gonzalez & Woods,
2002) so thatject center, causingm01 =m10 = 0. Following this,
scale invari-s produced by altering each object so that its area
(or pixelmoment
hx; y f xa x; y
a y
where a
b
m00
s31
and, x m10m00 ; y m01m00
.Here, m01, m00, m10 are the regular moments
mpq Xx
Xy
xpyqf x; y: 32
Translation invariance is achieved by moving the origin to the
im-the length of the vector from the origin to the coordinate point
rthe angle from the x-axis to the vector r. Zernike moments are
ation invariants but not invariants to scaling and
translation.ling and translation invariant can be achieved by
transformingpixel coordinate using following rule before applying
Zernike
0 M. Muthu Rama Krishnan et al. / Expert Syss becomes horizontal
& is then enclosed by a minimumunding rectangle (MBR). There is
at least one intersectingline, and a horizontal line. The area
irregularity is given as
Area Irregularity 1n
Xni1
14
Xnj1
maxk1...4;kj
j kSijk kSikk
j !
:
38(f17) Contour irregularity: The contour of the nucleus can be
rep-resented by a sequence of k equal spacing sample boundarypoints
{p0, p1, p2, . . . , pj1pj, . . . , pk1} with pk = p0 andp1 = pk1.
Let pj(w) be the boundary point with a distance ofw pixels from the
current point pj. The curvature at point pj isdened as:
dij tan1yj yjwxj xjw tan
1 yj1 yj1wxj1 xj1w ; d
i1 dik1
Therefore, contour irregularity is dened as
Contour Irregularity 1k
Xk1j0
jdij dij1j !
; 39
(f18) Spot areas ratio: Pigmentation is an important
characteris-tic appearing in a malignant tumor. In our system, the
brightand dark spots can be detected by top-hat and
bottom-hattransforms, respectively, on nuclei using a disk shape
structur-ing element of radius 5 (Huang & Lai, 2010). Top-hat
transformis the difference between an input image and its opening
bypoint between a nucleus and each side of its MBR as shown inFig.
8. If there are two or more intersecting points at one side,the
middle one is selected as the representative intersectingpoint
(Huang & Lai, 2010). Then, a nucleus is partitioned intofour
parts as follows. If an intersecting point is on a vertical sideof
theMBR, a horizontal cutting line will go through this point. Ifan
intersecting point is on a horizontal side of theMBR, a
verticalcutting line will go through this point. Consequently, four
possi-bly overlapping areas S1, S2, S3, and S4 will be formed with
eacharea surrounded by a segment of nucleuss boundary, a
vertical
P1
P2
P3
P4
P1
P2
P3
P4
(a) (b)Fig. 8. Area irregularity (a) Round nucleus. (b)
Irregular nucleus.
s with Applications 39 (2012) 10621077Texture features:
Haralicks texture features (Haralick, Shanmugan,& Dinstein,
1973) are calculated using the gray-level co-occurrencematrix. This
matrix is square with dimension Ng, where Ng is the
-
stemThe value of k2 is zero when the features are linearly
dependent andincreases as the amount of dependency decreases. It
may not benoted that the measure k2 is nothing but the eigenvalue
for thedirection normal to the principal component direction of
featurepair ( x, y). It is shown that maximum information
compressionachieved if a multivariate data is projected along its
principal com-ponent direction. The corresponding loss of
information in recon-struction of the pattern (in terms of second
order statistics) isequal to the eigenvalue along the direction
normal to the principalnumber of gray levels in the image. Element
[I, j] of the matrix isgenerated by counting the number of times a
pixel with value I isadjacent to a pixel with value j and then
dividing the entire matrixby the total number of such comparisons
made. Each entry is there-fore considered to be the probability
that a pixel with value Iwill befound adjacent to a pixel of value
j. Four statistics namely (f19) con-trast, (f20) correlation, (f21)
homogeneity and (f22) energy are calcu-lated from the co-occurrence
matrices calculated using offsets as(1, 0); (1, 0); (0, 1); (0, 1).
Thus
Contrast Xi;j
ji jj2pi; j; 41
Correlation Xi;j
i lij ljpi; jrirj
; 42
Homogeneity Xi;j
pi; j1 ji jj ; 43
Energy Xi;j
pi; j2: 44
(f23) Hyperchromatism: Hyperchromatism represents
excessivepigmentation in hemoglobin content of basal cell nuclei
(Huang& Lai, 2010). It is an important characteristic appearing
in amalignant tumor. For the case of sever dysplasia,
chromatinabnormality will result in increasing staining capacity of
nuclei.Thus, the intensity of nucleus in severe dysplasia
usuallyappears darker than that of normal nucleus. To nd the
hyper-chromatism mean intensity of nuclei (MNI) is calculated
asfollows:
Mean intensity of nuclei 1n
Xni1
1kNik
X8x;y2Ni
Nix; y !
;
45
where n total number of nuclei, Ni: ith nucleus in the image1 6
i 6 n.
4. Unsupervised feature selection
All extracted features are checked for possibly highly
correlatedfeatures. This process assists in removing any bias
towards certainfeatures which might afterwards affect the
classication proce-dure. An approach which is based on feature
similarity for measur-ing similarity between two random variables
based on lineardependency (Mitra, Murthy, & Pal, 2002) proposed
a measurecalled maximal information compression index. Let
Pbe the
covariance matrix of random variable x and y. Dene,
maximalinformation compression index as k2x; y smallest
eigenvalueofR, i.e.,
2k2x;y varxvaryvarxvary24varxvary1qx;y2
q :
46
M. Muthu Rama Krishnan et al. / Expert Sycomponent. Hence, k2 is
the amount of reconstruction error com-mitted if the data is
projected to a reduced dimension in the bestpossible way.
Therefore, it is a measure of the minimum amountof information loss
or the maximum amount of information com-pression. The feature
selection involves two steps, namely, parti-tioning the original
feature set into a number of homogenoussubsets (clusters) and
selecting a representative feature from eachsuch cluster.
Partitioning of the features is done based on the k-NN principle
using maximal information compression index. Indoing so, we rst
compute the k nearest features of each feature.Among them the
feature having the most compact subset (as deter-mined by its
distance to the farthest neighbor) is selected, and its
kneighboring features are discarded. This process is repeated for
theremaining features until all of them are either selected or
discarded.
While determining the k nearest-neighbors of features, we
as-sign a constant error threshold (e) which is set equal to the
dis-tance of the kth nearest-neighbor of the feature selected in
therst iteration. In subsequent iterations, we check the k2 value,
cor-responding to the subset of a feature, whether it is greater
than e ornot. If yes, then we decrease the value of k.
5. k-Fold cross validation
k-Fold cross validation the data set is divided into k
subsets.Each time, one of the k subsets is used as the test set and
the otherk 1 subsets are put together to form a training set. The
advantageof this method is that it matters less how the data gets
divided.Every data point gets to be in a test set exactly once, and
gets tobe in a training set k 1 times. The variance of the
resulting esti-mate is reduced as k is increased. The disadvantage
of this methodis that the training algorithm has to be rerun from
scratch k times,which means it takes k times as much computation to
make anevaluation. A variant of this method is to randomly divide
the datainto a test and training set k different times. The
advantage ofdoing this is that we can independently choose how
large each testset is and how many trials we average over
(http://www.cs.cmu.edu/schneide/tut5/node42.html).
6. Basal cell nuclei classication
The performance of our automatic basal cell nuclei
classicationsystem in this study is evaluated by two supervised and
threeunsupervised classiers: the Bayesian classier, the support
vectormachine (SVM) classier, the k-means, the Fuzzy c-means andGMM
clustering.
6.1. Bayesian classication
Bayesian classication is based on probability theory and
thefundamental approach to the problem of classication is
Bayesdecision theory (Duda, Hart, & Stork, 2007). The principle
of thedecision is to choose the most probable or the lowest risk
(ex-pected cost) option. The feature vector x = [x1, x2, . . ., xd]
is assumedto be generated by a d dimensional Gaussian process
havingensemble mean l and covariance matrix R such a process is
repre-sented using the probability density function given by
pxijkk 12pd2jPkj12 exp12xi lkT
X1xi lk
( ): 47
The posterior probability of such process is computed by Bayes
rule,
Pkjxn akpxnjkkPcj1ajpxnjkj
; 48
where c is the number of classes present in the data and aj is
the jth
s with Applications 39 (2012) 10621077 1071class priori
probability (>0). Here we have c = 2 viz., normal and OSFwithout
dysplasia. In order to make a Bayesian decision, the follow-ing
classication rule is adopted,
-
temIf P(k = 1|xn) > P(k = 2|xn) then xn e Normal class else
xn e Osfclass.
6.2. Support vector machine classication
The support vector machine classier (El-Naqa, Yang,
Wernick,Galatsanos, & Nishikawa, 2002; Vapnik, 1998) is based
on the ideaof margin maximization and it can be found by solving
the follow-ing optimization problem
min12wTw C
Xli1
n2i
s:t: yiwTxi bP 1 ni; i 1; l; ni P 0:49
The decision function for linear SVMs is given as f(x) = wTx +
b. Inthis formulation; we have the training data set xi; yif gi 1;
. . . ; l;where xi e Rn are the training data points or the tissue
sample vec-tors, yiare the class labels, l is the number of samples
and n is thenumber of features in each sample. By solving the
optimizationproblem (49), i.e., by nding the parameters w and b for
a giventraining set, we are effectively designing a decision
hyperplane overan n dimensional input space that produces the
maximal margin inthe space. Generally, the optimization problem
(50) is solved bychanging it into the dual problem below:
max Lda Xli1
ai 12Xli;j1
yiyjaiajxTi xj 50
Subject to 0 6 ai 6 C; i 1; . . . ; lXli1
aiyi 0: 51
In this setting, one needs to maximize the dual objective
func-tion Ld(a) with respect to the dual variables ai only. Subject
onlyto the box constraints 0 6 ai 6 C. The optimization problem
canbe solved by various established techniques for solving
generalquadratic programming problems with inequality
constraints.
6.3. k-means clustering
The k-means clustering algorithm initially assumes k
centroids(in our case k = 2). Based on the initial centroids, it
calculates thecluster label to each pattern (consisting of 18
features) based onthe minimum Euclidean distance (MacQueen, 1967).
Based onthese labels the centroids are re-estimated as the average
of allthe patterns belonging to that class at that iteration. The
conver-gence criterion is total mean squared error that should be
belowa threshold. The iterations are continued until the total MSE
is be-low the threshold. The k-means clustering minimizes
followingobjective function.
J XKk1
XNi1
kxi ckk2; 52
where xi is the ith pattern and ck is the kth centroid.
6.4. Fuzzy c-means clustering
The fuzzy c-means clustering algorithm optimizes (Bezdek,1981)
following objective function
J XNi1
Xcj1
umji kxi Vjk2; 53
1072 M. Muthu Rama Krishnan et al. / Expert Syswhere uji is the
fuzzy membership having m as the weighting expo-nent and with
pattern xi such that it can associate with the cluster jhaving
centroid Vj. The fuzzy membership has the property suchthat
Xcj1
uji 1 8i: 54
The algorithm almost works in the same manner as that of k
meansalgorithm. The update equations for the Cluster center and the
fuz-zy membership are follows:
V newj PN
i1ujimxiPNi1ujim
55
unewji 1
kxiV newj k
2m1
Pcl1
1kxiVnewl k
2m1
56
The iterations are stopped when kUnew UkF < e; a
predenedsmall real number and U fuji;1 6 j 6 c;1 6 i 6 Ng.
For different weighting exponent, it is possible to get
differentclustering accuracies.
6.5. Gaussian mixture model based clustering
Here we have a binary class problem of classication of nor-mal
and OSF with dysplasia cases. The GMM (Bilmes, 1998) as-sumes that
the features are drawn from a normal distribution.We have two
mixing components corresponding to normal andOSF classes
respectively. Therefore we have two class conditionaldensities,
p(xn|xk), 1 6 k 6 2 and 1 6 n 6 N, where k is the num-ber of
classes and N is the total number of observations or pat-terns,
corresponding class prior probabilities, p(xk), 1 6 k 6 2.Each of
the two mixing component has a mean vector and covari-ance matrix.
Since we have applied orthogonal transformation incompact supported
basis, the off diagonal elements in the covari-ance matrix are all
approximately zero since the data will behighly uncorrelated. The
probability density function of such amodel is given by
pxnjxk 12pjRkj1=2
exp 12xn xkTR1k xn xk
; 57
where xk 1jXkjXxn2xk
xn; 58
and Rk 1jXkjXxn2xk
xn xkxn xkT diagr2i ;1 6 i 6 d: 59
The corresponding posterior probabilities are given by Bayes
rule asfollows.
Pxkjxi pxijxkP2k1pxkpxijxk
: 60
Since our data consists of missing observations or it does not
repre-sent the whole of the sample space, the mean vectors and
thecovariance matrices computed are not the correct ones.
Thereforethe means and variances are recomputed using Expectation
Maxi-mization (EM) algorithm and using maximum likelihood
estimationmethod. The re-estimating formulae are following
l^j PN
i1xiPxjjxiPNi1Pxjjxi
; 61
P
s with Applications 39 (2012) 10621077r^2j Ni1xi
l^j2PxjjxiPN
i1Pxjjxi; 62
-
px^j 1NXNi1
Pxjjxi: 63
The initial prior probability is taken to be 0.5 for each of the
classes.An initial model is assumed from the data. The EM algorithm
used ishaving two core steps; E step and M step. During E step
class con-
Overall accuracy: The overall accuracy of a test is the measure
oftrue ndings (true-positive + true-negative results) divided byall
test results. This is also termed the efciency of the test.
Overall accuracy TP TNTP FP FN TN%: 67
7. Results and discussion
M. Muthu Rama Krishnan et al. / Expert Systems with Applications
39 (2012) 10621077 1073ditional density is computed according to
Eq. (57), and from it pos-terior density according to Eq. (60) is
computed. During M step theclass model is been re-estimated
according to the Eqs. (61)(63).The process is continued until the
new estimate will not changemuch from the previous estimate, and
model gets stabilized. Thenthe EM based GMM is said to be
converged. The logarithm of theclass conditional density called as
log- likelihood is computed foreach of the iteration and it will
stop increasing at convergence.
The GMM algorithm is an optimization problem which maxi-mizes
the following objective function.
J Yn
Xk
pxkpxnjxk 64
The converged centroids are such that the product over all
theobservations, the total class conditional densities weighted
withrespective prior probability will be maximized. The EM
algorithmdetermines its new estimate such that it will be
approaching tothe optimum of the objective function, so as for the
algorithm toconverge. GMM is an iterative algorithm, which can be
performedin O(ndkT) oating point operations, where n is the number
of pat-terns, d is the total number of features in a pattern, k is
the totalnumber of classes present in the data, and T is the number
of iter-ations required for convergence of the algorithm.
6.6. Performance analysis
In practice, each of the classiers is required to be evaluated
inorder to compare their sensitivity, specicity along with
overallaccuracy. In view of this, the following confusion matrix
(seeTable 1) is usually designed based on the trade-off between
actualand classier generated outputs.
where
TP: True Positive: A patient predicted with OSF when the
sub-ject actually has OSF.TN: True Negative: A patient predicted
healthy when subjectactually is healthy.FP: False Positive: A
patient predicted with OSF when subjectactually is healthy.FN:
False Negative: A patient predicted healthy when subjectactually
has OSF.
Sensitivity: It is a measure of accuracy of diagnosis of
malignant(true) cases of OSF. Mathematically, it is dened as
Sensitivity TPTP FN%: 65
Specicity: It is a measure of accuracy of diagnosis of benign
(false)cases of OSF. Mathematically, it is dened as
Specificity TNFP TN%: 66
Table 1A 2 2 confusion matrix for performance evaluation.
Classier output Patients with OSF (as conrmed on biopsy)
Negative (absent) Positive(present)Negative TN FNPositive FP
TPThe basal cell nuclei boundaries are overlaid on extracted
basallayer (shown in Fig. 9) of the H&E image shown in Fig.
5(a).Fig. 7(ac) shows some of the extracted cells after performing
fuz-zy classication for identifying cells of NOM and OSF
respectively.The segmented nuclei of the cells are shown in Fig.
7(df) usingGVF. Fig. 10(a) and (c) shows the segmented normal and
dysplasticbasal cells. Fig. 10(b) and (d) shows the segmented
nucleus respec-tively, which shows the normal nucleus taken very
less stain com-pare to the dysplastic nucleus. This is due to
hyperchromatism.
The features are extracted from the segmented basal cell
nucleiFig. 7(df). Here we have 771 nuclei for normal and 423 nuclei
forOSF with dysplasia.
The features of normal and OSF are summarized into mean,standard
deviation (Table 2). The results suggest that 18 featuresare
signicant except eccentricity, solidity, rectangularity,
orienta-tion and contour irregularity in discriminating normal and
OSFgroup using unsupervised feature selection. An advantage of
usingthe unsupervised feature selection for inspecting feature
separabil-ity is that the algorithm is generic in nature and has
the capabilityof multiscale representation of the data sets. Fig.
11 shows plot be-tween feature index and feature weights of the
unsupervised fea-ture selection between normal, OSF without
dysplasia group.Feature weights are basically the distance of k-NN
for each feature.Moreover, the plot indicates signicance of the
feature to discrim-inate the two groups.
Further, numeric values of most of the feature are
increasingsteadily from the normal to OSF with dysplasia. The
nucleus areaof the dysplastic cells is twice as large as that of
the normal cells.The increase in nucleus area in this study may be
a reection ofthe increase in DNA synthesis. The changes occurring
in the basalcell nuclei might indicate an increased metabolic
activity prior tothe invasion of the underlying connective tissues.
Thus, the meanintensity of nucleus in sever dysplasia usually
appears darker thanthat of normal nucleus (Fig. 10(b) and (d)),
which can be inferredfrom the results. In normal case the intensity
value is 24.98 it indi-cates stain taken by the nucleus is less but
in OSF with dysplasiathe intensity value is 18.69, it indicates
stain taken by the nucleusis high. This is due to hyperchromatism,
i.e., excessive pigmenta-tion in hemoglobin content of basal cell
nuclei. It is an importantcharacteristic appearing in a malignant
tumor. For the case of severdysplasia, chromatin abnormality
results in increasing stainingcapacity of nuclei.Fig. 9. Segmented
boundaries of basal cells are superimposed on the extractedbasal
layer.
-
Table 2
Fig. 10. (a) Segmented normal basal cell; (b) less intense
normal basal cell nucleus; (
1074 M. Muthu Rama Krishnan et al. / Expert SystemFeatures
extracted from nucleus of normal and OSF basal cells.
Sl. no Nucleus features
1 Area2 Perimeter3 EccentricityFig. 12(a) shows the box plot for
one of the feature, area of nu-cleus, which suggests that median of
the feature is almost same asmean so neglecting the chance of
outliers for contributing thehigher difference between two classes.
Fig. 12(b) shows the densityplot of perimeter for normal and OSF
with dysplasia cases which
4 Fourier descriptors5 Zernike moments ((m = 1; n = 3))6 Area
equivalent diameter7 Perimeter equivalent diameter8 Form factor9
Convex area10 Solidity11 Roundness12 Concavity13 Orientation14
Aspect ratio15 Rectangularity16 Area irregularity17 Contour
irregularity18 Spot areas ratio19 Contrast20 Correlation21
Homogeneity22 Energy23 Mean nuclei intensity
* Signicant based on feature weights.
Fig. 11. Plot between feature indexes vs. feature weights for
showing signicanceof features.Normal OSF with dysplasial r l r
7.93 1.75 13.79 2.07*
9.31 1.15 12.50 1.13*
0.89 0.12 0.88 0.14
c) segmented dysplastic basal cell; (d) high intense dysplastic
basal cell nucleus.
s with Applications 39 (2012) 10621077shows the distinct
discrimination between the two groups and3D scatter plot as shown
in Fig. 12(c) shows that the features,i.e., Zernike moments,
Fourier descriptors and area equivalentdiameter are quiet separable
from discrimination point of viewwith this we can infer a simple
linear classier can achieve higheraccuracy.
We have evaluated the performance of OSF screening systemusing
341 normal and 429 OSF with dysplasia biopsy images of size1388
1040 pixels obtained frommore than 20 patients. To estab-lish the
ground truth, biopsy images are commonly graded by agroup of
experienced pathologists. Before features extraction, nu-clei
segmentation must be performed. Fig. 6 shows examples ofsuccessful
nuclei segmentation.
To evaluate the performance of our screening system, we used1194
nuclei images in this 771 normal nuclei and 423 OSF withdysplasia
nuclei images. In our study we have used k-fold cross val-idation
for training/testing data partitioning. The advantage ofdoing this
is that we can independently choose how large each testset is and
how many trials we average over (Schneider 1997). Inour study the
number of cases (normal: 771, OSF with dysplasia:423) is divided by
10 fold; the size of each fold is not the sameas shown in Table
3.
Here we have employed two supervised classiers viz., Bayesianand
SVM, three unsupervised classiers viz., k-means, FCM, GMM
1.01e+013 7.38e+012 8.21e+013 5.17e+013*
2.38 0.32 2.44 0.42*
3.16 0.36 4.18 0.31*
2.52 0.56 4.39 0.66*
1.14 0.08 1.11 0.09*
8.22 1.82 14.34 2.19*
0.96 0.01 0.96 0.020.68 0.12 0.70 0.12*
0.29 0.13 0.55 0.26*
1.73 60.21 1.17 60.0910.25 2.26 17.82 2.69*
0.77 0.01 0.77 0.011.45 0.69 2.42 1.23*
0.25 0.07 0.25 0.071.59 0.07 1.56 0.07*
0.09 0.03 0.10 0.03*
0.98 0.01 0.97 0.01*
0.64 0.07 0.59 0.06*
0.910.03 0.94 0.02*
18.69 5.71 24.98 7.03*
-
M. Muthu Rama Krishnan et al. / Expert Systemto evaluate the
screening system using 18 features. The best overallperformance
(99.66%) is obtained with 10-fold cross validationusing SVM
classier. The corresponding sensitivity is 99.74% andspecicity is
99.53% are also sufciently high. The supervised clas-siers results
are listed in Table 4. In case of Bayesian we have ob-
Fig. 12. (a) Box plot for area; (b) Density plot of perimeter
for Normal and OSF wi
Table 3Stratied 10-fold cross validation of the given data
set.
Fold Size of training set Size of testing set
Fold#1 1075 119Fold#2 1074 120Fold#3 1074 120Fold#4 1074
120Fold#5 1074 120Fold#6 1075 119Fold#7 1075 119Fold#8 1075
119Fold#9 1075 119Fold#10 1075 119
Table 4Performance measure for supervised classiers.
Classier Average sensitivity(%)
Average specicity(%)
Average accuracy(%)
Bayesian 96.43 96.62 96.56SVM 99.74 99.53 99.66s with
Applications 39 (2012) 10621077 1075tained 96.56% overall
performance. The corresponding sensitivityis 96.43% and specicity
is 96.62%. Fig. 13(ac) shows the sensitiv-ity, specicity and
accuracy plot over 10-fold. In SVM we haveobserved both sensitivity
and specicity are more than 99% in all10-folds consistently, but in
Bayesian classier 7th fold there is adrastic reduction in
sensitivity, specicity and accuracy except thatall other folds are
more than 90%.
The classication accuracy is listed in Table 5 for all the
threeunsupervised classiers; i.e., k means, FCM and GMM
classiers,among them GMM performs well. The best overall
performance(90.37%) is obtained using GMM classier. The
corresponding sen-sitivity is 89.62% and specicity is 91.73% are
also sufciently high.The GMM is trained to classify the data and
the log likelihood willconverge during estimating model parameters.
The log likelihoodplot is given in Fig. 14. It converges in seven
iterations and be-comes stable.
From the above results (Tables 4 and 5), we conclude that theSVM
obtains very promising results in classifying the possibleOSF
patients. We believe that the proposed system can be veryhelpful to
the onco-pathologist for their nal decision on their pa-tients. By
using such an efcient tool, they can make very
accuratedecisions.
8. Conclusion
Accurate screening for OSF biopsy images is important
toprognosis and treatment planning. Visual grading by human
istime-consuming, subjective, and inconsistent while
computerized
th dysplasia; (c) 3D plot of features for normal and OSF with
dysplasia cases.
-
Fig. 13. (a) Sensitivity plot for SVM and Bayesian classiers
over 10-fold; (b) specicityBayesian classiers over 10-fold.
Table 5Performance measure for unsupervised classiers.
Classier Sensitivity (%) Specicity (%) Accuracy (%)
k-Means 84.44 83.22 84.00FCM 90.14 88.18 89.45GMM 89.62 91.73
90.37
Fig. 14. Log-likelihood values of GMM classier during training
over iterations.
1076 M. Muthu Rama Krishnan et al. / Expert Systems with
Applications 39 (2012) 10621077analysis for OSF biopsy images is a
very complex task requiring a lotof appropriate image processing
steps and experts domain knowl-edge for correct screening.
In this paper, we propose an automated system for screeningOSF
biopsy images. In image preprocessing, a median lteringmethod is
proposed to remove noise. Initially basal layer extractedfrom
histopathological images using various steps viz., fuzzy
diver-gence based thresholding subsequently morphological
operationsto nd the lower boundary of the basal layer and parabola
tting.Further, nuclei are extracted from these cells using color
deconvo-lution, marker-controlled watershed transform and GVF
activecontour method, such a hybrid approach is robust in terms
ofremoving noise and preserving shapes of nuclei in OSF
biopsyimages. In feature extraction, 23 features are extracted from
seg-mented biopsy images according to ve types of OSF
characteris-tics including nuclear changes (variation in size and
shape,polymorphism (nuclei of the basal layer are elongated and
perpen-dicular to basement membrane), nuclear irregularity,
hyperchro-masia (excessive pigmentation in hemoglobin content of
basalcell nuclei) and nuclear texture. These features comprise both
localand global characteristics so that normal and OSF with
dysplasiacan be distinguished effectively. In classication,
unsupervised fea-ture selection method is used to select an optimal
feature subset(18 features) from the 23 features for the supervised
and unsuper-vised classiers.
plot for SVM and Bayesian classiers over 10-fold; (c) accuracy
plot for SVM and
-
M. Muthu Rama Krishnan et al. / Expert Systems with Applications
39 (2012) 10621077 1077The major contribution of this study is to
develop an efcientand effective automated screening system for OSF
biopsy imagesusing several methods for image preprocessing,
segmentation, fea-ture extraction and image classication. The
system is effective be-cause experimental results show that 99.66%
of accuracy can beachieved on an average by exercising a set of 341
normal and429 OSF with dysplasia images obtained from more than
20patients. A compact set of 18 features and their
quantitativemeasurements are particularly useful for screening is
dened inthis paper. The best accuracy can be achieved 99.66% using
SVMclassier and 90.37% accuracy achieved using GMM classierbecause
feature subset is carefully selected. We believe that theproposed
system can be very helpful to the onco-pathologist fortheir nal
decision on to their patients.
Acknowledgement
The authors would like to thank Dr. M. Pal, GNDSIR,
Kolkata,India, and Dr. J. Chatterjee, SMST, IIT Kharagpur, India
for theirclinical support and valuable advices. The authors are
very gratefulto Mr. Pratik Shah, SMST, IIT Kharagpur, India for
assistance duringthe implementation of the parabola tting and
colour deconvolu-tion algorithms.
References
Banoczy, J. (1982). Oral leucoplakia (p. 231). Akademiai Kiado:
Budapest.Bezdek, J. C. (1981). Pattern recognition with fuzzy
objective function algorithms. New
York: Plenum Press.Bilmes, J. A. (1998). A gentle tutorial of
the EM algorithm and its application to
parameter estimation for Gaussian mixture and hidden Markov
models. TechnicalReport, UC Berkeley.
Burkhardt, A. (1985). Advanced methods in the evaluation of
premalignant lesionsand carcinoma of the oral mucosa. Journal of
Oral Pathology, 14, 751758.
Chaira, T., & Ray, A. K. (2003). Segmentation using fuzzy
divergence. PatternRecognition Letters, 24(12), 18371844.
Chaira, T., & Ray, A. K. (2009). Fuzzy image processing and
applications with MATLAB.New York: CRC Press, pp. 8081.
Chaudari, D., & Samal, A. (2007). A simple method for tting
of bounding rectangleto closed regions. Pattern Recognition, 40,
19811989.
Daftary, D. K., Murti, P. R., Bhonsale, R. B., Gupta, P. C.,
Mehta, F. S., & Pindborg, J.J.(1993). Oral precancerous lesions
and conditions of tropical interest. In:Prabhu, S. R., Wilson, D.
F., Daftary, D. K., Johnson, N. W., (Eds.), Oral diseases inthe
tropics (pp. 402424). Oxford: Oxford University Press.
Duda, R., Hart, P., & Stork, D. (2007). Pattern classication
(2nd ed.). India: Wiley.Duncan, J. S., & Ayache, N. (2000).
Medical image analysis: Progress over two
decades and the challenges ahead. IEEE Transactions on Pattern
Analysis andMachine Intelligence, 22, 85106.
El-Naqa, I., Yang, Y., Wernick, M. N., Galatsanos, N. P., &
Nishikawa, M. R. (2002). Asupport vector machine approach for
detection of microcalcications. IEEETransactions on medical
imaging, 21, 15521563.
Fan, J., & Xie, W. (1999). Distance measure and induced
fuzzy entropy. Fuzzy SetsSystems, 104, 305314.
Farjam, R., Soltanian-Zadeh, H., Zoroo, R. A., &
Jafari-Khouzani, K. (2005). Tree-structured grading of pathological
images of prostate. Proceedings of SPIE:Medical Imaging, 5747,
840851.
Fuhrman, S. A., Lasky, L. C., & Limas, C. (1982). Prognostic
signicance ofmorphologic parameters in renal cell carcinoma.
American Journal of SurgicalPathology, 6, 655663.
Gilles, F. H., Tavare, C. J., Becker, L. E., Burger, P. C.,
Yates, A. J., Pollack, I. F., et al.(2008). Pathologist
interobserver variability of histologic features in childhoodbrain
tumors: Results from the CCG-945 study. Pediatric and
DevelopmentalPathology, 11, 08117.
Glotsos, D. (2003). A hierarchical decision tree classication
scheme for braintumour astrocytoma grading using support vector
machines. In Proceedings ofthird international symposium on image
and signal processing analysis (Vol. 2, pp.10341038).
Gonzalez, R. C., & Woods, R. E. (2002). Digital image
processing (2nd ed.). New York:Prentice Hall, pp. 655659.
Grieg, G., Kubler, O., Kikinis, R., & Jolesz, F. A. (1992).
Nonlinear anisotropic lteringof MRI data. IEEE Transactions on
Medical Imaging, 11(2), 221232.
Grootscholten, C., Bajema, I. M., Florquin, S., Steenbergen, E.
J., Peutz-Kootstra, C. J.,Goldschmeding, R., et al. (2008).
Interobserver agreement of scoring ofhistopathological
characteristics and classication of lupus nephritis.Nephrology
Dialysis Transplantation, 23, 223230.Hand, J. R., & Broders, A.
(1931). Carcinoma of the kidney: The degree of malignancyin
relation to factors bearing on prognosis. Journal of Urology, 28,
199216.
Haralick, R. M., Shanmugan, K., & Dinstein, I. (1973).
Textural features for imageclassication. IEEE Transactions on
Systems, Man, and Cybernetics, SMC-3,610621.
http://www.cs.cmu.edu/~schneide/tut5/node42.html last accessed
March
2010.http://www.dentistry.bham.ac.uk/landinig/software/software.html
last accessed
March 2010.Huang, P. W., & Lai, Y. H. (2010). Effective
segmentation and classication for HCC
biopsy images. Pattern Recognition, 43(4),
15501563.Jafari-Khouzani, K., & Soltanian-Zadeh, H. (2003).
Multiwavelet grading of
pathological images of prostate. IEEE Transactions on Biomedical
Engineering,50, 697704.
Khotanzad, A., & Hong, Y. H. (1990). Invariant image
recognition by zernikemoments. IEEE Transactions on Pattern
Analysis and Machine Intelligence, 12(5),489497.
Kim, T. Y., Choi, H. J., Cha, S. J., & Choi, H. K. (2005).
Study on texture analysis of renalcell carcinoma nuclei based on
the Fuhrman grading system. In Proceedings ofseventh international
workshop on enterprise networking and computing inhealthcare
industry (pp. 384387).
Lohse, C. M., Blute, M. L., Zincke, H., Weaver, A. L., &
Chenille, J. C. (2002).Comparison of standardized and
non-standardized nuclear grade of renal cellcarcinoma to predict
outcome among 2042 patients. American Journal of SurgicalPathology,
118, 877886.
MacQueen, J. B. (1967). Some methods for classication and
analysis of multivariateobservations. In Proceedings of fth
Berkeley symposium on mathematicalstatistics and probability (Vol.
1, pp. 281297). Berkeley: University ofCalifornia Press.
McKeown, M. J., & Ramsey, D. A. (1996). Classication of
astrocytomas andmalignant astrocytomas by principal component
analysis and a neural net.Journal of Neuropathology and
Experimental Neurology, 55, 12381245.
Mitra, P., Murthy, C. A., & Pal, S. K. (2002). Unsupervised
feature selection usingfeature similarity. IEEE Transactions on
Pattern Analysis and Machine Intelligence,24(4), 301312.
Muthu Rama Krishnan, M., Pal, M., Bomminayuni, S. K.,
Chakraborty, C., Paul, R. R.,Chatterjee, J., et al. (2009).
Automated classication of cells in sub-epithelialconnective tissue
of oral sub-mucous brosis-an SVM based approach.Computers in
Biology and Medicine, 39(12), 10961104.
Muthu Rama Krishnan, M., Shah, P., Pal, M., Chakraborty, C.,
Paul, R. R., Chatterjee, J.,et al. (2010). Structural markers for
normal oral mucosa and oral sub-mucousbrosis. Micron, 41(4),
312320.
Muthu Rama Krishnan, M., Pal, M., Paul, R. R., Chakraborty, C.,
Chatterjee, J., & Ray, A.K. (2010). Computer vision approach to
morphometric feature analysis of basalcell nuclei for evaluating
malignant potentiality of oral submucous brosis.Journal of Medical
Systems. doi:10.1007/s10916-010-9634-5 [Epub ahead ofprint].
Novara, G., Martignoni, G., Artibani, W., & Ficarra, V.
(2007). Grading systems inrenal cell carcinoma. Journal of Urology,
177, 430436.
Paul, R. R., Mukherjee, A., Dutta, P. K., Banerjee, S., Pal, M.,
Chatterjee, J., et al.(2005). A novel wavelet neural network based
pathological stage detectiontechnique for an oral precancerous
condition. Journal of Clinical Pathology, 58,932938.
Perona, P., & Malik, J. (1990). Scale-space and edge
detection using anisotropicdiffusion. IEEE Transactions on Pattern
Analysis and Machine Intelligence, 12(7),629639.
Ruifrok, A. C., & Johnston, D. A. (2001). Quantication of
histochemical staining bycolor deconvolution. Analytical
Quantitative Cytology and Histology, 291299.
Rust, B. W. (2001). Fitting natures basic functions part I:
Polynomials and linearleast squares. Computing in Science and
Engineering, 8489.
Satheesh, M., Paul, M., & Hammond, S. P. (2007). Modeling
epithelial cell behaviorand organization. IEEE Transactions on
NanoBioscience, 6(1), 7785.
Scarpelli, M., Bartels, P. H., Montironi, R., Galluzzi, C. M.,
& Thompson, D. (1994).Morphometrically assisted grading of
astrocytomas. Analytical QuantitativeCytology and Histology, 16,
351356.
Schad, L. R., Schmitt, H. P., Oberwittler, C., & Lorenz, W.
J. (1987). Numerical gradingof astrocytomas. Medical Informatics,
12, 1122.
Shabana, A. H., Gel-Labban, N., & Lee, K. W. (1987).
Morphometric analysis of basalcell layer in oral premalignant white
lesions and squamous cell carcinoma.Journal of Clinical Pathology,
40(4), 454458.
Shuttleworth, J., Todman, A., Norrish. M., & Bennett, M.
(2005). Learninghistopathological microscopy. Pattern Recognition
and Image Analysis, Pt 2,Proceedings. 3687, 764772.
Smith, Y., Zajicek, G., Werman, M., Pizov, G., & Sherman, Y.
(1999). Similaritymeasurement method for the classication of
architecturally differentiatedimages. Computers and Biomedical
Research, 32, 112.
Tabesh, A., Teverovskiy, A. M., Pang, H. Y., Kumar, V. P.,
Verbel, D., Kotsianti, A., et al.(2007). Multifeature prostate
cancer diagnosis and gleason grading ofhistological images. IEEE
Transaction on Medical Imaging, 26, 13661378.
Vapnik, V. (1998). Statistical learning theory (2nd ed.). New
York: Wiley.Xu, C., & Prince, J. L. (1997). Gradient vector ow:
A new external force for snakes. In
Proceeding of IEEE conference on computer vision and pattern
recognition (CVPR)(pp. 6671). Los Alamitos: Comp. Soc. Press.
Hybrid segmentation, characterization and classification of
basal cell nuclei from histopathological images of normal oral
mucosa and oral submucous fibrosis1 Introduction2 Materials and
methods2.1 Histology2.2 Image acquisition2.3 Image processing2.4
Basal cell nuclei segmentation2.4.1 Edge enhancement using
anisotropic diffusion2.4.2 Fuzzy divergence based threshold
selection2.4.3 Parabola fitting2.4.4 Basal cell segmentation using
color deconvolution2.4.4.1 Color deconvolution
2.5 Basal cell nuclei tracking using GVF snakes
3 Feature extraction4 Unsupervised feature selection5 k-Fold
cross validation6 Basal cell nuclei classification6.1 Bayesian
classification6.2 Support vector machine classification6.3 k-means
clustering6.4 Fuzzy c-means clustering6.5 Gaussian mixture model
based clustering6.6 Performance analysis
7 Results and discussion8
ConclusionAcknowledgementReferences