Top Banner
European Journal of Radiology 54 (2005) 80–89 Significance analysis of qualitative mammographic features, using linear classifiers, neural networks and support vector machines Michael Mavroforakis a,, Harris Georgiou a , Nikos Dimitropoulos b , Dionisis Cavouras c , Sergios Theodoridis a a Informatics and Telecommunications Department, University of Athens, TYPA buildings, University Campus, 15771 Athens, Greece b Medical Imaging Department, EUROMEDICA Medical Center, 2 Mesogeion Avenue, Athens, Greece c Medical Image and Signal Processing Laboratory, Department of Medical Instruments Technology, STEF, Technological Educational Institution of Athens, Ag. Spyridonos Street, Egaleo, 12210 Athens, Greece Received 22 November 2004; received in revised form 17 December 2004; accepted 20 December 2004 Abstract Advances in modern technologies and computers have enabled digital image processing to become a vital tool in conventional clinical practice, including mammography. However, the core problem of the clinical evaluation of mammographic tumors remains a highly demanding cognitive task. In order for these automated diagnostic systems to perform in levels of sensitivity and specificity similar to that of human experts, it is essential that a robust framework on problem-specific design parameters is formulated. This study is focused on identifying a robust set of clinical features that can be used as the base for designing the input of any computer-aided diagnosis system for automatic mammographic tumor evaluation. A thorough list of clinical features was constructed and the diagnostic value of each feature was verified against current clinical practices by an expert physician. These features were directly or indirectly related to the overall morphological properties of the mammographic tumor or the texture of the fine-scale tissue structures as they appear in the digitized image, while others contained external clinical data of outmost importance, like the patient’s age. The entire feature set was used as an annotation list for describing the clinical properties of mammographic tumor cases in a quantitative way, such that subsequent objective analyses were possible. For the purposes of this study, a mammographic image database was created, with complete clinical evaluation descriptions and positive histological verification for each case. All tumors contained in the database were characterized according to the identified clinical features’ set and the resulting dataset was used as input for discrimination and diagnostic value analysis for each one of these features. Specifically, several standard methodologies of statistical significance analysis were employed to create feature rankings according to their discriminating power. Moreover, three different classification models, namely linear classifiers, neural networks and support vector machines, were employed to investigate the true efficiency of each one of them, as well as the overall complexity of the diagnostic task of mammographic tumor characterization. Both the statistical and the classification results have proven the explicit correlation of all the selected features with the final diagnosis, qualifying them as an adequate input base for any type of similar automated diagnosis system. The underlying complexity of the diagnostic task has justified the high value of sophisticated pattern recognition architectures. © 2005 Elsevier Ireland Ltd. All rights reserved. Keywords: Mammography; Tumor characterization; Automated diagnosis; SVM classifiers 1. Introduction Breast cancer is the most common cancer type and the sec- ond most common death cause in women in civilized world. Corresponding author. Present address: 43 Knossou Str., Glyfada 16561, Athens, Greece. Tel.: +30 210 9648663. E-mail address: [email protected] (M. Mavroforakis). Screening mammography, for detecting early breast cancer in asymptomatic women, increases the likelihood for cure and long-term survival. However, in cases of indeterminate mam- mographic findings, breast biopsy may be required. Avoiding unnecessary biopsies is important due to the discomfort, cost and probable breast scars inflicted upon the patients, which may cause diagnostic difficulty in future mammographic ex- aminations. 0720-048X/$ – see front matter © 2005 Elsevier Ireland Ltd. All rights reserved. doi:10.1016/j.ejrad.2004.12.015
10

Significance analysis of qualitative mammographic features, using linear classifiers, neural networks and support vector machines

Apr 27, 2023

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Significance analysis of qualitative mammographic features, using linear classifiers, neural networks and support vector machines

European Journal of Radiology 54 (2005) 80–89

Significance analysis of qualitative mammographic features, using linearclassifiers, neural networks and support vector machines

Michael Mavroforakisa,∗, Harris Georgioua, Nikos Dimitropoulosb,Dionisis Cavourasc, Sergios Theodoridisa

a Informatics and Telecommunications Department, University of Athens, TYPA buildings, University Campus, 15771 Athens, Greeceb Medical Imaging Department, EUROMEDICA Medical Center, 2 Mesogeion Avenue, Athens, Greece

c Medical Image and Signal Processing Laboratory, Department of Medical Instruments Technology, STEF,Technological Educational Institution of Athens, Ag. Spyridonos Street, Egaleo, 12210 Athens, Greece

Received 22 November 2004; received in revised form 17 December 2004; accepted 20 December 2004

Abstract

Advances in modern technologies and computers have enabled digital image processing to become a vital tool in conventional clinicalp emandingc n experts,i robust seto mmographict nst currentc ies of them ed externalc the clinicalp rposes of thiss ification fore lting datasetw ethodologieso e differentc e efficiencyo statisticala hem as ana ustified theh©

K

1

o

A

cer inandam-ding, costhichex-

0d

ractice, including mammography. However, the core problem of the clinical evaluation of mammographic tumors remains a highly dognitive task. In order for these automated diagnostic systems to perform in levels of sensitivity and specificity similar to that of humat is essential that a robust framework on problem-specific design parameters is formulated. This study is focused on identifying af clinical features that can be used as the base for designing the input of any computer-aided diagnosis system for automatic ma

umor evaluation. A thorough list of clinical features was constructed and the diagnostic value of each feature was verified agailinical practices by an expert physician. These features were directly or indirectly related to the overall morphological propertammographic tumor or the texture of the fine-scale tissue structures as they appear in the digitized image, while others contain

linical data of outmost importance, like the patient’s age. The entire feature set was used as an annotation list for describingroperties of mammographic tumor cases in a quantitative way, such that subsequent objective analyses were possible. For the putudy, a mammographic image database was created, with complete clinical evaluation descriptions and positive histological verach case. All tumors contained in the database were characterized according to the identified clinical features’ set and the resuas used as input for discrimination and diagnostic value analysis for each one of these features. Specifically, several standard mf statistical significance analysis were employed to create feature rankings according to their discriminating power. Moreover, threlassification models, namely linear classifiers, neural networks and support vector machines, were employed to investigate the truf each one of them, as well as the overall complexity of the diagnostic task of mammographic tumor characterization. Both thend the classification results have proven the explicit correlation of all the selected features with the final diagnosis, qualifying tdequate input base for any type of similar automated diagnosis system. The underlying complexity of the diagnostic task has jigh value of sophisticated pattern recognition architectures.2005 Elsevier Ireland Ltd. All rights reserved.

eywords:Mammography; Tumor characterization; Automated diagnosis; SVM classifiers

. Introduction

Breast cancer is the most common cancer type and the sec-nd most common death cause in women in civilized world.

∗ Corresponding author. Present address: 43 Knossou Str., Glyfada 16561,thens, Greece. Tel.: +30 210 9648663.E-mail address:[email protected] (M. Mavroforakis).

Screening mammography, for detecting early breast canasymptomatic women, increases the likelihood for curelong-term survival. However, in cases of indeterminate mmographic findings, breast biopsy may be required. Avoiunnecessary biopsies is important due to the discomfortand probable breast scars inflicted upon the patients, wmay cause diagnostic difficulty in future mammographicaminations.

720-048X/$ – see front matter © 2005 Elsevier Ireland Ltd. All rights reserved.oi:10.1016/j.ejrad.2004.12.015

Page 2: Significance analysis of qualitative mammographic features, using linear classifiers, neural networks and support vector machines

M. Mavroforakis et al. / European Journal of Radiology 54 (2005) 80–89 81

The diagnostic and clinical evaluation of mammographicimages constitutes a difficult and complex cognitive task,which requires advanced levels of expertise and knowledgeby the trained physicians. Mammographic screening, for theidentification of abnormalities and the pathological char-acterization of breast tissue, is a visual task that combinesseveral aspects and X-ray findings, presented in variousareas of the mammographic image, as well as external dataavailable through each patient’s clinical history. Specificclinical findings, such as the morphological properties andfine-scale structural information of the underlying tissue,are the key factors in characterizing the severity of everymammographic tumor, i.e., its benign or malignant nature[1]. Modern computer technology can be used to implementautomatic image processing and analysis of various aspectsof these findings, thus supporting effectively the expert’sevaluation as a valuable suggestive tool. However, the exacttask of tissue characterization and classification of a tumor,as probable benign or probable malignant, is extremelycomplex and includes advanced inference mechanisms[2,3].Consequently, computer-aided diagnosis (CAD) systemsfocus on specific aspects of the diagnostic process, suchas the identification and analysis of microcalcifications orthe detection of irregular tissue structures, each suggestiveof specific abnormalities[4]. Therefore, it is essential thatthese morphological and textural properties are defined ind bef tivem

igatee rrentc m-m witho (c)t ema thod.S uresw icalf ectlyo s ofb turesi them rea,o ory.S ions,s wasq tasetw mes,e ns,r atternr thisa dies,a s ina turesa puts any

type of automated tumor diagnosis system that is based onmorphological, textural or descriptive datasets.

2. Materials and methods

The current study was based on four distinct issues: (1)create a thorough list of abnormal findings regarding diagnos-tic evaluation of a mammogram, especially related to imagetextural and morphological features of the underlying tissue.From this list, the most prominent and content-rich featureswere to be selected, according to their suitability for auto-matic extraction through image processing algorithms. (2)Create a specialized mammographic image database, con-taining clearly identifiable and histologically verified casesof benign or malignant tumors. All cases were evaluated andannotated in relation to the previously defined list of impor-tant clinical features. (3) Analyze the newly constructed setof mammographic images in relation to the feature list, focus-ing especially on investigating the importance, comprehen-siveness and consistency of each one of these features whencorrelated with the verified final diagnosis. (4) Investigatethe performance of individual features, as well as subsets ofcombined features, when used as real training datasets forvarious classifier architectures, i.e., linear classifiers, neuralnetworks (NN) and support vector machines (SVM).

2

archt mer-o atedt tionw l them arer nducta sto icalb thef icro-c ies,w , areu ns ofb

recti um-m riesa r to.T thec hms,w tedi mo-g theirl alea iptiono

etail as a specific list of qualitative features that canormulated into a robust set of corresponding quantitaeasurements.This study focuses on three core issues: (a) to invest

fficient mammographic features, already used in culinical practice for the pathological evaluation of maographic tumors, (b) to assess their diagnostic valuebjective statistical and classification methods, and

o formulate a robust quantitative model for using ths the input for any automated image analysis mepecifically, a complete and coherent set of clinical featas constructed, by exploiting significant patholog

actors, related to mammographic abnormalities and dirr indirectly suggestive of probable malignant casereast tumors. The information content of these fea

s related to the mammographic image itself, namelyorphological and textural properties of the tumor’s ar external data obtained by the patient’s clinical histubsequently, this set of qualitative descriptive estimatupplied by the expert physician’s subjective evaluation,uantified and translated into a robust dataset. This daas used in statistical and classification analysis schemploying a wide range of discrimination evaluatioanging from standard significance tests, to advanced pecognition architectures. The results obtained duringnalysis can be used for objective comparative stus well as to produce ranking lists of clinical featureccordance to their true diagnostic value. These feand their relative discriminative power constitute the inpecifications and guidelines, which are essential for

.1. Mammographic features list

The first phase of the study included an extensive resehrough various aspects of identifying and evaluating nuus radiologic findings in mammographic images, rel

o benign and malignant abnormalities. The investigaas conducted by enumerating and documenting alorphological and textural tissue characteristics, which

ecognized and evaluated by the experts when they coclinical diagnosis[5–7]. Furthermore, an additional li

f other important features, like patient’s age and clinackground, were also included in this list. Some of

eatures, like the presence of suspicious masses or malcifications, are normally related directly to abnormalithile others, like the exact location and size of the masssually evaluated as intermediate suggestive indicatioenignancy or malignancy[2,3].

The complete list of the 31 features, along with dindications of benign and malignant biopsy results, is s

arized inTable 1. The features were grouped in categoccording to the general type of abnormality they refehe “CPU” column refers to the capability of relatingorresponding features to image processing algorithich can automatically extract specific content-rela

nformation. Advanced algorithms for automated mamraphic lesion detection have been proposed, however

evel of sensitivity and specificity, as well as their fine-scccordance to the corresponding expert’s detailed descrf tumor boundaries, is still under investigation[4,8].

Page 3: Significance analysis of qualitative mammographic features, using linear classifiers, neural networks and support vector machines

82 M. Mavroforakis et al. / European Journal of Radiology 54 (2005) 80–89

Table 1Clinical findings and features normally implicated in mammogram evaluation

Features list Morphological data Textural data Other CPU Doctor

TumorIntramammary node

√ √ √Size (general view)

√ √ √Inclusion of fat (%)

√ √ √Degree of irregularity

√ √Type of irregularity

√ √ √ √Stellate border

√ √ √Indistinct border

√ √ √ √Density (hypo/iso/hyper)

√ √ √Homogeneity

√ √ √Location

√ √ √Diameter

√ √ √Boundary shape (type)

√ √ √

MicrocalcificationsSize of cluster (general view)

√ √ √Number of elements

√ √ √Shape of cluster

√ √ √Variability of size of elements

√ √ √Irregular shape of elements

√ √ √Linear or branching elements

√ √ √

Secondary signsArchitectural distortion

√ √Asymmetric density

√ √ √Skin thickening or retraction

√ √Regional calcifications

√ √ √

Previous historyAvailability

√ √comparability

√ √Existence of abnormality in previous study

√ √

Correlation with clinical findingsAvailability

√ √Correlation: location of clinical findings with radiographic study

√ √Correlation: size/extent of clinical findings with radiographic study

√ √Level of suspicion due to clinical findings

√ √

Other dataAge

√ √Benign/malignant (histological)

√ √

It is obvious that some of the above features, althoughvery important, are not directly related to the mammo-graphic image by itself and, thus, they have to be providedas external annotation data for each case by the physician[5,6]. Furthermore, not all of them are related to the clinicalcharacterization of tumors, which is the main concern of thecurrent study. Subsequently, a robust, content-rich subsetof features was constructed, using selected features that arehighly related to tumor benignancy or malignancy and, at thesame, time refer to textural and morphological characteristicsof the tumor, i.e., to objective image properties. These featureselections were also based on the general requirement thatthe features can be automatically extracted and processed. Inthis case, the features that are extracted from the image canbe linked directly or indirectly to morphological or texturalproperties of the tissue inside and around the tumor area, as itappears on the image itself[9–12]. Qualitative or descriptivefeatures were scaled in numerical ranges or percentages,

in order to acquire quantitative data values. Both the finalfeature list selections, as well as the exact quantificationscales, were defined in cooperation with an expert physi-cian in order to ensure complete and detailed clinicalresults.

The final set of nine clinical features was the base for theannotation list, which was used to describe and documentthe expert’s clinical evaluation for each mammographic im-age in the database. Specifically, (1) thepresence of tumors,(2) thepresence of microcalcifications, (3) the tumor den-sity, (4) thepercentage of fatwithin the tumor, (5) thetumorboundary vagueness, (6) thetumor homogeneity, (7) thetu-mor morphological shape type, (8) thepatient’s age, as wellas (9) thefinal histologic diagnosis, were included. As thepatient’s age remains a feature of high clinical importance, itwas also included in the final annotation list as a unique “ex-ternal” data, although it cannot be referred directly from themammographic image itself[5,6]. Finally, the morphological

Page 4: Significance analysis of qualitative mammographic features, using linear classifiers, neural networks and support vector machines

M. Mavroforakis et al. / European Journal of Radiology 54 (2005) 80–89 83

Fig. 1. Morphological shape type representations of mammographictumors.

shape type refers to the classification of the tumor’s shape inone of four predefined shape categories, related to tumor;sboundary roughness and stellate or lobulated outline. Thesefour categories, illustrated inFig. 1, are defined asround,lobulated, micro-lobulatedandstellate, and their ranking isdirectly related to their pathology, from benign to malignant,respectively[13–16].

These quantified properties are essentially explicit infor-mation related to specific types of malignant mammogramabnormalities, including architectural distortion, clusters ofmicrocalcifications, lobulated or stellate masses, as well asskin oedema[7]. Thus, the initial annotation list constitutesa complete dataset that provides significant diagnostic data,which has been used for further statistical and clustering anal-ysis. The final annotation list, containing the selected featuresand quantification scales, is presented inTable 2.

2.2. Mammographic image database

The second phase of the study included the creation ofa thorough mammographic image database, especially de-signed to focus on cases of tumor presence with histologicalverification as benign or malignant by an expert physician.The requirement for patient’s clinical history and positivehistological verification of the benignancy or malignancy ofeach case was assessed as one of extreme importance fot s, an caseo lua-t eralh ase fot eri-fi The

TF

Q

PMMFBM r)MM )/3

H t)

selected subset was constructed in accordance to the gen-eral requirement for complete and unbiased statistical dis-tributions over all the radiologic findings investigated in thestudy.

The selected mammogram films were digitized at a typicalresolution of 63�m (400 dpi) with 8 b graylevel depth, inorder to retain fine scale textural and structural tissue charac-teristics. Furthermore, some additional post-processing wasapplied uniformly over all the selected images, using opti-mized unsharp filtering for image enhancement with minimalspectrum alteration. The resulting images were evaluated andverified by the expert as acceptable in terms of image qualityand resolution. The final set of 130 images of histologicallyconfirmed lesions (46 benign and 84 malignant) was usedas the base for all the subsequent analysis presented in thisstudy, with no reduction in spatial resolution or grayleveldepth.

Subsequently, every mammographic image in the databasewas evaluated by two expert radiologists and all the impor-tant findings were recorded separately for each case, usingthe annotation list that was created during the previous phaseof the study. As the tumor’s shape is one of the most impor-tant morphological properties for clinical characterization, itwas essential that shape type information was provided bythe expert and registered into the annotation list. In addition,for further morphological and textural analysis capabilities ats earlyd tline.I de-t eacht ng ah ddedb ndaryd thed s fort

2

inedt , ma-l ormald andv gatedu singe pa-r esti-m Sub-s roxi-m cal-c twod eachi urec holdw rrorsf cative

he quality and validity of the subsequent results. Thuew, special-purpose image set was assembled, usingf mammographic tumors with complete radiologic eva

ion and histologic diagnosis. The initial set contained sevundreds standard mammograms and it was used as a b

he final selection of tumor cases with positive clinical vcation by surgical biopsy and histologic examination.

able 2inal annotation list and quantification details

ualitative feature Range

atient’s age True ageass existence Yes/noicrocalcifications existence Yes/noat percentage 0,. . ., 100oundary sharpness 0,. . ., 100ass density L (hypo)/M (iso)/H (hypeass homogeneity 1,. . ., 10ass shape type 1 (round)/2 (lobulated

(micro-lobulated)/4(stellate)

isologic diagnosis B (benign)/M (malignan

r

s

r

ome later stage, it was crucial that every tumor was clescribed and registered by defining its boundary ou

n order to obtain tumor boundaries of high quality andail, a manual segmentation was applied. Specifically,umor was manually described by the radiologists usiigh-resolution digitizer device and stored as an embeoundary descriptor via alpha channel data. These bouescriptions were used for further independent work onefinition of mass inclusion masks and boundary zone

extural features extraction at these areas of interest.

.3. Statistical analysis

Statistical analysis was conducted on the data obtahrough the annotation list in three groups: benign casesignant cases and all cases. For each of these groups, nistribution approximation parameters, i.e., mean valueariance, were calculated and the results were investinder statistical significance analysis and projected aliarrors [17]. Specifically, for mean values calculated seately for each case grouping, significance ranges wereated according to the current group size and variance.

equently, feature distributions for each group were appated by normal distributions and the statistical error was

ulated according to the optimal limiting value separatingifferent classes, i.e., benign and malignant cases, using

ndividual feature. In other words, for every individual featontained in the annotation list, an optimal decision thresas calculated and the corresponding classification e

or the benign and malignant cases were used as an indi

Page 5: Significance analysis of qualitative mammographic features, using linear classifiers, neural networks and support vector machines

84 M. Mavroforakis et al. / European Journal of Radiology 54 (2005) 80–89

Fig. 2. Likelihood probability distributions of tumor boundary sharpness versus diagnosis and the corresponding bimodal normal distribution. Interferencebetween the two normal distribution curves constitutes the statistical error probability due to aliasing between the two classes. The true aliasingerror wascalculated by applying the specific decision threshold for boundary sharpness indicator in the current set of 130 mammographic images.

measurement of statistical aliasing. As an example, the use oftumor boundary sharpness, as the sole discrimination featurefor bimodal normal distribution modeling, is illustratedin Fig. 2.

The features were also processed through Univariatesignificance analysis, specificallyT-test [18]. Multivariatesignificance analysis was also applied, using the multivariateanalysis of variance (MANOVA) method[19]. In both cases,every feature was investigated separately under statisticaldependence hypotheses in relation to the diagnosis andthe results formulated a quantitative ranking, regarding thecorrelation between each feature and the diagnosis.

2.4. Classification analysis

In order to assess the discriminative power of each one ofthe qualitative clinical measurements, several classificationschemes were applied against the verified diagnosis for eachcase. Pattern recognition techniques include various typesof decision-theoretic approaches for data analysis and clas-sification, and have been proven extremely valuable to realproblems of high complexity such as the task of mammo-graphic diagnosis. In this study, several of the standard linearand non-linear classifiers were used in order to evaluate boththe true performance of these features, as well as the overallc

Both linear and non-linear classification architectureswere employed for every dataset configuration. Specifically,optimization of the best feature set was investigated by ap-plying exhaustive search through all the combinations offeatures, in order to identify the ones that yield maximumdiscrimination capability and optimal performance. Fur-thermore, the performance of each classifier was evaluatedthrough extensive use ofk-fold cross validation techniques,specifically leave-one-out and leave-k-out methods[20].

For linear classificationtesting, three standard modelswere considered.Linear discrimination analysis(LDA) wasapplied in the form of classifier, using iterative subsets of theinitial training set and employing leave-one-out classifica-tion for every individual pattern in the set[21]. A minimumdistance classifier(MDC) with Mahalanobis distance func-tion was employed in combination with least-squares datatransformation for better statistical compactness, yielding theleast-squares minimum distance classifier(LSMD) that wasused in this study[18].

Additionally, a typicalnearest neighbor classifierwithvariable neighborhood size (K-NN) was employed, using theneighborhood sizeK as an optimization parameter[18].

From the various types of typical non-linear classifiers,two representative neural architectures were considered. Amulti-layered perceptron(MLP) neural network model wasused, using the back-propagation algorithm for training,

omplexity of the problem itself[18].
Page 6: Significance analysis of qualitative mammographic features, using linear classifiers, neural networks and support vector machines

M. Mavroforakis et al. / European Journal of Radiology 54 (2005) 80–89 85

employing topology optimization and various choices for theneuron activation functions, specifically softmax, hyperbolictan and hard limiter[22]. Similarly to the MLP, aradialbasis function(RBF) neural network architecture was alsoemployed as a kernel-based alternative, using Gaussianactivation functions and optimized topology[23]. For neuralnetwork classifiers, no feature reduction was necessary, asthe neurons of the (trained) input layer could be examinedin order to discard features that correspond to input weightswith zero or near-zero values[24]. In other words, thearchitecture of the neural networks favors the automaticranking of the inputs during the training phase, in a waythat the final classifier can be examined in order to identifysignificant and non-significant features.

For more advanced investigation of the feature set, typicalsupport vector machine(SVM) models were applied in re-lation to the final diagnosis. Specifically, the C-SVC modelwas used in combination with standard RBF kernel functions,optimizing the penalty factor (C) and the Gaussian spread pa-rameter (σ) during training[25]. SVM classifiers employedlimited feature set optimizations, using iterative runs of en-larging inclusions of several features, available on the featureranking lists created by MANOVA significance analysis. Dueto the statistical importance of the shape type feature, all clas-sifications were considered both with and without the inclu-sion of this specific feature.

3

medt icals them hol-o ma-l rary,

Table 3Distribution of the four morphological shape types against diagnosis

Round Lobulated Micro-lobulated Stellate

Benign 25 18 2 183% 95% 5% 3%

Malignant 5 1 41 3717% 5% 95% 97%

Percentages are calculated per column.

Table 4Distribution of the “1 + 2” and “3 + 4” grouped morphological shape typesagainst diagnosis

Round + lobulated Micro-lobulated + stellate

Benign 43 388% 4%

Malignant 6 7812% 96%

Percentages are calculated per column.

micro-lobulated and stellate types exhibited 95 and 97% ofmalignancy, respectively, as illustrated inTable 3. When com-bining the round and lobulated cases, the overall percentageof malignancy was 12%, while for combined micro-lobulatedand stellate cases, the overall percentage of malignancy was96%, as illustrated inTable 4. This high statistical depen-dency of specific morphological features of each tumor withits verified pathology confirms the clinical value of its shapewhen conducting a pathological evaluation of a mammogram.It should be noted that if the shape type feature were to beused as the sole input for predicting the final diagnosis, anaccuracy rate just over 93% could be achieved.

3.1. Statistical significance analysis

Global statistics of the dataset were calculated for everyindividual feature in relation to the final diagnosis. The

TS

iation Skewness Kurtosis Mean configuration range

B0.119 1.623 ±2.583

−1.417 0.006 ±0.1212.208 3.539 ±0.078

−2.925 8.514 ±0.059

M

C culated alculates

. Results

Preliminary analysis on the initial dataset has confirhe strong statistical correlation between morphologhape type and verified diagnosis of breast tumors inammograms. Specifically, the first two types of morpgy, round and lobulated tumors, exhibited 17 and 5% of

ignancy, respectively, within the same class. On the cont

able 5tatistics of benign and malignant cases

Mean Standard dev

enign (cases: 46)Patient’s age 47.457 8.939Microcalcifications presence 0.783 0.417Fat% inclusion 0.126 0.270Boundary sharpness 0.808 0.205Tumor density 0.326 0.701Tumor homogeneity 7.109 1.464Tumor shape type 1.543 0.690

alignant (cases: 84)Patient’s age 57.631 9.079Microcalcifications presence 0.810 0.395Fat% inclusion 0.000 0.000Boundary sharpness 0.255 0.264Tumor density 0.798 0.485Tumor homogeneity 5.381 1.605Tumor shape type 3.310 0.776

ells indicating “–” mean that the specific parameter could not be calignificance level (alpha) 0.95.

−0.555 −0.781 ±0.203−0.951 0.979 ±0.423

1.324 2.215 ±0.199

0.098 −0.188 ±1.942−1.605 0.590 ±0.084

– – –1.017 −0.032 ±0.056

−2.420 5.260 ±0.1040.266 −0.328 ±0.343

−1.398 2.394 ±0.166

due to zero variance. All confidence ranges for mean values were cd for

Page 7: Significance analysis of qualitative mammographic features, using linear classifiers, neural networks and support vector machines

86 M. Mavroforakis et al. / European Journal of Radiology 54 (2005) 80–89

Table 6Test statistics of benign versus malignant cases

Statistics (benign vs. malignant) Classes boundary T-test value F-test value Statistical errors Error probability (%)

Patient’s age 47.791 1.704e−08 0.926 29 22.31Microcalcifications presence N/S 0.721 0.660 – –Fat% inclusion N/S 0.003 – – –Boundary sharpness 0.601 1.041e−24 0.067 15 11.54Tumor density N/S 0.127e−03 3.801e−03 – –Tumor homogeneity 7.006 1.181e−08 0.505 34 26.15Tumor shape type 2.226 3.890e−24 0.392 9 6.923

ForF-test values, cells indicating “–” mean that the specific parameter could not be calculated due to zero variance. Class boundary values indicating “N/S”mean that the clustering of the specific feature could not be qualified as bimodal normal distribution model, i.e. it could not produce any linear discriminationon the base classes.

properties and differences in the resulting bimodal normaldistributions, for benign and malignant cases, revealed thediscriminating power of each individual feature, as well asthe corresponding significance ranges for the mean values.The mean value and standard deviation of each feature wereused for constructing a bimodal normal distribution statisticalmodel, while the corresponding aliasing error between thetwo kernels was used to evaluate the statistical separability ofbenign and malignant cases, according to this feature.Fig. 2presents a graphical display of using boundary sharpness forthis type of statistical modeling.Table 5summarizes the com-plete results of these tests, whileTable 6summarizes all typesof statistical discrimination modeling, includingT-test,F-testand bimodal normal distribution modeling. Although sometests were not applicable to specific features due to statisticallimitations (e.g., zero variance or non-separable Gaussiandistributions), early conclusions on the discrimination powerof each individual feature could already be drawn from theseearly tests.

Furthermore, in order to produce feature rankings, whichtake into account statistical dependencies between the indi-vidual features, MANOVA was applied, investigating the dis-criminating power of each feature against the final diagnosis,as well as its independency to all the other features.

Table 7summarizes the feature rankings for all statisti-cal significance analysis methods applied in this study. Ther ing,U n-s s in

TF rsa

T

B peM pnessPMM s?F eityM

F veralls ead oft

the exact ordering with regard to the importance of each fea-ture.

3.2. Classification results

Classifications results were used as guidelines for evaluat-ing the performance of individual features, as well as identi-fying optimal feature combinations. Classification accuracyrates were thoroughly investigated for all classifier modelsand comparative results were obtained against the final diag-nosis.

3.2.1. Individual features evaluationDatasets of single feature inclusions were constructed for

conducting discriminating power analysis against diagnosis,using a typical LSMD classifier. The complete results for in-dividual feature classification configurations are summarizedin Table 8. Similarly to the results already obtained by sta-tistical significance analysis, the morphological shape typeof the tumor proved to be the most correlated feature withregard to final diagnosis. When patient’s age and shape typefeatures were excluded, optimal feature combinations, whichwere selected by the classifier, included tumor’s boundarysharpness, fat inclusion percentage and tumor homogeneity,yielding a maximum accuracy rate of 86.9%. The introduc-t ucedo sifi-c tionsw 87.7t ape

TT MDc

Q

MBFMMPM

esults obtained by bimodal normal distribution modelnivariate and MANOVA were generally similar and coistent, producing feature rankings with little difference

able 7eature ranking lists produced byT-test, bimodal normal distribution errond MANOVA evaluations against diagnosis

-test ranking Bimodal distributionerror ranking

MANOVA ranking

oundary sharpness Mass shape type Mass shape tyass shape type Boundary sharpness Boundary sharatient’s age Patient’s age Patient’s ageass homogeneity Mass homogeneity Fat percentageass density Mass density Microcalcificationat percentage Fat percentage Mass homogenicrocalcifications? Microcalcifications? Mass density

or non-linearly separable bimodal normal distribution cases, the ohape and aliasing of the underlying distributions are considered instrue misclassification errors.

ion of patient’s age into the set of input features also prodptimal configurations, achieving 89.2% accuracy. Clasation results analysis showed that for feature combinaithout shape type inclusion, the accuracy ranged from

o 91.5%, while for feature combinations that included sh

able 8rue discrimination efficiency of individual features through LSlassification

ualitative feature LSMD(success%− diagnosis)(%)

ass shape type 93.1oundary sharpness 86.1at percentage 74.6ass density 73.1ass homogeneity 73.1atient’s age 68.5icrocalcifications? 60.8

Page 8: Significance analysis of qualitative mammographic features, using linear classifiers, neural networks and support vector machines

M. Mavroforakis et al. / European Journal of Radiology 54 (2005) 80–89 87

Table 9Success rates of all classifiers against diagnosis prediction, with and withoutshape type input

Classifier model Accuracy without shapetype information (%)

Accuracy includingshape type information(%)

Target: tumor diagnosisLDA 87.69 93.85K-NN 91.54 93.08LSMD 89.23 93.08MLP 91.54 91.54RBF 90.77 91.54C-SVC/RBF 93.85 94.62

type information, the accuracy ranged from 91.5 to 93.1%,essentially verifying the explicit discriminating value of thisspecific feature.

3.2.2. Comparative classifier performanceFor a more realistic performance analysis for optimized

feature combinations, a wide range of linear and non-linearclassifiers were used. Linear classifiers included exhaustivesearch through all feature subsets for identifying optimalfeature combinations, while NN and SVM classifiers usedfull feature sets or optimal feature subsets, already availablethrough the linear classifiers.Table 9summarizes the highestaccuracy rates achieved by each classifier, with and withoutthe inclusion of shape type information.

The LDA classifier exhibited an accuracy rate of 87.69%when using an optimally selected feature set of patent’s ageand tumor’s boundary sharpness. When the shape type featurewas included in the input, the optimal feature set was con-stituted of the tumor’s boundary sharpness and shape typefeatures, and the accuracy rates were raised up to 93.85%It should be noted that this particular accuracy rate wasmarginally higher than the statistical dependency of tumor’sshape type versus its final diagnosis, namely 93.08%, thusproving the importance of combining several features to con-struct optimal feature subsets.

Similarly to the LDA, the optimizedK-NN classifiera l thea wheni tures shapet ticalc bableb bablem theo ughm s oft

r tot vedw infor-m well.A theo heme

is analogous to the suggestive statistical grouping of roundand lobulated cases as probable benign, and micro-lobulatedand stellate cases as probable malignant.

Both the MLP classifier and the RBF neural classifiersemployed full feature sets and optimized size for the hid-den layer. The MLP classifier yielded an accuracy of 91.54%when no shape type information was available, using oneor four hidden units and linear activation function. The in-clusion of the shape type feature did not affect the overallaccuracy rate, although it resulted in many more configura-tions achieving this maximum efficiency. Similarly, the RBFclassifier achieved an accuracy of 90.77% without shape typefeature inclusion (using eight hidden units) and 91.54% withshape type feature inclusion (using five hidden units).

The SVM classifier, employed in this study as a repre-sentative candidate of this family, was theC-SVCmodel(penalty-drivenSVM classifier) with radial basis kernel func-tion (RBF). For feature lists with shape type information ex-cluded, the accuracy rates achieved by the SVM classifierwere 93.85% even when using only the first four features fromthe ranking list, namely tumor’s boundary sharpness, patent’sage, percentage of fat inclusion and tumor homogeneity. Forfeature lists including the shape type property, the accuracyrates achieved by the SVM classifier were 94.62%, conclu-sively higher than the statistical correlation between shapetype and diagnosis. As expected, the shape type feature wasi el ofp rawnr me-t aves n1 dingv nt

4

l theh s int icals de-p cificf y forc suc-c tumorb ient’sa pert’se liste thusq ems,a l fea-t

urew per-t l

chieved an accuracy rate of 91.54% when using alvailable features except shape type and 93.08%

ncluding shape type information. In fact, the optimal feaubset in the second case was constituted only by theype feature itself, essentially implementing the statislassifier of grouping round and lobulated cases as proenign, and micro-lobulated and stellate cases as proalignant. There was no clear indication regarding

verall optimal value for the neighborhood size, althoost configurations of high accuracy employed size

hree to eight neighboring samples.For the LSMD classifier, success rates were simila

heK-NN. Specifically, an accuracy of 89.23% was achiehen using all the available features except shape typeation and 93.08% when using the shape type feature ass in the case ofK-NN, the shape type feature dominatedptimal feature subset and the resulting classification sc

.

ncluded in all optimal feature subsets achieving this leverformance. Although no clear conclusions can be degarding the exact choices on the values of SVM paraersCandσ, analysis of the various SVM configurations hhown that the values of the penalty factorC, namely betweeand 10, were inversely proportional to the correspon

alues of function spread parameterσ, namely from 0.1 dowo 0.01.

. Discussion

Results from statistical significance analysis reveaigh correlation between most of the qualitative feature

he annotation list and the final diagnosis. Morphologhape type of the tumor’s outline exhibits the highestendency in relation to the diagnosis, yielding the spe

eature as adequate to provide discrimination capabilitorrectly classifying benign and malignant cases withess rates up to 93%. Several other features, such asoundary sharpness, tumor homogeneity, as well as patge, have proven as important clinical aspects of the exvaluation. All the features contained in the annotationxhibited some degree of correlation to the diagnosis,ualifying them as plausible for automatic diagnosis systlthough in most cases optimal combinations of severa

ures have to be used, instead of single features.The discriminating value of each individual feat

as confirmed by several statistical significance proies, includingT-test, F-test, MANOVA, bimodal norma

Page 9: Significance analysis of qualitative mammographic features, using linear classifiers, neural networks and support vector machines

88 M. Mavroforakis et al. / European Journal of Radiology 54 (2005) 80–89

distribution error estimation and mean value significancerange, as well as real classification runs using a typicalLSMD classification scheme. Although the feature rankinglists, created using the results of these analysis methods, dif-fer slightly on the exact ordering of the features, some basicconclusions could be drawn with regard to the overall qualityof each individual feature. Specifically, the morphologicalshape type of the tumor, namely its classification in one of theround, lobulated, micro-lobulated or stellate categories, hasbeen established as the most important feature. Round andlobulated cases have been proven highly correlated to benig-nancy, while micro-lobulated and stellate cases have shownhigh correlation to malignancy, asTables 3 and 4show. Forcombined features configurations, optimized sets containingthe shape type feature exhibited 3 to 4% higher success ratesthan the ones without it, when used in real classificationschemes of both linear and non-linear models. Subsequently,tumor boundary sharpness or fuzziness was clearly sugges-tive to benign or malignant tumors respectively, asTable 8shows. Another important feature was the overall density ofthe tissue that constitutes the tumor, as dense tissue samplesof abnormal physiology are typically related to malignantgrowth rate of the cells in those areas[2,7]. Tumor’s homo-geneity and percentage of fat inclusion were also importantwhen combined together or with some other feature of highdiscriminative quality, although none of them could provideh ageh ovideda , asi them romp tanta ally,t wasi gh its allyt

type,r ora r, thes con-t havea ctivef veralo case,b latedi e ofi icala them es-t toolt stemc

ncyo oveni final

diagnosis, the performance of all classifiers was evaluatedseparately when including or excluding the shape type, whichwas the most dominant feature in the set. For feature sets with-out any shape type information inclusion, optimizedK-NNclassifier achieved the best results over LDA and LSMD alter-natives. Similar feature sets containing the shape type featureproduced results with no significant preference towards anyof these three classifiers. These results were closely matchedor exceeded by several MLP and RBF NN configurations,especially in the case of excluding shape type information.Although the best accuracy rates in some cases were pro-duced by linear, instead of non-linear, classifiers, it shouldbe noted that NN classifiers used only complete feature setsor feature combinations already calculated as optimal for lin-ear equivalents. Both MLP and RBF models required a largernumber of hidden units when shape type information was ex-cluded, while the inclusion of the specific feature essentiallysimplified the discrimination process and thus concluded intopologies with lesser hidden units. The overall performanceof MLP architectures was marginally higher than RBF equiv-alents, employing much smaller hidden layers and greaterdegrees of generalization.

The SVM classification schemes yielded overall maxi-mum accuracy rates, both when the shape type feature wasexcluded or included in the input vector, higher than the corre-sponding maximum rates of any other linear or NN alternativei ationo andN iorityi on-l

5

zingt rtieso rnalc nal-y ought ans-l singv ctivee truee

e ofd em,s tingl morc ed ina VMc thusp siblef hiss gicala tself,

igh success rates when used individually. Patient’sas been used as a good representative of features prs “external” annotative information by the physician

t is not directly related to the informative content ofammographic image itself, but rather on clinical data fatient’s history. However, it proved to be a very imporspect of the overall clinical evaluation of each case. Fin

he indicative feature of microcalcifications’ presencencluded in several optimal feature combinations, althouhown minimal discriminative value when used individuo predict benignancy or malignancy.

All features, except patient’s age and tumor’s shapeefer directly or indirectly to textural properties of the tumrea as it appears in the mammographic image. Howevehape type was evaluated as the most important featureained in the annotation list. This means that, in order tofully automated diagnosis system that is based on obje

eature measurements of various textural properties, sef these features have to be optimally combined. In anyoth morphological and textural features can be formu

nto a well-defined set of extraction functions, capablmplementing objective estimators of various morpholognd textural properties of each tumor, as it appears inammographic image. This would be very helpful to the

imation of the correct radiologic diagnosis as an adjuncto the physician and the in future a complementary syombined with a CAD.

With regard to best classifier performance, the efficief non-linear architectures over linear equivalents was pr

n almost all cases. Regarding the prediction of the

nvestigated in this study. Thus, a representative applicf advanced SVM models, compared to several linearN classification schemes, is suggestive to their super

n classification problems that exhibit high degree of ninearity in the training datasets.

. Conclusion

The problem of identifying image features characterihe overall morphology and fine-scaled structural propef the tissue in mammographic tumors, as well as extelinical data, was investigated using objective statistical asis and pattern recognition approaches. Therefore, althhe initial descriptive data were qualitative in nature, the tration into quantitative values and their thorough procesia advanced pattern analysis algorithms, produced objevaluations and discriminating power estimations of theirfficiency.

All the selected features have shown some degreependency to the final diagnosis, while some of thuch as morphological shape type, provided discriminaevels high enough to be used even individually for tulassification schemes. Optimal feature sets, employdvanced non-linear classification architectures, like Slassifiers, provided accuracy rates up to almost 95%,roving their efficiency and making such systems plau

or clinical application. All features investigated in ttudy, except patient’s age, are related to morpholond textural properties on the mammographic image i

Page 10: Significance analysis of qualitative mammographic features, using linear classifiers, neural networks and support vector machines

M. Mavroforakis et al. / European Journal of Radiology 54 (2005) 80–89 89

therefore a completely automated diagnosis system, usingthe same content-rich descriptive features, is feasible.

Statistical and classification analysis results have shownthat, although the selected feature sets were in fact content-rich with regard to their diagnostic value, the diagnosticprocess itself remains a complex and demanding task. Thehigh degree of non-linearity employed in the discriminationof the input data with regard to diagnosis prediction sug-gests that automatic diagnosis systems should implementpowerful pattern recognition models of non-linear andhighly adaptive architecture. Future work should be focusedon designing specialized image processing algorithmsfor efficient automatic extraction of morphological andtextural features, combined with robust implementations ofadvanced classification architectures, such as SVMs. Au-tomated diagnosis of breast mammographic abnormalities,combined with CAD systems, which indicate suspiciouslesions in mammograms, will be a very powerful tool in thehands of the mammographic departments and the reportingphysicians, especially the less experienced ones.

References

[1] Bocchi L, Coppini G, et al. Tissue characterization from X-ray im-ages. Med Eng Phys 1997;19(4):336–42.

t dis-

ia:

sses–64.

s fornosis.

-reast

ndrties.

io-rpho-

Med

[9] Haralick RM, Shanmugam K, Dinstein I. Textural features for imageclassification. IEEE Trans Sys Man Cyb 1973;SMC-3(3):610–21.

[10] Haralick RM. Statistical and structural approaches to texture. ProcIEEE 1979;67(5):786–804.

[11] Galloway M. Texture analysis using gray level run lengths. CompGraph Im Proc 1975;4:172–9.

[12] Mavroforakis M, Georgiou H, Cavouras D, Dimitropoulos N,Theodoridis S. Mammographic mass classification using textural fea-tures and descriptive diagnostic data. In: Proceedings of the 14thInternational Conference on Digital Signal Processing (DSP-2002).2002.

[13] Bruce LM, Adhami RR. Classifying mammographic mass shapesusing the wavelet transform modulus-maxima method. IEEE TransMed Im 1999;18(12):1170–7.

[14] Georgiou H, Cavouras D, Dimitropoulos N, Theodoridis S. Mam-mographic mass shape characterization using neural networks. In:Proceedings of the Second European Symposium on Biomedical En-gineering and Medical Physics. 2000.

[15] Kilday J, Palmieri F, Fox MD. Classifying mammographic le-sions using computerized image analysis. IEEE Trans Med Im1993;12(4):664–9.

[16] Georgiou H, Mavroforakis M, Cavouras D, Dimitropoulos N,Theodoridis S. Multiscaled mammographic mass shape analysis andclassification using neural networks. In: Proceedings of the 14thInternational Conference on Digital Signal Processing (DSP-2002).2002.

[17] Cheeseman P, Stutz J, Bayesian Classification (AutoClass): Theoryand Results. NASA Ames Research Center: Artificial IntelligenceResearch Branch; 1995.

[18] Theodoridis S, Koutroumbas K. Pattern recognition. 2nd ed. USA:Academic Press; 1999.

[ p-ixedLM)matri-DA

[ l pre-

[ rn

[ ed.

[ roc

[ orks

[ ctor

[2] Eagan RL. Breast imaging: diagnosis and morphology of breaseases. Philadelphia: Saunders; 1988.

[3] Robbins SL, Angell M, Kumar V. Basic pathology. PhiladelphSaunders; 1981.

[4] Christoyianni I, Dermatas E, Kokkinakis G. Fast detection of main computer-aided mammography. IEEE Sig Proc Mag 2000:54

[5] Orsi CD, Getty DJ, Swets JA, et al. Reading and decision aidimproved accuracy and standardization of mammographic diagRadiology 1992;184:619–22.

[6] Wu Y, Giger ML, Doi K, et al. Artificial neural networks in mammography: application to decision making in the diagnosis of bcancer. Radiology 1993;187:81–7.

[7] Ackerman LV, Mucciardi AN, et al. Classification of benign amalignant breast tumors on the basis of 36 radiographic propeCancer 1973;31:342–52.

[8] Huai Li, Wang Yue, Ray Liu KJ, et al. Computerized radgraphic mass detection. Part I. Lesion site selection by mological enhancement and contextual segmentation. IEEE TransIm 2001;20(4):289–301.

19] Leeden R, Vrijburg K, Leeuw J. A review of two different aproaches for the analysis of growth data using longitudinal mlinear models: comparing hierarchical linear regression (ML3, Hand repeated measures designs with structured covarianceces (BMDP5V). The Statistical Software Newsletter. SSNinCS1996;20:583–605.

20] Stone M. Cross-validatory choice and assessment of statisticadictions. JR Stat Soc 1974;B36(1):111–47.

21] Devroye L, Gyorfi L, Lugosi G. A probabilistic theory of patterecognition. New York: Springer-Verlag Inc.; 1996.

22] Haykin S. Neural networks: a comprehensive foundation. 2ndNew Jersey: Prentice Hall; 1999.

23] Poggio T, Girosi F. Networks for approximation and learning. PIEEE 1990;78:1481–97.

24] Reed R. Prunning algorithms: a survey. IEEE Trans Neural Netw1998;4(5):740–7.

25] Christianini N, Shawe-Taylor J. An introduction to support vemachines. Cambridge, UK: Cambridge University Press; 2000.