Medical Image Analysis Using Texture Analysis

Medical Image Analysis Using Texture Analysis

Neil MacEwen

Supervisor: Dr. W. Nailon 19th April 2004

Medical Image Analysis Using Texture Analysis Neil MacEwen

2

DECLARATION

I declare that this report is entirely the result of my own work under the

supervision of Dr. W. Nailon.

Neil MacEwen

19th April 2004


3

CONTENTS

CONTENTS........................................................................................................................ 3

ABSTRACT........................................................................................................................ 5

ACKNOWLEDGEMENTS................................................................................................ 6

TABLE OF FIGURES........................................................................................................ 7

LIST OF TABLES.............................................................................................................. 8

1. INTRODUCTION ..................................................................................................... 9

2. PATTERN RECOGNITION .................................................................................... 10

2.1 Feature generation............................................................................................ 11

3. REVIEW OF PREVIOUS PROJECT ...................................................................... 12

3.1 Texture Analysis ............................................................................................... 12

3.1.1 First-order algorithm................................................................................. 12

3.1.2 Second-order algorithms........................................................................... 13

3.2 Algorithm implementation...................................................................................... 15

4. CLASSIFICATION .................................................................................................. 16

4.1 A simple one-dimensional classification example........................................... 16

4.2 Classifier design................................................................................................ 18

4.3 Classifiers.......................................................................................................... 19

4.3.1 Statistical Approach .................................................................................. 19

4.3.2 Decision Functions.................................................................................... 20

4.3.3 Distance functions and clustering ............................................................. 22

4.3.4 Fuzzy logic classifier ................................................................................ 23

4.3.5 Artificial Neural Net (ANN) classifier ..................................................... 23

5. FEATURE REDUCTION .................................................................................... 25


4

6. SEGMENTATION ............................................................................................... 27

7. CLASSIFIER DESIGN AND CODING .............................................................. 28

7.1 Data Set ............................................................................................................. 28

7.2 Feature vector generation.................................................................................. 29

7.3 Classifiers.......................................................................................................... 30

7.3.1 Euclidean distance classifier ..................................................................... 30

7.3.2 Mahalanobis classifier .............................................................................. 30

7.3.3 Neural Network classifier ......................................................................... 31

7.4 Classifier testing................................................................................................ 31

7.4.1 Initial performance.................................................................................... 31

7.4.2 Normalisation............................................................................................ 33

7.4.3 Classifier performance for a 2-dimensional data set................................. 34

7.4.4 Neural Network ‘spread’ variation ........................................................... 37

7.4.5 Classifier performance at different sizes................................................... 38

7.4.6 Neural Network “spread” variation at different image sizes .................... 39

8. SEGMENTATION OF A TEXTURE IMAGE.................................................... 40

9. FEATURE SUBSET SELECTION.......................................................................... 42

10. CONCLUSION & FURTHER WORK ............................................................... 46

REFERENCES ................................................................................................................. 47

APPENDICES .................................................................................................................. 48

Appendix A – Texture features..................................................................................... 48

Appendix B – MATLAB code...................................................................................... 50


5

ABSTRACT

This main aim of this project was to validate the use of a set of 5 texture analysis

algorithms for the identification of textures in images.

The project built on a previous final year project that used the 5 algorithms to generate a

set of texture features describing an image. This report outlines the work done in

validating these algorithms. A set of classifiers were designed and tested, and results

showed that the algorithms can be used to successfully differentiate between different

texture types. The segmentation of a mixed texture image was undertaken to validate the

classifier performance results.

Feature reduction was investigated in order to reduce the amount of features needed to

classify successfully a texture image.


6

ACKNOWLEDGEMENTS

I would like to thank Bill for introducing me to this interesting topic, and for his support

and guidance throughout the year.


7

TABLE OF FIGURES

Figure 2.1: A basic scheme for pattern recognition......................................................... 10

Figure 3.1: Two textural images with their histograms and some first-order statistics... 13

Figure 3.2: An example of a greyscale MRI image and the graphical representation of its

textural features. See appendices A and B for feature details and code respectively. ..... 14

Figure 4.1: Two-dimensional feature space containing two classes. An incoming pattern

is assigned to the class corresponding to the region it falls in. ......................................... 16

Figure 4.2: The characters to be identified, 1 & 0, placed on a grid................................ 17

Figure 4.3: A histogram of features values. Anything falling to the right hand side of the

boundary is classified as a 0, anything to the left as a 1. .................................................. 18

Figure 4.4: (a) Linear decision function (b) Nonlinear decision function (both shown in

red) .................................................................................................................................... 20

Figure 4.5:Three linearly separable classes in R2, the decision boundary for a class Ci is

given by di(x) .................................................................................................................... 21

Figure 4.6: Three pairwise separable classes in R2, two decision boundaries are needed to

select each class. ............................................................................................................... 21

Figure 7.1: The four textures used for classifier development. ....................................... 28

Figure 7.2: Initial classifier performance results (% test vectors correctly identified).... 32

Figure 7.3: Graphical illustration of the normalisation process....................................... 33

Figure 7.4: Classifier performance after normalisation (% test vectors correctly

identified).......................................................................................................................... 34

Figure 7.5: Fisher’s iris data ............................................................................................ 35

Figure 7.6: Graphical representation of Fisher Data classification.................................. 36


8

Figure 7.7: Neural Net classifier performance for different spreads for 32 x 32 pixel

images ............................................................................................................................... 37

Figure 7.8: Average classifier performance over all four classes at different sizes ........ 38

Figure 7.9: Neural Net Classifier performance at varying values of spread.................... 39

Figure 8.1: Left: combination of the four Brodatz source textures. ............................ 40

Right: outline of combination image, showing classes............................. 40

Figure 8.2: Segmented Brodatz texture combination image............................................ 41

Figure 9.1: Classifier performances using subsets selected using the Bhattacharyya

distance ............................................................................................................................. 42

Figure 9.2: Classifier performances using subsets selected using forward and backward

selection procedures.......................................................................................................... 43

Figure 9.3: Classifier performances for each texture algorithm. ..................................... 44

LIST OF TABLES

Table 7.1: Number of smaller images extracted from source images at each resolution 29

Table 7.2: Classification of Fisher’s Iris Data before and after normalisation................ 36

Table 9.1: Feature subset sizes created by using only individual texture algorithms....... 44


9

1. INTRODUCTION

Visual assessment of digital image information is often aided by computer-based

image analysis to remove any subjective bias. In a medical context for example, it can be

very difficult to distinguish between different clinical features present in a medical image,

such as between grey and white matter in a MRI scan. One type of computer-based

analysis that can be used is texture analysis, which can be used to identify different

clinical features using their texture properties.

This project is an extension to a previous final year project [1] that aimed to create

a robust image-viewing platform and investigate the use of advanced image analysis

strategies for assisting clinical diagnosis. Five texture analysis algorithms were used to

generate a set of 38 textural features describing an image, and this project will

consequently extend the previous work to analyse a gold-standard data set to validate the

use of these algorithms for the classification of different texture types.


2. PATTERN RECOGNITION

This project fits into the overall subject area known as pattern recognition.

Pattern recognition is the process of discriminating between (classifying) certain

observations. For example, given a group of a thousand we may want to discriminate

between four types of humans [2]; (a) tall and thin (b) tall and fat (c) short and thin (d)

short and fat. A classification process is therefore carried out on certain features

belonging to these persons, to put them into the correct class. A good choice of features

in this case could be for example (height, weight).

In any pattern recognition problem, features must be generated from observations.

This process is called feature generation. A subset of the selected features may then be

chosen for various reasons (see section 6); the selection of this subset is called feature

reduction. These features are then used as the input for a classifier, which will assign the

original object into a corresponding class. In the previous example, the classes are the

four types of human, the observations are all the observed qualities of each human (which

are almost limitless, such as age, employment etc), and the extracted features are height

and weight, drawn from the observations.

A basic scheme for pattern recognition is given in figure 2.1.

x wiFeatureGeneration

FeatureReduction Classifier

y y'

Figure 2.1: A basic scheme for pattern recognition

Where x – observation vector

y – feature vector 10


11

y’ – reduced feature vector

wi - selected class

In the specific case of this project, the observation vector is a digital image (i.e.

the pixel values), the feature vector is a vector of textural features, and the various classes

are texture types.

2.1 Feature generation

Feature generation is the process of selecting useful features from some

observation vector that describes the original object. In the specific case of this project

features must be extracted from medical images. There are many ways to extract features

from images [3], in this project the image is described using textural features extracted by

texture analysis.


12

3. REVIEW OF PREVIOUS PROJECT

The previous project [1] undertaken involved the textural analysis of medical

images in order to extract textural features to aid clinical diagnosis. Various algorithms

were explored which produced a total of 38 features (see appendix A). A Graphical User

Interface (GUI) was created in MATLAB [10] to allow simple graphical and textual

viewing of the feature values, although no analysis was undertaken.

3.1 Texture Analysis

Texture is an important characteristic that can be used to identify or describe an

image [4]. In a digital image, texture describes the relationship between the intensities of

neighbouring pixels (not necessarily adjacent). Texture can be examined in two ways,

structurally and statistically. The statistical approach was used [1]. One first-order

algorithm was used, producing nine features, and four second-order algorithms producing

the remaining twenty-nine.

3.1.1 First-order algorithm

The first-order algorithm simply studies the first-order probability distribution of

the pixel intensity values. The nine features calculated are detailed in appendix A. An

example of some first-order statistics is shown in figure 3.1.


Figure 3.1: Two textural images with their histograms and some first-order statistics [1]

3.1.2 Second-order algorithms

Four algorithms were based on second-order statistics. In each case an

intermediate matrix describing the digital image was created, from which the features

were then calculated. The four techniques were:

1. The Neighbourhood Grey Tone Difference Matrix (NGTDM)

- This approach is based on the characteristics of the Human Visual

System (HVS). The HVS tends to measure some basic properties of

visual data such as size, colour, shape and orientation, and then

classify the textures in terms of properties such as coarseness, contrast,

roughness, directionality etc [1]. The NGTDM technique returns 5

features as detailed in appendix A.

2. The Grey Level Run Length Matrix (GLRLM)

13


- This approach calculates the number and length of runs of different

grey level values in the image. The calculation is performed over

various directions. The GLRLM also returns five features as detailed

in appendix A.

The next two algorithms are based on the co-occurrence matrix. The co-occurrence

matrix represents the joint probability distribution of pairs of grey level intensities. [1]

3. The Spatial Grey Level Dependence Matrix (SGLDM)

- This approach considers the probability of finding various different

pairs of pixel values over certain distances and directions. The

SGLDM returns 14 features as detailed in appendix A.

4. The Grey Level Difference Method (GLDM)

- This approach examines the differences between pixel values at fixed

separations. The GLDM returns 5 features as detailed in appendix A.

An example of some textural statistics is shown in figure 3.2.

Figure 3.2: An example of a greyscale MRI image and the graphical representation of its textural features. See appendices A and B for feature details and code respectively.

14


15

3.2 Algorithm implementation

The algorithms were implemented using MATLAB. For both the intermediate

matrix and texture feature calculations, computation was found to be very slow. To

optimise computational time calculations were performed in the C language, which is

more suitable for these types of algorithms [1]. A GUI viewer was also created, which

allows the user to view the input image and its numerical feature values. Regions of

interest (ROI) can also be selected, allowing the user to compare textural features for up

to four different regions.


4. CLASSIFICATION

To validate the texture algorithms textures were identified using the previously

described texture features; a classification process was therefore undertaken.

Classification is the process of categorising an object using certain features describing the

object. The features create a feature space in which all objects will lie and the aim of a

classifier is to identify the regions of the feature space taken up by each class. Thus when

a new feature vector is applied to the classifier it will be assigned to the corresponding

class according the region in which it fell, as shown in figure 4.1.

x1

x2

xx xx

x xx

xooo

o oo o

o pattern type 1pattern type 2x

Region 1

Region 2

Figure 4.1: Two-dimensional feature space containing two classes. An incoming pattern is assigned to the class corresponding to the region it falls in.

4.1 A simple one-dimensional classification example

A simple example of classification could be character recognition [3]. In a two-

class case the classifier must differentiate between a 1 and a zero. The objects (the

numbers) are shown in figure 4.2. They have been placed on a grid so that an observation

vector can be found for each character.

16


1 2

6

3 4

25

5 1 2

6

3 4

25

5

Figure 4.2: The characters to be identified, 1 & 0, placed on a grid.

The observation vectors are given roughly below. The area of each region of the grid

covered by the letter gives each element of the observation vector.

⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢

⎣

⎡

=

00.2.0..00

1x

⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢

⎣

⎡

=

02.0.2.0..00

0x

As the 0’s generally cover more area than the 1’s, a feature that will differentiate between

the two characters could intuitively be chosen to be the total area covered by each

character. The components of each observation vector are therefore summed to give the

feature vector y.

8.31 =y 1.92 =y

A classifier can be designed by plotting a histogram of all the features obtained from a

“training” set, as shown in figure 4.3. The histogram for each class (1 and 0) is plotted on

the same line, and then a boundary can be visually applied, a character being classified to

a class depending on which side of the boundary its feature vector falls in.

17


0

1

2

3

4

5

6

y1N

1's0's

Decision Boundary

Figure 4.3: A histogram of features values. Anything falling to the right hand side of the boundary is classified as a 0, anything to the left as a 1.

This classifier is obviously not ideal, and the error regions can be seen visually as

the portions of the 1 histogram to the right of the boundary, and the portions of the 0

histogram to the left of the boundary.

4.2 Classifier design

There are various considerations that must be taken in to account when

contemplating the use of a classifier [5]. A given pattern x has to be assigned to one of C

classes w1, w2,…., wc based on its feature values (x1, x2,…., xN). There is therefore an N-

dimensional feature space. The features have a density function conditioned on the

pattern class, thus a pattern vector x belonging to class wi is viewed as an observation

drawn from the class-conditional density p(x | wi).

The amount of information known about the class-conditional densities must be

considered. A parametric classifier assumes knowledge of the class-conditional densities.

However, even if the densities are not known a common approach is to estimate them

using a training set of patterns, as was highlighted in the previous simple example. That

is, a selection of feature vectors belonging to a single class are taken and an estimate is

made of the conditional density belonging to that class. There are also non-parametric

18


techniques that assume no prior knowledge of the classes. This is when it is impossible

to construct a classifier using training samples taken from a known class due to the

unavailability of labelled training samples. It may not even be known how many classes

there should be. In these cases cluster analysis is used to organize the training sets into

groups, or clusters, each corresponding to a class.

4.3 Classifiers

4.3.1 Statistical Approach

To use the statistical approach the class-distribution densities must be known or

estimated. The statistical approach is useful if there is an overlapping of class regions in

the feature space. A statistical classifier examines the risk involved with every

classification and attempts to measure the probability of misclassification. A well-known

statistical classifier is the Bayes classifier that is based on Bayes formula from probability

theory and minimises the total expected risk, the classifier is thus an optimum classifier.

The Bayes classifier calculates the posterior probability of a pattern being in each class,

and assigns it to the class that gives the largest probability. A simple Bayesian decision

rule in a two-class case could be the following [3]:

If )Pr()Pr(

)|()|(

)(1

2

2|

1|

2

1

ww

wpwp

wx

wx >≡xx

xl choose w1

)Pr()Pr(

1

2

ww

< choose w2

Where is called the likelihood ratio )(xl

is the conditional density function for class i evaluated at x )|(| iwx wpi

x

19


4.3.2 Decision Functions

When the number of classes is known and the training patterns produce

geometrically separated classes, decision functions can be used to classify an incoming

pattern. For a two-class example, where two classes C1 and C2 exist in Rn and a

hyperplane separates their patterns, the decision function can be used as

a linear classifier.

0)( =xd )(xd

10)( Cd ∈⇒> xx

20)( Cd ∈⇒< xx

The hyperplane is called the decision boundary. In some cases where the

classes cannot be separated by linear decision functions a nonlinear classifier can be

created using generalised decision functions, or the feature space can be transformed into

a much higher dimension where linear decision boundaries can be used. Examples of

linear and nonlinear decision functions are shown in figure 4.4.

0)( =xd

x1

x2

xx xx

x xx

xooo

o oo o

o pattern type 1pattern type 2x

x1

x2

oo

o oo o

xx

x

xx

x

x

Region 1

Region 2

Region 1

Region 2

(a) (b)

Figure 4.4: (a) Linear decision function (b) Nonlinear decision function (both shown in red)

In general, however, there are m pattern classes {C1,C2,…,Cm} in Rn. If some

surface separates some class CnRd ∈= xx ,0)( i from the remaining Cj, j i, i.e. ≠

20


0)( >xd , iC∈x

0)( <xd , jC∈x , ij ≠

then is a decision function of C)(xd i. This concept is illustrated in figure 4.5, which

gives an example of absolutely separable classes. Classes can also be pairwise separable,

which means that there is a possible linear decision boundary between each pair of

classes, as illustrated in figure 4.6.

d1(x) = 0

21

C2C3

C1 x1

x2

d3(x) = 0

+-

+-

d2(x) = 0+

-

Figure 4.5:Three linearly separable classes in R2, the decision boundary for a class Ci is given by di(x)

d23(x) = 0

C2

C3

C1

x1

x2

d12(x) = 0

+-

+-

d13(x) = 0

+-

Figure 4.6: Three pairwise separable classes in R2, two decision boundaries are needed to select each class.


4.3.3 Distance functions and clustering

If the training patterns form clusters, a distance function and clustering approach

can be used. This entails classifying an incoming pattern according to its proximity to

patterns of existing classes. Two ways of deciding to what class the incoming pattern

belongs are minimum-distance and nearest-neighbour classification [2]. Minimum

distance classification represents each class by prototype vectors, for example the class

mean. The simplest case is when the patterns of each class are very close to each other,

and each class can therefore be represented by a single prototype. If there are m pattern

classes in Rn, {C1,…,Cm}, represented by the prototype vectors y1,…,ym, then the

distances between an incoming pattern x and the prototype vectors can be defined as [2]:

( ) ( )( )21

iT

iiiD yxyxyx −−=−= , mi ≤≤1

x will be classified as Cj for which Dj is minimum, i.e.

ijD yx −= min , mi ≤≤1

This calculates the minimum Euclidean distance to the class prototypes. The

Euclidean distance however does not take into account any correlation between features,

and thus an improvement is to use the Mahalanobis distance that takes into account the

class covariance matrices. This classifier is also occasionally known as the Gaussian

classifier. Given a class mean and covariance matrix ui and Wi respectively, the distance

is defined as [6]:

iiiT

ii WuWuD ln)()( 1 +−−= − xx

The decision rule is therefore:

LC∈x when { }iL DD min=

22


23

When using the minimum-distance classifier a major problem is defining the class

prototypes. This is especially a problem if classes are split into several clusters. A

measure of similarity between patterns must be used, so that “similar” patterns can be

grouped together to form clusters. Clustering algorithms usually aim to optimise some

performance index, such as the sum of the distances between each pattern and its

corresponding cluster centre. Several clustering algorithms have been developed [2],

such as the c-Means Iterative algorithm (CMI), which iteratively updates each cluster

centre by replacing it with the mean of its samples, or the ISODATA algorithm, which is

another, more complex, iterative method.

Another classifier is the Nearest Neighbour classifier which classifies x to the

class corresponding to its nearest neighbour in the set of sample patterns. The Nearest

Neighbour classifier can also be extended to take into account the k nearest neighbours

[2].

4.3.4 Fuzzy logic classifier

Classification can be carried out using a fuzzy logic approach. This would result

in each incoming pattern being classified to every class with varying degrees of certainty.

This approach would not be useful for this project, as definite pattern identification must

be achieved, i.e. an incoming pattern must be classified into one single class.

4.3.5 Artificial Neural Net (ANN) classifier

In order to use the ANN classifier it must be assumed that a set of training

patterns and their correct classifications are a priori available. ANNs are based on the


24

functionality of the human brain. The human brain is made up of many neurons

connected together by synapses. ANNs are based on the same idea; they consist of

neurons connected by “weights”. Each neuron performs an operation on its input signal

to produce its output, and the weights simply multiply the signal by a fixed value. A

form of ANN that can be used for classification is the Probabilistic Neural Network

(PNN), which can be easily implemented using the MATLAB Neural Net toolbox [10].

The PNN consists of two layers, the first calculates the distance from an input vector to

the training vectors before the second determines the probabilities of the input being in

each class. Finally the input is classified according to the maximum of these

probabilities.


25

5. FEATURE REDUCTION

An optional step in the simple pattern recognition system is feature reduction.

Feature reduction is the process of reducing the dimensionality of the feature space. The

aim is to create a different set of features to either improve classifier performance or

reduce feature computation expense, which may be unnecessarily high due to irrelevant

features.

Feature reduction can be also known as feature extraction or feature selection.

Feature extraction is where a smaller number of new features are created from linear

combinations of the original features. The obvious drawback is that this means that the

same number of measurements must be taken in the first place. Feature selection is the

choice of a smaller subset of the original features. This can be an extremely

computationally expensive task to achieve, as if the problem is in high dimensions there

can be a huge number of possible subsets. There are however some other heuristic

methods of feature selection which, although perhaps not finding the “best” subset, will

find a reasonable subset [3]. All the methods require some form of quantifying the “best”

features, usually a measure closely related to the error rate of the resulting classifier if the

actual classifier performance cannot be evaluated.

Stepwise forward: this method first finds the single feature that maximises the

measure of “best”. Then another feature is selected which, coupled with the original,

again maximises the measure. A third feature is then chosen and this process continues

until a certain number of features have been found.


Stepwise backward: this method starts with all the features and, at each step,

removes the feature that maximises (or least reduces) the measure.

Full stepwise: this method combines both the previous methods to form a method

with the properties of both.

Another method [11] uses the Bhattacharyya distance to rank order the features in terms

of ‘relevance’ in separating the classes. The ranked features can then be used to create

subsets of chosen sizes. The Bhattacharyya distance for a single feature between two

classes a and b is shown below.

⎭⎬⎫

⎩⎨⎧

+−

+⎭⎬⎫

⎩⎨⎧

++= 22

2

2

2

2

2 )(41)2(

41ln

41),(

ba

ba

ab

b

b

abaBDσσµµ

σσ

σσ

where jµ and are the class mean and variance for class j. jσ

26


27

6. SEGMENTATION

Segmentation is the problem of separating an image into regions. The goal of

many medical image analysis applications is to separate an image into regions defined by

different clinical features. For example this may be to define clear regions in an MRI

brain scan containing white matter or grey matter. This could provide an extremely

useful tool for assisting clinical diagnosis.

In this project segmentation provides a method of validating classifier

performances. A simple intuitive method of segmentation is as follows. The image is

split into smaller blocks, and each block analysed individually, and either given a

classification as one of the classes, or designated “unknown”. The unknown blocks are

then analysed in more detail, as they are likely to contain boundaries between classes. A

progressive “zooming-in” process is then undertaken in order to find the boundary lines

between classes. The image has therefore been split into small regions each containing a

single class, and a new image could be created showing clearly the distinctions between

the classes. The performance of this segmentation process would validate the

performance of a classifier in a real-life application.


7. CLASSIFIER DESIGN AND CODING

7.1 Data Set

The Brodatz texture database [8] was chosen for the development of a suitable

classifier as it provides a benchmark set of texture images. Four separate textures were

selected to provide four distinct classes; as illustrated in figure 7.1. Clockwise from the

top-left texture, they are as follows:

D19 – Woollen cloth

D55 – Straw matting

D93 – Hide of unborn calf

D92 – Pigskin

The ‘D’ identifiers refer to page numbers in Brodatz’s original publication.

Figure 7.1: The four textures used for classifier development.

The four images each of size 640 x 640 pixels were used as sources for each class;

i.e. they were broken up into smaller sized regions which were analysed individually.

28


29

7.2 Feature vector generation

Initially, various smaller images were extracted from the source images. Once the

numerous smaller images were extracted, they were divided into training and test sets, as

shown in table 7.1, so that the classifiers could be designed and tested. The number of

images extracted at each resolution is a function of (and limited by for the larger sizes)

the size of the source image.

Extracted image size (pixels)

Number of images extracted

Number of training samples

Number of test samples

8 x 8 400 320 80

16 x 16 400 320 80

32 x 32 400 340 60

64 x 64 100 70 30

128 x 128 25 15 10

Table 7.1: Number of smaller images extracted from source images at each resolution

The extracted texture images were then used to create feature vectors. Initially for

classifier development the study was carried out only on the intermediate-sized 32 x 32

pixel images. A feature vector containing values for each of the 38 textures features was

generated for each image, as was explained in section 3. The feature vectors were stored

in matrix form for ease of future accessibility, thus classifier development was carried out

on two matrices for each class, a 340 x 38 training matrix, and a 60 x 38 test matrix.


30

7.3 Classifiers

Three classifiers were chosen for examination; the simple minimum Euclidean

distance classifier, the Gaussian Mahalanobis distance classifier, and the probabilistic

neural network classifier contained within the MATLAB Neural Network Toolbox.

Neural Network classifiers and the Gaussian classifier are commonly seen in

texture literature, and they have been shown to give good results. The Euclidean distance

classifier was chosen to provide a simple, fast alternative to the other more complex

classifiers.

7.3.1 Euclidean distance classifier

As explained in section 4.3.3 in order to implement a minimum distance classifier

there must first be a choice made of class prototypes to represent the class. An incoming

feature vector is then classified to the class represented by the nearest prototype, ‘nearest’

established using the Euclidean distance. The simplest prototype that can be used is

simply the class mean. Class prototypes can also be generated using some clustering

algorithm. As a comparison to the class means, prototypes were also created using the c-

means algorithm, and a function was also written to attempt to match the c-means cluster

centres to their best-fit class. The c-means algorithm has various modifiable attributes,

and thus produced a wide variety of different results, from this point on only the best

result will be referred to. The c-means process was undertaken using a MATLAB

toolbox function kmeans.m, which allowed for various different versions of the algorithm

to be implemented. A Euclidean distance classifier was written in MATLAB.

7.3.2 Mahalanobis classifier


31

The Mahalanobis classifier is another minimum distance classifier, classifying an

incoming feature vector to the nearest class, ‘nearest’ established using the Mahalanobis

distance (see section 4.3.3). The Mahalanobis distance measures the distance between a

point in space and a data set, the classifier thus needs a priori class mean and covariance

values, which are of course available from the training matrix. A Mahalanobis classifier

was written in MATLAB.

7.3.3 Neural Network classifier

The MATLAB Neural Network toolbox was used to create a probabilistic neural

network (see section 4.3.5) using the training data, which was then used for classification

purposes. The classification was then carried out by inputting a test feature vector into

the neural net, which outputted the corresponding class. The Neural Net classifier

essentially operates as a k-nearest-neighbour classifier, examining a number of local

vectors and working out the probabilities of the test vector belonging to each class. This

is achieved by measuring the Euclidean distance from the test point to each of its

neighbours. The test vector is then allocated to the class corresponding to the highest

probability.

7.4 Classifier testing

7.4.1 Initial performance

As explained earlier, initial classifier testing was carried out on the 32 x 32 pixel

set of texture images with 340 training vectors for each class used to create a priori class

means and covariance matrices, and for constructing the neural net. The test vectors were


inputted one by one into each classifier, and each classifier’s performance was measured

by examining the amount of correct classifications it achieved. The initial results are

shown in figure 7.2. The figure shows the percentage of test vectors from each class that

0

20

40

60

80

100

120

D19 D55 D92 D93 average

Euclideandistance(means)Euclidean distance(c-means)Mahalanobis distance

Neural Net

Figure 7.2: Initial classifier performance results (% test vectors correctly identified)

were correctly classified, and finally the average performance for each classifier. In the

case of the c-means minimum distance classifier, the results shown are the best achieved

for all algorithm set-ups (see MATLAB documentation for further information) in order

to compare with the class means performance.

As can be see in figure 7.2, the classifiers produced widely varying results. The

best performance was from the Mahalanobis classifier, which obtained an overall

classification performance of 96.25% of test vectors correctly identified. It was also seen

that the c-means algorithm provided no great advantage over simply using the class

means as prototypes. This is because the c-means algorithm is essentially trying to find

cluster centres that minimise the total distance between each cluster centre and its

members, and thus the best result that can be found is in fact the class mean. The Neural

Net classifier produced very poor results.

32


7.4.2 Normalisation

The feature vectors that are created come from five different texture analysis

techniques, and thus there are significant range variations between some values in a

feature vector. This is illustrated in figure 7.3, in which the range of texture feature

values for a simple texture image can be seen to range over several orders of magnitude.

Figure 7.3: Graphical illustration of the normalisation process

A normalisation process was therefore carried out on all vectors used in order to

bring them all into the same range, as also illustrated in figure 7.3. The normalisation

undertaken was zscore normalisation (or standardisation), which allows for the

comparison and combination of measures made on different scales. The zscore of a

column vector x (feature in a matrix of observations) is as follows, and is measured in

units of standard deviation;

σµ−

=xZ

where µ and σ are the feature mean and standard deviation.

Figure 7.4 shows the classifier performance characteristics after normalisation.

33


0

20

40

60

80

100

120

D19 D55 D92 D93 average

Euclideandistance(means)Euclidean distance(k-means)Mahalanobis distance

Neural Net

Figure 7.4: Classifier performance after normalisation (% test vectors correctly identified)

The Euclidean distance classifiers underwent a significant improvement in

performance after normalisation, both increasing by around 10%. The performance of

the neural net classifier was completely transformed, returning a 98.75% success rate.

The Mahalanobis classifier however underwent a large decrease in performance, the

number of successful classifications almost halving. As the classifiers are working in a

38-dimensional space, it is very difficult to understand the different performances;

therefore an exploration of a 2-dimensional data set was carried out with the aim of better

understanding the classifiers.

7.4.3 Classifier performance for a 2-dimensional data set

The performance of each classifier was examined for a 2-dimensional data set, as

this allowed visual analysis of the data. Fisher’s Iris data set, contained within the

MATLAB environment, was used which consists of measurements on various features of

150 iris specimens, 50 of each of 3 types. The 2 features selected for analysis were sepal

34


length and petal width. Figure 7.5 shows a graphical representation of the data, before

and after normalisation, the black dots show the class means.

Figure 7.5: Fisher’s iris data

It can be seen that the setosa type of iris is well separated from the other two

types, and thus one would expect it to classify well. The other two iris types, versicolor

and virginica, are slightly overlapping, and thus one would expect some possible

misclassifications. In this case the normalisation process has a smaller effect as each

feature is measured on the same scale to begin with, however the means can be seen to be

slightly further apart.

The classifiers were again tested for the new data, every specimen was inputted to

each classifier in an attempt to find some correlation between the results given by the

classifiers and the spatial representation of the data. For the Euclidean distance classifier

the class means were used as class prototypes. The results of the classification are shown

in table 7.2. Figure 7.6 shows the results of the classification graphically. The circled

samples are those that were misclassified, the black corresponding to the Euclidean

classifier, magenta to the Mahalanobis classifier and cyan to the Neural Net classifier.

35


Class Euclidean Distance

Mahalanobis Distance Neural Net

Before normalisation Setosa After normalisation

100

100

100

100

100

100 Before normalisation Versicolor After normalisation

76

80

96

96

84

88 Before normalisation Virginica After normalisation

78

86

94

94

78

80 Before normalisation Average After normalisation

84.67

86

96.67

96.67

87.3

89.33 Table 7.2: Classification of Fisher’s Iris Data before and after normalisation

Figure 7.6: Graphical representation of Fisher Data classification

As expected the setosa type classified very well, and the other classes contain

some misclassifications. Normalisation has no great spatial effect on the data, however it

still affects the classifier performances. Again the Euclidean distance and the Neural Net

classifiers have improved their performance. This time however the Mahalanobis

classifier underwent no change whatsoever.

36


The Neural Network and Euclidean classifiers both improve performance as they

are using the Euclidean distance, which can be adversely affected by being measured

over distorted scales. The Mahalanobis distance however measures its distance in units

of standard deviation from the class mean, and therefore the zscore normalisation has no

effect on Gaussian classifier performance.

7.4.4 Neural Network ‘spread’ variation

When designing the Neural Network classifier, a variable “spread” can be

defined. For initial classification testing spread was set to one, however the value of

spread affects the performance of the classifier. When spread is set to around 0, the

classifier behaves like a nearest-neighbour classifier. As spread increases, the classifier

takes into account several neighbouring vectors, and thus becomes a k-nearest-neighbour

classifier. Figure 7.7 shows the average Neural Net classifier performance for values of

spread varying from 0.1 to 2. It can be seen that the Neural Net classifier performed best

for 32 x 32 pixel images when the spread was between 0.8 and 0.9.

Figure 7.7: Neural Net classifier performance for different spreads for 32 x 32 pixel images

37


7.4.5 Classifier performance at different sizes

The classifier performances were tested at the image sizes other than 32 x 32

pixels. The results of the classifications are shown in figure 7.8. Neural Net and

Euclidean classifier performances were evaluated under normalised conditions, while the

Mahalanobis classifier was evaluated under non-normalised conditions. It must be noted

that evaluations at 64 x 64 and 128 x 128 pixel images were carried out on reduced data

sets.

Figure 7.8: Average classifier performance over all four classes at different image sizes

All three classifier performances dropped significantly at the smallest image size,

8 x 8 pixels. This is because it is hard to extract significant texture information from such

a small region. In general, all three classifiers performed best at 32 x 32 pixels, with the

performance characteristics generally dropping off to either side. The Neural Net

classifier returned overall the best performance. The general dip in performance at the

sizes greater than 32 x 32 can be attributed to the smaller data sets, and thus reduced

training sets available for classifier construction.

38


7.4.6 Neural Network “spread” variation at different image sizes

The variation of the Neural Net classifier’s performance according to spread was also

examined for the various image sizes. Figure 7.9 shows the performance of the classifier

at spread values between 0.1 and 2.

Figure 7.9: Neural Net Classifier performance at varying values of spread

It can be seen that the classifier’s performance varied quite considerably with spread, and

at all sizes the performance had a peak somewhere between 0.9 and 1.1. The figure also

illustrates very well the different performances at different image sizes, with the best

overall performance again being seen at the 32 x 32 image size.

39


8. SEGMENTATION OF A TEXTURE IMAGE

With the classifiers having now been designed and tested, an image made up from

the four previous Brodatz textures was segmented to verify the classifier performance

results seen previously and as a precursor to attempting to segment medical images.

Figure 8.1 shows the image created by combining sections of the source image,

and the classes that each section belongs to.

Figure 8.1: Left: combination of the four Brodatz source textures.

Right: outline of combination image, showing classes.

A segmentation process was coded in MATLAB, simply selecting a region of

interest (ROI), classifying the region, and creating a new image using a different colour

for each identified class. The combination image was created using 32 x 32 pixel blocks

taken from the source image, and as such performance using an ROI of greater than 32 x

32 pixels would be expected to be very poor.

Figure 8.2 shows the segmented images created using a ROI of 8, 16 and 32

pixels.

40


Figure 8.2: Segmented Brodatz texture combination image

The segmentation results confirmed the classifier performance results seen

previously. Using an 8 x 8 ROI the results were again generally quite poor, with a high

occurrence of misclassifications across all classifiers. Increasing the ROI size improved

the segmentation for all classifiers, and at 32 x 32 the Mahalanobis and Neural Net

classifiers segmented the image faultlessly.

41


9. FEATURE SUBSET SELECTION

As explained in section 5, feature reduction can be undertaken to reduce the

number of features used by the classifier. Calculating the 38 texture features was a very

time-consuming process, and therefore it was decided to investigate feature selection as a

means to reduce the amount of computation needed to create the input vectors for the

classifiers. As explained previously finding the optimum subset is an extremely onerous

task, thus techniques to find sub-optimal subsets were investigated. Again for

investigation purposes the 32 x 32 data set was used.

Firstly the rank ordering of the features using the Bhattacharyya distance was

carried out (see section 5). The resulting classifier performances for normalised and non-

normalised data are shown in figure 9.1.

Figure 9.1: Classifier performances using subsets selected using the Bhattacharyya distance

The results showed that a vastly reduced subset of features can be used for no

great loss in performance, or even in some cases an increase in performance. The

original 38-feature performances can be seen at the extreme right of each graph. It is

42


interesting to note that the Neural Net classifier achieved good performance using only 2

features of non-normalised data, and thereafter fell off to its usual poor performance. All

3 classifiers generally showed consistent performances across a wide range of subset

sizes for non-normalised data, however for normalised data subset size affected

considerably the performances. The Mahalanobis classifier also achieved good

performances with normalised data, unlike previously, which suggests that there were

only a few features that were significantly affecting its performance.

Performance at reduced subsets was also examined using the stepwise procedures

to select the subsets (see section 5). Both forward and backward algorithms were

investigated using both normalised and non-normalised data and using all 3 classifiers as

performance indicators. Figure 9.2 shows the results obtained using these procedures.

Figure 9.2: Classifier performances using subsets selected using forward and backward selection procedures.

The results again showed that excellent performance could be achieved using

reduced subsets. The graphs show forward and backward selection results using each

classifier as the measure of performance. Moving from left to right across the graph the

forward algorithm adds a feature and the backward algorithm removes one. Thus at the

very left side of the graph, the forward subset contains one feature, and the backward

subset contains 37 features (one has been removed). Likewise at the extreme right of the

43


each plot the forward subset contains 38 features (showing original performances) and

the backward subset contains one feature (not necessarily the same feature as the forward

algorithm).

Again it was seen that normalisation has very little affect on the Mahalanobis

classifier, but for the others it resulted in a marked improvement in performance. It is

interesting to note the reverse characteristics of the forward and backward algorithms, for

example for the Euclidean classifier using normalised data the backward characteristic is

almost a mirror image of the forward characteristic.

The performance of each texture algorithm was also examined, thus producing 5

reduced subsets of varying sizes, as shown in table 9.1. These performances are shown in

figure 9.3.

Algorithm Size of feature subset

First order 9

NGTDM 5

GLDM 5

GLRLM 5

SGLDM 14

Table 9.1: Feature subset sizes created by using only individual texture algorithms

Figure 9.3: Classifier performances for each texture algorithm.

44


45

It can be seen that the GLRLM algorithm is on average the best performing for

both normalised and non-normalised data, it is also affected the least by normalisation.

These results again show that a much-reduced subset can be used for good classifier

performances.


46

10. CONCLUSION & FURTHER WORK

A set of classifiers was developed which were used to validate the use of the five

texture algorithms for texture identification. Results showed that the algorithms generate

features that can be used to classify images by texture, and a texture combination image

was successfully segmented.

The generation of the features was found to be time-consuming, therefore feature

reduction was examined, and good classification was achieved using reduced subsets of

the original 38 features.

As the algorithms have been successfully used to differentiate between different

textures, the next step of the project is to use the algorithms to identify clinical features in

medical images. This will be investigated as a double-diploma project extension by

examining the segmentation of MRI images.


47

REFERENCES

1. Anas Zirari, “Design of a medical image analysis platform”, Final Year

Project 2002/2003 , University of Strathclyde

2. Menahem Friedman, Abraham Kandel “Introduction to pattern recognition:

statistical, structural, neural and fuzzy logic approaches.” Imperial College

Press 1999.

3. Charles W. Therrien, “Decision estimation and classification: an introduction

to pattern recognition and related topics.” John Wiley & Sons, Inc, 1989.

4. Robert M. Haralick, K. Shanmugam, Its’hak Dinstein, “Textural Features for

Image Classification.” IEEE Transactions on Systems, Man and Cybernetics,

Vol.3, No. 6, Nov 1973.

5. Anil K. Jain, “Advances in statistical pattern recognition”, NATO ASI Series,

Vol F30. “Pattern Recognition Theory and Applications”, edited by P.A.

Devijver and J. Kittler, Springer-Verlag Berlin Heidelberg, 1987.

6. Xiaoou Tang, “Texture Information in Run-Length Matrices”, IEEE

Transactions On Image Processing, Vol. 7, No. 11, Nov 1998.

7. http://www.nd.com/welcome/whatisnn.htm

8. P. Brodatz. “Textures: A Photographic Album for Artists & Designers.” New

York: Dover, 1966.

9. http://www.nlm.nih.gov/research/visible/visible_human.html

10. http://www.mathworks.com

11. Abhir Bhalerao and Constantino Carlos Reyes-Aldasoro, “Volumetric Texture

Description and Discriminant Feature Selection for MRI”, ??????


48

APPENDICES

Appendix A – Texture features

First-order - mean f1

- variance f2

- skew f3

- kurtosis f4

- energy f5

- coarseness f6

- entropy f7

- median f8

- mode f9

NGTDM - coarseness f10

- contrast f11

- busyness f12

- complexity f13

- texture strength f14

GLDM - contrast f15

- energy f16

- entropy f17

- mean f18

- homogeneity/inverse difference moment f19


49

GLRLM - short run emphasis f20

- long run emphasis f21

- grey level distribution f22

- run length distribution f23

- run percentage f24

SGLDM - contrast f25

- energy f26

- homogeneity f27

- correlation f28

- entropy f29

- sum of squares variance f30

- sum average f31

- sum variance f32

- sum entropy f33

- difference variance f34

- difference entropy f35

- information measure of correlation 1 f36

- information measure of correlation 2 f37

- maximal correlation coefficient f38


50

Appendix B – MATLAB code

Medical Image Analysis Using Texture Analysis

Documents