Master Thesis in Medical Physics · This thesis work explores the utility of medical image texture analysis for prostate cancer in the main context of a computer aided detection system.

Master Thesis in Medical Physics 25-05-2011

Joachim Nilsson

Medical Physics Programme – University of Gothenburg

Supervisors Tim Carter and David Atkinson, Centre for Medical Image Computing, University College London

2

Table of Contents

Abstract 3

Theory 4

Prostate Cancer 4

Magnetic Resonance Imaging 6

Texture Analysis 7

Markovian Co-Occurrence Matrix 8

Run-Length Matrix 9

Wavelet Transform 9

Fractal Dimensions 11

Computed Aided Diagnosis 11

Support Vector Machines 11

Methods 14

Texture Features 14

Markovian Co-Occurrence Matrix 15

Run-Length Matrix 17

Wavelet Decomposition 17

Fractal Dimension 18

Feature Selection 18

Support Vector Machine 19

Results 21

Feature Selection 21

Computer Aided Diagnosis 22

Discussion 23

References 26

3

Abstract This thesis work explores the utility of medical image texture analysis for prostate cancer in the main context of a computer aided detection system. Image data from a set of 37 patients who have undergone MRI examination at University College London Hospital has been used. A consultant radiologist drew regions of interest for suspicious tissue, which was cross-validated with biopsy information. This ground truth was used to estimate the accuracy of a computer aided diagnosis system based on image information derived from texture analysis. Approximately 170 texture features were extracted and investigated. The number was narrowed down to features carrying useful information by receiver operating characteristics analysis. The remaining features were used as an input feature vector in a non-linear support vector machine for the computer aided diagnosis. The predictive power of the texture analysis was assessed in itself, and further compared to a more extensive CAD system drawing data from physiological and functional images as well. The results show that texture analysis can provide information valuable for diagnosis, accurately classifying 70% of the regions tested. However, this information appears to be highly redundant in a more extensive context where other MRI data is included. When incorporated in a CAD system including such data, the texture features failed to increase the discriminatory power. There are major obstacles preventing an efficient and reliable scheme being realised, such as the need for image registration, zonal and glandular segmentation and problems arising from tissue edge effects. The use of texture analysis for prostate MRI might prove more beneficial to diagnosis if those obstacles are surpassed and if diffusion and contrast enhancement images can be incorporated readily.

4

Theory

Prostate Cancer

Diagnosis and treatment of prostate cancer involves two major difficulties due to its epidemiology and its tumour characteristics. The first challenge arises from it being very common in the population. It appears to occur in over 70% of all men over 70 [1]. That the disease is widespread in the population was noted long ago, when in the nineteen-thirties a high rate of previously unknown prostatic neoplasia was discovered at autopsy [2]. The second challenge emerges from the fact that despite prostatic cancer being almost endemic to the elderly male population in most parts of the world, the number of deaths caused by it is relatively low. The patient survival 15 years after diagnosis of the lower, and more common, grades of prostate cancer has been reported as 80% [3]. Most of the patients in this group of lower grades will die from other causes than their cancer, even if left untreated. Healthcare services are therefore confronted with the pertinent question of how to find the higher grades of disease out of a huge number of patients hosting the cancer, and treat them in an appropriate manner.

The prostate cancer group can further be stratified into peripheral zone (PZ) cancers, approximately 70% of all cases, and transitional zone (TZ) cancers, about 25% of all cases, the remaining arising in the central zone (CZ, not shown in the images, this zone surrounds the ejaculatory ducts), see Figure 1. The diagnosis of prostate cancer is commonly done through digital rectal examination of the patient, usually in combination with a blood assay of prostate-specific antigen (PSA) levels in the serum. Another common method for diagnosis is transrectal ultrasound (TRUS), often used to guide a biopsy procedure for histopathological examination. All of these methods suffer from systematic errors

Figure 1. Typical T2-weighted magnetic resonance image of a prostate. The enlarged section to the right highlights the transitional and peripheral zones (TZ, PZ) and the rectum. The cancerous tissue appears darker than the normal PZ tissue.

5

and low specificity [4-6]. The limitations of digital rectal examination include the obvious fact that the physician cannot access the whole gland and the procedure provides little information of tumour grade. PSA-tests are non-specific and levels that could indicate cancer are often caused by benign conditions, such as benign prostatic hyperplasia (BPH). TRUS biopsies involve a large number of cores extracted through the rectum. This procedure can provide useful grading information through the Gleason scoring system [7]. Gleason scores describe the histological changes in cancerous tissue, where a high score indicates a low grade of cell differentiation. The cell structure becomes more heterogeneous with less extracellular space. This could give rise to textural changes visible and quantifiable on MR images, which is part of the rationale for this work. While the Gleason grade is regarded as important in diagnosis and for decision upon treatment, its predictive power is limited [8-10]. It should also be noted that prostate cancer is not discernable on ultrasound images. The biopsies are further subject to systematic and random errors, due to limited access to the whole gland and the fact that precise targeting is prohibited by a mobile gland and poor image quality. TRUS biopsies also involve risk of infection from rectal bacteria, urinal retention and less dangerous discomforts such as haematuria. To achieve an acceptable accuracy in a biopsy procedure, a large number of cores are needed, as illustrated in Figure 2. Arguably, here lies the most important point against a wide prostate cancer screening scheme. A huge number of healthy patients would be biopsied as a result of an elevated PSA level, out of which a great number of patients would be diagnosed with a disease that poses little risk to their life and wellbeing. A more systematic biopsy procedure, the template transperineal prostate mapping (TPM) [11], addresses some of the issues with TRUS biopsies and provides a more comprehensive three-dimensional map in the form of pathology samples. One of the main advantages of this method is its high sensitivity. It also has potential to provide information enabling local treatment to the lesion in the prostate. This opens the possibility for accurate focal treatment, described below. While extensive prostate biopsy schemes can provide high-quality information about the disease, it also raises the cost and can inflict adverse effects on the patient through side-effects and discomfort. This has invited for magnetic resonance imaging (MRI) to be used in a more extensive manner for diagnosis, biopsy targeting, treatment planning and patient follow-up of prostate cancer. MRI is generally acknowledged as a valuable tool for diagnosis [12, 13]. When MRI is involved in this mesh of diagnostic methods, another fact must be considered. The tissue haemorrhage after biopsy causes changes in the MRI signal for normal prostatic tissue and it can appear similar to cancerous lesions, along with other

Figure 2. Saturation prostate biopsy scheme.

6

restrictions on the amount of useful information available from the image material [14]. Due to this effect, physicians are advised to not perform MRI examinations until at least an eight week period has passed after biopsy, or perform the MRI prior to biopsy. A patient-sparing and probably more expensive way is to perform the MRI before biopsy and potentially use the image material to guide the procedure. The treatments usually employed for prostate cancer include radical prostatectomy, external beam radiotherapy – sometimes combined with brachy-therapy, or more localised methods such as cryotherapy, high-intensity focussed ultrasound (HIFU), photodynamic therapy and hormonal therapy. The main treatment by radical prostatectomy or radiotherapy generally involves a burden of side-effects for the patient [15]. In order to decrease the adverse effects and avoid unnecessarily complex procedures, focal therapies such as HIFU and cryotherapy have seen an increased interest. These techniques can treat a minor part of the prostate gland while leaving the remainder to function normally. Through this and the avoidance of radical surgery, side-effects can be drastically lowered [16, 17]. For a focal therapy practice to be successful, diagnostic techniques with high sensitivity are needed, while somewhat lower specificity can be suffered. Perhaps the increase in interest for MRI of prostate cancer should be viewed in this respect as well, as a logical step in the demand for high-quality spatial disease information in order to perform adequate focal treatment.

Magnetic Resonance Imaging

Nuclear magnetic resonance techniques utilise the behaviour of nuclear spins in atoms for extraction of information. Applied to medicine, these techniques provide a powerful imaging modality that can assist physicians to diagnose disease, guide treatment and follow patients over time. An array of different techniques usually falls under the name magnetic resonance imaging (MRI) [18]. The patient is positioned in a strong magnetic field inside a scanner bore which aligns the nuclear spin moments in the field direction. This spin ensemble is manipulated by weaker, rapidly changing magnetic field gradients to produce spatially corresponding configurations. This process is known as spatial encoding, and is the fundamental step for image construction. By varying different parameters in this process of manipulation and encoding, different sets of basic anatomical information can be retrieved. The most common are T1-weighted and T2-weighted images, coined after the tissue spin relaxation properties that give rise to the contrast in those images. Other important techniques for prostate imaging are diffusion weighted imaging (DWI), dynamic contrast enhancement (DCE) and magnetic resonance spectroscopy (MRS) [19]. DWI produces weighted images where tissue with high water diffusion appears dark and tissue with restricted diffusion appears bright. A sequence of several DWI images can be combined to produce a map of the apparent diffusion coefficient (ADC) for the tissue. For prostate cancer, the lesion exhibits a restriction of diffusion [20, 21]. DCE imaging generates a time-resolved series of the delivery phase of an intravenous contrast agent. This is often used to explore

7

the perfusion characteristics of different tissues, where an advance enhancement and a quicker wash-out can be seen in prostatic carcinoma [22]. MR spectroscopy provides metabolic and functional information by indicating the levels of different metabolites in the tissue. For prostate cancer, malignant regions display an increase in choline and a decrease in citrate [23].

Texture Analysis

In a broad sense, texture analysis encompasses usage or extraction of information contained in properties of a surface or structure [24]. This definition is perhaps too hazy to convey the methods used to analyse digital medical images using texture. A more stringent definition would be analysis of spatial correspondence of pixel (picture element) intensities. It is likely that this is a major source of diagnostic information for radiologists in clinical practice alongside intensity and shape, as they visually assess an image. It is an obvious next step to explore the prospect of improving consistency and accuracy by quantising the texture in a meaningful way. Interest in this direction has been running alongside the advancements of computer science for a long time [25]. Evidence has also been produced that indicate high-order statistics can contain information not discernable by human vision [26]. However, the field of research was more broadly sparked in the 1970s with the publication of seminal work by Robert Haralick [27, 28] and others [29]. This research in texture analysis was applied for general image analysis as computers became increasingly capable of such tasks. Applied texture analysis consists of calculating textural features containing information from the image material. Some studies have shown a potential for texture features to provide valuable diagnostic information [30-32]. This can be especially beneficial in automatic or semi-automatic cases of large quantities of patient material, such as mammography screening [33]. A vast number of features have been developed since the advent of the research field. Some of the most common that are used in this work are described below.

Figure 3. A small region of an image, represented as integers and as gray-levels.

8

Markovian Co-Occurrence Matrix

Among these are the co-occurrence matrix, derived as a Markovian representation of the occurrence of gray-level pairs for a specific direction and distance. Figure 3, Figure 4 and Figure 5 illustrate how a co-occurrence matrix is composed for a small image segment. The d1,0deg matrix is computed by registering all pairs of gray levels i and j with distance 1 in the 0 degree direction. This work and report only handles the symmetric co-occurrence matrix, although a unidirectional version is sometimes employed. The co-occurrence matrix is often normalised so the sum of all occurrences amount to 1 (each element can therefore be interpreted as the joint probability of two values at a random point). A large number of features can be derived from the resulting matrix.

Figure 5. The d1, 0 degree co-occurrence matrix of the region in Figure 3 is constructed by calculating the occurrence of each pixel pair with distance 1 and angle 0 degrees.

Figure 4. The d1, 90 degree co-occurrence matrix for the image segment in Figure 3. Note that the resulting matrix is by definition symmetric for the bidirectional calculation.

9

Figure 6. Run-length matrix for the image segment displayed in Figure 3, for the 0 degree direction. Note the clustering to the first columns of the matrix due to the inhomogeneous example.

Run-Length Matrix

The run-length matrix is another commonly used texture representation, describing the occurrence of a specific gray-level of a specific length in a given direction [34]. If the image is simple, the run-length matrix can become very sparse with a low total element sum. This is exploited in several data compression techniques, for example in bmp and tiff image files. The run-length matrix of the example segment in Figure 3 is displayed in Figure 6.

Wavelet Transform

The wavelet transform has been mathematically defined and known as a concept since the start of the twentieth century [35]. It describes the mapping to a transform domain where the original image is represented by detail coefficients at multiple scales. This mapping operation can be done using different basis functions (wavelet functions) that are employed to extract the coefficients. The simplest set of basis for the transform was described by Alfréd Haar and is an orthogonal set of basis:

[ ] [

]

A represents the approximation filter (essentially a low-pass filter) and D the detail filter (a high pass-filter), see Figure 7. A reinvigoration of the research area has generated a wide range of new wavelet functions [36, 37]. They are used for various tasks, such as speech recognition and image compression (such as JPEG 2000). To understand the decomposition algorithm, it is useful to first imagine the wavelet transformation in a one-dimensional case. Take any series of integers. The array is downsampled by two, filtered with the basis functions above and the coefficients extracted. The detail coefficients (product of the D filtering) are stored and the approximation coefficient array is downsampled by two once again, and

10

Figure 8. Wavelet decomposition of a one-dimensional array of data. The coefficients A3, D1, D2 and D3 represent the complete data set.

Figure 9. Wavelet decomposition of the slice in Figure 1, using the Haar wavelet basis. The approximation coefficient image is downsampled and filtered (using the lateral(L), diagonal(D) and horizontal(H) filters) at each level (denoted by the numbers).

undergoes the same process of filtering. This is continued until the size of the filter coefficients exceeds the approximation coefficient array. The result is a wavelet representation, or wavelet decomposition, of the initial data. One way to illustrate this is shown in Figure 8. The filtering process becomes only a little more complex in the two-dimensional case of image decomposition. The detail coefficients are calculated for all three filter configurations, in two steps. Depending on the order of the filters, different coefficient maps are obtained and stored for each scale. The bidirectional low-pass filter generates the seed for the calculation on the next scale. A wavelet decomposition image can be constructed from the set of coefficients, which in effect is a set of three-directional high-pass filtered originals at different scales, see Figure 9. The wavelet coefficients are used for the texture feature calculation.

Figure 7. Set of one-dimensional Haar filters. The left is the high-pass filter and the right depicts the low-pass approximation filter.

11

Fractal Dimensions

A fractal describes as a self-similar structure with repetition on multiple scales. It is a concept with utility in many areas but can be applied to image analysis in order to examine multi-scale properties. The fractal capacity dimension is a property of a mathematical set to fill space at different scales [38]. The algorithm used here for approximating this dimension for the image data was the box-counting algorithm. In a simple way, the process calculates the number of boxes needed to cover a line or a surface for different box sizes. For this work, two techniques based on this concept are used to characterise a region in an image in terms of texture. One is the histogram fractal dimension (HFD), which operates in the histogram domain for the ROI. The other is the texture fractal dimension (TFD), operating on the surface of a three-dimensional representation of the ROI, where the third dimension is the intensity value of each pixel.

Computed Aided Diagnosis

Diagnosis of disease is a complex task involving the extraction and weighting of information to assemble an idea of the patient’s status and to apply a highly sophisticated pattern seeking in trying to explain it by previous knowledge. This task is undertaken by physicians with the assistance of a great deal of technical and biomedical techniques. The process of diagnosis can be in a lesser or greater extent be supported by various computer tools. Most medical procedures today involves a computer at some stage, but Computer Aided Diagnosis (CAD, an abbreviation sometimes used for the ancillary field of Computer Aided Detection as well) refers more specifically to when a computer is involved in actually making a decision, not merely extracting or processing information [39]. In radiology, the usual ways to employ CAD is in a semi-automatic or completely automatic manner. For the semi-automatic case, the radiologist or physician specifies the region of interest (ROI) and the computer system returns either a binary result or some probabilistic measure of the likelihood of disease. The automatic case quite self-explanatory uses image data directly as an input and either locates a region of suspicious tissue or returns some diagnostic output in general. One of the main algorithm frameworks for CAD is the support vector machine (SVM). It is the method used in this work and in the project adjoined to it.

Support Vector Machines

The SVM algorithm is a supervised machine learning technique and was invented by Corinna Cortes and Vladimir Vapnik in 1995 [40]. The algorithm takes a vector of data (testing set) and classifies it with respect to a set of previously learned sample vectors (training set). It is a supervised technique in the sense that the learning stage requires human interaction both in the input and output steps to teach the algorithm how to classify then. When the SVM is trained, it does not require supervision in order to make an adequate decision. The technique can be understood by imagining the SVM mapping the input vector of n features onto an

12

n-dimensional feature space and comparing the input coordinates with a hyperplane defined in the training step, see Figure 10. The position of the input vector in relation to the hyperplane decides the classification. A more strict mathematical formulation, which is not crucial to understand the concept, is provided below. A set of m training input vectors have a classification , where can take value 1 or -1 (healthy or malignant, in the medical case). The SVM training procedure can be expressed as a maximisation problem:

‖ ‖

∑

( ) where is the error for the training point , is a vector of coefficients for the hyperplane, b is the offset for the hyperplane vector and C is a constant determining the emphasis on minimising the error. The hyperplane vector norm is essentially a regularisation term. This equation finds the optimal hyperplane for a set of training data, in the meaning of maximising the margin from the data vectors in the different sets to the hyperplane, see Figure 10. The vectors defining that maximal margin are called support vectors.

The SVM has a dual formulation which can be used for increased clarity and utility [41]:

Figure 10. Conceptual separating plane for classification using SVM. The thick dark line is the hyperplane and the lighter ones represent the margin planes. Using the training set, the algorithm compares new points with the plane and classifies accordingly. Points on the edge of the margin are called support vectors. Note that this is for illustration and that SVM normally operates in dimensions higher than two.

13

∑ ∑

∑

where and are weights attributed to the data vectors in the training set.

Solving this problem finds the most important data points for the classification (the support vectors). The example above describes the linear kernel SVM. By replacing the dot product with a kernel function, the SVM can be based on non-linear discrimination. Generalising for the kernel representation, the final SVM classification equation becomes:

( ) (∑ ( )

)

It can easily be seen that a large represents an important support vector . By introducing a non-linear kernel, the process can be interpreted as a mapping into higher-dimensional space, one in which it is preferably easier to separate the two classification sets. Another interpretation is to view the kernel operation as a similarity measure between the support vectors and the input vector. It can therefore be important to carefully choose the kernel you employ for the SVM. A very common (and intuitive) kernel to use for this is the radial basis function (RBF), essentially a high-dimensional Gaussian function:

( ) ‖ ‖

This provides an intuitive similarity measure, as a value of 1 indicates identical points and 0 is the result of two points infinitely apart. Care must be taken when implementing SVM in order to achieve a reliable and accurate system. The parameters of the kernel should be selected either in a heuristic manner or through a robust algorithm of iterations. It is also essential to scale all features into a common range – effectively a normalisation - so that all dimensions have the same initial potential of influencing the decision of the machine.

14

Methods This work consists of applying texture analysis to a set of images and extracting texture features to be used in diagnosis, mainly through a SVM scheme. Previous work at University College London has explored standard MRI data, such as DCE derived parameters, histogram information and ADC values [42]. Texture features derived in this work was prepared to be appended and compared with the results previously found. The programming work was done using MATLAB and all image analysis was performed using in-house software for prostate viewing developed by Dean Barratt, Tim Carter and others. A total of 37 patients were included in the initial material, to be examined and narrowed down to the use of image material from 15 patients, from which 14 malignant regions (one was excluded because of an inconclusively drawn region, but the healthy regions of that patient was kept in the set) and 17 healthy regions were extracted. The images used were T2-weighted with 3 mm slice thickness and 0.35 cm/pixel resolution, acquired on a 1.5 T scanner. The reasons for narrowing the material down was equivocal regions, interest in only peripheral zone malignancy, insufficient image material (not all patients had all sequences performed) and signs of artefact corruption.

Texture Features

This section contains the definitions for all texture features extracted. All features can be and were derived from irregular ROIs, albeit in a somehow different manner. The fractal dimension features were calculated by performing the calculation on a square centred on the centre of gravity for the irregular ROI. This is due to the computational complexity of performing this otherwise. All other feature calculation uses any irregular ROI directly. The code for this feature extraction was developed and integrated in the prostate viewing software at CMIC as a part of this work. The pixel values in the ROI was normalised to the span of the region gray level mean extended three standard deviations in positive and negative directions as limits. The code that calculates the co-occurrence matrix features and the run-length matrix features utilises a minor MATLAB function developed by Xunkai Wei [43]. In the initial probing for plausible ways to implement texture analysis, and occasionally during the development process, a complete package of texture analysis software MaZda [44] was used. Regions of interest were drawn in malignant tissue by a consultant radiologist on an OsiriX workstation. The ROI data was loaded into the MATLAB viewer environment and used in the texture analysis software. A typical view is shown in Figure 11. Healthy peripheral zone tissue was delineated by the writer, guided by imaging and biopsy reports. All regions have been verified by template biopsy procedure information to obtain high accuracy in the ground truth for classification.

15

Markovian Co-Occurrence Matrix

( ) describes the probability or occurrence of gray level i to gray level j for distance d and direction 𝜃. is the number of gray levels represented in the ROI.

µ and σ describes the mean and standard deviations related to the marginal distributions ( ) and ( ).

Angular Second Moment:

∑∑ ( )

Contrast:

∑

∑∑ ( )

| |

Correlation:

∑ ∑ ( )

Figure 11. Prostate viewer developed at CMIC. The loaded images are T2 (TE 92ms, top left), ADC map (top right) and DCE wash-in phase (bottom left). Note the ROI around the suspicious tissue, which was drawn on the T2 image. There are obvious discrepancies between the shape and position of region in the T2-weighted image and the ADC map, due to artefacts and patient movement.

16

Sum of Squares:

∑∑( )

( )

Inverse Difference Moment:

∑∑

( ) ( )

Sum Average:

∑ ( )

( ) ∑∑ ( )

Sum Variance:

∑( ) ( )

Sum Entropy:

∑ ( ) ( ( ))

Entropy:

∑∑ ( ) ( ( ))

Difference Variance:

∑ ( ) ( )

( ) ∑∑ ( )

| |

where is the mean of the difference distribution

Difference Entropy:

∑ ( ) ( ( ))

17

Run-Length Matrix

For the run-length matrix, ( ) describes the occurrence of gray level i for a length j in direction 𝜃, is the number of gray levels in the ROI, is the longest run

length occurring and P is the total number of pixels in the ROI. C is defined as the sum of all elements in the run-length matrix. Short Run Emphasis:

(∑∑ ( )

) ⁄

Long Run Emphasis:

(∑∑ ( )

) ⁄

Gray Level Non-Uniformity:

(∑(∑ ( )

)

) ⁄

Run Length Non-Uniformity:

(∑(∑ ( )

)

) ⁄

Run Fraction:

∑∑ ( )

∑∑ ( )

⁄

Wavelet Decomposition

The wavelet decomposition produces coefficient images at multiple scales. By segmenting the ROI at each scale, the following measures were calculated, where c(x,y) describes the pixel in the coefficient image and is pixels in the ROI indexed: Energy:

( ∑ ( )

) ⁄

Variance:

∑( )

18

∑

Kurtosis:

∑( )

Entropy:

∑ ( )

Fractal Dimension

The fractal capacity dimensions were calculated using a box-counting algorithm developed by Frédéric Moisy [45]. The algorithm was applied to a surface plot and a line plot of the ROI and the histogram respectively. The mathematical definitions for the features follow below. Fractal Capacity Dimension:

( )

( ⁄ )

where is the number of boxes of side r. Another way to express this is to find the exponent in the function:

where N is the number of boxes of side R required to cover the fractal set, is a constant and FD is the fractal dimension. The capacity dimension is always less than or equal to the Euclidian dimension of the set, which means that the TFD 3 and HFD 2. A low capacity dimension indicates that the set is fractal in nature. A heterogeneous tissue ROI with fine detail would be distinguished by a higher fractal capacity dimension.

Feature Selection

Using the definitions above, which were judged as a reasonable encompassing set to describe the texture in the images, a total number of 172 features were extracted by varying parameters. In order to narrow the number of features, a pre-diagnostic assessment of information content was made. This process is in effect analogous to dimensionality reduction, but for clarity and reliability a more systematic approach was used, the receiver operating characteristics (ROC) of a simple linear decision operation for each individual feature. The ROC is determined by use of a shifting discriminatory level, and for each discrete data point is passes, the sensitivity and specificity is calculated. Plotting the points for

19

each discriminatory level result in the ROC curve, see Figure 12. The area under the curve (AUC) is determined by integration and amounts to 1 for perfect discrimination and 0.5 for a theoretical random case. The list of features and their performance was assessed manually and a set of highest scoring features was used for the SVM step.

Support Vector Machine

The software used for SVM was LIBSVM, an open source package developed by Chih-Chung Chang and Chih-Jen Lin [46]. The C-SVM algorithm was used in this work for training and classification. The package includes an iterative algorithm of cross-validation used to determine the kernel parameters for a near-optimal classification. Depending on the number of instances and the number of features used in the SVM, the linear kernel may perform appropriately. The radial basis function kernel converges to this if a (C, ) search is done [47]. In general, however, the RBF outperforms the linear kernel, albeit at a higher computational cost. The RBF was therefore employed and the parameters derived used in the final classification run. Three sets of data were compared through the SVM classification. The first was the full 31 samples of texture regions, the second and third were 14 regions each, see Figure 13. The first set contained only data derived from texture and comprised more regions since only T2 data was required. The second set contained CAD features previously studied (detailed below), and the third contained all texture and the previously used data combined. The previous features that has been shown to be useful [42] and employed for comparison here

Figure 12. Receiver operating characteristics curve. The blue line represents a typical (although smoothed) curve for a texture feature extracted. The dashed line would result from a completely random discrimination if both cases were equally frequenting. The area under the curve is calculated by integration from 0 to 1.

20

was and . The first three are derived from DCE

analysis where represents the volume transfer constant between blood plasma and extra-cellular extra-vascular space (EES), is the fraction of EES per unit volume and is the rate of exchange between EES and blood plasma

(calculated through / ) [48]. It also included histogram features; mean, kurtosis, percentiles 1, 10, 50, 90, 99%. In order for the SVM analysis to be meaningful, all data was scaled to [-1, 1] before further processing. The linear scaling range was determined by the maximum and minimum values in the training set as limits. The testing data was scaled to the span specific to the training data for each particular leave-one-out cycle. After the data scaling, the kernel parameters were established by a parameter space grid search. An array of (C, ) values was tested and through cross-validation scoring, the most appropriate for the data set was used. The accuracy of the feature sets was investigated by rotating the training and testing data by a leave-one-out approach. One data point (features derived from one region) was excluded from the training process and then tested in the model constructed from the remaining data. This was repeated until all points have been tested. This method maximises the amount of information obtained about the data and its classification performance while avoiding training bias from using the same points in both training and testing.

Figure 13. The three sets used for SVM classification.

21

Results

Feature Selection

The discriminatory power of the texture features extracted was examined by constructing ROC curves. Table 1 show the features scoring AUC above .65. The table contains all the features used in the next step of computer aided diagnosis. Table 1. Texture features exhibiting AUC above .65.

Texture Feature AUC

Wavelet Approx. Scale 1 - Energy 0.80

Wavelet Lateral Scale 1 – Kurtosis 0.72

Wavelet Approx. Scale 2 – Energy 0.80

Wavelet Approx. Scale 2 – Kurtosis 0.69

Wavelet Approx. Scale 3 – Energy 0.79

Wavelet Diagonal Scale 3 – Energy 0.69 Run Length 0 degrees Long Run Emphasis 0.66

Run Length 0 degrees Run Percentage 0.65

Run Length 90 degrees Gray Level Non-Uniformity 0.65



COM 90 degrees, distance 1 - Correlation 0.67

COM 90 degrees, distance 3 - Correlation 0.66 COM 45 degrees, distance 3 – Correlation 0.65

COM 90 degrees, distance 10 - Correlation 0.66

COM 90 degrees, distance 10 – Homogeneity 0.69

COM 0 degrees, distance 10 – Correlation 0.65

Texture Fractal Dimension 0.70

The prevalent features are low-pass filter wavelet coefficients, the run length measure of non-uniformity, correlation in the co-occurrence matrix and the texture fractal dimension. The wavelet features with high scores are describing the average gray level of the ROIs by calculating the energy of its coefficients. The run length features scoring in the table are mainly an indicator of the degree of inhomogeneity. As can be seen in the definitions, the gray-level non-uniformity measures the occurrence of dominant gray-levels in the region, giving high impact to short runs. Correlation in the co-occurrence matrix can be understood as another homogeneity measure by calculating the relation between adjacent pixels in the ROI. If many of them follow a tight range in between them, the co-occurrence matrix correlation will be high. Texture fractal dimension falls within the same area of homogeneity rating for the region analysed, since it measures the complexity of the region by projection to three dimensions and consequent box-counting. Due to the limited size of the regions used in this work, it is

22

questionable how representative the co-occurrence measures of distance 10 pixels are. An example of two regions and three feature values is shown in Figure 14.

Computer Aided Diagnosis

The classification power in the texture features derived in this work was assessed as accuracy – ratio of correctly classified regions out of all – for the three feature data sets. The first data set, using only data from the texture analysis features listed in Table 1, classified with an accuracy of 71% (22/31). Using only the information from histogram and MRI features, the accuracy was 93% (13/14). With all information combined in one vector, the accuracy reached 86% (12/14). The confusion matrices for all feature sets are shown in Figure 15. The small sample size of 14 regions in the latter two cases makes the uncertainty due to random and systematic errors in classification and model construction significant. The reasonable conclusion is that the accuracies of those sets should be considered equal.

Figure 14. Tissue example with healthy and malignant regions drawn. The three feature values are shown for illustration and are not chosen for typicality.

Figure 15. Confusion matrices for the three SVM runs. Correct predictions are marked blue. M – malignant, H – healthy, TPR – true positive ratio, FPR – false positive ratio.

23

Discussion The methods of this thesis work span a limited application and could not be justifiably argued as comprehensive for texture analysis. The approach was semi-automatic and did not attempt to take measure of an automatic scheme. There are however conclusions to be drawn from inference to the larger scope. The first is that it appears many of the features carry redundant information when compared with the histogram information already utilised in the CAD scheme. The wavelet features scoring high in Table 1 have similar AUC values as the mean and percentile measures. This should be obvious since they derive from the energy of the low-pass decomposition coefficients. This histogram information could easily be extracted from a region by simple computation. The highest scoring features in the previous work on prostate CAD at UCL [42] does not appear to be trumped by any texture measure considering the relatively low individual AUC values and the final SVM classification figures, although implementing both might have advantages in terms of robustness. This work does not attempt to investigate the amount of useful information carried by different texture features in relation to others, but it appears very probable that many can be rendered superfluous by a rigorous selection and cross-validation scheme. An important extension to the project would be a subset-cross-validation for feature selection. The current method only individually estimates the predictive power of each feature and selects a number of the highest scoring features. In order to make use of any powerful combination of two or three features, a rotating subset validation would be appropriate. It is likely that there is a bias favouring the current approach since the same data set is used to select SVM kernel parameters, to train the classifier and to perform a testing round. With more data, careful attention should be paid to avoid any such bias by using separate sets for training and the kernel parameter search, and preferably making the final CAD accuracy assessment with previously unused data. Owing to the low number of samples used to train a high-dimensional SVM, it is also difficult to draw any general conclusions of the impact of the dimensionality, which is assumed to be beneficial to the classifier. From these conclusions it is obvious that more work is required on feature selection, if the amount of data cannot be increased substantially. Since the methodology employed in this work should be viewed as generous due to strictly defined regions by a radiologist and conformal patient material, it evades several limitations of the general texture approach. A more automatic system would have to deal with complex issues such as reliable segmentation of the prostate and between peripheral and transitional zones – since the appearance of malignancy differs. Further issues would arise from edges where the malignant tissue might not be distinguishable from extra-prostatic tissue. Artefacts would impair both the necessary segmentation and obfuscate the texture of malignancy.

24

It is most likely that a more encompassing scheme including all image material would perform poorer than this work. One major disadvantage of the current texture analysis method is that it only uses the T2 data. In order to make any reliable conclusions of texture analysis for prostate on the whole, other types of MRI data should be annexed to the scheme. Implementing ADC and DCE data, since both have independently shown promising results in both previous work and published material, would plausibly add to the strength of the method. Diffusion images with high b-values appear to carry particular power for discrimination that warrants further investigation [21]. Any analysis attempting to incorporate data from these sources would have to either be semi-automatic with ROIs drawn for each slice and image type or it would have to apply some non-rigid registration technique. The ADC maps and its constituent raw b-weighted images used for calculation are subject to susceptibility effects and the prostate is not always delineated adequately for simple registration to work. Another aspect to consider is the clinical perspective, not only in terms of reliability but how to approach the issue of displaying information to radiologists. This work produces binary decisions for each region under analysis. Several radiologists have expressed a desire for more elaborative and more reliable endpoints. One viable approach could be a pixel-by-pixel based analysis where the relevant texture features are calculated and shown as a colour overlay or other graphical representation corresponding to probability of disease to aid the process of diagnosis. This would suffer from the edge effects discussed above but does not make decisions and thus leaves the clinician to utilise whatever information appropriate for the task. Similar to the pixel-based approach, there seems to be a demand for probabilistic end-points for each region of interest. Presently, the CAD system delivers a binary result regardless of how “certain” the data point is in relation to the constructed model. The set of texture features used in this work are some of the most widely used for different classification tasks and it is reasonable to assume they constitute a broad enough approach to claim having probed the diagnostic potential properly. There are evidently some textural differences between healthy and malignant prostate tissue, as seen in Figure 14, but whether it is helpful to attempt to quantify it and use it as a robust discriminatory measure is doubtful. A further exploration of the impact of altering calculation parameters, such as quantisation levels might illuminate potential pitfalls in this work. It is unlikely that the strength of the method could be increased much by this, however. The prospect of three-dimensional texture analysis was not assessed in this work. It could provide a better set of features, but requires a more extensive programming work and – at least with the current semi-automatic approach – more input from the radiologist. The theoretical basis of texture analysis is

25

unchanged by extension to three dimensions, as most calculations are easily extrapolated in further dimensions. From the CAD results, can one conclude there is enough evidence to support any claims of additional information provided through texture analysis? Even if a 70% discriminatory power in itself is not particularly low, there are several reasons to remain hesitant. The tumour cases involved in this work are fairly obvious, even if the grade of the cancers are bordering on clinically insignificant (Gleason 4+3 typical of biopsy results). It is reasonable to assume any clinician would achieve a high accuracy of diagnosis on his/her own, and additional input by texture analysis would not improve diagnostic outcome. It must be concluded that there is no evidence for the texture features contributing any further valuable information to the CAD system as it was implemented in this work.

26

References 1. Sakr, W.A., et al., Age and racial distribution of prostatic intraepithelial

neoplasia. Eur Urol, 1996. 30(2): p. 138-44. 2. Rich, A., On the Frequency of Occurence of Occult Carcinoma in the

Prostate. Journal of Urology, 1935. 33(3). 3. Johansson, J.E., et al., Fifteen-year survival in prostate cancer. A

prospective, population-based study in Sweden. JAMA : the journal of the American Medical Association, 1997. 277(6): p. 467-71.

4. Stamey, T.A., et al., The prostate specific antigen era in the United States is over for prostate cancer: what happened in the last 20 years? The Journal of urology, 2004. 172(4 Pt 1): p. 1297-301.

5. Keetch, D.W., W.J. Catalona, and D.S. Smith, Serial prostatic biopsies in men with persistently elevated serum prostate specific antigen values. The Journal of urology, 1994. 151(6): p. 1571-4.

6. Catalona, W.J., et al., Comparison of digital rectal examination and serum prostate specific antigen in the early detection of prostate cancer: results of a multicenter clinical trial of 6,630 men. The Journal of urology, 1994. 151(5): p. 1283-90.

7. Gleason, D.F. and G.T. Mellinger, Prediction of prognosis for prostatic adenocarcinoma by combined histological grading and clinical staging. The Journal of urology, 1974. 111(1): p. 58-64.

8. Andrén O, F.K., Franzén L, Andersson SO, Johansson JE, Rubin MA, How well does the Gleason score predict prostate cancer death? A 20-year followup of a population based cohort in Sweden. the Journal of urology, 2006. 175(4): p. 1337-40.

9. Stark, J.R., et al., Gleason score and lethal prostate cancer: does 3 + 4 = 4 + 3? Journal of clinical oncology : official journal of the American Society of Clinical Oncology, 2009. 27(21): p. 3459-64.

10. Andriole, G.L., Pathology: the lottery of conventional prostate biopsy. Nature reviews. Urology, 2009. 6(4): p. 188-9.

11. Barzell, W.E. and M.R. Melamed, Appropriate patient selection in the focal treatment of prostate cancer: the role of transperineal 3-dimensional pathologic mapping of the prostate--a 4-year experience. Urology, 2007. 70(6 Suppl): p. 27-35.

12. Dickinson, L., et al., Magnetic resonance imaging for the detection, localisation, and characterisation of prostate cancer: recommendations from a European consensus meeting. European urology, 2011. 59(4): p. 477-94.

13. Kirkham, A.P., M. Emberton, and C. Allen, How good is MRI at detecting and characterising cancer within the prostate? European urology, 2006. 50(6): p. 1163-74; discussion 1175.

14. White, S., et al., Prostate cancer: effect of postbiopsy hemorrhage on interpretation of MR images. Radiology, 1995. 195(2): p. 385-90.

15. Walsh, P.C., et al., Patient-reported urinary continence and sexual function after anatomic radical prostatectomy. Urology, 2000. 55(1): p. 58-61.

27

16. Ahmed, H.U., et al., Will focal therapy become a standard of care for men with localized prostate cancer? Nature clinical practice. Oncology, 2007. 4(11): p. 632-42.

17. Ganzer, R. and A. Blana, Do we have enough evidence to recommend the routine use of high-intensity focussed ultrasound for the primary and salvage treatment of prostate cancer? European urology, 2010. 58(6): p. 816-8.

18. McRobbie, D.W., MRI from picture to proton. 2nd ed2007, Cambridge, UK ; New York: Cambridge University Press. xii, 394 p.

19. Arumainayagam, N., et al., Accuracy of multiparametric magnetic resonance imaging in detecting recurrent prostate cancer after radiotherapy. BJU international, 2010. 106(7): p. 991-7.

20. Tan, C.H., J. Wang, and V. Kundra, Diffusion weighted imaging in prostate cancer. European radiology, 2011. 21(3): p. 593-603.

21. Hambrock, T., et al., Relationship between Apparent Diffusion Coefficients at 3.0-T MR Imaging and Gleason Grade in Peripheral Zone Prostate Cancer. Radiology, 2011.

22. Villers, A.P., P Leroy, X Biserte, J, Dynamic contrast-enhanced MRI for preoperative identification of localised prostate cancer. European Urology Supplements 2007. 6: p. 525-532.

23. Schmuecking, M., et al., Dynamic MRI and CAD vs. choline MRS: where is the detection level for a lesion characterisation in prostate cancer? International journal of radiation biology, 2009. 85(9): p. 814-24.

24. Sonka, M., Image processing, analysis, and machine vision. International student ed2007, Mason, OH: Thomson.

25. Kaizer, H., A quantification of textures on aerial photographs, in Technical Note 121 1955, Boston University Research Laboratory: Boston.

26. Julesz, B., et al., Inability of humans to discriminate between visual textures that agree in second-order statistics-revisited. Perception, 1973. 2(4): p. 391-405.

27. Haralick, M.R.S., K. Dinstein, I., Textural Features for Image Classification. IEEE Transactions on Systems, Man, & Cybernetics, 1979. 6: p. 610-621.

28. Haralick, M.R., Statistical and structural approaches to texture. Proceedings of the IEEE, 1979. 67(5): p. 786 - 804.

29. Bajcsy, R.L., L, Texture gradient as a depth cue. Computer Graphics and Image Processing, 1976. 5(1): p. 52-67.

30. Alic, L., et al., Heterogeneity in DCE-MRI parametric maps: a biomarker for treatment response? Physics in medicine and biology, 2011. 56(6): p. 1601-16.

31. Dongjiao, L.X., G. Xiaoying, W. Jue, Z. Jing, F., Computerized Characterization of Prostate Cancer by Fractal Analysis in MR Images. Journal of Magnetic Resonance Imaging, 2009. 30: p. 161-168.

32. Holli, K., et al., Characterization of breast cancer types by texture analysis of magnetic resonance images. Academic radiology, 2010. 17(2): p. 135-41.

33. Tourassi, G.D., Journey toward computer-aided diagnosis: role of image texture analysis. Radiology, 1999. 213(2): p. 317-20.

34. Galloway, M., Texture analysis using gray level run lengths. Computer Graphics and Image Processing, 1974. 4(2): p. 172-179.

28

35. Haar, A., Zur Theorie der orthogonalen Funktionensysteme. Mathematische Annalen, 1910. 69: p. 331-371.

36. Daubechies, I., Ten lectures on wavelets. Regional conference series in applied mathematics1992.

37. Daubechies, I., Orthonormal Bases of Compactly Supported Wavelets. Communications on Pure and Applied Mathematics, 1988. 41: p. 909-996.

38. Weisstein, E. Capacity Dimension. 2011 [cited 2011; http://mathworld.wolfram.com/CapacityDimension.html].

39. Doi, K., Current status and future potential of computer-aided diagnosis in medical imaging. The British journal of radiology, 2005. 78 Spec No 1: p. S3-S19.

40. Vapnik, V.C., C, Support-vector networks. Machine Learning, 1995. 20: p. 273-297.

41. Cristianini, N. and J. Shawe-Taylor, An introduction to support vector machines : and other kernel-based learning methods2000, Cambridge ; New York: Cambridge University Press. xiii, 189 p.

42. Plantinga, B., Development of a CAD system to aid the diagnosis of prostate cancer in prostate cancer screening, 2011, Centre for Medical Image Computing, University College London: London.

43. Xunkai, W., zigzag.m, 2007, Beijing Aeronautical Technology Research Center: Beijing. p. [email protected].

44. Szczypinski, P.M., et al., MaZda--a software package for image texture analysis. Computer methods and programs in biomedicine, 2009. 94(1): p. 66-76.

45. Moisy, F., boxcount.m, 2008, Université Paris Sud: Paris. 46. Chang, C.-C.L., C-J., LIBSVM - a library for support vector machines, 2001:

Taipei. p. http://www.csie.ntu.edu.tw/~cjlin/libsvm. 47. Hsu, C.-W.C., Chih-Chung Lin Chih,Jen, A Practical Guide to Support

Vector Classification, 2010, Department of Computer Science, National Taiwan University: Taipei.

48. Tofts, P.S., et al., Estimating kinetic parameters from dynamic contrast-enhanced T(1)-weighted MRI of a diffusable tracer: standardized quantities and symbols. Journal of magnetic resonance imaging : JMRI, 1999. 10(3): p. 223-32.

http://mathworld.wolfram.com/CapacityDimension.html%5d

http://www.csie.ntu.edu.tw/~cjlin/libsvm

Master Thesis in Medical Physics · This thesis work explores the utility of medical image texture analysis for prostate cancer in the main context of a computer aided detection system.

Documents