Top Banner
Fully automated classication of HARDI in vivo data using a support vector machine S. Schnell a, , D. Saur b , B.W. Kreher a , J. Hennig a , H. Burkhardt c , V.G. Kiselev a a Medical Physics, Department of Diagnostic Radiology, University Medical Center Freiburg, Hugstetter Str. 55, D-79106 Freiburg, Germany b Freiburg Brain Imaging Center, Department of Neurology, University Medical Center Freiburg, Germany c Chair in Pattern Recognition and Image Processing, Institute of Computer Science, Albert-Ludwigs-University Freiburg, Germany abstract article info Article history: Received 20 August 2008 Revised 3 February 2009 Accepted 1 March 2009 Available online 12 March 2009 Keywords: Spherical harmonics Rotation invariant Support vector machine Classication HARDI Crossing bre bundles The purpose of this study is the classication of high angular resolution diffusion imaging (HARDI) in vivo data using a model-free approach. This is achieved by using a Support Vector Machine (SVM) algorithm taken from the eld of supervised statistical learning. Six classes of image components are determined: grey matter, parallel neuronal bre bundles in white matter, crossing neuronal bre bundles in white matter, partial volume between white and grey matter, background noise and cerebrospinal uid. The SVM requires properties derived from the data as input, the so called feature vector, which should be rotation invariant. For our application we derive such a description from the spherical harmonic decomposition of the HARDI signal. With this information the SVM is trained in order to nd the function for separating the classes. The SVM is systematically tested with simulated data and then applied to six in vivo data sets. This new approach is data- driven and enables fully automatic HARDI data segmentation without employing a T1 MPRAGE scan and subjective expert intervention. This was demonstrated on ve test in vivo data sets giving robust results. The segmentation results could be used as a priori knowledge for increasing the performance of bre tracking as well as for other clinical and diagnostic applications of diffusion weighted imaging (DWI). © 2009 Elsevier Inc. All rights reserved. Introduction Diffusion weighted MRI (DWI) and in particular measurements of diffusion anisotropy provides biologically relevant information about the tissue microstructure. A special focus of interest for research and clinical application of DWI is the investigation of the brain white matter (WM) structure. Such measurements allow the reconstruc- tion of the neuronal bre architecture in WM, the visualisation of bre tracks and the examination of morphological connectivity between different cortical and sub-cortical regions. Data acquisition is typically performed using the so called High Angular Resolution Diffusion Imaging (HARDI) approach introduced by Tuch et al. (1999). This method consists of the application of diffusion encoding (DE) gradients in a large number of non-collinear directions. With, for instance, 64 DE gradient directions the spatially non-Gaussian diffusion behaviour of water in white matter regions with hetero- geneous bre orientations can be resolved. Therefore HARDI evolved to be the basis for many post-processing approaches for resolving the spatial structure of neuronal bre bundles in WM. Specically, it would be advantageous to distinguish between parallel (PF) and crossing (CF) bre bundles. The existing methods for inferring multiple bre bundle popula- tions from diffusion data can be classied into two groups (Behrens et al., 2007): model-dependent methods for the estimation of the underlying diffusion prole or model-free methods based on the inherent structure of the diffusion prole itself. The generic model- based method is Diffusion Tensor Imaging (DTI) (Basser et al., 1994), which was the rst method used as a basis for the reconstruction of neuronal bres, i.e. bre tracking. The diffusion tensor (DT) repre- sents the apparent diffusion coefcient (ADC) and can be explained as the averaging of all water spins in a voxel when applying DE gradients in several spatial directions. From the DT, anisotropy mea- sures, such as the fractional anisotropy (FA), can be derived. The main drawback of DTI is that it can only reveal a single bre orientation in each voxel and fails in voxels containing complex tissue architecture with more than one signicant bre orientation. One segmentation procedure based on the DT model applies a supervised clustering procedure with a collection of DTI metrics in regions of interests for the segmentation of GM, WM and CSF (Hasan and Narayana, 2006). In this method, the contrast of FA maps between CSF, WM and GM was used, based on the principal diffusivity indices. The CSF was segmented using its high diffusivity and low anisotropy properties. However, since this method is based on DTI, no further classication of the WM subclasses PF and CF was possible. An approach that combines model-dependent and model-free methods for the differentiation of parallel and crossing bre bundles based on HARDI and DTI was described by Kreher et al., (2005). In this approach a multi-diffusion tensor model was introduced, which contains one anisotropic and one isotropic diffusion tensor in order to model the tissue structures. In each voxel it is decided separately which of the two models is more appropriate for describing the NeuroImage 46 (2009) 642651 Corresponding author. Fax: +49 761 270 3832. E-mail address: [email protected] (S. Schnell). 1053-8119/$ see front matter © 2009 Elsevier Inc. All rights reserved. doi:10.1016/j.neuroimage.2009.03.003 Contents lists available at ScienceDirect NeuroImage journal homepage: www.elsevier.com/locate/ynimg
10

Fully automated classification of HARDI in vivo data using a support vector machine

May 04, 2023

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Fully automated classification of HARDI in vivo data using a support vector machine

NeuroImage 46 (2009) 642–651

Contents lists available at ScienceDirect

NeuroImage

j ourna l homepage: www.e lsev ie r.com/ locate /yn img

Fully automated classification of HARDI in vivo data using a support vector machine

S. Schnell a,⁎, D. Saur b, B.W. Kreher a, J. Hennig a, H. Burkhardt c, V.G. Kiselev a

a Medical Physics, Department of Diagnostic Radiology, University Medical Center Freiburg, Hugstetter Str. 55, D-79106 Freiburg, Germanyb Freiburg Brain Imaging Center, Department of Neurology, University Medical Center Freiburg, Germanyc Chair in Pattern Recognition and Image Processing, Institute of Computer Science, Albert-Ludwigs-University Freiburg, Germany

⁎ Corresponding author. Fax: +49 761 270 3832.E-mail address: [email protected]

1053-8119/$ – see front matter © 2009 Elsevier Inc. Aldoi:10.1016/j.neuroimage.2009.03.003

a b s t r a c t

a r t i c l e i n f o

Article history:Received 20 August 2008Revised 3 February 2009Accepted 1 March 2009Available online 12 March 2009

Keywords:Spherical harmonicsRotation invariantSupport vector machineClassificationHARDICrossing fibre bundles

The purpose of this study is the classification of high angular resolution diffusion imaging (HARDI) in vivodata using a model-free approach. This is achieved by using a Support Vector Machine (SVM) algorithmtaken from the field of supervised statistical learning. Six classes of image components are determined: greymatter, parallel neuronal fibre bundles in white matter, crossing neuronal fibre bundles in white matter,partial volume between white and grey matter, background noise and cerebrospinal fluid. The SVM requiresproperties derived from the data as input, the so called feature vector, which should be rotation invariant. Forour application we derive such a description from the spherical harmonic decomposition of the HARDI signal.With this information the SVM is trained in order to find the function for separating the classes. The SVM issystematically tested with simulated data and then applied to six in vivo data sets. This new approach is data-driven and enables fully automatic HARDI data segmentation without employing a T1 MPRAGE scan andsubjective expert intervention. This was demonstrated on five test in vivo data sets giving robust results. Thesegmentation results could be used as a priori knowledge for increasing the performance of fibre tracking aswell as for other clinical and diagnostic applications of diffusion weighted imaging (DWI).

© 2009 Elsevier Inc. All rights reserved.

Introduction

Diffusion weighted MRI (DWI) and in particular measurements ofdiffusion anisotropy provides biologically relevant information aboutthe tissue microstructure. A special focus of interest for research andclinical application of DWI is the investigation of the brain whitematter (WM) structure. Such measurements allow the reconstruc-tion of the neuronal fibre architecture in WM, the visualisation offibre tracks and the examination of morphological connectivitybetween different cortical and sub-cortical regions. Data acquisitionis typically performed using the so called High Angular ResolutionDiffusion Imaging (HARDI) approach introduced by Tuch et al.(1999). This method consists of the application of diffusion encoding(DE) gradients in a large number of non-collinear directions. With,for instance, 64 DE gradient directions the spatially non-Gaussiandiffusion behaviour of water in white matter regions with hetero-geneous fibre orientations can be resolved. Therefore HARDI evolvedto be the basis for many post-processing approaches for resolving thespatial structure of neuronal fibre bundles in WM. Specifically, itwould be advantageous to distinguish between parallel (PF) andcrossing (CF) fibre bundles.

The existing methods for inferring multiple fibre bundle popula-tions from diffusion data can be classified into two groups (Behrenset al., 2007): model-dependent methods for the estimation of the

e (S. Schnell).

l rights reserved.

underlying diffusion profile or model-free methods based on theinherent structure of the diffusion profile itself. The generic model-based method is Diffusion Tensor Imaging (DTI) (Basser et al., 1994),which was the first method used as a basis for the reconstruction ofneuronal fibres, i.e. fibre tracking. The diffusion tensor (DT) repre-sents the apparent diffusion coefficient (ADC) and can be explainedas the averaging of all water spins in a voxel when applying DEgradients in several spatial directions. From the DT, anisotropy mea-sures, such as the fractional anisotropy (FA), can be derived. The maindrawback of DTI is that it can only reveal a single fibre orientation ineach voxel and fails in voxels containing complex tissue architecturewith more than one significant fibre orientation. One segmentationprocedure based on the DT model applies a supervised clusteringprocedure with a collection of DTI metrics in regions of interests forthe segmentation of GM, WM and CSF (Hasan and Narayana, 2006).In this method, the contrast of FA maps between CSF, WM and GMwas used, based on the “principal diffusivity indices”. The CSF wassegmented using its high diffusivity and low anisotropy properties.However, since this method is based on DTI, no further classificationof the WM subclasses PF and CF was possible.

An approach that combines model-dependent and model-freemethods for the differentiation of parallel and crossing fibre bundlesbased on HARDI and DTI was described by Kreher et al., (2005). In thisapproach a multi-diffusion tensor model was introduced, whichcontains one anisotropic and one isotropic diffusion tensor in order tomodel the tissue structures. In each voxel it is decided separatelywhich of the two models is more appropriate for describing the

Page 2: Fully automated classification of HARDI in vivo data using a support vector machine

643S. Schnell et al. / NeuroImage 46 (2009) 642–651

underlying diffusion and therefore more suitable for the detection ofcrossing fibre bundles.

The first model-free method which used spherical harmonics forthe description of the diffusion profile acquired with HARDI datawas reported by Frank (2002). Spherical harmonics are functionssimilar to Fourier expansions, but described in spherical polar coor-dinates (polar angle θ and the azimuth angle φ). Every functionthat takes as its arguments the directions θ and φ can be expandedinto spherical harmonics. A function of the signal S can be describedwith spherical harmonics as follows (Webster and Szego, 1930;Arfken and Weber, 1985):

S θ;uð Þ =X∞n=0

X+ n

m= −n

amn Ymn θ;uð Þ ð1Þ

where Y is the spherical harmonic of order n (all integer n≥0), andmthe azimuthal separation constant or degree (all integer m, |m|≤n).The coefficients anm are expressed as:

amn =Z2π

j=0

θ=0

YmTn θ;/ð ÞS θ;/ð Þ sin θdθdu ð2Þ

with Y⁎ being the complex conjugate of Y. The expansion of Eq. (1) canbe terminated at some n. The higher the order n themore complex thedeviation from the spherical shape (n=0), which can be described.Then, however, a necessary condition of the sampling theoremrequires that more directions be measured (see section Theory andmethods and Yeo, 2005).

In Frank's (2002) approach isotropic diffusion occurring in wateror CSF is described by zero order spherical harmonics, diffusion alongparallel fibres by second order spherical harmonics, and diffusion inthe multiple fibre case is approximated by the fourth order. The oddorders describe asymmetric components and therefore representimaging artefacts and noise. By using a high order versus low orderratio of the spherical harmonic coefficients, Frank presented a methodfor differentiation between PF and CF, which is however subject tolimitations. The results could include possible misclassification, espe-cially for WM regions containing multiple crossings. These regionsappeared like isotropic voxels similar to GM voxels. Differentiationbetween GM, CSF and background noise was thus not feasible withexclusive use of the spherical harmonic description. Descoteaux et al.(2006) extended the model in order to distinguish between isotropic,one-fibre and multi-fibre diffusion. This procedure is very promising,but automatic full image segmentation was not possible, since CF wasstill often misclassified as GM or noise. Alexander et al. (2002)described a method for the modelling and detection of non-Gaussiandiffusion profiles also using spherical harmonics, but up to an order ofeight, providing a sequence of models of increasing complexity. Astatistical test was performed in order to find the simplest of themodels which adequately described the data. This method wasapplied in a human experiment and seemed to classify isotropic(GM) and anisotropic Gaussian (WM) regions correctly as order zeroand order two, respectively. It was found that on average five percentof profiles in voxels within the brain were classified as order four orabove (anisotropic non-Gaussian), which, from our understanding ofanatomy, would be too low a percentage. Themethodwas validated bycharacterising its performance using synthetic data. It was notdescribed how accurately GM was differentiated from CF.

Behrens et al. (2007) reviewed several recent model-free techni-ques. In the data shown (HARDI data in 60 DE directions) a third fibrebundle orientation could not be detected. The authors supposed thata detection of more than two orientations would be possible ifmore diffusion directions were to be acquired at a higher b-value.Simulations, which were performed in (Behrens et al. 2007) sug-gest that in order to resolve a three fibre bundle orthogonal system

robustly, data with b-values above 4000 s/mm2 has to be acquired. Aswill be shownbelow in the Theory andmethods section, it is necessaryto acquire more than 60 DE directions in order to fulfill the samplingtheorem for spherical harmonics of order four and above (Yeo, 2005).In addition, the review by Alexander (2005) showed with noisy datasynthesised from isotropic test functions that most methods generatespurious angular structure. This may explain why strong angularstructures are incorrectly detected even in many GM and CSF voxels.

However, for a realistic tissue description, the existing models arerather complex and often include ill-defined parameters notadequately supported by the measurement data. All presentedmethods, including the model-free methods, showed that full imagesegmentation of microstructures and of image background was notpossible. A differentiation between voxels containing PF or CF, or thedifferentiation between GM, CF and background noise is difficult,since this information is usually derived from some measure ofdiffusion anisotropy. Many publications outline methods which showpotential for performing this differentiation, but so far an evaluationof their methods for fibre crossings has not been reported.

Based on the state-of-the-art described above we suggest a newdata-driven analysis of multi-directional diffusionweighted MRI data,which may provide unique fingerprints for different types of tissueand image components. In addition to using a model-free approach,we employ methods developed in the field of pattern recognition. Inthe present case we attempt classification of six different classes: greymatter (GM), the two white matter (WM) subclasses: CF and PF,partial volume (a mixture between GM and WM), as well as cere-brospinal fluid (CSF), background noise and image artefacts (hereafterreferred to as noise). First, the underlying diffusion profile per voxel ofthe HARDI data is described using the rotational invariants of thespherical harmonic decomposition. Then a Support Vector Machine(SVM), a computer algorithm for statistical learningwhich has alreadydemonstrated robust performance in other applications (Nattkemper2004; Quddus et al., 2005) is used for classification (Cristianini andJohn, 2000). The SVM is trained with the labelled image features inorder to find the function for separating the classes. Afterwards theSVM is systematically tested with simulated data and then applied tosix in vivo data sets.

Theory and methods

The support vector machine as a classifier

The field of pattern recognition is a sub-topic of machine learning.It is either based on a priori knowledge or on statistical informationextracted from the patterns, meaning that some pattern in raw data isclassified by performing some action based on some property orfeature of the data. Therefore, pattern recognition can be consideredas a form of data classification.

A complete pattern recognition system consists of a sensor thatgathers the observations to be classified, a feature extraction mecha-nism, which computes numeric information from the observations,and a classifier, which does the actual job of classifying observations,relying on the extracted features. One classification method issupervised learning, which is a machine learning technique forcreating a function from training data. The training data consists ofpairs of input data (the feature vectors) and desired outputs (thelabels). The output of the function can predict a class label of the inputdata. The task of the supervised learner is to predict the value of thefunction for input data after having seen a number of trainingexamples. This means that for correct separation between the classesthe learner has to be able to generalise from the presented data tounseen situations or data (Burges, 1998).

An example of a classifier is the support vector machine (SVM),which was introduced by Cortes and Vapnik (1995) and was found toyield good results for the problem presented in this paper. The SVM

Page 3: Fully automated classification of HARDI in vivo data using a support vector machine

Fig. 1. SVM classification procedure using training and test data sets.

Table 1Parameter combinations for each tissue class.

Tissue FA Relative weights Angle/degree

GM 0–0.15 0/1 0PF 0.55–0.9 0/1 and 0.1/0.9 0–20CF 0.2–0.5 0.2/0.8–0.5/0.5 30–90

Each range in FA is divided evenly into 8 bins. Relativeweights are divided evenly into 2bins for PF and 5 bins for CF. For PF the angle range is divided evenly into 5 bins and forCF into 13 bins.

644 S. Schnell et al. / NeuroImage 46 (2009) 642–651

requires an input vector, the feature vector, containing a number ofdata properties derived from the original data and describing eachdata class sufficiently but not redundantly. The SVM maps the inputvectors non-linearly onto a high-dimensional feature space by use of akernel function, resulting in a data set which can be separated by alinear classifier. In this feature space the SVM derives a model aboutthe separation hyperplane or decision function. Amongst all possibleseparating hyperplanes there exists one unique optimal hyperplane,which is determined by a maximum distance between any trainingdata point and the hyperplane (Schölkopf and Smola, 2002). Thetraining data points, which are closest to the hyperplane, are called thesupport vectors. They define a margin ensuring high generalisationability. The projection of the data from the input space to the featurespace is performed by functions called kernel functions such aspolynomials or Gaussian radial basis functions (Schölkopf and Smola,2002). During the training process the parameters of these kernels aredetermined. The result is the derived model for the decision function.

The SVM was originally developed for a two-class problem: thebinary classification. But many problems exist where multipleclasses have to be defined, e.g. the six image classes in the applicationpresented. Here, the SVM can be adapted to two approaches: one-versus-one and one-versus-rest. The one-versus-rest approach dividesthe decision of an N-class problem into N-times two-class cases. Theone-versus-one approach constructs a SVM for each pair of classesresulting in N(N-1)/2 SVMs. When applying this to a test point,each classification adds to a counter of the winning class and at theend the point is labelled with the class with the most votes. The one-versus-rest approach has the disadvantage that the performance canbe compromised due to unbalanced training data sets. On the contrary,the one-versus-one approach is computationallymore expensive sincemore SVM pairs have to be computed (Gualtieri and Cromp, 1999).

When solving a problemwith a supervised learning algorithm onehas to consider various steps:

1. A training data set has to be created representing the “real-world”. Here, the input variables and the labels have to bedefined. This labelling has to be done either by measurement orby expert knowledge.

2. A labelled example data set (test data set) has to be determined.3. A feature representation optimal for statistical learning has to

be found. The accuracy of the learned function depends stronglyon the extracted features.

4. The optimal learning algorithm has to be chosen, which canseparate the data into the wanted output classes.

5. The learning algorithm is applied to the training data set and theSVM is validated using the resulting learning function with aidof the labelled test data set. This is done in order to find andadjust the optimal learning function for the data set.

6. The resulting optimal algorithm is applied to a new data set forclassification.

This procedure is illustrated in Fig. 1.

Simulated data sets

In order to evaluate the performance of the SVM for the task inquestion some quantitative method had to be found for evaluation.Therefore, artificial data sets were created. Diffusion weighted imageswere simulated based on a two-compartment model for 81, 61 and 31DE directions evenly distributed on a sphere, where the signal S iscomposed of a combination of two underlying compartments:

S = S0w1 exp −bD1ð Þ + S0w2 exp −bD2ð Þ ð3Þ

with S0 being the signal without diffusion weighting, w the weight ofthe compartment, b the diffusionweightingmatrix andD the diffusiontensor (the DT).

The template (Eq. (3)) was used for the simulation of GM, PF andCF. Each voxel was simulated to include: (i) an FA value (resulting in aspecific D), (ii) two crossing fibre bundles with varying anglebetween them (rotation of D) and (iii) relative weights w of thetwo combined signals, where 0.5 means that both bundles have thesame strength. The FA and the mean diffusivity were equal for bothbundles. The mean diffusivity was constant for each tissue type:0.39×10−3 mm2/s in GM and 0.79×10−3 mm2/s in PF and CF. Theremaining three parameters FA, relative weight and angle were variedindependently within each class. For each class 520 voxels weresimulated. In Table 1 the parameter combinations representing eachtissue class are shown. Note that crossing fibre bundles are definedwith an angle of ≥30 degrees.

Six training data sets with several signal to noise ratios (SNR)were created by adding Gaussian white noise: SNR=∞, 100, 60, 30,10, 5. The SNR was determined in all simulated tissues anddirections by taking the median of the signal and dividing it bythe standard deviation of the absolute value of the added noise. Inliterature the SNR is often determined in the b=0 image. Accordingto this definition the SNR0 in the corresponding b=0 image wouldbe: ∞, 179, 104, 52, 18, 9.

For each noise condition we created corresponding test data sets,which were identical except that the main orientation of the fibreswas randomly rotated about the z-axis and that for each noisecondition newly generated noise was added.

In vivo HARDI measurements and data pre-processing

The following scanning protocol was performed with six healthyvolunteers forming the basis for one training and five test data sets.We measured the five additional data sets for testing in order toavoid biases occurring when the same data is used for training and

Page 4: Fully automated classification of HARDI in vivo data using a support vector machine

645S. Schnell et al. / NeuroImage 46 (2009) 642–651

testing (Gottrup et al. 2005). The six in vivo HARDI data sets wereacquired on a 3.0 T MRI scanner (Magnetom TIM Trio, SiemensMedical Systems, Erlangen, Germany) with 81 DE directions. Thescanner was equipped with a high performance gradient systemcapable of a maximal gradient strength of 40 mT/m. Images wereacquired with a standard circularly polarised radio-frequency eightchannel head coil. An effective b-value of 1000 s/mm2 was used foreach of the 81 DE directions. Eleven additional measurementsevenly distributed throughout the scan were acquired withoutdiffusion weighting (b=0 s/mm2). A total of 92 scans with 69 sliceswere obtained using a diffusion sensitive spin echo EPI sequencewith TR=11000 ms, TE=94 ms, voxel size=2×2×2 mm3,matrix=104×104. During reconstruction, scans were correctedfor motion and distortion artefacts based on a reference measure-ment (Zaitsev et al., 2006).

In order to investigate the effects of varying DE directions twoadditional HARDI data sets per subject were acquired having thesame imaging parameters except for the DE directions. One dataset was acquired with 61 DE directions and eight images withb=0 s/mm2, and the second was measured with 31 DE directionsand four images with b=0 s/mm2. We chose to reduce thenumber of acquired non-diffusion weighted data sets relative tothe reduced number of DE directions in order to keep the ratiobetween zero and high b-value images the same. This ensures asimilar contrast and SNR (Jones et al., 1999). All diffusion schemesused for the in vivo measurements were the same as for thesimulations.

In addition, a 3D T1-weighted magnetisation prepared ultrafastgradient-echo sequence (T1-weighted MPRAGE) measurement wasacquired for all six subjects with the following parameters: matrixsize=256× 256; 160 slices, TR=2200 ms; TI=1100 ms;TE=2.15 ms, flip angle=12°; bandwidth=200 Hz/pixel, voxelsize=1×1×1 mm3. The post-processing steps for the T1-imagewere performed with SPM5 (Functional Imaging Laboratories 2005).The T1-imagewas coregistered to the b0-imagewith the default SPM5parameters except for the reslicing method, for which we used 4thdegree B-spline interpolation. Afterwards, segmentation into GM,WM

Fig. 2. Procedure for labelling the training in vivo data set and training the SV

and CSF was performed using the default SPM5 parameter settingsand template probability maps.

Application of the SVM for HARDI classification

As stated above, a prerequisite of tissue classification using theSVM is an adequate selection of image features. For this, we chose theshape of the diffusion profile in each voxel, which can be described byspherical harmonics (Frank, 2002). The properties derived from theHARDI data, which make up the feature vector, are required to berotation invariant. The spherical harmonic coefficient a changesdepending on the orientation of the spherical harmonic in space,meaning spherical harmonics are covariant. In the following, weintroduce an approach describing the shape of the diffusion profile pervoxel by using a rotation invariant description of the sphericalharmonics:

fn =Xn

m= −njamn j2: ð4Þ

By using Eq. (2) and applying the spherical harmonics additiontheorem on the spherical harmonic (Webster and Szego, 1930; Arfkenand Weber, 1985) one obtains:

fn =Z

v VTv

dv Vdv � S v Vð Þ � Pn hv V; við Þ � S vð Þ ð5Þ

≈Xhv′;vi

S v′ð Þ � Pn hv′; við Þ � S vð Þ ð6Þ

with fn being the rotation invariant number for each order n, Pnbeing the associated Legendre Polynomial of order n. The variableS describes the diffusion weighted signal normalised with thenon-diffusion weighted signal. The unit vector v represents eachDE direction, where the angular brackets denote the scalarproduct. A sufficient description of a shape by spherical harmonicscan only be achieved if enough sampling points are available. Yeo

M in order to classify all image contents of a second in vivo DWI data set.

Page 5: Fully automated classification of HARDI in vivo data using a support vector machine

646 S. Schnell et al. / NeuroImage 46 (2009) 642–651

(2005) found a sufficient but not necessary condition describing thesampling theorem for spherical harmonics to be B data points:

B = 2nð Þ2: ð7Þ

However, Yeo considered a function of spherical harmonic ordern to be undersampled when fewer than B samples are taken. Thismeans that for a spherical harmonic function of 4th order, 64 datapoints are usually necessary for representation.

The SVM C++ library used in this paper is called libSVMtl(Ronneberger, 2004) and has a number of internal algorithmicoptions. In our application for the separation of the classes the radialbasis function was most accurate. In addition, best classificationresults were achieved if scaling of the features was performed. Twodifferent scaling options are possible: the “minmax” approach (scalingof each feature in such a way that the minimum becomes −1 and themaximum +1) and the “stddev” approach (scaling of each feature insuch a way that the mean becomes 0 and the standard deviationbecomes 1). In addition, the two multi-class comparison methodswere explored (one-versus-one and one-versus-rest).

Classification using a supervised learning technique like the SVMrequires the selection and labelling of representative voxels for atraining data set. This can easily be done for the simulated data sets.However, true labelling of an in vivo data set requires knowledge ofthe contents of each voxel, which is impossible. In addition, inves-tigations of other complex problems (Mouridsen et al. 2006)showed that there exists considerable operator bias when usingexperts for anatomical assessment. Consequently, the labelling of thefirst in vivo training data set was performed in two steps (Fig. 2).First, we used a simulated data set with an SNR of 10 for training.The simulated data was used as model for the classification of thefuture in vivo training data set, but here only the division of WMinto CF and PF was of interest, since the separation of CF from GM isnot possible using spherical harmonics shape description exclusively(see Introduction).

Second, we created WM, GM and CSF masks from the SPM5segmentation results of the T1-image. The information gathered inthese two steps was taken in order to obtain a labelled in vivo trainingdata set, meaning the CF regions were masked using the WM masksfrom the T1 segmentation, and the same was done with the PF voxels.The GM and CSF labels from the SPM5 segmentation were used asfinal labels. The combination of GM, WM and CSF resulted in a brainmask with the inverse being the background noise including ghostingand chemical shift artefacts. This labelled data set would be used asthe basis for future classification of all new human HARDI exampleshaving the same DE scheme. The voxels selected for the training data

Table 2The segmentation accuracy in percentage (true positive results out of 520 possible (sensitivitythresholding of the linear (cl), planar (cp) and spherical (cs) coefficients of the simulated d

SNR Accuracy SVM

CF PF GM

DE81 DE61 DE31 DE81 DE61 DE31 DE81 DE

∞ sensitivity 100 100 100 100 98.5 100 100 10(1-specificity) 0 0.77 0 0 0 0 0

100 sensitivity 100 99.8 100 99.8 100 100 100 10(1-specificity) 0.1 0 0 0 0.1 0 0

60 sensitivity 100 99.2 99.4 99.8 100 100 100 10(1-specificity) 0.1 0 0 0 0.38 0.29 0

30 sensitivity 98.9 99.4 99.8 99.8 100 100 100 10(1-specificity) 0.1 0 0 0 0.29 0.1 0

10 sensitivity 99.2 99.2 97.5 97.9 98.7 95.8 100 10(1-specificity) 1.06 0.67 2.12 0.38 0.38 1.15 0

5 sensitivity 96 92.5 81.0 92.69 90.9 71.4 99.4 9(1-specificity) 3.85 5.87 12.98 1.63 1.44 7.79 0.48

The accuracy is shown as a function of SNR.

sets were derived from regions with definite knowledge about thetissue structure.

In summary, for the image component classification task wetransformed the HARDI data to yield rotation invariant features,labelled the training data sets into multiple classes by incorporatinga priori knowledge derived from SPM5 and trained the SVM withthese data. Then several varying algorithmic options were testedand with help of a corresponding test data set the classificationresults were evaluated. The SVM classification of the in vivo datasets produces a fully segmented HARDI data set. This was comparedwith segmentation using the linear, planar, and spherical coeffi-cients of Westin et al. (1997).

Results

The SVM may be used with various settings as mentioned above,for example: one-versus-one or one-versus-rest, together with a widerange of possible kernels etc. Due to limitations of space, we cannotshow the results of every possible combination. In general, we foundthat the best accuracy was obtained using the radial basis function askernel together with feature scaling. Only the results of onecombination (radial basis function, plus “stddev” feature scaling,plus one-versus-rest multi-class comparison) are presented for bothsimulated and in vivo data.

Simulations

The dependence of the quality of classification on the number ofDE directions and noise, for the SVM with the simulated data sets, isshown in Table 2. From the eigenvalues of the DT the linear (cl), planar(cp) and spherical (cs) coefficients (Westin et al., 1997) can be derived(hereafter called the Westin coefficients). The table shows the resultsobtained for PF, CF and GM in each specific data set after manualthresholding to achieve the best segmentation. It is evident that forboth methods the higher the SNR and the higher the number of DEdirections the better the detection accuracy. A non-trivial finding isthat for the SVM the accuracy remains very high even for low SNR andpoor sampling. All three image components were detected with theSVMwithout error in sensitivity in the data sets without noise, exceptfor a few voxels in the detection of PF. SVMwas least sensitive (71%) inthe detection of PF at the lowest SNR and smallest number of DEdirections. SVM was least specific (87%) in the detection of CF. Incontrast, the Westin coefficients thresholding shows much lowersensitivity and specificity. The lowest sensitivity was 47% in thedetection of GM. The lowest specificity was 10% in the detection of CFswith an SNR=10.

), false positive results out of 1040 possible (1-specificity)) of the SVM classification andata with all possible DE directions.

cp, cl, cs

CF (cp) PF (cl) GM (cs)

61 DE31 DE81 DE61 DE31 DE81 DE61 DE31 DE81 DE61 DE31

0 100 100 100 100 100 100 100 100 100 1000 0 0 0 0 0 0 0 0 0 00 100 82.5 97.7 80.2 100 100 100 100 97.7 97.90 0 0.5 3.5 4.9 0 0 0 4.6 3.5 6.30 100 83.7 97.3 76.2 100 100 100 99.6 97.3 84.20 0 11 8.4 18.6 0 0 0 12.2 8.4 6.00 100 67.7 96.5 78.8 100 100 99.6 94.6 96.5 82.50 0 33.8 17.7 53.8 0 0 0.2 10.9 17.7 16.20 99.8 93.7 76.5 76.2 97.9 92.7 97.3 85.8 76.5 46.70 0.19 90.3 39.2 68.4 0.3 0.1 1.3 40.2 39.2 33.86.9 87.9 78.5 70.8 44.2 84.0 72.7 95.8 67.1 70.8 55.41.44 9.13 80.0 56.3 41.2 4.2 7 2.1 51.9 56.3 57.6

Page 6: Fully automated classification of HARDI in vivo data using a support vector machine

Fig. 3. Percentage voxels detected as crossings depending on SNR and angle (81 DE directions). The ground truth is zero below 30°.

647S. Schnell et al. / NeuroImage 46 (2009) 642–651

Fig. 3 demonstrates the classification accuracy for the detection ofCFs for 81 DE directions. It is obvious that the lower the SNR, the lowerthe accuracy of classification. Even at a low SNR=10 the results are ingood agreement with the ground truth.

In vivo results

The results for the segmentation of the in vivo data using the SVMareshown in the figures below. All following figures (4–8) show the samecoronal slice from one test subject for the GM, WM and CSF SPM5segmentation results, respectively. For reasons of space, only the resultsof oneSVMoption arepresented in the following (using “stddev” featurescaling and one-vs-rest multi-class comparison). The “gold standard”SPM5 is shown in grey scale. Segmentation results are overlaid intransparent red. All followinganatomical regionswhichwe specifywereidentified with a standard atlas (Nieuwenhuys et al. 2008).

Classification of grey matter

Fig. 4 shows grey matter voxels classified by the two differentmethods: SVM classification and thresholding of Westin coefficient cs.The Westin coefficient thresholding (here 0.8bcsb0.9) shows manyfalse positive results especially in noise regions and at the border of thelateral ventricle and parts of the third ventricle containing CSF (yellow

Fig. 4. SPM5 segmentation map of grey matter overlaid with the classification results intransparent red, here the recognised grey matter (the first test data set is shown). (a)GM classified with SVM, the thalamus is encircled in green; (b) Westin coefficientsthresholding (0.8bcs b0.9), false positive CSF regions are encircled in yellow.

circle in Fig. 4b). The SVM classification yielded some false positiveresults in background noise, however the majority of the classifiedvoxels are in agreement with the SPM5 probability map. Although thethalamus is defined as GM in SPM5 the population of cells in this regionis, in reality, a mixture of grey matter and white matter, interweavedwith many neuronal fibres. This may explain why SVM and csthresholding are not in agreement with SPM5 (green circle in Fig. 4a).

Classification of white matter

In Fig. 5 all voxels recognised to be part of WM using SVMclassification or Westin coefficients thresholding (CF=N0.17bcp b1and PF=N0.25bcl b1) are illustrated.

Fig. 5. White matter map overlaid with classified parallel and crossing fibres (the firsttest data set is shown). (a) PF classified with SVM, the pons is encircled in red, the bluearrow points to the fornix; (b) PF classification by Westin coefficients thresholding(c1N0.25), the pons is encircled in red, the blue arrow points to the fornix; (c) CFclassified with SVM, the thalamus is encircled in green; (d) CF classification by Westincoefficients thresholding (cpN0.17), the thalamus is encircled in green.

Page 7: Fully automated classification of HARDI in vivo data using a support vector machine

Fig. 6. CSF map overlaid with recognised CSF (the first test data set is shown). (a) classi-fication with SVM; (b) classification by Westin coefficients thresholding (cs = 0.95–1).

648 S. Schnell et al. / NeuroImage 46 (2009) 642–651

No quantitative evaluation is possible; however we expect themajority of the white matter voxels to be classified as CFs, since wehave defined two fibre bundles separated with a relatively low angleof 30° to be crossing fibre bundles. The two images on top representthe detected PF and the two images below the detected CFs. Westincoefficients thresholding has lower specificity than the SVMmethod, showing greater number of false positives in the regionswith background noise (Figs. 5b and d). A lower number of parallelfibres were detected with Westin coefficients in the corticospinaltract close to the pons (red circle in Figs. 5a and b). In addition,Westin thresholding could not detect PFs of the fornix (blue arrowin Figs. 5a and b), which on the contrary were recognised with theSVM. It is encouraging that voxels in the region of pons wereclassified as both PF and CF, since we know that here bothdescending PFs and many CFs are present together. Both methodsdetected CFs in the thalamus (green circle in Figs. 5c and d), whichis, as was mentioned above, a region which contains grey and whitematter. As with GM classification, the SVM WM classification gavesome false positive voxels in regions with noise, but most voxelsagree with the probability map. In general, Westin coefficients andSVM classification showed good agreement in voxels with highanisotropy such as the corpus callosum or the corticospinal tractabove the pons.

Classification of CSF

In Fig. 6 the results for the CSF classification are shown. As wasmentioned above, finding a threshold for differentiation of GM andCSF with Westin coefficients thresholding is very difficult (GM:08bcsb0.9, CSF: 0.95bcsb1). Therefore, in Fig. 6b false negative voxelsoccur, where in Fig. 4b they would be false positives. Note that SVMalso classifies voxels outside the brain, which could be CSF or otherfluids like blood. In addition the SVM tends to overestimate thenumber of CSF voxels, whereas, using a threshold that separates CSFfrom GM well, Westin thresholding tends to underestimate thenumber of CSF voxels.

Fig. 7.Mean diffusivity maps overlaid with the recognised noise (background noise and imagand CSF SPM5 segmentation results.

Classification of image noise and artefacts

In Fig. 7 the classified image noise, including background noise andimage artefacts, is illustrated. Here, only the SVM classification resultsare shown, since it was impossible to find this class by Westinthresholding. The results are overlaid on mean diffusivity maps forbetter contrast. Some voxels recognised by the SVM lie in regions ofhigh vessel pulsation or areas sensitive to image artefacts (blue circlein Fig. 7a), which is a correct classification, since we also defined noiseto be artefacts.

Comparison of sensitivity and specificity for SVM and Westin coefficients

The SVM and the thresholding of the Westin coefficients resultswere compared with the “gold standard” SPM5 segmentation of theT1-image. As stated above, the segmentation algorithms, like SPM5,can only give probabilistic values for the tissue type. Only theprobabilistic values above p=0.5 were considered for the followingvalidation. This comparison of the SVM and Westin coefficients withSPM5 is problematic. Since standard segmentation algorithms onlydivide the brain into GM, WM and CSF, we had to combine theclassification results in order to be able to compare with SPM5.Therefore, the voxels recognised as PF and CF were combined torepresent WM. The voxels recognised as GM and partial volume werecombined to represent GM, although partial volume could contain GMand WM. The background noise was compared with a mask createdwith help of the segmented GM, WM and CSF from the T1-image andcontained all voxels outside the brain. The CSF segmentation resultswere directly comparable. Table 3 gives an overview for all three dataset options (81, 61 and 31 DE directions) of the validation of the SVMsegmentation. We show the average value and standard deviation forall test data sets.

The SVM classified all classes with a sensitivity above 70%, exceptfor GM. The data acquired in 81 DE directions give the best accuracyfor the classification of WM. The specificities for GM and WM areabove 90%, much higher than with Westin thresholding. The accuracyfor the classification of noise and artefacts is above 90%, although thespecificity drops with decreasing number of DE directions.

Westin coefficients thresholding shows much lower sensitivitiesthan SVM, except forGM.Decreasing the numberofDEdirectionsmakesonly a difference for the data set simulated in 31 DE direction with anSNR=5, where the sensitivity drops below 45% for WM. The specificityfor the detection of CSF shows slightly better results than for SVM.

Discussion and outlook

A new automated method for the separation of parallel andcrossing fibre bundles in the brainwhite matter using HARDI data anda SVM algorithm is presented. With this method, each voxel of a dataset was identified without additional anatomical scans or expertknowledge.

e artefacts). (a) classificationwith SVM; and (b) combination and inversion of GM, WM

Page 8: Fully automated classification of HARDI in vivo data using a support vector machine

Table 3Comparison of sensitivity and specificity results in percentage (SVM classification and the linear, planar and spherical coefficients thresholding) of all data sets (81, 61 and 31 DEdirections) in relation to T1-image segmentation using SPM5.

SVM Westin coefficients cl, cp, cs

Tissue class Sensitivity±std (1-specificity)±std Sensitivity±std (1-specificity)±std81 DE directionsWM (CF + PF) 86.35±2.03 2.29±0.36 60.27±2.89 17.37±2.22GM (GM + partial) 56.35±6.02 2.19±1.72 49.31±10.28 14.60±5.40CSF 73.31±10.19 8.46±1.45 44.35±17.16 5.40±4.85Noise 92.08±3.42 16.61±7.76 – –

61 DE directionsWM (CF + PF) 71.92±2.62 0.85±0.24 61.83±2.02 18.74±0.85GM (GM + partial) 48.39±14.74 1.42±0.87 50.70±3.40 13.22±1.30CSF 73.07±12.70 8.29±1.27 38.76±27.16 4.09±4.93Noise 94.62±2.43 28.96±16.63 – –

31 DE directionsWM (CF + PF) 77.14±1.67 1.42±0.27 64.47±3,96 17.52±1.56GM (GM + partial) 53.97±8.09 1.59±0.79 52.97±7.92 12.37±3.45CSF 72.88±13.38 9.61±2.01 28.15±15.87 2.69±4.64Noise 92.39±2.73 55.57±9.56 – –

The results of the five test data sets were averaged.

649S. Schnell et al. / NeuroImage 46 (2009) 642–651

A rotation invariant data representation of the feature space wasused as input for the SVM. After feature extraction the classificationprocedure was trained and systematically tested using simulated datasets with several noise levels. We show that even for very low SNR of 5the chosen SVM algorithms gave a very high sensitivity and specificity

Fig. 8. SPM segmentation results of GM (a–d) and WM (e, f) overlaid with SVM classificatiocontrasted: on the left side the one-vs-rest approach is shown and on the right one-vs-one

as well as robustness in the presence of noisewith simulated data sets,in contrast to Westin coefficients thresholding.

The in vivo HARDI data sets for the classification of fibre crossingswere obtained in a clinically acceptable time. T1-weighted MPRAGEimages were used solely to identify brain regions for the training data

n results in transparent red. The two multi-class comparison algorithms of the SVM are: (a, b) partial volume, (c, d) GM, and (e, f) CF.

Page 9: Fully automated classification of HARDI in vivo data using a support vector machine

650 S. Schnell et al. / NeuroImage 46 (2009) 642–651

set. Once this was done, the MPRAGE images were no longer neededfor classification. If the SVM can be generalised, this would mean thatMPRAGE images need not be acquired in future. The problem ofcreating a training data set was solved by combining SVM classifica-tion results for CF and PF from simulated data sets with T1segmentation results for grey matter, white matter, CSF and back-ground noise. The selection of representative voxels for the trainingdata set may need several refinement steps. This dependence on thetraining data is a drawback of using a supervised learning algorithm,but once the training data set was optimised all subsequent stepscannot be biased by operators. This means, for optimising theapplication some effort is needed for the selection of the “best”representing training data.

The project also attempted the classification of partial volumevoxels containing grey matter and white matter. This was found to behighly dependent on the chosenmulti-class comparison algorithm. Anexample is shown in Fig. 8. Here, the performance of the two multi-class comparison methods one-vs-one and one-vs-rest are shown forthree tissue classes. The results are most different in the area of thethalamus, which is a region not clearly specifiable as GM orWM (bluecircle in Figs. 8b and e). This area was classified as partial volumewiththe one-vs-rest approach. With the one-vs-one approach those voxelsappear to be CFs. In general, the one-versus-rest approach detectedareas of partial volume robustly. But when looking closer at the partialvolume and CF detectionwith this latter method some regions, whichshould be crossings, are detected as partial volume as well (greencircle in Fig. 8b).

In additionwe found that with a different SVM combination, betterGM sensitivity was obtained, but at a cost of lower detectionsensitivity for other classes (data not shown). In summary thismeans that there is a trade-off between the accuracy of partial volume,crossing fibre bundles and grey matter detection. This shows that thechoice of the classification algorithm depends on the application,since it highly influences the results.

In our comparison of SVMwithWestin coefficients thresholding, itwas shown that our method can differentiate between the WMstructures, CF and PF, and is sensitive also for the separation of CF fromGM (cf. Figs. 4–7). Except for CSF, with Westin coefficients thresh-olding the differentiation of all classes showed lower sensitivities andspecificities. Note that the optimal Westin thresholding resulted inerroneous double labelling, i.e. many voxels were detected as both PFand CF, whereas the SVM can only give one label for each voxel. Thereare several explanations for this difference: first, the Westincoefficients are based on the diffusion tensor model, which iscorrelated with spherical harmonics of order two only. And second,the SVM surveys several features in parallel in high-dimensionalspace, meaning class specific feature combinations are taken intoaccount.

In addition, we compared the SVM segmentation results with thecoregistered SPM segmentation of the T1-image and found goodagreement. The segmentation in SPM5was defined as “gold standard”,but as already described we had to choose a probability threshold forthe validation, which introduces a subjective element to the analysis.Also, coregistration with an automated algorithm always has a risk ofmisregistration. We carefully inspected the registration results inorder to ensure that this was not the case. There is also a discretisationeffect when coregistering an image with finer resolution to a coarseimage leading to inaccuracies especially at tissue borders. In order toavoid such effects we acquired the T1 MPRAGE images in the samesession and orientation as the HARDI data, but such effects can neverbe totally avoided. Furthermore, the results obtained from the SVM forthe CSF segmentation include voxels between brain and skull, which isnot shown in the SPM5 segmentation, since this typical segmentationprocedure is based on a priori known probability maps. This meansthat the voxels defined as false positives in regions not shown in SPM5segmentation might be correct in reality. A proper verification of the

results remains a challenge especially for the parallel and crossingfibre bundles, which is a commonly recognised problem in diffusionMR imaging. Also, there are several partial volume combinations ineach voxel possible such as GM-CSF, WM-CSF, WM-GM, GM-vesseland WM-vessel, which were not considered here. Only the GM andWM partial volume was taken into account. The above-mentionedmisregistration and erroneous segmentation emphasise the strengthof our method, which does not require registration or segmentation.With our method we do not require additional anatomical scans,except for the creation of an initial training data set. However, for thisone training data set it is important to determine carefully a set oftraining voxels chosen where the user is sure about the underlyingstructure.

We also investigated the dependence of the detection accuracywith the number of DE directions. Therewas only a small difference inbetween results obtained with the simulated data sets of 61 or 81 DEdirections, but there is a sharp decrease in accuracy when using 31 DEdirections. It would be interesting to investigate the effect of using aneven higher number of DE directions than 81. In the in vivo results aneffect of the number of DE directions can only be found for the twoWM classes. This agrees with the sampling theory which states thatfor the order n=4, the DE directions required is ≥64, from Eq. (7).

Several assumptions were made in our simulations: first, only aone and two fibre bundle model was used as the underlying anatomy.Second, the minimum angle between fibre bundles of 30° is anarbitrary threshold for crossings and can be seen as a definition ofthreshold between parallel and crossing fibre bundles. This thresholdenables the simulation of a so called fibre bundle branching or fanningsituation, which is the reason why so many fanning regions weredetected as crossing regions.

For proof of principle, our method was tested on five independentsubjects and showed very similar results for the classification of CF, PF,GM and partial volume. Futurework will test our algorithm on a largernumber of subjects.

In the literature there has been no method reported where themain image components were recognised just by using HARDI data.Some authors pointed out the difficulty of differentiating between CF,GM, and noise. In our approach we show one possibility of solving thisproblem by choosing a combination of published model-free methodswith a supervised learning technique. Though the results still showfalse positive voxels and the validation is difficult, we obtained asegmentation procedure that performed well.

Currently, our method can provide a priori knowledge for increasingthe performance of fibre tracking algorithms. After initial masking e.g.using the SVM classification results in order to define the tracking areaonly to be in WM, one could for example divide the fibre tracking intotwo cases: the parallel fibre bundles could be tracked with an easy andfast deterministic tracking algorithm and the crossing and fanningregions could be tracked with an expensive, but robust and reliablemethod such as Gibbs tracking (Kreher et al., 2008).

Another application would be automatic recognition of patholo-gies, for example, the prognosis for WM neuronal fibre bundlesdestroyed after stroke using a HARDI measurement in the acute strokephase. One could train the SVM with this early state HARDI scans,which could be labelledwith help of coregistered HARDI data acquiredfrom the same patient, but later in the chronic stroke phase. In theselate HARDI scans permanent destruction of whitematter fibre bundlescan be determined with expert knowledge. Now, the SVM classifica-tion of any HARDI data after acute stroke with the same imagingparameters may be used to predict the location of permanentlydestroyed WM regions.

Acknowledgments

The authors thank Dr. Olaf Ronneberger, Thorsten Schmidt and Dr.Marco Reisert from the Computer Science Department of the

Page 10: Fully automated classification of HARDI in vivo data using a support vector machine

651S. Schnell et al. / NeuroImage 46 (2009) 642–651

University Freiburg for the fruitful discussions about data clusteringand supervised learning. The authors gratefully acknowledge Dr. KuanJin Lee from theMedical Physics department of the University MedicalCenter Freiburg for his assistance on this project. This work was partlysupported by the Bundesministerium für Bildung und Forschung[BMBF-research collaborations “Mechanisms of brain reorganisationin the language network” (01GW0661)].

References

Alexander, D.C., 2005. Multiple-fiber reconstruction algorithms for diffusion MRI. Ann.N.Y. Acad. Sci. 1064, 113–133.

Alexander, D.C., Barker, G.J., et al., 2002. Detection and modeling of non-Gaussianapparent diffusion coefficient profiles in human brain data. Magn. Reson. Med. 48(2), 331–340.

Arfken, G.B., Weber, H.J., 1985. Spherical Harmonics” and “Integrals of theProducts of Three Spherical Harmonics. Mathematical Methods For Physicists.Elsevier Publishing Company, Academic Press, Orlando, USA, pp. 680–685 and698–700.

Basser, P.J., Mattiello, J., et al., 1994. MR diffusion tensor spectroscopy and imaging.Biophys. J. 66 (1), 259–267.

Behrens, T.E.J., Berg, H.J., et al., 2007. Probabilistic diffusion tractography with multiplefibre orientations: what can we gain? NeuroImage 34, 144–155.

Burges, C.J.C., 1998. A Tutorial on Support Vector Machines for Pattern Recognition. DataMining and Knowledge Discovery. Boston, USA, 2. Kluwer Academic Publishers,pp. 121–167.

Cortes, C., Vapnik, V., 1995. Support Vector Networks. Mach. Learn. 20, 273–297.Cristianini, N., John, S.-T., 2000. An introduction to support vector machines: and other

kernel-based learning methods. Cambridge University Press.Descoteaux, M., Angelino, E., et al., 2006. Apparent diffusion coefficients from high

angular resolution diffusion imaging: estimation and applications. Magn. Reson.Med. 56, 395–410.

Frank, L.R., 2002. Characterization of anisotropy in high angular resolution diffusion-weighted MRI. Magn. Reson. Med. 47, 1083–1099.

Gottrup, C., Thomsen, K., et al., 2005. Applying instance-based techniques to predictionof final outcome in acute stroke. Artif. Intell. Med. 33 (3), 223–236.

Gualtieri, J.A., Cromp, R.F., 1999. Support Vector Machines for Hyperspectral RemoteSensing Classification. SPIE.

Hasan, K.M., Narayana, P.A., 2006. Retrospective measurement of the diffusion tensoreigenvalues from diffusion anisotropy and mean diffusivity in DTI. Magn. Reson.Med. 56, 130–137.

Jones, D.K., Horsfield,M.A., et al.,1999. Optimal strategies formeasuring diffusion in aniso-tropic systems by magnetic resonance imaging. Magn. Reson. Med. 42 (3), 515–525.

Kreher, B.W., Schneider, J.F., et al., 2005. Multitensor approach for analysis and trackingof complex fiber configurations. Magn. Reson. Med. 54, 1216–1225.

Kreher, B.W., Mader, I., et al., 2008. Gibbs tracking: a novel approach for thereconstruction of neuronal pathways. Magn. Reson. Med. 60 (4), 953–963.

Mouridsen, K., Christensen, S., et al., 2006. Automatic selection of arterial input functionusing cluster analysis. Magn. Reson. Med. 55, 524–531.

Nattkemper, T.W., 2004. Multivariate image analysis in biomedicine— amethodologicalreview. J. Biomed. Informatics 37 (5), 380–391.

Nieuwenhuys, R., Voogd, J., et al., 2008. The Human Central Nervous System. Springer,Berlin Heidelberg New York.

Quddus, A., Fieguth, P., et al., 2005. Adaboost and Support Vector Machines for WhiteMatter Lesion Segmentation in MR Images. IEEE Engineering in Medicine andBiology, Shanghai, China.

Ronneberger, O., (2004). qlibSVMtl.q Albert-Ludwigs University Freiburg, Germany.from http://lmb.informatik.uni-freiburg.de/lmbsoft/libsvmtl/download.en.html.

Schölkopf, B., Smola, A.J., 2002. Learning With Kernels. MIT Press, Cambridge,Massachusetts (USA).

Tuch, D.S., Weisskoff, R.M., et al., 1999. High Angular Resolution Diffusion Imaging of theHuman Brain. International Annual Meeting of the International Society forMagnetic Resonance in Medicine, Philadelphia, USA.

Webster, A., Szegö, G., 1930. Leipzig and Berlin. Teubner, Germany.Westin, C.-F., Peled, S., et al., 1997. Geometrical Diffusion Measures For Mri From Tensor

Basis Analysis. International Annual Meeting of the International Society forMagnetic Resonance in Medicine, Vancouver, Canada.

Yeo, B.T.T., (2005). Computing Spherical Transform and Convolution on the 2-Sphere.1–8.Zaitsev, M., Hennig, J., et al., 2006. Geometric Distortions Applied to Diffusion Tensor

Imaging. International Annual Meeting of the International Society for MagneticResonance in Medicine, Seattle, USA.