DELIVERABLE 3.3 Workflow implementation and streamlining for high- throughput image analysis of large-scale studies Grant agreement no.: 601055 (FP7-ICT-2011-9) Project acronym: VPH-DARE@IT Project title: Dementia Research Enabled by IT Funding Scheme: Collaborative Project Project co-ordinator: Prof. Alejandro Frangi, University of Sheffield Tel.: +44 114 22 20153 Fax: +44 114 22 27890 E-mail: [email protected]Project web site address: http://www.vph-dare.eu Due date of deliverable Month 24 Actual submission date Month 27 Start date of project April 1 st 2013 Project duration 48 months Work Package & Task WP 3, Task 3.2, 3.3 Lead beneficiary UCL Editor PMO Author(s) Nicolas Toussaint, David Cash, Wyke Huizinga Quality reviewer Peter Metheral, Wiro Niessen Project co-funded by the European Union within the Seventh Framework Programme Dissemination level PU Public X PP Restricted to other programme participants (including Commission Services) RE Restricted to a group specific by the consortium (including Commission Services) CO Confidential, only for members of the consortium (including Commission Services)
21
Embed
DELIVERABLE 3IT D3.3 2v1 Final.pdfDue date of deliverable Month 24 Actual submission date Month 27 Start date of project April 1st 2013 Project duration 48 months Work Package & Task
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
DELIVERABLE 3.3
Workflow implementation and streamlining for high-
throughput image analysis of large-scale studies
Grant agreement no.: 601055 (FP7-ICT-2011-9)
Project acronym: VPH-DARE@IT
Project title: Dementia Research Enabled by IT
Funding Scheme: Collaborative Project
Project co-ordinator: Prof. Alejandro Frangi, University of Sheffield
2.1. GENERAL PIPELINE SPECIFICATIONS ........................................................................................ 6 2.2. INVENTORY OF IMAGE ANALYSIS TOOLS .................................................................................. 7 2.3. IMAGE ANALYSIS WORKFLOW BLOCKS ................................................................................... 7 2.4. INCORPORATION INTO VPH-DARE RESEARCH PLATFORM ..................................................... 7
3. PER BIOMARKER PIPELINE IMPLEMENTATION ............................................................ 9
3.1. BIAS CORRECTION ................................................................................................................... 9 3.1.1. Evaluation on testing Set ................................................................................................ 9
3.2. WHOLE BRAIN PARCELLATION AND TISSUE SEGMENTATION ................................................ 10 3.2.1. Testing Set .................................................................................................................... 11
3.3. HIPPOCAMPAL VOLUME PROFILE .......................................................................................... 12 3.3.1. Evaluation on AD / control testing set ......................................................................... 14
FP7-601055: VPH-DARE@IT D3.3 – Workflow implementation and streamlining for high-throughput image… 07/07/2015
- 4 -
TABLE OF FIGURES
Figure 1: Diagram of the Nipype workflow environment (http://nipy.sourceforge.net/nipype).
Distinct image analysis tools are embedded into interfaces that constitute blocks of a workflow
seamlessly executable and fully reproducible. .......................................................................... 7 Figure 2: Data and information flow in the research platform. The pipelines are stored in the
platform and are at the disposition of the user. The platform facilitates the application of
validated biomarker extraction workflows on large cohorts for high throughput analysis. ...... 8 Figure 3: Correlation of BSI measures when using N3 or N4 for bias correction as pre-
processing, presented as Bland-Altman plots. ......................................................................... 10 Figure 4: Brain Parcellation workflow. A database is used to propagate template labels to the
target input structural image .................................................................................................... 11 Figure 5: Example of whole brain parcellation on one of the 1000 subjects of the RSS. ...... 12 Figure 6: Top: parcellation showing the hippocampus in red. Bottom: distribution of the
hippocampus volume as percentage of intracranial volume as function of age. ..................... 12 Figure 7: hippocampus segmentations (left) with estimated left and right long axes (middle and
right) superimposed on structural T1 image. ........................................................................... 13 Figure 8: (left) Kernel density estimation in 1D (black line) with sparse data points (blue
markers). The blue dots represent the data points and the black continuous line represents the
density estimation. (right) Kernel density estimation is applied along the principal axis of the
hippocampus. ........................................................................................................................... 13 Figure 9: Two examples of output profiles from a healthy subject (left) and a patient suffering
from Alzheimer’s disease (right). The graph shows the left (red) and right (green) hippocampal
volume profiles. ....................................................................................................................... 14 Figure 10: Volume Profile Generation workflow. Kernel density estimation is used to estimate
the continuous function from sparse voxel volumes projected along the hippocampus axis. . 14 Figure 11: Hippocampal volume (in TIV percentage) profile on a population of 27 controls
(grey) and 43 AD patients (black) between the anterior and the posterior part. ...................... 15 Figure 12: Workflow for diffusion weighted imaging data. The diffusion-weighted images are
linearly registered to the average B0 image. Field maps and T1 images are used to estimate
susceptibility distortion. The resulted corrected images are used for tensor fitting. ............... 17 Figure 13: The diffusion-processing pipeline outputs maps depicting the white matter
arrangement. (Left) T1 weighted image. (Right) corresponding FA map colour-coded with
tissue orientation. ..................................................................................................................... 17 Figure 14: Quality Control graphs for Diffusion MRI. (Top) DWI image showing some
significant signal dropouts. (Bottom) corresponding inter-slice cross-correlation for B0 (red)
and DWI (blue) images, where the problematic volume is automatically detected. ............... 18 Figure 15: Inter-slice cross-correlation graphs on 216 subjects of the ADNI cohort.
Thresholding allows the automatic detection of 13 outliers containing significant signal
dropouts. .................................................................................................................................. 19 Figure 16: The Taverna Workbench. During the integration process, the user needs to import
the newly created service (top-left panel) into the workbench (main panel) and connect inputs
and outputs of the pipeline (in green) with the DARE portal nodes (in blue). The pipeline can
then be run from the menu. ...................................................................................................... 21
FP7-601055: VPH-DARE@IT D3.3 – Workflow implementation and streamlining for high-throughput image… 07/07/2015
- 5 -
1. INTRODUCTION
Over the first two years of the project, VPH-DARE has collected numerous retrospective
imaging studies in dementia into a single repository represented by the VPH-SHARE
infostructure. This provides the ability to determine whether data and results from multiple
datasets can be pooled together in order to provide a better understanding of disease processes
and what factors (genetic, lifestyle, environmental) could influence them. Some of the most
well established biomarkers, as well as some of the most promising for early disease detection
and differential diagnosis, come from imaging. While many of these databases already have
been analysed before and contain certain derived imaging biomarkers for some datasets, each
database has employed different methodologies, software packages, and program settings to
obtain these values. Thus, it is important to extract the relevant imaging biomarkers from each
of these retrospective studies using standardised pipelines and to make them available to the
consortium for the purposes of mechanistic and phenomenological modelling done in WPs 5
and 6, as well as to provide normal and abnormal distributions to aid in diagnostic decisions as
part of the clinical platform being developed in WP8. This task represents a computationally
expensive endeavour, as there are multiple pipelines that require hours of computing time, and
tens of thousands of datasets from which the biomarkers need to be extracted. The VPH-DARE
research platform offers the opportunity to perform this extraction in a standardised and high-
throughput manner.
This deliverable has close ties with deliverable 3.1, in which we laid out the basic requirements
for key biomarkers we felt would be necessary for extraction, and deliverable 7.2, where we
presented methods for integrating these biomarker pipelines into the research platform. In this
document, we first focus on the design approach for constructing these biomarker pipelines
with the consideration that they will be used within the research platform. Then we discuss the
key biomarker pipelines: how we have optimised the pipeline parameters and any adjustments
that we have made to overcome various challenges that arose during the implementation. We
then show evidence that the biomarkers are performing as expected through the use of
validation test sets for each pipeline. Finally, we present the outline of a plan to complete
extraction of imaging biomarkers from the retrospective database as represented by Milestone
33.
2. BIOMARKER PIPELINE DESIGN
The goal of this deliverable and Milestone 33 is to extract imaging biomarkers from the
retrospective databases. This objective had strong implications in terms of the selection and
subsequent implementation of the pipelines. First, the most commonly used imaging
biomarkers in dementia research are based on high-resolution structural MRI using volumetric
T1-weighted imaging. These biomarkers provide quantitative volumetric assessments of key
brain structures and the longitudinal rate of change in volume of these structures. These are the
most well established biomarkers, with evident changes just before symptom onset and a strong
correlation with clinical severity. All of the retrospective studies in VPH-DARE that contain
imaging have some form of volumetric T1-weighted imaging as part of the protocol. As a result,
we decided to primarily focus on extracting biomarkers from these images. Second, there are
numerous publicly available software packages available to perform the image processing
needed to obtain these biomarkers, and we wanted our design to be flexible and interoperable
between these packages, additionally to tools developed in-house, so that the pipeline could
take the best components from each. Finally, we want to provide results to the end users that
are clear and reproducible. We felt that this would involve providing reasonable provenance
information (software versions, computer hardware, etc.) as well as one “validated” version of
the pipeline and corresponding end result rather than multiple versions with different settings.
FP7-601055: VPH-DARE@IT D3.3 – Workflow implementation and streamlining for high-throughput image… 07/07/2015
- 6 -
2.1. GENERAL PIPELINE SPECIFICATIONS
Current neuroimaging software provides a large variety of analysis tools that have been largely
approved by the scientific community. A non-exhaustive list of tools commonly used in the
neuroimaging community is presented below:
- FSL (http://fsl.fmrib.ox.ac.uk/fsl/fslwiki) is a library and software suite containing a
large panel of basic and complex image processing and manipulation tools
- SPM (http://www.fil.ion.ucl.ac.uk/spm) is a library dedicated to statistical analysis of
imaging data
- FreeSurfer (http://surfer.nmr.mgh.harvard.edu) is a software suite especially used for
cortical surface study
- Camino (http://cmic.cs.ucl.ac.uk/camino) is a software for diffusion image processing
- Slicer (http://slicer.org) is a generic software package for medical image computing
- ANTS (http://stnava.github.io/ANTs) is a processing library used for image
registration and segmentation
It is therefore crucial to take advantage of this repertoire of existing tools. The choice of the
workflow implementation should therefore be driven by its ability to facilitate the integration
of such tools. Additionally, it is crucial for large studies that the workflows aiming at the
extraction of imaging biomarkers share some degree of reproducibility. Furthermore, common
image processing techniques such as registration and segmentation will be used numerous times
in different workflow. This requires a workflow environment facilitating the transfer of
processing blocks from workflow to workflow.
Such an environment could be achieved using basic shell scripts or by programming within the
command line binaries. This would require additional time in terms of implementation,
correctly identifying interoperability between software packages, managing the versions of
these packages, and recording this information so that it could be saved with the resulting
outputs. In some cases, it would be more sensible to do this when integrating into the research
platform. However, Nipype (http://nipy.sourceforge.net/nipype) is a Python based workflow
engine that is well adapted for neuroimaging studies, incorporates many of these ancillary
capabilities, and thereby comes as a natural choice for the implementation of the biomarker
extraction pipelines so that they could be incorporated as a major part of VPH-DARE@IT