Workflows and visualisation for machine learning segmentation of massive image data sets James Lefevre Hamilton lab Institute for Molecular Bioscience The University of Queensland
Workflows and visualisation for machine learning segmentation of massive image data sets
James Lefevre
Hamilton lab
Institute for Molecular Bioscience
The University of Queensland
Lattice Light Sheet Microscopy
• Very High spatial resolution(100 x 100 x 268 nm)
• Very High Temporal resolution (100 slices / second)
• Very low photo-toxicity
• Opportunity to generate very rare
data (very few working systems
world wide)
• 4 Individual lasers (440 nm (CFP),
488 nm (GFP), 560 nm (RFP), 640
nm (Far Red)
Typical capture: 300 time steps / image stacks over 20 minutes
Each stack 1113 x 1024 x 151, 688MB (single channel )
Approach to analysis of macrophage imaging
• Want automated semantic segmentation into:
– Tent-pole
– Ruffles
– Cell body
– Background
• Segment and analyse each stack / time step, then integrate information
• Identify and track objects
• Understand and quantify events with Stow Group
Stow Lab Imaging: http://doi.org/10.1083/jcb.201804137
Pipeline for segmentation and quant of 4D Imaging
Elements needed for pipeline
• High performance computing (thanks RCC!)
• 4D visualisation for interaction and viewing
• Methods for tracking structures across time
• Statistical analysis
Automated semantic segmentation approaches
Deep learning• Use 3D U-Net or similar architecture
• Sophisticated segmentations rapidly calculated on GPU clusters
• Requires extensive training data and time
• Uncertain prospects for transfer learning for this problem
Lacks power and reliablity at
scale – cannot compete with
machine learning approaches
Traditional approach• Work in interactive ImageJ or similar
• Clean image, e.g. noise removal via median filter
• Thresholding to separate classes, spot detection to identify objects etc
• Code algorithm in macro, run in batch mode
Lack sufficient training data
(many complete segmentations),
may lack computational
resources for training
Alternative machine learning approaches?
Machine learning with Trainable Weka
• Uses existing algorithms in ImageJ and plugins to calculate useful image features
• Trains a "shallow" machine learning algorithm from Weka ML platform using these features
• Far less training data required; fast, easy training of model
https://imagej.net/Trainable_Weka_Segmentation
Statistics used:• Local 3D mean, min, max, variance,
median
• 3D derivatives
• 3D Hessians
• 3D Gaussian blur
• 3D Laplacian
• 3D Canny edge detection
• 3D Difference of Gaussians
Calculated for r=1,2,4,8,16 pixels
(less in z dimension)
Used random forest
• Fast
• >99% class weighted CV
accuracy
• Good generalisability (visual
assessment)
Replicated training data with
scaled intensity for robustness
against fluorophore responseSpeed advantage in training does not extend to deployment –
all features need to be calculated for each image
2D and 3D versions – differ only in
image features used
GUI features
Builds on ImageJ interface
Create classes
Select training data
Select image features and scales
Select machine learning model
Train, save, load and apply models
Export training data
Compare segmentation to image
Select additional training data -> iterate
Can look at standard ML performance measures, but primarily assess visually over whole image / stack
Trainable Weka
GUI falls short of requirements in several ways – used API and extension code
Trainable Weka at scale
Run model externally, cache features on disk
calculate single feature for whole stack
segment single slice using all features
Selectively downsample for features at larger radii
HPC cluster, ImageJ headless mode, Research Database
Management collection
Wrote visualisation software using Processing language
See slices in 3D context
Compare images to segmentation, segmentation versions
Memory constraints, stability
110 feature stacks x ~200MB
Processing speed
Some feature costs O(radius cubed)
Automatic processing of large datasets
Selection of training data – 3D context
Deconvolution
Max intensity projections and stats
Segmentation
Computational Workflow
Parallel processing ~300 stacks
Watershed split, object stats and adjacencies
Tracking
Quality check, cropping,Intensity adjustment
Meshes and skeletonisation
Further analysis and visualisation
Deconvolution
Max intensity projections and stats
Segmentation
Microvolution softwareWiener GPU cluster
IMB Image portal
Interactive ImageJ, R
Computational Workflow
Parallel processing ~300 stacks
Watershed split, object stats and adjacencies
Tracking
Quality check, cropping,Intensity adjustment
Meshes and skeletonisation
Further analysis and visualisation
Headless ImageJTrainable Weka plugin
ImageScience / FeatureJ plugin
Headless ImageJMorphoLibJ
3D Objects Counter
Headless ImageJ with3D ImageJ Suite / mcib3d
Skeletonize3D, AnalyseSkeleton3D Objects Count3D Objects
Headless ImageJ
Object detection and tracking
Challenge
• Hundreds of objects over hundreds of time steps
• Segmentation often fails to fully separate objects
• Inevitable variation in segmentation between time steps
• Need to combine objects of different classes to understand
events
Approach
• Watershed split algorithm using edge-distance
• Selective rejoining of objects integrated into tracking
• Created class and object hierarchy – merged tentpole/ruffle class also analysed
Visualisation
Questions:
During training and validation• Training data selection – what class should that pixel be in?
• Assessment and comparison of base segmentations
Evaluating object segmentation and tracking• How did my object splitting and joining algorithm go?
• How about tracking over time?
Answer: I need a 3D/4D visualiser that allows instant
switching between various types of data - wrote visualiser in Processing 3
Visualiser - Viewing source & segmented data
Training data selection
What class should that pixel be in? How did my last model do?
Need to put image slices into 3D/4D context
Training data selection
What class should that pixel be in? How did my last model do?
Need to put image slices into 3D/4D context
Training data selection
What class should that pixel be in? How did my last model do?
Need to put image slices into 3D/4D context
Training data selection
What class should that pixel be in? How did my last model do?
Need to put image slices into 3D/4D context
Training data selection
What class should that pixel be in? How did my last model do?
Need to put image slices into 3D/4D context
View Object Associations in Space
How did my object splitting algorithm go?
Cells separated, but 2 spurious splits
Corrected by recombination algorithm
Tracking Across Time2 cells over 21
time points
2 cells over 3 time points
Yellow lines track tentpoles
Ruffles splitting
and merging
Questions
• Understand this new way that cells internalise proteins and molecules from their environment and responds to pathogens
• Apply to tens of thousands of events
• Track each event and its components over time
• What affects their generation? What proteins are crucial?
• What defects are associated with disease?
• Apply and adapt to other cellular systems
Stow Lab Imaging: http://doi.org/10.1083/jcb.201804137
Nick Hamilton
Fu
nd
ing
Stow Lab
Adam Wall
Nick Condon
Yvette Koh
Institute for Molecular Bioscience Microscopy
UQ Research Computing Center
Training data selection
What class should that pixel be in? How did my last model do?
Need to put image slices into 3D/4D context
Deconvolved image Semantic segmentation Segmentation probability
Segmentation of 3D LLSM imaging
Macrophage cells
(Image:Nick Condon / Stow Lab)
Segmentation into “tent poles” ,
ruffles, cell surface
(Image: James Lefevre / Hamilton Lab)
Has been applied to ~2000 3D time points and appears to work well