Computational Biology, Part 28 Automated Interpretation of Subcellular Patterns in Microscope Images III Robert F. Murphy Robert F. Murphy Copyright Copyright 1996, 1999, 1996, 1999, 2000-2006. 2000-2006. All rights reserved. All rights reserved.
Computational Biology, Part 28
Automated Interpretation of
Subcellular Patterns in Microscope Images
III
Computational Biology, Part 28
Automated Interpretation of
Subcellular Patterns in Microscope Images
IIIRobert F. MurphyRobert F. Murphy
Copyright Copyright 1996, 1999, 1996, 1999, 2000-2006.2000-2006.
All rights reserved.All rights reserved.
ResultsResults
1. Create sets of images showing the 1. Create sets of images showing the location of many different proteins location of many different proteins (each set defines one (each set defines one classclass of pattern) of pattern)
2. Reduce each image to a set of 2. Reduce each image to a set of numerical values (“numerical values (“featuresfeatures”) that are ”) that are insensitive to position and rotation of insensitive to position and rotation of the cellthe cell
3. Use statistical 3. Use statistical classification classification methodsmethods to “learn” how to distinguish to “learn” how to distinguish each class using the featureseach class using the features
Supervised learning of patternsSupervised learning of patterns
ER
Tubulin DNATfRActin
NucleolinMitoLAMP
gpp130giantin
Boland & Murphy 2001
2D Images of 10 Patterns (HeLa)
Evaluating Classifiers Divide ~100 images for each class into Divide ~100 images for each class into trainingtraining set and set and testtest setset
Use the Use the trainingtraining set to determine rules set to determine rules for the classesfor the classes
Use the Use the testtest set to evaluate performance set to evaluate performance Repeat with different division into Repeat with different division into training and testtraining and test
Evaluate different sets of features Evaluate different sets of features chosen as most discriminative by feature chosen as most discriminative by feature selection methodsselection methods
Evaluate different classifiersEvaluate different classifiers
2D Classification Results 2D Classification Results
Overall accuracy = 92%
True True ClasClasss
Output of the ClassifierOutput of the Classifier
DNADNA ERER GiaGia GppGpp LamLam MitMit NucNuc ActAct TfRTfR TubTub
DNADNA 9999 11 00 00 00 00 00 00 00 00
ERER 00 9797 00 00 00 22 00 00 00 11
GiaGia 00 00 9191 77 00 00 00 00 22 00
GppGpp 00 00 1414 8282 00 00 22 00 11 00
LamLam 00 00 11 00 8888 11 00 00 1010 00
MitMit 00 33 00 00 00 9292 00 00 33 33
NucNuc 00 00 00 00 00 00 9999 00 11 00
ActAct 00 00 00 00 00 00 00 100100 00 00
TfRTfR 00 11 00 00 1212 22 00 11 8181 22
TubTub 11 22 00 00 00 11 00 00 11 9595
Murphy et al 2000; Boland & Murphy 2001; Huang & Murphy 2004
Human Classification Results Human Classification Results
Overall accuracy = 83%
True True ClasClasss
Output of the ClassifierOutput of the Classifier
DNADNA ERER GiaGia GppGpp LamLam MitMit NucNuc ActAct TfRTfR TubTub
DNADNA 100100 00 00 00 00 00 00 00 00 00
ERER 00 9090 00 00 33 66 00 00 00 00
GiaGia 00 00 5656 3636 33 33 00 00 00 00
GppGpp 00 00 5454 3333 00 00 00 00 33 00
LamLam 00 00 66 00 7373 00 00 00 2020 00
MitMit 00 33 00 00 00 9696 00 00 00 33
NucNuc 00 00 00 00 00 00 100100 00 00 00
ActAct 00 00 00 00 00 00 00 100100 00 00
TfRTfR 00 1313 00 00 33 00 00 00 8383 00
TubTub 00 33 00 00 00 00 00 33 00 9393
Murphy et al 2003
Computer vs. HumanComputer vs. Human
40
50
60
70
80
90
100
40 50 60 70 80 90 100
Computer Accuracy
Human Accuracy
3D HeLa cell images3D HeLa cell imagesGiantinNuclear ER Lysosomalgpp130
ActinMitoch. Nucleolar TubulinEndosomal
Images collected using facilities at the Center for Biologic Imaging courtesy of Simon Watkins
Velliste & Murphy 2002
3D Classification Results 3D Classification Results
Overall accuracy = 98%
TruTrue e ClaClassss
Output of the ClassifierOutput of the Classifier
DNADNA ERER GiaGia GppGpp LamLam MitMit NucNuc ActAct TfRTfR TubTub
DNADNA 9898 22 00 00 00 00 00 00 00 00
ERER 00 100100 00 00 00 00 00 00 00 00
GiaGia 00 00 100100 00 00 00 00 00 00 00
GppGpp 00 00 00 9696 44 00 00 00 00 00
LamLam 00 00 00 44 9595 00 00 00 00 22
MitMit 00 00 22 00 00 9696 00 22 00 00
NucNuc 00 00 00 00 00 00 100100 00 00 00
ActAct 00 00 00 00 00 00 00 100100 00 00
TfRTfR 00 00 00 00 22 00 00 00 9696 22
TubTub 00 22 00 00 00 00 00 00 00 9898
Velliste & Murphy 2002; Chen & Murphy 2004
Unsupervised Learning to Identify High-Resolution Protein Patterns
Unsupervised Learning to Identify High-Resolution Protein Patterns
Location ProteomicsLocation Proteomics
TagTag many proteins many proteins We have used We have used CD-taggingCD-tagging(developed by (developed by Jonathan Jarvik andJonathan Jarvik andPeter BergetPeter Berget): Infect population of): Infect population ofcells with a retrovirus carrying DNAcells with a retrovirus carrying DNAsequence that will “tag” in a random gene in each sequence that will “tag” in a random gene in each
cellcell Isolate separate Isolate separate clonesclones, each of which produces , each of which produces
express one tagged proteinexpress one tagged protein Use RT-PCR to Use RT-PCR to identify tagged geneidentify tagged gene in each in each
cloneclone Collect Collect many live cell images many live cell images for each clone for each clone
using spinning disk confocal fluorescence using spinning disk confocal fluorescence microscopymicroscopy
Jarvik et al 2002
What Now?
Group ~90
tagged clones
by pattern
Solution: Group them automatically
How?How? SLF features can be used to measure SLF features can be used to measure similarity of protein patternssimilarity of protein patterns
This allows us for the first time to This allows us for the first time to create a systematic, objective, create a systematic, objective, framework for describing subcellular framework for describing subcellular locations: a locations: a Subcellular Location Subcellular Location TreeTree
Start by grouping two proteins whose Start by grouping two proteins whose patterns are most similar, keep patterns are most similar, keep adding branches for less and less adding branches for less and less similar patternssimilar patterns
Chen et al 2003;Chen and Murphy 2005
Protein name
Human description
From databases
http://murphylab.web.cmu.edu/services/PSLID/tree.html
Nucleolar Proteins
Punctate Nuclear Proteins
Predominantly Nuclear
Proteins with Some Punctate
Cytoplasmic Staining
Nuclear and Cytoplasmic Proteins with Some Punctate Staining
Uniform
Bottom: Visual Assignment to “known” locations
Top: Automated Grouping and Assignment
Protein name
http://murphylab.web.cmu.edu/services/PSLID/tree.html
Refining clusters using temporal textures
Refining clusters using temporal textures
Incorporating Temporal InformationIncorporating Temporal Information
Time series images could be useful forTime series images could be useful for Distinguishing proteins that are not Distinguishing proteins that are not distinguishable in static imagesdistinguishable in static images
Analyzing protein movement in the presence Analyzing protein movement in the presence of drugs, or during different stages of of drugs, or during different stages of the cell cyclethe cell cycle
Need approach that does not require Need approach that does not require detailed understanding of the detailed understanding of the objects/organelles in which each objects/organelles in which each protein is locatedprotein is located Generic Object Tracking approach?Generic Object Tracking approach?
Not all proteins in discernible objectsNot all proteins in discernible objects Non-tracking approach neededNon-tracking approach needed
Texture FeaturesTexture Features
Haralick texture featuresHaralick texture features describe the correlation in describe the correlation in intensity of pixels that are next intensity of pixels that are next to each other in to each other in spacespace. . These have been valuable for These have been valuable for classifying static patterns.classifying static patterns.
Temporal texture featuresTemporal texture features describe the correlation in describe the correlation in intensity of pixels in the same intensity of pixels in the same position in images next to each position in images next to each other over other over timetime..
Temporal Texturesbased on Co-occurrence Matrix
Temporal Texturesbased on Co-occurrence Matrix Temporal co-occurrence matrix P:Temporal co-occurrence matrix P:
NNlevellevel by N by Nlevellevel matrix, Element matrix, Element P[i, j] is the probability that P[i, j] is the probability that a pixel with value i has value j a pixel with value i has value j in the next image (time point).in the next image (time point).
Thirteen statistics calculated Thirteen statistics calculated on P are used as featureson P are used as features
4 2 2 2 41 2 4 1 13 4 4 4 22 2 3 3 23 3 3 2 4
4 2 2 2 41 2 4 1 13 4 4 4 22 2 3 3 23 3 3 2 4
Temporal co-occurrence matrix (for image that does not change)
770000004400660000330000990022000000331144332211
Image at t0 Image at t1
4 2 2 2 41 2 4 1 13 4 4 4 22 2 3 3 23 3 3 2 4
Temporal co-occurrence matrix (for image that changes) 1133330044
11005500335511112222002200111144332211
2 1 4 4 31 4 2 3 32 3 3 2 24 4 2 2 32 4 2 1 4
Image at t0 Image at t1
Implementation of Temporal Texture Features
Implementation of Temporal Texture Features Compare image pairs with different Compare image pairs with different time interval ,compute 13 temporal time interval ,compute 13 temporal texture features for each pair.texture features for each pair.
Use the average and variance of Use the average and variance of features in each kind of time features in each kind of time interval, yields 13*5*2=130 interval, yields 13*5*2=130 featuresfeatures
T= 0s 45s 90s 135s 180s 225s 270s 315s 360s 405s …
Test: Evaluate ability of temporal textures to improve discrimination of similar protein patterns
Results for temporal texture and static features
Results for temporal texture and static features
Dia1Dia1 SdprSdpr Atp5aAtp5a11
AdfpAdfp timm2timm233
Dia1Dia1 5050 3535 55 00 1010
SdprSdpr 99 8787 00 44 00
Atp5a1Atp5a1 00 55 9595 00 00
AdfpAdfp 22 00 22 9292 44
Timm23Timm23 00 55 00 88 8888Average Accuracy 85.1%
ConclusionConclusion
Addition of temporal texture Addition of temporal texture features improves features improves classification accuracy of classification accuracy of protein locationsprotein locations
Generative models of subcellular patterns Generative models of subcellular patterns
Decomposingmixture patternsDecomposingmixture patterns Clustering or classifying whole Clustering or classifying whole cell patterns will consider each cell patterns will consider each combination of two or more combination of two or more “basic” patterns as a unique new “basic” patterns as a unique new patternpattern
Desirable to have a way to Desirable to have a way to decomposedecompose mixtures instead mixtures instead
One approach would be to assume One approach would be to assume that each basic pattern has a that each basic pattern has a recognizable combination of recognizable combination of different types of objectsdifferent types of objects
Object-based subcellular pattern models
Object-based subcellular pattern models Goals:Goals:
to be able to recognize “pure” to be able to recognize “pure” patterns using only objectspatterns using only objects
to be able to recognize and unmix to be able to recognize and unmix patterns consisting of two or more patterns consisting of two or more “pure” patterns“pure” patterns
to enable building of generative to enable building of generative models that can synthesize patterns models that can synthesize patterns from objects: needed for systems from objects: needed for systems biologybiology
Object type determinationObject type determination Rather than specifying object Rather than specifying object types, we chose to learn them from types, we chose to learn them from the datathe data
Use subset of SLFs to describe Use subset of SLFs to describe objectsobjects
Perform Perform kk-means clustering for -means clustering for kk from 2 to 40from 2 to 40
Evaluate goodness of clustering Evaluate goodness of clustering using Akaike Information Criterionusing Akaike Information Criterion
Choose Choose kk that gives lowest AIC that gives lowest AIC
Zhao et al 2005
Unmixing: Learning strategyUnmixing: Learning strategy Once object types are known, each Once object types are known, each cell in the training (pure) set cell in the training (pure) set can be represented as a vector of can be represented as a vector of the amount of fluorescence for the amount of fluorescence for each object typeeach object type
Learn probability model for these Learn probability model for these vectors for each classvectors for each class
Mixed images can then be Mixed images can then be represented using mixture represented using mixture fractions times the probability fractions times the probability distribution of objects for each distribution of objects for each classclass
1 2 3 4 5 6 78
Nuclear class
Lysosomal class
Golgi class0
0.1
0.2
0.3
0.4
0.5
Amt fluor.
Object type
1 2 3 4 5 6 78
Nuclear class
Lysosomal class
Golgi class0
0.1
0.2
0.3
0.4
0.5
Amt fluor.
Object type
1 2 3 4 5 6 7 8
Nuclear classLysosomal classGolgi classAll
0
0.05
0.1
0.15
0.2
0.25
Amt fluor.
Object type
Pure Golgi Pure Golgi PatternPattern
Pure Lysosomal Pattern
50% mix of 50% mix of eacheach
Two-stage Strategy for unmixing unknown imageTwo-stage Strategy for unmixing unknown image
Find objects in unknown (test) Find objects in unknown (test) image, classify each object into image, classify each object into one of the object types using one of the object types using learned object type classifier learned object type classifier built with all objects from built with all objects from training imagestraining images
For each test image, make list of For each test image, make list of how often each object type is foundhow often each object type is found
Find the fractions of each class Find the fractions of each class that give “best” match to this listthat give “best” match to this list
Test of unmixingTest of unmixing
Use 2D HeLa dataUse 2D HeLa data Generate random mixture fractions Generate random mixture fractions for eight major patterns (summing for eight major patterns (summing to 1)to 1)
Use these to synthesize “images” Use these to synthesize “images” corresponding to these mixturescorresponding to these mixtures
Try to estimate mixture fractions Try to estimate mixture fractions from the synthesized imagesfrom the synthesized images
Compare to true mixture fractionsCompare to true mixture fractions
ResultsResults
Given 5 synthesized “cell Given 5 synthesized “cell images” with any mixture of 8 images” with any mixture of 8 basic patternsbasic patterns
Average accuracy of Average accuracy of estimating the mixture estimating the mixture coefficients is 83%coefficients is 83%
Zhao et al 2005
OverviewOverview
ObjectDetection
Real images
objects
Object type assigning
Object type modeling
Statistical models
Object types
€
P(objects | pattern)
€
P(objects | patterns,mix.coeff)
Generating imagesGenerating images
ObjectDetection
Real images
objects
Object type assigning
Object type modeling
Statistical models
Object types
SamplingGenerated images
€
P(image | pattern)
€
P(image | patterns,mix.coeff)
Generating objects and imagesGenerating objects and images
ObjectDetection
Real images
objects
Object type assigning
Object type modeling
Statistical models
Object morphology modeling
Object types
SamplingGenerated images
LAMP2 patternLAMP2 pattern
Nucleus
Cell membrane
Protein
Nuclear Shape - Medial Axis ModelNuclear Shape - Medial Axis Model
Rotate
Medial axisRepresented by two curves
the medial axis width along the medial axis
width
Synthetic Nuclear ShapesSynthetic Nuclear Shapes
With added nuclear textureWith added nuclear texture
Cell ShapeDescription: Distance Ratio
Cell ShapeDescription: Distance Ratio
d1
d2 2
21
d
ddr
+=
Capture variation Capture variation as a principal as a principal components components modelmodel
GenerationGeneration
Small ObjectsSmall Objects
Approximated by 2D Gaussian Approximated by 2D Gaussian distributiondistribution
Object PositionsObject Positions
d1d2
21
2
dd
dr
+=
PositionsPositions
Logistic regressionLogistic regression
GenerationGeneration Each pixel has a weight according to the Each pixel has a weight according to the
logistic modellogistic model
rerP
101
1)( ββ −−+=
Fully Synthetic Cell ImageFully Synthetic Cell Image
RealSyntheticRealSynthetic
Conclusions and Future WorkConclusions and Future Work Object-based generative Object-based generative models useful for models useful for communicating information communicating information about subcellular patternsabout subcellular patterns
Work continues!Work continues!
Final wordFinal word
Goal of automated image Goal of automated image interpretation should not beinterpretation should not be Quantitating intensity or Quantitating intensity or colocalizationcolocalization
Making it easier for biologists Making it easier for biologists to see what’s happeningto see what’s happening
Goal should be generalizable, Goal should be generalizable, verifiable, mechanistic verifiable, mechanistic models of cell organization models of cell organization and behavior and behavior automatically automatically derived from imagesderived from images