Computational Biology, Part 28 Automated Interpretation of Subcellular Patterns in Microscope Images III Robert F. Murphy Copyright 1996, 1999, 2000-2006.

Computational Biology, Part 28

Automated Interpretation of

Subcellular Patterns in Microscope Images

III

Computational Biology, Part 28

Automated Interpretation of

Subcellular Patterns in Microscope Images

IIIRobert F. MurphyRobert F. Murphy

Copyright Copyright 1996, 1999, 1996, 1999, 2000-2006.2000-2006.

All rights reserved.All rights reserved.

ResultsResults

1. Create sets of images showing the 1. Create sets of images showing the location of many different proteins location of many different proteins (each set defines one (each set defines one classclass of pattern) of pattern)

2. Reduce each image to a set of 2. Reduce each image to a set of numerical values (“numerical values (“featuresfeatures”) that are ”) that are insensitive to position and rotation of insensitive to position and rotation of the cellthe cell

3. Use statistical 3. Use statistical classification classification methodsmethods to “learn” how to distinguish to “learn” how to distinguish each class using the featureseach class using the features

Supervised learning of patternsSupervised learning of patterns

ER

Tubulin DNATfRActin

NucleolinMitoLAMP

gpp130giantin

Boland & Murphy 2001

2D Images of 10 Patterns (HeLa)

Evaluating Classifiers Divide ~100 images for each class into Divide ~100 images for each class into trainingtraining set and set and testtest setset

Use the Use the trainingtraining set to determine rules set to determine rules for the classesfor the classes

Use the Use the testtest set to evaluate performance set to evaluate performance Repeat with different division into Repeat with different division into training and testtraining and test

Evaluate different sets of features Evaluate different sets of features chosen as most discriminative by feature chosen as most discriminative by feature selection methodsselection methods

Evaluate different classifiersEvaluate different classifiers

2D Classification Results 2D Classification Results

Overall accuracy = 92%

True True ClasClasss

Output of the ClassifierOutput of the Classifier

DNADNA ERER GiaGia GppGpp LamLam MitMit NucNuc ActAct TfRTfR TubTub

DNADNA 9999 11 00 00 00 00 00 00 00 00

ERER 00 9797 00 00 00 22 00 00 00 11

GiaGia 00 00 9191 77 00 00 00 00 22 00

GppGpp 00 00 1414 8282 00 00 22 00 11 00

LamLam 00 00 11 00 8888 11 00 00 1010 00

MitMit 00 33 00 00 00 9292 00 00 33 33

NucNuc 00 00 00 00 00 00 9999 00 11 00

ActAct 00 00 00 00 00 00 00 100100 00 00

TfRTfR 00 11 00 00 1212 22 00 11 8181 22

TubTub 11 22 00 00 00 11 00 00 11 9595

Murphy et al 2000; Boland & Murphy 2001; Huang & Murphy 2004

Human Classification Results Human Classification Results


True True ClasClasss



DNADNA 100100 00 00 00 00 00 00 00 00 00

ERER 00 9090 00 00 33 66 00 00 00 00

GiaGia 00 00 5656 3636 33 33 00 00 00 00

GppGpp 00 00 5454 3333 00 00 00 00 33 00

LamLam 00 00 66 00 7373 00 00 00 2020 00

MitMit 00 33 00 00 00 9696 00 00 00 33

NucNuc 00 00 00 00 00 00 100100 00 00 00

ActAct 00 00 00 00 00 00 00 100100 00 00

TfRTfR 00 1313 00 00 33 00 00 00 8383 00

TubTub 00 33 00 00 00 00 00 33 00 9393

Murphy et al 2003

Computer vs. HumanComputer vs. Human

40

50

60

70

80

90

100

40 50 60 70 80 90 100

Computer Accuracy

Human Accuracy

3D HeLa cell images3D HeLa cell imagesGiantinNuclear ER Lysosomalgpp130

ActinMitoch. Nucleolar TubulinEndosomal

Images collected using facilities at the Center for Biologic Imaging courtesy of Simon Watkins

Velliste & Murphy 2002

3D Classification Results 3D Classification Results


TruTrue e ClaClassss



DNADNA 9898 22 00 00 00 00 00 00 00 00

ERER 00 100100 00 00 00 00 00 00 00 00

GiaGia 00 00 100100 00 00 00 00 00 00 00

GppGpp 00 00 00 9696 44 00 00 00 00 00

LamLam 00 00 00 44 9595 00 00 00 00 22

MitMit 00 00 22 00 00 9696 00 22 00 00

NucNuc 00 00 00 00 00 00 100100 00 00 00

ActAct 00 00 00 00 00 00 00 100100 00 00

TfRTfR 00 00 00 00 22 00 00 00 9696 22

TubTub 00 22 00 00 00 00 00 00 00 9898

Velliste & Murphy 2002; Chen & Murphy 2004

Unsupervised Learning to Identify High-Resolution Protein Patterns

Unsupervised Learning to Identify High-Resolution Protein Patterns

Location ProteomicsLocation Proteomics

TagTag many proteins many proteins We have used We have used CD-taggingCD-tagging(developed by (developed by Jonathan Jarvik andJonathan Jarvik andPeter BergetPeter Berget): Infect population of): Infect population ofcells with a retrovirus carrying DNAcells with a retrovirus carrying DNAsequence that will “tag” in a random gene in each sequence that will “tag” in a random gene in each

cellcell Isolate separate Isolate separate clonesclones, each of which produces , each of which produces

express one tagged proteinexpress one tagged protein Use RT-PCR to Use RT-PCR to identify tagged geneidentify tagged gene in each in each

cloneclone Collect Collect many live cell images many live cell images for each clone for each clone

using spinning disk confocal fluorescence using spinning disk confocal fluorescence microscopymicroscopy

Jarvik et al 2002

What Now?

Group ~90

tagged clones

by pattern

Solution: Group them automatically

How?How? SLF features can be used to measure SLF features can be used to measure similarity of protein patternssimilarity of protein patterns

This allows us for the first time to This allows us for the first time to create a systematic, objective, create a systematic, objective, framework for describing subcellular framework for describing subcellular locations: a locations: a Subcellular Location Subcellular Location TreeTree

Start by grouping two proteins whose Start by grouping two proteins whose patterns are most similar, keep patterns are most similar, keep adding branches for less and less adding branches for less and less similar patternssimilar patterns

Chen et al 2003;Chen and Murphy 2005

Protein name

Human description

From databases

http://murphylab.web.cmu.edu/services/PSLID/tree.html

Nucleolar Proteins

Punctate Nuclear Proteins

Predominantly Nuclear

Proteins with Some Punctate

Cytoplasmic Staining

Nuclear and Cytoplasmic Proteins with Some Punctate Staining

Uniform

Bottom: Visual Assignment to “known” locations

Top: Automated Grouping and Assignment

Protein name

http://murphylab.web.cmu.edu/services/PSLID/tree.html

Refining clusters using temporal textures

Refining clusters using temporal textures

Incorporating Temporal InformationIncorporating Temporal Information

Time series images could be useful forTime series images could be useful for Distinguishing proteins that are not Distinguishing proteins that are not distinguishable in static imagesdistinguishable in static images

Analyzing protein movement in the presence Analyzing protein movement in the presence of drugs, or during different stages of of drugs, or during different stages of the cell cyclethe cell cycle

Need approach that does not require Need approach that does not require detailed understanding of the detailed understanding of the objects/organelles in which each objects/organelles in which each protein is locatedprotein is located Generic Object Tracking approach?Generic Object Tracking approach?

Not all proteins in discernible objectsNot all proteins in discernible objects Non-tracking approach neededNon-tracking approach needed

Texture FeaturesTexture Features

Haralick texture featuresHaralick texture features describe the correlation in describe the correlation in intensity of pixels that are next intensity of pixels that are next to each other in to each other in spacespace. . These have been valuable for These have been valuable for classifying static patterns.classifying static patterns.

Temporal texture featuresTemporal texture features describe the correlation in describe the correlation in intensity of pixels in the same intensity of pixels in the same position in images next to each position in images next to each other over other over timetime..

Temporal Texturesbased on Co-occurrence Matrix

Temporal Texturesbased on Co-occurrence Matrix Temporal co-occurrence matrix P:Temporal co-occurrence matrix P:

NNlevellevel by N by Nlevellevel matrix, Element matrix, Element P[i, j] is the probability that P[i, j] is the probability that a pixel with value i has value j a pixel with value i has value j in the next image (time point).in the next image (time point).

Thirteen statistics calculated Thirteen statistics calculated on P are used as featureson P are used as features

4 2 2 2 41 2 4 1 13 4 4 4 22 2 3 3 23 3 3 2 4

4 2 2 2 41 2 4 1 13 4 4 4 22 2 3 3 23 3 3 2 4

Temporal co-occurrence matrix (for image that does not change)

770000004400660000330000990022000000331144332211

Image at t0 Image at t1

4 2 2 2 41 2 4 1 13 4 4 4 22 2 3 3 23 3 3 2 4

Temporal co-occurrence matrix (for image that changes) 1133330044

11005500335511112222002200111144332211

2 1 4 4 31 4 2 3 32 3 3 2 24 4 2 2 32 4 2 1 4

Image at t0 Image at t1

Implementation of Temporal Texture Features

Implementation of Temporal Texture Features Compare image pairs with different Compare image pairs with different time interval ,compute 13 temporal time interval ,compute 13 temporal texture features for each pair.texture features for each pair.

Use the average and variance of Use the average and variance of features in each kind of time features in each kind of time interval, yields 13*5*2=130 interval, yields 13*5*2=130 featuresfeatures

T= 0s 45s 90s 135s 180s 225s 270s 315s 360s 405s …

Test: Evaluate ability of temporal textures to improve discrimination of similar protein patterns

Results for temporal texture and static features

Results for temporal texture and static features

Dia1Dia1 SdprSdpr Atp5aAtp5a11

AdfpAdfp timm2timm233

Dia1Dia1 5050 3535 55 00 1010

SdprSdpr 99 8787 00 44 00

Atp5a1Atp5a1 00 55 9595 00 00

AdfpAdfp 22 00 22 9292 44

Timm23Timm23 00 55 00 88 8888Average Accuracy 85.1%

ConclusionConclusion

Addition of temporal texture Addition of temporal texture features improves features improves classification accuracy of classification accuracy of protein locationsprotein locations

Generative models of subcellular patterns Generative models of subcellular patterns

Decomposingmixture patternsDecomposingmixture patterns Clustering or classifying whole Clustering or classifying whole cell patterns will consider each cell patterns will consider each combination of two or more combination of two or more “basic” patterns as a unique new “basic” patterns as a unique new patternpattern

Desirable to have a way to Desirable to have a way to decomposedecompose mixtures instead mixtures instead

One approach would be to assume One approach would be to assume that each basic pattern has a that each basic pattern has a recognizable combination of recognizable combination of different types of objectsdifferent types of objects

Object-based subcellular pattern models

Object-based subcellular pattern models Goals:Goals:

to be able to recognize “pure” to be able to recognize “pure” patterns using only objectspatterns using only objects

to be able to recognize and unmix to be able to recognize and unmix patterns consisting of two or more patterns consisting of two or more “pure” patterns“pure” patterns

to enable building of generative to enable building of generative models that can synthesize patterns models that can synthesize patterns from objects: needed for systems from objects: needed for systems biologybiology

Object type determinationObject type determination Rather than specifying object Rather than specifying object types, we chose to learn them from types, we chose to learn them from the datathe data

Use subset of SLFs to describe Use subset of SLFs to describe objectsobjects

Perform Perform kk-means clustering for -means clustering for kk from 2 to 40from 2 to 40

Evaluate goodness of clustering Evaluate goodness of clustering using Akaike Information Criterionusing Akaike Information Criterion

Choose Choose kk that gives lowest AIC that gives lowest AIC

Zhao et al 2005

Unmixing: Learning strategyUnmixing: Learning strategy Once object types are known, each Once object types are known, each cell in the training (pure) set cell in the training (pure) set can be represented as a vector of can be represented as a vector of the amount of fluorescence for the amount of fluorescence for each object typeeach object type

Learn probability model for these Learn probability model for these vectors for each classvectors for each class

Mixed images can then be Mixed images can then be represented using mixture represented using mixture fractions times the probability fractions times the probability distribution of objects for each distribution of objects for each classclass

1 2 3 4 5 6 78

Nuclear class

Lysosomal class

Golgi class0

0.1

0.2

0.3

0.4

0.5

Amt fluor.

Object type

1 2 3 4 5 6 78

Nuclear class

Lysosomal class

Golgi class0

0.1

0.2

0.3

0.4

0.5

Amt fluor.

Object type

1 2 3 4 5 6 7 8

Nuclear classLysosomal classGolgi classAll

0

0.05

0.1

0.15

0.2

0.25

Amt fluor.

Object type

Pure Golgi Pure Golgi PatternPattern

Pure Lysosomal Pattern

50% mix of 50% mix of eacheach

Two-stage Strategy for unmixing unknown imageTwo-stage Strategy for unmixing unknown image

Find objects in unknown (test) Find objects in unknown (test) image, classify each object into image, classify each object into one of the object types using one of the object types using learned object type classifier learned object type classifier built with all objects from built with all objects from training imagestraining images

For each test image, make list of For each test image, make list of how often each object type is foundhow often each object type is found

Find the fractions of each class Find the fractions of each class that give “best” match to this listthat give “best” match to this list

Test of unmixingTest of unmixing

Use 2D HeLa dataUse 2D HeLa data Generate random mixture fractions Generate random mixture fractions for eight major patterns (summing for eight major patterns (summing to 1)to 1)

Use these to synthesize “images” Use these to synthesize “images” corresponding to these mixturescorresponding to these mixtures

Try to estimate mixture fractions Try to estimate mixture fractions from the synthesized imagesfrom the synthesized images

Compare to true mixture fractionsCompare to true mixture fractions

ResultsResults

Given 5 synthesized “cell Given 5 synthesized “cell images” with any mixture of 8 images” with any mixture of 8 basic patternsbasic patterns

Average accuracy of Average accuracy of estimating the mixture estimating the mixture coefficients is 83%coefficients is 83%

Zhao et al 2005

OverviewOverview

ObjectDetection

Real images

objects

Object type assigning

Object type modeling

Statistical models

Object types

€

P(objects | pattern)

€

P(objects | patterns,mix.coeff)

Generating imagesGenerating images

ObjectDetection

Real images

objects



Statistical models

Object types

SamplingGenerated images

€

P(image | pattern)

€

P(image | patterns,mix.coeff)

Generating objects and imagesGenerating objects and images

ObjectDetection

Real images

objects



Statistical models

Object morphology modeling

Object types

SamplingGenerated images

LAMP2 patternLAMP2 pattern

Nucleus

Cell membrane

Protein

Nuclear Shape - Medial Axis ModelNuclear Shape - Medial Axis Model

Rotate

Medial axisRepresented by two curves

the medial axis width along the medial axis

width

Synthetic Nuclear ShapesSynthetic Nuclear Shapes

With added nuclear textureWith added nuclear texture

Cell ShapeDescription: Distance Ratio

Cell ShapeDescription: Distance Ratio

d1

d2 2

21

d

ddr

+=

Capture variation Capture variation as a principal as a principal components components modelmodel

GenerationGeneration

Small ObjectsSmall Objects

Approximated by 2D Gaussian Approximated by 2D Gaussian distributiondistribution

Object PositionsObject Positions

d1d2

21

2

dd

dr

+=

PositionsPositions

Logistic regressionLogistic regression

GenerationGeneration Each pixel has a weight according to the Each pixel has a weight according to the

logistic modellogistic model

rerP

101

1)( ββ −−+=

Fully Synthetic Cell ImageFully Synthetic Cell Image

RealSyntheticRealSynthetic

Conclusions and Future WorkConclusions and Future Work Object-based generative Object-based generative models useful for models useful for communicating information communicating information about subcellular patternsabout subcellular patterns

Work continues!Work continues!

Final wordFinal word

Goal of automated image Goal of automated image interpretation should not beinterpretation should not be Quantitating intensity or Quantitating intensity or colocalizationcolocalization

Making it easier for biologists Making it easier for biologists to see what’s happeningto see what’s happening

Goal should be generalizable, Goal should be generalizable, verifiable, mechanistic verifiable, mechanistic models of cell organization models of cell organization and behavior and behavior automatically automatically derived from imagesderived from images

Computational Biology, Part 28 Automated Interpretation of Subcellular Patterns in Microscope Images III Robert F. Murphy Copyright 1996, 1999, 2000-2006.

Documents