Introduction in the analysis of cell-based functional ...users.unimi.it/marray/2006/material/Lec9/... · analysis of cell-based functional assay using Bioconductor CSAMA Bressanone
Post on 27-May-2020
4 Views
Preview:
Transcript
Florian HahneDepartment of Molecular Genome Analysis
German Cancer Research Center Heidelberg
Introduction in the analysis of cell-based functional assay using Bioconductor
CSAMA Bressanone 2006
Candidate gene sets from microarray
studies: dozens…hundreds
Capacity of detailed in-vivo functional studies:
one…few
How to close the gap?
How to separate a flood of ‘significant’secondary effects from causally relevant ones?
Why do we need functional assays?
• means to monitor effect of perturbation
expression or activation state of key regulatory proteins (plate reader, FACS, automated microscope)
The design: manipulate gene expression/protein function
• means to monitor perturbation (beneficial but not mandatory)
expression of fluorescence protein tag
• system to willfully manipulate expression level of certain genes in cells
up regulation (transfection of expression vectors)
down regulation (RNA interference, small compounds)
Features of cell-based assays: reporter system
Moffat et al. Nature Reviews Molecular Cell Biology 7, 177–187 (March 2006)
Features of cell-based assays
• What is the right assay for the question asked?In principle, any cellular function can be probed
• A cell is a complex system assay needs optimization
• Not all cells are the same choice of appropriate cell system
• Different levels of complexity:cell populations, individual cells, cell components
• Different technologiesplate readers, flow cytometry, microscopy
• Large amounts of data (genome-wide)information services, visualization, data management
functional assayphenotypes
Differentially regulatedgenes
~ 70 280 genes
RIP/IMDpathways
RIP
Tak1
IKK
Rel
R
Targets
Michael Boutros
Is differential expression a good predictor for ’signaling’ function?
Most pathway targets are not required for pathway function
functional assayphenotypes
Differentially regulatedgenes
~ 70 280 genes
3
RIP/IMDpathways
RIP
Tak1
IKK
Rel
R
Targets
Michael Boutros
Is differential expression a good predictor for ’signaling’ function?
RNAi is a post-transcriptional gene-silencing process...
...that can be applied in a high-throughput fashion on the cell or
organism scale.
RNA interference (RNAi)
T7
Precursor dsRNA
siRNAs
Degradation of target message
C. elegans Drosophila Mammals
Injection and soaking
Feeding bacteria
Worms
Bathing
> 200bp> 200bp 21bp
Cell culture
dsRNA dsRNA siRNAE. coli
DICER
RNAi experiments in different organisms
Moffat et al. Nature Reviews Molecular Cell Biology 7, 177–187 (March 2006)
Large scale RNAi screens
• R/Bioconductor package• Systematic analysis and documentation of cell-based HTS assays by RNAi or other type of perturbation libraries• Step-by-step analysis, from raw data files to the annotated hit list• Prerequisite for multiple screen comparisons (standardization)• Audit trail of experiment QA• The whole experiment is contained in one object
cellHTS package
Computational statistics using bioinformatic tools
Hit identification
Candidates
Gene Ontology
Expression DB
Protein DB
Raw data
Integrate information from external DB
Hit validation by secondary assays
cellHTS
cellHTSpackage
package
Screening work-flow
cellHTSpackage
A typical cell-based HTS assay
Import the raw data files
Plate-wise quality control
Data preprocessing
Experiment-wise quality control
Ranking of phenotypes (hit list)
HTML quality reports
Work flow
cell
num
ber
plate plots as graphical representation of experimental entities
• false color coding for concise display of numeric outcomes from statistical analyses
visualization of results
quantitative
Visualization: plate plots (in package prada)
visualization of results
plate plots as graphical representation of experimental entities
• false color coding for concise display of numeric outcomes from statistical analyses
Visualization: plate plots (in package prada)
qualitative
• per plate quality assessment• Dynamic range;
• Scatter plot between replicates and correlation coefficient;
• Distribution of the intensity values for each replicate;
• “Plate plots” for the replicate measurements and for the standard deviation between replicate measurements
• per experiment quality assessment• Boxplots for each replicate grouped by plate;
• Distribution of the signal in the control wells
• whole screen visualization
Main features
Quality reports in the form of HTML pages
cellHTS package
• Plate effects (plate-to-plate variations)
• Edge or spatial effects within the plate (well-to-well variations)
Remove systematic biases and variations, while keeping the biological relevant information
Assay formats, pipetting delivery, robotic failures, differences in compound concentrations due to
evaporation of solvent, potency differences across compounds, systematic across-plate biases, within-plate
spatial biases, ...
Data preprocessing
• Percent of control:
• Normalized percent inhibition:
• z score:
Plate effects
kth wellith plate
Plate effects & edge effects
• B score:
100×positivei
ki'ki µ
x=x
negativei
positivei
kipositive
i'ki µµ
xµ=x−−
i
iki'ki σ
µx=x −
rth rowcth column
ith plate
( )i
ciriirci'rci MAD
CRx=xˆˆˆ ++− µ
Data preprocessing
• Scale the plates by a plate-specific factor:
• Based on the intensities of the controls
• Based on the intensities of the samples
• Robust location estimators that take into account the assignment of RNAi reagents to the plates (random or non-random)
Plate effects
kth wellith plate
Data preprocessing
Plate effects – plate median scaling
kth wellith plate
Data preprocessing
Data preprocessing: plate effects due to library design
raw data
Plate 26
proteasome subunits or components;
ATP/GTP-binding site motifs
ribosomal proteins
like-Sm nucleoproteins and ribosomal proteins
Data preprocessing : plate effects due to library design
• Consider the shorth of the distribution of intensities in each plate as the per-plate scaling factor.
Plate effects & the siRNA library
Data preprocessing : plate effects due to library design
Data preprocessing: plate effects due to library design
before normalization after normalization
Median intensity across plates (preprocessed data)
Data preprocessing: edge effects
Data preprocessing: edge effects
Look at the sample variances based on the normalized replicate values
For some cases data transformation can stabilize variance
Data preprocessing: data transformation
Data Transformation: viability screen
Data transformation: treatment screen
Toll and Imd screens in D. melanogaster cells
• Two reporters:
• Firefly luciferase (F) for pathway activity
• Renilla luciferase (R) for growth and viability
• How to combine the intensities of the two reporters?
Two-color data
Toll screen
viability
path
way
activ
ity
FACS (fluorescence activated cell sorting)
light scatter detector
Fluorescence detector(PMT3, PMT4 etc.)
Laser
• measures fluorescence intensities as well as morphological parameters on the basis of light emission
• offers single cell resolution
• robust, reliable, variable
ORF
ORF
ORF
attB1attB2
attB1attB2
attB1attB2
ORF
attL1
attL2
entryclone
ORF
ORF
ORF
attB1attB2
attB1attB2
attB1attB2
PCR amplification
ORF
attL1
attL2
entryclone
Full coding cDNA clone
ORF cloning: The Gateway™ System
N ORF YFP CORFYFPN CN-terminal tag C-terminal tag
- package prada
package prada contains functionalities for analysis of data derived from cell based assays with a strong focus on flow cytometry data
modular framework
• data preprocessing• data visualization• data integration and management
for statistical inference and modeling general purpose tools can be used
• linear models • local regression• hypothesis testing
• FCS 3.0 files- standardized storage format for FACS data- contains fluorescence values in data segment, wealth of meta
data in text segment- can be imported into R (function readFCS)
Data import and maintenance (object orientation)
• cytoFrameR internal representation of data from one FCS file
- raw data matrix
- list of meta data• cytoSet
R internal representation of data from several FCS files (e.g. one 96 well plate)
generic functions, class methods
Gating
Gate: Selections of subpopulations of cells with respect to one or several measurement parameters.
Objects of class gate and gateSet
interactive drawing of gates based on two-dimensional scatter plots
can be assigned to cytoFrames
Gating
G1 + G2
G1 – G2
G1 ∩ G2
G2
G2
G2
combination of gates:
G1
G1
G1
distinction on basis of morphological properties
strong variation between experiments
dynamic determination
cell size
gran
ular
ityData pre-processing: FSC vs. SSC plot
Data pre-processing: finding the main population
assumption:bivariate normal distribution
robust fitting
discarding cells that do not lie within some given boundary of this distribution
=density ofdistribution
= discarded
X =midpoint ofdistribution
Data pre-processing: finding the main population
=density ofdistribution
= discarded
X =midpoint ofdistribution
shape and location of main distribution can be used for quality control
assumption:bivariate normal distribution
robust fitting
discarding cells that do not lie within some given boundary of this distribution
Understanding FACS data
parameter 1(perturbation)
para
met
er 2
(phe
noty
pe)
Understanding FACS data
parameter 1(perturbation)
para
met
er 2
(phe
noty
pe)activation
Understanding FACS data
parameter 1(perturbation)
para
met
er 2
(phe
noty
pe)inhibition
Dealing with correlations
cell size correlates with fluorescent intensities
(FL1)
(FL4)
specifictotal xsx ++= βα
induces spurious correlationsin the data
s: cell size (FSC) xtotal : measured fluorescencexspecific: actual fluorescence emitted by dye
different responses for different assays
• discrete response: on/off mechanism(e.g. apoptosis, proliferation)
over expression
effe
ct
over expression
effe
cttheory FACS
• continuous response: concentration dependent(e.g. MAP kinase)
over expression
effe
ct
over expression
effe
ct
theory FACS
Statistical analysis: mode of response
• robust fitting of smoothed local regression function:y: response (phenotype)x: perturbation signalm: smooth function
: robust estimator of m at point x0
• z-score as dimension less measure of effect:ratio of estimated slope δ at point x0 and assay-widescale parameter δ0
z = 18.1 z = 0.4 z = -40.2
t* t* t*
Statistical analysis: continuous response
( )( )0xmxmy′=
+=)δ
ε
0δδ
=z
)(ˆ 0xm′
Fisher’s exact test
Statistical analysis: discrete response
non perturbedpositive
(a)
non perturbednegative
(b)
perturbednegative
(d)
perturbedpositive
(c)ph
enot
yype
perturbation
, p valueeffect size significance
2
1
rrratioodds =
11
1 ++
=bar
11
2 ++
=dcr
Statistical analysis: discrete response
no effect activator
17 440
9556 3247
58 64
6010 5321
-log(odds ratio) = 0.09(p = 0.24)
-log(odds ratio) = 4.33(p = 2.2e-16)
visualization of results
plate plots as graphical representation of experimental entities
• false color coding for concise display of numeric outcomes from statistical analyses
• HTML image map allows for hyper linking to include further information for each well
Visualization: plate plots
additionalinformation
visualization of results
plate plots as graphical representation of experimental entities
• false color coding for concise display of numeric outcomes from statistical analyses
• HTML image map allows for hyper linking to include further information for each well
Visualization: plate plots
replicates
visualization of results
plate plots as graphical representation of experimental entities
• false color coding for concise display of numeric outcomes from statistical analyses
• HTML image map allows for hyper linking to include further information for each well
Visualization: plate plots
anything…
Package rflowcyt
• Also deals with flow cytometry data
• Focus more on individual FACS measurements and quality control
• slightly different object model but essentially the same conceptconversion functions from and to cytoFrames
• Quality assessment tools
Package rflowcyt
box plotcontour plot
ECDF density plot summary plot
Control phenotype
What the data look like...
Nuclear phenotype Cytokinesis
Mitotic arrest
Multipolar spindles
Tubulin elongation
CONTROL
What we try to find...
dm <- sqrt(distMap(seg))res <- objectCount(dm, gray[,,4], 100, 40)
index x y size intensity[1,] 1 304 95 1065 221.95894[2,] 1 140 186 680 141.64695[3,] 1 222 217 786 178.99816[4,] 1 0 170 274 61.27550[5,] 1 336 139 800 148.25224[6,] 1 212 91 696 213.69449[7,] 1 290 267 664 150.84269[8,] 1 107 101 1102 245.86509[9,] 1 257 0 372 83.80994
Image processing and analysis: Bioconductor
package EBImage(ImageMagick & others)
Computational statistics on vector data:
clustering, classification, hypothesis testing
R
Phenotypes, gene functions
ReproducibilityEvolution of code
ParallelizationDon't reinvent the wheel, stand on the
shoulders of giants
Features
Supports a variety of 2D (about 40) and some 3D image formats (TIFF) in read/write
mode, supports local file system and network protocols (HTTP, FTP)
Image objects are based on native R arrays thus supporting all functions
available for arrays as well as giving speed in operating with images in R
Majority of code is C/C++ for high performance
I/O is based on ImageMagick::Magick++ (C++) library available for a number of
platforms and operating systems (including both Windows and Linux)
Effective memory management to enable operations on very large data sets
Majority of ImageMagick:Magick++ 2D image processing filters:
threshold, blur, noise removal, edge, sharpen, unsharp mask etc
Distance Map filter, Object Counting algorithms
Manipulating Image Data
im1 <- read.image(“im01.jpg”); im2 <- read.image(“im02.jpg”)
# addition of two images, combining features of both in oneim3 <- im1 + im2
# subtraction of images – image differenceim4 <- im1 – im2
# multiplication – amplification of common features and removal of differencesim5 <- im1 * im2
# scaling of dataim6 <- im1 * 2
# extending contrast of dark regionsim7 <- sqrt(im1)
# cropping images and subscriptingim8 <- im1[100:200, 80:180]im9 <- im1[100:200, ]
# conditional replacement of image data – thresholdingim8[im8 > 0.5] <- 1.0
# data of one image is modified based on condition from another oneim1[im2 <= 0.2] <- 0.0
# conversions between colour modes and summation of RGB imagesrgb <- toRed(im1) + toGreen(im2); gray <- toGray(rgb)
addition
subtraction
multiplication
sqrt(im)
im[im>0.4]=1
im[..]
Image filters
original imageim <- read.image(..)
normalized in [0..1]im <- normalize(im)
false colour, 2-channelsim2 <- toRed(im) +
toGreen(im1)
adaptive thresholdingseg <- thresh(im, 20,
20, 400, TRUE)
skeletonsk <- edge(dm, 1)
edge filtered <- edge(im, 1)
Other filtersenhancements: blur, despeckle, enhance, medianFilter, gaussianFilter, redNoise, sharpen,spread, unsharpMasksegmentation: segment
colour: contrast, equalize,colorGamma, mod, shadetransformations: rotate, sample.image, scale.image
Distance mapsdm <- distMap(seg)display(normalize(dm))
Object count & indexingres <- objectCount(dm)
Identifying cells
Object marking
All filters are implemented in C++
for high performance
Acknowledgements
• EBI:
Wolfgang Huber
Ligia Bras
Oleg Sklyar
• DKFZ:
Stefan Wiemann
Dorit Arlt
Meher Majety
Mamatha Sauerman
Michael Boutros
Florian Fuchs
Viola Gesellchen
Dierk Ingelfinger
David Kuttenkeuler
Sandra Steinbrink
• FHCRC
Robert Gentleman
Nolwenn LeMeur
Seth Falcon
top related