Análise de expressão gênica Helena Brentani LBHC-Laboratório de Bioinformática do Hospital do Câncer [email protected]
Análise de expressão gênica
Helena Brentani
LBHC-Laboratório de Bioinformática do Hospital do Câncer
DNA
RNA
Proteínas
Funçõescelulares
Como estudar o transcriptoma
Análises em larga-escala
- ESTs
- SAGE
-Microarray
-RNA-seq
Exons Introns
cDNA
Coding Non-coding
Expressed Sequence Tag
AAAAAAAAA
ESTs so genes muito expressos, depende do tipo da biblioteca,
Disponibilidade de genoma
SAGE
Tags are isolated and concatermized.
Relative expression levels can be compared between cells in different states.
SAGE – pros e contras
Vantagens
•Não há hibridização nem referência – as medidas são relativas somente ao total de tags da biblioteca;
•Teoricamente, todos os mRNAs são medidos através das tags – não é preciso fixá-los;
•Maior sensitividade
Shui Qing YE, Tera LAVOIE, David C USHER, Li Q. ZHANG - Cell Research 2002; 12(2):105-115
Dificuldades
•Relacionar tag com transcrito;
•Custo elevado;
MicroArray
Microarray – prós e contras
Vantagens:
•Custo relativamente baixo
•Muitos transcritos (600-25k)
•Relação cDNA->gene quase inequívoca
Desvantagens
•Baixa reproducibilidade
•Two-colors: hibridização competitiva, diferença de marcação de um mesmo cDNA
•One-color: falta de parâmetro para comparar lâminas
•Dificuldade de mesclar experimentos numa mesma análise
Concentração do “alvo” de DNA e uniformidade dos spots é muito importante na análise
LAMINAS
TIPOS DE ARRAY
Óligos•30 nt (baixo sinal)
•70 nt (alto custo – necessidade de controles negativos)
cDNA imobilizado
•necessidade de desnaturar a alta temperatura antes de colocar a “sonda” desnaturada – possibilidade de renaturação do cDNA imobilizado durante o processo – baixo sinal•Croshybridization – família gênica,alu repeat•Primers específicos – primers do vetor (menor custo)
cDNA arrays OLIGO Long sequences Short sequences Spot unknown sequences Spot known sequences More variability in the system More reliable data Easier to analyze with appropriate experimental design
More difficult to analyze
A comparison between cDNA and oligonucleotide arrays
Ank-Signal intensity/fragment position on transcript
5’ 3/ /
1264-1529 1644-1906 1928-2202 3711-3977 10882-11147
/ /
PROBE: SCL12A4 (2869bp)
SC
L12
A4
- (2
88b
p)
SC
L12
A4
- (2
869
bp
)
SC
L12
A4
- (1
918b
p)
SC
L12
A6
- (2
52b
p)
SC
L12
A6
- (3
083
bp
)
SC
L12
A6
- (1
893
bp
)
SC
L12
A7-
(242
bp
)
0
5
10
15
20
25
A4-38 A4-46 A4-49 A6-39 A6-50 A6-53 A7-37
gene_frag
me
dia
n in
ten
sit
y
CD
H18
– (
272b
p)
CD
H12
– (
246b
p)
CD
H18
– (
2167
bp
)
CD
H12
– (
979b
p)
PROBE: CDH12 (979bp)
0
5
10
15
20
25
CDH1854 CDH1858 CDH1255 CDH1257
gene_frag
me
dia
n in
ten
sit
y
As cinco etapas da análise de expressão gênica
Preparação das amostras
Reação BioquímicaSpot identification
Análise dos dados
Questão Biológica/desenho
experimental
Desenho experimental• Número de indivíduos para cada classe
– Estimar a variabilidade biológica entre os indivíduos da mesma classe
• Número de replicatas– Estimar a variabilidade experimental
• Tipo de desenho experimental– Reference design – Balanced Block design– Loop design
• Número limitado de bibliotecas, arrays...
Preparação das Amostras•Extração do RNA total•qualidade do RNA•amplificação•controle da amplificação•tipos de protocolos
MARCAÇÃO aRNA para slide cDNADirect labeling method com RT (transcriptase reversa) e random
primer
Random primer – dN6
UUUUUUU 5`3´aRNA
CY3´CY5´
3´5´
5´ 3´
Degradação da molécula de RNA, purificação, hibridização, lavagem e detecção
Spot identificationIndividual spots are recognized, size and shape might be
adjusted per spot (automatically fine adjustments by hand).
Additional manual flagging of bad (X) or non-present (NA) spots
poor spot quality
good spot quality
Different Spot identification methods: Fixed circles, circles with variable size, arbitrary spot shape (morphological opening)
NA
X
Spot identification
Histogram of pixel intensities of a single spot
• The signal of the spots is quantified.
„Donuts“
Mean / Median / Mode / 75% quantile
Local background
GenePix
QuantArray
ScanAlyse
Raw data are not mRNA concentrations
• tissue contamination• RNA degradation
• amplification efficiency
• reverse transcription efficiency
• Hybridization efficiency and specificity
• clone identification and mapping
• PCR yield, contamination
• spotting efficiency
• DNA support binding
• other array manufacturing
related issues
• image segmentation
• signal quantification
• “background” correction
Quality control: Noise and reliable signal
Arrays 1 ... n
Array level Gene levelProbe level
Probe level: quality of the expression measurement of one spot on one particular array
Array level: quality of the expression measurement on one particular glass slide
Gene level: quality of the expression measurement of one probe across all arrays
Probe-level quality control• Individual spots printed on the slide• Sources:
– faulty printing, uneven distribution, contamination with debris, magnitude of signal relative to noise, poorly measured spots;
• Visual inspection:– hairs, dust, scratches, air bubbles, dark regions, regions with haze
• Spot quality:– Brightness: foreground/background ratio– Uniformity: variation in pixel intensities and ratios of intensities within a
spot– Morphology: area, perimeter, circularity.– Spot Size: number of foreground pixels
• Action:– set measurements to NA (missing values)– local normalization procedures which account for regional
idiosyncrasies.– use weights for measurements to indicate reliability in later analysis.
Array level quality control
• Problems:– array fabrication defect– problem with RNA extraction– failed labeling reaction– poor hybridization conditions– faulty scanner
• Quality measures:– Percentage of spots with no signal (~30% excluded spots) – Range of intensities– (Av. Foreground)/(Av. Background) > 3 in both channels– Distribution of spot signal area– Amount of adjustment needed: signals have to substantially changed to make
slides comparable.
Gene-level quality control
Gene g• Poor hybridization in the reference
channel may introduce bias on the fold-change
• Some probes will not hybridize well to the
target RNA
• Printing problems: such that all spots of
a given inventory well have poor quality.
•A well may be of bad quality – contamination
•Genes with a consistently low signal in the reference channel
are suspicious
Gene
mRNA Samples
gene-expression level or ratio for gene i in mRNA sample j
M =Log2(red intensity / green intensity)
Function (PM, MM) of MAS, dchip or RMA
sample1 sample2 sample3 sample4 sample5 …1 0.46 0.30 0.80 1.51 0.90 ...2 -0.10 0.49 0.24 0.06 0.46 ...3 0.15 0.74 0.04 0.10 0.20 ...4 -0.45 -1.03 -0.79 -0.56 -0.32 ...5 -0.06 1.06 1.35 1.09 -1.09 ...
A =average: log2(red intensity), log2(green intensity)
Function (PM, MM) of MAS, dchip or RMA
Gene expression data
Data Data (log scale)
Scatterplot
Message: look at your data on log-scale!
MA Plot
A = 1/2 log2(RG)
M =
log 2
(R/G
)
Median centering
Log S
ignal, c
ente
red
at
0
One of the simplest strategies is to bring all „centers“ of the array data to the same level.
Assumption: the majority of genes are un-changed between conditions.
Median is more robust to outliers than the mean.
Divide all expression measurements of each array by the Median.
Problem of median-centering
Log Green
Log
Red
Scatterplot of log-Signals after Median-centering
A = (Log Green + Log Red) / 2
M =
Log
Red
- Lo
g G
reen
M-A Plot of the same data
Median-Centering is a global Method. It does not adjust for local effects, intensity dependent effects, print-tip effects, etc.
Lowess normalization
A = (Log Green + Log Red) / 2
M =
Log
Red
- Lo
g G
reen
Local estimate Use the estimate to bend
the banana straight
Quality Control
Histograma e MA-Plot
Lowess
Não normalizado Normalizado
From data to knowledge
Gene
mRNA Samples
sample1 sample2 sample3 sample4 sample5 …1 0.46 0.30 0.80 1.51 0.90 ...2 -0.10 0.49 0.24 0.06 0.46 ...3 0.15 0.74 0.04 0.10 0.20 ...4 -0.45 -1.03 -0.79 -0.56 -0.32 ...5 -0.06 1.06 1.35 1.09 -1.09 ...
Ok, now we made sure that our data is of high quality and systematic, non-biological effects are removed.
The result is a gene expression matrix
Is that already a result? No! It’s just data, not knowledge.We need to use this data to answer a scientific question.
Tipos de experimentose a ferramenta adequada
Três tipos básicos de perguntas que os experimentos de microarray se propõe a resolver:
Genes diferencialmente expressos
•Encontrar genes que se comportem de maneira diferente em duas classes, com evidência estatística
•Teste T, Teste exato de Fischer, Chi quadrado, Fold change, BER
Padrões de expressão
•Encontrar listas de genes que tenham comportamento semelhante
•Clustering hierárquico
Classificação de amostras
•Encontrar uma lista reduzida de genes (1 para cada ~20 amostras) cujo comportamento permita predizer alguma informação sobre a amostra, e então construir um classificador
•Encontrar os genes (data mining): SVM-FS, busca exaustiva
•Construir o classificador: SVM, plano de Fischer, CART
Experiment Design
• Type I: (n = 2)– How is this gene expressed in target 1 as compared
to target 2?
– Which genes show up/down regulation between the two targets?
• Type II: (n > 2)– How does the expression of gene A vary over time,
tissues, or treatments?– Do any of the expression profiles exhibit similar
patterns of expression?
Métodos estatísticos usuais de análise de expressão diferencial
• Média/desvio padrão de classes• Fold change• SAM• Teste t student• wilcoxon• Bayes Error Rate• ANOVA• Correção multi-testes (Bonferroni, pFDR)
Permutation tests
i) For each gene, compute d-value (analogous to t-statistic). This isthe observed d-value for that gene.
ii) Rank the genes in ascending order of their d-values.
iii) Randomly shuffle the values of the genes between groups A and B,such that the reshuffled groups A and B respectively have the same number of elements as the original groups A and B. Compute the d-value for each randomized gene
Exp 1 Exp 2 Exp 3 Exp 4Exp 5 Exp 6
Gene 1
Group A Group B
Exp 1Exp 4 Exp 5Exp 2Exp 3 Exp 6
Gene 1
Group A Group B
Original grouping
Randomized grouping
SAM Two-Class Unpaired– 2
SAM Two-Class Unpaired– 4 Significant positive genes (i.e., mean expression of group B >mean expression of group A) in red
Significant negative genes (i.e., mean expression of group A > mean expression of group B) in green
“Observed d = expected d” line
Tuning parameter“delta” limits, can be dynamically changed by using the slider bar or entering a value in the text field.
The more a gene deviates from the “observed = expected” line, the more likely it is to be significant. Any gene beyond the first gene in the +ve or –ve direction on the x-axis (including the first gene), whose observed exceeds the expected by at least delta, is considered significant.
Tipos de experimentose a ferramenta adequada
Três tipos básicos de perguntas que os experimentos de microarray se propõe a resolver:
Genes diferencialmente expressos
•Encontrar genes que se comportem de maneira diferente em duas classes, com evidência estatística
•Teste T, Teste exato de Fischer, Chi quadrado, Fold change, BER
Padrões de expressão
•Encontrar listas de genes que tenham comportamento semelhante
•Clustering hierárquico
Classificação de amostras
•Encontrar uma lista reduzida de genes (1 para cada ~20 amostras) cujo comportamento permita predizer alguma informação sobre a amostra, e então construir um classificador
•Encontrar os genes (data mining): SVM-FS, busca exaustiva
•Construir o classificador: SVM, plano de Fischer, CART
Patterns of Gene Expression
• “Eisen”ized data (dendrograms)
• Self-Organizing Maps
• Principal Component Analysis
• k-means Clustering
Expression Vectors-Gene Expression Vectors
encapsulate the expression of a gene over a set of experimental conditions or sample types.
-0.8 0.8 1.5 1.8 0.5 -1.3 -0.4 1.5
- 2
0
2
1 2 3 4 5 6 7 8Log2(cy5/cy3)
Expression Vectors As Points in‘Expression Space’
Experiment 1
Experiment 2
Experiment 3
Similar Expression
-0.8
-0.60.9 1.2
-0.3
1.3
-0.7Exp 1 Exp 2 Exp 3
G1
G2
G3
G4
G5
-0.4-0.4
-0.8-0.8
-0.7
1.3 0.9 -0.6
Distance: a measure of similarity between genes.
Exp 1 Exp 2 Exp 3 Exp 4 Exp 5 Exp 6
Gene A
Gene B
x1A x2A x3A x4A x5A x6A
x1B x2B x3B x4B x5B x6B
Some distances: (MeV provides 11 metrics)
1. Euclidean: √Σi = 1 (xiA - xiB)26
2. Manhattan: Σi = 1 |xiA – xiB |6
3. Pearson correlation
p0
p1
Distance is Defined by a Metric
-2
0
2
log2
(cy5
/cy3
)
Euclidean Pearson(r*-1)Distance Metric:
4.2
1.4
-1.00
-0.90D
D
Tipos de experimentose a ferramenta adequada
Três tipos básicos de perguntas que os experimentos de microarray se propõe a resolver:
Genes diferencialmente expressos
•Encontrar genes que se comportem de maneira diferente em duas classes, com evidência estatística
•Teste T, Teste exato de Fischer, Chi quadrado, Fold change, BER
Padrões de expressão
•Encontrar listas de genes que tenham comportamento semelhante
•Clustering hierárquico
Classificação de amostras
•Encontrar uma lista reduzida de genes (1 para cada ~20 amostras) cujo comportamento permita predizer alguma informação sobre a amostra, e então construir um classificador
•Encontrar os genes (data mining): SVM-FS, busca exaustiva
•Construir o classificador: SVM, plano de Fischer, CART
SVM Classification
• SVM attempts to find an optimal separating hyperplane between members of the two initial classifications.
Separating hyperplane
Support Vector Machines
Maximal margin separating hyperplane
Datapoints closest to separating hyperplane= support vectors
How well did we do?
The classifier will usually perform worse than before:
Test error > training error
Same classifier (= line)
New data from same classes
Training error: how well do we do on the data we trained the classifier on?
But how well will we do in the future, on new data?
Test error: How well does the classifier generalize?
EASE(Expression Analysis Systematic Explorer)
EASE analysis identifies prevalent biological themes within gene clusters.
The significance of each identified theme is determined by its prevalence in the cluster and in the gene population of genes from which the cluster was created.
EASE File System
Consider a population of genes representing a diverse set of biological roles or themes shown below as different colors.
Diverse Biological Roles
Many algorithms can be applied to expression data to partition genes based on expression profiles over multiple conditions.
Many of these techniques work solely on expression data and disregard biological information.
-What are the some of the predominant biological themes represented in the cluster and how should significance be assigned to a discovered biological theme?
Consider a particular cluster…
Example:
Population Size: 40 genesCluster size: 12 genes
10 genes, shown in green, have a common biological theme and 8 occur within the cluster.
The frequency of the theme in the population is 10/40 = 25%
The frequency of the theme within the cluster is 8/12 = 67%
40
12
10
8
* 80% of the genes related to the theme in the populationended up within the relatively small cluster.
AND
Consider the Outcome
out
in
Theme
outin
Cluster
2
4 26
8
ContingencyMatrix
Assigning Significance to the Findings
The Fisher’s Exact Test permits us to determine if there arenon-random associations between the two variables, expressionbased cluster membership and membership to a particular biological theme.
8 2
4 26
in out
in
out
Cluster
Theme p ≈ .0002
( 2x2 contingency matrix )
Hypergeometric Distribution
a b
c d
a+c
a+b
b+d
c+d
!!!!!
)!()!()!()!(
)!()!(!
!!)!(
!!)!(
dcban
dbcadcba
dcban
dbdb
caca
++++=
++
+×+
The probability of any particularmatrix occurring by randomselection, given no associationbetween the two variables, is givenby the hypergeometric rule.
Probability Computation
For our matrix, 8 2
4 26 , we are not only
interested in getting the probability of getting exactly8 annotation hits in the cluster but rather the probabilityof having 8 or more hits. In this case the probabilities of each of the possible matrices is summed.
9 1
3 27
10 0
2 28
8 2
4 26
.0002207 + 7.27x10-6 + 7.79x10-8 ≈ .000228
Relevance Networks
Set of genes whose expression profiles are predictive of one another.
Genes with low entropy (least variable across experiments)are excluded from analysis.
H = -Σp(x)log2(p(x))x=1
10
Can be used to identify negative correlations between genes
Relevance Networks
Correlation coefficients outside the boundaries defined by the minimum and maximum thresholds are eliminated.
A
D
E B
C
.28
.75
.15.37
.40
.02
.51
.11
.63
.92A
D
E B
C
Tmin = 0.50The expression pattern of each gene compared to that of every other gene.
The ability of each gene to predict the expression of each other gene is assigned a correlation coefficient
Tmax = 0.90
The remaining relationships between genes define the subnets
FunNetFunNet((http://www.funnet.infohttp://www.funnet.info))
��
Programa que cria redes gênicas, a partir de dados de microarray, de 2 formas:
- pela relação direta entre a expressão dos genes
- buscando funções biológicas hiper-representadas (GO/KEGG), agrupando genes por essas funções e calculando relações entre funções
FunNet - parâmetrosFunNet - parâmetros
Upload dos arquivos citados anteriormente, quando necessários
- Análise funcional convencional: apenas analisa os Gos e KEGGs dos genes enviados
- Análise funcional das redes transcricionais: cria as redes
- Estimação de threshold de co-expressão: analisa quais seriam os melhores thresholds para você usar nos seus dados
FunNet – o que há de bom?FunNet – o que há de bom?
1. Buscar GO e KEGG
FunNet – o que há de bom?FunNet – o que há de bom?
2. Criar redes usando somente correlação entre genes3. Criar redes usando módulos funcionais
ArrayExpress• Public repository of microarray based gene expression data.• Implemented in Oracle at EBI.• Contains:
– several curated gene expression datasets – possible introduction of an image server to archive raw image
data associated with the experiments.• Accepts submissions in MAGE-ML format via a web-based data
annotation/submission tool called MIAMExpress. – A demo version of MIAMExpress is available at:
http://industry.ebi.ac.uk/~parkinso/subtool/subtype.html • Provides a simple web-based query interface and is directly linked
to the Expression Profiler data analysis tool which allows expression data clustering and other types of data exploration directly through the web.
Gene Express Omnibus• The Gene Expression Omnibus ia a gene expression database hosted at
the National library of Medicine
• It supports four basic data elements
– Platform ( the physical reagents used to generate the data)
– Sample (information about the mRNA being used)
– Submitter ( the person and organisation submitting the data)
– Series ( the relationship among the samples).
• It allows download of entire datasets, it has not ability to query the relationships
• Data are entered as tab delimited ASCII records,with a number of columns that depend on the kind of array selected.
• Supports Serial Analysis of Gene Expression (SAGE) data.
Stanford Microarray Database
• Contains the largest amount of data.• Uses relational database to answer queries.• Associated with numerious clustering and analysis
features.• Users can access the data in SMD from the web
interface of the package.• Disadvantage :
– It supports only Cy3/Cy5 glass slide data– It is designed to exclusively use an oracle database– Has been recently released outside without anykind of
support !!
Percentage of genes shared by studies used in meta-analysisFor each study (rows) the percentage of genes found in all other studies (columns). The actual number of genes shared is given in parentheses (Rhodes,D.R. et al. (2002) Meta-analysis of microarrays: inter-study validation of gene expression profiles reveals pathway dysregulation in prostate cancer. Cancer Res., 62, 4427–4433.)
Authors Dhanasekaran, S.M., et al.
Luo, J., et al. Magee, J.A., et al.
Welsh, J.B., et al.
Dhanasekaran, S.M., et al.
100 61.1 (5106) 23 (1919) 34.8 (2906)
Luo, J., et al. 99.7 (5106) 100 30.5 (1560) 41.6 (2132)
Magee, J.A., et al.
69 (1919) 56.1 (1560) 100 79.9 (2221)
Welsh, J.B., et al.
53 (2906) 38.9 (2132) 40.5 (2221) 100
How to deal with such variation ?
Copyright ©2002 American Association for Cancer Research
Rhodes, D. R. et al. Cancer Res 2002;62:4427-4433
Rhodes et al. (2002) applied meta-analysis to combine four datasets on prostate cancer to determine genes that are differentially expressed between clinically localized prostate and benign tissue.
Parmigiani et al. (2004) performed a cross-study comparison of gene expression for the molecular classification of lung cancer.
Park and Stegall (2007) combined publicly available and their own microarray datasets to investigate the detection of cytokine gene expression in human kidney.
Meta-analysis has been shown to have increased statistical power to detect small but consistent effects that might be false negatives in the individual analyses (Choi et al., 2003).
It also has significantly improved reproducibility when compared with independent studies, which may lead to improved reliability (Hong et al., 2006).
Therefore, meta-analysis provides researchers with an indispensable tool to interrogate existing databases for candidate biomarkers and biological pathways.
Why meta-analysis ?
Aderbal R. T. Silva, Paulo J. S. Silva, Cecília Feio, Luis P. Camargo, Lea T. Grinberg, Renata E. L. Ferreti, Renata Leite, José M. Farfel, Cesar H. Torres, Dirce M. Carraro, Diogo Patrão, and
Brazilian Aging Brain Study Group
Marcadores biológicos de diagnóstico precoce e prognóstico em AlzheimerUsando dados de expressão gênica
O PROBLEMA
Pacientechega ao
VO
Cérebro Sangue bruto
Timo
RimCoraçã
oHemisféri
oHemisféri
oLeucócit
oPlasma
Tecido / sangue congeladoFormaldeído
DNA RNAK7
Lâmina HE
Entrevista com familiar
Dado clínico
Human Brain Bank of the Cerebral Aging Study Group of FMUSP
SISTEMA INFORMATIZADO
Cadastro inicial do paciente e
amostras
Nro do VO, Nome do paciente, Questionários (opcionais)
�
Qtd tubos por região do cérebroQtd tubos de sangue, timo, rim, etc
Geração do código de barras
Para cada amostra cadastrada, o sistema gera e imprime um código de barras único
Armazenamento das amostras
Freezer específico para cada tipo de amostra, controle de posição por código de barras.
Retirada das amostras
Em qual cuba de formaldeído está o hemisfério esquerdo?Em que freezer está o tubo de sangue?
Transformação das amostras
Hemisfério em cuba de FA => K7K7 => Lâmina HELâmina HE => Lâmina coradaTecido/sangue => RNA e/ou DNA
- Banco de Encéfalos Humanos do Grupo de Estudos de Envelhecimento Cerebral (FMUSP)
�
- SVOC
Córtex frontal
Hipocampo
CASUÍSTICA
Experimental Tissue Reference TissuesAD/nsAD/OD/N Pool of 15 cell lines
Total RNA IsolationRNeasy Mini Kit - QIAGEN
mRNA Amplification2 rounds
T7 polymerase approach
Reverse trascription,fluorescently labeled
with Cy3 (green)
�
and Cy5 (red)
�
Customized cDNA Platform: - 4,800 cDNA sequences
- 4,608 human genes- 192 positive and negative controls
Signal Intensity Capture(Scan)
Statistical analysis
ANOVA (p < 0.05)
Biological processes (GO) (p < 0.05) - WebGestalt
Linear classifiers, with 3 genes
MATERIAL E MÉTODOS
Introdução
+ -
+ DA definida(sintomática)
Outras Demências
- DA assintomática
Indivíduos Normais
Apresentação Clínica de Demência
Neuropatologia da Doença de Alzheimer (DA)
Reserva Cognitiva
Capacidade de tolerar alterações relacionadas a idade/doença no cérebro sem desenvolver sintomas ou sinais claros
Diagnostico precoce
AW842619 ABHD3 AW369078 CRY2 AW369582 GTF2H1 BE831904 LOC348262 BF935252 PSTPIP1 BE814002 TPST1 AW820222 ZNF655AW938004 ACP5 BE815274 CX3CL1 BE937637 GUK1 AW896256 LOC399947 BG001122 PTPN12 BE167402 TRAPPC6B BG009053 ZNF84BE004154 ACTN1 AW890224 DEF6 BE833329 H2AFZ BF884010 LOC440104 BE070870 PTPRO BE812623 TRRAP AW385146BF923962 AFF3 BF958558 DFNA5 BF357100 HIP1R BE703406 LRP1 BF876532 RAB15 AW892064 TTC13 BF332048BF735483 AGPAT4 AW890450 DMN BF736934 HK1 BF154636 LTF BQ332146 RAP1GAP AW608814 TXNDC5 BF376871BF813119 AKAP8L BE698761 DNAJA2 BE768509 HLTF BE181083 MAP3K4 BF962405 RICS AW884244 UBE2L3 BQ356322BQ377544 ATP6AP2 AW362589 DNAJC13 BE089922 HPCAL4 BE829325 MMP19 AW379479 SCARA3 BF814397 UBE2O BE841175Seq_inconsistenteATP6V0B BQ300355 DNAJC5 BE073224 ICMT AW854050 MORC3 BF960958 SLC25A38 BF085922 UBP1 BQ367273BF756552 AXIN2 BF758919 ENTPD6 BF881076 IFNGR1 BE811192 MRPL27 AW937427 SMAD7 AW389806 UCKL1 BQ339374BE828861 AYTL1 AW858463 EPSTI1 BE698667 IL15 BE841209 MTERF BF926012 SNRPN BF806367 UROD BF752100BF805451 BAMBI AW177711 ETS2 AW883691 IQCE AW384219 MYC BF806819 SPG11 BF879244 USP14 BF092878BE168732 BSCL2 AW900852 EXOC2 BF371888 JMJD2C AW384904 MYO5A BQ345409 SPOCK2 BF930903 USP20 BF336377BF798182 C17orf68 BE832888 EXTL3 BF761806 KIAA0146 BE708226 NNT BE926073 SPON1 BF803592 VPS39 BQ330558BE719302 C1orf165 BE832956 FAM125B BE771837 KIAA0427 AW576992 NUDCD3 AW370650 SPRYD4 AW367978 WBP5 BF359711BF359220 C20orf7 BF829157 FBXL10 BF842373 KIAA0895 BF919192 OIT3 BQ329616 SPTBN5 BF817369 YTHDF2 BQ309790BE695309 C7orf44 BF920328 FBXL15 BF996596 KISS1 BF809799 PANK1 BE769762 SRPK1 BG003841 ZBTB7B BQ367012AW752098 C9orf125 BF930310 FLJ14803 BE713379 KLK7 AW372944 PATZ1 BE930145 STK24 BE144581 ZFYVE21 BQ364322BQ365457 CBLL1 BE005088 FLJ32549 AW378473 KPNA6 BF765614 PCTK1 BQ367148 TBRG1 BE836738 ZMYM4 BE844225BF802140 CDC16 BE167409 FOXE1 BF739130 KRT1 BG004607 PLEKHO2 BQ320141 TEC AW842249 ZNF266 BF329787BE168329 CLN5 BE835720 FREQ AW863981 LIPF BE175902 POLK AW582658 TIMM9 BE141963 ZNF394 BQ311508AW995807 CPSF3 AW995236 GALNAC4S-6ST BF998151 LOC148696 BG003089 PPIL2 BE002927 TMC5 AW379158 ZNF559 BQ360637BF920912 CRKL BF935421 GRIK5 BF753140 LOC285628 BE703712 PPP2R2C BF806472 TMEM2 AW364861 ZNF576 BQ355034
RESULTADO ANOVA (1X3)= 154 GENES
107
18 12
2
4 9
24 GENES DIFERENCIALMENTE EXPRESSOS ENTRE nsAD E NDIAGNÓSTICO PRECOCE???
MARCADORES BIOLÓGICOS DEDIAGNOSTICO PRECOCE
Resultados
• Genes diferencialmente expressos
DAa → DAd
Resultados• Processos Biológicos (GO)
Resultados• Rede dos processos biológicos
Module 1 Module 2
Up-regulatedDown-regulated
Resultados
• Vias de Sinalização (KEGG)
Resultados
• Redes das vias de sinalização
Module 1 Module 2
Up-regulatedDown-regulated
Resultados
• Clusterização Hierárquica (genes fold ≥│1,8│)
Resultados
• Classificadores
Conclusões• Genes que distinguem DAa de DAd estão envolvidos,
principalmente, com ciclo celular e plasticidade sináptica.
• HIPÓTESE
Neurônios erroneamente convertem sinais que seriam utilizados para plasticidade sináptica na ativação do ciclo celular, o que, subsequentemente, leva-os à morte.
DA assintomática
Neurônios diferenciados, após se retirarem do ciclo celular, são capazes de usar alternativamente este mecanismo, essencialmente desenvolvido para controlar proliferação, para controlar plasticidade sináptica.
DA ‘sintomática’
Arendt T, Bruckner NK. Biochim Biophys Acta. 2007; 1772:413-421.Frank CL, Tsai LH. Neuron. 2009; 62:312-326.
Questão BiológicaQuestão Biológica
Delineamento experimentalDelineamento experimental
Genes diferencialmente expressos, marcadores tumoraisGenes diferencialmente expressos, marcadores tumorais
Padrões de expressão gênica, classificadores,Padrões de expressão gênica, classificadores,
Vias metabólicas e Sistemas biológicosVias metabólicas e Sistemas biológicos
Validação biológica Validação biológica
Validação técnica Validação técnica
A VISÃO DO PESQUISADOR EM MICROARRAY
1-Desenho experimental
2-QUALIDADE:ruído-RNAm (tecidos, kits, procedimentos); transcrição (reação,enzima);Marcação (tipo, tempo); amplificação, tipo de pino, superfície, volume sonda;produção slide, fixação; parametros de hibridação (tempo, temperatura,Tampão); hibridação inespecífica; background; artefatos; captura de imagem;segmentação; quantificação dos pixels...
3-normalização
4-ANÁLISE: muitos genes (multipla testagem???); Significância- Número de variáveis(genes) é muito maior que número de experimentos
5-replicabilidade
A VISÃO DO ANALISTA
Biological verification and interpretation
Microarray experiment
Experimental design
Image analysis
Normalization
Biological question (hypothesis-driven or
explorative)
TestingEstimatio
nDiscriminatio
n
AnalysisClusterin
g
Quality Measurement
Failed
Pass
Pre-processing
Biological verification and interpretation
Microarray experiment
Experimental design
Image analysis
Normalization
Biological question (hypothesis-driven or
explorative)
TestingEstimatio
nDiscriminatio
n
AnalysisClusterin
g
Quality Measurement
Failed
Pass
Pre-processing
To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination:
He may be able to say what the experiment died of.
Ronald Fisher
Books
Gentleman, Carey, Huber, “Bioinformatics and Computational Biology Solutions Using R and Bioconductor”, Springer
David W. Mount, „Bioinformatics“, Cold Spring Harbor
Terry Speed, „Statistical Analysis of Gene Expression Microarray Data”. Chapman & Hall/CRC
Pierre Baldi & G. Wesley Hatfield, „DNA Microarrays and Gene Expression”, Cambridge
Giovanni Parmigani et al, „The Analysis of Gene Expression Data“, Springer
And how do I analyze my own data?
www.r-project.orgwww.bioconductor.org•Open source•Free•Easy installation•Helpful community•High quality standards•Regularly maintained and updated•Tons of documentation•Every package comes with example vignettes to walk you through standard tasks.
Agradecimentosao time do LBHC