Análise de expressão gênica

Análise de expressão gênica

Helena Brentani

LBHC-Laboratório de Bioinformática do Hospital do Câncer

[email protected]

DNA

RNA

Proteínas

Funçõescelulares

Como estudar o transcriptoma

Análises em larga-escala

- ESTs

- SAGE

-Microarray

-RNA-seq

Exons Introns

cDNA

Coding Non-coding

Expressed Sequence Tag

AAAAAAAAA

ESTs so genes muito expressos, depende do tipo da biblioteca,

Disponibilidade de genoma

SAGE

Tags are isolated and concatermized.

Relative expression levels can be compared between cells in different states.

SAGE – pros e contras

Vantagens

•Não há hibridização nem referência – as medidas são relativas somente ao total de tags da biblioteca;

•Teoricamente, todos os mRNAs são medidos através das tags – não é preciso fixá-los;

•Maior sensitividade

Shui Qing YE, Tera LAVOIE, David C USHER, Li Q. ZHANG - Cell Research 2002; 12(2):105-115

Dificuldades

•Relacionar tag com transcrito;

•Custo elevado;

MicroArray

Microarray – prós e contras

Vantagens:

•Custo relativamente baixo

•Muitos transcritos (600-25k)

•Relação cDNA->gene quase inequívoca

Desvantagens

•Baixa reproducibilidade

•Two-colors: hibridização competitiva, diferença de marcação de um mesmo cDNA

•One-color: falta de parâmetro para comparar lâminas

•Dificuldade de mesclar experimentos numa mesma análise

Concentração do “alvo” de DNA e uniformidade dos spots é muito importante na análise

LAMINAS

TIPOS DE ARRAY

Óligos•30 nt (baixo sinal)

•70 nt (alto custo – necessidade de controles negativos)

cDNA imobilizado

•necessidade de desnaturar a alta temperatura antes de colocar a “sonda” desnaturada – possibilidade de renaturação do cDNA imobilizado durante o processo – baixo sinal•Croshybridization – família gênica,alu repeat•Primers específicos – primers do vetor (menor custo)

cDNA arrays OLIGO Long sequences Short sequences Spot unknown sequences Spot known sequences More variability in the system More reliable data Easier to analyze with appropriate experimental design

More difficult to analyze

A comparison between cDNA and oligonucleotide arrays

Ank-Signal intensity/fragment position on transcript

5’ 3/ /

1264-1529 1644-1906 1928-2202 3711-3977 10882-11147

/ /

PROBE: SCL12A4 (2869bp)

SC

L12

A4

- (2

88b

p)

SC

L12

A4

- (2

869

bp

)

SC

L12

A4

- (1

918b

p)

SC

L12

A6

- (2

52b

p)

SC

L12

A6

- (3

083

bp

)

SC

L12

A6

- (1

893

bp

)

SC

L12

A7-

(242

bp

)

0

5

10

15

20

25

A4-38 A4-46 A4-49 A6-39 A6-50 A6-53 A7-37

gene_frag

me

dia

n in

ten

sit

y

CD

H18

– (

272b

p)

CD

H12

– (

246b

p)

CD

H18

– (

2167

bp

)

CD

H12

– (

979b

p)

PROBE: CDH12 (979bp)

0

5

10

15

20

25

CDH1854 CDH1858 CDH1255 CDH1257

gene_frag

me

dia

n in

ten

sit

y

As cinco etapas da análise de expressão gênica

Preparação das amostras

Reação BioquímicaSpot identification

Análise dos dados

Questão Biológica/desenho

experimental

Desenho experimental• Número de indivíduos para cada classe

– Estimar a variabilidade biológica entre os indivíduos da mesma classe

• Número de replicatas– Estimar a variabilidade experimental

• Tipo de desenho experimental– Reference design – Balanced Block design– Loop design

• Número limitado de bibliotecas, arrays...

Preparação das Amostras•Extração do RNA total•qualidade do RNA•amplificação•controle da amplificação•tipos de protocolos

MARCAÇÃO aRNA para slide cDNADirect labeling method com RT (transcriptase reversa) e random

primer

Random primer – dN6

UUUUUUU 5`3´aRNA

CY3´CY5´

3´5´

5´ 3´

Degradação da molécula de RNA, purificação, hibridização, lavagem e detecção

Spot identificationIndividual spots are recognized, size and shape might be

adjusted per spot (automatically fine adjustments by hand).

Additional manual flagging of bad (X) or non-present (NA) spots

poor spot quality

good spot quality

Different Spot identification methods: Fixed circles, circles with variable size, arbitrary spot shape (morphological opening)

NA

X

Spot identification

Histogram of pixel intensities of a single spot

• The signal of the spots is quantified.

„Donuts“

Mean / Median / Mode / 75% quantile

Local background

GenePix

QuantArray

ScanAlyse

Raw data are not mRNA concentrations

• tissue contamination• RNA degradation

• amplification efficiency

• reverse transcription efficiency

• Hybridization efficiency and specificity

• clone identification and mapping

• PCR yield, contamination

• spotting efficiency

• DNA support binding

• other array manufacturing

related issues

• image segmentation

• signal quantification

• “background” correction

Quality control: Noise and reliable signal

Arrays 1 ... n

Array level Gene levelProbe level

Probe level: quality of the expression measurement of one spot on one particular array

Array level: quality of the expression measurement on one particular glass slide

Gene level: quality of the expression measurement of one probe across all arrays

Probe-level quality control• Individual spots printed on the slide• Sources:

– faulty printing, uneven distribution, contamination with debris, magnitude of signal relative to noise, poorly measured spots;

• Visual inspection:– hairs, dust, scratches, air bubbles, dark regions, regions with haze

• Spot quality:– Brightness: foreground/background ratio– Uniformity: variation in pixel intensities and ratios of intensities within a

spot– Morphology: area, perimeter, circularity.– Spot Size: number of foreground pixels

• Action:– set measurements to NA (missing values)– local normalization procedures which account for regional

idiosyncrasies.– use weights for measurements to indicate reliability in later analysis.

Array level quality control

• Problems:– array fabrication defect– problem with RNA extraction– failed labeling reaction– poor hybridization conditions– faulty scanner

• Quality measures:– Percentage of spots with no signal (~30% excluded spots) – Range of intensities– (Av. Foreground)/(Av. Background) > 3 in both channels– Distribution of spot signal area– Amount of adjustment needed: signals have to substantially changed to make

slides comparable.

Gene-level quality control

Gene g• Poor hybridization in the reference

channel may introduce bias on the fold-change

• Some probes will not hybridize well to the

target RNA

• Printing problems: such that all spots of

a given inventory well have poor quality.

•A well may be of bad quality – contamination

•Genes with a consistently low signal in the reference channel

are suspicious

Gene

mRNA Samples

gene-expression level or ratio for gene i in mRNA sample j

M =Log2(red intensity / green intensity)

Function (PM, MM) of MAS, dchip or RMA

sample1 sample2 sample3 sample4 sample5 …1 0.46 0.30 0.80 1.51 0.90 ...2 -0.10 0.49 0.24 0.06 0.46 ...3 0.15 0.74 0.04 0.10 0.20 ...4 -0.45 -1.03 -0.79 -0.56 -0.32 ...5 -0.06 1.06 1.35 1.09 -1.09 ...

A =average: log2(red intensity), log2(green intensity)

Function (PM, MM) of MAS, dchip or RMA

Gene expression data

Data Data (log scale)

Scatterplot

Message: look at your data on log-scale!

MA Plot

A = 1/2 log2(RG)

M =

log 2

(R/G

)

Median centering

Log S

ignal, c

ente

red

at

0

One of the simplest strategies is to bring all „centers“ of the array data to the same level.

Assumption: the majority of genes are un-changed between conditions.

Median is more robust to outliers than the mean.

Divide all expression measurements of each array by the Median.

Problem of median-centering

Log Green

Log

Red

Scatterplot of log-Signals after Median-centering

A = (Log Green + Log Red) / 2

M =

Log

Red

- Lo

g G

reen

M-A Plot of the same data

Median-Centering is a global Method. It does not adjust for local effects, intensity dependent effects, print-tip effects, etc.

Lowess normalization

A = (Log Green + Log Red) / 2

M =

Log

Red

- Lo

g G

reen

Local estimate Use the estimate to bend

the banana straight

Quality Control

Histograma e MA-Plot

Lowess

Não normalizado Normalizado

From data to knowledge

Gene

mRNA Samples

sample1 sample2 sample3 sample4 sample5 …1 0.46 0.30 0.80 1.51 0.90 ...2 -0.10 0.49 0.24 0.06 0.46 ...3 0.15 0.74 0.04 0.10 0.20 ...4 -0.45 -1.03 -0.79 -0.56 -0.32 ...5 -0.06 1.06 1.35 1.09 -1.09 ...

Ok, now we made sure that our data is of high quality and systematic, non-biological effects are removed.

The result is a gene expression matrix

Is that already a result? No! It’s just data, not knowledge.We need to use this data to answer a scientific question.

Tipos de experimentose a ferramenta adequada

Três tipos básicos de perguntas que os experimentos de microarray se propõe a resolver:

Genes diferencialmente expressos

•Encontrar genes que se comportem de maneira diferente em duas classes, com evidência estatística

•Teste T, Teste exato de Fischer, Chi quadrado, Fold change, BER

Padrões de expressão

•Encontrar listas de genes que tenham comportamento semelhante

•Clustering hierárquico

Classificação de amostras

•Encontrar uma lista reduzida de genes (1 para cada ~20 amostras) cujo comportamento permita predizer alguma informação sobre a amostra, e então construir um classificador

•Encontrar os genes (data mining): SVM-FS, busca exaustiva

•Construir o classificador: SVM, plano de Fischer, CART

Experiment Design

• Type I: (n = 2)– How is this gene expressed in target 1 as compared

to target 2?

– Which genes show up/down regulation between the two targets?

• Type II: (n > 2)– How does the expression of gene A vary over time,

tissues, or treatments?– Do any of the expression profiles exhibit similar

patterns of expression?

Métodos estatísticos usuais de análise de expressão diferencial

• Média/desvio padrão de classes• Fold change• SAM• Teste t student• wilcoxon• Bayes Error Rate• ANOVA• Correção multi-testes (Bonferroni, pFDR)

Permutation tests

i) For each gene, compute d-value (analogous to t-statistic). This isthe observed d-value for that gene.

ii) Rank the genes in ascending order of their d-values.

iii) Randomly shuffle the values of the genes between groups A and B,such that the reshuffled groups A and B respectively have the same number of elements as the original groups A and B. Compute the d-value for each randomized gene

Exp 1 Exp 2 Exp 3 Exp 4Exp 5 Exp 6

Gene 1

Group A Group B

Exp 1Exp 4 Exp 5Exp 2Exp 3 Exp 6

Gene 1

Group A Group B

Original grouping

Randomized grouping

SAM Two-Class Unpaired– 2

SAM Two-Class Unpaired– 4 Significant positive genes (i.e., mean expression of group B >mean expression of group A) in red

Significant negative genes (i.e., mean expression of group A > mean expression of group B) in green

“Observed d = expected d” line

Tuning parameter“delta” limits, can be dynamically changed by using the slider bar or entering a value in the text field.

The more a gene deviates from the “observed = expected” line, the more likely it is to be significant. Any gene beyond the first gene in the +ve or –ve direction on the x-axis (including the first gene), whose observed exceeds the expected by at least delta, is considered significant.













Patterns of Gene Expression

• “Eisen”ized data (dendrograms)

• Self-Organizing Maps

• Principal Component Analysis

• k-means Clustering

Expression Vectors-Gene Expression Vectors

encapsulate the expression of a gene over a set of experimental conditions or sample types.

-0.8 0.8 1.5 1.8 0.5 -1.3 -0.4 1.5

- 2

0

2

1 2 3 4 5 6 7 8Log2(cy5/cy3)

Expression Vectors As Points in‘Expression Space’

Experiment 1

Experiment 2

Experiment 3

Similar Expression

-0.8

-0.60.9 1.2

-0.3

1.3

-0.7Exp 1 Exp 2 Exp 3

G1

G2

G3

G4

G5

-0.4-0.4

-0.8-0.8

-0.7

1.3 0.9 -0.6

Distance: a measure of similarity between genes.

Exp 1 Exp 2 Exp 3 Exp 4 Exp 5 Exp 6

Gene A

Gene B

x1A x2A x3A x4A x5A x6A

x1B x2B x3B x4B x5B x6B

Some distances: (MeV provides 11 metrics)

1. Euclidean: √Σi = 1 (xiA - xiB)26

2. Manhattan: Σi = 1 |xiA – xiB |6

3. Pearson correlation

p0

p1

Distance is Defined by a Metric

-2

0

2

log2

(cy5

/cy3

)

Euclidean Pearson(r*-1)Distance Metric:

4.2

1.4

-1.00

-0.90D

D













SVM Classification

• SVM attempts to find an optimal separating hyperplane between members of the two initial classifications.

Separating hyperplane

Support Vector Machines

Maximal margin separating hyperplane

Datapoints closest to separating hyperplane= support vectors

How well did we do?

The classifier will usually perform worse than before:

Test error > training error

Same classifier (= line)

New data from same classes

Training error: how well do we do on the data we trained the classifier on?

But how well will we do in the future, on new data?

Test error: How well does the classifier generalize?

EASE(Expression Analysis Systematic Explorer)

EASE analysis identifies prevalent biological themes within gene clusters.

The significance of each identified theme is determined by its prevalence in the cluster and in the gene population of genes from which the cluster was created.

EASE File System

Consider a population of genes representing a diverse set of biological roles or themes shown below as different colors.

Diverse Biological Roles

Many algorithms can be applied to expression data to partition genes based on expression profiles over multiple conditions.

Many of these techniques work solely on expression data and disregard biological information.

-What are the some of the predominant biological themes represented in the cluster and how should significance be assigned to a discovered biological theme?

Consider a particular cluster…

Example:

Population Size: 40 genesCluster size: 12 genes

10 genes, shown in green, have a common biological theme and 8 occur within the cluster.

The frequency of the theme in the population is 10/40 = 25%

The frequency of the theme within the cluster is 8/12 = 67%

40

12

10

8

* 80% of the genes related to the theme in the populationended up within the relatively small cluster.

AND

Consider the Outcome

out

in

Theme

outin

Cluster

2

4 26

8

ContingencyMatrix

Assigning Significance to the Findings

The Fisher’s Exact Test permits us to determine if there arenon-random associations between the two variables, expressionbased cluster membership and membership to a particular biological theme.

8 2

4 26

in out

in

out

Cluster

Theme p ≈ .0002

( 2x2 contingency matrix )

Hypergeometric Distribution

a b

c d

a+c

a+b

b+d

c+d

!!!!!

)!()!()!()!(

)!()!(!

!!)!(

!!)!(

dcban

dbcadcba

dcban

dbdb

caca

++++=

++

+×+

The probability of any particularmatrix occurring by randomselection, given no associationbetween the two variables, is givenby the hypergeometric rule.

Probability Computation

For our matrix, 8 2

4 26 , we are not only

interested in getting the probability of getting exactly8 annotation hits in the cluster but rather the probabilityof having 8 or more hits. In this case the probabilities of each of the possible matrices is summed.

9 1

3 27

10 0

2 28

8 2

4 26

.0002207 + 7.27x10-6 + 7.79x10-8 ≈ .000228

Relevance Networks

Set of genes whose expression profiles are predictive of one another.

Genes with low entropy (least variable across experiments)are excluded from analysis.

H = -Σp(x)log2(p(x))x=1

10

Can be used to identify negative correlations between genes

Relevance Networks

Correlation coefficients outside the boundaries defined by the minimum and maximum thresholds are eliminated.

A

D

E B

C

.28

.75

.15.37

.40

.02

.51

.11

.63

.92A

D

E B

C

Tmin = 0.50The expression pattern of each gene compared to that of every other gene.

The ability of each gene to predict the expression of each other gene is assigned a correlation coefficient

Tmax = 0.90

The remaining relationships between genes define the subnets

FunNetFunNet((http://www.funnet.infohttp://www.funnet.info))

��

Programa que cria redes gênicas, a partir de dados de microarray, de 2 formas:

- pela relação direta entre a expressão dos genes

- buscando funções biológicas hiper-representadas (GO/KEGG), agrupando genes por essas funções e calculando relações entre funções

FunNet - parâmetrosFunNet - parâmetros

Upload dos arquivos citados anteriormente, quando necessários

- Análise funcional convencional: apenas analisa os Gos e KEGGs dos genes enviados

- Análise funcional das redes transcricionais: cria as redes

- Estimação de threshold de co-expressão: analisa quais seriam os melhores thresholds para você usar nos seus dados

FunNet – o que há de bom?FunNet – o que há de bom?

1. Buscar GO e KEGG

FunNet – o que há de bom?FunNet – o que há de bom?

2. Criar redes usando somente correlação entre genes3. Criar redes usando módulos funcionais

ArrayExpress• Public repository of microarray based gene expression data.• Implemented in Oracle at EBI.• Contains:

– several curated gene expression datasets – possible introduction of an image server to archive raw image

data associated with the experiments.• Accepts submissions in MAGE-ML format via a web-based data

annotation/submission tool called MIAMExpress. – A demo version of MIAMExpress is available at:

http://industry.ebi.ac.uk/~parkinso/subtool/subtype.html • Provides a simple web-based query interface and is directly linked

to the Expression Profiler data analysis tool which allows expression data clustering and other types of data exploration directly through the web.

http://industry.ebi.ac.uk/~parkinso/subtool/subtype.html

Gene Express Omnibus• The Gene Expression Omnibus ia a gene expression database hosted at

the National library of Medicine

• It supports four basic data elements

– Platform ( the physical reagents used to generate the data)

– Sample (information about the mRNA being used)

– Submitter ( the person and organisation submitting the data)

– Series ( the relationship among the samples).

• It allows download of entire datasets, it has not ability to query the relationships

• Data are entered as tab delimited ASCII records,with a number of columns that depend on the kind of array selected.

• Supports Serial Analysis of Gene Expression (SAGE) data.

Stanford Microarray Database

• Contains the largest amount of data.• Uses relational database to answer queries.• Associated with numerious clustering and analysis

features.• Users can access the data in SMD from the web

interface of the package.• Disadvantage :

– It supports only Cy3/Cy5 glass slide data– It is designed to exclusively use an oracle database– Has been recently released outside without anykind of

support !!

Percentage of genes shared by studies used in meta-analysisFor each study (rows) the percentage of genes found in all other studies (columns). The actual number of genes shared is given in parentheses (Rhodes,D.R. et al. (2002) Meta-analysis of microarrays: inter-study validation of gene expression profiles reveals pathway dysregulation in prostate cancer. Cancer Res., 62, 4427–4433.)

Authors Dhanasekaran, S.M., et al.

Luo, J., et al. Magee, J.A., et al.

Welsh, J.B., et al.

Dhanasekaran, S.M., et al.

100 61.1 (5106) 23 (1919) 34.8 (2906)

Luo, J., et al. 99.7 (5106) 100 30.5 (1560) 41.6 (2132)

Magee, J.A., et al.

69 (1919) 56.1 (1560) 100 79.9 (2221)

Welsh, J.B., et al.

53 (2906) 38.9 (2132) 40.5 (2221) 100

How to deal with such variation ?

Copyright ©2002 American Association for Cancer Research

Rhodes, D. R. et al. Cancer Res 2002;62:4427-4433

Rhodes et al. (2002) applied meta-analysis to combine four datasets on prostate cancer to determine genes that are differentially expressed between clinically localized prostate and benign tissue.

Parmigiani et al. (2004) performed a cross-study comparison of gene expression for the molecular classification of lung cancer.

Park and Stegall (2007) combined publicly available and their own microarray datasets to investigate the detection of cytokine gene expression in human kidney.

Meta-analysis has been shown to have increased statistical power to detect small but consistent effects that might be false negatives in the individual analyses (Choi et al., 2003).

It also has significantly improved reproducibility when compared with independent studies, which may lead to improved reliability (Hong et al., 2006).

Therefore, meta-analysis provides researchers with an indispensable tool to interrogate existing databases for candidate biomarkers and biological pathways.

Why meta-analysis ?

Aderbal R. T. Silva, Paulo J. S. Silva, Cecília Feio, Luis P. Camargo, Lea T. Grinberg, Renata E. L. Ferreti, Renata Leite, José M. Farfel, Cesar H. Torres, Dirce M. Carraro, Diogo Patrão, and

Brazilian Aging Brain Study Group

Marcadores biológicos de diagnóstico precoce e prognóstico em AlzheimerUsando dados de expressão gênica

O PROBLEMA

Pacientechega ao

VO

Cérebro Sangue bruto

Timo

RimCoraçã

oHemisféri

oHemisféri

oLeucócit

oPlasma

Tecido / sangue congeladoFormaldeído

DNA RNAK7

Lâmina HE

Entrevista com familiar

Dado clínico

Human Brain Bank of the Cerebral Aging Study Group of FMUSP

SISTEMA INFORMATIZADO

Cadastro inicial do paciente e

amostras

Nro do VO, Nome do paciente, Questionários (opcionais)

�

Qtd tubos por região do cérebroQtd tubos de sangue, timo, rim, etc

Geração do código de barras

Para cada amostra cadastrada, o sistema gera e imprime um código de barras único

Armazenamento das amostras

Freezer específico para cada tipo de amostra, controle de posição por código de barras.

Retirada das amostras

Em qual cuba de formaldeído está o hemisfério esquerdo?Em que freezer está o tubo de sangue?

Transformação das amostras

Hemisfério em cuba de FA => K7K7 => Lâmina HELâmina HE => Lâmina coradaTecido/sangue => RNA e/ou DNA

- Banco de Encéfalos Humanos do Grupo de Estudos de Envelhecimento Cerebral (FMUSP)

�

- SVOC

Córtex frontal

Hipocampo

CASUÍSTICA

Experimental Tissue Reference TissuesAD/nsAD/OD/N Pool of 15 cell lines

Total RNA IsolationRNeasy Mini Kit - QIAGEN

mRNA Amplification2 rounds

T7 polymerase approach

Reverse trascription,fluorescently labeled

with Cy3 (green)

�

and Cy5 (red)

�

Customized cDNA Platform: - 4,800 cDNA sequences

- 4,608 human genes- 192 positive and negative controls

Signal Intensity Capture(Scan)

Statistical analysis

ANOVA (p < 0.05)

Biological processes (GO) (p < 0.05) - WebGestalt

Linear classifiers, with 3 genes

MATERIAL E MÉTODOS

Introdução

+ -

+ DA definida(sintomática)

Outras Demências

- DA assintomática

Indivíduos Normais

Apresentação Clínica de Demência

Neuropatologia da Doença de Alzheimer (DA)

Reserva Cognitiva

Capacidade de tolerar alterações relacionadas a idade/doença no cérebro sem desenvolver sintomas ou sinais claros

Diagnostico precoce

AW842619 ABHD3 AW369078 CRY2 AW369582 GTF2H1 BE831904 LOC348262 BF935252 PSTPIP1 BE814002 TPST1 AW820222 ZNF655AW938004 ACP5 BE815274 CX3CL1 BE937637 GUK1 AW896256 LOC399947 BG001122 PTPN12 BE167402 TRAPPC6B BG009053 ZNF84BE004154 ACTN1 AW890224 DEF6 BE833329 H2AFZ BF884010 LOC440104 BE070870 PTPRO BE812623 TRRAP AW385146BF923962 AFF3 BF958558 DFNA5 BF357100 HIP1R BE703406 LRP1 BF876532 RAB15 AW892064 TTC13 BF332048BF735483 AGPAT4 AW890450 DMN BF736934 HK1 BF154636 LTF BQ332146 RAP1GAP AW608814 TXNDC5 BF376871BF813119 AKAP8L BE698761 DNAJA2 BE768509 HLTF BE181083 MAP3K4 BF962405 RICS AW884244 UBE2L3 BQ356322BQ377544 ATP6AP2 AW362589 DNAJC13 BE089922 HPCAL4 BE829325 MMP19 AW379479 SCARA3 BF814397 UBE2O BE841175Seq_inconsistenteATP6V0B BQ300355 DNAJC5 BE073224 ICMT AW854050 MORC3 BF960958 SLC25A38 BF085922 UBP1 BQ367273BF756552 AXIN2 BF758919 ENTPD6 BF881076 IFNGR1 BE811192 MRPL27 AW937427 SMAD7 AW389806 UCKL1 BQ339374BE828861 AYTL1 AW858463 EPSTI1 BE698667 IL15 BE841209 MTERF BF926012 SNRPN BF806367 UROD BF752100BF805451 BAMBI AW177711 ETS2 AW883691 IQCE AW384219 MYC BF806819 SPG11 BF879244 USP14 BF092878BE168732 BSCL2 AW900852 EXOC2 BF371888 JMJD2C AW384904 MYO5A BQ345409 SPOCK2 BF930903 USP20 BF336377BF798182 C17orf68 BE832888 EXTL3 BF761806 KIAA0146 BE708226 NNT BE926073 SPON1 BF803592 VPS39 BQ330558BE719302 C1orf165 BE832956 FAM125B BE771837 KIAA0427 AW576992 NUDCD3 AW370650 SPRYD4 AW367978 WBP5 BF359711BF359220 C20orf7 BF829157 FBXL10 BF842373 KIAA0895 BF919192 OIT3 BQ329616 SPTBN5 BF817369 YTHDF2 BQ309790BE695309 C7orf44 BF920328 FBXL15 BF996596 KISS1 BF809799 PANK1 BE769762 SRPK1 BG003841 ZBTB7B BQ367012AW752098 C9orf125 BF930310 FLJ14803 BE713379 KLK7 AW372944 PATZ1 BE930145 STK24 BE144581 ZFYVE21 BQ364322BQ365457 CBLL1 BE005088 FLJ32549 AW378473 KPNA6 BF765614 PCTK1 BQ367148 TBRG1 BE836738 ZMYM4 BE844225BF802140 CDC16 BE167409 FOXE1 BF739130 KRT1 BG004607 PLEKHO2 BQ320141 TEC AW842249 ZNF266 BF329787BE168329 CLN5 BE835720 FREQ AW863981 LIPF BE175902 POLK AW582658 TIMM9 BE141963 ZNF394 BQ311508AW995807 CPSF3 AW995236 GALNAC4S-6ST BF998151 LOC148696 BG003089 PPIL2 BE002927 TMC5 AW379158 ZNF559 BQ360637BF920912 CRKL BF935421 GRIK5 BF753140 LOC285628 BE703712 PPP2R2C BF806472 TMEM2 AW364861 ZNF576 BQ355034

RESULTADO ANOVA (1X3)= 154 GENES

107

18 12

2

4 9

24 GENES DIFERENCIALMENTE EXPRESSOS ENTRE nsAD E NDIAGNÓSTICO PRECOCE???

MARCADORES BIOLÓGICOS DEDIAGNOSTICO PRECOCE

Resultados

• Genes diferencialmente expressos

DAa → DAd

Resultados• Processos Biológicos (GO)

Resultados• Rede dos processos biológicos

Module 1 Module 2

Up-regulatedDown-regulated

Resultados

• Vias de Sinalização (KEGG)

Resultados

• Redes das vias de sinalização

Module 1 Module 2

Up-regulatedDown-regulated

Resultados

• Clusterização Hierárquica (genes fold ≥│1,8│)

Resultados

• Classificadores

Conclusões• Genes que distinguem DAa de DAd estão envolvidos,

principalmente, com ciclo celular e plasticidade sináptica.

• HIPÓTESE

Neurônios erroneamente convertem sinais que seriam utilizados para plasticidade sináptica na ativação do ciclo celular, o que, subsequentemente, leva-os à morte.

DA assintomática

Neurônios diferenciados, após se retirarem do ciclo celular, são capazes de usar alternativamente este mecanismo, essencialmente desenvolvido para controlar proliferação, para controlar plasticidade sináptica.

DA ‘sintomática’

Arendt T, Bruckner NK. Biochim Biophys Acta. 2007; 1772:413-421.Frank CL, Tsai LH. Neuron. 2009; 62:312-326.

Questão BiológicaQuestão Biológica

Delineamento experimentalDelineamento experimental

Genes diferencialmente expressos, marcadores tumoraisGenes diferencialmente expressos, marcadores tumorais

Padrões de expressão gênica, classificadores,Padrões de expressão gênica, classificadores,

Vias metabólicas e Sistemas biológicosVias metabólicas e Sistemas biológicos

Validação biológica Validação biológica

Validação técnica Validação técnica

A VISÃO DO PESQUISADOR EM MICROARRAY

1-Desenho experimental

2-QUALIDADE:ruído-RNAm (tecidos, kits, procedimentos); transcrição (reação,enzima);Marcação (tipo, tempo); amplificação, tipo de pino, superfície, volume sonda;produção slide, fixação; parametros de hibridação (tempo, temperatura,Tampão); hibridação inespecífica; background; artefatos; captura de imagem;segmentação; quantificação dos pixels...

3-normalização

4-ANÁLISE: muitos genes (multipla testagem???); Significância- Número de variáveis(genes) é muito maior que número de experimentos

5-replicabilidade

A VISÃO DO ANALISTA

Biological verification and interpretation

Microarray experiment

Experimental design

Image analysis

Normalization

Biological question (hypothesis-driven or

explorative)

TestingEstimatio

nDiscriminatio

n

AnalysisClusterin

g

Quality Measurement

Failed

Pass

Pre-processing

Biological verification and interpretation

Microarray experiment

Experimental design

Image analysis

Normalization

Biological question (hypothesis-driven or

explorative)

TestingEstimatio

nDiscriminatio

n

AnalysisClusterin

g

Quality Measurement

Failed

Pass

Pre-processing

To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination:

He may be able to say what the experiment died of.

Ronald Fisher

Books

Gentleman, Carey, Huber, “Bioinformatics and Computational Biology Solutions Using R and Bioconductor”, Springer

David W. Mount, „Bioinformatics“, Cold Spring Harbor

Terry Speed, „Statistical Analysis of Gene Expression Microarray Data”. Chapman & Hall/CRC

Pierre Baldi & G. Wesley Hatfield, „DNA Microarrays and Gene Expression”, Cambridge

Giovanni Parmigani et al, „The Analysis of Gene Expression Data“, Springer

And how do I analyze my own data?

www.r-project.orgwww.bioconductor.org•Open source•Free•Easy installation•Helpful community•High quality standards•Regularly maintained and updated•Tons of documentation•Every package comes with example vignettes to walk you through standard tasks.

Agradecimentosao time do LBHC

Análise de expressão gênica

Documents