Top Banner
Analysis of High- throughput Gene Expression Profiling
35
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Analysis of High-throughput Gene Expression Profiling.

Analysis of High-throughput Gene Expression Profiling

Page 2: Analysis of High-throughput Gene Expression Profiling.

Why to Measure Gene Expression

1. Determines which genes are induced/repressed inresponse to a developmental phase or to anenvironmental change.2. Sets of genes whose expression rises and fallsunder the same condition are likely to have arelated function.3. Features such as a common regulatory motif can bedetected within co-expressed genes.4. A pattern of gene expression may be used as anindicator of abnormal cellular regulation.

• A useful tool for cancer diagnosis

Page 3: Analysis of High-throughput Gene Expression Profiling.

Why to Measure Gene Expression in Large Scale?

Transitional vs. High-throughput Approaches

Page 4: Analysis of High-throughput Gene Expression Profiling.

Techniques Used to Detect Gene Expression Level

• Microarray (single or dual channel)Microarray (single or dual channel)• SAGESAGE• EST/cDNA libraryEST/cDNA library• Northern Blots• Subtractive hybridisation• Differential hybridisation• Representational difference analysis (RDA)• DNA/RNA Fingerprinting (RAP-PCR)• Differential Display (DD-PCR)• aCGH: array CGH (DNA level)

High-throughput High-throughput

Page 5: Analysis of High-throughput Gene Expression Profiling.

Basic Information of Microarray, SAGE and cDNA Library

Page 6: Analysis of High-throughput Gene Expression Profiling.

(DNA) Microarray1. Developed around 1987.2. Employ methods previously exploited in immunoassay co

ntext – specific binding and marking techniques.3. Two types of probes:

Format I:Format I: probe cDNA (500~5,000 bases long) is immobilized to a solid surface such as glass; widely considered as developed at Stanford University; Traditionally called DNA microarrays. Format II:Format II: an array of oligonucleotide (20~80-mer oligos) probes is synthesized either in situ(on-chip) or by conventional synthesis followed by on-chip immobilization; developed at Affymetrix, Inc. Many companies are anufacturing oligonucleotide based chips using alternative in-situ synthesis or depositioning technologies. Historically called DNA chips.

Page 7: Analysis of High-throughput Gene Expression Profiling.

Microarray

• Single Channel: sub-type classification

• Dual Channel: differential expression gene screening

• Tissue microarray

• Protein microarray

• ……

Page 8: Analysis of High-throughput Gene Expression Profiling.

Array CGH

• Detecting DNA copy variation via microarray approach

• A hotspot in recent research works, especially in Cancer research

Page 9: Analysis of High-throughput Gene Expression Profiling.

Microarray Analysis

gene discovery

pattern discovery

inferences about biological processes

classification of biological processes

Which genes are up-regulated, down-regulated, co-regulated, not-regulated?

Page 10: Analysis of High-throughput Gene Expression Profiling.

SAGE

• Experimental technique assigned to gain a quantitive measure of gene expression.

• ~10-20 base “tags” are produced (immediately adjacent to the 3’ end of the 3’ most NlaIII restriction site).

• The SAGE technique measures not the expression level of a gene, but quantifies a "tag" which represents the transcription product of a gene.

Page 11: Analysis of High-throughput Gene Expression Profiling.

SAGE

Tags are isolated and concatermized.

Relative expression levels can be compared between cells in different states.

Page 12: Analysis of High-throughput Gene Expression Profiling.

SAGEmap (http://cgap.nci.nih.gov)

Page 13: Analysis of High-throughput Gene Expression Profiling.

SAGE: comparing two relational libraries

Page 14: Analysis of High-throughput Gene Expression Profiling.

EST library (UniGene)

Page 15: Analysis of High-throughput Gene Expression Profiling.

Gene expression info from Unigene Library

Page 16: Analysis of High-throughput Gene Expression Profiling.

An Example of In-house EST Library Analysis

Page 17: Analysis of High-throughput Gene Expression Profiling.

The Algorithms and Challenges of High-throughput Gene Expression Analysis

Page 18: Analysis of High-throughput Gene Expression Profiling.

Seeing is believing?

No, need to correct errors.

Page 19: Analysis of High-throughput Gene Expression Profiling.

SAGE:

• A typical experiment requires ~30,000 gene expression comparisons where normal and a diseased cell is compared.

• The results were subject to the size and reliabilities of the SAGE libraries.

• Statistical measures are used to filter out candidate genes to reduce the dimensionality of the data but it is tedious and time consuming to play with these measures until a good set is found.

Page 20: Analysis of High-throughput Gene Expression Profiling.

SAGE

• TPM: a simple normalization methodTPM=Count*1000,000/TotalCount

• Bayesian approach http://cancerres.aacrjournals.org/cgi/content/full/59/21/5403

Page 21: Analysis of High-throughput Gene Expression Profiling.

Microarray: Sources of errors

• systematic

• random

l

og

sig

nal

in

ten

sity

log RNA abundance

Page 22: Analysis of High-throughput Gene Expression Profiling.

Sources of Errors (Cont.)

• Printing and/or tip problems• Labeling and dye effects (differing amounts of

RNA labeled between the 2 channels)• Differences in the power of the two lasers (or

other scanner problems) • Difference in DNA concentration on arrays (pl

ate effects)• Spatial biases in ratios across the surface of t

he microarray due to uneven hybridization• cDNA array cannot distinguish alternatively

spliced forms

Page 23: Analysis of High-throughput Gene Expression Profiling.

Errors that cannot be corrected by statistics

• Competitive hybridization of different targets on the chip

• Failure to distinguish different splicing forms

• Misinterpretation of time course data when there are not sufficient points

• Misinterpretation of relative intensity

Page 24: Analysis of High-throughput Gene Expression Profiling.

Does clustered time course really mean co-expression?

Picture taken from http://genomics.stanford.edu/yeast/additional_figures_link.html

Yes, you can studyknown system (such as cell cycle) this way; but, how about the unknown systems?

Page 25: Analysis of High-throughput Gene Expression Profiling.

Normalization by iterative linear regression

fit a line (y=mx+b) to the data set

set aside outliers (residuals > 2 x s.e.)

repeat until r2 changes by

< 0.001

then apply slope and intercept to the original dataset

D Finkelstein et al. http://www.camda.duke.edu/CAMDA00/abstracts.asp

Page 26: Analysis of High-throughput Gene Expression Profiling.

average signal {log2 (Cy3 + Cy5)/2}

rati

o {

log

2 (C

y5 /

Cy3

)} Loess function fit line

0

Normalization (Curvilinear)

G Tseng et al., NAR 2001

Page 27: Analysis of High-throughput Gene Expression Profiling.

After Normalization ……

• Differentially Expressed (DE) Gene screeing– T-test– T-statistics– SVM

• Clustering– Hierarchical– SOM– K-means

• Network (Pathway) analysis– BioCarta, KEGG, GO databases– Bayesian network learning– Topology – …

Page 28: Analysis of High-throughput Gene Expression Profiling.

Bioinformatics challenges

1. data management

2. utilizing data from multiple experiments

3. utilizing data from multiple groups

* with different technologies

* with only processed data available

Page 29: Analysis of High-throughput Gene Expression Profiling.

Bioinformatics Analysis of Integrated Analysis of Gene Expression Profiling

Page 30: Analysis of High-throughput Gene Expression Profiling.

Large-scale meta-analysis of cancer microarray data identifies common transcriptional profiles of neoplastic transformation and progression

Daniel R. et al. PNAS, 2004(101), 9309-9314 T-test Q values (estimated false discovery rates) were calculated as

where P is P value, n is the total number of genes, and i is the sorted rank of P value.

Page 31: Analysis of High-throughput Gene Expression Profiling.

Cont. Meta-Profiling.

The purpose of meta-profiling is to address the hypothesis that a selected set of differential expression signatures shares a significant intersection of genes (a meta-signature), thus inferring a biological relatedness.

Page 32: Analysis of High-throughput Gene Expression Profiling.

67 genes were screened by mata-analysis

Page 33: Analysis of High-throughput Gene Expression Profiling.

Integrated Cancer Gene Expression Map

Page 34: Analysis of High-throughput Gene Expression Profiling.

7 genes were discovered by the system

Page 35: Analysis of High-throughput Gene Expression Profiling.

THANX!!