Top Banner
Introduction to Microarry and Related High Throughput Analysis BMI 705 Kun Huang Department of Biomedical Informatics Ohio State University
39

Introduction to Microarry and Related High Throughput Analysis BMI 705 Kun Huang Department of Biomedical Informatics Ohio State University.

Dec 22, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Introduction to Microarry and Related High Throughput Analysis BMI 705 Kun Huang Department of Biomedical Informatics Ohio State University.

Introduction to Microarry and Related High

Throughput Analysis BMI 705 Kun Huang

Department of Biomedical InformaticsOhio State University

Page 2: Introduction to Microarry and Related High Throughput Analysis BMI 705 Kun Huang Department of Biomedical Informatics Ohio State University.

What is microarray?

• Affymetrix-like arrays – single channel (background-green, foreground-red)

• cDNA arrays – two channel (red, green, yellow)

• CGH array, DNA methylation array, SNP array, etc.

• CHIP-on-Chip

• Tissue microarray

• Future - Sequencing

Page 3: Introduction to Microarry and Related High Throughput Analysis BMI 705 Kun Huang Department of Biomedical Informatics Ohio State University.

How is microarray manufactured?

Page 4: Introduction to Microarry and Related High Throughput Analysis BMI 705 Kun Huang Department of Biomedical Informatics Ohio State University.

How does two-channel microarray work?• Printed microarrays• Long probe oligonucleotides (80-100) long

are “printed” on the glass chip

Page 5: Introduction to Microarry and Related High Throughput Analysis BMI 705 Kun Huang Department of Biomedical Informatics Ohio State University.

How does two-channel microarray work?• Printing process introduces errors and

larger variance• Comparative hybridization experiment

Page 6: Introduction to Microarry and Related High Throughput Analysis BMI 705 Kun Huang Department of Biomedical Informatics Ohio State University.

How does microarray work?

Page 7: Introduction to Microarry and Related High Throughput Analysis BMI 705 Kun Huang Department of Biomedical Informatics Ohio State University.

How is microarray manufactured?• Affymetrix GeneChip

• silicon chip• oligonucleiotide probes lithographically synthesized

on the array• cRNA is used instead of cDNA

Page 8: Introduction to Microarry and Related High Throughput Analysis BMI 705 Kun Huang Department of Biomedical Informatics Ohio State University.

How does Affymetrix microarray work?

Page 9: Introduction to Microarry and Related High Throughput Analysis BMI 705 Kun Huang Department of Biomedical Informatics Ohio State University.

How does microarray work?

Page 10: Introduction to Microarry and Related High Throughput Analysis BMI 705 Kun Huang Department of Biomedical Informatics Ohio State University.

How does microarray work?

Page 11: Introduction to Microarry and Related High Throughput Analysis BMI 705 Kun Huang Department of Biomedical Informatics Ohio State University.

How does microarray work?

Page 12: Introduction to Microarry and Related High Throughput Analysis BMI 705 Kun Huang Department of Biomedical Informatics Ohio State University.

How does microarray work?

Page 13: Introduction to Microarry and Related High Throughput Analysis BMI 705 Kun Huang Department of Biomedical Informatics Ohio State University.

How does microarray work?• Fabrication expense and frequency of error

increases with the length of probe, therefore 25 oligonucleotide probes are employed.

• Problem: cross hybridization

• Solution: introduce mismatched probe with one position (central) different with the matched probe. The difference gives a more accurate reading.

Page 14: Introduction to Microarry and Related High Throughput Analysis BMI 705 Kun Huang Department of Biomedical Informatics Ohio State University.

How do we use microarray?

• Profiling

• Clustering

Page 15: Introduction to Microarry and Related High Throughput Analysis BMI 705 Kun Huang Department of Biomedical Informatics Ohio State University.

Spatial Images of the Microarrays• Data for the same

brain voxel but for the untreated control mouse

• Background levels are much higher than those for the Parkinson’s disearse model mouse

• There appears to be something non random affecting the background of the green channel of this slide

Page 16: Introduction to Microarry and Related High Throughput Analysis BMI 705 Kun Huang Department of Biomedical Informatics Ohio State University.

How do we take readings from microarray (measurement)?

cDNA array – ratio, log ratio

Affymetrix array

Page 17: Introduction to Microarry and Related High Throughput Analysis BMI 705 Kun Huang Department of Biomedical Informatics Ohio State University.

How do we process microarray data

(McShane, NCI)

Page 18: Introduction to Microarry and Related High Throughput Analysis BMI 705 Kun Huang Department of Biomedical Informatics Ohio State University.

How do we process microarray data • Normalization

• Intensity imbalance between RNA samples• Affect all genes• Not due to biology of samples, but due to technical

reasons• Reasons include difference in the settings of the

photodetector voltage, imbalance in total amount of RNA in each sample, difference in uptaking of the dyes, etc.

• The objective is to adjust the gene expression values of all genes so that the ones that are not really differentially expressed have similar values across the array(s).

Page 19: Introduction to Microarry and Related High Throughput Analysis BMI 705 Kun Huang Department of Biomedical Informatics Ohio State University.

Normalization • Two major issues to consider

• Which genes to use for normalization• Which normalization algorithm to use

• Housekeeping genes• Genes involved in essential activities of cell maintenance and survival, but

not in cell function and proliferation. These genes will be similarly expressed in all samples but they may be difficult to identify – need to be confirmed. Affymetrix GeneChip provides a set of house keeping genes (but still no guarantee).

• Spiked controls• Genes that are not usually found in the samples (both control and test

sample). E.g., yeast gene in human tissue samples. Note: Affy GeneChip protocol includes the spiking of control oligonucleotides into each sample. They are NOT for normalization. Instead, they are for other purposes such as gridding of slide by the image analysis software.

• Using all genes• Simplest approach – use all adequately expressed genes for normalization

The assumption is that the majority of genes on the array are housekeeping genes and the proportion of over expressed genes is similar to that of the under expressed genes. If the genes one the chip are specially selected, then this method will not work.

Page 20: Introduction to Microarry and Related High Throughput Analysis BMI 705 Kun Huang Department of Biomedical Informatics Ohio State University.

Normalization • Which normalization algorithm to use

• For two-color cDNA arrays - Intra-slide normalization

Slope = 1

Scatter plot Ratio-intensity (RI) or MA plot

Page 21: Introduction to Microarry and Related High Throughput Analysis BMI 705 Kun Huang Department of Biomedical Informatics Ohio State University.

Normalization • Linear (global) normalization

• Simplest but most consistent• Move the median to zero (slope 1 in scatter

plot, this only changes the intersection)• No clear nonliearity or slope in MA plot

Page 22: Introduction to Microarry and Related High Throughput Analysis BMI 705 Kun Huang Department of Biomedical Informatics Ohio State University.

Normalization • Intensity-based (Lowess) normalization

• Overall magnitude of the spot intensity has an impact on the relative intensity between the channels.

• “Straighten” the Lowess fit line in MA plot to horizontal line and move it to zero

Page 23: Introduction to Microarry and Related High Throughput Analysis BMI 705 Kun Huang Department of Biomedical Informatics Ohio State University.

Normalization • Intensity-based (Lowess) normalization

• Nonlinear• Gene-by-gene, could introduce bias• Use only when there is a compelling

reason

(McShane, NCI)

Page 24: Introduction to Microarry and Related High Throughput Analysis BMI 705 Kun Huang Department of Biomedical Informatics Ohio State University.

Normalization • Other normalization method

• Combination of location and intensity-based normalization

• Location• Quantile• …

Page 25: Introduction to Microarry and Related High Throughput Analysis BMI 705 Kun Huang Department of Biomedical Informatics Ohio State University.

Normalization • Which normalization algorithm to use

• Inter-slide normalization• Not just for Affymetrix arrays

Page 26: Introduction to Microarry and Related High Throughput Analysis BMI 705 Kun Huang Department of Biomedical Informatics Ohio State University.

Normalization • Box plot

Median

Low quartile

Upper quartile

Page 27: Introduction to Microarry and Related High Throughput Analysis BMI 705 Kun Huang Department of Biomedical Informatics Ohio State University.

Normalization • Linear (global) – the chips have equal median

(or mean) intensity• Intensity-based (Lowess) – the chips have

equal medians (means) at all intensity values• Quantile – the chips have identical intensity

distribution • Quantile is the “best” in term of normalizing the

data to desired distribution, however it also changes the gene expression level individually

• Avoid overfitting• Avoid bias

Page 28: Introduction to Microarry and Related High Throughput Analysis BMI 705 Kun Huang Department of Biomedical Informatics Ohio State University.

Student’s t-testGene Discovery and T-tests

Page 29: Introduction to Microarry and Related High Throughput Analysis BMI 705 Kun Huang Department of Biomedical Informatics Ohio State University.

Gene Discovery and Multiple T-testsControlling False Positives

• Statistical tests to control the false positives• Controlling for no false positives (very

stringent, e.g., Bonferroni test)• Controlling the number of false positives• Controlling the proportion of false positives• Note that in the screening stage, false

positive is better than false negative as the later means missing of possibly important discovery.

Page 30: Introduction to Microarry and Related High Throughput Analysis BMI 705 Kun Huang Department of Biomedical Informatics Ohio State University.

Microarray Databases• Gene Expression Ominbus (GEO) database – NCBI

– http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pubmed• EMBL-EBI microarray database

– http://www.ebi.ac.uk/Databases/microarray.html• ArrayExpress• Stanford Microarray Database (SMD)

– http://genome-www5.stanford.edu/• Other specialized, regional and aggregated databases

– http://psi081.ba.ars.usda.gov/SGMD/– http://www.oncomine.org/main/index.jsp– http://ihome.cuhk.edu.hk/~b400559/arraysoft_public.html– …

Page 31: Introduction to Microarry and Related High Throughput Analysis BMI 705 Kun Huang Department of Biomedical Informatics Ohio State University.

Microarray Softwares• DChip• Open source R, Bioconductor• BRBArray tools (NCI biometric research branch)• Affymetrix• GeneSpring GX• GenePattern• …

Page 32: Introduction to Microarry and Related High Throughput Analysis BMI 705 Kun Huang Department of Biomedical Informatics Ohio State University.

How do we use microarray (clustering)?

Page 33: Introduction to Microarry and Related High Throughput Analysis BMI 705 Kun Huang Department of Biomedical Informatics Ohio State University.

How do we process microarray data (clustering)?

-Unsupervised Learning – Hierarchical Clustering

Page 34: Introduction to Microarry and Related High Throughput Analysis BMI 705 Kun Huang Department of Biomedical Informatics Ohio State University.

ChIP-on-chip, “also known as genome-wide location analysis, is a technique for isolation and identification of the DNA sequences occupied by specific DNA binding proteins in cells.” (http://www.chiponchip.org)

• Identify protein binding sites on DNA• Study transcriptional factors – identify the genes that

controlled by the specific TFs• Identify TFs• Identify regulatory regions such as promoters,

enhancers, repressors, silencing elements, insulators, and boundary elements

• Determine sequences controlling DNA replication (e.g., histone binding sites)

Page 35: Introduction to Microarry and Related High Throughput Analysis BMI 705 Kun Huang Department of Biomedical Informatics Ohio State University.

ChIP-on-Chip

ChIP – Chromatin immunoprecipitationChip – Microarray

Page 36: Introduction to Microarry and Related High Throughput Analysis BMI 705 Kun Huang Department of Biomedical Informatics Ohio State University.

ChIP-on-Chip – Example

Simon I., Barnett J., Hannett N., Harbison C.T., Rinaldi N.J., Volkert T.L., Wyrick J.J., Zeitlinger J., Gifford D.K., Jaakkola T.S., et al. "Serial regulation of transcriptional regulators in the yeast cell cycle", Cell, Volume: 106, (2001), pp. 697-708.

Figure 2. Genome-wide Location of the Nine Cell Cycle Transcription Factors(A) 213 of the 800 cell cycle genes whose promoter regions were bound by a myc-tagged version of at least one of the nine cell cycle transcription factors (p < 0.001) are represented as horizontal lines. The weight-averaged binding ratios are displayed using a blue and white color scheme (genes with p value < 0.001 are displayed in blue). The expression ratios of an α factor synchronization time course from Spellman et al. (1998) are displayed using a red (induced) and green (repressed) color scheme.(B) The circle represents a smoothed distribution of the transcription timing (phase) of the 800 cell cycle genes (Spellman et al., 1998). The intensity of the red color, normalized by the maximum intensity value for each factor, represents the fraction of genes expressed at that point that are bound by a specific activator. The similarity in the distribution of color for specific factors (with Swi4, Swi6, and Mbp1, for example) shows that these factors bind to genes that are expressed during the same time frame

Page 37: Introduction to Microarry and Related High Throughput Analysis BMI 705 Kun Huang Department of Biomedical Informatics Ohio State University.

ChIP-on-Chip – Example

Simon I., et al. "Serial regulation of transcriptional regulators in the yeast cell cycle", Cell, Volume: 106, (2001), pp. 697-708.

Page 38: Introduction to Microarry and Related High Throughput Analysis BMI 705 Kun Huang Department of Biomedical Informatics Ohio State University.

ChIP-on-Chip

Problem : Probe design1. Most TF binding sites are not in exon2. Binding sequences are short3. Cover entire genome?4. Signal may be small

Tiling array – divide the sequence into chunks, called tiling path. The distance between the center of neighboring chunks is called resolution. A path can be overlapped or spaced.

Affymetrix tiling array for yeasts – 5bp resolution, 3.2 million probes

Affymetrix tiling array for human – 35bp spacing, 90 million probes

Page 39: Introduction to Microarry and Related High Throughput Analysis BMI 705 Kun Huang Department of Biomedical Informatics Ohio State University.

Sequencing• Solexa http://www.illumina.com/pages.ilmn?ID=203• SOLiD

Mikkelsen et al