Microarrays Wednesday, March 1, 2006 Dr. Tim Hughes CCBR – 160 College St. – Room 1302 [email protected]Outline: • Microarray experiments • Normalization • Different types of microarrays • Other applications besides expression profiling • Clustering and interpretation
49
Embed
Microarrays Wednesday, March 1, 2006 Dr. Tim Hughes CCBR – 160 College St. – Room 1302 [email protected] Outline: Microarray experiments Normalization.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Outline:• Microarray experiments• Normalization • Different types of microarrays• Other applications besides expression profiling• Clustering and interpretation
Suggested reading
Eisen et al., 1998
HARTIGAN, J.A., Clustering Algorithms, Wiley, New York and London (1975).My understanding is that it is no longer in print, but is available on CD.
Jain et al., ACM Computing Surveys, 31(3) 1999 “Data Clustering: a review”.(http://www.amk.alt-neustadt.at/diplom/papers/Clustering/p264-jain.pdf)
Hegde et al., A concise guide to cDNA microarray analysis.Biotechniques. 2000 Sep;29(3):548-50, 552-4, 556
Sherlock G. Analysis of large-scale gene expression data.Curr Opin Immunol. 2000 Apr;12(2):201-5.
www.accessexcellence.org/AB/GG/nucleic.html
Nucleic Acid Hybridization
Microarray expression profiling by 2-color assay (“cDNA arrays”)
Array: PCR products6250 yeast ORFs
hybridized cDNAs:green = controlred = experiment
*Schena et al., 1995
“cDNA microarrays” are essentially dot-blots on glass slides
• This slide was made with 16 pins• 4.5 mm pin spacing matches 384-well plates (16 x 24)• Done with robotics• Slides usually coated with poly-lysine• Spots are usually 100-150 microns• Spot spacing is usually 200-300 microns.• Slides are 25 x 75 mm• Easy to deposit 20K spots/slide
0.45 mm
Common ways to “label” nucleic acids
Random priming of double-stranded DNA:
Poly-T primed cDNA synthesis:
Direct labelling (fluors only):
Amplification:*
*
*
Reaction contains labelled nucleotides
* *
AAAAAAAA
AAAAAAAA
TTTTTTTTTT
Reaction contains labelled nucleotides
*
AAAAAAAA
AAAAAAAA
TTTTTTTTTT-T7 promoter
TTTTTTTTTT-T7 promoter
AAAAAAAA-T7 promoter
* ** *
*
T7 reaction contains labelled nucleotides
“second strand” synthesis
controltreatment
(drug, mutation)
updownunchangednot present
x y z
xx
x
xx
yy
yy
zz z
cDNA pools
Typical use of cDNA microarrays:“Internal” normalization using two colors
Excitation
Emission
532 nm laser (green) excites Cy3Cy3 detected with an emission filter that passes 557-592 nm
635 nm (red) excites Cy5Cy5 detected with an emission filter that passes 650-690 nm.
Image processing and normalization: what is microarray data?Microarray data is summary information from image files that come out of the scanner.Image processing: line up grids, flag bad spots, quantitate.
Lowess smoothing: The names "lowess" and "loess" are derived from the term "locally weighted scatter plot smooth," as both methods use locally weighted linear regression to smooth data.
(1) High-pass spatial detrending. See: O. Shai, Q. Morris, and B.J. Frey, (2003) Spatial Bias Removal in Microarray Images, University of Toronto Technical Report PSI-2003-21, http://www.psi.utoronto.ca/~ofer/detrendingReport.pdf
Huber, W., Von Heydebreck, A., Sultmann, H., Poustka, A., & Vingron, M. Variance stabilization applied to microarray data calibration and to the quantification of differential expression. Bioinformatics 18, Suppl 1, S96-S104 (2002).
Arrays are typically 25-mers, with “mismatch” control for specificity
Photolithographic arrays (Affymetrix)Advantages:
Density is limited essentially by the 5 micron resolution of scanners (solution: larger arrays).
Well-developed protocols.
“Industry standard” (largely self-driven).
Disadvantages:
Not all probes work well. Affymetrix has evolved a complicated system to compensate for this, but even “believers” use at least four probes per gene, and usually more.
Single color.
Sample preparation typically requires amplification.
Single supplier; historically intellectual property issues. (i.e. comparisons)
• 25,000 oligos / 1 x 3 inches
• Sequence completely flexible
• 60-mers
G
AGTC
A
CGGG
C
TGAA
Ink-jet arrays (Agilent)
Hughes TR et al. Expression profiling using microarrays fabricated by an ink-jet oligonucleotide synthesizer. Nat Biotechnol. 2001 Apr;19(4):342-7.
Ink-jet arrays generally agree with spotted cDNA arrays
Yeast IJS array: ~8 oligos per gene Spo vs. SC
cDNA array
mu
ltip
le o
ligos
cDNA array
sin
gle
olig
o
r = 0.96
HXT3 HXT1
HXT4
r = 0.97
Ink-jet arrays (Agilent)Advantages:
User-specified sequences; “no questions asked”
Sensitivity and specificity are defined and exceed requirement for most expression profiling applications; no amplification required
Virtually every 60-mer is functional
Data correlates well with spotted cDNA arrays
Disadvantages:
Density currently limited to ~45,000 spots per array.
Single supplier (although a protocol is in press for making your own synthesizer!)
Density is limited essentially by the 5 micron resolution of scanners.
Disadvantages:
New to arena. Performance in initial publication (Nuwaysir et al., Genome Research, 2002) suggests that sensitivity and specificity may be lower than that of Agilent arrays.
Single supplier – although all the parts are there for academics to build one.
Possible IP issues. Hybs are done in Iceland to bypass Affy IP. Nimblegen web site boasts of new partnership with Affymetrix.
“Maskless” arrays (Nimblegen)
Applications beyond expression profiling
• DNA copy number
• Genotyping
• Protein-DNA associations
• Molecular “Barcoding”
• Protein arrays
• Transformation arrays
Identifying DNA binding sites
Science 2000 Dec 22;290(5500):2306-9 Genome-wide location and function of DNA binding proteins.
Ren B, Robert F, Wyrick JJ, Aparicio O, Jennings EG, Simon I, Zeitlinger J, Schreiber J, Hannett N, Kanin E, Volkert TL, Wilson CJ, Bell SP, Young RA.
Results of individual studies (localization, 2-hybrid screens, protein complexes, etc.
Sequence motifs, structural domains (pfam, SMART)
Other people’s microarray clusters
etc.
**When testing clusters against many different types of categorical annotations, should consider correcting for multiple-testing, and also consider that categories are often not independent
mRNA
protein
nucleus
cell
Big questions:
To what degree are functional pathways coordinately regulated?
What controls the observed regulations?
Exploring mouse gene expression using Inkjet Oligonucleotide Arrays
• 22,000 oligos / 1 x 3 inches
• Sequence completely flexible
• Mouse “42K” array: NCBI GenomeScan predictions (“XM”) on mouse draft sequence G
AGTC
A
CGGG
C
TGAA
**Wen Zhang
• Includes:25K with cDNA
(75% of 18K RefSeq genes)30K with cDNA or EST12K potential new genes
Exploring mouse gene expression using Inkjet Oligonucleotide Arrays
Collect 55 different mouse tissues from experts:
Janet RossantJane AubinDerek van der KooyMichael FehlingsBenoit Bruneau**
**Wen Zhang
Analyze mRNA levels on arrays(1 g poly-A)
Analysis of 55 mouse tissues: QC
**Malina Bakowski, Blencowe lab
Unch
ar.
cDNA
EST
Gen
e tra
pTr
ansc
riptio
n fa
ctor
RNA
bind
ing/
RS d
omai
n
Test
isO
lfact
ory
bulb
Brai
nEy
eES Sk
el. M
uscle
Live
rFe
mur
Teet
hPl
acen
taPr
osta
teLy
mph
nod
eSp
leen
Digi
tTo
ngue
Trac
hea
Larg
e in
test
ine
Colo
n
Test
isO
lfact
ory
bulb
Brai
nEy
eES Sk
el.l
Mus
cleLi
ver
Fem
urTe
eth
Plac
enta
Pros
tate
Lym
ph n
ode
Sple
enDi
git
Tong
ueTr
ache
aLa
rge
inte
stin
eCo
lon
Hypothetical protein FLJ20519Testis nuclear RNA binding ptn (Tenr)
DEAD box polypeptide 4 (Ddx4)Deleted in azoospermia-like (Dazl)
RIKEN cDNA 1700001N01LOC235045
Sim. to serine protease inhibitorRIKEN cDNA 1700067I02