Page 1
Thanks to: Washington U, Harvard-MIT
Broad Inst., DARPA-BioSpice, DOE-GTL, EU-MolTools,
NGHRI-CEGS, NHLBI-PGA, NIGMS-SysBio, PhRMA, Lipper Foundation
Agencourt, Ambergen, Atactic, BeyondGenomics, Caliper, Genomatica, Genovoxx, Helicos, MJR, NEN, Nimblegen, SynBioCorp, ThermoFinnigan, Xeotron/Invitrogen
For more info see: arep.med.harvard.edu
CHI Microarrays in Medicine4-May-2005 9:20-9:50 AM
Synthesis &Analysis on Molecular Arrays
Page 2
Systems Biology Loop
Syntheses &Perturbations
Models
Experimental designs
(Systematic)
Data
Analysis & Synthesis Tools
Genome engineering
DNA & RNAPolony
Sequencing
Page 3
Why Synthetic Genomes & Proteomes?
• Test array hypotheses e.g. cis-DNA/RNA-elements • Multi-epitopes, vaccines, protein design• Mass spectrometry & array standards.• Access to any protein (complex) including post-transcriptional modifications• Utility of molecular biology DNA-RNA-Protein
in vitro "kits" (e.g. PCR, T7, Roche)
Whole genome or part?Whole if major redesign e.g. changingthe genetic code and stability.
Page 4
Up to 760K Oligos/Chip18 Mbp for $1K raw (6-18K genes)
<1K Oxamer Electrolytic acid/base 8K Atactic/Xeotron/Invitrogen Photo-Generated Acid Sheng , Zhou, Gulari, Gao (U.Houston) 24K Agilent Ink-jet standard reagents 48K Febit 100K Metrigen 380K Nimblegen Photolabile 5'protection Nuwaysir, Smith, Albert
Tian, Gong, Church
Page 5
Improve DNA Synthesis CostSynthesis on chips in pools is 5000X less expensive per
oligonucleotide, but amounts are low (1e6 molecules rather than usual 1e12) & bimolecular kinetics slow with square of concentration decrease!)
Solution: Amplify the oligos then release them.
10 50 10 => ss-70-mer (chip)
20-mer PCR primers with restriction sites at the 50mer junctions
Tian, Gong, Sheng , Zhou, Gulari, Gao, Church
=> ds-90-mer
=> ds-50-mer
Page 6
Improve DNA Synthesis Accuracyvia mismatch selection
Tian & Church Other mismatch methods: MutS (&H,L)
Page 7
Improving DNA synthesis accuracy
Method Bp/error
Chip assembly only 160 Hybridization-selection 1,400MutS-gel-shift 10,000PCR 35 cycles 10,000MutHLS cleavage 100,000
Tian & Church 2004 NatureCarr & Jacobson 2004 NARSmith & Modrich 1997 PNAShttp://www.invitrogen.com/content.cfm?pageid=453
Page 8
Computer aided Design Polymerase Assembly Multiplexing (CAD-PAM)
For tandem, inverted and dispersed repeats: Focus on 3' ends, hierarchical assembly, size-selection and scaffolding.
Mullis 1986 CSHSQB, Dillon 1990 BioTech, Stemmer 1995 Gene Tian et al. 2004 Nature, Kodumal et 2004 PNAS
50
75
125 225 425 825 … 100*2^(n-1)
Page 9
Genome assembly
0 1 2 3 4 PAM cycle# 550 75 125 225 425 #bp 825
50 HS PAM 425 MutS PAM 10K anneal 100K red5Mbp
USER USER-S1 USER-5'only One pool 480 pools 480 genomic 48 1 of 117K universal primers primer pairs
HS=Hybridization-SelectionUSER=Uracil DNA glycosylase &EndoVIII remove flanking primer pairs
] ]PCR in vitro
Page 10
All 30S-Ribosomal-protein DNAs(codon re-optimized)
Tian, Gong, Sheng , Zhou, Gulari, Gao, Church
1.7 kb
0.3 kb
s190.3kb
Nimblegen 95K chip
Atactic <4K chip
Page 11
Extreme mRNA makeover for protein expression in vitro
RS-2,4,5,6,9,10,12,13,15,16,17,and 21 detectable initially.
RS-1, 3, 7, 8, 11, 14, 18, 19, 20 initially weak or undetectable.
Solution: Iteratively resynthesize all mRNAs with less mRNA structure.
Tian & Church
20w 20m 17w 17m 16w 16m
10kd
W: wild-typeM: modified
Western blot based on His-tags
Page 12
3 Exponential technologies
Shendure J, Mitra R, Varma C, Church GM, 2004 Nature Reviews of Genetics. Carlson 2003 ; Kurzweil 2002
1E-3
1E-1
1E+1
1E+3
1E+5
1E+7
1E+9
1E+11
1E+13
1830 1850 1870 1890 1910 1930 1950 1970 1990 2010
urea
E.coli
B12
tRNA
operons
telegraph
Computation & communication
(bits/sec)
Synthesis (daltons)
Analysis(bp/$) tRNA
Page 13
Why Personal Genomics?
• Pathogen rapid response: emerging disease & biowarfare• B & T-cell diversity: clinical temporal profiling• Proteomics: antibodies & aptamers • RNA & methylation: quantitate splicing, & chromatin.• Preventative medicine: genotype–phenotype association• Cancer: drug targets, loss-of-heterozygosity• Synthetic biology: laboratory selections• Phylogenetic: footprinting, biodiversity
Shendure et al. 2004 Nature Reviews of Genetics
Page 14
Cancer Genome Projectdiagnosis, prognosis, therapies
Mutations G719S, L858R, Del746ELREA in red.
EGFR Mutations in lung cancer: correlation with clinical response to gefitinib [Iressa] therapy.
Paez, … Meyerson (Apr 2004) Science 304: 1497
Lynch … Haber, (Apr 2004) New Engl J Med. 350:2129.
Pao .. Mardis,Wilson,Varmus H, PNAS (Aug 2004) 101:13306-11.
Dulbecco R. (1986) A turning point in cancer research: sequencing the human genome. Science 231:1055-6.
Page 15
A’
A’A’
A’
A’
A’
B
BB
B
BB
A
Single Molecule From Library
B
BA’
A’
1st Round of PCR
Primer is Extendedby Polymerase
B
A’
BA’
Polymerase colony (polony) PCR in a gel
Primer A has 5’ immobilizing Acrydite
Mitra & Church Nucleic Acids Res. 27: e34
Page 16
Polymerase clones Plone sequencing
Polony-slides vs. Plone-beads1 vs. 2 immobilized primersdNTP extension vs. ligationSingle molecule vs. multi-molecule detection
Page 17
Cleavable dNTP-Fluorophore (& terminators)
Mitra,RD, Shendure,J, Olejnik,J, Olejnik,EK, and Church,GM (2003) Fluorescent in situ Sequencing on Polymerase Colonies. Analyt. Biochem. 320:55-65
Reduce
or
photo-cleave
Page 18
Polony Bead Sequencing Pipeline
In vitro libraries via paired tag
manipulation
Bead polonies via emulsion PCR
[Dre03]
Monolayered immobilization in acrylamide
Enrichment of amplified beads
SOFTWARE
Images → Tag Sequences
Tag Sequences → Genome
FISSEQ or “wobble”sequencing
Epifluorescence Scope with Integrated Flow
Cell
Mitra, Shendure, Porreca, Rosenbaum, Church unpub.
Page 19
rs3778973
rs997906
rs1557917
rs39284
rs10500042
rs4717028
C
A
G
C
G
C
C
A
G
C
G
C
GM12248 GM12249
GM10835
T
T
A
T
A
T
T
T
A
T
A
T
C
A
G
C
G
C
T
T
A
T
A
T
Haplotypes inferred by pedigreevs. direct single molecule measures homozygous
in the parents
heterozygous in the son
1.8Mb
79.9Mb
88.2Mb
89.4Mb
114Mb
155Mb
Page 20
rs3778973
rs997906
rs1557917
rs39284
rs10500042
rs4717028
GM10835
1.8Mb
79.9Mb
88.2Mb
89.4Mb
114Mb
155Mb
C
A
G
C
G
C
T
T
A
T
A
T
1Mb haplotypes
AT=198 GT=0 GC=45
Page 21
rs3778973
rs997906
rs1557917
rs39284
rs10500042
rs4717028
GM10835
1.8Mb
79.9Mb
88.2Mb
89.4Mb
114Mb
155Mb
C
A
G
C
G
C
T
T
A
T
A
T
75Mb haplotypes
TT=8 TC=0 AC=23
Page 22
rs3778973
rs997906
rs1557917
rs39284
rs10500042
rs4717028
GM10835
1.8Mb
79.9Mb
88.2Mb
89.4Mb
114Mb
155Mb
C
A
G
C
G
C
T
T
A
T
A
T
153Mb haplotypes
TT=72 CT=15 CC=28
Page 23
Plone-bead Fluorescent In Situ Sequencing in vitro Libraries
Greg PorrecaAbraham Rosenbaum
1 to 100kb Genomic1 to 100kb Genomic
M
L R
M
PCRbead
Sequencingprimers
Selectorbead
2x20bp after MmeI2x20bp after MmeI
Dressman et al PNAS 2003 emulsion
Page 24
Plone-FISSeq: up to 1 billion beads/slideWhite= Fe-core pixels, Cy5 primer (570nm) ; Cy3 dNTP (666nm)
Jay Shendure, Greg Porreca
Page 25
• # of bases sequenced (total Mbp) 23 (no) 10.8 (yes)
• # bases sequenced (unique) 73 b 4.7 Mb (72%)
• Avg fold coverage 324K 2.3
• Pixels used per bead (analysis) 3.6 3.6
• Read Length (bp) 14 24
• Indels 0.6% ?
• Substitutions (raw error-rate) 4e-5 1e-2• Throughput (kb/min) 360 10• Speed/cost ratio relative to 1100 32 current ABI capillary sequencing @ 0.75 kb/min/device
Plone-bead FISSeq '04 '05Consider amplification , homopolymer, context errors?
Shendure & Porreca
Page 26
CD44 Exon Combinatorics (Zhu & Shendure)
• Alternatively Spliced Cell Adhesion Molecule• Specific variable exons are up-or-down-regulated in
various cancers (>2000 papers)• v6 & v7 enable direct binding to chondroitin sulfate,
heparin…
Zhu,J, et al. Science. 301:836-8.
Page 27
Zhu J, Shendure J, Mitra RD, Church GM. Science 301:836-8. Single molecule profiling of alternative pre-mRNA splicing.
EXON PATTERN Eph4 Eph4bDD TOTALEph4 FRATIO LSTP-PV------------7-8-9-10 609 764 1373 1.17 1E-4--------------8-9-10 320 390 710 1.13 3E-2----------6-7-8-9-10 431 251 682 -1.85 4E-18------4-5-6-7-8-9-10 218 216 434 -1.08 2E-1----------------9-10 68 143 211 1.96 7E-7--------5-6-7-8-9-10 86 39 125 -2.37 2E-6----3-4-5-6-7-8-9-10 40 56 96 1.30 9E-2------4-5---7-8-9-10 16 74 90 4.30 2E-9--2-3-4-5-6-7-8-9-10 44 28 72 -1.69 1E-21-2-3-4-5-6-7-8-9-10 22 5 27 -4.73 3E-4--------5---7-8-9-10 5 19 24 3.53 3E-3----3-4-5---7-8-9-10 1 15 16 13.95 4E-4--2-3-4-5---7-8-9-10 1 10 11 9.30 5E-3
Eph4 = murine mammary epithelial cell line
Eph4bDD = stable transfection of Eph4 with MEK-1 (tumorigenic)
CD44 RNA splicing isoforms
Page 28
Soluble CD44
Zhu & Varma
Page 29
Systems Biology Loop
Syntheses &Perturbations
Models
Experimental designs
(Systematic)
Data
Analysis & Synthesis Tools
Genome engineering
DNA & RNAPolony
Sequencing
Page 30
Molecular Systems BiologyTranscriptomics
Proteomics Metabolomics
Functional genomics Structural genomics
Computational biology Theoretical biology
Mathematical biologySynthetic biology
An open access journalwww.nature.com/msb/
Page 31
Thanks to: Washington U, Harvard-MIT
Broad Inst., DARPA-BioSpice, DOE-GTL, EU-MolTools,
NGHRI-CEGS, NHLBI-PGA, NIGMS-SysBio, PhRMA, Lipper Foundation
Agencourt, Ambergen, Atactic, BeyondGenomics, Caliper, Genomatica, Genovoxx, Helicos, MJR, NEN, Nimblegen, SynBioCorp, ThermoFinnigan, Xeotron/Invitrogen
For more info see: arep.med.harvard.edu
CHI Microarrays in Medicine4-May-2005 9:20-9:50 AM
Synthesis &Analysis on Molecular Arrays