Thanks to: DARPA BioComp DNA&RNA Polonies: Mitra, Shendure, Zhu Protein MS: Jaffe, Leptos Metabolism/Proliferation models: Segre, Vitkup, 20-Mar-2003 New Methods for Genomic Systems Biology
Thanks to: DARPA BioComp
DNA&RNA Polonies: Mitra, Shendure, Zhu
Protein MS: Jaffe, LeptosMetabolism/Proliferation models: Segre, Vitkup, Badarinarayana
20-Mar-2003
New Methods for Genomic Systems Biology
gggatttagctcagttgggagagcgccagactgaa gatttg gaggtcctgtgttcgatccacagaattcgcacca
Modeling successes:
3D & Sequence alignment
Agenda for March 2015' George Church -- proteomics & polonies20' Daniel Segre – Metabolic modeling10' Matt Wright – 3D & 4D modeling25' Jingdong Tian -- minigenome10' Wayne Rindone – BioSpice
Discussion throughout is welcome.
10’ Financial, etc.
DNA RNA Proteins
Metabolites
Replication rate
Environment
Biosystems Integrating Measures & Models
Microbes Cancer & stem cells Darwinian optimaIn vitro replicationSmall multicellular organisms
RNAiInsertionsSNPs
interactions
Improving Models & Measures
Why model?
“Killer Applications”: Share, Search, Merge, Check, Design
The issue is not speed, but integration.Cost per 99.99% bp : Including Reagents, Personnel, Equipment/5yr, Overhead/sq.m• Sub-mm scale : 1m = femtoliter (10-15)• Instruments $2-50K per CPU
Why improve measurements?
Human genomes (6 billion)2 = 1019 bpImmune & cancer genome changes >1010 bp per time pointRNA ends & splicing: in situ 1012 bits/mm3
Biodiversity: Environmental & lab evolution Compact storage 105 now to 1017 bits/ mm3 eventually
& How? ($1K per genome, 108-1013 bits/$ )
Projected costs determine when biosystems data overdetermination is feasible.
In 1984, pre-HGP (X, pBR322, etc.) 0.1bp/$, would have been $30B per human
genome.
In 2002, (de novo full vs. resequencing ) ABI/Perlegen/Lynx: $300M vs. $3M
103 bp/$ (4 log improvement)
Other data I/O (e.g. video) 1013 bits/$
Steeper than exponential growth
0.001
0.01
0.1
1
10
100
1000
10000
1970 1980 1990 2000 2010
bp/$R2 = 0.985
R2 = 0.992
-5-3-113579
111315
1830 1850 1870 1890 1910 1930 1950 1970 1990 2010
log(IPS/$K)
log(bits/sec transmit)
http://www.faughnan.com/poverty.htmlhttp://www.kurzweilai.net/meme/frame.html?main=/articles/art0184.html
Kurzweil/Moore's law of ICs 1965
New sequencing approaches in commercial R&DMethod liter/bp Length Error Test-set $/device bp/hr
Capil fluidics e-6 600 <0.1% 1e11 350k 80k
ABI, Amersham, GenoMEMS, Caliper*, RTS*
SeqByHyb e-12 1 <5% 1e9 200k 1M
Perlegen-Affymetrix*, Xeotron*
Mass Spectrometry Sequenom, Bruker*
Single molecule >e-24 >>40 ? >80 30k-1M 180k
Pore(Agilent*) Fluor(USGenomics, Solexa) FRET(VisiGen,Mobious)
In vitro DNA-Amplification (e.g. Polonies) -- Multiplex cycles:
Lynx* e-15 20 <3% 1e7 ? 1M
Pyroseq.* e-6 >40 <1% 1e6 100k 5k
HMS* e-13<1% 40 90k >1M?
ParAllele, 454, RTS**GMC has a potential financial interest (or Harvard license)
Why single molecules?
Integration from cells/genomes/RNAs to data
Geometric constraints :Who’s “in cis” on a molecule, complex, or cell.e.g. DNA Haplotypes & RNA splice-forms
Polymerasecolonies
(Polonies) along a DNA
or RNAmolecule
A’
A’A’
A’
A’
A’
B
BB
B
BB
A
Single Molecule From Library
B
BA’
A’
1st Round of PCR
Primer is Extendedby Polymerase
B
A’
BA’
Polymerase colony (polony) PCR in a gel
Primer A has 5’ immobilizing Acrydite
Mitra & Church Nucleic Acids Res. 27: e34
• Hybridize Universal Primer • Add Red (Cy3) dTTP. Wash.• Add Green (FITC) dCTP• Wash; Scan
B B’
3’ 5’
AGT.
TC
B B’
3’ 5’
GCG..
C
Sequence polonies by sequential, fluorescent single-base extensions
Inexpensive, off-the-shelf equipment
MJR in situ Cycler$10K
Automatedslide fluidics
$4K
MicroarrayScanner$26K+
Human Haplotype:CFTR gene
45 kbp
Rob MitraVincent ButtyJay ShendureBen Williams
Quantitative removal of Fluorophores
Rob Mitra
Template ST30:3' TCACGAGT
Base added: (C) A G T (C)
(A) G (T) C (A)
(G) T C A
3' TCACGAGT AGTGCTCA
Sequencing multiple polonies
Rob Mitra
Mutiple Image Alignment
Metric based on optimal coincidence of high intensity noise pixels over a matrix of local offsets (0.4 pixel precision)
Polony exclusion principle &Single pixel sequences
Mitra & Shendure
DNA RNA Proteins
Metabolites
Replication rate
Environment
Biosystems Integrating Measures & Models
Microbes Cancer & stem cells Darwinian optimaIn vitro replicationSmall multicellular organisms
RNAiInsertionsSNPs
interactions
Alternatively Spliced Cell Adhesion Molecule
Specific variable exons are up-or-down-regulated in various cancers
Controversial prospective diagnostic / prognostic marker (>1000 papers)
Can full isoforms resolve controversy and/or act as superior markers?
Eph4 = murine mammary epthithelial cell line
Eph4bDD = stable transfection of Eph4 with MEK-1 (tumorigenic)
F R
v1 v2 v3 v4 v5 v6 v7 v8 v9 v10
TMA
CD44
CD44 Exon Combinatorics (Zhu & Shendure)
1. Search Signature Image for qualified ‘objects’
a. > 50 connected pixels with same signature valueb. ‘solidity’ of > 0.50c. long axis / short axis ratio < 3
OR
a. > 25 connected pixels with same signature valueb. ‘solidity’ of > 0.80c. long axis / short axis ratio < 1.5
2. Search for internal regional maxima within each object (lest two adjacent polonies with same signature get counted as one)
3. Assign centroid locations as qualified individual ‘polonies’
Trial & Error Derived Algorithm for Polony Finding
V1
V2
V3
V4
V5
V6
V7
V8
V9
V1
0
Examples of Counts (isoforms) of 8000 analyzed
Jun Zhu
------------7-8-9-10 609 764 1373 1.17 1E-4--------------8-9-10 320 390 710 1.13 3E-2----------6-7-8-9-10 431 251 682 -1.85 4E-18------4-5-6-7-8-9-10 218 216 434 -1.08 2E-1----------------9-10 68 143 211 1.96 7E-7--------5-6-7-8-9-10 86 39 125 -2.37 2E-6----3-4-5-6-7-8-9-10 40 56 96 1.30 9E-2------4-5---7-8-9-10 16 74 90 4.30 2E-9--2-3-4-5-6-7-8-9-10 44 28 72 -1.69 1E-21-2-3-4-5-6-7-8-9-10 22 5 27 -4.73 3E-4--------5---7-8-9-10 5 19 24 3.53 3E-3----3-4-5---7-8-9-10 1 15 16 13.95 4E-4--2-3-4-5---7-8-9-10 1 10 11 9.30 5E-3
Summary of Counts (isoforms)
Jun Zhu
EXON PATTERN Eph4 Eph4bDD TOTALEph4 FRATIO LSTP-PV------------7-8-9-10 609 764 1373 1.17 1E-4--------------8-9-10 320 390 710 1.13 3E-2----------6-7-8-9-10 431 251 682 -1.85 4E-18------4-5-6-7-8-9-10 218 216 434 -1.08 2E-1----------------9-10 68 143 211 1.96 7E-7--------5-6-7-8-9-10 86 39 125 -2.37 2E-6----3-4-5-6-7-8-9-10 40 56 96 1.30 9E-2------4-5---7-8-9-10 16 74 90 4.30 2E-9--2-3-4-5-6-7-8-9-10 44 28 72 -1.69 1E-21-2-3-4-5-6-7-8-9-10 22 5 27 -4.73 3E-4--------5---7-8-9-10 5 19 24 3.53 3E-3----3-4-5---7-8-9-10 1 15 16 13.95 4E-4--2-3-4-5---7-8-9-10 1 10 11 9.30 5E-3
Eph4 = murine mammary epthithelial cell line
Eph4bDD = stable transfection of Eph4 with MEK-1 (tumorigenic)
Polony Flavors
1. Replica Plating of DNA images [Mitra et al. NAR 1999]
2. Long Range Haplotyping [Mitra et al. PNAS 2003]
3. Allelic mRNA Quantitation (HEP) [Mitra et al. 2003]
4. Alternative Splicing Combinatorics [Zhu et al. 2003]
5. Precise SNP-mutant & mRNA ratios [Merrill et al. 2003]
6. Fluor in situ Sequencing (FISSEQ 1) [Mitra et al. 2003]
7. Multiplex Genotyping (ApoE, Hyman, Shendure & Williams)
8. In situ / single-cell extensions of the above (Zhu & Williams)
DNA RNA Proteins
Metabolites
Replication rate
Environment
Biosystems Integrating Measures & Models
Microbes Cancer & stem cells Darwinian optimaIn vitro replicationSmall multicellular organisms
RNAiInsertionsSNPs
interactions
Link et al. 1997 Electrophoresis 18:1259-313 (Pub)
Comparison of predicted with
observed protein properties
(abundance, localization, postsynthetic modifications)
E.coli
Circadian Cycle Proteogenomic Map 1/4
Circadian Cycle Proteogenomic Map 2/4
Circadian Cycle Proteogenomic Map 3/4
Circadian Cycle Proteogenomic Map 4/4
Numbers on top in basepairs. 1700 ORFs are predicted . Proteomic Model is based on Mass-spectrometry of peptides at 24h time points. DifferenceMap indicates new peptide regions. The 6 colors represent ORFs in the 6 reading frames.(Harvard-MIT GtL: Jaffe, Church, Lindell, Chisholm, et al. )
Circadian &Cell Cycle Proteogenomic Map (zoom)
Circadian time-series (Prochlorococcus) RNA & protein quantitation:
R2=.992 R2=.635 Linear Regression R2=.1
(Harvard-MIT GtL: Jaffe, Church, Lindell, Chisholm, et al. )
RNA (3 AM)RNA (3 AM)
DNA RNA Proteins
Metabolites
Replication rate
Environment
Biosystems Integrating Measures & Models
Microbes Cancer & stem cells Darwinian optimaIn vitro replicationSmall multicellular organisms
RNAiInsertionsSNPs
interactions