Genome sequencing MUPGRET Workshop Joe Polacco
Dec 22, 2015
Genome sequencing
MUPGRET WorkshopJoe Polacco
Size of human genome 23 pairs of chromosomes 3.1 billion bp If code written in NYC phone books
and stacked up would reach top of Washington monument.
Human Genome Project Began as a academic effort Initially involved 5 research
centers in US and England. Soon joined by Celera, spin off
company.
Some surprises Initial estimate 100,000 to 150,000
genes but found to be 35,000 to 50,000. (C. elegans ~19,000 genes)
Mass of genome that codes for protein originally estimated as 5% but found to be 1.5%.
Some completely sequenced genomes Mycoplasma genetialium
578,000 bp, 400 genes Haemophilus influenza
1,830,138 bp, 1738 genes E. coli
4,639,221 bp, 4377 genes S. cervisiae
12 x 106 bp, 5885 genes
More genomes C. elegans
95.5 x 106, 19,820 genes D. melanogaster
1.8 x 108, 13,601 genes A. thaliana
1.17 x 108, 25, 498 genes
More genomes M. musculus
3 x 109, ~30,000 genes H. sapiens
3.3 X 109, 30-50,000 genes O. sativa
4.3 x 108, 30-63,000 genes
The beginning Human genome project initially
discussed at a UC-Santa Cruz meeting in 1985.
What were the concerns? What will it do to biology? How will be pay for it? Is this really science? Why bother to sequence it all?
all vs. just the genes (skim sequencing)
Dept. of Energy Initially funded project in 1987. $5.3 million Study radiation induced mutations,
repair and effect on humans.
NIH Joined in 1988. James Watson leader 3% of research budget devoted to
examining the ethical, legal, and social implications of gene research (ELSI)
Other genomes Parallel sequencing of E. coli, S.
cerevisiae, C. elegans, D. melanogaster, and M. musculus
Why Work out the technology and methods
Watson’s vision Sequence it all not just genes. Use genetic maps and markers to
help assemble the pieces.
Academic players Wash U Baylor Whitehead Wellcome Trust Joint Genome Institute—DOE
Center
$1 to 10 cents a finished bp automated processing of cloned
DNA automated DNA sequencing computer system to support
sequence data algorithms to assess sequence
fidelity, assemble sequences, and “find” genes.
Maps Thomas Hunt Morgan (early 1900s)
—low resolution phenotypic markers
1970s restriction maps 1980s RFLPs 1989 Maynard Olson, Leroy Hood,
Charles Cantor, and David Botstein sequence itself is a marker! (STS)
PCR Polymerase Chain Reaction http://www.dnai.org/b/index.html
Techniques Amplifying
Making copies of DNA
The PCR revolution 1985 Kary Mullis-Cetus Corporation No need to send clones back and
forth Allowed automated DNA sequencing No need for large clone repositiory for
all human genes Unrestricted access to genes via
public sequence databases.
Kary Mullis talks about PCR http://www.dnai.org/b/index.html
Techniques Amplifying Interviews
Making DNA copies Naming PCR
Sequencing-the old way Maxim and Gilbert or Sanger methods
http://www.dnai.org/b/index.html Techniques Sorting and Amplifying
Early DNA sequencing http://www.dnai.org/b/index.html
Techniques Sorting and Amplifying
Interviews Dideoxy method of sequencing
Automated Sequencing Automation made possible by new
dye chemistry developed by Leroy Hood and Lloyd Smith at Cal. Inst. Tech. in 1986. http://www.dnai.org/b/index.html
Techniques Sorting and Amplifying
Cycle Sequencing
Inside the automated sequencer Collaboration with ABI produced
first automated sequencer. Laser detection of each bp.
http://www.dnai.org/b/index.html Techniques Sorting and Amplifying
Interviews Making sequencing automated Inside an automated sequencer
Sequencing Detect all 4 nucleotides in one lane
so quadrupled the output from a single sequencing gel.
Dupont dye terminators—allowed all four nucleotides to be attached to terminal nucleotide in the same sequencing reaction.
Capillary eliminated need to cast gels.
Sequencing the Genome an Overview
Show sequencing.exe file containing movie about sequencing the human genome.
Two approaches to sequence the genome
Hierarchical Shotgun clone libraries Use map to pick pieces of genome in
order, break them, sequence and reassemble. (Watson)
Whole genome shotgun Break up genomic DNA randomly,
sequence several genome equivalents, and reassemble. (Ventner)
Hierarchical Shotgun Clone Libraries Top-down strategy Ordered library of clones based on
large scale maps. Subclone larger inserts into
sequencing vector. Reassemble sequence. Based on order.
ESTs Expressed sequence tags Reverse transcribe mRNA and
sequence. Venter used nonspecific primer to
randomly amplify 150-400 bp fragments of genes.
Patent controversy NIH announced it would seed a
patent on Venter’s STS. Very controversial since functionally
unknown. More appropriate to private
company. Watson said it was “sheer lunacy”
and resigned due to conflict with Bernardine Healy NIH director.
More patent Many biotech companies arose at
the time to mine ESTs and applied for patents on the genes for diagnostics and pharmaceuticals.
NIH withdrew patent application. ESTs must be novel to be patented. ESTs must be useful to be patented.
The result No patents granted thus far on
genes without known function.
Whole genome shotgun Break the genome into a bunch of
pieces often by mechanical shearing. Sequence pieces and reassemble. Weber (Marshfield Medical Research
Foundation) and Myers (U of AZ) proposed method to speed sequencing.
1998 Venter leaves NIH to head Celera and promised to sequence human genome in 3 years for $300 million.
Accelerated the public project. Whole genome method was tested
by sequencing 120 Mbp of Drosophila genome.