Page 1
BioSci 203 blumberg lecture 6 page 1 ©copyright Bruce Blumberg 2001-2005. All rights reserved
Bio Sci 203 bb-lecture 6 – DNA sequence analysis
• Bruce Blumberg ([email protected] )– office – 2113E McGaugh Hall– 824-8573– office hours MWF 11-12.
• Today– Characterization of Selected DNA Sequences
• DNA sequence analysis
Page 2
BioSci 203 blumberg lecture 6 page 2 ©copyright Bruce Blumberg 2001-2005. All rights reserved
How to identify your gene of interest (contd)
• You have one protein and want to identify proteins that interact with it– some sort of interaction screen is indicated
• straight biochemistry• phage display• two hybrid• in vitro expression cloning
Page 3
BioSci 203 blumberg lecture 6 page 3 ©copyright Bruce Blumberg 2001-2005. All rights reserved
How to identify your gene of interest (contd)• biochemical approach
– purify cellular proteins that interact with your protein• co-immunoprecipitation• affinity chromatography• biochemical fractionation
– pure protein(s) are microsequenced• if not in database then make oligonucleotides and screen
cDNA library from appropriate tissues– advantage
• functional approach• stringency can be manipulated• can identify multimeric proteins or complexes• will work if you can purify proteins
– disadvantages• much skill required• low throughput• considerable optimization required
Page 4
BioSci 203 blumberg lecture 6 page 4 ©copyright Bruce Blumberg 2001-2005. All rights reserved
How to identify your gene of interest (contd)
• Phage display screening (a.k.a. panning)– requires a library that expresses
inserts as fusion proteins with a phage capsid protein
• most are M13 based• some lambda phages used
– prepare target protein• as affinity matrix• or as radiolabeled probe
– test for interaction with library members• if using affinity matrix you purify phages from a mixture• if labeling protein one plates fusion protein library and
probes with the protein– called receptor panning based on similarity with panning
for gold
Page 5
BioSci 203 blumberg lecture 6 page 5 ©copyright Bruce Blumberg 2001-2005. All rights reserved
How to identify your gene of interest (contd)
• Phage display screening (a.k.a. panning) (contd)– advantages
• stringency can be manipulated• if the affinity matrix approach works the cloning could go
rapidly– disadvantages
• Fusion proteins bias the screen against full-length cDNAs• Multiple attempts required to optimize binding• Limited targets possible• may not work for heterodimers• unlikely to work for complexes• panning can take many months for each screen
Page 6
BioSci 203 blumberg lecture 6 page 6 ©copyright Bruce Blumberg 2001-2005. All rights reserved
How to identify your gene of interest (contd)
• Two hybrid screening– originally used in yeast, now
other systems possible– prepare bait - target protein fused
to DBD (GAL4) usual• stable cell line is commonly used
– prepare fusion protein library with an activation domain
– What is key factor required for success?
– approach• transfect library into cells and
either select for survival or activation of reporter gene
• purify and characterize positive clones
No activation domain in bait!
Page 7
BioSci 203 blumberg lecture 6 page 7 ©copyright Bruce Blumberg 2001-2005. All rights reserved
How to identify your gene of interest (contd)
• Two hybrid screening (contd)– advantages
• seems simple and inexpensive on its face– in materials
• functional assay– disadvantages
• fusion proteins bias the screen against full-length cDNAs. • Binding parameters not manipulable• Difficult or impossible to detect interactions between proteins
and complexes.• Doesn’t work for secreted proteins• Many months to screen
– savings in materials are eaten up by salaries– avg grad student costs $30k/year– avg postdoc or tech costs $40k/year
• MANY false positives
Page 8
BioSci 203 blumberg lecture 6 page 8 ©copyright Bruce Blumberg 2001-2005. All rights reserved
How to identify your gene of interest (contd)• In vitro interaction screening
– based on in vitro expression cloning (IVEC)• transcribe and translate cDNA libraries in vitro into small
pools of proteins (~100)• test these proteins for their ability to interact with your
protein of interest– EMSA, co-ip, FRET, SPA
– advantages• functional approach• smaller pools increase sensitivity• automated variant allows diversity of targets
– proteins, protein complexes, nucleic acids, protein/nucleic acid complexes, small molecule drugs
– very fast– disadvantages
• can’t detect heterodimers unless 1 partner known• expensive consumables (but cheap salaries)
– typical screen will cost $10-15K• expense of automation
Page 9
BioSci 203 blumberg lecture 6 page 9 ©copyright Bruce Blumberg 2001-2005. All rights reserved
Analysis of genes and cDNAs• Characterization of cloned DNA (what do we want to know about a
new gene?– Complete DNA sequence
• cDNA sequence• genomic sequence? (promoters, introns and exons)• Restriction enzyme maps?
– where is the promoter(s)?• Alternative promoter use?• Mapping transcription start(s)
– where and when is mRNA expressed?• How abundantly is it expressed in each place?• association between expression levels and putative function?
– What is the function of this gene?• Loss-of-function analysis decisive
– Knockout or mutation– Knockdown (morpholino antisense, si RNA)– mutant mRNA e.g. dominant negative
• gain of function may be helpful– transgenic– mutant mRNA - constitutively active transcription factor
Page 10
BioSci 203 blumberg lecture 6 page 10 ©copyright Bruce Blumberg 2001-2005. All rights reserved
DNA Sequence analysis• Complete DNA sequence (all nts both strands, no gaps)
– complete sequence is desirable but takes time• how long depends on size and strategy employed
– which strategy to use depends on various factors• how large is the clone?
– cDNA– genomic
• How fast is sequence required?
• sequencing strategies– primer walking– cloning and sequencing of restriction fragments– progressive deletions
• Bidirectional, unidirectional– Shotgun sequencing
• whole genome• with mapping
– map first (C. elegans)– map as you go (many)
Page 11
BioSci 203 blumberg lecture 6 page 11 ©copyright Bruce Blumberg 2001-2005. All rights reserved
DNA Sequence analysis (contd)
• Primer walking - walk from the ends with oligonucleotides– sequence, back up ~50 nt from end, make a primer and continue– Why back up?
• To get adequate overlap
• May not get within 50 nt of primer with current sequencing
Page 12
BioSci 203 blumberg lecture 6 page 12 ©copyright Bruce Blumberg 2001-2005. All rights reserved
DNA Sequence analysis (contd)• Primer walking (contd)
– advantages• very simple• no possibility to lose bits of DNA
– restriction mapping– deletion methods
• no restriction map needed• best choice for short DNA
– disadvantages• slowest method
– about a week between sequencing runs• oligos are not free (and not reusable)• not feasible for large sequences
– applications• cDNA sequencing when time is not critical• targeted sequencing
– verification– closing gaps in sequences
Page 13
BioSci 203 blumberg lecture 6 page 13 ©copyright Bruce Blumberg 2001-2005. All rights reserved
DNA Sequence analysis (contd)• Cloning and sequencing of restriction fragments
– once the most popular method• make a restriction map, subclone fragments• sequence
– advantages• straightforward• directed approach• can go quickly• cloned fragments often useful otherwise
– RNase protection, nuclease mapping, in situ hybridization– disadvantages
• possible to lose small fragments– must run high quality analytical gels
• depends on quality of restriction map– mistaken mapping -> wrong sequence
• restriction site availability– applications
• sequencing small cDNAs• isolating regions to close gaps
Page 14
BioSci 203 blumberg lecture 6 page 14 ©copyright Bruce Blumberg 2001-2005. All rights reserved
DNA Sequence analysis (contd)
• nested deletion strategies - sequential deletions from one end of the clone– cut, close and sequence
• Approach– make restriction map– use enzymes that cut in polylinker and insert– Religate, sequence from end with restriction site– repeat until finished, filling in gaps with oligos
• advantages– Fast, simple, efficient
• disadvantages– limited by restriction site availability in vector and insert– need to make a restriction map
Page 15
BioSci 203 blumberg lecture 6 page 15 ©copyright Bruce Blumberg 2001-2005. All rights reserved
• nested deletion strategies (contd)– Exonuclease III-mediated deletion
• cut with polylinker enzyme– protect ends -
» 3’ overhang» phosphorothioate
• cut with enzyme between first cut and the insert
– can’t leave 3’ overhang• timed digestions with Exonuclease III• stop reactions, blunt ends• ligate and size select recombinants• sequence• advantages
– unidirectional– processivity of enzyme
gives nested deletions
DNA Sequence analysis (contd)
Page 16
BioSci 203 blumberg lecture 6 page 16 ©copyright Bruce Blumberg 2001-2005. All rights reserved
DNA Sequence analysis (contd)
• Nested deletion strategies– Exonuclease III-mediated deletion (contd)
• disadvantages– need two unique restriction sites flanking insert on each
side– best used successively to get > 10kb total deletions– may not get complete overlaps of sequences
» fill in with restriction fragments or oligos• applications
– method of choice for moderate size sequencing projects» cDNAs» genomic clones
– good for closing larger gaps
Page 17
BioSci 203 blumberg lecture 6 page 17 ©copyright Bruce Blumberg 2001-2005. All rights reserved
Large-Scale DNA Sequence analysis• Shotgun sequencing NOT invented by Craig Venter
– Messing 1981 first description of shotgun– Sanger lab developed current methods in 1983– approach
• blast genome into small chunks• clone these chunks
– 3-5 kb, 8 kb plasmid– 40 kb fosmid jump
repetitive sequences• sequence + assemble by computer
– A priori difficulties• how to get nice uniform distribution• how to assemble fragments• what to do about repeats?• How to minimize sequence redundancy?
Page 18
BioSci 203 blumberg lecture 6 page 18 ©copyright Bruce Blumberg 2001-2005. All rights reserved
Large-Scale DNA Sequence analysis (contd)
Page 19
BioSci 203 blumberg lecture 6 page 19 ©copyright Bruce Blumberg 2001-2005. All rights reserved
Large-Scale DNA Sequence analysis (contd)
Page 20
BioSci 203 blumberg lecture 6 page 20 ©copyright Bruce Blumberg 2001-2005. All rights reserved
Large-Scale DNA Sequence analysis (contd)• Shotgun sequencing (contd)
– How to minimize sequence redundancy?• Best way to minimize redundancy is map before you start
– C. elegans was done this way - when the sequence was finished, it was FINISHED
» mapping took almost 10 years– mapping much too tedious and nonprofitable for Celera
» who cares about redundancy, let’s sequence and make $$
• why does redundancy matter?– Finished sequence today costs about $0.50/base
Page 21
BioSci 203 blumberg lecture 6 page 21 ©copyright Bruce Blumberg 2001-2005. All rights reserved
Large-Scale DNA Sequence analysis (contd)
– Mapping by fingerprinting
– Mapping by hybridization
Page 22
BioSci 203 blumberg lecture 6 page 22 ©copyright Bruce Blumberg 2001-2005. All rights reserved
Large-Scale DNA Sequence analysis (contd)– Map as you go
Page 23
BioSci 203 blumberg lecture 6 page 23 ©copyright Bruce Blumberg 2001-2005. All rights reserved
Large-Scale DNA Sequence analysis (contd)
• Whole genome shotgun sequencing (Celera)– premise is that rapid generation of draft sequence is valuable– why bother trying to clone and sequence difficult regions?
• Basically just forget regions of repetitive DNA - not cost effective
– using this approach, it is easy to get to 90% finished• rule of thumb is that it takes at least as long to finish the last
5% as it took to get the first 95%– problems
• sequences done this way may never be complete as is C. elegans
• much redundant sequence with many sparse regions and lots of gaps.
• Fragment assembly for regions of highly repetitive DNA is dubious at best
• “Finished” fly and human genomes lack more than a few already characterized genes
Page 24
BioSci 203 blumberg lecture 6 page 24 ©copyright Bruce Blumberg 2001-2005. All rights reserved
Large-Scale DNA Sequence analysis (contd)
• How to approach a large new genome, knowing what we know now?– Xenopus tropicalis 1.7 Gb (about ½ human)
• Whole genome shotgun• BAC end sequencing• EST sequencing
– 8 x coverage currently– How to finish?
• Gaps could be closed with BACS• Finishing dependent on additional funding