Top Banner
DNA 1: Life & computers; comparative genomics, databases; model ut DNA 2: Polymorphisms, populations, statistics, pharmacogenom DNA 3: Dynamic programming, Blast, Multi-alignment, HiddenMarkovM RNA 1: Microarrays, library sequencing & quantitation concepts RNA 2: Clustering by gene or condition & other regulon data sourc RNA 3: Nucleic acid motifs; the nature of biological "proofs". Protein 1: 3D structural genomics, homology, dynamics, function & d Protein 2: Mass spectrometry, post-synthetic modifications, Protein 3: Quantitation of proteins, metabolites, & interactions Network 1: Metabolic kinetic & flux balance optimization methods Network 2: Molecular computing, self-assembly, genetic algorithms, Network 3: Cellular, developmental, social, ecological & commercial Team Project presentations Project Presentations Project Presentations Project follow-up & course synthesis Biophysics 101 Genomics and Computational Biology
52

Schedule: Tue Sep 19 DNA 1: Life & computers; comparative genomics, databases; model utility Tue Sep 26 DNA 2: Polymorphisms, populations, statistics,

Dec 18, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Schedule: Tue Sep 19 DNA 1: Life & computers; comparative genomics, databases; model utility Tue Sep 26 DNA 2: Polymorphisms, populations, statistics,

Schedule:Tue Sep 19 DNA 1: Life & computers; comparative genomics, databases; model utilityTue Sep 26 DNA 2: Polymorphisms, populations, statistics, pharmacogenomicsTue Oct 03 DNA 3: Dynamic programming, Blast, Multi-alignment, HiddenMarkovModelsTue Oct 10 RNA 1: Microarrays, library sequencing & quantitation concepts Tue Oct 17 RNA 2: Clustering by gene or condition & other regulon data sourcesTue Oct 24 RNA 3: Nucleic acid motifs; the nature of biological "proofs".Tue Oct 31 Protein 1: 3D structural genomics, homology, dynamics, function & drug designTue Nov 07 Protein 2: Mass spectrometry, post-synthetic modifications, Tue Nov 14 Protein 3: Quantitation of proteins, metabolites, & interactionsTue Nov 21 Network 1: Metabolic kinetic & flux balance optimization methodsTue Nov 28 Network 2: Molecular computing, self-assembly, genetic algorithms, neuralnetsTue Dec 05 Network 3: Cellular, developmental, social, ecological & commercial modelsTue Dec 12 Team Project presentationsTue Dec 19 Project PresentationsTue Jan 02 Project PresentationsTue Jan 09 Project follow-up & course synthesis

Biophysics 101 Genomics and Computational Biology

Page 2: Schedule: Tue Sep 19 DNA 1: Life & computers; comparative genomics, databases; model utility Tue Sep 26 DNA 2: Polymorphisms, populations, statistics,

101 Section meetingsTue 3:00 - 4:00 Haley HMS MEC 342Wed 7:00 - 8:00 pm Jason HMS MEC 342Thu 12:00 - 1:00 Dan HMS MEC 342 (Except on 12-Oct & 9-Nov he will use MEC 338)Thu 12:00 - 1:00 Nick HMS MEC 340Tue 7:30 - 9:00 pm Doug Science Cntr 110Tue 7:30 - 8:30 pm Allegra Science Cntr 101BTue 7:30 - 8:30 pm Yonatan Science Cntr 102BWed 6:00 - 7:00 pm Peter Science Cntr 112Thu 8:00 - 9:00 pm Adnan Science Cntr 209

Despite recruitment of new TFs, the sections are crowded so there are no auditor sections. Anyone registered who did not receive email should check the list at the break. (Email to schedule another "Biology tutorial" for Math/CS experts)

Page 3: Schedule: Tue Sep 19 DNA 1: Life & computers; comparative genomics, databases; model utility Tue Sep 26 DNA 2: Polymorphisms, populations, statistics,

Last week's take home lessons

Life & computers : Self-assembly Math: be suspicious of approximationsCatalysis by RNA & proteins"The Code": treasure (but don't memorize) exceptionsReplication Differential equation: dx/dt=kxMutation & the single molecule: Noise is overcome Human disease: SNPs <1 ppb & 1.5 fold dosage Directed graphs & pedigrees Bell curve statistics: Binomial & PoissonSelection

Page 4: Schedule: Tue Sep 19 DNA 1: Life & computers; comparative genomics, databases; model utility Tue Sep 26 DNA 2: Polymorphisms, populations, statistics,

Today's story, logic & goals

Types of mutantsMutation, drift, selection Binomial & exponential dx/dt = kxAssociation studies 2 statisticLinked and causative allelesHaplotypesComputing the first genome, the second ... New technologiesRandom and systematic errors

Page 5: Schedule: Tue Sep 19 DNA 1: Life & computers; comparative genomics, databases; model utility Tue Sep 26 DNA 2: Polymorphisms, populations, statistics,

Types of Mutants

Null: PKUDosage: Trisomy 21Conditional (e.g. temperature or chemical)Gain of function: HbSAltered ligand specificity

Page 6: Schedule: Tue Sep 19 DNA 1: Life & computers; comparative genomics, databases; model utility Tue Sep 26 DNA 2: Polymorphisms, populations, statistics,

A consensus motif in the RFX DNA binding domain and binding domain mutants with altered specificity. A mutant Escherichia coli sigma 70 subunit of RNA polymerase with altered promoter specificity. A mutant of Escherichia coli with altered inducer specificity for the fad regulon. A mutation in the xanthine dehydrogenase (purine hydroxylase I) of Aspergillus nidulans resulting in altered specificity. Implications for the A point mutation in the gamma2 subunit of gamma-aminobutyric acid type A receptors results in altered benzodiazepine binding site A point mutation leads to altered product specificity in beta-lactamase catalysis. A site-specific endonuclease derived from a mutant Trp repressor with altered DNA-binding specificity. A spontaneous point mutation in the aac(6')-Ib' gene results in altered substrate specificity of aminoglycoside 6'-N-acetyltransferase of a A streptavidin mutant with altered ligand-binding specificity. A structural model for the HIV-1 Rev-RRE complex deduced from altered-specificity rev variants isolated by a rapid genetic strategy. A technique for the isolation of yeast alcohol dehydrogenase mutants with altered substrate specificity. A U1 small nuclear ribonucleoprotein particle with altered specificity induces alternative splicing of an adenovirus E1A mRNA precursor. Amino acid substrate specificity of Escherichia coli phenylalanyl-tRNA synthetase altered by distinct mutations. An altered specificity mutation in the lambda repressor induces global reorganization of the protein-DNA interface. An altered-specificity mutation in a human POU domain demonstrates functional analogy between the POU-specific subdomain and Analysis of estrogen response element binding by genetically selected steroid receptor DNA binding domain mutants exhibiting altered Antiprotease targeting: altered specificity of alpha 1-antitrypsin by amino acid replacement at the reactive centre. AraC proteins with altered DNA sequence specificity which activate a mutant promoter in Escherichia coli. Assessment of the role of an omega loop of cholesterol oxidase: a truncated loop mutant has altered substrate specificity. Butyramide-utilizing mutants of Pseudomonas aeruginosa 8602 which produce an amidase with altered substrate specificity. Carboxyl-terminal domain dimer interface mutant 434 repressors have altered dimerization and DNA binding specificities. Characterization of the nuclear protein import mechanism using Ran mutants with altered nucleotide binding specificities. Computational method for the design of enzymes with altered substrate specificity. Crystallographic analysis of trypsin-G226A. A specificity pocket mutant of rat trypsin with altered binding and catalysis. Designing zinc-finger ADR1 mutants with altered specificity of DNA binding to T in UAS1 sequences. Dinitrogenase with altered substrate specificity results from the use of homocitrate analogues for in vitro synthesis of the iron-molybdenum Dissecting Fas signaling with an altered-specificity death-domain mutant: requirement of FADD binding for apoptosis but not Jun DNA-binding-defective mutants of the Epstein-Barr virus lytic switch activator Zta transactivate with altered specificities. E461H-beta-galactosidase (Escherichia coli): altered divalent metal specificity and slow but reversible metal inactivation. EcoRV-T94V: a mutant restriction endonuclease with an altered substrate specificity towards modified oligodeoxynucleotides. Engineering proteases with altered specificity. Engrailed (Gln50-->Lys) homeodomain-DNA complex at 1.9 A resolution: structural basis for enhanced affinity and altered specificity. Enhanced activity and altered specificity of phospholipase A2 by deletion of a surface loop. Escherichia coli hemolysin mutants with altered target cell specificity. Evidence for an altered operator specificity: catabolite repression control of the leucine operon in Salmonella typhimurium. Evidence that HT mutant strains of bacteriophage P22 retain an altered form of substrate specificity in the formation of transducing Ferrichrome transport in Escherichia coli K-12: altered substrate specificity of mutated periplasmic FhuD and interaction of FhuD with the Generation of estrogen receptor mutants with altered ligand specificity for use in establishing a regulatable gene expression system.

Altered specificity mutants

Page 7: Schedule: Tue Sep 19 DNA 1: Life & computers; comparative genomics, databases; model utility Tue Sep 26 DNA 2: Polymorphisms, populations, statistics,

Genetic strategy for analyzing specificity of dimer formation: Escherichia coli cyclic AMP receptor protein mutant altered in dimerization Immunoglobulin V region variants in hybridoma cells. I. Isolation of a variant with altered idiotypic and antigen binding specificity. In vitro selection for altered divalent metal specificity in the RNase P RNA. In vitro selection of zinc fingers with altered DNA-binding specificity. In vivo selection of basic region-leucine zipper proteins with altered DNA-binding specificities. Isolation and properties of Escherichia coli ATPase mutants with altered divalent metal specificity for ATP hydrolysis. Isolation of altered specificity mutants of the single-chain 434 repressor that recognize asymmetric DNA sequences containing TTAA Mechanisms of spontaneous mutagenesis: clues from altered mutational specificity in DNA repair-defective strains. Molecular basis of altered enzyme specificities in a family of mutant amidases from Pseudomonas aeruginosa. Mutants in position 69 of the Trp repressor of Escherichia coli K12 with altered DNA-binding specificity. Mutants of eukaryotic initiation factor eIF-4E with altered mRNA cap binding specificity reprogram mRNA selection by ribosomes in Mutational analysis of the CitA citrate transporter from Salmonella typhimurium: altered substrate specificity. Na+-coupled transport of melibiose in Escherichia coli: analysis of mutants with altered cation specificity. Nuclease activities of Moloney murine leukemia virus reverse transcriptase. Mutants with altered substrate specificities. Probing the altered specificity and catalytic properties of mutant subtilisin chemically modified at position S156C and S166C in the S1 Products of alternatively spliced transcripts of the Wilms' tumor suppressor gene, wt1, have altered DNA binding specificity and regulate Proline transport in Salmonella typhimurium: putP permease mutants with altered substrate specificity. Random mutagenesis of the substrate-binding site of a serine protease can generate enzymes with increased activities and altered Redesign of soluble fatty acid desaturases from plants for altered substrate specificity and double bond position. Selection and characterization of amino acid substitutions at residues 237-240 of TEM-1 beta-lactamase with altered substrate specificity Selection strategy for site-directed mutagenesis based on altered beta-lactamase specificity. Site-directed mutagenesis of yeast eEF1A. Viable mutants with altered nucleotide specificity. Structure and dynamics of the glucocorticoid receptor DNA-binding domain: comparison of wild type and a mutant with altered specificity. Structure-function analysis of SH3 domains: SH3 binding specificity altered by single amino acid substitutions. Sugar-binding and crystallographic studies of an arabinose-binding protein mutant (Met108Leu) that exhibits enhanced affinity & altered T7 RNA polymerase mutants with altered promoter specificities. The specificity of carboxypeptidase Y may be altered by changing the hydrophobicity of the S'1 binding pocket. The structural basis for the altered substrate specificity of the R292D active site mutant of aspartate aminotransferase from E. coli. Thymidine kinase with altered substrate specificity of acyclovir resistant varicella-zoster virus. U1 small nuclear RNAs with altered specificity can be stably expressed in mammalian cells and promote permanent changes in Use of altered specificity mutants to probe a specific protein-protein interaction in differentiation: the GATA-1:FOG complex. Use of Chinese hamster ovary cells with altered glycosylation patterns to define the carbohydrate specificity of Entamoeba histolytica Using altered specificity Oct-1 and Oct-2 mutants to analyze the regulation of immunoglobulin gene transcription. Variants of subtilisin BPN' with altered specificity profiles. Yeast and human TFIID with altered DNA-binding specificity for TATA elements.

Altered specificity mutants (continued)

Page 8: Schedule: Tue Sep 19 DNA 1: Life & computers; comparative genomics, databases; model utility Tue Sep 26 DNA 2: Polymorphisms, populations, statistics,

From genomics to public healthVaccines, drugs, lifestyle, public health measures

Pharmacogenomics

Targets (proteins or phenotypes)

Chemical diversity

Gene therapy, DNA vaccines, ribozymes, nutrition

High-throughput screening of compounds

Animal testing

Clinical trials phase 1,2,3

Formulation: Bioavailability

Toxicity

Delivery: time release ,feedback

Marketing and societal priorities

Page 9: Schedule: Tue Sep 19 DNA 1: Life & computers; comparative genomics, databases; model utility Tue Sep 26 DNA 2: Polymorphisms, populations, statistics,

Pharmacogenomics Gene/Enzyme Drug Quantitative

effect

Cisapride Drug-induced torsade de pointesKvLQT1 Terfenadine, disopyramide, meflaquine Drug-induced long QT syndrome

CYP2C9Tolbutamide, warfarin, phenytoin, nonsteroidal anti-inflammatories

Anticoagulant effect of warfarin

CYP2D6

Beta blockers, antidepressants, antipsychotics, codeine, debrisoquin, dextromethorphan, encainide, flecainide, guanoxan, methoxyamphetamine, N -propylajmaline, perhexiline, phenacetin, phenformin, propafenone, sparteine

Tardive dyskinesia from antipsychotics; narcotic side

effects, efficacy, and dependence; imipramine dose requirement; beta-

blocker effect

Dihydropyrimidine dehydrogenase Fluorouracil Fluorouracil neurotoxicity

ACE Enalapril, lisinopril, captoprilRenoprotective effects, cardiac

indices, blood pressure, immunoglobulin A nephropathy

Thiopurine methyltransferase Mercaptopurine, thioguanine, azathioprineThiopurine toxicity and efficacy; risk

of second cancers

HERG Quinidine Drug-induced long QT syndrome

hKCNE2 Clarithromycin Drug-induced arrhythmia

Potassium channels

Examples of clinically relevant genetic polymorphisms influencing drug metabolism and effects. Additional data

Page 11: Schedule: Tue Sep 19 DNA 1: Life & computers; comparative genomics, databases; model utility Tue Sep 26 DNA 2: Polymorphisms, populations, statistics,
Page 12: Schedule: Tue Sep 19 DNA 1: Life & computers; comparative genomics, databases; model utility Tue Sep 26 DNA 2: Polymorphisms, populations, statistics,

A significant basepair

aggtcatctgagGtcaggagttca

ANALYSIS: ALU repeat found upstream of

Iodothyronine deiodinase, Myeloperoxidase, Keratin K18, HoxA1,etc.

"-463 G creates a stronger SP1 binding site & retinoic acid response element (RARE) in the allele... overrepresented in acute promyelocytic leukemia" Piedrafita FJ, et al. 1996 JBC 271: 14412

Page 13: Schedule: Tue Sep 19 DNA 1: Life & computers; comparative genomics, databases; model utility Tue Sep 26 DNA 2: Polymorphisms, populations, statistics,

Critique of a basepair

1. 97% of the genome is noncoding.2. Even repeats have regulatory & health relevance.

4. One key basepair may be too reductionistic. Whole genome, whole population, whole network analyses are becoming increasingly feasible.

3. H. sapiens as a model system: Saturation mutagenesis screen of 6x109 heterozygotes; many hits per basepair on average.

Page 14: Schedule: Tue Sep 19 DNA 1: Life & computers; comparative genomics, databases; model utility Tue Sep 26 DNA 2: Polymorphisms, populations, statistics,

Today's story, logic & goals

Types of mutantsMutation, drift, selection Binomial & exponential dx/dt = kxAssociation studies 2 statisticLinked and causative allelesHaplotypesComputing the first genome, the second ... New technologiesRandom and systematic errors

Page 15: Schedule: Tue Sep 19 DNA 1: Life & computers; comparative genomics, databases; model utility Tue Sep 26 DNA 2: Polymorphisms, populations, statistics,

Where do allele frequencies come from?Mutation (T), Migration(M), Drift (D), Selection(S), …

Tj=Sj+(SiFj-i - SjRj-i) + (SiRi-j - SjF i-j)

i=0,j-1 i=j+1,N

Mj= Tj + analogous to above

Dj= Mi*B(N,j,i/N) i=0,N

Sj= Dj * w (w=relative fitness of i mutants to N-i original).

__________________________________

T,M,D,Si = frequency of i mutants in a pop. size N

Fi= forward rate = B(N,i,PF), Ri=reverse

B(N,i,p)= Binomial = C(N,i) pi (1-p)N-i

(ref)

Page 16: Schedule: Tue Sep 19 DNA 1: Life & computers; comparative genomics, databases; model utility Tue Sep 26 DNA 2: Polymorphisms, populations, statistics,

Random Genetic Drift very dependent upon population size

Page 17: Schedule: Tue Sep 19 DNA 1: Life & computers; comparative genomics, databases; model utility Tue Sep 26 DNA 2: Polymorphisms, populations, statistics,

Directional & Stabilizing Selection

• codominant mode of selection (genic selection)– fitness of heterozygote is the

mean of the fitness of the two homozygotes

AA = 1; Aa = 1 + s; aa = 1 + 2s

– always increase frequency of one allele at expense of the other

• overdominant mode– heterozygote has highest fitness

AA = 1, Aa = 1 + s; aa = 1 + t

where 0 < s > t

– reach equilibrium where two alleles coexist

Page 18: Schedule: Tue Sep 19 DNA 1: Life & computers; comparative genomics, databases; model utility Tue Sep 26 DNA 2: Polymorphisms, populations, statistics,

Fixation Times

• for neutral mutations, K = µ

• for advantageous mutations, K = 4Nsµ

Page 19: Schedule: Tue Sep 19 DNA 1: Life & computers; comparative genomics, databases; model utility Tue Sep 26 DNA 2: Polymorphisms, populations, statistics,

Role of Genetic Exchange

• Effect on distribution of fitness in the whole population

• Can accelerate rate of evolution at high cost (50%)

Page 20: Schedule: Tue Sep 19 DNA 1: Life & computers; comparative genomics, databases; model utility Tue Sep 26 DNA 2: Polymorphisms, populations, statistics,

DNA RNA Protein

Metabolites

Growth rateExpression

Interactions

Environment

Network genomics

stem cellscancer cellsvirusesorganisms

Page 21: Schedule: Tue Sep 19 DNA 1: Life & computers; comparative genomics, databases; model utility Tue Sep 26 DNA 2: Polymorphisms, populations, statistics,

Multiplex Competitive Growth Experiments

In-framemutants+ wild-type

Pool Select

MultiplexPCRsize-tagor chipreadout

40° pH5 NaCl Complex

t=0

Page 22: Schedule: Tue Sep 19 DNA 1: Life & computers; comparative genomics, databases; model utility Tue Sep 26 DNA 2: Polymorphisms, populations, statistics,

64 Conditions

48 (to 600) Strains

Intensity calibratedto strain abundance(selection coeficient)

Page 23: Schedule: Tue Sep 19 DNA 1: Life & computers; comparative genomics, databases; model utility Tue Sep 26 DNA 2: Polymorphisms, populations, statistics,

Ratio of strains over environments, e ,times, te , selection coefficients, se,R = Ro exp[-sete]

80% of 34 random yeast insertions have s<0.3% or s>0.3%t=160 generations, e=1 (rich media); ~50% for t=15, e=7.Should allow comparisons with population allele models.

Other multiplex competitive growth experiments:Thatcher, et al. (1998) PNAS 95:253.Link AJ (1994) thesis; (1997) J Bacteriol 179:6228.Smith V, et al. (1995) PNAS 92:6479. Shoemaker D, et al. (1996) Nat Genet 14:450.

Page 24: Schedule: Tue Sep 19 DNA 1: Life & computers; comparative genomics, databases; model utility Tue Sep 26 DNA 2: Polymorphisms, populations, statistics,

Today's story, logic & goals

Types of mutantsMutation, drift, selection Binomial & exponential dx/dt = kxAssociation studies 2 statisticLinked and causative allelesHaplotypesComputing the first genome, the second ... New technologiesRandom and systematic errors

Page 25: Schedule: Tue Sep 19 DNA 1: Life & computers; comparative genomics, databases; model utility Tue Sep 26 DNA 2: Polymorphisms, populations, statistics,

Caution: phases of human genetics

Monogenic vs. Polygenic dichotomy

Method ProblemsMendelian Linkage need large familiesCommon direct (causative) 3% coding + ?non-codingCommon indirect (LD) recombination & new allelesAll alleles (causative) expensive

LD= linkage disequilibrium = non-random association of k alleles

Page 26: Schedule: Tue Sep 19 DNA 1: Life & computers; comparative genomics, databases; model utility Tue Sep 26 DNA 2: Polymorphisms, populations, statistics,

Electron magnetic moment to Bohr magneton ratio e/B = 1.0011596521869 (41) Ur= 4.1x 10-12

"99.5%…to accept unambiguously that the Higgs has been spotted, the chances …

have to be reduced to one in ten million"

Peter J. Mohr and Barry N. Taylor, CODATA & Reviews of Modern Physics, Vol. 72, No. 2, 2000.

physics.nist.gov/cuu/ConstantsNature 407: 118

Number of genes in the human genome

34,000 to 120,000

Nature Genetics July 2000

Page 27: Schedule: Tue Sep 19 DNA 1: Life & computers; comparative genomics, databases; model utility Tue Sep 26 DNA 2: Polymorphisms, populations, statistics,

False negatives & positive rates

Page 28: Schedule: Tue Sep 19 DNA 1: Life & computers; comparative genomics, databases; model utility Tue Sep 26 DNA 2: Polymorphisms, populations, statistics,

One form of HIV-1 Resistance

Page 29: Schedule: Tue Sep 19 DNA 1: Life & computers; comparative genomics, databases; model utility Tue Sep 26 DNA 2: Polymorphisms, populations, statistics,

An association test for CCR-5 & HIV resistance

Alleles SeroNeg SeroPos total ExpecNeg EXpecPosCCR-5+ 1278 1368 2646 1305 1341 ccr-5 130 78 208 103 105total 1408 1446 2854

Pdof=(r-1)(c-1)=1 ChiSq=sum[(o-e) 2̂/e]= 12.047374 0.00052

15.122772 0.000100.00008

Page 30: Schedule: Tue Sep 19 DNA 1: Life & computers; comparative genomics, databases; model utility Tue Sep 26 DNA 2: Polymorphisms, populations, statistics,

But what if we test more than one locus?

The future of genetic studies of complex human diseases. ref

Y= Number of Sib Pairs (Association)X= Population frequency (p)

GRR=1.5, #alleles=1E6

1E+2

1E+3

1E+4

1E+5

1E+6

1E+7

1E+8

1E+9

1E+10

1E-091E-081E-071E-060.000010.00010.0010.010.11

[based on Risch & Merikangas (1996) Science 273: 1516]

| Y= Number of Sib Pairs (Association)X= Genotypic Relative Risk (GRR)

#alleles=1E6, p=0.5

1E+1

1E+2

1E+3

1E+4

1E+5

1E+6

1E+7

1E+8

0.001 0.01 0.1 1 10 100 1000 10000

1.001 1.01 1.1 2 11 101 1,001 10,001

1-GRRGRR

[based on Risch & Merikangas (1996) Science 273: 1516]|

|

Y= Number of Sib Pairs (Assocation)X= Number of Alleles (Hypotheses) Tested

GRR=1.5, p= 0.5

0

200

400

600

800

1,000

1,200

1,400

1,600

1E+4 1E+6 1E+8 1E+10 1E+12 1E+14 1E+16 1E+18 1E+20 1E+22

[based on Risch & Merikangas (1996) Science 273: 1516]|

Page 31: Schedule: Tue Sep 19 DNA 1: Life & computers; comparative genomics, databases; model utility Tue Sep 26 DNA 2: Polymorphisms, populations, statistics,

How many "new" polymorphisms?

G= generations of exponential population growth = 5000N'= population size = 6 x 109 now; N= 104 pre-Gm= mutation rate per bp per generation = 10-8 to 10-9 (ref)L= diploid genome = 6 x 109 bp ekG = N'/N; so k= 0.0028 Av # new mutations < Lektm = 4 x 103 to 4 x 104

per genome t=1 to 5000

Take home: "High genomic deleterious mutation rates in hominids"accumulate over 5000 generations & confound LD.

Page 32: Schedule: Tue Sep 19 DNA 1: Life & computers; comparative genomics, databases; model utility Tue Sep 26 DNA 2: Polymorphisms, populations, statistics,

How well linked?

G= generations of exponential population growth = 5000N= population size = 6 x 109 now; N= 104 pre-Gfor each haplotype H,frequency of H on the variant gametes = nvH/nv

frequency of H on the + gametes= n+H/n+

linkage disequilibrium: d2 = (nvH/nv - n+H/n+)2 = 0 to 1= marker separation 1% recomb = 1 Mbp

If S= sample size needed to detect variant & disease assoc.then approx. S/d2 is required for the LD marker.(Kruglyak ref)

Page 33: Schedule: Tue Sep 19 DNA 1: Life & computers; comparative genomics, databases; model utility Tue Sep 26 DNA 2: Polymorphisms, populations, statistics,

LD as a function of marker spacing & population expansion times

Variant at 50%

Variant at 10%

Page 34: Schedule: Tue Sep 19 DNA 1: Life & computers; comparative genomics, databases; model utility Tue Sep 26 DNA 2: Polymorphisms, populations, statistics,

LD as a function of recombination and population size

Page 35: Schedule: Tue Sep 19 DNA 1: Life & computers; comparative genomics, databases; model utility Tue Sep 26 DNA 2: Polymorphisms, populations, statistics,

Finding & Creating mutants

IsogenicProof of causality: Find > Create a copy > Revert

Caution:Effects on nearby genesAneuploidy (ref)

Page 36: Schedule: Tue Sep 19 DNA 1: Life & computers; comparative genomics, databases; model utility Tue Sep 26 DNA 2: Polymorphisms, populations, statistics,

Lesch KP, et al Science 1996 274:1527-31 Association of anxiety-related traits with a polymorphism in the serotonin transportergene regulatory region. Pubmed

Pharmacogenomics Example

5-hydroxytryptamine transporter

Page 37: Schedule: Tue Sep 19 DNA 1: Life & computers; comparative genomics, databases; model utility Tue Sep 26 DNA 2: Polymorphisms, populations, statistics,

Caution: phases of human genetics

Monogenic vs. Polygenic dichotomy

Method ProblemsMendelian Linkage (300bp) need large familiesCommon indirect/LD (106bp) recombination & new allelesCommon direct (causative) 3% coding + ?non-codingAll alleles (109) expensive ($0.20 per SNP)

Page 38: Schedule: Tue Sep 19 DNA 1: Life & computers; comparative genomics, databases; model utility Tue Sep 26 DNA 2: Polymorphisms, populations, statistics,

Today's story, logic & goals

Types of mutantsMutation, drift, selection Binomial & exponential dx/dt = kxAssociation studies 2 statisticLinked and causative allelesHaplotypesComputing the first genome, the second ... New technologiesRandom and systematic errors

Page 39: Schedule: Tue Sep 19 DNA 1: Life & computers; comparative genomics, databases; model utility Tue Sep 26 DNA 2: Polymorphisms, populations, statistics,

& haplotyping technologiesde novo sequencing > scanning > selected sequencing > diagnostic methods

Sequencing by synthesis• 1-base Fluorescent, isotopic or Mass-spec* primer extension (Pastinen97) • 30-base extension Pyrosequencing (Ronaghi99)*• 700-base extension, capillary arrays dideoxy* (Tabor95, Nickerson97, Heiner98)

SNP & mapping methods• Sequencing by hybridization on arrays (Hacia98, Gentalen99)*• Chemical & enzymatic cleavage: (Cotton98)• SSCP, D-HPLC (Gross 99)*

Femtoliter scale reactions (105 molecules)• 20-base restriction/ligation MPSS (Gross 99)• 30-base fluorescent in situ amplification sequencing (Mitra 1999)

Single molecule methods (not production)• Fluorescent exonuclease (Davis91)• Patch clamp current during ss-DNA pore transit (Kasianowicz96)• Electron, STM, optical microscopy (Lagutina96, Lin99)

New Genotyping

Page 40: Schedule: Tue Sep 19 DNA 1: Life & computers; comparative genomics, databases; model utility Tue Sep 26 DNA 2: Polymorphisms, populations, statistics,

Anal Biochem 1997 Oct 1;252(1):78-88 Optimization of spectroscopic and electrophoretic properties of energy transfer primers.Hung SC, Mathies RA, Glazer AN

http://www.pebio.com/ab/apply/dr/dra3b1b.html

Fluorecent primers or ddNTPs

Page 41: Schedule: Tue Sep 19 DNA 1: Life & computers; comparative genomics, databases; model utility Tue Sep 26 DNA 2: Polymorphisms, populations, statistics,

Ewing, Hillier, Wendl, & Green

1998

Indel=I+DTotal= I+D+N+S

Page 42: Schedule: Tue Sep 19 DNA 1: Life & computers; comparative genomics, databases; model utility Tue Sep 26 DNA 2: Polymorphisms, populations, statistics,

For (clone) template isolation?

For sequencing?

For assembly?

What are examples of random & systematic errors?

Page 43: Schedule: Tue Sep 19 DNA 1: Life & computers; comparative genomics, databases; model utility Tue Sep 26 DNA 2: Polymorphisms, populations, statistics,

For (clone) template isolation:restriction sites, repeats

For sequencing:Hairpins, tandem repeats

For assembly:repeats, errors, polymorphisms, chimeric clones, read mistracking

Examples of systematic errors

Page 44: Schedule: Tue Sep 19 DNA 1: Life & computers; comparative genomics, databases; model utility Tue Sep 26 DNA 2: Polymorphisms, populations, statistics,

Project completion % vs coverage redundancy

(see Roach 1995)

0%

20%

40%

60%

80%

100%

120%

140%

160%

0 1 2 3 4 5 6 7 8 9 10 11 12

Closure Probab. 1939

Av Island length 1995

Island Length 1988

Whole-genome shotgun

X= mean coverage

Page 45: Schedule: Tue Sep 19 DNA 1: Life & computers; comparative genomics, databases; model utility Tue Sep 26 DNA 2: Polymorphisms, populations, statistics,

Weber & Myers 1997

Page 46: Schedule: Tue Sep 19 DNA 1: Life & computers; comparative genomics, databases; model utility Tue Sep 26 DNA 2: Polymorphisms, populations, statistics,

Sequential dNTP addition (pyrosequencing)> 30 base reads; no hairpin artefacts

A T A T A

Conventionaldideoxy gelwith 2 hairpin

B B’

3’ 5’

CTA

GA

Systematic errors

Page 47: Schedule: Tue Sep 19 DNA 1: Life & computers; comparative genomics, databases; model utility Tue Sep 26 DNA 2: Polymorphisms, populations, statistics,

Use of DNA Chips for SNP ID & Scoring

Wang et al., Science 280 (1998): 1077

• already used for mutation detection with HIV-1, BRCA1, mitochondria

• higher detection rate than gel-based assays

• higher throughput and potential for automation

• ID of > 2000 SNPs in 2 Mb of human DNA

• can multiplex reactions

Page 48: Schedule: Tue Sep 19 DNA 1: Life & computers; comparative genomics, databases; model utility Tue Sep 26 DNA 2: Polymorphisms, populations, statistics,

Use of Mass Spec for Analysis and Scoring

A single nucleotide primer extension assay

Haff and Smirnov, Genome Research 7 (1997): 378

Page 49: Schedule: Tue Sep 19 DNA 1: Life & computers; comparative genomics, databases; model utility Tue Sep 26 DNA 2: Polymorphisms, populations, statistics,

Mass Spectrometry for Analysis and

Scoring

Use mass spec to score which base was added

Can also multiplex as long as primer masses are known

Haff and Smirnov, Genome Res. 7 (1997):

378

Page 50: Schedule: Tue Sep 19 DNA 1: Life & computers; comparative genomics, databases; model utility Tue Sep 26 DNA 2: Polymorphisms, populations, statistics,

Searching for Perls

(If only finding mutations were as easy as finding words.)

#!/usr/local/bin/perl undef $/; $dnatext = <>; $dnatext =~ s/\>.+?\n//g; $mutation = $text =~ s/mutation/mutation/gi; print " found: $mutation\n";

Page 51: Schedule: Tue Sep 19 DNA 1: Life & computers; comparative genomics, databases; model utility Tue Sep 26 DNA 2: Polymorphisms, populations, statistics,

Today's story, logic & goals

Types of mutantsMutation, drift, selection Binomial & exponential dx/dt = kxAssociation studies 2 statisticLinked and causative allelesHaplotypesComputing the first genome, the second ... New technologiesRandom and systematic errors

Page 52: Schedule: Tue Sep 19 DNA 1: Life & computers; comparative genomics, databases; model utility Tue Sep 26 DNA 2: Polymorphisms, populations, statistics,

END Sep 26, 2000

101101

101

101

101

101

101

101

101

101

101

101

101

101

101

101

101

101

101

101

101

101

101

101

101

101

101

101

101

101

101

101

101

101

101

101

101

101

101

101

101

101

101

101

101

101

101

101

101

101

101

101

101

101

101

101

101

101

101

101

101

101

101

101

101

101

101

101

101

101

101

101

101

101

101

101

101

101

101

101

101

101

101

101

101

101

101

101

101

101

101

101

101

101

101

101

101

101

101

101

101

101

101

101

101

101

101

101

101

Please fill out your questionaires and hand them to a teaching fellow now.