Top Banner
Using DNA sequences to identify target organisms Obtain sequence Align sequences, number of parsimony informative sites Gap handling Picking sequences (order) Analyze sequences (similarity/parsimony/exhaustive/bayesian Analyze output; CI, HI Bootstrap/decay indices
55

Using DNA sequences to identify target organisms Obtain sequence Align sequences, number of parsimony informative sites Gap handling Picking sequences.

Dec 20, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Using DNA sequences to identify target organisms Obtain sequence Align sequences, number of parsimony informative sites Gap handling Picking sequences.

Using DNA sequences to identify target organisms

• Obtain sequence• Align sequences, number of parsimony informative sites• Gap handling• Picking sequences (order)• Analyze sequences

(similarity/parsimony/exhaustive/bayesian• Analyze output; CI, HI Bootstrap/decay indices

Page 2: Using DNA sequences to identify target organisms Obtain sequence Align sequences, number of parsimony informative sites Gap handling Picking sequences.

2

Sequencing reaction (a)

Sequencing reaction requires:

PCR amplification product as template 1 oligonucleotide - Primer Nucleotides dATP, dCTP, dGTP, dTTP Taq polymerase Modified nucleotides ddATP, ddCTP, ddGTP, ddTTP

– ddNTPs are incorporated into the polynucleotide chain and block further elongation

– ddNTPs are fluorescently labeled, each with a different fluorocrome

Page 3: Using DNA sequences to identify target organisms Obtain sequence Align sequences, number of parsimony informative sites Gap handling Picking sequences.

3

Sequencing reaction (b)

1. Annealing

2. Elongation

3. Incorporation of ddNTP and stop of the elongation

ddATP FAM

ddCTP HEX

5’

5’

5’

5’

Page 4: Using DNA sequences to identify target organisms Obtain sequence Align sequences, number of parsimony informative sites Gap handling Picking sequences.

4

Page 5: Using DNA sequences to identify target organisms Obtain sequence Align sequences, number of parsimony informative sites Gap handling Picking sequences.

5

Alignment of the 2 sequences obtained using the Forward and the Reverse primers on the same PCR amplification product

Page 6: Using DNA sequences to identify target organisms Obtain sequence Align sequences, number of parsimony informative sites Gap handling Picking sequences.

6

Alignment of several sequences showing a T/C substitution (homozygote)

Page 7: Using DNA sequences to identify target organisms Obtain sequence Align sequences, number of parsimony informative sites Gap handling Picking sequences.

Good chromatogram!

Bad chromatogram…

Pull-up (too much signal) Loss of fidelity leads to slips, skips and mixed signals

Reverse reaction suffers same problems in opposite direction

Page 8: Using DNA sequences to identify target organisms Obtain sequence Align sequences, number of parsimony informative sites Gap handling Picking sequences.

Alignments (Se-Al)

Page 9: Using DNA sequences to identify target organisms Obtain sequence Align sequences, number of parsimony informative sites Gap handling Picking sequences.

Using DNA sequences

• Bootstrap: the presence of a branch separating two groups of microbial strains could be real or simply one of the possible ways we could visualize microbial populations. Bootstrap tests whether the branch is real. It does so by trying to see through iterations if a similar branch can come out by chance for a given dataset

• BS value over 65 ok over 80 good, under 60 bad

Page 10: Using DNA sequences to identify target organisms Obtain sequence Align sequences, number of parsimony informative sites Gap handling Picking sequences.

10

Statistical support

Re-sampling (~ 10000 times)Bootstrap analysis

The original loci are randomly re-sampled with replacement

Jacknife analysis

From the original data 1 locus is randomly removed

Page 11: Using DNA sequences to identify target organisms Obtain sequence Align sequences, number of parsimony informative sites Gap handling Picking sequences.

Using DNA sequences

• Testing alternative trees: kashino hasegawa • Molecular clock• Outgroup• Spatial correlation (Mantel)

• Networks and coalescence approaches

Page 12: Using DNA sequences to identify target organisms Obtain sequence Align sequences, number of parsimony informative sites Gap handling Picking sequences.

Genotype

• A unique individual as defined by an array of genetic markers. (the more markers you have the less mistaken identity you will have.

blonde

Page 13: Using DNA sequences to identify target organisms Obtain sequence Align sequences, number of parsimony informative sites Gap handling Picking sequences.

• Blonde

• Blue-eyed

Page 14: Using DNA sequences to identify target organisms Obtain sequence Align sequences, number of parsimony informative sites Gap handling Picking sequences.

• Blonde

• Blue-eyed

• Hairy

Page 15: Using DNA sequences to identify target organisms Obtain sequence Align sequences, number of parsimony informative sites Gap handling Picking sequences.

• Blonde

• Blue-eyed

• Hairy

• 6 feet tall

Page 16: Using DNA sequences to identify target organisms Obtain sequence Align sequences, number of parsimony informative sites Gap handling Picking sequences.

• Blonde

• Blue-eyed

• Hairy

• 6 feet tall

• Missing two molars

Page 17: Using DNA sequences to identify target organisms Obtain sequence Align sequences, number of parsimony informative sites Gap handling Picking sequences.

In the case of microbes it will probably be something like

• Genotype A= 01010101

• Genotype B= 00110101

• Genotype C= 00010101

Page 18: Using DNA sequences to identify target organisms Obtain sequence Align sequences, number of parsimony informative sites Gap handling Picking sequences.

Dominant vs. co-dominant markers

• Flowers are red or white or yellow, DNA sequence is agg, agt, agc; DNA fragment is 10, 12 0r 14 bp long (CO-DOMINANT, we know what alternative alleles are)

• Flowers are red or non-red, DNA is agg or not, size is 10bp or not. We only see the dominant allele and we express it in binary code 1(present), 0(absent)

Page 19: Using DNA sequences to identify target organisms Obtain sequence Align sequences, number of parsimony informative sites Gap handling Picking sequences.

Limitations of co-dominant markers

• Not all non-red flowers are the same, but we assume they are (non red flowers can be orange or yellow)

• If at one locus we have a dominant A allele and a recessive a allele, using a codominant marker we would say AA=Aa but not aa. We know in reality AA and Aa are quite different.

Page 20: Using DNA sequences to identify target organisms Obtain sequence Align sequences, number of parsimony informative sites Gap handling Picking sequences.

20

Study the genetic structure of a population

in an area Number of different genotypes Determine gene flow between two

population Determine if there is an ongoing invasion Duration of infestation

Page 21: Using DNA sequences to identify target organisms Obtain sequence Align sequences, number of parsimony informative sites Gap handling Picking sequences.

21

Some Considerations in Choosing a Genotyping Method• Level of taxonomic resolution desired

(Populations? Species? Phyla?)• Level of genotypic resolution desired

– Dominant vs. codominant markers– Fine (e.g., nucleotide-level) data vs. coarse

(e.g., fragment size) genomic scale

• Previous sequence knowledge• Cost and labor constraints

Page 22: Using DNA sequences to identify target organisms Obtain sequence Align sequences, number of parsimony informative sites Gap handling Picking sequences.

22

Genetic Markers

• SNPsSingle Nucleotide Polymorphisms

substitution of a nucleotide4 alleles: Adenine, Guanine, Cysteine, Thymine

Insertion/deletion of a nucleotide2 alleles: presence or absence of the nucleotide

Approximately every 200 – 300 bp Different degrees of variability

• Microsatellites

variation in number of short tandem repeats Unknown number of alleles High variability

Page 23: Using DNA sequences to identify target organisms Obtain sequence Align sequences, number of parsimony informative sites Gap handling Picking sequences.

23

Choice of genetic marker (a)

Comparison of individuals of the same species but isolated requires markers with low level of variability

No microsatellitesSNPs in genes necessary for the survival

of the cell• ATPase (cellular energy)• Cyt b (cytochrome b)• Cox1 (cytochrome c oxidase subunit 1)

Page 24: Using DNA sequences to identify target organisms Obtain sequence Align sequences, number of parsimony informative sites Gap handling Picking sequences.

24

Choice of genetic marker (b)

• Comparison of individuals closely related requires markers with high level of variability

MicrosatellitesSNPs in non-coding regions of genesAnonymous SNPs in the genome

Page 25: Using DNA sequences to identify target organisms Obtain sequence Align sequences, number of parsimony informative sites Gap handling Picking sequences.

25

PCR amplification (a)

PCR amplification requires:

DNA template 2 oligonucleotides - Primers Nucleotides dATP, dCTP, dGTP, dTTP Taq polymerase

Page 26: Using DNA sequences to identify target organisms Obtain sequence Align sequences, number of parsimony informative sites Gap handling Picking sequences.

26

PCR reaction (b)

1. Double strand denaturation

2. Annealing of the primers

3. Elongation

5’5’

5’

3’3’

Page 27: Using DNA sequences to identify target organisms Obtain sequence Align sequences, number of parsimony informative sites Gap handling Picking sequences.

27

Restriction Enzymes• Found in bacteria

• Cut DNA within the molecule (endonuclease)

• Cut at sequences that are specific for each enzyme (restriction sites)

• Leave either blunt or sticky ends, depending upon the specific enzyme

http://users.rcn.com/jkimball.ma.ultranet/BiologyPages/R/RestrictionEnzymes.html

Tobin & Dusheck, Asking About Life, 2nd ed. Copyright 2001, Harcourt, Inc.

Page 28: Using DNA sequences to identify target organisms Obtain sequence Align sequences, number of parsimony informative sites Gap handling Picking sequences.

28

Microsatellites Short tandem repeats

ACT ACT ACTACT ACT

ACT ACT ACTACT

DNA

DNA

Microsatellites are located in non-coding regions

Page 29: Using DNA sequences to identify target organisms Obtain sequence Align sequences, number of parsimony informative sites Gap handling Picking sequences.

29

Fluorescent genotyping of microsatellites

1. PCR amplification using 1 primer fluorescently labeled2. PCR amplification product mixed with a size marker3. PCR fragments separated by capillary electrophoresis

ACT ACT ACT ACTACT

5’ 5’

Page 30: Using DNA sequences to identify target organisms Obtain sequence Align sequences, number of parsimony informative sites Gap handling Picking sequences.

30

Size of the amplification product is variable and corresponds to the length of the flanking sequences plus a multiple of the size of the repeat

Co-dominant: homozygote for allele 1homozygote for allele 2 heterozygote

Page 31: Using DNA sequences to identify target organisms Obtain sequence Align sequences, number of parsimony informative sites Gap handling Picking sequences.

31Tetra repeat: allele 1 486 bp

Page 32: Using DNA sequences to identify target organisms Obtain sequence Align sequences, number of parsimony informative sites Gap handling Picking sequences.

32Tetra repeat: allele 2 490 bp

Page 33: Using DNA sequences to identify target organisms Obtain sequence Align sequences, number of parsimony informative sites Gap handling Picking sequences.

33

Sequencing reaction (a)

Sequencing reaction requires:

PCR amplification product as template 1 oligonucleotide - Primer Nucleotides dATP, dCTP, dGTP, dTTP Taq polymerase Modified nucleotides ddATP, ddCTP, ddGTP, ddTTP

– ddNTPs are incorporated into the polynucleotide chain and block further elongation

– ddNTPs are fluorescently labeled, each with a different fluorocrome

Page 34: Using DNA sequences to identify target organisms Obtain sequence Align sequences, number of parsimony informative sites Gap handling Picking sequences.

34

Sequencing reaction (b)

1. Annealing

2. Elongation

3. Incorporation of ddNTP and stop of the elongation

ddATP FAM

ddCTP HEX

5’

5’

5’

5’

Page 35: Using DNA sequences to identify target organisms Obtain sequence Align sequences, number of parsimony informative sites Gap handling Picking sequences.

35

Page 36: Using DNA sequences to identify target organisms Obtain sequence Align sequences, number of parsimony informative sites Gap handling Picking sequences.

36

Alignment of the 2 sequences obtained using the Forward and the Reverse primers on the same PCR amplification product

Page 37: Using DNA sequences to identify target organisms Obtain sequence Align sequences, number of parsimony informative sites Gap handling Picking sequences.

37

Alignment of several sequences showing a T/C substitution (homozygote)

Page 38: Using DNA sequences to identify target organisms Obtain sequence Align sequences, number of parsimony informative sites Gap handling Picking sequences.

38

PCR-RFLP Restriction Fragment Length Polymorphism

• Restriction enzymes cut the DNA at specific sequences

• DNA fragment containing a restriction sequence (EcoRI)

AGGTGAATCCAAAATTTT • DNA fragment after restriction digestion

AGGTG AATTCAAATTT

Page 39: Using DNA sequences to identify target organisms Obtain sequence Align sequences, number of parsimony informative sites Gap handling Picking sequences.

39

Scoring PCR-RFLP

PCR amplification of the region containing the restriction sites

Electrophoresis to identify presence or absence of bands

Size marker

Sample 1 Sample 2

Page 40: Using DNA sequences to identify target organisms Obtain sequence Align sequences, number of parsimony informative sites Gap handling Picking sequences.

40

PCR-RFLPFluorescent electrophoresis

Page 41: Using DNA sequences to identify target organisms Obtain sequence Align sequences, number of parsimony informative sites Gap handling Picking sequences.

41

P. ramorumCoxI-PCR-RFLP

PCR amplification of a 972 bp portion of the CoxI gene Restriction digestion with Apo I

EU isolates (mating type A1) have a C at position 377 of the amplicon Apo I cuts

US isolates (mating type A2) have a T at position 377 of the amplicon Apo I does not cut

Page 42: Using DNA sequences to identify target organisms Obtain sequence Align sequences, number of parsimony informative sites Gap handling Picking sequences.

42

PCR-SSCPSingle Strand Conformation Polymorphisms

• Denatured DNA (single strand) can be differentiate using electrophoresis on the basis of a single nucleotide difference

PCR amplification of region containing the polymorphism

Denaturation Gel electrophoresis

Page 43: Using DNA sequences to identify target organisms Obtain sequence Align sequences, number of parsimony informative sites Gap handling Picking sequences.

43

• PCR amplification of a selected gene, with one primer labeled with a fluorophore.• Digestion of DNA with a restriction enzyme; number and length of the resulting fragments is determined by the presence/absence of appropriate restriction sites (i.e., depends upon the underlying DNA sequence• Because the fluorophore is bound to the 5’ end of the PCR product, only the fragment that occurs 5’ to the restriction site will appear when run on an automated DNA sequencer • Size of the fragment may be specific to a certain genotype (though

resolution is limited!)436

281160303

468 485

T-RFLPTerminal Restriction Fragment Length Polymorphisms

Page 44: Using DNA sequences to identify target organisms Obtain sequence Align sequences, number of parsimony informative sites Gap handling Picking sequences.

44

T-RFLP Analysis I:Hierarchical Clustering

Grouping by overall similarity (distance) calculated between plots or communities -- e.g., Jaccard’s index: J=M/(M+N), where M = #matches and N= #mismatches; followed by clustering (e.g., UPGMA)

Figure: Plots clustered by bacterial community composition. Groupings do not correspond to carbon dioxide enrichment treatment (Osmundson, Naeem et al., in prep.)

Page 45: Using DNA sequences to identify target organisms Obtain sequence Align sequences, number of parsimony informative sites Gap handling Picking sequences.

45

T-RFLP Analysis II: MRPP & Indicator Species

Analysis Multiresponse permutation procedure

(MRPP): Do a priori groups (in this example,

based on carbon dioxide treatment) differ significantly in their biotic (in this example, microbial) communities?

Indicator Species Analysis: Are there species that discriminate

between groups?

(Osmundson, Naeem et al., in prep)

Page 46: Using DNA sequences to identify target organisms Obtain sequence Align sequences, number of parsimony informative sites Gap handling Picking sequences.

46

T-RFLP Analysis III:NMS (Nonmetric

Multidimensional Scaling)

Ordination based on community presence/absence matrix

Page 47: Using DNA sequences to identify target organisms Obtain sequence Align sequences, number of parsimony informative sites Gap handling Picking sequences.

47

Random Genomic Markers

DNA sequence of suitable SNPs is not available Relatively inexpensive Scan the entire genome producing information on

several variations in the same reaction

RAPD Random Amplification of Polymorphic DNA

AFLP Amplified Fragment Length Polymorphism

Page 48: Using DNA sequences to identify target organisms Obtain sequence Align sequences, number of parsimony informative sites Gap handling Picking sequences.

48

RAPDRandom Amplification of Polymorphic DNA

Amplification of genomic DNA included between 2 identical short sequences (random)

Genomic DNA is amplified with 1 pair of identical (complementary) primers (generally 10 bp and GC rich)example: 5’ AATCGGTACA 3’ and 5’ TGTACCGATT 3’

Amplification using a low annealing temperature (increased amplification for sequences not exactly complementary to the primer sequence)

The primers amplify or not depending on the presence or absence of the short sequence used to design the primers

3’3’ 5’5’5’

Page 49: Using DNA sequences to identify target organisms Obtain sequence Align sequences, number of parsimony informative sites Gap handling Picking sequences.

49

Scoring RAPD

Presence (1) or absence (0) of amplification product = Dominant marker

Mismatches between primer and template might also result in decreased amount of PCR product

Nucleotide substitution at 3’ end of the primer no annealing = no amplification

Nucleotide substitution at 5’ end of the primer < annealing = < amplification

Page 50: Using DNA sequences to identify target organisms Obtain sequence Align sequences, number of parsimony informative sites Gap handling Picking sequences.

50

AFLP Amplified Fragment Length Polymorphisms

(Vos et al., 1995)

Genomic DNA digested with 2 restriction enzymes:– EcoRI (6 bp restriction site)cuts infrequently

– MseI(4 bp restriction site)cuts frequently

GAATTCCTTAAG

TTAAAATT

Page 51: Using DNA sequences to identify target organisms Obtain sequence Align sequences, number of parsimony informative sites Gap handling Picking sequences.

51

Fragments of DNA resulting from restriction digestion are ligated with end-specific adaptors (a different one for each enzyme) to create a new PCR priming site

Pre selective PCR amplification is done using primers complementary to the adaptor + 1 bp (chosen by the user)

NN N N

Page 52: Using DNA sequences to identify target organisms Obtain sequence Align sequences, number of parsimony informative sites Gap handling Picking sequences.

52

Selective amplification using primers complementary to the adaptor (+1 bp) + 2 bp

NNNNNN NNN NNN

Page 53: Using DNA sequences to identify target organisms Obtain sequence Align sequences, number of parsimony informative sites Gap handling Picking sequences.

53

AFLP genotyping

PCR amplification using primers corresponding to the new sequenceIf there are 2 new priming sites within 400 – 1600 bp there is amplification

The result is: Presence or absence of amplification1 or 0Dominant marker: does not distinguish between heterozygote and homozygote

Due mostly to SNPs but also to deletions/insertions

Page 54: Using DNA sequences to identify target organisms Obtain sequence Align sequences, number of parsimony informative sites Gap handling Picking sequences.

54

AFLP OVERVIEW(VOS ET AL., 1995)

Page 55: Using DNA sequences to identify target organisms Obtain sequence Align sequences, number of parsimony informative sites Gap handling Picking sequences.

55

AFLPFluorescent electrophoresis