Genetics to Genomics (From Basics to Buzzwords) • Genetics : Understanding the role of heritable material in shaping organismal phenotypes • Genomes are more than collections of genes: • Chromosomes and episomes • Gene clusters within chromosomes • Genes and associated control elements • Complex Exon/Intron organization • Functional domains organized within coding regions • Functional domains positioned outside coding regions The fundamental task of genomics is understanding what information is important, and what is not. • Genomes results from the accumulation of changes over time (evolution). • Therefore, an understanding of genomes must have a basis in the understanding of how constituent domains, genes, gene clusters and chromosomes evolve • This leads to an understanding of patterns of information within and between gene and genomes.
62
Embed
Genetics to Genomics (From Basics to Buzzwords)cobamide2.bio.pitt.edu/core/overheads.pdf · Genetics to Genomics (From Basics to Buzzwords) • Genetics: Understanding the role of
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Genetics to Genomics(From Basics to Buzzwords)
• Genetics : Understanding the role of heritable material in
shaping organismal phenotypes
• Genomes are more than collections of genes:
• Chromosomes and episomes
• Gene clusters within chromosomes
• Genes and associated control elements
• Complex Exon/Intron organization
• Functional domains organized within coding regions
• Functional domains positioned outside coding regions
The fundamental task of genomics is understanding whatinformation is important, and what is not.• Genomes results from the accumulation of changes over
time (evolution).
• Therefore, an understanding of genomes must have a basis
in the understanding of how constituent domains, genes,
gene clusters and chromosomes evolve
• This leads to an understanding of patterns of information
within and between gene and genomes.
A History of Genomic Data• Richard Roblin’s Ph.D. thesis in 1967 was the determination
of the identity of a single nucleotide (1 base is not asequence); it was the 5’ end of bacteriophage R17, a 3 kbRNA phage; that base was a guanosine (pppGp…)
• In 1970, the 12-bp cohesive ends of bacteriophage lambdawere determined by Ray Wu
• In 1977, two methods were introduced for rapid DNAsequencing (both won their proponents Nobel Prizes):• The Maxam-Gilbert chemical degradation method• The Sanger primer extension method
• In 1977, the 5,386 base sequence of E. coli bacteriophageφX-174 was published
• In 1983, the 48,502 base pair sequence of bacteriophage λ was published
• In 1995 the 1,830,137 base pair sequence of the free-livingbacterium Haemophilus influenza was published
• In late 1996, the 12,052,000 base pair sequence of the yeastSaccharomyces cerevisiae was published
• In late 1998, the 97,000,000 bp sequence of the nematodeCaenorhabditis elegans was published
• In 2000, a draft of the 3,000,000,000 base pair humangenome was completed
• By early 2003, the genomes of nearly 100 species ofBacteria and Archaea, and 10 species of eukaryotes, arecompletely sequenced.
Mendel and Darwin:More than two dead white guys?
• What was “Blending Inheritance”?
• How did they view mutations?
• What was the influence of Aristotle?
• What was the influence of Malthus?
• What was the influence of Geologists?
• Natural Variation The result of genetic experiments played
out over long periods of time
• Similarity & Difference : Provide clues to relative importance:
the results of (more or less) an infinite number of genetic
experiments
Pillars of molecular evolution(How we make our models)
Empirical Data• Direct experimentation in laboratory environments
• Direct experimentation in natural environments
• Observation of natural variation within species
• Observation of differences between species
Integrative Analysis• Mathematical Modeling
• Cogitation
• Extrapolation and integration
Classification of similarity
Criteria for Classification• By what criteria are features similar?
• By what criteria are features different?
• What processes lead to similarity?
• What processes lead to differences?
Types of Similarity
• Homology : Identity by Descent
• Orthology : Encoded functions are identical
• Paralogy : Encoded functions are different
• Convergence : Identity by State
• Chance : Identity by State
Methods for Assessing Molecular Similarity
• DNA-DNA Hybridization
• Isozyme analysis and MLEE
• Library overlap (SAB)
• RFLP Analysis
• DNA sequence divergence
Measuring Mutation Rates
Mutation rates• Luria-Delbrück Fluctuation tests
• Targets used in laboratory experiments
• Phage resistance
• Antibiotic resistance
• lacZ
• Lessons
• Mutations occur almost at random
• Probability matrix is organism-dependent
• There are context effects
Substitution rates• A mutation is a lesion
• A substitution is a variant allele observed in nature
• Not all mutations become substitutions
Fate of Mutations
A. Mutations originate at particular frequencies
i Variable exposure to mutagens
ii Context for polymerase error
iii Likelihood of replication slippage for frameshift
B. However, lesions are repaired at different frequencies
i Different mismatch repair systems
ii Transcription coupled repair
C. Mutation not repaired, but has lethal effects
D. Mutation is disadvantageous and eventually lost
i Though not lost, mutation is infrequent in the population
ii Mutation is frequent in the population
E. Mutation becomes ubiquitous in the population (fixation)
i For a neutral mutation, P = 1/p*N
ii Average time to fixation is T = 2pN generations
Random Genetic Drift
What is the probability that a variant allele becomes fixed?• P = (1 - e-4N
esq)/(1 - e-4N
es)
• Consider the correction that e-x = (1-x) when x is small
• Consider a newly arisen allele; in a diploid population
frequency is 1/2N
• P = (1 - e-2Ne
s/N)/(1 - e-4Ne
s)
• s = 0 for a neutral mutation
• Therefore P = 1/2N; this should be intuitive
• If Ne = N, then P = (1 - e-2s)/(1 - e-4Ns)
• If s is small, then P = 2s/(1 - e-4Ns)
• For s > 0, N is large, P = 2s
• Neutral alleles go to fixation in t=2pN generations
• For s <> 0, alleles fix in t = (2/|s|)ln(2N) generations
Definitions q = Initial frequency of variant allele
s = selection coefficient
Ne = Effective population size
N = Actual population size
Effectively Neutral Mutations
Are mutations always either beneficial or detrimental?
As we saw earlier, that depends on what phenotype one isexamining
Even more insidious, that depends on population size andpopulation structure
In small populations, it takes a mighty big change in fitness(either positive or negative) to counter-act the stochasticprocess of genetic drift. “Detrimental” mutations can sweepa population even if they confer a disadvantageousphenotype.
In larger populations, these same mutations could beeliminated quickly, since genetic drift has a smaller impact.
This interaction between population size and the effect of amutation delineates a zone of fitness effects termed“effectively neutral,” whereby the fitness impact is notstatistically different from zero. This is a function of thepopulation size.
This is why it is difficult to proclaim “conserved” sequencesas important and nonconserved sequences as unimportant.“Conservation” (that is, the elimination of deleteriousmutations from the population) is a function of populationsize.
It is also a function of population structure (subdivision,migration, etc) and sexual exchange (obligate or infrequent)
Selectionism vs Neutralism
What is the significance of natural variation?
Selectionism Selectionism argues that most variants have adaptive value,
and variation is maintain through a variety of mechanisms
• Selection/mutation balance
• Heterosis
• Frequency dependence
• Spatial and temporal heterogeneity
Neutralism Neutralism argues that most variation is effectively neutral, and
reflects primarily genetic drift
The Poisson Distribution
• Predicts the distribution of occurrences in a discrete
classification system
• Derived from the binomial distribution and equal probability
of state
• Pµ(x) = µx / x!eµ
• So, for µ=1, P(x) = 1/ex!
• So, the probability of zero occurrences is ~37%
• The probability of only 1 occurrence is ~37%
• The probability of 2 or more occurrences is ~27%
When did they diverge?
ACTGTAGGAATCGC * * * AATGAAAGAATCGC
If the probability of mutation is 10-9 / bp / generation, how manygenerations have these two sequences been diverging?
Naïve answer• Let p be the probability of a mutation arising
• This can be (and has been) measured in the laboratory
• p = 10-9 / bp / generation
• For 14 bp, p = 1.4x10-8 /generation
• Therefore 1 mutation arises - on average - every 7.14 x 107
generations
• Therefore 3 mutations arise - on average - in 2.14 x 108
generations
What is missing here?
When did they diverge - Part II
First, many substitutions go unnoticed
A A C C A Single Substitution T T G G A C T A Multiple Substitutions A A C G C A Coincidental Substitutions G G T A T A Parallel Substitutions A A A C T A T Convergent Substitution C C G G C C T C Back Substitution
Only the “Single Substitution” leads to differences thataccurately reflect the number of mutational events
Jukes and Cantor Model
• Probability of any base changing to another base during time
t is set to be α
• Probability of a base being equal to its original state at time
T= t is P1 = 1 - 3α
• At time T = 2t, the probability of the original state is:
P2 = (1 - 3α)P1 + α(1 - P1)• This can be formulated as a first-order differential equation:
dPt/dt = -4αPt + α
• This can be solved as Pt = ¼ + (P0 - ¼)e-4at
• Since P0 = 1, Pt = ¼ + ¾ e-4at
• If P0 = 0, Pt = ¼ - ¾ e-4at
• Notice that both equation converge at equilibrium
• Under the Jukes-Cantor model, all bases have the same
frequency and interchange with equal likelihood
Convergence of the Jukes & Cantor Model
Pro b
abili
ty o
f hav
ing
an 'A
'
0.25
0.5
0.75
1.00
0.00
Time (million years)0 50 100 150 200
Kimura’s Two-parameter Model
• Separates transition probability from transversion probability
• A transition substitution occurs with probability α
• A transversion substitution occurs with probability β
• Probability of identity over time is calculated as:
Xt = ¼ + ¼e-4βt + ½e-2(α+β)t
• Probability of difference by transition is
Yt = ¼ + ¼e-4βt - ½e-2(α+β)t
• Probability of difference by each transversion is
Zt = ¼ - ¼e-4βt
• Note that Xt + Yt + 2Zt = 1
Justification for the Kimura Model
Relative substitution rates in mammalian pseudogenes
Mutant
Nucleotide
Original Nucleotide
A T C G A - 4.4 +/- 1.1 6.5 +/- 1.1 20.1 +/- 2.2
T 4.7 +/- 1.3 - 21.0 +/- 2.1 7.2 +/- 1.1
C 5.0 +/- 0.7 8.2 +/- 1.3 - 5.3 +/- 1.0
G 9.4 +/- 1.3 3.3 +/- 1.2 4.2 +/- 0.5 -
• Notice that transition rates are higher than individual
transversion rates
• Notice also that the rates of substitution are not symmetrical
Correcting for multiple substitutions
• Let’s start with the Jukes & Cantor one-parameter model
• Probability of identity for sequence in TWO lineages is
Pi = ¼ + ¾ e-8αt
• Probability of difference is PD = (1- Pi)
PD = ¾(1 - e-8αt)
or, 8αt = -ln(1 - 4/3P)
• Since t is unknown, we cannot estimate α. Instead, we
compute K, the number of substitutions per site
• For 2 lineages, K = 2*(3αt)
• So, K = - ¾ * ln(1 - 4/3P), where P is the proportion of
differing nucleotides per site
• For sequence of length L, the sampling variance is
V(K) = P(1-P) / L(1 - 4/3P)2
• For the Kimura model, let P be the proportion of bases as
Like distance methods, parsimony will give you a tree,
although you may not get a “most-parsimonious” tree.
How good is it? Consider these two data sets:
Taxon Data Set 1 Data Set 2A GGGCCAATTAA GGGCCAATTAA
B GGGCCAATGCC GGGCCTTGGCC
C CAATTTTGTCC AAATTAATTCC
D CAATTTTGGAA AAATTTTGGAA
Steps 14 17
Chars 8 OF 11 5 OF 11
A
B
C
D
History of ClassificationGod 4500 B.C. not-A groupsNoah (3500 B.C.) Cladistic charactersPlato (427-347 BC) Idealized FormAristotle (384 - 322 BC) Scala NaturalHans and Zacharias Janssen (1600) MicroscopeMarcello Malpighi (1628-1694) Cellular orgaizationRobert Hooke (1635-1702) CellsAnton van Leeuwenhoek (1632-1723) Describe bacteria.Carl von Linné (1707-1778) Systema naturaleOtto F. Muller (1730-1784) 379 Animacule descriptionsAntoine-Laurent de Jussieu (1748 -1836) Major divisions of plantsGeorges Cluvier (1769-1832) Major animal phylaChristian Ehrenberg (1795-1876) Included bacteria in systematicsGeorges-Louis Buffon (1707-1788) Not all species present at the CreationThomas Malthus (1766-1834) Exponential growthGeorges Cluvier (1769-1832) CatastrophesLouis Agassiz (1807-1873) Serial CreationJames Hutton (1726-1797) Old EarthCharles Lyell (1797-1875) Old EarthJean-Baptiste Lamarck (1744-1829) Inheritance of acquired charactersCharles Darwin (1809 - 1882) Natural SelectionLouis Pasteur (1822-1895) Microbial processesErnst Haeckel (1834 - 1919) Evolutionary classificationEdouard Chatton (1883 - 1947) Prokaryote/eukaryote dichotomyHerbert Copeland (1902 - 1968) ReclassificationRobert Whittaker (1924 - 1980) “Modern” classificationEmil Zuckerkandl & Linus Pauling Molecular clocksMotoo Kimura and Tom Jukes Neutral theoryCarl Woese and George Fox Molecular phylogenyNaoyuki Iwabe & Takashi Miyata Rooting the tree of life via EF’sPeter Gogarten Rooting the tree of life via ATPasesBrian Golding & Radhey Gupta Eukarya by Fusion
God (4500 BC)
Heavens Earth
Noah (3500 BC)
Animals Plants
Phylogeny I
Yet the “Heavens” haveno defining characteristics
Potential Introductionof Hierarchy in Classification
Living Things
PlantsAnimals
Aristotle (350 BC)
Animal Vegetable
Mineral
Aristotle (350 BC)
Air Earth
Fire Water
Phylogeny II
Classification, butlacking Hierarchy
Classification, butlacking Hierarchy
Linneus (AD 1743)
Animalia Plantae
Infusoria
Chatton (AD 1937)
Prokaryotes
Eukaryotes
Phylogeny III
Completely Hierarchical(even to non-living things!)
AnimalsVertebrates
Invertebrates
Birds Mammals
Introduced Polarity,or Time, Into Lines ofPhylogenetic Descent
Copeland (AD 1956)
Animalia PlantaeFungi
ProtistaMonera
Whittaker (AD 1959)
Animalia Plantae Protista Fungi Monera
Eukaryotes Prokaryotes
- Incorporation of Chatton’s distinction
Association Coefficients Between representative members ofthe Three Primary KingdomsOrganism 1 2 3 4 5 6 7 8 9 10 11 12 13