onal Center for Biotechnology Information onal Center for Biotechnology Information Evolution of eukaryotic Evolution of eukaryotic genomes: remarkable genomes: remarkable conservation and massive conservation and massive loss of genes and loss of genes and introns introns Eugene V. Koonin National Center for Biotechnology Information, NIH, Bethesda, MD
49
Embed
National Center for Biotechnology Information Evolution of eukaryotic genomes: remarkable conservation and massive loss of genes and introns Eugene V.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Nati
on
al C
en
ter
for
Bio
tech
nolo
gy In
form
ati
on
Nati
on
al C
en
ter
for
Bio
tech
nolo
gy In
form
ati
on
Evolution of eukaryotic Evolution of eukaryotic genomes: remarkable genomes: remarkable
conservation and massive loss conservation and massive loss of genes and intronsof genes and introns
Evolution of eukaryotic Evolution of eukaryotic genomes: remarkable genomes: remarkable
conservation and massive loss conservation and massive loss of genes and intronsof genes and introns
Eugene V. Koonin
National Center for Biotechnology Information,NIH, Bethesda, MD
Nati
on
al C
en
ter
for
Bio
tech
nolo
gy In
form
ati
on
Nati
on
al C
en
ter
for
Bio
tech
nolo
gy In
form
ati
on
“In my own subjects, genetics and molecular biology, research has become so directed toward medical problems and the needs of the pharmaceutical companies that most people do not recognize that the most challenging intellectual problem of all time, the reconstruction of our biological past, can now be tackled with some hope of success. “
Sydney Brenner, Science 282, 1411-1412 (20 Nov 1998)
Nati
on
al C
en
ter
for
Bio
tech
nolo
gy In
form
ati
on
Nati
on
al C
en
ter
for
Bio
tech
nolo
gy In
form
ati
on Comprehensive evolutionary
classification of genes fromsequenced genomes
Nati
on
al C
en
ter
for
Bio
tech
nolo
gy In
form
ati
on
Nati
on
al C
en
ter
for
Bio
tech
nolo
gy In
form
ati
on
Ancient conserved eukaryotic genes
Nati
on
al C
en
ter
for
Bio
tech
nolo
gy In
form
ati
on
Nati
on
al C
en
ter
for
Bio
tech
nolo
gy In
form
ati
on
Current status of evolutionary classification of proteins from 7 complete eukaryotic genomes:
112920 proteins = 65170 in KOGs + 23436 in LSEs + 24314 singletons
Lineage-specific expansions
Tatusov et al., BMC Bionformatics, 2003 Sep 11;4(1):41.
Nati
on
al C
en
ter
for
Bio
tech
nolo
gy In
form
ati
on
Nati
on
al C
en
ter
for
Bio
tech
nolo
gy In
form
ati
on Breakdown of eukaryotic proteins into KOGs, LSEs and
singletons
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
E. cuniculi S. cerevisiae S. pombe A. thaliana C. elegans D.melanogaster
H. sapiens
Singletons
LSEs
2-species KOGs
>3 species KOGs
Current status of evolutionary classification of proteins from 7 complete genomes
Nati
on
al C
en
ter
for
Bio
tech
nolo
gy In
form
ati
on
Nati
on
al C
en
ter
for
Bio
tech
nolo
gy In
form
ati
on
Define a phyletic pattern
Nati
on
al C
en
ter
for
Bio
tech
nolo
gy In
form
ati
on
Nati
on
al C
en
ter
for
Bio
tech
nolo
gy In
form
ati
on
Nati
on
al C
en
ter
for
Bio
tech
nolo
gy In
form
ati
on
Nati
on
al C
en
ter
for
Bio
tech
nolo
gy In
form
ati
on
Nati
on
al C
en
ter
for
Bio
tech
nolo
gy In
form
ati
on
Nati
on
al C
en
ter
for
Bio
tech
nolo
gy In
form
ati
on
Nati
on
al C
en
ter
for
Bio
tech
nolo
gy In
form
ati
on
Nati
on
al C
en
ter
for
Bio
tech
nolo
gy In
form
ati
on
Nati
on
al C
en
ter
for
Bio
tech
nolo
gy In
form
ati
on
Nati
on
al C
en
ter
for
Bio
tech
nolo
gy In
form
ati
on
858
921
186
188
1421109
271
1947
All
All-Ec
Animals-Fungi
Plant+fungi
Plant+animals
All animals
All fungi
Other patterns
Phyletic patterns of eukaryotic KOGs
Nati
on
al C
en
ter
for
Bio
tech
nolo
gy In
form
ati
on
Nati
on
al C
en
ter
for
Bio
tech
nolo
gy In
form
ati
on
S. cerevisiae
717
497 1004
273 1120
115 1463
221
0%
25%
50%
75%
100%
non-essential essential
1 2-5 6 7
Phyletic patterns of KOGs and phenotypic effect of knockouts
Essential genes tend not to be lost during evolution
Nati
on
al C
en
ter
for
Bio
tech
nolo
gy In
form
ati
on
Nati
on
al C
en
ter
for
Bio
tech
nolo
gy In
form
ati
on
C. elegans
736
312 917
154
3602
181 7282
163
0%
25%
50%
75%
100%
non-essential essential
1 2-5 6 7
Phyletic patterns of KOGs and phenotypic effect of knockouts
Essential genes tend not to be lost during evolution
Nati
on
al C
en
ter
for
Bio
tech
nolo
gy In
form
ati
on
Nati
on
al C
en
ter
for
Bio
tech
nolo
gy In
form
ati
on
The traditional application of the evolutionary parsimony principle:
Given the distribution of a set of binary characters in a set of species, construct the shortest tree (maximum parsimony tree)
A 10111100B 00110111C 00010111D 10111010
A
D
B
C
Nati
on
al C
en
ter
for
Bio
tech
nolo
gy In
form
ati
on
Nati
on
al C
en
ter
for
Bio
tech
nolo
gy In
form
ati
on
However, parsimony can be used with equal ease to addressthe reverse task: given the distribution of a set of binary characters in a set of species AND the *true* tree topology, construct the most parsimonious scenario of evolution (which, of course, might include many more events than the overall most economical scenario)
A 10111100B 00110111C 00010111D 10111010
A B C D
2 1 32
2 210111010 00010111
Nati
on
al C
en
ter
for
Bio
tech
nolo
gy In
form
ati
on
Nati
on
al C
en
ter
for
Bio
tech
nolo
gy In
form
ati
on
Ec Sc Sp Ce Dm Hs
At
100%
100%
Maximum parsimony (Dollo) tree for eukaryotes based on the phyletic patterns of KOGs
Nati
on
al C
en
ter
for
Bio
tech
nolo
gy In
form
ati
on
Nati
on
al C
en
ter
for
Bio
tech
nolo
gy In
form
ati
on
The phylogenetic parsimony tree built on the basis of KOG phyletic patterns did not follow the species treeHowever, the parsimony principle can be applied in the opposite direction: given a species tree topology, construct the most parsimonious scenario for the evolution of eukaryotic gene repertoire (mapping of gene (KOG) gain and loss events on the tree branches):
1/0
0/1
gain
loss
Nati
on
al C
en
ter
for
Bio
tech
nolo
gy In
form
ati
on
Nati
on
al C
en
ter
for
Bio
tech
nolo
gy In
form
ati
on
3491 520
Dm Hs Ce Sc Sp At
13688 162
4503 541
-
3711
398 37
1358 193
422 -
55
Ec
32605361
5000 3048
3835
3413
15802
1679299 1969202
842 586
267
The most parsimonious scenario of gene loss and birth in eukaryotic evolution and ancestral gene sets
Gene gainGene loss
Koonin et al. 2004. Genome Biol. 5: R7.
Nati
on
al C
en
ter
for
Bio
tech
nolo
gy In
form
ati
on
Nati
on
al C
en
ter
for
Bio
tech
nolo
gy In
form
ati
on
Exon/intron structure of eukaryotic genes
Eukaryotic nuclear, protein-coding genes usually contain multiple spliceosomal introns that are spliced out of pre-mRNAs by an RNA-protein complex, the spliceosome.
GU AG
exon1 exon2
intron
Nati
on
al C
en
ter
for
Bio
tech
nolo
gy In
form
ati
on
Nati
on
al C
en
ter
for
Bio
tech
nolo
gy In
form
ati
on
Evolution of introns and the exonic Evolution of introns and the exonic structure of eukaryotic genesstructure of eukaryotic genes
Evolution of introns and the exonic Evolution of introns and the exonic structure of eukaryotic genesstructure of eukaryotic genes
• Tempo and mode of intron evolution remain poorly understood.
• When did introns invade eukaryotic genes:prior to the origin of eukaryotes (introns early),early in eukaryotic evolution, or late?
• The common ancestor of animals, plants andfungi: intron-rich or intron-poor?
• What fraction of introns is conserved over longevolutionary spans?
Nati
on
al C
en
ter
for
Bio
tech
nolo
gy In
form
ati
on
Nati
on
al C
en
ter
for
Bio
tech
nolo
gy In
form
ati
on
Origin of intronsOrigin of intronsOrigin of intronsOrigin of introns
• The "intron-early" hypothesis suggests that introns existed before the divergence of prokaryotes and eukaryotes (W. Gilbert).
• The "intron-late" hypothesis posits that introns were inserted into eukaryotic genes after this divergence (T.Cavalier-Smith, Doolittles, J.Palmer)
Loss and sliding
Gain and loss
Nati
on
al C
en
ter
for
Bio
tech
nolo
gy In
form
ati
on
Nati
on
al C
en
ter
for
Bio
tech
nolo
gy In
form
ati
on
Three mechanisms of intron evolution have been invoked by proponents of both theories: - intron loss
- intron gain
- intron sliding
Mechanisms of intron evolutionMechanisms of intron evolution
Nati
on
al C
en
ter
for
Bio
tech
nolo
gy In
form
ati
on
Nati
on
al C
en
ter
for
Bio
tech
nolo
gy In
form
ati
on
Mechanisms of intron evolution: Mechanisms of intron evolution: intron lossintron loss
Complete loss of introns: re-integration of reverse-transcribed mRNAs into the genome
Loss of one or few introns (recombination/gene conversion between cDNAs and genomic sequences (Feiber et al. 2002 ))
Nati
on
al C
en
ter
for
Bio
tech
nolo
gy In
form
ati
on
Nati
on
al C
en
ter
for
Bio
tech
nolo
gy In
form
ati
on
Mechanisms of intron evolution: Mechanisms of intron evolution: intron gainintron gain
?A common event
Nati
on
al C
en
ter
for
Bio
tech
nolo
gy In
form
ati
on
Nati
on
al C
en
ter
for
Bio
tech
nolo
gy In
form
ati
on
Mechanisms of intron evolutionMechanisms of intron evolution
Why is our understanding of intron evolution so limited?
- Lack of information on exon/intron structure oforthologous genes
Can we use completely sequenced genomes?
- This is a great source of information but …they are not necessarily easy to work with...
Nati
on
al C
en
ter
for
Bio
tech
nolo
gy In
form
ati
on
Nati
on
al C
en
ter
for
Bio
tech
nolo
gy In
form
ati
on
Analysis of introns in completely Analysis of introns in completely sequenced genomessequenced genomes
We used sets of orthologous genes which contained a memberfrom each of 8 eukaryotic genomes:
The only intron among 684 genes conserved in 7 species
Matrices for all analyzed genes were concatenated and employedto build a single tree - 684 KOGs, 7236 intron positions
Nati
on
al C
en
ter
for
Bio
tech
nolo
gy In
form
ati
on
Nati
on
al C
en
ter
for
Bio
tech
nolo
gy In
form
ati
on
Phylogenetic tree of crown group eukaryotes based on conservation of intron positions: parsimony
100%
Dm
Ag
Hs
Ce Sc
Sp
At
Pf
100%
100%
100%
99%
The topology of this tree is a bit unexpected...
Nati
on
al C
en
ter
for
Bio
tech
nolo
gy In
form
ati
on
Nati
on
al C
en
ter
for
Bio
tech
nolo
gy In
form
ati
on
The phylogenetic parsimony tree built on the basis of the pattern of intron conservation did not follow the species tree.However, the parsimony principle can be applied in the opposite direction: given a species tree topology, construct the most parsimonious scenario for the evolution of eukaryotic gene structure: distribution of intron gain and loss events over the tree branches
1/0
0/1
gain
loss
Nati
on
al C
en
ter
for
Bio
tech
nolo
gy In
form
ati
on
Nati
on
al C
en
ter
for
Bio
tech
nolo
gy In
form
ati
on
Parsimonious evolutionary scenario for the mostrealistic topology of the eukaryotic tree
Roy SW, Fedorov A, Gilbert W.Large-scale comparison of intron positions in mammalian genes shows intron loss but no gain.Proc Natl Acad Sci U S A. 2003 Jun 10;100(12):7158-62.
A. S. Kondrashov, personal communication
There seems to have been virtually no intron gain and limited intron loss during mammalian evolution
Human mouse rat
~100 introns lost~0 introns gained
~100 Mya
Fish
Nati
on
al C
en
ter
for
Bio
tech
nolo
gy In
form
ati
on
Nati
on
al C
en
ter
for
Bio
tech
nolo
gy In
form
ati
on A conundrum of intron evolution:
•practically no intron gain during (at least)~100 mln yrs of mammalian evolution
•apparent massive gain during evolution of animal phyla (e.g., chordates) ~500-700 mln yr scale
Are major transitions in eukaryotic evolution associated with bursts of intron insertion?
Nati
on
al C
en
ter
for
Bio
tech
nolo
gy In
form
ati
on
Nati
on
al C
en
ter
for
Bio
tech
nolo
gy In
form
ati
on
Correlation between gain of genes and introns
-5000
0
5000
10000
15000
0 500 1000 1500 2000
intron gain
gen
e g
ain
R=0.96
Correlation between the loss of genes and introns
0
200
400
600
800
1000
0 200 400 600 800 1000
intron loss
ge
ne
los
s
R=0.93
Koonin, 2004, Cell Cycle 3, 280
Nati
on
al C
en
ter
for
Bio
tech
nolo
gy In
form
ati
on
Nati
on
al C
en
ter
for
Bio
tech
nolo
gy In
form
ati
on
Gain/loss of genes and gain/loss of introns in conserved genesoccur in parallel in eukaryotic evolution – probably manifestation of the same, general lineage-specific trends
‘…by magnifying the power of random genetic drift, reduced population size provides a permissive environment for the proliferation of various genomic features that would otherwise be eliminated by purifying selection.’
Lynch, M., Conery, J.S. (2003) The Origins of Genome Complexity. Science 302, 1401-4.
Nati
on
al C
en
ter
for
Bio
tech
nolo
gy In
form
ati
on
Nati
on
al C
en
ter
for
Bio
tech
nolo
gy In
form
ati
on
Comparing old and new introns: gaining insight intothe origin of introns
Sverdlov, Babenko, Rogozin, Koonin. Curr. Biol. (2003);Gene (2004, in press)
Nati
on
al C
en
ter
for
Bio
tech
nolo
gy In
form
ati
on
Nati
on
al C
en
ter
for
Bio
tech
nolo
gy In
form
ati
on
y = -42.679x + 937.18
y = 6.4048x + 1308.4
0
200
400
600
800
1000
1200
1400
1600
1 2 3 4 5 6 7 8
all old
all new
Linear ( all old)
Linear ( all new)
Distribution of old and new introns along the gene length
All genomes pooled
Nati
on
al C
en
ter
for
Bio
tech
nolo
gy In
form
ati
on
Nati
on
al C
en
ter
for
Bio
tech
nolo
gy In
form
ati
on
Distribution of old and new introns along the gene length
S. pombe – an intron-poor genome –nearly identical distributions of old and new introns
y = -5.4643x + 63.714
y = -5.9762x + 58.893
0
10
20
30
40
50
60
70
80
90
100
1 2 3 4 5 6 7 8
sp old
sp new
Linear ( sp old)
Linear ( sp new)
Nati
on
al C
en
ter
for
Bio
tech
nolo
gy In
form
ati
on
Nati
on
al C
en
ter
for
Bio
tech
nolo
gy In
form
ati
on
y = -11.679x + 293.18
y = 7.6071x + 422.89
0
100
200
300
400
500
600
1 2 3 4 5 6 7 8
hs old
hs new
Linear ( hs old)
Linear ( hs new)
Distribution of old and new introns along the gene length
H. sapiens – an intron-rich genome – enrichment fornew introns in the 3’-region
Nati
on
al C
en
ter
for
Bio
tech
nolo
gy In
form
ati
on
Nati
on
al C
en
ter
for
Bio
tech
nolo
gy In
form
ati
on
Reverse transcriptionduplication
TTTTTTTT
TTTTTTTT
GT AG
GT AG
AAAAAAAAA5’ 3’
Genomic DNA
Homologous recombination
new intron
GT AG
A reverse-transcription based model of intron insertion – almost the same as for intron loss (Fink, 1987) but includes an error of reverse transcription
Introns seem to be preferentially lost AND inserted near the 3’-end of the coding region – could there be similar mechanismsfor intron loss AND insertion?
Role of duplication in the origin of alternative exons has beendemonstratedKondrashov, F.A, Koonin, E.V. Hum. Molec. Genet., 2001Letunic, I. et al., Hum. Molec. Genet., 2002
Nati
on
al C
en
ter
for
Bio
tech
nolo
gy In
form
ati
on
Nati
on
al C
en
ter
for
Bio
tech
nolo
gy In
form
ati
on
Conclusions• Evolutionary classification of genes from sequenced genomes
(orthologs and paralogs) allows us to address genome-wideevolutionary trends by applying rather straightforward adaptations of known phylogenetic approaches
• Introns invaded protein-coding genes very early in evolution ofeukaryotes - prior to the origin of multicellular forms - and manyof these ancient introns survive to this day
• Remarkable conservation of ancestral introns in some eukaryotic lineages, with as many as 25-30% of the introns in humans and Arabidopsis being apparently inherited from the common ancestor of animals, fungi and plants, and ~30% Plasmodium introns conserved in the crown group. Even the earliest ancestral eukaryotes seem to have had many genes and introns.
Nati
on
al C
en
ter
for
Bio
tech
nolo
gy In
form
ati
on
Nati
on
al C
en
ter
for
Bio
tech
nolo
gy In
form
ati
on
•Massive gene and intron loss occurred on multiple, independentoccasions during eukaryotic evolution, especially in fungi, but alsoin arthropods and nematodes (and probably many more lineages).
•Classification of introns by age allows one to followthe evolution of splice signals, intron sequences themselves…and might even suggest mechanisms of intron insertion
•Lineage-specific expansion of paralogous gene familiesis accompanied by substantial loss and even more extensive acquisition of introns
•Loss and gain of introns and genes occur in parallel, reflecting thesame lineage-specific trends in genome evolution – perhaps largelydramatic changes in characteristic population sizes entailing changesin selection strength
Conclusions
Nati
on
al C
en
ter
for
Bio
tech
nolo
gy In
form
ati
on
Nati
on
al C
en
ter
for
Bio
tech
nolo
gy In
form
ati
on
Acknowledgments
Igor Rogozin (NCBI)
The COG group (NCBI):
Yuri Wolf (NCBI)Boris Mirkin (Birkbeck College, London) Alexander Sorokin (NCBI)Alexander Sverdlov (NCBI, now Columbia U)Vladimir Babenko (NCBI)Fyodor Kondrashov (NCBI, now UC Davis)Alexei Kondrashov (NCBI)
Natalie D. Fedorova, John D. Jackson, Aviva R. Jacobs, Dmitri M. Krylov, Kira S. Makarova, Raja Mazumder1, Sergei L. Mekhedov, Anastasia N. Nikolskaya1, B. Sridhar Rao, Sergei Smirnov, Alexander V. Sverdlov, Roman L. Tatusov, Sona Vasudevan, Jodie J. Yin, Darren A. Natale1