The Virus World, its evolution, evolution of antiviral defense, and the role of viruses in the evolution of cells Eugene V. Koonin National Center for Biotechnology Information, NIH, Bethesda KITP, Santa Barbara , February 17, 2011
The Virus World its evolution evolution of antiviral defense and
the role of viruses in the evolution of cells
Eugene V KooninNational Center for Biotechnology Information NIH Bethesda
KITP Santa Barbara February 17 2011
What is a virus
A virus is a small infectious agent that can replicate only inside the living cells of organismshttpenwikipediaorgwikiVirus
Raoult D Forterre P Redefining viruses lessons from Mimivirus Nat Rev Microbiol 2008 6(4)315-9
Viruses and virus-like agents possessbullgenomes bullvery often ndashthough not always ndashcapsids that encase the genome
but lackbullfunctional translation machinerybullmembranes with transportsecretionsystems
bullViruses are the most abundant biological entities in the biosphere there are 10-100 virus particles per cellbullThe pangenomes of viruses and cellular organismshave [at least] comparable complexities
1 cm3 of seawater contains 106-109 virus particles
There are millions of diverse bacteriophage speciesin the water soil and gut
Suttle CA (2005) Nature 437356
Edwards and Rohwer (2005) Nat Rev Microbiol 3504
Viruses are the dominant entities in the biosphere ndash physically and genetically ndash as shown by viral metagenomics ndash virome studies
Mean of sequences with matches to major functional categories
0
5
10
15
Microbial Metagenomes
Vira
l Met
agen
omes
0 5 10 15
Nucleosides and nucleotides
Cell wall and capsule
Fatty acids and lipids
Membrane transport
Stress response
Aromatic compound
Cell division and cell cycle
Nitrogen metabolism
Sulphur metabolism
Motility and chemotaxis
Phosphorus metabolism
Potassium metabolism
Cell signalling
Secondary metabolismDNA metabolism
Carbohydrates
Amino acids
Virulence
Protein metabolism
Respiration
Photosynthesis
Cofactors etc
RNA metabolism
20
Key
Kristensen Mushegian Dolja Koonin Trends Microbiol 2010
Most of the viromes might not even consists of typical viruses butrather of pseudovirus particles that carry microbial genes (GTAs)
Comparative genomics shows that viruses that cause human diseases
belong to families that evolved hundreds of millions or even billions
years ago
Viruses accompany the evolving cellular life throughout its history and
might even predate it
Some viruses are comparable to cellular life forms in size and genetic complexity
Mimivirus genome (~12 Mbp ~1000 genes) is twice as large as that of Mycoplasma genitalium (580 kbp ~500 genes)
The largest most complex viruses NCLDV (Nucleo-Cytoplasmic Large DNA viruses of eukaryotes) ndashthis is where the smallpox virus belongs
Raoult et al Science 2004 Nov 19306(5700)1344 Suzan-Monti et al PLoS ONE 2007 Mar 282(3)
Iyer Aravind KooninCommon origin of four diverse families of large eukaryotic DNA viruses J Virol 2001 75 11720Iyer et al Evolutionary genomics of nucleo-cytoplasmic large DNA viruses Virus Res 2006 Apr117(1)156
The largest most complex viruses the Nucleocytoplasmic Large DNA Viruses (NCLDV)
(this is where the smallpox virus AND themimivirus belong)
6 families of NCLDVhellipand counting
size kb hosts-poxviridae 26 [134-360] vertebrates insects-asfarviridae 1 [170] vertebrates protists()-iridoviridae 8 [103-212] vertebrates insects protists()-ascoviridae 4 [119-174] insects-phycodnaviridae 9 [155-407] algae haptophytes stramenopiles -mimiviridae 2 [1181-1200] amoebozoa algae()-[Marseille virus] 1 [368] amoebozoa ndash new family
The case for the monophyly (common origin) of NCLDV
9 universally conserved hallmark genes (vaccinia gene names)-primase (D5-N)-helicase (D5-C)-DNA polymerase (E9)-packaging ATPase (A32)-Major capsid protein (D13 non-capsid in poxviruses)-Thiol-oxidoreductase (E10)-Helicase (D6 D11)-ST protein kinase (F10)-Transcription factor VLTF2 (A1)
47 genes mapped to the last common ancestor of NCLDVby maximum likelihood ndash all main functional systems represented
Iyer et al J Virol 2001 Virus Res 2006 Yutin et al Virol J 2009
b1 Helvib1 Spofr
b1 Trinil2 Invir
l1 Aedtal3 Lymch
l3 Lymdil5 Ambtil5 Frovi
l5 Singrl4 Infsp
m6 Marseillevirusn1 Acapon2 Mamav
q1 Acatuq1 ParFRq1 ParMTq1 ParAR
q1 ParNYq1 Parbu
q6 Ostviq3 Ectsiq3 Felspq2 Emihu
u2 Amsmou2 Melsa
u1 Bovpau1 Orfviu1 Deevi
u1 Swiviu1 Myxviu1 Rabfi
u1 Goaviu1 Lumsku1 Shevi
u1 Tanviu1 Yabliu1 Yabmo
u1 Vacviu1 Varvi
u1 Molcou1 Crovi
u1 Canviu1 Fowvi
c1 Afrsw
1000010000
93159970
10000
1000010000
6702
9983
10000
7557
10000
1000010000
999310000
10000
9910
10000
7074
10000
6735
10000
10000
7417
10000
4747
454010000
9840
1000010000
10000
10000
10000
9720
9911
5452
9871
10000
10000
10000
05
Phycodnaviridae
Poxviridae
Ascoviridaeamp
Iridoviridae
MimiviridaeMarseillevirus
Asfarviridae
Phylogeny of NCLDV based on concatenation of 4 universal genes primase-helicase DNA polymerase packaging ATPase VLTF2 transcription factor
HOSTAnimals+ diverse protists
Amoebozoa Algae animals()
ChlorophytesHaptophytesstramenopiles
Animals haptophytes
bullDivergence of NCLDVmost likely antedatesdivergence of eukaryoticsupergroupsbullAlternativeaddition extensive horizontal transfer of viruses
Boyer Yutin et al PNAS 2009Koonin Yutin Intervirology2010
Animals
Boyer M Yutin N Pagnier I Barrassi L Fournous G Espinosa L Robert C AzzaS Sun S Rossmann MG Suzan-Monti M La Scola B Koonin EV Raoult DPNAS 2009106(51)21848-53
Giant Marseillevirus highlights the role of amoebae as a melting pot in emergence of chimeric microorganisms
Giant viruses such as Mimivirus isolated from amoeba found in aquatic habitatsshow biological sophistication comparable to that of simple cellular life formsand seem to evolve by similar mechanisms including extensive gene duplicationand horizontal gene transfer (HGT) possibly in part through a viral parasitethe virophage We report here the isolation of Marseille virus a previouslyuncharacterized giant virus of amoeba The virions of Marseillevirus encompass a 368-kb genome a minimum of 49 proteins and some messenger RNAs Phylogenetic analysis of core genes indicates that Marseillevirus is the prototypeof a family of nucleocytoplasmic large DNA viruses (NCLDV) of eukaryotes The genome repertoire of the virus is composed of typical NCLDV core genes and genes apparently obtained from eukaryotic hosts and their parasites or symbionts both bacterial and viral We propose that amoebae are melting potsldquoof microbial evolution where diverse forms emerge including giant viruses with complex gene repertoires of various origins
The mosaic composition of the Marseille virus genome
Gene content tree intervirus gene transfer
Amoeba as a melting pot for HGT between viruses andbacterial endosymbionts
There are really weird creatures out thereSome NCLDV host their own parasites
La Scola et al The virophage as a unique parasite of the giant mimivirus Nature 2008
Chimeric origin of the virophage genome
NCLDV
Mimivirus
Archaeal virus
NCLDVenvironmental only
Bacterialtransposon
La Scola et al Nature 2008 455100-4
Hypothesisorigin of the NCLDV (and other viruses ofEukaryotes) in the
the second melting pot of virus evolution ndasheukaryogenesis
Phage scaffold(virus hallmark genes)
Eukaryotic additionsdisplacements
Koonin Yutin Intervirology 2010
Poliovirus74 kb
VP0 3DPol2C 3CPro Angg
IRESVP1VP3 2APro 2B
3A 3B
Jelly-roll CPsSuperfamily 3
helicase RdRp
Viral hallmark genes
3C-Pro3DPol
RdRp3CPro g
VPg2C
S3H
Picornaviral lsquosignaturersquo genes
Protein sequence conservation
Some of the smallest viruses (this is where poliovirus belongs) The Big Bang of picorna-like virus evolution
Koonin Wolf Nagasaki Dolja Nat Rev Microbiol 2008 Dec6(12)925-39
Picorna-likeviruses possessdiverse arraysof hallmark andunique genes
Picorna-likeviral superfamily
Marine eukaryotic plankton carries a wealth ofpositive-strand RNA viruses nearly all belong to the picorna-like superfamily
Lang et al (2004) Virology 320206
HcRNAV44 kb
VP
HaRNAV86 kb
S-Pro Pol
VP1VP3VP2PolHel3
RsRNAV89 kb AnPolHel3 VP 1-3
SssRNAV90 kb An
PolHel3 VP 1-3C-Pro
sg sg
An
Nagasaki et al (2005) Appl Env Microbiol 718888
Culley et al (2006) Science 3121795
The amended Picornavirus-like
superfamily includes 14 recognized viral families
4 floating genera and15 unclassified positive-
strand anddouble-strand RNA
viruses that infect hosts from 4 of the 5 eukaryotic
supergroups-6 distinct clades
from RdRp phylogeny-diverse genome layouts
View from the viral side
6 major clades of picorna-likeviruses - 5 infect eukaryotesfrom different supergroups
The 5 supergroups of eukaryotes and their picorna-like viruses
Complementary view from thehost side
4 of the 5 eukaryoticsupergroups host diverse picorna-like viruses (no onechecked the 5th)
PolioV74 kb
VP0 3DPol2C 3CPro Angg
IRESVP1VP3 2APro 2B
3A 3B
Jelly-roll CPsSuperfamily 3
helicase RdRp
DNA phages Bacterial group II
retroelements
Bacterial and mitochondrialHtrA family proteases
Likely origins of the signature genes of the picorna-like superfamily were inferred usingsignature PSSMs and PSI-BLAST searches
3C-Pro
Hallmark genes of picorna-like viruses have distinct prokaryoticorigins
Radiation of major viralclades occurred in aldquoBig Bangrdquo during eukaryogenesis and antedates the divergenceof eukaryotic supergroups-the viruses then ldquosampledrdquo the hosts
Conclusions on the picorna and NCLDV storiesBig Bangs of virus evolution
bullPhylogenomic analysis of the picorna-like superfamily of eukaryotic RNA viruses indicates that the major lineages within this superfamily diverged prior to the divergence of the eukaryotic supergroupsbullBig Bang ndash an explosive early phase of viral evolutionbullMost likely the same pattern holds for other major groups ofviruses as illustrated by the evolutionary study of NCLDVa completely different group of virusesbullThe Big Bangs of eukaryotic virus evolution occurred concomitantly with a similar rapid phase of host evolution andcould be a manifestation of a general model of major evolutionarytransitions
Koonin The Biological Big Bang model for the major transitions in evolutionBiol Direct 2007 Aug 20221
Koonin Wolf Nagasaki Dolja Nat Rev Microbiol 2008 Dec6(12)925-39
+
1 Positive-strandRNA
3-30 kb - +R R ET
T+R
Class Replication cycle
2 Double-strandRNA
4-25 kb plusmnTr R E
T
+ plusmn
3 Negative-strandRNA
11-20 kbR R E
RdRp CPT
+-Tr -
+
Genetic cycles of RNA viruses
Diversity of viral genetic cycles versus the uniform genetic cycle of all cellular organisms Viruses are the
biospherersquos laboratory of genomic strategies
RdRp CP
RdRp CP
+4 Retroid
RNA viruses 7-12 kb +RT Tr E
T+Tr
plusmn
5 Retroid DNAviruses
elements2-10 kb +Tr
T+Tr
plusmnE RT
plusmn
RT CP
RT
Class Replication cycle
Genetic cycles of retroid viruses and retroelements
Genetic cycles of DNA viruses and plasmids 6 ssDNAvirusesplasmids2-11 kb
7 dsDNA virusesplasmids
5-1200 kb
+RCR RCR E
+Tr
plusmn +
TRCE
Tr
plusmn
+T
plusmnR E
Pr-Pol ALL CELLULARORGANISMS
YOU ARE HERE
I Genes with readily detectable homologs from cellular life forms 1 Genes with closely related homologs from cellular organisms
(typically the host of the given virus) present in a narrow group of viruses2 Genes that are conserved within a virus lineage or even several
lineages and have moderately close cellular homologsOrigin relatively recent (1) or ancient (2) acquisition from host
II Virus-specific genes3 ORFans ie genes without detectable homologs except possibly
in closely related viruses 4 Virus-specific genes that are conserved within a virus lineageAcquisition from host but with rapid divergence from ancestor once within viral genomes
III Viral hallmark genes5 Genes shared by many diverse virus lineages with only very distant
homologs in cellular organisms
Natural history of viral genes a one-page summary of viral comparative genomics
Contributions of different classes of viral genes to the genomes of different classes of viruses strong dependence on genome size
Genome size (log) 0 1 2 3
Recent acquisitions
Old acquisitions
Virus-specific ORFans
Virus-specific conserved
Virus hallmark
Most RNAvirusesretroelementsRCR replicons
Large RNA virusesadenotailedphages
NCLDVherpeslargephages
Natural history of viral genesViral Hallmark Genes
Shared by many diverse groups of viruses from the smallest RNA viruses to the giant DNA viruses
Strong support for monophyly of all viral members of the respective gene families
Only distant homologs in cellular organisms
Can be viewed as signatures of the lsquovirus statersquo
Play major roles in genome replication packagingand assembly
4 Rolling circle replicationinitiation endonuclease
1 Jelly-roll capsid protein
2 Superfamily 3 helicase
Protein products of viral hallmark genes
3 RNA-dependent RNA polymeraseand Reverse transcriptase
5 Viral archaeo-eukaryotic DNA primase
6 UL9-like superfamily 2 helicase
7 Packaging ATPase of the FtsK family
8 ATPase subunit of terminase
The primordial gene pool hypothesis (extremely counterintuitive ndash Santiago Elena Feb 16 2011)
The hallmark genes AND by implication the major lineages of modern viruses (at least viruses of prokaryotes) descend directly from a primordial gene pool
Koonin Senkevich Dolja Biol Direct 2006 129
Viral hallmark genes arebullpresent in a huge diversity of viruses and other selfish elementsbullrepresented only by remote homologs in cellular life forms
-synergy with the diversity of genomic strategies
A crucial corollary If viruses come directly from a primordial gene pool then origin of viruses isinextricably linked to the origin of cells
Cell degeneration Escaped genes Pr imordial geneticsystems
CELL
SMALLPARASITICCELL
VERYSMALLVIRUS
CHROMOSOME
PLASMID
VIRUS
RNA
VIRUS
PRE-CELLULAR LIFE FORMS
DNA
VIRUS
mRNA
VIRUS
Competing concepts of the or igin of viruses
CELLS
Origin and evolution of virus-like genetic
elementsin the pre-cellular
era
Koonin Martin TIG 2005Koonin Senkevich Dolja Biol Direct 2006 129Koonin Ann NY Acad Sci 2009
Replicon fusionyields chromosomes
bullViruses and virus-like genetic elements are not ldquojustrdquo pathogens they aredominant entities in the biospherebullEmergence of virus-like parasites is inevitable in any replicating systembullIn the pre-cellular epoch the genetic elements that later became viral andcellular genomes comprised a single pool in which they mixed matched andevolved new increasingly complex gene ensemblesbullDifferent replication strategies including RNA replication reverse transcriptionand DNA replication evolved already in the primordial genetic poolbullWith the emergence of prokaryotic cells a distinct pool of viral genes formed that retained its identity ever since as evidenced by the extant distribution ofviral hallmark genes ldquovirus worldrdquo or the virospherebullThe emergence of the eukaryotic cell was a second melting pot of virus evolution from which viruses of eukaryotes originated via recombination of genes from prokaryote viruses retroelements and the evolving eukaryotic hostbullViruses make essential contributions to the evolution of the genomes of cellularlife forms in particular as vehicles of HGT GTAs transducing phages
The ancient Virus World (VW)
Koonin EV Senkevich TG Dolja VV The ancient Virus World and evolution of cells Biol Direct 2006 Koonin EV Wolf YI Nagasaki K Dolja VV The complexity of the virus world Nat Rev Microbiol 2009
Evolution of antivirus defense systems
bull CRISPRCas system of adaptive immunity in prokaryotes
bull A case for Lamarckian evolutionbull The perennial virus-host arms race
CRISPR repeats and Cas genesCRISPR Clustered Regularly interspaced short palindromic repeatsCas CRISPR-associated (genes)
Sorek et al Nature Rev Microbiol 2008
Makarova KS Aravind L Grishin NV Rogozin IB Koonin EVA DNA repair system specific for thermophilic Archaea and bacteria predicted by genomic context analysisNucleic Acids Res 2002 Jan 1530(2)482-96
During a systematic analysis of conserved gene context in prokaryotic genomes a previously undetected complex partially conserved neighborhood consisting of more than 20 genes was discovered in most Archaea and some bacteria including the hyperthermophiles Thermotoga maritima and Aquifex aeolicus The gene composition and gene order in this neighborhood vary greatly between species but all versions have a stable conserved core that consists of five genes One of the core genes encodes a predicted DNA helicase often fused to a predicted HD-superfamily hydrolase and another encodes a RecB family exonuclease three core genes remain uncharacterized but one of these might encode a nuclease of a new family helliphelliphelliphellipThe functional features of the proteins encoded in this neighborhood suggest that they comprise a previously undetected DNA repair system which to our knowledge is the first repair system largely specific for thermophiles to be identified
Protein components of the system an update with unification of many diverse families
~25 families altogetherFamily SubfamilyA Phyletic
distributionBComments
1 COG1518 COG1518 (cas1) All Putative novel nucleaseintegrase Mostly α-helical protein
2 COG1343 COG1343 (cas2) COG3512ygbF-like
MTH324-likey1723_N-like
All Small protein related to VapD fused to helicase (COG1203) iny1723-like proteins
3 COG1203 COG1203 (cas3) All DNA helicase Most proteins have fusion to HD nuclease
4 RecB-like nuclease
COG1468 (cas4) COG4343 All RecB-like nuclease Contains three-cysteine C-terminal cluster
5 RAMP Repair-associated mysterious
protein
COG1688 (cas5)COG1769 COG1583COG1567 COG1336COG1367 COG1604COG1337 COG1332COG5551
BH0337-likeMJ0978-likeYgcH-like
y1726-like y1727-like
All Belong to ldquoRAMPrdquo family possibly RNA-binding proteinstructurally related to duplicated ferredoxin fold (PDB 1wj9)
6 COG1857 COG1857 COG3649 YgcJ-like y1725-like
All αβ protein predicted nuclease or integrase
7 HD-like nuclease
COG1203 (N-terminus) COG2254 All HD-like nuclease
8 BH0338 BH0338-likeMTH1090-like
All mostlyarchaea and FIRM
Large Zn-finger containing proteins probably nucleases (nucleaseactivity was shown for MTH1090 (ref)
9 COG1353 predicted
polymerase
COG1353 Most archaea some bacteria
Predicted palm-domain polymerase distantly related to viral RdRp and RT
hellipand ~15 other less common proteins Makarova et al BD 2006
CRISPR clustered regularly interspaced short palindromic repeats
CRISPR repeats
TB and TA
Strepto-like
Ecoli-like
Pasteurella-like
Cas2COG1343
Cas1COG1518
Cas4COG1468
Cas3COG1203
ldquoThe common structural characteristics of CRISPR loci are (i) the presence of multiple short direct repeats which show no or very little sequence variation within a given locus (ii) the presence of non-repetitive spacer sequences between the repeats of similar size (iii) the presence of a common leader sequence of a few hundred basepairs in most species harbouring multiple CRISPR loci (iv) the absence of long open reading frames within the locus and (v) the presence of the cas1 gene accompanied by the cas2 cas3 or cas4 genes in CRISPR-containing speciesrdquo
The cas genes are our ldquorepairrdquo system
ldquoHelicaserdquo cassette
ldquoPolymeraserdquo cassette
paragon of prokaryotic genome plasticity-extensive gene shuffling-evidence of multiple horizontal transfers-widespread gene loss
RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins
extreme sequence diversity MOTIFs specific I II III IV V COGs 13361367 hhshhGs ust-lKGhh+hh hhGtt hD lGhttsGh 160413371332 y1726-like slhlpEKuVRGT lRTIDs YGuVTsGhuh COG1851 hGphpGpsaFh hGFGRh BH0337-like TpA-h+GIh-uIh hhLpDV LGsREhuht COG1567 hhhpp ups-lhtAhh luscoGhGh COG1769 hhhh+Ph-hhh ss-hhGhlhsh hGh hGtcp+hsthchp COG1688 (Cas5) hhhhhts sssshhGhlsh lGttphh COGs 15835551 hhhhoPhhl hGtppshGFGl YgcH-like hHphlh hGu+uhGhGhh y1727-like LHphLh GFsthGLStss MJ0978-like hHNH lG+tsuhGhGol
Ferredoxin foldRNA-recognition motif (RRM)
The sequence similarity space of CRISPR repeats visualized with the BioLayoutprogram [26] Dots denote individual repeat sequences connecting lines represent Smith-Waterman similarities such that closer dots represent more similar sequences Dot colors denote cluster association as derived from MCL clustering The 12 largest clusters are indicated by circles together with their sequence logos coarse phylogenetic composition and sample secondary structures where applicableKunin et al Genome Biology 2007 8R61 doi101186gb-2007-8-4-r61
CRISPR show extreme diversity and complex clustering
Arcful GTTGAAATC-------AGACCAAAATGGGATTGAAAG- Metthe GTTAAAATC-------AGACCAAAATGGGATTGAAAT- Pyraby GTTCCAATA-------AGACTAGAATAGAATTGAAAG- Pyrhor GTTTAATAA--------GACTAAAATAGAATTGAAAG- Sulsol GATTAATCC-------------CAAAAGAATTGAAAG- Pyrhor GTTTCCGTA--------GAACTAAATAGTGTGGAAAG- Pyraby GTTTCCGTA--------GAACTTAGTAGTGTGGAAAG- Pyraer GTTTCAACT------------ATCTTTTGATTTTTGG- Pictor GTTAAATAA-------TAACCTAAATAGGATTGAAAG- Theaer GTAAAATAG---------ACCTTAATAGGATTGAAAG- Metace GTTTCAATC-------CCTCAAAGGTCTGATTTTAAC- Sultok GATGAATCC------------CAAAAGGAATTGAAAG- Sulaci GTTTTAGTT-------------TCTTGTCGTTATTAC- Pictor GTTTAAGAA--------TTACTAGATAGTATGGAGT-- Halmar GTTTCAGAC-------GGACCCTTGTGGGGTTGAAGC- Metjan ATTAAAATC-------AGACCGTTTCGGAATGGAAAT- Metkan GTTTCATTA-CCCGTATTATTACGGGTTAATTGCGAG- Nanequ CTTTCAATA---------TTTCTAATATATTAGAAAC- Themar GTTTCAATA-------CTTCCTTAGAGGTATGGAAAC- Theten GTTTCAATC-------CCTTTTAGGTAGGCTAAAAAC Thethe GTTGCAAAC-CTCGTTAGCCTCGTAGAGGATTGAAAC- Aquaeo GTTTTAACT--------CCACACGGTACATTAGAAAC- Bachal GTCGCACTC----TATATGGGT-GCGTGGATTGAAAT- Azoarcus GTGTTCCCC-------GCGCATCGCGGGGGTTGAAG-- Chltep GTCTTCCCC--------ACGCC-CGTGGGGGTGTTTC- Chrvio GTGCTCCCC-------CACGCA-CGTGGGGATGAACCG Clotet GTATTAGTA-------GCACCA-TATTGGAATGTAAAT Anabaena GTTTTAATTAACAAAAATCCCTATCAGGGATTGAAAC- Myctub GTTTCCGTC--CCCTCTCGGGGTTTTGGGTCTGACGAC Yerpes GTTCACTGC--------CGCACAGGCAGCTTAGAAA-- Metcap GTTTCAATCCACTCCCGGCTATTTAGCCGGGAGATAC- Mycgal GTTTTAGCA-CTGTACAATACTTGTGTAAGCAATAAC- Pasmul GTTAACTGC--------CGTATAGGCAGCTTAGAAA-- Pasmul GTTGTAGTTCCCTCTCTCATTTCGCAGTGCTACAAT-- Pasmul GTTCACCAT--------CGTGTAGATGGCTTAGAAA--
All
A large subset of CRISPRs is conserved among diverse specieseven between archaea and bacteria suggesting that
CRISPRs are horizontally transferred together with cas genes
Mojica FJ Diez-Villasenor C Garcia-Martinez J Soria E
Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elementsJ Mol Evol 2005 Feb60(2)174-82
Here we show that CRISPR spacers derive from preexisting sequences either chromosomal or within transmissible genetic elements such as bacteriophages and conjugative plasmids
helliphelliphelliphellipThe transcription of the CRISPR loci (Tang et al 2002) suggests that such activity could be executed by CRISPR-RNA molecules acting as regulatory RNA that specifically recognizes the target through the homologous RNA-spacer sequence similarly to the eukaryotic interference RNA
HypothesisCRISPRCasbull is a prokaryotic immunity system that functions on the
RNAi principle
bull integrates short fragments of essential phageplasmid genes into CRISPRs
bull When expressed these fragments (psiRNA ndash after prokaryotic siRNA) silence the target gene and make the organism immune to the respective agent
bull contains all or most of the protein activities involved in these processes
bull Some of the Cas proteins are functional analogs of the eukaryotic proteins involved in RNAi in particular components of RISC (RNA-Induced Silencing Complex) and form prokaryotic analogs of RISC (pRISCs)
Transcription
(protein-guided) RNA folding
specific transcription factor (COG1517)
cellular RNApol
polycistronic pre-psiRNA
p-dicer (Helicase+HD-Hydrolase)(COG1203 COG2254)
RNA processing(fast)
pre-psiRNA75-100 nt
RNA processing(slow)
p-dicer
psiRNA25-45 nt
p-slicer(COG1468 COG4343
COG1857)
RAMP
target RNA cleavage
p-RISC
plasmid or phage mRNA
Annealing to RNA target
p-RISC
The basic scheme of CASS functioning
psiRNA25-45 nt
RAMP
RAMP-RNA complex(unstable)plasmid or phage
mRNA
Primer elongationCASS RNApol(COG1353)
Amplified annealing
long dsRNA (stable)
p-dicerDuplex degradation
RAMPRAMP binding
Cycle continues
Annealing to RNA target
Variant of CASS functioning with polymerasepsi-RNA amplification(mostly in thermophiles but also in Mycobacteria)
plasmid or phage mRNA
pre-psiRNA75-100 nt
CASS RNApol(COG1353)Or other RT
Reverse transcription with random copy choice
Random RNA recombination and reverse transcription
dsDNA with CRISPRs and target-derived spacers
Homologous recombination with genomic CRISPR region
integrase(COG1518)
genomic DNA
genomic DNA with new target-derived spacer
OR
New psiRNA generation
Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12
Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity
Key validation
bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease
Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes
(1) Adaptation
(2) Expression amp Processing
(3) Interference
invading virus plasmid
CRISPRleader
5 4 3 2 1
6 5 4 3 2 1
invading virus plasmid
invading virus plasmid
host
host
cas operon
Cascade
Cas3
cas3 cas2cas1cas5
Van der Oost et al TIBS 2009
The three types of CRISPRCas systems and their signature genes
1343
1343
1343
1343
Helicase1203
HD-f
Cas2
Cas6
Cas6
Cas5
Cas8
Cas9
Cas1
Cas3
Cas10
Cas11
Cas7
Cas6
Cas6
Cas5 Cas6
Cas4
Cas6
13433513
1343
RuvC-f
3513
HNH
13433513
1857
18571857 RAMP1688
RAMP1688
RecB-f1468
RecB-f1468
Helicase1203
R AMP1583
1857 1857Helicase1203
1857Helicase1203
1343 y1724
ygcKygcL
EcoliCASS2
YpestCASS3
NmeniCASS4
MtubeCASS6 RAMP module
DvulgCASS1
Tneap and HmariCASS7
ApernCASS5
Type I (Helicase cassette)
Cascade
Type II (HNH-type)
Type III (RAMP module or Polymerase cassette)
CASS4a
1857 1343RecB-f1468
Helicase1203
BH0338MTH1090LA3191
1421 RAMP1567
RAMP15833337RAMP
1583
RAMP1583
RAMP1769
SPy1049
Cas6
Cas61857
RecB-f1468
RAMP1688
Helicase1203
AF0070
R AMP1583
1343
1517
Cas12 Cas13
RAMP1688
RAMP1688
RAMP1688
Makarova et al NRM submitted
Experimental data on CRISPRCas systems
1857
1857
1857
RAMP1688
10
Cordip DIP0037Bacfra BF3951
Wolsuc WS1444Camjej Cj1522c
Neimen NMA0630Pasmul PM1126
Strthe str0658Strpyo Spy 0770
Treden TDE0328Staepi SERP2463
Mycgal MGA 0523Mycmob MMOB0320
Porgin PG1982Legpne lpp0161
Wolsuc WS1615Sulsol SSO1450
Aerper APE1240Sulsol SSO1405
Pyraer PAE0200Pyraer PAE0081
Metmaz MM0559Metmaz MM3249
Arcful AF1878Mansuc MS1635
Niteur NE0111Myctub Rv2817c
Nostoc alr0381Synech slr7071Thethe TTHB145
Chrvio CV1229Xanaxo XAC3842
Chltep CT1130Desvul DVUA0134
Metcap MCA0651Azoarc ebA3283
Deigeo_DRAFT_1682Mansuc MS0981
Baccla ABC3592Bachal BH0341
Strpyo Spy 1286Vibvul VVA1544
Nostoc alr1468Synech slr7092
Synech slr7016Geosul GSU0057
Thethe TTHB224Nostoc alr1568
Metkan MK1312
Esccol b2755Saltyp STM2938
Phopro PBPRC0034Geosul GSU1392
Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639
Symthe_STH663Corjei jk0643
Nocfar nfa44220Strave SAV7537
Metcap MCA0930Cordip DIP2214
Chrvio CV1756Zymmob ZMO0680
Pasmul PM0311Erwcar ECA3679
Yerpes y1722Legpne lpl2837Phopro PBPRB1995
Acinet ACIAD2484
Clotet CTC01148Fusnuc FN1177
Aquaeo aq 369Theten TTE2658Themar TM1797
Bacfra BF2544Porgin PG2014
Halmar pNG4053Clotet CTC01463
Metjan MJ0378Pyrhor PH1245
Thekod TK0455Metthe MTH1084
Arcful AF2435Metace MA3670
Nanequ NEQ017Pyrhor PH0173
Pictor PTO0003Pictor PTO0049Thevol TVN0106
1343
1343
1343
1343
1518
1518
1343
1518
RecB-f4343
RecB-f1468
RecB-f1468
RecB-f1468
Helicase12031857 RAMP
1688
RAMP1688
Helicase1203
Helicase1203
RAMPBH0337
RAMPYgcH
RAMPy1727
RAMPy17261857
Helicase1203
HD-f
Helicase1203
HD-f
1343
HD-f
HD-f
HD-f
RuvC-f
AF0070
BH0338
y1724
ygcKygcL
RAMP1583
HTH
AF1870
SPy1049
3574
AF1870
PH0918
HTH 3574
1857 RAMP1688
RecB-f1468
Helicase1203
HD-f
AF1870
MTH1090
AF1873
RAMP1583
3574
AF1870
3574
1343
1
7
2
3
4
5
6
7a
Sulsol SSO1999Sulsol SSO1440
Sultok ST2639Sultok ST0032
Sulsol SSO1402Pyraer PAE0068
Arcful AF1874Pyrhor PH0917Thekod TK0450
Pyraby PAB1689
13433513
3513
1518
RuvC-f
HNH
4aHNH
1 0
Pyrhor PH0162
6
64
6
4
16
17
7
7
7
7
1
2
7
7
7
7
5 7
5 75 7
5 75 7
5 7
6 1
7
Metkan MK1314
Metace MA1931
Metmaz MM3356
Metbar Mbar_A1355
Strthe stu0960Myctub Rv2823cStaepi SERP2461
Themar TM1811Mansuc MS1651
Niteur NE0123Thethe TTP0102
Metthe MTH1082
Metthe MTH326
Metjan MJ1672Pictor PTO0053Thevol TVN0112
Metkan MK1297Thethe TTP0115
Bachal BH0328Synech sll7090
Aquaeo aq 387Theten TTE2639
Themar TM1794Arcful AF1867
Pyrfur PF1129Thefus Tfu1578
Porgin PG1987Sultok ST1979
Sulsol SSO1991Sulsol SSO1729
Sultok ST0012Sulaci Saci 2046Nostoc all1479
1421 RAMP1567
RAMP1583
RAMPMTH323
RAMP1769
XMTH324
ldquoPolymeraserdquo cassette
ldquoHelicaserdquo cassette(A)
(B)
(C)
Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4
Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56
Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)
mRNA targeting
Phylogeny of Cas1 and the 3 types of CRISPRCas systems
228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes
EcoliCASS2Type IType I
Type I
Type I
Type I
Type III
Type III
Type III
Type III
Type III
Type III
Type II
YpestCASS3
NmeniCASS4
PolymeraseRAMPmoduleMtubeCASS6
PolRAMP
DvulgCASS1
TneapHmariCASS7
ApernCASS5
cas1cas9 cas2 cas4
cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI
II
IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2
RAMP
Makarova et al NRM submitted
CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes
54
256
1537
26
516
2
9
7 56107
3
10
13
383
210
46
216
8
1
7 70211
10
1
0102030405060708090
100
1623
84 6
17 7
2322
0
9
0
0
151
2
1
21
17
20 1
0
1533 28 7 14
0
9 7 40
117 210
0102030405060708090
100
Type I Type II Type IIICas1presenceabsence
Makarova et al in preparation
Modular evolution of the 3 types of CRISPRCas
Makarova et al in preparation
Mol Microbiol 2011 Jan79(2)484-502
A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair
Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF
Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair
Back to repair functions
The 3 major modalities of evolution
Koonin Wolf Biol Direct 2009
CRISPRCas as a bona fide Lamarckian system
Koonin Wolf Biol Direct 2009
Phenomenon Biological rolefunction
Phyletic spread
Lamarckian criteria
Genomic changes caused by environmental factor
Changes are specific to relevant genomic loci
Changes provide adaptation to the causative factor
Bona fide LamarckianCRISPRCas Defense against
viruses and other mobile elements
Most of the Archaeaand many bacteria
Yes Yes Yes
piRNA Defense against transposable elements in germline
Animals Yes Yes Yes
HGT (specific cases)
Adaptation to new environment stress response resistance
Archaea bacteria unicellular eukaryotes
Yes Yes Yes
Quasi-LamarckianHGT (general phenomenon)
Diverse innovations Archaea bacteria unicellular eukaryotes
Yes No Yesno
Stress-induced mutagenesis
Stress responseresistanceadaptation to new conditions
Ubiquitous Yes No or partially
Yes (but general evolvabilityenhanced as well)
Diverse Lamarckian and quasi-Lamarckian phenomena
Koonin Wolf Biol Direct 2009
Stress as a gauge of evolutionary modality
Koonin Wolf Biol Direct 2009
bullEvolution of parasites is intrinsic to any replicator system
bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since
bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms
bullPerennial arms race between parasites and hosts is one of the principal factors of evolution
Valerian DoljaOregon State University
Yuri Wolf NCBITatiana Senkevich NIAID NIH
Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille
Didier Raoult et al
Bill MartinUniv Duesseldorf
Kira Makarova NCBI
What is a virus
A virus is a small infectious agent that can replicate only inside the living cells of organismshttpenwikipediaorgwikiVirus
Raoult D Forterre P Redefining viruses lessons from Mimivirus Nat Rev Microbiol 2008 6(4)315-9
Viruses and virus-like agents possessbullgenomes bullvery often ndashthough not always ndashcapsids that encase the genome
but lackbullfunctional translation machinerybullmembranes with transportsecretionsystems
bullViruses are the most abundant biological entities in the biosphere there are 10-100 virus particles per cellbullThe pangenomes of viruses and cellular organismshave [at least] comparable complexities
1 cm3 of seawater contains 106-109 virus particles
There are millions of diverse bacteriophage speciesin the water soil and gut
Suttle CA (2005) Nature 437356
Edwards and Rohwer (2005) Nat Rev Microbiol 3504
Viruses are the dominant entities in the biosphere ndash physically and genetically ndash as shown by viral metagenomics ndash virome studies
Mean of sequences with matches to major functional categories
0
5
10
15
Microbial Metagenomes
Vira
l Met
agen
omes
0 5 10 15
Nucleosides and nucleotides
Cell wall and capsule
Fatty acids and lipids
Membrane transport
Stress response
Aromatic compound
Cell division and cell cycle
Nitrogen metabolism
Sulphur metabolism
Motility and chemotaxis
Phosphorus metabolism
Potassium metabolism
Cell signalling
Secondary metabolismDNA metabolism
Carbohydrates
Amino acids
Virulence
Protein metabolism
Respiration
Photosynthesis
Cofactors etc
RNA metabolism
20
Key
Kristensen Mushegian Dolja Koonin Trends Microbiol 2010
Most of the viromes might not even consists of typical viruses butrather of pseudovirus particles that carry microbial genes (GTAs)
Comparative genomics shows that viruses that cause human diseases
belong to families that evolved hundreds of millions or even billions
years ago
Viruses accompany the evolving cellular life throughout its history and
might even predate it
Some viruses are comparable to cellular life forms in size and genetic complexity
Mimivirus genome (~12 Mbp ~1000 genes) is twice as large as that of Mycoplasma genitalium (580 kbp ~500 genes)
The largest most complex viruses NCLDV (Nucleo-Cytoplasmic Large DNA viruses of eukaryotes) ndashthis is where the smallpox virus belongs
Raoult et al Science 2004 Nov 19306(5700)1344 Suzan-Monti et al PLoS ONE 2007 Mar 282(3)
Iyer Aravind KooninCommon origin of four diverse families of large eukaryotic DNA viruses J Virol 2001 75 11720Iyer et al Evolutionary genomics of nucleo-cytoplasmic large DNA viruses Virus Res 2006 Apr117(1)156
The largest most complex viruses the Nucleocytoplasmic Large DNA Viruses (NCLDV)
(this is where the smallpox virus AND themimivirus belong)
6 families of NCLDVhellipand counting
size kb hosts-poxviridae 26 [134-360] vertebrates insects-asfarviridae 1 [170] vertebrates protists()-iridoviridae 8 [103-212] vertebrates insects protists()-ascoviridae 4 [119-174] insects-phycodnaviridae 9 [155-407] algae haptophytes stramenopiles -mimiviridae 2 [1181-1200] amoebozoa algae()-[Marseille virus] 1 [368] amoebozoa ndash new family
The case for the monophyly (common origin) of NCLDV
9 universally conserved hallmark genes (vaccinia gene names)-primase (D5-N)-helicase (D5-C)-DNA polymerase (E9)-packaging ATPase (A32)-Major capsid protein (D13 non-capsid in poxviruses)-Thiol-oxidoreductase (E10)-Helicase (D6 D11)-ST protein kinase (F10)-Transcription factor VLTF2 (A1)
47 genes mapped to the last common ancestor of NCLDVby maximum likelihood ndash all main functional systems represented
Iyer et al J Virol 2001 Virus Res 2006 Yutin et al Virol J 2009
b1 Helvib1 Spofr
b1 Trinil2 Invir
l1 Aedtal3 Lymch
l3 Lymdil5 Ambtil5 Frovi
l5 Singrl4 Infsp
m6 Marseillevirusn1 Acapon2 Mamav
q1 Acatuq1 ParFRq1 ParMTq1 ParAR
q1 ParNYq1 Parbu
q6 Ostviq3 Ectsiq3 Felspq2 Emihu
u2 Amsmou2 Melsa
u1 Bovpau1 Orfviu1 Deevi
u1 Swiviu1 Myxviu1 Rabfi
u1 Goaviu1 Lumsku1 Shevi
u1 Tanviu1 Yabliu1 Yabmo
u1 Vacviu1 Varvi
u1 Molcou1 Crovi
u1 Canviu1 Fowvi
c1 Afrsw
1000010000
93159970
10000
1000010000
6702
9983
10000
7557
10000
1000010000
999310000
10000
9910
10000
7074
10000
6735
10000
10000
7417
10000
4747
454010000
9840
1000010000
10000
10000
10000
9720
9911
5452
9871
10000
10000
10000
05
Phycodnaviridae
Poxviridae
Ascoviridaeamp
Iridoviridae
MimiviridaeMarseillevirus
Asfarviridae
Phylogeny of NCLDV based on concatenation of 4 universal genes primase-helicase DNA polymerase packaging ATPase VLTF2 transcription factor
HOSTAnimals+ diverse protists
Amoebozoa Algae animals()
ChlorophytesHaptophytesstramenopiles
Animals haptophytes
bullDivergence of NCLDVmost likely antedatesdivergence of eukaryoticsupergroupsbullAlternativeaddition extensive horizontal transfer of viruses
Boyer Yutin et al PNAS 2009Koonin Yutin Intervirology2010
Animals
Boyer M Yutin N Pagnier I Barrassi L Fournous G Espinosa L Robert C AzzaS Sun S Rossmann MG Suzan-Monti M La Scola B Koonin EV Raoult DPNAS 2009106(51)21848-53
Giant Marseillevirus highlights the role of amoebae as a melting pot in emergence of chimeric microorganisms
Giant viruses such as Mimivirus isolated from amoeba found in aquatic habitatsshow biological sophistication comparable to that of simple cellular life formsand seem to evolve by similar mechanisms including extensive gene duplicationand horizontal gene transfer (HGT) possibly in part through a viral parasitethe virophage We report here the isolation of Marseille virus a previouslyuncharacterized giant virus of amoeba The virions of Marseillevirus encompass a 368-kb genome a minimum of 49 proteins and some messenger RNAs Phylogenetic analysis of core genes indicates that Marseillevirus is the prototypeof a family of nucleocytoplasmic large DNA viruses (NCLDV) of eukaryotes The genome repertoire of the virus is composed of typical NCLDV core genes and genes apparently obtained from eukaryotic hosts and their parasites or symbionts both bacterial and viral We propose that amoebae are melting potsldquoof microbial evolution where diverse forms emerge including giant viruses with complex gene repertoires of various origins
The mosaic composition of the Marseille virus genome
Gene content tree intervirus gene transfer
Amoeba as a melting pot for HGT between viruses andbacterial endosymbionts
There are really weird creatures out thereSome NCLDV host their own parasites
La Scola et al The virophage as a unique parasite of the giant mimivirus Nature 2008
Chimeric origin of the virophage genome
NCLDV
Mimivirus
Archaeal virus
NCLDVenvironmental only
Bacterialtransposon
La Scola et al Nature 2008 455100-4
Hypothesisorigin of the NCLDV (and other viruses ofEukaryotes) in the
the second melting pot of virus evolution ndasheukaryogenesis
Phage scaffold(virus hallmark genes)
Eukaryotic additionsdisplacements
Koonin Yutin Intervirology 2010
Poliovirus74 kb
VP0 3DPol2C 3CPro Angg
IRESVP1VP3 2APro 2B
3A 3B
Jelly-roll CPsSuperfamily 3
helicase RdRp
Viral hallmark genes
3C-Pro3DPol
RdRp3CPro g
VPg2C
S3H
Picornaviral lsquosignaturersquo genes
Protein sequence conservation
Some of the smallest viruses (this is where poliovirus belongs) The Big Bang of picorna-like virus evolution
Koonin Wolf Nagasaki Dolja Nat Rev Microbiol 2008 Dec6(12)925-39
Picorna-likeviruses possessdiverse arraysof hallmark andunique genes
Picorna-likeviral superfamily
Marine eukaryotic plankton carries a wealth ofpositive-strand RNA viruses nearly all belong to the picorna-like superfamily
Lang et al (2004) Virology 320206
HcRNAV44 kb
VP
HaRNAV86 kb
S-Pro Pol
VP1VP3VP2PolHel3
RsRNAV89 kb AnPolHel3 VP 1-3
SssRNAV90 kb An
PolHel3 VP 1-3C-Pro
sg sg
An
Nagasaki et al (2005) Appl Env Microbiol 718888
Culley et al (2006) Science 3121795
The amended Picornavirus-like
superfamily includes 14 recognized viral families
4 floating genera and15 unclassified positive-
strand anddouble-strand RNA
viruses that infect hosts from 4 of the 5 eukaryotic
supergroups-6 distinct clades
from RdRp phylogeny-diverse genome layouts
View from the viral side
6 major clades of picorna-likeviruses - 5 infect eukaryotesfrom different supergroups
The 5 supergroups of eukaryotes and their picorna-like viruses
Complementary view from thehost side
4 of the 5 eukaryoticsupergroups host diverse picorna-like viruses (no onechecked the 5th)
PolioV74 kb
VP0 3DPol2C 3CPro Angg
IRESVP1VP3 2APro 2B
3A 3B
Jelly-roll CPsSuperfamily 3
helicase RdRp
DNA phages Bacterial group II
retroelements
Bacterial and mitochondrialHtrA family proteases
Likely origins of the signature genes of the picorna-like superfamily were inferred usingsignature PSSMs and PSI-BLAST searches
3C-Pro
Hallmark genes of picorna-like viruses have distinct prokaryoticorigins
Radiation of major viralclades occurred in aldquoBig Bangrdquo during eukaryogenesis and antedates the divergenceof eukaryotic supergroups-the viruses then ldquosampledrdquo the hosts
Conclusions on the picorna and NCLDV storiesBig Bangs of virus evolution
bullPhylogenomic analysis of the picorna-like superfamily of eukaryotic RNA viruses indicates that the major lineages within this superfamily diverged prior to the divergence of the eukaryotic supergroupsbullBig Bang ndash an explosive early phase of viral evolutionbullMost likely the same pattern holds for other major groups ofviruses as illustrated by the evolutionary study of NCLDVa completely different group of virusesbullThe Big Bangs of eukaryotic virus evolution occurred concomitantly with a similar rapid phase of host evolution andcould be a manifestation of a general model of major evolutionarytransitions
Koonin The Biological Big Bang model for the major transitions in evolutionBiol Direct 2007 Aug 20221
Koonin Wolf Nagasaki Dolja Nat Rev Microbiol 2008 Dec6(12)925-39
+
1 Positive-strandRNA
3-30 kb - +R R ET
T+R
Class Replication cycle
2 Double-strandRNA
4-25 kb plusmnTr R E
T
+ plusmn
3 Negative-strandRNA
11-20 kbR R E
RdRp CPT
+-Tr -
+
Genetic cycles of RNA viruses
Diversity of viral genetic cycles versus the uniform genetic cycle of all cellular organisms Viruses are the
biospherersquos laboratory of genomic strategies
RdRp CP
RdRp CP
+4 Retroid
RNA viruses 7-12 kb +RT Tr E
T+Tr
plusmn
5 Retroid DNAviruses
elements2-10 kb +Tr
T+Tr
plusmnE RT
plusmn
RT CP
RT
Class Replication cycle
Genetic cycles of retroid viruses and retroelements
Genetic cycles of DNA viruses and plasmids 6 ssDNAvirusesplasmids2-11 kb
7 dsDNA virusesplasmids
5-1200 kb
+RCR RCR E
+Tr
plusmn +
TRCE
Tr
plusmn
+T
plusmnR E
Pr-Pol ALL CELLULARORGANISMS
YOU ARE HERE
I Genes with readily detectable homologs from cellular life forms 1 Genes with closely related homologs from cellular organisms
(typically the host of the given virus) present in a narrow group of viruses2 Genes that are conserved within a virus lineage or even several
lineages and have moderately close cellular homologsOrigin relatively recent (1) or ancient (2) acquisition from host
II Virus-specific genes3 ORFans ie genes without detectable homologs except possibly
in closely related viruses 4 Virus-specific genes that are conserved within a virus lineageAcquisition from host but with rapid divergence from ancestor once within viral genomes
III Viral hallmark genes5 Genes shared by many diverse virus lineages with only very distant
homologs in cellular organisms
Natural history of viral genes a one-page summary of viral comparative genomics
Contributions of different classes of viral genes to the genomes of different classes of viruses strong dependence on genome size
Genome size (log) 0 1 2 3
Recent acquisitions
Old acquisitions
Virus-specific ORFans
Virus-specific conserved
Virus hallmark
Most RNAvirusesretroelementsRCR replicons
Large RNA virusesadenotailedphages
NCLDVherpeslargephages
Natural history of viral genesViral Hallmark Genes
Shared by many diverse groups of viruses from the smallest RNA viruses to the giant DNA viruses
Strong support for monophyly of all viral members of the respective gene families
Only distant homologs in cellular organisms
Can be viewed as signatures of the lsquovirus statersquo
Play major roles in genome replication packagingand assembly
4 Rolling circle replicationinitiation endonuclease
1 Jelly-roll capsid protein
2 Superfamily 3 helicase
Protein products of viral hallmark genes
3 RNA-dependent RNA polymeraseand Reverse transcriptase
5 Viral archaeo-eukaryotic DNA primase
6 UL9-like superfamily 2 helicase
7 Packaging ATPase of the FtsK family
8 ATPase subunit of terminase
The primordial gene pool hypothesis (extremely counterintuitive ndash Santiago Elena Feb 16 2011)
The hallmark genes AND by implication the major lineages of modern viruses (at least viruses of prokaryotes) descend directly from a primordial gene pool
Koonin Senkevich Dolja Biol Direct 2006 129
Viral hallmark genes arebullpresent in a huge diversity of viruses and other selfish elementsbullrepresented only by remote homologs in cellular life forms
-synergy with the diversity of genomic strategies
A crucial corollary If viruses come directly from a primordial gene pool then origin of viruses isinextricably linked to the origin of cells
Cell degeneration Escaped genes Pr imordial geneticsystems
CELL
SMALLPARASITICCELL
VERYSMALLVIRUS
CHROMOSOME
PLASMID
VIRUS
RNA
VIRUS
PRE-CELLULAR LIFE FORMS
DNA
VIRUS
mRNA
VIRUS
Competing concepts of the or igin of viruses
CELLS
Origin and evolution of virus-like genetic
elementsin the pre-cellular
era
Koonin Martin TIG 2005Koonin Senkevich Dolja Biol Direct 2006 129Koonin Ann NY Acad Sci 2009
Replicon fusionyields chromosomes
bullViruses and virus-like genetic elements are not ldquojustrdquo pathogens they aredominant entities in the biospherebullEmergence of virus-like parasites is inevitable in any replicating systembullIn the pre-cellular epoch the genetic elements that later became viral andcellular genomes comprised a single pool in which they mixed matched andevolved new increasingly complex gene ensemblesbullDifferent replication strategies including RNA replication reverse transcriptionand DNA replication evolved already in the primordial genetic poolbullWith the emergence of prokaryotic cells a distinct pool of viral genes formed that retained its identity ever since as evidenced by the extant distribution ofviral hallmark genes ldquovirus worldrdquo or the virospherebullThe emergence of the eukaryotic cell was a second melting pot of virus evolution from which viruses of eukaryotes originated via recombination of genes from prokaryote viruses retroelements and the evolving eukaryotic hostbullViruses make essential contributions to the evolution of the genomes of cellularlife forms in particular as vehicles of HGT GTAs transducing phages
The ancient Virus World (VW)
Koonin EV Senkevich TG Dolja VV The ancient Virus World and evolution of cells Biol Direct 2006 Koonin EV Wolf YI Nagasaki K Dolja VV The complexity of the virus world Nat Rev Microbiol 2009
Evolution of antivirus defense systems
bull CRISPRCas system of adaptive immunity in prokaryotes
bull A case for Lamarckian evolutionbull The perennial virus-host arms race
CRISPR repeats and Cas genesCRISPR Clustered Regularly interspaced short palindromic repeatsCas CRISPR-associated (genes)
Sorek et al Nature Rev Microbiol 2008
Makarova KS Aravind L Grishin NV Rogozin IB Koonin EVA DNA repair system specific for thermophilic Archaea and bacteria predicted by genomic context analysisNucleic Acids Res 2002 Jan 1530(2)482-96
During a systematic analysis of conserved gene context in prokaryotic genomes a previously undetected complex partially conserved neighborhood consisting of more than 20 genes was discovered in most Archaea and some bacteria including the hyperthermophiles Thermotoga maritima and Aquifex aeolicus The gene composition and gene order in this neighborhood vary greatly between species but all versions have a stable conserved core that consists of five genes One of the core genes encodes a predicted DNA helicase often fused to a predicted HD-superfamily hydrolase and another encodes a RecB family exonuclease three core genes remain uncharacterized but one of these might encode a nuclease of a new family helliphelliphelliphellipThe functional features of the proteins encoded in this neighborhood suggest that they comprise a previously undetected DNA repair system which to our knowledge is the first repair system largely specific for thermophiles to be identified
Protein components of the system an update with unification of many diverse families
~25 families altogetherFamily SubfamilyA Phyletic
distributionBComments
1 COG1518 COG1518 (cas1) All Putative novel nucleaseintegrase Mostly α-helical protein
2 COG1343 COG1343 (cas2) COG3512ygbF-like
MTH324-likey1723_N-like
All Small protein related to VapD fused to helicase (COG1203) iny1723-like proteins
3 COG1203 COG1203 (cas3) All DNA helicase Most proteins have fusion to HD nuclease
4 RecB-like nuclease
COG1468 (cas4) COG4343 All RecB-like nuclease Contains three-cysteine C-terminal cluster
5 RAMP Repair-associated mysterious
protein
COG1688 (cas5)COG1769 COG1583COG1567 COG1336COG1367 COG1604COG1337 COG1332COG5551
BH0337-likeMJ0978-likeYgcH-like
y1726-like y1727-like
All Belong to ldquoRAMPrdquo family possibly RNA-binding proteinstructurally related to duplicated ferredoxin fold (PDB 1wj9)
6 COG1857 COG1857 COG3649 YgcJ-like y1725-like
All αβ protein predicted nuclease or integrase
7 HD-like nuclease
COG1203 (N-terminus) COG2254 All HD-like nuclease
8 BH0338 BH0338-likeMTH1090-like
All mostlyarchaea and FIRM
Large Zn-finger containing proteins probably nucleases (nucleaseactivity was shown for MTH1090 (ref)
9 COG1353 predicted
polymerase
COG1353 Most archaea some bacteria
Predicted palm-domain polymerase distantly related to viral RdRp and RT
hellipand ~15 other less common proteins Makarova et al BD 2006
CRISPR clustered regularly interspaced short palindromic repeats
CRISPR repeats
TB and TA
Strepto-like
Ecoli-like
Pasteurella-like
Cas2COG1343
Cas1COG1518
Cas4COG1468
Cas3COG1203
ldquoThe common structural characteristics of CRISPR loci are (i) the presence of multiple short direct repeats which show no or very little sequence variation within a given locus (ii) the presence of non-repetitive spacer sequences between the repeats of similar size (iii) the presence of a common leader sequence of a few hundred basepairs in most species harbouring multiple CRISPR loci (iv) the absence of long open reading frames within the locus and (v) the presence of the cas1 gene accompanied by the cas2 cas3 or cas4 genes in CRISPR-containing speciesrdquo
The cas genes are our ldquorepairrdquo system
ldquoHelicaserdquo cassette
ldquoPolymeraserdquo cassette
paragon of prokaryotic genome plasticity-extensive gene shuffling-evidence of multiple horizontal transfers-widespread gene loss
RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins
extreme sequence diversity MOTIFs specific I II III IV V COGs 13361367 hhshhGs ust-lKGhh+hh hhGtt hD lGhttsGh 160413371332 y1726-like slhlpEKuVRGT lRTIDs YGuVTsGhuh COG1851 hGphpGpsaFh hGFGRh BH0337-like TpA-h+GIh-uIh hhLpDV LGsREhuht COG1567 hhhpp ups-lhtAhh luscoGhGh COG1769 hhhh+Ph-hhh ss-hhGhlhsh hGh hGtcp+hsthchp COG1688 (Cas5) hhhhhts sssshhGhlsh lGttphh COGs 15835551 hhhhoPhhl hGtppshGFGl YgcH-like hHphlh hGu+uhGhGhh y1727-like LHphLh GFsthGLStss MJ0978-like hHNH lG+tsuhGhGol
Ferredoxin foldRNA-recognition motif (RRM)
The sequence similarity space of CRISPR repeats visualized with the BioLayoutprogram [26] Dots denote individual repeat sequences connecting lines represent Smith-Waterman similarities such that closer dots represent more similar sequences Dot colors denote cluster association as derived from MCL clustering The 12 largest clusters are indicated by circles together with their sequence logos coarse phylogenetic composition and sample secondary structures where applicableKunin et al Genome Biology 2007 8R61 doi101186gb-2007-8-4-r61
CRISPR show extreme diversity and complex clustering
Arcful GTTGAAATC-------AGACCAAAATGGGATTGAAAG- Metthe GTTAAAATC-------AGACCAAAATGGGATTGAAAT- Pyraby GTTCCAATA-------AGACTAGAATAGAATTGAAAG- Pyrhor GTTTAATAA--------GACTAAAATAGAATTGAAAG- Sulsol GATTAATCC-------------CAAAAGAATTGAAAG- Pyrhor GTTTCCGTA--------GAACTAAATAGTGTGGAAAG- Pyraby GTTTCCGTA--------GAACTTAGTAGTGTGGAAAG- Pyraer GTTTCAACT------------ATCTTTTGATTTTTGG- Pictor GTTAAATAA-------TAACCTAAATAGGATTGAAAG- Theaer GTAAAATAG---------ACCTTAATAGGATTGAAAG- Metace GTTTCAATC-------CCTCAAAGGTCTGATTTTAAC- Sultok GATGAATCC------------CAAAAGGAATTGAAAG- Sulaci GTTTTAGTT-------------TCTTGTCGTTATTAC- Pictor GTTTAAGAA--------TTACTAGATAGTATGGAGT-- Halmar GTTTCAGAC-------GGACCCTTGTGGGGTTGAAGC- Metjan ATTAAAATC-------AGACCGTTTCGGAATGGAAAT- Metkan GTTTCATTA-CCCGTATTATTACGGGTTAATTGCGAG- Nanequ CTTTCAATA---------TTTCTAATATATTAGAAAC- Themar GTTTCAATA-------CTTCCTTAGAGGTATGGAAAC- Theten GTTTCAATC-------CCTTTTAGGTAGGCTAAAAAC Thethe GTTGCAAAC-CTCGTTAGCCTCGTAGAGGATTGAAAC- Aquaeo GTTTTAACT--------CCACACGGTACATTAGAAAC- Bachal GTCGCACTC----TATATGGGT-GCGTGGATTGAAAT- Azoarcus GTGTTCCCC-------GCGCATCGCGGGGGTTGAAG-- Chltep GTCTTCCCC--------ACGCC-CGTGGGGGTGTTTC- Chrvio GTGCTCCCC-------CACGCA-CGTGGGGATGAACCG Clotet GTATTAGTA-------GCACCA-TATTGGAATGTAAAT Anabaena GTTTTAATTAACAAAAATCCCTATCAGGGATTGAAAC- Myctub GTTTCCGTC--CCCTCTCGGGGTTTTGGGTCTGACGAC Yerpes GTTCACTGC--------CGCACAGGCAGCTTAGAAA-- Metcap GTTTCAATCCACTCCCGGCTATTTAGCCGGGAGATAC- Mycgal GTTTTAGCA-CTGTACAATACTTGTGTAAGCAATAAC- Pasmul GTTAACTGC--------CGTATAGGCAGCTTAGAAA-- Pasmul GTTGTAGTTCCCTCTCTCATTTCGCAGTGCTACAAT-- Pasmul GTTCACCAT--------CGTGTAGATGGCTTAGAAA--
All
A large subset of CRISPRs is conserved among diverse specieseven between archaea and bacteria suggesting that
CRISPRs are horizontally transferred together with cas genes
Mojica FJ Diez-Villasenor C Garcia-Martinez J Soria E
Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elementsJ Mol Evol 2005 Feb60(2)174-82
Here we show that CRISPR spacers derive from preexisting sequences either chromosomal or within transmissible genetic elements such as bacteriophages and conjugative plasmids
helliphelliphelliphellipThe transcription of the CRISPR loci (Tang et al 2002) suggests that such activity could be executed by CRISPR-RNA molecules acting as regulatory RNA that specifically recognizes the target through the homologous RNA-spacer sequence similarly to the eukaryotic interference RNA
HypothesisCRISPRCasbull is a prokaryotic immunity system that functions on the
RNAi principle
bull integrates short fragments of essential phageplasmid genes into CRISPRs
bull When expressed these fragments (psiRNA ndash after prokaryotic siRNA) silence the target gene and make the organism immune to the respective agent
bull contains all or most of the protein activities involved in these processes
bull Some of the Cas proteins are functional analogs of the eukaryotic proteins involved in RNAi in particular components of RISC (RNA-Induced Silencing Complex) and form prokaryotic analogs of RISC (pRISCs)
Transcription
(protein-guided) RNA folding
specific transcription factor (COG1517)
cellular RNApol
polycistronic pre-psiRNA
p-dicer (Helicase+HD-Hydrolase)(COG1203 COG2254)
RNA processing(fast)
pre-psiRNA75-100 nt
RNA processing(slow)
p-dicer
psiRNA25-45 nt
p-slicer(COG1468 COG4343
COG1857)
RAMP
target RNA cleavage
p-RISC
plasmid or phage mRNA
Annealing to RNA target
p-RISC
The basic scheme of CASS functioning
psiRNA25-45 nt
RAMP
RAMP-RNA complex(unstable)plasmid or phage
mRNA
Primer elongationCASS RNApol(COG1353)
Amplified annealing
long dsRNA (stable)
p-dicerDuplex degradation
RAMPRAMP binding
Cycle continues
Annealing to RNA target
Variant of CASS functioning with polymerasepsi-RNA amplification(mostly in thermophiles but also in Mycobacteria)
plasmid or phage mRNA
pre-psiRNA75-100 nt
CASS RNApol(COG1353)Or other RT
Reverse transcription with random copy choice
Random RNA recombination and reverse transcription
dsDNA with CRISPRs and target-derived spacers
Homologous recombination with genomic CRISPR region
integrase(COG1518)
genomic DNA
genomic DNA with new target-derived spacer
OR
New psiRNA generation
Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12
Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity
Key validation
bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease
Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes
(1) Adaptation
(2) Expression amp Processing
(3) Interference
invading virus plasmid
CRISPRleader
5 4 3 2 1
6 5 4 3 2 1
invading virus plasmid
invading virus plasmid
host
host
cas operon
Cascade
Cas3
cas3 cas2cas1cas5
Van der Oost et al TIBS 2009
The three types of CRISPRCas systems and their signature genes
1343
1343
1343
1343
Helicase1203
HD-f
Cas2
Cas6
Cas6
Cas5
Cas8
Cas9
Cas1
Cas3
Cas10
Cas11
Cas7
Cas6
Cas6
Cas5 Cas6
Cas4
Cas6
13433513
1343
RuvC-f
3513
HNH
13433513
1857
18571857 RAMP1688
RAMP1688
RecB-f1468
RecB-f1468
Helicase1203
R AMP1583
1857 1857Helicase1203
1857Helicase1203
1343 y1724
ygcKygcL
EcoliCASS2
YpestCASS3
NmeniCASS4
MtubeCASS6 RAMP module
DvulgCASS1
Tneap and HmariCASS7
ApernCASS5
Type I (Helicase cassette)
Cascade
Type II (HNH-type)
Type III (RAMP module or Polymerase cassette)
CASS4a
1857 1343RecB-f1468
Helicase1203
BH0338MTH1090LA3191
1421 RAMP1567
RAMP15833337RAMP
1583
RAMP1583
RAMP1769
SPy1049
Cas6
Cas61857
RecB-f1468
RAMP1688
Helicase1203
AF0070
R AMP1583
1343
1517
Cas12 Cas13
RAMP1688
RAMP1688
RAMP1688
Makarova et al NRM submitted
Experimental data on CRISPRCas systems
1857
1857
1857
RAMP1688
10
Cordip DIP0037Bacfra BF3951
Wolsuc WS1444Camjej Cj1522c
Neimen NMA0630Pasmul PM1126
Strthe str0658Strpyo Spy 0770
Treden TDE0328Staepi SERP2463
Mycgal MGA 0523Mycmob MMOB0320
Porgin PG1982Legpne lpp0161
Wolsuc WS1615Sulsol SSO1450
Aerper APE1240Sulsol SSO1405
Pyraer PAE0200Pyraer PAE0081
Metmaz MM0559Metmaz MM3249
Arcful AF1878Mansuc MS1635
Niteur NE0111Myctub Rv2817c
Nostoc alr0381Synech slr7071Thethe TTHB145
Chrvio CV1229Xanaxo XAC3842
Chltep CT1130Desvul DVUA0134
Metcap MCA0651Azoarc ebA3283
Deigeo_DRAFT_1682Mansuc MS0981
Baccla ABC3592Bachal BH0341
Strpyo Spy 1286Vibvul VVA1544
Nostoc alr1468Synech slr7092
Synech slr7016Geosul GSU0057
Thethe TTHB224Nostoc alr1568
Metkan MK1312
Esccol b2755Saltyp STM2938
Phopro PBPRC0034Geosul GSU1392
Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639
Symthe_STH663Corjei jk0643
Nocfar nfa44220Strave SAV7537
Metcap MCA0930Cordip DIP2214
Chrvio CV1756Zymmob ZMO0680
Pasmul PM0311Erwcar ECA3679
Yerpes y1722Legpne lpl2837Phopro PBPRB1995
Acinet ACIAD2484
Clotet CTC01148Fusnuc FN1177
Aquaeo aq 369Theten TTE2658Themar TM1797
Bacfra BF2544Porgin PG2014
Halmar pNG4053Clotet CTC01463
Metjan MJ0378Pyrhor PH1245
Thekod TK0455Metthe MTH1084
Arcful AF2435Metace MA3670
Nanequ NEQ017Pyrhor PH0173
Pictor PTO0003Pictor PTO0049Thevol TVN0106
1343
1343
1343
1343
1518
1518
1343
1518
RecB-f4343
RecB-f1468
RecB-f1468
RecB-f1468
Helicase12031857 RAMP
1688
RAMP1688
Helicase1203
Helicase1203
RAMPBH0337
RAMPYgcH
RAMPy1727
RAMPy17261857
Helicase1203
HD-f
Helicase1203
HD-f
1343
HD-f
HD-f
HD-f
RuvC-f
AF0070
BH0338
y1724
ygcKygcL
RAMP1583
HTH
AF1870
SPy1049
3574
AF1870
PH0918
HTH 3574
1857 RAMP1688
RecB-f1468
Helicase1203
HD-f
AF1870
MTH1090
AF1873
RAMP1583
3574
AF1870
3574
1343
1
7
2
3
4
5
6
7a
Sulsol SSO1999Sulsol SSO1440
Sultok ST2639Sultok ST0032
Sulsol SSO1402Pyraer PAE0068
Arcful AF1874Pyrhor PH0917Thekod TK0450
Pyraby PAB1689
13433513
3513
1518
RuvC-f
HNH
4aHNH
1 0
Pyrhor PH0162
6
64
6
4
16
17
7
7
7
7
1
2
7
7
7
7
5 7
5 75 7
5 75 7
5 7
6 1
7
Metkan MK1314
Metace MA1931
Metmaz MM3356
Metbar Mbar_A1355
Strthe stu0960Myctub Rv2823cStaepi SERP2461
Themar TM1811Mansuc MS1651
Niteur NE0123Thethe TTP0102
Metthe MTH1082
Metthe MTH326
Metjan MJ1672Pictor PTO0053Thevol TVN0112
Metkan MK1297Thethe TTP0115
Bachal BH0328Synech sll7090
Aquaeo aq 387Theten TTE2639
Themar TM1794Arcful AF1867
Pyrfur PF1129Thefus Tfu1578
Porgin PG1987Sultok ST1979
Sulsol SSO1991Sulsol SSO1729
Sultok ST0012Sulaci Saci 2046Nostoc all1479
1421 RAMP1567
RAMP1583
RAMPMTH323
RAMP1769
XMTH324
ldquoPolymeraserdquo cassette
ldquoHelicaserdquo cassette(A)
(B)
(C)
Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4
Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56
Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)
mRNA targeting
Phylogeny of Cas1 and the 3 types of CRISPRCas systems
228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes
EcoliCASS2Type IType I
Type I
Type I
Type I
Type III
Type III
Type III
Type III
Type III
Type III
Type II
YpestCASS3
NmeniCASS4
PolymeraseRAMPmoduleMtubeCASS6
PolRAMP
DvulgCASS1
TneapHmariCASS7
ApernCASS5
cas1cas9 cas2 cas4
cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI
II
IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2
RAMP
Makarova et al NRM submitted
CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes
54
256
1537
26
516
2
9
7 56107
3
10
13
383
210
46
216
8
1
7 70211
10
1
0102030405060708090
100
1623
84 6
17 7
2322
0
9
0
0
151
2
1
21
17
20 1
0
1533 28 7 14
0
9 7 40
117 210
0102030405060708090
100
Type I Type II Type IIICas1presenceabsence
Makarova et al in preparation
Modular evolution of the 3 types of CRISPRCas
Makarova et al in preparation
Mol Microbiol 2011 Jan79(2)484-502
A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair
Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF
Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair
Back to repair functions
The 3 major modalities of evolution
Koonin Wolf Biol Direct 2009
CRISPRCas as a bona fide Lamarckian system
Koonin Wolf Biol Direct 2009
Phenomenon Biological rolefunction
Phyletic spread
Lamarckian criteria
Genomic changes caused by environmental factor
Changes are specific to relevant genomic loci
Changes provide adaptation to the causative factor
Bona fide LamarckianCRISPRCas Defense against
viruses and other mobile elements
Most of the Archaeaand many bacteria
Yes Yes Yes
piRNA Defense against transposable elements in germline
Animals Yes Yes Yes
HGT (specific cases)
Adaptation to new environment stress response resistance
Archaea bacteria unicellular eukaryotes
Yes Yes Yes
Quasi-LamarckianHGT (general phenomenon)
Diverse innovations Archaea bacteria unicellular eukaryotes
Yes No Yesno
Stress-induced mutagenesis
Stress responseresistanceadaptation to new conditions
Ubiquitous Yes No or partially
Yes (but general evolvabilityenhanced as well)
Diverse Lamarckian and quasi-Lamarckian phenomena
Koonin Wolf Biol Direct 2009
Stress as a gauge of evolutionary modality
Koonin Wolf Biol Direct 2009
bullEvolution of parasites is intrinsic to any replicator system
bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since
bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms
bullPerennial arms race between parasites and hosts is one of the principal factors of evolution
Valerian DoljaOregon State University
Yuri Wolf NCBITatiana Senkevich NIAID NIH
Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille
Didier Raoult et al
Bill MartinUniv Duesseldorf
Kira Makarova NCBI
bullViruses are the most abundant biological entities in the biosphere there are 10-100 virus particles per cellbullThe pangenomes of viruses and cellular organismshave [at least] comparable complexities
1 cm3 of seawater contains 106-109 virus particles
There are millions of diverse bacteriophage speciesin the water soil and gut
Suttle CA (2005) Nature 437356
Edwards and Rohwer (2005) Nat Rev Microbiol 3504
Viruses are the dominant entities in the biosphere ndash physically and genetically ndash as shown by viral metagenomics ndash virome studies
Mean of sequences with matches to major functional categories
0
5
10
15
Microbial Metagenomes
Vira
l Met
agen
omes
0 5 10 15
Nucleosides and nucleotides
Cell wall and capsule
Fatty acids and lipids
Membrane transport
Stress response
Aromatic compound
Cell division and cell cycle
Nitrogen metabolism
Sulphur metabolism
Motility and chemotaxis
Phosphorus metabolism
Potassium metabolism
Cell signalling
Secondary metabolismDNA metabolism
Carbohydrates
Amino acids
Virulence
Protein metabolism
Respiration
Photosynthesis
Cofactors etc
RNA metabolism
20
Key
Kristensen Mushegian Dolja Koonin Trends Microbiol 2010
Most of the viromes might not even consists of typical viruses butrather of pseudovirus particles that carry microbial genes (GTAs)
Comparative genomics shows that viruses that cause human diseases
belong to families that evolved hundreds of millions or even billions
years ago
Viruses accompany the evolving cellular life throughout its history and
might even predate it
Some viruses are comparable to cellular life forms in size and genetic complexity
Mimivirus genome (~12 Mbp ~1000 genes) is twice as large as that of Mycoplasma genitalium (580 kbp ~500 genes)
The largest most complex viruses NCLDV (Nucleo-Cytoplasmic Large DNA viruses of eukaryotes) ndashthis is where the smallpox virus belongs
Raoult et al Science 2004 Nov 19306(5700)1344 Suzan-Monti et al PLoS ONE 2007 Mar 282(3)
Iyer Aravind KooninCommon origin of four diverse families of large eukaryotic DNA viruses J Virol 2001 75 11720Iyer et al Evolutionary genomics of nucleo-cytoplasmic large DNA viruses Virus Res 2006 Apr117(1)156
The largest most complex viruses the Nucleocytoplasmic Large DNA Viruses (NCLDV)
(this is where the smallpox virus AND themimivirus belong)
6 families of NCLDVhellipand counting
size kb hosts-poxviridae 26 [134-360] vertebrates insects-asfarviridae 1 [170] vertebrates protists()-iridoviridae 8 [103-212] vertebrates insects protists()-ascoviridae 4 [119-174] insects-phycodnaviridae 9 [155-407] algae haptophytes stramenopiles -mimiviridae 2 [1181-1200] amoebozoa algae()-[Marseille virus] 1 [368] amoebozoa ndash new family
The case for the monophyly (common origin) of NCLDV
9 universally conserved hallmark genes (vaccinia gene names)-primase (D5-N)-helicase (D5-C)-DNA polymerase (E9)-packaging ATPase (A32)-Major capsid protein (D13 non-capsid in poxviruses)-Thiol-oxidoreductase (E10)-Helicase (D6 D11)-ST protein kinase (F10)-Transcription factor VLTF2 (A1)
47 genes mapped to the last common ancestor of NCLDVby maximum likelihood ndash all main functional systems represented
Iyer et al J Virol 2001 Virus Res 2006 Yutin et al Virol J 2009
b1 Helvib1 Spofr
b1 Trinil2 Invir
l1 Aedtal3 Lymch
l3 Lymdil5 Ambtil5 Frovi
l5 Singrl4 Infsp
m6 Marseillevirusn1 Acapon2 Mamav
q1 Acatuq1 ParFRq1 ParMTq1 ParAR
q1 ParNYq1 Parbu
q6 Ostviq3 Ectsiq3 Felspq2 Emihu
u2 Amsmou2 Melsa
u1 Bovpau1 Orfviu1 Deevi
u1 Swiviu1 Myxviu1 Rabfi
u1 Goaviu1 Lumsku1 Shevi
u1 Tanviu1 Yabliu1 Yabmo
u1 Vacviu1 Varvi
u1 Molcou1 Crovi
u1 Canviu1 Fowvi
c1 Afrsw
1000010000
93159970
10000
1000010000
6702
9983
10000
7557
10000
1000010000
999310000
10000
9910
10000
7074
10000
6735
10000
10000
7417
10000
4747
454010000
9840
1000010000
10000
10000
10000
9720
9911
5452
9871
10000
10000
10000
05
Phycodnaviridae
Poxviridae
Ascoviridaeamp
Iridoviridae
MimiviridaeMarseillevirus
Asfarviridae
Phylogeny of NCLDV based on concatenation of 4 universal genes primase-helicase DNA polymerase packaging ATPase VLTF2 transcription factor
HOSTAnimals+ diverse protists
Amoebozoa Algae animals()
ChlorophytesHaptophytesstramenopiles
Animals haptophytes
bullDivergence of NCLDVmost likely antedatesdivergence of eukaryoticsupergroupsbullAlternativeaddition extensive horizontal transfer of viruses
Boyer Yutin et al PNAS 2009Koonin Yutin Intervirology2010
Animals
Boyer M Yutin N Pagnier I Barrassi L Fournous G Espinosa L Robert C AzzaS Sun S Rossmann MG Suzan-Monti M La Scola B Koonin EV Raoult DPNAS 2009106(51)21848-53
Giant Marseillevirus highlights the role of amoebae as a melting pot in emergence of chimeric microorganisms
Giant viruses such as Mimivirus isolated from amoeba found in aquatic habitatsshow biological sophistication comparable to that of simple cellular life formsand seem to evolve by similar mechanisms including extensive gene duplicationand horizontal gene transfer (HGT) possibly in part through a viral parasitethe virophage We report here the isolation of Marseille virus a previouslyuncharacterized giant virus of amoeba The virions of Marseillevirus encompass a 368-kb genome a minimum of 49 proteins and some messenger RNAs Phylogenetic analysis of core genes indicates that Marseillevirus is the prototypeof a family of nucleocytoplasmic large DNA viruses (NCLDV) of eukaryotes The genome repertoire of the virus is composed of typical NCLDV core genes and genes apparently obtained from eukaryotic hosts and their parasites or symbionts both bacterial and viral We propose that amoebae are melting potsldquoof microbial evolution where diverse forms emerge including giant viruses with complex gene repertoires of various origins
The mosaic composition of the Marseille virus genome
Gene content tree intervirus gene transfer
Amoeba as a melting pot for HGT between viruses andbacterial endosymbionts
There are really weird creatures out thereSome NCLDV host their own parasites
La Scola et al The virophage as a unique parasite of the giant mimivirus Nature 2008
Chimeric origin of the virophage genome
NCLDV
Mimivirus
Archaeal virus
NCLDVenvironmental only
Bacterialtransposon
La Scola et al Nature 2008 455100-4
Hypothesisorigin of the NCLDV (and other viruses ofEukaryotes) in the
the second melting pot of virus evolution ndasheukaryogenesis
Phage scaffold(virus hallmark genes)
Eukaryotic additionsdisplacements
Koonin Yutin Intervirology 2010
Poliovirus74 kb
VP0 3DPol2C 3CPro Angg
IRESVP1VP3 2APro 2B
3A 3B
Jelly-roll CPsSuperfamily 3
helicase RdRp
Viral hallmark genes
3C-Pro3DPol
RdRp3CPro g
VPg2C
S3H
Picornaviral lsquosignaturersquo genes
Protein sequence conservation
Some of the smallest viruses (this is where poliovirus belongs) The Big Bang of picorna-like virus evolution
Koonin Wolf Nagasaki Dolja Nat Rev Microbiol 2008 Dec6(12)925-39
Picorna-likeviruses possessdiverse arraysof hallmark andunique genes
Picorna-likeviral superfamily
Marine eukaryotic plankton carries a wealth ofpositive-strand RNA viruses nearly all belong to the picorna-like superfamily
Lang et al (2004) Virology 320206
HcRNAV44 kb
VP
HaRNAV86 kb
S-Pro Pol
VP1VP3VP2PolHel3
RsRNAV89 kb AnPolHel3 VP 1-3
SssRNAV90 kb An
PolHel3 VP 1-3C-Pro
sg sg
An
Nagasaki et al (2005) Appl Env Microbiol 718888
Culley et al (2006) Science 3121795
The amended Picornavirus-like
superfamily includes 14 recognized viral families
4 floating genera and15 unclassified positive-
strand anddouble-strand RNA
viruses that infect hosts from 4 of the 5 eukaryotic
supergroups-6 distinct clades
from RdRp phylogeny-diverse genome layouts
View from the viral side
6 major clades of picorna-likeviruses - 5 infect eukaryotesfrom different supergroups
The 5 supergroups of eukaryotes and their picorna-like viruses
Complementary view from thehost side
4 of the 5 eukaryoticsupergroups host diverse picorna-like viruses (no onechecked the 5th)
PolioV74 kb
VP0 3DPol2C 3CPro Angg
IRESVP1VP3 2APro 2B
3A 3B
Jelly-roll CPsSuperfamily 3
helicase RdRp
DNA phages Bacterial group II
retroelements
Bacterial and mitochondrialHtrA family proteases
Likely origins of the signature genes of the picorna-like superfamily were inferred usingsignature PSSMs and PSI-BLAST searches
3C-Pro
Hallmark genes of picorna-like viruses have distinct prokaryoticorigins
Radiation of major viralclades occurred in aldquoBig Bangrdquo during eukaryogenesis and antedates the divergenceof eukaryotic supergroups-the viruses then ldquosampledrdquo the hosts
Conclusions on the picorna and NCLDV storiesBig Bangs of virus evolution
bullPhylogenomic analysis of the picorna-like superfamily of eukaryotic RNA viruses indicates that the major lineages within this superfamily diverged prior to the divergence of the eukaryotic supergroupsbullBig Bang ndash an explosive early phase of viral evolutionbullMost likely the same pattern holds for other major groups ofviruses as illustrated by the evolutionary study of NCLDVa completely different group of virusesbullThe Big Bangs of eukaryotic virus evolution occurred concomitantly with a similar rapid phase of host evolution andcould be a manifestation of a general model of major evolutionarytransitions
Koonin The Biological Big Bang model for the major transitions in evolutionBiol Direct 2007 Aug 20221
Koonin Wolf Nagasaki Dolja Nat Rev Microbiol 2008 Dec6(12)925-39
+
1 Positive-strandRNA
3-30 kb - +R R ET
T+R
Class Replication cycle
2 Double-strandRNA
4-25 kb plusmnTr R E
T
+ plusmn
3 Negative-strandRNA
11-20 kbR R E
RdRp CPT
+-Tr -
+
Genetic cycles of RNA viruses
Diversity of viral genetic cycles versus the uniform genetic cycle of all cellular organisms Viruses are the
biospherersquos laboratory of genomic strategies
RdRp CP
RdRp CP
+4 Retroid
RNA viruses 7-12 kb +RT Tr E
T+Tr
plusmn
5 Retroid DNAviruses
elements2-10 kb +Tr
T+Tr
plusmnE RT
plusmn
RT CP
RT
Class Replication cycle
Genetic cycles of retroid viruses and retroelements
Genetic cycles of DNA viruses and plasmids 6 ssDNAvirusesplasmids2-11 kb
7 dsDNA virusesplasmids
5-1200 kb
+RCR RCR E
+Tr
plusmn +
TRCE
Tr
plusmn
+T
plusmnR E
Pr-Pol ALL CELLULARORGANISMS
YOU ARE HERE
I Genes with readily detectable homologs from cellular life forms 1 Genes with closely related homologs from cellular organisms
(typically the host of the given virus) present in a narrow group of viruses2 Genes that are conserved within a virus lineage or even several
lineages and have moderately close cellular homologsOrigin relatively recent (1) or ancient (2) acquisition from host
II Virus-specific genes3 ORFans ie genes without detectable homologs except possibly
in closely related viruses 4 Virus-specific genes that are conserved within a virus lineageAcquisition from host but with rapid divergence from ancestor once within viral genomes
III Viral hallmark genes5 Genes shared by many diverse virus lineages with only very distant
homologs in cellular organisms
Natural history of viral genes a one-page summary of viral comparative genomics
Contributions of different classes of viral genes to the genomes of different classes of viruses strong dependence on genome size
Genome size (log) 0 1 2 3
Recent acquisitions
Old acquisitions
Virus-specific ORFans
Virus-specific conserved
Virus hallmark
Most RNAvirusesretroelementsRCR replicons
Large RNA virusesadenotailedphages
NCLDVherpeslargephages
Natural history of viral genesViral Hallmark Genes
Shared by many diverse groups of viruses from the smallest RNA viruses to the giant DNA viruses
Strong support for monophyly of all viral members of the respective gene families
Only distant homologs in cellular organisms
Can be viewed as signatures of the lsquovirus statersquo
Play major roles in genome replication packagingand assembly
4 Rolling circle replicationinitiation endonuclease
1 Jelly-roll capsid protein
2 Superfamily 3 helicase
Protein products of viral hallmark genes
3 RNA-dependent RNA polymeraseand Reverse transcriptase
5 Viral archaeo-eukaryotic DNA primase
6 UL9-like superfamily 2 helicase
7 Packaging ATPase of the FtsK family
8 ATPase subunit of terminase
The primordial gene pool hypothesis (extremely counterintuitive ndash Santiago Elena Feb 16 2011)
The hallmark genes AND by implication the major lineages of modern viruses (at least viruses of prokaryotes) descend directly from a primordial gene pool
Koonin Senkevich Dolja Biol Direct 2006 129
Viral hallmark genes arebullpresent in a huge diversity of viruses and other selfish elementsbullrepresented only by remote homologs in cellular life forms
-synergy with the diversity of genomic strategies
A crucial corollary If viruses come directly from a primordial gene pool then origin of viruses isinextricably linked to the origin of cells
Cell degeneration Escaped genes Pr imordial geneticsystems
CELL
SMALLPARASITICCELL
VERYSMALLVIRUS
CHROMOSOME
PLASMID
VIRUS
RNA
VIRUS
PRE-CELLULAR LIFE FORMS
DNA
VIRUS
mRNA
VIRUS
Competing concepts of the or igin of viruses
CELLS
Origin and evolution of virus-like genetic
elementsin the pre-cellular
era
Koonin Martin TIG 2005Koonin Senkevich Dolja Biol Direct 2006 129Koonin Ann NY Acad Sci 2009
Replicon fusionyields chromosomes
bullViruses and virus-like genetic elements are not ldquojustrdquo pathogens they aredominant entities in the biospherebullEmergence of virus-like parasites is inevitable in any replicating systembullIn the pre-cellular epoch the genetic elements that later became viral andcellular genomes comprised a single pool in which they mixed matched andevolved new increasingly complex gene ensemblesbullDifferent replication strategies including RNA replication reverse transcriptionand DNA replication evolved already in the primordial genetic poolbullWith the emergence of prokaryotic cells a distinct pool of viral genes formed that retained its identity ever since as evidenced by the extant distribution ofviral hallmark genes ldquovirus worldrdquo or the virospherebullThe emergence of the eukaryotic cell was a second melting pot of virus evolution from which viruses of eukaryotes originated via recombination of genes from prokaryote viruses retroelements and the evolving eukaryotic hostbullViruses make essential contributions to the evolution of the genomes of cellularlife forms in particular as vehicles of HGT GTAs transducing phages
The ancient Virus World (VW)
Koonin EV Senkevich TG Dolja VV The ancient Virus World and evolution of cells Biol Direct 2006 Koonin EV Wolf YI Nagasaki K Dolja VV The complexity of the virus world Nat Rev Microbiol 2009
Evolution of antivirus defense systems
bull CRISPRCas system of adaptive immunity in prokaryotes
bull A case for Lamarckian evolutionbull The perennial virus-host arms race
CRISPR repeats and Cas genesCRISPR Clustered Regularly interspaced short palindromic repeatsCas CRISPR-associated (genes)
Sorek et al Nature Rev Microbiol 2008
Makarova KS Aravind L Grishin NV Rogozin IB Koonin EVA DNA repair system specific for thermophilic Archaea and bacteria predicted by genomic context analysisNucleic Acids Res 2002 Jan 1530(2)482-96
During a systematic analysis of conserved gene context in prokaryotic genomes a previously undetected complex partially conserved neighborhood consisting of more than 20 genes was discovered in most Archaea and some bacteria including the hyperthermophiles Thermotoga maritima and Aquifex aeolicus The gene composition and gene order in this neighborhood vary greatly between species but all versions have a stable conserved core that consists of five genes One of the core genes encodes a predicted DNA helicase often fused to a predicted HD-superfamily hydrolase and another encodes a RecB family exonuclease three core genes remain uncharacterized but one of these might encode a nuclease of a new family helliphelliphelliphellipThe functional features of the proteins encoded in this neighborhood suggest that they comprise a previously undetected DNA repair system which to our knowledge is the first repair system largely specific for thermophiles to be identified
Protein components of the system an update with unification of many diverse families
~25 families altogetherFamily SubfamilyA Phyletic
distributionBComments
1 COG1518 COG1518 (cas1) All Putative novel nucleaseintegrase Mostly α-helical protein
2 COG1343 COG1343 (cas2) COG3512ygbF-like
MTH324-likey1723_N-like
All Small protein related to VapD fused to helicase (COG1203) iny1723-like proteins
3 COG1203 COG1203 (cas3) All DNA helicase Most proteins have fusion to HD nuclease
4 RecB-like nuclease
COG1468 (cas4) COG4343 All RecB-like nuclease Contains three-cysteine C-terminal cluster
5 RAMP Repair-associated mysterious
protein
COG1688 (cas5)COG1769 COG1583COG1567 COG1336COG1367 COG1604COG1337 COG1332COG5551
BH0337-likeMJ0978-likeYgcH-like
y1726-like y1727-like
All Belong to ldquoRAMPrdquo family possibly RNA-binding proteinstructurally related to duplicated ferredoxin fold (PDB 1wj9)
6 COG1857 COG1857 COG3649 YgcJ-like y1725-like
All αβ protein predicted nuclease or integrase
7 HD-like nuclease
COG1203 (N-terminus) COG2254 All HD-like nuclease
8 BH0338 BH0338-likeMTH1090-like
All mostlyarchaea and FIRM
Large Zn-finger containing proteins probably nucleases (nucleaseactivity was shown for MTH1090 (ref)
9 COG1353 predicted
polymerase
COG1353 Most archaea some bacteria
Predicted palm-domain polymerase distantly related to viral RdRp and RT
hellipand ~15 other less common proteins Makarova et al BD 2006
CRISPR clustered regularly interspaced short palindromic repeats
CRISPR repeats
TB and TA
Strepto-like
Ecoli-like
Pasteurella-like
Cas2COG1343
Cas1COG1518
Cas4COG1468
Cas3COG1203
ldquoThe common structural characteristics of CRISPR loci are (i) the presence of multiple short direct repeats which show no or very little sequence variation within a given locus (ii) the presence of non-repetitive spacer sequences between the repeats of similar size (iii) the presence of a common leader sequence of a few hundred basepairs in most species harbouring multiple CRISPR loci (iv) the absence of long open reading frames within the locus and (v) the presence of the cas1 gene accompanied by the cas2 cas3 or cas4 genes in CRISPR-containing speciesrdquo
The cas genes are our ldquorepairrdquo system
ldquoHelicaserdquo cassette
ldquoPolymeraserdquo cassette
paragon of prokaryotic genome plasticity-extensive gene shuffling-evidence of multiple horizontal transfers-widespread gene loss
RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins
extreme sequence diversity MOTIFs specific I II III IV V COGs 13361367 hhshhGs ust-lKGhh+hh hhGtt hD lGhttsGh 160413371332 y1726-like slhlpEKuVRGT lRTIDs YGuVTsGhuh COG1851 hGphpGpsaFh hGFGRh BH0337-like TpA-h+GIh-uIh hhLpDV LGsREhuht COG1567 hhhpp ups-lhtAhh luscoGhGh COG1769 hhhh+Ph-hhh ss-hhGhlhsh hGh hGtcp+hsthchp COG1688 (Cas5) hhhhhts sssshhGhlsh lGttphh COGs 15835551 hhhhoPhhl hGtppshGFGl YgcH-like hHphlh hGu+uhGhGhh y1727-like LHphLh GFsthGLStss MJ0978-like hHNH lG+tsuhGhGol
Ferredoxin foldRNA-recognition motif (RRM)
The sequence similarity space of CRISPR repeats visualized with the BioLayoutprogram [26] Dots denote individual repeat sequences connecting lines represent Smith-Waterman similarities such that closer dots represent more similar sequences Dot colors denote cluster association as derived from MCL clustering The 12 largest clusters are indicated by circles together with their sequence logos coarse phylogenetic composition and sample secondary structures where applicableKunin et al Genome Biology 2007 8R61 doi101186gb-2007-8-4-r61
CRISPR show extreme diversity and complex clustering
Arcful GTTGAAATC-------AGACCAAAATGGGATTGAAAG- Metthe GTTAAAATC-------AGACCAAAATGGGATTGAAAT- Pyraby GTTCCAATA-------AGACTAGAATAGAATTGAAAG- Pyrhor GTTTAATAA--------GACTAAAATAGAATTGAAAG- Sulsol GATTAATCC-------------CAAAAGAATTGAAAG- Pyrhor GTTTCCGTA--------GAACTAAATAGTGTGGAAAG- Pyraby GTTTCCGTA--------GAACTTAGTAGTGTGGAAAG- Pyraer GTTTCAACT------------ATCTTTTGATTTTTGG- Pictor GTTAAATAA-------TAACCTAAATAGGATTGAAAG- Theaer GTAAAATAG---------ACCTTAATAGGATTGAAAG- Metace GTTTCAATC-------CCTCAAAGGTCTGATTTTAAC- Sultok GATGAATCC------------CAAAAGGAATTGAAAG- Sulaci GTTTTAGTT-------------TCTTGTCGTTATTAC- Pictor GTTTAAGAA--------TTACTAGATAGTATGGAGT-- Halmar GTTTCAGAC-------GGACCCTTGTGGGGTTGAAGC- Metjan ATTAAAATC-------AGACCGTTTCGGAATGGAAAT- Metkan GTTTCATTA-CCCGTATTATTACGGGTTAATTGCGAG- Nanequ CTTTCAATA---------TTTCTAATATATTAGAAAC- Themar GTTTCAATA-------CTTCCTTAGAGGTATGGAAAC- Theten GTTTCAATC-------CCTTTTAGGTAGGCTAAAAAC Thethe GTTGCAAAC-CTCGTTAGCCTCGTAGAGGATTGAAAC- Aquaeo GTTTTAACT--------CCACACGGTACATTAGAAAC- Bachal GTCGCACTC----TATATGGGT-GCGTGGATTGAAAT- Azoarcus GTGTTCCCC-------GCGCATCGCGGGGGTTGAAG-- Chltep GTCTTCCCC--------ACGCC-CGTGGGGGTGTTTC- Chrvio GTGCTCCCC-------CACGCA-CGTGGGGATGAACCG Clotet GTATTAGTA-------GCACCA-TATTGGAATGTAAAT Anabaena GTTTTAATTAACAAAAATCCCTATCAGGGATTGAAAC- Myctub GTTTCCGTC--CCCTCTCGGGGTTTTGGGTCTGACGAC Yerpes GTTCACTGC--------CGCACAGGCAGCTTAGAAA-- Metcap GTTTCAATCCACTCCCGGCTATTTAGCCGGGAGATAC- Mycgal GTTTTAGCA-CTGTACAATACTTGTGTAAGCAATAAC- Pasmul GTTAACTGC--------CGTATAGGCAGCTTAGAAA-- Pasmul GTTGTAGTTCCCTCTCTCATTTCGCAGTGCTACAAT-- Pasmul GTTCACCAT--------CGTGTAGATGGCTTAGAAA--
All
A large subset of CRISPRs is conserved among diverse specieseven between archaea and bacteria suggesting that
CRISPRs are horizontally transferred together with cas genes
Mojica FJ Diez-Villasenor C Garcia-Martinez J Soria E
Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elementsJ Mol Evol 2005 Feb60(2)174-82
Here we show that CRISPR spacers derive from preexisting sequences either chromosomal or within transmissible genetic elements such as bacteriophages and conjugative plasmids
helliphelliphelliphellipThe transcription of the CRISPR loci (Tang et al 2002) suggests that such activity could be executed by CRISPR-RNA molecules acting as regulatory RNA that specifically recognizes the target through the homologous RNA-spacer sequence similarly to the eukaryotic interference RNA
HypothesisCRISPRCasbull is a prokaryotic immunity system that functions on the
RNAi principle
bull integrates short fragments of essential phageplasmid genes into CRISPRs
bull When expressed these fragments (psiRNA ndash after prokaryotic siRNA) silence the target gene and make the organism immune to the respective agent
bull contains all or most of the protein activities involved in these processes
bull Some of the Cas proteins are functional analogs of the eukaryotic proteins involved in RNAi in particular components of RISC (RNA-Induced Silencing Complex) and form prokaryotic analogs of RISC (pRISCs)
Transcription
(protein-guided) RNA folding
specific transcription factor (COG1517)
cellular RNApol
polycistronic pre-psiRNA
p-dicer (Helicase+HD-Hydrolase)(COG1203 COG2254)
RNA processing(fast)
pre-psiRNA75-100 nt
RNA processing(slow)
p-dicer
psiRNA25-45 nt
p-slicer(COG1468 COG4343
COG1857)
RAMP
target RNA cleavage
p-RISC
plasmid or phage mRNA
Annealing to RNA target
p-RISC
The basic scheme of CASS functioning
psiRNA25-45 nt
RAMP
RAMP-RNA complex(unstable)plasmid or phage
mRNA
Primer elongationCASS RNApol(COG1353)
Amplified annealing
long dsRNA (stable)
p-dicerDuplex degradation
RAMPRAMP binding
Cycle continues
Annealing to RNA target
Variant of CASS functioning with polymerasepsi-RNA amplification(mostly in thermophiles but also in Mycobacteria)
plasmid or phage mRNA
pre-psiRNA75-100 nt
CASS RNApol(COG1353)Or other RT
Reverse transcription with random copy choice
Random RNA recombination and reverse transcription
dsDNA with CRISPRs and target-derived spacers
Homologous recombination with genomic CRISPR region
integrase(COG1518)
genomic DNA
genomic DNA with new target-derived spacer
OR
New psiRNA generation
Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12
Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity
Key validation
bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease
Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes
(1) Adaptation
(2) Expression amp Processing
(3) Interference
invading virus plasmid
CRISPRleader
5 4 3 2 1
6 5 4 3 2 1
invading virus plasmid
invading virus plasmid
host
host
cas operon
Cascade
Cas3
cas3 cas2cas1cas5
Van der Oost et al TIBS 2009
The three types of CRISPRCas systems and their signature genes
1343
1343
1343
1343
Helicase1203
HD-f
Cas2
Cas6
Cas6
Cas5
Cas8
Cas9
Cas1
Cas3
Cas10
Cas11
Cas7
Cas6
Cas6
Cas5 Cas6
Cas4
Cas6
13433513
1343
RuvC-f
3513
HNH
13433513
1857
18571857 RAMP1688
RAMP1688
RecB-f1468
RecB-f1468
Helicase1203
R AMP1583
1857 1857Helicase1203
1857Helicase1203
1343 y1724
ygcKygcL
EcoliCASS2
YpestCASS3
NmeniCASS4
MtubeCASS6 RAMP module
DvulgCASS1
Tneap and HmariCASS7
ApernCASS5
Type I (Helicase cassette)
Cascade
Type II (HNH-type)
Type III (RAMP module or Polymerase cassette)
CASS4a
1857 1343RecB-f1468
Helicase1203
BH0338MTH1090LA3191
1421 RAMP1567
RAMP15833337RAMP
1583
RAMP1583
RAMP1769
SPy1049
Cas6
Cas61857
RecB-f1468
RAMP1688
Helicase1203
AF0070
R AMP1583
1343
1517
Cas12 Cas13
RAMP1688
RAMP1688
RAMP1688
Makarova et al NRM submitted
Experimental data on CRISPRCas systems
1857
1857
1857
RAMP1688
10
Cordip DIP0037Bacfra BF3951
Wolsuc WS1444Camjej Cj1522c
Neimen NMA0630Pasmul PM1126
Strthe str0658Strpyo Spy 0770
Treden TDE0328Staepi SERP2463
Mycgal MGA 0523Mycmob MMOB0320
Porgin PG1982Legpne lpp0161
Wolsuc WS1615Sulsol SSO1450
Aerper APE1240Sulsol SSO1405
Pyraer PAE0200Pyraer PAE0081
Metmaz MM0559Metmaz MM3249
Arcful AF1878Mansuc MS1635
Niteur NE0111Myctub Rv2817c
Nostoc alr0381Synech slr7071Thethe TTHB145
Chrvio CV1229Xanaxo XAC3842
Chltep CT1130Desvul DVUA0134
Metcap MCA0651Azoarc ebA3283
Deigeo_DRAFT_1682Mansuc MS0981
Baccla ABC3592Bachal BH0341
Strpyo Spy 1286Vibvul VVA1544
Nostoc alr1468Synech slr7092
Synech slr7016Geosul GSU0057
Thethe TTHB224Nostoc alr1568
Metkan MK1312
Esccol b2755Saltyp STM2938
Phopro PBPRC0034Geosul GSU1392
Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639
Symthe_STH663Corjei jk0643
Nocfar nfa44220Strave SAV7537
Metcap MCA0930Cordip DIP2214
Chrvio CV1756Zymmob ZMO0680
Pasmul PM0311Erwcar ECA3679
Yerpes y1722Legpne lpl2837Phopro PBPRB1995
Acinet ACIAD2484
Clotet CTC01148Fusnuc FN1177
Aquaeo aq 369Theten TTE2658Themar TM1797
Bacfra BF2544Porgin PG2014
Halmar pNG4053Clotet CTC01463
Metjan MJ0378Pyrhor PH1245
Thekod TK0455Metthe MTH1084
Arcful AF2435Metace MA3670
Nanequ NEQ017Pyrhor PH0173
Pictor PTO0003Pictor PTO0049Thevol TVN0106
1343
1343
1343
1343
1518
1518
1343
1518
RecB-f4343
RecB-f1468
RecB-f1468
RecB-f1468
Helicase12031857 RAMP
1688
RAMP1688
Helicase1203
Helicase1203
RAMPBH0337
RAMPYgcH
RAMPy1727
RAMPy17261857
Helicase1203
HD-f
Helicase1203
HD-f
1343
HD-f
HD-f
HD-f
RuvC-f
AF0070
BH0338
y1724
ygcKygcL
RAMP1583
HTH
AF1870
SPy1049
3574
AF1870
PH0918
HTH 3574
1857 RAMP1688
RecB-f1468
Helicase1203
HD-f
AF1870
MTH1090
AF1873
RAMP1583
3574
AF1870
3574
1343
1
7
2
3
4
5
6
7a
Sulsol SSO1999Sulsol SSO1440
Sultok ST2639Sultok ST0032
Sulsol SSO1402Pyraer PAE0068
Arcful AF1874Pyrhor PH0917Thekod TK0450
Pyraby PAB1689
13433513
3513
1518
RuvC-f
HNH
4aHNH
1 0
Pyrhor PH0162
6
64
6
4
16
17
7
7
7
7
1
2
7
7
7
7
5 7
5 75 7
5 75 7
5 7
6 1
7
Metkan MK1314
Metace MA1931
Metmaz MM3356
Metbar Mbar_A1355
Strthe stu0960Myctub Rv2823cStaepi SERP2461
Themar TM1811Mansuc MS1651
Niteur NE0123Thethe TTP0102
Metthe MTH1082
Metthe MTH326
Metjan MJ1672Pictor PTO0053Thevol TVN0112
Metkan MK1297Thethe TTP0115
Bachal BH0328Synech sll7090
Aquaeo aq 387Theten TTE2639
Themar TM1794Arcful AF1867
Pyrfur PF1129Thefus Tfu1578
Porgin PG1987Sultok ST1979
Sulsol SSO1991Sulsol SSO1729
Sultok ST0012Sulaci Saci 2046Nostoc all1479
1421 RAMP1567
RAMP1583
RAMPMTH323
RAMP1769
XMTH324
ldquoPolymeraserdquo cassette
ldquoHelicaserdquo cassette(A)
(B)
(C)
Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4
Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56
Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)
mRNA targeting
Phylogeny of Cas1 and the 3 types of CRISPRCas systems
228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes
EcoliCASS2Type IType I
Type I
Type I
Type I
Type III
Type III
Type III
Type III
Type III
Type III
Type II
YpestCASS3
NmeniCASS4
PolymeraseRAMPmoduleMtubeCASS6
PolRAMP
DvulgCASS1
TneapHmariCASS7
ApernCASS5
cas1cas9 cas2 cas4
cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI
II
IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2
RAMP
Makarova et al NRM submitted
CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes
54
256
1537
26
516
2
9
7 56107
3
10
13
383
210
46
216
8
1
7 70211
10
1
0102030405060708090
100
1623
84 6
17 7
2322
0
9
0
0
151
2
1
21
17
20 1
0
1533 28 7 14
0
9 7 40
117 210
0102030405060708090
100
Type I Type II Type IIICas1presenceabsence
Makarova et al in preparation
Modular evolution of the 3 types of CRISPRCas
Makarova et al in preparation
Mol Microbiol 2011 Jan79(2)484-502
A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair
Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF
Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair
Back to repair functions
The 3 major modalities of evolution
Koonin Wolf Biol Direct 2009
CRISPRCas as a bona fide Lamarckian system
Koonin Wolf Biol Direct 2009
Phenomenon Biological rolefunction
Phyletic spread
Lamarckian criteria
Genomic changes caused by environmental factor
Changes are specific to relevant genomic loci
Changes provide adaptation to the causative factor
Bona fide LamarckianCRISPRCas Defense against
viruses and other mobile elements
Most of the Archaeaand many bacteria
Yes Yes Yes
piRNA Defense against transposable elements in germline
Animals Yes Yes Yes
HGT (specific cases)
Adaptation to new environment stress response resistance
Archaea bacteria unicellular eukaryotes
Yes Yes Yes
Quasi-LamarckianHGT (general phenomenon)
Diverse innovations Archaea bacteria unicellular eukaryotes
Yes No Yesno
Stress-induced mutagenesis
Stress responseresistanceadaptation to new conditions
Ubiquitous Yes No or partially
Yes (but general evolvabilityenhanced as well)
Diverse Lamarckian and quasi-Lamarckian phenomena
Koonin Wolf Biol Direct 2009
Stress as a gauge of evolutionary modality
Koonin Wolf Biol Direct 2009
bullEvolution of parasites is intrinsic to any replicator system
bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since
bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms
bullPerennial arms race between parasites and hosts is one of the principal factors of evolution
Valerian DoljaOregon State University
Yuri Wolf NCBITatiana Senkevich NIAID NIH
Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille
Didier Raoult et al
Bill MartinUniv Duesseldorf
Kira Makarova NCBI
Mean of sequences with matches to major functional categories
0
5
10
15
Microbial Metagenomes
Vira
l Met
agen
omes
0 5 10 15
Nucleosides and nucleotides
Cell wall and capsule
Fatty acids and lipids
Membrane transport
Stress response
Aromatic compound
Cell division and cell cycle
Nitrogen metabolism
Sulphur metabolism
Motility and chemotaxis
Phosphorus metabolism
Potassium metabolism
Cell signalling
Secondary metabolismDNA metabolism
Carbohydrates
Amino acids
Virulence
Protein metabolism
Respiration
Photosynthesis
Cofactors etc
RNA metabolism
20
Key
Kristensen Mushegian Dolja Koonin Trends Microbiol 2010
Most of the viromes might not even consists of typical viruses butrather of pseudovirus particles that carry microbial genes (GTAs)
Comparative genomics shows that viruses that cause human diseases
belong to families that evolved hundreds of millions or even billions
years ago
Viruses accompany the evolving cellular life throughout its history and
might even predate it
Some viruses are comparable to cellular life forms in size and genetic complexity
Mimivirus genome (~12 Mbp ~1000 genes) is twice as large as that of Mycoplasma genitalium (580 kbp ~500 genes)
The largest most complex viruses NCLDV (Nucleo-Cytoplasmic Large DNA viruses of eukaryotes) ndashthis is where the smallpox virus belongs
Raoult et al Science 2004 Nov 19306(5700)1344 Suzan-Monti et al PLoS ONE 2007 Mar 282(3)
Iyer Aravind KooninCommon origin of four diverse families of large eukaryotic DNA viruses J Virol 2001 75 11720Iyer et al Evolutionary genomics of nucleo-cytoplasmic large DNA viruses Virus Res 2006 Apr117(1)156
The largest most complex viruses the Nucleocytoplasmic Large DNA Viruses (NCLDV)
(this is where the smallpox virus AND themimivirus belong)
6 families of NCLDVhellipand counting
size kb hosts-poxviridae 26 [134-360] vertebrates insects-asfarviridae 1 [170] vertebrates protists()-iridoviridae 8 [103-212] vertebrates insects protists()-ascoviridae 4 [119-174] insects-phycodnaviridae 9 [155-407] algae haptophytes stramenopiles -mimiviridae 2 [1181-1200] amoebozoa algae()-[Marseille virus] 1 [368] amoebozoa ndash new family
The case for the monophyly (common origin) of NCLDV
9 universally conserved hallmark genes (vaccinia gene names)-primase (D5-N)-helicase (D5-C)-DNA polymerase (E9)-packaging ATPase (A32)-Major capsid protein (D13 non-capsid in poxviruses)-Thiol-oxidoreductase (E10)-Helicase (D6 D11)-ST protein kinase (F10)-Transcription factor VLTF2 (A1)
47 genes mapped to the last common ancestor of NCLDVby maximum likelihood ndash all main functional systems represented
Iyer et al J Virol 2001 Virus Res 2006 Yutin et al Virol J 2009
b1 Helvib1 Spofr
b1 Trinil2 Invir
l1 Aedtal3 Lymch
l3 Lymdil5 Ambtil5 Frovi
l5 Singrl4 Infsp
m6 Marseillevirusn1 Acapon2 Mamav
q1 Acatuq1 ParFRq1 ParMTq1 ParAR
q1 ParNYq1 Parbu
q6 Ostviq3 Ectsiq3 Felspq2 Emihu
u2 Amsmou2 Melsa
u1 Bovpau1 Orfviu1 Deevi
u1 Swiviu1 Myxviu1 Rabfi
u1 Goaviu1 Lumsku1 Shevi
u1 Tanviu1 Yabliu1 Yabmo
u1 Vacviu1 Varvi
u1 Molcou1 Crovi
u1 Canviu1 Fowvi
c1 Afrsw
1000010000
93159970
10000
1000010000
6702
9983
10000
7557
10000
1000010000
999310000
10000
9910
10000
7074
10000
6735
10000
10000
7417
10000
4747
454010000
9840
1000010000
10000
10000
10000
9720
9911
5452
9871
10000
10000
10000
05
Phycodnaviridae
Poxviridae
Ascoviridaeamp
Iridoviridae
MimiviridaeMarseillevirus
Asfarviridae
Phylogeny of NCLDV based on concatenation of 4 universal genes primase-helicase DNA polymerase packaging ATPase VLTF2 transcription factor
HOSTAnimals+ diverse protists
Amoebozoa Algae animals()
ChlorophytesHaptophytesstramenopiles
Animals haptophytes
bullDivergence of NCLDVmost likely antedatesdivergence of eukaryoticsupergroupsbullAlternativeaddition extensive horizontal transfer of viruses
Boyer Yutin et al PNAS 2009Koonin Yutin Intervirology2010
Animals
Boyer M Yutin N Pagnier I Barrassi L Fournous G Espinosa L Robert C AzzaS Sun S Rossmann MG Suzan-Monti M La Scola B Koonin EV Raoult DPNAS 2009106(51)21848-53
Giant Marseillevirus highlights the role of amoebae as a melting pot in emergence of chimeric microorganisms
Giant viruses such as Mimivirus isolated from amoeba found in aquatic habitatsshow biological sophistication comparable to that of simple cellular life formsand seem to evolve by similar mechanisms including extensive gene duplicationand horizontal gene transfer (HGT) possibly in part through a viral parasitethe virophage We report here the isolation of Marseille virus a previouslyuncharacterized giant virus of amoeba The virions of Marseillevirus encompass a 368-kb genome a minimum of 49 proteins and some messenger RNAs Phylogenetic analysis of core genes indicates that Marseillevirus is the prototypeof a family of nucleocytoplasmic large DNA viruses (NCLDV) of eukaryotes The genome repertoire of the virus is composed of typical NCLDV core genes and genes apparently obtained from eukaryotic hosts and their parasites or symbionts both bacterial and viral We propose that amoebae are melting potsldquoof microbial evolution where diverse forms emerge including giant viruses with complex gene repertoires of various origins
The mosaic composition of the Marseille virus genome
Gene content tree intervirus gene transfer
Amoeba as a melting pot for HGT between viruses andbacterial endosymbionts
There are really weird creatures out thereSome NCLDV host their own parasites
La Scola et al The virophage as a unique parasite of the giant mimivirus Nature 2008
Chimeric origin of the virophage genome
NCLDV
Mimivirus
Archaeal virus
NCLDVenvironmental only
Bacterialtransposon
La Scola et al Nature 2008 455100-4
Hypothesisorigin of the NCLDV (and other viruses ofEukaryotes) in the
the second melting pot of virus evolution ndasheukaryogenesis
Phage scaffold(virus hallmark genes)
Eukaryotic additionsdisplacements
Koonin Yutin Intervirology 2010
Poliovirus74 kb
VP0 3DPol2C 3CPro Angg
IRESVP1VP3 2APro 2B
3A 3B
Jelly-roll CPsSuperfamily 3
helicase RdRp
Viral hallmark genes
3C-Pro3DPol
RdRp3CPro g
VPg2C
S3H
Picornaviral lsquosignaturersquo genes
Protein sequence conservation
Some of the smallest viruses (this is where poliovirus belongs) The Big Bang of picorna-like virus evolution
Koonin Wolf Nagasaki Dolja Nat Rev Microbiol 2008 Dec6(12)925-39
Picorna-likeviruses possessdiverse arraysof hallmark andunique genes
Picorna-likeviral superfamily
Marine eukaryotic plankton carries a wealth ofpositive-strand RNA viruses nearly all belong to the picorna-like superfamily
Lang et al (2004) Virology 320206
HcRNAV44 kb
VP
HaRNAV86 kb
S-Pro Pol
VP1VP3VP2PolHel3
RsRNAV89 kb AnPolHel3 VP 1-3
SssRNAV90 kb An
PolHel3 VP 1-3C-Pro
sg sg
An
Nagasaki et al (2005) Appl Env Microbiol 718888
Culley et al (2006) Science 3121795
The amended Picornavirus-like
superfamily includes 14 recognized viral families
4 floating genera and15 unclassified positive-
strand anddouble-strand RNA
viruses that infect hosts from 4 of the 5 eukaryotic
supergroups-6 distinct clades
from RdRp phylogeny-diverse genome layouts
View from the viral side
6 major clades of picorna-likeviruses - 5 infect eukaryotesfrom different supergroups
The 5 supergroups of eukaryotes and their picorna-like viruses
Complementary view from thehost side
4 of the 5 eukaryoticsupergroups host diverse picorna-like viruses (no onechecked the 5th)
PolioV74 kb
VP0 3DPol2C 3CPro Angg
IRESVP1VP3 2APro 2B
3A 3B
Jelly-roll CPsSuperfamily 3
helicase RdRp
DNA phages Bacterial group II
retroelements
Bacterial and mitochondrialHtrA family proteases
Likely origins of the signature genes of the picorna-like superfamily were inferred usingsignature PSSMs and PSI-BLAST searches
3C-Pro
Hallmark genes of picorna-like viruses have distinct prokaryoticorigins
Radiation of major viralclades occurred in aldquoBig Bangrdquo during eukaryogenesis and antedates the divergenceof eukaryotic supergroups-the viruses then ldquosampledrdquo the hosts
Conclusions on the picorna and NCLDV storiesBig Bangs of virus evolution
bullPhylogenomic analysis of the picorna-like superfamily of eukaryotic RNA viruses indicates that the major lineages within this superfamily diverged prior to the divergence of the eukaryotic supergroupsbullBig Bang ndash an explosive early phase of viral evolutionbullMost likely the same pattern holds for other major groups ofviruses as illustrated by the evolutionary study of NCLDVa completely different group of virusesbullThe Big Bangs of eukaryotic virus evolution occurred concomitantly with a similar rapid phase of host evolution andcould be a manifestation of a general model of major evolutionarytransitions
Koonin The Biological Big Bang model for the major transitions in evolutionBiol Direct 2007 Aug 20221
Koonin Wolf Nagasaki Dolja Nat Rev Microbiol 2008 Dec6(12)925-39
+
1 Positive-strandRNA
3-30 kb - +R R ET
T+R
Class Replication cycle
2 Double-strandRNA
4-25 kb plusmnTr R E
T
+ plusmn
3 Negative-strandRNA
11-20 kbR R E
RdRp CPT
+-Tr -
+
Genetic cycles of RNA viruses
Diversity of viral genetic cycles versus the uniform genetic cycle of all cellular organisms Viruses are the
biospherersquos laboratory of genomic strategies
RdRp CP
RdRp CP
+4 Retroid
RNA viruses 7-12 kb +RT Tr E
T+Tr
plusmn
5 Retroid DNAviruses
elements2-10 kb +Tr
T+Tr
plusmnE RT
plusmn
RT CP
RT
Class Replication cycle
Genetic cycles of retroid viruses and retroelements
Genetic cycles of DNA viruses and plasmids 6 ssDNAvirusesplasmids2-11 kb
7 dsDNA virusesplasmids
5-1200 kb
+RCR RCR E
+Tr
plusmn +
TRCE
Tr
plusmn
+T
plusmnR E
Pr-Pol ALL CELLULARORGANISMS
YOU ARE HERE
I Genes with readily detectable homologs from cellular life forms 1 Genes with closely related homologs from cellular organisms
(typically the host of the given virus) present in a narrow group of viruses2 Genes that are conserved within a virus lineage or even several
lineages and have moderately close cellular homologsOrigin relatively recent (1) or ancient (2) acquisition from host
II Virus-specific genes3 ORFans ie genes without detectable homologs except possibly
in closely related viruses 4 Virus-specific genes that are conserved within a virus lineageAcquisition from host but with rapid divergence from ancestor once within viral genomes
III Viral hallmark genes5 Genes shared by many diverse virus lineages with only very distant
homologs in cellular organisms
Natural history of viral genes a one-page summary of viral comparative genomics
Contributions of different classes of viral genes to the genomes of different classes of viruses strong dependence on genome size
Genome size (log) 0 1 2 3
Recent acquisitions
Old acquisitions
Virus-specific ORFans
Virus-specific conserved
Virus hallmark
Most RNAvirusesretroelementsRCR replicons
Large RNA virusesadenotailedphages
NCLDVherpeslargephages
Natural history of viral genesViral Hallmark Genes
Shared by many diverse groups of viruses from the smallest RNA viruses to the giant DNA viruses
Strong support for monophyly of all viral members of the respective gene families
Only distant homologs in cellular organisms
Can be viewed as signatures of the lsquovirus statersquo
Play major roles in genome replication packagingand assembly
4 Rolling circle replicationinitiation endonuclease
1 Jelly-roll capsid protein
2 Superfamily 3 helicase
Protein products of viral hallmark genes
3 RNA-dependent RNA polymeraseand Reverse transcriptase
5 Viral archaeo-eukaryotic DNA primase
6 UL9-like superfamily 2 helicase
7 Packaging ATPase of the FtsK family
8 ATPase subunit of terminase
The primordial gene pool hypothesis (extremely counterintuitive ndash Santiago Elena Feb 16 2011)
The hallmark genes AND by implication the major lineages of modern viruses (at least viruses of prokaryotes) descend directly from a primordial gene pool
Koonin Senkevich Dolja Biol Direct 2006 129
Viral hallmark genes arebullpresent in a huge diversity of viruses and other selfish elementsbullrepresented only by remote homologs in cellular life forms
-synergy with the diversity of genomic strategies
A crucial corollary If viruses come directly from a primordial gene pool then origin of viruses isinextricably linked to the origin of cells
Cell degeneration Escaped genes Pr imordial geneticsystems
CELL
SMALLPARASITICCELL
VERYSMALLVIRUS
CHROMOSOME
PLASMID
VIRUS
RNA
VIRUS
PRE-CELLULAR LIFE FORMS
DNA
VIRUS
mRNA
VIRUS
Competing concepts of the or igin of viruses
CELLS
Origin and evolution of virus-like genetic
elementsin the pre-cellular
era
Koonin Martin TIG 2005Koonin Senkevich Dolja Biol Direct 2006 129Koonin Ann NY Acad Sci 2009
Replicon fusionyields chromosomes
bullViruses and virus-like genetic elements are not ldquojustrdquo pathogens they aredominant entities in the biospherebullEmergence of virus-like parasites is inevitable in any replicating systembullIn the pre-cellular epoch the genetic elements that later became viral andcellular genomes comprised a single pool in which they mixed matched andevolved new increasingly complex gene ensemblesbullDifferent replication strategies including RNA replication reverse transcriptionand DNA replication evolved already in the primordial genetic poolbullWith the emergence of prokaryotic cells a distinct pool of viral genes formed that retained its identity ever since as evidenced by the extant distribution ofviral hallmark genes ldquovirus worldrdquo or the virospherebullThe emergence of the eukaryotic cell was a second melting pot of virus evolution from which viruses of eukaryotes originated via recombination of genes from prokaryote viruses retroelements and the evolving eukaryotic hostbullViruses make essential contributions to the evolution of the genomes of cellularlife forms in particular as vehicles of HGT GTAs transducing phages
The ancient Virus World (VW)
Koonin EV Senkevich TG Dolja VV The ancient Virus World and evolution of cells Biol Direct 2006 Koonin EV Wolf YI Nagasaki K Dolja VV The complexity of the virus world Nat Rev Microbiol 2009
Evolution of antivirus defense systems
bull CRISPRCas system of adaptive immunity in prokaryotes
bull A case for Lamarckian evolutionbull The perennial virus-host arms race
CRISPR repeats and Cas genesCRISPR Clustered Regularly interspaced short palindromic repeatsCas CRISPR-associated (genes)
Sorek et al Nature Rev Microbiol 2008
Makarova KS Aravind L Grishin NV Rogozin IB Koonin EVA DNA repair system specific for thermophilic Archaea and bacteria predicted by genomic context analysisNucleic Acids Res 2002 Jan 1530(2)482-96
During a systematic analysis of conserved gene context in prokaryotic genomes a previously undetected complex partially conserved neighborhood consisting of more than 20 genes was discovered in most Archaea and some bacteria including the hyperthermophiles Thermotoga maritima and Aquifex aeolicus The gene composition and gene order in this neighborhood vary greatly between species but all versions have a stable conserved core that consists of five genes One of the core genes encodes a predicted DNA helicase often fused to a predicted HD-superfamily hydrolase and another encodes a RecB family exonuclease three core genes remain uncharacterized but one of these might encode a nuclease of a new family helliphelliphelliphellipThe functional features of the proteins encoded in this neighborhood suggest that they comprise a previously undetected DNA repair system which to our knowledge is the first repair system largely specific for thermophiles to be identified
Protein components of the system an update with unification of many diverse families
~25 families altogetherFamily SubfamilyA Phyletic
distributionBComments
1 COG1518 COG1518 (cas1) All Putative novel nucleaseintegrase Mostly α-helical protein
2 COG1343 COG1343 (cas2) COG3512ygbF-like
MTH324-likey1723_N-like
All Small protein related to VapD fused to helicase (COG1203) iny1723-like proteins
3 COG1203 COG1203 (cas3) All DNA helicase Most proteins have fusion to HD nuclease
4 RecB-like nuclease
COG1468 (cas4) COG4343 All RecB-like nuclease Contains three-cysteine C-terminal cluster
5 RAMP Repair-associated mysterious
protein
COG1688 (cas5)COG1769 COG1583COG1567 COG1336COG1367 COG1604COG1337 COG1332COG5551
BH0337-likeMJ0978-likeYgcH-like
y1726-like y1727-like
All Belong to ldquoRAMPrdquo family possibly RNA-binding proteinstructurally related to duplicated ferredoxin fold (PDB 1wj9)
6 COG1857 COG1857 COG3649 YgcJ-like y1725-like
All αβ protein predicted nuclease or integrase
7 HD-like nuclease
COG1203 (N-terminus) COG2254 All HD-like nuclease
8 BH0338 BH0338-likeMTH1090-like
All mostlyarchaea and FIRM
Large Zn-finger containing proteins probably nucleases (nucleaseactivity was shown for MTH1090 (ref)
9 COG1353 predicted
polymerase
COG1353 Most archaea some bacteria
Predicted palm-domain polymerase distantly related to viral RdRp and RT
hellipand ~15 other less common proteins Makarova et al BD 2006
CRISPR clustered regularly interspaced short palindromic repeats
CRISPR repeats
TB and TA
Strepto-like
Ecoli-like
Pasteurella-like
Cas2COG1343
Cas1COG1518
Cas4COG1468
Cas3COG1203
ldquoThe common structural characteristics of CRISPR loci are (i) the presence of multiple short direct repeats which show no or very little sequence variation within a given locus (ii) the presence of non-repetitive spacer sequences between the repeats of similar size (iii) the presence of a common leader sequence of a few hundred basepairs in most species harbouring multiple CRISPR loci (iv) the absence of long open reading frames within the locus and (v) the presence of the cas1 gene accompanied by the cas2 cas3 or cas4 genes in CRISPR-containing speciesrdquo
The cas genes are our ldquorepairrdquo system
ldquoHelicaserdquo cassette
ldquoPolymeraserdquo cassette
paragon of prokaryotic genome plasticity-extensive gene shuffling-evidence of multiple horizontal transfers-widespread gene loss
RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins
extreme sequence diversity MOTIFs specific I II III IV V COGs 13361367 hhshhGs ust-lKGhh+hh hhGtt hD lGhttsGh 160413371332 y1726-like slhlpEKuVRGT lRTIDs YGuVTsGhuh COG1851 hGphpGpsaFh hGFGRh BH0337-like TpA-h+GIh-uIh hhLpDV LGsREhuht COG1567 hhhpp ups-lhtAhh luscoGhGh COG1769 hhhh+Ph-hhh ss-hhGhlhsh hGh hGtcp+hsthchp COG1688 (Cas5) hhhhhts sssshhGhlsh lGttphh COGs 15835551 hhhhoPhhl hGtppshGFGl YgcH-like hHphlh hGu+uhGhGhh y1727-like LHphLh GFsthGLStss MJ0978-like hHNH lG+tsuhGhGol
Ferredoxin foldRNA-recognition motif (RRM)
The sequence similarity space of CRISPR repeats visualized with the BioLayoutprogram [26] Dots denote individual repeat sequences connecting lines represent Smith-Waterman similarities such that closer dots represent more similar sequences Dot colors denote cluster association as derived from MCL clustering The 12 largest clusters are indicated by circles together with their sequence logos coarse phylogenetic composition and sample secondary structures where applicableKunin et al Genome Biology 2007 8R61 doi101186gb-2007-8-4-r61
CRISPR show extreme diversity and complex clustering
Arcful GTTGAAATC-------AGACCAAAATGGGATTGAAAG- Metthe GTTAAAATC-------AGACCAAAATGGGATTGAAAT- Pyraby GTTCCAATA-------AGACTAGAATAGAATTGAAAG- Pyrhor GTTTAATAA--------GACTAAAATAGAATTGAAAG- Sulsol GATTAATCC-------------CAAAAGAATTGAAAG- Pyrhor GTTTCCGTA--------GAACTAAATAGTGTGGAAAG- Pyraby GTTTCCGTA--------GAACTTAGTAGTGTGGAAAG- Pyraer GTTTCAACT------------ATCTTTTGATTTTTGG- Pictor GTTAAATAA-------TAACCTAAATAGGATTGAAAG- Theaer GTAAAATAG---------ACCTTAATAGGATTGAAAG- Metace GTTTCAATC-------CCTCAAAGGTCTGATTTTAAC- Sultok GATGAATCC------------CAAAAGGAATTGAAAG- Sulaci GTTTTAGTT-------------TCTTGTCGTTATTAC- Pictor GTTTAAGAA--------TTACTAGATAGTATGGAGT-- Halmar GTTTCAGAC-------GGACCCTTGTGGGGTTGAAGC- Metjan ATTAAAATC-------AGACCGTTTCGGAATGGAAAT- Metkan GTTTCATTA-CCCGTATTATTACGGGTTAATTGCGAG- Nanequ CTTTCAATA---------TTTCTAATATATTAGAAAC- Themar GTTTCAATA-------CTTCCTTAGAGGTATGGAAAC- Theten GTTTCAATC-------CCTTTTAGGTAGGCTAAAAAC Thethe GTTGCAAAC-CTCGTTAGCCTCGTAGAGGATTGAAAC- Aquaeo GTTTTAACT--------CCACACGGTACATTAGAAAC- Bachal GTCGCACTC----TATATGGGT-GCGTGGATTGAAAT- Azoarcus GTGTTCCCC-------GCGCATCGCGGGGGTTGAAG-- Chltep GTCTTCCCC--------ACGCC-CGTGGGGGTGTTTC- Chrvio GTGCTCCCC-------CACGCA-CGTGGGGATGAACCG Clotet GTATTAGTA-------GCACCA-TATTGGAATGTAAAT Anabaena GTTTTAATTAACAAAAATCCCTATCAGGGATTGAAAC- Myctub GTTTCCGTC--CCCTCTCGGGGTTTTGGGTCTGACGAC Yerpes GTTCACTGC--------CGCACAGGCAGCTTAGAAA-- Metcap GTTTCAATCCACTCCCGGCTATTTAGCCGGGAGATAC- Mycgal GTTTTAGCA-CTGTACAATACTTGTGTAAGCAATAAC- Pasmul GTTAACTGC--------CGTATAGGCAGCTTAGAAA-- Pasmul GTTGTAGTTCCCTCTCTCATTTCGCAGTGCTACAAT-- Pasmul GTTCACCAT--------CGTGTAGATGGCTTAGAAA--
All
A large subset of CRISPRs is conserved among diverse specieseven between archaea and bacteria suggesting that
CRISPRs are horizontally transferred together with cas genes
Mojica FJ Diez-Villasenor C Garcia-Martinez J Soria E
Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elementsJ Mol Evol 2005 Feb60(2)174-82
Here we show that CRISPR spacers derive from preexisting sequences either chromosomal or within transmissible genetic elements such as bacteriophages and conjugative plasmids
helliphelliphelliphellipThe transcription of the CRISPR loci (Tang et al 2002) suggests that such activity could be executed by CRISPR-RNA molecules acting as regulatory RNA that specifically recognizes the target through the homologous RNA-spacer sequence similarly to the eukaryotic interference RNA
HypothesisCRISPRCasbull is a prokaryotic immunity system that functions on the
RNAi principle
bull integrates short fragments of essential phageplasmid genes into CRISPRs
bull When expressed these fragments (psiRNA ndash after prokaryotic siRNA) silence the target gene and make the organism immune to the respective agent
bull contains all or most of the protein activities involved in these processes
bull Some of the Cas proteins are functional analogs of the eukaryotic proteins involved in RNAi in particular components of RISC (RNA-Induced Silencing Complex) and form prokaryotic analogs of RISC (pRISCs)
Transcription
(protein-guided) RNA folding
specific transcription factor (COG1517)
cellular RNApol
polycistronic pre-psiRNA
p-dicer (Helicase+HD-Hydrolase)(COG1203 COG2254)
RNA processing(fast)
pre-psiRNA75-100 nt
RNA processing(slow)
p-dicer
psiRNA25-45 nt
p-slicer(COG1468 COG4343
COG1857)
RAMP
target RNA cleavage
p-RISC
plasmid or phage mRNA
Annealing to RNA target
p-RISC
The basic scheme of CASS functioning
psiRNA25-45 nt
RAMP
RAMP-RNA complex(unstable)plasmid or phage
mRNA
Primer elongationCASS RNApol(COG1353)
Amplified annealing
long dsRNA (stable)
p-dicerDuplex degradation
RAMPRAMP binding
Cycle continues
Annealing to RNA target
Variant of CASS functioning with polymerasepsi-RNA amplification(mostly in thermophiles but also in Mycobacteria)
plasmid or phage mRNA
pre-psiRNA75-100 nt
CASS RNApol(COG1353)Or other RT
Reverse transcription with random copy choice
Random RNA recombination and reverse transcription
dsDNA with CRISPRs and target-derived spacers
Homologous recombination with genomic CRISPR region
integrase(COG1518)
genomic DNA
genomic DNA with new target-derived spacer
OR
New psiRNA generation
Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12
Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity
Key validation
bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease
Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes
(1) Adaptation
(2) Expression amp Processing
(3) Interference
invading virus plasmid
CRISPRleader
5 4 3 2 1
6 5 4 3 2 1
invading virus plasmid
invading virus plasmid
host
host
cas operon
Cascade
Cas3
cas3 cas2cas1cas5
Van der Oost et al TIBS 2009
The three types of CRISPRCas systems and their signature genes
1343
1343
1343
1343
Helicase1203
HD-f
Cas2
Cas6
Cas6
Cas5
Cas8
Cas9
Cas1
Cas3
Cas10
Cas11
Cas7
Cas6
Cas6
Cas5 Cas6
Cas4
Cas6
13433513
1343
RuvC-f
3513
HNH
13433513
1857
18571857 RAMP1688
RAMP1688
RecB-f1468
RecB-f1468
Helicase1203
R AMP1583
1857 1857Helicase1203
1857Helicase1203
1343 y1724
ygcKygcL
EcoliCASS2
YpestCASS3
NmeniCASS4
MtubeCASS6 RAMP module
DvulgCASS1
Tneap and HmariCASS7
ApernCASS5
Type I (Helicase cassette)
Cascade
Type II (HNH-type)
Type III (RAMP module or Polymerase cassette)
CASS4a
1857 1343RecB-f1468
Helicase1203
BH0338MTH1090LA3191
1421 RAMP1567
RAMP15833337RAMP
1583
RAMP1583
RAMP1769
SPy1049
Cas6
Cas61857
RecB-f1468
RAMP1688
Helicase1203
AF0070
R AMP1583
1343
1517
Cas12 Cas13
RAMP1688
RAMP1688
RAMP1688
Makarova et al NRM submitted
Experimental data on CRISPRCas systems
1857
1857
1857
RAMP1688
10
Cordip DIP0037Bacfra BF3951
Wolsuc WS1444Camjej Cj1522c
Neimen NMA0630Pasmul PM1126
Strthe str0658Strpyo Spy 0770
Treden TDE0328Staepi SERP2463
Mycgal MGA 0523Mycmob MMOB0320
Porgin PG1982Legpne lpp0161
Wolsuc WS1615Sulsol SSO1450
Aerper APE1240Sulsol SSO1405
Pyraer PAE0200Pyraer PAE0081
Metmaz MM0559Metmaz MM3249
Arcful AF1878Mansuc MS1635
Niteur NE0111Myctub Rv2817c
Nostoc alr0381Synech slr7071Thethe TTHB145
Chrvio CV1229Xanaxo XAC3842
Chltep CT1130Desvul DVUA0134
Metcap MCA0651Azoarc ebA3283
Deigeo_DRAFT_1682Mansuc MS0981
Baccla ABC3592Bachal BH0341
Strpyo Spy 1286Vibvul VVA1544
Nostoc alr1468Synech slr7092
Synech slr7016Geosul GSU0057
Thethe TTHB224Nostoc alr1568
Metkan MK1312
Esccol b2755Saltyp STM2938
Phopro PBPRC0034Geosul GSU1392
Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639
Symthe_STH663Corjei jk0643
Nocfar nfa44220Strave SAV7537
Metcap MCA0930Cordip DIP2214
Chrvio CV1756Zymmob ZMO0680
Pasmul PM0311Erwcar ECA3679
Yerpes y1722Legpne lpl2837Phopro PBPRB1995
Acinet ACIAD2484
Clotet CTC01148Fusnuc FN1177
Aquaeo aq 369Theten TTE2658Themar TM1797
Bacfra BF2544Porgin PG2014
Halmar pNG4053Clotet CTC01463
Metjan MJ0378Pyrhor PH1245
Thekod TK0455Metthe MTH1084
Arcful AF2435Metace MA3670
Nanequ NEQ017Pyrhor PH0173
Pictor PTO0003Pictor PTO0049Thevol TVN0106
1343
1343
1343
1343
1518
1518
1343
1518
RecB-f4343
RecB-f1468
RecB-f1468
RecB-f1468
Helicase12031857 RAMP
1688
RAMP1688
Helicase1203
Helicase1203
RAMPBH0337
RAMPYgcH
RAMPy1727
RAMPy17261857
Helicase1203
HD-f
Helicase1203
HD-f
1343
HD-f
HD-f
HD-f
RuvC-f
AF0070
BH0338
y1724
ygcKygcL
RAMP1583
HTH
AF1870
SPy1049
3574
AF1870
PH0918
HTH 3574
1857 RAMP1688
RecB-f1468
Helicase1203
HD-f
AF1870
MTH1090
AF1873
RAMP1583
3574
AF1870
3574
1343
1
7
2
3
4
5
6
7a
Sulsol SSO1999Sulsol SSO1440
Sultok ST2639Sultok ST0032
Sulsol SSO1402Pyraer PAE0068
Arcful AF1874Pyrhor PH0917Thekod TK0450
Pyraby PAB1689
13433513
3513
1518
RuvC-f
HNH
4aHNH
1 0
Pyrhor PH0162
6
64
6
4
16
17
7
7
7
7
1
2
7
7
7
7
5 7
5 75 7
5 75 7
5 7
6 1
7
Metkan MK1314
Metace MA1931
Metmaz MM3356
Metbar Mbar_A1355
Strthe stu0960Myctub Rv2823cStaepi SERP2461
Themar TM1811Mansuc MS1651
Niteur NE0123Thethe TTP0102
Metthe MTH1082
Metthe MTH326
Metjan MJ1672Pictor PTO0053Thevol TVN0112
Metkan MK1297Thethe TTP0115
Bachal BH0328Synech sll7090
Aquaeo aq 387Theten TTE2639
Themar TM1794Arcful AF1867
Pyrfur PF1129Thefus Tfu1578
Porgin PG1987Sultok ST1979
Sulsol SSO1991Sulsol SSO1729
Sultok ST0012Sulaci Saci 2046Nostoc all1479
1421 RAMP1567
RAMP1583
RAMPMTH323
RAMP1769
XMTH324
ldquoPolymeraserdquo cassette
ldquoHelicaserdquo cassette(A)
(B)
(C)
Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4
Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56
Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)
mRNA targeting
Phylogeny of Cas1 and the 3 types of CRISPRCas systems
228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes
EcoliCASS2Type IType I
Type I
Type I
Type I
Type III
Type III
Type III
Type III
Type III
Type III
Type II
YpestCASS3
NmeniCASS4
PolymeraseRAMPmoduleMtubeCASS6
PolRAMP
DvulgCASS1
TneapHmariCASS7
ApernCASS5
cas1cas9 cas2 cas4
cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI
II
IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2
RAMP
Makarova et al NRM submitted
CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes
54
256
1537
26
516
2
9
7 56107
3
10
13
383
210
46
216
8
1
7 70211
10
1
0102030405060708090
100
1623
84 6
17 7
2322
0
9
0
0
151
2
1
21
17
20 1
0
1533 28 7 14
0
9 7 40
117 210
0102030405060708090
100
Type I Type II Type IIICas1presenceabsence
Makarova et al in preparation
Modular evolution of the 3 types of CRISPRCas
Makarova et al in preparation
Mol Microbiol 2011 Jan79(2)484-502
A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair
Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF
Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair
Back to repair functions
The 3 major modalities of evolution
Koonin Wolf Biol Direct 2009
CRISPRCas as a bona fide Lamarckian system
Koonin Wolf Biol Direct 2009
Phenomenon Biological rolefunction
Phyletic spread
Lamarckian criteria
Genomic changes caused by environmental factor
Changes are specific to relevant genomic loci
Changes provide adaptation to the causative factor
Bona fide LamarckianCRISPRCas Defense against
viruses and other mobile elements
Most of the Archaeaand many bacteria
Yes Yes Yes
piRNA Defense against transposable elements in germline
Animals Yes Yes Yes
HGT (specific cases)
Adaptation to new environment stress response resistance
Archaea bacteria unicellular eukaryotes
Yes Yes Yes
Quasi-LamarckianHGT (general phenomenon)
Diverse innovations Archaea bacteria unicellular eukaryotes
Yes No Yesno
Stress-induced mutagenesis
Stress responseresistanceadaptation to new conditions
Ubiquitous Yes No or partially
Yes (but general evolvabilityenhanced as well)
Diverse Lamarckian and quasi-Lamarckian phenomena
Koonin Wolf Biol Direct 2009
Stress as a gauge of evolutionary modality
Koonin Wolf Biol Direct 2009
bullEvolution of parasites is intrinsic to any replicator system
bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since
bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms
bullPerennial arms race between parasites and hosts is one of the principal factors of evolution
Valerian DoljaOregon State University
Yuri Wolf NCBITatiana Senkevich NIAID NIH
Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille
Didier Raoult et al
Bill MartinUniv Duesseldorf
Kira Makarova NCBI
Comparative genomics shows that viruses that cause human diseases
belong to families that evolved hundreds of millions or even billions
years ago
Viruses accompany the evolving cellular life throughout its history and
might even predate it
Some viruses are comparable to cellular life forms in size and genetic complexity
Mimivirus genome (~12 Mbp ~1000 genes) is twice as large as that of Mycoplasma genitalium (580 kbp ~500 genes)
The largest most complex viruses NCLDV (Nucleo-Cytoplasmic Large DNA viruses of eukaryotes) ndashthis is where the smallpox virus belongs
Raoult et al Science 2004 Nov 19306(5700)1344 Suzan-Monti et al PLoS ONE 2007 Mar 282(3)
Iyer Aravind KooninCommon origin of four diverse families of large eukaryotic DNA viruses J Virol 2001 75 11720Iyer et al Evolutionary genomics of nucleo-cytoplasmic large DNA viruses Virus Res 2006 Apr117(1)156
The largest most complex viruses the Nucleocytoplasmic Large DNA Viruses (NCLDV)
(this is where the smallpox virus AND themimivirus belong)
6 families of NCLDVhellipand counting
size kb hosts-poxviridae 26 [134-360] vertebrates insects-asfarviridae 1 [170] vertebrates protists()-iridoviridae 8 [103-212] vertebrates insects protists()-ascoviridae 4 [119-174] insects-phycodnaviridae 9 [155-407] algae haptophytes stramenopiles -mimiviridae 2 [1181-1200] amoebozoa algae()-[Marseille virus] 1 [368] amoebozoa ndash new family
The case for the monophyly (common origin) of NCLDV
9 universally conserved hallmark genes (vaccinia gene names)-primase (D5-N)-helicase (D5-C)-DNA polymerase (E9)-packaging ATPase (A32)-Major capsid protein (D13 non-capsid in poxviruses)-Thiol-oxidoreductase (E10)-Helicase (D6 D11)-ST protein kinase (F10)-Transcription factor VLTF2 (A1)
47 genes mapped to the last common ancestor of NCLDVby maximum likelihood ndash all main functional systems represented
Iyer et al J Virol 2001 Virus Res 2006 Yutin et al Virol J 2009
b1 Helvib1 Spofr
b1 Trinil2 Invir
l1 Aedtal3 Lymch
l3 Lymdil5 Ambtil5 Frovi
l5 Singrl4 Infsp
m6 Marseillevirusn1 Acapon2 Mamav
q1 Acatuq1 ParFRq1 ParMTq1 ParAR
q1 ParNYq1 Parbu
q6 Ostviq3 Ectsiq3 Felspq2 Emihu
u2 Amsmou2 Melsa
u1 Bovpau1 Orfviu1 Deevi
u1 Swiviu1 Myxviu1 Rabfi
u1 Goaviu1 Lumsku1 Shevi
u1 Tanviu1 Yabliu1 Yabmo
u1 Vacviu1 Varvi
u1 Molcou1 Crovi
u1 Canviu1 Fowvi
c1 Afrsw
1000010000
93159970
10000
1000010000
6702
9983
10000
7557
10000
1000010000
999310000
10000
9910
10000
7074
10000
6735
10000
10000
7417
10000
4747
454010000
9840
1000010000
10000
10000
10000
9720
9911
5452
9871
10000
10000
10000
05
Phycodnaviridae
Poxviridae
Ascoviridaeamp
Iridoviridae
MimiviridaeMarseillevirus
Asfarviridae
Phylogeny of NCLDV based on concatenation of 4 universal genes primase-helicase DNA polymerase packaging ATPase VLTF2 transcription factor
HOSTAnimals+ diverse protists
Amoebozoa Algae animals()
ChlorophytesHaptophytesstramenopiles
Animals haptophytes
bullDivergence of NCLDVmost likely antedatesdivergence of eukaryoticsupergroupsbullAlternativeaddition extensive horizontal transfer of viruses
Boyer Yutin et al PNAS 2009Koonin Yutin Intervirology2010
Animals
Boyer M Yutin N Pagnier I Barrassi L Fournous G Espinosa L Robert C AzzaS Sun S Rossmann MG Suzan-Monti M La Scola B Koonin EV Raoult DPNAS 2009106(51)21848-53
Giant Marseillevirus highlights the role of amoebae as a melting pot in emergence of chimeric microorganisms
Giant viruses such as Mimivirus isolated from amoeba found in aquatic habitatsshow biological sophistication comparable to that of simple cellular life formsand seem to evolve by similar mechanisms including extensive gene duplicationand horizontal gene transfer (HGT) possibly in part through a viral parasitethe virophage We report here the isolation of Marseille virus a previouslyuncharacterized giant virus of amoeba The virions of Marseillevirus encompass a 368-kb genome a minimum of 49 proteins and some messenger RNAs Phylogenetic analysis of core genes indicates that Marseillevirus is the prototypeof a family of nucleocytoplasmic large DNA viruses (NCLDV) of eukaryotes The genome repertoire of the virus is composed of typical NCLDV core genes and genes apparently obtained from eukaryotic hosts and their parasites or symbionts both bacterial and viral We propose that amoebae are melting potsldquoof microbial evolution where diverse forms emerge including giant viruses with complex gene repertoires of various origins
The mosaic composition of the Marseille virus genome
Gene content tree intervirus gene transfer
Amoeba as a melting pot for HGT between viruses andbacterial endosymbionts
There are really weird creatures out thereSome NCLDV host their own parasites
La Scola et al The virophage as a unique parasite of the giant mimivirus Nature 2008
Chimeric origin of the virophage genome
NCLDV
Mimivirus
Archaeal virus
NCLDVenvironmental only
Bacterialtransposon
La Scola et al Nature 2008 455100-4
Hypothesisorigin of the NCLDV (and other viruses ofEukaryotes) in the
the second melting pot of virus evolution ndasheukaryogenesis
Phage scaffold(virus hallmark genes)
Eukaryotic additionsdisplacements
Koonin Yutin Intervirology 2010
Poliovirus74 kb
VP0 3DPol2C 3CPro Angg
IRESVP1VP3 2APro 2B
3A 3B
Jelly-roll CPsSuperfamily 3
helicase RdRp
Viral hallmark genes
3C-Pro3DPol
RdRp3CPro g
VPg2C
S3H
Picornaviral lsquosignaturersquo genes
Protein sequence conservation
Some of the smallest viruses (this is where poliovirus belongs) The Big Bang of picorna-like virus evolution
Koonin Wolf Nagasaki Dolja Nat Rev Microbiol 2008 Dec6(12)925-39
Picorna-likeviruses possessdiverse arraysof hallmark andunique genes
Picorna-likeviral superfamily
Marine eukaryotic plankton carries a wealth ofpositive-strand RNA viruses nearly all belong to the picorna-like superfamily
Lang et al (2004) Virology 320206
HcRNAV44 kb
VP
HaRNAV86 kb
S-Pro Pol
VP1VP3VP2PolHel3
RsRNAV89 kb AnPolHel3 VP 1-3
SssRNAV90 kb An
PolHel3 VP 1-3C-Pro
sg sg
An
Nagasaki et al (2005) Appl Env Microbiol 718888
Culley et al (2006) Science 3121795
The amended Picornavirus-like
superfamily includes 14 recognized viral families
4 floating genera and15 unclassified positive-
strand anddouble-strand RNA
viruses that infect hosts from 4 of the 5 eukaryotic
supergroups-6 distinct clades
from RdRp phylogeny-diverse genome layouts
View from the viral side
6 major clades of picorna-likeviruses - 5 infect eukaryotesfrom different supergroups
The 5 supergroups of eukaryotes and their picorna-like viruses
Complementary view from thehost side
4 of the 5 eukaryoticsupergroups host diverse picorna-like viruses (no onechecked the 5th)
PolioV74 kb
VP0 3DPol2C 3CPro Angg
IRESVP1VP3 2APro 2B
3A 3B
Jelly-roll CPsSuperfamily 3
helicase RdRp
DNA phages Bacterial group II
retroelements
Bacterial and mitochondrialHtrA family proteases
Likely origins of the signature genes of the picorna-like superfamily were inferred usingsignature PSSMs and PSI-BLAST searches
3C-Pro
Hallmark genes of picorna-like viruses have distinct prokaryoticorigins
Radiation of major viralclades occurred in aldquoBig Bangrdquo during eukaryogenesis and antedates the divergenceof eukaryotic supergroups-the viruses then ldquosampledrdquo the hosts
Conclusions on the picorna and NCLDV storiesBig Bangs of virus evolution
bullPhylogenomic analysis of the picorna-like superfamily of eukaryotic RNA viruses indicates that the major lineages within this superfamily diverged prior to the divergence of the eukaryotic supergroupsbullBig Bang ndash an explosive early phase of viral evolutionbullMost likely the same pattern holds for other major groups ofviruses as illustrated by the evolutionary study of NCLDVa completely different group of virusesbullThe Big Bangs of eukaryotic virus evolution occurred concomitantly with a similar rapid phase of host evolution andcould be a manifestation of a general model of major evolutionarytransitions
Koonin The Biological Big Bang model for the major transitions in evolutionBiol Direct 2007 Aug 20221
Koonin Wolf Nagasaki Dolja Nat Rev Microbiol 2008 Dec6(12)925-39
+
1 Positive-strandRNA
3-30 kb - +R R ET
T+R
Class Replication cycle
2 Double-strandRNA
4-25 kb plusmnTr R E
T
+ plusmn
3 Negative-strandRNA
11-20 kbR R E
RdRp CPT
+-Tr -
+
Genetic cycles of RNA viruses
Diversity of viral genetic cycles versus the uniform genetic cycle of all cellular organisms Viruses are the
biospherersquos laboratory of genomic strategies
RdRp CP
RdRp CP
+4 Retroid
RNA viruses 7-12 kb +RT Tr E
T+Tr
plusmn
5 Retroid DNAviruses
elements2-10 kb +Tr
T+Tr
plusmnE RT
plusmn
RT CP
RT
Class Replication cycle
Genetic cycles of retroid viruses and retroelements
Genetic cycles of DNA viruses and plasmids 6 ssDNAvirusesplasmids2-11 kb
7 dsDNA virusesplasmids
5-1200 kb
+RCR RCR E
+Tr
plusmn +
TRCE
Tr
plusmn
+T
plusmnR E
Pr-Pol ALL CELLULARORGANISMS
YOU ARE HERE
I Genes with readily detectable homologs from cellular life forms 1 Genes with closely related homologs from cellular organisms
(typically the host of the given virus) present in a narrow group of viruses2 Genes that are conserved within a virus lineage or even several
lineages and have moderately close cellular homologsOrigin relatively recent (1) or ancient (2) acquisition from host
II Virus-specific genes3 ORFans ie genes without detectable homologs except possibly
in closely related viruses 4 Virus-specific genes that are conserved within a virus lineageAcquisition from host but with rapid divergence from ancestor once within viral genomes
III Viral hallmark genes5 Genes shared by many diverse virus lineages with only very distant
homologs in cellular organisms
Natural history of viral genes a one-page summary of viral comparative genomics
Contributions of different classes of viral genes to the genomes of different classes of viruses strong dependence on genome size
Genome size (log) 0 1 2 3
Recent acquisitions
Old acquisitions
Virus-specific ORFans
Virus-specific conserved
Virus hallmark
Most RNAvirusesretroelementsRCR replicons
Large RNA virusesadenotailedphages
NCLDVherpeslargephages
Natural history of viral genesViral Hallmark Genes
Shared by many diverse groups of viruses from the smallest RNA viruses to the giant DNA viruses
Strong support for monophyly of all viral members of the respective gene families
Only distant homologs in cellular organisms
Can be viewed as signatures of the lsquovirus statersquo
Play major roles in genome replication packagingand assembly
4 Rolling circle replicationinitiation endonuclease
1 Jelly-roll capsid protein
2 Superfamily 3 helicase
Protein products of viral hallmark genes
3 RNA-dependent RNA polymeraseand Reverse transcriptase
5 Viral archaeo-eukaryotic DNA primase
6 UL9-like superfamily 2 helicase
7 Packaging ATPase of the FtsK family
8 ATPase subunit of terminase
The primordial gene pool hypothesis (extremely counterintuitive ndash Santiago Elena Feb 16 2011)
The hallmark genes AND by implication the major lineages of modern viruses (at least viruses of prokaryotes) descend directly from a primordial gene pool
Koonin Senkevich Dolja Biol Direct 2006 129
Viral hallmark genes arebullpresent in a huge diversity of viruses and other selfish elementsbullrepresented only by remote homologs in cellular life forms
-synergy with the diversity of genomic strategies
A crucial corollary If viruses come directly from a primordial gene pool then origin of viruses isinextricably linked to the origin of cells
Cell degeneration Escaped genes Pr imordial geneticsystems
CELL
SMALLPARASITICCELL
VERYSMALLVIRUS
CHROMOSOME
PLASMID
VIRUS
RNA
VIRUS
PRE-CELLULAR LIFE FORMS
DNA
VIRUS
mRNA
VIRUS
Competing concepts of the or igin of viruses
CELLS
Origin and evolution of virus-like genetic
elementsin the pre-cellular
era
Koonin Martin TIG 2005Koonin Senkevich Dolja Biol Direct 2006 129Koonin Ann NY Acad Sci 2009
Replicon fusionyields chromosomes
bullViruses and virus-like genetic elements are not ldquojustrdquo pathogens they aredominant entities in the biospherebullEmergence of virus-like parasites is inevitable in any replicating systembullIn the pre-cellular epoch the genetic elements that later became viral andcellular genomes comprised a single pool in which they mixed matched andevolved new increasingly complex gene ensemblesbullDifferent replication strategies including RNA replication reverse transcriptionand DNA replication evolved already in the primordial genetic poolbullWith the emergence of prokaryotic cells a distinct pool of viral genes formed that retained its identity ever since as evidenced by the extant distribution ofviral hallmark genes ldquovirus worldrdquo or the virospherebullThe emergence of the eukaryotic cell was a second melting pot of virus evolution from which viruses of eukaryotes originated via recombination of genes from prokaryote viruses retroelements and the evolving eukaryotic hostbullViruses make essential contributions to the evolution of the genomes of cellularlife forms in particular as vehicles of HGT GTAs transducing phages
The ancient Virus World (VW)
Koonin EV Senkevich TG Dolja VV The ancient Virus World and evolution of cells Biol Direct 2006 Koonin EV Wolf YI Nagasaki K Dolja VV The complexity of the virus world Nat Rev Microbiol 2009
Evolution of antivirus defense systems
bull CRISPRCas system of adaptive immunity in prokaryotes
bull A case for Lamarckian evolutionbull The perennial virus-host arms race
CRISPR repeats and Cas genesCRISPR Clustered Regularly interspaced short palindromic repeatsCas CRISPR-associated (genes)
Sorek et al Nature Rev Microbiol 2008
Makarova KS Aravind L Grishin NV Rogozin IB Koonin EVA DNA repair system specific for thermophilic Archaea and bacteria predicted by genomic context analysisNucleic Acids Res 2002 Jan 1530(2)482-96
During a systematic analysis of conserved gene context in prokaryotic genomes a previously undetected complex partially conserved neighborhood consisting of more than 20 genes was discovered in most Archaea and some bacteria including the hyperthermophiles Thermotoga maritima and Aquifex aeolicus The gene composition and gene order in this neighborhood vary greatly between species but all versions have a stable conserved core that consists of five genes One of the core genes encodes a predicted DNA helicase often fused to a predicted HD-superfamily hydrolase and another encodes a RecB family exonuclease three core genes remain uncharacterized but one of these might encode a nuclease of a new family helliphelliphelliphellipThe functional features of the proteins encoded in this neighborhood suggest that they comprise a previously undetected DNA repair system which to our knowledge is the first repair system largely specific for thermophiles to be identified
Protein components of the system an update with unification of many diverse families
~25 families altogetherFamily SubfamilyA Phyletic
distributionBComments
1 COG1518 COG1518 (cas1) All Putative novel nucleaseintegrase Mostly α-helical protein
2 COG1343 COG1343 (cas2) COG3512ygbF-like
MTH324-likey1723_N-like
All Small protein related to VapD fused to helicase (COG1203) iny1723-like proteins
3 COG1203 COG1203 (cas3) All DNA helicase Most proteins have fusion to HD nuclease
4 RecB-like nuclease
COG1468 (cas4) COG4343 All RecB-like nuclease Contains three-cysteine C-terminal cluster
5 RAMP Repair-associated mysterious
protein
COG1688 (cas5)COG1769 COG1583COG1567 COG1336COG1367 COG1604COG1337 COG1332COG5551
BH0337-likeMJ0978-likeYgcH-like
y1726-like y1727-like
All Belong to ldquoRAMPrdquo family possibly RNA-binding proteinstructurally related to duplicated ferredoxin fold (PDB 1wj9)
6 COG1857 COG1857 COG3649 YgcJ-like y1725-like
All αβ protein predicted nuclease or integrase
7 HD-like nuclease
COG1203 (N-terminus) COG2254 All HD-like nuclease
8 BH0338 BH0338-likeMTH1090-like
All mostlyarchaea and FIRM
Large Zn-finger containing proteins probably nucleases (nucleaseactivity was shown for MTH1090 (ref)
9 COG1353 predicted
polymerase
COG1353 Most archaea some bacteria
Predicted palm-domain polymerase distantly related to viral RdRp and RT
hellipand ~15 other less common proteins Makarova et al BD 2006
CRISPR clustered regularly interspaced short palindromic repeats
CRISPR repeats
TB and TA
Strepto-like
Ecoli-like
Pasteurella-like
Cas2COG1343
Cas1COG1518
Cas4COG1468
Cas3COG1203
ldquoThe common structural characteristics of CRISPR loci are (i) the presence of multiple short direct repeats which show no or very little sequence variation within a given locus (ii) the presence of non-repetitive spacer sequences between the repeats of similar size (iii) the presence of a common leader sequence of a few hundred basepairs in most species harbouring multiple CRISPR loci (iv) the absence of long open reading frames within the locus and (v) the presence of the cas1 gene accompanied by the cas2 cas3 or cas4 genes in CRISPR-containing speciesrdquo
The cas genes are our ldquorepairrdquo system
ldquoHelicaserdquo cassette
ldquoPolymeraserdquo cassette
paragon of prokaryotic genome plasticity-extensive gene shuffling-evidence of multiple horizontal transfers-widespread gene loss
RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins
extreme sequence diversity MOTIFs specific I II III IV V COGs 13361367 hhshhGs ust-lKGhh+hh hhGtt hD lGhttsGh 160413371332 y1726-like slhlpEKuVRGT lRTIDs YGuVTsGhuh COG1851 hGphpGpsaFh hGFGRh BH0337-like TpA-h+GIh-uIh hhLpDV LGsREhuht COG1567 hhhpp ups-lhtAhh luscoGhGh COG1769 hhhh+Ph-hhh ss-hhGhlhsh hGh hGtcp+hsthchp COG1688 (Cas5) hhhhhts sssshhGhlsh lGttphh COGs 15835551 hhhhoPhhl hGtppshGFGl YgcH-like hHphlh hGu+uhGhGhh y1727-like LHphLh GFsthGLStss MJ0978-like hHNH lG+tsuhGhGol
Ferredoxin foldRNA-recognition motif (RRM)
The sequence similarity space of CRISPR repeats visualized with the BioLayoutprogram [26] Dots denote individual repeat sequences connecting lines represent Smith-Waterman similarities such that closer dots represent more similar sequences Dot colors denote cluster association as derived from MCL clustering The 12 largest clusters are indicated by circles together with their sequence logos coarse phylogenetic composition and sample secondary structures where applicableKunin et al Genome Biology 2007 8R61 doi101186gb-2007-8-4-r61
CRISPR show extreme diversity and complex clustering
Arcful GTTGAAATC-------AGACCAAAATGGGATTGAAAG- Metthe GTTAAAATC-------AGACCAAAATGGGATTGAAAT- Pyraby GTTCCAATA-------AGACTAGAATAGAATTGAAAG- Pyrhor GTTTAATAA--------GACTAAAATAGAATTGAAAG- Sulsol GATTAATCC-------------CAAAAGAATTGAAAG- Pyrhor GTTTCCGTA--------GAACTAAATAGTGTGGAAAG- Pyraby GTTTCCGTA--------GAACTTAGTAGTGTGGAAAG- Pyraer GTTTCAACT------------ATCTTTTGATTTTTGG- Pictor GTTAAATAA-------TAACCTAAATAGGATTGAAAG- Theaer GTAAAATAG---------ACCTTAATAGGATTGAAAG- Metace GTTTCAATC-------CCTCAAAGGTCTGATTTTAAC- Sultok GATGAATCC------------CAAAAGGAATTGAAAG- Sulaci GTTTTAGTT-------------TCTTGTCGTTATTAC- Pictor GTTTAAGAA--------TTACTAGATAGTATGGAGT-- Halmar GTTTCAGAC-------GGACCCTTGTGGGGTTGAAGC- Metjan ATTAAAATC-------AGACCGTTTCGGAATGGAAAT- Metkan GTTTCATTA-CCCGTATTATTACGGGTTAATTGCGAG- Nanequ CTTTCAATA---------TTTCTAATATATTAGAAAC- Themar GTTTCAATA-------CTTCCTTAGAGGTATGGAAAC- Theten GTTTCAATC-------CCTTTTAGGTAGGCTAAAAAC Thethe GTTGCAAAC-CTCGTTAGCCTCGTAGAGGATTGAAAC- Aquaeo GTTTTAACT--------CCACACGGTACATTAGAAAC- Bachal GTCGCACTC----TATATGGGT-GCGTGGATTGAAAT- Azoarcus GTGTTCCCC-------GCGCATCGCGGGGGTTGAAG-- Chltep GTCTTCCCC--------ACGCC-CGTGGGGGTGTTTC- Chrvio GTGCTCCCC-------CACGCA-CGTGGGGATGAACCG Clotet GTATTAGTA-------GCACCA-TATTGGAATGTAAAT Anabaena GTTTTAATTAACAAAAATCCCTATCAGGGATTGAAAC- Myctub GTTTCCGTC--CCCTCTCGGGGTTTTGGGTCTGACGAC Yerpes GTTCACTGC--------CGCACAGGCAGCTTAGAAA-- Metcap GTTTCAATCCACTCCCGGCTATTTAGCCGGGAGATAC- Mycgal GTTTTAGCA-CTGTACAATACTTGTGTAAGCAATAAC- Pasmul GTTAACTGC--------CGTATAGGCAGCTTAGAAA-- Pasmul GTTGTAGTTCCCTCTCTCATTTCGCAGTGCTACAAT-- Pasmul GTTCACCAT--------CGTGTAGATGGCTTAGAAA--
All
A large subset of CRISPRs is conserved among diverse specieseven between archaea and bacteria suggesting that
CRISPRs are horizontally transferred together with cas genes
Mojica FJ Diez-Villasenor C Garcia-Martinez J Soria E
Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elementsJ Mol Evol 2005 Feb60(2)174-82
Here we show that CRISPR spacers derive from preexisting sequences either chromosomal or within transmissible genetic elements such as bacteriophages and conjugative plasmids
helliphelliphelliphellipThe transcription of the CRISPR loci (Tang et al 2002) suggests that such activity could be executed by CRISPR-RNA molecules acting as regulatory RNA that specifically recognizes the target through the homologous RNA-spacer sequence similarly to the eukaryotic interference RNA
HypothesisCRISPRCasbull is a prokaryotic immunity system that functions on the
RNAi principle
bull integrates short fragments of essential phageplasmid genes into CRISPRs
bull When expressed these fragments (psiRNA ndash after prokaryotic siRNA) silence the target gene and make the organism immune to the respective agent
bull contains all or most of the protein activities involved in these processes
bull Some of the Cas proteins are functional analogs of the eukaryotic proteins involved in RNAi in particular components of RISC (RNA-Induced Silencing Complex) and form prokaryotic analogs of RISC (pRISCs)
Transcription
(protein-guided) RNA folding
specific transcription factor (COG1517)
cellular RNApol
polycistronic pre-psiRNA
p-dicer (Helicase+HD-Hydrolase)(COG1203 COG2254)
RNA processing(fast)
pre-psiRNA75-100 nt
RNA processing(slow)
p-dicer
psiRNA25-45 nt
p-slicer(COG1468 COG4343
COG1857)
RAMP
target RNA cleavage
p-RISC
plasmid or phage mRNA
Annealing to RNA target
p-RISC
The basic scheme of CASS functioning
psiRNA25-45 nt
RAMP
RAMP-RNA complex(unstable)plasmid or phage
mRNA
Primer elongationCASS RNApol(COG1353)
Amplified annealing
long dsRNA (stable)
p-dicerDuplex degradation
RAMPRAMP binding
Cycle continues
Annealing to RNA target
Variant of CASS functioning with polymerasepsi-RNA amplification(mostly in thermophiles but also in Mycobacteria)
plasmid or phage mRNA
pre-psiRNA75-100 nt
CASS RNApol(COG1353)Or other RT
Reverse transcription with random copy choice
Random RNA recombination and reverse transcription
dsDNA with CRISPRs and target-derived spacers
Homologous recombination with genomic CRISPR region
integrase(COG1518)
genomic DNA
genomic DNA with new target-derived spacer
OR
New psiRNA generation
Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12
Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity
Key validation
bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease
Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes
(1) Adaptation
(2) Expression amp Processing
(3) Interference
invading virus plasmid
CRISPRleader
5 4 3 2 1
6 5 4 3 2 1
invading virus plasmid
invading virus plasmid
host
host
cas operon
Cascade
Cas3
cas3 cas2cas1cas5
Van der Oost et al TIBS 2009
The three types of CRISPRCas systems and their signature genes
1343
1343
1343
1343
Helicase1203
HD-f
Cas2
Cas6
Cas6
Cas5
Cas8
Cas9
Cas1
Cas3
Cas10
Cas11
Cas7
Cas6
Cas6
Cas5 Cas6
Cas4
Cas6
13433513
1343
RuvC-f
3513
HNH
13433513
1857
18571857 RAMP1688
RAMP1688
RecB-f1468
RecB-f1468
Helicase1203
R AMP1583
1857 1857Helicase1203
1857Helicase1203
1343 y1724
ygcKygcL
EcoliCASS2
YpestCASS3
NmeniCASS4
MtubeCASS6 RAMP module
DvulgCASS1
Tneap and HmariCASS7
ApernCASS5
Type I (Helicase cassette)
Cascade
Type II (HNH-type)
Type III (RAMP module or Polymerase cassette)
CASS4a
1857 1343RecB-f1468
Helicase1203
BH0338MTH1090LA3191
1421 RAMP1567
RAMP15833337RAMP
1583
RAMP1583
RAMP1769
SPy1049
Cas6
Cas61857
RecB-f1468
RAMP1688
Helicase1203
AF0070
R AMP1583
1343
1517
Cas12 Cas13
RAMP1688
RAMP1688
RAMP1688
Makarova et al NRM submitted
Experimental data on CRISPRCas systems
1857
1857
1857
RAMP1688
10
Cordip DIP0037Bacfra BF3951
Wolsuc WS1444Camjej Cj1522c
Neimen NMA0630Pasmul PM1126
Strthe str0658Strpyo Spy 0770
Treden TDE0328Staepi SERP2463
Mycgal MGA 0523Mycmob MMOB0320
Porgin PG1982Legpne lpp0161
Wolsuc WS1615Sulsol SSO1450
Aerper APE1240Sulsol SSO1405
Pyraer PAE0200Pyraer PAE0081
Metmaz MM0559Metmaz MM3249
Arcful AF1878Mansuc MS1635
Niteur NE0111Myctub Rv2817c
Nostoc alr0381Synech slr7071Thethe TTHB145
Chrvio CV1229Xanaxo XAC3842
Chltep CT1130Desvul DVUA0134
Metcap MCA0651Azoarc ebA3283
Deigeo_DRAFT_1682Mansuc MS0981
Baccla ABC3592Bachal BH0341
Strpyo Spy 1286Vibvul VVA1544
Nostoc alr1468Synech slr7092
Synech slr7016Geosul GSU0057
Thethe TTHB224Nostoc alr1568
Metkan MK1312
Esccol b2755Saltyp STM2938
Phopro PBPRC0034Geosul GSU1392
Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639
Symthe_STH663Corjei jk0643
Nocfar nfa44220Strave SAV7537
Metcap MCA0930Cordip DIP2214
Chrvio CV1756Zymmob ZMO0680
Pasmul PM0311Erwcar ECA3679
Yerpes y1722Legpne lpl2837Phopro PBPRB1995
Acinet ACIAD2484
Clotet CTC01148Fusnuc FN1177
Aquaeo aq 369Theten TTE2658Themar TM1797
Bacfra BF2544Porgin PG2014
Halmar pNG4053Clotet CTC01463
Metjan MJ0378Pyrhor PH1245
Thekod TK0455Metthe MTH1084
Arcful AF2435Metace MA3670
Nanequ NEQ017Pyrhor PH0173
Pictor PTO0003Pictor PTO0049Thevol TVN0106
1343
1343
1343
1343
1518
1518
1343
1518
RecB-f4343
RecB-f1468
RecB-f1468
RecB-f1468
Helicase12031857 RAMP
1688
RAMP1688
Helicase1203
Helicase1203
RAMPBH0337
RAMPYgcH
RAMPy1727
RAMPy17261857
Helicase1203
HD-f
Helicase1203
HD-f
1343
HD-f
HD-f
HD-f
RuvC-f
AF0070
BH0338
y1724
ygcKygcL
RAMP1583
HTH
AF1870
SPy1049
3574
AF1870
PH0918
HTH 3574
1857 RAMP1688
RecB-f1468
Helicase1203
HD-f
AF1870
MTH1090
AF1873
RAMP1583
3574
AF1870
3574
1343
1
7
2
3
4
5
6
7a
Sulsol SSO1999Sulsol SSO1440
Sultok ST2639Sultok ST0032
Sulsol SSO1402Pyraer PAE0068
Arcful AF1874Pyrhor PH0917Thekod TK0450
Pyraby PAB1689
13433513
3513
1518
RuvC-f
HNH
4aHNH
1 0
Pyrhor PH0162
6
64
6
4
16
17
7
7
7
7
1
2
7
7
7
7
5 7
5 75 7
5 75 7
5 7
6 1
7
Metkan MK1314
Metace MA1931
Metmaz MM3356
Metbar Mbar_A1355
Strthe stu0960Myctub Rv2823cStaepi SERP2461
Themar TM1811Mansuc MS1651
Niteur NE0123Thethe TTP0102
Metthe MTH1082
Metthe MTH326
Metjan MJ1672Pictor PTO0053Thevol TVN0112
Metkan MK1297Thethe TTP0115
Bachal BH0328Synech sll7090
Aquaeo aq 387Theten TTE2639
Themar TM1794Arcful AF1867
Pyrfur PF1129Thefus Tfu1578
Porgin PG1987Sultok ST1979
Sulsol SSO1991Sulsol SSO1729
Sultok ST0012Sulaci Saci 2046Nostoc all1479
1421 RAMP1567
RAMP1583
RAMPMTH323
RAMP1769
XMTH324
ldquoPolymeraserdquo cassette
ldquoHelicaserdquo cassette(A)
(B)
(C)
Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4
Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56
Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)
mRNA targeting
Phylogeny of Cas1 and the 3 types of CRISPRCas systems
228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes
EcoliCASS2Type IType I
Type I
Type I
Type I
Type III
Type III
Type III
Type III
Type III
Type III
Type II
YpestCASS3
NmeniCASS4
PolymeraseRAMPmoduleMtubeCASS6
PolRAMP
DvulgCASS1
TneapHmariCASS7
ApernCASS5
cas1cas9 cas2 cas4
cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI
II
IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2
RAMP
Makarova et al NRM submitted
CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes
54
256
1537
26
516
2
9
7 56107
3
10
13
383
210
46
216
8
1
7 70211
10
1
0102030405060708090
100
1623
84 6
17 7
2322
0
9
0
0
151
2
1
21
17
20 1
0
1533 28 7 14
0
9 7 40
117 210
0102030405060708090
100
Type I Type II Type IIICas1presenceabsence
Makarova et al in preparation
Modular evolution of the 3 types of CRISPRCas
Makarova et al in preparation
Mol Microbiol 2011 Jan79(2)484-502
A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair
Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF
Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair
Back to repair functions
The 3 major modalities of evolution
Koonin Wolf Biol Direct 2009
CRISPRCas as a bona fide Lamarckian system
Koonin Wolf Biol Direct 2009
Phenomenon Biological rolefunction
Phyletic spread
Lamarckian criteria
Genomic changes caused by environmental factor
Changes are specific to relevant genomic loci
Changes provide adaptation to the causative factor
Bona fide LamarckianCRISPRCas Defense against
viruses and other mobile elements
Most of the Archaeaand many bacteria
Yes Yes Yes
piRNA Defense against transposable elements in germline
Animals Yes Yes Yes
HGT (specific cases)
Adaptation to new environment stress response resistance
Archaea bacteria unicellular eukaryotes
Yes Yes Yes
Quasi-LamarckianHGT (general phenomenon)
Diverse innovations Archaea bacteria unicellular eukaryotes
Yes No Yesno
Stress-induced mutagenesis
Stress responseresistanceadaptation to new conditions
Ubiquitous Yes No or partially
Yes (but general evolvabilityenhanced as well)
Diverse Lamarckian and quasi-Lamarckian phenomena
Koonin Wolf Biol Direct 2009
Stress as a gauge of evolutionary modality
Koonin Wolf Biol Direct 2009
bullEvolution of parasites is intrinsic to any replicator system
bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since
bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms
bullPerennial arms race between parasites and hosts is one of the principal factors of evolution
Valerian DoljaOregon State University
Yuri Wolf NCBITatiana Senkevich NIAID NIH
Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille
Didier Raoult et al
Bill MartinUniv Duesseldorf
Kira Makarova NCBI
Some viruses are comparable to cellular life forms in size and genetic complexity
Mimivirus genome (~12 Mbp ~1000 genes) is twice as large as that of Mycoplasma genitalium (580 kbp ~500 genes)
The largest most complex viruses NCLDV (Nucleo-Cytoplasmic Large DNA viruses of eukaryotes) ndashthis is where the smallpox virus belongs
Raoult et al Science 2004 Nov 19306(5700)1344 Suzan-Monti et al PLoS ONE 2007 Mar 282(3)
Iyer Aravind KooninCommon origin of four diverse families of large eukaryotic DNA viruses J Virol 2001 75 11720Iyer et al Evolutionary genomics of nucleo-cytoplasmic large DNA viruses Virus Res 2006 Apr117(1)156
The largest most complex viruses the Nucleocytoplasmic Large DNA Viruses (NCLDV)
(this is where the smallpox virus AND themimivirus belong)
6 families of NCLDVhellipand counting
size kb hosts-poxviridae 26 [134-360] vertebrates insects-asfarviridae 1 [170] vertebrates protists()-iridoviridae 8 [103-212] vertebrates insects protists()-ascoviridae 4 [119-174] insects-phycodnaviridae 9 [155-407] algae haptophytes stramenopiles -mimiviridae 2 [1181-1200] amoebozoa algae()-[Marseille virus] 1 [368] amoebozoa ndash new family
The case for the monophyly (common origin) of NCLDV
9 universally conserved hallmark genes (vaccinia gene names)-primase (D5-N)-helicase (D5-C)-DNA polymerase (E9)-packaging ATPase (A32)-Major capsid protein (D13 non-capsid in poxviruses)-Thiol-oxidoreductase (E10)-Helicase (D6 D11)-ST protein kinase (F10)-Transcription factor VLTF2 (A1)
47 genes mapped to the last common ancestor of NCLDVby maximum likelihood ndash all main functional systems represented
Iyer et al J Virol 2001 Virus Res 2006 Yutin et al Virol J 2009
b1 Helvib1 Spofr
b1 Trinil2 Invir
l1 Aedtal3 Lymch
l3 Lymdil5 Ambtil5 Frovi
l5 Singrl4 Infsp
m6 Marseillevirusn1 Acapon2 Mamav
q1 Acatuq1 ParFRq1 ParMTq1 ParAR
q1 ParNYq1 Parbu
q6 Ostviq3 Ectsiq3 Felspq2 Emihu
u2 Amsmou2 Melsa
u1 Bovpau1 Orfviu1 Deevi
u1 Swiviu1 Myxviu1 Rabfi
u1 Goaviu1 Lumsku1 Shevi
u1 Tanviu1 Yabliu1 Yabmo
u1 Vacviu1 Varvi
u1 Molcou1 Crovi
u1 Canviu1 Fowvi
c1 Afrsw
1000010000
93159970
10000
1000010000
6702
9983
10000
7557
10000
1000010000
999310000
10000
9910
10000
7074
10000
6735
10000
10000
7417
10000
4747
454010000
9840
1000010000
10000
10000
10000
9720
9911
5452
9871
10000
10000
10000
05
Phycodnaviridae
Poxviridae
Ascoviridaeamp
Iridoviridae
MimiviridaeMarseillevirus
Asfarviridae
Phylogeny of NCLDV based on concatenation of 4 universal genes primase-helicase DNA polymerase packaging ATPase VLTF2 transcription factor
HOSTAnimals+ diverse protists
Amoebozoa Algae animals()
ChlorophytesHaptophytesstramenopiles
Animals haptophytes
bullDivergence of NCLDVmost likely antedatesdivergence of eukaryoticsupergroupsbullAlternativeaddition extensive horizontal transfer of viruses
Boyer Yutin et al PNAS 2009Koonin Yutin Intervirology2010
Animals
Boyer M Yutin N Pagnier I Barrassi L Fournous G Espinosa L Robert C AzzaS Sun S Rossmann MG Suzan-Monti M La Scola B Koonin EV Raoult DPNAS 2009106(51)21848-53
Giant Marseillevirus highlights the role of amoebae as a melting pot in emergence of chimeric microorganisms
Giant viruses such as Mimivirus isolated from amoeba found in aquatic habitatsshow biological sophistication comparable to that of simple cellular life formsand seem to evolve by similar mechanisms including extensive gene duplicationand horizontal gene transfer (HGT) possibly in part through a viral parasitethe virophage We report here the isolation of Marseille virus a previouslyuncharacterized giant virus of amoeba The virions of Marseillevirus encompass a 368-kb genome a minimum of 49 proteins and some messenger RNAs Phylogenetic analysis of core genes indicates that Marseillevirus is the prototypeof a family of nucleocytoplasmic large DNA viruses (NCLDV) of eukaryotes The genome repertoire of the virus is composed of typical NCLDV core genes and genes apparently obtained from eukaryotic hosts and their parasites or symbionts both bacterial and viral We propose that amoebae are melting potsldquoof microbial evolution where diverse forms emerge including giant viruses with complex gene repertoires of various origins
The mosaic composition of the Marseille virus genome
Gene content tree intervirus gene transfer
Amoeba as a melting pot for HGT between viruses andbacterial endosymbionts
There are really weird creatures out thereSome NCLDV host their own parasites
La Scola et al The virophage as a unique parasite of the giant mimivirus Nature 2008
Chimeric origin of the virophage genome
NCLDV
Mimivirus
Archaeal virus
NCLDVenvironmental only
Bacterialtransposon
La Scola et al Nature 2008 455100-4
Hypothesisorigin of the NCLDV (and other viruses ofEukaryotes) in the
the second melting pot of virus evolution ndasheukaryogenesis
Phage scaffold(virus hallmark genes)
Eukaryotic additionsdisplacements
Koonin Yutin Intervirology 2010
Poliovirus74 kb
VP0 3DPol2C 3CPro Angg
IRESVP1VP3 2APro 2B
3A 3B
Jelly-roll CPsSuperfamily 3
helicase RdRp
Viral hallmark genes
3C-Pro3DPol
RdRp3CPro g
VPg2C
S3H
Picornaviral lsquosignaturersquo genes
Protein sequence conservation
Some of the smallest viruses (this is where poliovirus belongs) The Big Bang of picorna-like virus evolution
Koonin Wolf Nagasaki Dolja Nat Rev Microbiol 2008 Dec6(12)925-39
Picorna-likeviruses possessdiverse arraysof hallmark andunique genes
Picorna-likeviral superfamily
Marine eukaryotic plankton carries a wealth ofpositive-strand RNA viruses nearly all belong to the picorna-like superfamily
Lang et al (2004) Virology 320206
HcRNAV44 kb
VP
HaRNAV86 kb
S-Pro Pol
VP1VP3VP2PolHel3
RsRNAV89 kb AnPolHel3 VP 1-3
SssRNAV90 kb An
PolHel3 VP 1-3C-Pro
sg sg
An
Nagasaki et al (2005) Appl Env Microbiol 718888
Culley et al (2006) Science 3121795
The amended Picornavirus-like
superfamily includes 14 recognized viral families
4 floating genera and15 unclassified positive-
strand anddouble-strand RNA
viruses that infect hosts from 4 of the 5 eukaryotic
supergroups-6 distinct clades
from RdRp phylogeny-diverse genome layouts
View from the viral side
6 major clades of picorna-likeviruses - 5 infect eukaryotesfrom different supergroups
The 5 supergroups of eukaryotes and their picorna-like viruses
Complementary view from thehost side
4 of the 5 eukaryoticsupergroups host diverse picorna-like viruses (no onechecked the 5th)
PolioV74 kb
VP0 3DPol2C 3CPro Angg
IRESVP1VP3 2APro 2B
3A 3B
Jelly-roll CPsSuperfamily 3
helicase RdRp
DNA phages Bacterial group II
retroelements
Bacterial and mitochondrialHtrA family proteases
Likely origins of the signature genes of the picorna-like superfamily were inferred usingsignature PSSMs and PSI-BLAST searches
3C-Pro
Hallmark genes of picorna-like viruses have distinct prokaryoticorigins
Radiation of major viralclades occurred in aldquoBig Bangrdquo during eukaryogenesis and antedates the divergenceof eukaryotic supergroups-the viruses then ldquosampledrdquo the hosts
Conclusions on the picorna and NCLDV storiesBig Bangs of virus evolution
bullPhylogenomic analysis of the picorna-like superfamily of eukaryotic RNA viruses indicates that the major lineages within this superfamily diverged prior to the divergence of the eukaryotic supergroupsbullBig Bang ndash an explosive early phase of viral evolutionbullMost likely the same pattern holds for other major groups ofviruses as illustrated by the evolutionary study of NCLDVa completely different group of virusesbullThe Big Bangs of eukaryotic virus evolution occurred concomitantly with a similar rapid phase of host evolution andcould be a manifestation of a general model of major evolutionarytransitions
Koonin The Biological Big Bang model for the major transitions in evolutionBiol Direct 2007 Aug 20221
Koonin Wolf Nagasaki Dolja Nat Rev Microbiol 2008 Dec6(12)925-39
+
1 Positive-strandRNA
3-30 kb - +R R ET
T+R
Class Replication cycle
2 Double-strandRNA
4-25 kb plusmnTr R E
T
+ plusmn
3 Negative-strandRNA
11-20 kbR R E
RdRp CPT
+-Tr -
+
Genetic cycles of RNA viruses
Diversity of viral genetic cycles versus the uniform genetic cycle of all cellular organisms Viruses are the
biospherersquos laboratory of genomic strategies
RdRp CP
RdRp CP
+4 Retroid
RNA viruses 7-12 kb +RT Tr E
T+Tr
plusmn
5 Retroid DNAviruses
elements2-10 kb +Tr
T+Tr
plusmnE RT
plusmn
RT CP
RT
Class Replication cycle
Genetic cycles of retroid viruses and retroelements
Genetic cycles of DNA viruses and plasmids 6 ssDNAvirusesplasmids2-11 kb
7 dsDNA virusesplasmids
5-1200 kb
+RCR RCR E
+Tr
plusmn +
TRCE
Tr
plusmn
+T
plusmnR E
Pr-Pol ALL CELLULARORGANISMS
YOU ARE HERE
I Genes with readily detectable homologs from cellular life forms 1 Genes with closely related homologs from cellular organisms
(typically the host of the given virus) present in a narrow group of viruses2 Genes that are conserved within a virus lineage or even several
lineages and have moderately close cellular homologsOrigin relatively recent (1) or ancient (2) acquisition from host
II Virus-specific genes3 ORFans ie genes without detectable homologs except possibly
in closely related viruses 4 Virus-specific genes that are conserved within a virus lineageAcquisition from host but with rapid divergence from ancestor once within viral genomes
III Viral hallmark genes5 Genes shared by many diverse virus lineages with only very distant
homologs in cellular organisms
Natural history of viral genes a one-page summary of viral comparative genomics
Contributions of different classes of viral genes to the genomes of different classes of viruses strong dependence on genome size
Genome size (log) 0 1 2 3
Recent acquisitions
Old acquisitions
Virus-specific ORFans
Virus-specific conserved
Virus hallmark
Most RNAvirusesretroelementsRCR replicons
Large RNA virusesadenotailedphages
NCLDVherpeslargephages
Natural history of viral genesViral Hallmark Genes
Shared by many diverse groups of viruses from the smallest RNA viruses to the giant DNA viruses
Strong support for monophyly of all viral members of the respective gene families
Only distant homologs in cellular organisms
Can be viewed as signatures of the lsquovirus statersquo
Play major roles in genome replication packagingand assembly
4 Rolling circle replicationinitiation endonuclease
1 Jelly-roll capsid protein
2 Superfamily 3 helicase
Protein products of viral hallmark genes
3 RNA-dependent RNA polymeraseand Reverse transcriptase
5 Viral archaeo-eukaryotic DNA primase
6 UL9-like superfamily 2 helicase
7 Packaging ATPase of the FtsK family
8 ATPase subunit of terminase
The primordial gene pool hypothesis (extremely counterintuitive ndash Santiago Elena Feb 16 2011)
The hallmark genes AND by implication the major lineages of modern viruses (at least viruses of prokaryotes) descend directly from a primordial gene pool
Koonin Senkevich Dolja Biol Direct 2006 129
Viral hallmark genes arebullpresent in a huge diversity of viruses and other selfish elementsbullrepresented only by remote homologs in cellular life forms
-synergy with the diversity of genomic strategies
A crucial corollary If viruses come directly from a primordial gene pool then origin of viruses isinextricably linked to the origin of cells
Cell degeneration Escaped genes Pr imordial geneticsystems
CELL
SMALLPARASITICCELL
VERYSMALLVIRUS
CHROMOSOME
PLASMID
VIRUS
RNA
VIRUS
PRE-CELLULAR LIFE FORMS
DNA
VIRUS
mRNA
VIRUS
Competing concepts of the or igin of viruses
CELLS
Origin and evolution of virus-like genetic
elementsin the pre-cellular
era
Koonin Martin TIG 2005Koonin Senkevich Dolja Biol Direct 2006 129Koonin Ann NY Acad Sci 2009
Replicon fusionyields chromosomes
bullViruses and virus-like genetic elements are not ldquojustrdquo pathogens they aredominant entities in the biospherebullEmergence of virus-like parasites is inevitable in any replicating systembullIn the pre-cellular epoch the genetic elements that later became viral andcellular genomes comprised a single pool in which they mixed matched andevolved new increasingly complex gene ensemblesbullDifferent replication strategies including RNA replication reverse transcriptionand DNA replication evolved already in the primordial genetic poolbullWith the emergence of prokaryotic cells a distinct pool of viral genes formed that retained its identity ever since as evidenced by the extant distribution ofviral hallmark genes ldquovirus worldrdquo or the virospherebullThe emergence of the eukaryotic cell was a second melting pot of virus evolution from which viruses of eukaryotes originated via recombination of genes from prokaryote viruses retroelements and the evolving eukaryotic hostbullViruses make essential contributions to the evolution of the genomes of cellularlife forms in particular as vehicles of HGT GTAs transducing phages
The ancient Virus World (VW)
Koonin EV Senkevich TG Dolja VV The ancient Virus World and evolution of cells Biol Direct 2006 Koonin EV Wolf YI Nagasaki K Dolja VV The complexity of the virus world Nat Rev Microbiol 2009
Evolution of antivirus defense systems
bull CRISPRCas system of adaptive immunity in prokaryotes
bull A case for Lamarckian evolutionbull The perennial virus-host arms race
CRISPR repeats and Cas genesCRISPR Clustered Regularly interspaced short palindromic repeatsCas CRISPR-associated (genes)
Sorek et al Nature Rev Microbiol 2008
Makarova KS Aravind L Grishin NV Rogozin IB Koonin EVA DNA repair system specific for thermophilic Archaea and bacteria predicted by genomic context analysisNucleic Acids Res 2002 Jan 1530(2)482-96
During a systematic analysis of conserved gene context in prokaryotic genomes a previously undetected complex partially conserved neighborhood consisting of more than 20 genes was discovered in most Archaea and some bacteria including the hyperthermophiles Thermotoga maritima and Aquifex aeolicus The gene composition and gene order in this neighborhood vary greatly between species but all versions have a stable conserved core that consists of five genes One of the core genes encodes a predicted DNA helicase often fused to a predicted HD-superfamily hydrolase and another encodes a RecB family exonuclease three core genes remain uncharacterized but one of these might encode a nuclease of a new family helliphelliphelliphellipThe functional features of the proteins encoded in this neighborhood suggest that they comprise a previously undetected DNA repair system which to our knowledge is the first repair system largely specific for thermophiles to be identified
Protein components of the system an update with unification of many diverse families
~25 families altogetherFamily SubfamilyA Phyletic
distributionBComments
1 COG1518 COG1518 (cas1) All Putative novel nucleaseintegrase Mostly α-helical protein
2 COG1343 COG1343 (cas2) COG3512ygbF-like
MTH324-likey1723_N-like
All Small protein related to VapD fused to helicase (COG1203) iny1723-like proteins
3 COG1203 COG1203 (cas3) All DNA helicase Most proteins have fusion to HD nuclease
4 RecB-like nuclease
COG1468 (cas4) COG4343 All RecB-like nuclease Contains three-cysteine C-terminal cluster
5 RAMP Repair-associated mysterious
protein
COG1688 (cas5)COG1769 COG1583COG1567 COG1336COG1367 COG1604COG1337 COG1332COG5551
BH0337-likeMJ0978-likeYgcH-like
y1726-like y1727-like
All Belong to ldquoRAMPrdquo family possibly RNA-binding proteinstructurally related to duplicated ferredoxin fold (PDB 1wj9)
6 COG1857 COG1857 COG3649 YgcJ-like y1725-like
All αβ protein predicted nuclease or integrase
7 HD-like nuclease
COG1203 (N-terminus) COG2254 All HD-like nuclease
8 BH0338 BH0338-likeMTH1090-like
All mostlyarchaea and FIRM
Large Zn-finger containing proteins probably nucleases (nucleaseactivity was shown for MTH1090 (ref)
9 COG1353 predicted
polymerase
COG1353 Most archaea some bacteria
Predicted palm-domain polymerase distantly related to viral RdRp and RT
hellipand ~15 other less common proteins Makarova et al BD 2006
CRISPR clustered regularly interspaced short palindromic repeats
CRISPR repeats
TB and TA
Strepto-like
Ecoli-like
Pasteurella-like
Cas2COG1343
Cas1COG1518
Cas4COG1468
Cas3COG1203
ldquoThe common structural characteristics of CRISPR loci are (i) the presence of multiple short direct repeats which show no or very little sequence variation within a given locus (ii) the presence of non-repetitive spacer sequences between the repeats of similar size (iii) the presence of a common leader sequence of a few hundred basepairs in most species harbouring multiple CRISPR loci (iv) the absence of long open reading frames within the locus and (v) the presence of the cas1 gene accompanied by the cas2 cas3 or cas4 genes in CRISPR-containing speciesrdquo
The cas genes are our ldquorepairrdquo system
ldquoHelicaserdquo cassette
ldquoPolymeraserdquo cassette
paragon of prokaryotic genome plasticity-extensive gene shuffling-evidence of multiple horizontal transfers-widespread gene loss
RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins
extreme sequence diversity MOTIFs specific I II III IV V COGs 13361367 hhshhGs ust-lKGhh+hh hhGtt hD lGhttsGh 160413371332 y1726-like slhlpEKuVRGT lRTIDs YGuVTsGhuh COG1851 hGphpGpsaFh hGFGRh BH0337-like TpA-h+GIh-uIh hhLpDV LGsREhuht COG1567 hhhpp ups-lhtAhh luscoGhGh COG1769 hhhh+Ph-hhh ss-hhGhlhsh hGh hGtcp+hsthchp COG1688 (Cas5) hhhhhts sssshhGhlsh lGttphh COGs 15835551 hhhhoPhhl hGtppshGFGl YgcH-like hHphlh hGu+uhGhGhh y1727-like LHphLh GFsthGLStss MJ0978-like hHNH lG+tsuhGhGol
Ferredoxin foldRNA-recognition motif (RRM)
The sequence similarity space of CRISPR repeats visualized with the BioLayoutprogram [26] Dots denote individual repeat sequences connecting lines represent Smith-Waterman similarities such that closer dots represent more similar sequences Dot colors denote cluster association as derived from MCL clustering The 12 largest clusters are indicated by circles together with their sequence logos coarse phylogenetic composition and sample secondary structures where applicableKunin et al Genome Biology 2007 8R61 doi101186gb-2007-8-4-r61
CRISPR show extreme diversity and complex clustering
Arcful GTTGAAATC-------AGACCAAAATGGGATTGAAAG- Metthe GTTAAAATC-------AGACCAAAATGGGATTGAAAT- Pyraby GTTCCAATA-------AGACTAGAATAGAATTGAAAG- Pyrhor GTTTAATAA--------GACTAAAATAGAATTGAAAG- Sulsol GATTAATCC-------------CAAAAGAATTGAAAG- Pyrhor GTTTCCGTA--------GAACTAAATAGTGTGGAAAG- Pyraby GTTTCCGTA--------GAACTTAGTAGTGTGGAAAG- Pyraer GTTTCAACT------------ATCTTTTGATTTTTGG- Pictor GTTAAATAA-------TAACCTAAATAGGATTGAAAG- Theaer GTAAAATAG---------ACCTTAATAGGATTGAAAG- Metace GTTTCAATC-------CCTCAAAGGTCTGATTTTAAC- Sultok GATGAATCC------------CAAAAGGAATTGAAAG- Sulaci GTTTTAGTT-------------TCTTGTCGTTATTAC- Pictor GTTTAAGAA--------TTACTAGATAGTATGGAGT-- Halmar GTTTCAGAC-------GGACCCTTGTGGGGTTGAAGC- Metjan ATTAAAATC-------AGACCGTTTCGGAATGGAAAT- Metkan GTTTCATTA-CCCGTATTATTACGGGTTAATTGCGAG- Nanequ CTTTCAATA---------TTTCTAATATATTAGAAAC- Themar GTTTCAATA-------CTTCCTTAGAGGTATGGAAAC- Theten GTTTCAATC-------CCTTTTAGGTAGGCTAAAAAC Thethe GTTGCAAAC-CTCGTTAGCCTCGTAGAGGATTGAAAC- Aquaeo GTTTTAACT--------CCACACGGTACATTAGAAAC- Bachal GTCGCACTC----TATATGGGT-GCGTGGATTGAAAT- Azoarcus GTGTTCCCC-------GCGCATCGCGGGGGTTGAAG-- Chltep GTCTTCCCC--------ACGCC-CGTGGGGGTGTTTC- Chrvio GTGCTCCCC-------CACGCA-CGTGGGGATGAACCG Clotet GTATTAGTA-------GCACCA-TATTGGAATGTAAAT Anabaena GTTTTAATTAACAAAAATCCCTATCAGGGATTGAAAC- Myctub GTTTCCGTC--CCCTCTCGGGGTTTTGGGTCTGACGAC Yerpes GTTCACTGC--------CGCACAGGCAGCTTAGAAA-- Metcap GTTTCAATCCACTCCCGGCTATTTAGCCGGGAGATAC- Mycgal GTTTTAGCA-CTGTACAATACTTGTGTAAGCAATAAC- Pasmul GTTAACTGC--------CGTATAGGCAGCTTAGAAA-- Pasmul GTTGTAGTTCCCTCTCTCATTTCGCAGTGCTACAAT-- Pasmul GTTCACCAT--------CGTGTAGATGGCTTAGAAA--
All
A large subset of CRISPRs is conserved among diverse specieseven between archaea and bacteria suggesting that
CRISPRs are horizontally transferred together with cas genes
Mojica FJ Diez-Villasenor C Garcia-Martinez J Soria E
Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elementsJ Mol Evol 2005 Feb60(2)174-82
Here we show that CRISPR spacers derive from preexisting sequences either chromosomal or within transmissible genetic elements such as bacteriophages and conjugative plasmids
helliphelliphelliphellipThe transcription of the CRISPR loci (Tang et al 2002) suggests that such activity could be executed by CRISPR-RNA molecules acting as regulatory RNA that specifically recognizes the target through the homologous RNA-spacer sequence similarly to the eukaryotic interference RNA
HypothesisCRISPRCasbull is a prokaryotic immunity system that functions on the
RNAi principle
bull integrates short fragments of essential phageplasmid genes into CRISPRs
bull When expressed these fragments (psiRNA ndash after prokaryotic siRNA) silence the target gene and make the organism immune to the respective agent
bull contains all or most of the protein activities involved in these processes
bull Some of the Cas proteins are functional analogs of the eukaryotic proteins involved in RNAi in particular components of RISC (RNA-Induced Silencing Complex) and form prokaryotic analogs of RISC (pRISCs)
Transcription
(protein-guided) RNA folding
specific transcription factor (COG1517)
cellular RNApol
polycistronic pre-psiRNA
p-dicer (Helicase+HD-Hydrolase)(COG1203 COG2254)
RNA processing(fast)
pre-psiRNA75-100 nt
RNA processing(slow)
p-dicer
psiRNA25-45 nt
p-slicer(COG1468 COG4343
COG1857)
RAMP
target RNA cleavage
p-RISC
plasmid or phage mRNA
Annealing to RNA target
p-RISC
The basic scheme of CASS functioning
psiRNA25-45 nt
RAMP
RAMP-RNA complex(unstable)plasmid or phage
mRNA
Primer elongationCASS RNApol(COG1353)
Amplified annealing
long dsRNA (stable)
p-dicerDuplex degradation
RAMPRAMP binding
Cycle continues
Annealing to RNA target
Variant of CASS functioning with polymerasepsi-RNA amplification(mostly in thermophiles but also in Mycobacteria)
plasmid or phage mRNA
pre-psiRNA75-100 nt
CASS RNApol(COG1353)Or other RT
Reverse transcription with random copy choice
Random RNA recombination and reverse transcription
dsDNA with CRISPRs and target-derived spacers
Homologous recombination with genomic CRISPR region
integrase(COG1518)
genomic DNA
genomic DNA with new target-derived spacer
OR
New psiRNA generation
Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12
Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity
Key validation
bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease
Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes
(1) Adaptation
(2) Expression amp Processing
(3) Interference
invading virus plasmid
CRISPRleader
5 4 3 2 1
6 5 4 3 2 1
invading virus plasmid
invading virus plasmid
host
host
cas operon
Cascade
Cas3
cas3 cas2cas1cas5
Van der Oost et al TIBS 2009
The three types of CRISPRCas systems and their signature genes
1343
1343
1343
1343
Helicase1203
HD-f
Cas2
Cas6
Cas6
Cas5
Cas8
Cas9
Cas1
Cas3
Cas10
Cas11
Cas7
Cas6
Cas6
Cas5 Cas6
Cas4
Cas6
13433513
1343
RuvC-f
3513
HNH
13433513
1857
18571857 RAMP1688
RAMP1688
RecB-f1468
RecB-f1468
Helicase1203
R AMP1583
1857 1857Helicase1203
1857Helicase1203
1343 y1724
ygcKygcL
EcoliCASS2
YpestCASS3
NmeniCASS4
MtubeCASS6 RAMP module
DvulgCASS1
Tneap and HmariCASS7
ApernCASS5
Type I (Helicase cassette)
Cascade
Type II (HNH-type)
Type III (RAMP module or Polymerase cassette)
CASS4a
1857 1343RecB-f1468
Helicase1203
BH0338MTH1090LA3191
1421 RAMP1567
RAMP15833337RAMP
1583
RAMP1583
RAMP1769
SPy1049
Cas6
Cas61857
RecB-f1468
RAMP1688
Helicase1203
AF0070
R AMP1583
1343
1517
Cas12 Cas13
RAMP1688
RAMP1688
RAMP1688
Makarova et al NRM submitted
Experimental data on CRISPRCas systems
1857
1857
1857
RAMP1688
10
Cordip DIP0037Bacfra BF3951
Wolsuc WS1444Camjej Cj1522c
Neimen NMA0630Pasmul PM1126
Strthe str0658Strpyo Spy 0770
Treden TDE0328Staepi SERP2463
Mycgal MGA 0523Mycmob MMOB0320
Porgin PG1982Legpne lpp0161
Wolsuc WS1615Sulsol SSO1450
Aerper APE1240Sulsol SSO1405
Pyraer PAE0200Pyraer PAE0081
Metmaz MM0559Metmaz MM3249
Arcful AF1878Mansuc MS1635
Niteur NE0111Myctub Rv2817c
Nostoc alr0381Synech slr7071Thethe TTHB145
Chrvio CV1229Xanaxo XAC3842
Chltep CT1130Desvul DVUA0134
Metcap MCA0651Azoarc ebA3283
Deigeo_DRAFT_1682Mansuc MS0981
Baccla ABC3592Bachal BH0341
Strpyo Spy 1286Vibvul VVA1544
Nostoc alr1468Synech slr7092
Synech slr7016Geosul GSU0057
Thethe TTHB224Nostoc alr1568
Metkan MK1312
Esccol b2755Saltyp STM2938
Phopro PBPRC0034Geosul GSU1392
Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639
Symthe_STH663Corjei jk0643
Nocfar nfa44220Strave SAV7537
Metcap MCA0930Cordip DIP2214
Chrvio CV1756Zymmob ZMO0680
Pasmul PM0311Erwcar ECA3679
Yerpes y1722Legpne lpl2837Phopro PBPRB1995
Acinet ACIAD2484
Clotet CTC01148Fusnuc FN1177
Aquaeo aq 369Theten TTE2658Themar TM1797
Bacfra BF2544Porgin PG2014
Halmar pNG4053Clotet CTC01463
Metjan MJ0378Pyrhor PH1245
Thekod TK0455Metthe MTH1084
Arcful AF2435Metace MA3670
Nanequ NEQ017Pyrhor PH0173
Pictor PTO0003Pictor PTO0049Thevol TVN0106
1343
1343
1343
1343
1518
1518
1343
1518
RecB-f4343
RecB-f1468
RecB-f1468
RecB-f1468
Helicase12031857 RAMP
1688
RAMP1688
Helicase1203
Helicase1203
RAMPBH0337
RAMPYgcH
RAMPy1727
RAMPy17261857
Helicase1203
HD-f
Helicase1203
HD-f
1343
HD-f
HD-f
HD-f
RuvC-f
AF0070
BH0338
y1724
ygcKygcL
RAMP1583
HTH
AF1870
SPy1049
3574
AF1870
PH0918
HTH 3574
1857 RAMP1688
RecB-f1468
Helicase1203
HD-f
AF1870
MTH1090
AF1873
RAMP1583
3574
AF1870
3574
1343
1
7
2
3
4
5
6
7a
Sulsol SSO1999Sulsol SSO1440
Sultok ST2639Sultok ST0032
Sulsol SSO1402Pyraer PAE0068
Arcful AF1874Pyrhor PH0917Thekod TK0450
Pyraby PAB1689
13433513
3513
1518
RuvC-f
HNH
4aHNH
1 0
Pyrhor PH0162
6
64
6
4
16
17
7
7
7
7
1
2
7
7
7
7
5 7
5 75 7
5 75 7
5 7
6 1
7
Metkan MK1314
Metace MA1931
Metmaz MM3356
Metbar Mbar_A1355
Strthe stu0960Myctub Rv2823cStaepi SERP2461
Themar TM1811Mansuc MS1651
Niteur NE0123Thethe TTP0102
Metthe MTH1082
Metthe MTH326
Metjan MJ1672Pictor PTO0053Thevol TVN0112
Metkan MK1297Thethe TTP0115
Bachal BH0328Synech sll7090
Aquaeo aq 387Theten TTE2639
Themar TM1794Arcful AF1867
Pyrfur PF1129Thefus Tfu1578
Porgin PG1987Sultok ST1979
Sulsol SSO1991Sulsol SSO1729
Sultok ST0012Sulaci Saci 2046Nostoc all1479
1421 RAMP1567
RAMP1583
RAMPMTH323
RAMP1769
XMTH324
ldquoPolymeraserdquo cassette
ldquoHelicaserdquo cassette(A)
(B)
(C)
Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4
Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56
Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)
mRNA targeting
Phylogeny of Cas1 and the 3 types of CRISPRCas systems
228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes
EcoliCASS2Type IType I
Type I
Type I
Type I
Type III
Type III
Type III
Type III
Type III
Type III
Type II
YpestCASS3
NmeniCASS4
PolymeraseRAMPmoduleMtubeCASS6
PolRAMP
DvulgCASS1
TneapHmariCASS7
ApernCASS5
cas1cas9 cas2 cas4
cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI
II
IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2
RAMP
Makarova et al NRM submitted
CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes
54
256
1537
26
516
2
9
7 56107
3
10
13
383
210
46
216
8
1
7 70211
10
1
0102030405060708090
100
1623
84 6
17 7
2322
0
9
0
0
151
2
1
21
17
20 1
0
1533 28 7 14
0
9 7 40
117 210
0102030405060708090
100
Type I Type II Type IIICas1presenceabsence
Makarova et al in preparation
Modular evolution of the 3 types of CRISPRCas
Makarova et al in preparation
Mol Microbiol 2011 Jan79(2)484-502
A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair
Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF
Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair
Back to repair functions
The 3 major modalities of evolution
Koonin Wolf Biol Direct 2009
CRISPRCas as a bona fide Lamarckian system
Koonin Wolf Biol Direct 2009
Phenomenon Biological rolefunction
Phyletic spread
Lamarckian criteria
Genomic changes caused by environmental factor
Changes are specific to relevant genomic loci
Changes provide adaptation to the causative factor
Bona fide LamarckianCRISPRCas Defense against
viruses and other mobile elements
Most of the Archaeaand many bacteria
Yes Yes Yes
piRNA Defense against transposable elements in germline
Animals Yes Yes Yes
HGT (specific cases)
Adaptation to new environment stress response resistance
Archaea bacteria unicellular eukaryotes
Yes Yes Yes
Quasi-LamarckianHGT (general phenomenon)
Diverse innovations Archaea bacteria unicellular eukaryotes
Yes No Yesno
Stress-induced mutagenesis
Stress responseresistanceadaptation to new conditions
Ubiquitous Yes No or partially
Yes (but general evolvabilityenhanced as well)
Diverse Lamarckian and quasi-Lamarckian phenomena
Koonin Wolf Biol Direct 2009
Stress as a gauge of evolutionary modality
Koonin Wolf Biol Direct 2009
bullEvolution of parasites is intrinsic to any replicator system
bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since
bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms
bullPerennial arms race between parasites and hosts is one of the principal factors of evolution
Valerian DoljaOregon State University
Yuri Wolf NCBITatiana Senkevich NIAID NIH
Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille
Didier Raoult et al
Bill MartinUniv Duesseldorf
Kira Makarova NCBI
Iyer Aravind KooninCommon origin of four diverse families of large eukaryotic DNA viruses J Virol 2001 75 11720Iyer et al Evolutionary genomics of nucleo-cytoplasmic large DNA viruses Virus Res 2006 Apr117(1)156
The largest most complex viruses the Nucleocytoplasmic Large DNA Viruses (NCLDV)
(this is where the smallpox virus AND themimivirus belong)
6 families of NCLDVhellipand counting
size kb hosts-poxviridae 26 [134-360] vertebrates insects-asfarviridae 1 [170] vertebrates protists()-iridoviridae 8 [103-212] vertebrates insects protists()-ascoviridae 4 [119-174] insects-phycodnaviridae 9 [155-407] algae haptophytes stramenopiles -mimiviridae 2 [1181-1200] amoebozoa algae()-[Marseille virus] 1 [368] amoebozoa ndash new family
The case for the monophyly (common origin) of NCLDV
9 universally conserved hallmark genes (vaccinia gene names)-primase (D5-N)-helicase (D5-C)-DNA polymerase (E9)-packaging ATPase (A32)-Major capsid protein (D13 non-capsid in poxviruses)-Thiol-oxidoreductase (E10)-Helicase (D6 D11)-ST protein kinase (F10)-Transcription factor VLTF2 (A1)
47 genes mapped to the last common ancestor of NCLDVby maximum likelihood ndash all main functional systems represented
Iyer et al J Virol 2001 Virus Res 2006 Yutin et al Virol J 2009
b1 Helvib1 Spofr
b1 Trinil2 Invir
l1 Aedtal3 Lymch
l3 Lymdil5 Ambtil5 Frovi
l5 Singrl4 Infsp
m6 Marseillevirusn1 Acapon2 Mamav
q1 Acatuq1 ParFRq1 ParMTq1 ParAR
q1 ParNYq1 Parbu
q6 Ostviq3 Ectsiq3 Felspq2 Emihu
u2 Amsmou2 Melsa
u1 Bovpau1 Orfviu1 Deevi
u1 Swiviu1 Myxviu1 Rabfi
u1 Goaviu1 Lumsku1 Shevi
u1 Tanviu1 Yabliu1 Yabmo
u1 Vacviu1 Varvi
u1 Molcou1 Crovi
u1 Canviu1 Fowvi
c1 Afrsw
1000010000
93159970
10000
1000010000
6702
9983
10000
7557
10000
1000010000
999310000
10000
9910
10000
7074
10000
6735
10000
10000
7417
10000
4747
454010000
9840
1000010000
10000
10000
10000
9720
9911
5452
9871
10000
10000
10000
05
Phycodnaviridae
Poxviridae
Ascoviridaeamp
Iridoviridae
MimiviridaeMarseillevirus
Asfarviridae
Phylogeny of NCLDV based on concatenation of 4 universal genes primase-helicase DNA polymerase packaging ATPase VLTF2 transcription factor
HOSTAnimals+ diverse protists
Amoebozoa Algae animals()
ChlorophytesHaptophytesstramenopiles
Animals haptophytes
bullDivergence of NCLDVmost likely antedatesdivergence of eukaryoticsupergroupsbullAlternativeaddition extensive horizontal transfer of viruses
Boyer Yutin et al PNAS 2009Koonin Yutin Intervirology2010
Animals
Boyer M Yutin N Pagnier I Barrassi L Fournous G Espinosa L Robert C AzzaS Sun S Rossmann MG Suzan-Monti M La Scola B Koonin EV Raoult DPNAS 2009106(51)21848-53
Giant Marseillevirus highlights the role of amoebae as a melting pot in emergence of chimeric microorganisms
Giant viruses such as Mimivirus isolated from amoeba found in aquatic habitatsshow biological sophistication comparable to that of simple cellular life formsand seem to evolve by similar mechanisms including extensive gene duplicationand horizontal gene transfer (HGT) possibly in part through a viral parasitethe virophage We report here the isolation of Marseille virus a previouslyuncharacterized giant virus of amoeba The virions of Marseillevirus encompass a 368-kb genome a minimum of 49 proteins and some messenger RNAs Phylogenetic analysis of core genes indicates that Marseillevirus is the prototypeof a family of nucleocytoplasmic large DNA viruses (NCLDV) of eukaryotes The genome repertoire of the virus is composed of typical NCLDV core genes and genes apparently obtained from eukaryotic hosts and their parasites or symbionts both bacterial and viral We propose that amoebae are melting potsldquoof microbial evolution where diverse forms emerge including giant viruses with complex gene repertoires of various origins
The mosaic composition of the Marseille virus genome
Gene content tree intervirus gene transfer
Amoeba as a melting pot for HGT between viruses andbacterial endosymbionts
There are really weird creatures out thereSome NCLDV host their own parasites
La Scola et al The virophage as a unique parasite of the giant mimivirus Nature 2008
Chimeric origin of the virophage genome
NCLDV
Mimivirus
Archaeal virus
NCLDVenvironmental only
Bacterialtransposon
La Scola et al Nature 2008 455100-4
Hypothesisorigin of the NCLDV (and other viruses ofEukaryotes) in the
the second melting pot of virus evolution ndasheukaryogenesis
Phage scaffold(virus hallmark genes)
Eukaryotic additionsdisplacements
Koonin Yutin Intervirology 2010
Poliovirus74 kb
VP0 3DPol2C 3CPro Angg
IRESVP1VP3 2APro 2B
3A 3B
Jelly-roll CPsSuperfamily 3
helicase RdRp
Viral hallmark genes
3C-Pro3DPol
RdRp3CPro g
VPg2C
S3H
Picornaviral lsquosignaturersquo genes
Protein sequence conservation
Some of the smallest viruses (this is where poliovirus belongs) The Big Bang of picorna-like virus evolution
Koonin Wolf Nagasaki Dolja Nat Rev Microbiol 2008 Dec6(12)925-39
Picorna-likeviruses possessdiverse arraysof hallmark andunique genes
Picorna-likeviral superfamily
Marine eukaryotic plankton carries a wealth ofpositive-strand RNA viruses nearly all belong to the picorna-like superfamily
Lang et al (2004) Virology 320206
HcRNAV44 kb
VP
HaRNAV86 kb
S-Pro Pol
VP1VP3VP2PolHel3
RsRNAV89 kb AnPolHel3 VP 1-3
SssRNAV90 kb An
PolHel3 VP 1-3C-Pro
sg sg
An
Nagasaki et al (2005) Appl Env Microbiol 718888
Culley et al (2006) Science 3121795
The amended Picornavirus-like
superfamily includes 14 recognized viral families
4 floating genera and15 unclassified positive-
strand anddouble-strand RNA
viruses that infect hosts from 4 of the 5 eukaryotic
supergroups-6 distinct clades
from RdRp phylogeny-diverse genome layouts
View from the viral side
6 major clades of picorna-likeviruses - 5 infect eukaryotesfrom different supergroups
The 5 supergroups of eukaryotes and their picorna-like viruses
Complementary view from thehost side
4 of the 5 eukaryoticsupergroups host diverse picorna-like viruses (no onechecked the 5th)
PolioV74 kb
VP0 3DPol2C 3CPro Angg
IRESVP1VP3 2APro 2B
3A 3B
Jelly-roll CPsSuperfamily 3
helicase RdRp
DNA phages Bacterial group II
retroelements
Bacterial and mitochondrialHtrA family proteases
Likely origins of the signature genes of the picorna-like superfamily were inferred usingsignature PSSMs and PSI-BLAST searches
3C-Pro
Hallmark genes of picorna-like viruses have distinct prokaryoticorigins
Radiation of major viralclades occurred in aldquoBig Bangrdquo during eukaryogenesis and antedates the divergenceof eukaryotic supergroups-the viruses then ldquosampledrdquo the hosts
Conclusions on the picorna and NCLDV storiesBig Bangs of virus evolution
bullPhylogenomic analysis of the picorna-like superfamily of eukaryotic RNA viruses indicates that the major lineages within this superfamily diverged prior to the divergence of the eukaryotic supergroupsbullBig Bang ndash an explosive early phase of viral evolutionbullMost likely the same pattern holds for other major groups ofviruses as illustrated by the evolutionary study of NCLDVa completely different group of virusesbullThe Big Bangs of eukaryotic virus evolution occurred concomitantly with a similar rapid phase of host evolution andcould be a manifestation of a general model of major evolutionarytransitions
Koonin The Biological Big Bang model for the major transitions in evolutionBiol Direct 2007 Aug 20221
Koonin Wolf Nagasaki Dolja Nat Rev Microbiol 2008 Dec6(12)925-39
+
1 Positive-strandRNA
3-30 kb - +R R ET
T+R
Class Replication cycle
2 Double-strandRNA
4-25 kb plusmnTr R E
T
+ plusmn
3 Negative-strandRNA
11-20 kbR R E
RdRp CPT
+-Tr -
+
Genetic cycles of RNA viruses
Diversity of viral genetic cycles versus the uniform genetic cycle of all cellular organisms Viruses are the
biospherersquos laboratory of genomic strategies
RdRp CP
RdRp CP
+4 Retroid
RNA viruses 7-12 kb +RT Tr E
T+Tr
plusmn
5 Retroid DNAviruses
elements2-10 kb +Tr
T+Tr
plusmnE RT
plusmn
RT CP
RT
Class Replication cycle
Genetic cycles of retroid viruses and retroelements
Genetic cycles of DNA viruses and plasmids 6 ssDNAvirusesplasmids2-11 kb
7 dsDNA virusesplasmids
5-1200 kb
+RCR RCR E
+Tr
plusmn +
TRCE
Tr
plusmn
+T
plusmnR E
Pr-Pol ALL CELLULARORGANISMS
YOU ARE HERE
I Genes with readily detectable homologs from cellular life forms 1 Genes with closely related homologs from cellular organisms
(typically the host of the given virus) present in a narrow group of viruses2 Genes that are conserved within a virus lineage or even several
lineages and have moderately close cellular homologsOrigin relatively recent (1) or ancient (2) acquisition from host
II Virus-specific genes3 ORFans ie genes without detectable homologs except possibly
in closely related viruses 4 Virus-specific genes that are conserved within a virus lineageAcquisition from host but with rapid divergence from ancestor once within viral genomes
III Viral hallmark genes5 Genes shared by many diverse virus lineages with only very distant
homologs in cellular organisms
Natural history of viral genes a one-page summary of viral comparative genomics
Contributions of different classes of viral genes to the genomes of different classes of viruses strong dependence on genome size
Genome size (log) 0 1 2 3
Recent acquisitions
Old acquisitions
Virus-specific ORFans
Virus-specific conserved
Virus hallmark
Most RNAvirusesretroelementsRCR replicons
Large RNA virusesadenotailedphages
NCLDVherpeslargephages
Natural history of viral genesViral Hallmark Genes
Shared by many diverse groups of viruses from the smallest RNA viruses to the giant DNA viruses
Strong support for monophyly of all viral members of the respective gene families
Only distant homologs in cellular organisms
Can be viewed as signatures of the lsquovirus statersquo
Play major roles in genome replication packagingand assembly
4 Rolling circle replicationinitiation endonuclease
1 Jelly-roll capsid protein
2 Superfamily 3 helicase
Protein products of viral hallmark genes
3 RNA-dependent RNA polymeraseand Reverse transcriptase
5 Viral archaeo-eukaryotic DNA primase
6 UL9-like superfamily 2 helicase
7 Packaging ATPase of the FtsK family
8 ATPase subunit of terminase
The primordial gene pool hypothesis (extremely counterintuitive ndash Santiago Elena Feb 16 2011)
The hallmark genes AND by implication the major lineages of modern viruses (at least viruses of prokaryotes) descend directly from a primordial gene pool
Koonin Senkevich Dolja Biol Direct 2006 129
Viral hallmark genes arebullpresent in a huge diversity of viruses and other selfish elementsbullrepresented only by remote homologs in cellular life forms
-synergy with the diversity of genomic strategies
A crucial corollary If viruses come directly from a primordial gene pool then origin of viruses isinextricably linked to the origin of cells
Cell degeneration Escaped genes Pr imordial geneticsystems
CELL
SMALLPARASITICCELL
VERYSMALLVIRUS
CHROMOSOME
PLASMID
VIRUS
RNA
VIRUS
PRE-CELLULAR LIFE FORMS
DNA
VIRUS
mRNA
VIRUS
Competing concepts of the or igin of viruses
CELLS
Origin and evolution of virus-like genetic
elementsin the pre-cellular
era
Koonin Martin TIG 2005Koonin Senkevich Dolja Biol Direct 2006 129Koonin Ann NY Acad Sci 2009
Replicon fusionyields chromosomes
bullViruses and virus-like genetic elements are not ldquojustrdquo pathogens they aredominant entities in the biospherebullEmergence of virus-like parasites is inevitable in any replicating systembullIn the pre-cellular epoch the genetic elements that later became viral andcellular genomes comprised a single pool in which they mixed matched andevolved new increasingly complex gene ensemblesbullDifferent replication strategies including RNA replication reverse transcriptionand DNA replication evolved already in the primordial genetic poolbullWith the emergence of prokaryotic cells a distinct pool of viral genes formed that retained its identity ever since as evidenced by the extant distribution ofviral hallmark genes ldquovirus worldrdquo or the virospherebullThe emergence of the eukaryotic cell was a second melting pot of virus evolution from which viruses of eukaryotes originated via recombination of genes from prokaryote viruses retroelements and the evolving eukaryotic hostbullViruses make essential contributions to the evolution of the genomes of cellularlife forms in particular as vehicles of HGT GTAs transducing phages
The ancient Virus World (VW)
Koonin EV Senkevich TG Dolja VV The ancient Virus World and evolution of cells Biol Direct 2006 Koonin EV Wolf YI Nagasaki K Dolja VV The complexity of the virus world Nat Rev Microbiol 2009
Evolution of antivirus defense systems
bull CRISPRCas system of adaptive immunity in prokaryotes
bull A case for Lamarckian evolutionbull The perennial virus-host arms race
CRISPR repeats and Cas genesCRISPR Clustered Regularly interspaced short palindromic repeatsCas CRISPR-associated (genes)
Sorek et al Nature Rev Microbiol 2008
Makarova KS Aravind L Grishin NV Rogozin IB Koonin EVA DNA repair system specific for thermophilic Archaea and bacteria predicted by genomic context analysisNucleic Acids Res 2002 Jan 1530(2)482-96
During a systematic analysis of conserved gene context in prokaryotic genomes a previously undetected complex partially conserved neighborhood consisting of more than 20 genes was discovered in most Archaea and some bacteria including the hyperthermophiles Thermotoga maritima and Aquifex aeolicus The gene composition and gene order in this neighborhood vary greatly between species but all versions have a stable conserved core that consists of five genes One of the core genes encodes a predicted DNA helicase often fused to a predicted HD-superfamily hydrolase and another encodes a RecB family exonuclease three core genes remain uncharacterized but one of these might encode a nuclease of a new family helliphelliphelliphellipThe functional features of the proteins encoded in this neighborhood suggest that they comprise a previously undetected DNA repair system which to our knowledge is the first repair system largely specific for thermophiles to be identified
Protein components of the system an update with unification of many diverse families
~25 families altogetherFamily SubfamilyA Phyletic
distributionBComments
1 COG1518 COG1518 (cas1) All Putative novel nucleaseintegrase Mostly α-helical protein
2 COG1343 COG1343 (cas2) COG3512ygbF-like
MTH324-likey1723_N-like
All Small protein related to VapD fused to helicase (COG1203) iny1723-like proteins
3 COG1203 COG1203 (cas3) All DNA helicase Most proteins have fusion to HD nuclease
4 RecB-like nuclease
COG1468 (cas4) COG4343 All RecB-like nuclease Contains three-cysteine C-terminal cluster
5 RAMP Repair-associated mysterious
protein
COG1688 (cas5)COG1769 COG1583COG1567 COG1336COG1367 COG1604COG1337 COG1332COG5551
BH0337-likeMJ0978-likeYgcH-like
y1726-like y1727-like
All Belong to ldquoRAMPrdquo family possibly RNA-binding proteinstructurally related to duplicated ferredoxin fold (PDB 1wj9)
6 COG1857 COG1857 COG3649 YgcJ-like y1725-like
All αβ protein predicted nuclease or integrase
7 HD-like nuclease
COG1203 (N-terminus) COG2254 All HD-like nuclease
8 BH0338 BH0338-likeMTH1090-like
All mostlyarchaea and FIRM
Large Zn-finger containing proteins probably nucleases (nucleaseactivity was shown for MTH1090 (ref)
9 COG1353 predicted
polymerase
COG1353 Most archaea some bacteria
Predicted palm-domain polymerase distantly related to viral RdRp and RT
hellipand ~15 other less common proteins Makarova et al BD 2006
CRISPR clustered regularly interspaced short palindromic repeats
CRISPR repeats
TB and TA
Strepto-like
Ecoli-like
Pasteurella-like
Cas2COG1343
Cas1COG1518
Cas4COG1468
Cas3COG1203
ldquoThe common structural characteristics of CRISPR loci are (i) the presence of multiple short direct repeats which show no or very little sequence variation within a given locus (ii) the presence of non-repetitive spacer sequences between the repeats of similar size (iii) the presence of a common leader sequence of a few hundred basepairs in most species harbouring multiple CRISPR loci (iv) the absence of long open reading frames within the locus and (v) the presence of the cas1 gene accompanied by the cas2 cas3 or cas4 genes in CRISPR-containing speciesrdquo
The cas genes are our ldquorepairrdquo system
ldquoHelicaserdquo cassette
ldquoPolymeraserdquo cassette
paragon of prokaryotic genome plasticity-extensive gene shuffling-evidence of multiple horizontal transfers-widespread gene loss
RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins
extreme sequence diversity MOTIFs specific I II III IV V COGs 13361367 hhshhGs ust-lKGhh+hh hhGtt hD lGhttsGh 160413371332 y1726-like slhlpEKuVRGT lRTIDs YGuVTsGhuh COG1851 hGphpGpsaFh hGFGRh BH0337-like TpA-h+GIh-uIh hhLpDV LGsREhuht COG1567 hhhpp ups-lhtAhh luscoGhGh COG1769 hhhh+Ph-hhh ss-hhGhlhsh hGh hGtcp+hsthchp COG1688 (Cas5) hhhhhts sssshhGhlsh lGttphh COGs 15835551 hhhhoPhhl hGtppshGFGl YgcH-like hHphlh hGu+uhGhGhh y1727-like LHphLh GFsthGLStss MJ0978-like hHNH lG+tsuhGhGol
Ferredoxin foldRNA-recognition motif (RRM)
The sequence similarity space of CRISPR repeats visualized with the BioLayoutprogram [26] Dots denote individual repeat sequences connecting lines represent Smith-Waterman similarities such that closer dots represent more similar sequences Dot colors denote cluster association as derived from MCL clustering The 12 largest clusters are indicated by circles together with their sequence logos coarse phylogenetic composition and sample secondary structures where applicableKunin et al Genome Biology 2007 8R61 doi101186gb-2007-8-4-r61
CRISPR show extreme diversity and complex clustering
Arcful GTTGAAATC-------AGACCAAAATGGGATTGAAAG- Metthe GTTAAAATC-------AGACCAAAATGGGATTGAAAT- Pyraby GTTCCAATA-------AGACTAGAATAGAATTGAAAG- Pyrhor GTTTAATAA--------GACTAAAATAGAATTGAAAG- Sulsol GATTAATCC-------------CAAAAGAATTGAAAG- Pyrhor GTTTCCGTA--------GAACTAAATAGTGTGGAAAG- Pyraby GTTTCCGTA--------GAACTTAGTAGTGTGGAAAG- Pyraer GTTTCAACT------------ATCTTTTGATTTTTGG- Pictor GTTAAATAA-------TAACCTAAATAGGATTGAAAG- Theaer GTAAAATAG---------ACCTTAATAGGATTGAAAG- Metace GTTTCAATC-------CCTCAAAGGTCTGATTTTAAC- Sultok GATGAATCC------------CAAAAGGAATTGAAAG- Sulaci GTTTTAGTT-------------TCTTGTCGTTATTAC- Pictor GTTTAAGAA--------TTACTAGATAGTATGGAGT-- Halmar GTTTCAGAC-------GGACCCTTGTGGGGTTGAAGC- Metjan ATTAAAATC-------AGACCGTTTCGGAATGGAAAT- Metkan GTTTCATTA-CCCGTATTATTACGGGTTAATTGCGAG- Nanequ CTTTCAATA---------TTTCTAATATATTAGAAAC- Themar GTTTCAATA-------CTTCCTTAGAGGTATGGAAAC- Theten GTTTCAATC-------CCTTTTAGGTAGGCTAAAAAC Thethe GTTGCAAAC-CTCGTTAGCCTCGTAGAGGATTGAAAC- Aquaeo GTTTTAACT--------CCACACGGTACATTAGAAAC- Bachal GTCGCACTC----TATATGGGT-GCGTGGATTGAAAT- Azoarcus GTGTTCCCC-------GCGCATCGCGGGGGTTGAAG-- Chltep GTCTTCCCC--------ACGCC-CGTGGGGGTGTTTC- Chrvio GTGCTCCCC-------CACGCA-CGTGGGGATGAACCG Clotet GTATTAGTA-------GCACCA-TATTGGAATGTAAAT Anabaena GTTTTAATTAACAAAAATCCCTATCAGGGATTGAAAC- Myctub GTTTCCGTC--CCCTCTCGGGGTTTTGGGTCTGACGAC Yerpes GTTCACTGC--------CGCACAGGCAGCTTAGAAA-- Metcap GTTTCAATCCACTCCCGGCTATTTAGCCGGGAGATAC- Mycgal GTTTTAGCA-CTGTACAATACTTGTGTAAGCAATAAC- Pasmul GTTAACTGC--------CGTATAGGCAGCTTAGAAA-- Pasmul GTTGTAGTTCCCTCTCTCATTTCGCAGTGCTACAAT-- Pasmul GTTCACCAT--------CGTGTAGATGGCTTAGAAA--
All
A large subset of CRISPRs is conserved among diverse specieseven between archaea and bacteria suggesting that
CRISPRs are horizontally transferred together with cas genes
Mojica FJ Diez-Villasenor C Garcia-Martinez J Soria E
Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elementsJ Mol Evol 2005 Feb60(2)174-82
Here we show that CRISPR spacers derive from preexisting sequences either chromosomal or within transmissible genetic elements such as bacteriophages and conjugative plasmids
helliphelliphelliphellipThe transcription of the CRISPR loci (Tang et al 2002) suggests that such activity could be executed by CRISPR-RNA molecules acting as regulatory RNA that specifically recognizes the target through the homologous RNA-spacer sequence similarly to the eukaryotic interference RNA
HypothesisCRISPRCasbull is a prokaryotic immunity system that functions on the
RNAi principle
bull integrates short fragments of essential phageplasmid genes into CRISPRs
bull When expressed these fragments (psiRNA ndash after prokaryotic siRNA) silence the target gene and make the organism immune to the respective agent
bull contains all or most of the protein activities involved in these processes
bull Some of the Cas proteins are functional analogs of the eukaryotic proteins involved in RNAi in particular components of RISC (RNA-Induced Silencing Complex) and form prokaryotic analogs of RISC (pRISCs)
Transcription
(protein-guided) RNA folding
specific transcription factor (COG1517)
cellular RNApol
polycistronic pre-psiRNA
p-dicer (Helicase+HD-Hydrolase)(COG1203 COG2254)
RNA processing(fast)
pre-psiRNA75-100 nt
RNA processing(slow)
p-dicer
psiRNA25-45 nt
p-slicer(COG1468 COG4343
COG1857)
RAMP
target RNA cleavage
p-RISC
plasmid or phage mRNA
Annealing to RNA target
p-RISC
The basic scheme of CASS functioning
psiRNA25-45 nt
RAMP
RAMP-RNA complex(unstable)plasmid or phage
mRNA
Primer elongationCASS RNApol(COG1353)
Amplified annealing
long dsRNA (stable)
p-dicerDuplex degradation
RAMPRAMP binding
Cycle continues
Annealing to RNA target
Variant of CASS functioning with polymerasepsi-RNA amplification(mostly in thermophiles but also in Mycobacteria)
plasmid or phage mRNA
pre-psiRNA75-100 nt
CASS RNApol(COG1353)Or other RT
Reverse transcription with random copy choice
Random RNA recombination and reverse transcription
dsDNA with CRISPRs and target-derived spacers
Homologous recombination with genomic CRISPR region
integrase(COG1518)
genomic DNA
genomic DNA with new target-derived spacer
OR
New psiRNA generation
Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12
Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity
Key validation
bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease
Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes
(1) Adaptation
(2) Expression amp Processing
(3) Interference
invading virus plasmid
CRISPRleader
5 4 3 2 1
6 5 4 3 2 1
invading virus plasmid
invading virus plasmid
host
host
cas operon
Cascade
Cas3
cas3 cas2cas1cas5
Van der Oost et al TIBS 2009
The three types of CRISPRCas systems and their signature genes
1343
1343
1343
1343
Helicase1203
HD-f
Cas2
Cas6
Cas6
Cas5
Cas8
Cas9
Cas1
Cas3
Cas10
Cas11
Cas7
Cas6
Cas6
Cas5 Cas6
Cas4
Cas6
13433513
1343
RuvC-f
3513
HNH
13433513
1857
18571857 RAMP1688
RAMP1688
RecB-f1468
RecB-f1468
Helicase1203
R AMP1583
1857 1857Helicase1203
1857Helicase1203
1343 y1724
ygcKygcL
EcoliCASS2
YpestCASS3
NmeniCASS4
MtubeCASS6 RAMP module
DvulgCASS1
Tneap and HmariCASS7
ApernCASS5
Type I (Helicase cassette)
Cascade
Type II (HNH-type)
Type III (RAMP module or Polymerase cassette)
CASS4a
1857 1343RecB-f1468
Helicase1203
BH0338MTH1090LA3191
1421 RAMP1567
RAMP15833337RAMP
1583
RAMP1583
RAMP1769
SPy1049
Cas6
Cas61857
RecB-f1468
RAMP1688
Helicase1203
AF0070
R AMP1583
1343
1517
Cas12 Cas13
RAMP1688
RAMP1688
RAMP1688
Makarova et al NRM submitted
Experimental data on CRISPRCas systems
1857
1857
1857
RAMP1688
10
Cordip DIP0037Bacfra BF3951
Wolsuc WS1444Camjej Cj1522c
Neimen NMA0630Pasmul PM1126
Strthe str0658Strpyo Spy 0770
Treden TDE0328Staepi SERP2463
Mycgal MGA 0523Mycmob MMOB0320
Porgin PG1982Legpne lpp0161
Wolsuc WS1615Sulsol SSO1450
Aerper APE1240Sulsol SSO1405
Pyraer PAE0200Pyraer PAE0081
Metmaz MM0559Metmaz MM3249
Arcful AF1878Mansuc MS1635
Niteur NE0111Myctub Rv2817c
Nostoc alr0381Synech slr7071Thethe TTHB145
Chrvio CV1229Xanaxo XAC3842
Chltep CT1130Desvul DVUA0134
Metcap MCA0651Azoarc ebA3283
Deigeo_DRAFT_1682Mansuc MS0981
Baccla ABC3592Bachal BH0341
Strpyo Spy 1286Vibvul VVA1544
Nostoc alr1468Synech slr7092
Synech slr7016Geosul GSU0057
Thethe TTHB224Nostoc alr1568
Metkan MK1312
Esccol b2755Saltyp STM2938
Phopro PBPRC0034Geosul GSU1392
Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639
Symthe_STH663Corjei jk0643
Nocfar nfa44220Strave SAV7537
Metcap MCA0930Cordip DIP2214
Chrvio CV1756Zymmob ZMO0680
Pasmul PM0311Erwcar ECA3679
Yerpes y1722Legpne lpl2837Phopro PBPRB1995
Acinet ACIAD2484
Clotet CTC01148Fusnuc FN1177
Aquaeo aq 369Theten TTE2658Themar TM1797
Bacfra BF2544Porgin PG2014
Halmar pNG4053Clotet CTC01463
Metjan MJ0378Pyrhor PH1245
Thekod TK0455Metthe MTH1084
Arcful AF2435Metace MA3670
Nanequ NEQ017Pyrhor PH0173
Pictor PTO0003Pictor PTO0049Thevol TVN0106
1343
1343
1343
1343
1518
1518
1343
1518
RecB-f4343
RecB-f1468
RecB-f1468
RecB-f1468
Helicase12031857 RAMP
1688
RAMP1688
Helicase1203
Helicase1203
RAMPBH0337
RAMPYgcH
RAMPy1727
RAMPy17261857
Helicase1203
HD-f
Helicase1203
HD-f
1343
HD-f
HD-f
HD-f
RuvC-f
AF0070
BH0338
y1724
ygcKygcL
RAMP1583
HTH
AF1870
SPy1049
3574
AF1870
PH0918
HTH 3574
1857 RAMP1688
RecB-f1468
Helicase1203
HD-f
AF1870
MTH1090
AF1873
RAMP1583
3574
AF1870
3574
1343
1
7
2
3
4
5
6
7a
Sulsol SSO1999Sulsol SSO1440
Sultok ST2639Sultok ST0032
Sulsol SSO1402Pyraer PAE0068
Arcful AF1874Pyrhor PH0917Thekod TK0450
Pyraby PAB1689
13433513
3513
1518
RuvC-f
HNH
4aHNH
1 0
Pyrhor PH0162
6
64
6
4
16
17
7
7
7
7
1
2
7
7
7
7
5 7
5 75 7
5 75 7
5 7
6 1
7
Metkan MK1314
Metace MA1931
Metmaz MM3356
Metbar Mbar_A1355
Strthe stu0960Myctub Rv2823cStaepi SERP2461
Themar TM1811Mansuc MS1651
Niteur NE0123Thethe TTP0102
Metthe MTH1082
Metthe MTH326
Metjan MJ1672Pictor PTO0053Thevol TVN0112
Metkan MK1297Thethe TTP0115
Bachal BH0328Synech sll7090
Aquaeo aq 387Theten TTE2639
Themar TM1794Arcful AF1867
Pyrfur PF1129Thefus Tfu1578
Porgin PG1987Sultok ST1979
Sulsol SSO1991Sulsol SSO1729
Sultok ST0012Sulaci Saci 2046Nostoc all1479
1421 RAMP1567
RAMP1583
RAMPMTH323
RAMP1769
XMTH324
ldquoPolymeraserdquo cassette
ldquoHelicaserdquo cassette(A)
(B)
(C)
Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4
Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56
Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)
mRNA targeting
Phylogeny of Cas1 and the 3 types of CRISPRCas systems
228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes
EcoliCASS2Type IType I
Type I
Type I
Type I
Type III
Type III
Type III
Type III
Type III
Type III
Type II
YpestCASS3
NmeniCASS4
PolymeraseRAMPmoduleMtubeCASS6
PolRAMP
DvulgCASS1
TneapHmariCASS7
ApernCASS5
cas1cas9 cas2 cas4
cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI
II
IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2
RAMP
Makarova et al NRM submitted
CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes
54
256
1537
26
516
2
9
7 56107
3
10
13
383
210
46
216
8
1
7 70211
10
1
0102030405060708090
100
1623
84 6
17 7
2322
0
9
0
0
151
2
1
21
17
20 1
0
1533 28 7 14
0
9 7 40
117 210
0102030405060708090
100
Type I Type II Type IIICas1presenceabsence
Makarova et al in preparation
Modular evolution of the 3 types of CRISPRCas
Makarova et al in preparation
Mol Microbiol 2011 Jan79(2)484-502
A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair
Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF
Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair
Back to repair functions
The 3 major modalities of evolution
Koonin Wolf Biol Direct 2009
CRISPRCas as a bona fide Lamarckian system
Koonin Wolf Biol Direct 2009
Phenomenon Biological rolefunction
Phyletic spread
Lamarckian criteria
Genomic changes caused by environmental factor
Changes are specific to relevant genomic loci
Changes provide adaptation to the causative factor
Bona fide LamarckianCRISPRCas Defense against
viruses and other mobile elements
Most of the Archaeaand many bacteria
Yes Yes Yes
piRNA Defense against transposable elements in germline
Animals Yes Yes Yes
HGT (specific cases)
Adaptation to new environment stress response resistance
Archaea bacteria unicellular eukaryotes
Yes Yes Yes
Quasi-LamarckianHGT (general phenomenon)
Diverse innovations Archaea bacteria unicellular eukaryotes
Yes No Yesno
Stress-induced mutagenesis
Stress responseresistanceadaptation to new conditions
Ubiquitous Yes No or partially
Yes (but general evolvabilityenhanced as well)
Diverse Lamarckian and quasi-Lamarckian phenomena
Koonin Wolf Biol Direct 2009
Stress as a gauge of evolutionary modality
Koonin Wolf Biol Direct 2009
bullEvolution of parasites is intrinsic to any replicator system
bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since
bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms
bullPerennial arms race between parasites and hosts is one of the principal factors of evolution
Valerian DoljaOregon State University
Yuri Wolf NCBITatiana Senkevich NIAID NIH
Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille
Didier Raoult et al
Bill MartinUniv Duesseldorf
Kira Makarova NCBI
The case for the monophyly (common origin) of NCLDV
9 universally conserved hallmark genes (vaccinia gene names)-primase (D5-N)-helicase (D5-C)-DNA polymerase (E9)-packaging ATPase (A32)-Major capsid protein (D13 non-capsid in poxviruses)-Thiol-oxidoreductase (E10)-Helicase (D6 D11)-ST protein kinase (F10)-Transcription factor VLTF2 (A1)
47 genes mapped to the last common ancestor of NCLDVby maximum likelihood ndash all main functional systems represented
Iyer et al J Virol 2001 Virus Res 2006 Yutin et al Virol J 2009
b1 Helvib1 Spofr
b1 Trinil2 Invir
l1 Aedtal3 Lymch
l3 Lymdil5 Ambtil5 Frovi
l5 Singrl4 Infsp
m6 Marseillevirusn1 Acapon2 Mamav
q1 Acatuq1 ParFRq1 ParMTq1 ParAR
q1 ParNYq1 Parbu
q6 Ostviq3 Ectsiq3 Felspq2 Emihu
u2 Amsmou2 Melsa
u1 Bovpau1 Orfviu1 Deevi
u1 Swiviu1 Myxviu1 Rabfi
u1 Goaviu1 Lumsku1 Shevi
u1 Tanviu1 Yabliu1 Yabmo
u1 Vacviu1 Varvi
u1 Molcou1 Crovi
u1 Canviu1 Fowvi
c1 Afrsw
1000010000
93159970
10000
1000010000
6702
9983
10000
7557
10000
1000010000
999310000
10000
9910
10000
7074
10000
6735
10000
10000
7417
10000
4747
454010000
9840
1000010000
10000
10000
10000
9720
9911
5452
9871
10000
10000
10000
05
Phycodnaviridae
Poxviridae
Ascoviridaeamp
Iridoviridae
MimiviridaeMarseillevirus
Asfarviridae
Phylogeny of NCLDV based on concatenation of 4 universal genes primase-helicase DNA polymerase packaging ATPase VLTF2 transcription factor
HOSTAnimals+ diverse protists
Amoebozoa Algae animals()
ChlorophytesHaptophytesstramenopiles
Animals haptophytes
bullDivergence of NCLDVmost likely antedatesdivergence of eukaryoticsupergroupsbullAlternativeaddition extensive horizontal transfer of viruses
Boyer Yutin et al PNAS 2009Koonin Yutin Intervirology2010
Animals
Boyer M Yutin N Pagnier I Barrassi L Fournous G Espinosa L Robert C AzzaS Sun S Rossmann MG Suzan-Monti M La Scola B Koonin EV Raoult DPNAS 2009106(51)21848-53
Giant Marseillevirus highlights the role of amoebae as a melting pot in emergence of chimeric microorganisms
Giant viruses such as Mimivirus isolated from amoeba found in aquatic habitatsshow biological sophistication comparable to that of simple cellular life formsand seem to evolve by similar mechanisms including extensive gene duplicationand horizontal gene transfer (HGT) possibly in part through a viral parasitethe virophage We report here the isolation of Marseille virus a previouslyuncharacterized giant virus of amoeba The virions of Marseillevirus encompass a 368-kb genome a minimum of 49 proteins and some messenger RNAs Phylogenetic analysis of core genes indicates that Marseillevirus is the prototypeof a family of nucleocytoplasmic large DNA viruses (NCLDV) of eukaryotes The genome repertoire of the virus is composed of typical NCLDV core genes and genes apparently obtained from eukaryotic hosts and their parasites or symbionts both bacterial and viral We propose that amoebae are melting potsldquoof microbial evolution where diverse forms emerge including giant viruses with complex gene repertoires of various origins
The mosaic composition of the Marseille virus genome
Gene content tree intervirus gene transfer
Amoeba as a melting pot for HGT between viruses andbacterial endosymbionts
There are really weird creatures out thereSome NCLDV host their own parasites
La Scola et al The virophage as a unique parasite of the giant mimivirus Nature 2008
Chimeric origin of the virophage genome
NCLDV
Mimivirus
Archaeal virus
NCLDVenvironmental only
Bacterialtransposon
La Scola et al Nature 2008 455100-4
Hypothesisorigin of the NCLDV (and other viruses ofEukaryotes) in the
the second melting pot of virus evolution ndasheukaryogenesis
Phage scaffold(virus hallmark genes)
Eukaryotic additionsdisplacements
Koonin Yutin Intervirology 2010
Poliovirus74 kb
VP0 3DPol2C 3CPro Angg
IRESVP1VP3 2APro 2B
3A 3B
Jelly-roll CPsSuperfamily 3
helicase RdRp
Viral hallmark genes
3C-Pro3DPol
RdRp3CPro g
VPg2C
S3H
Picornaviral lsquosignaturersquo genes
Protein sequence conservation
Some of the smallest viruses (this is where poliovirus belongs) The Big Bang of picorna-like virus evolution
Koonin Wolf Nagasaki Dolja Nat Rev Microbiol 2008 Dec6(12)925-39
Picorna-likeviruses possessdiverse arraysof hallmark andunique genes
Picorna-likeviral superfamily
Marine eukaryotic plankton carries a wealth ofpositive-strand RNA viruses nearly all belong to the picorna-like superfamily
Lang et al (2004) Virology 320206
HcRNAV44 kb
VP
HaRNAV86 kb
S-Pro Pol
VP1VP3VP2PolHel3
RsRNAV89 kb AnPolHel3 VP 1-3
SssRNAV90 kb An
PolHel3 VP 1-3C-Pro
sg sg
An
Nagasaki et al (2005) Appl Env Microbiol 718888
Culley et al (2006) Science 3121795
The amended Picornavirus-like
superfamily includes 14 recognized viral families
4 floating genera and15 unclassified positive-
strand anddouble-strand RNA
viruses that infect hosts from 4 of the 5 eukaryotic
supergroups-6 distinct clades
from RdRp phylogeny-diverse genome layouts
View from the viral side
6 major clades of picorna-likeviruses - 5 infect eukaryotesfrom different supergroups
The 5 supergroups of eukaryotes and their picorna-like viruses
Complementary view from thehost side
4 of the 5 eukaryoticsupergroups host diverse picorna-like viruses (no onechecked the 5th)
PolioV74 kb
VP0 3DPol2C 3CPro Angg
IRESVP1VP3 2APro 2B
3A 3B
Jelly-roll CPsSuperfamily 3
helicase RdRp
DNA phages Bacterial group II
retroelements
Bacterial and mitochondrialHtrA family proteases
Likely origins of the signature genes of the picorna-like superfamily were inferred usingsignature PSSMs and PSI-BLAST searches
3C-Pro
Hallmark genes of picorna-like viruses have distinct prokaryoticorigins
Radiation of major viralclades occurred in aldquoBig Bangrdquo during eukaryogenesis and antedates the divergenceof eukaryotic supergroups-the viruses then ldquosampledrdquo the hosts
Conclusions on the picorna and NCLDV storiesBig Bangs of virus evolution
bullPhylogenomic analysis of the picorna-like superfamily of eukaryotic RNA viruses indicates that the major lineages within this superfamily diverged prior to the divergence of the eukaryotic supergroupsbullBig Bang ndash an explosive early phase of viral evolutionbullMost likely the same pattern holds for other major groups ofviruses as illustrated by the evolutionary study of NCLDVa completely different group of virusesbullThe Big Bangs of eukaryotic virus evolution occurred concomitantly with a similar rapid phase of host evolution andcould be a manifestation of a general model of major evolutionarytransitions
Koonin The Biological Big Bang model for the major transitions in evolutionBiol Direct 2007 Aug 20221
Koonin Wolf Nagasaki Dolja Nat Rev Microbiol 2008 Dec6(12)925-39
+
1 Positive-strandRNA
3-30 kb - +R R ET
T+R
Class Replication cycle
2 Double-strandRNA
4-25 kb plusmnTr R E
T
+ plusmn
3 Negative-strandRNA
11-20 kbR R E
RdRp CPT
+-Tr -
+
Genetic cycles of RNA viruses
Diversity of viral genetic cycles versus the uniform genetic cycle of all cellular organisms Viruses are the
biospherersquos laboratory of genomic strategies
RdRp CP
RdRp CP
+4 Retroid
RNA viruses 7-12 kb +RT Tr E
T+Tr
plusmn
5 Retroid DNAviruses
elements2-10 kb +Tr
T+Tr
plusmnE RT
plusmn
RT CP
RT
Class Replication cycle
Genetic cycles of retroid viruses and retroelements
Genetic cycles of DNA viruses and plasmids 6 ssDNAvirusesplasmids2-11 kb
7 dsDNA virusesplasmids
5-1200 kb
+RCR RCR E
+Tr
plusmn +
TRCE
Tr
plusmn
+T
plusmnR E
Pr-Pol ALL CELLULARORGANISMS
YOU ARE HERE
I Genes with readily detectable homologs from cellular life forms 1 Genes with closely related homologs from cellular organisms
(typically the host of the given virus) present in a narrow group of viruses2 Genes that are conserved within a virus lineage or even several
lineages and have moderately close cellular homologsOrigin relatively recent (1) or ancient (2) acquisition from host
II Virus-specific genes3 ORFans ie genes without detectable homologs except possibly
in closely related viruses 4 Virus-specific genes that are conserved within a virus lineageAcquisition from host but with rapid divergence from ancestor once within viral genomes
III Viral hallmark genes5 Genes shared by many diverse virus lineages with only very distant
homologs in cellular organisms
Natural history of viral genes a one-page summary of viral comparative genomics
Contributions of different classes of viral genes to the genomes of different classes of viruses strong dependence on genome size
Genome size (log) 0 1 2 3
Recent acquisitions
Old acquisitions
Virus-specific ORFans
Virus-specific conserved
Virus hallmark
Most RNAvirusesretroelementsRCR replicons
Large RNA virusesadenotailedphages
NCLDVherpeslargephages
Natural history of viral genesViral Hallmark Genes
Shared by many diverse groups of viruses from the smallest RNA viruses to the giant DNA viruses
Strong support for monophyly of all viral members of the respective gene families
Only distant homologs in cellular organisms
Can be viewed as signatures of the lsquovirus statersquo
Play major roles in genome replication packagingand assembly
4 Rolling circle replicationinitiation endonuclease
1 Jelly-roll capsid protein
2 Superfamily 3 helicase
Protein products of viral hallmark genes
3 RNA-dependent RNA polymeraseand Reverse transcriptase
5 Viral archaeo-eukaryotic DNA primase
6 UL9-like superfamily 2 helicase
7 Packaging ATPase of the FtsK family
8 ATPase subunit of terminase
The primordial gene pool hypothesis (extremely counterintuitive ndash Santiago Elena Feb 16 2011)
The hallmark genes AND by implication the major lineages of modern viruses (at least viruses of prokaryotes) descend directly from a primordial gene pool
Koonin Senkevich Dolja Biol Direct 2006 129
Viral hallmark genes arebullpresent in a huge diversity of viruses and other selfish elementsbullrepresented only by remote homologs in cellular life forms
-synergy with the diversity of genomic strategies
A crucial corollary If viruses come directly from a primordial gene pool then origin of viruses isinextricably linked to the origin of cells
Cell degeneration Escaped genes Pr imordial geneticsystems
CELL
SMALLPARASITICCELL
VERYSMALLVIRUS
CHROMOSOME
PLASMID
VIRUS
RNA
VIRUS
PRE-CELLULAR LIFE FORMS
DNA
VIRUS
mRNA
VIRUS
Competing concepts of the or igin of viruses
CELLS
Origin and evolution of virus-like genetic
elementsin the pre-cellular
era
Koonin Martin TIG 2005Koonin Senkevich Dolja Biol Direct 2006 129Koonin Ann NY Acad Sci 2009
Replicon fusionyields chromosomes
bullViruses and virus-like genetic elements are not ldquojustrdquo pathogens they aredominant entities in the biospherebullEmergence of virus-like parasites is inevitable in any replicating systembullIn the pre-cellular epoch the genetic elements that later became viral andcellular genomes comprised a single pool in which they mixed matched andevolved new increasingly complex gene ensemblesbullDifferent replication strategies including RNA replication reverse transcriptionand DNA replication evolved already in the primordial genetic poolbullWith the emergence of prokaryotic cells a distinct pool of viral genes formed that retained its identity ever since as evidenced by the extant distribution ofviral hallmark genes ldquovirus worldrdquo or the virospherebullThe emergence of the eukaryotic cell was a second melting pot of virus evolution from which viruses of eukaryotes originated via recombination of genes from prokaryote viruses retroelements and the evolving eukaryotic hostbullViruses make essential contributions to the evolution of the genomes of cellularlife forms in particular as vehicles of HGT GTAs transducing phages
The ancient Virus World (VW)
Koonin EV Senkevich TG Dolja VV The ancient Virus World and evolution of cells Biol Direct 2006 Koonin EV Wolf YI Nagasaki K Dolja VV The complexity of the virus world Nat Rev Microbiol 2009
Evolution of antivirus defense systems
bull CRISPRCas system of adaptive immunity in prokaryotes
bull A case for Lamarckian evolutionbull The perennial virus-host arms race
CRISPR repeats and Cas genesCRISPR Clustered Regularly interspaced short palindromic repeatsCas CRISPR-associated (genes)
Sorek et al Nature Rev Microbiol 2008
Makarova KS Aravind L Grishin NV Rogozin IB Koonin EVA DNA repair system specific for thermophilic Archaea and bacteria predicted by genomic context analysisNucleic Acids Res 2002 Jan 1530(2)482-96
During a systematic analysis of conserved gene context in prokaryotic genomes a previously undetected complex partially conserved neighborhood consisting of more than 20 genes was discovered in most Archaea and some bacteria including the hyperthermophiles Thermotoga maritima and Aquifex aeolicus The gene composition and gene order in this neighborhood vary greatly between species but all versions have a stable conserved core that consists of five genes One of the core genes encodes a predicted DNA helicase often fused to a predicted HD-superfamily hydrolase and another encodes a RecB family exonuclease three core genes remain uncharacterized but one of these might encode a nuclease of a new family helliphelliphelliphellipThe functional features of the proteins encoded in this neighborhood suggest that they comprise a previously undetected DNA repair system which to our knowledge is the first repair system largely specific for thermophiles to be identified
Protein components of the system an update with unification of many diverse families
~25 families altogetherFamily SubfamilyA Phyletic
distributionBComments
1 COG1518 COG1518 (cas1) All Putative novel nucleaseintegrase Mostly α-helical protein
2 COG1343 COG1343 (cas2) COG3512ygbF-like
MTH324-likey1723_N-like
All Small protein related to VapD fused to helicase (COG1203) iny1723-like proteins
3 COG1203 COG1203 (cas3) All DNA helicase Most proteins have fusion to HD nuclease
4 RecB-like nuclease
COG1468 (cas4) COG4343 All RecB-like nuclease Contains three-cysteine C-terminal cluster
5 RAMP Repair-associated mysterious
protein
COG1688 (cas5)COG1769 COG1583COG1567 COG1336COG1367 COG1604COG1337 COG1332COG5551
BH0337-likeMJ0978-likeYgcH-like
y1726-like y1727-like
All Belong to ldquoRAMPrdquo family possibly RNA-binding proteinstructurally related to duplicated ferredoxin fold (PDB 1wj9)
6 COG1857 COG1857 COG3649 YgcJ-like y1725-like
All αβ protein predicted nuclease or integrase
7 HD-like nuclease
COG1203 (N-terminus) COG2254 All HD-like nuclease
8 BH0338 BH0338-likeMTH1090-like
All mostlyarchaea and FIRM
Large Zn-finger containing proteins probably nucleases (nucleaseactivity was shown for MTH1090 (ref)
9 COG1353 predicted
polymerase
COG1353 Most archaea some bacteria
Predicted palm-domain polymerase distantly related to viral RdRp and RT
hellipand ~15 other less common proteins Makarova et al BD 2006
CRISPR clustered regularly interspaced short palindromic repeats
CRISPR repeats
TB and TA
Strepto-like
Ecoli-like
Pasteurella-like
Cas2COG1343
Cas1COG1518
Cas4COG1468
Cas3COG1203
ldquoThe common structural characteristics of CRISPR loci are (i) the presence of multiple short direct repeats which show no or very little sequence variation within a given locus (ii) the presence of non-repetitive spacer sequences between the repeats of similar size (iii) the presence of a common leader sequence of a few hundred basepairs in most species harbouring multiple CRISPR loci (iv) the absence of long open reading frames within the locus and (v) the presence of the cas1 gene accompanied by the cas2 cas3 or cas4 genes in CRISPR-containing speciesrdquo
The cas genes are our ldquorepairrdquo system
ldquoHelicaserdquo cassette
ldquoPolymeraserdquo cassette
paragon of prokaryotic genome plasticity-extensive gene shuffling-evidence of multiple horizontal transfers-widespread gene loss
RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins
extreme sequence diversity MOTIFs specific I II III IV V COGs 13361367 hhshhGs ust-lKGhh+hh hhGtt hD lGhttsGh 160413371332 y1726-like slhlpEKuVRGT lRTIDs YGuVTsGhuh COG1851 hGphpGpsaFh hGFGRh BH0337-like TpA-h+GIh-uIh hhLpDV LGsREhuht COG1567 hhhpp ups-lhtAhh luscoGhGh COG1769 hhhh+Ph-hhh ss-hhGhlhsh hGh hGtcp+hsthchp COG1688 (Cas5) hhhhhts sssshhGhlsh lGttphh COGs 15835551 hhhhoPhhl hGtppshGFGl YgcH-like hHphlh hGu+uhGhGhh y1727-like LHphLh GFsthGLStss MJ0978-like hHNH lG+tsuhGhGol
Ferredoxin foldRNA-recognition motif (RRM)
The sequence similarity space of CRISPR repeats visualized with the BioLayoutprogram [26] Dots denote individual repeat sequences connecting lines represent Smith-Waterman similarities such that closer dots represent more similar sequences Dot colors denote cluster association as derived from MCL clustering The 12 largest clusters are indicated by circles together with their sequence logos coarse phylogenetic composition and sample secondary structures where applicableKunin et al Genome Biology 2007 8R61 doi101186gb-2007-8-4-r61
CRISPR show extreme diversity and complex clustering
Arcful GTTGAAATC-------AGACCAAAATGGGATTGAAAG- Metthe GTTAAAATC-------AGACCAAAATGGGATTGAAAT- Pyraby GTTCCAATA-------AGACTAGAATAGAATTGAAAG- Pyrhor GTTTAATAA--------GACTAAAATAGAATTGAAAG- Sulsol GATTAATCC-------------CAAAAGAATTGAAAG- Pyrhor GTTTCCGTA--------GAACTAAATAGTGTGGAAAG- Pyraby GTTTCCGTA--------GAACTTAGTAGTGTGGAAAG- Pyraer GTTTCAACT------------ATCTTTTGATTTTTGG- Pictor GTTAAATAA-------TAACCTAAATAGGATTGAAAG- Theaer GTAAAATAG---------ACCTTAATAGGATTGAAAG- Metace GTTTCAATC-------CCTCAAAGGTCTGATTTTAAC- Sultok GATGAATCC------------CAAAAGGAATTGAAAG- Sulaci GTTTTAGTT-------------TCTTGTCGTTATTAC- Pictor GTTTAAGAA--------TTACTAGATAGTATGGAGT-- Halmar GTTTCAGAC-------GGACCCTTGTGGGGTTGAAGC- Metjan ATTAAAATC-------AGACCGTTTCGGAATGGAAAT- Metkan GTTTCATTA-CCCGTATTATTACGGGTTAATTGCGAG- Nanequ CTTTCAATA---------TTTCTAATATATTAGAAAC- Themar GTTTCAATA-------CTTCCTTAGAGGTATGGAAAC- Theten GTTTCAATC-------CCTTTTAGGTAGGCTAAAAAC Thethe GTTGCAAAC-CTCGTTAGCCTCGTAGAGGATTGAAAC- Aquaeo GTTTTAACT--------CCACACGGTACATTAGAAAC- Bachal GTCGCACTC----TATATGGGT-GCGTGGATTGAAAT- Azoarcus GTGTTCCCC-------GCGCATCGCGGGGGTTGAAG-- Chltep GTCTTCCCC--------ACGCC-CGTGGGGGTGTTTC- Chrvio GTGCTCCCC-------CACGCA-CGTGGGGATGAACCG Clotet GTATTAGTA-------GCACCA-TATTGGAATGTAAAT Anabaena GTTTTAATTAACAAAAATCCCTATCAGGGATTGAAAC- Myctub GTTTCCGTC--CCCTCTCGGGGTTTTGGGTCTGACGAC Yerpes GTTCACTGC--------CGCACAGGCAGCTTAGAAA-- Metcap GTTTCAATCCACTCCCGGCTATTTAGCCGGGAGATAC- Mycgal GTTTTAGCA-CTGTACAATACTTGTGTAAGCAATAAC- Pasmul GTTAACTGC--------CGTATAGGCAGCTTAGAAA-- Pasmul GTTGTAGTTCCCTCTCTCATTTCGCAGTGCTACAAT-- Pasmul GTTCACCAT--------CGTGTAGATGGCTTAGAAA--
All
A large subset of CRISPRs is conserved among diverse specieseven between archaea and bacteria suggesting that
CRISPRs are horizontally transferred together with cas genes
Mojica FJ Diez-Villasenor C Garcia-Martinez J Soria E
Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elementsJ Mol Evol 2005 Feb60(2)174-82
Here we show that CRISPR spacers derive from preexisting sequences either chromosomal or within transmissible genetic elements such as bacteriophages and conjugative plasmids
helliphelliphelliphellipThe transcription of the CRISPR loci (Tang et al 2002) suggests that such activity could be executed by CRISPR-RNA molecules acting as regulatory RNA that specifically recognizes the target through the homologous RNA-spacer sequence similarly to the eukaryotic interference RNA
HypothesisCRISPRCasbull is a prokaryotic immunity system that functions on the
RNAi principle
bull integrates short fragments of essential phageplasmid genes into CRISPRs
bull When expressed these fragments (psiRNA ndash after prokaryotic siRNA) silence the target gene and make the organism immune to the respective agent
bull contains all or most of the protein activities involved in these processes
bull Some of the Cas proteins are functional analogs of the eukaryotic proteins involved in RNAi in particular components of RISC (RNA-Induced Silencing Complex) and form prokaryotic analogs of RISC (pRISCs)
Transcription
(protein-guided) RNA folding
specific transcription factor (COG1517)
cellular RNApol
polycistronic pre-psiRNA
p-dicer (Helicase+HD-Hydrolase)(COG1203 COG2254)
RNA processing(fast)
pre-psiRNA75-100 nt
RNA processing(slow)
p-dicer
psiRNA25-45 nt
p-slicer(COG1468 COG4343
COG1857)
RAMP
target RNA cleavage
p-RISC
plasmid or phage mRNA
Annealing to RNA target
p-RISC
The basic scheme of CASS functioning
psiRNA25-45 nt
RAMP
RAMP-RNA complex(unstable)plasmid or phage
mRNA
Primer elongationCASS RNApol(COG1353)
Amplified annealing
long dsRNA (stable)
p-dicerDuplex degradation
RAMPRAMP binding
Cycle continues
Annealing to RNA target
Variant of CASS functioning with polymerasepsi-RNA amplification(mostly in thermophiles but also in Mycobacteria)
plasmid or phage mRNA
pre-psiRNA75-100 nt
CASS RNApol(COG1353)Or other RT
Reverse transcription with random copy choice
Random RNA recombination and reverse transcription
dsDNA with CRISPRs and target-derived spacers
Homologous recombination with genomic CRISPR region
integrase(COG1518)
genomic DNA
genomic DNA with new target-derived spacer
OR
New psiRNA generation
Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12
Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity
Key validation
bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease
Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes
(1) Adaptation
(2) Expression amp Processing
(3) Interference
invading virus plasmid
CRISPRleader
5 4 3 2 1
6 5 4 3 2 1
invading virus plasmid
invading virus plasmid
host
host
cas operon
Cascade
Cas3
cas3 cas2cas1cas5
Van der Oost et al TIBS 2009
The three types of CRISPRCas systems and their signature genes
1343
1343
1343
1343
Helicase1203
HD-f
Cas2
Cas6
Cas6
Cas5
Cas8
Cas9
Cas1
Cas3
Cas10
Cas11
Cas7
Cas6
Cas6
Cas5 Cas6
Cas4
Cas6
13433513
1343
RuvC-f
3513
HNH
13433513
1857
18571857 RAMP1688
RAMP1688
RecB-f1468
RecB-f1468
Helicase1203
R AMP1583
1857 1857Helicase1203
1857Helicase1203
1343 y1724
ygcKygcL
EcoliCASS2
YpestCASS3
NmeniCASS4
MtubeCASS6 RAMP module
DvulgCASS1
Tneap and HmariCASS7
ApernCASS5
Type I (Helicase cassette)
Cascade
Type II (HNH-type)
Type III (RAMP module or Polymerase cassette)
CASS4a
1857 1343RecB-f1468
Helicase1203
BH0338MTH1090LA3191
1421 RAMP1567
RAMP15833337RAMP
1583
RAMP1583
RAMP1769
SPy1049
Cas6
Cas61857
RecB-f1468
RAMP1688
Helicase1203
AF0070
R AMP1583
1343
1517
Cas12 Cas13
RAMP1688
RAMP1688
RAMP1688
Makarova et al NRM submitted
Experimental data on CRISPRCas systems
1857
1857
1857
RAMP1688
10
Cordip DIP0037Bacfra BF3951
Wolsuc WS1444Camjej Cj1522c
Neimen NMA0630Pasmul PM1126
Strthe str0658Strpyo Spy 0770
Treden TDE0328Staepi SERP2463
Mycgal MGA 0523Mycmob MMOB0320
Porgin PG1982Legpne lpp0161
Wolsuc WS1615Sulsol SSO1450
Aerper APE1240Sulsol SSO1405
Pyraer PAE0200Pyraer PAE0081
Metmaz MM0559Metmaz MM3249
Arcful AF1878Mansuc MS1635
Niteur NE0111Myctub Rv2817c
Nostoc alr0381Synech slr7071Thethe TTHB145
Chrvio CV1229Xanaxo XAC3842
Chltep CT1130Desvul DVUA0134
Metcap MCA0651Azoarc ebA3283
Deigeo_DRAFT_1682Mansuc MS0981
Baccla ABC3592Bachal BH0341
Strpyo Spy 1286Vibvul VVA1544
Nostoc alr1468Synech slr7092
Synech slr7016Geosul GSU0057
Thethe TTHB224Nostoc alr1568
Metkan MK1312
Esccol b2755Saltyp STM2938
Phopro PBPRC0034Geosul GSU1392
Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639
Symthe_STH663Corjei jk0643
Nocfar nfa44220Strave SAV7537
Metcap MCA0930Cordip DIP2214
Chrvio CV1756Zymmob ZMO0680
Pasmul PM0311Erwcar ECA3679
Yerpes y1722Legpne lpl2837Phopro PBPRB1995
Acinet ACIAD2484
Clotet CTC01148Fusnuc FN1177
Aquaeo aq 369Theten TTE2658Themar TM1797
Bacfra BF2544Porgin PG2014
Halmar pNG4053Clotet CTC01463
Metjan MJ0378Pyrhor PH1245
Thekod TK0455Metthe MTH1084
Arcful AF2435Metace MA3670
Nanequ NEQ017Pyrhor PH0173
Pictor PTO0003Pictor PTO0049Thevol TVN0106
1343
1343
1343
1343
1518
1518
1343
1518
RecB-f4343
RecB-f1468
RecB-f1468
RecB-f1468
Helicase12031857 RAMP
1688
RAMP1688
Helicase1203
Helicase1203
RAMPBH0337
RAMPYgcH
RAMPy1727
RAMPy17261857
Helicase1203
HD-f
Helicase1203
HD-f
1343
HD-f
HD-f
HD-f
RuvC-f
AF0070
BH0338
y1724
ygcKygcL
RAMP1583
HTH
AF1870
SPy1049
3574
AF1870
PH0918
HTH 3574
1857 RAMP1688
RecB-f1468
Helicase1203
HD-f
AF1870
MTH1090
AF1873
RAMP1583
3574
AF1870
3574
1343
1
7
2
3
4
5
6
7a
Sulsol SSO1999Sulsol SSO1440
Sultok ST2639Sultok ST0032
Sulsol SSO1402Pyraer PAE0068
Arcful AF1874Pyrhor PH0917Thekod TK0450
Pyraby PAB1689
13433513
3513
1518
RuvC-f
HNH
4aHNH
1 0
Pyrhor PH0162
6
64
6
4
16
17
7
7
7
7
1
2
7
7
7
7
5 7
5 75 7
5 75 7
5 7
6 1
7
Metkan MK1314
Metace MA1931
Metmaz MM3356
Metbar Mbar_A1355
Strthe stu0960Myctub Rv2823cStaepi SERP2461
Themar TM1811Mansuc MS1651
Niteur NE0123Thethe TTP0102
Metthe MTH1082
Metthe MTH326
Metjan MJ1672Pictor PTO0053Thevol TVN0112
Metkan MK1297Thethe TTP0115
Bachal BH0328Synech sll7090
Aquaeo aq 387Theten TTE2639
Themar TM1794Arcful AF1867
Pyrfur PF1129Thefus Tfu1578
Porgin PG1987Sultok ST1979
Sulsol SSO1991Sulsol SSO1729
Sultok ST0012Sulaci Saci 2046Nostoc all1479
1421 RAMP1567
RAMP1583
RAMPMTH323
RAMP1769
XMTH324
ldquoPolymeraserdquo cassette
ldquoHelicaserdquo cassette(A)
(B)
(C)
Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4
Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56
Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)
mRNA targeting
Phylogeny of Cas1 and the 3 types of CRISPRCas systems
228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes
EcoliCASS2Type IType I
Type I
Type I
Type I
Type III
Type III
Type III
Type III
Type III
Type III
Type II
YpestCASS3
NmeniCASS4
PolymeraseRAMPmoduleMtubeCASS6
PolRAMP
DvulgCASS1
TneapHmariCASS7
ApernCASS5
cas1cas9 cas2 cas4
cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI
II
IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2
RAMP
Makarova et al NRM submitted
CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes
54
256
1537
26
516
2
9
7 56107
3
10
13
383
210
46
216
8
1
7 70211
10
1
0102030405060708090
100
1623
84 6
17 7
2322
0
9
0
0
151
2
1
21
17
20 1
0
1533 28 7 14
0
9 7 40
117 210
0102030405060708090
100
Type I Type II Type IIICas1presenceabsence
Makarova et al in preparation
Modular evolution of the 3 types of CRISPRCas
Makarova et al in preparation
Mol Microbiol 2011 Jan79(2)484-502
A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair
Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF
Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair
Back to repair functions
The 3 major modalities of evolution
Koonin Wolf Biol Direct 2009
CRISPRCas as a bona fide Lamarckian system
Koonin Wolf Biol Direct 2009
Phenomenon Biological rolefunction
Phyletic spread
Lamarckian criteria
Genomic changes caused by environmental factor
Changes are specific to relevant genomic loci
Changes provide adaptation to the causative factor
Bona fide LamarckianCRISPRCas Defense against
viruses and other mobile elements
Most of the Archaeaand many bacteria
Yes Yes Yes
piRNA Defense against transposable elements in germline
Animals Yes Yes Yes
HGT (specific cases)
Adaptation to new environment stress response resistance
Archaea bacteria unicellular eukaryotes
Yes Yes Yes
Quasi-LamarckianHGT (general phenomenon)
Diverse innovations Archaea bacteria unicellular eukaryotes
Yes No Yesno
Stress-induced mutagenesis
Stress responseresistanceadaptation to new conditions
Ubiquitous Yes No or partially
Yes (but general evolvabilityenhanced as well)
Diverse Lamarckian and quasi-Lamarckian phenomena
Koonin Wolf Biol Direct 2009
Stress as a gauge of evolutionary modality
Koonin Wolf Biol Direct 2009
bullEvolution of parasites is intrinsic to any replicator system
bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since
bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms
bullPerennial arms race between parasites and hosts is one of the principal factors of evolution
Valerian DoljaOregon State University
Yuri Wolf NCBITatiana Senkevich NIAID NIH
Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille
Didier Raoult et al
Bill MartinUniv Duesseldorf
Kira Makarova NCBI
b1 Helvib1 Spofr
b1 Trinil2 Invir
l1 Aedtal3 Lymch
l3 Lymdil5 Ambtil5 Frovi
l5 Singrl4 Infsp
m6 Marseillevirusn1 Acapon2 Mamav
q1 Acatuq1 ParFRq1 ParMTq1 ParAR
q1 ParNYq1 Parbu
q6 Ostviq3 Ectsiq3 Felspq2 Emihu
u2 Amsmou2 Melsa
u1 Bovpau1 Orfviu1 Deevi
u1 Swiviu1 Myxviu1 Rabfi
u1 Goaviu1 Lumsku1 Shevi
u1 Tanviu1 Yabliu1 Yabmo
u1 Vacviu1 Varvi
u1 Molcou1 Crovi
u1 Canviu1 Fowvi
c1 Afrsw
1000010000
93159970
10000
1000010000
6702
9983
10000
7557
10000
1000010000
999310000
10000
9910
10000
7074
10000
6735
10000
10000
7417
10000
4747
454010000
9840
1000010000
10000
10000
10000
9720
9911
5452
9871
10000
10000
10000
05
Phycodnaviridae
Poxviridae
Ascoviridaeamp
Iridoviridae
MimiviridaeMarseillevirus
Asfarviridae
Phylogeny of NCLDV based on concatenation of 4 universal genes primase-helicase DNA polymerase packaging ATPase VLTF2 transcription factor
HOSTAnimals+ diverse protists
Amoebozoa Algae animals()
ChlorophytesHaptophytesstramenopiles
Animals haptophytes
bullDivergence of NCLDVmost likely antedatesdivergence of eukaryoticsupergroupsbullAlternativeaddition extensive horizontal transfer of viruses
Boyer Yutin et al PNAS 2009Koonin Yutin Intervirology2010
Animals
Boyer M Yutin N Pagnier I Barrassi L Fournous G Espinosa L Robert C AzzaS Sun S Rossmann MG Suzan-Monti M La Scola B Koonin EV Raoult DPNAS 2009106(51)21848-53
Giant Marseillevirus highlights the role of amoebae as a melting pot in emergence of chimeric microorganisms
Giant viruses such as Mimivirus isolated from amoeba found in aquatic habitatsshow biological sophistication comparable to that of simple cellular life formsand seem to evolve by similar mechanisms including extensive gene duplicationand horizontal gene transfer (HGT) possibly in part through a viral parasitethe virophage We report here the isolation of Marseille virus a previouslyuncharacterized giant virus of amoeba The virions of Marseillevirus encompass a 368-kb genome a minimum of 49 proteins and some messenger RNAs Phylogenetic analysis of core genes indicates that Marseillevirus is the prototypeof a family of nucleocytoplasmic large DNA viruses (NCLDV) of eukaryotes The genome repertoire of the virus is composed of typical NCLDV core genes and genes apparently obtained from eukaryotic hosts and their parasites or symbionts both bacterial and viral We propose that amoebae are melting potsldquoof microbial evolution where diverse forms emerge including giant viruses with complex gene repertoires of various origins
The mosaic composition of the Marseille virus genome
Gene content tree intervirus gene transfer
Amoeba as a melting pot for HGT between viruses andbacterial endosymbionts
There are really weird creatures out thereSome NCLDV host their own parasites
La Scola et al The virophage as a unique parasite of the giant mimivirus Nature 2008
Chimeric origin of the virophage genome
NCLDV
Mimivirus
Archaeal virus
NCLDVenvironmental only
Bacterialtransposon
La Scola et al Nature 2008 455100-4
Hypothesisorigin of the NCLDV (and other viruses ofEukaryotes) in the
the second melting pot of virus evolution ndasheukaryogenesis
Phage scaffold(virus hallmark genes)
Eukaryotic additionsdisplacements
Koonin Yutin Intervirology 2010
Poliovirus74 kb
VP0 3DPol2C 3CPro Angg
IRESVP1VP3 2APro 2B
3A 3B
Jelly-roll CPsSuperfamily 3
helicase RdRp
Viral hallmark genes
3C-Pro3DPol
RdRp3CPro g
VPg2C
S3H
Picornaviral lsquosignaturersquo genes
Protein sequence conservation
Some of the smallest viruses (this is where poliovirus belongs) The Big Bang of picorna-like virus evolution
Koonin Wolf Nagasaki Dolja Nat Rev Microbiol 2008 Dec6(12)925-39
Picorna-likeviruses possessdiverse arraysof hallmark andunique genes
Picorna-likeviral superfamily
Marine eukaryotic plankton carries a wealth ofpositive-strand RNA viruses nearly all belong to the picorna-like superfamily
Lang et al (2004) Virology 320206
HcRNAV44 kb
VP
HaRNAV86 kb
S-Pro Pol
VP1VP3VP2PolHel3
RsRNAV89 kb AnPolHel3 VP 1-3
SssRNAV90 kb An
PolHel3 VP 1-3C-Pro
sg sg
An
Nagasaki et al (2005) Appl Env Microbiol 718888
Culley et al (2006) Science 3121795
The amended Picornavirus-like
superfamily includes 14 recognized viral families
4 floating genera and15 unclassified positive-
strand anddouble-strand RNA
viruses that infect hosts from 4 of the 5 eukaryotic
supergroups-6 distinct clades
from RdRp phylogeny-diverse genome layouts
View from the viral side
6 major clades of picorna-likeviruses - 5 infect eukaryotesfrom different supergroups
The 5 supergroups of eukaryotes and their picorna-like viruses
Complementary view from thehost side
4 of the 5 eukaryoticsupergroups host diverse picorna-like viruses (no onechecked the 5th)
PolioV74 kb
VP0 3DPol2C 3CPro Angg
IRESVP1VP3 2APro 2B
3A 3B
Jelly-roll CPsSuperfamily 3
helicase RdRp
DNA phages Bacterial group II
retroelements
Bacterial and mitochondrialHtrA family proteases
Likely origins of the signature genes of the picorna-like superfamily were inferred usingsignature PSSMs and PSI-BLAST searches
3C-Pro
Hallmark genes of picorna-like viruses have distinct prokaryoticorigins
Radiation of major viralclades occurred in aldquoBig Bangrdquo during eukaryogenesis and antedates the divergenceof eukaryotic supergroups-the viruses then ldquosampledrdquo the hosts
Conclusions on the picorna and NCLDV storiesBig Bangs of virus evolution
bullPhylogenomic analysis of the picorna-like superfamily of eukaryotic RNA viruses indicates that the major lineages within this superfamily diverged prior to the divergence of the eukaryotic supergroupsbullBig Bang ndash an explosive early phase of viral evolutionbullMost likely the same pattern holds for other major groups ofviruses as illustrated by the evolutionary study of NCLDVa completely different group of virusesbullThe Big Bangs of eukaryotic virus evolution occurred concomitantly with a similar rapid phase of host evolution andcould be a manifestation of a general model of major evolutionarytransitions
Koonin The Biological Big Bang model for the major transitions in evolutionBiol Direct 2007 Aug 20221
Koonin Wolf Nagasaki Dolja Nat Rev Microbiol 2008 Dec6(12)925-39
+
1 Positive-strandRNA
3-30 kb - +R R ET
T+R
Class Replication cycle
2 Double-strandRNA
4-25 kb plusmnTr R E
T
+ plusmn
3 Negative-strandRNA
11-20 kbR R E
RdRp CPT
+-Tr -
+
Genetic cycles of RNA viruses
Diversity of viral genetic cycles versus the uniform genetic cycle of all cellular organisms Viruses are the
biospherersquos laboratory of genomic strategies
RdRp CP
RdRp CP
+4 Retroid
RNA viruses 7-12 kb +RT Tr E
T+Tr
plusmn
5 Retroid DNAviruses
elements2-10 kb +Tr
T+Tr
plusmnE RT
plusmn
RT CP
RT
Class Replication cycle
Genetic cycles of retroid viruses and retroelements
Genetic cycles of DNA viruses and plasmids 6 ssDNAvirusesplasmids2-11 kb
7 dsDNA virusesplasmids
5-1200 kb
+RCR RCR E
+Tr
plusmn +
TRCE
Tr
plusmn
+T
plusmnR E
Pr-Pol ALL CELLULARORGANISMS
YOU ARE HERE
I Genes with readily detectable homologs from cellular life forms 1 Genes with closely related homologs from cellular organisms
(typically the host of the given virus) present in a narrow group of viruses2 Genes that are conserved within a virus lineage or even several
lineages and have moderately close cellular homologsOrigin relatively recent (1) or ancient (2) acquisition from host
II Virus-specific genes3 ORFans ie genes without detectable homologs except possibly
in closely related viruses 4 Virus-specific genes that are conserved within a virus lineageAcquisition from host but with rapid divergence from ancestor once within viral genomes
III Viral hallmark genes5 Genes shared by many diverse virus lineages with only very distant
homologs in cellular organisms
Natural history of viral genes a one-page summary of viral comparative genomics
Contributions of different classes of viral genes to the genomes of different classes of viruses strong dependence on genome size
Genome size (log) 0 1 2 3
Recent acquisitions
Old acquisitions
Virus-specific ORFans
Virus-specific conserved
Virus hallmark
Most RNAvirusesretroelementsRCR replicons
Large RNA virusesadenotailedphages
NCLDVherpeslargephages
Natural history of viral genesViral Hallmark Genes
Shared by many diverse groups of viruses from the smallest RNA viruses to the giant DNA viruses
Strong support for monophyly of all viral members of the respective gene families
Only distant homologs in cellular organisms
Can be viewed as signatures of the lsquovirus statersquo
Play major roles in genome replication packagingand assembly
4 Rolling circle replicationinitiation endonuclease
1 Jelly-roll capsid protein
2 Superfamily 3 helicase
Protein products of viral hallmark genes
3 RNA-dependent RNA polymeraseand Reverse transcriptase
5 Viral archaeo-eukaryotic DNA primase
6 UL9-like superfamily 2 helicase
7 Packaging ATPase of the FtsK family
8 ATPase subunit of terminase
The primordial gene pool hypothesis (extremely counterintuitive ndash Santiago Elena Feb 16 2011)
The hallmark genes AND by implication the major lineages of modern viruses (at least viruses of prokaryotes) descend directly from a primordial gene pool
Koonin Senkevich Dolja Biol Direct 2006 129
Viral hallmark genes arebullpresent in a huge diversity of viruses and other selfish elementsbullrepresented only by remote homologs in cellular life forms
-synergy with the diversity of genomic strategies
A crucial corollary If viruses come directly from a primordial gene pool then origin of viruses isinextricably linked to the origin of cells
Cell degeneration Escaped genes Pr imordial geneticsystems
CELL
SMALLPARASITICCELL
VERYSMALLVIRUS
CHROMOSOME
PLASMID
VIRUS
RNA
VIRUS
PRE-CELLULAR LIFE FORMS
DNA
VIRUS
mRNA
VIRUS
Competing concepts of the or igin of viruses
CELLS
Origin and evolution of virus-like genetic
elementsin the pre-cellular
era
Koonin Martin TIG 2005Koonin Senkevich Dolja Biol Direct 2006 129Koonin Ann NY Acad Sci 2009
Replicon fusionyields chromosomes
bullViruses and virus-like genetic elements are not ldquojustrdquo pathogens they aredominant entities in the biospherebullEmergence of virus-like parasites is inevitable in any replicating systembullIn the pre-cellular epoch the genetic elements that later became viral andcellular genomes comprised a single pool in which they mixed matched andevolved new increasingly complex gene ensemblesbullDifferent replication strategies including RNA replication reverse transcriptionand DNA replication evolved already in the primordial genetic poolbullWith the emergence of prokaryotic cells a distinct pool of viral genes formed that retained its identity ever since as evidenced by the extant distribution ofviral hallmark genes ldquovirus worldrdquo or the virospherebullThe emergence of the eukaryotic cell was a second melting pot of virus evolution from which viruses of eukaryotes originated via recombination of genes from prokaryote viruses retroelements and the evolving eukaryotic hostbullViruses make essential contributions to the evolution of the genomes of cellularlife forms in particular as vehicles of HGT GTAs transducing phages
The ancient Virus World (VW)
Koonin EV Senkevich TG Dolja VV The ancient Virus World and evolution of cells Biol Direct 2006 Koonin EV Wolf YI Nagasaki K Dolja VV The complexity of the virus world Nat Rev Microbiol 2009
Evolution of antivirus defense systems
bull CRISPRCas system of adaptive immunity in prokaryotes
bull A case for Lamarckian evolutionbull The perennial virus-host arms race
CRISPR repeats and Cas genesCRISPR Clustered Regularly interspaced short palindromic repeatsCas CRISPR-associated (genes)
Sorek et al Nature Rev Microbiol 2008
Makarova KS Aravind L Grishin NV Rogozin IB Koonin EVA DNA repair system specific for thermophilic Archaea and bacteria predicted by genomic context analysisNucleic Acids Res 2002 Jan 1530(2)482-96
During a systematic analysis of conserved gene context in prokaryotic genomes a previously undetected complex partially conserved neighborhood consisting of more than 20 genes was discovered in most Archaea and some bacteria including the hyperthermophiles Thermotoga maritima and Aquifex aeolicus The gene composition and gene order in this neighborhood vary greatly between species but all versions have a stable conserved core that consists of five genes One of the core genes encodes a predicted DNA helicase often fused to a predicted HD-superfamily hydrolase and another encodes a RecB family exonuclease three core genes remain uncharacterized but one of these might encode a nuclease of a new family helliphelliphelliphellipThe functional features of the proteins encoded in this neighborhood suggest that they comprise a previously undetected DNA repair system which to our knowledge is the first repair system largely specific for thermophiles to be identified
Protein components of the system an update with unification of many diverse families
~25 families altogetherFamily SubfamilyA Phyletic
distributionBComments
1 COG1518 COG1518 (cas1) All Putative novel nucleaseintegrase Mostly α-helical protein
2 COG1343 COG1343 (cas2) COG3512ygbF-like
MTH324-likey1723_N-like
All Small protein related to VapD fused to helicase (COG1203) iny1723-like proteins
3 COG1203 COG1203 (cas3) All DNA helicase Most proteins have fusion to HD nuclease
4 RecB-like nuclease
COG1468 (cas4) COG4343 All RecB-like nuclease Contains three-cysteine C-terminal cluster
5 RAMP Repair-associated mysterious
protein
COG1688 (cas5)COG1769 COG1583COG1567 COG1336COG1367 COG1604COG1337 COG1332COG5551
BH0337-likeMJ0978-likeYgcH-like
y1726-like y1727-like
All Belong to ldquoRAMPrdquo family possibly RNA-binding proteinstructurally related to duplicated ferredoxin fold (PDB 1wj9)
6 COG1857 COG1857 COG3649 YgcJ-like y1725-like
All αβ protein predicted nuclease or integrase
7 HD-like nuclease
COG1203 (N-terminus) COG2254 All HD-like nuclease
8 BH0338 BH0338-likeMTH1090-like
All mostlyarchaea and FIRM
Large Zn-finger containing proteins probably nucleases (nucleaseactivity was shown for MTH1090 (ref)
9 COG1353 predicted
polymerase
COG1353 Most archaea some bacteria
Predicted palm-domain polymerase distantly related to viral RdRp and RT
hellipand ~15 other less common proteins Makarova et al BD 2006
CRISPR clustered regularly interspaced short palindromic repeats
CRISPR repeats
TB and TA
Strepto-like
Ecoli-like
Pasteurella-like
Cas2COG1343
Cas1COG1518
Cas4COG1468
Cas3COG1203
ldquoThe common structural characteristics of CRISPR loci are (i) the presence of multiple short direct repeats which show no or very little sequence variation within a given locus (ii) the presence of non-repetitive spacer sequences between the repeats of similar size (iii) the presence of a common leader sequence of a few hundred basepairs in most species harbouring multiple CRISPR loci (iv) the absence of long open reading frames within the locus and (v) the presence of the cas1 gene accompanied by the cas2 cas3 or cas4 genes in CRISPR-containing speciesrdquo
The cas genes are our ldquorepairrdquo system
ldquoHelicaserdquo cassette
ldquoPolymeraserdquo cassette
paragon of prokaryotic genome plasticity-extensive gene shuffling-evidence of multiple horizontal transfers-widespread gene loss
RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins
extreme sequence diversity MOTIFs specific I II III IV V COGs 13361367 hhshhGs ust-lKGhh+hh hhGtt hD lGhttsGh 160413371332 y1726-like slhlpEKuVRGT lRTIDs YGuVTsGhuh COG1851 hGphpGpsaFh hGFGRh BH0337-like TpA-h+GIh-uIh hhLpDV LGsREhuht COG1567 hhhpp ups-lhtAhh luscoGhGh COG1769 hhhh+Ph-hhh ss-hhGhlhsh hGh hGtcp+hsthchp COG1688 (Cas5) hhhhhts sssshhGhlsh lGttphh COGs 15835551 hhhhoPhhl hGtppshGFGl YgcH-like hHphlh hGu+uhGhGhh y1727-like LHphLh GFsthGLStss MJ0978-like hHNH lG+tsuhGhGol
Ferredoxin foldRNA-recognition motif (RRM)
The sequence similarity space of CRISPR repeats visualized with the BioLayoutprogram [26] Dots denote individual repeat sequences connecting lines represent Smith-Waterman similarities such that closer dots represent more similar sequences Dot colors denote cluster association as derived from MCL clustering The 12 largest clusters are indicated by circles together with their sequence logos coarse phylogenetic composition and sample secondary structures where applicableKunin et al Genome Biology 2007 8R61 doi101186gb-2007-8-4-r61
CRISPR show extreme diversity and complex clustering
Arcful GTTGAAATC-------AGACCAAAATGGGATTGAAAG- Metthe GTTAAAATC-------AGACCAAAATGGGATTGAAAT- Pyraby GTTCCAATA-------AGACTAGAATAGAATTGAAAG- Pyrhor GTTTAATAA--------GACTAAAATAGAATTGAAAG- Sulsol GATTAATCC-------------CAAAAGAATTGAAAG- Pyrhor GTTTCCGTA--------GAACTAAATAGTGTGGAAAG- Pyraby GTTTCCGTA--------GAACTTAGTAGTGTGGAAAG- Pyraer GTTTCAACT------------ATCTTTTGATTTTTGG- Pictor GTTAAATAA-------TAACCTAAATAGGATTGAAAG- Theaer GTAAAATAG---------ACCTTAATAGGATTGAAAG- Metace GTTTCAATC-------CCTCAAAGGTCTGATTTTAAC- Sultok GATGAATCC------------CAAAAGGAATTGAAAG- Sulaci GTTTTAGTT-------------TCTTGTCGTTATTAC- Pictor GTTTAAGAA--------TTACTAGATAGTATGGAGT-- Halmar GTTTCAGAC-------GGACCCTTGTGGGGTTGAAGC- Metjan ATTAAAATC-------AGACCGTTTCGGAATGGAAAT- Metkan GTTTCATTA-CCCGTATTATTACGGGTTAATTGCGAG- Nanequ CTTTCAATA---------TTTCTAATATATTAGAAAC- Themar GTTTCAATA-------CTTCCTTAGAGGTATGGAAAC- Theten GTTTCAATC-------CCTTTTAGGTAGGCTAAAAAC Thethe GTTGCAAAC-CTCGTTAGCCTCGTAGAGGATTGAAAC- Aquaeo GTTTTAACT--------CCACACGGTACATTAGAAAC- Bachal GTCGCACTC----TATATGGGT-GCGTGGATTGAAAT- Azoarcus GTGTTCCCC-------GCGCATCGCGGGGGTTGAAG-- Chltep GTCTTCCCC--------ACGCC-CGTGGGGGTGTTTC- Chrvio GTGCTCCCC-------CACGCA-CGTGGGGATGAACCG Clotet GTATTAGTA-------GCACCA-TATTGGAATGTAAAT Anabaena GTTTTAATTAACAAAAATCCCTATCAGGGATTGAAAC- Myctub GTTTCCGTC--CCCTCTCGGGGTTTTGGGTCTGACGAC Yerpes GTTCACTGC--------CGCACAGGCAGCTTAGAAA-- Metcap GTTTCAATCCACTCCCGGCTATTTAGCCGGGAGATAC- Mycgal GTTTTAGCA-CTGTACAATACTTGTGTAAGCAATAAC- Pasmul GTTAACTGC--------CGTATAGGCAGCTTAGAAA-- Pasmul GTTGTAGTTCCCTCTCTCATTTCGCAGTGCTACAAT-- Pasmul GTTCACCAT--------CGTGTAGATGGCTTAGAAA--
All
A large subset of CRISPRs is conserved among diverse specieseven between archaea and bacteria suggesting that
CRISPRs are horizontally transferred together with cas genes
Mojica FJ Diez-Villasenor C Garcia-Martinez J Soria E
Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elementsJ Mol Evol 2005 Feb60(2)174-82
Here we show that CRISPR spacers derive from preexisting sequences either chromosomal or within transmissible genetic elements such as bacteriophages and conjugative plasmids
helliphelliphelliphellipThe transcription of the CRISPR loci (Tang et al 2002) suggests that such activity could be executed by CRISPR-RNA molecules acting as regulatory RNA that specifically recognizes the target through the homologous RNA-spacer sequence similarly to the eukaryotic interference RNA
HypothesisCRISPRCasbull is a prokaryotic immunity system that functions on the
RNAi principle
bull integrates short fragments of essential phageplasmid genes into CRISPRs
bull When expressed these fragments (psiRNA ndash after prokaryotic siRNA) silence the target gene and make the organism immune to the respective agent
bull contains all or most of the protein activities involved in these processes
bull Some of the Cas proteins are functional analogs of the eukaryotic proteins involved in RNAi in particular components of RISC (RNA-Induced Silencing Complex) and form prokaryotic analogs of RISC (pRISCs)
Transcription
(protein-guided) RNA folding
specific transcription factor (COG1517)
cellular RNApol
polycistronic pre-psiRNA
p-dicer (Helicase+HD-Hydrolase)(COG1203 COG2254)
RNA processing(fast)
pre-psiRNA75-100 nt
RNA processing(slow)
p-dicer
psiRNA25-45 nt
p-slicer(COG1468 COG4343
COG1857)
RAMP
target RNA cleavage
p-RISC
plasmid or phage mRNA
Annealing to RNA target
p-RISC
The basic scheme of CASS functioning
psiRNA25-45 nt
RAMP
RAMP-RNA complex(unstable)plasmid or phage
mRNA
Primer elongationCASS RNApol(COG1353)
Amplified annealing
long dsRNA (stable)
p-dicerDuplex degradation
RAMPRAMP binding
Cycle continues
Annealing to RNA target
Variant of CASS functioning with polymerasepsi-RNA amplification(mostly in thermophiles but also in Mycobacteria)
plasmid or phage mRNA
pre-psiRNA75-100 nt
CASS RNApol(COG1353)Or other RT
Reverse transcription with random copy choice
Random RNA recombination and reverse transcription
dsDNA with CRISPRs and target-derived spacers
Homologous recombination with genomic CRISPR region
integrase(COG1518)
genomic DNA
genomic DNA with new target-derived spacer
OR
New psiRNA generation
Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12
Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity
Key validation
bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease
Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes
(1) Adaptation
(2) Expression amp Processing
(3) Interference
invading virus plasmid
CRISPRleader
5 4 3 2 1
6 5 4 3 2 1
invading virus plasmid
invading virus plasmid
host
host
cas operon
Cascade
Cas3
cas3 cas2cas1cas5
Van der Oost et al TIBS 2009
The three types of CRISPRCas systems and their signature genes
1343
1343
1343
1343
Helicase1203
HD-f
Cas2
Cas6
Cas6
Cas5
Cas8
Cas9
Cas1
Cas3
Cas10
Cas11
Cas7
Cas6
Cas6
Cas5 Cas6
Cas4
Cas6
13433513
1343
RuvC-f
3513
HNH
13433513
1857
18571857 RAMP1688
RAMP1688
RecB-f1468
RecB-f1468
Helicase1203
R AMP1583
1857 1857Helicase1203
1857Helicase1203
1343 y1724
ygcKygcL
EcoliCASS2
YpestCASS3
NmeniCASS4
MtubeCASS6 RAMP module
DvulgCASS1
Tneap and HmariCASS7
ApernCASS5
Type I (Helicase cassette)
Cascade
Type II (HNH-type)
Type III (RAMP module or Polymerase cassette)
CASS4a
1857 1343RecB-f1468
Helicase1203
BH0338MTH1090LA3191
1421 RAMP1567
RAMP15833337RAMP
1583
RAMP1583
RAMP1769
SPy1049
Cas6
Cas61857
RecB-f1468
RAMP1688
Helicase1203
AF0070
R AMP1583
1343
1517
Cas12 Cas13
RAMP1688
RAMP1688
RAMP1688
Makarova et al NRM submitted
Experimental data on CRISPRCas systems
1857
1857
1857
RAMP1688
10
Cordip DIP0037Bacfra BF3951
Wolsuc WS1444Camjej Cj1522c
Neimen NMA0630Pasmul PM1126
Strthe str0658Strpyo Spy 0770
Treden TDE0328Staepi SERP2463
Mycgal MGA 0523Mycmob MMOB0320
Porgin PG1982Legpne lpp0161
Wolsuc WS1615Sulsol SSO1450
Aerper APE1240Sulsol SSO1405
Pyraer PAE0200Pyraer PAE0081
Metmaz MM0559Metmaz MM3249
Arcful AF1878Mansuc MS1635
Niteur NE0111Myctub Rv2817c
Nostoc alr0381Synech slr7071Thethe TTHB145
Chrvio CV1229Xanaxo XAC3842
Chltep CT1130Desvul DVUA0134
Metcap MCA0651Azoarc ebA3283
Deigeo_DRAFT_1682Mansuc MS0981
Baccla ABC3592Bachal BH0341
Strpyo Spy 1286Vibvul VVA1544
Nostoc alr1468Synech slr7092
Synech slr7016Geosul GSU0057
Thethe TTHB224Nostoc alr1568
Metkan MK1312
Esccol b2755Saltyp STM2938
Phopro PBPRC0034Geosul GSU1392
Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639
Symthe_STH663Corjei jk0643
Nocfar nfa44220Strave SAV7537
Metcap MCA0930Cordip DIP2214
Chrvio CV1756Zymmob ZMO0680
Pasmul PM0311Erwcar ECA3679
Yerpes y1722Legpne lpl2837Phopro PBPRB1995
Acinet ACIAD2484
Clotet CTC01148Fusnuc FN1177
Aquaeo aq 369Theten TTE2658Themar TM1797
Bacfra BF2544Porgin PG2014
Halmar pNG4053Clotet CTC01463
Metjan MJ0378Pyrhor PH1245
Thekod TK0455Metthe MTH1084
Arcful AF2435Metace MA3670
Nanequ NEQ017Pyrhor PH0173
Pictor PTO0003Pictor PTO0049Thevol TVN0106
1343
1343
1343
1343
1518
1518
1343
1518
RecB-f4343
RecB-f1468
RecB-f1468
RecB-f1468
Helicase12031857 RAMP
1688
RAMP1688
Helicase1203
Helicase1203
RAMPBH0337
RAMPYgcH
RAMPy1727
RAMPy17261857
Helicase1203
HD-f
Helicase1203
HD-f
1343
HD-f
HD-f
HD-f
RuvC-f
AF0070
BH0338
y1724
ygcKygcL
RAMP1583
HTH
AF1870
SPy1049
3574
AF1870
PH0918
HTH 3574
1857 RAMP1688
RecB-f1468
Helicase1203
HD-f
AF1870
MTH1090
AF1873
RAMP1583
3574
AF1870
3574
1343
1
7
2
3
4
5
6
7a
Sulsol SSO1999Sulsol SSO1440
Sultok ST2639Sultok ST0032
Sulsol SSO1402Pyraer PAE0068
Arcful AF1874Pyrhor PH0917Thekod TK0450
Pyraby PAB1689
13433513
3513
1518
RuvC-f
HNH
4aHNH
1 0
Pyrhor PH0162
6
64
6
4
16
17
7
7
7
7
1
2
7
7
7
7
5 7
5 75 7
5 75 7
5 7
6 1
7
Metkan MK1314
Metace MA1931
Metmaz MM3356
Metbar Mbar_A1355
Strthe stu0960Myctub Rv2823cStaepi SERP2461
Themar TM1811Mansuc MS1651
Niteur NE0123Thethe TTP0102
Metthe MTH1082
Metthe MTH326
Metjan MJ1672Pictor PTO0053Thevol TVN0112
Metkan MK1297Thethe TTP0115
Bachal BH0328Synech sll7090
Aquaeo aq 387Theten TTE2639
Themar TM1794Arcful AF1867
Pyrfur PF1129Thefus Tfu1578
Porgin PG1987Sultok ST1979
Sulsol SSO1991Sulsol SSO1729
Sultok ST0012Sulaci Saci 2046Nostoc all1479
1421 RAMP1567
RAMP1583
RAMPMTH323
RAMP1769
XMTH324
ldquoPolymeraserdquo cassette
ldquoHelicaserdquo cassette(A)
(B)
(C)
Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4
Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56
Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)
mRNA targeting
Phylogeny of Cas1 and the 3 types of CRISPRCas systems
228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes
EcoliCASS2Type IType I
Type I
Type I
Type I
Type III
Type III
Type III
Type III
Type III
Type III
Type II
YpestCASS3
NmeniCASS4
PolymeraseRAMPmoduleMtubeCASS6
PolRAMP
DvulgCASS1
TneapHmariCASS7
ApernCASS5
cas1cas9 cas2 cas4
cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI
II
IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2
RAMP
Makarova et al NRM submitted
CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes
54
256
1537
26
516
2
9
7 56107
3
10
13
383
210
46
216
8
1
7 70211
10
1
0102030405060708090
100
1623
84 6
17 7
2322
0
9
0
0
151
2
1
21
17
20 1
0
1533 28 7 14
0
9 7 40
117 210
0102030405060708090
100
Type I Type II Type IIICas1presenceabsence
Makarova et al in preparation
Modular evolution of the 3 types of CRISPRCas
Makarova et al in preparation
Mol Microbiol 2011 Jan79(2)484-502
A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair
Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF
Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair
Back to repair functions
The 3 major modalities of evolution
Koonin Wolf Biol Direct 2009
CRISPRCas as a bona fide Lamarckian system
Koonin Wolf Biol Direct 2009
Phenomenon Biological rolefunction
Phyletic spread
Lamarckian criteria
Genomic changes caused by environmental factor
Changes are specific to relevant genomic loci
Changes provide adaptation to the causative factor
Bona fide LamarckianCRISPRCas Defense against
viruses and other mobile elements
Most of the Archaeaand many bacteria
Yes Yes Yes
piRNA Defense against transposable elements in germline
Animals Yes Yes Yes
HGT (specific cases)
Adaptation to new environment stress response resistance
Archaea bacteria unicellular eukaryotes
Yes Yes Yes
Quasi-LamarckianHGT (general phenomenon)
Diverse innovations Archaea bacteria unicellular eukaryotes
Yes No Yesno
Stress-induced mutagenesis
Stress responseresistanceadaptation to new conditions
Ubiquitous Yes No or partially
Yes (but general evolvabilityenhanced as well)
Diverse Lamarckian and quasi-Lamarckian phenomena
Koonin Wolf Biol Direct 2009
Stress as a gauge of evolutionary modality
Koonin Wolf Biol Direct 2009
bullEvolution of parasites is intrinsic to any replicator system
bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since
bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms
bullPerennial arms race between parasites and hosts is one of the principal factors of evolution
Valerian DoljaOregon State University
Yuri Wolf NCBITatiana Senkevich NIAID NIH
Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille
Didier Raoult et al
Bill MartinUniv Duesseldorf
Kira Makarova NCBI
Boyer M Yutin N Pagnier I Barrassi L Fournous G Espinosa L Robert C AzzaS Sun S Rossmann MG Suzan-Monti M La Scola B Koonin EV Raoult DPNAS 2009106(51)21848-53
Giant Marseillevirus highlights the role of amoebae as a melting pot in emergence of chimeric microorganisms
Giant viruses such as Mimivirus isolated from amoeba found in aquatic habitatsshow biological sophistication comparable to that of simple cellular life formsand seem to evolve by similar mechanisms including extensive gene duplicationand horizontal gene transfer (HGT) possibly in part through a viral parasitethe virophage We report here the isolation of Marseille virus a previouslyuncharacterized giant virus of amoeba The virions of Marseillevirus encompass a 368-kb genome a minimum of 49 proteins and some messenger RNAs Phylogenetic analysis of core genes indicates that Marseillevirus is the prototypeof a family of nucleocytoplasmic large DNA viruses (NCLDV) of eukaryotes The genome repertoire of the virus is composed of typical NCLDV core genes and genes apparently obtained from eukaryotic hosts and their parasites or symbionts both bacterial and viral We propose that amoebae are melting potsldquoof microbial evolution where diverse forms emerge including giant viruses with complex gene repertoires of various origins
The mosaic composition of the Marseille virus genome
Gene content tree intervirus gene transfer
Amoeba as a melting pot for HGT between viruses andbacterial endosymbionts
There are really weird creatures out thereSome NCLDV host their own parasites
La Scola et al The virophage as a unique parasite of the giant mimivirus Nature 2008
Chimeric origin of the virophage genome
NCLDV
Mimivirus
Archaeal virus
NCLDVenvironmental only
Bacterialtransposon
La Scola et al Nature 2008 455100-4
Hypothesisorigin of the NCLDV (and other viruses ofEukaryotes) in the
the second melting pot of virus evolution ndasheukaryogenesis
Phage scaffold(virus hallmark genes)
Eukaryotic additionsdisplacements
Koonin Yutin Intervirology 2010
Poliovirus74 kb
VP0 3DPol2C 3CPro Angg
IRESVP1VP3 2APro 2B
3A 3B
Jelly-roll CPsSuperfamily 3
helicase RdRp
Viral hallmark genes
3C-Pro3DPol
RdRp3CPro g
VPg2C
S3H
Picornaviral lsquosignaturersquo genes
Protein sequence conservation
Some of the smallest viruses (this is where poliovirus belongs) The Big Bang of picorna-like virus evolution
Koonin Wolf Nagasaki Dolja Nat Rev Microbiol 2008 Dec6(12)925-39
Picorna-likeviruses possessdiverse arraysof hallmark andunique genes
Picorna-likeviral superfamily
Marine eukaryotic plankton carries a wealth ofpositive-strand RNA viruses nearly all belong to the picorna-like superfamily
Lang et al (2004) Virology 320206
HcRNAV44 kb
VP
HaRNAV86 kb
S-Pro Pol
VP1VP3VP2PolHel3
RsRNAV89 kb AnPolHel3 VP 1-3
SssRNAV90 kb An
PolHel3 VP 1-3C-Pro
sg sg
An
Nagasaki et al (2005) Appl Env Microbiol 718888
Culley et al (2006) Science 3121795
The amended Picornavirus-like
superfamily includes 14 recognized viral families
4 floating genera and15 unclassified positive-
strand anddouble-strand RNA
viruses that infect hosts from 4 of the 5 eukaryotic
supergroups-6 distinct clades
from RdRp phylogeny-diverse genome layouts
View from the viral side
6 major clades of picorna-likeviruses - 5 infect eukaryotesfrom different supergroups
The 5 supergroups of eukaryotes and their picorna-like viruses
Complementary view from thehost side
4 of the 5 eukaryoticsupergroups host diverse picorna-like viruses (no onechecked the 5th)
PolioV74 kb
VP0 3DPol2C 3CPro Angg
IRESVP1VP3 2APro 2B
3A 3B
Jelly-roll CPsSuperfamily 3
helicase RdRp
DNA phages Bacterial group II
retroelements
Bacterial and mitochondrialHtrA family proteases
Likely origins of the signature genes of the picorna-like superfamily were inferred usingsignature PSSMs and PSI-BLAST searches
3C-Pro
Hallmark genes of picorna-like viruses have distinct prokaryoticorigins
Radiation of major viralclades occurred in aldquoBig Bangrdquo during eukaryogenesis and antedates the divergenceof eukaryotic supergroups-the viruses then ldquosampledrdquo the hosts
Conclusions on the picorna and NCLDV storiesBig Bangs of virus evolution
bullPhylogenomic analysis of the picorna-like superfamily of eukaryotic RNA viruses indicates that the major lineages within this superfamily diverged prior to the divergence of the eukaryotic supergroupsbullBig Bang ndash an explosive early phase of viral evolutionbullMost likely the same pattern holds for other major groups ofviruses as illustrated by the evolutionary study of NCLDVa completely different group of virusesbullThe Big Bangs of eukaryotic virus evolution occurred concomitantly with a similar rapid phase of host evolution andcould be a manifestation of a general model of major evolutionarytransitions
Koonin The Biological Big Bang model for the major transitions in evolutionBiol Direct 2007 Aug 20221
Koonin Wolf Nagasaki Dolja Nat Rev Microbiol 2008 Dec6(12)925-39
+
1 Positive-strandRNA
3-30 kb - +R R ET
T+R
Class Replication cycle
2 Double-strandRNA
4-25 kb plusmnTr R E
T
+ plusmn
3 Negative-strandRNA
11-20 kbR R E
RdRp CPT
+-Tr -
+
Genetic cycles of RNA viruses
Diversity of viral genetic cycles versus the uniform genetic cycle of all cellular organisms Viruses are the
biospherersquos laboratory of genomic strategies
RdRp CP
RdRp CP
+4 Retroid
RNA viruses 7-12 kb +RT Tr E
T+Tr
plusmn
5 Retroid DNAviruses
elements2-10 kb +Tr
T+Tr
plusmnE RT
plusmn
RT CP
RT
Class Replication cycle
Genetic cycles of retroid viruses and retroelements
Genetic cycles of DNA viruses and plasmids 6 ssDNAvirusesplasmids2-11 kb
7 dsDNA virusesplasmids
5-1200 kb
+RCR RCR E
+Tr
plusmn +
TRCE
Tr
plusmn
+T
plusmnR E
Pr-Pol ALL CELLULARORGANISMS
YOU ARE HERE
I Genes with readily detectable homologs from cellular life forms 1 Genes with closely related homologs from cellular organisms
(typically the host of the given virus) present in a narrow group of viruses2 Genes that are conserved within a virus lineage or even several
lineages and have moderately close cellular homologsOrigin relatively recent (1) or ancient (2) acquisition from host
II Virus-specific genes3 ORFans ie genes without detectable homologs except possibly
in closely related viruses 4 Virus-specific genes that are conserved within a virus lineageAcquisition from host but with rapid divergence from ancestor once within viral genomes
III Viral hallmark genes5 Genes shared by many diverse virus lineages with only very distant
homologs in cellular organisms
Natural history of viral genes a one-page summary of viral comparative genomics
Contributions of different classes of viral genes to the genomes of different classes of viruses strong dependence on genome size
Genome size (log) 0 1 2 3
Recent acquisitions
Old acquisitions
Virus-specific ORFans
Virus-specific conserved
Virus hallmark
Most RNAvirusesretroelementsRCR replicons
Large RNA virusesadenotailedphages
NCLDVherpeslargephages
Natural history of viral genesViral Hallmark Genes
Shared by many diverse groups of viruses from the smallest RNA viruses to the giant DNA viruses
Strong support for monophyly of all viral members of the respective gene families
Only distant homologs in cellular organisms
Can be viewed as signatures of the lsquovirus statersquo
Play major roles in genome replication packagingand assembly
4 Rolling circle replicationinitiation endonuclease
1 Jelly-roll capsid protein
2 Superfamily 3 helicase
Protein products of viral hallmark genes
3 RNA-dependent RNA polymeraseand Reverse transcriptase
5 Viral archaeo-eukaryotic DNA primase
6 UL9-like superfamily 2 helicase
7 Packaging ATPase of the FtsK family
8 ATPase subunit of terminase
The primordial gene pool hypothesis (extremely counterintuitive ndash Santiago Elena Feb 16 2011)
The hallmark genes AND by implication the major lineages of modern viruses (at least viruses of prokaryotes) descend directly from a primordial gene pool
Koonin Senkevich Dolja Biol Direct 2006 129
Viral hallmark genes arebullpresent in a huge diversity of viruses and other selfish elementsbullrepresented only by remote homologs in cellular life forms
-synergy with the diversity of genomic strategies
A crucial corollary If viruses come directly from a primordial gene pool then origin of viruses isinextricably linked to the origin of cells
Cell degeneration Escaped genes Pr imordial geneticsystems
CELL
SMALLPARASITICCELL
VERYSMALLVIRUS
CHROMOSOME
PLASMID
VIRUS
RNA
VIRUS
PRE-CELLULAR LIFE FORMS
DNA
VIRUS
mRNA
VIRUS
Competing concepts of the or igin of viruses
CELLS
Origin and evolution of virus-like genetic
elementsin the pre-cellular
era
Koonin Martin TIG 2005Koonin Senkevich Dolja Biol Direct 2006 129Koonin Ann NY Acad Sci 2009
Replicon fusionyields chromosomes
bullViruses and virus-like genetic elements are not ldquojustrdquo pathogens they aredominant entities in the biospherebullEmergence of virus-like parasites is inevitable in any replicating systembullIn the pre-cellular epoch the genetic elements that later became viral andcellular genomes comprised a single pool in which they mixed matched andevolved new increasingly complex gene ensemblesbullDifferent replication strategies including RNA replication reverse transcriptionand DNA replication evolved already in the primordial genetic poolbullWith the emergence of prokaryotic cells a distinct pool of viral genes formed that retained its identity ever since as evidenced by the extant distribution ofviral hallmark genes ldquovirus worldrdquo or the virospherebullThe emergence of the eukaryotic cell was a second melting pot of virus evolution from which viruses of eukaryotes originated via recombination of genes from prokaryote viruses retroelements and the evolving eukaryotic hostbullViruses make essential contributions to the evolution of the genomes of cellularlife forms in particular as vehicles of HGT GTAs transducing phages
The ancient Virus World (VW)
Koonin EV Senkevich TG Dolja VV The ancient Virus World and evolution of cells Biol Direct 2006 Koonin EV Wolf YI Nagasaki K Dolja VV The complexity of the virus world Nat Rev Microbiol 2009
Evolution of antivirus defense systems
bull CRISPRCas system of adaptive immunity in prokaryotes
bull A case for Lamarckian evolutionbull The perennial virus-host arms race
CRISPR repeats and Cas genesCRISPR Clustered Regularly interspaced short palindromic repeatsCas CRISPR-associated (genes)
Sorek et al Nature Rev Microbiol 2008
Makarova KS Aravind L Grishin NV Rogozin IB Koonin EVA DNA repair system specific for thermophilic Archaea and bacteria predicted by genomic context analysisNucleic Acids Res 2002 Jan 1530(2)482-96
During a systematic analysis of conserved gene context in prokaryotic genomes a previously undetected complex partially conserved neighborhood consisting of more than 20 genes was discovered in most Archaea and some bacteria including the hyperthermophiles Thermotoga maritima and Aquifex aeolicus The gene composition and gene order in this neighborhood vary greatly between species but all versions have a stable conserved core that consists of five genes One of the core genes encodes a predicted DNA helicase often fused to a predicted HD-superfamily hydrolase and another encodes a RecB family exonuclease three core genes remain uncharacterized but one of these might encode a nuclease of a new family helliphelliphelliphellipThe functional features of the proteins encoded in this neighborhood suggest that they comprise a previously undetected DNA repair system which to our knowledge is the first repair system largely specific for thermophiles to be identified
Protein components of the system an update with unification of many diverse families
~25 families altogetherFamily SubfamilyA Phyletic
distributionBComments
1 COG1518 COG1518 (cas1) All Putative novel nucleaseintegrase Mostly α-helical protein
2 COG1343 COG1343 (cas2) COG3512ygbF-like
MTH324-likey1723_N-like
All Small protein related to VapD fused to helicase (COG1203) iny1723-like proteins
3 COG1203 COG1203 (cas3) All DNA helicase Most proteins have fusion to HD nuclease
4 RecB-like nuclease
COG1468 (cas4) COG4343 All RecB-like nuclease Contains three-cysteine C-terminal cluster
5 RAMP Repair-associated mysterious
protein
COG1688 (cas5)COG1769 COG1583COG1567 COG1336COG1367 COG1604COG1337 COG1332COG5551
BH0337-likeMJ0978-likeYgcH-like
y1726-like y1727-like
All Belong to ldquoRAMPrdquo family possibly RNA-binding proteinstructurally related to duplicated ferredoxin fold (PDB 1wj9)
6 COG1857 COG1857 COG3649 YgcJ-like y1725-like
All αβ protein predicted nuclease or integrase
7 HD-like nuclease
COG1203 (N-terminus) COG2254 All HD-like nuclease
8 BH0338 BH0338-likeMTH1090-like
All mostlyarchaea and FIRM
Large Zn-finger containing proteins probably nucleases (nucleaseactivity was shown for MTH1090 (ref)
9 COG1353 predicted
polymerase
COG1353 Most archaea some bacteria
Predicted palm-domain polymerase distantly related to viral RdRp and RT
hellipand ~15 other less common proteins Makarova et al BD 2006
CRISPR clustered regularly interspaced short palindromic repeats
CRISPR repeats
TB and TA
Strepto-like
Ecoli-like
Pasteurella-like
Cas2COG1343
Cas1COG1518
Cas4COG1468
Cas3COG1203
ldquoThe common structural characteristics of CRISPR loci are (i) the presence of multiple short direct repeats which show no or very little sequence variation within a given locus (ii) the presence of non-repetitive spacer sequences between the repeats of similar size (iii) the presence of a common leader sequence of a few hundred basepairs in most species harbouring multiple CRISPR loci (iv) the absence of long open reading frames within the locus and (v) the presence of the cas1 gene accompanied by the cas2 cas3 or cas4 genes in CRISPR-containing speciesrdquo
The cas genes are our ldquorepairrdquo system
ldquoHelicaserdquo cassette
ldquoPolymeraserdquo cassette
paragon of prokaryotic genome plasticity-extensive gene shuffling-evidence of multiple horizontal transfers-widespread gene loss
RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins
extreme sequence diversity MOTIFs specific I II III IV V COGs 13361367 hhshhGs ust-lKGhh+hh hhGtt hD lGhttsGh 160413371332 y1726-like slhlpEKuVRGT lRTIDs YGuVTsGhuh COG1851 hGphpGpsaFh hGFGRh BH0337-like TpA-h+GIh-uIh hhLpDV LGsREhuht COG1567 hhhpp ups-lhtAhh luscoGhGh COG1769 hhhh+Ph-hhh ss-hhGhlhsh hGh hGtcp+hsthchp COG1688 (Cas5) hhhhhts sssshhGhlsh lGttphh COGs 15835551 hhhhoPhhl hGtppshGFGl YgcH-like hHphlh hGu+uhGhGhh y1727-like LHphLh GFsthGLStss MJ0978-like hHNH lG+tsuhGhGol
Ferredoxin foldRNA-recognition motif (RRM)
The sequence similarity space of CRISPR repeats visualized with the BioLayoutprogram [26] Dots denote individual repeat sequences connecting lines represent Smith-Waterman similarities such that closer dots represent more similar sequences Dot colors denote cluster association as derived from MCL clustering The 12 largest clusters are indicated by circles together with their sequence logos coarse phylogenetic composition and sample secondary structures where applicableKunin et al Genome Biology 2007 8R61 doi101186gb-2007-8-4-r61
CRISPR show extreme diversity and complex clustering
Arcful GTTGAAATC-------AGACCAAAATGGGATTGAAAG- Metthe GTTAAAATC-------AGACCAAAATGGGATTGAAAT- Pyraby GTTCCAATA-------AGACTAGAATAGAATTGAAAG- Pyrhor GTTTAATAA--------GACTAAAATAGAATTGAAAG- Sulsol GATTAATCC-------------CAAAAGAATTGAAAG- Pyrhor GTTTCCGTA--------GAACTAAATAGTGTGGAAAG- Pyraby GTTTCCGTA--------GAACTTAGTAGTGTGGAAAG- Pyraer GTTTCAACT------------ATCTTTTGATTTTTGG- Pictor GTTAAATAA-------TAACCTAAATAGGATTGAAAG- Theaer GTAAAATAG---------ACCTTAATAGGATTGAAAG- Metace GTTTCAATC-------CCTCAAAGGTCTGATTTTAAC- Sultok GATGAATCC------------CAAAAGGAATTGAAAG- Sulaci GTTTTAGTT-------------TCTTGTCGTTATTAC- Pictor GTTTAAGAA--------TTACTAGATAGTATGGAGT-- Halmar GTTTCAGAC-------GGACCCTTGTGGGGTTGAAGC- Metjan ATTAAAATC-------AGACCGTTTCGGAATGGAAAT- Metkan GTTTCATTA-CCCGTATTATTACGGGTTAATTGCGAG- Nanequ CTTTCAATA---------TTTCTAATATATTAGAAAC- Themar GTTTCAATA-------CTTCCTTAGAGGTATGGAAAC- Theten GTTTCAATC-------CCTTTTAGGTAGGCTAAAAAC Thethe GTTGCAAAC-CTCGTTAGCCTCGTAGAGGATTGAAAC- Aquaeo GTTTTAACT--------CCACACGGTACATTAGAAAC- Bachal GTCGCACTC----TATATGGGT-GCGTGGATTGAAAT- Azoarcus GTGTTCCCC-------GCGCATCGCGGGGGTTGAAG-- Chltep GTCTTCCCC--------ACGCC-CGTGGGGGTGTTTC- Chrvio GTGCTCCCC-------CACGCA-CGTGGGGATGAACCG Clotet GTATTAGTA-------GCACCA-TATTGGAATGTAAAT Anabaena GTTTTAATTAACAAAAATCCCTATCAGGGATTGAAAC- Myctub GTTTCCGTC--CCCTCTCGGGGTTTTGGGTCTGACGAC Yerpes GTTCACTGC--------CGCACAGGCAGCTTAGAAA-- Metcap GTTTCAATCCACTCCCGGCTATTTAGCCGGGAGATAC- Mycgal GTTTTAGCA-CTGTACAATACTTGTGTAAGCAATAAC- Pasmul GTTAACTGC--------CGTATAGGCAGCTTAGAAA-- Pasmul GTTGTAGTTCCCTCTCTCATTTCGCAGTGCTACAAT-- Pasmul GTTCACCAT--------CGTGTAGATGGCTTAGAAA--
All
A large subset of CRISPRs is conserved among diverse specieseven between archaea and bacteria suggesting that
CRISPRs are horizontally transferred together with cas genes
Mojica FJ Diez-Villasenor C Garcia-Martinez J Soria E
Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elementsJ Mol Evol 2005 Feb60(2)174-82
Here we show that CRISPR spacers derive from preexisting sequences either chromosomal or within transmissible genetic elements such as bacteriophages and conjugative plasmids
helliphelliphelliphellipThe transcription of the CRISPR loci (Tang et al 2002) suggests that such activity could be executed by CRISPR-RNA molecules acting as regulatory RNA that specifically recognizes the target through the homologous RNA-spacer sequence similarly to the eukaryotic interference RNA
HypothesisCRISPRCasbull is a prokaryotic immunity system that functions on the
RNAi principle
bull integrates short fragments of essential phageplasmid genes into CRISPRs
bull When expressed these fragments (psiRNA ndash after prokaryotic siRNA) silence the target gene and make the organism immune to the respective agent
bull contains all or most of the protein activities involved in these processes
bull Some of the Cas proteins are functional analogs of the eukaryotic proteins involved in RNAi in particular components of RISC (RNA-Induced Silencing Complex) and form prokaryotic analogs of RISC (pRISCs)
Transcription
(protein-guided) RNA folding
specific transcription factor (COG1517)
cellular RNApol
polycistronic pre-psiRNA
p-dicer (Helicase+HD-Hydrolase)(COG1203 COG2254)
RNA processing(fast)
pre-psiRNA75-100 nt
RNA processing(slow)
p-dicer
psiRNA25-45 nt
p-slicer(COG1468 COG4343
COG1857)
RAMP
target RNA cleavage
p-RISC
plasmid or phage mRNA
Annealing to RNA target
p-RISC
The basic scheme of CASS functioning
psiRNA25-45 nt
RAMP
RAMP-RNA complex(unstable)plasmid or phage
mRNA
Primer elongationCASS RNApol(COG1353)
Amplified annealing
long dsRNA (stable)
p-dicerDuplex degradation
RAMPRAMP binding
Cycle continues
Annealing to RNA target
Variant of CASS functioning with polymerasepsi-RNA amplification(mostly in thermophiles but also in Mycobacteria)
plasmid or phage mRNA
pre-psiRNA75-100 nt
CASS RNApol(COG1353)Or other RT
Reverse transcription with random copy choice
Random RNA recombination and reverse transcription
dsDNA with CRISPRs and target-derived spacers
Homologous recombination with genomic CRISPR region
integrase(COG1518)
genomic DNA
genomic DNA with new target-derived spacer
OR
New psiRNA generation
Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12
Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity
Key validation
bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease
Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes
(1) Adaptation
(2) Expression amp Processing
(3) Interference
invading virus plasmid
CRISPRleader
5 4 3 2 1
6 5 4 3 2 1
invading virus plasmid
invading virus plasmid
host
host
cas operon
Cascade
Cas3
cas3 cas2cas1cas5
Van der Oost et al TIBS 2009
The three types of CRISPRCas systems and their signature genes
1343
1343
1343
1343
Helicase1203
HD-f
Cas2
Cas6
Cas6
Cas5
Cas8
Cas9
Cas1
Cas3
Cas10
Cas11
Cas7
Cas6
Cas6
Cas5 Cas6
Cas4
Cas6
13433513
1343
RuvC-f
3513
HNH
13433513
1857
18571857 RAMP1688
RAMP1688
RecB-f1468
RecB-f1468
Helicase1203
R AMP1583
1857 1857Helicase1203
1857Helicase1203
1343 y1724
ygcKygcL
EcoliCASS2
YpestCASS3
NmeniCASS4
MtubeCASS6 RAMP module
DvulgCASS1
Tneap and HmariCASS7
ApernCASS5
Type I (Helicase cassette)
Cascade
Type II (HNH-type)
Type III (RAMP module or Polymerase cassette)
CASS4a
1857 1343RecB-f1468
Helicase1203
BH0338MTH1090LA3191
1421 RAMP1567
RAMP15833337RAMP
1583
RAMP1583
RAMP1769
SPy1049
Cas6
Cas61857
RecB-f1468
RAMP1688
Helicase1203
AF0070
R AMP1583
1343
1517
Cas12 Cas13
RAMP1688
RAMP1688
RAMP1688
Makarova et al NRM submitted
Experimental data on CRISPRCas systems
1857
1857
1857
RAMP1688
10
Cordip DIP0037Bacfra BF3951
Wolsuc WS1444Camjej Cj1522c
Neimen NMA0630Pasmul PM1126
Strthe str0658Strpyo Spy 0770
Treden TDE0328Staepi SERP2463
Mycgal MGA 0523Mycmob MMOB0320
Porgin PG1982Legpne lpp0161
Wolsuc WS1615Sulsol SSO1450
Aerper APE1240Sulsol SSO1405
Pyraer PAE0200Pyraer PAE0081
Metmaz MM0559Metmaz MM3249
Arcful AF1878Mansuc MS1635
Niteur NE0111Myctub Rv2817c
Nostoc alr0381Synech slr7071Thethe TTHB145
Chrvio CV1229Xanaxo XAC3842
Chltep CT1130Desvul DVUA0134
Metcap MCA0651Azoarc ebA3283
Deigeo_DRAFT_1682Mansuc MS0981
Baccla ABC3592Bachal BH0341
Strpyo Spy 1286Vibvul VVA1544
Nostoc alr1468Synech slr7092
Synech slr7016Geosul GSU0057
Thethe TTHB224Nostoc alr1568
Metkan MK1312
Esccol b2755Saltyp STM2938
Phopro PBPRC0034Geosul GSU1392
Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639
Symthe_STH663Corjei jk0643
Nocfar nfa44220Strave SAV7537
Metcap MCA0930Cordip DIP2214
Chrvio CV1756Zymmob ZMO0680
Pasmul PM0311Erwcar ECA3679
Yerpes y1722Legpne lpl2837Phopro PBPRB1995
Acinet ACIAD2484
Clotet CTC01148Fusnuc FN1177
Aquaeo aq 369Theten TTE2658Themar TM1797
Bacfra BF2544Porgin PG2014
Halmar pNG4053Clotet CTC01463
Metjan MJ0378Pyrhor PH1245
Thekod TK0455Metthe MTH1084
Arcful AF2435Metace MA3670
Nanequ NEQ017Pyrhor PH0173
Pictor PTO0003Pictor PTO0049Thevol TVN0106
1343
1343
1343
1343
1518
1518
1343
1518
RecB-f4343
RecB-f1468
RecB-f1468
RecB-f1468
Helicase12031857 RAMP
1688
RAMP1688
Helicase1203
Helicase1203
RAMPBH0337
RAMPYgcH
RAMPy1727
RAMPy17261857
Helicase1203
HD-f
Helicase1203
HD-f
1343
HD-f
HD-f
HD-f
RuvC-f
AF0070
BH0338
y1724
ygcKygcL
RAMP1583
HTH
AF1870
SPy1049
3574
AF1870
PH0918
HTH 3574
1857 RAMP1688
RecB-f1468
Helicase1203
HD-f
AF1870
MTH1090
AF1873
RAMP1583
3574
AF1870
3574
1343
1
7
2
3
4
5
6
7a
Sulsol SSO1999Sulsol SSO1440
Sultok ST2639Sultok ST0032
Sulsol SSO1402Pyraer PAE0068
Arcful AF1874Pyrhor PH0917Thekod TK0450
Pyraby PAB1689
13433513
3513
1518
RuvC-f
HNH
4aHNH
1 0
Pyrhor PH0162
6
64
6
4
16
17
7
7
7
7
1
2
7
7
7
7
5 7
5 75 7
5 75 7
5 7
6 1
7
Metkan MK1314
Metace MA1931
Metmaz MM3356
Metbar Mbar_A1355
Strthe stu0960Myctub Rv2823cStaepi SERP2461
Themar TM1811Mansuc MS1651
Niteur NE0123Thethe TTP0102
Metthe MTH1082
Metthe MTH326
Metjan MJ1672Pictor PTO0053Thevol TVN0112
Metkan MK1297Thethe TTP0115
Bachal BH0328Synech sll7090
Aquaeo aq 387Theten TTE2639
Themar TM1794Arcful AF1867
Pyrfur PF1129Thefus Tfu1578
Porgin PG1987Sultok ST1979
Sulsol SSO1991Sulsol SSO1729
Sultok ST0012Sulaci Saci 2046Nostoc all1479
1421 RAMP1567
RAMP1583
RAMPMTH323
RAMP1769
XMTH324
ldquoPolymeraserdquo cassette
ldquoHelicaserdquo cassette(A)
(B)
(C)
Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4
Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56
Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)
mRNA targeting
Phylogeny of Cas1 and the 3 types of CRISPRCas systems
228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes
EcoliCASS2Type IType I
Type I
Type I
Type I
Type III
Type III
Type III
Type III
Type III
Type III
Type II
YpestCASS3
NmeniCASS4
PolymeraseRAMPmoduleMtubeCASS6
PolRAMP
DvulgCASS1
TneapHmariCASS7
ApernCASS5
cas1cas9 cas2 cas4
cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI
II
IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2
RAMP
Makarova et al NRM submitted
CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes
54
256
1537
26
516
2
9
7 56107
3
10
13
383
210
46
216
8
1
7 70211
10
1
0102030405060708090
100
1623
84 6
17 7
2322
0
9
0
0
151
2
1
21
17
20 1
0
1533 28 7 14
0
9 7 40
117 210
0102030405060708090
100
Type I Type II Type IIICas1presenceabsence
Makarova et al in preparation
Modular evolution of the 3 types of CRISPRCas
Makarova et al in preparation
Mol Microbiol 2011 Jan79(2)484-502
A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair
Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF
Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair
Back to repair functions
The 3 major modalities of evolution
Koonin Wolf Biol Direct 2009
CRISPRCas as a bona fide Lamarckian system
Koonin Wolf Biol Direct 2009
Phenomenon Biological rolefunction
Phyletic spread
Lamarckian criteria
Genomic changes caused by environmental factor
Changes are specific to relevant genomic loci
Changes provide adaptation to the causative factor
Bona fide LamarckianCRISPRCas Defense against
viruses and other mobile elements
Most of the Archaeaand many bacteria
Yes Yes Yes
piRNA Defense against transposable elements in germline
Animals Yes Yes Yes
HGT (specific cases)
Adaptation to new environment stress response resistance
Archaea bacteria unicellular eukaryotes
Yes Yes Yes
Quasi-LamarckianHGT (general phenomenon)
Diverse innovations Archaea bacteria unicellular eukaryotes
Yes No Yesno
Stress-induced mutagenesis
Stress responseresistanceadaptation to new conditions
Ubiquitous Yes No or partially
Yes (but general evolvabilityenhanced as well)
Diverse Lamarckian and quasi-Lamarckian phenomena
Koonin Wolf Biol Direct 2009
Stress as a gauge of evolutionary modality
Koonin Wolf Biol Direct 2009
bullEvolution of parasites is intrinsic to any replicator system
bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since
bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms
bullPerennial arms race between parasites and hosts is one of the principal factors of evolution
Valerian DoljaOregon State University
Yuri Wolf NCBITatiana Senkevich NIAID NIH
Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille
Didier Raoult et al
Bill MartinUniv Duesseldorf
Kira Makarova NCBI
The mosaic composition of the Marseille virus genome
Gene content tree intervirus gene transfer
Amoeba as a melting pot for HGT between viruses andbacterial endosymbionts
There are really weird creatures out thereSome NCLDV host their own parasites
La Scola et al The virophage as a unique parasite of the giant mimivirus Nature 2008
Chimeric origin of the virophage genome
NCLDV
Mimivirus
Archaeal virus
NCLDVenvironmental only
Bacterialtransposon
La Scola et al Nature 2008 455100-4
Hypothesisorigin of the NCLDV (and other viruses ofEukaryotes) in the
the second melting pot of virus evolution ndasheukaryogenesis
Phage scaffold(virus hallmark genes)
Eukaryotic additionsdisplacements
Koonin Yutin Intervirology 2010
Poliovirus74 kb
VP0 3DPol2C 3CPro Angg
IRESVP1VP3 2APro 2B
3A 3B
Jelly-roll CPsSuperfamily 3
helicase RdRp
Viral hallmark genes
3C-Pro3DPol
RdRp3CPro g
VPg2C
S3H
Picornaviral lsquosignaturersquo genes
Protein sequence conservation
Some of the smallest viruses (this is where poliovirus belongs) The Big Bang of picorna-like virus evolution
Koonin Wolf Nagasaki Dolja Nat Rev Microbiol 2008 Dec6(12)925-39
Picorna-likeviruses possessdiverse arraysof hallmark andunique genes
Picorna-likeviral superfamily
Marine eukaryotic plankton carries a wealth ofpositive-strand RNA viruses nearly all belong to the picorna-like superfamily
Lang et al (2004) Virology 320206
HcRNAV44 kb
VP
HaRNAV86 kb
S-Pro Pol
VP1VP3VP2PolHel3
RsRNAV89 kb AnPolHel3 VP 1-3
SssRNAV90 kb An
PolHel3 VP 1-3C-Pro
sg sg
An
Nagasaki et al (2005) Appl Env Microbiol 718888
Culley et al (2006) Science 3121795
The amended Picornavirus-like
superfamily includes 14 recognized viral families
4 floating genera and15 unclassified positive-
strand anddouble-strand RNA
viruses that infect hosts from 4 of the 5 eukaryotic
supergroups-6 distinct clades
from RdRp phylogeny-diverse genome layouts
View from the viral side
6 major clades of picorna-likeviruses - 5 infect eukaryotesfrom different supergroups
The 5 supergroups of eukaryotes and their picorna-like viruses
Complementary view from thehost side
4 of the 5 eukaryoticsupergroups host diverse picorna-like viruses (no onechecked the 5th)
PolioV74 kb
VP0 3DPol2C 3CPro Angg
IRESVP1VP3 2APro 2B
3A 3B
Jelly-roll CPsSuperfamily 3
helicase RdRp
DNA phages Bacterial group II
retroelements
Bacterial and mitochondrialHtrA family proteases
Likely origins of the signature genes of the picorna-like superfamily were inferred usingsignature PSSMs and PSI-BLAST searches
3C-Pro
Hallmark genes of picorna-like viruses have distinct prokaryoticorigins
Radiation of major viralclades occurred in aldquoBig Bangrdquo during eukaryogenesis and antedates the divergenceof eukaryotic supergroups-the viruses then ldquosampledrdquo the hosts
Conclusions on the picorna and NCLDV storiesBig Bangs of virus evolution
bullPhylogenomic analysis of the picorna-like superfamily of eukaryotic RNA viruses indicates that the major lineages within this superfamily diverged prior to the divergence of the eukaryotic supergroupsbullBig Bang ndash an explosive early phase of viral evolutionbullMost likely the same pattern holds for other major groups ofviruses as illustrated by the evolutionary study of NCLDVa completely different group of virusesbullThe Big Bangs of eukaryotic virus evolution occurred concomitantly with a similar rapid phase of host evolution andcould be a manifestation of a general model of major evolutionarytransitions
Koonin The Biological Big Bang model for the major transitions in evolutionBiol Direct 2007 Aug 20221
Koonin Wolf Nagasaki Dolja Nat Rev Microbiol 2008 Dec6(12)925-39
+
1 Positive-strandRNA
3-30 kb - +R R ET
T+R
Class Replication cycle
2 Double-strandRNA
4-25 kb plusmnTr R E
T
+ plusmn
3 Negative-strandRNA
11-20 kbR R E
RdRp CPT
+-Tr -
+
Genetic cycles of RNA viruses
Diversity of viral genetic cycles versus the uniform genetic cycle of all cellular organisms Viruses are the
biospherersquos laboratory of genomic strategies
RdRp CP
RdRp CP
+4 Retroid
RNA viruses 7-12 kb +RT Tr E
T+Tr
plusmn
5 Retroid DNAviruses
elements2-10 kb +Tr
T+Tr
plusmnE RT
plusmn
RT CP
RT
Class Replication cycle
Genetic cycles of retroid viruses and retroelements
Genetic cycles of DNA viruses and plasmids 6 ssDNAvirusesplasmids2-11 kb
7 dsDNA virusesplasmids
5-1200 kb
+RCR RCR E
+Tr
plusmn +
TRCE
Tr
plusmn
+T
plusmnR E
Pr-Pol ALL CELLULARORGANISMS
YOU ARE HERE
I Genes with readily detectable homologs from cellular life forms 1 Genes with closely related homologs from cellular organisms
(typically the host of the given virus) present in a narrow group of viruses2 Genes that are conserved within a virus lineage or even several
lineages and have moderately close cellular homologsOrigin relatively recent (1) or ancient (2) acquisition from host
II Virus-specific genes3 ORFans ie genes without detectable homologs except possibly
in closely related viruses 4 Virus-specific genes that are conserved within a virus lineageAcquisition from host but with rapid divergence from ancestor once within viral genomes
III Viral hallmark genes5 Genes shared by many diverse virus lineages with only very distant
homologs in cellular organisms
Natural history of viral genes a one-page summary of viral comparative genomics
Contributions of different classes of viral genes to the genomes of different classes of viruses strong dependence on genome size
Genome size (log) 0 1 2 3
Recent acquisitions
Old acquisitions
Virus-specific ORFans
Virus-specific conserved
Virus hallmark
Most RNAvirusesretroelementsRCR replicons
Large RNA virusesadenotailedphages
NCLDVherpeslargephages
Natural history of viral genesViral Hallmark Genes
Shared by many diverse groups of viruses from the smallest RNA viruses to the giant DNA viruses
Strong support for monophyly of all viral members of the respective gene families
Only distant homologs in cellular organisms
Can be viewed as signatures of the lsquovirus statersquo
Play major roles in genome replication packagingand assembly
4 Rolling circle replicationinitiation endonuclease
1 Jelly-roll capsid protein
2 Superfamily 3 helicase
Protein products of viral hallmark genes
3 RNA-dependent RNA polymeraseand Reverse transcriptase
5 Viral archaeo-eukaryotic DNA primase
6 UL9-like superfamily 2 helicase
7 Packaging ATPase of the FtsK family
8 ATPase subunit of terminase
The primordial gene pool hypothesis (extremely counterintuitive ndash Santiago Elena Feb 16 2011)
The hallmark genes AND by implication the major lineages of modern viruses (at least viruses of prokaryotes) descend directly from a primordial gene pool
Koonin Senkevich Dolja Biol Direct 2006 129
Viral hallmark genes arebullpresent in a huge diversity of viruses and other selfish elementsbullrepresented only by remote homologs in cellular life forms
-synergy with the diversity of genomic strategies
A crucial corollary If viruses come directly from a primordial gene pool then origin of viruses isinextricably linked to the origin of cells
Cell degeneration Escaped genes Pr imordial geneticsystems
CELL
SMALLPARASITICCELL
VERYSMALLVIRUS
CHROMOSOME
PLASMID
VIRUS
RNA
VIRUS
PRE-CELLULAR LIFE FORMS
DNA
VIRUS
mRNA
VIRUS
Competing concepts of the or igin of viruses
CELLS
Origin and evolution of virus-like genetic
elementsin the pre-cellular
era
Koonin Martin TIG 2005Koonin Senkevich Dolja Biol Direct 2006 129Koonin Ann NY Acad Sci 2009
Replicon fusionyields chromosomes
bullViruses and virus-like genetic elements are not ldquojustrdquo pathogens they aredominant entities in the biospherebullEmergence of virus-like parasites is inevitable in any replicating systembullIn the pre-cellular epoch the genetic elements that later became viral andcellular genomes comprised a single pool in which they mixed matched andevolved new increasingly complex gene ensemblesbullDifferent replication strategies including RNA replication reverse transcriptionand DNA replication evolved already in the primordial genetic poolbullWith the emergence of prokaryotic cells a distinct pool of viral genes formed that retained its identity ever since as evidenced by the extant distribution ofviral hallmark genes ldquovirus worldrdquo or the virospherebullThe emergence of the eukaryotic cell was a second melting pot of virus evolution from which viruses of eukaryotes originated via recombination of genes from prokaryote viruses retroelements and the evolving eukaryotic hostbullViruses make essential contributions to the evolution of the genomes of cellularlife forms in particular as vehicles of HGT GTAs transducing phages
The ancient Virus World (VW)
Koonin EV Senkevich TG Dolja VV The ancient Virus World and evolution of cells Biol Direct 2006 Koonin EV Wolf YI Nagasaki K Dolja VV The complexity of the virus world Nat Rev Microbiol 2009
Evolution of antivirus defense systems
bull CRISPRCas system of adaptive immunity in prokaryotes
bull A case for Lamarckian evolutionbull The perennial virus-host arms race
CRISPR repeats and Cas genesCRISPR Clustered Regularly interspaced short palindromic repeatsCas CRISPR-associated (genes)
Sorek et al Nature Rev Microbiol 2008
Makarova KS Aravind L Grishin NV Rogozin IB Koonin EVA DNA repair system specific for thermophilic Archaea and bacteria predicted by genomic context analysisNucleic Acids Res 2002 Jan 1530(2)482-96
During a systematic analysis of conserved gene context in prokaryotic genomes a previously undetected complex partially conserved neighborhood consisting of more than 20 genes was discovered in most Archaea and some bacteria including the hyperthermophiles Thermotoga maritima and Aquifex aeolicus The gene composition and gene order in this neighborhood vary greatly between species but all versions have a stable conserved core that consists of five genes One of the core genes encodes a predicted DNA helicase often fused to a predicted HD-superfamily hydrolase and another encodes a RecB family exonuclease three core genes remain uncharacterized but one of these might encode a nuclease of a new family helliphelliphelliphellipThe functional features of the proteins encoded in this neighborhood suggest that they comprise a previously undetected DNA repair system which to our knowledge is the first repair system largely specific for thermophiles to be identified
Protein components of the system an update with unification of many diverse families
~25 families altogetherFamily SubfamilyA Phyletic
distributionBComments
1 COG1518 COG1518 (cas1) All Putative novel nucleaseintegrase Mostly α-helical protein
2 COG1343 COG1343 (cas2) COG3512ygbF-like
MTH324-likey1723_N-like
All Small protein related to VapD fused to helicase (COG1203) iny1723-like proteins
3 COG1203 COG1203 (cas3) All DNA helicase Most proteins have fusion to HD nuclease
4 RecB-like nuclease
COG1468 (cas4) COG4343 All RecB-like nuclease Contains three-cysteine C-terminal cluster
5 RAMP Repair-associated mysterious
protein
COG1688 (cas5)COG1769 COG1583COG1567 COG1336COG1367 COG1604COG1337 COG1332COG5551
BH0337-likeMJ0978-likeYgcH-like
y1726-like y1727-like
All Belong to ldquoRAMPrdquo family possibly RNA-binding proteinstructurally related to duplicated ferredoxin fold (PDB 1wj9)
6 COG1857 COG1857 COG3649 YgcJ-like y1725-like
All αβ protein predicted nuclease or integrase
7 HD-like nuclease
COG1203 (N-terminus) COG2254 All HD-like nuclease
8 BH0338 BH0338-likeMTH1090-like
All mostlyarchaea and FIRM
Large Zn-finger containing proteins probably nucleases (nucleaseactivity was shown for MTH1090 (ref)
9 COG1353 predicted
polymerase
COG1353 Most archaea some bacteria
Predicted palm-domain polymerase distantly related to viral RdRp and RT
hellipand ~15 other less common proteins Makarova et al BD 2006
CRISPR clustered regularly interspaced short palindromic repeats
CRISPR repeats
TB and TA
Strepto-like
Ecoli-like
Pasteurella-like
Cas2COG1343
Cas1COG1518
Cas4COG1468
Cas3COG1203
ldquoThe common structural characteristics of CRISPR loci are (i) the presence of multiple short direct repeats which show no or very little sequence variation within a given locus (ii) the presence of non-repetitive spacer sequences between the repeats of similar size (iii) the presence of a common leader sequence of a few hundred basepairs in most species harbouring multiple CRISPR loci (iv) the absence of long open reading frames within the locus and (v) the presence of the cas1 gene accompanied by the cas2 cas3 or cas4 genes in CRISPR-containing speciesrdquo
The cas genes are our ldquorepairrdquo system
ldquoHelicaserdquo cassette
ldquoPolymeraserdquo cassette
paragon of prokaryotic genome plasticity-extensive gene shuffling-evidence of multiple horizontal transfers-widespread gene loss
RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins
extreme sequence diversity MOTIFs specific I II III IV V COGs 13361367 hhshhGs ust-lKGhh+hh hhGtt hD lGhttsGh 160413371332 y1726-like slhlpEKuVRGT lRTIDs YGuVTsGhuh COG1851 hGphpGpsaFh hGFGRh BH0337-like TpA-h+GIh-uIh hhLpDV LGsREhuht COG1567 hhhpp ups-lhtAhh luscoGhGh COG1769 hhhh+Ph-hhh ss-hhGhlhsh hGh hGtcp+hsthchp COG1688 (Cas5) hhhhhts sssshhGhlsh lGttphh COGs 15835551 hhhhoPhhl hGtppshGFGl YgcH-like hHphlh hGu+uhGhGhh y1727-like LHphLh GFsthGLStss MJ0978-like hHNH lG+tsuhGhGol
Ferredoxin foldRNA-recognition motif (RRM)
The sequence similarity space of CRISPR repeats visualized with the BioLayoutprogram [26] Dots denote individual repeat sequences connecting lines represent Smith-Waterman similarities such that closer dots represent more similar sequences Dot colors denote cluster association as derived from MCL clustering The 12 largest clusters are indicated by circles together with their sequence logos coarse phylogenetic composition and sample secondary structures where applicableKunin et al Genome Biology 2007 8R61 doi101186gb-2007-8-4-r61
CRISPR show extreme diversity and complex clustering
Arcful GTTGAAATC-------AGACCAAAATGGGATTGAAAG- Metthe GTTAAAATC-------AGACCAAAATGGGATTGAAAT- Pyraby GTTCCAATA-------AGACTAGAATAGAATTGAAAG- Pyrhor GTTTAATAA--------GACTAAAATAGAATTGAAAG- Sulsol GATTAATCC-------------CAAAAGAATTGAAAG- Pyrhor GTTTCCGTA--------GAACTAAATAGTGTGGAAAG- Pyraby GTTTCCGTA--------GAACTTAGTAGTGTGGAAAG- Pyraer GTTTCAACT------------ATCTTTTGATTTTTGG- Pictor GTTAAATAA-------TAACCTAAATAGGATTGAAAG- Theaer GTAAAATAG---------ACCTTAATAGGATTGAAAG- Metace GTTTCAATC-------CCTCAAAGGTCTGATTTTAAC- Sultok GATGAATCC------------CAAAAGGAATTGAAAG- Sulaci GTTTTAGTT-------------TCTTGTCGTTATTAC- Pictor GTTTAAGAA--------TTACTAGATAGTATGGAGT-- Halmar GTTTCAGAC-------GGACCCTTGTGGGGTTGAAGC- Metjan ATTAAAATC-------AGACCGTTTCGGAATGGAAAT- Metkan GTTTCATTA-CCCGTATTATTACGGGTTAATTGCGAG- Nanequ CTTTCAATA---------TTTCTAATATATTAGAAAC- Themar GTTTCAATA-------CTTCCTTAGAGGTATGGAAAC- Theten GTTTCAATC-------CCTTTTAGGTAGGCTAAAAAC Thethe GTTGCAAAC-CTCGTTAGCCTCGTAGAGGATTGAAAC- Aquaeo GTTTTAACT--------CCACACGGTACATTAGAAAC- Bachal GTCGCACTC----TATATGGGT-GCGTGGATTGAAAT- Azoarcus GTGTTCCCC-------GCGCATCGCGGGGGTTGAAG-- Chltep GTCTTCCCC--------ACGCC-CGTGGGGGTGTTTC- Chrvio GTGCTCCCC-------CACGCA-CGTGGGGATGAACCG Clotet GTATTAGTA-------GCACCA-TATTGGAATGTAAAT Anabaena GTTTTAATTAACAAAAATCCCTATCAGGGATTGAAAC- Myctub GTTTCCGTC--CCCTCTCGGGGTTTTGGGTCTGACGAC Yerpes GTTCACTGC--------CGCACAGGCAGCTTAGAAA-- Metcap GTTTCAATCCACTCCCGGCTATTTAGCCGGGAGATAC- Mycgal GTTTTAGCA-CTGTACAATACTTGTGTAAGCAATAAC- Pasmul GTTAACTGC--------CGTATAGGCAGCTTAGAAA-- Pasmul GTTGTAGTTCCCTCTCTCATTTCGCAGTGCTACAAT-- Pasmul GTTCACCAT--------CGTGTAGATGGCTTAGAAA--
All
A large subset of CRISPRs is conserved among diverse specieseven between archaea and bacteria suggesting that
CRISPRs are horizontally transferred together with cas genes
Mojica FJ Diez-Villasenor C Garcia-Martinez J Soria E
Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elementsJ Mol Evol 2005 Feb60(2)174-82
Here we show that CRISPR spacers derive from preexisting sequences either chromosomal or within transmissible genetic elements such as bacteriophages and conjugative plasmids
helliphelliphelliphellipThe transcription of the CRISPR loci (Tang et al 2002) suggests that such activity could be executed by CRISPR-RNA molecules acting as regulatory RNA that specifically recognizes the target through the homologous RNA-spacer sequence similarly to the eukaryotic interference RNA
HypothesisCRISPRCasbull is a prokaryotic immunity system that functions on the
RNAi principle
bull integrates short fragments of essential phageplasmid genes into CRISPRs
bull When expressed these fragments (psiRNA ndash after prokaryotic siRNA) silence the target gene and make the organism immune to the respective agent
bull contains all or most of the protein activities involved in these processes
bull Some of the Cas proteins are functional analogs of the eukaryotic proteins involved in RNAi in particular components of RISC (RNA-Induced Silencing Complex) and form prokaryotic analogs of RISC (pRISCs)
Transcription
(protein-guided) RNA folding
specific transcription factor (COG1517)
cellular RNApol
polycistronic pre-psiRNA
p-dicer (Helicase+HD-Hydrolase)(COG1203 COG2254)
RNA processing(fast)
pre-psiRNA75-100 nt
RNA processing(slow)
p-dicer
psiRNA25-45 nt
p-slicer(COG1468 COG4343
COG1857)
RAMP
target RNA cleavage
p-RISC
plasmid or phage mRNA
Annealing to RNA target
p-RISC
The basic scheme of CASS functioning
psiRNA25-45 nt
RAMP
RAMP-RNA complex(unstable)plasmid or phage
mRNA
Primer elongationCASS RNApol(COG1353)
Amplified annealing
long dsRNA (stable)
p-dicerDuplex degradation
RAMPRAMP binding
Cycle continues
Annealing to RNA target
Variant of CASS functioning with polymerasepsi-RNA amplification(mostly in thermophiles but also in Mycobacteria)
plasmid or phage mRNA
pre-psiRNA75-100 nt
CASS RNApol(COG1353)Or other RT
Reverse transcription with random copy choice
Random RNA recombination and reverse transcription
dsDNA with CRISPRs and target-derived spacers
Homologous recombination with genomic CRISPR region
integrase(COG1518)
genomic DNA
genomic DNA with new target-derived spacer
OR
New psiRNA generation
Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12
Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity
Key validation
bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease
Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes
(1) Adaptation
(2) Expression amp Processing
(3) Interference
invading virus plasmid
CRISPRleader
5 4 3 2 1
6 5 4 3 2 1
invading virus plasmid
invading virus plasmid
host
host
cas operon
Cascade
Cas3
cas3 cas2cas1cas5
Van der Oost et al TIBS 2009
The three types of CRISPRCas systems and their signature genes
1343
1343
1343
1343
Helicase1203
HD-f
Cas2
Cas6
Cas6
Cas5
Cas8
Cas9
Cas1
Cas3
Cas10
Cas11
Cas7
Cas6
Cas6
Cas5 Cas6
Cas4
Cas6
13433513
1343
RuvC-f
3513
HNH
13433513
1857
18571857 RAMP1688
RAMP1688
RecB-f1468
RecB-f1468
Helicase1203
R AMP1583
1857 1857Helicase1203
1857Helicase1203
1343 y1724
ygcKygcL
EcoliCASS2
YpestCASS3
NmeniCASS4
MtubeCASS6 RAMP module
DvulgCASS1
Tneap and HmariCASS7
ApernCASS5
Type I (Helicase cassette)
Cascade
Type II (HNH-type)
Type III (RAMP module or Polymerase cassette)
CASS4a
1857 1343RecB-f1468
Helicase1203
BH0338MTH1090LA3191
1421 RAMP1567
RAMP15833337RAMP
1583
RAMP1583
RAMP1769
SPy1049
Cas6
Cas61857
RecB-f1468
RAMP1688
Helicase1203
AF0070
R AMP1583
1343
1517
Cas12 Cas13
RAMP1688
RAMP1688
RAMP1688
Makarova et al NRM submitted
Experimental data on CRISPRCas systems
1857
1857
1857
RAMP1688
10
Cordip DIP0037Bacfra BF3951
Wolsuc WS1444Camjej Cj1522c
Neimen NMA0630Pasmul PM1126
Strthe str0658Strpyo Spy 0770
Treden TDE0328Staepi SERP2463
Mycgal MGA 0523Mycmob MMOB0320
Porgin PG1982Legpne lpp0161
Wolsuc WS1615Sulsol SSO1450
Aerper APE1240Sulsol SSO1405
Pyraer PAE0200Pyraer PAE0081
Metmaz MM0559Metmaz MM3249
Arcful AF1878Mansuc MS1635
Niteur NE0111Myctub Rv2817c
Nostoc alr0381Synech slr7071Thethe TTHB145
Chrvio CV1229Xanaxo XAC3842
Chltep CT1130Desvul DVUA0134
Metcap MCA0651Azoarc ebA3283
Deigeo_DRAFT_1682Mansuc MS0981
Baccla ABC3592Bachal BH0341
Strpyo Spy 1286Vibvul VVA1544
Nostoc alr1468Synech slr7092
Synech slr7016Geosul GSU0057
Thethe TTHB224Nostoc alr1568
Metkan MK1312
Esccol b2755Saltyp STM2938
Phopro PBPRC0034Geosul GSU1392
Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639
Symthe_STH663Corjei jk0643
Nocfar nfa44220Strave SAV7537
Metcap MCA0930Cordip DIP2214
Chrvio CV1756Zymmob ZMO0680
Pasmul PM0311Erwcar ECA3679
Yerpes y1722Legpne lpl2837Phopro PBPRB1995
Acinet ACIAD2484
Clotet CTC01148Fusnuc FN1177
Aquaeo aq 369Theten TTE2658Themar TM1797
Bacfra BF2544Porgin PG2014
Halmar pNG4053Clotet CTC01463
Metjan MJ0378Pyrhor PH1245
Thekod TK0455Metthe MTH1084
Arcful AF2435Metace MA3670
Nanequ NEQ017Pyrhor PH0173
Pictor PTO0003Pictor PTO0049Thevol TVN0106
1343
1343
1343
1343
1518
1518
1343
1518
RecB-f4343
RecB-f1468
RecB-f1468
RecB-f1468
Helicase12031857 RAMP
1688
RAMP1688
Helicase1203
Helicase1203
RAMPBH0337
RAMPYgcH
RAMPy1727
RAMPy17261857
Helicase1203
HD-f
Helicase1203
HD-f
1343
HD-f
HD-f
HD-f
RuvC-f
AF0070
BH0338
y1724
ygcKygcL
RAMP1583
HTH
AF1870
SPy1049
3574
AF1870
PH0918
HTH 3574
1857 RAMP1688
RecB-f1468
Helicase1203
HD-f
AF1870
MTH1090
AF1873
RAMP1583
3574
AF1870
3574
1343
1
7
2
3
4
5
6
7a
Sulsol SSO1999Sulsol SSO1440
Sultok ST2639Sultok ST0032
Sulsol SSO1402Pyraer PAE0068
Arcful AF1874Pyrhor PH0917Thekod TK0450
Pyraby PAB1689
13433513
3513
1518
RuvC-f
HNH
4aHNH
1 0
Pyrhor PH0162
6
64
6
4
16
17
7
7
7
7
1
2
7
7
7
7
5 7
5 75 7
5 75 7
5 7
6 1
7
Metkan MK1314
Metace MA1931
Metmaz MM3356
Metbar Mbar_A1355
Strthe stu0960Myctub Rv2823cStaepi SERP2461
Themar TM1811Mansuc MS1651
Niteur NE0123Thethe TTP0102
Metthe MTH1082
Metthe MTH326
Metjan MJ1672Pictor PTO0053Thevol TVN0112
Metkan MK1297Thethe TTP0115
Bachal BH0328Synech sll7090
Aquaeo aq 387Theten TTE2639
Themar TM1794Arcful AF1867
Pyrfur PF1129Thefus Tfu1578
Porgin PG1987Sultok ST1979
Sulsol SSO1991Sulsol SSO1729
Sultok ST0012Sulaci Saci 2046Nostoc all1479
1421 RAMP1567
RAMP1583
RAMPMTH323
RAMP1769
XMTH324
ldquoPolymeraserdquo cassette
ldquoHelicaserdquo cassette(A)
(B)
(C)
Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4
Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56
Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)
mRNA targeting
Phylogeny of Cas1 and the 3 types of CRISPRCas systems
228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes
EcoliCASS2Type IType I
Type I
Type I
Type I
Type III
Type III
Type III
Type III
Type III
Type III
Type II
YpestCASS3
NmeniCASS4
PolymeraseRAMPmoduleMtubeCASS6
PolRAMP
DvulgCASS1
TneapHmariCASS7
ApernCASS5
cas1cas9 cas2 cas4
cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI
II
IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2
RAMP
Makarova et al NRM submitted
CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes
54
256
1537
26
516
2
9
7 56107
3
10
13
383
210
46
216
8
1
7 70211
10
1
0102030405060708090
100
1623
84 6
17 7
2322
0
9
0
0
151
2
1
21
17
20 1
0
1533 28 7 14
0
9 7 40
117 210
0102030405060708090
100
Type I Type II Type IIICas1presenceabsence
Makarova et al in preparation
Modular evolution of the 3 types of CRISPRCas
Makarova et al in preparation
Mol Microbiol 2011 Jan79(2)484-502
A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair
Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF
Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair
Back to repair functions
The 3 major modalities of evolution
Koonin Wolf Biol Direct 2009
CRISPRCas as a bona fide Lamarckian system
Koonin Wolf Biol Direct 2009
Phenomenon Biological rolefunction
Phyletic spread
Lamarckian criteria
Genomic changes caused by environmental factor
Changes are specific to relevant genomic loci
Changes provide adaptation to the causative factor
Bona fide LamarckianCRISPRCas Defense against
viruses and other mobile elements
Most of the Archaeaand many bacteria
Yes Yes Yes
piRNA Defense against transposable elements in germline
Animals Yes Yes Yes
HGT (specific cases)
Adaptation to new environment stress response resistance
Archaea bacteria unicellular eukaryotes
Yes Yes Yes
Quasi-LamarckianHGT (general phenomenon)
Diverse innovations Archaea bacteria unicellular eukaryotes
Yes No Yesno
Stress-induced mutagenesis
Stress responseresistanceadaptation to new conditions
Ubiquitous Yes No or partially
Yes (but general evolvabilityenhanced as well)
Diverse Lamarckian and quasi-Lamarckian phenomena
Koonin Wolf Biol Direct 2009
Stress as a gauge of evolutionary modality
Koonin Wolf Biol Direct 2009
bullEvolution of parasites is intrinsic to any replicator system
bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since
bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms
bullPerennial arms race between parasites and hosts is one of the principal factors of evolution
Valerian DoljaOregon State University
Yuri Wolf NCBITatiana Senkevich NIAID NIH
Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille
Didier Raoult et al
Bill MartinUniv Duesseldorf
Kira Makarova NCBI
Gene content tree intervirus gene transfer
Amoeba as a melting pot for HGT between viruses andbacterial endosymbionts
There are really weird creatures out thereSome NCLDV host their own parasites
La Scola et al The virophage as a unique parasite of the giant mimivirus Nature 2008
Chimeric origin of the virophage genome
NCLDV
Mimivirus
Archaeal virus
NCLDVenvironmental only
Bacterialtransposon
La Scola et al Nature 2008 455100-4
Hypothesisorigin of the NCLDV (and other viruses ofEukaryotes) in the
the second melting pot of virus evolution ndasheukaryogenesis
Phage scaffold(virus hallmark genes)
Eukaryotic additionsdisplacements
Koonin Yutin Intervirology 2010
Poliovirus74 kb
VP0 3DPol2C 3CPro Angg
IRESVP1VP3 2APro 2B
3A 3B
Jelly-roll CPsSuperfamily 3
helicase RdRp
Viral hallmark genes
3C-Pro3DPol
RdRp3CPro g
VPg2C
S3H
Picornaviral lsquosignaturersquo genes
Protein sequence conservation
Some of the smallest viruses (this is where poliovirus belongs) The Big Bang of picorna-like virus evolution
Koonin Wolf Nagasaki Dolja Nat Rev Microbiol 2008 Dec6(12)925-39
Picorna-likeviruses possessdiverse arraysof hallmark andunique genes
Picorna-likeviral superfamily
Marine eukaryotic plankton carries a wealth ofpositive-strand RNA viruses nearly all belong to the picorna-like superfamily
Lang et al (2004) Virology 320206
HcRNAV44 kb
VP
HaRNAV86 kb
S-Pro Pol
VP1VP3VP2PolHel3
RsRNAV89 kb AnPolHel3 VP 1-3
SssRNAV90 kb An
PolHel3 VP 1-3C-Pro
sg sg
An
Nagasaki et al (2005) Appl Env Microbiol 718888
Culley et al (2006) Science 3121795
The amended Picornavirus-like
superfamily includes 14 recognized viral families
4 floating genera and15 unclassified positive-
strand anddouble-strand RNA
viruses that infect hosts from 4 of the 5 eukaryotic
supergroups-6 distinct clades
from RdRp phylogeny-diverse genome layouts
View from the viral side
6 major clades of picorna-likeviruses - 5 infect eukaryotesfrom different supergroups
The 5 supergroups of eukaryotes and their picorna-like viruses
Complementary view from thehost side
4 of the 5 eukaryoticsupergroups host diverse picorna-like viruses (no onechecked the 5th)
PolioV74 kb
VP0 3DPol2C 3CPro Angg
IRESVP1VP3 2APro 2B
3A 3B
Jelly-roll CPsSuperfamily 3
helicase RdRp
DNA phages Bacterial group II
retroelements
Bacterial and mitochondrialHtrA family proteases
Likely origins of the signature genes of the picorna-like superfamily were inferred usingsignature PSSMs and PSI-BLAST searches
3C-Pro
Hallmark genes of picorna-like viruses have distinct prokaryoticorigins
Radiation of major viralclades occurred in aldquoBig Bangrdquo during eukaryogenesis and antedates the divergenceof eukaryotic supergroups-the viruses then ldquosampledrdquo the hosts
Conclusions on the picorna and NCLDV storiesBig Bangs of virus evolution
bullPhylogenomic analysis of the picorna-like superfamily of eukaryotic RNA viruses indicates that the major lineages within this superfamily diverged prior to the divergence of the eukaryotic supergroupsbullBig Bang ndash an explosive early phase of viral evolutionbullMost likely the same pattern holds for other major groups ofviruses as illustrated by the evolutionary study of NCLDVa completely different group of virusesbullThe Big Bangs of eukaryotic virus evolution occurred concomitantly with a similar rapid phase of host evolution andcould be a manifestation of a general model of major evolutionarytransitions
Koonin The Biological Big Bang model for the major transitions in evolutionBiol Direct 2007 Aug 20221
Koonin Wolf Nagasaki Dolja Nat Rev Microbiol 2008 Dec6(12)925-39
+
1 Positive-strandRNA
3-30 kb - +R R ET
T+R
Class Replication cycle
2 Double-strandRNA
4-25 kb plusmnTr R E
T
+ plusmn
3 Negative-strandRNA
11-20 kbR R E
RdRp CPT
+-Tr -
+
Genetic cycles of RNA viruses
Diversity of viral genetic cycles versus the uniform genetic cycle of all cellular organisms Viruses are the
biospherersquos laboratory of genomic strategies
RdRp CP
RdRp CP
+4 Retroid
RNA viruses 7-12 kb +RT Tr E
T+Tr
plusmn
5 Retroid DNAviruses
elements2-10 kb +Tr
T+Tr
plusmnE RT
plusmn
RT CP
RT
Class Replication cycle
Genetic cycles of retroid viruses and retroelements
Genetic cycles of DNA viruses and plasmids 6 ssDNAvirusesplasmids2-11 kb
7 dsDNA virusesplasmids
5-1200 kb
+RCR RCR E
+Tr
plusmn +
TRCE
Tr
plusmn
+T
plusmnR E
Pr-Pol ALL CELLULARORGANISMS
YOU ARE HERE
I Genes with readily detectable homologs from cellular life forms 1 Genes with closely related homologs from cellular organisms
(typically the host of the given virus) present in a narrow group of viruses2 Genes that are conserved within a virus lineage or even several
lineages and have moderately close cellular homologsOrigin relatively recent (1) or ancient (2) acquisition from host
II Virus-specific genes3 ORFans ie genes without detectable homologs except possibly
in closely related viruses 4 Virus-specific genes that are conserved within a virus lineageAcquisition from host but with rapid divergence from ancestor once within viral genomes
III Viral hallmark genes5 Genes shared by many diverse virus lineages with only very distant
homologs in cellular organisms
Natural history of viral genes a one-page summary of viral comparative genomics
Contributions of different classes of viral genes to the genomes of different classes of viruses strong dependence on genome size
Genome size (log) 0 1 2 3
Recent acquisitions
Old acquisitions
Virus-specific ORFans
Virus-specific conserved
Virus hallmark
Most RNAvirusesretroelementsRCR replicons
Large RNA virusesadenotailedphages
NCLDVherpeslargephages
Natural history of viral genesViral Hallmark Genes
Shared by many diverse groups of viruses from the smallest RNA viruses to the giant DNA viruses
Strong support for monophyly of all viral members of the respective gene families
Only distant homologs in cellular organisms
Can be viewed as signatures of the lsquovirus statersquo
Play major roles in genome replication packagingand assembly
4 Rolling circle replicationinitiation endonuclease
1 Jelly-roll capsid protein
2 Superfamily 3 helicase
Protein products of viral hallmark genes
3 RNA-dependent RNA polymeraseand Reverse transcriptase
5 Viral archaeo-eukaryotic DNA primase
6 UL9-like superfamily 2 helicase
7 Packaging ATPase of the FtsK family
8 ATPase subunit of terminase
The primordial gene pool hypothesis (extremely counterintuitive ndash Santiago Elena Feb 16 2011)
The hallmark genes AND by implication the major lineages of modern viruses (at least viruses of prokaryotes) descend directly from a primordial gene pool
Koonin Senkevich Dolja Biol Direct 2006 129
Viral hallmark genes arebullpresent in a huge diversity of viruses and other selfish elementsbullrepresented only by remote homologs in cellular life forms
-synergy with the diversity of genomic strategies
A crucial corollary If viruses come directly from a primordial gene pool then origin of viruses isinextricably linked to the origin of cells
Cell degeneration Escaped genes Pr imordial geneticsystems
CELL
SMALLPARASITICCELL
VERYSMALLVIRUS
CHROMOSOME
PLASMID
VIRUS
RNA
VIRUS
PRE-CELLULAR LIFE FORMS
DNA
VIRUS
mRNA
VIRUS
Competing concepts of the or igin of viruses
CELLS
Origin and evolution of virus-like genetic
elementsin the pre-cellular
era
Koonin Martin TIG 2005Koonin Senkevich Dolja Biol Direct 2006 129Koonin Ann NY Acad Sci 2009
Replicon fusionyields chromosomes
bullViruses and virus-like genetic elements are not ldquojustrdquo pathogens they aredominant entities in the biospherebullEmergence of virus-like parasites is inevitable in any replicating systembullIn the pre-cellular epoch the genetic elements that later became viral andcellular genomes comprised a single pool in which they mixed matched andevolved new increasingly complex gene ensemblesbullDifferent replication strategies including RNA replication reverse transcriptionand DNA replication evolved already in the primordial genetic poolbullWith the emergence of prokaryotic cells a distinct pool of viral genes formed that retained its identity ever since as evidenced by the extant distribution ofviral hallmark genes ldquovirus worldrdquo or the virospherebullThe emergence of the eukaryotic cell was a second melting pot of virus evolution from which viruses of eukaryotes originated via recombination of genes from prokaryote viruses retroelements and the evolving eukaryotic hostbullViruses make essential contributions to the evolution of the genomes of cellularlife forms in particular as vehicles of HGT GTAs transducing phages
The ancient Virus World (VW)
Koonin EV Senkevich TG Dolja VV The ancient Virus World and evolution of cells Biol Direct 2006 Koonin EV Wolf YI Nagasaki K Dolja VV The complexity of the virus world Nat Rev Microbiol 2009
Evolution of antivirus defense systems
bull CRISPRCas system of adaptive immunity in prokaryotes
bull A case for Lamarckian evolutionbull The perennial virus-host arms race
CRISPR repeats and Cas genesCRISPR Clustered Regularly interspaced short palindromic repeatsCas CRISPR-associated (genes)
Sorek et al Nature Rev Microbiol 2008
Makarova KS Aravind L Grishin NV Rogozin IB Koonin EVA DNA repair system specific for thermophilic Archaea and bacteria predicted by genomic context analysisNucleic Acids Res 2002 Jan 1530(2)482-96
During a systematic analysis of conserved gene context in prokaryotic genomes a previously undetected complex partially conserved neighborhood consisting of more than 20 genes was discovered in most Archaea and some bacteria including the hyperthermophiles Thermotoga maritima and Aquifex aeolicus The gene composition and gene order in this neighborhood vary greatly between species but all versions have a stable conserved core that consists of five genes One of the core genes encodes a predicted DNA helicase often fused to a predicted HD-superfamily hydrolase and another encodes a RecB family exonuclease three core genes remain uncharacterized but one of these might encode a nuclease of a new family helliphelliphelliphellipThe functional features of the proteins encoded in this neighborhood suggest that they comprise a previously undetected DNA repair system which to our knowledge is the first repair system largely specific for thermophiles to be identified
Protein components of the system an update with unification of many diverse families
~25 families altogetherFamily SubfamilyA Phyletic
distributionBComments
1 COG1518 COG1518 (cas1) All Putative novel nucleaseintegrase Mostly α-helical protein
2 COG1343 COG1343 (cas2) COG3512ygbF-like
MTH324-likey1723_N-like
All Small protein related to VapD fused to helicase (COG1203) iny1723-like proteins
3 COG1203 COG1203 (cas3) All DNA helicase Most proteins have fusion to HD nuclease
4 RecB-like nuclease
COG1468 (cas4) COG4343 All RecB-like nuclease Contains three-cysteine C-terminal cluster
5 RAMP Repair-associated mysterious
protein
COG1688 (cas5)COG1769 COG1583COG1567 COG1336COG1367 COG1604COG1337 COG1332COG5551
BH0337-likeMJ0978-likeYgcH-like
y1726-like y1727-like
All Belong to ldquoRAMPrdquo family possibly RNA-binding proteinstructurally related to duplicated ferredoxin fold (PDB 1wj9)
6 COG1857 COG1857 COG3649 YgcJ-like y1725-like
All αβ protein predicted nuclease or integrase
7 HD-like nuclease
COG1203 (N-terminus) COG2254 All HD-like nuclease
8 BH0338 BH0338-likeMTH1090-like
All mostlyarchaea and FIRM
Large Zn-finger containing proteins probably nucleases (nucleaseactivity was shown for MTH1090 (ref)
9 COG1353 predicted
polymerase
COG1353 Most archaea some bacteria
Predicted palm-domain polymerase distantly related to viral RdRp and RT
hellipand ~15 other less common proteins Makarova et al BD 2006
CRISPR clustered regularly interspaced short palindromic repeats
CRISPR repeats
TB and TA
Strepto-like
Ecoli-like
Pasteurella-like
Cas2COG1343
Cas1COG1518
Cas4COG1468
Cas3COG1203
ldquoThe common structural characteristics of CRISPR loci are (i) the presence of multiple short direct repeats which show no or very little sequence variation within a given locus (ii) the presence of non-repetitive spacer sequences between the repeats of similar size (iii) the presence of a common leader sequence of a few hundred basepairs in most species harbouring multiple CRISPR loci (iv) the absence of long open reading frames within the locus and (v) the presence of the cas1 gene accompanied by the cas2 cas3 or cas4 genes in CRISPR-containing speciesrdquo
The cas genes are our ldquorepairrdquo system
ldquoHelicaserdquo cassette
ldquoPolymeraserdquo cassette
paragon of prokaryotic genome plasticity-extensive gene shuffling-evidence of multiple horizontal transfers-widespread gene loss
RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins
extreme sequence diversity MOTIFs specific I II III IV V COGs 13361367 hhshhGs ust-lKGhh+hh hhGtt hD lGhttsGh 160413371332 y1726-like slhlpEKuVRGT lRTIDs YGuVTsGhuh COG1851 hGphpGpsaFh hGFGRh BH0337-like TpA-h+GIh-uIh hhLpDV LGsREhuht COG1567 hhhpp ups-lhtAhh luscoGhGh COG1769 hhhh+Ph-hhh ss-hhGhlhsh hGh hGtcp+hsthchp COG1688 (Cas5) hhhhhts sssshhGhlsh lGttphh COGs 15835551 hhhhoPhhl hGtppshGFGl YgcH-like hHphlh hGu+uhGhGhh y1727-like LHphLh GFsthGLStss MJ0978-like hHNH lG+tsuhGhGol
Ferredoxin foldRNA-recognition motif (RRM)
The sequence similarity space of CRISPR repeats visualized with the BioLayoutprogram [26] Dots denote individual repeat sequences connecting lines represent Smith-Waterman similarities such that closer dots represent more similar sequences Dot colors denote cluster association as derived from MCL clustering The 12 largest clusters are indicated by circles together with their sequence logos coarse phylogenetic composition and sample secondary structures where applicableKunin et al Genome Biology 2007 8R61 doi101186gb-2007-8-4-r61
CRISPR show extreme diversity and complex clustering
Arcful GTTGAAATC-------AGACCAAAATGGGATTGAAAG- Metthe GTTAAAATC-------AGACCAAAATGGGATTGAAAT- Pyraby GTTCCAATA-------AGACTAGAATAGAATTGAAAG- Pyrhor GTTTAATAA--------GACTAAAATAGAATTGAAAG- Sulsol GATTAATCC-------------CAAAAGAATTGAAAG- Pyrhor GTTTCCGTA--------GAACTAAATAGTGTGGAAAG- Pyraby GTTTCCGTA--------GAACTTAGTAGTGTGGAAAG- Pyraer GTTTCAACT------------ATCTTTTGATTTTTGG- Pictor GTTAAATAA-------TAACCTAAATAGGATTGAAAG- Theaer GTAAAATAG---------ACCTTAATAGGATTGAAAG- Metace GTTTCAATC-------CCTCAAAGGTCTGATTTTAAC- Sultok GATGAATCC------------CAAAAGGAATTGAAAG- Sulaci GTTTTAGTT-------------TCTTGTCGTTATTAC- Pictor GTTTAAGAA--------TTACTAGATAGTATGGAGT-- Halmar GTTTCAGAC-------GGACCCTTGTGGGGTTGAAGC- Metjan ATTAAAATC-------AGACCGTTTCGGAATGGAAAT- Metkan GTTTCATTA-CCCGTATTATTACGGGTTAATTGCGAG- Nanequ CTTTCAATA---------TTTCTAATATATTAGAAAC- Themar GTTTCAATA-------CTTCCTTAGAGGTATGGAAAC- Theten GTTTCAATC-------CCTTTTAGGTAGGCTAAAAAC Thethe GTTGCAAAC-CTCGTTAGCCTCGTAGAGGATTGAAAC- Aquaeo GTTTTAACT--------CCACACGGTACATTAGAAAC- Bachal GTCGCACTC----TATATGGGT-GCGTGGATTGAAAT- Azoarcus GTGTTCCCC-------GCGCATCGCGGGGGTTGAAG-- Chltep GTCTTCCCC--------ACGCC-CGTGGGGGTGTTTC- Chrvio GTGCTCCCC-------CACGCA-CGTGGGGATGAACCG Clotet GTATTAGTA-------GCACCA-TATTGGAATGTAAAT Anabaena GTTTTAATTAACAAAAATCCCTATCAGGGATTGAAAC- Myctub GTTTCCGTC--CCCTCTCGGGGTTTTGGGTCTGACGAC Yerpes GTTCACTGC--------CGCACAGGCAGCTTAGAAA-- Metcap GTTTCAATCCACTCCCGGCTATTTAGCCGGGAGATAC- Mycgal GTTTTAGCA-CTGTACAATACTTGTGTAAGCAATAAC- Pasmul GTTAACTGC--------CGTATAGGCAGCTTAGAAA-- Pasmul GTTGTAGTTCCCTCTCTCATTTCGCAGTGCTACAAT-- Pasmul GTTCACCAT--------CGTGTAGATGGCTTAGAAA--
All
A large subset of CRISPRs is conserved among diverse specieseven between archaea and bacteria suggesting that
CRISPRs are horizontally transferred together with cas genes
Mojica FJ Diez-Villasenor C Garcia-Martinez J Soria E
Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elementsJ Mol Evol 2005 Feb60(2)174-82
Here we show that CRISPR spacers derive from preexisting sequences either chromosomal or within transmissible genetic elements such as bacteriophages and conjugative plasmids
helliphelliphelliphellipThe transcription of the CRISPR loci (Tang et al 2002) suggests that such activity could be executed by CRISPR-RNA molecules acting as regulatory RNA that specifically recognizes the target through the homologous RNA-spacer sequence similarly to the eukaryotic interference RNA
HypothesisCRISPRCasbull is a prokaryotic immunity system that functions on the
RNAi principle
bull integrates short fragments of essential phageplasmid genes into CRISPRs
bull When expressed these fragments (psiRNA ndash after prokaryotic siRNA) silence the target gene and make the organism immune to the respective agent
bull contains all or most of the protein activities involved in these processes
bull Some of the Cas proteins are functional analogs of the eukaryotic proteins involved in RNAi in particular components of RISC (RNA-Induced Silencing Complex) and form prokaryotic analogs of RISC (pRISCs)
Transcription
(protein-guided) RNA folding
specific transcription factor (COG1517)
cellular RNApol
polycistronic pre-psiRNA
p-dicer (Helicase+HD-Hydrolase)(COG1203 COG2254)
RNA processing(fast)
pre-psiRNA75-100 nt
RNA processing(slow)
p-dicer
psiRNA25-45 nt
p-slicer(COG1468 COG4343
COG1857)
RAMP
target RNA cleavage
p-RISC
plasmid or phage mRNA
Annealing to RNA target
p-RISC
The basic scheme of CASS functioning
psiRNA25-45 nt
RAMP
RAMP-RNA complex(unstable)plasmid or phage
mRNA
Primer elongationCASS RNApol(COG1353)
Amplified annealing
long dsRNA (stable)
p-dicerDuplex degradation
RAMPRAMP binding
Cycle continues
Annealing to RNA target
Variant of CASS functioning with polymerasepsi-RNA amplification(mostly in thermophiles but also in Mycobacteria)
plasmid or phage mRNA
pre-psiRNA75-100 nt
CASS RNApol(COG1353)Or other RT
Reverse transcription with random copy choice
Random RNA recombination and reverse transcription
dsDNA with CRISPRs and target-derived spacers
Homologous recombination with genomic CRISPR region
integrase(COG1518)
genomic DNA
genomic DNA with new target-derived spacer
OR
New psiRNA generation
Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12
Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity
Key validation
bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease
Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes
(1) Adaptation
(2) Expression amp Processing
(3) Interference
invading virus plasmid
CRISPRleader
5 4 3 2 1
6 5 4 3 2 1
invading virus plasmid
invading virus plasmid
host
host
cas operon
Cascade
Cas3
cas3 cas2cas1cas5
Van der Oost et al TIBS 2009
The three types of CRISPRCas systems and their signature genes
1343
1343
1343
1343
Helicase1203
HD-f
Cas2
Cas6
Cas6
Cas5
Cas8
Cas9
Cas1
Cas3
Cas10
Cas11
Cas7
Cas6
Cas6
Cas5 Cas6
Cas4
Cas6
13433513
1343
RuvC-f
3513
HNH
13433513
1857
18571857 RAMP1688
RAMP1688
RecB-f1468
RecB-f1468
Helicase1203
R AMP1583
1857 1857Helicase1203
1857Helicase1203
1343 y1724
ygcKygcL
EcoliCASS2
YpestCASS3
NmeniCASS4
MtubeCASS6 RAMP module
DvulgCASS1
Tneap and HmariCASS7
ApernCASS5
Type I (Helicase cassette)
Cascade
Type II (HNH-type)
Type III (RAMP module or Polymerase cassette)
CASS4a
1857 1343RecB-f1468
Helicase1203
BH0338MTH1090LA3191
1421 RAMP1567
RAMP15833337RAMP
1583
RAMP1583
RAMP1769
SPy1049
Cas6
Cas61857
RecB-f1468
RAMP1688
Helicase1203
AF0070
R AMP1583
1343
1517
Cas12 Cas13
RAMP1688
RAMP1688
RAMP1688
Makarova et al NRM submitted
Experimental data on CRISPRCas systems
1857
1857
1857
RAMP1688
10
Cordip DIP0037Bacfra BF3951
Wolsuc WS1444Camjej Cj1522c
Neimen NMA0630Pasmul PM1126
Strthe str0658Strpyo Spy 0770
Treden TDE0328Staepi SERP2463
Mycgal MGA 0523Mycmob MMOB0320
Porgin PG1982Legpne lpp0161
Wolsuc WS1615Sulsol SSO1450
Aerper APE1240Sulsol SSO1405
Pyraer PAE0200Pyraer PAE0081
Metmaz MM0559Metmaz MM3249
Arcful AF1878Mansuc MS1635
Niteur NE0111Myctub Rv2817c
Nostoc alr0381Synech slr7071Thethe TTHB145
Chrvio CV1229Xanaxo XAC3842
Chltep CT1130Desvul DVUA0134
Metcap MCA0651Azoarc ebA3283
Deigeo_DRAFT_1682Mansuc MS0981
Baccla ABC3592Bachal BH0341
Strpyo Spy 1286Vibvul VVA1544
Nostoc alr1468Synech slr7092
Synech slr7016Geosul GSU0057
Thethe TTHB224Nostoc alr1568
Metkan MK1312
Esccol b2755Saltyp STM2938
Phopro PBPRC0034Geosul GSU1392
Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639
Symthe_STH663Corjei jk0643
Nocfar nfa44220Strave SAV7537
Metcap MCA0930Cordip DIP2214
Chrvio CV1756Zymmob ZMO0680
Pasmul PM0311Erwcar ECA3679
Yerpes y1722Legpne lpl2837Phopro PBPRB1995
Acinet ACIAD2484
Clotet CTC01148Fusnuc FN1177
Aquaeo aq 369Theten TTE2658Themar TM1797
Bacfra BF2544Porgin PG2014
Halmar pNG4053Clotet CTC01463
Metjan MJ0378Pyrhor PH1245
Thekod TK0455Metthe MTH1084
Arcful AF2435Metace MA3670
Nanequ NEQ017Pyrhor PH0173
Pictor PTO0003Pictor PTO0049Thevol TVN0106
1343
1343
1343
1343
1518
1518
1343
1518
RecB-f4343
RecB-f1468
RecB-f1468
RecB-f1468
Helicase12031857 RAMP
1688
RAMP1688
Helicase1203
Helicase1203
RAMPBH0337
RAMPYgcH
RAMPy1727
RAMPy17261857
Helicase1203
HD-f
Helicase1203
HD-f
1343
HD-f
HD-f
HD-f
RuvC-f
AF0070
BH0338
y1724
ygcKygcL
RAMP1583
HTH
AF1870
SPy1049
3574
AF1870
PH0918
HTH 3574
1857 RAMP1688
RecB-f1468
Helicase1203
HD-f
AF1870
MTH1090
AF1873
RAMP1583
3574
AF1870
3574
1343
1
7
2
3
4
5
6
7a
Sulsol SSO1999Sulsol SSO1440
Sultok ST2639Sultok ST0032
Sulsol SSO1402Pyraer PAE0068
Arcful AF1874Pyrhor PH0917Thekod TK0450
Pyraby PAB1689
13433513
3513
1518
RuvC-f
HNH
4aHNH
1 0
Pyrhor PH0162
6
64
6
4
16
17
7
7
7
7
1
2
7
7
7
7
5 7
5 75 7
5 75 7
5 7
6 1
7
Metkan MK1314
Metace MA1931
Metmaz MM3356
Metbar Mbar_A1355
Strthe stu0960Myctub Rv2823cStaepi SERP2461
Themar TM1811Mansuc MS1651
Niteur NE0123Thethe TTP0102
Metthe MTH1082
Metthe MTH326
Metjan MJ1672Pictor PTO0053Thevol TVN0112
Metkan MK1297Thethe TTP0115
Bachal BH0328Synech sll7090
Aquaeo aq 387Theten TTE2639
Themar TM1794Arcful AF1867
Pyrfur PF1129Thefus Tfu1578
Porgin PG1987Sultok ST1979
Sulsol SSO1991Sulsol SSO1729
Sultok ST0012Sulaci Saci 2046Nostoc all1479
1421 RAMP1567
RAMP1583
RAMPMTH323
RAMP1769
XMTH324
ldquoPolymeraserdquo cassette
ldquoHelicaserdquo cassette(A)
(B)
(C)
Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4
Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56
Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)
mRNA targeting
Phylogeny of Cas1 and the 3 types of CRISPRCas systems
228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes
EcoliCASS2Type IType I
Type I
Type I
Type I
Type III
Type III
Type III
Type III
Type III
Type III
Type II
YpestCASS3
NmeniCASS4
PolymeraseRAMPmoduleMtubeCASS6
PolRAMP
DvulgCASS1
TneapHmariCASS7
ApernCASS5
cas1cas9 cas2 cas4
cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI
II
IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2
RAMP
Makarova et al NRM submitted
CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes
54
256
1537
26
516
2
9
7 56107
3
10
13
383
210
46
216
8
1
7 70211
10
1
0102030405060708090
100
1623
84 6
17 7
2322
0
9
0
0
151
2
1
21
17
20 1
0
1533 28 7 14
0
9 7 40
117 210
0102030405060708090
100
Type I Type II Type IIICas1presenceabsence
Makarova et al in preparation
Modular evolution of the 3 types of CRISPRCas
Makarova et al in preparation
Mol Microbiol 2011 Jan79(2)484-502
A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair
Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF
Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair
Back to repair functions
The 3 major modalities of evolution
Koonin Wolf Biol Direct 2009
CRISPRCas as a bona fide Lamarckian system
Koonin Wolf Biol Direct 2009
Phenomenon Biological rolefunction
Phyletic spread
Lamarckian criteria
Genomic changes caused by environmental factor
Changes are specific to relevant genomic loci
Changes provide adaptation to the causative factor
Bona fide LamarckianCRISPRCas Defense against
viruses and other mobile elements
Most of the Archaeaand many bacteria
Yes Yes Yes
piRNA Defense against transposable elements in germline
Animals Yes Yes Yes
HGT (specific cases)
Adaptation to new environment stress response resistance
Archaea bacteria unicellular eukaryotes
Yes Yes Yes
Quasi-LamarckianHGT (general phenomenon)
Diverse innovations Archaea bacteria unicellular eukaryotes
Yes No Yesno
Stress-induced mutagenesis
Stress responseresistanceadaptation to new conditions
Ubiquitous Yes No or partially
Yes (but general evolvabilityenhanced as well)
Diverse Lamarckian and quasi-Lamarckian phenomena
Koonin Wolf Biol Direct 2009
Stress as a gauge of evolutionary modality
Koonin Wolf Biol Direct 2009
bullEvolution of parasites is intrinsic to any replicator system
bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since
bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms
bullPerennial arms race between parasites and hosts is one of the principal factors of evolution
Valerian DoljaOregon State University
Yuri Wolf NCBITatiana Senkevich NIAID NIH
Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille
Didier Raoult et al
Bill MartinUniv Duesseldorf
Kira Makarova NCBI
Amoeba as a melting pot for HGT between viruses andbacterial endosymbionts
There are really weird creatures out thereSome NCLDV host their own parasites
La Scola et al The virophage as a unique parasite of the giant mimivirus Nature 2008
Chimeric origin of the virophage genome
NCLDV
Mimivirus
Archaeal virus
NCLDVenvironmental only
Bacterialtransposon
La Scola et al Nature 2008 455100-4
Hypothesisorigin of the NCLDV (and other viruses ofEukaryotes) in the
the second melting pot of virus evolution ndasheukaryogenesis
Phage scaffold(virus hallmark genes)
Eukaryotic additionsdisplacements
Koonin Yutin Intervirology 2010
Poliovirus74 kb
VP0 3DPol2C 3CPro Angg
IRESVP1VP3 2APro 2B
3A 3B
Jelly-roll CPsSuperfamily 3
helicase RdRp
Viral hallmark genes
3C-Pro3DPol
RdRp3CPro g
VPg2C
S3H
Picornaviral lsquosignaturersquo genes
Protein sequence conservation
Some of the smallest viruses (this is where poliovirus belongs) The Big Bang of picorna-like virus evolution
Koonin Wolf Nagasaki Dolja Nat Rev Microbiol 2008 Dec6(12)925-39
Picorna-likeviruses possessdiverse arraysof hallmark andunique genes
Picorna-likeviral superfamily
Marine eukaryotic plankton carries a wealth ofpositive-strand RNA viruses nearly all belong to the picorna-like superfamily
Lang et al (2004) Virology 320206
HcRNAV44 kb
VP
HaRNAV86 kb
S-Pro Pol
VP1VP3VP2PolHel3
RsRNAV89 kb AnPolHel3 VP 1-3
SssRNAV90 kb An
PolHel3 VP 1-3C-Pro
sg sg
An
Nagasaki et al (2005) Appl Env Microbiol 718888
Culley et al (2006) Science 3121795
The amended Picornavirus-like
superfamily includes 14 recognized viral families
4 floating genera and15 unclassified positive-
strand anddouble-strand RNA
viruses that infect hosts from 4 of the 5 eukaryotic
supergroups-6 distinct clades
from RdRp phylogeny-diverse genome layouts
View from the viral side
6 major clades of picorna-likeviruses - 5 infect eukaryotesfrom different supergroups
The 5 supergroups of eukaryotes and their picorna-like viruses
Complementary view from thehost side
4 of the 5 eukaryoticsupergroups host diverse picorna-like viruses (no onechecked the 5th)
PolioV74 kb
VP0 3DPol2C 3CPro Angg
IRESVP1VP3 2APro 2B
3A 3B
Jelly-roll CPsSuperfamily 3
helicase RdRp
DNA phages Bacterial group II
retroelements
Bacterial and mitochondrialHtrA family proteases
Likely origins of the signature genes of the picorna-like superfamily were inferred usingsignature PSSMs and PSI-BLAST searches
3C-Pro
Hallmark genes of picorna-like viruses have distinct prokaryoticorigins
Radiation of major viralclades occurred in aldquoBig Bangrdquo during eukaryogenesis and antedates the divergenceof eukaryotic supergroups-the viruses then ldquosampledrdquo the hosts
Conclusions on the picorna and NCLDV storiesBig Bangs of virus evolution
bullPhylogenomic analysis of the picorna-like superfamily of eukaryotic RNA viruses indicates that the major lineages within this superfamily diverged prior to the divergence of the eukaryotic supergroupsbullBig Bang ndash an explosive early phase of viral evolutionbullMost likely the same pattern holds for other major groups ofviruses as illustrated by the evolutionary study of NCLDVa completely different group of virusesbullThe Big Bangs of eukaryotic virus evolution occurred concomitantly with a similar rapid phase of host evolution andcould be a manifestation of a general model of major evolutionarytransitions
Koonin The Biological Big Bang model for the major transitions in evolutionBiol Direct 2007 Aug 20221
Koonin Wolf Nagasaki Dolja Nat Rev Microbiol 2008 Dec6(12)925-39
+
1 Positive-strandRNA
3-30 kb - +R R ET
T+R
Class Replication cycle
2 Double-strandRNA
4-25 kb plusmnTr R E
T
+ plusmn
3 Negative-strandRNA
11-20 kbR R E
RdRp CPT
+-Tr -
+
Genetic cycles of RNA viruses
Diversity of viral genetic cycles versus the uniform genetic cycle of all cellular organisms Viruses are the
biospherersquos laboratory of genomic strategies
RdRp CP
RdRp CP
+4 Retroid
RNA viruses 7-12 kb +RT Tr E
T+Tr
plusmn
5 Retroid DNAviruses
elements2-10 kb +Tr
T+Tr
plusmnE RT
plusmn
RT CP
RT
Class Replication cycle
Genetic cycles of retroid viruses and retroelements
Genetic cycles of DNA viruses and plasmids 6 ssDNAvirusesplasmids2-11 kb
7 dsDNA virusesplasmids
5-1200 kb
+RCR RCR E
+Tr
plusmn +
TRCE
Tr
plusmn
+T
plusmnR E
Pr-Pol ALL CELLULARORGANISMS
YOU ARE HERE
I Genes with readily detectable homologs from cellular life forms 1 Genes with closely related homologs from cellular organisms
(typically the host of the given virus) present in a narrow group of viruses2 Genes that are conserved within a virus lineage or even several
lineages and have moderately close cellular homologsOrigin relatively recent (1) or ancient (2) acquisition from host
II Virus-specific genes3 ORFans ie genes without detectable homologs except possibly
in closely related viruses 4 Virus-specific genes that are conserved within a virus lineageAcquisition from host but with rapid divergence from ancestor once within viral genomes
III Viral hallmark genes5 Genes shared by many diverse virus lineages with only very distant
homologs in cellular organisms
Natural history of viral genes a one-page summary of viral comparative genomics
Contributions of different classes of viral genes to the genomes of different classes of viruses strong dependence on genome size
Genome size (log) 0 1 2 3
Recent acquisitions
Old acquisitions
Virus-specific ORFans
Virus-specific conserved
Virus hallmark
Most RNAvirusesretroelementsRCR replicons
Large RNA virusesadenotailedphages
NCLDVherpeslargephages
Natural history of viral genesViral Hallmark Genes
Shared by many diverse groups of viruses from the smallest RNA viruses to the giant DNA viruses
Strong support for monophyly of all viral members of the respective gene families
Only distant homologs in cellular organisms
Can be viewed as signatures of the lsquovirus statersquo
Play major roles in genome replication packagingand assembly
4 Rolling circle replicationinitiation endonuclease
1 Jelly-roll capsid protein
2 Superfamily 3 helicase
Protein products of viral hallmark genes
3 RNA-dependent RNA polymeraseand Reverse transcriptase
5 Viral archaeo-eukaryotic DNA primase
6 UL9-like superfamily 2 helicase
7 Packaging ATPase of the FtsK family
8 ATPase subunit of terminase
The primordial gene pool hypothesis (extremely counterintuitive ndash Santiago Elena Feb 16 2011)
The hallmark genes AND by implication the major lineages of modern viruses (at least viruses of prokaryotes) descend directly from a primordial gene pool
Koonin Senkevich Dolja Biol Direct 2006 129
Viral hallmark genes arebullpresent in a huge diversity of viruses and other selfish elementsbullrepresented only by remote homologs in cellular life forms
-synergy with the diversity of genomic strategies
A crucial corollary If viruses come directly from a primordial gene pool then origin of viruses isinextricably linked to the origin of cells
Cell degeneration Escaped genes Pr imordial geneticsystems
CELL
SMALLPARASITICCELL
VERYSMALLVIRUS
CHROMOSOME
PLASMID
VIRUS
RNA
VIRUS
PRE-CELLULAR LIFE FORMS
DNA
VIRUS
mRNA
VIRUS
Competing concepts of the or igin of viruses
CELLS
Origin and evolution of virus-like genetic
elementsin the pre-cellular
era
Koonin Martin TIG 2005Koonin Senkevich Dolja Biol Direct 2006 129Koonin Ann NY Acad Sci 2009
Replicon fusionyields chromosomes
bullViruses and virus-like genetic elements are not ldquojustrdquo pathogens they aredominant entities in the biospherebullEmergence of virus-like parasites is inevitable in any replicating systembullIn the pre-cellular epoch the genetic elements that later became viral andcellular genomes comprised a single pool in which they mixed matched andevolved new increasingly complex gene ensemblesbullDifferent replication strategies including RNA replication reverse transcriptionand DNA replication evolved already in the primordial genetic poolbullWith the emergence of prokaryotic cells a distinct pool of viral genes formed that retained its identity ever since as evidenced by the extant distribution ofviral hallmark genes ldquovirus worldrdquo or the virospherebullThe emergence of the eukaryotic cell was a second melting pot of virus evolution from which viruses of eukaryotes originated via recombination of genes from prokaryote viruses retroelements and the evolving eukaryotic hostbullViruses make essential contributions to the evolution of the genomes of cellularlife forms in particular as vehicles of HGT GTAs transducing phages
The ancient Virus World (VW)
Koonin EV Senkevich TG Dolja VV The ancient Virus World and evolution of cells Biol Direct 2006 Koonin EV Wolf YI Nagasaki K Dolja VV The complexity of the virus world Nat Rev Microbiol 2009
Evolution of antivirus defense systems
bull CRISPRCas system of adaptive immunity in prokaryotes
bull A case for Lamarckian evolutionbull The perennial virus-host arms race
CRISPR repeats and Cas genesCRISPR Clustered Regularly interspaced short palindromic repeatsCas CRISPR-associated (genes)
Sorek et al Nature Rev Microbiol 2008
Makarova KS Aravind L Grishin NV Rogozin IB Koonin EVA DNA repair system specific for thermophilic Archaea and bacteria predicted by genomic context analysisNucleic Acids Res 2002 Jan 1530(2)482-96
During a systematic analysis of conserved gene context in prokaryotic genomes a previously undetected complex partially conserved neighborhood consisting of more than 20 genes was discovered in most Archaea and some bacteria including the hyperthermophiles Thermotoga maritima and Aquifex aeolicus The gene composition and gene order in this neighborhood vary greatly between species but all versions have a stable conserved core that consists of five genes One of the core genes encodes a predicted DNA helicase often fused to a predicted HD-superfamily hydrolase and another encodes a RecB family exonuclease three core genes remain uncharacterized but one of these might encode a nuclease of a new family helliphelliphelliphellipThe functional features of the proteins encoded in this neighborhood suggest that they comprise a previously undetected DNA repair system which to our knowledge is the first repair system largely specific for thermophiles to be identified
Protein components of the system an update with unification of many diverse families
~25 families altogetherFamily SubfamilyA Phyletic
distributionBComments
1 COG1518 COG1518 (cas1) All Putative novel nucleaseintegrase Mostly α-helical protein
2 COG1343 COG1343 (cas2) COG3512ygbF-like
MTH324-likey1723_N-like
All Small protein related to VapD fused to helicase (COG1203) iny1723-like proteins
3 COG1203 COG1203 (cas3) All DNA helicase Most proteins have fusion to HD nuclease
4 RecB-like nuclease
COG1468 (cas4) COG4343 All RecB-like nuclease Contains three-cysteine C-terminal cluster
5 RAMP Repair-associated mysterious
protein
COG1688 (cas5)COG1769 COG1583COG1567 COG1336COG1367 COG1604COG1337 COG1332COG5551
BH0337-likeMJ0978-likeYgcH-like
y1726-like y1727-like
All Belong to ldquoRAMPrdquo family possibly RNA-binding proteinstructurally related to duplicated ferredoxin fold (PDB 1wj9)
6 COG1857 COG1857 COG3649 YgcJ-like y1725-like
All αβ protein predicted nuclease or integrase
7 HD-like nuclease
COG1203 (N-terminus) COG2254 All HD-like nuclease
8 BH0338 BH0338-likeMTH1090-like
All mostlyarchaea and FIRM
Large Zn-finger containing proteins probably nucleases (nucleaseactivity was shown for MTH1090 (ref)
9 COG1353 predicted
polymerase
COG1353 Most archaea some bacteria
Predicted palm-domain polymerase distantly related to viral RdRp and RT
hellipand ~15 other less common proteins Makarova et al BD 2006
CRISPR clustered regularly interspaced short palindromic repeats
CRISPR repeats
TB and TA
Strepto-like
Ecoli-like
Pasteurella-like
Cas2COG1343
Cas1COG1518
Cas4COG1468
Cas3COG1203
ldquoThe common structural characteristics of CRISPR loci are (i) the presence of multiple short direct repeats which show no or very little sequence variation within a given locus (ii) the presence of non-repetitive spacer sequences between the repeats of similar size (iii) the presence of a common leader sequence of a few hundred basepairs in most species harbouring multiple CRISPR loci (iv) the absence of long open reading frames within the locus and (v) the presence of the cas1 gene accompanied by the cas2 cas3 or cas4 genes in CRISPR-containing speciesrdquo
The cas genes are our ldquorepairrdquo system
ldquoHelicaserdquo cassette
ldquoPolymeraserdquo cassette
paragon of prokaryotic genome plasticity-extensive gene shuffling-evidence of multiple horizontal transfers-widespread gene loss
RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins
extreme sequence diversity MOTIFs specific I II III IV V COGs 13361367 hhshhGs ust-lKGhh+hh hhGtt hD lGhttsGh 160413371332 y1726-like slhlpEKuVRGT lRTIDs YGuVTsGhuh COG1851 hGphpGpsaFh hGFGRh BH0337-like TpA-h+GIh-uIh hhLpDV LGsREhuht COG1567 hhhpp ups-lhtAhh luscoGhGh COG1769 hhhh+Ph-hhh ss-hhGhlhsh hGh hGtcp+hsthchp COG1688 (Cas5) hhhhhts sssshhGhlsh lGttphh COGs 15835551 hhhhoPhhl hGtppshGFGl YgcH-like hHphlh hGu+uhGhGhh y1727-like LHphLh GFsthGLStss MJ0978-like hHNH lG+tsuhGhGol
Ferredoxin foldRNA-recognition motif (RRM)
The sequence similarity space of CRISPR repeats visualized with the BioLayoutprogram [26] Dots denote individual repeat sequences connecting lines represent Smith-Waterman similarities such that closer dots represent more similar sequences Dot colors denote cluster association as derived from MCL clustering The 12 largest clusters are indicated by circles together with their sequence logos coarse phylogenetic composition and sample secondary structures where applicableKunin et al Genome Biology 2007 8R61 doi101186gb-2007-8-4-r61
CRISPR show extreme diversity and complex clustering
Arcful GTTGAAATC-------AGACCAAAATGGGATTGAAAG- Metthe GTTAAAATC-------AGACCAAAATGGGATTGAAAT- Pyraby GTTCCAATA-------AGACTAGAATAGAATTGAAAG- Pyrhor GTTTAATAA--------GACTAAAATAGAATTGAAAG- Sulsol GATTAATCC-------------CAAAAGAATTGAAAG- Pyrhor GTTTCCGTA--------GAACTAAATAGTGTGGAAAG- Pyraby GTTTCCGTA--------GAACTTAGTAGTGTGGAAAG- Pyraer GTTTCAACT------------ATCTTTTGATTTTTGG- Pictor GTTAAATAA-------TAACCTAAATAGGATTGAAAG- Theaer GTAAAATAG---------ACCTTAATAGGATTGAAAG- Metace GTTTCAATC-------CCTCAAAGGTCTGATTTTAAC- Sultok GATGAATCC------------CAAAAGGAATTGAAAG- Sulaci GTTTTAGTT-------------TCTTGTCGTTATTAC- Pictor GTTTAAGAA--------TTACTAGATAGTATGGAGT-- Halmar GTTTCAGAC-------GGACCCTTGTGGGGTTGAAGC- Metjan ATTAAAATC-------AGACCGTTTCGGAATGGAAAT- Metkan GTTTCATTA-CCCGTATTATTACGGGTTAATTGCGAG- Nanequ CTTTCAATA---------TTTCTAATATATTAGAAAC- Themar GTTTCAATA-------CTTCCTTAGAGGTATGGAAAC- Theten GTTTCAATC-------CCTTTTAGGTAGGCTAAAAAC Thethe GTTGCAAAC-CTCGTTAGCCTCGTAGAGGATTGAAAC- Aquaeo GTTTTAACT--------CCACACGGTACATTAGAAAC- Bachal GTCGCACTC----TATATGGGT-GCGTGGATTGAAAT- Azoarcus GTGTTCCCC-------GCGCATCGCGGGGGTTGAAG-- Chltep GTCTTCCCC--------ACGCC-CGTGGGGGTGTTTC- Chrvio GTGCTCCCC-------CACGCA-CGTGGGGATGAACCG Clotet GTATTAGTA-------GCACCA-TATTGGAATGTAAAT Anabaena GTTTTAATTAACAAAAATCCCTATCAGGGATTGAAAC- Myctub GTTTCCGTC--CCCTCTCGGGGTTTTGGGTCTGACGAC Yerpes GTTCACTGC--------CGCACAGGCAGCTTAGAAA-- Metcap GTTTCAATCCACTCCCGGCTATTTAGCCGGGAGATAC- Mycgal GTTTTAGCA-CTGTACAATACTTGTGTAAGCAATAAC- Pasmul GTTAACTGC--------CGTATAGGCAGCTTAGAAA-- Pasmul GTTGTAGTTCCCTCTCTCATTTCGCAGTGCTACAAT-- Pasmul GTTCACCAT--------CGTGTAGATGGCTTAGAAA--
All
A large subset of CRISPRs is conserved among diverse specieseven between archaea and bacteria suggesting that
CRISPRs are horizontally transferred together with cas genes
Mojica FJ Diez-Villasenor C Garcia-Martinez J Soria E
Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elementsJ Mol Evol 2005 Feb60(2)174-82
Here we show that CRISPR spacers derive from preexisting sequences either chromosomal or within transmissible genetic elements such as bacteriophages and conjugative plasmids
helliphelliphelliphellipThe transcription of the CRISPR loci (Tang et al 2002) suggests that such activity could be executed by CRISPR-RNA molecules acting as regulatory RNA that specifically recognizes the target through the homologous RNA-spacer sequence similarly to the eukaryotic interference RNA
HypothesisCRISPRCasbull is a prokaryotic immunity system that functions on the
RNAi principle
bull integrates short fragments of essential phageplasmid genes into CRISPRs
bull When expressed these fragments (psiRNA ndash after prokaryotic siRNA) silence the target gene and make the organism immune to the respective agent
bull contains all or most of the protein activities involved in these processes
bull Some of the Cas proteins are functional analogs of the eukaryotic proteins involved in RNAi in particular components of RISC (RNA-Induced Silencing Complex) and form prokaryotic analogs of RISC (pRISCs)
Transcription
(protein-guided) RNA folding
specific transcription factor (COG1517)
cellular RNApol
polycistronic pre-psiRNA
p-dicer (Helicase+HD-Hydrolase)(COG1203 COG2254)
RNA processing(fast)
pre-psiRNA75-100 nt
RNA processing(slow)
p-dicer
psiRNA25-45 nt
p-slicer(COG1468 COG4343
COG1857)
RAMP
target RNA cleavage
p-RISC
plasmid or phage mRNA
Annealing to RNA target
p-RISC
The basic scheme of CASS functioning
psiRNA25-45 nt
RAMP
RAMP-RNA complex(unstable)plasmid or phage
mRNA
Primer elongationCASS RNApol(COG1353)
Amplified annealing
long dsRNA (stable)
p-dicerDuplex degradation
RAMPRAMP binding
Cycle continues
Annealing to RNA target
Variant of CASS functioning with polymerasepsi-RNA amplification(mostly in thermophiles but also in Mycobacteria)
plasmid or phage mRNA
pre-psiRNA75-100 nt
CASS RNApol(COG1353)Or other RT
Reverse transcription with random copy choice
Random RNA recombination and reverse transcription
dsDNA with CRISPRs and target-derived spacers
Homologous recombination with genomic CRISPR region
integrase(COG1518)
genomic DNA
genomic DNA with new target-derived spacer
OR
New psiRNA generation
Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12
Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity
Key validation
bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease
Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes
(1) Adaptation
(2) Expression amp Processing
(3) Interference
invading virus plasmid
CRISPRleader
5 4 3 2 1
6 5 4 3 2 1
invading virus plasmid
invading virus plasmid
host
host
cas operon
Cascade
Cas3
cas3 cas2cas1cas5
Van der Oost et al TIBS 2009
The three types of CRISPRCas systems and their signature genes
1343
1343
1343
1343
Helicase1203
HD-f
Cas2
Cas6
Cas6
Cas5
Cas8
Cas9
Cas1
Cas3
Cas10
Cas11
Cas7
Cas6
Cas6
Cas5 Cas6
Cas4
Cas6
13433513
1343
RuvC-f
3513
HNH
13433513
1857
18571857 RAMP1688
RAMP1688
RecB-f1468
RecB-f1468
Helicase1203
R AMP1583
1857 1857Helicase1203
1857Helicase1203
1343 y1724
ygcKygcL
EcoliCASS2
YpestCASS3
NmeniCASS4
MtubeCASS6 RAMP module
DvulgCASS1
Tneap and HmariCASS7
ApernCASS5
Type I (Helicase cassette)
Cascade
Type II (HNH-type)
Type III (RAMP module or Polymerase cassette)
CASS4a
1857 1343RecB-f1468
Helicase1203
BH0338MTH1090LA3191
1421 RAMP1567
RAMP15833337RAMP
1583
RAMP1583
RAMP1769
SPy1049
Cas6
Cas61857
RecB-f1468
RAMP1688
Helicase1203
AF0070
R AMP1583
1343
1517
Cas12 Cas13
RAMP1688
RAMP1688
RAMP1688
Makarova et al NRM submitted
Experimental data on CRISPRCas systems
1857
1857
1857
RAMP1688
10
Cordip DIP0037Bacfra BF3951
Wolsuc WS1444Camjej Cj1522c
Neimen NMA0630Pasmul PM1126
Strthe str0658Strpyo Spy 0770
Treden TDE0328Staepi SERP2463
Mycgal MGA 0523Mycmob MMOB0320
Porgin PG1982Legpne lpp0161
Wolsuc WS1615Sulsol SSO1450
Aerper APE1240Sulsol SSO1405
Pyraer PAE0200Pyraer PAE0081
Metmaz MM0559Metmaz MM3249
Arcful AF1878Mansuc MS1635
Niteur NE0111Myctub Rv2817c
Nostoc alr0381Synech slr7071Thethe TTHB145
Chrvio CV1229Xanaxo XAC3842
Chltep CT1130Desvul DVUA0134
Metcap MCA0651Azoarc ebA3283
Deigeo_DRAFT_1682Mansuc MS0981
Baccla ABC3592Bachal BH0341
Strpyo Spy 1286Vibvul VVA1544
Nostoc alr1468Synech slr7092
Synech slr7016Geosul GSU0057
Thethe TTHB224Nostoc alr1568
Metkan MK1312
Esccol b2755Saltyp STM2938
Phopro PBPRC0034Geosul GSU1392
Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639
Symthe_STH663Corjei jk0643
Nocfar nfa44220Strave SAV7537
Metcap MCA0930Cordip DIP2214
Chrvio CV1756Zymmob ZMO0680
Pasmul PM0311Erwcar ECA3679
Yerpes y1722Legpne lpl2837Phopro PBPRB1995
Acinet ACIAD2484
Clotet CTC01148Fusnuc FN1177
Aquaeo aq 369Theten TTE2658Themar TM1797
Bacfra BF2544Porgin PG2014
Halmar pNG4053Clotet CTC01463
Metjan MJ0378Pyrhor PH1245
Thekod TK0455Metthe MTH1084
Arcful AF2435Metace MA3670
Nanequ NEQ017Pyrhor PH0173
Pictor PTO0003Pictor PTO0049Thevol TVN0106
1343
1343
1343
1343
1518
1518
1343
1518
RecB-f4343
RecB-f1468
RecB-f1468
RecB-f1468
Helicase12031857 RAMP
1688
RAMP1688
Helicase1203
Helicase1203
RAMPBH0337
RAMPYgcH
RAMPy1727
RAMPy17261857
Helicase1203
HD-f
Helicase1203
HD-f
1343
HD-f
HD-f
HD-f
RuvC-f
AF0070
BH0338
y1724
ygcKygcL
RAMP1583
HTH
AF1870
SPy1049
3574
AF1870
PH0918
HTH 3574
1857 RAMP1688
RecB-f1468
Helicase1203
HD-f
AF1870
MTH1090
AF1873
RAMP1583
3574
AF1870
3574
1343
1
7
2
3
4
5
6
7a
Sulsol SSO1999Sulsol SSO1440
Sultok ST2639Sultok ST0032
Sulsol SSO1402Pyraer PAE0068
Arcful AF1874Pyrhor PH0917Thekod TK0450
Pyraby PAB1689
13433513
3513
1518
RuvC-f
HNH
4aHNH
1 0
Pyrhor PH0162
6
64
6
4
16
17
7
7
7
7
1
2
7
7
7
7
5 7
5 75 7
5 75 7
5 7
6 1
7
Metkan MK1314
Metace MA1931
Metmaz MM3356
Metbar Mbar_A1355
Strthe stu0960Myctub Rv2823cStaepi SERP2461
Themar TM1811Mansuc MS1651
Niteur NE0123Thethe TTP0102
Metthe MTH1082
Metthe MTH326
Metjan MJ1672Pictor PTO0053Thevol TVN0112
Metkan MK1297Thethe TTP0115
Bachal BH0328Synech sll7090
Aquaeo aq 387Theten TTE2639
Themar TM1794Arcful AF1867
Pyrfur PF1129Thefus Tfu1578
Porgin PG1987Sultok ST1979
Sulsol SSO1991Sulsol SSO1729
Sultok ST0012Sulaci Saci 2046Nostoc all1479
1421 RAMP1567
RAMP1583
RAMPMTH323
RAMP1769
XMTH324
ldquoPolymeraserdquo cassette
ldquoHelicaserdquo cassette(A)
(B)
(C)
Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4
Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56
Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)
mRNA targeting
Phylogeny of Cas1 and the 3 types of CRISPRCas systems
228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes
EcoliCASS2Type IType I
Type I
Type I
Type I
Type III
Type III
Type III
Type III
Type III
Type III
Type II
YpestCASS3
NmeniCASS4
PolymeraseRAMPmoduleMtubeCASS6
PolRAMP
DvulgCASS1
TneapHmariCASS7
ApernCASS5
cas1cas9 cas2 cas4
cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI
II
IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2
RAMP
Makarova et al NRM submitted
CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes
54
256
1537
26
516
2
9
7 56107
3
10
13
383
210
46
216
8
1
7 70211
10
1
0102030405060708090
100
1623
84 6
17 7
2322
0
9
0
0
151
2
1
21
17
20 1
0
1533 28 7 14
0
9 7 40
117 210
0102030405060708090
100
Type I Type II Type IIICas1presenceabsence
Makarova et al in preparation
Modular evolution of the 3 types of CRISPRCas
Makarova et al in preparation
Mol Microbiol 2011 Jan79(2)484-502
A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair
Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF
Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair
Back to repair functions
The 3 major modalities of evolution
Koonin Wolf Biol Direct 2009
CRISPRCas as a bona fide Lamarckian system
Koonin Wolf Biol Direct 2009
Phenomenon Biological rolefunction
Phyletic spread
Lamarckian criteria
Genomic changes caused by environmental factor
Changes are specific to relevant genomic loci
Changes provide adaptation to the causative factor
Bona fide LamarckianCRISPRCas Defense against
viruses and other mobile elements
Most of the Archaeaand many bacteria
Yes Yes Yes
piRNA Defense against transposable elements in germline
Animals Yes Yes Yes
HGT (specific cases)
Adaptation to new environment stress response resistance
Archaea bacteria unicellular eukaryotes
Yes Yes Yes
Quasi-LamarckianHGT (general phenomenon)
Diverse innovations Archaea bacteria unicellular eukaryotes
Yes No Yesno
Stress-induced mutagenesis
Stress responseresistanceadaptation to new conditions
Ubiquitous Yes No or partially
Yes (but general evolvabilityenhanced as well)
Diverse Lamarckian and quasi-Lamarckian phenomena
Koonin Wolf Biol Direct 2009
Stress as a gauge of evolutionary modality
Koonin Wolf Biol Direct 2009
bullEvolution of parasites is intrinsic to any replicator system
bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since
bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms
bullPerennial arms race between parasites and hosts is one of the principal factors of evolution
Valerian DoljaOregon State University
Yuri Wolf NCBITatiana Senkevich NIAID NIH
Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille
Didier Raoult et al
Bill MartinUniv Duesseldorf
Kira Makarova NCBI
There are really weird creatures out thereSome NCLDV host their own parasites
La Scola et al The virophage as a unique parasite of the giant mimivirus Nature 2008
Chimeric origin of the virophage genome
NCLDV
Mimivirus
Archaeal virus
NCLDVenvironmental only
Bacterialtransposon
La Scola et al Nature 2008 455100-4
Hypothesisorigin of the NCLDV (and other viruses ofEukaryotes) in the
the second melting pot of virus evolution ndasheukaryogenesis
Phage scaffold(virus hallmark genes)
Eukaryotic additionsdisplacements
Koonin Yutin Intervirology 2010
Poliovirus74 kb
VP0 3DPol2C 3CPro Angg
IRESVP1VP3 2APro 2B
3A 3B
Jelly-roll CPsSuperfamily 3
helicase RdRp
Viral hallmark genes
3C-Pro3DPol
RdRp3CPro g
VPg2C
S3H
Picornaviral lsquosignaturersquo genes
Protein sequence conservation
Some of the smallest viruses (this is where poliovirus belongs) The Big Bang of picorna-like virus evolution
Koonin Wolf Nagasaki Dolja Nat Rev Microbiol 2008 Dec6(12)925-39
Picorna-likeviruses possessdiverse arraysof hallmark andunique genes
Picorna-likeviral superfamily
Marine eukaryotic plankton carries a wealth ofpositive-strand RNA viruses nearly all belong to the picorna-like superfamily
Lang et al (2004) Virology 320206
HcRNAV44 kb
VP
HaRNAV86 kb
S-Pro Pol
VP1VP3VP2PolHel3
RsRNAV89 kb AnPolHel3 VP 1-3
SssRNAV90 kb An
PolHel3 VP 1-3C-Pro
sg sg
An
Nagasaki et al (2005) Appl Env Microbiol 718888
Culley et al (2006) Science 3121795
The amended Picornavirus-like
superfamily includes 14 recognized viral families
4 floating genera and15 unclassified positive-
strand anddouble-strand RNA
viruses that infect hosts from 4 of the 5 eukaryotic
supergroups-6 distinct clades
from RdRp phylogeny-diverse genome layouts
View from the viral side
6 major clades of picorna-likeviruses - 5 infect eukaryotesfrom different supergroups
The 5 supergroups of eukaryotes and their picorna-like viruses
Complementary view from thehost side
4 of the 5 eukaryoticsupergroups host diverse picorna-like viruses (no onechecked the 5th)
PolioV74 kb
VP0 3DPol2C 3CPro Angg
IRESVP1VP3 2APro 2B
3A 3B
Jelly-roll CPsSuperfamily 3
helicase RdRp
DNA phages Bacterial group II
retroelements
Bacterial and mitochondrialHtrA family proteases
Likely origins of the signature genes of the picorna-like superfamily were inferred usingsignature PSSMs and PSI-BLAST searches
3C-Pro
Hallmark genes of picorna-like viruses have distinct prokaryoticorigins
Radiation of major viralclades occurred in aldquoBig Bangrdquo during eukaryogenesis and antedates the divergenceof eukaryotic supergroups-the viruses then ldquosampledrdquo the hosts
Conclusions on the picorna and NCLDV storiesBig Bangs of virus evolution
bullPhylogenomic analysis of the picorna-like superfamily of eukaryotic RNA viruses indicates that the major lineages within this superfamily diverged prior to the divergence of the eukaryotic supergroupsbullBig Bang ndash an explosive early phase of viral evolutionbullMost likely the same pattern holds for other major groups ofviruses as illustrated by the evolutionary study of NCLDVa completely different group of virusesbullThe Big Bangs of eukaryotic virus evolution occurred concomitantly with a similar rapid phase of host evolution andcould be a manifestation of a general model of major evolutionarytransitions
Koonin The Biological Big Bang model for the major transitions in evolutionBiol Direct 2007 Aug 20221
Koonin Wolf Nagasaki Dolja Nat Rev Microbiol 2008 Dec6(12)925-39
+
1 Positive-strandRNA
3-30 kb - +R R ET
T+R
Class Replication cycle
2 Double-strandRNA
4-25 kb plusmnTr R E
T
+ plusmn
3 Negative-strandRNA
11-20 kbR R E
RdRp CPT
+-Tr -
+
Genetic cycles of RNA viruses
Diversity of viral genetic cycles versus the uniform genetic cycle of all cellular organisms Viruses are the
biospherersquos laboratory of genomic strategies
RdRp CP
RdRp CP
+4 Retroid
RNA viruses 7-12 kb +RT Tr E
T+Tr
plusmn
5 Retroid DNAviruses
elements2-10 kb +Tr
T+Tr
plusmnE RT
plusmn
RT CP
RT
Class Replication cycle
Genetic cycles of retroid viruses and retroelements
Genetic cycles of DNA viruses and plasmids 6 ssDNAvirusesplasmids2-11 kb
7 dsDNA virusesplasmids
5-1200 kb
+RCR RCR E
+Tr
plusmn +
TRCE
Tr
plusmn
+T
plusmnR E
Pr-Pol ALL CELLULARORGANISMS
YOU ARE HERE
I Genes with readily detectable homologs from cellular life forms 1 Genes with closely related homologs from cellular organisms
(typically the host of the given virus) present in a narrow group of viruses2 Genes that are conserved within a virus lineage or even several
lineages and have moderately close cellular homologsOrigin relatively recent (1) or ancient (2) acquisition from host
II Virus-specific genes3 ORFans ie genes without detectable homologs except possibly
in closely related viruses 4 Virus-specific genes that are conserved within a virus lineageAcquisition from host but with rapid divergence from ancestor once within viral genomes
III Viral hallmark genes5 Genes shared by many diverse virus lineages with only very distant
homologs in cellular organisms
Natural history of viral genes a one-page summary of viral comparative genomics
Contributions of different classes of viral genes to the genomes of different classes of viruses strong dependence on genome size
Genome size (log) 0 1 2 3
Recent acquisitions
Old acquisitions
Virus-specific ORFans
Virus-specific conserved
Virus hallmark
Most RNAvirusesretroelementsRCR replicons
Large RNA virusesadenotailedphages
NCLDVherpeslargephages
Natural history of viral genesViral Hallmark Genes
Shared by many diverse groups of viruses from the smallest RNA viruses to the giant DNA viruses
Strong support for monophyly of all viral members of the respective gene families
Only distant homologs in cellular organisms
Can be viewed as signatures of the lsquovirus statersquo
Play major roles in genome replication packagingand assembly
4 Rolling circle replicationinitiation endonuclease
1 Jelly-roll capsid protein
2 Superfamily 3 helicase
Protein products of viral hallmark genes
3 RNA-dependent RNA polymeraseand Reverse transcriptase
5 Viral archaeo-eukaryotic DNA primase
6 UL9-like superfamily 2 helicase
7 Packaging ATPase of the FtsK family
8 ATPase subunit of terminase
The primordial gene pool hypothesis (extremely counterintuitive ndash Santiago Elena Feb 16 2011)
The hallmark genes AND by implication the major lineages of modern viruses (at least viruses of prokaryotes) descend directly from a primordial gene pool
Koonin Senkevich Dolja Biol Direct 2006 129
Viral hallmark genes arebullpresent in a huge diversity of viruses and other selfish elementsbullrepresented only by remote homologs in cellular life forms
-synergy with the diversity of genomic strategies
A crucial corollary If viruses come directly from a primordial gene pool then origin of viruses isinextricably linked to the origin of cells
Cell degeneration Escaped genes Pr imordial geneticsystems
CELL
SMALLPARASITICCELL
VERYSMALLVIRUS
CHROMOSOME
PLASMID
VIRUS
RNA
VIRUS
PRE-CELLULAR LIFE FORMS
DNA
VIRUS
mRNA
VIRUS
Competing concepts of the or igin of viruses
CELLS
Origin and evolution of virus-like genetic
elementsin the pre-cellular
era
Koonin Martin TIG 2005Koonin Senkevich Dolja Biol Direct 2006 129Koonin Ann NY Acad Sci 2009
Replicon fusionyields chromosomes
bullViruses and virus-like genetic elements are not ldquojustrdquo pathogens they aredominant entities in the biospherebullEmergence of virus-like parasites is inevitable in any replicating systembullIn the pre-cellular epoch the genetic elements that later became viral andcellular genomes comprised a single pool in which they mixed matched andevolved new increasingly complex gene ensemblesbullDifferent replication strategies including RNA replication reverse transcriptionand DNA replication evolved already in the primordial genetic poolbullWith the emergence of prokaryotic cells a distinct pool of viral genes formed that retained its identity ever since as evidenced by the extant distribution ofviral hallmark genes ldquovirus worldrdquo or the virospherebullThe emergence of the eukaryotic cell was a second melting pot of virus evolution from which viruses of eukaryotes originated via recombination of genes from prokaryote viruses retroelements and the evolving eukaryotic hostbullViruses make essential contributions to the evolution of the genomes of cellularlife forms in particular as vehicles of HGT GTAs transducing phages
The ancient Virus World (VW)
Koonin EV Senkevich TG Dolja VV The ancient Virus World and evolution of cells Biol Direct 2006 Koonin EV Wolf YI Nagasaki K Dolja VV The complexity of the virus world Nat Rev Microbiol 2009
Evolution of antivirus defense systems
bull CRISPRCas system of adaptive immunity in prokaryotes
bull A case for Lamarckian evolutionbull The perennial virus-host arms race
CRISPR repeats and Cas genesCRISPR Clustered Regularly interspaced short palindromic repeatsCas CRISPR-associated (genes)
Sorek et al Nature Rev Microbiol 2008
Makarova KS Aravind L Grishin NV Rogozin IB Koonin EVA DNA repair system specific for thermophilic Archaea and bacteria predicted by genomic context analysisNucleic Acids Res 2002 Jan 1530(2)482-96
During a systematic analysis of conserved gene context in prokaryotic genomes a previously undetected complex partially conserved neighborhood consisting of more than 20 genes was discovered in most Archaea and some bacteria including the hyperthermophiles Thermotoga maritima and Aquifex aeolicus The gene composition and gene order in this neighborhood vary greatly between species but all versions have a stable conserved core that consists of five genes One of the core genes encodes a predicted DNA helicase often fused to a predicted HD-superfamily hydrolase and another encodes a RecB family exonuclease three core genes remain uncharacterized but one of these might encode a nuclease of a new family helliphelliphelliphellipThe functional features of the proteins encoded in this neighborhood suggest that they comprise a previously undetected DNA repair system which to our knowledge is the first repair system largely specific for thermophiles to be identified
Protein components of the system an update with unification of many diverse families
~25 families altogetherFamily SubfamilyA Phyletic
distributionBComments
1 COG1518 COG1518 (cas1) All Putative novel nucleaseintegrase Mostly α-helical protein
2 COG1343 COG1343 (cas2) COG3512ygbF-like
MTH324-likey1723_N-like
All Small protein related to VapD fused to helicase (COG1203) iny1723-like proteins
3 COG1203 COG1203 (cas3) All DNA helicase Most proteins have fusion to HD nuclease
4 RecB-like nuclease
COG1468 (cas4) COG4343 All RecB-like nuclease Contains three-cysteine C-terminal cluster
5 RAMP Repair-associated mysterious
protein
COG1688 (cas5)COG1769 COG1583COG1567 COG1336COG1367 COG1604COG1337 COG1332COG5551
BH0337-likeMJ0978-likeYgcH-like
y1726-like y1727-like
All Belong to ldquoRAMPrdquo family possibly RNA-binding proteinstructurally related to duplicated ferredoxin fold (PDB 1wj9)
6 COG1857 COG1857 COG3649 YgcJ-like y1725-like
All αβ protein predicted nuclease or integrase
7 HD-like nuclease
COG1203 (N-terminus) COG2254 All HD-like nuclease
8 BH0338 BH0338-likeMTH1090-like
All mostlyarchaea and FIRM
Large Zn-finger containing proteins probably nucleases (nucleaseactivity was shown for MTH1090 (ref)
9 COG1353 predicted
polymerase
COG1353 Most archaea some bacteria
Predicted palm-domain polymerase distantly related to viral RdRp and RT
hellipand ~15 other less common proteins Makarova et al BD 2006
CRISPR clustered regularly interspaced short palindromic repeats
CRISPR repeats
TB and TA
Strepto-like
Ecoli-like
Pasteurella-like
Cas2COG1343
Cas1COG1518
Cas4COG1468
Cas3COG1203
ldquoThe common structural characteristics of CRISPR loci are (i) the presence of multiple short direct repeats which show no or very little sequence variation within a given locus (ii) the presence of non-repetitive spacer sequences between the repeats of similar size (iii) the presence of a common leader sequence of a few hundred basepairs in most species harbouring multiple CRISPR loci (iv) the absence of long open reading frames within the locus and (v) the presence of the cas1 gene accompanied by the cas2 cas3 or cas4 genes in CRISPR-containing speciesrdquo
The cas genes are our ldquorepairrdquo system
ldquoHelicaserdquo cassette
ldquoPolymeraserdquo cassette
paragon of prokaryotic genome plasticity-extensive gene shuffling-evidence of multiple horizontal transfers-widespread gene loss
RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins
extreme sequence diversity MOTIFs specific I II III IV V COGs 13361367 hhshhGs ust-lKGhh+hh hhGtt hD lGhttsGh 160413371332 y1726-like slhlpEKuVRGT lRTIDs YGuVTsGhuh COG1851 hGphpGpsaFh hGFGRh BH0337-like TpA-h+GIh-uIh hhLpDV LGsREhuht COG1567 hhhpp ups-lhtAhh luscoGhGh COG1769 hhhh+Ph-hhh ss-hhGhlhsh hGh hGtcp+hsthchp COG1688 (Cas5) hhhhhts sssshhGhlsh lGttphh COGs 15835551 hhhhoPhhl hGtppshGFGl YgcH-like hHphlh hGu+uhGhGhh y1727-like LHphLh GFsthGLStss MJ0978-like hHNH lG+tsuhGhGol
Ferredoxin foldRNA-recognition motif (RRM)
The sequence similarity space of CRISPR repeats visualized with the BioLayoutprogram [26] Dots denote individual repeat sequences connecting lines represent Smith-Waterman similarities such that closer dots represent more similar sequences Dot colors denote cluster association as derived from MCL clustering The 12 largest clusters are indicated by circles together with their sequence logos coarse phylogenetic composition and sample secondary structures where applicableKunin et al Genome Biology 2007 8R61 doi101186gb-2007-8-4-r61
CRISPR show extreme diversity and complex clustering
Arcful GTTGAAATC-------AGACCAAAATGGGATTGAAAG- Metthe GTTAAAATC-------AGACCAAAATGGGATTGAAAT- Pyraby GTTCCAATA-------AGACTAGAATAGAATTGAAAG- Pyrhor GTTTAATAA--------GACTAAAATAGAATTGAAAG- Sulsol GATTAATCC-------------CAAAAGAATTGAAAG- Pyrhor GTTTCCGTA--------GAACTAAATAGTGTGGAAAG- Pyraby GTTTCCGTA--------GAACTTAGTAGTGTGGAAAG- Pyraer GTTTCAACT------------ATCTTTTGATTTTTGG- Pictor GTTAAATAA-------TAACCTAAATAGGATTGAAAG- Theaer GTAAAATAG---------ACCTTAATAGGATTGAAAG- Metace GTTTCAATC-------CCTCAAAGGTCTGATTTTAAC- Sultok GATGAATCC------------CAAAAGGAATTGAAAG- Sulaci GTTTTAGTT-------------TCTTGTCGTTATTAC- Pictor GTTTAAGAA--------TTACTAGATAGTATGGAGT-- Halmar GTTTCAGAC-------GGACCCTTGTGGGGTTGAAGC- Metjan ATTAAAATC-------AGACCGTTTCGGAATGGAAAT- Metkan GTTTCATTA-CCCGTATTATTACGGGTTAATTGCGAG- Nanequ CTTTCAATA---------TTTCTAATATATTAGAAAC- Themar GTTTCAATA-------CTTCCTTAGAGGTATGGAAAC- Theten GTTTCAATC-------CCTTTTAGGTAGGCTAAAAAC Thethe GTTGCAAAC-CTCGTTAGCCTCGTAGAGGATTGAAAC- Aquaeo GTTTTAACT--------CCACACGGTACATTAGAAAC- Bachal GTCGCACTC----TATATGGGT-GCGTGGATTGAAAT- Azoarcus GTGTTCCCC-------GCGCATCGCGGGGGTTGAAG-- Chltep GTCTTCCCC--------ACGCC-CGTGGGGGTGTTTC- Chrvio GTGCTCCCC-------CACGCA-CGTGGGGATGAACCG Clotet GTATTAGTA-------GCACCA-TATTGGAATGTAAAT Anabaena GTTTTAATTAACAAAAATCCCTATCAGGGATTGAAAC- Myctub GTTTCCGTC--CCCTCTCGGGGTTTTGGGTCTGACGAC Yerpes GTTCACTGC--------CGCACAGGCAGCTTAGAAA-- Metcap GTTTCAATCCACTCCCGGCTATTTAGCCGGGAGATAC- Mycgal GTTTTAGCA-CTGTACAATACTTGTGTAAGCAATAAC- Pasmul GTTAACTGC--------CGTATAGGCAGCTTAGAAA-- Pasmul GTTGTAGTTCCCTCTCTCATTTCGCAGTGCTACAAT-- Pasmul GTTCACCAT--------CGTGTAGATGGCTTAGAAA--
All
A large subset of CRISPRs is conserved among diverse specieseven between archaea and bacteria suggesting that
CRISPRs are horizontally transferred together with cas genes
Mojica FJ Diez-Villasenor C Garcia-Martinez J Soria E
Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elementsJ Mol Evol 2005 Feb60(2)174-82
Here we show that CRISPR spacers derive from preexisting sequences either chromosomal or within transmissible genetic elements such as bacteriophages and conjugative plasmids
helliphelliphelliphellipThe transcription of the CRISPR loci (Tang et al 2002) suggests that such activity could be executed by CRISPR-RNA molecules acting as regulatory RNA that specifically recognizes the target through the homologous RNA-spacer sequence similarly to the eukaryotic interference RNA
HypothesisCRISPRCasbull is a prokaryotic immunity system that functions on the
RNAi principle
bull integrates short fragments of essential phageplasmid genes into CRISPRs
bull When expressed these fragments (psiRNA ndash after prokaryotic siRNA) silence the target gene and make the organism immune to the respective agent
bull contains all or most of the protein activities involved in these processes
bull Some of the Cas proteins are functional analogs of the eukaryotic proteins involved in RNAi in particular components of RISC (RNA-Induced Silencing Complex) and form prokaryotic analogs of RISC (pRISCs)
Transcription
(protein-guided) RNA folding
specific transcription factor (COG1517)
cellular RNApol
polycistronic pre-psiRNA
p-dicer (Helicase+HD-Hydrolase)(COG1203 COG2254)
RNA processing(fast)
pre-psiRNA75-100 nt
RNA processing(slow)
p-dicer
psiRNA25-45 nt
p-slicer(COG1468 COG4343
COG1857)
RAMP
target RNA cleavage
p-RISC
plasmid or phage mRNA
Annealing to RNA target
p-RISC
The basic scheme of CASS functioning
psiRNA25-45 nt
RAMP
RAMP-RNA complex(unstable)plasmid or phage
mRNA
Primer elongationCASS RNApol(COG1353)
Amplified annealing
long dsRNA (stable)
p-dicerDuplex degradation
RAMPRAMP binding
Cycle continues
Annealing to RNA target
Variant of CASS functioning with polymerasepsi-RNA amplification(mostly in thermophiles but also in Mycobacteria)
plasmid or phage mRNA
pre-psiRNA75-100 nt
CASS RNApol(COG1353)Or other RT
Reverse transcription with random copy choice
Random RNA recombination and reverse transcription
dsDNA with CRISPRs and target-derived spacers
Homologous recombination with genomic CRISPR region
integrase(COG1518)
genomic DNA
genomic DNA with new target-derived spacer
OR
New psiRNA generation
Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12
Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity
Key validation
bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease
Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes
(1) Adaptation
(2) Expression amp Processing
(3) Interference
invading virus plasmid
CRISPRleader
5 4 3 2 1
6 5 4 3 2 1
invading virus plasmid
invading virus plasmid
host
host
cas operon
Cascade
Cas3
cas3 cas2cas1cas5
Van der Oost et al TIBS 2009
The three types of CRISPRCas systems and their signature genes
1343
1343
1343
1343
Helicase1203
HD-f
Cas2
Cas6
Cas6
Cas5
Cas8
Cas9
Cas1
Cas3
Cas10
Cas11
Cas7
Cas6
Cas6
Cas5 Cas6
Cas4
Cas6
13433513
1343
RuvC-f
3513
HNH
13433513
1857
18571857 RAMP1688
RAMP1688
RecB-f1468
RecB-f1468
Helicase1203
R AMP1583
1857 1857Helicase1203
1857Helicase1203
1343 y1724
ygcKygcL
EcoliCASS2
YpestCASS3
NmeniCASS4
MtubeCASS6 RAMP module
DvulgCASS1
Tneap and HmariCASS7
ApernCASS5
Type I (Helicase cassette)
Cascade
Type II (HNH-type)
Type III (RAMP module or Polymerase cassette)
CASS4a
1857 1343RecB-f1468
Helicase1203
BH0338MTH1090LA3191
1421 RAMP1567
RAMP15833337RAMP
1583
RAMP1583
RAMP1769
SPy1049
Cas6
Cas61857
RecB-f1468
RAMP1688
Helicase1203
AF0070
R AMP1583
1343
1517
Cas12 Cas13
RAMP1688
RAMP1688
RAMP1688
Makarova et al NRM submitted
Experimental data on CRISPRCas systems
1857
1857
1857
RAMP1688
10
Cordip DIP0037Bacfra BF3951
Wolsuc WS1444Camjej Cj1522c
Neimen NMA0630Pasmul PM1126
Strthe str0658Strpyo Spy 0770
Treden TDE0328Staepi SERP2463
Mycgal MGA 0523Mycmob MMOB0320
Porgin PG1982Legpne lpp0161
Wolsuc WS1615Sulsol SSO1450
Aerper APE1240Sulsol SSO1405
Pyraer PAE0200Pyraer PAE0081
Metmaz MM0559Metmaz MM3249
Arcful AF1878Mansuc MS1635
Niteur NE0111Myctub Rv2817c
Nostoc alr0381Synech slr7071Thethe TTHB145
Chrvio CV1229Xanaxo XAC3842
Chltep CT1130Desvul DVUA0134
Metcap MCA0651Azoarc ebA3283
Deigeo_DRAFT_1682Mansuc MS0981
Baccla ABC3592Bachal BH0341
Strpyo Spy 1286Vibvul VVA1544
Nostoc alr1468Synech slr7092
Synech slr7016Geosul GSU0057
Thethe TTHB224Nostoc alr1568
Metkan MK1312
Esccol b2755Saltyp STM2938
Phopro PBPRC0034Geosul GSU1392
Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639
Symthe_STH663Corjei jk0643
Nocfar nfa44220Strave SAV7537
Metcap MCA0930Cordip DIP2214
Chrvio CV1756Zymmob ZMO0680
Pasmul PM0311Erwcar ECA3679
Yerpes y1722Legpne lpl2837Phopro PBPRB1995
Acinet ACIAD2484
Clotet CTC01148Fusnuc FN1177
Aquaeo aq 369Theten TTE2658Themar TM1797
Bacfra BF2544Porgin PG2014
Halmar pNG4053Clotet CTC01463
Metjan MJ0378Pyrhor PH1245
Thekod TK0455Metthe MTH1084
Arcful AF2435Metace MA3670
Nanequ NEQ017Pyrhor PH0173
Pictor PTO0003Pictor PTO0049Thevol TVN0106
1343
1343
1343
1343
1518
1518
1343
1518
RecB-f4343
RecB-f1468
RecB-f1468
RecB-f1468
Helicase12031857 RAMP
1688
RAMP1688
Helicase1203
Helicase1203
RAMPBH0337
RAMPYgcH
RAMPy1727
RAMPy17261857
Helicase1203
HD-f
Helicase1203
HD-f
1343
HD-f
HD-f
HD-f
RuvC-f
AF0070
BH0338
y1724
ygcKygcL
RAMP1583
HTH
AF1870
SPy1049
3574
AF1870
PH0918
HTH 3574
1857 RAMP1688
RecB-f1468
Helicase1203
HD-f
AF1870
MTH1090
AF1873
RAMP1583
3574
AF1870
3574
1343
1
7
2
3
4
5
6
7a
Sulsol SSO1999Sulsol SSO1440
Sultok ST2639Sultok ST0032
Sulsol SSO1402Pyraer PAE0068
Arcful AF1874Pyrhor PH0917Thekod TK0450
Pyraby PAB1689
13433513
3513
1518
RuvC-f
HNH
4aHNH
1 0
Pyrhor PH0162
6
64
6
4
16
17
7
7
7
7
1
2
7
7
7
7
5 7
5 75 7
5 75 7
5 7
6 1
7
Metkan MK1314
Metace MA1931
Metmaz MM3356
Metbar Mbar_A1355
Strthe stu0960Myctub Rv2823cStaepi SERP2461
Themar TM1811Mansuc MS1651
Niteur NE0123Thethe TTP0102
Metthe MTH1082
Metthe MTH326
Metjan MJ1672Pictor PTO0053Thevol TVN0112
Metkan MK1297Thethe TTP0115
Bachal BH0328Synech sll7090
Aquaeo aq 387Theten TTE2639
Themar TM1794Arcful AF1867
Pyrfur PF1129Thefus Tfu1578
Porgin PG1987Sultok ST1979
Sulsol SSO1991Sulsol SSO1729
Sultok ST0012Sulaci Saci 2046Nostoc all1479
1421 RAMP1567
RAMP1583
RAMPMTH323
RAMP1769
XMTH324
ldquoPolymeraserdquo cassette
ldquoHelicaserdquo cassette(A)
(B)
(C)
Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4
Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56
Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)
mRNA targeting
Phylogeny of Cas1 and the 3 types of CRISPRCas systems
228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes
EcoliCASS2Type IType I
Type I
Type I
Type I
Type III
Type III
Type III
Type III
Type III
Type III
Type II
YpestCASS3
NmeniCASS4
PolymeraseRAMPmoduleMtubeCASS6
PolRAMP
DvulgCASS1
TneapHmariCASS7
ApernCASS5
cas1cas9 cas2 cas4
cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI
II
IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2
RAMP
Makarova et al NRM submitted
CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes
54
256
1537
26
516
2
9
7 56107
3
10
13
383
210
46
216
8
1
7 70211
10
1
0102030405060708090
100
1623
84 6
17 7
2322
0
9
0
0
151
2
1
21
17
20 1
0
1533 28 7 14
0
9 7 40
117 210
0102030405060708090
100
Type I Type II Type IIICas1presenceabsence
Makarova et al in preparation
Modular evolution of the 3 types of CRISPRCas
Makarova et al in preparation
Mol Microbiol 2011 Jan79(2)484-502
A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair
Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF
Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair
Back to repair functions
The 3 major modalities of evolution
Koonin Wolf Biol Direct 2009
CRISPRCas as a bona fide Lamarckian system
Koonin Wolf Biol Direct 2009
Phenomenon Biological rolefunction
Phyletic spread
Lamarckian criteria
Genomic changes caused by environmental factor
Changes are specific to relevant genomic loci
Changes provide adaptation to the causative factor
Bona fide LamarckianCRISPRCas Defense against
viruses and other mobile elements
Most of the Archaeaand many bacteria
Yes Yes Yes
piRNA Defense against transposable elements in germline
Animals Yes Yes Yes
HGT (specific cases)
Adaptation to new environment stress response resistance
Archaea bacteria unicellular eukaryotes
Yes Yes Yes
Quasi-LamarckianHGT (general phenomenon)
Diverse innovations Archaea bacteria unicellular eukaryotes
Yes No Yesno
Stress-induced mutagenesis
Stress responseresistanceadaptation to new conditions
Ubiquitous Yes No or partially
Yes (but general evolvabilityenhanced as well)
Diverse Lamarckian and quasi-Lamarckian phenomena
Koonin Wolf Biol Direct 2009
Stress as a gauge of evolutionary modality
Koonin Wolf Biol Direct 2009
bullEvolution of parasites is intrinsic to any replicator system
bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since
bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms
bullPerennial arms race between parasites and hosts is one of the principal factors of evolution
Valerian DoljaOregon State University
Yuri Wolf NCBITatiana Senkevich NIAID NIH
Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille
Didier Raoult et al
Bill MartinUniv Duesseldorf
Kira Makarova NCBI
Chimeric origin of the virophage genome
NCLDV
Mimivirus
Archaeal virus
NCLDVenvironmental only
Bacterialtransposon
La Scola et al Nature 2008 455100-4
Hypothesisorigin of the NCLDV (and other viruses ofEukaryotes) in the
the second melting pot of virus evolution ndasheukaryogenesis
Phage scaffold(virus hallmark genes)
Eukaryotic additionsdisplacements
Koonin Yutin Intervirology 2010
Poliovirus74 kb
VP0 3DPol2C 3CPro Angg
IRESVP1VP3 2APro 2B
3A 3B
Jelly-roll CPsSuperfamily 3
helicase RdRp
Viral hallmark genes
3C-Pro3DPol
RdRp3CPro g
VPg2C
S3H
Picornaviral lsquosignaturersquo genes
Protein sequence conservation
Some of the smallest viruses (this is where poliovirus belongs) The Big Bang of picorna-like virus evolution
Koonin Wolf Nagasaki Dolja Nat Rev Microbiol 2008 Dec6(12)925-39
Picorna-likeviruses possessdiverse arraysof hallmark andunique genes
Picorna-likeviral superfamily
Marine eukaryotic plankton carries a wealth ofpositive-strand RNA viruses nearly all belong to the picorna-like superfamily
Lang et al (2004) Virology 320206
HcRNAV44 kb
VP
HaRNAV86 kb
S-Pro Pol
VP1VP3VP2PolHel3
RsRNAV89 kb AnPolHel3 VP 1-3
SssRNAV90 kb An
PolHel3 VP 1-3C-Pro
sg sg
An
Nagasaki et al (2005) Appl Env Microbiol 718888
Culley et al (2006) Science 3121795
The amended Picornavirus-like
superfamily includes 14 recognized viral families
4 floating genera and15 unclassified positive-
strand anddouble-strand RNA
viruses that infect hosts from 4 of the 5 eukaryotic
supergroups-6 distinct clades
from RdRp phylogeny-diverse genome layouts
View from the viral side
6 major clades of picorna-likeviruses - 5 infect eukaryotesfrom different supergroups
The 5 supergroups of eukaryotes and their picorna-like viruses
Complementary view from thehost side
4 of the 5 eukaryoticsupergroups host diverse picorna-like viruses (no onechecked the 5th)
PolioV74 kb
VP0 3DPol2C 3CPro Angg
IRESVP1VP3 2APro 2B
3A 3B
Jelly-roll CPsSuperfamily 3
helicase RdRp
DNA phages Bacterial group II
retroelements
Bacterial and mitochondrialHtrA family proteases
Likely origins of the signature genes of the picorna-like superfamily were inferred usingsignature PSSMs and PSI-BLAST searches
3C-Pro
Hallmark genes of picorna-like viruses have distinct prokaryoticorigins
Radiation of major viralclades occurred in aldquoBig Bangrdquo during eukaryogenesis and antedates the divergenceof eukaryotic supergroups-the viruses then ldquosampledrdquo the hosts
Conclusions on the picorna and NCLDV storiesBig Bangs of virus evolution
bullPhylogenomic analysis of the picorna-like superfamily of eukaryotic RNA viruses indicates that the major lineages within this superfamily diverged prior to the divergence of the eukaryotic supergroupsbullBig Bang ndash an explosive early phase of viral evolutionbullMost likely the same pattern holds for other major groups ofviruses as illustrated by the evolutionary study of NCLDVa completely different group of virusesbullThe Big Bangs of eukaryotic virus evolution occurred concomitantly with a similar rapid phase of host evolution andcould be a manifestation of a general model of major evolutionarytransitions
Koonin The Biological Big Bang model for the major transitions in evolutionBiol Direct 2007 Aug 20221
Koonin Wolf Nagasaki Dolja Nat Rev Microbiol 2008 Dec6(12)925-39
+
1 Positive-strandRNA
3-30 kb - +R R ET
T+R
Class Replication cycle
2 Double-strandRNA
4-25 kb plusmnTr R E
T
+ plusmn
3 Negative-strandRNA
11-20 kbR R E
RdRp CPT
+-Tr -
+
Genetic cycles of RNA viruses
Diversity of viral genetic cycles versus the uniform genetic cycle of all cellular organisms Viruses are the
biospherersquos laboratory of genomic strategies
RdRp CP
RdRp CP
+4 Retroid
RNA viruses 7-12 kb +RT Tr E
T+Tr
plusmn
5 Retroid DNAviruses
elements2-10 kb +Tr
T+Tr
plusmnE RT
plusmn
RT CP
RT
Class Replication cycle
Genetic cycles of retroid viruses and retroelements
Genetic cycles of DNA viruses and plasmids 6 ssDNAvirusesplasmids2-11 kb
7 dsDNA virusesplasmids
5-1200 kb
+RCR RCR E
+Tr
plusmn +
TRCE
Tr
plusmn
+T
plusmnR E
Pr-Pol ALL CELLULARORGANISMS
YOU ARE HERE
I Genes with readily detectable homologs from cellular life forms 1 Genes with closely related homologs from cellular organisms
(typically the host of the given virus) present in a narrow group of viruses2 Genes that are conserved within a virus lineage or even several
lineages and have moderately close cellular homologsOrigin relatively recent (1) or ancient (2) acquisition from host
II Virus-specific genes3 ORFans ie genes without detectable homologs except possibly
in closely related viruses 4 Virus-specific genes that are conserved within a virus lineageAcquisition from host but with rapid divergence from ancestor once within viral genomes
III Viral hallmark genes5 Genes shared by many diverse virus lineages with only very distant
homologs in cellular organisms
Natural history of viral genes a one-page summary of viral comparative genomics
Contributions of different classes of viral genes to the genomes of different classes of viruses strong dependence on genome size
Genome size (log) 0 1 2 3
Recent acquisitions
Old acquisitions
Virus-specific ORFans
Virus-specific conserved
Virus hallmark
Most RNAvirusesretroelementsRCR replicons
Large RNA virusesadenotailedphages
NCLDVherpeslargephages
Natural history of viral genesViral Hallmark Genes
Shared by many diverse groups of viruses from the smallest RNA viruses to the giant DNA viruses
Strong support for monophyly of all viral members of the respective gene families
Only distant homologs in cellular organisms
Can be viewed as signatures of the lsquovirus statersquo
Play major roles in genome replication packagingand assembly
4 Rolling circle replicationinitiation endonuclease
1 Jelly-roll capsid protein
2 Superfamily 3 helicase
Protein products of viral hallmark genes
3 RNA-dependent RNA polymeraseand Reverse transcriptase
5 Viral archaeo-eukaryotic DNA primase
6 UL9-like superfamily 2 helicase
7 Packaging ATPase of the FtsK family
8 ATPase subunit of terminase
The primordial gene pool hypothesis (extremely counterintuitive ndash Santiago Elena Feb 16 2011)
The hallmark genes AND by implication the major lineages of modern viruses (at least viruses of prokaryotes) descend directly from a primordial gene pool
Koonin Senkevich Dolja Biol Direct 2006 129
Viral hallmark genes arebullpresent in a huge diversity of viruses and other selfish elementsbullrepresented only by remote homologs in cellular life forms
-synergy with the diversity of genomic strategies
A crucial corollary If viruses come directly from a primordial gene pool then origin of viruses isinextricably linked to the origin of cells
Cell degeneration Escaped genes Pr imordial geneticsystems
CELL
SMALLPARASITICCELL
VERYSMALLVIRUS
CHROMOSOME
PLASMID
VIRUS
RNA
VIRUS
PRE-CELLULAR LIFE FORMS
DNA
VIRUS
mRNA
VIRUS
Competing concepts of the or igin of viruses
CELLS
Origin and evolution of virus-like genetic
elementsin the pre-cellular
era
Koonin Martin TIG 2005Koonin Senkevich Dolja Biol Direct 2006 129Koonin Ann NY Acad Sci 2009
Replicon fusionyields chromosomes
bullViruses and virus-like genetic elements are not ldquojustrdquo pathogens they aredominant entities in the biospherebullEmergence of virus-like parasites is inevitable in any replicating systembullIn the pre-cellular epoch the genetic elements that later became viral andcellular genomes comprised a single pool in which they mixed matched andevolved new increasingly complex gene ensemblesbullDifferent replication strategies including RNA replication reverse transcriptionand DNA replication evolved already in the primordial genetic poolbullWith the emergence of prokaryotic cells a distinct pool of viral genes formed that retained its identity ever since as evidenced by the extant distribution ofviral hallmark genes ldquovirus worldrdquo or the virospherebullThe emergence of the eukaryotic cell was a second melting pot of virus evolution from which viruses of eukaryotes originated via recombination of genes from prokaryote viruses retroelements and the evolving eukaryotic hostbullViruses make essential contributions to the evolution of the genomes of cellularlife forms in particular as vehicles of HGT GTAs transducing phages
The ancient Virus World (VW)
Koonin EV Senkevich TG Dolja VV The ancient Virus World and evolution of cells Biol Direct 2006 Koonin EV Wolf YI Nagasaki K Dolja VV The complexity of the virus world Nat Rev Microbiol 2009
Evolution of antivirus defense systems
bull CRISPRCas system of adaptive immunity in prokaryotes
bull A case for Lamarckian evolutionbull The perennial virus-host arms race
CRISPR repeats and Cas genesCRISPR Clustered Regularly interspaced short palindromic repeatsCas CRISPR-associated (genes)
Sorek et al Nature Rev Microbiol 2008
Makarova KS Aravind L Grishin NV Rogozin IB Koonin EVA DNA repair system specific for thermophilic Archaea and bacteria predicted by genomic context analysisNucleic Acids Res 2002 Jan 1530(2)482-96
During a systematic analysis of conserved gene context in prokaryotic genomes a previously undetected complex partially conserved neighborhood consisting of more than 20 genes was discovered in most Archaea and some bacteria including the hyperthermophiles Thermotoga maritima and Aquifex aeolicus The gene composition and gene order in this neighborhood vary greatly between species but all versions have a stable conserved core that consists of five genes One of the core genes encodes a predicted DNA helicase often fused to a predicted HD-superfamily hydrolase and another encodes a RecB family exonuclease three core genes remain uncharacterized but one of these might encode a nuclease of a new family helliphelliphelliphellipThe functional features of the proteins encoded in this neighborhood suggest that they comprise a previously undetected DNA repair system which to our knowledge is the first repair system largely specific for thermophiles to be identified
Protein components of the system an update with unification of many diverse families
~25 families altogetherFamily SubfamilyA Phyletic
distributionBComments
1 COG1518 COG1518 (cas1) All Putative novel nucleaseintegrase Mostly α-helical protein
2 COG1343 COG1343 (cas2) COG3512ygbF-like
MTH324-likey1723_N-like
All Small protein related to VapD fused to helicase (COG1203) iny1723-like proteins
3 COG1203 COG1203 (cas3) All DNA helicase Most proteins have fusion to HD nuclease
4 RecB-like nuclease
COG1468 (cas4) COG4343 All RecB-like nuclease Contains three-cysteine C-terminal cluster
5 RAMP Repair-associated mysterious
protein
COG1688 (cas5)COG1769 COG1583COG1567 COG1336COG1367 COG1604COG1337 COG1332COG5551
BH0337-likeMJ0978-likeYgcH-like
y1726-like y1727-like
All Belong to ldquoRAMPrdquo family possibly RNA-binding proteinstructurally related to duplicated ferredoxin fold (PDB 1wj9)
6 COG1857 COG1857 COG3649 YgcJ-like y1725-like
All αβ protein predicted nuclease or integrase
7 HD-like nuclease
COG1203 (N-terminus) COG2254 All HD-like nuclease
8 BH0338 BH0338-likeMTH1090-like
All mostlyarchaea and FIRM
Large Zn-finger containing proteins probably nucleases (nucleaseactivity was shown for MTH1090 (ref)
9 COG1353 predicted
polymerase
COG1353 Most archaea some bacteria
Predicted palm-domain polymerase distantly related to viral RdRp and RT
hellipand ~15 other less common proteins Makarova et al BD 2006
CRISPR clustered regularly interspaced short palindromic repeats
CRISPR repeats
TB and TA
Strepto-like
Ecoli-like
Pasteurella-like
Cas2COG1343
Cas1COG1518
Cas4COG1468
Cas3COG1203
ldquoThe common structural characteristics of CRISPR loci are (i) the presence of multiple short direct repeats which show no or very little sequence variation within a given locus (ii) the presence of non-repetitive spacer sequences between the repeats of similar size (iii) the presence of a common leader sequence of a few hundred basepairs in most species harbouring multiple CRISPR loci (iv) the absence of long open reading frames within the locus and (v) the presence of the cas1 gene accompanied by the cas2 cas3 or cas4 genes in CRISPR-containing speciesrdquo
The cas genes are our ldquorepairrdquo system
ldquoHelicaserdquo cassette
ldquoPolymeraserdquo cassette
paragon of prokaryotic genome plasticity-extensive gene shuffling-evidence of multiple horizontal transfers-widespread gene loss
RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins
extreme sequence diversity MOTIFs specific I II III IV V COGs 13361367 hhshhGs ust-lKGhh+hh hhGtt hD lGhttsGh 160413371332 y1726-like slhlpEKuVRGT lRTIDs YGuVTsGhuh COG1851 hGphpGpsaFh hGFGRh BH0337-like TpA-h+GIh-uIh hhLpDV LGsREhuht COG1567 hhhpp ups-lhtAhh luscoGhGh COG1769 hhhh+Ph-hhh ss-hhGhlhsh hGh hGtcp+hsthchp COG1688 (Cas5) hhhhhts sssshhGhlsh lGttphh COGs 15835551 hhhhoPhhl hGtppshGFGl YgcH-like hHphlh hGu+uhGhGhh y1727-like LHphLh GFsthGLStss MJ0978-like hHNH lG+tsuhGhGol
Ferredoxin foldRNA-recognition motif (RRM)
The sequence similarity space of CRISPR repeats visualized with the BioLayoutprogram [26] Dots denote individual repeat sequences connecting lines represent Smith-Waterman similarities such that closer dots represent more similar sequences Dot colors denote cluster association as derived from MCL clustering The 12 largest clusters are indicated by circles together with their sequence logos coarse phylogenetic composition and sample secondary structures where applicableKunin et al Genome Biology 2007 8R61 doi101186gb-2007-8-4-r61
CRISPR show extreme diversity and complex clustering
Arcful GTTGAAATC-------AGACCAAAATGGGATTGAAAG- Metthe GTTAAAATC-------AGACCAAAATGGGATTGAAAT- Pyraby GTTCCAATA-------AGACTAGAATAGAATTGAAAG- Pyrhor GTTTAATAA--------GACTAAAATAGAATTGAAAG- Sulsol GATTAATCC-------------CAAAAGAATTGAAAG- Pyrhor GTTTCCGTA--------GAACTAAATAGTGTGGAAAG- Pyraby GTTTCCGTA--------GAACTTAGTAGTGTGGAAAG- Pyraer GTTTCAACT------------ATCTTTTGATTTTTGG- Pictor GTTAAATAA-------TAACCTAAATAGGATTGAAAG- Theaer GTAAAATAG---------ACCTTAATAGGATTGAAAG- Metace GTTTCAATC-------CCTCAAAGGTCTGATTTTAAC- Sultok GATGAATCC------------CAAAAGGAATTGAAAG- Sulaci GTTTTAGTT-------------TCTTGTCGTTATTAC- Pictor GTTTAAGAA--------TTACTAGATAGTATGGAGT-- Halmar GTTTCAGAC-------GGACCCTTGTGGGGTTGAAGC- Metjan ATTAAAATC-------AGACCGTTTCGGAATGGAAAT- Metkan GTTTCATTA-CCCGTATTATTACGGGTTAATTGCGAG- Nanequ CTTTCAATA---------TTTCTAATATATTAGAAAC- Themar GTTTCAATA-------CTTCCTTAGAGGTATGGAAAC- Theten GTTTCAATC-------CCTTTTAGGTAGGCTAAAAAC Thethe GTTGCAAAC-CTCGTTAGCCTCGTAGAGGATTGAAAC- Aquaeo GTTTTAACT--------CCACACGGTACATTAGAAAC- Bachal GTCGCACTC----TATATGGGT-GCGTGGATTGAAAT- Azoarcus GTGTTCCCC-------GCGCATCGCGGGGGTTGAAG-- Chltep GTCTTCCCC--------ACGCC-CGTGGGGGTGTTTC- Chrvio GTGCTCCCC-------CACGCA-CGTGGGGATGAACCG Clotet GTATTAGTA-------GCACCA-TATTGGAATGTAAAT Anabaena GTTTTAATTAACAAAAATCCCTATCAGGGATTGAAAC- Myctub GTTTCCGTC--CCCTCTCGGGGTTTTGGGTCTGACGAC Yerpes GTTCACTGC--------CGCACAGGCAGCTTAGAAA-- Metcap GTTTCAATCCACTCCCGGCTATTTAGCCGGGAGATAC- Mycgal GTTTTAGCA-CTGTACAATACTTGTGTAAGCAATAAC- Pasmul GTTAACTGC--------CGTATAGGCAGCTTAGAAA-- Pasmul GTTGTAGTTCCCTCTCTCATTTCGCAGTGCTACAAT-- Pasmul GTTCACCAT--------CGTGTAGATGGCTTAGAAA--
All
A large subset of CRISPRs is conserved among diverse specieseven between archaea and bacteria suggesting that
CRISPRs are horizontally transferred together with cas genes
Mojica FJ Diez-Villasenor C Garcia-Martinez J Soria E
Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elementsJ Mol Evol 2005 Feb60(2)174-82
Here we show that CRISPR spacers derive from preexisting sequences either chromosomal or within transmissible genetic elements such as bacteriophages and conjugative plasmids
helliphelliphelliphellipThe transcription of the CRISPR loci (Tang et al 2002) suggests that such activity could be executed by CRISPR-RNA molecules acting as regulatory RNA that specifically recognizes the target through the homologous RNA-spacer sequence similarly to the eukaryotic interference RNA
HypothesisCRISPRCasbull is a prokaryotic immunity system that functions on the
RNAi principle
bull integrates short fragments of essential phageplasmid genes into CRISPRs
bull When expressed these fragments (psiRNA ndash after prokaryotic siRNA) silence the target gene and make the organism immune to the respective agent
bull contains all or most of the protein activities involved in these processes
bull Some of the Cas proteins are functional analogs of the eukaryotic proteins involved in RNAi in particular components of RISC (RNA-Induced Silencing Complex) and form prokaryotic analogs of RISC (pRISCs)
Transcription
(protein-guided) RNA folding
specific transcription factor (COG1517)
cellular RNApol
polycistronic pre-psiRNA
p-dicer (Helicase+HD-Hydrolase)(COG1203 COG2254)
RNA processing(fast)
pre-psiRNA75-100 nt
RNA processing(slow)
p-dicer
psiRNA25-45 nt
p-slicer(COG1468 COG4343
COG1857)
RAMP
target RNA cleavage
p-RISC
plasmid or phage mRNA
Annealing to RNA target
p-RISC
The basic scheme of CASS functioning
psiRNA25-45 nt
RAMP
RAMP-RNA complex(unstable)plasmid or phage
mRNA
Primer elongationCASS RNApol(COG1353)
Amplified annealing
long dsRNA (stable)
p-dicerDuplex degradation
RAMPRAMP binding
Cycle continues
Annealing to RNA target
Variant of CASS functioning with polymerasepsi-RNA amplification(mostly in thermophiles but also in Mycobacteria)
plasmid or phage mRNA
pre-psiRNA75-100 nt
CASS RNApol(COG1353)Or other RT
Reverse transcription with random copy choice
Random RNA recombination and reverse transcription
dsDNA with CRISPRs and target-derived spacers
Homologous recombination with genomic CRISPR region
integrase(COG1518)
genomic DNA
genomic DNA with new target-derived spacer
OR
New psiRNA generation
Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12
Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity
Key validation
bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease
Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes
(1) Adaptation
(2) Expression amp Processing
(3) Interference
invading virus plasmid
CRISPRleader
5 4 3 2 1
6 5 4 3 2 1
invading virus plasmid
invading virus plasmid
host
host
cas operon
Cascade
Cas3
cas3 cas2cas1cas5
Van der Oost et al TIBS 2009
The three types of CRISPRCas systems and their signature genes
1343
1343
1343
1343
Helicase1203
HD-f
Cas2
Cas6
Cas6
Cas5
Cas8
Cas9
Cas1
Cas3
Cas10
Cas11
Cas7
Cas6
Cas6
Cas5 Cas6
Cas4
Cas6
13433513
1343
RuvC-f
3513
HNH
13433513
1857
18571857 RAMP1688
RAMP1688
RecB-f1468
RecB-f1468
Helicase1203
R AMP1583
1857 1857Helicase1203
1857Helicase1203
1343 y1724
ygcKygcL
EcoliCASS2
YpestCASS3
NmeniCASS4
MtubeCASS6 RAMP module
DvulgCASS1
Tneap and HmariCASS7
ApernCASS5
Type I (Helicase cassette)
Cascade
Type II (HNH-type)
Type III (RAMP module or Polymerase cassette)
CASS4a
1857 1343RecB-f1468
Helicase1203
BH0338MTH1090LA3191
1421 RAMP1567
RAMP15833337RAMP
1583
RAMP1583
RAMP1769
SPy1049
Cas6
Cas61857
RecB-f1468
RAMP1688
Helicase1203
AF0070
R AMP1583
1343
1517
Cas12 Cas13
RAMP1688
RAMP1688
RAMP1688
Makarova et al NRM submitted
Experimental data on CRISPRCas systems
1857
1857
1857
RAMP1688
10
Cordip DIP0037Bacfra BF3951
Wolsuc WS1444Camjej Cj1522c
Neimen NMA0630Pasmul PM1126
Strthe str0658Strpyo Spy 0770
Treden TDE0328Staepi SERP2463
Mycgal MGA 0523Mycmob MMOB0320
Porgin PG1982Legpne lpp0161
Wolsuc WS1615Sulsol SSO1450
Aerper APE1240Sulsol SSO1405
Pyraer PAE0200Pyraer PAE0081
Metmaz MM0559Metmaz MM3249
Arcful AF1878Mansuc MS1635
Niteur NE0111Myctub Rv2817c
Nostoc alr0381Synech slr7071Thethe TTHB145
Chrvio CV1229Xanaxo XAC3842
Chltep CT1130Desvul DVUA0134
Metcap MCA0651Azoarc ebA3283
Deigeo_DRAFT_1682Mansuc MS0981
Baccla ABC3592Bachal BH0341
Strpyo Spy 1286Vibvul VVA1544
Nostoc alr1468Synech slr7092
Synech slr7016Geosul GSU0057
Thethe TTHB224Nostoc alr1568
Metkan MK1312
Esccol b2755Saltyp STM2938
Phopro PBPRC0034Geosul GSU1392
Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639
Symthe_STH663Corjei jk0643
Nocfar nfa44220Strave SAV7537
Metcap MCA0930Cordip DIP2214
Chrvio CV1756Zymmob ZMO0680
Pasmul PM0311Erwcar ECA3679
Yerpes y1722Legpne lpl2837Phopro PBPRB1995
Acinet ACIAD2484
Clotet CTC01148Fusnuc FN1177
Aquaeo aq 369Theten TTE2658Themar TM1797
Bacfra BF2544Porgin PG2014
Halmar pNG4053Clotet CTC01463
Metjan MJ0378Pyrhor PH1245
Thekod TK0455Metthe MTH1084
Arcful AF2435Metace MA3670
Nanequ NEQ017Pyrhor PH0173
Pictor PTO0003Pictor PTO0049Thevol TVN0106
1343
1343
1343
1343
1518
1518
1343
1518
RecB-f4343
RecB-f1468
RecB-f1468
RecB-f1468
Helicase12031857 RAMP
1688
RAMP1688
Helicase1203
Helicase1203
RAMPBH0337
RAMPYgcH
RAMPy1727
RAMPy17261857
Helicase1203
HD-f
Helicase1203
HD-f
1343
HD-f
HD-f
HD-f
RuvC-f
AF0070
BH0338
y1724
ygcKygcL
RAMP1583
HTH
AF1870
SPy1049
3574
AF1870
PH0918
HTH 3574
1857 RAMP1688
RecB-f1468
Helicase1203
HD-f
AF1870
MTH1090
AF1873
RAMP1583
3574
AF1870
3574
1343
1
7
2
3
4
5
6
7a
Sulsol SSO1999Sulsol SSO1440
Sultok ST2639Sultok ST0032
Sulsol SSO1402Pyraer PAE0068
Arcful AF1874Pyrhor PH0917Thekod TK0450
Pyraby PAB1689
13433513
3513
1518
RuvC-f
HNH
4aHNH
1 0
Pyrhor PH0162
6
64
6
4
16
17
7
7
7
7
1
2
7
7
7
7
5 7
5 75 7
5 75 7
5 7
6 1
7
Metkan MK1314
Metace MA1931
Metmaz MM3356
Metbar Mbar_A1355
Strthe stu0960Myctub Rv2823cStaepi SERP2461
Themar TM1811Mansuc MS1651
Niteur NE0123Thethe TTP0102
Metthe MTH1082
Metthe MTH326
Metjan MJ1672Pictor PTO0053Thevol TVN0112
Metkan MK1297Thethe TTP0115
Bachal BH0328Synech sll7090
Aquaeo aq 387Theten TTE2639
Themar TM1794Arcful AF1867
Pyrfur PF1129Thefus Tfu1578
Porgin PG1987Sultok ST1979
Sulsol SSO1991Sulsol SSO1729
Sultok ST0012Sulaci Saci 2046Nostoc all1479
1421 RAMP1567
RAMP1583
RAMPMTH323
RAMP1769
XMTH324
ldquoPolymeraserdquo cassette
ldquoHelicaserdquo cassette(A)
(B)
(C)
Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4
Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56
Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)
mRNA targeting
Phylogeny of Cas1 and the 3 types of CRISPRCas systems
228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes
EcoliCASS2Type IType I
Type I
Type I
Type I
Type III
Type III
Type III
Type III
Type III
Type III
Type II
YpestCASS3
NmeniCASS4
PolymeraseRAMPmoduleMtubeCASS6
PolRAMP
DvulgCASS1
TneapHmariCASS7
ApernCASS5
cas1cas9 cas2 cas4
cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI
II
IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2
RAMP
Makarova et al NRM submitted
CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes
54
256
1537
26
516
2
9
7 56107
3
10
13
383
210
46
216
8
1
7 70211
10
1
0102030405060708090
100
1623
84 6
17 7
2322
0
9
0
0
151
2
1
21
17
20 1
0
1533 28 7 14
0
9 7 40
117 210
0102030405060708090
100
Type I Type II Type IIICas1presenceabsence
Makarova et al in preparation
Modular evolution of the 3 types of CRISPRCas
Makarova et al in preparation
Mol Microbiol 2011 Jan79(2)484-502
A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair
Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF
Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair
Back to repair functions
The 3 major modalities of evolution
Koonin Wolf Biol Direct 2009
CRISPRCas as a bona fide Lamarckian system
Koonin Wolf Biol Direct 2009
Phenomenon Biological rolefunction
Phyletic spread
Lamarckian criteria
Genomic changes caused by environmental factor
Changes are specific to relevant genomic loci
Changes provide adaptation to the causative factor
Bona fide LamarckianCRISPRCas Defense against
viruses and other mobile elements
Most of the Archaeaand many bacteria
Yes Yes Yes
piRNA Defense against transposable elements in germline
Animals Yes Yes Yes
HGT (specific cases)
Adaptation to new environment stress response resistance
Archaea bacteria unicellular eukaryotes
Yes Yes Yes
Quasi-LamarckianHGT (general phenomenon)
Diverse innovations Archaea bacteria unicellular eukaryotes
Yes No Yesno
Stress-induced mutagenesis
Stress responseresistanceadaptation to new conditions
Ubiquitous Yes No or partially
Yes (but general evolvabilityenhanced as well)
Diverse Lamarckian and quasi-Lamarckian phenomena
Koonin Wolf Biol Direct 2009
Stress as a gauge of evolutionary modality
Koonin Wolf Biol Direct 2009
bullEvolution of parasites is intrinsic to any replicator system
bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since
bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms
bullPerennial arms race between parasites and hosts is one of the principal factors of evolution
Valerian DoljaOregon State University
Yuri Wolf NCBITatiana Senkevich NIAID NIH
Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille
Didier Raoult et al
Bill MartinUniv Duesseldorf
Kira Makarova NCBI
Hypothesisorigin of the NCLDV (and other viruses ofEukaryotes) in the
the second melting pot of virus evolution ndasheukaryogenesis
Phage scaffold(virus hallmark genes)
Eukaryotic additionsdisplacements
Koonin Yutin Intervirology 2010
Poliovirus74 kb
VP0 3DPol2C 3CPro Angg
IRESVP1VP3 2APro 2B
3A 3B
Jelly-roll CPsSuperfamily 3
helicase RdRp
Viral hallmark genes
3C-Pro3DPol
RdRp3CPro g
VPg2C
S3H
Picornaviral lsquosignaturersquo genes
Protein sequence conservation
Some of the smallest viruses (this is where poliovirus belongs) The Big Bang of picorna-like virus evolution
Koonin Wolf Nagasaki Dolja Nat Rev Microbiol 2008 Dec6(12)925-39
Picorna-likeviruses possessdiverse arraysof hallmark andunique genes
Picorna-likeviral superfamily
Marine eukaryotic plankton carries a wealth ofpositive-strand RNA viruses nearly all belong to the picorna-like superfamily
Lang et al (2004) Virology 320206
HcRNAV44 kb
VP
HaRNAV86 kb
S-Pro Pol
VP1VP3VP2PolHel3
RsRNAV89 kb AnPolHel3 VP 1-3
SssRNAV90 kb An
PolHel3 VP 1-3C-Pro
sg sg
An
Nagasaki et al (2005) Appl Env Microbiol 718888
Culley et al (2006) Science 3121795
The amended Picornavirus-like
superfamily includes 14 recognized viral families
4 floating genera and15 unclassified positive-
strand anddouble-strand RNA
viruses that infect hosts from 4 of the 5 eukaryotic
supergroups-6 distinct clades
from RdRp phylogeny-diverse genome layouts
View from the viral side
6 major clades of picorna-likeviruses - 5 infect eukaryotesfrom different supergroups
The 5 supergroups of eukaryotes and their picorna-like viruses
Complementary view from thehost side
4 of the 5 eukaryoticsupergroups host diverse picorna-like viruses (no onechecked the 5th)
PolioV74 kb
VP0 3DPol2C 3CPro Angg
IRESVP1VP3 2APro 2B
3A 3B
Jelly-roll CPsSuperfamily 3
helicase RdRp
DNA phages Bacterial group II
retroelements
Bacterial and mitochondrialHtrA family proteases
Likely origins of the signature genes of the picorna-like superfamily were inferred usingsignature PSSMs and PSI-BLAST searches
3C-Pro
Hallmark genes of picorna-like viruses have distinct prokaryoticorigins
Radiation of major viralclades occurred in aldquoBig Bangrdquo during eukaryogenesis and antedates the divergenceof eukaryotic supergroups-the viruses then ldquosampledrdquo the hosts
Conclusions on the picorna and NCLDV storiesBig Bangs of virus evolution
bullPhylogenomic analysis of the picorna-like superfamily of eukaryotic RNA viruses indicates that the major lineages within this superfamily diverged prior to the divergence of the eukaryotic supergroupsbullBig Bang ndash an explosive early phase of viral evolutionbullMost likely the same pattern holds for other major groups ofviruses as illustrated by the evolutionary study of NCLDVa completely different group of virusesbullThe Big Bangs of eukaryotic virus evolution occurred concomitantly with a similar rapid phase of host evolution andcould be a manifestation of a general model of major evolutionarytransitions
Koonin The Biological Big Bang model for the major transitions in evolutionBiol Direct 2007 Aug 20221
Koonin Wolf Nagasaki Dolja Nat Rev Microbiol 2008 Dec6(12)925-39
+
1 Positive-strandRNA
3-30 kb - +R R ET
T+R
Class Replication cycle
2 Double-strandRNA
4-25 kb plusmnTr R E
T
+ plusmn
3 Negative-strandRNA
11-20 kbR R E
RdRp CPT
+-Tr -
+
Genetic cycles of RNA viruses
Diversity of viral genetic cycles versus the uniform genetic cycle of all cellular organisms Viruses are the
biospherersquos laboratory of genomic strategies
RdRp CP
RdRp CP
+4 Retroid
RNA viruses 7-12 kb +RT Tr E
T+Tr
plusmn
5 Retroid DNAviruses
elements2-10 kb +Tr
T+Tr
plusmnE RT
plusmn
RT CP
RT
Class Replication cycle
Genetic cycles of retroid viruses and retroelements
Genetic cycles of DNA viruses and plasmids 6 ssDNAvirusesplasmids2-11 kb
7 dsDNA virusesplasmids
5-1200 kb
+RCR RCR E
+Tr
plusmn +
TRCE
Tr
plusmn
+T
plusmnR E
Pr-Pol ALL CELLULARORGANISMS
YOU ARE HERE
I Genes with readily detectable homologs from cellular life forms 1 Genes with closely related homologs from cellular organisms
(typically the host of the given virus) present in a narrow group of viruses2 Genes that are conserved within a virus lineage or even several
lineages and have moderately close cellular homologsOrigin relatively recent (1) or ancient (2) acquisition from host
II Virus-specific genes3 ORFans ie genes without detectable homologs except possibly
in closely related viruses 4 Virus-specific genes that are conserved within a virus lineageAcquisition from host but with rapid divergence from ancestor once within viral genomes
III Viral hallmark genes5 Genes shared by many diverse virus lineages with only very distant
homologs in cellular organisms
Natural history of viral genes a one-page summary of viral comparative genomics
Contributions of different classes of viral genes to the genomes of different classes of viruses strong dependence on genome size
Genome size (log) 0 1 2 3
Recent acquisitions
Old acquisitions
Virus-specific ORFans
Virus-specific conserved
Virus hallmark
Most RNAvirusesretroelementsRCR replicons
Large RNA virusesadenotailedphages
NCLDVherpeslargephages
Natural history of viral genesViral Hallmark Genes
Shared by many diverse groups of viruses from the smallest RNA viruses to the giant DNA viruses
Strong support for monophyly of all viral members of the respective gene families
Only distant homologs in cellular organisms
Can be viewed as signatures of the lsquovirus statersquo
Play major roles in genome replication packagingand assembly
4 Rolling circle replicationinitiation endonuclease
1 Jelly-roll capsid protein
2 Superfamily 3 helicase
Protein products of viral hallmark genes
3 RNA-dependent RNA polymeraseand Reverse transcriptase
5 Viral archaeo-eukaryotic DNA primase
6 UL9-like superfamily 2 helicase
7 Packaging ATPase of the FtsK family
8 ATPase subunit of terminase
The primordial gene pool hypothesis (extremely counterintuitive ndash Santiago Elena Feb 16 2011)
The hallmark genes AND by implication the major lineages of modern viruses (at least viruses of prokaryotes) descend directly from a primordial gene pool
Koonin Senkevich Dolja Biol Direct 2006 129
Viral hallmark genes arebullpresent in a huge diversity of viruses and other selfish elementsbullrepresented only by remote homologs in cellular life forms
-synergy with the diversity of genomic strategies
A crucial corollary If viruses come directly from a primordial gene pool then origin of viruses isinextricably linked to the origin of cells
Cell degeneration Escaped genes Pr imordial geneticsystems
CELL
SMALLPARASITICCELL
VERYSMALLVIRUS
CHROMOSOME
PLASMID
VIRUS
RNA
VIRUS
PRE-CELLULAR LIFE FORMS
DNA
VIRUS
mRNA
VIRUS
Competing concepts of the or igin of viruses
CELLS
Origin and evolution of virus-like genetic
elementsin the pre-cellular
era
Koonin Martin TIG 2005Koonin Senkevich Dolja Biol Direct 2006 129Koonin Ann NY Acad Sci 2009
Replicon fusionyields chromosomes
bullViruses and virus-like genetic elements are not ldquojustrdquo pathogens they aredominant entities in the biospherebullEmergence of virus-like parasites is inevitable in any replicating systembullIn the pre-cellular epoch the genetic elements that later became viral andcellular genomes comprised a single pool in which they mixed matched andevolved new increasingly complex gene ensemblesbullDifferent replication strategies including RNA replication reverse transcriptionand DNA replication evolved already in the primordial genetic poolbullWith the emergence of prokaryotic cells a distinct pool of viral genes formed that retained its identity ever since as evidenced by the extant distribution ofviral hallmark genes ldquovirus worldrdquo or the virospherebullThe emergence of the eukaryotic cell was a second melting pot of virus evolution from which viruses of eukaryotes originated via recombination of genes from prokaryote viruses retroelements and the evolving eukaryotic hostbullViruses make essential contributions to the evolution of the genomes of cellularlife forms in particular as vehicles of HGT GTAs transducing phages
The ancient Virus World (VW)
Koonin EV Senkevich TG Dolja VV The ancient Virus World and evolution of cells Biol Direct 2006 Koonin EV Wolf YI Nagasaki K Dolja VV The complexity of the virus world Nat Rev Microbiol 2009
Evolution of antivirus defense systems
bull CRISPRCas system of adaptive immunity in prokaryotes
bull A case for Lamarckian evolutionbull The perennial virus-host arms race
CRISPR repeats and Cas genesCRISPR Clustered Regularly interspaced short palindromic repeatsCas CRISPR-associated (genes)
Sorek et al Nature Rev Microbiol 2008
Makarova KS Aravind L Grishin NV Rogozin IB Koonin EVA DNA repair system specific for thermophilic Archaea and bacteria predicted by genomic context analysisNucleic Acids Res 2002 Jan 1530(2)482-96
During a systematic analysis of conserved gene context in prokaryotic genomes a previously undetected complex partially conserved neighborhood consisting of more than 20 genes was discovered in most Archaea and some bacteria including the hyperthermophiles Thermotoga maritima and Aquifex aeolicus The gene composition and gene order in this neighborhood vary greatly between species but all versions have a stable conserved core that consists of five genes One of the core genes encodes a predicted DNA helicase often fused to a predicted HD-superfamily hydrolase and another encodes a RecB family exonuclease three core genes remain uncharacterized but one of these might encode a nuclease of a new family helliphelliphelliphellipThe functional features of the proteins encoded in this neighborhood suggest that they comprise a previously undetected DNA repair system which to our knowledge is the first repair system largely specific for thermophiles to be identified
Protein components of the system an update with unification of many diverse families
~25 families altogetherFamily SubfamilyA Phyletic
distributionBComments
1 COG1518 COG1518 (cas1) All Putative novel nucleaseintegrase Mostly α-helical protein
2 COG1343 COG1343 (cas2) COG3512ygbF-like
MTH324-likey1723_N-like
All Small protein related to VapD fused to helicase (COG1203) iny1723-like proteins
3 COG1203 COG1203 (cas3) All DNA helicase Most proteins have fusion to HD nuclease
4 RecB-like nuclease
COG1468 (cas4) COG4343 All RecB-like nuclease Contains three-cysteine C-terminal cluster
5 RAMP Repair-associated mysterious
protein
COG1688 (cas5)COG1769 COG1583COG1567 COG1336COG1367 COG1604COG1337 COG1332COG5551
BH0337-likeMJ0978-likeYgcH-like
y1726-like y1727-like
All Belong to ldquoRAMPrdquo family possibly RNA-binding proteinstructurally related to duplicated ferredoxin fold (PDB 1wj9)
6 COG1857 COG1857 COG3649 YgcJ-like y1725-like
All αβ protein predicted nuclease or integrase
7 HD-like nuclease
COG1203 (N-terminus) COG2254 All HD-like nuclease
8 BH0338 BH0338-likeMTH1090-like
All mostlyarchaea and FIRM
Large Zn-finger containing proteins probably nucleases (nucleaseactivity was shown for MTH1090 (ref)
9 COG1353 predicted
polymerase
COG1353 Most archaea some bacteria
Predicted palm-domain polymerase distantly related to viral RdRp and RT
hellipand ~15 other less common proteins Makarova et al BD 2006
CRISPR clustered regularly interspaced short palindromic repeats
CRISPR repeats
TB and TA
Strepto-like
Ecoli-like
Pasteurella-like
Cas2COG1343
Cas1COG1518
Cas4COG1468
Cas3COG1203
ldquoThe common structural characteristics of CRISPR loci are (i) the presence of multiple short direct repeats which show no or very little sequence variation within a given locus (ii) the presence of non-repetitive spacer sequences between the repeats of similar size (iii) the presence of a common leader sequence of a few hundred basepairs in most species harbouring multiple CRISPR loci (iv) the absence of long open reading frames within the locus and (v) the presence of the cas1 gene accompanied by the cas2 cas3 or cas4 genes in CRISPR-containing speciesrdquo
The cas genes are our ldquorepairrdquo system
ldquoHelicaserdquo cassette
ldquoPolymeraserdquo cassette
paragon of prokaryotic genome plasticity-extensive gene shuffling-evidence of multiple horizontal transfers-widespread gene loss
RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins
extreme sequence diversity MOTIFs specific I II III IV V COGs 13361367 hhshhGs ust-lKGhh+hh hhGtt hD lGhttsGh 160413371332 y1726-like slhlpEKuVRGT lRTIDs YGuVTsGhuh COG1851 hGphpGpsaFh hGFGRh BH0337-like TpA-h+GIh-uIh hhLpDV LGsREhuht COG1567 hhhpp ups-lhtAhh luscoGhGh COG1769 hhhh+Ph-hhh ss-hhGhlhsh hGh hGtcp+hsthchp COG1688 (Cas5) hhhhhts sssshhGhlsh lGttphh COGs 15835551 hhhhoPhhl hGtppshGFGl YgcH-like hHphlh hGu+uhGhGhh y1727-like LHphLh GFsthGLStss MJ0978-like hHNH lG+tsuhGhGol
Ferredoxin foldRNA-recognition motif (RRM)
The sequence similarity space of CRISPR repeats visualized with the BioLayoutprogram [26] Dots denote individual repeat sequences connecting lines represent Smith-Waterman similarities such that closer dots represent more similar sequences Dot colors denote cluster association as derived from MCL clustering The 12 largest clusters are indicated by circles together with their sequence logos coarse phylogenetic composition and sample secondary structures where applicableKunin et al Genome Biology 2007 8R61 doi101186gb-2007-8-4-r61
CRISPR show extreme diversity and complex clustering
Arcful GTTGAAATC-------AGACCAAAATGGGATTGAAAG- Metthe GTTAAAATC-------AGACCAAAATGGGATTGAAAT- Pyraby GTTCCAATA-------AGACTAGAATAGAATTGAAAG- Pyrhor GTTTAATAA--------GACTAAAATAGAATTGAAAG- Sulsol GATTAATCC-------------CAAAAGAATTGAAAG- Pyrhor GTTTCCGTA--------GAACTAAATAGTGTGGAAAG- Pyraby GTTTCCGTA--------GAACTTAGTAGTGTGGAAAG- Pyraer GTTTCAACT------------ATCTTTTGATTTTTGG- Pictor GTTAAATAA-------TAACCTAAATAGGATTGAAAG- Theaer GTAAAATAG---------ACCTTAATAGGATTGAAAG- Metace GTTTCAATC-------CCTCAAAGGTCTGATTTTAAC- Sultok GATGAATCC------------CAAAAGGAATTGAAAG- Sulaci GTTTTAGTT-------------TCTTGTCGTTATTAC- Pictor GTTTAAGAA--------TTACTAGATAGTATGGAGT-- Halmar GTTTCAGAC-------GGACCCTTGTGGGGTTGAAGC- Metjan ATTAAAATC-------AGACCGTTTCGGAATGGAAAT- Metkan GTTTCATTA-CCCGTATTATTACGGGTTAATTGCGAG- Nanequ CTTTCAATA---------TTTCTAATATATTAGAAAC- Themar GTTTCAATA-------CTTCCTTAGAGGTATGGAAAC- Theten GTTTCAATC-------CCTTTTAGGTAGGCTAAAAAC Thethe GTTGCAAAC-CTCGTTAGCCTCGTAGAGGATTGAAAC- Aquaeo GTTTTAACT--------CCACACGGTACATTAGAAAC- Bachal GTCGCACTC----TATATGGGT-GCGTGGATTGAAAT- Azoarcus GTGTTCCCC-------GCGCATCGCGGGGGTTGAAG-- Chltep GTCTTCCCC--------ACGCC-CGTGGGGGTGTTTC- Chrvio GTGCTCCCC-------CACGCA-CGTGGGGATGAACCG Clotet GTATTAGTA-------GCACCA-TATTGGAATGTAAAT Anabaena GTTTTAATTAACAAAAATCCCTATCAGGGATTGAAAC- Myctub GTTTCCGTC--CCCTCTCGGGGTTTTGGGTCTGACGAC Yerpes GTTCACTGC--------CGCACAGGCAGCTTAGAAA-- Metcap GTTTCAATCCACTCCCGGCTATTTAGCCGGGAGATAC- Mycgal GTTTTAGCA-CTGTACAATACTTGTGTAAGCAATAAC- Pasmul GTTAACTGC--------CGTATAGGCAGCTTAGAAA-- Pasmul GTTGTAGTTCCCTCTCTCATTTCGCAGTGCTACAAT-- Pasmul GTTCACCAT--------CGTGTAGATGGCTTAGAAA--
All
A large subset of CRISPRs is conserved among diverse specieseven between archaea and bacteria suggesting that
CRISPRs are horizontally transferred together with cas genes
Mojica FJ Diez-Villasenor C Garcia-Martinez J Soria E
Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elementsJ Mol Evol 2005 Feb60(2)174-82
Here we show that CRISPR spacers derive from preexisting sequences either chromosomal or within transmissible genetic elements such as bacteriophages and conjugative plasmids
helliphelliphelliphellipThe transcription of the CRISPR loci (Tang et al 2002) suggests that such activity could be executed by CRISPR-RNA molecules acting as regulatory RNA that specifically recognizes the target through the homologous RNA-spacer sequence similarly to the eukaryotic interference RNA
HypothesisCRISPRCasbull is a prokaryotic immunity system that functions on the
RNAi principle
bull integrates short fragments of essential phageplasmid genes into CRISPRs
bull When expressed these fragments (psiRNA ndash after prokaryotic siRNA) silence the target gene and make the organism immune to the respective agent
bull contains all or most of the protein activities involved in these processes
bull Some of the Cas proteins are functional analogs of the eukaryotic proteins involved in RNAi in particular components of RISC (RNA-Induced Silencing Complex) and form prokaryotic analogs of RISC (pRISCs)
Transcription
(protein-guided) RNA folding
specific transcription factor (COG1517)
cellular RNApol
polycistronic pre-psiRNA
p-dicer (Helicase+HD-Hydrolase)(COG1203 COG2254)
RNA processing(fast)
pre-psiRNA75-100 nt
RNA processing(slow)
p-dicer
psiRNA25-45 nt
p-slicer(COG1468 COG4343
COG1857)
RAMP
target RNA cleavage
p-RISC
plasmid or phage mRNA
Annealing to RNA target
p-RISC
The basic scheme of CASS functioning
psiRNA25-45 nt
RAMP
RAMP-RNA complex(unstable)plasmid or phage
mRNA
Primer elongationCASS RNApol(COG1353)
Amplified annealing
long dsRNA (stable)
p-dicerDuplex degradation
RAMPRAMP binding
Cycle continues
Annealing to RNA target
Variant of CASS functioning with polymerasepsi-RNA amplification(mostly in thermophiles but also in Mycobacteria)
plasmid or phage mRNA
pre-psiRNA75-100 nt
CASS RNApol(COG1353)Or other RT
Reverse transcription with random copy choice
Random RNA recombination and reverse transcription
dsDNA with CRISPRs and target-derived spacers
Homologous recombination with genomic CRISPR region
integrase(COG1518)
genomic DNA
genomic DNA with new target-derived spacer
OR
New psiRNA generation
Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12
Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity
Key validation
bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease
Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes
(1) Adaptation
(2) Expression amp Processing
(3) Interference
invading virus plasmid
CRISPRleader
5 4 3 2 1
6 5 4 3 2 1
invading virus plasmid
invading virus plasmid
host
host
cas operon
Cascade
Cas3
cas3 cas2cas1cas5
Van der Oost et al TIBS 2009
The three types of CRISPRCas systems and their signature genes
1343
1343
1343
1343
Helicase1203
HD-f
Cas2
Cas6
Cas6
Cas5
Cas8
Cas9
Cas1
Cas3
Cas10
Cas11
Cas7
Cas6
Cas6
Cas5 Cas6
Cas4
Cas6
13433513
1343
RuvC-f
3513
HNH
13433513
1857
18571857 RAMP1688
RAMP1688
RecB-f1468
RecB-f1468
Helicase1203
R AMP1583
1857 1857Helicase1203
1857Helicase1203
1343 y1724
ygcKygcL
EcoliCASS2
YpestCASS3
NmeniCASS4
MtubeCASS6 RAMP module
DvulgCASS1
Tneap and HmariCASS7
ApernCASS5
Type I (Helicase cassette)
Cascade
Type II (HNH-type)
Type III (RAMP module or Polymerase cassette)
CASS4a
1857 1343RecB-f1468
Helicase1203
BH0338MTH1090LA3191
1421 RAMP1567
RAMP15833337RAMP
1583
RAMP1583
RAMP1769
SPy1049
Cas6
Cas61857
RecB-f1468
RAMP1688
Helicase1203
AF0070
R AMP1583
1343
1517
Cas12 Cas13
RAMP1688
RAMP1688
RAMP1688
Makarova et al NRM submitted
Experimental data on CRISPRCas systems
1857
1857
1857
RAMP1688
10
Cordip DIP0037Bacfra BF3951
Wolsuc WS1444Camjej Cj1522c
Neimen NMA0630Pasmul PM1126
Strthe str0658Strpyo Spy 0770
Treden TDE0328Staepi SERP2463
Mycgal MGA 0523Mycmob MMOB0320
Porgin PG1982Legpne lpp0161
Wolsuc WS1615Sulsol SSO1450
Aerper APE1240Sulsol SSO1405
Pyraer PAE0200Pyraer PAE0081
Metmaz MM0559Metmaz MM3249
Arcful AF1878Mansuc MS1635
Niteur NE0111Myctub Rv2817c
Nostoc alr0381Synech slr7071Thethe TTHB145
Chrvio CV1229Xanaxo XAC3842
Chltep CT1130Desvul DVUA0134
Metcap MCA0651Azoarc ebA3283
Deigeo_DRAFT_1682Mansuc MS0981
Baccla ABC3592Bachal BH0341
Strpyo Spy 1286Vibvul VVA1544
Nostoc alr1468Synech slr7092
Synech slr7016Geosul GSU0057
Thethe TTHB224Nostoc alr1568
Metkan MK1312
Esccol b2755Saltyp STM2938
Phopro PBPRC0034Geosul GSU1392
Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639
Symthe_STH663Corjei jk0643
Nocfar nfa44220Strave SAV7537
Metcap MCA0930Cordip DIP2214
Chrvio CV1756Zymmob ZMO0680
Pasmul PM0311Erwcar ECA3679
Yerpes y1722Legpne lpl2837Phopro PBPRB1995
Acinet ACIAD2484
Clotet CTC01148Fusnuc FN1177
Aquaeo aq 369Theten TTE2658Themar TM1797
Bacfra BF2544Porgin PG2014
Halmar pNG4053Clotet CTC01463
Metjan MJ0378Pyrhor PH1245
Thekod TK0455Metthe MTH1084
Arcful AF2435Metace MA3670
Nanequ NEQ017Pyrhor PH0173
Pictor PTO0003Pictor PTO0049Thevol TVN0106
1343
1343
1343
1343
1518
1518
1343
1518
RecB-f4343
RecB-f1468
RecB-f1468
RecB-f1468
Helicase12031857 RAMP
1688
RAMP1688
Helicase1203
Helicase1203
RAMPBH0337
RAMPYgcH
RAMPy1727
RAMPy17261857
Helicase1203
HD-f
Helicase1203
HD-f
1343
HD-f
HD-f
HD-f
RuvC-f
AF0070
BH0338
y1724
ygcKygcL
RAMP1583
HTH
AF1870
SPy1049
3574
AF1870
PH0918
HTH 3574
1857 RAMP1688
RecB-f1468
Helicase1203
HD-f
AF1870
MTH1090
AF1873
RAMP1583
3574
AF1870
3574
1343
1
7
2
3
4
5
6
7a
Sulsol SSO1999Sulsol SSO1440
Sultok ST2639Sultok ST0032
Sulsol SSO1402Pyraer PAE0068
Arcful AF1874Pyrhor PH0917Thekod TK0450
Pyraby PAB1689
13433513
3513
1518
RuvC-f
HNH
4aHNH
1 0
Pyrhor PH0162
6
64
6
4
16
17
7
7
7
7
1
2
7
7
7
7
5 7
5 75 7
5 75 7
5 7
6 1
7
Metkan MK1314
Metace MA1931
Metmaz MM3356
Metbar Mbar_A1355
Strthe stu0960Myctub Rv2823cStaepi SERP2461
Themar TM1811Mansuc MS1651
Niteur NE0123Thethe TTP0102
Metthe MTH1082
Metthe MTH326
Metjan MJ1672Pictor PTO0053Thevol TVN0112
Metkan MK1297Thethe TTP0115
Bachal BH0328Synech sll7090
Aquaeo aq 387Theten TTE2639
Themar TM1794Arcful AF1867
Pyrfur PF1129Thefus Tfu1578
Porgin PG1987Sultok ST1979
Sulsol SSO1991Sulsol SSO1729
Sultok ST0012Sulaci Saci 2046Nostoc all1479
1421 RAMP1567
RAMP1583
RAMPMTH323
RAMP1769
XMTH324
ldquoPolymeraserdquo cassette
ldquoHelicaserdquo cassette(A)
(B)
(C)
Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4
Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56
Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)
mRNA targeting
Phylogeny of Cas1 and the 3 types of CRISPRCas systems
228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes
EcoliCASS2Type IType I
Type I
Type I
Type I
Type III
Type III
Type III
Type III
Type III
Type III
Type II
YpestCASS3
NmeniCASS4
PolymeraseRAMPmoduleMtubeCASS6
PolRAMP
DvulgCASS1
TneapHmariCASS7
ApernCASS5
cas1cas9 cas2 cas4
cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI
II
IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2
RAMP
Makarova et al NRM submitted
CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes
54
256
1537
26
516
2
9
7 56107
3
10
13
383
210
46
216
8
1
7 70211
10
1
0102030405060708090
100
1623
84 6
17 7
2322
0
9
0
0
151
2
1
21
17
20 1
0
1533 28 7 14
0
9 7 40
117 210
0102030405060708090
100
Type I Type II Type IIICas1presenceabsence
Makarova et al in preparation
Modular evolution of the 3 types of CRISPRCas
Makarova et al in preparation
Mol Microbiol 2011 Jan79(2)484-502
A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair
Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF
Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair
Back to repair functions
The 3 major modalities of evolution
Koonin Wolf Biol Direct 2009
CRISPRCas as a bona fide Lamarckian system
Koonin Wolf Biol Direct 2009
Phenomenon Biological rolefunction
Phyletic spread
Lamarckian criteria
Genomic changes caused by environmental factor
Changes are specific to relevant genomic loci
Changes provide adaptation to the causative factor
Bona fide LamarckianCRISPRCas Defense against
viruses and other mobile elements
Most of the Archaeaand many bacteria
Yes Yes Yes
piRNA Defense against transposable elements in germline
Animals Yes Yes Yes
HGT (specific cases)
Adaptation to new environment stress response resistance
Archaea bacteria unicellular eukaryotes
Yes Yes Yes
Quasi-LamarckianHGT (general phenomenon)
Diverse innovations Archaea bacteria unicellular eukaryotes
Yes No Yesno
Stress-induced mutagenesis
Stress responseresistanceadaptation to new conditions
Ubiquitous Yes No or partially
Yes (but general evolvabilityenhanced as well)
Diverse Lamarckian and quasi-Lamarckian phenomena
Koonin Wolf Biol Direct 2009
Stress as a gauge of evolutionary modality
Koonin Wolf Biol Direct 2009
bullEvolution of parasites is intrinsic to any replicator system
bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since
bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms
bullPerennial arms race between parasites and hosts is one of the principal factors of evolution
Valerian DoljaOregon State University
Yuri Wolf NCBITatiana Senkevich NIAID NIH
Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille
Didier Raoult et al
Bill MartinUniv Duesseldorf
Kira Makarova NCBI
Poliovirus74 kb
VP0 3DPol2C 3CPro Angg
IRESVP1VP3 2APro 2B
3A 3B
Jelly-roll CPsSuperfamily 3
helicase RdRp
Viral hallmark genes
3C-Pro3DPol
RdRp3CPro g
VPg2C
S3H
Picornaviral lsquosignaturersquo genes
Protein sequence conservation
Some of the smallest viruses (this is where poliovirus belongs) The Big Bang of picorna-like virus evolution
Koonin Wolf Nagasaki Dolja Nat Rev Microbiol 2008 Dec6(12)925-39
Picorna-likeviruses possessdiverse arraysof hallmark andunique genes
Picorna-likeviral superfamily
Marine eukaryotic plankton carries a wealth ofpositive-strand RNA viruses nearly all belong to the picorna-like superfamily
Lang et al (2004) Virology 320206
HcRNAV44 kb
VP
HaRNAV86 kb
S-Pro Pol
VP1VP3VP2PolHel3
RsRNAV89 kb AnPolHel3 VP 1-3
SssRNAV90 kb An
PolHel3 VP 1-3C-Pro
sg sg
An
Nagasaki et al (2005) Appl Env Microbiol 718888
Culley et al (2006) Science 3121795
The amended Picornavirus-like
superfamily includes 14 recognized viral families
4 floating genera and15 unclassified positive-
strand anddouble-strand RNA
viruses that infect hosts from 4 of the 5 eukaryotic
supergroups-6 distinct clades
from RdRp phylogeny-diverse genome layouts
View from the viral side
6 major clades of picorna-likeviruses - 5 infect eukaryotesfrom different supergroups
The 5 supergroups of eukaryotes and their picorna-like viruses
Complementary view from thehost side
4 of the 5 eukaryoticsupergroups host diverse picorna-like viruses (no onechecked the 5th)
PolioV74 kb
VP0 3DPol2C 3CPro Angg
IRESVP1VP3 2APro 2B
3A 3B
Jelly-roll CPsSuperfamily 3
helicase RdRp
DNA phages Bacterial group II
retroelements
Bacterial and mitochondrialHtrA family proteases
Likely origins of the signature genes of the picorna-like superfamily were inferred usingsignature PSSMs and PSI-BLAST searches
3C-Pro
Hallmark genes of picorna-like viruses have distinct prokaryoticorigins
Radiation of major viralclades occurred in aldquoBig Bangrdquo during eukaryogenesis and antedates the divergenceof eukaryotic supergroups-the viruses then ldquosampledrdquo the hosts
Conclusions on the picorna and NCLDV storiesBig Bangs of virus evolution
bullPhylogenomic analysis of the picorna-like superfamily of eukaryotic RNA viruses indicates that the major lineages within this superfamily diverged prior to the divergence of the eukaryotic supergroupsbullBig Bang ndash an explosive early phase of viral evolutionbullMost likely the same pattern holds for other major groups ofviruses as illustrated by the evolutionary study of NCLDVa completely different group of virusesbullThe Big Bangs of eukaryotic virus evolution occurred concomitantly with a similar rapid phase of host evolution andcould be a manifestation of a general model of major evolutionarytransitions
Koonin The Biological Big Bang model for the major transitions in evolutionBiol Direct 2007 Aug 20221
Koonin Wolf Nagasaki Dolja Nat Rev Microbiol 2008 Dec6(12)925-39
+
1 Positive-strandRNA
3-30 kb - +R R ET
T+R
Class Replication cycle
2 Double-strandRNA
4-25 kb plusmnTr R E
T
+ plusmn
3 Negative-strandRNA
11-20 kbR R E
RdRp CPT
+-Tr -
+
Genetic cycles of RNA viruses
Diversity of viral genetic cycles versus the uniform genetic cycle of all cellular organisms Viruses are the
biospherersquos laboratory of genomic strategies
RdRp CP
RdRp CP
+4 Retroid
RNA viruses 7-12 kb +RT Tr E
T+Tr
plusmn
5 Retroid DNAviruses
elements2-10 kb +Tr
T+Tr
plusmnE RT
plusmn
RT CP
RT
Class Replication cycle
Genetic cycles of retroid viruses and retroelements
Genetic cycles of DNA viruses and plasmids 6 ssDNAvirusesplasmids2-11 kb
7 dsDNA virusesplasmids
5-1200 kb
+RCR RCR E
+Tr
plusmn +
TRCE
Tr
plusmn
+T
plusmnR E
Pr-Pol ALL CELLULARORGANISMS
YOU ARE HERE
I Genes with readily detectable homologs from cellular life forms 1 Genes with closely related homologs from cellular organisms
(typically the host of the given virus) present in a narrow group of viruses2 Genes that are conserved within a virus lineage or even several
lineages and have moderately close cellular homologsOrigin relatively recent (1) or ancient (2) acquisition from host
II Virus-specific genes3 ORFans ie genes without detectable homologs except possibly
in closely related viruses 4 Virus-specific genes that are conserved within a virus lineageAcquisition from host but with rapid divergence from ancestor once within viral genomes
III Viral hallmark genes5 Genes shared by many diverse virus lineages with only very distant
homologs in cellular organisms
Natural history of viral genes a one-page summary of viral comparative genomics
Contributions of different classes of viral genes to the genomes of different classes of viruses strong dependence on genome size
Genome size (log) 0 1 2 3
Recent acquisitions
Old acquisitions
Virus-specific ORFans
Virus-specific conserved
Virus hallmark
Most RNAvirusesretroelementsRCR replicons
Large RNA virusesadenotailedphages
NCLDVherpeslargephages
Natural history of viral genesViral Hallmark Genes
Shared by many diverse groups of viruses from the smallest RNA viruses to the giant DNA viruses
Strong support for monophyly of all viral members of the respective gene families
Only distant homologs in cellular organisms
Can be viewed as signatures of the lsquovirus statersquo
Play major roles in genome replication packagingand assembly
4 Rolling circle replicationinitiation endonuclease
1 Jelly-roll capsid protein
2 Superfamily 3 helicase
Protein products of viral hallmark genes
3 RNA-dependent RNA polymeraseand Reverse transcriptase
5 Viral archaeo-eukaryotic DNA primase
6 UL9-like superfamily 2 helicase
7 Packaging ATPase of the FtsK family
8 ATPase subunit of terminase
The primordial gene pool hypothesis (extremely counterintuitive ndash Santiago Elena Feb 16 2011)
The hallmark genes AND by implication the major lineages of modern viruses (at least viruses of prokaryotes) descend directly from a primordial gene pool
Koonin Senkevich Dolja Biol Direct 2006 129
Viral hallmark genes arebullpresent in a huge diversity of viruses and other selfish elementsbullrepresented only by remote homologs in cellular life forms
-synergy with the diversity of genomic strategies
A crucial corollary If viruses come directly from a primordial gene pool then origin of viruses isinextricably linked to the origin of cells
Cell degeneration Escaped genes Pr imordial geneticsystems
CELL
SMALLPARASITICCELL
VERYSMALLVIRUS
CHROMOSOME
PLASMID
VIRUS
RNA
VIRUS
PRE-CELLULAR LIFE FORMS
DNA
VIRUS
mRNA
VIRUS
Competing concepts of the or igin of viruses
CELLS
Origin and evolution of virus-like genetic
elementsin the pre-cellular
era
Koonin Martin TIG 2005Koonin Senkevich Dolja Biol Direct 2006 129Koonin Ann NY Acad Sci 2009
Replicon fusionyields chromosomes
bullViruses and virus-like genetic elements are not ldquojustrdquo pathogens they aredominant entities in the biospherebullEmergence of virus-like parasites is inevitable in any replicating systembullIn the pre-cellular epoch the genetic elements that later became viral andcellular genomes comprised a single pool in which they mixed matched andevolved new increasingly complex gene ensemblesbullDifferent replication strategies including RNA replication reverse transcriptionand DNA replication evolved already in the primordial genetic poolbullWith the emergence of prokaryotic cells a distinct pool of viral genes formed that retained its identity ever since as evidenced by the extant distribution ofviral hallmark genes ldquovirus worldrdquo or the virospherebullThe emergence of the eukaryotic cell was a second melting pot of virus evolution from which viruses of eukaryotes originated via recombination of genes from prokaryote viruses retroelements and the evolving eukaryotic hostbullViruses make essential contributions to the evolution of the genomes of cellularlife forms in particular as vehicles of HGT GTAs transducing phages
The ancient Virus World (VW)
Koonin EV Senkevich TG Dolja VV The ancient Virus World and evolution of cells Biol Direct 2006 Koonin EV Wolf YI Nagasaki K Dolja VV The complexity of the virus world Nat Rev Microbiol 2009
Evolution of antivirus defense systems
bull CRISPRCas system of adaptive immunity in prokaryotes
bull A case for Lamarckian evolutionbull The perennial virus-host arms race
CRISPR repeats and Cas genesCRISPR Clustered Regularly interspaced short palindromic repeatsCas CRISPR-associated (genes)
Sorek et al Nature Rev Microbiol 2008
Makarova KS Aravind L Grishin NV Rogozin IB Koonin EVA DNA repair system specific for thermophilic Archaea and bacteria predicted by genomic context analysisNucleic Acids Res 2002 Jan 1530(2)482-96
During a systematic analysis of conserved gene context in prokaryotic genomes a previously undetected complex partially conserved neighborhood consisting of more than 20 genes was discovered in most Archaea and some bacteria including the hyperthermophiles Thermotoga maritima and Aquifex aeolicus The gene composition and gene order in this neighborhood vary greatly between species but all versions have a stable conserved core that consists of five genes One of the core genes encodes a predicted DNA helicase often fused to a predicted HD-superfamily hydrolase and another encodes a RecB family exonuclease three core genes remain uncharacterized but one of these might encode a nuclease of a new family helliphelliphelliphellipThe functional features of the proteins encoded in this neighborhood suggest that they comprise a previously undetected DNA repair system which to our knowledge is the first repair system largely specific for thermophiles to be identified
Protein components of the system an update with unification of many diverse families
~25 families altogetherFamily SubfamilyA Phyletic
distributionBComments
1 COG1518 COG1518 (cas1) All Putative novel nucleaseintegrase Mostly α-helical protein
2 COG1343 COG1343 (cas2) COG3512ygbF-like
MTH324-likey1723_N-like
All Small protein related to VapD fused to helicase (COG1203) iny1723-like proteins
3 COG1203 COG1203 (cas3) All DNA helicase Most proteins have fusion to HD nuclease
4 RecB-like nuclease
COG1468 (cas4) COG4343 All RecB-like nuclease Contains three-cysteine C-terminal cluster
5 RAMP Repair-associated mysterious
protein
COG1688 (cas5)COG1769 COG1583COG1567 COG1336COG1367 COG1604COG1337 COG1332COG5551
BH0337-likeMJ0978-likeYgcH-like
y1726-like y1727-like
All Belong to ldquoRAMPrdquo family possibly RNA-binding proteinstructurally related to duplicated ferredoxin fold (PDB 1wj9)
6 COG1857 COG1857 COG3649 YgcJ-like y1725-like
All αβ protein predicted nuclease or integrase
7 HD-like nuclease
COG1203 (N-terminus) COG2254 All HD-like nuclease
8 BH0338 BH0338-likeMTH1090-like
All mostlyarchaea and FIRM
Large Zn-finger containing proteins probably nucleases (nucleaseactivity was shown for MTH1090 (ref)
9 COG1353 predicted
polymerase
COG1353 Most archaea some bacteria
Predicted palm-domain polymerase distantly related to viral RdRp and RT
hellipand ~15 other less common proteins Makarova et al BD 2006
CRISPR clustered regularly interspaced short palindromic repeats
CRISPR repeats
TB and TA
Strepto-like
Ecoli-like
Pasteurella-like
Cas2COG1343
Cas1COG1518
Cas4COG1468
Cas3COG1203
ldquoThe common structural characteristics of CRISPR loci are (i) the presence of multiple short direct repeats which show no or very little sequence variation within a given locus (ii) the presence of non-repetitive spacer sequences between the repeats of similar size (iii) the presence of a common leader sequence of a few hundred basepairs in most species harbouring multiple CRISPR loci (iv) the absence of long open reading frames within the locus and (v) the presence of the cas1 gene accompanied by the cas2 cas3 or cas4 genes in CRISPR-containing speciesrdquo
The cas genes are our ldquorepairrdquo system
ldquoHelicaserdquo cassette
ldquoPolymeraserdquo cassette
paragon of prokaryotic genome plasticity-extensive gene shuffling-evidence of multiple horizontal transfers-widespread gene loss
RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins
extreme sequence diversity MOTIFs specific I II III IV V COGs 13361367 hhshhGs ust-lKGhh+hh hhGtt hD lGhttsGh 160413371332 y1726-like slhlpEKuVRGT lRTIDs YGuVTsGhuh COG1851 hGphpGpsaFh hGFGRh BH0337-like TpA-h+GIh-uIh hhLpDV LGsREhuht COG1567 hhhpp ups-lhtAhh luscoGhGh COG1769 hhhh+Ph-hhh ss-hhGhlhsh hGh hGtcp+hsthchp COG1688 (Cas5) hhhhhts sssshhGhlsh lGttphh COGs 15835551 hhhhoPhhl hGtppshGFGl YgcH-like hHphlh hGu+uhGhGhh y1727-like LHphLh GFsthGLStss MJ0978-like hHNH lG+tsuhGhGol
Ferredoxin foldRNA-recognition motif (RRM)
The sequence similarity space of CRISPR repeats visualized with the BioLayoutprogram [26] Dots denote individual repeat sequences connecting lines represent Smith-Waterman similarities such that closer dots represent more similar sequences Dot colors denote cluster association as derived from MCL clustering The 12 largest clusters are indicated by circles together with their sequence logos coarse phylogenetic composition and sample secondary structures where applicableKunin et al Genome Biology 2007 8R61 doi101186gb-2007-8-4-r61
CRISPR show extreme diversity and complex clustering
Arcful GTTGAAATC-------AGACCAAAATGGGATTGAAAG- Metthe GTTAAAATC-------AGACCAAAATGGGATTGAAAT- Pyraby GTTCCAATA-------AGACTAGAATAGAATTGAAAG- Pyrhor GTTTAATAA--------GACTAAAATAGAATTGAAAG- Sulsol GATTAATCC-------------CAAAAGAATTGAAAG- Pyrhor GTTTCCGTA--------GAACTAAATAGTGTGGAAAG- Pyraby GTTTCCGTA--------GAACTTAGTAGTGTGGAAAG- Pyraer GTTTCAACT------------ATCTTTTGATTTTTGG- Pictor GTTAAATAA-------TAACCTAAATAGGATTGAAAG- Theaer GTAAAATAG---------ACCTTAATAGGATTGAAAG- Metace GTTTCAATC-------CCTCAAAGGTCTGATTTTAAC- Sultok GATGAATCC------------CAAAAGGAATTGAAAG- Sulaci GTTTTAGTT-------------TCTTGTCGTTATTAC- Pictor GTTTAAGAA--------TTACTAGATAGTATGGAGT-- Halmar GTTTCAGAC-------GGACCCTTGTGGGGTTGAAGC- Metjan ATTAAAATC-------AGACCGTTTCGGAATGGAAAT- Metkan GTTTCATTA-CCCGTATTATTACGGGTTAATTGCGAG- Nanequ CTTTCAATA---------TTTCTAATATATTAGAAAC- Themar GTTTCAATA-------CTTCCTTAGAGGTATGGAAAC- Theten GTTTCAATC-------CCTTTTAGGTAGGCTAAAAAC Thethe GTTGCAAAC-CTCGTTAGCCTCGTAGAGGATTGAAAC- Aquaeo GTTTTAACT--------CCACACGGTACATTAGAAAC- Bachal GTCGCACTC----TATATGGGT-GCGTGGATTGAAAT- Azoarcus GTGTTCCCC-------GCGCATCGCGGGGGTTGAAG-- Chltep GTCTTCCCC--------ACGCC-CGTGGGGGTGTTTC- Chrvio GTGCTCCCC-------CACGCA-CGTGGGGATGAACCG Clotet GTATTAGTA-------GCACCA-TATTGGAATGTAAAT Anabaena GTTTTAATTAACAAAAATCCCTATCAGGGATTGAAAC- Myctub GTTTCCGTC--CCCTCTCGGGGTTTTGGGTCTGACGAC Yerpes GTTCACTGC--------CGCACAGGCAGCTTAGAAA-- Metcap GTTTCAATCCACTCCCGGCTATTTAGCCGGGAGATAC- Mycgal GTTTTAGCA-CTGTACAATACTTGTGTAAGCAATAAC- Pasmul GTTAACTGC--------CGTATAGGCAGCTTAGAAA-- Pasmul GTTGTAGTTCCCTCTCTCATTTCGCAGTGCTACAAT-- Pasmul GTTCACCAT--------CGTGTAGATGGCTTAGAAA--
All
A large subset of CRISPRs is conserved among diverse specieseven between archaea and bacteria suggesting that
CRISPRs are horizontally transferred together with cas genes
Mojica FJ Diez-Villasenor C Garcia-Martinez J Soria E
Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elementsJ Mol Evol 2005 Feb60(2)174-82
Here we show that CRISPR spacers derive from preexisting sequences either chromosomal or within transmissible genetic elements such as bacteriophages and conjugative plasmids
helliphelliphelliphellipThe transcription of the CRISPR loci (Tang et al 2002) suggests that such activity could be executed by CRISPR-RNA molecules acting as regulatory RNA that specifically recognizes the target through the homologous RNA-spacer sequence similarly to the eukaryotic interference RNA
HypothesisCRISPRCasbull is a prokaryotic immunity system that functions on the
RNAi principle
bull integrates short fragments of essential phageplasmid genes into CRISPRs
bull When expressed these fragments (psiRNA ndash after prokaryotic siRNA) silence the target gene and make the organism immune to the respective agent
bull contains all or most of the protein activities involved in these processes
bull Some of the Cas proteins are functional analogs of the eukaryotic proteins involved in RNAi in particular components of RISC (RNA-Induced Silencing Complex) and form prokaryotic analogs of RISC (pRISCs)
Transcription
(protein-guided) RNA folding
specific transcription factor (COG1517)
cellular RNApol
polycistronic pre-psiRNA
p-dicer (Helicase+HD-Hydrolase)(COG1203 COG2254)
RNA processing(fast)
pre-psiRNA75-100 nt
RNA processing(slow)
p-dicer
psiRNA25-45 nt
p-slicer(COG1468 COG4343
COG1857)
RAMP
target RNA cleavage
p-RISC
plasmid or phage mRNA
Annealing to RNA target
p-RISC
The basic scheme of CASS functioning
psiRNA25-45 nt
RAMP
RAMP-RNA complex(unstable)plasmid or phage
mRNA
Primer elongationCASS RNApol(COG1353)
Amplified annealing
long dsRNA (stable)
p-dicerDuplex degradation
RAMPRAMP binding
Cycle continues
Annealing to RNA target
Variant of CASS functioning with polymerasepsi-RNA amplification(mostly in thermophiles but also in Mycobacteria)
plasmid or phage mRNA
pre-psiRNA75-100 nt
CASS RNApol(COG1353)Or other RT
Reverse transcription with random copy choice
Random RNA recombination and reverse transcription
dsDNA with CRISPRs and target-derived spacers
Homologous recombination with genomic CRISPR region
integrase(COG1518)
genomic DNA
genomic DNA with new target-derived spacer
OR
New psiRNA generation
Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12
Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity
Key validation
bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease
Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes
(1) Adaptation
(2) Expression amp Processing
(3) Interference
invading virus plasmid
CRISPRleader
5 4 3 2 1
6 5 4 3 2 1
invading virus plasmid
invading virus plasmid
host
host
cas operon
Cascade
Cas3
cas3 cas2cas1cas5
Van der Oost et al TIBS 2009
The three types of CRISPRCas systems and their signature genes
1343
1343
1343
1343
Helicase1203
HD-f
Cas2
Cas6
Cas6
Cas5
Cas8
Cas9
Cas1
Cas3
Cas10
Cas11
Cas7
Cas6
Cas6
Cas5 Cas6
Cas4
Cas6
13433513
1343
RuvC-f
3513
HNH
13433513
1857
18571857 RAMP1688
RAMP1688
RecB-f1468
RecB-f1468
Helicase1203
R AMP1583
1857 1857Helicase1203
1857Helicase1203
1343 y1724
ygcKygcL
EcoliCASS2
YpestCASS3
NmeniCASS4
MtubeCASS6 RAMP module
DvulgCASS1
Tneap and HmariCASS7
ApernCASS5
Type I (Helicase cassette)
Cascade
Type II (HNH-type)
Type III (RAMP module or Polymerase cassette)
CASS4a
1857 1343RecB-f1468
Helicase1203
BH0338MTH1090LA3191
1421 RAMP1567
RAMP15833337RAMP
1583
RAMP1583
RAMP1769
SPy1049
Cas6
Cas61857
RecB-f1468
RAMP1688
Helicase1203
AF0070
R AMP1583
1343
1517
Cas12 Cas13
RAMP1688
RAMP1688
RAMP1688
Makarova et al NRM submitted
Experimental data on CRISPRCas systems
1857
1857
1857
RAMP1688
10
Cordip DIP0037Bacfra BF3951
Wolsuc WS1444Camjej Cj1522c
Neimen NMA0630Pasmul PM1126
Strthe str0658Strpyo Spy 0770
Treden TDE0328Staepi SERP2463
Mycgal MGA 0523Mycmob MMOB0320
Porgin PG1982Legpne lpp0161
Wolsuc WS1615Sulsol SSO1450
Aerper APE1240Sulsol SSO1405
Pyraer PAE0200Pyraer PAE0081
Metmaz MM0559Metmaz MM3249
Arcful AF1878Mansuc MS1635
Niteur NE0111Myctub Rv2817c
Nostoc alr0381Synech slr7071Thethe TTHB145
Chrvio CV1229Xanaxo XAC3842
Chltep CT1130Desvul DVUA0134
Metcap MCA0651Azoarc ebA3283
Deigeo_DRAFT_1682Mansuc MS0981
Baccla ABC3592Bachal BH0341
Strpyo Spy 1286Vibvul VVA1544
Nostoc alr1468Synech slr7092
Synech slr7016Geosul GSU0057
Thethe TTHB224Nostoc alr1568
Metkan MK1312
Esccol b2755Saltyp STM2938
Phopro PBPRC0034Geosul GSU1392
Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639
Symthe_STH663Corjei jk0643
Nocfar nfa44220Strave SAV7537
Metcap MCA0930Cordip DIP2214
Chrvio CV1756Zymmob ZMO0680
Pasmul PM0311Erwcar ECA3679
Yerpes y1722Legpne lpl2837Phopro PBPRB1995
Acinet ACIAD2484
Clotet CTC01148Fusnuc FN1177
Aquaeo aq 369Theten TTE2658Themar TM1797
Bacfra BF2544Porgin PG2014
Halmar pNG4053Clotet CTC01463
Metjan MJ0378Pyrhor PH1245
Thekod TK0455Metthe MTH1084
Arcful AF2435Metace MA3670
Nanequ NEQ017Pyrhor PH0173
Pictor PTO0003Pictor PTO0049Thevol TVN0106
1343
1343
1343
1343
1518
1518
1343
1518
RecB-f4343
RecB-f1468
RecB-f1468
RecB-f1468
Helicase12031857 RAMP
1688
RAMP1688
Helicase1203
Helicase1203
RAMPBH0337
RAMPYgcH
RAMPy1727
RAMPy17261857
Helicase1203
HD-f
Helicase1203
HD-f
1343
HD-f
HD-f
HD-f
RuvC-f
AF0070
BH0338
y1724
ygcKygcL
RAMP1583
HTH
AF1870
SPy1049
3574
AF1870
PH0918
HTH 3574
1857 RAMP1688
RecB-f1468
Helicase1203
HD-f
AF1870
MTH1090
AF1873
RAMP1583
3574
AF1870
3574
1343
1
7
2
3
4
5
6
7a
Sulsol SSO1999Sulsol SSO1440
Sultok ST2639Sultok ST0032
Sulsol SSO1402Pyraer PAE0068
Arcful AF1874Pyrhor PH0917Thekod TK0450
Pyraby PAB1689
13433513
3513
1518
RuvC-f
HNH
4aHNH
1 0
Pyrhor PH0162
6
64
6
4
16
17
7
7
7
7
1
2
7
7
7
7
5 7
5 75 7
5 75 7
5 7
6 1
7
Metkan MK1314
Metace MA1931
Metmaz MM3356
Metbar Mbar_A1355
Strthe stu0960Myctub Rv2823cStaepi SERP2461
Themar TM1811Mansuc MS1651
Niteur NE0123Thethe TTP0102
Metthe MTH1082
Metthe MTH326
Metjan MJ1672Pictor PTO0053Thevol TVN0112
Metkan MK1297Thethe TTP0115
Bachal BH0328Synech sll7090
Aquaeo aq 387Theten TTE2639
Themar TM1794Arcful AF1867
Pyrfur PF1129Thefus Tfu1578
Porgin PG1987Sultok ST1979
Sulsol SSO1991Sulsol SSO1729
Sultok ST0012Sulaci Saci 2046Nostoc all1479
1421 RAMP1567
RAMP1583
RAMPMTH323
RAMP1769
XMTH324
ldquoPolymeraserdquo cassette
ldquoHelicaserdquo cassette(A)
(B)
(C)
Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4
Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56
Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)
mRNA targeting
Phylogeny of Cas1 and the 3 types of CRISPRCas systems
228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes
EcoliCASS2Type IType I
Type I
Type I
Type I
Type III
Type III
Type III
Type III
Type III
Type III
Type II
YpestCASS3
NmeniCASS4
PolymeraseRAMPmoduleMtubeCASS6
PolRAMP
DvulgCASS1
TneapHmariCASS7
ApernCASS5
cas1cas9 cas2 cas4
cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI
II
IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2
RAMP
Makarova et al NRM submitted
CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes
54
256
1537
26
516
2
9
7 56107
3
10
13
383
210
46
216
8
1
7 70211
10
1
0102030405060708090
100
1623
84 6
17 7
2322
0
9
0
0
151
2
1
21
17
20 1
0
1533 28 7 14
0
9 7 40
117 210
0102030405060708090
100
Type I Type II Type IIICas1presenceabsence
Makarova et al in preparation
Modular evolution of the 3 types of CRISPRCas
Makarova et al in preparation
Mol Microbiol 2011 Jan79(2)484-502
A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair
Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF
Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair
Back to repair functions
The 3 major modalities of evolution
Koonin Wolf Biol Direct 2009
CRISPRCas as a bona fide Lamarckian system
Koonin Wolf Biol Direct 2009
Phenomenon Biological rolefunction
Phyletic spread
Lamarckian criteria
Genomic changes caused by environmental factor
Changes are specific to relevant genomic loci
Changes provide adaptation to the causative factor
Bona fide LamarckianCRISPRCas Defense against
viruses and other mobile elements
Most of the Archaeaand many bacteria
Yes Yes Yes
piRNA Defense against transposable elements in germline
Animals Yes Yes Yes
HGT (specific cases)
Adaptation to new environment stress response resistance
Archaea bacteria unicellular eukaryotes
Yes Yes Yes
Quasi-LamarckianHGT (general phenomenon)
Diverse innovations Archaea bacteria unicellular eukaryotes
Yes No Yesno
Stress-induced mutagenesis
Stress responseresistanceadaptation to new conditions
Ubiquitous Yes No or partially
Yes (but general evolvabilityenhanced as well)
Diverse Lamarckian and quasi-Lamarckian phenomena
Koonin Wolf Biol Direct 2009
Stress as a gauge of evolutionary modality
Koonin Wolf Biol Direct 2009
bullEvolution of parasites is intrinsic to any replicator system
bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since
bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms
bullPerennial arms race between parasites and hosts is one of the principal factors of evolution
Valerian DoljaOregon State University
Yuri Wolf NCBITatiana Senkevich NIAID NIH
Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille
Didier Raoult et al
Bill MartinUniv Duesseldorf
Kira Makarova NCBI
Marine eukaryotic plankton carries a wealth ofpositive-strand RNA viruses nearly all belong to the picorna-like superfamily
Lang et al (2004) Virology 320206
HcRNAV44 kb
VP
HaRNAV86 kb
S-Pro Pol
VP1VP3VP2PolHel3
RsRNAV89 kb AnPolHel3 VP 1-3
SssRNAV90 kb An
PolHel3 VP 1-3C-Pro
sg sg
An
Nagasaki et al (2005) Appl Env Microbiol 718888
Culley et al (2006) Science 3121795
The amended Picornavirus-like
superfamily includes 14 recognized viral families
4 floating genera and15 unclassified positive-
strand anddouble-strand RNA
viruses that infect hosts from 4 of the 5 eukaryotic
supergroups-6 distinct clades
from RdRp phylogeny-diverse genome layouts
View from the viral side
6 major clades of picorna-likeviruses - 5 infect eukaryotesfrom different supergroups
The 5 supergroups of eukaryotes and their picorna-like viruses
Complementary view from thehost side
4 of the 5 eukaryoticsupergroups host diverse picorna-like viruses (no onechecked the 5th)
PolioV74 kb
VP0 3DPol2C 3CPro Angg
IRESVP1VP3 2APro 2B
3A 3B
Jelly-roll CPsSuperfamily 3
helicase RdRp
DNA phages Bacterial group II
retroelements
Bacterial and mitochondrialHtrA family proteases
Likely origins of the signature genes of the picorna-like superfamily were inferred usingsignature PSSMs and PSI-BLAST searches
3C-Pro
Hallmark genes of picorna-like viruses have distinct prokaryoticorigins
Radiation of major viralclades occurred in aldquoBig Bangrdquo during eukaryogenesis and antedates the divergenceof eukaryotic supergroups-the viruses then ldquosampledrdquo the hosts
Conclusions on the picorna and NCLDV storiesBig Bangs of virus evolution
bullPhylogenomic analysis of the picorna-like superfamily of eukaryotic RNA viruses indicates that the major lineages within this superfamily diverged prior to the divergence of the eukaryotic supergroupsbullBig Bang ndash an explosive early phase of viral evolutionbullMost likely the same pattern holds for other major groups ofviruses as illustrated by the evolutionary study of NCLDVa completely different group of virusesbullThe Big Bangs of eukaryotic virus evolution occurred concomitantly with a similar rapid phase of host evolution andcould be a manifestation of a general model of major evolutionarytransitions
Koonin The Biological Big Bang model for the major transitions in evolutionBiol Direct 2007 Aug 20221
Koonin Wolf Nagasaki Dolja Nat Rev Microbiol 2008 Dec6(12)925-39
+
1 Positive-strandRNA
3-30 kb - +R R ET
T+R
Class Replication cycle
2 Double-strandRNA
4-25 kb plusmnTr R E
T
+ plusmn
3 Negative-strandRNA
11-20 kbR R E
RdRp CPT
+-Tr -
+
Genetic cycles of RNA viruses
Diversity of viral genetic cycles versus the uniform genetic cycle of all cellular organisms Viruses are the
biospherersquos laboratory of genomic strategies
RdRp CP
RdRp CP
+4 Retroid
RNA viruses 7-12 kb +RT Tr E
T+Tr
plusmn
5 Retroid DNAviruses
elements2-10 kb +Tr
T+Tr
plusmnE RT
plusmn
RT CP
RT
Class Replication cycle
Genetic cycles of retroid viruses and retroelements
Genetic cycles of DNA viruses and plasmids 6 ssDNAvirusesplasmids2-11 kb
7 dsDNA virusesplasmids
5-1200 kb
+RCR RCR E
+Tr
plusmn +
TRCE
Tr
plusmn
+T
plusmnR E
Pr-Pol ALL CELLULARORGANISMS
YOU ARE HERE
I Genes with readily detectable homologs from cellular life forms 1 Genes with closely related homologs from cellular organisms
(typically the host of the given virus) present in a narrow group of viruses2 Genes that are conserved within a virus lineage or even several
lineages and have moderately close cellular homologsOrigin relatively recent (1) or ancient (2) acquisition from host
II Virus-specific genes3 ORFans ie genes without detectable homologs except possibly
in closely related viruses 4 Virus-specific genes that are conserved within a virus lineageAcquisition from host but with rapid divergence from ancestor once within viral genomes
III Viral hallmark genes5 Genes shared by many diverse virus lineages with only very distant
homologs in cellular organisms
Natural history of viral genes a one-page summary of viral comparative genomics
Contributions of different classes of viral genes to the genomes of different classes of viruses strong dependence on genome size
Genome size (log) 0 1 2 3
Recent acquisitions
Old acquisitions
Virus-specific ORFans
Virus-specific conserved
Virus hallmark
Most RNAvirusesretroelementsRCR replicons
Large RNA virusesadenotailedphages
NCLDVherpeslargephages
Natural history of viral genesViral Hallmark Genes
Shared by many diverse groups of viruses from the smallest RNA viruses to the giant DNA viruses
Strong support for monophyly of all viral members of the respective gene families
Only distant homologs in cellular organisms
Can be viewed as signatures of the lsquovirus statersquo
Play major roles in genome replication packagingand assembly
4 Rolling circle replicationinitiation endonuclease
1 Jelly-roll capsid protein
2 Superfamily 3 helicase
Protein products of viral hallmark genes
3 RNA-dependent RNA polymeraseand Reverse transcriptase
5 Viral archaeo-eukaryotic DNA primase
6 UL9-like superfamily 2 helicase
7 Packaging ATPase of the FtsK family
8 ATPase subunit of terminase
The primordial gene pool hypothesis (extremely counterintuitive ndash Santiago Elena Feb 16 2011)
The hallmark genes AND by implication the major lineages of modern viruses (at least viruses of prokaryotes) descend directly from a primordial gene pool
Koonin Senkevich Dolja Biol Direct 2006 129
Viral hallmark genes arebullpresent in a huge diversity of viruses and other selfish elementsbullrepresented only by remote homologs in cellular life forms
-synergy with the diversity of genomic strategies
A crucial corollary If viruses come directly from a primordial gene pool then origin of viruses isinextricably linked to the origin of cells
Cell degeneration Escaped genes Pr imordial geneticsystems
CELL
SMALLPARASITICCELL
VERYSMALLVIRUS
CHROMOSOME
PLASMID
VIRUS
RNA
VIRUS
PRE-CELLULAR LIFE FORMS
DNA
VIRUS
mRNA
VIRUS
Competing concepts of the or igin of viruses
CELLS
Origin and evolution of virus-like genetic
elementsin the pre-cellular
era
Koonin Martin TIG 2005Koonin Senkevich Dolja Biol Direct 2006 129Koonin Ann NY Acad Sci 2009
Replicon fusionyields chromosomes
bullViruses and virus-like genetic elements are not ldquojustrdquo pathogens they aredominant entities in the biospherebullEmergence of virus-like parasites is inevitable in any replicating systembullIn the pre-cellular epoch the genetic elements that later became viral andcellular genomes comprised a single pool in which they mixed matched andevolved new increasingly complex gene ensemblesbullDifferent replication strategies including RNA replication reverse transcriptionand DNA replication evolved already in the primordial genetic poolbullWith the emergence of prokaryotic cells a distinct pool of viral genes formed that retained its identity ever since as evidenced by the extant distribution ofviral hallmark genes ldquovirus worldrdquo or the virospherebullThe emergence of the eukaryotic cell was a second melting pot of virus evolution from which viruses of eukaryotes originated via recombination of genes from prokaryote viruses retroelements and the evolving eukaryotic hostbullViruses make essential contributions to the evolution of the genomes of cellularlife forms in particular as vehicles of HGT GTAs transducing phages
The ancient Virus World (VW)
Koonin EV Senkevich TG Dolja VV The ancient Virus World and evolution of cells Biol Direct 2006 Koonin EV Wolf YI Nagasaki K Dolja VV The complexity of the virus world Nat Rev Microbiol 2009
Evolution of antivirus defense systems
bull CRISPRCas system of adaptive immunity in prokaryotes
bull A case for Lamarckian evolutionbull The perennial virus-host arms race
CRISPR repeats and Cas genesCRISPR Clustered Regularly interspaced short palindromic repeatsCas CRISPR-associated (genes)
Sorek et al Nature Rev Microbiol 2008
Makarova KS Aravind L Grishin NV Rogozin IB Koonin EVA DNA repair system specific for thermophilic Archaea and bacteria predicted by genomic context analysisNucleic Acids Res 2002 Jan 1530(2)482-96
During a systematic analysis of conserved gene context in prokaryotic genomes a previously undetected complex partially conserved neighborhood consisting of more than 20 genes was discovered in most Archaea and some bacteria including the hyperthermophiles Thermotoga maritima and Aquifex aeolicus The gene composition and gene order in this neighborhood vary greatly between species but all versions have a stable conserved core that consists of five genes One of the core genes encodes a predicted DNA helicase often fused to a predicted HD-superfamily hydrolase and another encodes a RecB family exonuclease three core genes remain uncharacterized but one of these might encode a nuclease of a new family helliphelliphelliphellipThe functional features of the proteins encoded in this neighborhood suggest that they comprise a previously undetected DNA repair system which to our knowledge is the first repair system largely specific for thermophiles to be identified
Protein components of the system an update with unification of many diverse families
~25 families altogetherFamily SubfamilyA Phyletic
distributionBComments
1 COG1518 COG1518 (cas1) All Putative novel nucleaseintegrase Mostly α-helical protein
2 COG1343 COG1343 (cas2) COG3512ygbF-like
MTH324-likey1723_N-like
All Small protein related to VapD fused to helicase (COG1203) iny1723-like proteins
3 COG1203 COG1203 (cas3) All DNA helicase Most proteins have fusion to HD nuclease
4 RecB-like nuclease
COG1468 (cas4) COG4343 All RecB-like nuclease Contains three-cysteine C-terminal cluster
5 RAMP Repair-associated mysterious
protein
COG1688 (cas5)COG1769 COG1583COG1567 COG1336COG1367 COG1604COG1337 COG1332COG5551
BH0337-likeMJ0978-likeYgcH-like
y1726-like y1727-like
All Belong to ldquoRAMPrdquo family possibly RNA-binding proteinstructurally related to duplicated ferredoxin fold (PDB 1wj9)
6 COG1857 COG1857 COG3649 YgcJ-like y1725-like
All αβ protein predicted nuclease or integrase
7 HD-like nuclease
COG1203 (N-terminus) COG2254 All HD-like nuclease
8 BH0338 BH0338-likeMTH1090-like
All mostlyarchaea and FIRM
Large Zn-finger containing proteins probably nucleases (nucleaseactivity was shown for MTH1090 (ref)
9 COG1353 predicted
polymerase
COG1353 Most archaea some bacteria
Predicted palm-domain polymerase distantly related to viral RdRp and RT
hellipand ~15 other less common proteins Makarova et al BD 2006
CRISPR clustered regularly interspaced short palindromic repeats
CRISPR repeats
TB and TA
Strepto-like
Ecoli-like
Pasteurella-like
Cas2COG1343
Cas1COG1518
Cas4COG1468
Cas3COG1203
ldquoThe common structural characteristics of CRISPR loci are (i) the presence of multiple short direct repeats which show no or very little sequence variation within a given locus (ii) the presence of non-repetitive spacer sequences between the repeats of similar size (iii) the presence of a common leader sequence of a few hundred basepairs in most species harbouring multiple CRISPR loci (iv) the absence of long open reading frames within the locus and (v) the presence of the cas1 gene accompanied by the cas2 cas3 or cas4 genes in CRISPR-containing speciesrdquo
The cas genes are our ldquorepairrdquo system
ldquoHelicaserdquo cassette
ldquoPolymeraserdquo cassette
paragon of prokaryotic genome plasticity-extensive gene shuffling-evidence of multiple horizontal transfers-widespread gene loss
RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins
extreme sequence diversity MOTIFs specific I II III IV V COGs 13361367 hhshhGs ust-lKGhh+hh hhGtt hD lGhttsGh 160413371332 y1726-like slhlpEKuVRGT lRTIDs YGuVTsGhuh COG1851 hGphpGpsaFh hGFGRh BH0337-like TpA-h+GIh-uIh hhLpDV LGsREhuht COG1567 hhhpp ups-lhtAhh luscoGhGh COG1769 hhhh+Ph-hhh ss-hhGhlhsh hGh hGtcp+hsthchp COG1688 (Cas5) hhhhhts sssshhGhlsh lGttphh COGs 15835551 hhhhoPhhl hGtppshGFGl YgcH-like hHphlh hGu+uhGhGhh y1727-like LHphLh GFsthGLStss MJ0978-like hHNH lG+tsuhGhGol
Ferredoxin foldRNA-recognition motif (RRM)
The sequence similarity space of CRISPR repeats visualized with the BioLayoutprogram [26] Dots denote individual repeat sequences connecting lines represent Smith-Waterman similarities such that closer dots represent more similar sequences Dot colors denote cluster association as derived from MCL clustering The 12 largest clusters are indicated by circles together with their sequence logos coarse phylogenetic composition and sample secondary structures where applicableKunin et al Genome Biology 2007 8R61 doi101186gb-2007-8-4-r61
CRISPR show extreme diversity and complex clustering
Arcful GTTGAAATC-------AGACCAAAATGGGATTGAAAG- Metthe GTTAAAATC-------AGACCAAAATGGGATTGAAAT- Pyraby GTTCCAATA-------AGACTAGAATAGAATTGAAAG- Pyrhor GTTTAATAA--------GACTAAAATAGAATTGAAAG- Sulsol GATTAATCC-------------CAAAAGAATTGAAAG- Pyrhor GTTTCCGTA--------GAACTAAATAGTGTGGAAAG- Pyraby GTTTCCGTA--------GAACTTAGTAGTGTGGAAAG- Pyraer GTTTCAACT------------ATCTTTTGATTTTTGG- Pictor GTTAAATAA-------TAACCTAAATAGGATTGAAAG- Theaer GTAAAATAG---------ACCTTAATAGGATTGAAAG- Metace GTTTCAATC-------CCTCAAAGGTCTGATTTTAAC- Sultok GATGAATCC------------CAAAAGGAATTGAAAG- Sulaci GTTTTAGTT-------------TCTTGTCGTTATTAC- Pictor GTTTAAGAA--------TTACTAGATAGTATGGAGT-- Halmar GTTTCAGAC-------GGACCCTTGTGGGGTTGAAGC- Metjan ATTAAAATC-------AGACCGTTTCGGAATGGAAAT- Metkan GTTTCATTA-CCCGTATTATTACGGGTTAATTGCGAG- Nanequ CTTTCAATA---------TTTCTAATATATTAGAAAC- Themar GTTTCAATA-------CTTCCTTAGAGGTATGGAAAC- Theten GTTTCAATC-------CCTTTTAGGTAGGCTAAAAAC Thethe GTTGCAAAC-CTCGTTAGCCTCGTAGAGGATTGAAAC- Aquaeo GTTTTAACT--------CCACACGGTACATTAGAAAC- Bachal GTCGCACTC----TATATGGGT-GCGTGGATTGAAAT- Azoarcus GTGTTCCCC-------GCGCATCGCGGGGGTTGAAG-- Chltep GTCTTCCCC--------ACGCC-CGTGGGGGTGTTTC- Chrvio GTGCTCCCC-------CACGCA-CGTGGGGATGAACCG Clotet GTATTAGTA-------GCACCA-TATTGGAATGTAAAT Anabaena GTTTTAATTAACAAAAATCCCTATCAGGGATTGAAAC- Myctub GTTTCCGTC--CCCTCTCGGGGTTTTGGGTCTGACGAC Yerpes GTTCACTGC--------CGCACAGGCAGCTTAGAAA-- Metcap GTTTCAATCCACTCCCGGCTATTTAGCCGGGAGATAC- Mycgal GTTTTAGCA-CTGTACAATACTTGTGTAAGCAATAAC- Pasmul GTTAACTGC--------CGTATAGGCAGCTTAGAAA-- Pasmul GTTGTAGTTCCCTCTCTCATTTCGCAGTGCTACAAT-- Pasmul GTTCACCAT--------CGTGTAGATGGCTTAGAAA--
All
A large subset of CRISPRs is conserved among diverse specieseven between archaea and bacteria suggesting that
CRISPRs are horizontally transferred together with cas genes
Mojica FJ Diez-Villasenor C Garcia-Martinez J Soria E
Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elementsJ Mol Evol 2005 Feb60(2)174-82
Here we show that CRISPR spacers derive from preexisting sequences either chromosomal or within transmissible genetic elements such as bacteriophages and conjugative plasmids
helliphelliphelliphellipThe transcription of the CRISPR loci (Tang et al 2002) suggests that such activity could be executed by CRISPR-RNA molecules acting as regulatory RNA that specifically recognizes the target through the homologous RNA-spacer sequence similarly to the eukaryotic interference RNA
HypothesisCRISPRCasbull is a prokaryotic immunity system that functions on the
RNAi principle
bull integrates short fragments of essential phageplasmid genes into CRISPRs
bull When expressed these fragments (psiRNA ndash after prokaryotic siRNA) silence the target gene and make the organism immune to the respective agent
bull contains all or most of the protein activities involved in these processes
bull Some of the Cas proteins are functional analogs of the eukaryotic proteins involved in RNAi in particular components of RISC (RNA-Induced Silencing Complex) and form prokaryotic analogs of RISC (pRISCs)
Transcription
(protein-guided) RNA folding
specific transcription factor (COG1517)
cellular RNApol
polycistronic pre-psiRNA
p-dicer (Helicase+HD-Hydrolase)(COG1203 COG2254)
RNA processing(fast)
pre-psiRNA75-100 nt
RNA processing(slow)
p-dicer
psiRNA25-45 nt
p-slicer(COG1468 COG4343
COG1857)
RAMP
target RNA cleavage
p-RISC
plasmid or phage mRNA
Annealing to RNA target
p-RISC
The basic scheme of CASS functioning
psiRNA25-45 nt
RAMP
RAMP-RNA complex(unstable)plasmid or phage
mRNA
Primer elongationCASS RNApol(COG1353)
Amplified annealing
long dsRNA (stable)
p-dicerDuplex degradation
RAMPRAMP binding
Cycle continues
Annealing to RNA target
Variant of CASS functioning with polymerasepsi-RNA amplification(mostly in thermophiles but also in Mycobacteria)
plasmid or phage mRNA
pre-psiRNA75-100 nt
CASS RNApol(COG1353)Or other RT
Reverse transcription with random copy choice
Random RNA recombination and reverse transcription
dsDNA with CRISPRs and target-derived spacers
Homologous recombination with genomic CRISPR region
integrase(COG1518)
genomic DNA
genomic DNA with new target-derived spacer
OR
New psiRNA generation
Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12
Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity
Key validation
bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease
Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes
(1) Adaptation
(2) Expression amp Processing
(3) Interference
invading virus plasmid
CRISPRleader
5 4 3 2 1
6 5 4 3 2 1
invading virus plasmid
invading virus plasmid
host
host
cas operon
Cascade
Cas3
cas3 cas2cas1cas5
Van der Oost et al TIBS 2009
The three types of CRISPRCas systems and their signature genes
1343
1343
1343
1343
Helicase1203
HD-f
Cas2
Cas6
Cas6
Cas5
Cas8
Cas9
Cas1
Cas3
Cas10
Cas11
Cas7
Cas6
Cas6
Cas5 Cas6
Cas4
Cas6
13433513
1343
RuvC-f
3513
HNH
13433513
1857
18571857 RAMP1688
RAMP1688
RecB-f1468
RecB-f1468
Helicase1203
R AMP1583
1857 1857Helicase1203
1857Helicase1203
1343 y1724
ygcKygcL
EcoliCASS2
YpestCASS3
NmeniCASS4
MtubeCASS6 RAMP module
DvulgCASS1
Tneap and HmariCASS7
ApernCASS5
Type I (Helicase cassette)
Cascade
Type II (HNH-type)
Type III (RAMP module or Polymerase cassette)
CASS4a
1857 1343RecB-f1468
Helicase1203
BH0338MTH1090LA3191
1421 RAMP1567
RAMP15833337RAMP
1583
RAMP1583
RAMP1769
SPy1049
Cas6
Cas61857
RecB-f1468
RAMP1688
Helicase1203
AF0070
R AMP1583
1343
1517
Cas12 Cas13
RAMP1688
RAMP1688
RAMP1688
Makarova et al NRM submitted
Experimental data on CRISPRCas systems
1857
1857
1857
RAMP1688
10
Cordip DIP0037Bacfra BF3951
Wolsuc WS1444Camjej Cj1522c
Neimen NMA0630Pasmul PM1126
Strthe str0658Strpyo Spy 0770
Treden TDE0328Staepi SERP2463
Mycgal MGA 0523Mycmob MMOB0320
Porgin PG1982Legpne lpp0161
Wolsuc WS1615Sulsol SSO1450
Aerper APE1240Sulsol SSO1405
Pyraer PAE0200Pyraer PAE0081
Metmaz MM0559Metmaz MM3249
Arcful AF1878Mansuc MS1635
Niteur NE0111Myctub Rv2817c
Nostoc alr0381Synech slr7071Thethe TTHB145
Chrvio CV1229Xanaxo XAC3842
Chltep CT1130Desvul DVUA0134
Metcap MCA0651Azoarc ebA3283
Deigeo_DRAFT_1682Mansuc MS0981
Baccla ABC3592Bachal BH0341
Strpyo Spy 1286Vibvul VVA1544
Nostoc alr1468Synech slr7092
Synech slr7016Geosul GSU0057
Thethe TTHB224Nostoc alr1568
Metkan MK1312
Esccol b2755Saltyp STM2938
Phopro PBPRC0034Geosul GSU1392
Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639
Symthe_STH663Corjei jk0643
Nocfar nfa44220Strave SAV7537
Metcap MCA0930Cordip DIP2214
Chrvio CV1756Zymmob ZMO0680
Pasmul PM0311Erwcar ECA3679
Yerpes y1722Legpne lpl2837Phopro PBPRB1995
Acinet ACIAD2484
Clotet CTC01148Fusnuc FN1177
Aquaeo aq 369Theten TTE2658Themar TM1797
Bacfra BF2544Porgin PG2014
Halmar pNG4053Clotet CTC01463
Metjan MJ0378Pyrhor PH1245
Thekod TK0455Metthe MTH1084
Arcful AF2435Metace MA3670
Nanequ NEQ017Pyrhor PH0173
Pictor PTO0003Pictor PTO0049Thevol TVN0106
1343
1343
1343
1343
1518
1518
1343
1518
RecB-f4343
RecB-f1468
RecB-f1468
RecB-f1468
Helicase12031857 RAMP
1688
RAMP1688
Helicase1203
Helicase1203
RAMPBH0337
RAMPYgcH
RAMPy1727
RAMPy17261857
Helicase1203
HD-f
Helicase1203
HD-f
1343
HD-f
HD-f
HD-f
RuvC-f
AF0070
BH0338
y1724
ygcKygcL
RAMP1583
HTH
AF1870
SPy1049
3574
AF1870
PH0918
HTH 3574
1857 RAMP1688
RecB-f1468
Helicase1203
HD-f
AF1870
MTH1090
AF1873
RAMP1583
3574
AF1870
3574
1343
1
7
2
3
4
5
6
7a
Sulsol SSO1999Sulsol SSO1440
Sultok ST2639Sultok ST0032
Sulsol SSO1402Pyraer PAE0068
Arcful AF1874Pyrhor PH0917Thekod TK0450
Pyraby PAB1689
13433513
3513
1518
RuvC-f
HNH
4aHNH
1 0
Pyrhor PH0162
6
64
6
4
16
17
7
7
7
7
1
2
7
7
7
7
5 7
5 75 7
5 75 7
5 7
6 1
7
Metkan MK1314
Metace MA1931
Metmaz MM3356
Metbar Mbar_A1355
Strthe stu0960Myctub Rv2823cStaepi SERP2461
Themar TM1811Mansuc MS1651
Niteur NE0123Thethe TTP0102
Metthe MTH1082
Metthe MTH326
Metjan MJ1672Pictor PTO0053Thevol TVN0112
Metkan MK1297Thethe TTP0115
Bachal BH0328Synech sll7090
Aquaeo aq 387Theten TTE2639
Themar TM1794Arcful AF1867
Pyrfur PF1129Thefus Tfu1578
Porgin PG1987Sultok ST1979
Sulsol SSO1991Sulsol SSO1729
Sultok ST0012Sulaci Saci 2046Nostoc all1479
1421 RAMP1567
RAMP1583
RAMPMTH323
RAMP1769
XMTH324
ldquoPolymeraserdquo cassette
ldquoHelicaserdquo cassette(A)
(B)
(C)
Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4
Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56
Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)
mRNA targeting
Phylogeny of Cas1 and the 3 types of CRISPRCas systems
228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes
EcoliCASS2Type IType I
Type I
Type I
Type I
Type III
Type III
Type III
Type III
Type III
Type III
Type II
YpestCASS3
NmeniCASS4
PolymeraseRAMPmoduleMtubeCASS6
PolRAMP
DvulgCASS1
TneapHmariCASS7
ApernCASS5
cas1cas9 cas2 cas4
cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI
II
IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2
RAMP
Makarova et al NRM submitted
CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes
54
256
1537
26
516
2
9
7 56107
3
10
13
383
210
46
216
8
1
7 70211
10
1
0102030405060708090
100
1623
84 6
17 7
2322
0
9
0
0
151
2
1
21
17
20 1
0
1533 28 7 14
0
9 7 40
117 210
0102030405060708090
100
Type I Type II Type IIICas1presenceabsence
Makarova et al in preparation
Modular evolution of the 3 types of CRISPRCas
Makarova et al in preparation
Mol Microbiol 2011 Jan79(2)484-502
A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair
Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF
Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair
Back to repair functions
The 3 major modalities of evolution
Koonin Wolf Biol Direct 2009
CRISPRCas as a bona fide Lamarckian system
Koonin Wolf Biol Direct 2009
Phenomenon Biological rolefunction
Phyletic spread
Lamarckian criteria
Genomic changes caused by environmental factor
Changes are specific to relevant genomic loci
Changes provide adaptation to the causative factor
Bona fide LamarckianCRISPRCas Defense against
viruses and other mobile elements
Most of the Archaeaand many bacteria
Yes Yes Yes
piRNA Defense against transposable elements in germline
Animals Yes Yes Yes
HGT (specific cases)
Adaptation to new environment stress response resistance
Archaea bacteria unicellular eukaryotes
Yes Yes Yes
Quasi-LamarckianHGT (general phenomenon)
Diverse innovations Archaea bacteria unicellular eukaryotes
Yes No Yesno
Stress-induced mutagenesis
Stress responseresistanceadaptation to new conditions
Ubiquitous Yes No or partially
Yes (but general evolvabilityenhanced as well)
Diverse Lamarckian and quasi-Lamarckian phenomena
Koonin Wolf Biol Direct 2009
Stress as a gauge of evolutionary modality
Koonin Wolf Biol Direct 2009
bullEvolution of parasites is intrinsic to any replicator system
bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since
bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms
bullPerennial arms race between parasites and hosts is one of the principal factors of evolution
Valerian DoljaOregon State University
Yuri Wolf NCBITatiana Senkevich NIAID NIH
Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille
Didier Raoult et al
Bill MartinUniv Duesseldorf
Kira Makarova NCBI
The amended Picornavirus-like
superfamily includes 14 recognized viral families
4 floating genera and15 unclassified positive-
strand anddouble-strand RNA
viruses that infect hosts from 4 of the 5 eukaryotic
supergroups-6 distinct clades
from RdRp phylogeny-diverse genome layouts
View from the viral side
6 major clades of picorna-likeviruses - 5 infect eukaryotesfrom different supergroups
The 5 supergroups of eukaryotes and their picorna-like viruses
Complementary view from thehost side
4 of the 5 eukaryoticsupergroups host diverse picorna-like viruses (no onechecked the 5th)
PolioV74 kb
VP0 3DPol2C 3CPro Angg
IRESVP1VP3 2APro 2B
3A 3B
Jelly-roll CPsSuperfamily 3
helicase RdRp
DNA phages Bacterial group II
retroelements
Bacterial and mitochondrialHtrA family proteases
Likely origins of the signature genes of the picorna-like superfamily were inferred usingsignature PSSMs and PSI-BLAST searches
3C-Pro
Hallmark genes of picorna-like viruses have distinct prokaryoticorigins
Radiation of major viralclades occurred in aldquoBig Bangrdquo during eukaryogenesis and antedates the divergenceof eukaryotic supergroups-the viruses then ldquosampledrdquo the hosts
Conclusions on the picorna and NCLDV storiesBig Bangs of virus evolution
bullPhylogenomic analysis of the picorna-like superfamily of eukaryotic RNA viruses indicates that the major lineages within this superfamily diverged prior to the divergence of the eukaryotic supergroupsbullBig Bang ndash an explosive early phase of viral evolutionbullMost likely the same pattern holds for other major groups ofviruses as illustrated by the evolutionary study of NCLDVa completely different group of virusesbullThe Big Bangs of eukaryotic virus evolution occurred concomitantly with a similar rapid phase of host evolution andcould be a manifestation of a general model of major evolutionarytransitions
Koonin The Biological Big Bang model for the major transitions in evolutionBiol Direct 2007 Aug 20221
Koonin Wolf Nagasaki Dolja Nat Rev Microbiol 2008 Dec6(12)925-39
+
1 Positive-strandRNA
3-30 kb - +R R ET
T+R
Class Replication cycle
2 Double-strandRNA
4-25 kb plusmnTr R E
T
+ plusmn
3 Negative-strandRNA
11-20 kbR R E
RdRp CPT
+-Tr -
+
Genetic cycles of RNA viruses
Diversity of viral genetic cycles versus the uniform genetic cycle of all cellular organisms Viruses are the
biospherersquos laboratory of genomic strategies
RdRp CP
RdRp CP
+4 Retroid
RNA viruses 7-12 kb +RT Tr E
T+Tr
plusmn
5 Retroid DNAviruses
elements2-10 kb +Tr
T+Tr
plusmnE RT
plusmn
RT CP
RT
Class Replication cycle
Genetic cycles of retroid viruses and retroelements
Genetic cycles of DNA viruses and plasmids 6 ssDNAvirusesplasmids2-11 kb
7 dsDNA virusesplasmids
5-1200 kb
+RCR RCR E
+Tr
plusmn +
TRCE
Tr
plusmn
+T
plusmnR E
Pr-Pol ALL CELLULARORGANISMS
YOU ARE HERE
I Genes with readily detectable homologs from cellular life forms 1 Genes with closely related homologs from cellular organisms
(typically the host of the given virus) present in a narrow group of viruses2 Genes that are conserved within a virus lineage or even several
lineages and have moderately close cellular homologsOrigin relatively recent (1) or ancient (2) acquisition from host
II Virus-specific genes3 ORFans ie genes without detectable homologs except possibly
in closely related viruses 4 Virus-specific genes that are conserved within a virus lineageAcquisition from host but with rapid divergence from ancestor once within viral genomes
III Viral hallmark genes5 Genes shared by many diverse virus lineages with only very distant
homologs in cellular organisms
Natural history of viral genes a one-page summary of viral comparative genomics
Contributions of different classes of viral genes to the genomes of different classes of viruses strong dependence on genome size
Genome size (log) 0 1 2 3
Recent acquisitions
Old acquisitions
Virus-specific ORFans
Virus-specific conserved
Virus hallmark
Most RNAvirusesretroelementsRCR replicons
Large RNA virusesadenotailedphages
NCLDVherpeslargephages
Natural history of viral genesViral Hallmark Genes
Shared by many diverse groups of viruses from the smallest RNA viruses to the giant DNA viruses
Strong support for monophyly of all viral members of the respective gene families
Only distant homologs in cellular organisms
Can be viewed as signatures of the lsquovirus statersquo
Play major roles in genome replication packagingand assembly
4 Rolling circle replicationinitiation endonuclease
1 Jelly-roll capsid protein
2 Superfamily 3 helicase
Protein products of viral hallmark genes
3 RNA-dependent RNA polymeraseand Reverse transcriptase
5 Viral archaeo-eukaryotic DNA primase
6 UL9-like superfamily 2 helicase
7 Packaging ATPase of the FtsK family
8 ATPase subunit of terminase
The primordial gene pool hypothesis (extremely counterintuitive ndash Santiago Elena Feb 16 2011)
The hallmark genes AND by implication the major lineages of modern viruses (at least viruses of prokaryotes) descend directly from a primordial gene pool
Koonin Senkevich Dolja Biol Direct 2006 129
Viral hallmark genes arebullpresent in a huge diversity of viruses and other selfish elementsbullrepresented only by remote homologs in cellular life forms
-synergy with the diversity of genomic strategies
A crucial corollary If viruses come directly from a primordial gene pool then origin of viruses isinextricably linked to the origin of cells
Cell degeneration Escaped genes Pr imordial geneticsystems
CELL
SMALLPARASITICCELL
VERYSMALLVIRUS
CHROMOSOME
PLASMID
VIRUS
RNA
VIRUS
PRE-CELLULAR LIFE FORMS
DNA
VIRUS
mRNA
VIRUS
Competing concepts of the or igin of viruses
CELLS
Origin and evolution of virus-like genetic
elementsin the pre-cellular
era
Koonin Martin TIG 2005Koonin Senkevich Dolja Biol Direct 2006 129Koonin Ann NY Acad Sci 2009
Replicon fusionyields chromosomes
bullViruses and virus-like genetic elements are not ldquojustrdquo pathogens they aredominant entities in the biospherebullEmergence of virus-like parasites is inevitable in any replicating systembullIn the pre-cellular epoch the genetic elements that later became viral andcellular genomes comprised a single pool in which they mixed matched andevolved new increasingly complex gene ensemblesbullDifferent replication strategies including RNA replication reverse transcriptionand DNA replication evolved already in the primordial genetic poolbullWith the emergence of prokaryotic cells a distinct pool of viral genes formed that retained its identity ever since as evidenced by the extant distribution ofviral hallmark genes ldquovirus worldrdquo or the virospherebullThe emergence of the eukaryotic cell was a second melting pot of virus evolution from which viruses of eukaryotes originated via recombination of genes from prokaryote viruses retroelements and the evolving eukaryotic hostbullViruses make essential contributions to the evolution of the genomes of cellularlife forms in particular as vehicles of HGT GTAs transducing phages
The ancient Virus World (VW)
Koonin EV Senkevich TG Dolja VV The ancient Virus World and evolution of cells Biol Direct 2006 Koonin EV Wolf YI Nagasaki K Dolja VV The complexity of the virus world Nat Rev Microbiol 2009
Evolution of antivirus defense systems
bull CRISPRCas system of adaptive immunity in prokaryotes
bull A case for Lamarckian evolutionbull The perennial virus-host arms race
CRISPR repeats and Cas genesCRISPR Clustered Regularly interspaced short palindromic repeatsCas CRISPR-associated (genes)
Sorek et al Nature Rev Microbiol 2008
Makarova KS Aravind L Grishin NV Rogozin IB Koonin EVA DNA repair system specific for thermophilic Archaea and bacteria predicted by genomic context analysisNucleic Acids Res 2002 Jan 1530(2)482-96
During a systematic analysis of conserved gene context in prokaryotic genomes a previously undetected complex partially conserved neighborhood consisting of more than 20 genes was discovered in most Archaea and some bacteria including the hyperthermophiles Thermotoga maritima and Aquifex aeolicus The gene composition and gene order in this neighborhood vary greatly between species but all versions have a stable conserved core that consists of five genes One of the core genes encodes a predicted DNA helicase often fused to a predicted HD-superfamily hydrolase and another encodes a RecB family exonuclease three core genes remain uncharacterized but one of these might encode a nuclease of a new family helliphelliphelliphellipThe functional features of the proteins encoded in this neighborhood suggest that they comprise a previously undetected DNA repair system which to our knowledge is the first repair system largely specific for thermophiles to be identified
Protein components of the system an update with unification of many diverse families
~25 families altogetherFamily SubfamilyA Phyletic
distributionBComments
1 COG1518 COG1518 (cas1) All Putative novel nucleaseintegrase Mostly α-helical protein
2 COG1343 COG1343 (cas2) COG3512ygbF-like
MTH324-likey1723_N-like
All Small protein related to VapD fused to helicase (COG1203) iny1723-like proteins
3 COG1203 COG1203 (cas3) All DNA helicase Most proteins have fusion to HD nuclease
4 RecB-like nuclease
COG1468 (cas4) COG4343 All RecB-like nuclease Contains three-cysteine C-terminal cluster
5 RAMP Repair-associated mysterious
protein
COG1688 (cas5)COG1769 COG1583COG1567 COG1336COG1367 COG1604COG1337 COG1332COG5551
BH0337-likeMJ0978-likeYgcH-like
y1726-like y1727-like
All Belong to ldquoRAMPrdquo family possibly RNA-binding proteinstructurally related to duplicated ferredoxin fold (PDB 1wj9)
6 COG1857 COG1857 COG3649 YgcJ-like y1725-like
All αβ protein predicted nuclease or integrase
7 HD-like nuclease
COG1203 (N-terminus) COG2254 All HD-like nuclease
8 BH0338 BH0338-likeMTH1090-like
All mostlyarchaea and FIRM
Large Zn-finger containing proteins probably nucleases (nucleaseactivity was shown for MTH1090 (ref)
9 COG1353 predicted
polymerase
COG1353 Most archaea some bacteria
Predicted palm-domain polymerase distantly related to viral RdRp and RT
hellipand ~15 other less common proteins Makarova et al BD 2006
CRISPR clustered regularly interspaced short palindromic repeats
CRISPR repeats
TB and TA
Strepto-like
Ecoli-like
Pasteurella-like
Cas2COG1343
Cas1COG1518
Cas4COG1468
Cas3COG1203
ldquoThe common structural characteristics of CRISPR loci are (i) the presence of multiple short direct repeats which show no or very little sequence variation within a given locus (ii) the presence of non-repetitive spacer sequences between the repeats of similar size (iii) the presence of a common leader sequence of a few hundred basepairs in most species harbouring multiple CRISPR loci (iv) the absence of long open reading frames within the locus and (v) the presence of the cas1 gene accompanied by the cas2 cas3 or cas4 genes in CRISPR-containing speciesrdquo
The cas genes are our ldquorepairrdquo system
ldquoHelicaserdquo cassette
ldquoPolymeraserdquo cassette
paragon of prokaryotic genome plasticity-extensive gene shuffling-evidence of multiple horizontal transfers-widespread gene loss
RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins
extreme sequence diversity MOTIFs specific I II III IV V COGs 13361367 hhshhGs ust-lKGhh+hh hhGtt hD lGhttsGh 160413371332 y1726-like slhlpEKuVRGT lRTIDs YGuVTsGhuh COG1851 hGphpGpsaFh hGFGRh BH0337-like TpA-h+GIh-uIh hhLpDV LGsREhuht COG1567 hhhpp ups-lhtAhh luscoGhGh COG1769 hhhh+Ph-hhh ss-hhGhlhsh hGh hGtcp+hsthchp COG1688 (Cas5) hhhhhts sssshhGhlsh lGttphh COGs 15835551 hhhhoPhhl hGtppshGFGl YgcH-like hHphlh hGu+uhGhGhh y1727-like LHphLh GFsthGLStss MJ0978-like hHNH lG+tsuhGhGol
Ferredoxin foldRNA-recognition motif (RRM)
The sequence similarity space of CRISPR repeats visualized with the BioLayoutprogram [26] Dots denote individual repeat sequences connecting lines represent Smith-Waterman similarities such that closer dots represent more similar sequences Dot colors denote cluster association as derived from MCL clustering The 12 largest clusters are indicated by circles together with their sequence logos coarse phylogenetic composition and sample secondary structures where applicableKunin et al Genome Biology 2007 8R61 doi101186gb-2007-8-4-r61
CRISPR show extreme diversity and complex clustering
Arcful GTTGAAATC-------AGACCAAAATGGGATTGAAAG- Metthe GTTAAAATC-------AGACCAAAATGGGATTGAAAT- Pyraby GTTCCAATA-------AGACTAGAATAGAATTGAAAG- Pyrhor GTTTAATAA--------GACTAAAATAGAATTGAAAG- Sulsol GATTAATCC-------------CAAAAGAATTGAAAG- Pyrhor GTTTCCGTA--------GAACTAAATAGTGTGGAAAG- Pyraby GTTTCCGTA--------GAACTTAGTAGTGTGGAAAG- Pyraer GTTTCAACT------------ATCTTTTGATTTTTGG- Pictor GTTAAATAA-------TAACCTAAATAGGATTGAAAG- Theaer GTAAAATAG---------ACCTTAATAGGATTGAAAG- Metace GTTTCAATC-------CCTCAAAGGTCTGATTTTAAC- Sultok GATGAATCC------------CAAAAGGAATTGAAAG- Sulaci GTTTTAGTT-------------TCTTGTCGTTATTAC- Pictor GTTTAAGAA--------TTACTAGATAGTATGGAGT-- Halmar GTTTCAGAC-------GGACCCTTGTGGGGTTGAAGC- Metjan ATTAAAATC-------AGACCGTTTCGGAATGGAAAT- Metkan GTTTCATTA-CCCGTATTATTACGGGTTAATTGCGAG- Nanequ CTTTCAATA---------TTTCTAATATATTAGAAAC- Themar GTTTCAATA-------CTTCCTTAGAGGTATGGAAAC- Theten GTTTCAATC-------CCTTTTAGGTAGGCTAAAAAC Thethe GTTGCAAAC-CTCGTTAGCCTCGTAGAGGATTGAAAC- Aquaeo GTTTTAACT--------CCACACGGTACATTAGAAAC- Bachal GTCGCACTC----TATATGGGT-GCGTGGATTGAAAT- Azoarcus GTGTTCCCC-------GCGCATCGCGGGGGTTGAAG-- Chltep GTCTTCCCC--------ACGCC-CGTGGGGGTGTTTC- Chrvio GTGCTCCCC-------CACGCA-CGTGGGGATGAACCG Clotet GTATTAGTA-------GCACCA-TATTGGAATGTAAAT Anabaena GTTTTAATTAACAAAAATCCCTATCAGGGATTGAAAC- Myctub GTTTCCGTC--CCCTCTCGGGGTTTTGGGTCTGACGAC Yerpes GTTCACTGC--------CGCACAGGCAGCTTAGAAA-- Metcap GTTTCAATCCACTCCCGGCTATTTAGCCGGGAGATAC- Mycgal GTTTTAGCA-CTGTACAATACTTGTGTAAGCAATAAC- Pasmul GTTAACTGC--------CGTATAGGCAGCTTAGAAA-- Pasmul GTTGTAGTTCCCTCTCTCATTTCGCAGTGCTACAAT-- Pasmul GTTCACCAT--------CGTGTAGATGGCTTAGAAA--
All
A large subset of CRISPRs is conserved among diverse specieseven between archaea and bacteria suggesting that
CRISPRs are horizontally transferred together with cas genes
Mojica FJ Diez-Villasenor C Garcia-Martinez J Soria E
Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elementsJ Mol Evol 2005 Feb60(2)174-82
Here we show that CRISPR spacers derive from preexisting sequences either chromosomal or within transmissible genetic elements such as bacteriophages and conjugative plasmids
helliphelliphelliphellipThe transcription of the CRISPR loci (Tang et al 2002) suggests that such activity could be executed by CRISPR-RNA molecules acting as regulatory RNA that specifically recognizes the target through the homologous RNA-spacer sequence similarly to the eukaryotic interference RNA
HypothesisCRISPRCasbull is a prokaryotic immunity system that functions on the
RNAi principle
bull integrates short fragments of essential phageplasmid genes into CRISPRs
bull When expressed these fragments (psiRNA ndash after prokaryotic siRNA) silence the target gene and make the organism immune to the respective agent
bull contains all or most of the protein activities involved in these processes
bull Some of the Cas proteins are functional analogs of the eukaryotic proteins involved in RNAi in particular components of RISC (RNA-Induced Silencing Complex) and form prokaryotic analogs of RISC (pRISCs)
Transcription
(protein-guided) RNA folding
specific transcription factor (COG1517)
cellular RNApol
polycistronic pre-psiRNA
p-dicer (Helicase+HD-Hydrolase)(COG1203 COG2254)
RNA processing(fast)
pre-psiRNA75-100 nt
RNA processing(slow)
p-dicer
psiRNA25-45 nt
p-slicer(COG1468 COG4343
COG1857)
RAMP
target RNA cleavage
p-RISC
plasmid or phage mRNA
Annealing to RNA target
p-RISC
The basic scheme of CASS functioning
psiRNA25-45 nt
RAMP
RAMP-RNA complex(unstable)plasmid or phage
mRNA
Primer elongationCASS RNApol(COG1353)
Amplified annealing
long dsRNA (stable)
p-dicerDuplex degradation
RAMPRAMP binding
Cycle continues
Annealing to RNA target
Variant of CASS functioning with polymerasepsi-RNA amplification(mostly in thermophiles but also in Mycobacteria)
plasmid or phage mRNA
pre-psiRNA75-100 nt
CASS RNApol(COG1353)Or other RT
Reverse transcription with random copy choice
Random RNA recombination and reverse transcription
dsDNA with CRISPRs and target-derived spacers
Homologous recombination with genomic CRISPR region
integrase(COG1518)
genomic DNA
genomic DNA with new target-derived spacer
OR
New psiRNA generation
Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12
Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity
Key validation
bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease
Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes
(1) Adaptation
(2) Expression amp Processing
(3) Interference
invading virus plasmid
CRISPRleader
5 4 3 2 1
6 5 4 3 2 1
invading virus plasmid
invading virus plasmid
host
host
cas operon
Cascade
Cas3
cas3 cas2cas1cas5
Van der Oost et al TIBS 2009
The three types of CRISPRCas systems and their signature genes
1343
1343
1343
1343
Helicase1203
HD-f
Cas2
Cas6
Cas6
Cas5
Cas8
Cas9
Cas1
Cas3
Cas10
Cas11
Cas7
Cas6
Cas6
Cas5 Cas6
Cas4
Cas6
13433513
1343
RuvC-f
3513
HNH
13433513
1857
18571857 RAMP1688
RAMP1688
RecB-f1468
RecB-f1468
Helicase1203
R AMP1583
1857 1857Helicase1203
1857Helicase1203
1343 y1724
ygcKygcL
EcoliCASS2
YpestCASS3
NmeniCASS4
MtubeCASS6 RAMP module
DvulgCASS1
Tneap and HmariCASS7
ApernCASS5
Type I (Helicase cassette)
Cascade
Type II (HNH-type)
Type III (RAMP module or Polymerase cassette)
CASS4a
1857 1343RecB-f1468
Helicase1203
BH0338MTH1090LA3191
1421 RAMP1567
RAMP15833337RAMP
1583
RAMP1583
RAMP1769
SPy1049
Cas6
Cas61857
RecB-f1468
RAMP1688
Helicase1203
AF0070
R AMP1583
1343
1517
Cas12 Cas13
RAMP1688
RAMP1688
RAMP1688
Makarova et al NRM submitted
Experimental data on CRISPRCas systems
1857
1857
1857
RAMP1688
10
Cordip DIP0037Bacfra BF3951
Wolsuc WS1444Camjej Cj1522c
Neimen NMA0630Pasmul PM1126
Strthe str0658Strpyo Spy 0770
Treden TDE0328Staepi SERP2463
Mycgal MGA 0523Mycmob MMOB0320
Porgin PG1982Legpne lpp0161
Wolsuc WS1615Sulsol SSO1450
Aerper APE1240Sulsol SSO1405
Pyraer PAE0200Pyraer PAE0081
Metmaz MM0559Metmaz MM3249
Arcful AF1878Mansuc MS1635
Niteur NE0111Myctub Rv2817c
Nostoc alr0381Synech slr7071Thethe TTHB145
Chrvio CV1229Xanaxo XAC3842
Chltep CT1130Desvul DVUA0134
Metcap MCA0651Azoarc ebA3283
Deigeo_DRAFT_1682Mansuc MS0981
Baccla ABC3592Bachal BH0341
Strpyo Spy 1286Vibvul VVA1544
Nostoc alr1468Synech slr7092
Synech slr7016Geosul GSU0057
Thethe TTHB224Nostoc alr1568
Metkan MK1312
Esccol b2755Saltyp STM2938
Phopro PBPRC0034Geosul GSU1392
Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639
Symthe_STH663Corjei jk0643
Nocfar nfa44220Strave SAV7537
Metcap MCA0930Cordip DIP2214
Chrvio CV1756Zymmob ZMO0680
Pasmul PM0311Erwcar ECA3679
Yerpes y1722Legpne lpl2837Phopro PBPRB1995
Acinet ACIAD2484
Clotet CTC01148Fusnuc FN1177
Aquaeo aq 369Theten TTE2658Themar TM1797
Bacfra BF2544Porgin PG2014
Halmar pNG4053Clotet CTC01463
Metjan MJ0378Pyrhor PH1245
Thekod TK0455Metthe MTH1084
Arcful AF2435Metace MA3670
Nanequ NEQ017Pyrhor PH0173
Pictor PTO0003Pictor PTO0049Thevol TVN0106
1343
1343
1343
1343
1518
1518
1343
1518
RecB-f4343
RecB-f1468
RecB-f1468
RecB-f1468
Helicase12031857 RAMP
1688
RAMP1688
Helicase1203
Helicase1203
RAMPBH0337
RAMPYgcH
RAMPy1727
RAMPy17261857
Helicase1203
HD-f
Helicase1203
HD-f
1343
HD-f
HD-f
HD-f
RuvC-f
AF0070
BH0338
y1724
ygcKygcL
RAMP1583
HTH
AF1870
SPy1049
3574
AF1870
PH0918
HTH 3574
1857 RAMP1688
RecB-f1468
Helicase1203
HD-f
AF1870
MTH1090
AF1873
RAMP1583
3574
AF1870
3574
1343
1
7
2
3
4
5
6
7a
Sulsol SSO1999Sulsol SSO1440
Sultok ST2639Sultok ST0032
Sulsol SSO1402Pyraer PAE0068
Arcful AF1874Pyrhor PH0917Thekod TK0450
Pyraby PAB1689
13433513
3513
1518
RuvC-f
HNH
4aHNH
1 0
Pyrhor PH0162
6
64
6
4
16
17
7
7
7
7
1
2
7
7
7
7
5 7
5 75 7
5 75 7
5 7
6 1
7
Metkan MK1314
Metace MA1931
Metmaz MM3356
Metbar Mbar_A1355
Strthe stu0960Myctub Rv2823cStaepi SERP2461
Themar TM1811Mansuc MS1651
Niteur NE0123Thethe TTP0102
Metthe MTH1082
Metthe MTH326
Metjan MJ1672Pictor PTO0053Thevol TVN0112
Metkan MK1297Thethe TTP0115
Bachal BH0328Synech sll7090
Aquaeo aq 387Theten TTE2639
Themar TM1794Arcful AF1867
Pyrfur PF1129Thefus Tfu1578
Porgin PG1987Sultok ST1979
Sulsol SSO1991Sulsol SSO1729
Sultok ST0012Sulaci Saci 2046Nostoc all1479
1421 RAMP1567
RAMP1583
RAMPMTH323
RAMP1769
XMTH324
ldquoPolymeraserdquo cassette
ldquoHelicaserdquo cassette(A)
(B)
(C)
Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4
Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56
Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)
mRNA targeting
Phylogeny of Cas1 and the 3 types of CRISPRCas systems
228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes
EcoliCASS2Type IType I
Type I
Type I
Type I
Type III
Type III
Type III
Type III
Type III
Type III
Type II
YpestCASS3
NmeniCASS4
PolymeraseRAMPmoduleMtubeCASS6
PolRAMP
DvulgCASS1
TneapHmariCASS7
ApernCASS5
cas1cas9 cas2 cas4
cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI
II
IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2
RAMP
Makarova et al NRM submitted
CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes
54
256
1537
26
516
2
9
7 56107
3
10
13
383
210
46
216
8
1
7 70211
10
1
0102030405060708090
100
1623
84 6
17 7
2322
0
9
0
0
151
2
1
21
17
20 1
0
1533 28 7 14
0
9 7 40
117 210
0102030405060708090
100
Type I Type II Type IIICas1presenceabsence
Makarova et al in preparation
Modular evolution of the 3 types of CRISPRCas
Makarova et al in preparation
Mol Microbiol 2011 Jan79(2)484-502
A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair
Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF
Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair
Back to repair functions
The 3 major modalities of evolution
Koonin Wolf Biol Direct 2009
CRISPRCas as a bona fide Lamarckian system
Koonin Wolf Biol Direct 2009
Phenomenon Biological rolefunction
Phyletic spread
Lamarckian criteria
Genomic changes caused by environmental factor
Changes are specific to relevant genomic loci
Changes provide adaptation to the causative factor
Bona fide LamarckianCRISPRCas Defense against
viruses and other mobile elements
Most of the Archaeaand many bacteria
Yes Yes Yes
piRNA Defense against transposable elements in germline
Animals Yes Yes Yes
HGT (specific cases)
Adaptation to new environment stress response resistance
Archaea bacteria unicellular eukaryotes
Yes Yes Yes
Quasi-LamarckianHGT (general phenomenon)
Diverse innovations Archaea bacteria unicellular eukaryotes
Yes No Yesno
Stress-induced mutagenesis
Stress responseresistanceadaptation to new conditions
Ubiquitous Yes No or partially
Yes (but general evolvabilityenhanced as well)
Diverse Lamarckian and quasi-Lamarckian phenomena
Koonin Wolf Biol Direct 2009
Stress as a gauge of evolutionary modality
Koonin Wolf Biol Direct 2009
bullEvolution of parasites is intrinsic to any replicator system
bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since
bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms
bullPerennial arms race between parasites and hosts is one of the principal factors of evolution
Valerian DoljaOregon State University
Yuri Wolf NCBITatiana Senkevich NIAID NIH
Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille
Didier Raoult et al
Bill MartinUniv Duesseldorf
Kira Makarova NCBI
View from the viral side
6 major clades of picorna-likeviruses - 5 infect eukaryotesfrom different supergroups
The 5 supergroups of eukaryotes and their picorna-like viruses
Complementary view from thehost side
4 of the 5 eukaryoticsupergroups host diverse picorna-like viruses (no onechecked the 5th)
PolioV74 kb
VP0 3DPol2C 3CPro Angg
IRESVP1VP3 2APro 2B
3A 3B
Jelly-roll CPsSuperfamily 3
helicase RdRp
DNA phages Bacterial group II
retroelements
Bacterial and mitochondrialHtrA family proteases
Likely origins of the signature genes of the picorna-like superfamily were inferred usingsignature PSSMs and PSI-BLAST searches
3C-Pro
Hallmark genes of picorna-like viruses have distinct prokaryoticorigins
Radiation of major viralclades occurred in aldquoBig Bangrdquo during eukaryogenesis and antedates the divergenceof eukaryotic supergroups-the viruses then ldquosampledrdquo the hosts
Conclusions on the picorna and NCLDV storiesBig Bangs of virus evolution
bullPhylogenomic analysis of the picorna-like superfamily of eukaryotic RNA viruses indicates that the major lineages within this superfamily diverged prior to the divergence of the eukaryotic supergroupsbullBig Bang ndash an explosive early phase of viral evolutionbullMost likely the same pattern holds for other major groups ofviruses as illustrated by the evolutionary study of NCLDVa completely different group of virusesbullThe Big Bangs of eukaryotic virus evolution occurred concomitantly with a similar rapid phase of host evolution andcould be a manifestation of a general model of major evolutionarytransitions
Koonin The Biological Big Bang model for the major transitions in evolutionBiol Direct 2007 Aug 20221
Koonin Wolf Nagasaki Dolja Nat Rev Microbiol 2008 Dec6(12)925-39
+
1 Positive-strandRNA
3-30 kb - +R R ET
T+R
Class Replication cycle
2 Double-strandRNA
4-25 kb plusmnTr R E
T
+ plusmn
3 Negative-strandRNA
11-20 kbR R E
RdRp CPT
+-Tr -
+
Genetic cycles of RNA viruses
Diversity of viral genetic cycles versus the uniform genetic cycle of all cellular organisms Viruses are the
biospherersquos laboratory of genomic strategies
RdRp CP
RdRp CP
+4 Retroid
RNA viruses 7-12 kb +RT Tr E
T+Tr
plusmn
5 Retroid DNAviruses
elements2-10 kb +Tr
T+Tr
plusmnE RT
plusmn
RT CP
RT
Class Replication cycle
Genetic cycles of retroid viruses and retroelements
Genetic cycles of DNA viruses and plasmids 6 ssDNAvirusesplasmids2-11 kb
7 dsDNA virusesplasmids
5-1200 kb
+RCR RCR E
+Tr
plusmn +
TRCE
Tr
plusmn
+T
plusmnR E
Pr-Pol ALL CELLULARORGANISMS
YOU ARE HERE
I Genes with readily detectable homologs from cellular life forms 1 Genes with closely related homologs from cellular organisms
(typically the host of the given virus) present in a narrow group of viruses2 Genes that are conserved within a virus lineage or even several
lineages and have moderately close cellular homologsOrigin relatively recent (1) or ancient (2) acquisition from host
II Virus-specific genes3 ORFans ie genes without detectable homologs except possibly
in closely related viruses 4 Virus-specific genes that are conserved within a virus lineageAcquisition from host but with rapid divergence from ancestor once within viral genomes
III Viral hallmark genes5 Genes shared by many diverse virus lineages with only very distant
homologs in cellular organisms
Natural history of viral genes a one-page summary of viral comparative genomics
Contributions of different classes of viral genes to the genomes of different classes of viruses strong dependence on genome size
Genome size (log) 0 1 2 3
Recent acquisitions
Old acquisitions
Virus-specific ORFans
Virus-specific conserved
Virus hallmark
Most RNAvirusesretroelementsRCR replicons
Large RNA virusesadenotailedphages
NCLDVherpeslargephages
Natural history of viral genesViral Hallmark Genes
Shared by many diverse groups of viruses from the smallest RNA viruses to the giant DNA viruses
Strong support for monophyly of all viral members of the respective gene families
Only distant homologs in cellular organisms
Can be viewed as signatures of the lsquovirus statersquo
Play major roles in genome replication packagingand assembly
4 Rolling circle replicationinitiation endonuclease
1 Jelly-roll capsid protein
2 Superfamily 3 helicase
Protein products of viral hallmark genes
3 RNA-dependent RNA polymeraseand Reverse transcriptase
5 Viral archaeo-eukaryotic DNA primase
6 UL9-like superfamily 2 helicase
7 Packaging ATPase of the FtsK family
8 ATPase subunit of terminase
The primordial gene pool hypothesis (extremely counterintuitive ndash Santiago Elena Feb 16 2011)
The hallmark genes AND by implication the major lineages of modern viruses (at least viruses of prokaryotes) descend directly from a primordial gene pool
Koonin Senkevich Dolja Biol Direct 2006 129
Viral hallmark genes arebullpresent in a huge diversity of viruses and other selfish elementsbullrepresented only by remote homologs in cellular life forms
-synergy with the diversity of genomic strategies
A crucial corollary If viruses come directly from a primordial gene pool then origin of viruses isinextricably linked to the origin of cells
Cell degeneration Escaped genes Pr imordial geneticsystems
CELL
SMALLPARASITICCELL
VERYSMALLVIRUS
CHROMOSOME
PLASMID
VIRUS
RNA
VIRUS
PRE-CELLULAR LIFE FORMS
DNA
VIRUS
mRNA
VIRUS
Competing concepts of the or igin of viruses
CELLS
Origin and evolution of virus-like genetic
elementsin the pre-cellular
era
Koonin Martin TIG 2005Koonin Senkevich Dolja Biol Direct 2006 129Koonin Ann NY Acad Sci 2009
Replicon fusionyields chromosomes
bullViruses and virus-like genetic elements are not ldquojustrdquo pathogens they aredominant entities in the biospherebullEmergence of virus-like parasites is inevitable in any replicating systembullIn the pre-cellular epoch the genetic elements that later became viral andcellular genomes comprised a single pool in which they mixed matched andevolved new increasingly complex gene ensemblesbullDifferent replication strategies including RNA replication reverse transcriptionand DNA replication evolved already in the primordial genetic poolbullWith the emergence of prokaryotic cells a distinct pool of viral genes formed that retained its identity ever since as evidenced by the extant distribution ofviral hallmark genes ldquovirus worldrdquo or the virospherebullThe emergence of the eukaryotic cell was a second melting pot of virus evolution from which viruses of eukaryotes originated via recombination of genes from prokaryote viruses retroelements and the evolving eukaryotic hostbullViruses make essential contributions to the evolution of the genomes of cellularlife forms in particular as vehicles of HGT GTAs transducing phages
The ancient Virus World (VW)
Koonin EV Senkevich TG Dolja VV The ancient Virus World and evolution of cells Biol Direct 2006 Koonin EV Wolf YI Nagasaki K Dolja VV The complexity of the virus world Nat Rev Microbiol 2009
Evolution of antivirus defense systems
bull CRISPRCas system of adaptive immunity in prokaryotes
bull A case for Lamarckian evolutionbull The perennial virus-host arms race
CRISPR repeats and Cas genesCRISPR Clustered Regularly interspaced short palindromic repeatsCas CRISPR-associated (genes)
Sorek et al Nature Rev Microbiol 2008
Makarova KS Aravind L Grishin NV Rogozin IB Koonin EVA DNA repair system specific for thermophilic Archaea and bacteria predicted by genomic context analysisNucleic Acids Res 2002 Jan 1530(2)482-96
During a systematic analysis of conserved gene context in prokaryotic genomes a previously undetected complex partially conserved neighborhood consisting of more than 20 genes was discovered in most Archaea and some bacteria including the hyperthermophiles Thermotoga maritima and Aquifex aeolicus The gene composition and gene order in this neighborhood vary greatly between species but all versions have a stable conserved core that consists of five genes One of the core genes encodes a predicted DNA helicase often fused to a predicted HD-superfamily hydrolase and another encodes a RecB family exonuclease three core genes remain uncharacterized but one of these might encode a nuclease of a new family helliphelliphelliphellipThe functional features of the proteins encoded in this neighborhood suggest that they comprise a previously undetected DNA repair system which to our knowledge is the first repair system largely specific for thermophiles to be identified
Protein components of the system an update with unification of many diverse families
~25 families altogetherFamily SubfamilyA Phyletic
distributionBComments
1 COG1518 COG1518 (cas1) All Putative novel nucleaseintegrase Mostly α-helical protein
2 COG1343 COG1343 (cas2) COG3512ygbF-like
MTH324-likey1723_N-like
All Small protein related to VapD fused to helicase (COG1203) iny1723-like proteins
3 COG1203 COG1203 (cas3) All DNA helicase Most proteins have fusion to HD nuclease
4 RecB-like nuclease
COG1468 (cas4) COG4343 All RecB-like nuclease Contains three-cysteine C-terminal cluster
5 RAMP Repair-associated mysterious
protein
COG1688 (cas5)COG1769 COG1583COG1567 COG1336COG1367 COG1604COG1337 COG1332COG5551
BH0337-likeMJ0978-likeYgcH-like
y1726-like y1727-like
All Belong to ldquoRAMPrdquo family possibly RNA-binding proteinstructurally related to duplicated ferredoxin fold (PDB 1wj9)
6 COG1857 COG1857 COG3649 YgcJ-like y1725-like
All αβ protein predicted nuclease or integrase
7 HD-like nuclease
COG1203 (N-terminus) COG2254 All HD-like nuclease
8 BH0338 BH0338-likeMTH1090-like
All mostlyarchaea and FIRM
Large Zn-finger containing proteins probably nucleases (nucleaseactivity was shown for MTH1090 (ref)
9 COG1353 predicted
polymerase
COG1353 Most archaea some bacteria
Predicted palm-domain polymerase distantly related to viral RdRp and RT
hellipand ~15 other less common proteins Makarova et al BD 2006
CRISPR clustered regularly interspaced short palindromic repeats
CRISPR repeats
TB and TA
Strepto-like
Ecoli-like
Pasteurella-like
Cas2COG1343
Cas1COG1518
Cas4COG1468
Cas3COG1203
ldquoThe common structural characteristics of CRISPR loci are (i) the presence of multiple short direct repeats which show no or very little sequence variation within a given locus (ii) the presence of non-repetitive spacer sequences between the repeats of similar size (iii) the presence of a common leader sequence of a few hundred basepairs in most species harbouring multiple CRISPR loci (iv) the absence of long open reading frames within the locus and (v) the presence of the cas1 gene accompanied by the cas2 cas3 or cas4 genes in CRISPR-containing speciesrdquo
The cas genes are our ldquorepairrdquo system
ldquoHelicaserdquo cassette
ldquoPolymeraserdquo cassette
paragon of prokaryotic genome plasticity-extensive gene shuffling-evidence of multiple horizontal transfers-widespread gene loss
RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins
extreme sequence diversity MOTIFs specific I II III IV V COGs 13361367 hhshhGs ust-lKGhh+hh hhGtt hD lGhttsGh 160413371332 y1726-like slhlpEKuVRGT lRTIDs YGuVTsGhuh COG1851 hGphpGpsaFh hGFGRh BH0337-like TpA-h+GIh-uIh hhLpDV LGsREhuht COG1567 hhhpp ups-lhtAhh luscoGhGh COG1769 hhhh+Ph-hhh ss-hhGhlhsh hGh hGtcp+hsthchp COG1688 (Cas5) hhhhhts sssshhGhlsh lGttphh COGs 15835551 hhhhoPhhl hGtppshGFGl YgcH-like hHphlh hGu+uhGhGhh y1727-like LHphLh GFsthGLStss MJ0978-like hHNH lG+tsuhGhGol
Ferredoxin foldRNA-recognition motif (RRM)
The sequence similarity space of CRISPR repeats visualized with the BioLayoutprogram [26] Dots denote individual repeat sequences connecting lines represent Smith-Waterman similarities such that closer dots represent more similar sequences Dot colors denote cluster association as derived from MCL clustering The 12 largest clusters are indicated by circles together with their sequence logos coarse phylogenetic composition and sample secondary structures where applicableKunin et al Genome Biology 2007 8R61 doi101186gb-2007-8-4-r61
CRISPR show extreme diversity and complex clustering
Arcful GTTGAAATC-------AGACCAAAATGGGATTGAAAG- Metthe GTTAAAATC-------AGACCAAAATGGGATTGAAAT- Pyraby GTTCCAATA-------AGACTAGAATAGAATTGAAAG- Pyrhor GTTTAATAA--------GACTAAAATAGAATTGAAAG- Sulsol GATTAATCC-------------CAAAAGAATTGAAAG- Pyrhor GTTTCCGTA--------GAACTAAATAGTGTGGAAAG- Pyraby GTTTCCGTA--------GAACTTAGTAGTGTGGAAAG- Pyraer GTTTCAACT------------ATCTTTTGATTTTTGG- Pictor GTTAAATAA-------TAACCTAAATAGGATTGAAAG- Theaer GTAAAATAG---------ACCTTAATAGGATTGAAAG- Metace GTTTCAATC-------CCTCAAAGGTCTGATTTTAAC- Sultok GATGAATCC------------CAAAAGGAATTGAAAG- Sulaci GTTTTAGTT-------------TCTTGTCGTTATTAC- Pictor GTTTAAGAA--------TTACTAGATAGTATGGAGT-- Halmar GTTTCAGAC-------GGACCCTTGTGGGGTTGAAGC- Metjan ATTAAAATC-------AGACCGTTTCGGAATGGAAAT- Metkan GTTTCATTA-CCCGTATTATTACGGGTTAATTGCGAG- Nanequ CTTTCAATA---------TTTCTAATATATTAGAAAC- Themar GTTTCAATA-------CTTCCTTAGAGGTATGGAAAC- Theten GTTTCAATC-------CCTTTTAGGTAGGCTAAAAAC Thethe GTTGCAAAC-CTCGTTAGCCTCGTAGAGGATTGAAAC- Aquaeo GTTTTAACT--------CCACACGGTACATTAGAAAC- Bachal GTCGCACTC----TATATGGGT-GCGTGGATTGAAAT- Azoarcus GTGTTCCCC-------GCGCATCGCGGGGGTTGAAG-- Chltep GTCTTCCCC--------ACGCC-CGTGGGGGTGTTTC- Chrvio GTGCTCCCC-------CACGCA-CGTGGGGATGAACCG Clotet GTATTAGTA-------GCACCA-TATTGGAATGTAAAT Anabaena GTTTTAATTAACAAAAATCCCTATCAGGGATTGAAAC- Myctub GTTTCCGTC--CCCTCTCGGGGTTTTGGGTCTGACGAC Yerpes GTTCACTGC--------CGCACAGGCAGCTTAGAAA-- Metcap GTTTCAATCCACTCCCGGCTATTTAGCCGGGAGATAC- Mycgal GTTTTAGCA-CTGTACAATACTTGTGTAAGCAATAAC- Pasmul GTTAACTGC--------CGTATAGGCAGCTTAGAAA-- Pasmul GTTGTAGTTCCCTCTCTCATTTCGCAGTGCTACAAT-- Pasmul GTTCACCAT--------CGTGTAGATGGCTTAGAAA--
All
A large subset of CRISPRs is conserved among diverse specieseven between archaea and bacteria suggesting that
CRISPRs are horizontally transferred together with cas genes
Mojica FJ Diez-Villasenor C Garcia-Martinez J Soria E
Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elementsJ Mol Evol 2005 Feb60(2)174-82
Here we show that CRISPR spacers derive from preexisting sequences either chromosomal or within transmissible genetic elements such as bacteriophages and conjugative plasmids
helliphelliphelliphellipThe transcription of the CRISPR loci (Tang et al 2002) suggests that such activity could be executed by CRISPR-RNA molecules acting as regulatory RNA that specifically recognizes the target through the homologous RNA-spacer sequence similarly to the eukaryotic interference RNA
HypothesisCRISPRCasbull is a prokaryotic immunity system that functions on the
RNAi principle
bull integrates short fragments of essential phageplasmid genes into CRISPRs
bull When expressed these fragments (psiRNA ndash after prokaryotic siRNA) silence the target gene and make the organism immune to the respective agent
bull contains all or most of the protein activities involved in these processes
bull Some of the Cas proteins are functional analogs of the eukaryotic proteins involved in RNAi in particular components of RISC (RNA-Induced Silencing Complex) and form prokaryotic analogs of RISC (pRISCs)
Transcription
(protein-guided) RNA folding
specific transcription factor (COG1517)
cellular RNApol
polycistronic pre-psiRNA
p-dicer (Helicase+HD-Hydrolase)(COG1203 COG2254)
RNA processing(fast)
pre-psiRNA75-100 nt
RNA processing(slow)
p-dicer
psiRNA25-45 nt
p-slicer(COG1468 COG4343
COG1857)
RAMP
target RNA cleavage
p-RISC
plasmid or phage mRNA
Annealing to RNA target
p-RISC
The basic scheme of CASS functioning
psiRNA25-45 nt
RAMP
RAMP-RNA complex(unstable)plasmid or phage
mRNA
Primer elongationCASS RNApol(COG1353)
Amplified annealing
long dsRNA (stable)
p-dicerDuplex degradation
RAMPRAMP binding
Cycle continues
Annealing to RNA target
Variant of CASS functioning with polymerasepsi-RNA amplification(mostly in thermophiles but also in Mycobacteria)
plasmid or phage mRNA
pre-psiRNA75-100 nt
CASS RNApol(COG1353)Or other RT
Reverse transcription with random copy choice
Random RNA recombination and reverse transcription
dsDNA with CRISPRs and target-derived spacers
Homologous recombination with genomic CRISPR region
integrase(COG1518)
genomic DNA
genomic DNA with new target-derived spacer
OR
New psiRNA generation
Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12
Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity
Key validation
bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease
Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes
(1) Adaptation
(2) Expression amp Processing
(3) Interference
invading virus plasmid
CRISPRleader
5 4 3 2 1
6 5 4 3 2 1
invading virus plasmid
invading virus plasmid
host
host
cas operon
Cascade
Cas3
cas3 cas2cas1cas5
Van der Oost et al TIBS 2009
The three types of CRISPRCas systems and their signature genes
1343
1343
1343
1343
Helicase1203
HD-f
Cas2
Cas6
Cas6
Cas5
Cas8
Cas9
Cas1
Cas3
Cas10
Cas11
Cas7
Cas6
Cas6
Cas5 Cas6
Cas4
Cas6
13433513
1343
RuvC-f
3513
HNH
13433513
1857
18571857 RAMP1688
RAMP1688
RecB-f1468
RecB-f1468
Helicase1203
R AMP1583
1857 1857Helicase1203
1857Helicase1203
1343 y1724
ygcKygcL
EcoliCASS2
YpestCASS3
NmeniCASS4
MtubeCASS6 RAMP module
DvulgCASS1
Tneap and HmariCASS7
ApernCASS5
Type I (Helicase cassette)
Cascade
Type II (HNH-type)
Type III (RAMP module or Polymerase cassette)
CASS4a
1857 1343RecB-f1468
Helicase1203
BH0338MTH1090LA3191
1421 RAMP1567
RAMP15833337RAMP
1583
RAMP1583
RAMP1769
SPy1049
Cas6
Cas61857
RecB-f1468
RAMP1688
Helicase1203
AF0070
R AMP1583
1343
1517
Cas12 Cas13
RAMP1688
RAMP1688
RAMP1688
Makarova et al NRM submitted
Experimental data on CRISPRCas systems
1857
1857
1857
RAMP1688
10
Cordip DIP0037Bacfra BF3951
Wolsuc WS1444Camjej Cj1522c
Neimen NMA0630Pasmul PM1126
Strthe str0658Strpyo Spy 0770
Treden TDE0328Staepi SERP2463
Mycgal MGA 0523Mycmob MMOB0320
Porgin PG1982Legpne lpp0161
Wolsuc WS1615Sulsol SSO1450
Aerper APE1240Sulsol SSO1405
Pyraer PAE0200Pyraer PAE0081
Metmaz MM0559Metmaz MM3249
Arcful AF1878Mansuc MS1635
Niteur NE0111Myctub Rv2817c
Nostoc alr0381Synech slr7071Thethe TTHB145
Chrvio CV1229Xanaxo XAC3842
Chltep CT1130Desvul DVUA0134
Metcap MCA0651Azoarc ebA3283
Deigeo_DRAFT_1682Mansuc MS0981
Baccla ABC3592Bachal BH0341
Strpyo Spy 1286Vibvul VVA1544
Nostoc alr1468Synech slr7092
Synech slr7016Geosul GSU0057
Thethe TTHB224Nostoc alr1568
Metkan MK1312
Esccol b2755Saltyp STM2938
Phopro PBPRC0034Geosul GSU1392
Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639
Symthe_STH663Corjei jk0643
Nocfar nfa44220Strave SAV7537
Metcap MCA0930Cordip DIP2214
Chrvio CV1756Zymmob ZMO0680
Pasmul PM0311Erwcar ECA3679
Yerpes y1722Legpne lpl2837Phopro PBPRB1995
Acinet ACIAD2484
Clotet CTC01148Fusnuc FN1177
Aquaeo aq 369Theten TTE2658Themar TM1797
Bacfra BF2544Porgin PG2014
Halmar pNG4053Clotet CTC01463
Metjan MJ0378Pyrhor PH1245
Thekod TK0455Metthe MTH1084
Arcful AF2435Metace MA3670
Nanequ NEQ017Pyrhor PH0173
Pictor PTO0003Pictor PTO0049Thevol TVN0106
1343
1343
1343
1343
1518
1518
1343
1518
RecB-f4343
RecB-f1468
RecB-f1468
RecB-f1468
Helicase12031857 RAMP
1688
RAMP1688
Helicase1203
Helicase1203
RAMPBH0337
RAMPYgcH
RAMPy1727
RAMPy17261857
Helicase1203
HD-f
Helicase1203
HD-f
1343
HD-f
HD-f
HD-f
RuvC-f
AF0070
BH0338
y1724
ygcKygcL
RAMP1583
HTH
AF1870
SPy1049
3574
AF1870
PH0918
HTH 3574
1857 RAMP1688
RecB-f1468
Helicase1203
HD-f
AF1870
MTH1090
AF1873
RAMP1583
3574
AF1870
3574
1343
1
7
2
3
4
5
6
7a
Sulsol SSO1999Sulsol SSO1440
Sultok ST2639Sultok ST0032
Sulsol SSO1402Pyraer PAE0068
Arcful AF1874Pyrhor PH0917Thekod TK0450
Pyraby PAB1689
13433513
3513
1518
RuvC-f
HNH
4aHNH
1 0
Pyrhor PH0162
6
64
6
4
16
17
7
7
7
7
1
2
7
7
7
7
5 7
5 75 7
5 75 7
5 7
6 1
7
Metkan MK1314
Metace MA1931
Metmaz MM3356
Metbar Mbar_A1355
Strthe stu0960Myctub Rv2823cStaepi SERP2461
Themar TM1811Mansuc MS1651
Niteur NE0123Thethe TTP0102
Metthe MTH1082
Metthe MTH326
Metjan MJ1672Pictor PTO0053Thevol TVN0112
Metkan MK1297Thethe TTP0115
Bachal BH0328Synech sll7090
Aquaeo aq 387Theten TTE2639
Themar TM1794Arcful AF1867
Pyrfur PF1129Thefus Tfu1578
Porgin PG1987Sultok ST1979
Sulsol SSO1991Sulsol SSO1729
Sultok ST0012Sulaci Saci 2046Nostoc all1479
1421 RAMP1567
RAMP1583
RAMPMTH323
RAMP1769
XMTH324
ldquoPolymeraserdquo cassette
ldquoHelicaserdquo cassette(A)
(B)
(C)
Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4
Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56
Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)
mRNA targeting
Phylogeny of Cas1 and the 3 types of CRISPRCas systems
228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes
EcoliCASS2Type IType I
Type I
Type I
Type I
Type III
Type III
Type III
Type III
Type III
Type III
Type II
YpestCASS3
NmeniCASS4
PolymeraseRAMPmoduleMtubeCASS6
PolRAMP
DvulgCASS1
TneapHmariCASS7
ApernCASS5
cas1cas9 cas2 cas4
cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI
II
IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2
RAMP
Makarova et al NRM submitted
CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes
54
256
1537
26
516
2
9
7 56107
3
10
13
383
210
46
216
8
1
7 70211
10
1
0102030405060708090
100
1623
84 6
17 7
2322
0
9
0
0
151
2
1
21
17
20 1
0
1533 28 7 14
0
9 7 40
117 210
0102030405060708090
100
Type I Type II Type IIICas1presenceabsence
Makarova et al in preparation
Modular evolution of the 3 types of CRISPRCas
Makarova et al in preparation
Mol Microbiol 2011 Jan79(2)484-502
A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair
Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF
Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair
Back to repair functions
The 3 major modalities of evolution
Koonin Wolf Biol Direct 2009
CRISPRCas as a bona fide Lamarckian system
Koonin Wolf Biol Direct 2009
Phenomenon Biological rolefunction
Phyletic spread
Lamarckian criteria
Genomic changes caused by environmental factor
Changes are specific to relevant genomic loci
Changes provide adaptation to the causative factor
Bona fide LamarckianCRISPRCas Defense against
viruses and other mobile elements
Most of the Archaeaand many bacteria
Yes Yes Yes
piRNA Defense against transposable elements in germline
Animals Yes Yes Yes
HGT (specific cases)
Adaptation to new environment stress response resistance
Archaea bacteria unicellular eukaryotes
Yes Yes Yes
Quasi-LamarckianHGT (general phenomenon)
Diverse innovations Archaea bacteria unicellular eukaryotes
Yes No Yesno
Stress-induced mutagenesis
Stress responseresistanceadaptation to new conditions
Ubiquitous Yes No or partially
Yes (but general evolvabilityenhanced as well)
Diverse Lamarckian and quasi-Lamarckian phenomena
Koonin Wolf Biol Direct 2009
Stress as a gauge of evolutionary modality
Koonin Wolf Biol Direct 2009
bullEvolution of parasites is intrinsic to any replicator system
bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since
bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms
bullPerennial arms race between parasites and hosts is one of the principal factors of evolution
Valerian DoljaOregon State University
Yuri Wolf NCBITatiana Senkevich NIAID NIH
Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille
Didier Raoult et al
Bill MartinUniv Duesseldorf
Kira Makarova NCBI
The 5 supergroups of eukaryotes and their picorna-like viruses
Complementary view from thehost side
4 of the 5 eukaryoticsupergroups host diverse picorna-like viruses (no onechecked the 5th)
PolioV74 kb
VP0 3DPol2C 3CPro Angg
IRESVP1VP3 2APro 2B
3A 3B
Jelly-roll CPsSuperfamily 3
helicase RdRp
DNA phages Bacterial group II
retroelements
Bacterial and mitochondrialHtrA family proteases
Likely origins of the signature genes of the picorna-like superfamily were inferred usingsignature PSSMs and PSI-BLAST searches
3C-Pro
Hallmark genes of picorna-like viruses have distinct prokaryoticorigins
Radiation of major viralclades occurred in aldquoBig Bangrdquo during eukaryogenesis and antedates the divergenceof eukaryotic supergroups-the viruses then ldquosampledrdquo the hosts
Conclusions on the picorna and NCLDV storiesBig Bangs of virus evolution
bullPhylogenomic analysis of the picorna-like superfamily of eukaryotic RNA viruses indicates that the major lineages within this superfamily diverged prior to the divergence of the eukaryotic supergroupsbullBig Bang ndash an explosive early phase of viral evolutionbullMost likely the same pattern holds for other major groups ofviruses as illustrated by the evolutionary study of NCLDVa completely different group of virusesbullThe Big Bangs of eukaryotic virus evolution occurred concomitantly with a similar rapid phase of host evolution andcould be a manifestation of a general model of major evolutionarytransitions
Koonin The Biological Big Bang model for the major transitions in evolutionBiol Direct 2007 Aug 20221
Koonin Wolf Nagasaki Dolja Nat Rev Microbiol 2008 Dec6(12)925-39
+
1 Positive-strandRNA
3-30 kb - +R R ET
T+R
Class Replication cycle
2 Double-strandRNA
4-25 kb plusmnTr R E
T
+ plusmn
3 Negative-strandRNA
11-20 kbR R E
RdRp CPT
+-Tr -
+
Genetic cycles of RNA viruses
Diversity of viral genetic cycles versus the uniform genetic cycle of all cellular organisms Viruses are the
biospherersquos laboratory of genomic strategies
RdRp CP
RdRp CP
+4 Retroid
RNA viruses 7-12 kb +RT Tr E
T+Tr
plusmn
5 Retroid DNAviruses
elements2-10 kb +Tr
T+Tr
plusmnE RT
plusmn
RT CP
RT
Class Replication cycle
Genetic cycles of retroid viruses and retroelements
Genetic cycles of DNA viruses and plasmids 6 ssDNAvirusesplasmids2-11 kb
7 dsDNA virusesplasmids
5-1200 kb
+RCR RCR E
+Tr
plusmn +
TRCE
Tr
plusmn
+T
plusmnR E
Pr-Pol ALL CELLULARORGANISMS
YOU ARE HERE
I Genes with readily detectable homologs from cellular life forms 1 Genes with closely related homologs from cellular organisms
(typically the host of the given virus) present in a narrow group of viruses2 Genes that are conserved within a virus lineage or even several
lineages and have moderately close cellular homologsOrigin relatively recent (1) or ancient (2) acquisition from host
II Virus-specific genes3 ORFans ie genes without detectable homologs except possibly
in closely related viruses 4 Virus-specific genes that are conserved within a virus lineageAcquisition from host but with rapid divergence from ancestor once within viral genomes
III Viral hallmark genes5 Genes shared by many diverse virus lineages with only very distant
homologs in cellular organisms
Natural history of viral genes a one-page summary of viral comparative genomics
Contributions of different classes of viral genes to the genomes of different classes of viruses strong dependence on genome size
Genome size (log) 0 1 2 3
Recent acquisitions
Old acquisitions
Virus-specific ORFans
Virus-specific conserved
Virus hallmark
Most RNAvirusesretroelementsRCR replicons
Large RNA virusesadenotailedphages
NCLDVherpeslargephages
Natural history of viral genesViral Hallmark Genes
Shared by many diverse groups of viruses from the smallest RNA viruses to the giant DNA viruses
Strong support for monophyly of all viral members of the respective gene families
Only distant homologs in cellular organisms
Can be viewed as signatures of the lsquovirus statersquo
Play major roles in genome replication packagingand assembly
4 Rolling circle replicationinitiation endonuclease
1 Jelly-roll capsid protein
2 Superfamily 3 helicase
Protein products of viral hallmark genes
3 RNA-dependent RNA polymeraseand Reverse transcriptase
5 Viral archaeo-eukaryotic DNA primase
6 UL9-like superfamily 2 helicase
7 Packaging ATPase of the FtsK family
8 ATPase subunit of terminase
The primordial gene pool hypothesis (extremely counterintuitive ndash Santiago Elena Feb 16 2011)
The hallmark genes AND by implication the major lineages of modern viruses (at least viruses of prokaryotes) descend directly from a primordial gene pool
Koonin Senkevich Dolja Biol Direct 2006 129
Viral hallmark genes arebullpresent in a huge diversity of viruses and other selfish elementsbullrepresented only by remote homologs in cellular life forms
-synergy with the diversity of genomic strategies
A crucial corollary If viruses come directly from a primordial gene pool then origin of viruses isinextricably linked to the origin of cells
Cell degeneration Escaped genes Pr imordial geneticsystems
CELL
SMALLPARASITICCELL
VERYSMALLVIRUS
CHROMOSOME
PLASMID
VIRUS
RNA
VIRUS
PRE-CELLULAR LIFE FORMS
DNA
VIRUS
mRNA
VIRUS
Competing concepts of the or igin of viruses
CELLS
Origin and evolution of virus-like genetic
elementsin the pre-cellular
era
Koonin Martin TIG 2005Koonin Senkevich Dolja Biol Direct 2006 129Koonin Ann NY Acad Sci 2009
Replicon fusionyields chromosomes
bullViruses and virus-like genetic elements are not ldquojustrdquo pathogens they aredominant entities in the biospherebullEmergence of virus-like parasites is inevitable in any replicating systembullIn the pre-cellular epoch the genetic elements that later became viral andcellular genomes comprised a single pool in which they mixed matched andevolved new increasingly complex gene ensemblesbullDifferent replication strategies including RNA replication reverse transcriptionand DNA replication evolved already in the primordial genetic poolbullWith the emergence of prokaryotic cells a distinct pool of viral genes formed that retained its identity ever since as evidenced by the extant distribution ofviral hallmark genes ldquovirus worldrdquo or the virospherebullThe emergence of the eukaryotic cell was a second melting pot of virus evolution from which viruses of eukaryotes originated via recombination of genes from prokaryote viruses retroelements and the evolving eukaryotic hostbullViruses make essential contributions to the evolution of the genomes of cellularlife forms in particular as vehicles of HGT GTAs transducing phages
The ancient Virus World (VW)
Koonin EV Senkevich TG Dolja VV The ancient Virus World and evolution of cells Biol Direct 2006 Koonin EV Wolf YI Nagasaki K Dolja VV The complexity of the virus world Nat Rev Microbiol 2009
Evolution of antivirus defense systems
bull CRISPRCas system of adaptive immunity in prokaryotes
bull A case for Lamarckian evolutionbull The perennial virus-host arms race
CRISPR repeats and Cas genesCRISPR Clustered Regularly interspaced short palindromic repeatsCas CRISPR-associated (genes)
Sorek et al Nature Rev Microbiol 2008
Makarova KS Aravind L Grishin NV Rogozin IB Koonin EVA DNA repair system specific for thermophilic Archaea and bacteria predicted by genomic context analysisNucleic Acids Res 2002 Jan 1530(2)482-96
During a systematic analysis of conserved gene context in prokaryotic genomes a previously undetected complex partially conserved neighborhood consisting of more than 20 genes was discovered in most Archaea and some bacteria including the hyperthermophiles Thermotoga maritima and Aquifex aeolicus The gene composition and gene order in this neighborhood vary greatly between species but all versions have a stable conserved core that consists of five genes One of the core genes encodes a predicted DNA helicase often fused to a predicted HD-superfamily hydrolase and another encodes a RecB family exonuclease three core genes remain uncharacterized but one of these might encode a nuclease of a new family helliphelliphelliphellipThe functional features of the proteins encoded in this neighborhood suggest that they comprise a previously undetected DNA repair system which to our knowledge is the first repair system largely specific for thermophiles to be identified
Protein components of the system an update with unification of many diverse families
~25 families altogetherFamily SubfamilyA Phyletic
distributionBComments
1 COG1518 COG1518 (cas1) All Putative novel nucleaseintegrase Mostly α-helical protein
2 COG1343 COG1343 (cas2) COG3512ygbF-like
MTH324-likey1723_N-like
All Small protein related to VapD fused to helicase (COG1203) iny1723-like proteins
3 COG1203 COG1203 (cas3) All DNA helicase Most proteins have fusion to HD nuclease
4 RecB-like nuclease
COG1468 (cas4) COG4343 All RecB-like nuclease Contains three-cysteine C-terminal cluster
5 RAMP Repair-associated mysterious
protein
COG1688 (cas5)COG1769 COG1583COG1567 COG1336COG1367 COG1604COG1337 COG1332COG5551
BH0337-likeMJ0978-likeYgcH-like
y1726-like y1727-like
All Belong to ldquoRAMPrdquo family possibly RNA-binding proteinstructurally related to duplicated ferredoxin fold (PDB 1wj9)
6 COG1857 COG1857 COG3649 YgcJ-like y1725-like
All αβ protein predicted nuclease or integrase
7 HD-like nuclease
COG1203 (N-terminus) COG2254 All HD-like nuclease
8 BH0338 BH0338-likeMTH1090-like
All mostlyarchaea and FIRM
Large Zn-finger containing proteins probably nucleases (nucleaseactivity was shown for MTH1090 (ref)
9 COG1353 predicted
polymerase
COG1353 Most archaea some bacteria
Predicted palm-domain polymerase distantly related to viral RdRp and RT
hellipand ~15 other less common proteins Makarova et al BD 2006
CRISPR clustered regularly interspaced short palindromic repeats
CRISPR repeats
TB and TA
Strepto-like
Ecoli-like
Pasteurella-like
Cas2COG1343
Cas1COG1518
Cas4COG1468
Cas3COG1203
ldquoThe common structural characteristics of CRISPR loci are (i) the presence of multiple short direct repeats which show no or very little sequence variation within a given locus (ii) the presence of non-repetitive spacer sequences between the repeats of similar size (iii) the presence of a common leader sequence of a few hundred basepairs in most species harbouring multiple CRISPR loci (iv) the absence of long open reading frames within the locus and (v) the presence of the cas1 gene accompanied by the cas2 cas3 or cas4 genes in CRISPR-containing speciesrdquo
The cas genes are our ldquorepairrdquo system
ldquoHelicaserdquo cassette
ldquoPolymeraserdquo cassette
paragon of prokaryotic genome plasticity-extensive gene shuffling-evidence of multiple horizontal transfers-widespread gene loss
RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins
extreme sequence diversity MOTIFs specific I II III IV V COGs 13361367 hhshhGs ust-lKGhh+hh hhGtt hD lGhttsGh 160413371332 y1726-like slhlpEKuVRGT lRTIDs YGuVTsGhuh COG1851 hGphpGpsaFh hGFGRh BH0337-like TpA-h+GIh-uIh hhLpDV LGsREhuht COG1567 hhhpp ups-lhtAhh luscoGhGh COG1769 hhhh+Ph-hhh ss-hhGhlhsh hGh hGtcp+hsthchp COG1688 (Cas5) hhhhhts sssshhGhlsh lGttphh COGs 15835551 hhhhoPhhl hGtppshGFGl YgcH-like hHphlh hGu+uhGhGhh y1727-like LHphLh GFsthGLStss MJ0978-like hHNH lG+tsuhGhGol
Ferredoxin foldRNA-recognition motif (RRM)
The sequence similarity space of CRISPR repeats visualized with the BioLayoutprogram [26] Dots denote individual repeat sequences connecting lines represent Smith-Waterman similarities such that closer dots represent more similar sequences Dot colors denote cluster association as derived from MCL clustering The 12 largest clusters are indicated by circles together with their sequence logos coarse phylogenetic composition and sample secondary structures where applicableKunin et al Genome Biology 2007 8R61 doi101186gb-2007-8-4-r61
CRISPR show extreme diversity and complex clustering
Arcful GTTGAAATC-------AGACCAAAATGGGATTGAAAG- Metthe GTTAAAATC-------AGACCAAAATGGGATTGAAAT- Pyraby GTTCCAATA-------AGACTAGAATAGAATTGAAAG- Pyrhor GTTTAATAA--------GACTAAAATAGAATTGAAAG- Sulsol GATTAATCC-------------CAAAAGAATTGAAAG- Pyrhor GTTTCCGTA--------GAACTAAATAGTGTGGAAAG- Pyraby GTTTCCGTA--------GAACTTAGTAGTGTGGAAAG- Pyraer GTTTCAACT------------ATCTTTTGATTTTTGG- Pictor GTTAAATAA-------TAACCTAAATAGGATTGAAAG- Theaer GTAAAATAG---------ACCTTAATAGGATTGAAAG- Metace GTTTCAATC-------CCTCAAAGGTCTGATTTTAAC- Sultok GATGAATCC------------CAAAAGGAATTGAAAG- Sulaci GTTTTAGTT-------------TCTTGTCGTTATTAC- Pictor GTTTAAGAA--------TTACTAGATAGTATGGAGT-- Halmar GTTTCAGAC-------GGACCCTTGTGGGGTTGAAGC- Metjan ATTAAAATC-------AGACCGTTTCGGAATGGAAAT- Metkan GTTTCATTA-CCCGTATTATTACGGGTTAATTGCGAG- Nanequ CTTTCAATA---------TTTCTAATATATTAGAAAC- Themar GTTTCAATA-------CTTCCTTAGAGGTATGGAAAC- Theten GTTTCAATC-------CCTTTTAGGTAGGCTAAAAAC Thethe GTTGCAAAC-CTCGTTAGCCTCGTAGAGGATTGAAAC- Aquaeo GTTTTAACT--------CCACACGGTACATTAGAAAC- Bachal GTCGCACTC----TATATGGGT-GCGTGGATTGAAAT- Azoarcus GTGTTCCCC-------GCGCATCGCGGGGGTTGAAG-- Chltep GTCTTCCCC--------ACGCC-CGTGGGGGTGTTTC- Chrvio GTGCTCCCC-------CACGCA-CGTGGGGATGAACCG Clotet GTATTAGTA-------GCACCA-TATTGGAATGTAAAT Anabaena GTTTTAATTAACAAAAATCCCTATCAGGGATTGAAAC- Myctub GTTTCCGTC--CCCTCTCGGGGTTTTGGGTCTGACGAC Yerpes GTTCACTGC--------CGCACAGGCAGCTTAGAAA-- Metcap GTTTCAATCCACTCCCGGCTATTTAGCCGGGAGATAC- Mycgal GTTTTAGCA-CTGTACAATACTTGTGTAAGCAATAAC- Pasmul GTTAACTGC--------CGTATAGGCAGCTTAGAAA-- Pasmul GTTGTAGTTCCCTCTCTCATTTCGCAGTGCTACAAT-- Pasmul GTTCACCAT--------CGTGTAGATGGCTTAGAAA--
All
A large subset of CRISPRs is conserved among diverse specieseven between archaea and bacteria suggesting that
CRISPRs are horizontally transferred together with cas genes
Mojica FJ Diez-Villasenor C Garcia-Martinez J Soria E
Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elementsJ Mol Evol 2005 Feb60(2)174-82
Here we show that CRISPR spacers derive from preexisting sequences either chromosomal or within transmissible genetic elements such as bacteriophages and conjugative plasmids
helliphelliphelliphellipThe transcription of the CRISPR loci (Tang et al 2002) suggests that such activity could be executed by CRISPR-RNA molecules acting as regulatory RNA that specifically recognizes the target through the homologous RNA-spacer sequence similarly to the eukaryotic interference RNA
HypothesisCRISPRCasbull is a prokaryotic immunity system that functions on the
RNAi principle
bull integrates short fragments of essential phageplasmid genes into CRISPRs
bull When expressed these fragments (psiRNA ndash after prokaryotic siRNA) silence the target gene and make the organism immune to the respective agent
bull contains all or most of the protein activities involved in these processes
bull Some of the Cas proteins are functional analogs of the eukaryotic proteins involved in RNAi in particular components of RISC (RNA-Induced Silencing Complex) and form prokaryotic analogs of RISC (pRISCs)
Transcription
(protein-guided) RNA folding
specific transcription factor (COG1517)
cellular RNApol
polycistronic pre-psiRNA
p-dicer (Helicase+HD-Hydrolase)(COG1203 COG2254)
RNA processing(fast)
pre-psiRNA75-100 nt
RNA processing(slow)
p-dicer
psiRNA25-45 nt
p-slicer(COG1468 COG4343
COG1857)
RAMP
target RNA cleavage
p-RISC
plasmid or phage mRNA
Annealing to RNA target
p-RISC
The basic scheme of CASS functioning
psiRNA25-45 nt
RAMP
RAMP-RNA complex(unstable)plasmid or phage
mRNA
Primer elongationCASS RNApol(COG1353)
Amplified annealing
long dsRNA (stable)
p-dicerDuplex degradation
RAMPRAMP binding
Cycle continues
Annealing to RNA target
Variant of CASS functioning with polymerasepsi-RNA amplification(mostly in thermophiles but also in Mycobacteria)
plasmid or phage mRNA
pre-psiRNA75-100 nt
CASS RNApol(COG1353)Or other RT
Reverse transcription with random copy choice
Random RNA recombination and reverse transcription
dsDNA with CRISPRs and target-derived spacers
Homologous recombination with genomic CRISPR region
integrase(COG1518)
genomic DNA
genomic DNA with new target-derived spacer
OR
New psiRNA generation
Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12
Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity
Key validation
bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease
Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes
(1) Adaptation
(2) Expression amp Processing
(3) Interference
invading virus plasmid
CRISPRleader
5 4 3 2 1
6 5 4 3 2 1
invading virus plasmid
invading virus plasmid
host
host
cas operon
Cascade
Cas3
cas3 cas2cas1cas5
Van der Oost et al TIBS 2009
The three types of CRISPRCas systems and their signature genes
1343
1343
1343
1343
Helicase1203
HD-f
Cas2
Cas6
Cas6
Cas5
Cas8
Cas9
Cas1
Cas3
Cas10
Cas11
Cas7
Cas6
Cas6
Cas5 Cas6
Cas4
Cas6
13433513
1343
RuvC-f
3513
HNH
13433513
1857
18571857 RAMP1688
RAMP1688
RecB-f1468
RecB-f1468
Helicase1203
R AMP1583
1857 1857Helicase1203
1857Helicase1203
1343 y1724
ygcKygcL
EcoliCASS2
YpestCASS3
NmeniCASS4
MtubeCASS6 RAMP module
DvulgCASS1
Tneap and HmariCASS7
ApernCASS5
Type I (Helicase cassette)
Cascade
Type II (HNH-type)
Type III (RAMP module or Polymerase cassette)
CASS4a
1857 1343RecB-f1468
Helicase1203
BH0338MTH1090LA3191
1421 RAMP1567
RAMP15833337RAMP
1583
RAMP1583
RAMP1769
SPy1049
Cas6
Cas61857
RecB-f1468
RAMP1688
Helicase1203
AF0070
R AMP1583
1343
1517
Cas12 Cas13
RAMP1688
RAMP1688
RAMP1688
Makarova et al NRM submitted
Experimental data on CRISPRCas systems
1857
1857
1857
RAMP1688
10
Cordip DIP0037Bacfra BF3951
Wolsuc WS1444Camjej Cj1522c
Neimen NMA0630Pasmul PM1126
Strthe str0658Strpyo Spy 0770
Treden TDE0328Staepi SERP2463
Mycgal MGA 0523Mycmob MMOB0320
Porgin PG1982Legpne lpp0161
Wolsuc WS1615Sulsol SSO1450
Aerper APE1240Sulsol SSO1405
Pyraer PAE0200Pyraer PAE0081
Metmaz MM0559Metmaz MM3249
Arcful AF1878Mansuc MS1635
Niteur NE0111Myctub Rv2817c
Nostoc alr0381Synech slr7071Thethe TTHB145
Chrvio CV1229Xanaxo XAC3842
Chltep CT1130Desvul DVUA0134
Metcap MCA0651Azoarc ebA3283
Deigeo_DRAFT_1682Mansuc MS0981
Baccla ABC3592Bachal BH0341
Strpyo Spy 1286Vibvul VVA1544
Nostoc alr1468Synech slr7092
Synech slr7016Geosul GSU0057
Thethe TTHB224Nostoc alr1568
Metkan MK1312
Esccol b2755Saltyp STM2938
Phopro PBPRC0034Geosul GSU1392
Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639
Symthe_STH663Corjei jk0643
Nocfar nfa44220Strave SAV7537
Metcap MCA0930Cordip DIP2214
Chrvio CV1756Zymmob ZMO0680
Pasmul PM0311Erwcar ECA3679
Yerpes y1722Legpne lpl2837Phopro PBPRB1995
Acinet ACIAD2484
Clotet CTC01148Fusnuc FN1177
Aquaeo aq 369Theten TTE2658Themar TM1797
Bacfra BF2544Porgin PG2014
Halmar pNG4053Clotet CTC01463
Metjan MJ0378Pyrhor PH1245
Thekod TK0455Metthe MTH1084
Arcful AF2435Metace MA3670
Nanequ NEQ017Pyrhor PH0173
Pictor PTO0003Pictor PTO0049Thevol TVN0106
1343
1343
1343
1343
1518
1518
1343
1518
RecB-f4343
RecB-f1468
RecB-f1468
RecB-f1468
Helicase12031857 RAMP
1688
RAMP1688
Helicase1203
Helicase1203
RAMPBH0337
RAMPYgcH
RAMPy1727
RAMPy17261857
Helicase1203
HD-f
Helicase1203
HD-f
1343
HD-f
HD-f
HD-f
RuvC-f
AF0070
BH0338
y1724
ygcKygcL
RAMP1583
HTH
AF1870
SPy1049
3574
AF1870
PH0918
HTH 3574
1857 RAMP1688
RecB-f1468
Helicase1203
HD-f
AF1870
MTH1090
AF1873
RAMP1583
3574
AF1870
3574
1343
1
7
2
3
4
5
6
7a
Sulsol SSO1999Sulsol SSO1440
Sultok ST2639Sultok ST0032
Sulsol SSO1402Pyraer PAE0068
Arcful AF1874Pyrhor PH0917Thekod TK0450
Pyraby PAB1689
13433513
3513
1518
RuvC-f
HNH
4aHNH
1 0
Pyrhor PH0162
6
64
6
4
16
17
7
7
7
7
1
2
7
7
7
7
5 7
5 75 7
5 75 7
5 7
6 1
7
Metkan MK1314
Metace MA1931
Metmaz MM3356
Metbar Mbar_A1355
Strthe stu0960Myctub Rv2823cStaepi SERP2461
Themar TM1811Mansuc MS1651
Niteur NE0123Thethe TTP0102
Metthe MTH1082
Metthe MTH326
Metjan MJ1672Pictor PTO0053Thevol TVN0112
Metkan MK1297Thethe TTP0115
Bachal BH0328Synech sll7090
Aquaeo aq 387Theten TTE2639
Themar TM1794Arcful AF1867
Pyrfur PF1129Thefus Tfu1578
Porgin PG1987Sultok ST1979
Sulsol SSO1991Sulsol SSO1729
Sultok ST0012Sulaci Saci 2046Nostoc all1479
1421 RAMP1567
RAMP1583
RAMPMTH323
RAMP1769
XMTH324
ldquoPolymeraserdquo cassette
ldquoHelicaserdquo cassette(A)
(B)
(C)
Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4
Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56
Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)
mRNA targeting
Phylogeny of Cas1 and the 3 types of CRISPRCas systems
228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes
EcoliCASS2Type IType I
Type I
Type I
Type I
Type III
Type III
Type III
Type III
Type III
Type III
Type II
YpestCASS3
NmeniCASS4
PolymeraseRAMPmoduleMtubeCASS6
PolRAMP
DvulgCASS1
TneapHmariCASS7
ApernCASS5
cas1cas9 cas2 cas4
cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI
II
IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2
RAMP
Makarova et al NRM submitted
CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes
54
256
1537
26
516
2
9
7 56107
3
10
13
383
210
46
216
8
1
7 70211
10
1
0102030405060708090
100
1623
84 6
17 7
2322
0
9
0
0
151
2
1
21
17
20 1
0
1533 28 7 14
0
9 7 40
117 210
0102030405060708090
100
Type I Type II Type IIICas1presenceabsence
Makarova et al in preparation
Modular evolution of the 3 types of CRISPRCas
Makarova et al in preparation
Mol Microbiol 2011 Jan79(2)484-502
A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair
Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF
Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair
Back to repair functions
The 3 major modalities of evolution
Koonin Wolf Biol Direct 2009
CRISPRCas as a bona fide Lamarckian system
Koonin Wolf Biol Direct 2009
Phenomenon Biological rolefunction
Phyletic spread
Lamarckian criteria
Genomic changes caused by environmental factor
Changes are specific to relevant genomic loci
Changes provide adaptation to the causative factor
Bona fide LamarckianCRISPRCas Defense against
viruses and other mobile elements
Most of the Archaeaand many bacteria
Yes Yes Yes
piRNA Defense against transposable elements in germline
Animals Yes Yes Yes
HGT (specific cases)
Adaptation to new environment stress response resistance
Archaea bacteria unicellular eukaryotes
Yes Yes Yes
Quasi-LamarckianHGT (general phenomenon)
Diverse innovations Archaea bacteria unicellular eukaryotes
Yes No Yesno
Stress-induced mutagenesis
Stress responseresistanceadaptation to new conditions
Ubiquitous Yes No or partially
Yes (but general evolvabilityenhanced as well)
Diverse Lamarckian and quasi-Lamarckian phenomena
Koonin Wolf Biol Direct 2009
Stress as a gauge of evolutionary modality
Koonin Wolf Biol Direct 2009
bullEvolution of parasites is intrinsic to any replicator system
bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since
bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms
bullPerennial arms race between parasites and hosts is one of the principal factors of evolution
Valerian DoljaOregon State University
Yuri Wolf NCBITatiana Senkevich NIAID NIH
Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille
Didier Raoult et al
Bill MartinUniv Duesseldorf
Kira Makarova NCBI
PolioV74 kb
VP0 3DPol2C 3CPro Angg
IRESVP1VP3 2APro 2B
3A 3B
Jelly-roll CPsSuperfamily 3
helicase RdRp
DNA phages Bacterial group II
retroelements
Bacterial and mitochondrialHtrA family proteases
Likely origins of the signature genes of the picorna-like superfamily were inferred usingsignature PSSMs and PSI-BLAST searches
3C-Pro
Hallmark genes of picorna-like viruses have distinct prokaryoticorigins
Radiation of major viralclades occurred in aldquoBig Bangrdquo during eukaryogenesis and antedates the divergenceof eukaryotic supergroups-the viruses then ldquosampledrdquo the hosts
Conclusions on the picorna and NCLDV storiesBig Bangs of virus evolution
bullPhylogenomic analysis of the picorna-like superfamily of eukaryotic RNA viruses indicates that the major lineages within this superfamily diverged prior to the divergence of the eukaryotic supergroupsbullBig Bang ndash an explosive early phase of viral evolutionbullMost likely the same pattern holds for other major groups ofviruses as illustrated by the evolutionary study of NCLDVa completely different group of virusesbullThe Big Bangs of eukaryotic virus evolution occurred concomitantly with a similar rapid phase of host evolution andcould be a manifestation of a general model of major evolutionarytransitions
Koonin The Biological Big Bang model for the major transitions in evolutionBiol Direct 2007 Aug 20221
Koonin Wolf Nagasaki Dolja Nat Rev Microbiol 2008 Dec6(12)925-39
+
1 Positive-strandRNA
3-30 kb - +R R ET
T+R
Class Replication cycle
2 Double-strandRNA
4-25 kb plusmnTr R E
T
+ plusmn
3 Negative-strandRNA
11-20 kbR R E
RdRp CPT
+-Tr -
+
Genetic cycles of RNA viruses
Diversity of viral genetic cycles versus the uniform genetic cycle of all cellular organisms Viruses are the
biospherersquos laboratory of genomic strategies
RdRp CP
RdRp CP
+4 Retroid
RNA viruses 7-12 kb +RT Tr E
T+Tr
plusmn
5 Retroid DNAviruses
elements2-10 kb +Tr
T+Tr
plusmnE RT
plusmn
RT CP
RT
Class Replication cycle
Genetic cycles of retroid viruses and retroelements
Genetic cycles of DNA viruses and plasmids 6 ssDNAvirusesplasmids2-11 kb
7 dsDNA virusesplasmids
5-1200 kb
+RCR RCR E
+Tr
plusmn +
TRCE
Tr
plusmn
+T
plusmnR E
Pr-Pol ALL CELLULARORGANISMS
YOU ARE HERE
I Genes with readily detectable homologs from cellular life forms 1 Genes with closely related homologs from cellular organisms
(typically the host of the given virus) present in a narrow group of viruses2 Genes that are conserved within a virus lineage or even several
lineages and have moderately close cellular homologsOrigin relatively recent (1) or ancient (2) acquisition from host
II Virus-specific genes3 ORFans ie genes without detectable homologs except possibly
in closely related viruses 4 Virus-specific genes that are conserved within a virus lineageAcquisition from host but with rapid divergence from ancestor once within viral genomes
III Viral hallmark genes5 Genes shared by many diverse virus lineages with only very distant
homologs in cellular organisms
Natural history of viral genes a one-page summary of viral comparative genomics
Contributions of different classes of viral genes to the genomes of different classes of viruses strong dependence on genome size
Genome size (log) 0 1 2 3
Recent acquisitions
Old acquisitions
Virus-specific ORFans
Virus-specific conserved
Virus hallmark
Most RNAvirusesretroelementsRCR replicons
Large RNA virusesadenotailedphages
NCLDVherpeslargephages
Natural history of viral genesViral Hallmark Genes
Shared by many diverse groups of viruses from the smallest RNA viruses to the giant DNA viruses
Strong support for monophyly of all viral members of the respective gene families
Only distant homologs in cellular organisms
Can be viewed as signatures of the lsquovirus statersquo
Play major roles in genome replication packagingand assembly
4 Rolling circle replicationinitiation endonuclease
1 Jelly-roll capsid protein
2 Superfamily 3 helicase
Protein products of viral hallmark genes
3 RNA-dependent RNA polymeraseand Reverse transcriptase
5 Viral archaeo-eukaryotic DNA primase
6 UL9-like superfamily 2 helicase
7 Packaging ATPase of the FtsK family
8 ATPase subunit of terminase
The primordial gene pool hypothesis (extremely counterintuitive ndash Santiago Elena Feb 16 2011)
The hallmark genes AND by implication the major lineages of modern viruses (at least viruses of prokaryotes) descend directly from a primordial gene pool
Koonin Senkevich Dolja Biol Direct 2006 129
Viral hallmark genes arebullpresent in a huge diversity of viruses and other selfish elementsbullrepresented only by remote homologs in cellular life forms
-synergy with the diversity of genomic strategies
A crucial corollary If viruses come directly from a primordial gene pool then origin of viruses isinextricably linked to the origin of cells
Cell degeneration Escaped genes Pr imordial geneticsystems
CELL
SMALLPARASITICCELL
VERYSMALLVIRUS
CHROMOSOME
PLASMID
VIRUS
RNA
VIRUS
PRE-CELLULAR LIFE FORMS
DNA
VIRUS
mRNA
VIRUS
Competing concepts of the or igin of viruses
CELLS
Origin and evolution of virus-like genetic
elementsin the pre-cellular
era
Koonin Martin TIG 2005Koonin Senkevich Dolja Biol Direct 2006 129Koonin Ann NY Acad Sci 2009
Replicon fusionyields chromosomes
bullViruses and virus-like genetic elements are not ldquojustrdquo pathogens they aredominant entities in the biospherebullEmergence of virus-like parasites is inevitable in any replicating systembullIn the pre-cellular epoch the genetic elements that later became viral andcellular genomes comprised a single pool in which they mixed matched andevolved new increasingly complex gene ensemblesbullDifferent replication strategies including RNA replication reverse transcriptionand DNA replication evolved already in the primordial genetic poolbullWith the emergence of prokaryotic cells a distinct pool of viral genes formed that retained its identity ever since as evidenced by the extant distribution ofviral hallmark genes ldquovirus worldrdquo or the virospherebullThe emergence of the eukaryotic cell was a second melting pot of virus evolution from which viruses of eukaryotes originated via recombination of genes from prokaryote viruses retroelements and the evolving eukaryotic hostbullViruses make essential contributions to the evolution of the genomes of cellularlife forms in particular as vehicles of HGT GTAs transducing phages
The ancient Virus World (VW)
Koonin EV Senkevich TG Dolja VV The ancient Virus World and evolution of cells Biol Direct 2006 Koonin EV Wolf YI Nagasaki K Dolja VV The complexity of the virus world Nat Rev Microbiol 2009
Evolution of antivirus defense systems
bull CRISPRCas system of adaptive immunity in prokaryotes
bull A case for Lamarckian evolutionbull The perennial virus-host arms race
CRISPR repeats and Cas genesCRISPR Clustered Regularly interspaced short palindromic repeatsCas CRISPR-associated (genes)
Sorek et al Nature Rev Microbiol 2008
Makarova KS Aravind L Grishin NV Rogozin IB Koonin EVA DNA repair system specific for thermophilic Archaea and bacteria predicted by genomic context analysisNucleic Acids Res 2002 Jan 1530(2)482-96
During a systematic analysis of conserved gene context in prokaryotic genomes a previously undetected complex partially conserved neighborhood consisting of more than 20 genes was discovered in most Archaea and some bacteria including the hyperthermophiles Thermotoga maritima and Aquifex aeolicus The gene composition and gene order in this neighborhood vary greatly between species but all versions have a stable conserved core that consists of five genes One of the core genes encodes a predicted DNA helicase often fused to a predicted HD-superfamily hydrolase and another encodes a RecB family exonuclease three core genes remain uncharacterized but one of these might encode a nuclease of a new family helliphelliphelliphellipThe functional features of the proteins encoded in this neighborhood suggest that they comprise a previously undetected DNA repair system which to our knowledge is the first repair system largely specific for thermophiles to be identified
Protein components of the system an update with unification of many diverse families
~25 families altogetherFamily SubfamilyA Phyletic
distributionBComments
1 COG1518 COG1518 (cas1) All Putative novel nucleaseintegrase Mostly α-helical protein
2 COG1343 COG1343 (cas2) COG3512ygbF-like
MTH324-likey1723_N-like
All Small protein related to VapD fused to helicase (COG1203) iny1723-like proteins
3 COG1203 COG1203 (cas3) All DNA helicase Most proteins have fusion to HD nuclease
4 RecB-like nuclease
COG1468 (cas4) COG4343 All RecB-like nuclease Contains three-cysteine C-terminal cluster
5 RAMP Repair-associated mysterious
protein
COG1688 (cas5)COG1769 COG1583COG1567 COG1336COG1367 COG1604COG1337 COG1332COG5551
BH0337-likeMJ0978-likeYgcH-like
y1726-like y1727-like
All Belong to ldquoRAMPrdquo family possibly RNA-binding proteinstructurally related to duplicated ferredoxin fold (PDB 1wj9)
6 COG1857 COG1857 COG3649 YgcJ-like y1725-like
All αβ protein predicted nuclease or integrase
7 HD-like nuclease
COG1203 (N-terminus) COG2254 All HD-like nuclease
8 BH0338 BH0338-likeMTH1090-like
All mostlyarchaea and FIRM
Large Zn-finger containing proteins probably nucleases (nucleaseactivity was shown for MTH1090 (ref)
9 COG1353 predicted
polymerase
COG1353 Most archaea some bacteria
Predicted palm-domain polymerase distantly related to viral RdRp and RT
hellipand ~15 other less common proteins Makarova et al BD 2006
CRISPR clustered regularly interspaced short palindromic repeats
CRISPR repeats
TB and TA
Strepto-like
Ecoli-like
Pasteurella-like
Cas2COG1343
Cas1COG1518
Cas4COG1468
Cas3COG1203
ldquoThe common structural characteristics of CRISPR loci are (i) the presence of multiple short direct repeats which show no or very little sequence variation within a given locus (ii) the presence of non-repetitive spacer sequences between the repeats of similar size (iii) the presence of a common leader sequence of a few hundred basepairs in most species harbouring multiple CRISPR loci (iv) the absence of long open reading frames within the locus and (v) the presence of the cas1 gene accompanied by the cas2 cas3 or cas4 genes in CRISPR-containing speciesrdquo
The cas genes are our ldquorepairrdquo system
ldquoHelicaserdquo cassette
ldquoPolymeraserdquo cassette
paragon of prokaryotic genome plasticity-extensive gene shuffling-evidence of multiple horizontal transfers-widespread gene loss
RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins
extreme sequence diversity MOTIFs specific I II III IV V COGs 13361367 hhshhGs ust-lKGhh+hh hhGtt hD lGhttsGh 160413371332 y1726-like slhlpEKuVRGT lRTIDs YGuVTsGhuh COG1851 hGphpGpsaFh hGFGRh BH0337-like TpA-h+GIh-uIh hhLpDV LGsREhuht COG1567 hhhpp ups-lhtAhh luscoGhGh COG1769 hhhh+Ph-hhh ss-hhGhlhsh hGh hGtcp+hsthchp COG1688 (Cas5) hhhhhts sssshhGhlsh lGttphh COGs 15835551 hhhhoPhhl hGtppshGFGl YgcH-like hHphlh hGu+uhGhGhh y1727-like LHphLh GFsthGLStss MJ0978-like hHNH lG+tsuhGhGol
Ferredoxin foldRNA-recognition motif (RRM)
The sequence similarity space of CRISPR repeats visualized with the BioLayoutprogram [26] Dots denote individual repeat sequences connecting lines represent Smith-Waterman similarities such that closer dots represent more similar sequences Dot colors denote cluster association as derived from MCL clustering The 12 largest clusters are indicated by circles together with their sequence logos coarse phylogenetic composition and sample secondary structures where applicableKunin et al Genome Biology 2007 8R61 doi101186gb-2007-8-4-r61
CRISPR show extreme diversity and complex clustering
Arcful GTTGAAATC-------AGACCAAAATGGGATTGAAAG- Metthe GTTAAAATC-------AGACCAAAATGGGATTGAAAT- Pyraby GTTCCAATA-------AGACTAGAATAGAATTGAAAG- Pyrhor GTTTAATAA--------GACTAAAATAGAATTGAAAG- Sulsol GATTAATCC-------------CAAAAGAATTGAAAG- Pyrhor GTTTCCGTA--------GAACTAAATAGTGTGGAAAG- Pyraby GTTTCCGTA--------GAACTTAGTAGTGTGGAAAG- Pyraer GTTTCAACT------------ATCTTTTGATTTTTGG- Pictor GTTAAATAA-------TAACCTAAATAGGATTGAAAG- Theaer GTAAAATAG---------ACCTTAATAGGATTGAAAG- Metace GTTTCAATC-------CCTCAAAGGTCTGATTTTAAC- Sultok GATGAATCC------------CAAAAGGAATTGAAAG- Sulaci GTTTTAGTT-------------TCTTGTCGTTATTAC- Pictor GTTTAAGAA--------TTACTAGATAGTATGGAGT-- Halmar GTTTCAGAC-------GGACCCTTGTGGGGTTGAAGC- Metjan ATTAAAATC-------AGACCGTTTCGGAATGGAAAT- Metkan GTTTCATTA-CCCGTATTATTACGGGTTAATTGCGAG- Nanequ CTTTCAATA---------TTTCTAATATATTAGAAAC- Themar GTTTCAATA-------CTTCCTTAGAGGTATGGAAAC- Theten GTTTCAATC-------CCTTTTAGGTAGGCTAAAAAC Thethe GTTGCAAAC-CTCGTTAGCCTCGTAGAGGATTGAAAC- Aquaeo GTTTTAACT--------CCACACGGTACATTAGAAAC- Bachal GTCGCACTC----TATATGGGT-GCGTGGATTGAAAT- Azoarcus GTGTTCCCC-------GCGCATCGCGGGGGTTGAAG-- Chltep GTCTTCCCC--------ACGCC-CGTGGGGGTGTTTC- Chrvio GTGCTCCCC-------CACGCA-CGTGGGGATGAACCG Clotet GTATTAGTA-------GCACCA-TATTGGAATGTAAAT Anabaena GTTTTAATTAACAAAAATCCCTATCAGGGATTGAAAC- Myctub GTTTCCGTC--CCCTCTCGGGGTTTTGGGTCTGACGAC Yerpes GTTCACTGC--------CGCACAGGCAGCTTAGAAA-- Metcap GTTTCAATCCACTCCCGGCTATTTAGCCGGGAGATAC- Mycgal GTTTTAGCA-CTGTACAATACTTGTGTAAGCAATAAC- Pasmul GTTAACTGC--------CGTATAGGCAGCTTAGAAA-- Pasmul GTTGTAGTTCCCTCTCTCATTTCGCAGTGCTACAAT-- Pasmul GTTCACCAT--------CGTGTAGATGGCTTAGAAA--
All
A large subset of CRISPRs is conserved among diverse specieseven between archaea and bacteria suggesting that
CRISPRs are horizontally transferred together with cas genes
Mojica FJ Diez-Villasenor C Garcia-Martinez J Soria E
Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elementsJ Mol Evol 2005 Feb60(2)174-82
Here we show that CRISPR spacers derive from preexisting sequences either chromosomal or within transmissible genetic elements such as bacteriophages and conjugative plasmids
helliphelliphelliphellipThe transcription of the CRISPR loci (Tang et al 2002) suggests that such activity could be executed by CRISPR-RNA molecules acting as regulatory RNA that specifically recognizes the target through the homologous RNA-spacer sequence similarly to the eukaryotic interference RNA
HypothesisCRISPRCasbull is a prokaryotic immunity system that functions on the
RNAi principle
bull integrates short fragments of essential phageplasmid genes into CRISPRs
bull When expressed these fragments (psiRNA ndash after prokaryotic siRNA) silence the target gene and make the organism immune to the respective agent
bull contains all or most of the protein activities involved in these processes
bull Some of the Cas proteins are functional analogs of the eukaryotic proteins involved in RNAi in particular components of RISC (RNA-Induced Silencing Complex) and form prokaryotic analogs of RISC (pRISCs)
Transcription
(protein-guided) RNA folding
specific transcription factor (COG1517)
cellular RNApol
polycistronic pre-psiRNA
p-dicer (Helicase+HD-Hydrolase)(COG1203 COG2254)
RNA processing(fast)
pre-psiRNA75-100 nt
RNA processing(slow)
p-dicer
psiRNA25-45 nt
p-slicer(COG1468 COG4343
COG1857)
RAMP
target RNA cleavage
p-RISC
plasmid or phage mRNA
Annealing to RNA target
p-RISC
The basic scheme of CASS functioning
psiRNA25-45 nt
RAMP
RAMP-RNA complex(unstable)plasmid or phage
mRNA
Primer elongationCASS RNApol(COG1353)
Amplified annealing
long dsRNA (stable)
p-dicerDuplex degradation
RAMPRAMP binding
Cycle continues
Annealing to RNA target
Variant of CASS functioning with polymerasepsi-RNA amplification(mostly in thermophiles but also in Mycobacteria)
plasmid or phage mRNA
pre-psiRNA75-100 nt
CASS RNApol(COG1353)Or other RT
Reverse transcription with random copy choice
Random RNA recombination and reverse transcription
dsDNA with CRISPRs and target-derived spacers
Homologous recombination with genomic CRISPR region
integrase(COG1518)
genomic DNA
genomic DNA with new target-derived spacer
OR
New psiRNA generation
Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12
Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity
Key validation
bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease
Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes
(1) Adaptation
(2) Expression amp Processing
(3) Interference
invading virus plasmid
CRISPRleader
5 4 3 2 1
6 5 4 3 2 1
invading virus plasmid
invading virus plasmid
host
host
cas operon
Cascade
Cas3
cas3 cas2cas1cas5
Van der Oost et al TIBS 2009
The three types of CRISPRCas systems and their signature genes
1343
1343
1343
1343
Helicase1203
HD-f
Cas2
Cas6
Cas6
Cas5
Cas8
Cas9
Cas1
Cas3
Cas10
Cas11
Cas7
Cas6
Cas6
Cas5 Cas6
Cas4
Cas6
13433513
1343
RuvC-f
3513
HNH
13433513
1857
18571857 RAMP1688
RAMP1688
RecB-f1468
RecB-f1468
Helicase1203
R AMP1583
1857 1857Helicase1203
1857Helicase1203
1343 y1724
ygcKygcL
EcoliCASS2
YpestCASS3
NmeniCASS4
MtubeCASS6 RAMP module
DvulgCASS1
Tneap and HmariCASS7
ApernCASS5
Type I (Helicase cassette)
Cascade
Type II (HNH-type)
Type III (RAMP module or Polymerase cassette)
CASS4a
1857 1343RecB-f1468
Helicase1203
BH0338MTH1090LA3191
1421 RAMP1567
RAMP15833337RAMP
1583
RAMP1583
RAMP1769
SPy1049
Cas6
Cas61857
RecB-f1468
RAMP1688
Helicase1203
AF0070
R AMP1583
1343
1517
Cas12 Cas13
RAMP1688
RAMP1688
RAMP1688
Makarova et al NRM submitted
Experimental data on CRISPRCas systems
1857
1857
1857
RAMP1688
10
Cordip DIP0037Bacfra BF3951
Wolsuc WS1444Camjej Cj1522c
Neimen NMA0630Pasmul PM1126
Strthe str0658Strpyo Spy 0770
Treden TDE0328Staepi SERP2463
Mycgal MGA 0523Mycmob MMOB0320
Porgin PG1982Legpne lpp0161
Wolsuc WS1615Sulsol SSO1450
Aerper APE1240Sulsol SSO1405
Pyraer PAE0200Pyraer PAE0081
Metmaz MM0559Metmaz MM3249
Arcful AF1878Mansuc MS1635
Niteur NE0111Myctub Rv2817c
Nostoc alr0381Synech slr7071Thethe TTHB145
Chrvio CV1229Xanaxo XAC3842
Chltep CT1130Desvul DVUA0134
Metcap MCA0651Azoarc ebA3283
Deigeo_DRAFT_1682Mansuc MS0981
Baccla ABC3592Bachal BH0341
Strpyo Spy 1286Vibvul VVA1544
Nostoc alr1468Synech slr7092
Synech slr7016Geosul GSU0057
Thethe TTHB224Nostoc alr1568
Metkan MK1312
Esccol b2755Saltyp STM2938
Phopro PBPRC0034Geosul GSU1392
Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639
Symthe_STH663Corjei jk0643
Nocfar nfa44220Strave SAV7537
Metcap MCA0930Cordip DIP2214
Chrvio CV1756Zymmob ZMO0680
Pasmul PM0311Erwcar ECA3679
Yerpes y1722Legpne lpl2837Phopro PBPRB1995
Acinet ACIAD2484
Clotet CTC01148Fusnuc FN1177
Aquaeo aq 369Theten TTE2658Themar TM1797
Bacfra BF2544Porgin PG2014
Halmar pNG4053Clotet CTC01463
Metjan MJ0378Pyrhor PH1245
Thekod TK0455Metthe MTH1084
Arcful AF2435Metace MA3670
Nanequ NEQ017Pyrhor PH0173
Pictor PTO0003Pictor PTO0049Thevol TVN0106
1343
1343
1343
1343
1518
1518
1343
1518
RecB-f4343
RecB-f1468
RecB-f1468
RecB-f1468
Helicase12031857 RAMP
1688
RAMP1688
Helicase1203
Helicase1203
RAMPBH0337
RAMPYgcH
RAMPy1727
RAMPy17261857
Helicase1203
HD-f
Helicase1203
HD-f
1343
HD-f
HD-f
HD-f
RuvC-f
AF0070
BH0338
y1724
ygcKygcL
RAMP1583
HTH
AF1870
SPy1049
3574
AF1870
PH0918
HTH 3574
1857 RAMP1688
RecB-f1468
Helicase1203
HD-f
AF1870
MTH1090
AF1873
RAMP1583
3574
AF1870
3574
1343
1
7
2
3
4
5
6
7a
Sulsol SSO1999Sulsol SSO1440
Sultok ST2639Sultok ST0032
Sulsol SSO1402Pyraer PAE0068
Arcful AF1874Pyrhor PH0917Thekod TK0450
Pyraby PAB1689
13433513
3513
1518
RuvC-f
HNH
4aHNH
1 0
Pyrhor PH0162
6
64
6
4
16
17
7
7
7
7
1
2
7
7
7
7
5 7
5 75 7
5 75 7
5 7
6 1
7
Metkan MK1314
Metace MA1931
Metmaz MM3356
Metbar Mbar_A1355
Strthe stu0960Myctub Rv2823cStaepi SERP2461
Themar TM1811Mansuc MS1651
Niteur NE0123Thethe TTP0102
Metthe MTH1082
Metthe MTH326
Metjan MJ1672Pictor PTO0053Thevol TVN0112
Metkan MK1297Thethe TTP0115
Bachal BH0328Synech sll7090
Aquaeo aq 387Theten TTE2639
Themar TM1794Arcful AF1867
Pyrfur PF1129Thefus Tfu1578
Porgin PG1987Sultok ST1979
Sulsol SSO1991Sulsol SSO1729
Sultok ST0012Sulaci Saci 2046Nostoc all1479
1421 RAMP1567
RAMP1583
RAMPMTH323
RAMP1769
XMTH324
ldquoPolymeraserdquo cassette
ldquoHelicaserdquo cassette(A)
(B)
(C)
Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4
Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56
Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)
mRNA targeting
Phylogeny of Cas1 and the 3 types of CRISPRCas systems
228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes
EcoliCASS2Type IType I
Type I
Type I
Type I
Type III
Type III
Type III
Type III
Type III
Type III
Type II
YpestCASS3
NmeniCASS4
PolymeraseRAMPmoduleMtubeCASS6
PolRAMP
DvulgCASS1
TneapHmariCASS7
ApernCASS5
cas1cas9 cas2 cas4
cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI
II
IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2
RAMP
Makarova et al NRM submitted
CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes
54
256
1537
26
516
2
9
7 56107
3
10
13
383
210
46
216
8
1
7 70211
10
1
0102030405060708090
100
1623
84 6
17 7
2322
0
9
0
0
151
2
1
21
17
20 1
0
1533 28 7 14
0
9 7 40
117 210
0102030405060708090
100
Type I Type II Type IIICas1presenceabsence
Makarova et al in preparation
Modular evolution of the 3 types of CRISPRCas
Makarova et al in preparation
Mol Microbiol 2011 Jan79(2)484-502
A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair
Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF
Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair
Back to repair functions
The 3 major modalities of evolution
Koonin Wolf Biol Direct 2009
CRISPRCas as a bona fide Lamarckian system
Koonin Wolf Biol Direct 2009
Phenomenon Biological rolefunction
Phyletic spread
Lamarckian criteria
Genomic changes caused by environmental factor
Changes are specific to relevant genomic loci
Changes provide adaptation to the causative factor
Bona fide LamarckianCRISPRCas Defense against
viruses and other mobile elements
Most of the Archaeaand many bacteria
Yes Yes Yes
piRNA Defense against transposable elements in germline
Animals Yes Yes Yes
HGT (specific cases)
Adaptation to new environment stress response resistance
Archaea bacteria unicellular eukaryotes
Yes Yes Yes
Quasi-LamarckianHGT (general phenomenon)
Diverse innovations Archaea bacteria unicellular eukaryotes
Yes No Yesno
Stress-induced mutagenesis
Stress responseresistanceadaptation to new conditions
Ubiquitous Yes No or partially
Yes (but general evolvabilityenhanced as well)
Diverse Lamarckian and quasi-Lamarckian phenomena
Koonin Wolf Biol Direct 2009
Stress as a gauge of evolutionary modality
Koonin Wolf Biol Direct 2009
bullEvolution of parasites is intrinsic to any replicator system
bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since
bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms
bullPerennial arms race between parasites and hosts is one of the principal factors of evolution
Valerian DoljaOregon State University
Yuri Wolf NCBITatiana Senkevich NIAID NIH
Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille
Didier Raoult et al
Bill MartinUniv Duesseldorf
Kira Makarova NCBI
Hallmark genes of picorna-like viruses have distinct prokaryoticorigins
Radiation of major viralclades occurred in aldquoBig Bangrdquo during eukaryogenesis and antedates the divergenceof eukaryotic supergroups-the viruses then ldquosampledrdquo the hosts
Conclusions on the picorna and NCLDV storiesBig Bangs of virus evolution
bullPhylogenomic analysis of the picorna-like superfamily of eukaryotic RNA viruses indicates that the major lineages within this superfamily diverged prior to the divergence of the eukaryotic supergroupsbullBig Bang ndash an explosive early phase of viral evolutionbullMost likely the same pattern holds for other major groups ofviruses as illustrated by the evolutionary study of NCLDVa completely different group of virusesbullThe Big Bangs of eukaryotic virus evolution occurred concomitantly with a similar rapid phase of host evolution andcould be a manifestation of a general model of major evolutionarytransitions
Koonin The Biological Big Bang model for the major transitions in evolutionBiol Direct 2007 Aug 20221
Koonin Wolf Nagasaki Dolja Nat Rev Microbiol 2008 Dec6(12)925-39
+
1 Positive-strandRNA
3-30 kb - +R R ET
T+R
Class Replication cycle
2 Double-strandRNA
4-25 kb plusmnTr R E
T
+ plusmn
3 Negative-strandRNA
11-20 kbR R E
RdRp CPT
+-Tr -
+
Genetic cycles of RNA viruses
Diversity of viral genetic cycles versus the uniform genetic cycle of all cellular organisms Viruses are the
biospherersquos laboratory of genomic strategies
RdRp CP
RdRp CP
+4 Retroid
RNA viruses 7-12 kb +RT Tr E
T+Tr
plusmn
5 Retroid DNAviruses
elements2-10 kb +Tr
T+Tr
plusmnE RT
plusmn
RT CP
RT
Class Replication cycle
Genetic cycles of retroid viruses and retroelements
Genetic cycles of DNA viruses and plasmids 6 ssDNAvirusesplasmids2-11 kb
7 dsDNA virusesplasmids
5-1200 kb
+RCR RCR E
+Tr
plusmn +
TRCE
Tr
plusmn
+T
plusmnR E
Pr-Pol ALL CELLULARORGANISMS
YOU ARE HERE
I Genes with readily detectable homologs from cellular life forms 1 Genes with closely related homologs from cellular organisms
(typically the host of the given virus) present in a narrow group of viruses2 Genes that are conserved within a virus lineage or even several
lineages and have moderately close cellular homologsOrigin relatively recent (1) or ancient (2) acquisition from host
II Virus-specific genes3 ORFans ie genes without detectable homologs except possibly
in closely related viruses 4 Virus-specific genes that are conserved within a virus lineageAcquisition from host but with rapid divergence from ancestor once within viral genomes
III Viral hallmark genes5 Genes shared by many diverse virus lineages with only very distant
homologs in cellular organisms
Natural history of viral genes a one-page summary of viral comparative genomics
Contributions of different classes of viral genes to the genomes of different classes of viruses strong dependence on genome size
Genome size (log) 0 1 2 3
Recent acquisitions
Old acquisitions
Virus-specific ORFans
Virus-specific conserved
Virus hallmark
Most RNAvirusesretroelementsRCR replicons
Large RNA virusesadenotailedphages
NCLDVherpeslargephages
Natural history of viral genesViral Hallmark Genes
Shared by many diverse groups of viruses from the smallest RNA viruses to the giant DNA viruses
Strong support for monophyly of all viral members of the respective gene families
Only distant homologs in cellular organisms
Can be viewed as signatures of the lsquovirus statersquo
Play major roles in genome replication packagingand assembly
4 Rolling circle replicationinitiation endonuclease
1 Jelly-roll capsid protein
2 Superfamily 3 helicase
Protein products of viral hallmark genes
3 RNA-dependent RNA polymeraseand Reverse transcriptase
5 Viral archaeo-eukaryotic DNA primase
6 UL9-like superfamily 2 helicase
7 Packaging ATPase of the FtsK family
8 ATPase subunit of terminase
The primordial gene pool hypothesis (extremely counterintuitive ndash Santiago Elena Feb 16 2011)
The hallmark genes AND by implication the major lineages of modern viruses (at least viruses of prokaryotes) descend directly from a primordial gene pool
Koonin Senkevich Dolja Biol Direct 2006 129
Viral hallmark genes arebullpresent in a huge diversity of viruses and other selfish elementsbullrepresented only by remote homologs in cellular life forms
-synergy with the diversity of genomic strategies
A crucial corollary If viruses come directly from a primordial gene pool then origin of viruses isinextricably linked to the origin of cells
Cell degeneration Escaped genes Pr imordial geneticsystems
CELL
SMALLPARASITICCELL
VERYSMALLVIRUS
CHROMOSOME
PLASMID
VIRUS
RNA
VIRUS
PRE-CELLULAR LIFE FORMS
DNA
VIRUS
mRNA
VIRUS
Competing concepts of the or igin of viruses
CELLS
Origin and evolution of virus-like genetic
elementsin the pre-cellular
era
Koonin Martin TIG 2005Koonin Senkevich Dolja Biol Direct 2006 129Koonin Ann NY Acad Sci 2009
Replicon fusionyields chromosomes
bullViruses and virus-like genetic elements are not ldquojustrdquo pathogens they aredominant entities in the biospherebullEmergence of virus-like parasites is inevitable in any replicating systembullIn the pre-cellular epoch the genetic elements that later became viral andcellular genomes comprised a single pool in which they mixed matched andevolved new increasingly complex gene ensemblesbullDifferent replication strategies including RNA replication reverse transcriptionand DNA replication evolved already in the primordial genetic poolbullWith the emergence of prokaryotic cells a distinct pool of viral genes formed that retained its identity ever since as evidenced by the extant distribution ofviral hallmark genes ldquovirus worldrdquo or the virospherebullThe emergence of the eukaryotic cell was a second melting pot of virus evolution from which viruses of eukaryotes originated via recombination of genes from prokaryote viruses retroelements and the evolving eukaryotic hostbullViruses make essential contributions to the evolution of the genomes of cellularlife forms in particular as vehicles of HGT GTAs transducing phages
The ancient Virus World (VW)
Koonin EV Senkevich TG Dolja VV The ancient Virus World and evolution of cells Biol Direct 2006 Koonin EV Wolf YI Nagasaki K Dolja VV The complexity of the virus world Nat Rev Microbiol 2009
Evolution of antivirus defense systems
bull CRISPRCas system of adaptive immunity in prokaryotes
bull A case for Lamarckian evolutionbull The perennial virus-host arms race
CRISPR repeats and Cas genesCRISPR Clustered Regularly interspaced short palindromic repeatsCas CRISPR-associated (genes)
Sorek et al Nature Rev Microbiol 2008
Makarova KS Aravind L Grishin NV Rogozin IB Koonin EVA DNA repair system specific for thermophilic Archaea and bacteria predicted by genomic context analysisNucleic Acids Res 2002 Jan 1530(2)482-96
During a systematic analysis of conserved gene context in prokaryotic genomes a previously undetected complex partially conserved neighborhood consisting of more than 20 genes was discovered in most Archaea and some bacteria including the hyperthermophiles Thermotoga maritima and Aquifex aeolicus The gene composition and gene order in this neighborhood vary greatly between species but all versions have a stable conserved core that consists of five genes One of the core genes encodes a predicted DNA helicase often fused to a predicted HD-superfamily hydrolase and another encodes a RecB family exonuclease three core genes remain uncharacterized but one of these might encode a nuclease of a new family helliphelliphelliphellipThe functional features of the proteins encoded in this neighborhood suggest that they comprise a previously undetected DNA repair system which to our knowledge is the first repair system largely specific for thermophiles to be identified
Protein components of the system an update with unification of many diverse families
~25 families altogetherFamily SubfamilyA Phyletic
distributionBComments
1 COG1518 COG1518 (cas1) All Putative novel nucleaseintegrase Mostly α-helical protein
2 COG1343 COG1343 (cas2) COG3512ygbF-like
MTH324-likey1723_N-like
All Small protein related to VapD fused to helicase (COG1203) iny1723-like proteins
3 COG1203 COG1203 (cas3) All DNA helicase Most proteins have fusion to HD nuclease
4 RecB-like nuclease
COG1468 (cas4) COG4343 All RecB-like nuclease Contains three-cysteine C-terminal cluster
5 RAMP Repair-associated mysterious
protein
COG1688 (cas5)COG1769 COG1583COG1567 COG1336COG1367 COG1604COG1337 COG1332COG5551
BH0337-likeMJ0978-likeYgcH-like
y1726-like y1727-like
All Belong to ldquoRAMPrdquo family possibly RNA-binding proteinstructurally related to duplicated ferredoxin fold (PDB 1wj9)
6 COG1857 COG1857 COG3649 YgcJ-like y1725-like
All αβ protein predicted nuclease or integrase
7 HD-like nuclease
COG1203 (N-terminus) COG2254 All HD-like nuclease
8 BH0338 BH0338-likeMTH1090-like
All mostlyarchaea and FIRM
Large Zn-finger containing proteins probably nucleases (nucleaseactivity was shown for MTH1090 (ref)
9 COG1353 predicted
polymerase
COG1353 Most archaea some bacteria
Predicted palm-domain polymerase distantly related to viral RdRp and RT
hellipand ~15 other less common proteins Makarova et al BD 2006
CRISPR clustered regularly interspaced short palindromic repeats
CRISPR repeats
TB and TA
Strepto-like
Ecoli-like
Pasteurella-like
Cas2COG1343
Cas1COG1518
Cas4COG1468
Cas3COG1203
ldquoThe common structural characteristics of CRISPR loci are (i) the presence of multiple short direct repeats which show no or very little sequence variation within a given locus (ii) the presence of non-repetitive spacer sequences between the repeats of similar size (iii) the presence of a common leader sequence of a few hundred basepairs in most species harbouring multiple CRISPR loci (iv) the absence of long open reading frames within the locus and (v) the presence of the cas1 gene accompanied by the cas2 cas3 or cas4 genes in CRISPR-containing speciesrdquo
The cas genes are our ldquorepairrdquo system
ldquoHelicaserdquo cassette
ldquoPolymeraserdquo cassette
paragon of prokaryotic genome plasticity-extensive gene shuffling-evidence of multiple horizontal transfers-widespread gene loss
RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins
extreme sequence diversity MOTIFs specific I II III IV V COGs 13361367 hhshhGs ust-lKGhh+hh hhGtt hD lGhttsGh 160413371332 y1726-like slhlpEKuVRGT lRTIDs YGuVTsGhuh COG1851 hGphpGpsaFh hGFGRh BH0337-like TpA-h+GIh-uIh hhLpDV LGsREhuht COG1567 hhhpp ups-lhtAhh luscoGhGh COG1769 hhhh+Ph-hhh ss-hhGhlhsh hGh hGtcp+hsthchp COG1688 (Cas5) hhhhhts sssshhGhlsh lGttphh COGs 15835551 hhhhoPhhl hGtppshGFGl YgcH-like hHphlh hGu+uhGhGhh y1727-like LHphLh GFsthGLStss MJ0978-like hHNH lG+tsuhGhGol
Ferredoxin foldRNA-recognition motif (RRM)
The sequence similarity space of CRISPR repeats visualized with the BioLayoutprogram [26] Dots denote individual repeat sequences connecting lines represent Smith-Waterman similarities such that closer dots represent more similar sequences Dot colors denote cluster association as derived from MCL clustering The 12 largest clusters are indicated by circles together with their sequence logos coarse phylogenetic composition and sample secondary structures where applicableKunin et al Genome Biology 2007 8R61 doi101186gb-2007-8-4-r61
CRISPR show extreme diversity and complex clustering
Arcful GTTGAAATC-------AGACCAAAATGGGATTGAAAG- Metthe GTTAAAATC-------AGACCAAAATGGGATTGAAAT- Pyraby GTTCCAATA-------AGACTAGAATAGAATTGAAAG- Pyrhor GTTTAATAA--------GACTAAAATAGAATTGAAAG- Sulsol GATTAATCC-------------CAAAAGAATTGAAAG- Pyrhor GTTTCCGTA--------GAACTAAATAGTGTGGAAAG- Pyraby GTTTCCGTA--------GAACTTAGTAGTGTGGAAAG- Pyraer GTTTCAACT------------ATCTTTTGATTTTTGG- Pictor GTTAAATAA-------TAACCTAAATAGGATTGAAAG- Theaer GTAAAATAG---------ACCTTAATAGGATTGAAAG- Metace GTTTCAATC-------CCTCAAAGGTCTGATTTTAAC- Sultok GATGAATCC------------CAAAAGGAATTGAAAG- Sulaci GTTTTAGTT-------------TCTTGTCGTTATTAC- Pictor GTTTAAGAA--------TTACTAGATAGTATGGAGT-- Halmar GTTTCAGAC-------GGACCCTTGTGGGGTTGAAGC- Metjan ATTAAAATC-------AGACCGTTTCGGAATGGAAAT- Metkan GTTTCATTA-CCCGTATTATTACGGGTTAATTGCGAG- Nanequ CTTTCAATA---------TTTCTAATATATTAGAAAC- Themar GTTTCAATA-------CTTCCTTAGAGGTATGGAAAC- Theten GTTTCAATC-------CCTTTTAGGTAGGCTAAAAAC Thethe GTTGCAAAC-CTCGTTAGCCTCGTAGAGGATTGAAAC- Aquaeo GTTTTAACT--------CCACACGGTACATTAGAAAC- Bachal GTCGCACTC----TATATGGGT-GCGTGGATTGAAAT- Azoarcus GTGTTCCCC-------GCGCATCGCGGGGGTTGAAG-- Chltep GTCTTCCCC--------ACGCC-CGTGGGGGTGTTTC- Chrvio GTGCTCCCC-------CACGCA-CGTGGGGATGAACCG Clotet GTATTAGTA-------GCACCA-TATTGGAATGTAAAT Anabaena GTTTTAATTAACAAAAATCCCTATCAGGGATTGAAAC- Myctub GTTTCCGTC--CCCTCTCGGGGTTTTGGGTCTGACGAC Yerpes GTTCACTGC--------CGCACAGGCAGCTTAGAAA-- Metcap GTTTCAATCCACTCCCGGCTATTTAGCCGGGAGATAC- Mycgal GTTTTAGCA-CTGTACAATACTTGTGTAAGCAATAAC- Pasmul GTTAACTGC--------CGTATAGGCAGCTTAGAAA-- Pasmul GTTGTAGTTCCCTCTCTCATTTCGCAGTGCTACAAT-- Pasmul GTTCACCAT--------CGTGTAGATGGCTTAGAAA--
All
A large subset of CRISPRs is conserved among diverse specieseven between archaea and bacteria suggesting that
CRISPRs are horizontally transferred together with cas genes
Mojica FJ Diez-Villasenor C Garcia-Martinez J Soria E
Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elementsJ Mol Evol 2005 Feb60(2)174-82
Here we show that CRISPR spacers derive from preexisting sequences either chromosomal or within transmissible genetic elements such as bacteriophages and conjugative plasmids
helliphelliphelliphellipThe transcription of the CRISPR loci (Tang et al 2002) suggests that such activity could be executed by CRISPR-RNA molecules acting as regulatory RNA that specifically recognizes the target through the homologous RNA-spacer sequence similarly to the eukaryotic interference RNA
HypothesisCRISPRCasbull is a prokaryotic immunity system that functions on the
RNAi principle
bull integrates short fragments of essential phageplasmid genes into CRISPRs
bull When expressed these fragments (psiRNA ndash after prokaryotic siRNA) silence the target gene and make the organism immune to the respective agent
bull contains all or most of the protein activities involved in these processes
bull Some of the Cas proteins are functional analogs of the eukaryotic proteins involved in RNAi in particular components of RISC (RNA-Induced Silencing Complex) and form prokaryotic analogs of RISC (pRISCs)
Transcription
(protein-guided) RNA folding
specific transcription factor (COG1517)
cellular RNApol
polycistronic pre-psiRNA
p-dicer (Helicase+HD-Hydrolase)(COG1203 COG2254)
RNA processing(fast)
pre-psiRNA75-100 nt
RNA processing(slow)
p-dicer
psiRNA25-45 nt
p-slicer(COG1468 COG4343
COG1857)
RAMP
target RNA cleavage
p-RISC
plasmid or phage mRNA
Annealing to RNA target
p-RISC
The basic scheme of CASS functioning
psiRNA25-45 nt
RAMP
RAMP-RNA complex(unstable)plasmid or phage
mRNA
Primer elongationCASS RNApol(COG1353)
Amplified annealing
long dsRNA (stable)
p-dicerDuplex degradation
RAMPRAMP binding
Cycle continues
Annealing to RNA target
Variant of CASS functioning with polymerasepsi-RNA amplification(mostly in thermophiles but also in Mycobacteria)
plasmid or phage mRNA
pre-psiRNA75-100 nt
CASS RNApol(COG1353)Or other RT
Reverse transcription with random copy choice
Random RNA recombination and reverse transcription
dsDNA with CRISPRs and target-derived spacers
Homologous recombination with genomic CRISPR region
integrase(COG1518)
genomic DNA
genomic DNA with new target-derived spacer
OR
New psiRNA generation
Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12
Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity
Key validation
bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease
Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes
(1) Adaptation
(2) Expression amp Processing
(3) Interference
invading virus plasmid
CRISPRleader
5 4 3 2 1
6 5 4 3 2 1
invading virus plasmid
invading virus plasmid
host
host
cas operon
Cascade
Cas3
cas3 cas2cas1cas5
Van der Oost et al TIBS 2009
The three types of CRISPRCas systems and their signature genes
1343
1343
1343
1343
Helicase1203
HD-f
Cas2
Cas6
Cas6
Cas5
Cas8
Cas9
Cas1
Cas3
Cas10
Cas11
Cas7
Cas6
Cas6
Cas5 Cas6
Cas4
Cas6
13433513
1343
RuvC-f
3513
HNH
13433513
1857
18571857 RAMP1688
RAMP1688
RecB-f1468
RecB-f1468
Helicase1203
R AMP1583
1857 1857Helicase1203
1857Helicase1203
1343 y1724
ygcKygcL
EcoliCASS2
YpestCASS3
NmeniCASS4
MtubeCASS6 RAMP module
DvulgCASS1
Tneap and HmariCASS7
ApernCASS5
Type I (Helicase cassette)
Cascade
Type II (HNH-type)
Type III (RAMP module or Polymerase cassette)
CASS4a
1857 1343RecB-f1468
Helicase1203
BH0338MTH1090LA3191
1421 RAMP1567
RAMP15833337RAMP
1583
RAMP1583
RAMP1769
SPy1049
Cas6
Cas61857
RecB-f1468
RAMP1688
Helicase1203
AF0070
R AMP1583
1343
1517
Cas12 Cas13
RAMP1688
RAMP1688
RAMP1688
Makarova et al NRM submitted
Experimental data on CRISPRCas systems
1857
1857
1857
RAMP1688
10
Cordip DIP0037Bacfra BF3951
Wolsuc WS1444Camjej Cj1522c
Neimen NMA0630Pasmul PM1126
Strthe str0658Strpyo Spy 0770
Treden TDE0328Staepi SERP2463
Mycgal MGA 0523Mycmob MMOB0320
Porgin PG1982Legpne lpp0161
Wolsuc WS1615Sulsol SSO1450
Aerper APE1240Sulsol SSO1405
Pyraer PAE0200Pyraer PAE0081
Metmaz MM0559Metmaz MM3249
Arcful AF1878Mansuc MS1635
Niteur NE0111Myctub Rv2817c
Nostoc alr0381Synech slr7071Thethe TTHB145
Chrvio CV1229Xanaxo XAC3842
Chltep CT1130Desvul DVUA0134
Metcap MCA0651Azoarc ebA3283
Deigeo_DRAFT_1682Mansuc MS0981
Baccla ABC3592Bachal BH0341
Strpyo Spy 1286Vibvul VVA1544
Nostoc alr1468Synech slr7092
Synech slr7016Geosul GSU0057
Thethe TTHB224Nostoc alr1568
Metkan MK1312
Esccol b2755Saltyp STM2938
Phopro PBPRC0034Geosul GSU1392
Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639
Symthe_STH663Corjei jk0643
Nocfar nfa44220Strave SAV7537
Metcap MCA0930Cordip DIP2214
Chrvio CV1756Zymmob ZMO0680
Pasmul PM0311Erwcar ECA3679
Yerpes y1722Legpne lpl2837Phopro PBPRB1995
Acinet ACIAD2484
Clotet CTC01148Fusnuc FN1177
Aquaeo aq 369Theten TTE2658Themar TM1797
Bacfra BF2544Porgin PG2014
Halmar pNG4053Clotet CTC01463
Metjan MJ0378Pyrhor PH1245
Thekod TK0455Metthe MTH1084
Arcful AF2435Metace MA3670
Nanequ NEQ017Pyrhor PH0173
Pictor PTO0003Pictor PTO0049Thevol TVN0106
1343
1343
1343
1343
1518
1518
1343
1518
RecB-f4343
RecB-f1468
RecB-f1468
RecB-f1468
Helicase12031857 RAMP
1688
RAMP1688
Helicase1203
Helicase1203
RAMPBH0337
RAMPYgcH
RAMPy1727
RAMPy17261857
Helicase1203
HD-f
Helicase1203
HD-f
1343
HD-f
HD-f
HD-f
RuvC-f
AF0070
BH0338
y1724
ygcKygcL
RAMP1583
HTH
AF1870
SPy1049
3574
AF1870
PH0918
HTH 3574
1857 RAMP1688
RecB-f1468
Helicase1203
HD-f
AF1870
MTH1090
AF1873
RAMP1583
3574
AF1870
3574
1343
1
7
2
3
4
5
6
7a
Sulsol SSO1999Sulsol SSO1440
Sultok ST2639Sultok ST0032
Sulsol SSO1402Pyraer PAE0068
Arcful AF1874Pyrhor PH0917Thekod TK0450
Pyraby PAB1689
13433513
3513
1518
RuvC-f
HNH
4aHNH
1 0
Pyrhor PH0162
6
64
6
4
16
17
7
7
7
7
1
2
7
7
7
7
5 7
5 75 7
5 75 7
5 7
6 1
7
Metkan MK1314
Metace MA1931
Metmaz MM3356
Metbar Mbar_A1355
Strthe stu0960Myctub Rv2823cStaepi SERP2461
Themar TM1811Mansuc MS1651
Niteur NE0123Thethe TTP0102
Metthe MTH1082
Metthe MTH326
Metjan MJ1672Pictor PTO0053Thevol TVN0112
Metkan MK1297Thethe TTP0115
Bachal BH0328Synech sll7090
Aquaeo aq 387Theten TTE2639
Themar TM1794Arcful AF1867
Pyrfur PF1129Thefus Tfu1578
Porgin PG1987Sultok ST1979
Sulsol SSO1991Sulsol SSO1729
Sultok ST0012Sulaci Saci 2046Nostoc all1479
1421 RAMP1567
RAMP1583
RAMPMTH323
RAMP1769
XMTH324
ldquoPolymeraserdquo cassette
ldquoHelicaserdquo cassette(A)
(B)
(C)
Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4
Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56
Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)
mRNA targeting
Phylogeny of Cas1 and the 3 types of CRISPRCas systems
228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes
EcoliCASS2Type IType I
Type I
Type I
Type I
Type III
Type III
Type III
Type III
Type III
Type III
Type II
YpestCASS3
NmeniCASS4
PolymeraseRAMPmoduleMtubeCASS6
PolRAMP
DvulgCASS1
TneapHmariCASS7
ApernCASS5
cas1cas9 cas2 cas4
cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI
II
IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2
RAMP
Makarova et al NRM submitted
CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes
54
256
1537
26
516
2
9
7 56107
3
10
13
383
210
46
216
8
1
7 70211
10
1
0102030405060708090
100
1623
84 6
17 7
2322
0
9
0
0
151
2
1
21
17
20 1
0
1533 28 7 14
0
9 7 40
117 210
0102030405060708090
100
Type I Type II Type IIICas1presenceabsence
Makarova et al in preparation
Modular evolution of the 3 types of CRISPRCas
Makarova et al in preparation
Mol Microbiol 2011 Jan79(2)484-502
A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair
Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF
Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair
Back to repair functions
The 3 major modalities of evolution
Koonin Wolf Biol Direct 2009
CRISPRCas as a bona fide Lamarckian system
Koonin Wolf Biol Direct 2009
Phenomenon Biological rolefunction
Phyletic spread
Lamarckian criteria
Genomic changes caused by environmental factor
Changes are specific to relevant genomic loci
Changes provide adaptation to the causative factor
Bona fide LamarckianCRISPRCas Defense against
viruses and other mobile elements
Most of the Archaeaand many bacteria
Yes Yes Yes
piRNA Defense against transposable elements in germline
Animals Yes Yes Yes
HGT (specific cases)
Adaptation to new environment stress response resistance
Archaea bacteria unicellular eukaryotes
Yes Yes Yes
Quasi-LamarckianHGT (general phenomenon)
Diverse innovations Archaea bacteria unicellular eukaryotes
Yes No Yesno
Stress-induced mutagenesis
Stress responseresistanceadaptation to new conditions
Ubiquitous Yes No or partially
Yes (but general evolvabilityenhanced as well)
Diverse Lamarckian and quasi-Lamarckian phenomena
Koonin Wolf Biol Direct 2009
Stress as a gauge of evolutionary modality
Koonin Wolf Biol Direct 2009
bullEvolution of parasites is intrinsic to any replicator system
bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since
bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms
bullPerennial arms race between parasites and hosts is one of the principal factors of evolution
Valerian DoljaOregon State University
Yuri Wolf NCBITatiana Senkevich NIAID NIH
Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille
Didier Raoult et al
Bill MartinUniv Duesseldorf
Kira Makarova NCBI
Conclusions on the picorna and NCLDV storiesBig Bangs of virus evolution
bullPhylogenomic analysis of the picorna-like superfamily of eukaryotic RNA viruses indicates that the major lineages within this superfamily diverged prior to the divergence of the eukaryotic supergroupsbullBig Bang ndash an explosive early phase of viral evolutionbullMost likely the same pattern holds for other major groups ofviruses as illustrated by the evolutionary study of NCLDVa completely different group of virusesbullThe Big Bangs of eukaryotic virus evolution occurred concomitantly with a similar rapid phase of host evolution andcould be a manifestation of a general model of major evolutionarytransitions
Koonin The Biological Big Bang model for the major transitions in evolutionBiol Direct 2007 Aug 20221
Koonin Wolf Nagasaki Dolja Nat Rev Microbiol 2008 Dec6(12)925-39
+
1 Positive-strandRNA
3-30 kb - +R R ET
T+R
Class Replication cycle
2 Double-strandRNA
4-25 kb plusmnTr R E
T
+ plusmn
3 Negative-strandRNA
11-20 kbR R E
RdRp CPT
+-Tr -
+
Genetic cycles of RNA viruses
Diversity of viral genetic cycles versus the uniform genetic cycle of all cellular organisms Viruses are the
biospherersquos laboratory of genomic strategies
RdRp CP
RdRp CP
+4 Retroid
RNA viruses 7-12 kb +RT Tr E
T+Tr
plusmn
5 Retroid DNAviruses
elements2-10 kb +Tr
T+Tr
plusmnE RT
plusmn
RT CP
RT
Class Replication cycle
Genetic cycles of retroid viruses and retroelements
Genetic cycles of DNA viruses and plasmids 6 ssDNAvirusesplasmids2-11 kb
7 dsDNA virusesplasmids
5-1200 kb
+RCR RCR E
+Tr
plusmn +
TRCE
Tr
plusmn
+T
plusmnR E
Pr-Pol ALL CELLULARORGANISMS
YOU ARE HERE
I Genes with readily detectable homologs from cellular life forms 1 Genes with closely related homologs from cellular organisms
(typically the host of the given virus) present in a narrow group of viruses2 Genes that are conserved within a virus lineage or even several
lineages and have moderately close cellular homologsOrigin relatively recent (1) or ancient (2) acquisition from host
II Virus-specific genes3 ORFans ie genes without detectable homologs except possibly
in closely related viruses 4 Virus-specific genes that are conserved within a virus lineageAcquisition from host but with rapid divergence from ancestor once within viral genomes
III Viral hallmark genes5 Genes shared by many diverse virus lineages with only very distant
homologs in cellular organisms
Natural history of viral genes a one-page summary of viral comparative genomics
Contributions of different classes of viral genes to the genomes of different classes of viruses strong dependence on genome size
Genome size (log) 0 1 2 3
Recent acquisitions
Old acquisitions
Virus-specific ORFans
Virus-specific conserved
Virus hallmark
Most RNAvirusesretroelementsRCR replicons
Large RNA virusesadenotailedphages
NCLDVherpeslargephages
Natural history of viral genesViral Hallmark Genes
Shared by many diverse groups of viruses from the smallest RNA viruses to the giant DNA viruses
Strong support for monophyly of all viral members of the respective gene families
Only distant homologs in cellular organisms
Can be viewed as signatures of the lsquovirus statersquo
Play major roles in genome replication packagingand assembly
4 Rolling circle replicationinitiation endonuclease
1 Jelly-roll capsid protein
2 Superfamily 3 helicase
Protein products of viral hallmark genes
3 RNA-dependent RNA polymeraseand Reverse transcriptase
5 Viral archaeo-eukaryotic DNA primase
6 UL9-like superfamily 2 helicase
7 Packaging ATPase of the FtsK family
8 ATPase subunit of terminase
The primordial gene pool hypothesis (extremely counterintuitive ndash Santiago Elena Feb 16 2011)
The hallmark genes AND by implication the major lineages of modern viruses (at least viruses of prokaryotes) descend directly from a primordial gene pool
Koonin Senkevich Dolja Biol Direct 2006 129
Viral hallmark genes arebullpresent in a huge diversity of viruses and other selfish elementsbullrepresented only by remote homologs in cellular life forms
-synergy with the diversity of genomic strategies
A crucial corollary If viruses come directly from a primordial gene pool then origin of viruses isinextricably linked to the origin of cells
Cell degeneration Escaped genes Pr imordial geneticsystems
CELL
SMALLPARASITICCELL
VERYSMALLVIRUS
CHROMOSOME
PLASMID
VIRUS
RNA
VIRUS
PRE-CELLULAR LIFE FORMS
DNA
VIRUS
mRNA
VIRUS
Competing concepts of the or igin of viruses
CELLS
Origin and evolution of virus-like genetic
elementsin the pre-cellular
era
Koonin Martin TIG 2005Koonin Senkevich Dolja Biol Direct 2006 129Koonin Ann NY Acad Sci 2009
Replicon fusionyields chromosomes
bullViruses and virus-like genetic elements are not ldquojustrdquo pathogens they aredominant entities in the biospherebullEmergence of virus-like parasites is inevitable in any replicating systembullIn the pre-cellular epoch the genetic elements that later became viral andcellular genomes comprised a single pool in which they mixed matched andevolved new increasingly complex gene ensemblesbullDifferent replication strategies including RNA replication reverse transcriptionand DNA replication evolved already in the primordial genetic poolbullWith the emergence of prokaryotic cells a distinct pool of viral genes formed that retained its identity ever since as evidenced by the extant distribution ofviral hallmark genes ldquovirus worldrdquo or the virospherebullThe emergence of the eukaryotic cell was a second melting pot of virus evolution from which viruses of eukaryotes originated via recombination of genes from prokaryote viruses retroelements and the evolving eukaryotic hostbullViruses make essential contributions to the evolution of the genomes of cellularlife forms in particular as vehicles of HGT GTAs transducing phages
The ancient Virus World (VW)
Koonin EV Senkevich TG Dolja VV The ancient Virus World and evolution of cells Biol Direct 2006 Koonin EV Wolf YI Nagasaki K Dolja VV The complexity of the virus world Nat Rev Microbiol 2009
Evolution of antivirus defense systems
bull CRISPRCas system of adaptive immunity in prokaryotes
bull A case for Lamarckian evolutionbull The perennial virus-host arms race
CRISPR repeats and Cas genesCRISPR Clustered Regularly interspaced short palindromic repeatsCas CRISPR-associated (genes)
Sorek et al Nature Rev Microbiol 2008
Makarova KS Aravind L Grishin NV Rogozin IB Koonin EVA DNA repair system specific for thermophilic Archaea and bacteria predicted by genomic context analysisNucleic Acids Res 2002 Jan 1530(2)482-96
During a systematic analysis of conserved gene context in prokaryotic genomes a previously undetected complex partially conserved neighborhood consisting of more than 20 genes was discovered in most Archaea and some bacteria including the hyperthermophiles Thermotoga maritima and Aquifex aeolicus The gene composition and gene order in this neighborhood vary greatly between species but all versions have a stable conserved core that consists of five genes One of the core genes encodes a predicted DNA helicase often fused to a predicted HD-superfamily hydrolase and another encodes a RecB family exonuclease three core genes remain uncharacterized but one of these might encode a nuclease of a new family helliphelliphelliphellipThe functional features of the proteins encoded in this neighborhood suggest that they comprise a previously undetected DNA repair system which to our knowledge is the first repair system largely specific for thermophiles to be identified
Protein components of the system an update with unification of many diverse families
~25 families altogetherFamily SubfamilyA Phyletic
distributionBComments
1 COG1518 COG1518 (cas1) All Putative novel nucleaseintegrase Mostly α-helical protein
2 COG1343 COG1343 (cas2) COG3512ygbF-like
MTH324-likey1723_N-like
All Small protein related to VapD fused to helicase (COG1203) iny1723-like proteins
3 COG1203 COG1203 (cas3) All DNA helicase Most proteins have fusion to HD nuclease
4 RecB-like nuclease
COG1468 (cas4) COG4343 All RecB-like nuclease Contains three-cysteine C-terminal cluster
5 RAMP Repair-associated mysterious
protein
COG1688 (cas5)COG1769 COG1583COG1567 COG1336COG1367 COG1604COG1337 COG1332COG5551
BH0337-likeMJ0978-likeYgcH-like
y1726-like y1727-like
All Belong to ldquoRAMPrdquo family possibly RNA-binding proteinstructurally related to duplicated ferredoxin fold (PDB 1wj9)
6 COG1857 COG1857 COG3649 YgcJ-like y1725-like
All αβ protein predicted nuclease or integrase
7 HD-like nuclease
COG1203 (N-terminus) COG2254 All HD-like nuclease
8 BH0338 BH0338-likeMTH1090-like
All mostlyarchaea and FIRM
Large Zn-finger containing proteins probably nucleases (nucleaseactivity was shown for MTH1090 (ref)
9 COG1353 predicted
polymerase
COG1353 Most archaea some bacteria
Predicted palm-domain polymerase distantly related to viral RdRp and RT
hellipand ~15 other less common proteins Makarova et al BD 2006
CRISPR clustered regularly interspaced short palindromic repeats
CRISPR repeats
TB and TA
Strepto-like
Ecoli-like
Pasteurella-like
Cas2COG1343
Cas1COG1518
Cas4COG1468
Cas3COG1203
ldquoThe common structural characteristics of CRISPR loci are (i) the presence of multiple short direct repeats which show no or very little sequence variation within a given locus (ii) the presence of non-repetitive spacer sequences between the repeats of similar size (iii) the presence of a common leader sequence of a few hundred basepairs in most species harbouring multiple CRISPR loci (iv) the absence of long open reading frames within the locus and (v) the presence of the cas1 gene accompanied by the cas2 cas3 or cas4 genes in CRISPR-containing speciesrdquo
The cas genes are our ldquorepairrdquo system
ldquoHelicaserdquo cassette
ldquoPolymeraserdquo cassette
paragon of prokaryotic genome plasticity-extensive gene shuffling-evidence of multiple horizontal transfers-widespread gene loss
RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins
extreme sequence diversity MOTIFs specific I II III IV V COGs 13361367 hhshhGs ust-lKGhh+hh hhGtt hD lGhttsGh 160413371332 y1726-like slhlpEKuVRGT lRTIDs YGuVTsGhuh COG1851 hGphpGpsaFh hGFGRh BH0337-like TpA-h+GIh-uIh hhLpDV LGsREhuht COG1567 hhhpp ups-lhtAhh luscoGhGh COG1769 hhhh+Ph-hhh ss-hhGhlhsh hGh hGtcp+hsthchp COG1688 (Cas5) hhhhhts sssshhGhlsh lGttphh COGs 15835551 hhhhoPhhl hGtppshGFGl YgcH-like hHphlh hGu+uhGhGhh y1727-like LHphLh GFsthGLStss MJ0978-like hHNH lG+tsuhGhGol
Ferredoxin foldRNA-recognition motif (RRM)
The sequence similarity space of CRISPR repeats visualized with the BioLayoutprogram [26] Dots denote individual repeat sequences connecting lines represent Smith-Waterman similarities such that closer dots represent more similar sequences Dot colors denote cluster association as derived from MCL clustering The 12 largest clusters are indicated by circles together with their sequence logos coarse phylogenetic composition and sample secondary structures where applicableKunin et al Genome Biology 2007 8R61 doi101186gb-2007-8-4-r61
CRISPR show extreme diversity and complex clustering
Arcful GTTGAAATC-------AGACCAAAATGGGATTGAAAG- Metthe GTTAAAATC-------AGACCAAAATGGGATTGAAAT- Pyraby GTTCCAATA-------AGACTAGAATAGAATTGAAAG- Pyrhor GTTTAATAA--------GACTAAAATAGAATTGAAAG- Sulsol GATTAATCC-------------CAAAAGAATTGAAAG- Pyrhor GTTTCCGTA--------GAACTAAATAGTGTGGAAAG- Pyraby GTTTCCGTA--------GAACTTAGTAGTGTGGAAAG- Pyraer GTTTCAACT------------ATCTTTTGATTTTTGG- Pictor GTTAAATAA-------TAACCTAAATAGGATTGAAAG- Theaer GTAAAATAG---------ACCTTAATAGGATTGAAAG- Metace GTTTCAATC-------CCTCAAAGGTCTGATTTTAAC- Sultok GATGAATCC------------CAAAAGGAATTGAAAG- Sulaci GTTTTAGTT-------------TCTTGTCGTTATTAC- Pictor GTTTAAGAA--------TTACTAGATAGTATGGAGT-- Halmar GTTTCAGAC-------GGACCCTTGTGGGGTTGAAGC- Metjan ATTAAAATC-------AGACCGTTTCGGAATGGAAAT- Metkan GTTTCATTA-CCCGTATTATTACGGGTTAATTGCGAG- Nanequ CTTTCAATA---------TTTCTAATATATTAGAAAC- Themar GTTTCAATA-------CTTCCTTAGAGGTATGGAAAC- Theten GTTTCAATC-------CCTTTTAGGTAGGCTAAAAAC Thethe GTTGCAAAC-CTCGTTAGCCTCGTAGAGGATTGAAAC- Aquaeo GTTTTAACT--------CCACACGGTACATTAGAAAC- Bachal GTCGCACTC----TATATGGGT-GCGTGGATTGAAAT- Azoarcus GTGTTCCCC-------GCGCATCGCGGGGGTTGAAG-- Chltep GTCTTCCCC--------ACGCC-CGTGGGGGTGTTTC- Chrvio GTGCTCCCC-------CACGCA-CGTGGGGATGAACCG Clotet GTATTAGTA-------GCACCA-TATTGGAATGTAAAT Anabaena GTTTTAATTAACAAAAATCCCTATCAGGGATTGAAAC- Myctub GTTTCCGTC--CCCTCTCGGGGTTTTGGGTCTGACGAC Yerpes GTTCACTGC--------CGCACAGGCAGCTTAGAAA-- Metcap GTTTCAATCCACTCCCGGCTATTTAGCCGGGAGATAC- Mycgal GTTTTAGCA-CTGTACAATACTTGTGTAAGCAATAAC- Pasmul GTTAACTGC--------CGTATAGGCAGCTTAGAAA-- Pasmul GTTGTAGTTCCCTCTCTCATTTCGCAGTGCTACAAT-- Pasmul GTTCACCAT--------CGTGTAGATGGCTTAGAAA--
All
A large subset of CRISPRs is conserved among diverse specieseven between archaea and bacteria suggesting that
CRISPRs are horizontally transferred together with cas genes
Mojica FJ Diez-Villasenor C Garcia-Martinez J Soria E
Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elementsJ Mol Evol 2005 Feb60(2)174-82
Here we show that CRISPR spacers derive from preexisting sequences either chromosomal or within transmissible genetic elements such as bacteriophages and conjugative plasmids
helliphelliphelliphellipThe transcription of the CRISPR loci (Tang et al 2002) suggests that such activity could be executed by CRISPR-RNA molecules acting as regulatory RNA that specifically recognizes the target through the homologous RNA-spacer sequence similarly to the eukaryotic interference RNA
HypothesisCRISPRCasbull is a prokaryotic immunity system that functions on the
RNAi principle
bull integrates short fragments of essential phageplasmid genes into CRISPRs
bull When expressed these fragments (psiRNA ndash after prokaryotic siRNA) silence the target gene and make the organism immune to the respective agent
bull contains all or most of the protein activities involved in these processes
bull Some of the Cas proteins are functional analogs of the eukaryotic proteins involved in RNAi in particular components of RISC (RNA-Induced Silencing Complex) and form prokaryotic analogs of RISC (pRISCs)
Transcription
(protein-guided) RNA folding
specific transcription factor (COG1517)
cellular RNApol
polycistronic pre-psiRNA
p-dicer (Helicase+HD-Hydrolase)(COG1203 COG2254)
RNA processing(fast)
pre-psiRNA75-100 nt
RNA processing(slow)
p-dicer
psiRNA25-45 nt
p-slicer(COG1468 COG4343
COG1857)
RAMP
target RNA cleavage
p-RISC
plasmid or phage mRNA
Annealing to RNA target
p-RISC
The basic scheme of CASS functioning
psiRNA25-45 nt
RAMP
RAMP-RNA complex(unstable)plasmid or phage
mRNA
Primer elongationCASS RNApol(COG1353)
Amplified annealing
long dsRNA (stable)
p-dicerDuplex degradation
RAMPRAMP binding
Cycle continues
Annealing to RNA target
Variant of CASS functioning with polymerasepsi-RNA amplification(mostly in thermophiles but also in Mycobacteria)
plasmid or phage mRNA
pre-psiRNA75-100 nt
CASS RNApol(COG1353)Or other RT
Reverse transcription with random copy choice
Random RNA recombination and reverse transcription
dsDNA with CRISPRs and target-derived spacers
Homologous recombination with genomic CRISPR region
integrase(COG1518)
genomic DNA
genomic DNA with new target-derived spacer
OR
New psiRNA generation
Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12
Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity
Key validation
bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease
Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes
(1) Adaptation
(2) Expression amp Processing
(3) Interference
invading virus plasmid
CRISPRleader
5 4 3 2 1
6 5 4 3 2 1
invading virus plasmid
invading virus plasmid
host
host
cas operon
Cascade
Cas3
cas3 cas2cas1cas5
Van der Oost et al TIBS 2009
The three types of CRISPRCas systems and their signature genes
1343
1343
1343
1343
Helicase1203
HD-f
Cas2
Cas6
Cas6
Cas5
Cas8
Cas9
Cas1
Cas3
Cas10
Cas11
Cas7
Cas6
Cas6
Cas5 Cas6
Cas4
Cas6
13433513
1343
RuvC-f
3513
HNH
13433513
1857
18571857 RAMP1688
RAMP1688
RecB-f1468
RecB-f1468
Helicase1203
R AMP1583
1857 1857Helicase1203
1857Helicase1203
1343 y1724
ygcKygcL
EcoliCASS2
YpestCASS3
NmeniCASS4
MtubeCASS6 RAMP module
DvulgCASS1
Tneap and HmariCASS7
ApernCASS5
Type I (Helicase cassette)
Cascade
Type II (HNH-type)
Type III (RAMP module or Polymerase cassette)
CASS4a
1857 1343RecB-f1468
Helicase1203
BH0338MTH1090LA3191
1421 RAMP1567
RAMP15833337RAMP
1583
RAMP1583
RAMP1769
SPy1049
Cas6
Cas61857
RecB-f1468
RAMP1688
Helicase1203
AF0070
R AMP1583
1343
1517
Cas12 Cas13
RAMP1688
RAMP1688
RAMP1688
Makarova et al NRM submitted
Experimental data on CRISPRCas systems
1857
1857
1857
RAMP1688
10
Cordip DIP0037Bacfra BF3951
Wolsuc WS1444Camjej Cj1522c
Neimen NMA0630Pasmul PM1126
Strthe str0658Strpyo Spy 0770
Treden TDE0328Staepi SERP2463
Mycgal MGA 0523Mycmob MMOB0320
Porgin PG1982Legpne lpp0161
Wolsuc WS1615Sulsol SSO1450
Aerper APE1240Sulsol SSO1405
Pyraer PAE0200Pyraer PAE0081
Metmaz MM0559Metmaz MM3249
Arcful AF1878Mansuc MS1635
Niteur NE0111Myctub Rv2817c
Nostoc alr0381Synech slr7071Thethe TTHB145
Chrvio CV1229Xanaxo XAC3842
Chltep CT1130Desvul DVUA0134
Metcap MCA0651Azoarc ebA3283
Deigeo_DRAFT_1682Mansuc MS0981
Baccla ABC3592Bachal BH0341
Strpyo Spy 1286Vibvul VVA1544
Nostoc alr1468Synech slr7092
Synech slr7016Geosul GSU0057
Thethe TTHB224Nostoc alr1568
Metkan MK1312
Esccol b2755Saltyp STM2938
Phopro PBPRC0034Geosul GSU1392
Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639
Symthe_STH663Corjei jk0643
Nocfar nfa44220Strave SAV7537
Metcap MCA0930Cordip DIP2214
Chrvio CV1756Zymmob ZMO0680
Pasmul PM0311Erwcar ECA3679
Yerpes y1722Legpne lpl2837Phopro PBPRB1995
Acinet ACIAD2484
Clotet CTC01148Fusnuc FN1177
Aquaeo aq 369Theten TTE2658Themar TM1797
Bacfra BF2544Porgin PG2014
Halmar pNG4053Clotet CTC01463
Metjan MJ0378Pyrhor PH1245
Thekod TK0455Metthe MTH1084
Arcful AF2435Metace MA3670
Nanequ NEQ017Pyrhor PH0173
Pictor PTO0003Pictor PTO0049Thevol TVN0106
1343
1343
1343
1343
1518
1518
1343
1518
RecB-f4343
RecB-f1468
RecB-f1468
RecB-f1468
Helicase12031857 RAMP
1688
RAMP1688
Helicase1203
Helicase1203
RAMPBH0337
RAMPYgcH
RAMPy1727
RAMPy17261857
Helicase1203
HD-f
Helicase1203
HD-f
1343
HD-f
HD-f
HD-f
RuvC-f
AF0070
BH0338
y1724
ygcKygcL
RAMP1583
HTH
AF1870
SPy1049
3574
AF1870
PH0918
HTH 3574
1857 RAMP1688
RecB-f1468
Helicase1203
HD-f
AF1870
MTH1090
AF1873
RAMP1583
3574
AF1870
3574
1343
1
7
2
3
4
5
6
7a
Sulsol SSO1999Sulsol SSO1440
Sultok ST2639Sultok ST0032
Sulsol SSO1402Pyraer PAE0068
Arcful AF1874Pyrhor PH0917Thekod TK0450
Pyraby PAB1689
13433513
3513
1518
RuvC-f
HNH
4aHNH
1 0
Pyrhor PH0162
6
64
6
4
16
17
7
7
7
7
1
2
7
7
7
7
5 7
5 75 7
5 75 7
5 7
6 1
7
Metkan MK1314
Metace MA1931
Metmaz MM3356
Metbar Mbar_A1355
Strthe stu0960Myctub Rv2823cStaepi SERP2461
Themar TM1811Mansuc MS1651
Niteur NE0123Thethe TTP0102
Metthe MTH1082
Metthe MTH326
Metjan MJ1672Pictor PTO0053Thevol TVN0112
Metkan MK1297Thethe TTP0115
Bachal BH0328Synech sll7090
Aquaeo aq 387Theten TTE2639
Themar TM1794Arcful AF1867
Pyrfur PF1129Thefus Tfu1578
Porgin PG1987Sultok ST1979
Sulsol SSO1991Sulsol SSO1729
Sultok ST0012Sulaci Saci 2046Nostoc all1479
1421 RAMP1567
RAMP1583
RAMPMTH323
RAMP1769
XMTH324
ldquoPolymeraserdquo cassette
ldquoHelicaserdquo cassette(A)
(B)
(C)
Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4
Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56
Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)
mRNA targeting
Phylogeny of Cas1 and the 3 types of CRISPRCas systems
228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes
EcoliCASS2Type IType I
Type I
Type I
Type I
Type III
Type III
Type III
Type III
Type III
Type III
Type II
YpestCASS3
NmeniCASS4
PolymeraseRAMPmoduleMtubeCASS6
PolRAMP
DvulgCASS1
TneapHmariCASS7
ApernCASS5
cas1cas9 cas2 cas4
cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI
II
IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2
RAMP
Makarova et al NRM submitted
CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes
54
256
1537
26
516
2
9
7 56107
3
10
13
383
210
46
216
8
1
7 70211
10
1
0102030405060708090
100
1623
84 6
17 7
2322
0
9
0
0
151
2
1
21
17
20 1
0
1533 28 7 14
0
9 7 40
117 210
0102030405060708090
100
Type I Type II Type IIICas1presenceabsence
Makarova et al in preparation
Modular evolution of the 3 types of CRISPRCas
Makarova et al in preparation
Mol Microbiol 2011 Jan79(2)484-502
A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair
Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF
Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair
Back to repair functions
The 3 major modalities of evolution
Koonin Wolf Biol Direct 2009
CRISPRCas as a bona fide Lamarckian system
Koonin Wolf Biol Direct 2009
Phenomenon Biological rolefunction
Phyletic spread
Lamarckian criteria
Genomic changes caused by environmental factor
Changes are specific to relevant genomic loci
Changes provide adaptation to the causative factor
Bona fide LamarckianCRISPRCas Defense against
viruses and other mobile elements
Most of the Archaeaand many bacteria
Yes Yes Yes
piRNA Defense against transposable elements in germline
Animals Yes Yes Yes
HGT (specific cases)
Adaptation to new environment stress response resistance
Archaea bacteria unicellular eukaryotes
Yes Yes Yes
Quasi-LamarckianHGT (general phenomenon)
Diverse innovations Archaea bacteria unicellular eukaryotes
Yes No Yesno
Stress-induced mutagenesis
Stress responseresistanceadaptation to new conditions
Ubiquitous Yes No or partially
Yes (but general evolvabilityenhanced as well)
Diverse Lamarckian and quasi-Lamarckian phenomena
Koonin Wolf Biol Direct 2009
Stress as a gauge of evolutionary modality
Koonin Wolf Biol Direct 2009
bullEvolution of parasites is intrinsic to any replicator system
bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since
bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms
bullPerennial arms race between parasites and hosts is one of the principal factors of evolution
Valerian DoljaOregon State University
Yuri Wolf NCBITatiana Senkevich NIAID NIH
Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille
Didier Raoult et al
Bill MartinUniv Duesseldorf
Kira Makarova NCBI
+
1 Positive-strandRNA
3-30 kb - +R R ET
T+R
Class Replication cycle
2 Double-strandRNA
4-25 kb plusmnTr R E
T
+ plusmn
3 Negative-strandRNA
11-20 kbR R E
RdRp CPT
+-Tr -
+
Genetic cycles of RNA viruses
Diversity of viral genetic cycles versus the uniform genetic cycle of all cellular organisms Viruses are the
biospherersquos laboratory of genomic strategies
RdRp CP
RdRp CP
+4 Retroid
RNA viruses 7-12 kb +RT Tr E
T+Tr
plusmn
5 Retroid DNAviruses
elements2-10 kb +Tr
T+Tr
plusmnE RT
plusmn
RT CP
RT
Class Replication cycle
Genetic cycles of retroid viruses and retroelements
Genetic cycles of DNA viruses and plasmids 6 ssDNAvirusesplasmids2-11 kb
7 dsDNA virusesplasmids
5-1200 kb
+RCR RCR E
+Tr
plusmn +
TRCE
Tr
plusmn
+T
plusmnR E
Pr-Pol ALL CELLULARORGANISMS
YOU ARE HERE
I Genes with readily detectable homologs from cellular life forms 1 Genes with closely related homologs from cellular organisms
(typically the host of the given virus) present in a narrow group of viruses2 Genes that are conserved within a virus lineage or even several
lineages and have moderately close cellular homologsOrigin relatively recent (1) or ancient (2) acquisition from host
II Virus-specific genes3 ORFans ie genes without detectable homologs except possibly
in closely related viruses 4 Virus-specific genes that are conserved within a virus lineageAcquisition from host but with rapid divergence from ancestor once within viral genomes
III Viral hallmark genes5 Genes shared by many diverse virus lineages with only very distant
homologs in cellular organisms
Natural history of viral genes a one-page summary of viral comparative genomics
Contributions of different classes of viral genes to the genomes of different classes of viruses strong dependence on genome size
Genome size (log) 0 1 2 3
Recent acquisitions
Old acquisitions
Virus-specific ORFans
Virus-specific conserved
Virus hallmark
Most RNAvirusesretroelementsRCR replicons
Large RNA virusesadenotailedphages
NCLDVherpeslargephages
Natural history of viral genesViral Hallmark Genes
Shared by many diverse groups of viruses from the smallest RNA viruses to the giant DNA viruses
Strong support for monophyly of all viral members of the respective gene families
Only distant homologs in cellular organisms
Can be viewed as signatures of the lsquovirus statersquo
Play major roles in genome replication packagingand assembly
4 Rolling circle replicationinitiation endonuclease
1 Jelly-roll capsid protein
2 Superfamily 3 helicase
Protein products of viral hallmark genes
3 RNA-dependent RNA polymeraseand Reverse transcriptase
5 Viral archaeo-eukaryotic DNA primase
6 UL9-like superfamily 2 helicase
7 Packaging ATPase of the FtsK family
8 ATPase subunit of terminase
The primordial gene pool hypothesis (extremely counterintuitive ndash Santiago Elena Feb 16 2011)
The hallmark genes AND by implication the major lineages of modern viruses (at least viruses of prokaryotes) descend directly from a primordial gene pool
Koonin Senkevich Dolja Biol Direct 2006 129
Viral hallmark genes arebullpresent in a huge diversity of viruses and other selfish elementsbullrepresented only by remote homologs in cellular life forms
-synergy with the diversity of genomic strategies
A crucial corollary If viruses come directly from a primordial gene pool then origin of viruses isinextricably linked to the origin of cells
Cell degeneration Escaped genes Pr imordial geneticsystems
CELL
SMALLPARASITICCELL
VERYSMALLVIRUS
CHROMOSOME
PLASMID
VIRUS
RNA
VIRUS
PRE-CELLULAR LIFE FORMS
DNA
VIRUS
mRNA
VIRUS
Competing concepts of the or igin of viruses
CELLS
Origin and evolution of virus-like genetic
elementsin the pre-cellular
era
Koonin Martin TIG 2005Koonin Senkevich Dolja Biol Direct 2006 129Koonin Ann NY Acad Sci 2009
Replicon fusionyields chromosomes
bullViruses and virus-like genetic elements are not ldquojustrdquo pathogens they aredominant entities in the biospherebullEmergence of virus-like parasites is inevitable in any replicating systembullIn the pre-cellular epoch the genetic elements that later became viral andcellular genomes comprised a single pool in which they mixed matched andevolved new increasingly complex gene ensemblesbullDifferent replication strategies including RNA replication reverse transcriptionand DNA replication evolved already in the primordial genetic poolbullWith the emergence of prokaryotic cells a distinct pool of viral genes formed that retained its identity ever since as evidenced by the extant distribution ofviral hallmark genes ldquovirus worldrdquo or the virospherebullThe emergence of the eukaryotic cell was a second melting pot of virus evolution from which viruses of eukaryotes originated via recombination of genes from prokaryote viruses retroelements and the evolving eukaryotic hostbullViruses make essential contributions to the evolution of the genomes of cellularlife forms in particular as vehicles of HGT GTAs transducing phages
The ancient Virus World (VW)
Koonin EV Senkevich TG Dolja VV The ancient Virus World and evolution of cells Biol Direct 2006 Koonin EV Wolf YI Nagasaki K Dolja VV The complexity of the virus world Nat Rev Microbiol 2009
Evolution of antivirus defense systems
bull CRISPRCas system of adaptive immunity in prokaryotes
bull A case for Lamarckian evolutionbull The perennial virus-host arms race
CRISPR repeats and Cas genesCRISPR Clustered Regularly interspaced short palindromic repeatsCas CRISPR-associated (genes)
Sorek et al Nature Rev Microbiol 2008
Makarova KS Aravind L Grishin NV Rogozin IB Koonin EVA DNA repair system specific for thermophilic Archaea and bacteria predicted by genomic context analysisNucleic Acids Res 2002 Jan 1530(2)482-96
During a systematic analysis of conserved gene context in prokaryotic genomes a previously undetected complex partially conserved neighborhood consisting of more than 20 genes was discovered in most Archaea and some bacteria including the hyperthermophiles Thermotoga maritima and Aquifex aeolicus The gene composition and gene order in this neighborhood vary greatly between species but all versions have a stable conserved core that consists of five genes One of the core genes encodes a predicted DNA helicase often fused to a predicted HD-superfamily hydrolase and another encodes a RecB family exonuclease three core genes remain uncharacterized but one of these might encode a nuclease of a new family helliphelliphelliphellipThe functional features of the proteins encoded in this neighborhood suggest that they comprise a previously undetected DNA repair system which to our knowledge is the first repair system largely specific for thermophiles to be identified
Protein components of the system an update with unification of many diverse families
~25 families altogetherFamily SubfamilyA Phyletic
distributionBComments
1 COG1518 COG1518 (cas1) All Putative novel nucleaseintegrase Mostly α-helical protein
2 COG1343 COG1343 (cas2) COG3512ygbF-like
MTH324-likey1723_N-like
All Small protein related to VapD fused to helicase (COG1203) iny1723-like proteins
3 COG1203 COG1203 (cas3) All DNA helicase Most proteins have fusion to HD nuclease
4 RecB-like nuclease
COG1468 (cas4) COG4343 All RecB-like nuclease Contains three-cysteine C-terminal cluster
5 RAMP Repair-associated mysterious
protein
COG1688 (cas5)COG1769 COG1583COG1567 COG1336COG1367 COG1604COG1337 COG1332COG5551
BH0337-likeMJ0978-likeYgcH-like
y1726-like y1727-like
All Belong to ldquoRAMPrdquo family possibly RNA-binding proteinstructurally related to duplicated ferredoxin fold (PDB 1wj9)
6 COG1857 COG1857 COG3649 YgcJ-like y1725-like
All αβ protein predicted nuclease or integrase
7 HD-like nuclease
COG1203 (N-terminus) COG2254 All HD-like nuclease
8 BH0338 BH0338-likeMTH1090-like
All mostlyarchaea and FIRM
Large Zn-finger containing proteins probably nucleases (nucleaseactivity was shown for MTH1090 (ref)
9 COG1353 predicted
polymerase
COG1353 Most archaea some bacteria
Predicted palm-domain polymerase distantly related to viral RdRp and RT
hellipand ~15 other less common proteins Makarova et al BD 2006
CRISPR clustered regularly interspaced short palindromic repeats
CRISPR repeats
TB and TA
Strepto-like
Ecoli-like
Pasteurella-like
Cas2COG1343
Cas1COG1518
Cas4COG1468
Cas3COG1203
ldquoThe common structural characteristics of CRISPR loci are (i) the presence of multiple short direct repeats which show no or very little sequence variation within a given locus (ii) the presence of non-repetitive spacer sequences between the repeats of similar size (iii) the presence of a common leader sequence of a few hundred basepairs in most species harbouring multiple CRISPR loci (iv) the absence of long open reading frames within the locus and (v) the presence of the cas1 gene accompanied by the cas2 cas3 or cas4 genes in CRISPR-containing speciesrdquo
The cas genes are our ldquorepairrdquo system
ldquoHelicaserdquo cassette
ldquoPolymeraserdquo cassette
paragon of prokaryotic genome plasticity-extensive gene shuffling-evidence of multiple horizontal transfers-widespread gene loss
RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins
extreme sequence diversity MOTIFs specific I II III IV V COGs 13361367 hhshhGs ust-lKGhh+hh hhGtt hD lGhttsGh 160413371332 y1726-like slhlpEKuVRGT lRTIDs YGuVTsGhuh COG1851 hGphpGpsaFh hGFGRh BH0337-like TpA-h+GIh-uIh hhLpDV LGsREhuht COG1567 hhhpp ups-lhtAhh luscoGhGh COG1769 hhhh+Ph-hhh ss-hhGhlhsh hGh hGtcp+hsthchp COG1688 (Cas5) hhhhhts sssshhGhlsh lGttphh COGs 15835551 hhhhoPhhl hGtppshGFGl YgcH-like hHphlh hGu+uhGhGhh y1727-like LHphLh GFsthGLStss MJ0978-like hHNH lG+tsuhGhGol
Ferredoxin foldRNA-recognition motif (RRM)
The sequence similarity space of CRISPR repeats visualized with the BioLayoutprogram [26] Dots denote individual repeat sequences connecting lines represent Smith-Waterman similarities such that closer dots represent more similar sequences Dot colors denote cluster association as derived from MCL clustering The 12 largest clusters are indicated by circles together with their sequence logos coarse phylogenetic composition and sample secondary structures where applicableKunin et al Genome Biology 2007 8R61 doi101186gb-2007-8-4-r61
CRISPR show extreme diversity and complex clustering
Arcful GTTGAAATC-------AGACCAAAATGGGATTGAAAG- Metthe GTTAAAATC-------AGACCAAAATGGGATTGAAAT- Pyraby GTTCCAATA-------AGACTAGAATAGAATTGAAAG- Pyrhor GTTTAATAA--------GACTAAAATAGAATTGAAAG- Sulsol GATTAATCC-------------CAAAAGAATTGAAAG- Pyrhor GTTTCCGTA--------GAACTAAATAGTGTGGAAAG- Pyraby GTTTCCGTA--------GAACTTAGTAGTGTGGAAAG- Pyraer GTTTCAACT------------ATCTTTTGATTTTTGG- Pictor GTTAAATAA-------TAACCTAAATAGGATTGAAAG- Theaer GTAAAATAG---------ACCTTAATAGGATTGAAAG- Metace GTTTCAATC-------CCTCAAAGGTCTGATTTTAAC- Sultok GATGAATCC------------CAAAAGGAATTGAAAG- Sulaci GTTTTAGTT-------------TCTTGTCGTTATTAC- Pictor GTTTAAGAA--------TTACTAGATAGTATGGAGT-- Halmar GTTTCAGAC-------GGACCCTTGTGGGGTTGAAGC- Metjan ATTAAAATC-------AGACCGTTTCGGAATGGAAAT- Metkan GTTTCATTA-CCCGTATTATTACGGGTTAATTGCGAG- Nanequ CTTTCAATA---------TTTCTAATATATTAGAAAC- Themar GTTTCAATA-------CTTCCTTAGAGGTATGGAAAC- Theten GTTTCAATC-------CCTTTTAGGTAGGCTAAAAAC Thethe GTTGCAAAC-CTCGTTAGCCTCGTAGAGGATTGAAAC- Aquaeo GTTTTAACT--------CCACACGGTACATTAGAAAC- Bachal GTCGCACTC----TATATGGGT-GCGTGGATTGAAAT- Azoarcus GTGTTCCCC-------GCGCATCGCGGGGGTTGAAG-- Chltep GTCTTCCCC--------ACGCC-CGTGGGGGTGTTTC- Chrvio GTGCTCCCC-------CACGCA-CGTGGGGATGAACCG Clotet GTATTAGTA-------GCACCA-TATTGGAATGTAAAT Anabaena GTTTTAATTAACAAAAATCCCTATCAGGGATTGAAAC- Myctub GTTTCCGTC--CCCTCTCGGGGTTTTGGGTCTGACGAC Yerpes GTTCACTGC--------CGCACAGGCAGCTTAGAAA-- Metcap GTTTCAATCCACTCCCGGCTATTTAGCCGGGAGATAC- Mycgal GTTTTAGCA-CTGTACAATACTTGTGTAAGCAATAAC- Pasmul GTTAACTGC--------CGTATAGGCAGCTTAGAAA-- Pasmul GTTGTAGTTCCCTCTCTCATTTCGCAGTGCTACAAT-- Pasmul GTTCACCAT--------CGTGTAGATGGCTTAGAAA--
All
A large subset of CRISPRs is conserved among diverse specieseven between archaea and bacteria suggesting that
CRISPRs are horizontally transferred together with cas genes
Mojica FJ Diez-Villasenor C Garcia-Martinez J Soria E
Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elementsJ Mol Evol 2005 Feb60(2)174-82
Here we show that CRISPR spacers derive from preexisting sequences either chromosomal or within transmissible genetic elements such as bacteriophages and conjugative plasmids
helliphelliphelliphellipThe transcription of the CRISPR loci (Tang et al 2002) suggests that such activity could be executed by CRISPR-RNA molecules acting as regulatory RNA that specifically recognizes the target through the homologous RNA-spacer sequence similarly to the eukaryotic interference RNA
HypothesisCRISPRCasbull is a prokaryotic immunity system that functions on the
RNAi principle
bull integrates short fragments of essential phageplasmid genes into CRISPRs
bull When expressed these fragments (psiRNA ndash after prokaryotic siRNA) silence the target gene and make the organism immune to the respective agent
bull contains all or most of the protein activities involved in these processes
bull Some of the Cas proteins are functional analogs of the eukaryotic proteins involved in RNAi in particular components of RISC (RNA-Induced Silencing Complex) and form prokaryotic analogs of RISC (pRISCs)
Transcription
(protein-guided) RNA folding
specific transcription factor (COG1517)
cellular RNApol
polycistronic pre-psiRNA
p-dicer (Helicase+HD-Hydrolase)(COG1203 COG2254)
RNA processing(fast)
pre-psiRNA75-100 nt
RNA processing(slow)
p-dicer
psiRNA25-45 nt
p-slicer(COG1468 COG4343
COG1857)
RAMP
target RNA cleavage
p-RISC
plasmid or phage mRNA
Annealing to RNA target
p-RISC
The basic scheme of CASS functioning
psiRNA25-45 nt
RAMP
RAMP-RNA complex(unstable)plasmid or phage
mRNA
Primer elongationCASS RNApol(COG1353)
Amplified annealing
long dsRNA (stable)
p-dicerDuplex degradation
RAMPRAMP binding
Cycle continues
Annealing to RNA target
Variant of CASS functioning with polymerasepsi-RNA amplification(mostly in thermophiles but also in Mycobacteria)
plasmid or phage mRNA
pre-psiRNA75-100 nt
CASS RNApol(COG1353)Or other RT
Reverse transcription with random copy choice
Random RNA recombination and reverse transcription
dsDNA with CRISPRs and target-derived spacers
Homologous recombination with genomic CRISPR region
integrase(COG1518)
genomic DNA
genomic DNA with new target-derived spacer
OR
New psiRNA generation
Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12
Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity
Key validation
bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease
Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes
(1) Adaptation
(2) Expression amp Processing
(3) Interference
invading virus plasmid
CRISPRleader
5 4 3 2 1
6 5 4 3 2 1
invading virus plasmid
invading virus plasmid
host
host
cas operon
Cascade
Cas3
cas3 cas2cas1cas5
Van der Oost et al TIBS 2009
The three types of CRISPRCas systems and their signature genes
1343
1343
1343
1343
Helicase1203
HD-f
Cas2
Cas6
Cas6
Cas5
Cas8
Cas9
Cas1
Cas3
Cas10
Cas11
Cas7
Cas6
Cas6
Cas5 Cas6
Cas4
Cas6
13433513
1343
RuvC-f
3513
HNH
13433513
1857
18571857 RAMP1688
RAMP1688
RecB-f1468
RecB-f1468
Helicase1203
R AMP1583
1857 1857Helicase1203
1857Helicase1203
1343 y1724
ygcKygcL
EcoliCASS2
YpestCASS3
NmeniCASS4
MtubeCASS6 RAMP module
DvulgCASS1
Tneap and HmariCASS7
ApernCASS5
Type I (Helicase cassette)
Cascade
Type II (HNH-type)
Type III (RAMP module or Polymerase cassette)
CASS4a
1857 1343RecB-f1468
Helicase1203
BH0338MTH1090LA3191
1421 RAMP1567
RAMP15833337RAMP
1583
RAMP1583
RAMP1769
SPy1049
Cas6
Cas61857
RecB-f1468
RAMP1688
Helicase1203
AF0070
R AMP1583
1343
1517
Cas12 Cas13
RAMP1688
RAMP1688
RAMP1688
Makarova et al NRM submitted
Experimental data on CRISPRCas systems
1857
1857
1857
RAMP1688
10
Cordip DIP0037Bacfra BF3951
Wolsuc WS1444Camjej Cj1522c
Neimen NMA0630Pasmul PM1126
Strthe str0658Strpyo Spy 0770
Treden TDE0328Staepi SERP2463
Mycgal MGA 0523Mycmob MMOB0320
Porgin PG1982Legpne lpp0161
Wolsuc WS1615Sulsol SSO1450
Aerper APE1240Sulsol SSO1405
Pyraer PAE0200Pyraer PAE0081
Metmaz MM0559Metmaz MM3249
Arcful AF1878Mansuc MS1635
Niteur NE0111Myctub Rv2817c
Nostoc alr0381Synech slr7071Thethe TTHB145
Chrvio CV1229Xanaxo XAC3842
Chltep CT1130Desvul DVUA0134
Metcap MCA0651Azoarc ebA3283
Deigeo_DRAFT_1682Mansuc MS0981
Baccla ABC3592Bachal BH0341
Strpyo Spy 1286Vibvul VVA1544
Nostoc alr1468Synech slr7092
Synech slr7016Geosul GSU0057
Thethe TTHB224Nostoc alr1568
Metkan MK1312
Esccol b2755Saltyp STM2938
Phopro PBPRC0034Geosul GSU1392
Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639
Symthe_STH663Corjei jk0643
Nocfar nfa44220Strave SAV7537
Metcap MCA0930Cordip DIP2214
Chrvio CV1756Zymmob ZMO0680
Pasmul PM0311Erwcar ECA3679
Yerpes y1722Legpne lpl2837Phopro PBPRB1995
Acinet ACIAD2484
Clotet CTC01148Fusnuc FN1177
Aquaeo aq 369Theten TTE2658Themar TM1797
Bacfra BF2544Porgin PG2014
Halmar pNG4053Clotet CTC01463
Metjan MJ0378Pyrhor PH1245
Thekod TK0455Metthe MTH1084
Arcful AF2435Metace MA3670
Nanequ NEQ017Pyrhor PH0173
Pictor PTO0003Pictor PTO0049Thevol TVN0106
1343
1343
1343
1343
1518
1518
1343
1518
RecB-f4343
RecB-f1468
RecB-f1468
RecB-f1468
Helicase12031857 RAMP
1688
RAMP1688
Helicase1203
Helicase1203
RAMPBH0337
RAMPYgcH
RAMPy1727
RAMPy17261857
Helicase1203
HD-f
Helicase1203
HD-f
1343
HD-f
HD-f
HD-f
RuvC-f
AF0070
BH0338
y1724
ygcKygcL
RAMP1583
HTH
AF1870
SPy1049
3574
AF1870
PH0918
HTH 3574
1857 RAMP1688
RecB-f1468
Helicase1203
HD-f
AF1870
MTH1090
AF1873
RAMP1583
3574
AF1870
3574
1343
1
7
2
3
4
5
6
7a
Sulsol SSO1999Sulsol SSO1440
Sultok ST2639Sultok ST0032
Sulsol SSO1402Pyraer PAE0068
Arcful AF1874Pyrhor PH0917Thekod TK0450
Pyraby PAB1689
13433513
3513
1518
RuvC-f
HNH
4aHNH
1 0
Pyrhor PH0162
6
64
6
4
16
17
7
7
7
7
1
2
7
7
7
7
5 7
5 75 7
5 75 7
5 7
6 1
7
Metkan MK1314
Metace MA1931
Metmaz MM3356
Metbar Mbar_A1355
Strthe stu0960Myctub Rv2823cStaepi SERP2461
Themar TM1811Mansuc MS1651
Niteur NE0123Thethe TTP0102
Metthe MTH1082
Metthe MTH326
Metjan MJ1672Pictor PTO0053Thevol TVN0112
Metkan MK1297Thethe TTP0115
Bachal BH0328Synech sll7090
Aquaeo aq 387Theten TTE2639
Themar TM1794Arcful AF1867
Pyrfur PF1129Thefus Tfu1578
Porgin PG1987Sultok ST1979
Sulsol SSO1991Sulsol SSO1729
Sultok ST0012Sulaci Saci 2046Nostoc all1479
1421 RAMP1567
RAMP1583
RAMPMTH323
RAMP1769
XMTH324
ldquoPolymeraserdquo cassette
ldquoHelicaserdquo cassette(A)
(B)
(C)
Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4
Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56
Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)
mRNA targeting
Phylogeny of Cas1 and the 3 types of CRISPRCas systems
228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes
EcoliCASS2Type IType I
Type I
Type I
Type I
Type III
Type III
Type III
Type III
Type III
Type III
Type II
YpestCASS3
NmeniCASS4
PolymeraseRAMPmoduleMtubeCASS6
PolRAMP
DvulgCASS1
TneapHmariCASS7
ApernCASS5
cas1cas9 cas2 cas4
cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI
II
IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2
RAMP
Makarova et al NRM submitted
CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes
54
256
1537
26
516
2
9
7 56107
3
10
13
383
210
46
216
8
1
7 70211
10
1
0102030405060708090
100
1623
84 6
17 7
2322
0
9
0
0
151
2
1
21
17
20 1
0
1533 28 7 14
0
9 7 40
117 210
0102030405060708090
100
Type I Type II Type IIICas1presenceabsence
Makarova et al in preparation
Modular evolution of the 3 types of CRISPRCas
Makarova et al in preparation
Mol Microbiol 2011 Jan79(2)484-502
A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair
Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF
Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair
Back to repair functions
The 3 major modalities of evolution
Koonin Wolf Biol Direct 2009
CRISPRCas as a bona fide Lamarckian system
Koonin Wolf Biol Direct 2009
Phenomenon Biological rolefunction
Phyletic spread
Lamarckian criteria
Genomic changes caused by environmental factor
Changes are specific to relevant genomic loci
Changes provide adaptation to the causative factor
Bona fide LamarckianCRISPRCas Defense against
viruses and other mobile elements
Most of the Archaeaand many bacteria
Yes Yes Yes
piRNA Defense against transposable elements in germline
Animals Yes Yes Yes
HGT (specific cases)
Adaptation to new environment stress response resistance
Archaea bacteria unicellular eukaryotes
Yes Yes Yes
Quasi-LamarckianHGT (general phenomenon)
Diverse innovations Archaea bacteria unicellular eukaryotes
Yes No Yesno
Stress-induced mutagenesis
Stress responseresistanceadaptation to new conditions
Ubiquitous Yes No or partially
Yes (but general evolvabilityenhanced as well)
Diverse Lamarckian and quasi-Lamarckian phenomena
Koonin Wolf Biol Direct 2009
Stress as a gauge of evolutionary modality
Koonin Wolf Biol Direct 2009
bullEvolution of parasites is intrinsic to any replicator system
bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since
bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms
bullPerennial arms race between parasites and hosts is one of the principal factors of evolution
Valerian DoljaOregon State University
Yuri Wolf NCBITatiana Senkevich NIAID NIH
Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille
Didier Raoult et al
Bill MartinUniv Duesseldorf
Kira Makarova NCBI
+4 Retroid
RNA viruses 7-12 kb +RT Tr E
T+Tr
plusmn
5 Retroid DNAviruses
elements2-10 kb +Tr
T+Tr
plusmnE RT
plusmn
RT CP
RT
Class Replication cycle
Genetic cycles of retroid viruses and retroelements
Genetic cycles of DNA viruses and plasmids 6 ssDNAvirusesplasmids2-11 kb
7 dsDNA virusesplasmids
5-1200 kb
+RCR RCR E
+Tr
plusmn +
TRCE
Tr
plusmn
+T
plusmnR E
Pr-Pol ALL CELLULARORGANISMS
YOU ARE HERE
I Genes with readily detectable homologs from cellular life forms 1 Genes with closely related homologs from cellular organisms
(typically the host of the given virus) present in a narrow group of viruses2 Genes that are conserved within a virus lineage or even several
lineages and have moderately close cellular homologsOrigin relatively recent (1) or ancient (2) acquisition from host
II Virus-specific genes3 ORFans ie genes without detectable homologs except possibly
in closely related viruses 4 Virus-specific genes that are conserved within a virus lineageAcquisition from host but with rapid divergence from ancestor once within viral genomes
III Viral hallmark genes5 Genes shared by many diverse virus lineages with only very distant
homologs in cellular organisms
Natural history of viral genes a one-page summary of viral comparative genomics
Contributions of different classes of viral genes to the genomes of different classes of viruses strong dependence on genome size
Genome size (log) 0 1 2 3
Recent acquisitions
Old acquisitions
Virus-specific ORFans
Virus-specific conserved
Virus hallmark
Most RNAvirusesretroelementsRCR replicons
Large RNA virusesadenotailedphages
NCLDVherpeslargephages
Natural history of viral genesViral Hallmark Genes
Shared by many diverse groups of viruses from the smallest RNA viruses to the giant DNA viruses
Strong support for monophyly of all viral members of the respective gene families
Only distant homologs in cellular organisms
Can be viewed as signatures of the lsquovirus statersquo
Play major roles in genome replication packagingand assembly
4 Rolling circle replicationinitiation endonuclease
1 Jelly-roll capsid protein
2 Superfamily 3 helicase
Protein products of viral hallmark genes
3 RNA-dependent RNA polymeraseand Reverse transcriptase
5 Viral archaeo-eukaryotic DNA primase
6 UL9-like superfamily 2 helicase
7 Packaging ATPase of the FtsK family
8 ATPase subunit of terminase
The primordial gene pool hypothesis (extremely counterintuitive ndash Santiago Elena Feb 16 2011)
The hallmark genes AND by implication the major lineages of modern viruses (at least viruses of prokaryotes) descend directly from a primordial gene pool
Koonin Senkevich Dolja Biol Direct 2006 129
Viral hallmark genes arebullpresent in a huge diversity of viruses and other selfish elementsbullrepresented only by remote homologs in cellular life forms
-synergy with the diversity of genomic strategies
A crucial corollary If viruses come directly from a primordial gene pool then origin of viruses isinextricably linked to the origin of cells
Cell degeneration Escaped genes Pr imordial geneticsystems
CELL
SMALLPARASITICCELL
VERYSMALLVIRUS
CHROMOSOME
PLASMID
VIRUS
RNA
VIRUS
PRE-CELLULAR LIFE FORMS
DNA
VIRUS
mRNA
VIRUS
Competing concepts of the or igin of viruses
CELLS
Origin and evolution of virus-like genetic
elementsin the pre-cellular
era
Koonin Martin TIG 2005Koonin Senkevich Dolja Biol Direct 2006 129Koonin Ann NY Acad Sci 2009
Replicon fusionyields chromosomes
bullViruses and virus-like genetic elements are not ldquojustrdquo pathogens they aredominant entities in the biospherebullEmergence of virus-like parasites is inevitable in any replicating systembullIn the pre-cellular epoch the genetic elements that later became viral andcellular genomes comprised a single pool in which they mixed matched andevolved new increasingly complex gene ensemblesbullDifferent replication strategies including RNA replication reverse transcriptionand DNA replication evolved already in the primordial genetic poolbullWith the emergence of prokaryotic cells a distinct pool of viral genes formed that retained its identity ever since as evidenced by the extant distribution ofviral hallmark genes ldquovirus worldrdquo or the virospherebullThe emergence of the eukaryotic cell was a second melting pot of virus evolution from which viruses of eukaryotes originated via recombination of genes from prokaryote viruses retroelements and the evolving eukaryotic hostbullViruses make essential contributions to the evolution of the genomes of cellularlife forms in particular as vehicles of HGT GTAs transducing phages
The ancient Virus World (VW)
Koonin EV Senkevich TG Dolja VV The ancient Virus World and evolution of cells Biol Direct 2006 Koonin EV Wolf YI Nagasaki K Dolja VV The complexity of the virus world Nat Rev Microbiol 2009
Evolution of antivirus defense systems
bull CRISPRCas system of adaptive immunity in prokaryotes
bull A case for Lamarckian evolutionbull The perennial virus-host arms race
CRISPR repeats and Cas genesCRISPR Clustered Regularly interspaced short palindromic repeatsCas CRISPR-associated (genes)
Sorek et al Nature Rev Microbiol 2008
Makarova KS Aravind L Grishin NV Rogozin IB Koonin EVA DNA repair system specific for thermophilic Archaea and bacteria predicted by genomic context analysisNucleic Acids Res 2002 Jan 1530(2)482-96
During a systematic analysis of conserved gene context in prokaryotic genomes a previously undetected complex partially conserved neighborhood consisting of more than 20 genes was discovered in most Archaea and some bacteria including the hyperthermophiles Thermotoga maritima and Aquifex aeolicus The gene composition and gene order in this neighborhood vary greatly between species but all versions have a stable conserved core that consists of five genes One of the core genes encodes a predicted DNA helicase often fused to a predicted HD-superfamily hydrolase and another encodes a RecB family exonuclease three core genes remain uncharacterized but one of these might encode a nuclease of a new family helliphelliphelliphellipThe functional features of the proteins encoded in this neighborhood suggest that they comprise a previously undetected DNA repair system which to our knowledge is the first repair system largely specific for thermophiles to be identified
Protein components of the system an update with unification of many diverse families
~25 families altogetherFamily SubfamilyA Phyletic
distributionBComments
1 COG1518 COG1518 (cas1) All Putative novel nucleaseintegrase Mostly α-helical protein
2 COG1343 COG1343 (cas2) COG3512ygbF-like
MTH324-likey1723_N-like
All Small protein related to VapD fused to helicase (COG1203) iny1723-like proteins
3 COG1203 COG1203 (cas3) All DNA helicase Most proteins have fusion to HD nuclease
4 RecB-like nuclease
COG1468 (cas4) COG4343 All RecB-like nuclease Contains three-cysteine C-terminal cluster
5 RAMP Repair-associated mysterious
protein
COG1688 (cas5)COG1769 COG1583COG1567 COG1336COG1367 COG1604COG1337 COG1332COG5551
BH0337-likeMJ0978-likeYgcH-like
y1726-like y1727-like
All Belong to ldquoRAMPrdquo family possibly RNA-binding proteinstructurally related to duplicated ferredoxin fold (PDB 1wj9)
6 COG1857 COG1857 COG3649 YgcJ-like y1725-like
All αβ protein predicted nuclease or integrase
7 HD-like nuclease
COG1203 (N-terminus) COG2254 All HD-like nuclease
8 BH0338 BH0338-likeMTH1090-like
All mostlyarchaea and FIRM
Large Zn-finger containing proteins probably nucleases (nucleaseactivity was shown for MTH1090 (ref)
9 COG1353 predicted
polymerase
COG1353 Most archaea some bacteria
Predicted palm-domain polymerase distantly related to viral RdRp and RT
hellipand ~15 other less common proteins Makarova et al BD 2006
CRISPR clustered regularly interspaced short palindromic repeats
CRISPR repeats
TB and TA
Strepto-like
Ecoli-like
Pasteurella-like
Cas2COG1343
Cas1COG1518
Cas4COG1468
Cas3COG1203
ldquoThe common structural characteristics of CRISPR loci are (i) the presence of multiple short direct repeats which show no or very little sequence variation within a given locus (ii) the presence of non-repetitive spacer sequences between the repeats of similar size (iii) the presence of a common leader sequence of a few hundred basepairs in most species harbouring multiple CRISPR loci (iv) the absence of long open reading frames within the locus and (v) the presence of the cas1 gene accompanied by the cas2 cas3 or cas4 genes in CRISPR-containing speciesrdquo
The cas genes are our ldquorepairrdquo system
ldquoHelicaserdquo cassette
ldquoPolymeraserdquo cassette
paragon of prokaryotic genome plasticity-extensive gene shuffling-evidence of multiple horizontal transfers-widespread gene loss
RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins
extreme sequence diversity MOTIFs specific I II III IV V COGs 13361367 hhshhGs ust-lKGhh+hh hhGtt hD lGhttsGh 160413371332 y1726-like slhlpEKuVRGT lRTIDs YGuVTsGhuh COG1851 hGphpGpsaFh hGFGRh BH0337-like TpA-h+GIh-uIh hhLpDV LGsREhuht COG1567 hhhpp ups-lhtAhh luscoGhGh COG1769 hhhh+Ph-hhh ss-hhGhlhsh hGh hGtcp+hsthchp COG1688 (Cas5) hhhhhts sssshhGhlsh lGttphh COGs 15835551 hhhhoPhhl hGtppshGFGl YgcH-like hHphlh hGu+uhGhGhh y1727-like LHphLh GFsthGLStss MJ0978-like hHNH lG+tsuhGhGol
Ferredoxin foldRNA-recognition motif (RRM)
The sequence similarity space of CRISPR repeats visualized with the BioLayoutprogram [26] Dots denote individual repeat sequences connecting lines represent Smith-Waterman similarities such that closer dots represent more similar sequences Dot colors denote cluster association as derived from MCL clustering The 12 largest clusters are indicated by circles together with their sequence logos coarse phylogenetic composition and sample secondary structures where applicableKunin et al Genome Biology 2007 8R61 doi101186gb-2007-8-4-r61
CRISPR show extreme diversity and complex clustering
Arcful GTTGAAATC-------AGACCAAAATGGGATTGAAAG- Metthe GTTAAAATC-------AGACCAAAATGGGATTGAAAT- Pyraby GTTCCAATA-------AGACTAGAATAGAATTGAAAG- Pyrhor GTTTAATAA--------GACTAAAATAGAATTGAAAG- Sulsol GATTAATCC-------------CAAAAGAATTGAAAG- Pyrhor GTTTCCGTA--------GAACTAAATAGTGTGGAAAG- Pyraby GTTTCCGTA--------GAACTTAGTAGTGTGGAAAG- Pyraer GTTTCAACT------------ATCTTTTGATTTTTGG- Pictor GTTAAATAA-------TAACCTAAATAGGATTGAAAG- Theaer GTAAAATAG---------ACCTTAATAGGATTGAAAG- Metace GTTTCAATC-------CCTCAAAGGTCTGATTTTAAC- Sultok GATGAATCC------------CAAAAGGAATTGAAAG- Sulaci GTTTTAGTT-------------TCTTGTCGTTATTAC- Pictor GTTTAAGAA--------TTACTAGATAGTATGGAGT-- Halmar GTTTCAGAC-------GGACCCTTGTGGGGTTGAAGC- Metjan ATTAAAATC-------AGACCGTTTCGGAATGGAAAT- Metkan GTTTCATTA-CCCGTATTATTACGGGTTAATTGCGAG- Nanequ CTTTCAATA---------TTTCTAATATATTAGAAAC- Themar GTTTCAATA-------CTTCCTTAGAGGTATGGAAAC- Theten GTTTCAATC-------CCTTTTAGGTAGGCTAAAAAC Thethe GTTGCAAAC-CTCGTTAGCCTCGTAGAGGATTGAAAC- Aquaeo GTTTTAACT--------CCACACGGTACATTAGAAAC- Bachal GTCGCACTC----TATATGGGT-GCGTGGATTGAAAT- Azoarcus GTGTTCCCC-------GCGCATCGCGGGGGTTGAAG-- Chltep GTCTTCCCC--------ACGCC-CGTGGGGGTGTTTC- Chrvio GTGCTCCCC-------CACGCA-CGTGGGGATGAACCG Clotet GTATTAGTA-------GCACCA-TATTGGAATGTAAAT Anabaena GTTTTAATTAACAAAAATCCCTATCAGGGATTGAAAC- Myctub GTTTCCGTC--CCCTCTCGGGGTTTTGGGTCTGACGAC Yerpes GTTCACTGC--------CGCACAGGCAGCTTAGAAA-- Metcap GTTTCAATCCACTCCCGGCTATTTAGCCGGGAGATAC- Mycgal GTTTTAGCA-CTGTACAATACTTGTGTAAGCAATAAC- Pasmul GTTAACTGC--------CGTATAGGCAGCTTAGAAA-- Pasmul GTTGTAGTTCCCTCTCTCATTTCGCAGTGCTACAAT-- Pasmul GTTCACCAT--------CGTGTAGATGGCTTAGAAA--
All
A large subset of CRISPRs is conserved among diverse specieseven between archaea and bacteria suggesting that
CRISPRs are horizontally transferred together with cas genes
Mojica FJ Diez-Villasenor C Garcia-Martinez J Soria E
Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elementsJ Mol Evol 2005 Feb60(2)174-82
Here we show that CRISPR spacers derive from preexisting sequences either chromosomal or within transmissible genetic elements such as bacteriophages and conjugative plasmids
helliphelliphelliphellipThe transcription of the CRISPR loci (Tang et al 2002) suggests that such activity could be executed by CRISPR-RNA molecules acting as regulatory RNA that specifically recognizes the target through the homologous RNA-spacer sequence similarly to the eukaryotic interference RNA
HypothesisCRISPRCasbull is a prokaryotic immunity system that functions on the
RNAi principle
bull integrates short fragments of essential phageplasmid genes into CRISPRs
bull When expressed these fragments (psiRNA ndash after prokaryotic siRNA) silence the target gene and make the organism immune to the respective agent
bull contains all or most of the protein activities involved in these processes
bull Some of the Cas proteins are functional analogs of the eukaryotic proteins involved in RNAi in particular components of RISC (RNA-Induced Silencing Complex) and form prokaryotic analogs of RISC (pRISCs)
Transcription
(protein-guided) RNA folding
specific transcription factor (COG1517)
cellular RNApol
polycistronic pre-psiRNA
p-dicer (Helicase+HD-Hydrolase)(COG1203 COG2254)
RNA processing(fast)
pre-psiRNA75-100 nt
RNA processing(slow)
p-dicer
psiRNA25-45 nt
p-slicer(COG1468 COG4343
COG1857)
RAMP
target RNA cleavage
p-RISC
plasmid or phage mRNA
Annealing to RNA target
p-RISC
The basic scheme of CASS functioning
psiRNA25-45 nt
RAMP
RAMP-RNA complex(unstable)plasmid or phage
mRNA
Primer elongationCASS RNApol(COG1353)
Amplified annealing
long dsRNA (stable)
p-dicerDuplex degradation
RAMPRAMP binding
Cycle continues
Annealing to RNA target
Variant of CASS functioning with polymerasepsi-RNA amplification(mostly in thermophiles but also in Mycobacteria)
plasmid or phage mRNA
pre-psiRNA75-100 nt
CASS RNApol(COG1353)Or other RT
Reverse transcription with random copy choice
Random RNA recombination and reverse transcription
dsDNA with CRISPRs and target-derived spacers
Homologous recombination with genomic CRISPR region
integrase(COG1518)
genomic DNA
genomic DNA with new target-derived spacer
OR
New psiRNA generation
Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12
Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity
Key validation
bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease
Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes
(1) Adaptation
(2) Expression amp Processing
(3) Interference
invading virus plasmid
CRISPRleader
5 4 3 2 1
6 5 4 3 2 1
invading virus plasmid
invading virus plasmid
host
host
cas operon
Cascade
Cas3
cas3 cas2cas1cas5
Van der Oost et al TIBS 2009
The three types of CRISPRCas systems and their signature genes
1343
1343
1343
1343
Helicase1203
HD-f
Cas2
Cas6
Cas6
Cas5
Cas8
Cas9
Cas1
Cas3
Cas10
Cas11
Cas7
Cas6
Cas6
Cas5 Cas6
Cas4
Cas6
13433513
1343
RuvC-f
3513
HNH
13433513
1857
18571857 RAMP1688
RAMP1688
RecB-f1468
RecB-f1468
Helicase1203
R AMP1583
1857 1857Helicase1203
1857Helicase1203
1343 y1724
ygcKygcL
EcoliCASS2
YpestCASS3
NmeniCASS4
MtubeCASS6 RAMP module
DvulgCASS1
Tneap and HmariCASS7
ApernCASS5
Type I (Helicase cassette)
Cascade
Type II (HNH-type)
Type III (RAMP module or Polymerase cassette)
CASS4a
1857 1343RecB-f1468
Helicase1203
BH0338MTH1090LA3191
1421 RAMP1567
RAMP15833337RAMP
1583
RAMP1583
RAMP1769
SPy1049
Cas6
Cas61857
RecB-f1468
RAMP1688
Helicase1203
AF0070
R AMP1583
1343
1517
Cas12 Cas13
RAMP1688
RAMP1688
RAMP1688
Makarova et al NRM submitted
Experimental data on CRISPRCas systems
1857
1857
1857
RAMP1688
10
Cordip DIP0037Bacfra BF3951
Wolsuc WS1444Camjej Cj1522c
Neimen NMA0630Pasmul PM1126
Strthe str0658Strpyo Spy 0770
Treden TDE0328Staepi SERP2463
Mycgal MGA 0523Mycmob MMOB0320
Porgin PG1982Legpne lpp0161
Wolsuc WS1615Sulsol SSO1450
Aerper APE1240Sulsol SSO1405
Pyraer PAE0200Pyraer PAE0081
Metmaz MM0559Metmaz MM3249
Arcful AF1878Mansuc MS1635
Niteur NE0111Myctub Rv2817c
Nostoc alr0381Synech slr7071Thethe TTHB145
Chrvio CV1229Xanaxo XAC3842
Chltep CT1130Desvul DVUA0134
Metcap MCA0651Azoarc ebA3283
Deigeo_DRAFT_1682Mansuc MS0981
Baccla ABC3592Bachal BH0341
Strpyo Spy 1286Vibvul VVA1544
Nostoc alr1468Synech slr7092
Synech slr7016Geosul GSU0057
Thethe TTHB224Nostoc alr1568
Metkan MK1312
Esccol b2755Saltyp STM2938
Phopro PBPRC0034Geosul GSU1392
Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639
Symthe_STH663Corjei jk0643
Nocfar nfa44220Strave SAV7537
Metcap MCA0930Cordip DIP2214
Chrvio CV1756Zymmob ZMO0680
Pasmul PM0311Erwcar ECA3679
Yerpes y1722Legpne lpl2837Phopro PBPRB1995
Acinet ACIAD2484
Clotet CTC01148Fusnuc FN1177
Aquaeo aq 369Theten TTE2658Themar TM1797
Bacfra BF2544Porgin PG2014
Halmar pNG4053Clotet CTC01463
Metjan MJ0378Pyrhor PH1245
Thekod TK0455Metthe MTH1084
Arcful AF2435Metace MA3670
Nanequ NEQ017Pyrhor PH0173
Pictor PTO0003Pictor PTO0049Thevol TVN0106
1343
1343
1343
1343
1518
1518
1343
1518
RecB-f4343
RecB-f1468
RecB-f1468
RecB-f1468
Helicase12031857 RAMP
1688
RAMP1688
Helicase1203
Helicase1203
RAMPBH0337
RAMPYgcH
RAMPy1727
RAMPy17261857
Helicase1203
HD-f
Helicase1203
HD-f
1343
HD-f
HD-f
HD-f
RuvC-f
AF0070
BH0338
y1724
ygcKygcL
RAMP1583
HTH
AF1870
SPy1049
3574
AF1870
PH0918
HTH 3574
1857 RAMP1688
RecB-f1468
Helicase1203
HD-f
AF1870
MTH1090
AF1873
RAMP1583
3574
AF1870
3574
1343
1
7
2
3
4
5
6
7a
Sulsol SSO1999Sulsol SSO1440
Sultok ST2639Sultok ST0032
Sulsol SSO1402Pyraer PAE0068
Arcful AF1874Pyrhor PH0917Thekod TK0450
Pyraby PAB1689
13433513
3513
1518
RuvC-f
HNH
4aHNH
1 0
Pyrhor PH0162
6
64
6
4
16
17
7
7
7
7
1
2
7
7
7
7
5 7
5 75 7
5 75 7
5 7
6 1
7
Metkan MK1314
Metace MA1931
Metmaz MM3356
Metbar Mbar_A1355
Strthe stu0960Myctub Rv2823cStaepi SERP2461
Themar TM1811Mansuc MS1651
Niteur NE0123Thethe TTP0102
Metthe MTH1082
Metthe MTH326
Metjan MJ1672Pictor PTO0053Thevol TVN0112
Metkan MK1297Thethe TTP0115
Bachal BH0328Synech sll7090
Aquaeo aq 387Theten TTE2639
Themar TM1794Arcful AF1867
Pyrfur PF1129Thefus Tfu1578
Porgin PG1987Sultok ST1979
Sulsol SSO1991Sulsol SSO1729
Sultok ST0012Sulaci Saci 2046Nostoc all1479
1421 RAMP1567
RAMP1583
RAMPMTH323
RAMP1769
XMTH324
ldquoPolymeraserdquo cassette
ldquoHelicaserdquo cassette(A)
(B)
(C)
Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4
Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56
Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)
mRNA targeting
Phylogeny of Cas1 and the 3 types of CRISPRCas systems
228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes
EcoliCASS2Type IType I
Type I
Type I
Type I
Type III
Type III
Type III
Type III
Type III
Type III
Type II
YpestCASS3
NmeniCASS4
PolymeraseRAMPmoduleMtubeCASS6
PolRAMP
DvulgCASS1
TneapHmariCASS7
ApernCASS5
cas1cas9 cas2 cas4
cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI
II
IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2
RAMP
Makarova et al NRM submitted
CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes
54
256
1537
26
516
2
9
7 56107
3
10
13
383
210
46
216
8
1
7 70211
10
1
0102030405060708090
100
1623
84 6
17 7
2322
0
9
0
0
151
2
1
21
17
20 1
0
1533 28 7 14
0
9 7 40
117 210
0102030405060708090
100
Type I Type II Type IIICas1presenceabsence
Makarova et al in preparation
Modular evolution of the 3 types of CRISPRCas
Makarova et al in preparation
Mol Microbiol 2011 Jan79(2)484-502
A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair
Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF
Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair
Back to repair functions
The 3 major modalities of evolution
Koonin Wolf Biol Direct 2009
CRISPRCas as a bona fide Lamarckian system
Koonin Wolf Biol Direct 2009
Phenomenon Biological rolefunction
Phyletic spread
Lamarckian criteria
Genomic changes caused by environmental factor
Changes are specific to relevant genomic loci
Changes provide adaptation to the causative factor
Bona fide LamarckianCRISPRCas Defense against
viruses and other mobile elements
Most of the Archaeaand many bacteria
Yes Yes Yes
piRNA Defense against transposable elements in germline
Animals Yes Yes Yes
HGT (specific cases)
Adaptation to new environment stress response resistance
Archaea bacteria unicellular eukaryotes
Yes Yes Yes
Quasi-LamarckianHGT (general phenomenon)
Diverse innovations Archaea bacteria unicellular eukaryotes
Yes No Yesno
Stress-induced mutagenesis
Stress responseresistanceadaptation to new conditions
Ubiquitous Yes No or partially
Yes (but general evolvabilityenhanced as well)
Diverse Lamarckian and quasi-Lamarckian phenomena
Koonin Wolf Biol Direct 2009
Stress as a gauge of evolutionary modality
Koonin Wolf Biol Direct 2009
bullEvolution of parasites is intrinsic to any replicator system
bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since
bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms
bullPerennial arms race between parasites and hosts is one of the principal factors of evolution
Valerian DoljaOregon State University
Yuri Wolf NCBITatiana Senkevich NIAID NIH
Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille
Didier Raoult et al
Bill MartinUniv Duesseldorf
Kira Makarova NCBI
I Genes with readily detectable homologs from cellular life forms 1 Genes with closely related homologs from cellular organisms
(typically the host of the given virus) present in a narrow group of viruses2 Genes that are conserved within a virus lineage or even several
lineages and have moderately close cellular homologsOrigin relatively recent (1) or ancient (2) acquisition from host
II Virus-specific genes3 ORFans ie genes without detectable homologs except possibly
in closely related viruses 4 Virus-specific genes that are conserved within a virus lineageAcquisition from host but with rapid divergence from ancestor once within viral genomes
III Viral hallmark genes5 Genes shared by many diverse virus lineages with only very distant
homologs in cellular organisms
Natural history of viral genes a one-page summary of viral comparative genomics
Contributions of different classes of viral genes to the genomes of different classes of viruses strong dependence on genome size
Genome size (log) 0 1 2 3
Recent acquisitions
Old acquisitions
Virus-specific ORFans
Virus-specific conserved
Virus hallmark
Most RNAvirusesretroelementsRCR replicons
Large RNA virusesadenotailedphages
NCLDVherpeslargephages
Natural history of viral genesViral Hallmark Genes
Shared by many diverse groups of viruses from the smallest RNA viruses to the giant DNA viruses
Strong support for monophyly of all viral members of the respective gene families
Only distant homologs in cellular organisms
Can be viewed as signatures of the lsquovirus statersquo
Play major roles in genome replication packagingand assembly
4 Rolling circle replicationinitiation endonuclease
1 Jelly-roll capsid protein
2 Superfamily 3 helicase
Protein products of viral hallmark genes
3 RNA-dependent RNA polymeraseand Reverse transcriptase
5 Viral archaeo-eukaryotic DNA primase
6 UL9-like superfamily 2 helicase
7 Packaging ATPase of the FtsK family
8 ATPase subunit of terminase
The primordial gene pool hypothesis (extremely counterintuitive ndash Santiago Elena Feb 16 2011)
The hallmark genes AND by implication the major lineages of modern viruses (at least viruses of prokaryotes) descend directly from a primordial gene pool
Koonin Senkevich Dolja Biol Direct 2006 129
Viral hallmark genes arebullpresent in a huge diversity of viruses and other selfish elementsbullrepresented only by remote homologs in cellular life forms
-synergy with the diversity of genomic strategies
A crucial corollary If viruses come directly from a primordial gene pool then origin of viruses isinextricably linked to the origin of cells
Cell degeneration Escaped genes Pr imordial geneticsystems
CELL
SMALLPARASITICCELL
VERYSMALLVIRUS
CHROMOSOME
PLASMID
VIRUS
RNA
VIRUS
PRE-CELLULAR LIFE FORMS
DNA
VIRUS
mRNA
VIRUS
Competing concepts of the or igin of viruses
CELLS
Origin and evolution of virus-like genetic
elementsin the pre-cellular
era
Koonin Martin TIG 2005Koonin Senkevich Dolja Biol Direct 2006 129Koonin Ann NY Acad Sci 2009
Replicon fusionyields chromosomes
bullViruses and virus-like genetic elements are not ldquojustrdquo pathogens they aredominant entities in the biospherebullEmergence of virus-like parasites is inevitable in any replicating systembullIn the pre-cellular epoch the genetic elements that later became viral andcellular genomes comprised a single pool in which they mixed matched andevolved new increasingly complex gene ensemblesbullDifferent replication strategies including RNA replication reverse transcriptionand DNA replication evolved already in the primordial genetic poolbullWith the emergence of prokaryotic cells a distinct pool of viral genes formed that retained its identity ever since as evidenced by the extant distribution ofviral hallmark genes ldquovirus worldrdquo or the virospherebullThe emergence of the eukaryotic cell was a second melting pot of virus evolution from which viruses of eukaryotes originated via recombination of genes from prokaryote viruses retroelements and the evolving eukaryotic hostbullViruses make essential contributions to the evolution of the genomes of cellularlife forms in particular as vehicles of HGT GTAs transducing phages
The ancient Virus World (VW)
Koonin EV Senkevich TG Dolja VV The ancient Virus World and evolution of cells Biol Direct 2006 Koonin EV Wolf YI Nagasaki K Dolja VV The complexity of the virus world Nat Rev Microbiol 2009
Evolution of antivirus defense systems
bull CRISPRCas system of adaptive immunity in prokaryotes
bull A case for Lamarckian evolutionbull The perennial virus-host arms race
CRISPR repeats and Cas genesCRISPR Clustered Regularly interspaced short palindromic repeatsCas CRISPR-associated (genes)
Sorek et al Nature Rev Microbiol 2008
Makarova KS Aravind L Grishin NV Rogozin IB Koonin EVA DNA repair system specific for thermophilic Archaea and bacteria predicted by genomic context analysisNucleic Acids Res 2002 Jan 1530(2)482-96
During a systematic analysis of conserved gene context in prokaryotic genomes a previously undetected complex partially conserved neighborhood consisting of more than 20 genes was discovered in most Archaea and some bacteria including the hyperthermophiles Thermotoga maritima and Aquifex aeolicus The gene composition and gene order in this neighborhood vary greatly between species but all versions have a stable conserved core that consists of five genes One of the core genes encodes a predicted DNA helicase often fused to a predicted HD-superfamily hydrolase and another encodes a RecB family exonuclease three core genes remain uncharacterized but one of these might encode a nuclease of a new family helliphelliphelliphellipThe functional features of the proteins encoded in this neighborhood suggest that they comprise a previously undetected DNA repair system which to our knowledge is the first repair system largely specific for thermophiles to be identified
Protein components of the system an update with unification of many diverse families
~25 families altogetherFamily SubfamilyA Phyletic
distributionBComments
1 COG1518 COG1518 (cas1) All Putative novel nucleaseintegrase Mostly α-helical protein
2 COG1343 COG1343 (cas2) COG3512ygbF-like
MTH324-likey1723_N-like
All Small protein related to VapD fused to helicase (COG1203) iny1723-like proteins
3 COG1203 COG1203 (cas3) All DNA helicase Most proteins have fusion to HD nuclease
4 RecB-like nuclease
COG1468 (cas4) COG4343 All RecB-like nuclease Contains three-cysteine C-terminal cluster
5 RAMP Repair-associated mysterious
protein
COG1688 (cas5)COG1769 COG1583COG1567 COG1336COG1367 COG1604COG1337 COG1332COG5551
BH0337-likeMJ0978-likeYgcH-like
y1726-like y1727-like
All Belong to ldquoRAMPrdquo family possibly RNA-binding proteinstructurally related to duplicated ferredoxin fold (PDB 1wj9)
6 COG1857 COG1857 COG3649 YgcJ-like y1725-like
All αβ protein predicted nuclease or integrase
7 HD-like nuclease
COG1203 (N-terminus) COG2254 All HD-like nuclease
8 BH0338 BH0338-likeMTH1090-like
All mostlyarchaea and FIRM
Large Zn-finger containing proteins probably nucleases (nucleaseactivity was shown for MTH1090 (ref)
9 COG1353 predicted
polymerase
COG1353 Most archaea some bacteria
Predicted palm-domain polymerase distantly related to viral RdRp and RT
hellipand ~15 other less common proteins Makarova et al BD 2006
CRISPR clustered regularly interspaced short palindromic repeats
CRISPR repeats
TB and TA
Strepto-like
Ecoli-like
Pasteurella-like
Cas2COG1343
Cas1COG1518
Cas4COG1468
Cas3COG1203
ldquoThe common structural characteristics of CRISPR loci are (i) the presence of multiple short direct repeats which show no or very little sequence variation within a given locus (ii) the presence of non-repetitive spacer sequences between the repeats of similar size (iii) the presence of a common leader sequence of a few hundred basepairs in most species harbouring multiple CRISPR loci (iv) the absence of long open reading frames within the locus and (v) the presence of the cas1 gene accompanied by the cas2 cas3 or cas4 genes in CRISPR-containing speciesrdquo
The cas genes are our ldquorepairrdquo system
ldquoHelicaserdquo cassette
ldquoPolymeraserdquo cassette
paragon of prokaryotic genome plasticity-extensive gene shuffling-evidence of multiple horizontal transfers-widespread gene loss
RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins
extreme sequence diversity MOTIFs specific I II III IV V COGs 13361367 hhshhGs ust-lKGhh+hh hhGtt hD lGhttsGh 160413371332 y1726-like slhlpEKuVRGT lRTIDs YGuVTsGhuh COG1851 hGphpGpsaFh hGFGRh BH0337-like TpA-h+GIh-uIh hhLpDV LGsREhuht COG1567 hhhpp ups-lhtAhh luscoGhGh COG1769 hhhh+Ph-hhh ss-hhGhlhsh hGh hGtcp+hsthchp COG1688 (Cas5) hhhhhts sssshhGhlsh lGttphh COGs 15835551 hhhhoPhhl hGtppshGFGl YgcH-like hHphlh hGu+uhGhGhh y1727-like LHphLh GFsthGLStss MJ0978-like hHNH lG+tsuhGhGol
Ferredoxin foldRNA-recognition motif (RRM)
The sequence similarity space of CRISPR repeats visualized with the BioLayoutprogram [26] Dots denote individual repeat sequences connecting lines represent Smith-Waterman similarities such that closer dots represent more similar sequences Dot colors denote cluster association as derived from MCL clustering The 12 largest clusters are indicated by circles together with their sequence logos coarse phylogenetic composition and sample secondary structures where applicableKunin et al Genome Biology 2007 8R61 doi101186gb-2007-8-4-r61
CRISPR show extreme diversity and complex clustering
Arcful GTTGAAATC-------AGACCAAAATGGGATTGAAAG- Metthe GTTAAAATC-------AGACCAAAATGGGATTGAAAT- Pyraby GTTCCAATA-------AGACTAGAATAGAATTGAAAG- Pyrhor GTTTAATAA--------GACTAAAATAGAATTGAAAG- Sulsol GATTAATCC-------------CAAAAGAATTGAAAG- Pyrhor GTTTCCGTA--------GAACTAAATAGTGTGGAAAG- Pyraby GTTTCCGTA--------GAACTTAGTAGTGTGGAAAG- Pyraer GTTTCAACT------------ATCTTTTGATTTTTGG- Pictor GTTAAATAA-------TAACCTAAATAGGATTGAAAG- Theaer GTAAAATAG---------ACCTTAATAGGATTGAAAG- Metace GTTTCAATC-------CCTCAAAGGTCTGATTTTAAC- Sultok GATGAATCC------------CAAAAGGAATTGAAAG- Sulaci GTTTTAGTT-------------TCTTGTCGTTATTAC- Pictor GTTTAAGAA--------TTACTAGATAGTATGGAGT-- Halmar GTTTCAGAC-------GGACCCTTGTGGGGTTGAAGC- Metjan ATTAAAATC-------AGACCGTTTCGGAATGGAAAT- Metkan GTTTCATTA-CCCGTATTATTACGGGTTAATTGCGAG- Nanequ CTTTCAATA---------TTTCTAATATATTAGAAAC- Themar GTTTCAATA-------CTTCCTTAGAGGTATGGAAAC- Theten GTTTCAATC-------CCTTTTAGGTAGGCTAAAAAC Thethe GTTGCAAAC-CTCGTTAGCCTCGTAGAGGATTGAAAC- Aquaeo GTTTTAACT--------CCACACGGTACATTAGAAAC- Bachal GTCGCACTC----TATATGGGT-GCGTGGATTGAAAT- Azoarcus GTGTTCCCC-------GCGCATCGCGGGGGTTGAAG-- Chltep GTCTTCCCC--------ACGCC-CGTGGGGGTGTTTC- Chrvio GTGCTCCCC-------CACGCA-CGTGGGGATGAACCG Clotet GTATTAGTA-------GCACCA-TATTGGAATGTAAAT Anabaena GTTTTAATTAACAAAAATCCCTATCAGGGATTGAAAC- Myctub GTTTCCGTC--CCCTCTCGGGGTTTTGGGTCTGACGAC Yerpes GTTCACTGC--------CGCACAGGCAGCTTAGAAA-- Metcap GTTTCAATCCACTCCCGGCTATTTAGCCGGGAGATAC- Mycgal GTTTTAGCA-CTGTACAATACTTGTGTAAGCAATAAC- Pasmul GTTAACTGC--------CGTATAGGCAGCTTAGAAA-- Pasmul GTTGTAGTTCCCTCTCTCATTTCGCAGTGCTACAAT-- Pasmul GTTCACCAT--------CGTGTAGATGGCTTAGAAA--
All
A large subset of CRISPRs is conserved among diverse specieseven between archaea and bacteria suggesting that
CRISPRs are horizontally transferred together with cas genes
Mojica FJ Diez-Villasenor C Garcia-Martinez J Soria E
Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elementsJ Mol Evol 2005 Feb60(2)174-82
Here we show that CRISPR spacers derive from preexisting sequences either chromosomal or within transmissible genetic elements such as bacteriophages and conjugative plasmids
helliphelliphelliphellipThe transcription of the CRISPR loci (Tang et al 2002) suggests that such activity could be executed by CRISPR-RNA molecules acting as regulatory RNA that specifically recognizes the target through the homologous RNA-spacer sequence similarly to the eukaryotic interference RNA
HypothesisCRISPRCasbull is a prokaryotic immunity system that functions on the
RNAi principle
bull integrates short fragments of essential phageplasmid genes into CRISPRs
bull When expressed these fragments (psiRNA ndash after prokaryotic siRNA) silence the target gene and make the organism immune to the respective agent
bull contains all or most of the protein activities involved in these processes
bull Some of the Cas proteins are functional analogs of the eukaryotic proteins involved in RNAi in particular components of RISC (RNA-Induced Silencing Complex) and form prokaryotic analogs of RISC (pRISCs)
Transcription
(protein-guided) RNA folding
specific transcription factor (COG1517)
cellular RNApol
polycistronic pre-psiRNA
p-dicer (Helicase+HD-Hydrolase)(COG1203 COG2254)
RNA processing(fast)
pre-psiRNA75-100 nt
RNA processing(slow)
p-dicer
psiRNA25-45 nt
p-slicer(COG1468 COG4343
COG1857)
RAMP
target RNA cleavage
p-RISC
plasmid or phage mRNA
Annealing to RNA target
p-RISC
The basic scheme of CASS functioning
psiRNA25-45 nt
RAMP
RAMP-RNA complex(unstable)plasmid or phage
mRNA
Primer elongationCASS RNApol(COG1353)
Amplified annealing
long dsRNA (stable)
p-dicerDuplex degradation
RAMPRAMP binding
Cycle continues
Annealing to RNA target
Variant of CASS functioning with polymerasepsi-RNA amplification(mostly in thermophiles but also in Mycobacteria)
plasmid or phage mRNA
pre-psiRNA75-100 nt
CASS RNApol(COG1353)Or other RT
Reverse transcription with random copy choice
Random RNA recombination and reverse transcription
dsDNA with CRISPRs and target-derived spacers
Homologous recombination with genomic CRISPR region
integrase(COG1518)
genomic DNA
genomic DNA with new target-derived spacer
OR
New psiRNA generation
Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12
Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity
Key validation
bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease
Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes
(1) Adaptation
(2) Expression amp Processing
(3) Interference
invading virus plasmid
CRISPRleader
5 4 3 2 1
6 5 4 3 2 1
invading virus plasmid
invading virus plasmid
host
host
cas operon
Cascade
Cas3
cas3 cas2cas1cas5
Van der Oost et al TIBS 2009
The three types of CRISPRCas systems and their signature genes
1343
1343
1343
1343
Helicase1203
HD-f
Cas2
Cas6
Cas6
Cas5
Cas8
Cas9
Cas1
Cas3
Cas10
Cas11
Cas7
Cas6
Cas6
Cas5 Cas6
Cas4
Cas6
13433513
1343
RuvC-f
3513
HNH
13433513
1857
18571857 RAMP1688
RAMP1688
RecB-f1468
RecB-f1468
Helicase1203
R AMP1583
1857 1857Helicase1203
1857Helicase1203
1343 y1724
ygcKygcL
EcoliCASS2
YpestCASS3
NmeniCASS4
MtubeCASS6 RAMP module
DvulgCASS1
Tneap and HmariCASS7
ApernCASS5
Type I (Helicase cassette)
Cascade
Type II (HNH-type)
Type III (RAMP module or Polymerase cassette)
CASS4a
1857 1343RecB-f1468
Helicase1203
BH0338MTH1090LA3191
1421 RAMP1567
RAMP15833337RAMP
1583
RAMP1583
RAMP1769
SPy1049
Cas6
Cas61857
RecB-f1468
RAMP1688
Helicase1203
AF0070
R AMP1583
1343
1517
Cas12 Cas13
RAMP1688
RAMP1688
RAMP1688
Makarova et al NRM submitted
Experimental data on CRISPRCas systems
1857
1857
1857
RAMP1688
10
Cordip DIP0037Bacfra BF3951
Wolsuc WS1444Camjej Cj1522c
Neimen NMA0630Pasmul PM1126
Strthe str0658Strpyo Spy 0770
Treden TDE0328Staepi SERP2463
Mycgal MGA 0523Mycmob MMOB0320
Porgin PG1982Legpne lpp0161
Wolsuc WS1615Sulsol SSO1450
Aerper APE1240Sulsol SSO1405
Pyraer PAE0200Pyraer PAE0081
Metmaz MM0559Metmaz MM3249
Arcful AF1878Mansuc MS1635
Niteur NE0111Myctub Rv2817c
Nostoc alr0381Synech slr7071Thethe TTHB145
Chrvio CV1229Xanaxo XAC3842
Chltep CT1130Desvul DVUA0134
Metcap MCA0651Azoarc ebA3283
Deigeo_DRAFT_1682Mansuc MS0981
Baccla ABC3592Bachal BH0341
Strpyo Spy 1286Vibvul VVA1544
Nostoc alr1468Synech slr7092
Synech slr7016Geosul GSU0057
Thethe TTHB224Nostoc alr1568
Metkan MK1312
Esccol b2755Saltyp STM2938
Phopro PBPRC0034Geosul GSU1392
Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639
Symthe_STH663Corjei jk0643
Nocfar nfa44220Strave SAV7537
Metcap MCA0930Cordip DIP2214
Chrvio CV1756Zymmob ZMO0680
Pasmul PM0311Erwcar ECA3679
Yerpes y1722Legpne lpl2837Phopro PBPRB1995
Acinet ACIAD2484
Clotet CTC01148Fusnuc FN1177
Aquaeo aq 369Theten TTE2658Themar TM1797
Bacfra BF2544Porgin PG2014
Halmar pNG4053Clotet CTC01463
Metjan MJ0378Pyrhor PH1245
Thekod TK0455Metthe MTH1084
Arcful AF2435Metace MA3670
Nanequ NEQ017Pyrhor PH0173
Pictor PTO0003Pictor PTO0049Thevol TVN0106
1343
1343
1343
1343
1518
1518
1343
1518
RecB-f4343
RecB-f1468
RecB-f1468
RecB-f1468
Helicase12031857 RAMP
1688
RAMP1688
Helicase1203
Helicase1203
RAMPBH0337
RAMPYgcH
RAMPy1727
RAMPy17261857
Helicase1203
HD-f
Helicase1203
HD-f
1343
HD-f
HD-f
HD-f
RuvC-f
AF0070
BH0338
y1724
ygcKygcL
RAMP1583
HTH
AF1870
SPy1049
3574
AF1870
PH0918
HTH 3574
1857 RAMP1688
RecB-f1468
Helicase1203
HD-f
AF1870
MTH1090
AF1873
RAMP1583
3574
AF1870
3574
1343
1
7
2
3
4
5
6
7a
Sulsol SSO1999Sulsol SSO1440
Sultok ST2639Sultok ST0032
Sulsol SSO1402Pyraer PAE0068
Arcful AF1874Pyrhor PH0917Thekod TK0450
Pyraby PAB1689
13433513
3513
1518
RuvC-f
HNH
4aHNH
1 0
Pyrhor PH0162
6
64
6
4
16
17
7
7
7
7
1
2
7
7
7
7
5 7
5 75 7
5 75 7
5 7
6 1
7
Metkan MK1314
Metace MA1931
Metmaz MM3356
Metbar Mbar_A1355
Strthe stu0960Myctub Rv2823cStaepi SERP2461
Themar TM1811Mansuc MS1651
Niteur NE0123Thethe TTP0102
Metthe MTH1082
Metthe MTH326
Metjan MJ1672Pictor PTO0053Thevol TVN0112
Metkan MK1297Thethe TTP0115
Bachal BH0328Synech sll7090
Aquaeo aq 387Theten TTE2639
Themar TM1794Arcful AF1867
Pyrfur PF1129Thefus Tfu1578
Porgin PG1987Sultok ST1979
Sulsol SSO1991Sulsol SSO1729
Sultok ST0012Sulaci Saci 2046Nostoc all1479
1421 RAMP1567
RAMP1583
RAMPMTH323
RAMP1769
XMTH324
ldquoPolymeraserdquo cassette
ldquoHelicaserdquo cassette(A)
(B)
(C)
Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4
Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56
Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)
mRNA targeting
Phylogeny of Cas1 and the 3 types of CRISPRCas systems
228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes
EcoliCASS2Type IType I
Type I
Type I
Type I
Type III
Type III
Type III
Type III
Type III
Type III
Type II
YpestCASS3
NmeniCASS4
PolymeraseRAMPmoduleMtubeCASS6
PolRAMP
DvulgCASS1
TneapHmariCASS7
ApernCASS5
cas1cas9 cas2 cas4
cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI
II
IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2
RAMP
Makarova et al NRM submitted
CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes
54
256
1537
26
516
2
9
7 56107
3
10
13
383
210
46
216
8
1
7 70211
10
1
0102030405060708090
100
1623
84 6
17 7
2322
0
9
0
0
151
2
1
21
17
20 1
0
1533 28 7 14
0
9 7 40
117 210
0102030405060708090
100
Type I Type II Type IIICas1presenceabsence
Makarova et al in preparation
Modular evolution of the 3 types of CRISPRCas
Makarova et al in preparation
Mol Microbiol 2011 Jan79(2)484-502
A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair
Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF
Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair
Back to repair functions
The 3 major modalities of evolution
Koonin Wolf Biol Direct 2009
CRISPRCas as a bona fide Lamarckian system
Koonin Wolf Biol Direct 2009
Phenomenon Biological rolefunction
Phyletic spread
Lamarckian criteria
Genomic changes caused by environmental factor
Changes are specific to relevant genomic loci
Changes provide adaptation to the causative factor
Bona fide LamarckianCRISPRCas Defense against
viruses and other mobile elements
Most of the Archaeaand many bacteria
Yes Yes Yes
piRNA Defense against transposable elements in germline
Animals Yes Yes Yes
HGT (specific cases)
Adaptation to new environment stress response resistance
Archaea bacteria unicellular eukaryotes
Yes Yes Yes
Quasi-LamarckianHGT (general phenomenon)
Diverse innovations Archaea bacteria unicellular eukaryotes
Yes No Yesno
Stress-induced mutagenesis
Stress responseresistanceadaptation to new conditions
Ubiquitous Yes No or partially
Yes (but general evolvabilityenhanced as well)
Diverse Lamarckian and quasi-Lamarckian phenomena
Koonin Wolf Biol Direct 2009
Stress as a gauge of evolutionary modality
Koonin Wolf Biol Direct 2009
bullEvolution of parasites is intrinsic to any replicator system
bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since
bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms
bullPerennial arms race between parasites and hosts is one of the principal factors of evolution
Valerian DoljaOregon State University
Yuri Wolf NCBITatiana Senkevich NIAID NIH
Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille
Didier Raoult et al
Bill MartinUniv Duesseldorf
Kira Makarova NCBI
Contributions of different classes of viral genes to the genomes of different classes of viruses strong dependence on genome size
Genome size (log) 0 1 2 3
Recent acquisitions
Old acquisitions
Virus-specific ORFans
Virus-specific conserved
Virus hallmark
Most RNAvirusesretroelementsRCR replicons
Large RNA virusesadenotailedphages
NCLDVherpeslargephages
Natural history of viral genesViral Hallmark Genes
Shared by many diverse groups of viruses from the smallest RNA viruses to the giant DNA viruses
Strong support for monophyly of all viral members of the respective gene families
Only distant homologs in cellular organisms
Can be viewed as signatures of the lsquovirus statersquo
Play major roles in genome replication packagingand assembly
4 Rolling circle replicationinitiation endonuclease
1 Jelly-roll capsid protein
2 Superfamily 3 helicase
Protein products of viral hallmark genes
3 RNA-dependent RNA polymeraseand Reverse transcriptase
5 Viral archaeo-eukaryotic DNA primase
6 UL9-like superfamily 2 helicase
7 Packaging ATPase of the FtsK family
8 ATPase subunit of terminase
The primordial gene pool hypothesis (extremely counterintuitive ndash Santiago Elena Feb 16 2011)
The hallmark genes AND by implication the major lineages of modern viruses (at least viruses of prokaryotes) descend directly from a primordial gene pool
Koonin Senkevich Dolja Biol Direct 2006 129
Viral hallmark genes arebullpresent in a huge diversity of viruses and other selfish elementsbullrepresented only by remote homologs in cellular life forms
-synergy with the diversity of genomic strategies
A crucial corollary If viruses come directly from a primordial gene pool then origin of viruses isinextricably linked to the origin of cells
Cell degeneration Escaped genes Pr imordial geneticsystems
CELL
SMALLPARASITICCELL
VERYSMALLVIRUS
CHROMOSOME
PLASMID
VIRUS
RNA
VIRUS
PRE-CELLULAR LIFE FORMS
DNA
VIRUS
mRNA
VIRUS
Competing concepts of the or igin of viruses
CELLS
Origin and evolution of virus-like genetic
elementsin the pre-cellular
era
Koonin Martin TIG 2005Koonin Senkevich Dolja Biol Direct 2006 129Koonin Ann NY Acad Sci 2009
Replicon fusionyields chromosomes
bullViruses and virus-like genetic elements are not ldquojustrdquo pathogens they aredominant entities in the biospherebullEmergence of virus-like parasites is inevitable in any replicating systembullIn the pre-cellular epoch the genetic elements that later became viral andcellular genomes comprised a single pool in which they mixed matched andevolved new increasingly complex gene ensemblesbullDifferent replication strategies including RNA replication reverse transcriptionand DNA replication evolved already in the primordial genetic poolbullWith the emergence of prokaryotic cells a distinct pool of viral genes formed that retained its identity ever since as evidenced by the extant distribution ofviral hallmark genes ldquovirus worldrdquo or the virospherebullThe emergence of the eukaryotic cell was a second melting pot of virus evolution from which viruses of eukaryotes originated via recombination of genes from prokaryote viruses retroelements and the evolving eukaryotic hostbullViruses make essential contributions to the evolution of the genomes of cellularlife forms in particular as vehicles of HGT GTAs transducing phages
The ancient Virus World (VW)
Koonin EV Senkevich TG Dolja VV The ancient Virus World and evolution of cells Biol Direct 2006 Koonin EV Wolf YI Nagasaki K Dolja VV The complexity of the virus world Nat Rev Microbiol 2009
Evolution of antivirus defense systems
bull CRISPRCas system of adaptive immunity in prokaryotes
bull A case for Lamarckian evolutionbull The perennial virus-host arms race
CRISPR repeats and Cas genesCRISPR Clustered Regularly interspaced short palindromic repeatsCas CRISPR-associated (genes)
Sorek et al Nature Rev Microbiol 2008
Makarova KS Aravind L Grishin NV Rogozin IB Koonin EVA DNA repair system specific for thermophilic Archaea and bacteria predicted by genomic context analysisNucleic Acids Res 2002 Jan 1530(2)482-96
During a systematic analysis of conserved gene context in prokaryotic genomes a previously undetected complex partially conserved neighborhood consisting of more than 20 genes was discovered in most Archaea and some bacteria including the hyperthermophiles Thermotoga maritima and Aquifex aeolicus The gene composition and gene order in this neighborhood vary greatly between species but all versions have a stable conserved core that consists of five genes One of the core genes encodes a predicted DNA helicase often fused to a predicted HD-superfamily hydrolase and another encodes a RecB family exonuclease three core genes remain uncharacterized but one of these might encode a nuclease of a new family helliphelliphelliphellipThe functional features of the proteins encoded in this neighborhood suggest that they comprise a previously undetected DNA repair system which to our knowledge is the first repair system largely specific for thermophiles to be identified
Protein components of the system an update with unification of many diverse families
~25 families altogetherFamily SubfamilyA Phyletic
distributionBComments
1 COG1518 COG1518 (cas1) All Putative novel nucleaseintegrase Mostly α-helical protein
2 COG1343 COG1343 (cas2) COG3512ygbF-like
MTH324-likey1723_N-like
All Small protein related to VapD fused to helicase (COG1203) iny1723-like proteins
3 COG1203 COG1203 (cas3) All DNA helicase Most proteins have fusion to HD nuclease
4 RecB-like nuclease
COG1468 (cas4) COG4343 All RecB-like nuclease Contains three-cysteine C-terminal cluster
5 RAMP Repair-associated mysterious
protein
COG1688 (cas5)COG1769 COG1583COG1567 COG1336COG1367 COG1604COG1337 COG1332COG5551
BH0337-likeMJ0978-likeYgcH-like
y1726-like y1727-like
All Belong to ldquoRAMPrdquo family possibly RNA-binding proteinstructurally related to duplicated ferredoxin fold (PDB 1wj9)
6 COG1857 COG1857 COG3649 YgcJ-like y1725-like
All αβ protein predicted nuclease or integrase
7 HD-like nuclease
COG1203 (N-terminus) COG2254 All HD-like nuclease
8 BH0338 BH0338-likeMTH1090-like
All mostlyarchaea and FIRM
Large Zn-finger containing proteins probably nucleases (nucleaseactivity was shown for MTH1090 (ref)
9 COG1353 predicted
polymerase
COG1353 Most archaea some bacteria
Predicted palm-domain polymerase distantly related to viral RdRp and RT
hellipand ~15 other less common proteins Makarova et al BD 2006
CRISPR clustered regularly interspaced short palindromic repeats
CRISPR repeats
TB and TA
Strepto-like
Ecoli-like
Pasteurella-like
Cas2COG1343
Cas1COG1518
Cas4COG1468
Cas3COG1203
ldquoThe common structural characteristics of CRISPR loci are (i) the presence of multiple short direct repeats which show no or very little sequence variation within a given locus (ii) the presence of non-repetitive spacer sequences between the repeats of similar size (iii) the presence of a common leader sequence of a few hundred basepairs in most species harbouring multiple CRISPR loci (iv) the absence of long open reading frames within the locus and (v) the presence of the cas1 gene accompanied by the cas2 cas3 or cas4 genes in CRISPR-containing speciesrdquo
The cas genes are our ldquorepairrdquo system
ldquoHelicaserdquo cassette
ldquoPolymeraserdquo cassette
paragon of prokaryotic genome plasticity-extensive gene shuffling-evidence of multiple horizontal transfers-widespread gene loss
RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins
extreme sequence diversity MOTIFs specific I II III IV V COGs 13361367 hhshhGs ust-lKGhh+hh hhGtt hD lGhttsGh 160413371332 y1726-like slhlpEKuVRGT lRTIDs YGuVTsGhuh COG1851 hGphpGpsaFh hGFGRh BH0337-like TpA-h+GIh-uIh hhLpDV LGsREhuht COG1567 hhhpp ups-lhtAhh luscoGhGh COG1769 hhhh+Ph-hhh ss-hhGhlhsh hGh hGtcp+hsthchp COG1688 (Cas5) hhhhhts sssshhGhlsh lGttphh COGs 15835551 hhhhoPhhl hGtppshGFGl YgcH-like hHphlh hGu+uhGhGhh y1727-like LHphLh GFsthGLStss MJ0978-like hHNH lG+tsuhGhGol
Ferredoxin foldRNA-recognition motif (RRM)
The sequence similarity space of CRISPR repeats visualized with the BioLayoutprogram [26] Dots denote individual repeat sequences connecting lines represent Smith-Waterman similarities such that closer dots represent more similar sequences Dot colors denote cluster association as derived from MCL clustering The 12 largest clusters are indicated by circles together with their sequence logos coarse phylogenetic composition and sample secondary structures where applicableKunin et al Genome Biology 2007 8R61 doi101186gb-2007-8-4-r61
CRISPR show extreme diversity and complex clustering
Arcful GTTGAAATC-------AGACCAAAATGGGATTGAAAG- Metthe GTTAAAATC-------AGACCAAAATGGGATTGAAAT- Pyraby GTTCCAATA-------AGACTAGAATAGAATTGAAAG- Pyrhor GTTTAATAA--------GACTAAAATAGAATTGAAAG- Sulsol GATTAATCC-------------CAAAAGAATTGAAAG- Pyrhor GTTTCCGTA--------GAACTAAATAGTGTGGAAAG- Pyraby GTTTCCGTA--------GAACTTAGTAGTGTGGAAAG- Pyraer GTTTCAACT------------ATCTTTTGATTTTTGG- Pictor GTTAAATAA-------TAACCTAAATAGGATTGAAAG- Theaer GTAAAATAG---------ACCTTAATAGGATTGAAAG- Metace GTTTCAATC-------CCTCAAAGGTCTGATTTTAAC- Sultok GATGAATCC------------CAAAAGGAATTGAAAG- Sulaci GTTTTAGTT-------------TCTTGTCGTTATTAC- Pictor GTTTAAGAA--------TTACTAGATAGTATGGAGT-- Halmar GTTTCAGAC-------GGACCCTTGTGGGGTTGAAGC- Metjan ATTAAAATC-------AGACCGTTTCGGAATGGAAAT- Metkan GTTTCATTA-CCCGTATTATTACGGGTTAATTGCGAG- Nanequ CTTTCAATA---------TTTCTAATATATTAGAAAC- Themar GTTTCAATA-------CTTCCTTAGAGGTATGGAAAC- Theten GTTTCAATC-------CCTTTTAGGTAGGCTAAAAAC Thethe GTTGCAAAC-CTCGTTAGCCTCGTAGAGGATTGAAAC- Aquaeo GTTTTAACT--------CCACACGGTACATTAGAAAC- Bachal GTCGCACTC----TATATGGGT-GCGTGGATTGAAAT- Azoarcus GTGTTCCCC-------GCGCATCGCGGGGGTTGAAG-- Chltep GTCTTCCCC--------ACGCC-CGTGGGGGTGTTTC- Chrvio GTGCTCCCC-------CACGCA-CGTGGGGATGAACCG Clotet GTATTAGTA-------GCACCA-TATTGGAATGTAAAT Anabaena GTTTTAATTAACAAAAATCCCTATCAGGGATTGAAAC- Myctub GTTTCCGTC--CCCTCTCGGGGTTTTGGGTCTGACGAC Yerpes GTTCACTGC--------CGCACAGGCAGCTTAGAAA-- Metcap GTTTCAATCCACTCCCGGCTATTTAGCCGGGAGATAC- Mycgal GTTTTAGCA-CTGTACAATACTTGTGTAAGCAATAAC- Pasmul GTTAACTGC--------CGTATAGGCAGCTTAGAAA-- Pasmul GTTGTAGTTCCCTCTCTCATTTCGCAGTGCTACAAT-- Pasmul GTTCACCAT--------CGTGTAGATGGCTTAGAAA--
All
A large subset of CRISPRs is conserved among diverse specieseven between archaea and bacteria suggesting that
CRISPRs are horizontally transferred together with cas genes
Mojica FJ Diez-Villasenor C Garcia-Martinez J Soria E
Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elementsJ Mol Evol 2005 Feb60(2)174-82
Here we show that CRISPR spacers derive from preexisting sequences either chromosomal or within transmissible genetic elements such as bacteriophages and conjugative plasmids
helliphelliphelliphellipThe transcription of the CRISPR loci (Tang et al 2002) suggests that such activity could be executed by CRISPR-RNA molecules acting as regulatory RNA that specifically recognizes the target through the homologous RNA-spacer sequence similarly to the eukaryotic interference RNA
HypothesisCRISPRCasbull is a prokaryotic immunity system that functions on the
RNAi principle
bull integrates short fragments of essential phageplasmid genes into CRISPRs
bull When expressed these fragments (psiRNA ndash after prokaryotic siRNA) silence the target gene and make the organism immune to the respective agent
bull contains all or most of the protein activities involved in these processes
bull Some of the Cas proteins are functional analogs of the eukaryotic proteins involved in RNAi in particular components of RISC (RNA-Induced Silencing Complex) and form prokaryotic analogs of RISC (pRISCs)
Transcription
(protein-guided) RNA folding
specific transcription factor (COG1517)
cellular RNApol
polycistronic pre-psiRNA
p-dicer (Helicase+HD-Hydrolase)(COG1203 COG2254)
RNA processing(fast)
pre-psiRNA75-100 nt
RNA processing(slow)
p-dicer
psiRNA25-45 nt
p-slicer(COG1468 COG4343
COG1857)
RAMP
target RNA cleavage
p-RISC
plasmid or phage mRNA
Annealing to RNA target
p-RISC
The basic scheme of CASS functioning
psiRNA25-45 nt
RAMP
RAMP-RNA complex(unstable)plasmid or phage
mRNA
Primer elongationCASS RNApol(COG1353)
Amplified annealing
long dsRNA (stable)
p-dicerDuplex degradation
RAMPRAMP binding
Cycle continues
Annealing to RNA target
Variant of CASS functioning with polymerasepsi-RNA amplification(mostly in thermophiles but also in Mycobacteria)
plasmid or phage mRNA
pre-psiRNA75-100 nt
CASS RNApol(COG1353)Or other RT
Reverse transcription with random copy choice
Random RNA recombination and reverse transcription
dsDNA with CRISPRs and target-derived spacers
Homologous recombination with genomic CRISPR region
integrase(COG1518)
genomic DNA
genomic DNA with new target-derived spacer
OR
New psiRNA generation
Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12
Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity
Key validation
bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease
Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes
(1) Adaptation
(2) Expression amp Processing
(3) Interference
invading virus plasmid
CRISPRleader
5 4 3 2 1
6 5 4 3 2 1
invading virus plasmid
invading virus plasmid
host
host
cas operon
Cascade
Cas3
cas3 cas2cas1cas5
Van der Oost et al TIBS 2009
The three types of CRISPRCas systems and their signature genes
1343
1343
1343
1343
Helicase1203
HD-f
Cas2
Cas6
Cas6
Cas5
Cas8
Cas9
Cas1
Cas3
Cas10
Cas11
Cas7
Cas6
Cas6
Cas5 Cas6
Cas4
Cas6
13433513
1343
RuvC-f
3513
HNH
13433513
1857
18571857 RAMP1688
RAMP1688
RecB-f1468
RecB-f1468
Helicase1203
R AMP1583
1857 1857Helicase1203
1857Helicase1203
1343 y1724
ygcKygcL
EcoliCASS2
YpestCASS3
NmeniCASS4
MtubeCASS6 RAMP module
DvulgCASS1
Tneap and HmariCASS7
ApernCASS5
Type I (Helicase cassette)
Cascade
Type II (HNH-type)
Type III (RAMP module or Polymerase cassette)
CASS4a
1857 1343RecB-f1468
Helicase1203
BH0338MTH1090LA3191
1421 RAMP1567
RAMP15833337RAMP
1583
RAMP1583
RAMP1769
SPy1049
Cas6
Cas61857
RecB-f1468
RAMP1688
Helicase1203
AF0070
R AMP1583
1343
1517
Cas12 Cas13
RAMP1688
RAMP1688
RAMP1688
Makarova et al NRM submitted
Experimental data on CRISPRCas systems
1857
1857
1857
RAMP1688
10
Cordip DIP0037Bacfra BF3951
Wolsuc WS1444Camjej Cj1522c
Neimen NMA0630Pasmul PM1126
Strthe str0658Strpyo Spy 0770
Treden TDE0328Staepi SERP2463
Mycgal MGA 0523Mycmob MMOB0320
Porgin PG1982Legpne lpp0161
Wolsuc WS1615Sulsol SSO1450
Aerper APE1240Sulsol SSO1405
Pyraer PAE0200Pyraer PAE0081
Metmaz MM0559Metmaz MM3249
Arcful AF1878Mansuc MS1635
Niteur NE0111Myctub Rv2817c
Nostoc alr0381Synech slr7071Thethe TTHB145
Chrvio CV1229Xanaxo XAC3842
Chltep CT1130Desvul DVUA0134
Metcap MCA0651Azoarc ebA3283
Deigeo_DRAFT_1682Mansuc MS0981
Baccla ABC3592Bachal BH0341
Strpyo Spy 1286Vibvul VVA1544
Nostoc alr1468Synech slr7092
Synech slr7016Geosul GSU0057
Thethe TTHB224Nostoc alr1568
Metkan MK1312
Esccol b2755Saltyp STM2938
Phopro PBPRC0034Geosul GSU1392
Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639
Symthe_STH663Corjei jk0643
Nocfar nfa44220Strave SAV7537
Metcap MCA0930Cordip DIP2214
Chrvio CV1756Zymmob ZMO0680
Pasmul PM0311Erwcar ECA3679
Yerpes y1722Legpne lpl2837Phopro PBPRB1995
Acinet ACIAD2484
Clotet CTC01148Fusnuc FN1177
Aquaeo aq 369Theten TTE2658Themar TM1797
Bacfra BF2544Porgin PG2014
Halmar pNG4053Clotet CTC01463
Metjan MJ0378Pyrhor PH1245
Thekod TK0455Metthe MTH1084
Arcful AF2435Metace MA3670
Nanequ NEQ017Pyrhor PH0173
Pictor PTO0003Pictor PTO0049Thevol TVN0106
1343
1343
1343
1343
1518
1518
1343
1518
RecB-f4343
RecB-f1468
RecB-f1468
RecB-f1468
Helicase12031857 RAMP
1688
RAMP1688
Helicase1203
Helicase1203
RAMPBH0337
RAMPYgcH
RAMPy1727
RAMPy17261857
Helicase1203
HD-f
Helicase1203
HD-f
1343
HD-f
HD-f
HD-f
RuvC-f
AF0070
BH0338
y1724
ygcKygcL
RAMP1583
HTH
AF1870
SPy1049
3574
AF1870
PH0918
HTH 3574
1857 RAMP1688
RecB-f1468
Helicase1203
HD-f
AF1870
MTH1090
AF1873
RAMP1583
3574
AF1870
3574
1343
1
7
2
3
4
5
6
7a
Sulsol SSO1999Sulsol SSO1440
Sultok ST2639Sultok ST0032
Sulsol SSO1402Pyraer PAE0068
Arcful AF1874Pyrhor PH0917Thekod TK0450
Pyraby PAB1689
13433513
3513
1518
RuvC-f
HNH
4aHNH
1 0
Pyrhor PH0162
6
64
6
4
16
17
7
7
7
7
1
2
7
7
7
7
5 7
5 75 7
5 75 7
5 7
6 1
7
Metkan MK1314
Metace MA1931
Metmaz MM3356
Metbar Mbar_A1355
Strthe stu0960Myctub Rv2823cStaepi SERP2461
Themar TM1811Mansuc MS1651
Niteur NE0123Thethe TTP0102
Metthe MTH1082
Metthe MTH326
Metjan MJ1672Pictor PTO0053Thevol TVN0112
Metkan MK1297Thethe TTP0115
Bachal BH0328Synech sll7090
Aquaeo aq 387Theten TTE2639
Themar TM1794Arcful AF1867
Pyrfur PF1129Thefus Tfu1578
Porgin PG1987Sultok ST1979
Sulsol SSO1991Sulsol SSO1729
Sultok ST0012Sulaci Saci 2046Nostoc all1479
1421 RAMP1567
RAMP1583
RAMPMTH323
RAMP1769
XMTH324
ldquoPolymeraserdquo cassette
ldquoHelicaserdquo cassette(A)
(B)
(C)
Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4
Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56
Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)
mRNA targeting
Phylogeny of Cas1 and the 3 types of CRISPRCas systems
228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes
EcoliCASS2Type IType I
Type I
Type I
Type I
Type III
Type III
Type III
Type III
Type III
Type III
Type II
YpestCASS3
NmeniCASS4
PolymeraseRAMPmoduleMtubeCASS6
PolRAMP
DvulgCASS1
TneapHmariCASS7
ApernCASS5
cas1cas9 cas2 cas4
cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI
II
IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2
RAMP
Makarova et al NRM submitted
CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes
54
256
1537
26
516
2
9
7 56107
3
10
13
383
210
46
216
8
1
7 70211
10
1
0102030405060708090
100
1623
84 6
17 7
2322
0
9
0
0
151
2
1
21
17
20 1
0
1533 28 7 14
0
9 7 40
117 210
0102030405060708090
100
Type I Type II Type IIICas1presenceabsence
Makarova et al in preparation
Modular evolution of the 3 types of CRISPRCas
Makarova et al in preparation
Mol Microbiol 2011 Jan79(2)484-502
A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair
Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF
Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair
Back to repair functions
The 3 major modalities of evolution
Koonin Wolf Biol Direct 2009
CRISPRCas as a bona fide Lamarckian system
Koonin Wolf Biol Direct 2009
Phenomenon Biological rolefunction
Phyletic spread
Lamarckian criteria
Genomic changes caused by environmental factor
Changes are specific to relevant genomic loci
Changes provide adaptation to the causative factor
Bona fide LamarckianCRISPRCas Defense against
viruses and other mobile elements
Most of the Archaeaand many bacteria
Yes Yes Yes
piRNA Defense against transposable elements in germline
Animals Yes Yes Yes
HGT (specific cases)
Adaptation to new environment stress response resistance
Archaea bacteria unicellular eukaryotes
Yes Yes Yes
Quasi-LamarckianHGT (general phenomenon)
Diverse innovations Archaea bacteria unicellular eukaryotes
Yes No Yesno
Stress-induced mutagenesis
Stress responseresistanceadaptation to new conditions
Ubiquitous Yes No or partially
Yes (but general evolvabilityenhanced as well)
Diverse Lamarckian and quasi-Lamarckian phenomena
Koonin Wolf Biol Direct 2009
Stress as a gauge of evolutionary modality
Koonin Wolf Biol Direct 2009
bullEvolution of parasites is intrinsic to any replicator system
bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since
bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms
bullPerennial arms race between parasites and hosts is one of the principal factors of evolution
Valerian DoljaOregon State University
Yuri Wolf NCBITatiana Senkevich NIAID NIH
Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille
Didier Raoult et al
Bill MartinUniv Duesseldorf
Kira Makarova NCBI
Natural history of viral genesViral Hallmark Genes
Shared by many diverse groups of viruses from the smallest RNA viruses to the giant DNA viruses
Strong support for monophyly of all viral members of the respective gene families
Only distant homologs in cellular organisms
Can be viewed as signatures of the lsquovirus statersquo
Play major roles in genome replication packagingand assembly
4 Rolling circle replicationinitiation endonuclease
1 Jelly-roll capsid protein
2 Superfamily 3 helicase
Protein products of viral hallmark genes
3 RNA-dependent RNA polymeraseand Reverse transcriptase
5 Viral archaeo-eukaryotic DNA primase
6 UL9-like superfamily 2 helicase
7 Packaging ATPase of the FtsK family
8 ATPase subunit of terminase
The primordial gene pool hypothesis (extremely counterintuitive ndash Santiago Elena Feb 16 2011)
The hallmark genes AND by implication the major lineages of modern viruses (at least viruses of prokaryotes) descend directly from a primordial gene pool
Koonin Senkevich Dolja Biol Direct 2006 129
Viral hallmark genes arebullpresent in a huge diversity of viruses and other selfish elementsbullrepresented only by remote homologs in cellular life forms
-synergy with the diversity of genomic strategies
A crucial corollary If viruses come directly from a primordial gene pool then origin of viruses isinextricably linked to the origin of cells
Cell degeneration Escaped genes Pr imordial geneticsystems
CELL
SMALLPARASITICCELL
VERYSMALLVIRUS
CHROMOSOME
PLASMID
VIRUS
RNA
VIRUS
PRE-CELLULAR LIFE FORMS
DNA
VIRUS
mRNA
VIRUS
Competing concepts of the or igin of viruses
CELLS
Origin and evolution of virus-like genetic
elementsin the pre-cellular
era
Koonin Martin TIG 2005Koonin Senkevich Dolja Biol Direct 2006 129Koonin Ann NY Acad Sci 2009
Replicon fusionyields chromosomes
bullViruses and virus-like genetic elements are not ldquojustrdquo pathogens they aredominant entities in the biospherebullEmergence of virus-like parasites is inevitable in any replicating systembullIn the pre-cellular epoch the genetic elements that later became viral andcellular genomes comprised a single pool in which they mixed matched andevolved new increasingly complex gene ensemblesbullDifferent replication strategies including RNA replication reverse transcriptionand DNA replication evolved already in the primordial genetic poolbullWith the emergence of prokaryotic cells a distinct pool of viral genes formed that retained its identity ever since as evidenced by the extant distribution ofviral hallmark genes ldquovirus worldrdquo or the virospherebullThe emergence of the eukaryotic cell was a second melting pot of virus evolution from which viruses of eukaryotes originated via recombination of genes from prokaryote viruses retroelements and the evolving eukaryotic hostbullViruses make essential contributions to the evolution of the genomes of cellularlife forms in particular as vehicles of HGT GTAs transducing phages
The ancient Virus World (VW)
Koonin EV Senkevich TG Dolja VV The ancient Virus World and evolution of cells Biol Direct 2006 Koonin EV Wolf YI Nagasaki K Dolja VV The complexity of the virus world Nat Rev Microbiol 2009
Evolution of antivirus defense systems
bull CRISPRCas system of adaptive immunity in prokaryotes
bull A case for Lamarckian evolutionbull The perennial virus-host arms race
CRISPR repeats and Cas genesCRISPR Clustered Regularly interspaced short palindromic repeatsCas CRISPR-associated (genes)
Sorek et al Nature Rev Microbiol 2008
Makarova KS Aravind L Grishin NV Rogozin IB Koonin EVA DNA repair system specific for thermophilic Archaea and bacteria predicted by genomic context analysisNucleic Acids Res 2002 Jan 1530(2)482-96
During a systematic analysis of conserved gene context in prokaryotic genomes a previously undetected complex partially conserved neighborhood consisting of more than 20 genes was discovered in most Archaea and some bacteria including the hyperthermophiles Thermotoga maritima and Aquifex aeolicus The gene composition and gene order in this neighborhood vary greatly between species but all versions have a stable conserved core that consists of five genes One of the core genes encodes a predicted DNA helicase often fused to a predicted HD-superfamily hydrolase and another encodes a RecB family exonuclease three core genes remain uncharacterized but one of these might encode a nuclease of a new family helliphelliphelliphellipThe functional features of the proteins encoded in this neighborhood suggest that they comprise a previously undetected DNA repair system which to our knowledge is the first repair system largely specific for thermophiles to be identified
Protein components of the system an update with unification of many diverse families
~25 families altogetherFamily SubfamilyA Phyletic
distributionBComments
1 COG1518 COG1518 (cas1) All Putative novel nucleaseintegrase Mostly α-helical protein
2 COG1343 COG1343 (cas2) COG3512ygbF-like
MTH324-likey1723_N-like
All Small protein related to VapD fused to helicase (COG1203) iny1723-like proteins
3 COG1203 COG1203 (cas3) All DNA helicase Most proteins have fusion to HD nuclease
4 RecB-like nuclease
COG1468 (cas4) COG4343 All RecB-like nuclease Contains three-cysteine C-terminal cluster
5 RAMP Repair-associated mysterious
protein
COG1688 (cas5)COG1769 COG1583COG1567 COG1336COG1367 COG1604COG1337 COG1332COG5551
BH0337-likeMJ0978-likeYgcH-like
y1726-like y1727-like
All Belong to ldquoRAMPrdquo family possibly RNA-binding proteinstructurally related to duplicated ferredoxin fold (PDB 1wj9)
6 COG1857 COG1857 COG3649 YgcJ-like y1725-like
All αβ protein predicted nuclease or integrase
7 HD-like nuclease
COG1203 (N-terminus) COG2254 All HD-like nuclease
8 BH0338 BH0338-likeMTH1090-like
All mostlyarchaea and FIRM
Large Zn-finger containing proteins probably nucleases (nucleaseactivity was shown for MTH1090 (ref)
9 COG1353 predicted
polymerase
COG1353 Most archaea some bacteria
Predicted palm-domain polymerase distantly related to viral RdRp and RT
hellipand ~15 other less common proteins Makarova et al BD 2006
CRISPR clustered regularly interspaced short palindromic repeats
CRISPR repeats
TB and TA
Strepto-like
Ecoli-like
Pasteurella-like
Cas2COG1343
Cas1COG1518
Cas4COG1468
Cas3COG1203
ldquoThe common structural characteristics of CRISPR loci are (i) the presence of multiple short direct repeats which show no or very little sequence variation within a given locus (ii) the presence of non-repetitive spacer sequences between the repeats of similar size (iii) the presence of a common leader sequence of a few hundred basepairs in most species harbouring multiple CRISPR loci (iv) the absence of long open reading frames within the locus and (v) the presence of the cas1 gene accompanied by the cas2 cas3 or cas4 genes in CRISPR-containing speciesrdquo
The cas genes are our ldquorepairrdquo system
ldquoHelicaserdquo cassette
ldquoPolymeraserdquo cassette
paragon of prokaryotic genome plasticity-extensive gene shuffling-evidence of multiple horizontal transfers-widespread gene loss
RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins
extreme sequence diversity MOTIFs specific I II III IV V COGs 13361367 hhshhGs ust-lKGhh+hh hhGtt hD lGhttsGh 160413371332 y1726-like slhlpEKuVRGT lRTIDs YGuVTsGhuh COG1851 hGphpGpsaFh hGFGRh BH0337-like TpA-h+GIh-uIh hhLpDV LGsREhuht COG1567 hhhpp ups-lhtAhh luscoGhGh COG1769 hhhh+Ph-hhh ss-hhGhlhsh hGh hGtcp+hsthchp COG1688 (Cas5) hhhhhts sssshhGhlsh lGttphh COGs 15835551 hhhhoPhhl hGtppshGFGl YgcH-like hHphlh hGu+uhGhGhh y1727-like LHphLh GFsthGLStss MJ0978-like hHNH lG+tsuhGhGol
Ferredoxin foldRNA-recognition motif (RRM)
The sequence similarity space of CRISPR repeats visualized with the BioLayoutprogram [26] Dots denote individual repeat sequences connecting lines represent Smith-Waterman similarities such that closer dots represent more similar sequences Dot colors denote cluster association as derived from MCL clustering The 12 largest clusters are indicated by circles together with their sequence logos coarse phylogenetic composition and sample secondary structures where applicableKunin et al Genome Biology 2007 8R61 doi101186gb-2007-8-4-r61
CRISPR show extreme diversity and complex clustering
Arcful GTTGAAATC-------AGACCAAAATGGGATTGAAAG- Metthe GTTAAAATC-------AGACCAAAATGGGATTGAAAT- Pyraby GTTCCAATA-------AGACTAGAATAGAATTGAAAG- Pyrhor GTTTAATAA--------GACTAAAATAGAATTGAAAG- Sulsol GATTAATCC-------------CAAAAGAATTGAAAG- Pyrhor GTTTCCGTA--------GAACTAAATAGTGTGGAAAG- Pyraby GTTTCCGTA--------GAACTTAGTAGTGTGGAAAG- Pyraer GTTTCAACT------------ATCTTTTGATTTTTGG- Pictor GTTAAATAA-------TAACCTAAATAGGATTGAAAG- Theaer GTAAAATAG---------ACCTTAATAGGATTGAAAG- Metace GTTTCAATC-------CCTCAAAGGTCTGATTTTAAC- Sultok GATGAATCC------------CAAAAGGAATTGAAAG- Sulaci GTTTTAGTT-------------TCTTGTCGTTATTAC- Pictor GTTTAAGAA--------TTACTAGATAGTATGGAGT-- Halmar GTTTCAGAC-------GGACCCTTGTGGGGTTGAAGC- Metjan ATTAAAATC-------AGACCGTTTCGGAATGGAAAT- Metkan GTTTCATTA-CCCGTATTATTACGGGTTAATTGCGAG- Nanequ CTTTCAATA---------TTTCTAATATATTAGAAAC- Themar GTTTCAATA-------CTTCCTTAGAGGTATGGAAAC- Theten GTTTCAATC-------CCTTTTAGGTAGGCTAAAAAC Thethe GTTGCAAAC-CTCGTTAGCCTCGTAGAGGATTGAAAC- Aquaeo GTTTTAACT--------CCACACGGTACATTAGAAAC- Bachal GTCGCACTC----TATATGGGT-GCGTGGATTGAAAT- Azoarcus GTGTTCCCC-------GCGCATCGCGGGGGTTGAAG-- Chltep GTCTTCCCC--------ACGCC-CGTGGGGGTGTTTC- Chrvio GTGCTCCCC-------CACGCA-CGTGGGGATGAACCG Clotet GTATTAGTA-------GCACCA-TATTGGAATGTAAAT Anabaena GTTTTAATTAACAAAAATCCCTATCAGGGATTGAAAC- Myctub GTTTCCGTC--CCCTCTCGGGGTTTTGGGTCTGACGAC Yerpes GTTCACTGC--------CGCACAGGCAGCTTAGAAA-- Metcap GTTTCAATCCACTCCCGGCTATTTAGCCGGGAGATAC- Mycgal GTTTTAGCA-CTGTACAATACTTGTGTAAGCAATAAC- Pasmul GTTAACTGC--------CGTATAGGCAGCTTAGAAA-- Pasmul GTTGTAGTTCCCTCTCTCATTTCGCAGTGCTACAAT-- Pasmul GTTCACCAT--------CGTGTAGATGGCTTAGAAA--
All
A large subset of CRISPRs is conserved among diverse specieseven between archaea and bacteria suggesting that
CRISPRs are horizontally transferred together with cas genes
Mojica FJ Diez-Villasenor C Garcia-Martinez J Soria E
Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elementsJ Mol Evol 2005 Feb60(2)174-82
Here we show that CRISPR spacers derive from preexisting sequences either chromosomal or within transmissible genetic elements such as bacteriophages and conjugative plasmids
helliphelliphelliphellipThe transcription of the CRISPR loci (Tang et al 2002) suggests that such activity could be executed by CRISPR-RNA molecules acting as regulatory RNA that specifically recognizes the target through the homologous RNA-spacer sequence similarly to the eukaryotic interference RNA
HypothesisCRISPRCasbull is a prokaryotic immunity system that functions on the
RNAi principle
bull integrates short fragments of essential phageplasmid genes into CRISPRs
bull When expressed these fragments (psiRNA ndash after prokaryotic siRNA) silence the target gene and make the organism immune to the respective agent
bull contains all or most of the protein activities involved in these processes
bull Some of the Cas proteins are functional analogs of the eukaryotic proteins involved in RNAi in particular components of RISC (RNA-Induced Silencing Complex) and form prokaryotic analogs of RISC (pRISCs)
Transcription
(protein-guided) RNA folding
specific transcription factor (COG1517)
cellular RNApol
polycistronic pre-psiRNA
p-dicer (Helicase+HD-Hydrolase)(COG1203 COG2254)
RNA processing(fast)
pre-psiRNA75-100 nt
RNA processing(slow)
p-dicer
psiRNA25-45 nt
p-slicer(COG1468 COG4343
COG1857)
RAMP
target RNA cleavage
p-RISC
plasmid or phage mRNA
Annealing to RNA target
p-RISC
The basic scheme of CASS functioning
psiRNA25-45 nt
RAMP
RAMP-RNA complex(unstable)plasmid or phage
mRNA
Primer elongationCASS RNApol(COG1353)
Amplified annealing
long dsRNA (stable)
p-dicerDuplex degradation
RAMPRAMP binding
Cycle continues
Annealing to RNA target
Variant of CASS functioning with polymerasepsi-RNA amplification(mostly in thermophiles but also in Mycobacteria)
plasmid or phage mRNA
pre-psiRNA75-100 nt
CASS RNApol(COG1353)Or other RT
Reverse transcription with random copy choice
Random RNA recombination and reverse transcription
dsDNA with CRISPRs and target-derived spacers
Homologous recombination with genomic CRISPR region
integrase(COG1518)
genomic DNA
genomic DNA with new target-derived spacer
OR
New psiRNA generation
Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12
Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity
Key validation
bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease
Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes
(1) Adaptation
(2) Expression amp Processing
(3) Interference
invading virus plasmid
CRISPRleader
5 4 3 2 1
6 5 4 3 2 1
invading virus plasmid
invading virus plasmid
host
host
cas operon
Cascade
Cas3
cas3 cas2cas1cas5
Van der Oost et al TIBS 2009
The three types of CRISPRCas systems and their signature genes
1343
1343
1343
1343
Helicase1203
HD-f
Cas2
Cas6
Cas6
Cas5
Cas8
Cas9
Cas1
Cas3
Cas10
Cas11
Cas7
Cas6
Cas6
Cas5 Cas6
Cas4
Cas6
13433513
1343
RuvC-f
3513
HNH
13433513
1857
18571857 RAMP1688
RAMP1688
RecB-f1468
RecB-f1468
Helicase1203
R AMP1583
1857 1857Helicase1203
1857Helicase1203
1343 y1724
ygcKygcL
EcoliCASS2
YpestCASS3
NmeniCASS4
MtubeCASS6 RAMP module
DvulgCASS1
Tneap and HmariCASS7
ApernCASS5
Type I (Helicase cassette)
Cascade
Type II (HNH-type)
Type III (RAMP module or Polymerase cassette)
CASS4a
1857 1343RecB-f1468
Helicase1203
BH0338MTH1090LA3191
1421 RAMP1567
RAMP15833337RAMP
1583
RAMP1583
RAMP1769
SPy1049
Cas6
Cas61857
RecB-f1468
RAMP1688
Helicase1203
AF0070
R AMP1583
1343
1517
Cas12 Cas13
RAMP1688
RAMP1688
RAMP1688
Makarova et al NRM submitted
Experimental data on CRISPRCas systems
1857
1857
1857
RAMP1688
10
Cordip DIP0037Bacfra BF3951
Wolsuc WS1444Camjej Cj1522c
Neimen NMA0630Pasmul PM1126
Strthe str0658Strpyo Spy 0770
Treden TDE0328Staepi SERP2463
Mycgal MGA 0523Mycmob MMOB0320
Porgin PG1982Legpne lpp0161
Wolsuc WS1615Sulsol SSO1450
Aerper APE1240Sulsol SSO1405
Pyraer PAE0200Pyraer PAE0081
Metmaz MM0559Metmaz MM3249
Arcful AF1878Mansuc MS1635
Niteur NE0111Myctub Rv2817c
Nostoc alr0381Synech slr7071Thethe TTHB145
Chrvio CV1229Xanaxo XAC3842
Chltep CT1130Desvul DVUA0134
Metcap MCA0651Azoarc ebA3283
Deigeo_DRAFT_1682Mansuc MS0981
Baccla ABC3592Bachal BH0341
Strpyo Spy 1286Vibvul VVA1544
Nostoc alr1468Synech slr7092
Synech slr7016Geosul GSU0057
Thethe TTHB224Nostoc alr1568
Metkan MK1312
Esccol b2755Saltyp STM2938
Phopro PBPRC0034Geosul GSU1392
Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639
Symthe_STH663Corjei jk0643
Nocfar nfa44220Strave SAV7537
Metcap MCA0930Cordip DIP2214
Chrvio CV1756Zymmob ZMO0680
Pasmul PM0311Erwcar ECA3679
Yerpes y1722Legpne lpl2837Phopro PBPRB1995
Acinet ACIAD2484
Clotet CTC01148Fusnuc FN1177
Aquaeo aq 369Theten TTE2658Themar TM1797
Bacfra BF2544Porgin PG2014
Halmar pNG4053Clotet CTC01463
Metjan MJ0378Pyrhor PH1245
Thekod TK0455Metthe MTH1084
Arcful AF2435Metace MA3670
Nanequ NEQ017Pyrhor PH0173
Pictor PTO0003Pictor PTO0049Thevol TVN0106
1343
1343
1343
1343
1518
1518
1343
1518
RecB-f4343
RecB-f1468
RecB-f1468
RecB-f1468
Helicase12031857 RAMP
1688
RAMP1688
Helicase1203
Helicase1203
RAMPBH0337
RAMPYgcH
RAMPy1727
RAMPy17261857
Helicase1203
HD-f
Helicase1203
HD-f
1343
HD-f
HD-f
HD-f
RuvC-f
AF0070
BH0338
y1724
ygcKygcL
RAMP1583
HTH
AF1870
SPy1049
3574
AF1870
PH0918
HTH 3574
1857 RAMP1688
RecB-f1468
Helicase1203
HD-f
AF1870
MTH1090
AF1873
RAMP1583
3574
AF1870
3574
1343
1
7
2
3
4
5
6
7a
Sulsol SSO1999Sulsol SSO1440
Sultok ST2639Sultok ST0032
Sulsol SSO1402Pyraer PAE0068
Arcful AF1874Pyrhor PH0917Thekod TK0450
Pyraby PAB1689
13433513
3513
1518
RuvC-f
HNH
4aHNH
1 0
Pyrhor PH0162
6
64
6
4
16
17
7
7
7
7
1
2
7
7
7
7
5 7
5 75 7
5 75 7
5 7
6 1
7
Metkan MK1314
Metace MA1931
Metmaz MM3356
Metbar Mbar_A1355
Strthe stu0960Myctub Rv2823cStaepi SERP2461
Themar TM1811Mansuc MS1651
Niteur NE0123Thethe TTP0102
Metthe MTH1082
Metthe MTH326
Metjan MJ1672Pictor PTO0053Thevol TVN0112
Metkan MK1297Thethe TTP0115
Bachal BH0328Synech sll7090
Aquaeo aq 387Theten TTE2639
Themar TM1794Arcful AF1867
Pyrfur PF1129Thefus Tfu1578
Porgin PG1987Sultok ST1979
Sulsol SSO1991Sulsol SSO1729
Sultok ST0012Sulaci Saci 2046Nostoc all1479
1421 RAMP1567
RAMP1583
RAMPMTH323
RAMP1769
XMTH324
ldquoPolymeraserdquo cassette
ldquoHelicaserdquo cassette(A)
(B)
(C)
Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4
Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56
Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)
mRNA targeting
Phylogeny of Cas1 and the 3 types of CRISPRCas systems
228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes
EcoliCASS2Type IType I
Type I
Type I
Type I
Type III
Type III
Type III
Type III
Type III
Type III
Type II
YpestCASS3
NmeniCASS4
PolymeraseRAMPmoduleMtubeCASS6
PolRAMP
DvulgCASS1
TneapHmariCASS7
ApernCASS5
cas1cas9 cas2 cas4
cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI
II
IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2
RAMP
Makarova et al NRM submitted
CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes
54
256
1537
26
516
2
9
7 56107
3
10
13
383
210
46
216
8
1
7 70211
10
1
0102030405060708090
100
1623
84 6
17 7
2322
0
9
0
0
151
2
1
21
17
20 1
0
1533 28 7 14
0
9 7 40
117 210
0102030405060708090
100
Type I Type II Type IIICas1presenceabsence
Makarova et al in preparation
Modular evolution of the 3 types of CRISPRCas
Makarova et al in preparation
Mol Microbiol 2011 Jan79(2)484-502
A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair
Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF
Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair
Back to repair functions
The 3 major modalities of evolution
Koonin Wolf Biol Direct 2009
CRISPRCas as a bona fide Lamarckian system
Koonin Wolf Biol Direct 2009
Phenomenon Biological rolefunction
Phyletic spread
Lamarckian criteria
Genomic changes caused by environmental factor
Changes are specific to relevant genomic loci
Changes provide adaptation to the causative factor
Bona fide LamarckianCRISPRCas Defense against
viruses and other mobile elements
Most of the Archaeaand many bacteria
Yes Yes Yes
piRNA Defense against transposable elements in germline
Animals Yes Yes Yes
HGT (specific cases)
Adaptation to new environment stress response resistance
Archaea bacteria unicellular eukaryotes
Yes Yes Yes
Quasi-LamarckianHGT (general phenomenon)
Diverse innovations Archaea bacteria unicellular eukaryotes
Yes No Yesno
Stress-induced mutagenesis
Stress responseresistanceadaptation to new conditions
Ubiquitous Yes No or partially
Yes (but general evolvabilityenhanced as well)
Diverse Lamarckian and quasi-Lamarckian phenomena
Koonin Wolf Biol Direct 2009
Stress as a gauge of evolutionary modality
Koonin Wolf Biol Direct 2009
bullEvolution of parasites is intrinsic to any replicator system
bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since
bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms
bullPerennial arms race between parasites and hosts is one of the principal factors of evolution
Valerian DoljaOregon State University
Yuri Wolf NCBITatiana Senkevich NIAID NIH
Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille
Didier Raoult et al
Bill MartinUniv Duesseldorf
Kira Makarova NCBI
4 Rolling circle replicationinitiation endonuclease
1 Jelly-roll capsid protein
2 Superfamily 3 helicase
Protein products of viral hallmark genes
3 RNA-dependent RNA polymeraseand Reverse transcriptase
5 Viral archaeo-eukaryotic DNA primase
6 UL9-like superfamily 2 helicase
7 Packaging ATPase of the FtsK family
8 ATPase subunit of terminase
The primordial gene pool hypothesis (extremely counterintuitive ndash Santiago Elena Feb 16 2011)
The hallmark genes AND by implication the major lineages of modern viruses (at least viruses of prokaryotes) descend directly from a primordial gene pool
Koonin Senkevich Dolja Biol Direct 2006 129
Viral hallmark genes arebullpresent in a huge diversity of viruses and other selfish elementsbullrepresented only by remote homologs in cellular life forms
-synergy with the diversity of genomic strategies
A crucial corollary If viruses come directly from a primordial gene pool then origin of viruses isinextricably linked to the origin of cells
Cell degeneration Escaped genes Pr imordial geneticsystems
CELL
SMALLPARASITICCELL
VERYSMALLVIRUS
CHROMOSOME
PLASMID
VIRUS
RNA
VIRUS
PRE-CELLULAR LIFE FORMS
DNA
VIRUS
mRNA
VIRUS
Competing concepts of the or igin of viruses
CELLS
Origin and evolution of virus-like genetic
elementsin the pre-cellular
era
Koonin Martin TIG 2005Koonin Senkevich Dolja Biol Direct 2006 129Koonin Ann NY Acad Sci 2009
Replicon fusionyields chromosomes
bullViruses and virus-like genetic elements are not ldquojustrdquo pathogens they aredominant entities in the biospherebullEmergence of virus-like parasites is inevitable in any replicating systembullIn the pre-cellular epoch the genetic elements that later became viral andcellular genomes comprised a single pool in which they mixed matched andevolved new increasingly complex gene ensemblesbullDifferent replication strategies including RNA replication reverse transcriptionand DNA replication evolved already in the primordial genetic poolbullWith the emergence of prokaryotic cells a distinct pool of viral genes formed that retained its identity ever since as evidenced by the extant distribution ofviral hallmark genes ldquovirus worldrdquo or the virospherebullThe emergence of the eukaryotic cell was a second melting pot of virus evolution from which viruses of eukaryotes originated via recombination of genes from prokaryote viruses retroelements and the evolving eukaryotic hostbullViruses make essential contributions to the evolution of the genomes of cellularlife forms in particular as vehicles of HGT GTAs transducing phages
The ancient Virus World (VW)
Koonin EV Senkevich TG Dolja VV The ancient Virus World and evolution of cells Biol Direct 2006 Koonin EV Wolf YI Nagasaki K Dolja VV The complexity of the virus world Nat Rev Microbiol 2009
Evolution of antivirus defense systems
bull CRISPRCas system of adaptive immunity in prokaryotes
bull A case for Lamarckian evolutionbull The perennial virus-host arms race
CRISPR repeats and Cas genesCRISPR Clustered Regularly interspaced short palindromic repeatsCas CRISPR-associated (genes)
Sorek et al Nature Rev Microbiol 2008
Makarova KS Aravind L Grishin NV Rogozin IB Koonin EVA DNA repair system specific for thermophilic Archaea and bacteria predicted by genomic context analysisNucleic Acids Res 2002 Jan 1530(2)482-96
During a systematic analysis of conserved gene context in prokaryotic genomes a previously undetected complex partially conserved neighborhood consisting of more than 20 genes was discovered in most Archaea and some bacteria including the hyperthermophiles Thermotoga maritima and Aquifex aeolicus The gene composition and gene order in this neighborhood vary greatly between species but all versions have a stable conserved core that consists of five genes One of the core genes encodes a predicted DNA helicase often fused to a predicted HD-superfamily hydrolase and another encodes a RecB family exonuclease three core genes remain uncharacterized but one of these might encode a nuclease of a new family helliphelliphelliphellipThe functional features of the proteins encoded in this neighborhood suggest that they comprise a previously undetected DNA repair system which to our knowledge is the first repair system largely specific for thermophiles to be identified
Protein components of the system an update with unification of many diverse families
~25 families altogetherFamily SubfamilyA Phyletic
distributionBComments
1 COG1518 COG1518 (cas1) All Putative novel nucleaseintegrase Mostly α-helical protein
2 COG1343 COG1343 (cas2) COG3512ygbF-like
MTH324-likey1723_N-like
All Small protein related to VapD fused to helicase (COG1203) iny1723-like proteins
3 COG1203 COG1203 (cas3) All DNA helicase Most proteins have fusion to HD nuclease
4 RecB-like nuclease
COG1468 (cas4) COG4343 All RecB-like nuclease Contains three-cysteine C-terminal cluster
5 RAMP Repair-associated mysterious
protein
COG1688 (cas5)COG1769 COG1583COG1567 COG1336COG1367 COG1604COG1337 COG1332COG5551
BH0337-likeMJ0978-likeYgcH-like
y1726-like y1727-like
All Belong to ldquoRAMPrdquo family possibly RNA-binding proteinstructurally related to duplicated ferredoxin fold (PDB 1wj9)
6 COG1857 COG1857 COG3649 YgcJ-like y1725-like
All αβ protein predicted nuclease or integrase
7 HD-like nuclease
COG1203 (N-terminus) COG2254 All HD-like nuclease
8 BH0338 BH0338-likeMTH1090-like
All mostlyarchaea and FIRM
Large Zn-finger containing proteins probably nucleases (nucleaseactivity was shown for MTH1090 (ref)
9 COG1353 predicted
polymerase
COG1353 Most archaea some bacteria
Predicted palm-domain polymerase distantly related to viral RdRp and RT
hellipand ~15 other less common proteins Makarova et al BD 2006
CRISPR clustered regularly interspaced short palindromic repeats
CRISPR repeats
TB and TA
Strepto-like
Ecoli-like
Pasteurella-like
Cas2COG1343
Cas1COG1518
Cas4COG1468
Cas3COG1203
ldquoThe common structural characteristics of CRISPR loci are (i) the presence of multiple short direct repeats which show no or very little sequence variation within a given locus (ii) the presence of non-repetitive spacer sequences between the repeats of similar size (iii) the presence of a common leader sequence of a few hundred basepairs in most species harbouring multiple CRISPR loci (iv) the absence of long open reading frames within the locus and (v) the presence of the cas1 gene accompanied by the cas2 cas3 or cas4 genes in CRISPR-containing speciesrdquo
The cas genes are our ldquorepairrdquo system
ldquoHelicaserdquo cassette
ldquoPolymeraserdquo cassette
paragon of prokaryotic genome plasticity-extensive gene shuffling-evidence of multiple horizontal transfers-widespread gene loss
RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins
extreme sequence diversity MOTIFs specific I II III IV V COGs 13361367 hhshhGs ust-lKGhh+hh hhGtt hD lGhttsGh 160413371332 y1726-like slhlpEKuVRGT lRTIDs YGuVTsGhuh COG1851 hGphpGpsaFh hGFGRh BH0337-like TpA-h+GIh-uIh hhLpDV LGsREhuht COG1567 hhhpp ups-lhtAhh luscoGhGh COG1769 hhhh+Ph-hhh ss-hhGhlhsh hGh hGtcp+hsthchp COG1688 (Cas5) hhhhhts sssshhGhlsh lGttphh COGs 15835551 hhhhoPhhl hGtppshGFGl YgcH-like hHphlh hGu+uhGhGhh y1727-like LHphLh GFsthGLStss MJ0978-like hHNH lG+tsuhGhGol
Ferredoxin foldRNA-recognition motif (RRM)
The sequence similarity space of CRISPR repeats visualized with the BioLayoutprogram [26] Dots denote individual repeat sequences connecting lines represent Smith-Waterman similarities such that closer dots represent more similar sequences Dot colors denote cluster association as derived from MCL clustering The 12 largest clusters are indicated by circles together with their sequence logos coarse phylogenetic composition and sample secondary structures where applicableKunin et al Genome Biology 2007 8R61 doi101186gb-2007-8-4-r61
CRISPR show extreme diversity and complex clustering
Arcful GTTGAAATC-------AGACCAAAATGGGATTGAAAG- Metthe GTTAAAATC-------AGACCAAAATGGGATTGAAAT- Pyraby GTTCCAATA-------AGACTAGAATAGAATTGAAAG- Pyrhor GTTTAATAA--------GACTAAAATAGAATTGAAAG- Sulsol GATTAATCC-------------CAAAAGAATTGAAAG- Pyrhor GTTTCCGTA--------GAACTAAATAGTGTGGAAAG- Pyraby GTTTCCGTA--------GAACTTAGTAGTGTGGAAAG- Pyraer GTTTCAACT------------ATCTTTTGATTTTTGG- Pictor GTTAAATAA-------TAACCTAAATAGGATTGAAAG- Theaer GTAAAATAG---------ACCTTAATAGGATTGAAAG- Metace GTTTCAATC-------CCTCAAAGGTCTGATTTTAAC- Sultok GATGAATCC------------CAAAAGGAATTGAAAG- Sulaci GTTTTAGTT-------------TCTTGTCGTTATTAC- Pictor GTTTAAGAA--------TTACTAGATAGTATGGAGT-- Halmar GTTTCAGAC-------GGACCCTTGTGGGGTTGAAGC- Metjan ATTAAAATC-------AGACCGTTTCGGAATGGAAAT- Metkan GTTTCATTA-CCCGTATTATTACGGGTTAATTGCGAG- Nanequ CTTTCAATA---------TTTCTAATATATTAGAAAC- Themar GTTTCAATA-------CTTCCTTAGAGGTATGGAAAC- Theten GTTTCAATC-------CCTTTTAGGTAGGCTAAAAAC Thethe GTTGCAAAC-CTCGTTAGCCTCGTAGAGGATTGAAAC- Aquaeo GTTTTAACT--------CCACACGGTACATTAGAAAC- Bachal GTCGCACTC----TATATGGGT-GCGTGGATTGAAAT- Azoarcus GTGTTCCCC-------GCGCATCGCGGGGGTTGAAG-- Chltep GTCTTCCCC--------ACGCC-CGTGGGGGTGTTTC- Chrvio GTGCTCCCC-------CACGCA-CGTGGGGATGAACCG Clotet GTATTAGTA-------GCACCA-TATTGGAATGTAAAT Anabaena GTTTTAATTAACAAAAATCCCTATCAGGGATTGAAAC- Myctub GTTTCCGTC--CCCTCTCGGGGTTTTGGGTCTGACGAC Yerpes GTTCACTGC--------CGCACAGGCAGCTTAGAAA-- Metcap GTTTCAATCCACTCCCGGCTATTTAGCCGGGAGATAC- Mycgal GTTTTAGCA-CTGTACAATACTTGTGTAAGCAATAAC- Pasmul GTTAACTGC--------CGTATAGGCAGCTTAGAAA-- Pasmul GTTGTAGTTCCCTCTCTCATTTCGCAGTGCTACAAT-- Pasmul GTTCACCAT--------CGTGTAGATGGCTTAGAAA--
All
A large subset of CRISPRs is conserved among diverse specieseven between archaea and bacteria suggesting that
CRISPRs are horizontally transferred together with cas genes
Mojica FJ Diez-Villasenor C Garcia-Martinez J Soria E
Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elementsJ Mol Evol 2005 Feb60(2)174-82
Here we show that CRISPR spacers derive from preexisting sequences either chromosomal or within transmissible genetic elements such as bacteriophages and conjugative plasmids
helliphelliphelliphellipThe transcription of the CRISPR loci (Tang et al 2002) suggests that such activity could be executed by CRISPR-RNA molecules acting as regulatory RNA that specifically recognizes the target through the homologous RNA-spacer sequence similarly to the eukaryotic interference RNA
HypothesisCRISPRCasbull is a prokaryotic immunity system that functions on the
RNAi principle
bull integrates short fragments of essential phageplasmid genes into CRISPRs
bull When expressed these fragments (psiRNA ndash after prokaryotic siRNA) silence the target gene and make the organism immune to the respective agent
bull contains all or most of the protein activities involved in these processes
bull Some of the Cas proteins are functional analogs of the eukaryotic proteins involved in RNAi in particular components of RISC (RNA-Induced Silencing Complex) and form prokaryotic analogs of RISC (pRISCs)
Transcription
(protein-guided) RNA folding
specific transcription factor (COG1517)
cellular RNApol
polycistronic pre-psiRNA
p-dicer (Helicase+HD-Hydrolase)(COG1203 COG2254)
RNA processing(fast)
pre-psiRNA75-100 nt
RNA processing(slow)
p-dicer
psiRNA25-45 nt
p-slicer(COG1468 COG4343
COG1857)
RAMP
target RNA cleavage
p-RISC
plasmid or phage mRNA
Annealing to RNA target
p-RISC
The basic scheme of CASS functioning
psiRNA25-45 nt
RAMP
RAMP-RNA complex(unstable)plasmid or phage
mRNA
Primer elongationCASS RNApol(COG1353)
Amplified annealing
long dsRNA (stable)
p-dicerDuplex degradation
RAMPRAMP binding
Cycle continues
Annealing to RNA target
Variant of CASS functioning with polymerasepsi-RNA amplification(mostly in thermophiles but also in Mycobacteria)
plasmid or phage mRNA
pre-psiRNA75-100 nt
CASS RNApol(COG1353)Or other RT
Reverse transcription with random copy choice
Random RNA recombination and reverse transcription
dsDNA with CRISPRs and target-derived spacers
Homologous recombination with genomic CRISPR region
integrase(COG1518)
genomic DNA
genomic DNA with new target-derived spacer
OR
New psiRNA generation
Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12
Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity
Key validation
bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease
Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes
(1) Adaptation
(2) Expression amp Processing
(3) Interference
invading virus plasmid
CRISPRleader
5 4 3 2 1
6 5 4 3 2 1
invading virus plasmid
invading virus plasmid
host
host
cas operon
Cascade
Cas3
cas3 cas2cas1cas5
Van der Oost et al TIBS 2009
The three types of CRISPRCas systems and their signature genes
1343
1343
1343
1343
Helicase1203
HD-f
Cas2
Cas6
Cas6
Cas5
Cas8
Cas9
Cas1
Cas3
Cas10
Cas11
Cas7
Cas6
Cas6
Cas5 Cas6
Cas4
Cas6
13433513
1343
RuvC-f
3513
HNH
13433513
1857
18571857 RAMP1688
RAMP1688
RecB-f1468
RecB-f1468
Helicase1203
R AMP1583
1857 1857Helicase1203
1857Helicase1203
1343 y1724
ygcKygcL
EcoliCASS2
YpestCASS3
NmeniCASS4
MtubeCASS6 RAMP module
DvulgCASS1
Tneap and HmariCASS7
ApernCASS5
Type I (Helicase cassette)
Cascade
Type II (HNH-type)
Type III (RAMP module or Polymerase cassette)
CASS4a
1857 1343RecB-f1468
Helicase1203
BH0338MTH1090LA3191
1421 RAMP1567
RAMP15833337RAMP
1583
RAMP1583
RAMP1769
SPy1049
Cas6
Cas61857
RecB-f1468
RAMP1688
Helicase1203
AF0070
R AMP1583
1343
1517
Cas12 Cas13
RAMP1688
RAMP1688
RAMP1688
Makarova et al NRM submitted
Experimental data on CRISPRCas systems
1857
1857
1857
RAMP1688
10
Cordip DIP0037Bacfra BF3951
Wolsuc WS1444Camjej Cj1522c
Neimen NMA0630Pasmul PM1126
Strthe str0658Strpyo Spy 0770
Treden TDE0328Staepi SERP2463
Mycgal MGA 0523Mycmob MMOB0320
Porgin PG1982Legpne lpp0161
Wolsuc WS1615Sulsol SSO1450
Aerper APE1240Sulsol SSO1405
Pyraer PAE0200Pyraer PAE0081
Metmaz MM0559Metmaz MM3249
Arcful AF1878Mansuc MS1635
Niteur NE0111Myctub Rv2817c
Nostoc alr0381Synech slr7071Thethe TTHB145
Chrvio CV1229Xanaxo XAC3842
Chltep CT1130Desvul DVUA0134
Metcap MCA0651Azoarc ebA3283
Deigeo_DRAFT_1682Mansuc MS0981
Baccla ABC3592Bachal BH0341
Strpyo Spy 1286Vibvul VVA1544
Nostoc alr1468Synech slr7092
Synech slr7016Geosul GSU0057
Thethe TTHB224Nostoc alr1568
Metkan MK1312
Esccol b2755Saltyp STM2938
Phopro PBPRC0034Geosul GSU1392
Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639
Symthe_STH663Corjei jk0643
Nocfar nfa44220Strave SAV7537
Metcap MCA0930Cordip DIP2214
Chrvio CV1756Zymmob ZMO0680
Pasmul PM0311Erwcar ECA3679
Yerpes y1722Legpne lpl2837Phopro PBPRB1995
Acinet ACIAD2484
Clotet CTC01148Fusnuc FN1177
Aquaeo aq 369Theten TTE2658Themar TM1797
Bacfra BF2544Porgin PG2014
Halmar pNG4053Clotet CTC01463
Metjan MJ0378Pyrhor PH1245
Thekod TK0455Metthe MTH1084
Arcful AF2435Metace MA3670
Nanequ NEQ017Pyrhor PH0173
Pictor PTO0003Pictor PTO0049Thevol TVN0106
1343
1343
1343
1343
1518
1518
1343
1518
RecB-f4343
RecB-f1468
RecB-f1468
RecB-f1468
Helicase12031857 RAMP
1688
RAMP1688
Helicase1203
Helicase1203
RAMPBH0337
RAMPYgcH
RAMPy1727
RAMPy17261857
Helicase1203
HD-f
Helicase1203
HD-f
1343
HD-f
HD-f
HD-f
RuvC-f
AF0070
BH0338
y1724
ygcKygcL
RAMP1583
HTH
AF1870
SPy1049
3574
AF1870
PH0918
HTH 3574
1857 RAMP1688
RecB-f1468
Helicase1203
HD-f
AF1870
MTH1090
AF1873
RAMP1583
3574
AF1870
3574
1343
1
7
2
3
4
5
6
7a
Sulsol SSO1999Sulsol SSO1440
Sultok ST2639Sultok ST0032
Sulsol SSO1402Pyraer PAE0068
Arcful AF1874Pyrhor PH0917Thekod TK0450
Pyraby PAB1689
13433513
3513
1518
RuvC-f
HNH
4aHNH
1 0
Pyrhor PH0162
6
64
6
4
16
17
7
7
7
7
1
2
7
7
7
7
5 7
5 75 7
5 75 7
5 7
6 1
7
Metkan MK1314
Metace MA1931
Metmaz MM3356
Metbar Mbar_A1355
Strthe stu0960Myctub Rv2823cStaepi SERP2461
Themar TM1811Mansuc MS1651
Niteur NE0123Thethe TTP0102
Metthe MTH1082
Metthe MTH326
Metjan MJ1672Pictor PTO0053Thevol TVN0112
Metkan MK1297Thethe TTP0115
Bachal BH0328Synech sll7090
Aquaeo aq 387Theten TTE2639
Themar TM1794Arcful AF1867
Pyrfur PF1129Thefus Tfu1578
Porgin PG1987Sultok ST1979
Sulsol SSO1991Sulsol SSO1729
Sultok ST0012Sulaci Saci 2046Nostoc all1479
1421 RAMP1567
RAMP1583
RAMPMTH323
RAMP1769
XMTH324
ldquoPolymeraserdquo cassette
ldquoHelicaserdquo cassette(A)
(B)
(C)
Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4
Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56
Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)
mRNA targeting
Phylogeny of Cas1 and the 3 types of CRISPRCas systems
228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes
EcoliCASS2Type IType I
Type I
Type I
Type I
Type III
Type III
Type III
Type III
Type III
Type III
Type II
YpestCASS3
NmeniCASS4
PolymeraseRAMPmoduleMtubeCASS6
PolRAMP
DvulgCASS1
TneapHmariCASS7
ApernCASS5
cas1cas9 cas2 cas4
cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI
II
IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2
RAMP
Makarova et al NRM submitted
CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes
54
256
1537
26
516
2
9
7 56107
3
10
13
383
210
46
216
8
1
7 70211
10
1
0102030405060708090
100
1623
84 6
17 7
2322
0
9
0
0
151
2
1
21
17
20 1
0
1533 28 7 14
0
9 7 40
117 210
0102030405060708090
100
Type I Type II Type IIICas1presenceabsence
Makarova et al in preparation
Modular evolution of the 3 types of CRISPRCas
Makarova et al in preparation
Mol Microbiol 2011 Jan79(2)484-502
A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair
Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF
Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair
Back to repair functions
The 3 major modalities of evolution
Koonin Wolf Biol Direct 2009
CRISPRCas as a bona fide Lamarckian system
Koonin Wolf Biol Direct 2009
Phenomenon Biological rolefunction
Phyletic spread
Lamarckian criteria
Genomic changes caused by environmental factor
Changes are specific to relevant genomic loci
Changes provide adaptation to the causative factor
Bona fide LamarckianCRISPRCas Defense against
viruses and other mobile elements
Most of the Archaeaand many bacteria
Yes Yes Yes
piRNA Defense against transposable elements in germline
Animals Yes Yes Yes
HGT (specific cases)
Adaptation to new environment stress response resistance
Archaea bacteria unicellular eukaryotes
Yes Yes Yes
Quasi-LamarckianHGT (general phenomenon)
Diverse innovations Archaea bacteria unicellular eukaryotes
Yes No Yesno
Stress-induced mutagenesis
Stress responseresistanceadaptation to new conditions
Ubiquitous Yes No or partially
Yes (but general evolvabilityenhanced as well)
Diverse Lamarckian and quasi-Lamarckian phenomena
Koonin Wolf Biol Direct 2009
Stress as a gauge of evolutionary modality
Koonin Wolf Biol Direct 2009
bullEvolution of parasites is intrinsic to any replicator system
bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since
bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms
bullPerennial arms race between parasites and hosts is one of the principal factors of evolution
Valerian DoljaOregon State University
Yuri Wolf NCBITatiana Senkevich NIAID NIH
Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille
Didier Raoult et al
Bill MartinUniv Duesseldorf
Kira Makarova NCBI
The primordial gene pool hypothesis (extremely counterintuitive ndash Santiago Elena Feb 16 2011)
The hallmark genes AND by implication the major lineages of modern viruses (at least viruses of prokaryotes) descend directly from a primordial gene pool
Koonin Senkevich Dolja Biol Direct 2006 129
Viral hallmark genes arebullpresent in a huge diversity of viruses and other selfish elementsbullrepresented only by remote homologs in cellular life forms
-synergy with the diversity of genomic strategies
A crucial corollary If viruses come directly from a primordial gene pool then origin of viruses isinextricably linked to the origin of cells
Cell degeneration Escaped genes Pr imordial geneticsystems
CELL
SMALLPARASITICCELL
VERYSMALLVIRUS
CHROMOSOME
PLASMID
VIRUS
RNA
VIRUS
PRE-CELLULAR LIFE FORMS
DNA
VIRUS
mRNA
VIRUS
Competing concepts of the or igin of viruses
CELLS
Origin and evolution of virus-like genetic
elementsin the pre-cellular
era
Koonin Martin TIG 2005Koonin Senkevich Dolja Biol Direct 2006 129Koonin Ann NY Acad Sci 2009
Replicon fusionyields chromosomes
bullViruses and virus-like genetic elements are not ldquojustrdquo pathogens they aredominant entities in the biospherebullEmergence of virus-like parasites is inevitable in any replicating systembullIn the pre-cellular epoch the genetic elements that later became viral andcellular genomes comprised a single pool in which they mixed matched andevolved new increasingly complex gene ensemblesbullDifferent replication strategies including RNA replication reverse transcriptionand DNA replication evolved already in the primordial genetic poolbullWith the emergence of prokaryotic cells a distinct pool of viral genes formed that retained its identity ever since as evidenced by the extant distribution ofviral hallmark genes ldquovirus worldrdquo or the virospherebullThe emergence of the eukaryotic cell was a second melting pot of virus evolution from which viruses of eukaryotes originated via recombination of genes from prokaryote viruses retroelements and the evolving eukaryotic hostbullViruses make essential contributions to the evolution of the genomes of cellularlife forms in particular as vehicles of HGT GTAs transducing phages
The ancient Virus World (VW)
Koonin EV Senkevich TG Dolja VV The ancient Virus World and evolution of cells Biol Direct 2006 Koonin EV Wolf YI Nagasaki K Dolja VV The complexity of the virus world Nat Rev Microbiol 2009
Evolution of antivirus defense systems
bull CRISPRCas system of adaptive immunity in prokaryotes
bull A case for Lamarckian evolutionbull The perennial virus-host arms race
CRISPR repeats and Cas genesCRISPR Clustered Regularly interspaced short palindromic repeatsCas CRISPR-associated (genes)
Sorek et al Nature Rev Microbiol 2008
Makarova KS Aravind L Grishin NV Rogozin IB Koonin EVA DNA repair system specific for thermophilic Archaea and bacteria predicted by genomic context analysisNucleic Acids Res 2002 Jan 1530(2)482-96
During a systematic analysis of conserved gene context in prokaryotic genomes a previously undetected complex partially conserved neighborhood consisting of more than 20 genes was discovered in most Archaea and some bacteria including the hyperthermophiles Thermotoga maritima and Aquifex aeolicus The gene composition and gene order in this neighborhood vary greatly between species but all versions have a stable conserved core that consists of five genes One of the core genes encodes a predicted DNA helicase often fused to a predicted HD-superfamily hydrolase and another encodes a RecB family exonuclease three core genes remain uncharacterized but one of these might encode a nuclease of a new family helliphelliphelliphellipThe functional features of the proteins encoded in this neighborhood suggest that they comprise a previously undetected DNA repair system which to our knowledge is the first repair system largely specific for thermophiles to be identified
Protein components of the system an update with unification of many diverse families
~25 families altogetherFamily SubfamilyA Phyletic
distributionBComments
1 COG1518 COG1518 (cas1) All Putative novel nucleaseintegrase Mostly α-helical protein
2 COG1343 COG1343 (cas2) COG3512ygbF-like
MTH324-likey1723_N-like
All Small protein related to VapD fused to helicase (COG1203) iny1723-like proteins
3 COG1203 COG1203 (cas3) All DNA helicase Most proteins have fusion to HD nuclease
4 RecB-like nuclease
COG1468 (cas4) COG4343 All RecB-like nuclease Contains three-cysteine C-terminal cluster
5 RAMP Repair-associated mysterious
protein
COG1688 (cas5)COG1769 COG1583COG1567 COG1336COG1367 COG1604COG1337 COG1332COG5551
BH0337-likeMJ0978-likeYgcH-like
y1726-like y1727-like
All Belong to ldquoRAMPrdquo family possibly RNA-binding proteinstructurally related to duplicated ferredoxin fold (PDB 1wj9)
6 COG1857 COG1857 COG3649 YgcJ-like y1725-like
All αβ protein predicted nuclease or integrase
7 HD-like nuclease
COG1203 (N-terminus) COG2254 All HD-like nuclease
8 BH0338 BH0338-likeMTH1090-like
All mostlyarchaea and FIRM
Large Zn-finger containing proteins probably nucleases (nucleaseactivity was shown for MTH1090 (ref)
9 COG1353 predicted
polymerase
COG1353 Most archaea some bacteria
Predicted palm-domain polymerase distantly related to viral RdRp and RT
hellipand ~15 other less common proteins Makarova et al BD 2006
CRISPR clustered regularly interspaced short palindromic repeats
CRISPR repeats
TB and TA
Strepto-like
Ecoli-like
Pasteurella-like
Cas2COG1343
Cas1COG1518
Cas4COG1468
Cas3COG1203
ldquoThe common structural characteristics of CRISPR loci are (i) the presence of multiple short direct repeats which show no or very little sequence variation within a given locus (ii) the presence of non-repetitive spacer sequences between the repeats of similar size (iii) the presence of a common leader sequence of a few hundred basepairs in most species harbouring multiple CRISPR loci (iv) the absence of long open reading frames within the locus and (v) the presence of the cas1 gene accompanied by the cas2 cas3 or cas4 genes in CRISPR-containing speciesrdquo
The cas genes are our ldquorepairrdquo system
ldquoHelicaserdquo cassette
ldquoPolymeraserdquo cassette
paragon of prokaryotic genome plasticity-extensive gene shuffling-evidence of multiple horizontal transfers-widespread gene loss
RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins
extreme sequence diversity MOTIFs specific I II III IV V COGs 13361367 hhshhGs ust-lKGhh+hh hhGtt hD lGhttsGh 160413371332 y1726-like slhlpEKuVRGT lRTIDs YGuVTsGhuh COG1851 hGphpGpsaFh hGFGRh BH0337-like TpA-h+GIh-uIh hhLpDV LGsREhuht COG1567 hhhpp ups-lhtAhh luscoGhGh COG1769 hhhh+Ph-hhh ss-hhGhlhsh hGh hGtcp+hsthchp COG1688 (Cas5) hhhhhts sssshhGhlsh lGttphh COGs 15835551 hhhhoPhhl hGtppshGFGl YgcH-like hHphlh hGu+uhGhGhh y1727-like LHphLh GFsthGLStss MJ0978-like hHNH lG+tsuhGhGol
Ferredoxin foldRNA-recognition motif (RRM)
The sequence similarity space of CRISPR repeats visualized with the BioLayoutprogram [26] Dots denote individual repeat sequences connecting lines represent Smith-Waterman similarities such that closer dots represent more similar sequences Dot colors denote cluster association as derived from MCL clustering The 12 largest clusters are indicated by circles together with their sequence logos coarse phylogenetic composition and sample secondary structures where applicableKunin et al Genome Biology 2007 8R61 doi101186gb-2007-8-4-r61
CRISPR show extreme diversity and complex clustering
Arcful GTTGAAATC-------AGACCAAAATGGGATTGAAAG- Metthe GTTAAAATC-------AGACCAAAATGGGATTGAAAT- Pyraby GTTCCAATA-------AGACTAGAATAGAATTGAAAG- Pyrhor GTTTAATAA--------GACTAAAATAGAATTGAAAG- Sulsol GATTAATCC-------------CAAAAGAATTGAAAG- Pyrhor GTTTCCGTA--------GAACTAAATAGTGTGGAAAG- Pyraby GTTTCCGTA--------GAACTTAGTAGTGTGGAAAG- Pyraer GTTTCAACT------------ATCTTTTGATTTTTGG- Pictor GTTAAATAA-------TAACCTAAATAGGATTGAAAG- Theaer GTAAAATAG---------ACCTTAATAGGATTGAAAG- Metace GTTTCAATC-------CCTCAAAGGTCTGATTTTAAC- Sultok GATGAATCC------------CAAAAGGAATTGAAAG- Sulaci GTTTTAGTT-------------TCTTGTCGTTATTAC- Pictor GTTTAAGAA--------TTACTAGATAGTATGGAGT-- Halmar GTTTCAGAC-------GGACCCTTGTGGGGTTGAAGC- Metjan ATTAAAATC-------AGACCGTTTCGGAATGGAAAT- Metkan GTTTCATTA-CCCGTATTATTACGGGTTAATTGCGAG- Nanequ CTTTCAATA---------TTTCTAATATATTAGAAAC- Themar GTTTCAATA-------CTTCCTTAGAGGTATGGAAAC- Theten GTTTCAATC-------CCTTTTAGGTAGGCTAAAAAC Thethe GTTGCAAAC-CTCGTTAGCCTCGTAGAGGATTGAAAC- Aquaeo GTTTTAACT--------CCACACGGTACATTAGAAAC- Bachal GTCGCACTC----TATATGGGT-GCGTGGATTGAAAT- Azoarcus GTGTTCCCC-------GCGCATCGCGGGGGTTGAAG-- Chltep GTCTTCCCC--------ACGCC-CGTGGGGGTGTTTC- Chrvio GTGCTCCCC-------CACGCA-CGTGGGGATGAACCG Clotet GTATTAGTA-------GCACCA-TATTGGAATGTAAAT Anabaena GTTTTAATTAACAAAAATCCCTATCAGGGATTGAAAC- Myctub GTTTCCGTC--CCCTCTCGGGGTTTTGGGTCTGACGAC Yerpes GTTCACTGC--------CGCACAGGCAGCTTAGAAA-- Metcap GTTTCAATCCACTCCCGGCTATTTAGCCGGGAGATAC- Mycgal GTTTTAGCA-CTGTACAATACTTGTGTAAGCAATAAC- Pasmul GTTAACTGC--------CGTATAGGCAGCTTAGAAA-- Pasmul GTTGTAGTTCCCTCTCTCATTTCGCAGTGCTACAAT-- Pasmul GTTCACCAT--------CGTGTAGATGGCTTAGAAA--
All
A large subset of CRISPRs is conserved among diverse specieseven between archaea and bacteria suggesting that
CRISPRs are horizontally transferred together with cas genes
Mojica FJ Diez-Villasenor C Garcia-Martinez J Soria E
Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elementsJ Mol Evol 2005 Feb60(2)174-82
Here we show that CRISPR spacers derive from preexisting sequences either chromosomal or within transmissible genetic elements such as bacteriophages and conjugative plasmids
helliphelliphelliphellipThe transcription of the CRISPR loci (Tang et al 2002) suggests that such activity could be executed by CRISPR-RNA molecules acting as regulatory RNA that specifically recognizes the target through the homologous RNA-spacer sequence similarly to the eukaryotic interference RNA
HypothesisCRISPRCasbull is a prokaryotic immunity system that functions on the
RNAi principle
bull integrates short fragments of essential phageplasmid genes into CRISPRs
bull When expressed these fragments (psiRNA ndash after prokaryotic siRNA) silence the target gene and make the organism immune to the respective agent
bull contains all or most of the protein activities involved in these processes
bull Some of the Cas proteins are functional analogs of the eukaryotic proteins involved in RNAi in particular components of RISC (RNA-Induced Silencing Complex) and form prokaryotic analogs of RISC (pRISCs)
Transcription
(protein-guided) RNA folding
specific transcription factor (COG1517)
cellular RNApol
polycistronic pre-psiRNA
p-dicer (Helicase+HD-Hydrolase)(COG1203 COG2254)
RNA processing(fast)
pre-psiRNA75-100 nt
RNA processing(slow)
p-dicer
psiRNA25-45 nt
p-slicer(COG1468 COG4343
COG1857)
RAMP
target RNA cleavage
p-RISC
plasmid or phage mRNA
Annealing to RNA target
p-RISC
The basic scheme of CASS functioning
psiRNA25-45 nt
RAMP
RAMP-RNA complex(unstable)plasmid or phage
mRNA
Primer elongationCASS RNApol(COG1353)
Amplified annealing
long dsRNA (stable)
p-dicerDuplex degradation
RAMPRAMP binding
Cycle continues
Annealing to RNA target
Variant of CASS functioning with polymerasepsi-RNA amplification(mostly in thermophiles but also in Mycobacteria)
plasmid or phage mRNA
pre-psiRNA75-100 nt
CASS RNApol(COG1353)Or other RT
Reverse transcription with random copy choice
Random RNA recombination and reverse transcription
dsDNA with CRISPRs and target-derived spacers
Homologous recombination with genomic CRISPR region
integrase(COG1518)
genomic DNA
genomic DNA with new target-derived spacer
OR
New psiRNA generation
Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12
Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity
Key validation
bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease
Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes
(1) Adaptation
(2) Expression amp Processing
(3) Interference
invading virus plasmid
CRISPRleader
5 4 3 2 1
6 5 4 3 2 1
invading virus plasmid
invading virus plasmid
host
host
cas operon
Cascade
Cas3
cas3 cas2cas1cas5
Van der Oost et al TIBS 2009
The three types of CRISPRCas systems and their signature genes
1343
1343
1343
1343
Helicase1203
HD-f
Cas2
Cas6
Cas6
Cas5
Cas8
Cas9
Cas1
Cas3
Cas10
Cas11
Cas7
Cas6
Cas6
Cas5 Cas6
Cas4
Cas6
13433513
1343
RuvC-f
3513
HNH
13433513
1857
18571857 RAMP1688
RAMP1688
RecB-f1468
RecB-f1468
Helicase1203
R AMP1583
1857 1857Helicase1203
1857Helicase1203
1343 y1724
ygcKygcL
EcoliCASS2
YpestCASS3
NmeniCASS4
MtubeCASS6 RAMP module
DvulgCASS1
Tneap and HmariCASS7
ApernCASS5
Type I (Helicase cassette)
Cascade
Type II (HNH-type)
Type III (RAMP module or Polymerase cassette)
CASS4a
1857 1343RecB-f1468
Helicase1203
BH0338MTH1090LA3191
1421 RAMP1567
RAMP15833337RAMP
1583
RAMP1583
RAMP1769
SPy1049
Cas6
Cas61857
RecB-f1468
RAMP1688
Helicase1203
AF0070
R AMP1583
1343
1517
Cas12 Cas13
RAMP1688
RAMP1688
RAMP1688
Makarova et al NRM submitted
Experimental data on CRISPRCas systems
1857
1857
1857
RAMP1688
10
Cordip DIP0037Bacfra BF3951
Wolsuc WS1444Camjej Cj1522c
Neimen NMA0630Pasmul PM1126
Strthe str0658Strpyo Spy 0770
Treden TDE0328Staepi SERP2463
Mycgal MGA 0523Mycmob MMOB0320
Porgin PG1982Legpne lpp0161
Wolsuc WS1615Sulsol SSO1450
Aerper APE1240Sulsol SSO1405
Pyraer PAE0200Pyraer PAE0081
Metmaz MM0559Metmaz MM3249
Arcful AF1878Mansuc MS1635
Niteur NE0111Myctub Rv2817c
Nostoc alr0381Synech slr7071Thethe TTHB145
Chrvio CV1229Xanaxo XAC3842
Chltep CT1130Desvul DVUA0134
Metcap MCA0651Azoarc ebA3283
Deigeo_DRAFT_1682Mansuc MS0981
Baccla ABC3592Bachal BH0341
Strpyo Spy 1286Vibvul VVA1544
Nostoc alr1468Synech slr7092
Synech slr7016Geosul GSU0057
Thethe TTHB224Nostoc alr1568
Metkan MK1312
Esccol b2755Saltyp STM2938
Phopro PBPRC0034Geosul GSU1392
Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639
Symthe_STH663Corjei jk0643
Nocfar nfa44220Strave SAV7537
Metcap MCA0930Cordip DIP2214
Chrvio CV1756Zymmob ZMO0680
Pasmul PM0311Erwcar ECA3679
Yerpes y1722Legpne lpl2837Phopro PBPRB1995
Acinet ACIAD2484
Clotet CTC01148Fusnuc FN1177
Aquaeo aq 369Theten TTE2658Themar TM1797
Bacfra BF2544Porgin PG2014
Halmar pNG4053Clotet CTC01463
Metjan MJ0378Pyrhor PH1245
Thekod TK0455Metthe MTH1084
Arcful AF2435Metace MA3670
Nanequ NEQ017Pyrhor PH0173
Pictor PTO0003Pictor PTO0049Thevol TVN0106
1343
1343
1343
1343
1518
1518
1343
1518
RecB-f4343
RecB-f1468
RecB-f1468
RecB-f1468
Helicase12031857 RAMP
1688
RAMP1688
Helicase1203
Helicase1203
RAMPBH0337
RAMPYgcH
RAMPy1727
RAMPy17261857
Helicase1203
HD-f
Helicase1203
HD-f
1343
HD-f
HD-f
HD-f
RuvC-f
AF0070
BH0338
y1724
ygcKygcL
RAMP1583
HTH
AF1870
SPy1049
3574
AF1870
PH0918
HTH 3574
1857 RAMP1688
RecB-f1468
Helicase1203
HD-f
AF1870
MTH1090
AF1873
RAMP1583
3574
AF1870
3574
1343
1
7
2
3
4
5
6
7a
Sulsol SSO1999Sulsol SSO1440
Sultok ST2639Sultok ST0032
Sulsol SSO1402Pyraer PAE0068
Arcful AF1874Pyrhor PH0917Thekod TK0450
Pyraby PAB1689
13433513
3513
1518
RuvC-f
HNH
4aHNH
1 0
Pyrhor PH0162
6
64
6
4
16
17
7
7
7
7
1
2
7
7
7
7
5 7
5 75 7
5 75 7
5 7
6 1
7
Metkan MK1314
Metace MA1931
Metmaz MM3356
Metbar Mbar_A1355
Strthe stu0960Myctub Rv2823cStaepi SERP2461
Themar TM1811Mansuc MS1651
Niteur NE0123Thethe TTP0102
Metthe MTH1082
Metthe MTH326
Metjan MJ1672Pictor PTO0053Thevol TVN0112
Metkan MK1297Thethe TTP0115
Bachal BH0328Synech sll7090
Aquaeo aq 387Theten TTE2639
Themar TM1794Arcful AF1867
Pyrfur PF1129Thefus Tfu1578
Porgin PG1987Sultok ST1979
Sulsol SSO1991Sulsol SSO1729
Sultok ST0012Sulaci Saci 2046Nostoc all1479
1421 RAMP1567
RAMP1583
RAMPMTH323
RAMP1769
XMTH324
ldquoPolymeraserdquo cassette
ldquoHelicaserdquo cassette(A)
(B)
(C)
Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4
Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56
Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)
mRNA targeting
Phylogeny of Cas1 and the 3 types of CRISPRCas systems
228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes
EcoliCASS2Type IType I
Type I
Type I
Type I
Type III
Type III
Type III
Type III
Type III
Type III
Type II
YpestCASS3
NmeniCASS4
PolymeraseRAMPmoduleMtubeCASS6
PolRAMP
DvulgCASS1
TneapHmariCASS7
ApernCASS5
cas1cas9 cas2 cas4
cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI
II
IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2
RAMP
Makarova et al NRM submitted
CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes
54
256
1537
26
516
2
9
7 56107
3
10
13
383
210
46
216
8
1
7 70211
10
1
0102030405060708090
100
1623
84 6
17 7
2322
0
9
0
0
151
2
1
21
17
20 1
0
1533 28 7 14
0
9 7 40
117 210
0102030405060708090
100
Type I Type II Type IIICas1presenceabsence
Makarova et al in preparation
Modular evolution of the 3 types of CRISPRCas
Makarova et al in preparation
Mol Microbiol 2011 Jan79(2)484-502
A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair
Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF
Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair
Back to repair functions
The 3 major modalities of evolution
Koonin Wolf Biol Direct 2009
CRISPRCas as a bona fide Lamarckian system
Koonin Wolf Biol Direct 2009
Phenomenon Biological rolefunction
Phyletic spread
Lamarckian criteria
Genomic changes caused by environmental factor
Changes are specific to relevant genomic loci
Changes provide adaptation to the causative factor
Bona fide LamarckianCRISPRCas Defense against
viruses and other mobile elements
Most of the Archaeaand many bacteria
Yes Yes Yes
piRNA Defense against transposable elements in germline
Animals Yes Yes Yes
HGT (specific cases)
Adaptation to new environment stress response resistance
Archaea bacteria unicellular eukaryotes
Yes Yes Yes
Quasi-LamarckianHGT (general phenomenon)
Diverse innovations Archaea bacteria unicellular eukaryotes
Yes No Yesno
Stress-induced mutagenesis
Stress responseresistanceadaptation to new conditions
Ubiquitous Yes No or partially
Yes (but general evolvabilityenhanced as well)
Diverse Lamarckian and quasi-Lamarckian phenomena
Koonin Wolf Biol Direct 2009
Stress as a gauge of evolutionary modality
Koonin Wolf Biol Direct 2009
bullEvolution of parasites is intrinsic to any replicator system
bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since
bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms
bullPerennial arms race between parasites and hosts is one of the principal factors of evolution
Valerian DoljaOregon State University
Yuri Wolf NCBITatiana Senkevich NIAID NIH
Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille
Didier Raoult et al
Bill MartinUniv Duesseldorf
Kira Makarova NCBI
Cell degeneration Escaped genes Pr imordial geneticsystems
CELL
SMALLPARASITICCELL
VERYSMALLVIRUS
CHROMOSOME
PLASMID
VIRUS
RNA
VIRUS
PRE-CELLULAR LIFE FORMS
DNA
VIRUS
mRNA
VIRUS
Competing concepts of the or igin of viruses
CELLS
Origin and evolution of virus-like genetic
elementsin the pre-cellular
era
Koonin Martin TIG 2005Koonin Senkevich Dolja Biol Direct 2006 129Koonin Ann NY Acad Sci 2009
Replicon fusionyields chromosomes
bullViruses and virus-like genetic elements are not ldquojustrdquo pathogens they aredominant entities in the biospherebullEmergence of virus-like parasites is inevitable in any replicating systembullIn the pre-cellular epoch the genetic elements that later became viral andcellular genomes comprised a single pool in which they mixed matched andevolved new increasingly complex gene ensemblesbullDifferent replication strategies including RNA replication reverse transcriptionand DNA replication evolved already in the primordial genetic poolbullWith the emergence of prokaryotic cells a distinct pool of viral genes formed that retained its identity ever since as evidenced by the extant distribution ofviral hallmark genes ldquovirus worldrdquo or the virospherebullThe emergence of the eukaryotic cell was a second melting pot of virus evolution from which viruses of eukaryotes originated via recombination of genes from prokaryote viruses retroelements and the evolving eukaryotic hostbullViruses make essential contributions to the evolution of the genomes of cellularlife forms in particular as vehicles of HGT GTAs transducing phages
The ancient Virus World (VW)
Koonin EV Senkevich TG Dolja VV The ancient Virus World and evolution of cells Biol Direct 2006 Koonin EV Wolf YI Nagasaki K Dolja VV The complexity of the virus world Nat Rev Microbiol 2009
Evolution of antivirus defense systems
bull CRISPRCas system of adaptive immunity in prokaryotes
bull A case for Lamarckian evolutionbull The perennial virus-host arms race
CRISPR repeats and Cas genesCRISPR Clustered Regularly interspaced short palindromic repeatsCas CRISPR-associated (genes)
Sorek et al Nature Rev Microbiol 2008
Makarova KS Aravind L Grishin NV Rogozin IB Koonin EVA DNA repair system specific for thermophilic Archaea and bacteria predicted by genomic context analysisNucleic Acids Res 2002 Jan 1530(2)482-96
During a systematic analysis of conserved gene context in prokaryotic genomes a previously undetected complex partially conserved neighborhood consisting of more than 20 genes was discovered in most Archaea and some bacteria including the hyperthermophiles Thermotoga maritima and Aquifex aeolicus The gene composition and gene order in this neighborhood vary greatly between species but all versions have a stable conserved core that consists of five genes One of the core genes encodes a predicted DNA helicase often fused to a predicted HD-superfamily hydrolase and another encodes a RecB family exonuclease three core genes remain uncharacterized but one of these might encode a nuclease of a new family helliphelliphelliphellipThe functional features of the proteins encoded in this neighborhood suggest that they comprise a previously undetected DNA repair system which to our knowledge is the first repair system largely specific for thermophiles to be identified
Protein components of the system an update with unification of many diverse families
~25 families altogetherFamily SubfamilyA Phyletic
distributionBComments
1 COG1518 COG1518 (cas1) All Putative novel nucleaseintegrase Mostly α-helical protein
2 COG1343 COG1343 (cas2) COG3512ygbF-like
MTH324-likey1723_N-like
All Small protein related to VapD fused to helicase (COG1203) iny1723-like proteins
3 COG1203 COG1203 (cas3) All DNA helicase Most proteins have fusion to HD nuclease
4 RecB-like nuclease
COG1468 (cas4) COG4343 All RecB-like nuclease Contains three-cysteine C-terminal cluster
5 RAMP Repair-associated mysterious
protein
COG1688 (cas5)COG1769 COG1583COG1567 COG1336COG1367 COG1604COG1337 COG1332COG5551
BH0337-likeMJ0978-likeYgcH-like
y1726-like y1727-like
All Belong to ldquoRAMPrdquo family possibly RNA-binding proteinstructurally related to duplicated ferredoxin fold (PDB 1wj9)
6 COG1857 COG1857 COG3649 YgcJ-like y1725-like
All αβ protein predicted nuclease or integrase
7 HD-like nuclease
COG1203 (N-terminus) COG2254 All HD-like nuclease
8 BH0338 BH0338-likeMTH1090-like
All mostlyarchaea and FIRM
Large Zn-finger containing proteins probably nucleases (nucleaseactivity was shown for MTH1090 (ref)
9 COG1353 predicted
polymerase
COG1353 Most archaea some bacteria
Predicted palm-domain polymerase distantly related to viral RdRp and RT
hellipand ~15 other less common proteins Makarova et al BD 2006
CRISPR clustered regularly interspaced short palindromic repeats
CRISPR repeats
TB and TA
Strepto-like
Ecoli-like
Pasteurella-like
Cas2COG1343
Cas1COG1518
Cas4COG1468
Cas3COG1203
ldquoThe common structural characteristics of CRISPR loci are (i) the presence of multiple short direct repeats which show no or very little sequence variation within a given locus (ii) the presence of non-repetitive spacer sequences between the repeats of similar size (iii) the presence of a common leader sequence of a few hundred basepairs in most species harbouring multiple CRISPR loci (iv) the absence of long open reading frames within the locus and (v) the presence of the cas1 gene accompanied by the cas2 cas3 or cas4 genes in CRISPR-containing speciesrdquo
The cas genes are our ldquorepairrdquo system
ldquoHelicaserdquo cassette
ldquoPolymeraserdquo cassette
paragon of prokaryotic genome plasticity-extensive gene shuffling-evidence of multiple horizontal transfers-widespread gene loss
RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins
extreme sequence diversity MOTIFs specific I II III IV V COGs 13361367 hhshhGs ust-lKGhh+hh hhGtt hD lGhttsGh 160413371332 y1726-like slhlpEKuVRGT lRTIDs YGuVTsGhuh COG1851 hGphpGpsaFh hGFGRh BH0337-like TpA-h+GIh-uIh hhLpDV LGsREhuht COG1567 hhhpp ups-lhtAhh luscoGhGh COG1769 hhhh+Ph-hhh ss-hhGhlhsh hGh hGtcp+hsthchp COG1688 (Cas5) hhhhhts sssshhGhlsh lGttphh COGs 15835551 hhhhoPhhl hGtppshGFGl YgcH-like hHphlh hGu+uhGhGhh y1727-like LHphLh GFsthGLStss MJ0978-like hHNH lG+tsuhGhGol
Ferredoxin foldRNA-recognition motif (RRM)
The sequence similarity space of CRISPR repeats visualized with the BioLayoutprogram [26] Dots denote individual repeat sequences connecting lines represent Smith-Waterman similarities such that closer dots represent more similar sequences Dot colors denote cluster association as derived from MCL clustering The 12 largest clusters are indicated by circles together with their sequence logos coarse phylogenetic composition and sample secondary structures where applicableKunin et al Genome Biology 2007 8R61 doi101186gb-2007-8-4-r61
CRISPR show extreme diversity and complex clustering
Arcful GTTGAAATC-------AGACCAAAATGGGATTGAAAG- Metthe GTTAAAATC-------AGACCAAAATGGGATTGAAAT- Pyraby GTTCCAATA-------AGACTAGAATAGAATTGAAAG- Pyrhor GTTTAATAA--------GACTAAAATAGAATTGAAAG- Sulsol GATTAATCC-------------CAAAAGAATTGAAAG- Pyrhor GTTTCCGTA--------GAACTAAATAGTGTGGAAAG- Pyraby GTTTCCGTA--------GAACTTAGTAGTGTGGAAAG- Pyraer GTTTCAACT------------ATCTTTTGATTTTTGG- Pictor GTTAAATAA-------TAACCTAAATAGGATTGAAAG- Theaer GTAAAATAG---------ACCTTAATAGGATTGAAAG- Metace GTTTCAATC-------CCTCAAAGGTCTGATTTTAAC- Sultok GATGAATCC------------CAAAAGGAATTGAAAG- Sulaci GTTTTAGTT-------------TCTTGTCGTTATTAC- Pictor GTTTAAGAA--------TTACTAGATAGTATGGAGT-- Halmar GTTTCAGAC-------GGACCCTTGTGGGGTTGAAGC- Metjan ATTAAAATC-------AGACCGTTTCGGAATGGAAAT- Metkan GTTTCATTA-CCCGTATTATTACGGGTTAATTGCGAG- Nanequ CTTTCAATA---------TTTCTAATATATTAGAAAC- Themar GTTTCAATA-------CTTCCTTAGAGGTATGGAAAC- Theten GTTTCAATC-------CCTTTTAGGTAGGCTAAAAAC Thethe GTTGCAAAC-CTCGTTAGCCTCGTAGAGGATTGAAAC- Aquaeo GTTTTAACT--------CCACACGGTACATTAGAAAC- Bachal GTCGCACTC----TATATGGGT-GCGTGGATTGAAAT- Azoarcus GTGTTCCCC-------GCGCATCGCGGGGGTTGAAG-- Chltep GTCTTCCCC--------ACGCC-CGTGGGGGTGTTTC- Chrvio GTGCTCCCC-------CACGCA-CGTGGGGATGAACCG Clotet GTATTAGTA-------GCACCA-TATTGGAATGTAAAT Anabaena GTTTTAATTAACAAAAATCCCTATCAGGGATTGAAAC- Myctub GTTTCCGTC--CCCTCTCGGGGTTTTGGGTCTGACGAC Yerpes GTTCACTGC--------CGCACAGGCAGCTTAGAAA-- Metcap GTTTCAATCCACTCCCGGCTATTTAGCCGGGAGATAC- Mycgal GTTTTAGCA-CTGTACAATACTTGTGTAAGCAATAAC- Pasmul GTTAACTGC--------CGTATAGGCAGCTTAGAAA-- Pasmul GTTGTAGTTCCCTCTCTCATTTCGCAGTGCTACAAT-- Pasmul GTTCACCAT--------CGTGTAGATGGCTTAGAAA--
All
A large subset of CRISPRs is conserved among diverse specieseven between archaea and bacteria suggesting that
CRISPRs are horizontally transferred together with cas genes
Mojica FJ Diez-Villasenor C Garcia-Martinez J Soria E
Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elementsJ Mol Evol 2005 Feb60(2)174-82
Here we show that CRISPR spacers derive from preexisting sequences either chromosomal or within transmissible genetic elements such as bacteriophages and conjugative plasmids
helliphelliphelliphellipThe transcription of the CRISPR loci (Tang et al 2002) suggests that such activity could be executed by CRISPR-RNA molecules acting as regulatory RNA that specifically recognizes the target through the homologous RNA-spacer sequence similarly to the eukaryotic interference RNA
HypothesisCRISPRCasbull is a prokaryotic immunity system that functions on the
RNAi principle
bull integrates short fragments of essential phageplasmid genes into CRISPRs
bull When expressed these fragments (psiRNA ndash after prokaryotic siRNA) silence the target gene and make the organism immune to the respective agent
bull contains all or most of the protein activities involved in these processes
bull Some of the Cas proteins are functional analogs of the eukaryotic proteins involved in RNAi in particular components of RISC (RNA-Induced Silencing Complex) and form prokaryotic analogs of RISC (pRISCs)
Transcription
(protein-guided) RNA folding
specific transcription factor (COG1517)
cellular RNApol
polycistronic pre-psiRNA
p-dicer (Helicase+HD-Hydrolase)(COG1203 COG2254)
RNA processing(fast)
pre-psiRNA75-100 nt
RNA processing(slow)
p-dicer
psiRNA25-45 nt
p-slicer(COG1468 COG4343
COG1857)
RAMP
target RNA cleavage
p-RISC
plasmid or phage mRNA
Annealing to RNA target
p-RISC
The basic scheme of CASS functioning
psiRNA25-45 nt
RAMP
RAMP-RNA complex(unstable)plasmid or phage
mRNA
Primer elongationCASS RNApol(COG1353)
Amplified annealing
long dsRNA (stable)
p-dicerDuplex degradation
RAMPRAMP binding
Cycle continues
Annealing to RNA target
Variant of CASS functioning with polymerasepsi-RNA amplification(mostly in thermophiles but also in Mycobacteria)
plasmid or phage mRNA
pre-psiRNA75-100 nt
CASS RNApol(COG1353)Or other RT
Reverse transcription with random copy choice
Random RNA recombination and reverse transcription
dsDNA with CRISPRs and target-derived spacers
Homologous recombination with genomic CRISPR region
integrase(COG1518)
genomic DNA
genomic DNA with new target-derived spacer
OR
New psiRNA generation
Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12
Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity
Key validation
bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease
Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes
(1) Adaptation
(2) Expression amp Processing
(3) Interference
invading virus plasmid
CRISPRleader
5 4 3 2 1
6 5 4 3 2 1
invading virus plasmid
invading virus plasmid
host
host
cas operon
Cascade
Cas3
cas3 cas2cas1cas5
Van der Oost et al TIBS 2009
The three types of CRISPRCas systems and their signature genes
1343
1343
1343
1343
Helicase1203
HD-f
Cas2
Cas6
Cas6
Cas5
Cas8
Cas9
Cas1
Cas3
Cas10
Cas11
Cas7
Cas6
Cas6
Cas5 Cas6
Cas4
Cas6
13433513
1343
RuvC-f
3513
HNH
13433513
1857
18571857 RAMP1688
RAMP1688
RecB-f1468
RecB-f1468
Helicase1203
R AMP1583
1857 1857Helicase1203
1857Helicase1203
1343 y1724
ygcKygcL
EcoliCASS2
YpestCASS3
NmeniCASS4
MtubeCASS6 RAMP module
DvulgCASS1
Tneap and HmariCASS7
ApernCASS5
Type I (Helicase cassette)
Cascade
Type II (HNH-type)
Type III (RAMP module or Polymerase cassette)
CASS4a
1857 1343RecB-f1468
Helicase1203
BH0338MTH1090LA3191
1421 RAMP1567
RAMP15833337RAMP
1583
RAMP1583
RAMP1769
SPy1049
Cas6
Cas61857
RecB-f1468
RAMP1688
Helicase1203
AF0070
R AMP1583
1343
1517
Cas12 Cas13
RAMP1688
RAMP1688
RAMP1688
Makarova et al NRM submitted
Experimental data on CRISPRCas systems
1857
1857
1857
RAMP1688
10
Cordip DIP0037Bacfra BF3951
Wolsuc WS1444Camjej Cj1522c
Neimen NMA0630Pasmul PM1126
Strthe str0658Strpyo Spy 0770
Treden TDE0328Staepi SERP2463
Mycgal MGA 0523Mycmob MMOB0320
Porgin PG1982Legpne lpp0161
Wolsuc WS1615Sulsol SSO1450
Aerper APE1240Sulsol SSO1405
Pyraer PAE0200Pyraer PAE0081
Metmaz MM0559Metmaz MM3249
Arcful AF1878Mansuc MS1635
Niteur NE0111Myctub Rv2817c
Nostoc alr0381Synech slr7071Thethe TTHB145
Chrvio CV1229Xanaxo XAC3842
Chltep CT1130Desvul DVUA0134
Metcap MCA0651Azoarc ebA3283
Deigeo_DRAFT_1682Mansuc MS0981
Baccla ABC3592Bachal BH0341
Strpyo Spy 1286Vibvul VVA1544
Nostoc alr1468Synech slr7092
Synech slr7016Geosul GSU0057
Thethe TTHB224Nostoc alr1568
Metkan MK1312
Esccol b2755Saltyp STM2938
Phopro PBPRC0034Geosul GSU1392
Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639
Symthe_STH663Corjei jk0643
Nocfar nfa44220Strave SAV7537
Metcap MCA0930Cordip DIP2214
Chrvio CV1756Zymmob ZMO0680
Pasmul PM0311Erwcar ECA3679
Yerpes y1722Legpne lpl2837Phopro PBPRB1995
Acinet ACIAD2484
Clotet CTC01148Fusnuc FN1177
Aquaeo aq 369Theten TTE2658Themar TM1797
Bacfra BF2544Porgin PG2014
Halmar pNG4053Clotet CTC01463
Metjan MJ0378Pyrhor PH1245
Thekod TK0455Metthe MTH1084
Arcful AF2435Metace MA3670
Nanequ NEQ017Pyrhor PH0173
Pictor PTO0003Pictor PTO0049Thevol TVN0106
1343
1343
1343
1343
1518
1518
1343
1518
RecB-f4343
RecB-f1468
RecB-f1468
RecB-f1468
Helicase12031857 RAMP
1688
RAMP1688
Helicase1203
Helicase1203
RAMPBH0337
RAMPYgcH
RAMPy1727
RAMPy17261857
Helicase1203
HD-f
Helicase1203
HD-f
1343
HD-f
HD-f
HD-f
RuvC-f
AF0070
BH0338
y1724
ygcKygcL
RAMP1583
HTH
AF1870
SPy1049
3574
AF1870
PH0918
HTH 3574
1857 RAMP1688
RecB-f1468
Helicase1203
HD-f
AF1870
MTH1090
AF1873
RAMP1583
3574
AF1870
3574
1343
1
7
2
3
4
5
6
7a
Sulsol SSO1999Sulsol SSO1440
Sultok ST2639Sultok ST0032
Sulsol SSO1402Pyraer PAE0068
Arcful AF1874Pyrhor PH0917Thekod TK0450
Pyraby PAB1689
13433513
3513
1518
RuvC-f
HNH
4aHNH
1 0
Pyrhor PH0162
6
64
6
4
16
17
7
7
7
7
1
2
7
7
7
7
5 7
5 75 7
5 75 7
5 7
6 1
7
Metkan MK1314
Metace MA1931
Metmaz MM3356
Metbar Mbar_A1355
Strthe stu0960Myctub Rv2823cStaepi SERP2461
Themar TM1811Mansuc MS1651
Niteur NE0123Thethe TTP0102
Metthe MTH1082
Metthe MTH326
Metjan MJ1672Pictor PTO0053Thevol TVN0112
Metkan MK1297Thethe TTP0115
Bachal BH0328Synech sll7090
Aquaeo aq 387Theten TTE2639
Themar TM1794Arcful AF1867
Pyrfur PF1129Thefus Tfu1578
Porgin PG1987Sultok ST1979
Sulsol SSO1991Sulsol SSO1729
Sultok ST0012Sulaci Saci 2046Nostoc all1479
1421 RAMP1567
RAMP1583
RAMPMTH323
RAMP1769
XMTH324
ldquoPolymeraserdquo cassette
ldquoHelicaserdquo cassette(A)
(B)
(C)
Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4
Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56
Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)
mRNA targeting
Phylogeny of Cas1 and the 3 types of CRISPRCas systems
228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes
EcoliCASS2Type IType I
Type I
Type I
Type I
Type III
Type III
Type III
Type III
Type III
Type III
Type II
YpestCASS3
NmeniCASS4
PolymeraseRAMPmoduleMtubeCASS6
PolRAMP
DvulgCASS1
TneapHmariCASS7
ApernCASS5
cas1cas9 cas2 cas4
cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI
II
IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2
RAMP
Makarova et al NRM submitted
CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes
54
256
1537
26
516
2
9
7 56107
3
10
13
383
210
46
216
8
1
7 70211
10
1
0102030405060708090
100
1623
84 6
17 7
2322
0
9
0
0
151
2
1
21
17
20 1
0
1533 28 7 14
0
9 7 40
117 210
0102030405060708090
100
Type I Type II Type IIICas1presenceabsence
Makarova et al in preparation
Modular evolution of the 3 types of CRISPRCas
Makarova et al in preparation
Mol Microbiol 2011 Jan79(2)484-502
A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair
Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF
Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair
Back to repair functions
The 3 major modalities of evolution
Koonin Wolf Biol Direct 2009
CRISPRCas as a bona fide Lamarckian system
Koonin Wolf Biol Direct 2009
Phenomenon Biological rolefunction
Phyletic spread
Lamarckian criteria
Genomic changes caused by environmental factor
Changes are specific to relevant genomic loci
Changes provide adaptation to the causative factor
Bona fide LamarckianCRISPRCas Defense against
viruses and other mobile elements
Most of the Archaeaand many bacteria
Yes Yes Yes
piRNA Defense against transposable elements in germline
Animals Yes Yes Yes
HGT (specific cases)
Adaptation to new environment stress response resistance
Archaea bacteria unicellular eukaryotes
Yes Yes Yes
Quasi-LamarckianHGT (general phenomenon)
Diverse innovations Archaea bacteria unicellular eukaryotes
Yes No Yesno
Stress-induced mutagenesis
Stress responseresistanceadaptation to new conditions
Ubiquitous Yes No or partially
Yes (but general evolvabilityenhanced as well)
Diverse Lamarckian and quasi-Lamarckian phenomena
Koonin Wolf Biol Direct 2009
Stress as a gauge of evolutionary modality
Koonin Wolf Biol Direct 2009
bullEvolution of parasites is intrinsic to any replicator system
bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since
bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms
bullPerennial arms race between parasites and hosts is one of the principal factors of evolution
Valerian DoljaOregon State University
Yuri Wolf NCBITatiana Senkevich NIAID NIH
Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille
Didier Raoult et al
Bill MartinUniv Duesseldorf
Kira Makarova NCBI
Origin and evolution of virus-like genetic
elementsin the pre-cellular
era
Koonin Martin TIG 2005Koonin Senkevich Dolja Biol Direct 2006 129Koonin Ann NY Acad Sci 2009
Replicon fusionyields chromosomes
bullViruses and virus-like genetic elements are not ldquojustrdquo pathogens they aredominant entities in the biospherebullEmergence of virus-like parasites is inevitable in any replicating systembullIn the pre-cellular epoch the genetic elements that later became viral andcellular genomes comprised a single pool in which they mixed matched andevolved new increasingly complex gene ensemblesbullDifferent replication strategies including RNA replication reverse transcriptionand DNA replication evolved already in the primordial genetic poolbullWith the emergence of prokaryotic cells a distinct pool of viral genes formed that retained its identity ever since as evidenced by the extant distribution ofviral hallmark genes ldquovirus worldrdquo or the virospherebullThe emergence of the eukaryotic cell was a second melting pot of virus evolution from which viruses of eukaryotes originated via recombination of genes from prokaryote viruses retroelements and the evolving eukaryotic hostbullViruses make essential contributions to the evolution of the genomes of cellularlife forms in particular as vehicles of HGT GTAs transducing phages
The ancient Virus World (VW)
Koonin EV Senkevich TG Dolja VV The ancient Virus World and evolution of cells Biol Direct 2006 Koonin EV Wolf YI Nagasaki K Dolja VV The complexity of the virus world Nat Rev Microbiol 2009
Evolution of antivirus defense systems
bull CRISPRCas system of adaptive immunity in prokaryotes
bull A case for Lamarckian evolutionbull The perennial virus-host arms race
CRISPR repeats and Cas genesCRISPR Clustered Regularly interspaced short palindromic repeatsCas CRISPR-associated (genes)
Sorek et al Nature Rev Microbiol 2008
Makarova KS Aravind L Grishin NV Rogozin IB Koonin EVA DNA repair system specific for thermophilic Archaea and bacteria predicted by genomic context analysisNucleic Acids Res 2002 Jan 1530(2)482-96
During a systematic analysis of conserved gene context in prokaryotic genomes a previously undetected complex partially conserved neighborhood consisting of more than 20 genes was discovered in most Archaea and some bacteria including the hyperthermophiles Thermotoga maritima and Aquifex aeolicus The gene composition and gene order in this neighborhood vary greatly between species but all versions have a stable conserved core that consists of five genes One of the core genes encodes a predicted DNA helicase often fused to a predicted HD-superfamily hydrolase and another encodes a RecB family exonuclease three core genes remain uncharacterized but one of these might encode a nuclease of a new family helliphelliphelliphellipThe functional features of the proteins encoded in this neighborhood suggest that they comprise a previously undetected DNA repair system which to our knowledge is the first repair system largely specific for thermophiles to be identified
Protein components of the system an update with unification of many diverse families
~25 families altogetherFamily SubfamilyA Phyletic
distributionBComments
1 COG1518 COG1518 (cas1) All Putative novel nucleaseintegrase Mostly α-helical protein
2 COG1343 COG1343 (cas2) COG3512ygbF-like
MTH324-likey1723_N-like
All Small protein related to VapD fused to helicase (COG1203) iny1723-like proteins
3 COG1203 COG1203 (cas3) All DNA helicase Most proteins have fusion to HD nuclease
4 RecB-like nuclease
COG1468 (cas4) COG4343 All RecB-like nuclease Contains three-cysteine C-terminal cluster
5 RAMP Repair-associated mysterious
protein
COG1688 (cas5)COG1769 COG1583COG1567 COG1336COG1367 COG1604COG1337 COG1332COG5551
BH0337-likeMJ0978-likeYgcH-like
y1726-like y1727-like
All Belong to ldquoRAMPrdquo family possibly RNA-binding proteinstructurally related to duplicated ferredoxin fold (PDB 1wj9)
6 COG1857 COG1857 COG3649 YgcJ-like y1725-like
All αβ protein predicted nuclease or integrase
7 HD-like nuclease
COG1203 (N-terminus) COG2254 All HD-like nuclease
8 BH0338 BH0338-likeMTH1090-like
All mostlyarchaea and FIRM
Large Zn-finger containing proteins probably nucleases (nucleaseactivity was shown for MTH1090 (ref)
9 COG1353 predicted
polymerase
COG1353 Most archaea some bacteria
Predicted palm-domain polymerase distantly related to viral RdRp and RT
hellipand ~15 other less common proteins Makarova et al BD 2006
CRISPR clustered regularly interspaced short palindromic repeats
CRISPR repeats
TB and TA
Strepto-like
Ecoli-like
Pasteurella-like
Cas2COG1343
Cas1COG1518
Cas4COG1468
Cas3COG1203
ldquoThe common structural characteristics of CRISPR loci are (i) the presence of multiple short direct repeats which show no or very little sequence variation within a given locus (ii) the presence of non-repetitive spacer sequences between the repeats of similar size (iii) the presence of a common leader sequence of a few hundred basepairs in most species harbouring multiple CRISPR loci (iv) the absence of long open reading frames within the locus and (v) the presence of the cas1 gene accompanied by the cas2 cas3 or cas4 genes in CRISPR-containing speciesrdquo
The cas genes are our ldquorepairrdquo system
ldquoHelicaserdquo cassette
ldquoPolymeraserdquo cassette
paragon of prokaryotic genome plasticity-extensive gene shuffling-evidence of multiple horizontal transfers-widespread gene loss
RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins
extreme sequence diversity MOTIFs specific I II III IV V COGs 13361367 hhshhGs ust-lKGhh+hh hhGtt hD lGhttsGh 160413371332 y1726-like slhlpEKuVRGT lRTIDs YGuVTsGhuh COG1851 hGphpGpsaFh hGFGRh BH0337-like TpA-h+GIh-uIh hhLpDV LGsREhuht COG1567 hhhpp ups-lhtAhh luscoGhGh COG1769 hhhh+Ph-hhh ss-hhGhlhsh hGh hGtcp+hsthchp COG1688 (Cas5) hhhhhts sssshhGhlsh lGttphh COGs 15835551 hhhhoPhhl hGtppshGFGl YgcH-like hHphlh hGu+uhGhGhh y1727-like LHphLh GFsthGLStss MJ0978-like hHNH lG+tsuhGhGol
Ferredoxin foldRNA-recognition motif (RRM)
The sequence similarity space of CRISPR repeats visualized with the BioLayoutprogram [26] Dots denote individual repeat sequences connecting lines represent Smith-Waterman similarities such that closer dots represent more similar sequences Dot colors denote cluster association as derived from MCL clustering The 12 largest clusters are indicated by circles together with their sequence logos coarse phylogenetic composition and sample secondary structures where applicableKunin et al Genome Biology 2007 8R61 doi101186gb-2007-8-4-r61
CRISPR show extreme diversity and complex clustering
Arcful GTTGAAATC-------AGACCAAAATGGGATTGAAAG- Metthe GTTAAAATC-------AGACCAAAATGGGATTGAAAT- Pyraby GTTCCAATA-------AGACTAGAATAGAATTGAAAG- Pyrhor GTTTAATAA--------GACTAAAATAGAATTGAAAG- Sulsol GATTAATCC-------------CAAAAGAATTGAAAG- Pyrhor GTTTCCGTA--------GAACTAAATAGTGTGGAAAG- Pyraby GTTTCCGTA--------GAACTTAGTAGTGTGGAAAG- Pyraer GTTTCAACT------------ATCTTTTGATTTTTGG- Pictor GTTAAATAA-------TAACCTAAATAGGATTGAAAG- Theaer GTAAAATAG---------ACCTTAATAGGATTGAAAG- Metace GTTTCAATC-------CCTCAAAGGTCTGATTTTAAC- Sultok GATGAATCC------------CAAAAGGAATTGAAAG- Sulaci GTTTTAGTT-------------TCTTGTCGTTATTAC- Pictor GTTTAAGAA--------TTACTAGATAGTATGGAGT-- Halmar GTTTCAGAC-------GGACCCTTGTGGGGTTGAAGC- Metjan ATTAAAATC-------AGACCGTTTCGGAATGGAAAT- Metkan GTTTCATTA-CCCGTATTATTACGGGTTAATTGCGAG- Nanequ CTTTCAATA---------TTTCTAATATATTAGAAAC- Themar GTTTCAATA-------CTTCCTTAGAGGTATGGAAAC- Theten GTTTCAATC-------CCTTTTAGGTAGGCTAAAAAC Thethe GTTGCAAAC-CTCGTTAGCCTCGTAGAGGATTGAAAC- Aquaeo GTTTTAACT--------CCACACGGTACATTAGAAAC- Bachal GTCGCACTC----TATATGGGT-GCGTGGATTGAAAT- Azoarcus GTGTTCCCC-------GCGCATCGCGGGGGTTGAAG-- Chltep GTCTTCCCC--------ACGCC-CGTGGGGGTGTTTC- Chrvio GTGCTCCCC-------CACGCA-CGTGGGGATGAACCG Clotet GTATTAGTA-------GCACCA-TATTGGAATGTAAAT Anabaena GTTTTAATTAACAAAAATCCCTATCAGGGATTGAAAC- Myctub GTTTCCGTC--CCCTCTCGGGGTTTTGGGTCTGACGAC Yerpes GTTCACTGC--------CGCACAGGCAGCTTAGAAA-- Metcap GTTTCAATCCACTCCCGGCTATTTAGCCGGGAGATAC- Mycgal GTTTTAGCA-CTGTACAATACTTGTGTAAGCAATAAC- Pasmul GTTAACTGC--------CGTATAGGCAGCTTAGAAA-- Pasmul GTTGTAGTTCCCTCTCTCATTTCGCAGTGCTACAAT-- Pasmul GTTCACCAT--------CGTGTAGATGGCTTAGAAA--
All
A large subset of CRISPRs is conserved among diverse specieseven between archaea and bacteria suggesting that
CRISPRs are horizontally transferred together with cas genes
Mojica FJ Diez-Villasenor C Garcia-Martinez J Soria E
Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elementsJ Mol Evol 2005 Feb60(2)174-82
Here we show that CRISPR spacers derive from preexisting sequences either chromosomal or within transmissible genetic elements such as bacteriophages and conjugative plasmids
helliphelliphelliphellipThe transcription of the CRISPR loci (Tang et al 2002) suggests that such activity could be executed by CRISPR-RNA molecules acting as regulatory RNA that specifically recognizes the target through the homologous RNA-spacer sequence similarly to the eukaryotic interference RNA
HypothesisCRISPRCasbull is a prokaryotic immunity system that functions on the
RNAi principle
bull integrates short fragments of essential phageplasmid genes into CRISPRs
bull When expressed these fragments (psiRNA ndash after prokaryotic siRNA) silence the target gene and make the organism immune to the respective agent
bull contains all or most of the protein activities involved in these processes
bull Some of the Cas proteins are functional analogs of the eukaryotic proteins involved in RNAi in particular components of RISC (RNA-Induced Silencing Complex) and form prokaryotic analogs of RISC (pRISCs)
Transcription
(protein-guided) RNA folding
specific transcription factor (COG1517)
cellular RNApol
polycistronic pre-psiRNA
p-dicer (Helicase+HD-Hydrolase)(COG1203 COG2254)
RNA processing(fast)
pre-psiRNA75-100 nt
RNA processing(slow)
p-dicer
psiRNA25-45 nt
p-slicer(COG1468 COG4343
COG1857)
RAMP
target RNA cleavage
p-RISC
plasmid or phage mRNA
Annealing to RNA target
p-RISC
The basic scheme of CASS functioning
psiRNA25-45 nt
RAMP
RAMP-RNA complex(unstable)plasmid or phage
mRNA
Primer elongationCASS RNApol(COG1353)
Amplified annealing
long dsRNA (stable)
p-dicerDuplex degradation
RAMPRAMP binding
Cycle continues
Annealing to RNA target
Variant of CASS functioning with polymerasepsi-RNA amplification(mostly in thermophiles but also in Mycobacteria)
plasmid or phage mRNA
pre-psiRNA75-100 nt
CASS RNApol(COG1353)Or other RT
Reverse transcription with random copy choice
Random RNA recombination and reverse transcription
dsDNA with CRISPRs and target-derived spacers
Homologous recombination with genomic CRISPR region
integrase(COG1518)
genomic DNA
genomic DNA with new target-derived spacer
OR
New psiRNA generation
Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12
Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity
Key validation
bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease
Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes
(1) Adaptation
(2) Expression amp Processing
(3) Interference
invading virus plasmid
CRISPRleader
5 4 3 2 1
6 5 4 3 2 1
invading virus plasmid
invading virus plasmid
host
host
cas operon
Cascade
Cas3
cas3 cas2cas1cas5
Van der Oost et al TIBS 2009
The three types of CRISPRCas systems and their signature genes
1343
1343
1343
1343
Helicase1203
HD-f
Cas2
Cas6
Cas6
Cas5
Cas8
Cas9
Cas1
Cas3
Cas10
Cas11
Cas7
Cas6
Cas6
Cas5 Cas6
Cas4
Cas6
13433513
1343
RuvC-f
3513
HNH
13433513
1857
18571857 RAMP1688
RAMP1688
RecB-f1468
RecB-f1468
Helicase1203
R AMP1583
1857 1857Helicase1203
1857Helicase1203
1343 y1724
ygcKygcL
EcoliCASS2
YpestCASS3
NmeniCASS4
MtubeCASS6 RAMP module
DvulgCASS1
Tneap and HmariCASS7
ApernCASS5
Type I (Helicase cassette)
Cascade
Type II (HNH-type)
Type III (RAMP module or Polymerase cassette)
CASS4a
1857 1343RecB-f1468
Helicase1203
BH0338MTH1090LA3191
1421 RAMP1567
RAMP15833337RAMP
1583
RAMP1583
RAMP1769
SPy1049
Cas6
Cas61857
RecB-f1468
RAMP1688
Helicase1203
AF0070
R AMP1583
1343
1517
Cas12 Cas13
RAMP1688
RAMP1688
RAMP1688
Makarova et al NRM submitted
Experimental data on CRISPRCas systems
1857
1857
1857
RAMP1688
10
Cordip DIP0037Bacfra BF3951
Wolsuc WS1444Camjej Cj1522c
Neimen NMA0630Pasmul PM1126
Strthe str0658Strpyo Spy 0770
Treden TDE0328Staepi SERP2463
Mycgal MGA 0523Mycmob MMOB0320
Porgin PG1982Legpne lpp0161
Wolsuc WS1615Sulsol SSO1450
Aerper APE1240Sulsol SSO1405
Pyraer PAE0200Pyraer PAE0081
Metmaz MM0559Metmaz MM3249
Arcful AF1878Mansuc MS1635
Niteur NE0111Myctub Rv2817c
Nostoc alr0381Synech slr7071Thethe TTHB145
Chrvio CV1229Xanaxo XAC3842
Chltep CT1130Desvul DVUA0134
Metcap MCA0651Azoarc ebA3283
Deigeo_DRAFT_1682Mansuc MS0981
Baccla ABC3592Bachal BH0341
Strpyo Spy 1286Vibvul VVA1544
Nostoc alr1468Synech slr7092
Synech slr7016Geosul GSU0057
Thethe TTHB224Nostoc alr1568
Metkan MK1312
Esccol b2755Saltyp STM2938
Phopro PBPRC0034Geosul GSU1392
Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639
Symthe_STH663Corjei jk0643
Nocfar nfa44220Strave SAV7537
Metcap MCA0930Cordip DIP2214
Chrvio CV1756Zymmob ZMO0680
Pasmul PM0311Erwcar ECA3679
Yerpes y1722Legpne lpl2837Phopro PBPRB1995
Acinet ACIAD2484
Clotet CTC01148Fusnuc FN1177
Aquaeo aq 369Theten TTE2658Themar TM1797
Bacfra BF2544Porgin PG2014
Halmar pNG4053Clotet CTC01463
Metjan MJ0378Pyrhor PH1245
Thekod TK0455Metthe MTH1084
Arcful AF2435Metace MA3670
Nanequ NEQ017Pyrhor PH0173
Pictor PTO0003Pictor PTO0049Thevol TVN0106
1343
1343
1343
1343
1518
1518
1343
1518
RecB-f4343
RecB-f1468
RecB-f1468
RecB-f1468
Helicase12031857 RAMP
1688
RAMP1688
Helicase1203
Helicase1203
RAMPBH0337
RAMPYgcH
RAMPy1727
RAMPy17261857
Helicase1203
HD-f
Helicase1203
HD-f
1343
HD-f
HD-f
HD-f
RuvC-f
AF0070
BH0338
y1724
ygcKygcL
RAMP1583
HTH
AF1870
SPy1049
3574
AF1870
PH0918
HTH 3574
1857 RAMP1688
RecB-f1468
Helicase1203
HD-f
AF1870
MTH1090
AF1873
RAMP1583
3574
AF1870
3574
1343
1
7
2
3
4
5
6
7a
Sulsol SSO1999Sulsol SSO1440
Sultok ST2639Sultok ST0032
Sulsol SSO1402Pyraer PAE0068
Arcful AF1874Pyrhor PH0917Thekod TK0450
Pyraby PAB1689
13433513
3513
1518
RuvC-f
HNH
4aHNH
1 0
Pyrhor PH0162
6
64
6
4
16
17
7
7
7
7
1
2
7
7
7
7
5 7
5 75 7
5 75 7
5 7
6 1
7
Metkan MK1314
Metace MA1931
Metmaz MM3356
Metbar Mbar_A1355
Strthe stu0960Myctub Rv2823cStaepi SERP2461
Themar TM1811Mansuc MS1651
Niteur NE0123Thethe TTP0102
Metthe MTH1082
Metthe MTH326
Metjan MJ1672Pictor PTO0053Thevol TVN0112
Metkan MK1297Thethe TTP0115
Bachal BH0328Synech sll7090
Aquaeo aq 387Theten TTE2639
Themar TM1794Arcful AF1867
Pyrfur PF1129Thefus Tfu1578
Porgin PG1987Sultok ST1979
Sulsol SSO1991Sulsol SSO1729
Sultok ST0012Sulaci Saci 2046Nostoc all1479
1421 RAMP1567
RAMP1583
RAMPMTH323
RAMP1769
XMTH324
ldquoPolymeraserdquo cassette
ldquoHelicaserdquo cassette(A)
(B)
(C)
Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4
Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56
Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)
mRNA targeting
Phylogeny of Cas1 and the 3 types of CRISPRCas systems
228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes
EcoliCASS2Type IType I
Type I
Type I
Type I
Type III
Type III
Type III
Type III
Type III
Type III
Type II
YpestCASS3
NmeniCASS4
PolymeraseRAMPmoduleMtubeCASS6
PolRAMP
DvulgCASS1
TneapHmariCASS7
ApernCASS5
cas1cas9 cas2 cas4
cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI
II
IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2
RAMP
Makarova et al NRM submitted
CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes
54
256
1537
26
516
2
9
7 56107
3
10
13
383
210
46
216
8
1
7 70211
10
1
0102030405060708090
100
1623
84 6
17 7
2322
0
9
0
0
151
2
1
21
17
20 1
0
1533 28 7 14
0
9 7 40
117 210
0102030405060708090
100
Type I Type II Type IIICas1presenceabsence
Makarova et al in preparation
Modular evolution of the 3 types of CRISPRCas
Makarova et al in preparation
Mol Microbiol 2011 Jan79(2)484-502
A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair
Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF
Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair
Back to repair functions
The 3 major modalities of evolution
Koonin Wolf Biol Direct 2009
CRISPRCas as a bona fide Lamarckian system
Koonin Wolf Biol Direct 2009
Phenomenon Biological rolefunction
Phyletic spread
Lamarckian criteria
Genomic changes caused by environmental factor
Changes are specific to relevant genomic loci
Changes provide adaptation to the causative factor
Bona fide LamarckianCRISPRCas Defense against
viruses and other mobile elements
Most of the Archaeaand many bacteria
Yes Yes Yes
piRNA Defense against transposable elements in germline
Animals Yes Yes Yes
HGT (specific cases)
Adaptation to new environment stress response resistance
Archaea bacteria unicellular eukaryotes
Yes Yes Yes
Quasi-LamarckianHGT (general phenomenon)
Diverse innovations Archaea bacteria unicellular eukaryotes
Yes No Yesno
Stress-induced mutagenesis
Stress responseresistanceadaptation to new conditions
Ubiquitous Yes No or partially
Yes (but general evolvabilityenhanced as well)
Diverse Lamarckian and quasi-Lamarckian phenomena
Koonin Wolf Biol Direct 2009
Stress as a gauge of evolutionary modality
Koonin Wolf Biol Direct 2009
bullEvolution of parasites is intrinsic to any replicator system
bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since
bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms
bullPerennial arms race between parasites and hosts is one of the principal factors of evolution
Valerian DoljaOregon State University
Yuri Wolf NCBITatiana Senkevich NIAID NIH
Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille
Didier Raoult et al
Bill MartinUniv Duesseldorf
Kira Makarova NCBI
bullViruses and virus-like genetic elements are not ldquojustrdquo pathogens they aredominant entities in the biospherebullEmergence of virus-like parasites is inevitable in any replicating systembullIn the pre-cellular epoch the genetic elements that later became viral andcellular genomes comprised a single pool in which they mixed matched andevolved new increasingly complex gene ensemblesbullDifferent replication strategies including RNA replication reverse transcriptionand DNA replication evolved already in the primordial genetic poolbullWith the emergence of prokaryotic cells a distinct pool of viral genes formed that retained its identity ever since as evidenced by the extant distribution ofviral hallmark genes ldquovirus worldrdquo or the virospherebullThe emergence of the eukaryotic cell was a second melting pot of virus evolution from which viruses of eukaryotes originated via recombination of genes from prokaryote viruses retroelements and the evolving eukaryotic hostbullViruses make essential contributions to the evolution of the genomes of cellularlife forms in particular as vehicles of HGT GTAs transducing phages
The ancient Virus World (VW)
Koonin EV Senkevich TG Dolja VV The ancient Virus World and evolution of cells Biol Direct 2006 Koonin EV Wolf YI Nagasaki K Dolja VV The complexity of the virus world Nat Rev Microbiol 2009
Evolution of antivirus defense systems
bull CRISPRCas system of adaptive immunity in prokaryotes
bull A case for Lamarckian evolutionbull The perennial virus-host arms race
CRISPR repeats and Cas genesCRISPR Clustered Regularly interspaced short palindromic repeatsCas CRISPR-associated (genes)
Sorek et al Nature Rev Microbiol 2008
Makarova KS Aravind L Grishin NV Rogozin IB Koonin EVA DNA repair system specific for thermophilic Archaea and bacteria predicted by genomic context analysisNucleic Acids Res 2002 Jan 1530(2)482-96
During a systematic analysis of conserved gene context in prokaryotic genomes a previously undetected complex partially conserved neighborhood consisting of more than 20 genes was discovered in most Archaea and some bacteria including the hyperthermophiles Thermotoga maritima and Aquifex aeolicus The gene composition and gene order in this neighborhood vary greatly between species but all versions have a stable conserved core that consists of five genes One of the core genes encodes a predicted DNA helicase often fused to a predicted HD-superfamily hydrolase and another encodes a RecB family exonuclease three core genes remain uncharacterized but one of these might encode a nuclease of a new family helliphelliphelliphellipThe functional features of the proteins encoded in this neighborhood suggest that they comprise a previously undetected DNA repair system which to our knowledge is the first repair system largely specific for thermophiles to be identified
Protein components of the system an update with unification of many diverse families
~25 families altogetherFamily SubfamilyA Phyletic
distributionBComments
1 COG1518 COG1518 (cas1) All Putative novel nucleaseintegrase Mostly α-helical protein
2 COG1343 COG1343 (cas2) COG3512ygbF-like
MTH324-likey1723_N-like
All Small protein related to VapD fused to helicase (COG1203) iny1723-like proteins
3 COG1203 COG1203 (cas3) All DNA helicase Most proteins have fusion to HD nuclease
4 RecB-like nuclease
COG1468 (cas4) COG4343 All RecB-like nuclease Contains three-cysteine C-terminal cluster
5 RAMP Repair-associated mysterious
protein
COG1688 (cas5)COG1769 COG1583COG1567 COG1336COG1367 COG1604COG1337 COG1332COG5551
BH0337-likeMJ0978-likeYgcH-like
y1726-like y1727-like
All Belong to ldquoRAMPrdquo family possibly RNA-binding proteinstructurally related to duplicated ferredoxin fold (PDB 1wj9)
6 COG1857 COG1857 COG3649 YgcJ-like y1725-like
All αβ protein predicted nuclease or integrase
7 HD-like nuclease
COG1203 (N-terminus) COG2254 All HD-like nuclease
8 BH0338 BH0338-likeMTH1090-like
All mostlyarchaea and FIRM
Large Zn-finger containing proteins probably nucleases (nucleaseactivity was shown for MTH1090 (ref)
9 COG1353 predicted
polymerase
COG1353 Most archaea some bacteria
Predicted palm-domain polymerase distantly related to viral RdRp and RT
hellipand ~15 other less common proteins Makarova et al BD 2006
CRISPR clustered regularly interspaced short palindromic repeats
CRISPR repeats
TB and TA
Strepto-like
Ecoli-like
Pasteurella-like
Cas2COG1343
Cas1COG1518
Cas4COG1468
Cas3COG1203
ldquoThe common structural characteristics of CRISPR loci are (i) the presence of multiple short direct repeats which show no or very little sequence variation within a given locus (ii) the presence of non-repetitive spacer sequences between the repeats of similar size (iii) the presence of a common leader sequence of a few hundred basepairs in most species harbouring multiple CRISPR loci (iv) the absence of long open reading frames within the locus and (v) the presence of the cas1 gene accompanied by the cas2 cas3 or cas4 genes in CRISPR-containing speciesrdquo
The cas genes are our ldquorepairrdquo system
ldquoHelicaserdquo cassette
ldquoPolymeraserdquo cassette
paragon of prokaryotic genome plasticity-extensive gene shuffling-evidence of multiple horizontal transfers-widespread gene loss
RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins
extreme sequence diversity MOTIFs specific I II III IV V COGs 13361367 hhshhGs ust-lKGhh+hh hhGtt hD lGhttsGh 160413371332 y1726-like slhlpEKuVRGT lRTIDs YGuVTsGhuh COG1851 hGphpGpsaFh hGFGRh BH0337-like TpA-h+GIh-uIh hhLpDV LGsREhuht COG1567 hhhpp ups-lhtAhh luscoGhGh COG1769 hhhh+Ph-hhh ss-hhGhlhsh hGh hGtcp+hsthchp COG1688 (Cas5) hhhhhts sssshhGhlsh lGttphh COGs 15835551 hhhhoPhhl hGtppshGFGl YgcH-like hHphlh hGu+uhGhGhh y1727-like LHphLh GFsthGLStss MJ0978-like hHNH lG+tsuhGhGol
Ferredoxin foldRNA-recognition motif (RRM)
The sequence similarity space of CRISPR repeats visualized with the BioLayoutprogram [26] Dots denote individual repeat sequences connecting lines represent Smith-Waterman similarities such that closer dots represent more similar sequences Dot colors denote cluster association as derived from MCL clustering The 12 largest clusters are indicated by circles together with their sequence logos coarse phylogenetic composition and sample secondary structures where applicableKunin et al Genome Biology 2007 8R61 doi101186gb-2007-8-4-r61
CRISPR show extreme diversity and complex clustering
Arcful GTTGAAATC-------AGACCAAAATGGGATTGAAAG- Metthe GTTAAAATC-------AGACCAAAATGGGATTGAAAT- Pyraby GTTCCAATA-------AGACTAGAATAGAATTGAAAG- Pyrhor GTTTAATAA--------GACTAAAATAGAATTGAAAG- Sulsol GATTAATCC-------------CAAAAGAATTGAAAG- Pyrhor GTTTCCGTA--------GAACTAAATAGTGTGGAAAG- Pyraby GTTTCCGTA--------GAACTTAGTAGTGTGGAAAG- Pyraer GTTTCAACT------------ATCTTTTGATTTTTGG- Pictor GTTAAATAA-------TAACCTAAATAGGATTGAAAG- Theaer GTAAAATAG---------ACCTTAATAGGATTGAAAG- Metace GTTTCAATC-------CCTCAAAGGTCTGATTTTAAC- Sultok GATGAATCC------------CAAAAGGAATTGAAAG- Sulaci GTTTTAGTT-------------TCTTGTCGTTATTAC- Pictor GTTTAAGAA--------TTACTAGATAGTATGGAGT-- Halmar GTTTCAGAC-------GGACCCTTGTGGGGTTGAAGC- Metjan ATTAAAATC-------AGACCGTTTCGGAATGGAAAT- Metkan GTTTCATTA-CCCGTATTATTACGGGTTAATTGCGAG- Nanequ CTTTCAATA---------TTTCTAATATATTAGAAAC- Themar GTTTCAATA-------CTTCCTTAGAGGTATGGAAAC- Theten GTTTCAATC-------CCTTTTAGGTAGGCTAAAAAC Thethe GTTGCAAAC-CTCGTTAGCCTCGTAGAGGATTGAAAC- Aquaeo GTTTTAACT--------CCACACGGTACATTAGAAAC- Bachal GTCGCACTC----TATATGGGT-GCGTGGATTGAAAT- Azoarcus GTGTTCCCC-------GCGCATCGCGGGGGTTGAAG-- Chltep GTCTTCCCC--------ACGCC-CGTGGGGGTGTTTC- Chrvio GTGCTCCCC-------CACGCA-CGTGGGGATGAACCG Clotet GTATTAGTA-------GCACCA-TATTGGAATGTAAAT Anabaena GTTTTAATTAACAAAAATCCCTATCAGGGATTGAAAC- Myctub GTTTCCGTC--CCCTCTCGGGGTTTTGGGTCTGACGAC Yerpes GTTCACTGC--------CGCACAGGCAGCTTAGAAA-- Metcap GTTTCAATCCACTCCCGGCTATTTAGCCGGGAGATAC- Mycgal GTTTTAGCA-CTGTACAATACTTGTGTAAGCAATAAC- Pasmul GTTAACTGC--------CGTATAGGCAGCTTAGAAA-- Pasmul GTTGTAGTTCCCTCTCTCATTTCGCAGTGCTACAAT-- Pasmul GTTCACCAT--------CGTGTAGATGGCTTAGAAA--
All
A large subset of CRISPRs is conserved among diverse specieseven between archaea and bacteria suggesting that
CRISPRs are horizontally transferred together with cas genes
Mojica FJ Diez-Villasenor C Garcia-Martinez J Soria E
Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elementsJ Mol Evol 2005 Feb60(2)174-82
Here we show that CRISPR spacers derive from preexisting sequences either chromosomal or within transmissible genetic elements such as bacteriophages and conjugative plasmids
helliphelliphelliphellipThe transcription of the CRISPR loci (Tang et al 2002) suggests that such activity could be executed by CRISPR-RNA molecules acting as regulatory RNA that specifically recognizes the target through the homologous RNA-spacer sequence similarly to the eukaryotic interference RNA
HypothesisCRISPRCasbull is a prokaryotic immunity system that functions on the
RNAi principle
bull integrates short fragments of essential phageplasmid genes into CRISPRs
bull When expressed these fragments (psiRNA ndash after prokaryotic siRNA) silence the target gene and make the organism immune to the respective agent
bull contains all or most of the protein activities involved in these processes
bull Some of the Cas proteins are functional analogs of the eukaryotic proteins involved in RNAi in particular components of RISC (RNA-Induced Silencing Complex) and form prokaryotic analogs of RISC (pRISCs)
Transcription
(protein-guided) RNA folding
specific transcription factor (COG1517)
cellular RNApol
polycistronic pre-psiRNA
p-dicer (Helicase+HD-Hydrolase)(COG1203 COG2254)
RNA processing(fast)
pre-psiRNA75-100 nt
RNA processing(slow)
p-dicer
psiRNA25-45 nt
p-slicer(COG1468 COG4343
COG1857)
RAMP
target RNA cleavage
p-RISC
plasmid or phage mRNA
Annealing to RNA target
p-RISC
The basic scheme of CASS functioning
psiRNA25-45 nt
RAMP
RAMP-RNA complex(unstable)plasmid or phage
mRNA
Primer elongationCASS RNApol(COG1353)
Amplified annealing
long dsRNA (stable)
p-dicerDuplex degradation
RAMPRAMP binding
Cycle continues
Annealing to RNA target
Variant of CASS functioning with polymerasepsi-RNA amplification(mostly in thermophiles but also in Mycobacteria)
plasmid or phage mRNA
pre-psiRNA75-100 nt
CASS RNApol(COG1353)Or other RT
Reverse transcription with random copy choice
Random RNA recombination and reverse transcription
dsDNA with CRISPRs and target-derived spacers
Homologous recombination with genomic CRISPR region
integrase(COG1518)
genomic DNA
genomic DNA with new target-derived spacer
OR
New psiRNA generation
Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12
Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity
Key validation
bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease
Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes
(1) Adaptation
(2) Expression amp Processing
(3) Interference
invading virus plasmid
CRISPRleader
5 4 3 2 1
6 5 4 3 2 1
invading virus plasmid
invading virus plasmid
host
host
cas operon
Cascade
Cas3
cas3 cas2cas1cas5
Van der Oost et al TIBS 2009
The three types of CRISPRCas systems and their signature genes
1343
1343
1343
1343
Helicase1203
HD-f
Cas2
Cas6
Cas6
Cas5
Cas8
Cas9
Cas1
Cas3
Cas10
Cas11
Cas7
Cas6
Cas6
Cas5 Cas6
Cas4
Cas6
13433513
1343
RuvC-f
3513
HNH
13433513
1857
18571857 RAMP1688
RAMP1688
RecB-f1468
RecB-f1468
Helicase1203
R AMP1583
1857 1857Helicase1203
1857Helicase1203
1343 y1724
ygcKygcL
EcoliCASS2
YpestCASS3
NmeniCASS4
MtubeCASS6 RAMP module
DvulgCASS1
Tneap and HmariCASS7
ApernCASS5
Type I (Helicase cassette)
Cascade
Type II (HNH-type)
Type III (RAMP module or Polymerase cassette)
CASS4a
1857 1343RecB-f1468
Helicase1203
BH0338MTH1090LA3191
1421 RAMP1567
RAMP15833337RAMP
1583
RAMP1583
RAMP1769
SPy1049
Cas6
Cas61857
RecB-f1468
RAMP1688
Helicase1203
AF0070
R AMP1583
1343
1517
Cas12 Cas13
RAMP1688
RAMP1688
RAMP1688
Makarova et al NRM submitted
Experimental data on CRISPRCas systems
1857
1857
1857
RAMP1688
10
Cordip DIP0037Bacfra BF3951
Wolsuc WS1444Camjej Cj1522c
Neimen NMA0630Pasmul PM1126
Strthe str0658Strpyo Spy 0770
Treden TDE0328Staepi SERP2463
Mycgal MGA 0523Mycmob MMOB0320
Porgin PG1982Legpne lpp0161
Wolsuc WS1615Sulsol SSO1450
Aerper APE1240Sulsol SSO1405
Pyraer PAE0200Pyraer PAE0081
Metmaz MM0559Metmaz MM3249
Arcful AF1878Mansuc MS1635
Niteur NE0111Myctub Rv2817c
Nostoc alr0381Synech slr7071Thethe TTHB145
Chrvio CV1229Xanaxo XAC3842
Chltep CT1130Desvul DVUA0134
Metcap MCA0651Azoarc ebA3283
Deigeo_DRAFT_1682Mansuc MS0981
Baccla ABC3592Bachal BH0341
Strpyo Spy 1286Vibvul VVA1544
Nostoc alr1468Synech slr7092
Synech slr7016Geosul GSU0057
Thethe TTHB224Nostoc alr1568
Metkan MK1312
Esccol b2755Saltyp STM2938
Phopro PBPRC0034Geosul GSU1392
Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639
Symthe_STH663Corjei jk0643
Nocfar nfa44220Strave SAV7537
Metcap MCA0930Cordip DIP2214
Chrvio CV1756Zymmob ZMO0680
Pasmul PM0311Erwcar ECA3679
Yerpes y1722Legpne lpl2837Phopro PBPRB1995
Acinet ACIAD2484
Clotet CTC01148Fusnuc FN1177
Aquaeo aq 369Theten TTE2658Themar TM1797
Bacfra BF2544Porgin PG2014
Halmar pNG4053Clotet CTC01463
Metjan MJ0378Pyrhor PH1245
Thekod TK0455Metthe MTH1084
Arcful AF2435Metace MA3670
Nanequ NEQ017Pyrhor PH0173
Pictor PTO0003Pictor PTO0049Thevol TVN0106
1343
1343
1343
1343
1518
1518
1343
1518
RecB-f4343
RecB-f1468
RecB-f1468
RecB-f1468
Helicase12031857 RAMP
1688
RAMP1688
Helicase1203
Helicase1203
RAMPBH0337
RAMPYgcH
RAMPy1727
RAMPy17261857
Helicase1203
HD-f
Helicase1203
HD-f
1343
HD-f
HD-f
HD-f
RuvC-f
AF0070
BH0338
y1724
ygcKygcL
RAMP1583
HTH
AF1870
SPy1049
3574
AF1870
PH0918
HTH 3574
1857 RAMP1688
RecB-f1468
Helicase1203
HD-f
AF1870
MTH1090
AF1873
RAMP1583
3574
AF1870
3574
1343
1
7
2
3
4
5
6
7a
Sulsol SSO1999Sulsol SSO1440
Sultok ST2639Sultok ST0032
Sulsol SSO1402Pyraer PAE0068
Arcful AF1874Pyrhor PH0917Thekod TK0450
Pyraby PAB1689
13433513
3513
1518
RuvC-f
HNH
4aHNH
1 0
Pyrhor PH0162
6
64
6
4
16
17
7
7
7
7
1
2
7
7
7
7
5 7
5 75 7
5 75 7
5 7
6 1
7
Metkan MK1314
Metace MA1931
Metmaz MM3356
Metbar Mbar_A1355
Strthe stu0960Myctub Rv2823cStaepi SERP2461
Themar TM1811Mansuc MS1651
Niteur NE0123Thethe TTP0102
Metthe MTH1082
Metthe MTH326
Metjan MJ1672Pictor PTO0053Thevol TVN0112
Metkan MK1297Thethe TTP0115
Bachal BH0328Synech sll7090
Aquaeo aq 387Theten TTE2639
Themar TM1794Arcful AF1867
Pyrfur PF1129Thefus Tfu1578
Porgin PG1987Sultok ST1979
Sulsol SSO1991Sulsol SSO1729
Sultok ST0012Sulaci Saci 2046Nostoc all1479
1421 RAMP1567
RAMP1583
RAMPMTH323
RAMP1769
XMTH324
ldquoPolymeraserdquo cassette
ldquoHelicaserdquo cassette(A)
(B)
(C)
Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4
Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56
Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)
mRNA targeting
Phylogeny of Cas1 and the 3 types of CRISPRCas systems
228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes
EcoliCASS2Type IType I
Type I
Type I
Type I
Type III
Type III
Type III
Type III
Type III
Type III
Type II
YpestCASS3
NmeniCASS4
PolymeraseRAMPmoduleMtubeCASS6
PolRAMP
DvulgCASS1
TneapHmariCASS7
ApernCASS5
cas1cas9 cas2 cas4
cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI
II
IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2
RAMP
Makarova et al NRM submitted
CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes
54
256
1537
26
516
2
9
7 56107
3
10
13
383
210
46
216
8
1
7 70211
10
1
0102030405060708090
100
1623
84 6
17 7
2322
0
9
0
0
151
2
1
21
17
20 1
0
1533 28 7 14
0
9 7 40
117 210
0102030405060708090
100
Type I Type II Type IIICas1presenceabsence
Makarova et al in preparation
Modular evolution of the 3 types of CRISPRCas
Makarova et al in preparation
Mol Microbiol 2011 Jan79(2)484-502
A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair
Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF
Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair
Back to repair functions
The 3 major modalities of evolution
Koonin Wolf Biol Direct 2009
CRISPRCas as a bona fide Lamarckian system
Koonin Wolf Biol Direct 2009
Phenomenon Biological rolefunction
Phyletic spread
Lamarckian criteria
Genomic changes caused by environmental factor
Changes are specific to relevant genomic loci
Changes provide adaptation to the causative factor
Bona fide LamarckianCRISPRCas Defense against
viruses and other mobile elements
Most of the Archaeaand many bacteria
Yes Yes Yes
piRNA Defense against transposable elements in germline
Animals Yes Yes Yes
HGT (specific cases)
Adaptation to new environment stress response resistance
Archaea bacteria unicellular eukaryotes
Yes Yes Yes
Quasi-LamarckianHGT (general phenomenon)
Diverse innovations Archaea bacteria unicellular eukaryotes
Yes No Yesno
Stress-induced mutagenesis
Stress responseresistanceadaptation to new conditions
Ubiquitous Yes No or partially
Yes (but general evolvabilityenhanced as well)
Diverse Lamarckian and quasi-Lamarckian phenomena
Koonin Wolf Biol Direct 2009
Stress as a gauge of evolutionary modality
Koonin Wolf Biol Direct 2009
bullEvolution of parasites is intrinsic to any replicator system
bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since
bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms
bullPerennial arms race between parasites and hosts is one of the principal factors of evolution
Valerian DoljaOregon State University
Yuri Wolf NCBITatiana Senkevich NIAID NIH
Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille
Didier Raoult et al
Bill MartinUniv Duesseldorf
Kira Makarova NCBI
Evolution of antivirus defense systems
bull CRISPRCas system of adaptive immunity in prokaryotes
bull A case for Lamarckian evolutionbull The perennial virus-host arms race
CRISPR repeats and Cas genesCRISPR Clustered Regularly interspaced short palindromic repeatsCas CRISPR-associated (genes)
Sorek et al Nature Rev Microbiol 2008
Makarova KS Aravind L Grishin NV Rogozin IB Koonin EVA DNA repair system specific for thermophilic Archaea and bacteria predicted by genomic context analysisNucleic Acids Res 2002 Jan 1530(2)482-96
During a systematic analysis of conserved gene context in prokaryotic genomes a previously undetected complex partially conserved neighborhood consisting of more than 20 genes was discovered in most Archaea and some bacteria including the hyperthermophiles Thermotoga maritima and Aquifex aeolicus The gene composition and gene order in this neighborhood vary greatly between species but all versions have a stable conserved core that consists of five genes One of the core genes encodes a predicted DNA helicase often fused to a predicted HD-superfamily hydrolase and another encodes a RecB family exonuclease three core genes remain uncharacterized but one of these might encode a nuclease of a new family helliphelliphelliphellipThe functional features of the proteins encoded in this neighborhood suggest that they comprise a previously undetected DNA repair system which to our knowledge is the first repair system largely specific for thermophiles to be identified
Protein components of the system an update with unification of many diverse families
~25 families altogetherFamily SubfamilyA Phyletic
distributionBComments
1 COG1518 COG1518 (cas1) All Putative novel nucleaseintegrase Mostly α-helical protein
2 COG1343 COG1343 (cas2) COG3512ygbF-like
MTH324-likey1723_N-like
All Small protein related to VapD fused to helicase (COG1203) iny1723-like proteins
3 COG1203 COG1203 (cas3) All DNA helicase Most proteins have fusion to HD nuclease
4 RecB-like nuclease
COG1468 (cas4) COG4343 All RecB-like nuclease Contains three-cysteine C-terminal cluster
5 RAMP Repair-associated mysterious
protein
COG1688 (cas5)COG1769 COG1583COG1567 COG1336COG1367 COG1604COG1337 COG1332COG5551
BH0337-likeMJ0978-likeYgcH-like
y1726-like y1727-like
All Belong to ldquoRAMPrdquo family possibly RNA-binding proteinstructurally related to duplicated ferredoxin fold (PDB 1wj9)
6 COG1857 COG1857 COG3649 YgcJ-like y1725-like
All αβ protein predicted nuclease or integrase
7 HD-like nuclease
COG1203 (N-terminus) COG2254 All HD-like nuclease
8 BH0338 BH0338-likeMTH1090-like
All mostlyarchaea and FIRM
Large Zn-finger containing proteins probably nucleases (nucleaseactivity was shown for MTH1090 (ref)
9 COG1353 predicted
polymerase
COG1353 Most archaea some bacteria
Predicted palm-domain polymerase distantly related to viral RdRp and RT
hellipand ~15 other less common proteins Makarova et al BD 2006
CRISPR clustered regularly interspaced short palindromic repeats
CRISPR repeats
TB and TA
Strepto-like
Ecoli-like
Pasteurella-like
Cas2COG1343
Cas1COG1518
Cas4COG1468
Cas3COG1203
ldquoThe common structural characteristics of CRISPR loci are (i) the presence of multiple short direct repeats which show no or very little sequence variation within a given locus (ii) the presence of non-repetitive spacer sequences between the repeats of similar size (iii) the presence of a common leader sequence of a few hundred basepairs in most species harbouring multiple CRISPR loci (iv) the absence of long open reading frames within the locus and (v) the presence of the cas1 gene accompanied by the cas2 cas3 or cas4 genes in CRISPR-containing speciesrdquo
The cas genes are our ldquorepairrdquo system
ldquoHelicaserdquo cassette
ldquoPolymeraserdquo cassette
paragon of prokaryotic genome plasticity-extensive gene shuffling-evidence of multiple horizontal transfers-widespread gene loss
RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins
extreme sequence diversity MOTIFs specific I II III IV V COGs 13361367 hhshhGs ust-lKGhh+hh hhGtt hD lGhttsGh 160413371332 y1726-like slhlpEKuVRGT lRTIDs YGuVTsGhuh COG1851 hGphpGpsaFh hGFGRh BH0337-like TpA-h+GIh-uIh hhLpDV LGsREhuht COG1567 hhhpp ups-lhtAhh luscoGhGh COG1769 hhhh+Ph-hhh ss-hhGhlhsh hGh hGtcp+hsthchp COG1688 (Cas5) hhhhhts sssshhGhlsh lGttphh COGs 15835551 hhhhoPhhl hGtppshGFGl YgcH-like hHphlh hGu+uhGhGhh y1727-like LHphLh GFsthGLStss MJ0978-like hHNH lG+tsuhGhGol
Ferredoxin foldRNA-recognition motif (RRM)
The sequence similarity space of CRISPR repeats visualized with the BioLayoutprogram [26] Dots denote individual repeat sequences connecting lines represent Smith-Waterman similarities such that closer dots represent more similar sequences Dot colors denote cluster association as derived from MCL clustering The 12 largest clusters are indicated by circles together with their sequence logos coarse phylogenetic composition and sample secondary structures where applicableKunin et al Genome Biology 2007 8R61 doi101186gb-2007-8-4-r61
CRISPR show extreme diversity and complex clustering
Arcful GTTGAAATC-------AGACCAAAATGGGATTGAAAG- Metthe GTTAAAATC-------AGACCAAAATGGGATTGAAAT- Pyraby GTTCCAATA-------AGACTAGAATAGAATTGAAAG- Pyrhor GTTTAATAA--------GACTAAAATAGAATTGAAAG- Sulsol GATTAATCC-------------CAAAAGAATTGAAAG- Pyrhor GTTTCCGTA--------GAACTAAATAGTGTGGAAAG- Pyraby GTTTCCGTA--------GAACTTAGTAGTGTGGAAAG- Pyraer GTTTCAACT------------ATCTTTTGATTTTTGG- Pictor GTTAAATAA-------TAACCTAAATAGGATTGAAAG- Theaer GTAAAATAG---------ACCTTAATAGGATTGAAAG- Metace GTTTCAATC-------CCTCAAAGGTCTGATTTTAAC- Sultok GATGAATCC------------CAAAAGGAATTGAAAG- Sulaci GTTTTAGTT-------------TCTTGTCGTTATTAC- Pictor GTTTAAGAA--------TTACTAGATAGTATGGAGT-- Halmar GTTTCAGAC-------GGACCCTTGTGGGGTTGAAGC- Metjan ATTAAAATC-------AGACCGTTTCGGAATGGAAAT- Metkan GTTTCATTA-CCCGTATTATTACGGGTTAATTGCGAG- Nanequ CTTTCAATA---------TTTCTAATATATTAGAAAC- Themar GTTTCAATA-------CTTCCTTAGAGGTATGGAAAC- Theten GTTTCAATC-------CCTTTTAGGTAGGCTAAAAAC Thethe GTTGCAAAC-CTCGTTAGCCTCGTAGAGGATTGAAAC- Aquaeo GTTTTAACT--------CCACACGGTACATTAGAAAC- Bachal GTCGCACTC----TATATGGGT-GCGTGGATTGAAAT- Azoarcus GTGTTCCCC-------GCGCATCGCGGGGGTTGAAG-- Chltep GTCTTCCCC--------ACGCC-CGTGGGGGTGTTTC- Chrvio GTGCTCCCC-------CACGCA-CGTGGGGATGAACCG Clotet GTATTAGTA-------GCACCA-TATTGGAATGTAAAT Anabaena GTTTTAATTAACAAAAATCCCTATCAGGGATTGAAAC- Myctub GTTTCCGTC--CCCTCTCGGGGTTTTGGGTCTGACGAC Yerpes GTTCACTGC--------CGCACAGGCAGCTTAGAAA-- Metcap GTTTCAATCCACTCCCGGCTATTTAGCCGGGAGATAC- Mycgal GTTTTAGCA-CTGTACAATACTTGTGTAAGCAATAAC- Pasmul GTTAACTGC--------CGTATAGGCAGCTTAGAAA-- Pasmul GTTGTAGTTCCCTCTCTCATTTCGCAGTGCTACAAT-- Pasmul GTTCACCAT--------CGTGTAGATGGCTTAGAAA--
All
A large subset of CRISPRs is conserved among diverse specieseven between archaea and bacteria suggesting that
CRISPRs are horizontally transferred together with cas genes
Mojica FJ Diez-Villasenor C Garcia-Martinez J Soria E
Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elementsJ Mol Evol 2005 Feb60(2)174-82
Here we show that CRISPR spacers derive from preexisting sequences either chromosomal or within transmissible genetic elements such as bacteriophages and conjugative plasmids
helliphelliphelliphellipThe transcription of the CRISPR loci (Tang et al 2002) suggests that such activity could be executed by CRISPR-RNA molecules acting as regulatory RNA that specifically recognizes the target through the homologous RNA-spacer sequence similarly to the eukaryotic interference RNA
HypothesisCRISPRCasbull is a prokaryotic immunity system that functions on the
RNAi principle
bull integrates short fragments of essential phageplasmid genes into CRISPRs
bull When expressed these fragments (psiRNA ndash after prokaryotic siRNA) silence the target gene and make the organism immune to the respective agent
bull contains all or most of the protein activities involved in these processes
bull Some of the Cas proteins are functional analogs of the eukaryotic proteins involved in RNAi in particular components of RISC (RNA-Induced Silencing Complex) and form prokaryotic analogs of RISC (pRISCs)
Transcription
(protein-guided) RNA folding
specific transcription factor (COG1517)
cellular RNApol
polycistronic pre-psiRNA
p-dicer (Helicase+HD-Hydrolase)(COG1203 COG2254)
RNA processing(fast)
pre-psiRNA75-100 nt
RNA processing(slow)
p-dicer
psiRNA25-45 nt
p-slicer(COG1468 COG4343
COG1857)
RAMP
target RNA cleavage
p-RISC
plasmid or phage mRNA
Annealing to RNA target
p-RISC
The basic scheme of CASS functioning
psiRNA25-45 nt
RAMP
RAMP-RNA complex(unstable)plasmid or phage
mRNA
Primer elongationCASS RNApol(COG1353)
Amplified annealing
long dsRNA (stable)
p-dicerDuplex degradation
RAMPRAMP binding
Cycle continues
Annealing to RNA target
Variant of CASS functioning with polymerasepsi-RNA amplification(mostly in thermophiles but also in Mycobacteria)
plasmid or phage mRNA
pre-psiRNA75-100 nt
CASS RNApol(COG1353)Or other RT
Reverse transcription with random copy choice
Random RNA recombination and reverse transcription
dsDNA with CRISPRs and target-derived spacers
Homologous recombination with genomic CRISPR region
integrase(COG1518)
genomic DNA
genomic DNA with new target-derived spacer
OR
New psiRNA generation
Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12
Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity
Key validation
bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease
Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes
(1) Adaptation
(2) Expression amp Processing
(3) Interference
invading virus plasmid
CRISPRleader
5 4 3 2 1
6 5 4 3 2 1
invading virus plasmid
invading virus plasmid
host
host
cas operon
Cascade
Cas3
cas3 cas2cas1cas5
Van der Oost et al TIBS 2009
The three types of CRISPRCas systems and their signature genes
1343
1343
1343
1343
Helicase1203
HD-f
Cas2
Cas6
Cas6
Cas5
Cas8
Cas9
Cas1
Cas3
Cas10
Cas11
Cas7
Cas6
Cas6
Cas5 Cas6
Cas4
Cas6
13433513
1343
RuvC-f
3513
HNH
13433513
1857
18571857 RAMP1688
RAMP1688
RecB-f1468
RecB-f1468
Helicase1203
R AMP1583
1857 1857Helicase1203
1857Helicase1203
1343 y1724
ygcKygcL
EcoliCASS2
YpestCASS3
NmeniCASS4
MtubeCASS6 RAMP module
DvulgCASS1
Tneap and HmariCASS7
ApernCASS5
Type I (Helicase cassette)
Cascade
Type II (HNH-type)
Type III (RAMP module or Polymerase cassette)
CASS4a
1857 1343RecB-f1468
Helicase1203
BH0338MTH1090LA3191
1421 RAMP1567
RAMP15833337RAMP
1583
RAMP1583
RAMP1769
SPy1049
Cas6
Cas61857
RecB-f1468
RAMP1688
Helicase1203
AF0070
R AMP1583
1343
1517
Cas12 Cas13
RAMP1688
RAMP1688
RAMP1688
Makarova et al NRM submitted
Experimental data on CRISPRCas systems
1857
1857
1857
RAMP1688
10
Cordip DIP0037Bacfra BF3951
Wolsuc WS1444Camjej Cj1522c
Neimen NMA0630Pasmul PM1126
Strthe str0658Strpyo Spy 0770
Treden TDE0328Staepi SERP2463
Mycgal MGA 0523Mycmob MMOB0320
Porgin PG1982Legpne lpp0161
Wolsuc WS1615Sulsol SSO1450
Aerper APE1240Sulsol SSO1405
Pyraer PAE0200Pyraer PAE0081
Metmaz MM0559Metmaz MM3249
Arcful AF1878Mansuc MS1635
Niteur NE0111Myctub Rv2817c
Nostoc alr0381Synech slr7071Thethe TTHB145
Chrvio CV1229Xanaxo XAC3842
Chltep CT1130Desvul DVUA0134
Metcap MCA0651Azoarc ebA3283
Deigeo_DRAFT_1682Mansuc MS0981
Baccla ABC3592Bachal BH0341
Strpyo Spy 1286Vibvul VVA1544
Nostoc alr1468Synech slr7092
Synech slr7016Geosul GSU0057
Thethe TTHB224Nostoc alr1568
Metkan MK1312
Esccol b2755Saltyp STM2938
Phopro PBPRC0034Geosul GSU1392
Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639
Symthe_STH663Corjei jk0643
Nocfar nfa44220Strave SAV7537
Metcap MCA0930Cordip DIP2214
Chrvio CV1756Zymmob ZMO0680
Pasmul PM0311Erwcar ECA3679
Yerpes y1722Legpne lpl2837Phopro PBPRB1995
Acinet ACIAD2484
Clotet CTC01148Fusnuc FN1177
Aquaeo aq 369Theten TTE2658Themar TM1797
Bacfra BF2544Porgin PG2014
Halmar pNG4053Clotet CTC01463
Metjan MJ0378Pyrhor PH1245
Thekod TK0455Metthe MTH1084
Arcful AF2435Metace MA3670
Nanequ NEQ017Pyrhor PH0173
Pictor PTO0003Pictor PTO0049Thevol TVN0106
1343
1343
1343
1343
1518
1518
1343
1518
RecB-f4343
RecB-f1468
RecB-f1468
RecB-f1468
Helicase12031857 RAMP
1688
RAMP1688
Helicase1203
Helicase1203
RAMPBH0337
RAMPYgcH
RAMPy1727
RAMPy17261857
Helicase1203
HD-f
Helicase1203
HD-f
1343
HD-f
HD-f
HD-f
RuvC-f
AF0070
BH0338
y1724
ygcKygcL
RAMP1583
HTH
AF1870
SPy1049
3574
AF1870
PH0918
HTH 3574
1857 RAMP1688
RecB-f1468
Helicase1203
HD-f
AF1870
MTH1090
AF1873
RAMP1583
3574
AF1870
3574
1343
1
7
2
3
4
5
6
7a
Sulsol SSO1999Sulsol SSO1440
Sultok ST2639Sultok ST0032
Sulsol SSO1402Pyraer PAE0068
Arcful AF1874Pyrhor PH0917Thekod TK0450
Pyraby PAB1689
13433513
3513
1518
RuvC-f
HNH
4aHNH
1 0
Pyrhor PH0162
6
64
6
4
16
17
7
7
7
7
1
2
7
7
7
7
5 7
5 75 7
5 75 7
5 7
6 1
7
Metkan MK1314
Metace MA1931
Metmaz MM3356
Metbar Mbar_A1355
Strthe stu0960Myctub Rv2823cStaepi SERP2461
Themar TM1811Mansuc MS1651
Niteur NE0123Thethe TTP0102
Metthe MTH1082
Metthe MTH326
Metjan MJ1672Pictor PTO0053Thevol TVN0112
Metkan MK1297Thethe TTP0115
Bachal BH0328Synech sll7090
Aquaeo aq 387Theten TTE2639
Themar TM1794Arcful AF1867
Pyrfur PF1129Thefus Tfu1578
Porgin PG1987Sultok ST1979
Sulsol SSO1991Sulsol SSO1729
Sultok ST0012Sulaci Saci 2046Nostoc all1479
1421 RAMP1567
RAMP1583
RAMPMTH323
RAMP1769
XMTH324
ldquoPolymeraserdquo cassette
ldquoHelicaserdquo cassette(A)
(B)
(C)
Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4
Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56
Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)
mRNA targeting
Phylogeny of Cas1 and the 3 types of CRISPRCas systems
228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes
EcoliCASS2Type IType I
Type I
Type I
Type I
Type III
Type III
Type III
Type III
Type III
Type III
Type II
YpestCASS3
NmeniCASS4
PolymeraseRAMPmoduleMtubeCASS6
PolRAMP
DvulgCASS1
TneapHmariCASS7
ApernCASS5
cas1cas9 cas2 cas4
cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI
II
IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2
RAMP
Makarova et al NRM submitted
CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes
54
256
1537
26
516
2
9
7 56107
3
10
13
383
210
46
216
8
1
7 70211
10
1
0102030405060708090
100
1623
84 6
17 7
2322
0
9
0
0
151
2
1
21
17
20 1
0
1533 28 7 14
0
9 7 40
117 210
0102030405060708090
100
Type I Type II Type IIICas1presenceabsence
Makarova et al in preparation
Modular evolution of the 3 types of CRISPRCas
Makarova et al in preparation
Mol Microbiol 2011 Jan79(2)484-502
A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair
Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF
Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair
Back to repair functions
The 3 major modalities of evolution
Koonin Wolf Biol Direct 2009
CRISPRCas as a bona fide Lamarckian system
Koonin Wolf Biol Direct 2009
Phenomenon Biological rolefunction
Phyletic spread
Lamarckian criteria
Genomic changes caused by environmental factor
Changes are specific to relevant genomic loci
Changes provide adaptation to the causative factor
Bona fide LamarckianCRISPRCas Defense against
viruses and other mobile elements
Most of the Archaeaand many bacteria
Yes Yes Yes
piRNA Defense against transposable elements in germline
Animals Yes Yes Yes
HGT (specific cases)
Adaptation to new environment stress response resistance
Archaea bacteria unicellular eukaryotes
Yes Yes Yes
Quasi-LamarckianHGT (general phenomenon)
Diverse innovations Archaea bacteria unicellular eukaryotes
Yes No Yesno
Stress-induced mutagenesis
Stress responseresistanceadaptation to new conditions
Ubiquitous Yes No or partially
Yes (but general evolvabilityenhanced as well)
Diverse Lamarckian and quasi-Lamarckian phenomena
Koonin Wolf Biol Direct 2009
Stress as a gauge of evolutionary modality
Koonin Wolf Biol Direct 2009
bullEvolution of parasites is intrinsic to any replicator system
bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since
bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms
bullPerennial arms race between parasites and hosts is one of the principal factors of evolution
Valerian DoljaOregon State University
Yuri Wolf NCBITatiana Senkevich NIAID NIH
Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille
Didier Raoult et al
Bill MartinUniv Duesseldorf
Kira Makarova NCBI
CRISPR repeats and Cas genesCRISPR Clustered Regularly interspaced short palindromic repeatsCas CRISPR-associated (genes)
Sorek et al Nature Rev Microbiol 2008
Makarova KS Aravind L Grishin NV Rogozin IB Koonin EVA DNA repair system specific for thermophilic Archaea and bacteria predicted by genomic context analysisNucleic Acids Res 2002 Jan 1530(2)482-96
During a systematic analysis of conserved gene context in prokaryotic genomes a previously undetected complex partially conserved neighborhood consisting of more than 20 genes was discovered in most Archaea and some bacteria including the hyperthermophiles Thermotoga maritima and Aquifex aeolicus The gene composition and gene order in this neighborhood vary greatly between species but all versions have a stable conserved core that consists of five genes One of the core genes encodes a predicted DNA helicase often fused to a predicted HD-superfamily hydrolase and another encodes a RecB family exonuclease three core genes remain uncharacterized but one of these might encode a nuclease of a new family helliphelliphelliphellipThe functional features of the proteins encoded in this neighborhood suggest that they comprise a previously undetected DNA repair system which to our knowledge is the first repair system largely specific for thermophiles to be identified
Protein components of the system an update with unification of many diverse families
~25 families altogetherFamily SubfamilyA Phyletic
distributionBComments
1 COG1518 COG1518 (cas1) All Putative novel nucleaseintegrase Mostly α-helical protein
2 COG1343 COG1343 (cas2) COG3512ygbF-like
MTH324-likey1723_N-like
All Small protein related to VapD fused to helicase (COG1203) iny1723-like proteins
3 COG1203 COG1203 (cas3) All DNA helicase Most proteins have fusion to HD nuclease
4 RecB-like nuclease
COG1468 (cas4) COG4343 All RecB-like nuclease Contains three-cysteine C-terminal cluster
5 RAMP Repair-associated mysterious
protein
COG1688 (cas5)COG1769 COG1583COG1567 COG1336COG1367 COG1604COG1337 COG1332COG5551
BH0337-likeMJ0978-likeYgcH-like
y1726-like y1727-like
All Belong to ldquoRAMPrdquo family possibly RNA-binding proteinstructurally related to duplicated ferredoxin fold (PDB 1wj9)
6 COG1857 COG1857 COG3649 YgcJ-like y1725-like
All αβ protein predicted nuclease or integrase
7 HD-like nuclease
COG1203 (N-terminus) COG2254 All HD-like nuclease
8 BH0338 BH0338-likeMTH1090-like
All mostlyarchaea and FIRM
Large Zn-finger containing proteins probably nucleases (nucleaseactivity was shown for MTH1090 (ref)
9 COG1353 predicted
polymerase
COG1353 Most archaea some bacteria
Predicted palm-domain polymerase distantly related to viral RdRp and RT
hellipand ~15 other less common proteins Makarova et al BD 2006
CRISPR clustered regularly interspaced short palindromic repeats
CRISPR repeats
TB and TA
Strepto-like
Ecoli-like
Pasteurella-like
Cas2COG1343
Cas1COG1518
Cas4COG1468
Cas3COG1203
ldquoThe common structural characteristics of CRISPR loci are (i) the presence of multiple short direct repeats which show no or very little sequence variation within a given locus (ii) the presence of non-repetitive spacer sequences between the repeats of similar size (iii) the presence of a common leader sequence of a few hundred basepairs in most species harbouring multiple CRISPR loci (iv) the absence of long open reading frames within the locus and (v) the presence of the cas1 gene accompanied by the cas2 cas3 or cas4 genes in CRISPR-containing speciesrdquo
The cas genes are our ldquorepairrdquo system
ldquoHelicaserdquo cassette
ldquoPolymeraserdquo cassette
paragon of prokaryotic genome plasticity-extensive gene shuffling-evidence of multiple horizontal transfers-widespread gene loss
RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins
extreme sequence diversity MOTIFs specific I II III IV V COGs 13361367 hhshhGs ust-lKGhh+hh hhGtt hD lGhttsGh 160413371332 y1726-like slhlpEKuVRGT lRTIDs YGuVTsGhuh COG1851 hGphpGpsaFh hGFGRh BH0337-like TpA-h+GIh-uIh hhLpDV LGsREhuht COG1567 hhhpp ups-lhtAhh luscoGhGh COG1769 hhhh+Ph-hhh ss-hhGhlhsh hGh hGtcp+hsthchp COG1688 (Cas5) hhhhhts sssshhGhlsh lGttphh COGs 15835551 hhhhoPhhl hGtppshGFGl YgcH-like hHphlh hGu+uhGhGhh y1727-like LHphLh GFsthGLStss MJ0978-like hHNH lG+tsuhGhGol
Ferredoxin foldRNA-recognition motif (RRM)
The sequence similarity space of CRISPR repeats visualized with the BioLayoutprogram [26] Dots denote individual repeat sequences connecting lines represent Smith-Waterman similarities such that closer dots represent more similar sequences Dot colors denote cluster association as derived from MCL clustering The 12 largest clusters are indicated by circles together with their sequence logos coarse phylogenetic composition and sample secondary structures where applicableKunin et al Genome Biology 2007 8R61 doi101186gb-2007-8-4-r61
CRISPR show extreme diversity and complex clustering
Arcful GTTGAAATC-------AGACCAAAATGGGATTGAAAG- Metthe GTTAAAATC-------AGACCAAAATGGGATTGAAAT- Pyraby GTTCCAATA-------AGACTAGAATAGAATTGAAAG- Pyrhor GTTTAATAA--------GACTAAAATAGAATTGAAAG- Sulsol GATTAATCC-------------CAAAAGAATTGAAAG- Pyrhor GTTTCCGTA--------GAACTAAATAGTGTGGAAAG- Pyraby GTTTCCGTA--------GAACTTAGTAGTGTGGAAAG- Pyraer GTTTCAACT------------ATCTTTTGATTTTTGG- Pictor GTTAAATAA-------TAACCTAAATAGGATTGAAAG- Theaer GTAAAATAG---------ACCTTAATAGGATTGAAAG- Metace GTTTCAATC-------CCTCAAAGGTCTGATTTTAAC- Sultok GATGAATCC------------CAAAAGGAATTGAAAG- Sulaci GTTTTAGTT-------------TCTTGTCGTTATTAC- Pictor GTTTAAGAA--------TTACTAGATAGTATGGAGT-- Halmar GTTTCAGAC-------GGACCCTTGTGGGGTTGAAGC- Metjan ATTAAAATC-------AGACCGTTTCGGAATGGAAAT- Metkan GTTTCATTA-CCCGTATTATTACGGGTTAATTGCGAG- Nanequ CTTTCAATA---------TTTCTAATATATTAGAAAC- Themar GTTTCAATA-------CTTCCTTAGAGGTATGGAAAC- Theten GTTTCAATC-------CCTTTTAGGTAGGCTAAAAAC Thethe GTTGCAAAC-CTCGTTAGCCTCGTAGAGGATTGAAAC- Aquaeo GTTTTAACT--------CCACACGGTACATTAGAAAC- Bachal GTCGCACTC----TATATGGGT-GCGTGGATTGAAAT- Azoarcus GTGTTCCCC-------GCGCATCGCGGGGGTTGAAG-- Chltep GTCTTCCCC--------ACGCC-CGTGGGGGTGTTTC- Chrvio GTGCTCCCC-------CACGCA-CGTGGGGATGAACCG Clotet GTATTAGTA-------GCACCA-TATTGGAATGTAAAT Anabaena GTTTTAATTAACAAAAATCCCTATCAGGGATTGAAAC- Myctub GTTTCCGTC--CCCTCTCGGGGTTTTGGGTCTGACGAC Yerpes GTTCACTGC--------CGCACAGGCAGCTTAGAAA-- Metcap GTTTCAATCCACTCCCGGCTATTTAGCCGGGAGATAC- Mycgal GTTTTAGCA-CTGTACAATACTTGTGTAAGCAATAAC- Pasmul GTTAACTGC--------CGTATAGGCAGCTTAGAAA-- Pasmul GTTGTAGTTCCCTCTCTCATTTCGCAGTGCTACAAT-- Pasmul GTTCACCAT--------CGTGTAGATGGCTTAGAAA--
All
A large subset of CRISPRs is conserved among diverse specieseven between archaea and bacteria suggesting that
CRISPRs are horizontally transferred together with cas genes
Mojica FJ Diez-Villasenor C Garcia-Martinez J Soria E
Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elementsJ Mol Evol 2005 Feb60(2)174-82
Here we show that CRISPR spacers derive from preexisting sequences either chromosomal or within transmissible genetic elements such as bacteriophages and conjugative plasmids
helliphelliphelliphellipThe transcription of the CRISPR loci (Tang et al 2002) suggests that such activity could be executed by CRISPR-RNA molecules acting as regulatory RNA that specifically recognizes the target through the homologous RNA-spacer sequence similarly to the eukaryotic interference RNA
HypothesisCRISPRCasbull is a prokaryotic immunity system that functions on the
RNAi principle
bull integrates short fragments of essential phageplasmid genes into CRISPRs
bull When expressed these fragments (psiRNA ndash after prokaryotic siRNA) silence the target gene and make the organism immune to the respective agent
bull contains all or most of the protein activities involved in these processes
bull Some of the Cas proteins are functional analogs of the eukaryotic proteins involved in RNAi in particular components of RISC (RNA-Induced Silencing Complex) and form prokaryotic analogs of RISC (pRISCs)
Transcription
(protein-guided) RNA folding
specific transcription factor (COG1517)
cellular RNApol
polycistronic pre-psiRNA
p-dicer (Helicase+HD-Hydrolase)(COG1203 COG2254)
RNA processing(fast)
pre-psiRNA75-100 nt
RNA processing(slow)
p-dicer
psiRNA25-45 nt
p-slicer(COG1468 COG4343
COG1857)
RAMP
target RNA cleavage
p-RISC
plasmid or phage mRNA
Annealing to RNA target
p-RISC
The basic scheme of CASS functioning
psiRNA25-45 nt
RAMP
RAMP-RNA complex(unstable)plasmid or phage
mRNA
Primer elongationCASS RNApol(COG1353)
Amplified annealing
long dsRNA (stable)
p-dicerDuplex degradation
RAMPRAMP binding
Cycle continues
Annealing to RNA target
Variant of CASS functioning with polymerasepsi-RNA amplification(mostly in thermophiles but also in Mycobacteria)
plasmid or phage mRNA
pre-psiRNA75-100 nt
CASS RNApol(COG1353)Or other RT
Reverse transcription with random copy choice
Random RNA recombination and reverse transcription
dsDNA with CRISPRs and target-derived spacers
Homologous recombination with genomic CRISPR region
integrase(COG1518)
genomic DNA
genomic DNA with new target-derived spacer
OR
New psiRNA generation
Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12
Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity
Key validation
bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease
Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes
(1) Adaptation
(2) Expression amp Processing
(3) Interference
invading virus plasmid
CRISPRleader
5 4 3 2 1
6 5 4 3 2 1
invading virus plasmid
invading virus plasmid
host
host
cas operon
Cascade
Cas3
cas3 cas2cas1cas5
Van der Oost et al TIBS 2009
The three types of CRISPRCas systems and their signature genes
1343
1343
1343
1343
Helicase1203
HD-f
Cas2
Cas6
Cas6
Cas5
Cas8
Cas9
Cas1
Cas3
Cas10
Cas11
Cas7
Cas6
Cas6
Cas5 Cas6
Cas4
Cas6
13433513
1343
RuvC-f
3513
HNH
13433513
1857
18571857 RAMP1688
RAMP1688
RecB-f1468
RecB-f1468
Helicase1203
R AMP1583
1857 1857Helicase1203
1857Helicase1203
1343 y1724
ygcKygcL
EcoliCASS2
YpestCASS3
NmeniCASS4
MtubeCASS6 RAMP module
DvulgCASS1
Tneap and HmariCASS7
ApernCASS5
Type I (Helicase cassette)
Cascade
Type II (HNH-type)
Type III (RAMP module or Polymerase cassette)
CASS4a
1857 1343RecB-f1468
Helicase1203
BH0338MTH1090LA3191
1421 RAMP1567
RAMP15833337RAMP
1583
RAMP1583
RAMP1769
SPy1049
Cas6
Cas61857
RecB-f1468
RAMP1688
Helicase1203
AF0070
R AMP1583
1343
1517
Cas12 Cas13
RAMP1688
RAMP1688
RAMP1688
Makarova et al NRM submitted
Experimental data on CRISPRCas systems
1857
1857
1857
RAMP1688
10
Cordip DIP0037Bacfra BF3951
Wolsuc WS1444Camjej Cj1522c
Neimen NMA0630Pasmul PM1126
Strthe str0658Strpyo Spy 0770
Treden TDE0328Staepi SERP2463
Mycgal MGA 0523Mycmob MMOB0320
Porgin PG1982Legpne lpp0161
Wolsuc WS1615Sulsol SSO1450
Aerper APE1240Sulsol SSO1405
Pyraer PAE0200Pyraer PAE0081
Metmaz MM0559Metmaz MM3249
Arcful AF1878Mansuc MS1635
Niteur NE0111Myctub Rv2817c
Nostoc alr0381Synech slr7071Thethe TTHB145
Chrvio CV1229Xanaxo XAC3842
Chltep CT1130Desvul DVUA0134
Metcap MCA0651Azoarc ebA3283
Deigeo_DRAFT_1682Mansuc MS0981
Baccla ABC3592Bachal BH0341
Strpyo Spy 1286Vibvul VVA1544
Nostoc alr1468Synech slr7092
Synech slr7016Geosul GSU0057
Thethe TTHB224Nostoc alr1568
Metkan MK1312
Esccol b2755Saltyp STM2938
Phopro PBPRC0034Geosul GSU1392
Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639
Symthe_STH663Corjei jk0643
Nocfar nfa44220Strave SAV7537
Metcap MCA0930Cordip DIP2214
Chrvio CV1756Zymmob ZMO0680
Pasmul PM0311Erwcar ECA3679
Yerpes y1722Legpne lpl2837Phopro PBPRB1995
Acinet ACIAD2484
Clotet CTC01148Fusnuc FN1177
Aquaeo aq 369Theten TTE2658Themar TM1797
Bacfra BF2544Porgin PG2014
Halmar pNG4053Clotet CTC01463
Metjan MJ0378Pyrhor PH1245
Thekod TK0455Metthe MTH1084
Arcful AF2435Metace MA3670
Nanequ NEQ017Pyrhor PH0173
Pictor PTO0003Pictor PTO0049Thevol TVN0106
1343
1343
1343
1343
1518
1518
1343
1518
RecB-f4343
RecB-f1468
RecB-f1468
RecB-f1468
Helicase12031857 RAMP
1688
RAMP1688
Helicase1203
Helicase1203
RAMPBH0337
RAMPYgcH
RAMPy1727
RAMPy17261857
Helicase1203
HD-f
Helicase1203
HD-f
1343
HD-f
HD-f
HD-f
RuvC-f
AF0070
BH0338
y1724
ygcKygcL
RAMP1583
HTH
AF1870
SPy1049
3574
AF1870
PH0918
HTH 3574
1857 RAMP1688
RecB-f1468
Helicase1203
HD-f
AF1870
MTH1090
AF1873
RAMP1583
3574
AF1870
3574
1343
1
7
2
3
4
5
6
7a
Sulsol SSO1999Sulsol SSO1440
Sultok ST2639Sultok ST0032
Sulsol SSO1402Pyraer PAE0068
Arcful AF1874Pyrhor PH0917Thekod TK0450
Pyraby PAB1689
13433513
3513
1518
RuvC-f
HNH
4aHNH
1 0
Pyrhor PH0162
6
64
6
4
16
17
7
7
7
7
1
2
7
7
7
7
5 7
5 75 7
5 75 7
5 7
6 1
7
Metkan MK1314
Metace MA1931
Metmaz MM3356
Metbar Mbar_A1355
Strthe stu0960Myctub Rv2823cStaepi SERP2461
Themar TM1811Mansuc MS1651
Niteur NE0123Thethe TTP0102
Metthe MTH1082
Metthe MTH326
Metjan MJ1672Pictor PTO0053Thevol TVN0112
Metkan MK1297Thethe TTP0115
Bachal BH0328Synech sll7090
Aquaeo aq 387Theten TTE2639
Themar TM1794Arcful AF1867
Pyrfur PF1129Thefus Tfu1578
Porgin PG1987Sultok ST1979
Sulsol SSO1991Sulsol SSO1729
Sultok ST0012Sulaci Saci 2046Nostoc all1479
1421 RAMP1567
RAMP1583
RAMPMTH323
RAMP1769
XMTH324
ldquoPolymeraserdquo cassette
ldquoHelicaserdquo cassette(A)
(B)
(C)
Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4
Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56
Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)
mRNA targeting
Phylogeny of Cas1 and the 3 types of CRISPRCas systems
228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes
EcoliCASS2Type IType I
Type I
Type I
Type I
Type III
Type III
Type III
Type III
Type III
Type III
Type II
YpestCASS3
NmeniCASS4
PolymeraseRAMPmoduleMtubeCASS6
PolRAMP
DvulgCASS1
TneapHmariCASS7
ApernCASS5
cas1cas9 cas2 cas4
cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI
II
IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2
RAMP
Makarova et al NRM submitted
CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes
54
256
1537
26
516
2
9
7 56107
3
10
13
383
210
46
216
8
1
7 70211
10
1
0102030405060708090
100
1623
84 6
17 7
2322
0
9
0
0
151
2
1
21
17
20 1
0
1533 28 7 14
0
9 7 40
117 210
0102030405060708090
100
Type I Type II Type IIICas1presenceabsence
Makarova et al in preparation
Modular evolution of the 3 types of CRISPRCas
Makarova et al in preparation
Mol Microbiol 2011 Jan79(2)484-502
A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair
Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF
Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair
Back to repair functions
The 3 major modalities of evolution
Koonin Wolf Biol Direct 2009
CRISPRCas as a bona fide Lamarckian system
Koonin Wolf Biol Direct 2009
Phenomenon Biological rolefunction
Phyletic spread
Lamarckian criteria
Genomic changes caused by environmental factor
Changes are specific to relevant genomic loci
Changes provide adaptation to the causative factor
Bona fide LamarckianCRISPRCas Defense against
viruses and other mobile elements
Most of the Archaeaand many bacteria
Yes Yes Yes
piRNA Defense against transposable elements in germline
Animals Yes Yes Yes
HGT (specific cases)
Adaptation to new environment stress response resistance
Archaea bacteria unicellular eukaryotes
Yes Yes Yes
Quasi-LamarckianHGT (general phenomenon)
Diverse innovations Archaea bacteria unicellular eukaryotes
Yes No Yesno
Stress-induced mutagenesis
Stress responseresistanceadaptation to new conditions
Ubiquitous Yes No or partially
Yes (but general evolvabilityenhanced as well)
Diverse Lamarckian and quasi-Lamarckian phenomena
Koonin Wolf Biol Direct 2009
Stress as a gauge of evolutionary modality
Koonin Wolf Biol Direct 2009
bullEvolution of parasites is intrinsic to any replicator system
bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since
bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms
bullPerennial arms race between parasites and hosts is one of the principal factors of evolution
Valerian DoljaOregon State University
Yuri Wolf NCBITatiana Senkevich NIAID NIH
Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille
Didier Raoult et al
Bill MartinUniv Duesseldorf
Kira Makarova NCBI
Makarova KS Aravind L Grishin NV Rogozin IB Koonin EVA DNA repair system specific for thermophilic Archaea and bacteria predicted by genomic context analysisNucleic Acids Res 2002 Jan 1530(2)482-96
During a systematic analysis of conserved gene context in prokaryotic genomes a previously undetected complex partially conserved neighborhood consisting of more than 20 genes was discovered in most Archaea and some bacteria including the hyperthermophiles Thermotoga maritima and Aquifex aeolicus The gene composition and gene order in this neighborhood vary greatly between species but all versions have a stable conserved core that consists of five genes One of the core genes encodes a predicted DNA helicase often fused to a predicted HD-superfamily hydrolase and another encodes a RecB family exonuclease three core genes remain uncharacterized but one of these might encode a nuclease of a new family helliphelliphelliphellipThe functional features of the proteins encoded in this neighborhood suggest that they comprise a previously undetected DNA repair system which to our knowledge is the first repair system largely specific for thermophiles to be identified
Protein components of the system an update with unification of many diverse families
~25 families altogetherFamily SubfamilyA Phyletic
distributionBComments
1 COG1518 COG1518 (cas1) All Putative novel nucleaseintegrase Mostly α-helical protein
2 COG1343 COG1343 (cas2) COG3512ygbF-like
MTH324-likey1723_N-like
All Small protein related to VapD fused to helicase (COG1203) iny1723-like proteins
3 COG1203 COG1203 (cas3) All DNA helicase Most proteins have fusion to HD nuclease
4 RecB-like nuclease
COG1468 (cas4) COG4343 All RecB-like nuclease Contains three-cysteine C-terminal cluster
5 RAMP Repair-associated mysterious
protein
COG1688 (cas5)COG1769 COG1583COG1567 COG1336COG1367 COG1604COG1337 COG1332COG5551
BH0337-likeMJ0978-likeYgcH-like
y1726-like y1727-like
All Belong to ldquoRAMPrdquo family possibly RNA-binding proteinstructurally related to duplicated ferredoxin fold (PDB 1wj9)
6 COG1857 COG1857 COG3649 YgcJ-like y1725-like
All αβ protein predicted nuclease or integrase
7 HD-like nuclease
COG1203 (N-terminus) COG2254 All HD-like nuclease
8 BH0338 BH0338-likeMTH1090-like
All mostlyarchaea and FIRM
Large Zn-finger containing proteins probably nucleases (nucleaseactivity was shown for MTH1090 (ref)
9 COG1353 predicted
polymerase
COG1353 Most archaea some bacteria
Predicted palm-domain polymerase distantly related to viral RdRp and RT
hellipand ~15 other less common proteins Makarova et al BD 2006
CRISPR clustered regularly interspaced short palindromic repeats
CRISPR repeats
TB and TA
Strepto-like
Ecoli-like
Pasteurella-like
Cas2COG1343
Cas1COG1518
Cas4COG1468
Cas3COG1203
ldquoThe common structural characteristics of CRISPR loci are (i) the presence of multiple short direct repeats which show no or very little sequence variation within a given locus (ii) the presence of non-repetitive spacer sequences between the repeats of similar size (iii) the presence of a common leader sequence of a few hundred basepairs in most species harbouring multiple CRISPR loci (iv) the absence of long open reading frames within the locus and (v) the presence of the cas1 gene accompanied by the cas2 cas3 or cas4 genes in CRISPR-containing speciesrdquo
The cas genes are our ldquorepairrdquo system
ldquoHelicaserdquo cassette
ldquoPolymeraserdquo cassette
paragon of prokaryotic genome plasticity-extensive gene shuffling-evidence of multiple horizontal transfers-widespread gene loss
RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins
extreme sequence diversity MOTIFs specific I II III IV V COGs 13361367 hhshhGs ust-lKGhh+hh hhGtt hD lGhttsGh 160413371332 y1726-like slhlpEKuVRGT lRTIDs YGuVTsGhuh COG1851 hGphpGpsaFh hGFGRh BH0337-like TpA-h+GIh-uIh hhLpDV LGsREhuht COG1567 hhhpp ups-lhtAhh luscoGhGh COG1769 hhhh+Ph-hhh ss-hhGhlhsh hGh hGtcp+hsthchp COG1688 (Cas5) hhhhhts sssshhGhlsh lGttphh COGs 15835551 hhhhoPhhl hGtppshGFGl YgcH-like hHphlh hGu+uhGhGhh y1727-like LHphLh GFsthGLStss MJ0978-like hHNH lG+tsuhGhGol
Ferredoxin foldRNA-recognition motif (RRM)
The sequence similarity space of CRISPR repeats visualized with the BioLayoutprogram [26] Dots denote individual repeat sequences connecting lines represent Smith-Waterman similarities such that closer dots represent more similar sequences Dot colors denote cluster association as derived from MCL clustering The 12 largest clusters are indicated by circles together with their sequence logos coarse phylogenetic composition and sample secondary structures where applicableKunin et al Genome Biology 2007 8R61 doi101186gb-2007-8-4-r61
CRISPR show extreme diversity and complex clustering
Arcful GTTGAAATC-------AGACCAAAATGGGATTGAAAG- Metthe GTTAAAATC-------AGACCAAAATGGGATTGAAAT- Pyraby GTTCCAATA-------AGACTAGAATAGAATTGAAAG- Pyrhor GTTTAATAA--------GACTAAAATAGAATTGAAAG- Sulsol GATTAATCC-------------CAAAAGAATTGAAAG- Pyrhor GTTTCCGTA--------GAACTAAATAGTGTGGAAAG- Pyraby GTTTCCGTA--------GAACTTAGTAGTGTGGAAAG- Pyraer GTTTCAACT------------ATCTTTTGATTTTTGG- Pictor GTTAAATAA-------TAACCTAAATAGGATTGAAAG- Theaer GTAAAATAG---------ACCTTAATAGGATTGAAAG- Metace GTTTCAATC-------CCTCAAAGGTCTGATTTTAAC- Sultok GATGAATCC------------CAAAAGGAATTGAAAG- Sulaci GTTTTAGTT-------------TCTTGTCGTTATTAC- Pictor GTTTAAGAA--------TTACTAGATAGTATGGAGT-- Halmar GTTTCAGAC-------GGACCCTTGTGGGGTTGAAGC- Metjan ATTAAAATC-------AGACCGTTTCGGAATGGAAAT- Metkan GTTTCATTA-CCCGTATTATTACGGGTTAATTGCGAG- Nanequ CTTTCAATA---------TTTCTAATATATTAGAAAC- Themar GTTTCAATA-------CTTCCTTAGAGGTATGGAAAC- Theten GTTTCAATC-------CCTTTTAGGTAGGCTAAAAAC Thethe GTTGCAAAC-CTCGTTAGCCTCGTAGAGGATTGAAAC- Aquaeo GTTTTAACT--------CCACACGGTACATTAGAAAC- Bachal GTCGCACTC----TATATGGGT-GCGTGGATTGAAAT- Azoarcus GTGTTCCCC-------GCGCATCGCGGGGGTTGAAG-- Chltep GTCTTCCCC--------ACGCC-CGTGGGGGTGTTTC- Chrvio GTGCTCCCC-------CACGCA-CGTGGGGATGAACCG Clotet GTATTAGTA-------GCACCA-TATTGGAATGTAAAT Anabaena GTTTTAATTAACAAAAATCCCTATCAGGGATTGAAAC- Myctub GTTTCCGTC--CCCTCTCGGGGTTTTGGGTCTGACGAC Yerpes GTTCACTGC--------CGCACAGGCAGCTTAGAAA-- Metcap GTTTCAATCCACTCCCGGCTATTTAGCCGGGAGATAC- Mycgal GTTTTAGCA-CTGTACAATACTTGTGTAAGCAATAAC- Pasmul GTTAACTGC--------CGTATAGGCAGCTTAGAAA-- Pasmul GTTGTAGTTCCCTCTCTCATTTCGCAGTGCTACAAT-- Pasmul GTTCACCAT--------CGTGTAGATGGCTTAGAAA--
All
A large subset of CRISPRs is conserved among diverse specieseven between archaea and bacteria suggesting that
CRISPRs are horizontally transferred together with cas genes
Mojica FJ Diez-Villasenor C Garcia-Martinez J Soria E
Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elementsJ Mol Evol 2005 Feb60(2)174-82
Here we show that CRISPR spacers derive from preexisting sequences either chromosomal or within transmissible genetic elements such as bacteriophages and conjugative plasmids
helliphelliphelliphellipThe transcription of the CRISPR loci (Tang et al 2002) suggests that such activity could be executed by CRISPR-RNA molecules acting as regulatory RNA that specifically recognizes the target through the homologous RNA-spacer sequence similarly to the eukaryotic interference RNA
HypothesisCRISPRCasbull is a prokaryotic immunity system that functions on the
RNAi principle
bull integrates short fragments of essential phageplasmid genes into CRISPRs
bull When expressed these fragments (psiRNA ndash after prokaryotic siRNA) silence the target gene and make the organism immune to the respective agent
bull contains all or most of the protein activities involved in these processes
bull Some of the Cas proteins are functional analogs of the eukaryotic proteins involved in RNAi in particular components of RISC (RNA-Induced Silencing Complex) and form prokaryotic analogs of RISC (pRISCs)
Transcription
(protein-guided) RNA folding
specific transcription factor (COG1517)
cellular RNApol
polycistronic pre-psiRNA
p-dicer (Helicase+HD-Hydrolase)(COG1203 COG2254)
RNA processing(fast)
pre-psiRNA75-100 nt
RNA processing(slow)
p-dicer
psiRNA25-45 nt
p-slicer(COG1468 COG4343
COG1857)
RAMP
target RNA cleavage
p-RISC
plasmid or phage mRNA
Annealing to RNA target
p-RISC
The basic scheme of CASS functioning
psiRNA25-45 nt
RAMP
RAMP-RNA complex(unstable)plasmid or phage
mRNA
Primer elongationCASS RNApol(COG1353)
Amplified annealing
long dsRNA (stable)
p-dicerDuplex degradation
RAMPRAMP binding
Cycle continues
Annealing to RNA target
Variant of CASS functioning with polymerasepsi-RNA amplification(mostly in thermophiles but also in Mycobacteria)
plasmid or phage mRNA
pre-psiRNA75-100 nt
CASS RNApol(COG1353)Or other RT
Reverse transcription with random copy choice
Random RNA recombination and reverse transcription
dsDNA with CRISPRs and target-derived spacers
Homologous recombination with genomic CRISPR region
integrase(COG1518)
genomic DNA
genomic DNA with new target-derived spacer
OR
New psiRNA generation
Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12
Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity
Key validation
bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease
Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes
(1) Adaptation
(2) Expression amp Processing
(3) Interference
invading virus plasmid
CRISPRleader
5 4 3 2 1
6 5 4 3 2 1
invading virus plasmid
invading virus plasmid
host
host
cas operon
Cascade
Cas3
cas3 cas2cas1cas5
Van der Oost et al TIBS 2009
The three types of CRISPRCas systems and their signature genes
1343
1343
1343
1343
Helicase1203
HD-f
Cas2
Cas6
Cas6
Cas5
Cas8
Cas9
Cas1
Cas3
Cas10
Cas11
Cas7
Cas6
Cas6
Cas5 Cas6
Cas4
Cas6
13433513
1343
RuvC-f
3513
HNH
13433513
1857
18571857 RAMP1688
RAMP1688
RecB-f1468
RecB-f1468
Helicase1203
R AMP1583
1857 1857Helicase1203
1857Helicase1203
1343 y1724
ygcKygcL
EcoliCASS2
YpestCASS3
NmeniCASS4
MtubeCASS6 RAMP module
DvulgCASS1
Tneap and HmariCASS7
ApernCASS5
Type I (Helicase cassette)
Cascade
Type II (HNH-type)
Type III (RAMP module or Polymerase cassette)
CASS4a
1857 1343RecB-f1468
Helicase1203
BH0338MTH1090LA3191
1421 RAMP1567
RAMP15833337RAMP
1583
RAMP1583
RAMP1769
SPy1049
Cas6
Cas61857
RecB-f1468
RAMP1688
Helicase1203
AF0070
R AMP1583
1343
1517
Cas12 Cas13
RAMP1688
RAMP1688
RAMP1688
Makarova et al NRM submitted
Experimental data on CRISPRCas systems
1857
1857
1857
RAMP1688
10
Cordip DIP0037Bacfra BF3951
Wolsuc WS1444Camjej Cj1522c
Neimen NMA0630Pasmul PM1126
Strthe str0658Strpyo Spy 0770
Treden TDE0328Staepi SERP2463
Mycgal MGA 0523Mycmob MMOB0320
Porgin PG1982Legpne lpp0161
Wolsuc WS1615Sulsol SSO1450
Aerper APE1240Sulsol SSO1405
Pyraer PAE0200Pyraer PAE0081
Metmaz MM0559Metmaz MM3249
Arcful AF1878Mansuc MS1635
Niteur NE0111Myctub Rv2817c
Nostoc alr0381Synech slr7071Thethe TTHB145
Chrvio CV1229Xanaxo XAC3842
Chltep CT1130Desvul DVUA0134
Metcap MCA0651Azoarc ebA3283
Deigeo_DRAFT_1682Mansuc MS0981
Baccla ABC3592Bachal BH0341
Strpyo Spy 1286Vibvul VVA1544
Nostoc alr1468Synech slr7092
Synech slr7016Geosul GSU0057
Thethe TTHB224Nostoc alr1568
Metkan MK1312
Esccol b2755Saltyp STM2938
Phopro PBPRC0034Geosul GSU1392
Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639
Symthe_STH663Corjei jk0643
Nocfar nfa44220Strave SAV7537
Metcap MCA0930Cordip DIP2214
Chrvio CV1756Zymmob ZMO0680
Pasmul PM0311Erwcar ECA3679
Yerpes y1722Legpne lpl2837Phopro PBPRB1995
Acinet ACIAD2484
Clotet CTC01148Fusnuc FN1177
Aquaeo aq 369Theten TTE2658Themar TM1797
Bacfra BF2544Porgin PG2014
Halmar pNG4053Clotet CTC01463
Metjan MJ0378Pyrhor PH1245
Thekod TK0455Metthe MTH1084
Arcful AF2435Metace MA3670
Nanequ NEQ017Pyrhor PH0173
Pictor PTO0003Pictor PTO0049Thevol TVN0106
1343
1343
1343
1343
1518
1518
1343
1518
RecB-f4343
RecB-f1468
RecB-f1468
RecB-f1468
Helicase12031857 RAMP
1688
RAMP1688
Helicase1203
Helicase1203
RAMPBH0337
RAMPYgcH
RAMPy1727
RAMPy17261857
Helicase1203
HD-f
Helicase1203
HD-f
1343
HD-f
HD-f
HD-f
RuvC-f
AF0070
BH0338
y1724
ygcKygcL
RAMP1583
HTH
AF1870
SPy1049
3574
AF1870
PH0918
HTH 3574
1857 RAMP1688
RecB-f1468
Helicase1203
HD-f
AF1870
MTH1090
AF1873
RAMP1583
3574
AF1870
3574
1343
1
7
2
3
4
5
6
7a
Sulsol SSO1999Sulsol SSO1440
Sultok ST2639Sultok ST0032
Sulsol SSO1402Pyraer PAE0068
Arcful AF1874Pyrhor PH0917Thekod TK0450
Pyraby PAB1689
13433513
3513
1518
RuvC-f
HNH
4aHNH
1 0
Pyrhor PH0162
6
64
6
4
16
17
7
7
7
7
1
2
7
7
7
7
5 7
5 75 7
5 75 7
5 7
6 1
7
Metkan MK1314
Metace MA1931
Metmaz MM3356
Metbar Mbar_A1355
Strthe stu0960Myctub Rv2823cStaepi SERP2461
Themar TM1811Mansuc MS1651
Niteur NE0123Thethe TTP0102
Metthe MTH1082
Metthe MTH326
Metjan MJ1672Pictor PTO0053Thevol TVN0112
Metkan MK1297Thethe TTP0115
Bachal BH0328Synech sll7090
Aquaeo aq 387Theten TTE2639
Themar TM1794Arcful AF1867
Pyrfur PF1129Thefus Tfu1578
Porgin PG1987Sultok ST1979
Sulsol SSO1991Sulsol SSO1729
Sultok ST0012Sulaci Saci 2046Nostoc all1479
1421 RAMP1567
RAMP1583
RAMPMTH323
RAMP1769
XMTH324
ldquoPolymeraserdquo cassette
ldquoHelicaserdquo cassette(A)
(B)
(C)
Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4
Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56
Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)
mRNA targeting
Phylogeny of Cas1 and the 3 types of CRISPRCas systems
228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes
EcoliCASS2Type IType I
Type I
Type I
Type I
Type III
Type III
Type III
Type III
Type III
Type III
Type II
YpestCASS3
NmeniCASS4
PolymeraseRAMPmoduleMtubeCASS6
PolRAMP
DvulgCASS1
TneapHmariCASS7
ApernCASS5
cas1cas9 cas2 cas4
cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI
II
IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2
RAMP
Makarova et al NRM submitted
CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes
54
256
1537
26
516
2
9
7 56107
3
10
13
383
210
46
216
8
1
7 70211
10
1
0102030405060708090
100
1623
84 6
17 7
2322
0
9
0
0
151
2
1
21
17
20 1
0
1533 28 7 14
0
9 7 40
117 210
0102030405060708090
100
Type I Type II Type IIICas1presenceabsence
Makarova et al in preparation
Modular evolution of the 3 types of CRISPRCas
Makarova et al in preparation
Mol Microbiol 2011 Jan79(2)484-502
A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair
Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF
Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair
Back to repair functions
The 3 major modalities of evolution
Koonin Wolf Biol Direct 2009
CRISPRCas as a bona fide Lamarckian system
Koonin Wolf Biol Direct 2009
Phenomenon Biological rolefunction
Phyletic spread
Lamarckian criteria
Genomic changes caused by environmental factor
Changes are specific to relevant genomic loci
Changes provide adaptation to the causative factor
Bona fide LamarckianCRISPRCas Defense against
viruses and other mobile elements
Most of the Archaeaand many bacteria
Yes Yes Yes
piRNA Defense against transposable elements in germline
Animals Yes Yes Yes
HGT (specific cases)
Adaptation to new environment stress response resistance
Archaea bacteria unicellular eukaryotes
Yes Yes Yes
Quasi-LamarckianHGT (general phenomenon)
Diverse innovations Archaea bacteria unicellular eukaryotes
Yes No Yesno
Stress-induced mutagenesis
Stress responseresistanceadaptation to new conditions
Ubiquitous Yes No or partially
Yes (but general evolvabilityenhanced as well)
Diverse Lamarckian and quasi-Lamarckian phenomena
Koonin Wolf Biol Direct 2009
Stress as a gauge of evolutionary modality
Koonin Wolf Biol Direct 2009
bullEvolution of parasites is intrinsic to any replicator system
bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since
bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms
bullPerennial arms race between parasites and hosts is one of the principal factors of evolution
Valerian DoljaOregon State University
Yuri Wolf NCBITatiana Senkevich NIAID NIH
Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille
Didier Raoult et al
Bill MartinUniv Duesseldorf
Kira Makarova NCBI
Protein components of the system an update with unification of many diverse families
~25 families altogetherFamily SubfamilyA Phyletic
distributionBComments
1 COG1518 COG1518 (cas1) All Putative novel nucleaseintegrase Mostly α-helical protein
2 COG1343 COG1343 (cas2) COG3512ygbF-like
MTH324-likey1723_N-like
All Small protein related to VapD fused to helicase (COG1203) iny1723-like proteins
3 COG1203 COG1203 (cas3) All DNA helicase Most proteins have fusion to HD nuclease
4 RecB-like nuclease
COG1468 (cas4) COG4343 All RecB-like nuclease Contains three-cysteine C-terminal cluster
5 RAMP Repair-associated mysterious
protein
COG1688 (cas5)COG1769 COG1583COG1567 COG1336COG1367 COG1604COG1337 COG1332COG5551
BH0337-likeMJ0978-likeYgcH-like
y1726-like y1727-like
All Belong to ldquoRAMPrdquo family possibly RNA-binding proteinstructurally related to duplicated ferredoxin fold (PDB 1wj9)
6 COG1857 COG1857 COG3649 YgcJ-like y1725-like
All αβ protein predicted nuclease or integrase
7 HD-like nuclease
COG1203 (N-terminus) COG2254 All HD-like nuclease
8 BH0338 BH0338-likeMTH1090-like
All mostlyarchaea and FIRM
Large Zn-finger containing proteins probably nucleases (nucleaseactivity was shown for MTH1090 (ref)
9 COG1353 predicted
polymerase
COG1353 Most archaea some bacteria
Predicted palm-domain polymerase distantly related to viral RdRp and RT
hellipand ~15 other less common proteins Makarova et al BD 2006
CRISPR clustered regularly interspaced short palindromic repeats
CRISPR repeats
TB and TA
Strepto-like
Ecoli-like
Pasteurella-like
Cas2COG1343
Cas1COG1518
Cas4COG1468
Cas3COG1203
ldquoThe common structural characteristics of CRISPR loci are (i) the presence of multiple short direct repeats which show no or very little sequence variation within a given locus (ii) the presence of non-repetitive spacer sequences between the repeats of similar size (iii) the presence of a common leader sequence of a few hundred basepairs in most species harbouring multiple CRISPR loci (iv) the absence of long open reading frames within the locus and (v) the presence of the cas1 gene accompanied by the cas2 cas3 or cas4 genes in CRISPR-containing speciesrdquo
The cas genes are our ldquorepairrdquo system
ldquoHelicaserdquo cassette
ldquoPolymeraserdquo cassette
paragon of prokaryotic genome plasticity-extensive gene shuffling-evidence of multiple horizontal transfers-widespread gene loss
RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins
extreme sequence diversity MOTIFs specific I II III IV V COGs 13361367 hhshhGs ust-lKGhh+hh hhGtt hD lGhttsGh 160413371332 y1726-like slhlpEKuVRGT lRTIDs YGuVTsGhuh COG1851 hGphpGpsaFh hGFGRh BH0337-like TpA-h+GIh-uIh hhLpDV LGsREhuht COG1567 hhhpp ups-lhtAhh luscoGhGh COG1769 hhhh+Ph-hhh ss-hhGhlhsh hGh hGtcp+hsthchp COG1688 (Cas5) hhhhhts sssshhGhlsh lGttphh COGs 15835551 hhhhoPhhl hGtppshGFGl YgcH-like hHphlh hGu+uhGhGhh y1727-like LHphLh GFsthGLStss MJ0978-like hHNH lG+tsuhGhGol
Ferredoxin foldRNA-recognition motif (RRM)
The sequence similarity space of CRISPR repeats visualized with the BioLayoutprogram [26] Dots denote individual repeat sequences connecting lines represent Smith-Waterman similarities such that closer dots represent more similar sequences Dot colors denote cluster association as derived from MCL clustering The 12 largest clusters are indicated by circles together with their sequence logos coarse phylogenetic composition and sample secondary structures where applicableKunin et al Genome Biology 2007 8R61 doi101186gb-2007-8-4-r61
CRISPR show extreme diversity and complex clustering
Arcful GTTGAAATC-------AGACCAAAATGGGATTGAAAG- Metthe GTTAAAATC-------AGACCAAAATGGGATTGAAAT- Pyraby GTTCCAATA-------AGACTAGAATAGAATTGAAAG- Pyrhor GTTTAATAA--------GACTAAAATAGAATTGAAAG- Sulsol GATTAATCC-------------CAAAAGAATTGAAAG- Pyrhor GTTTCCGTA--------GAACTAAATAGTGTGGAAAG- Pyraby GTTTCCGTA--------GAACTTAGTAGTGTGGAAAG- Pyraer GTTTCAACT------------ATCTTTTGATTTTTGG- Pictor GTTAAATAA-------TAACCTAAATAGGATTGAAAG- Theaer GTAAAATAG---------ACCTTAATAGGATTGAAAG- Metace GTTTCAATC-------CCTCAAAGGTCTGATTTTAAC- Sultok GATGAATCC------------CAAAAGGAATTGAAAG- Sulaci GTTTTAGTT-------------TCTTGTCGTTATTAC- Pictor GTTTAAGAA--------TTACTAGATAGTATGGAGT-- Halmar GTTTCAGAC-------GGACCCTTGTGGGGTTGAAGC- Metjan ATTAAAATC-------AGACCGTTTCGGAATGGAAAT- Metkan GTTTCATTA-CCCGTATTATTACGGGTTAATTGCGAG- Nanequ CTTTCAATA---------TTTCTAATATATTAGAAAC- Themar GTTTCAATA-------CTTCCTTAGAGGTATGGAAAC- Theten GTTTCAATC-------CCTTTTAGGTAGGCTAAAAAC Thethe GTTGCAAAC-CTCGTTAGCCTCGTAGAGGATTGAAAC- Aquaeo GTTTTAACT--------CCACACGGTACATTAGAAAC- Bachal GTCGCACTC----TATATGGGT-GCGTGGATTGAAAT- Azoarcus GTGTTCCCC-------GCGCATCGCGGGGGTTGAAG-- Chltep GTCTTCCCC--------ACGCC-CGTGGGGGTGTTTC- Chrvio GTGCTCCCC-------CACGCA-CGTGGGGATGAACCG Clotet GTATTAGTA-------GCACCA-TATTGGAATGTAAAT Anabaena GTTTTAATTAACAAAAATCCCTATCAGGGATTGAAAC- Myctub GTTTCCGTC--CCCTCTCGGGGTTTTGGGTCTGACGAC Yerpes GTTCACTGC--------CGCACAGGCAGCTTAGAAA-- Metcap GTTTCAATCCACTCCCGGCTATTTAGCCGGGAGATAC- Mycgal GTTTTAGCA-CTGTACAATACTTGTGTAAGCAATAAC- Pasmul GTTAACTGC--------CGTATAGGCAGCTTAGAAA-- Pasmul GTTGTAGTTCCCTCTCTCATTTCGCAGTGCTACAAT-- Pasmul GTTCACCAT--------CGTGTAGATGGCTTAGAAA--
All
A large subset of CRISPRs is conserved among diverse specieseven between archaea and bacteria suggesting that
CRISPRs are horizontally transferred together with cas genes
Mojica FJ Diez-Villasenor C Garcia-Martinez J Soria E
Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elementsJ Mol Evol 2005 Feb60(2)174-82
Here we show that CRISPR spacers derive from preexisting sequences either chromosomal or within transmissible genetic elements such as bacteriophages and conjugative plasmids
helliphelliphelliphellipThe transcription of the CRISPR loci (Tang et al 2002) suggests that such activity could be executed by CRISPR-RNA molecules acting as regulatory RNA that specifically recognizes the target through the homologous RNA-spacer sequence similarly to the eukaryotic interference RNA
HypothesisCRISPRCasbull is a prokaryotic immunity system that functions on the
RNAi principle
bull integrates short fragments of essential phageplasmid genes into CRISPRs
bull When expressed these fragments (psiRNA ndash after prokaryotic siRNA) silence the target gene and make the organism immune to the respective agent
bull contains all or most of the protein activities involved in these processes
bull Some of the Cas proteins are functional analogs of the eukaryotic proteins involved in RNAi in particular components of RISC (RNA-Induced Silencing Complex) and form prokaryotic analogs of RISC (pRISCs)
Transcription
(protein-guided) RNA folding
specific transcription factor (COG1517)
cellular RNApol
polycistronic pre-psiRNA
p-dicer (Helicase+HD-Hydrolase)(COG1203 COG2254)
RNA processing(fast)
pre-psiRNA75-100 nt
RNA processing(slow)
p-dicer
psiRNA25-45 nt
p-slicer(COG1468 COG4343
COG1857)
RAMP
target RNA cleavage
p-RISC
plasmid or phage mRNA
Annealing to RNA target
p-RISC
The basic scheme of CASS functioning
psiRNA25-45 nt
RAMP
RAMP-RNA complex(unstable)plasmid or phage
mRNA
Primer elongationCASS RNApol(COG1353)
Amplified annealing
long dsRNA (stable)
p-dicerDuplex degradation
RAMPRAMP binding
Cycle continues
Annealing to RNA target
Variant of CASS functioning with polymerasepsi-RNA amplification(mostly in thermophiles but also in Mycobacteria)
plasmid or phage mRNA
pre-psiRNA75-100 nt
CASS RNApol(COG1353)Or other RT
Reverse transcription with random copy choice
Random RNA recombination and reverse transcription
dsDNA with CRISPRs and target-derived spacers
Homologous recombination with genomic CRISPR region
integrase(COG1518)
genomic DNA
genomic DNA with new target-derived spacer
OR
New psiRNA generation
Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12
Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity
Key validation
bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease
Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes
(1) Adaptation
(2) Expression amp Processing
(3) Interference
invading virus plasmid
CRISPRleader
5 4 3 2 1
6 5 4 3 2 1
invading virus plasmid
invading virus plasmid
host
host
cas operon
Cascade
Cas3
cas3 cas2cas1cas5
Van der Oost et al TIBS 2009
The three types of CRISPRCas systems and their signature genes
1343
1343
1343
1343
Helicase1203
HD-f
Cas2
Cas6
Cas6
Cas5
Cas8
Cas9
Cas1
Cas3
Cas10
Cas11
Cas7
Cas6
Cas6
Cas5 Cas6
Cas4
Cas6
13433513
1343
RuvC-f
3513
HNH
13433513
1857
18571857 RAMP1688
RAMP1688
RecB-f1468
RecB-f1468
Helicase1203
R AMP1583
1857 1857Helicase1203
1857Helicase1203
1343 y1724
ygcKygcL
EcoliCASS2
YpestCASS3
NmeniCASS4
MtubeCASS6 RAMP module
DvulgCASS1
Tneap and HmariCASS7
ApernCASS5
Type I (Helicase cassette)
Cascade
Type II (HNH-type)
Type III (RAMP module or Polymerase cassette)
CASS4a
1857 1343RecB-f1468
Helicase1203
BH0338MTH1090LA3191
1421 RAMP1567
RAMP15833337RAMP
1583
RAMP1583
RAMP1769
SPy1049
Cas6
Cas61857
RecB-f1468
RAMP1688
Helicase1203
AF0070
R AMP1583
1343
1517
Cas12 Cas13
RAMP1688
RAMP1688
RAMP1688
Makarova et al NRM submitted
Experimental data on CRISPRCas systems
1857
1857
1857
RAMP1688
10
Cordip DIP0037Bacfra BF3951
Wolsuc WS1444Camjej Cj1522c
Neimen NMA0630Pasmul PM1126
Strthe str0658Strpyo Spy 0770
Treden TDE0328Staepi SERP2463
Mycgal MGA 0523Mycmob MMOB0320
Porgin PG1982Legpne lpp0161
Wolsuc WS1615Sulsol SSO1450
Aerper APE1240Sulsol SSO1405
Pyraer PAE0200Pyraer PAE0081
Metmaz MM0559Metmaz MM3249
Arcful AF1878Mansuc MS1635
Niteur NE0111Myctub Rv2817c
Nostoc alr0381Synech slr7071Thethe TTHB145
Chrvio CV1229Xanaxo XAC3842
Chltep CT1130Desvul DVUA0134
Metcap MCA0651Azoarc ebA3283
Deigeo_DRAFT_1682Mansuc MS0981
Baccla ABC3592Bachal BH0341
Strpyo Spy 1286Vibvul VVA1544
Nostoc alr1468Synech slr7092
Synech slr7016Geosul GSU0057
Thethe TTHB224Nostoc alr1568
Metkan MK1312
Esccol b2755Saltyp STM2938
Phopro PBPRC0034Geosul GSU1392
Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639
Symthe_STH663Corjei jk0643
Nocfar nfa44220Strave SAV7537
Metcap MCA0930Cordip DIP2214
Chrvio CV1756Zymmob ZMO0680
Pasmul PM0311Erwcar ECA3679
Yerpes y1722Legpne lpl2837Phopro PBPRB1995
Acinet ACIAD2484
Clotet CTC01148Fusnuc FN1177
Aquaeo aq 369Theten TTE2658Themar TM1797
Bacfra BF2544Porgin PG2014
Halmar pNG4053Clotet CTC01463
Metjan MJ0378Pyrhor PH1245
Thekod TK0455Metthe MTH1084
Arcful AF2435Metace MA3670
Nanequ NEQ017Pyrhor PH0173
Pictor PTO0003Pictor PTO0049Thevol TVN0106
1343
1343
1343
1343
1518
1518
1343
1518
RecB-f4343
RecB-f1468
RecB-f1468
RecB-f1468
Helicase12031857 RAMP
1688
RAMP1688
Helicase1203
Helicase1203
RAMPBH0337
RAMPYgcH
RAMPy1727
RAMPy17261857
Helicase1203
HD-f
Helicase1203
HD-f
1343
HD-f
HD-f
HD-f
RuvC-f
AF0070
BH0338
y1724
ygcKygcL
RAMP1583
HTH
AF1870
SPy1049
3574
AF1870
PH0918
HTH 3574
1857 RAMP1688
RecB-f1468
Helicase1203
HD-f
AF1870
MTH1090
AF1873
RAMP1583
3574
AF1870
3574
1343
1
7
2
3
4
5
6
7a
Sulsol SSO1999Sulsol SSO1440
Sultok ST2639Sultok ST0032
Sulsol SSO1402Pyraer PAE0068
Arcful AF1874Pyrhor PH0917Thekod TK0450
Pyraby PAB1689
13433513
3513
1518
RuvC-f
HNH
4aHNH
1 0
Pyrhor PH0162
6
64
6
4
16
17
7
7
7
7
1
2
7
7
7
7
5 7
5 75 7
5 75 7
5 7
6 1
7
Metkan MK1314
Metace MA1931
Metmaz MM3356
Metbar Mbar_A1355
Strthe stu0960Myctub Rv2823cStaepi SERP2461
Themar TM1811Mansuc MS1651
Niteur NE0123Thethe TTP0102
Metthe MTH1082
Metthe MTH326
Metjan MJ1672Pictor PTO0053Thevol TVN0112
Metkan MK1297Thethe TTP0115
Bachal BH0328Synech sll7090
Aquaeo aq 387Theten TTE2639
Themar TM1794Arcful AF1867
Pyrfur PF1129Thefus Tfu1578
Porgin PG1987Sultok ST1979
Sulsol SSO1991Sulsol SSO1729
Sultok ST0012Sulaci Saci 2046Nostoc all1479
1421 RAMP1567
RAMP1583
RAMPMTH323
RAMP1769
XMTH324
ldquoPolymeraserdquo cassette
ldquoHelicaserdquo cassette(A)
(B)
(C)
Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4
Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56
Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)
mRNA targeting
Phylogeny of Cas1 and the 3 types of CRISPRCas systems
228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes
EcoliCASS2Type IType I
Type I
Type I
Type I
Type III
Type III
Type III
Type III
Type III
Type III
Type II
YpestCASS3
NmeniCASS4
PolymeraseRAMPmoduleMtubeCASS6
PolRAMP
DvulgCASS1
TneapHmariCASS7
ApernCASS5
cas1cas9 cas2 cas4
cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI
II
IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2
RAMP
Makarova et al NRM submitted
CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes
54
256
1537
26
516
2
9
7 56107
3
10
13
383
210
46
216
8
1
7 70211
10
1
0102030405060708090
100
1623
84 6
17 7
2322
0
9
0
0
151
2
1
21
17
20 1
0
1533 28 7 14
0
9 7 40
117 210
0102030405060708090
100
Type I Type II Type IIICas1presenceabsence
Makarova et al in preparation
Modular evolution of the 3 types of CRISPRCas
Makarova et al in preparation
Mol Microbiol 2011 Jan79(2)484-502
A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair
Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF
Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair
Back to repair functions
The 3 major modalities of evolution
Koonin Wolf Biol Direct 2009
CRISPRCas as a bona fide Lamarckian system
Koonin Wolf Biol Direct 2009
Phenomenon Biological rolefunction
Phyletic spread
Lamarckian criteria
Genomic changes caused by environmental factor
Changes are specific to relevant genomic loci
Changes provide adaptation to the causative factor
Bona fide LamarckianCRISPRCas Defense against
viruses and other mobile elements
Most of the Archaeaand many bacteria
Yes Yes Yes
piRNA Defense against transposable elements in germline
Animals Yes Yes Yes
HGT (specific cases)
Adaptation to new environment stress response resistance
Archaea bacteria unicellular eukaryotes
Yes Yes Yes
Quasi-LamarckianHGT (general phenomenon)
Diverse innovations Archaea bacteria unicellular eukaryotes
Yes No Yesno
Stress-induced mutagenesis
Stress responseresistanceadaptation to new conditions
Ubiquitous Yes No or partially
Yes (but general evolvabilityenhanced as well)
Diverse Lamarckian and quasi-Lamarckian phenomena
Koonin Wolf Biol Direct 2009
Stress as a gauge of evolutionary modality
Koonin Wolf Biol Direct 2009
bullEvolution of parasites is intrinsic to any replicator system
bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since
bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms
bullPerennial arms race between parasites and hosts is one of the principal factors of evolution
Valerian DoljaOregon State University
Yuri Wolf NCBITatiana Senkevich NIAID NIH
Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille
Didier Raoult et al
Bill MartinUniv Duesseldorf
Kira Makarova NCBI
CRISPR clustered regularly interspaced short palindromic repeats
CRISPR repeats
TB and TA
Strepto-like
Ecoli-like
Pasteurella-like
Cas2COG1343
Cas1COG1518
Cas4COG1468
Cas3COG1203
ldquoThe common structural characteristics of CRISPR loci are (i) the presence of multiple short direct repeats which show no or very little sequence variation within a given locus (ii) the presence of non-repetitive spacer sequences between the repeats of similar size (iii) the presence of a common leader sequence of a few hundred basepairs in most species harbouring multiple CRISPR loci (iv) the absence of long open reading frames within the locus and (v) the presence of the cas1 gene accompanied by the cas2 cas3 or cas4 genes in CRISPR-containing speciesrdquo
The cas genes are our ldquorepairrdquo system
ldquoHelicaserdquo cassette
ldquoPolymeraserdquo cassette
paragon of prokaryotic genome plasticity-extensive gene shuffling-evidence of multiple horizontal transfers-widespread gene loss
RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins
extreme sequence diversity MOTIFs specific I II III IV V COGs 13361367 hhshhGs ust-lKGhh+hh hhGtt hD lGhttsGh 160413371332 y1726-like slhlpEKuVRGT lRTIDs YGuVTsGhuh COG1851 hGphpGpsaFh hGFGRh BH0337-like TpA-h+GIh-uIh hhLpDV LGsREhuht COG1567 hhhpp ups-lhtAhh luscoGhGh COG1769 hhhh+Ph-hhh ss-hhGhlhsh hGh hGtcp+hsthchp COG1688 (Cas5) hhhhhts sssshhGhlsh lGttphh COGs 15835551 hhhhoPhhl hGtppshGFGl YgcH-like hHphlh hGu+uhGhGhh y1727-like LHphLh GFsthGLStss MJ0978-like hHNH lG+tsuhGhGol
Ferredoxin foldRNA-recognition motif (RRM)
The sequence similarity space of CRISPR repeats visualized with the BioLayoutprogram [26] Dots denote individual repeat sequences connecting lines represent Smith-Waterman similarities such that closer dots represent more similar sequences Dot colors denote cluster association as derived from MCL clustering The 12 largest clusters are indicated by circles together with their sequence logos coarse phylogenetic composition and sample secondary structures where applicableKunin et al Genome Biology 2007 8R61 doi101186gb-2007-8-4-r61
CRISPR show extreme diversity and complex clustering
Arcful GTTGAAATC-------AGACCAAAATGGGATTGAAAG- Metthe GTTAAAATC-------AGACCAAAATGGGATTGAAAT- Pyraby GTTCCAATA-------AGACTAGAATAGAATTGAAAG- Pyrhor GTTTAATAA--------GACTAAAATAGAATTGAAAG- Sulsol GATTAATCC-------------CAAAAGAATTGAAAG- Pyrhor GTTTCCGTA--------GAACTAAATAGTGTGGAAAG- Pyraby GTTTCCGTA--------GAACTTAGTAGTGTGGAAAG- Pyraer GTTTCAACT------------ATCTTTTGATTTTTGG- Pictor GTTAAATAA-------TAACCTAAATAGGATTGAAAG- Theaer GTAAAATAG---------ACCTTAATAGGATTGAAAG- Metace GTTTCAATC-------CCTCAAAGGTCTGATTTTAAC- Sultok GATGAATCC------------CAAAAGGAATTGAAAG- Sulaci GTTTTAGTT-------------TCTTGTCGTTATTAC- Pictor GTTTAAGAA--------TTACTAGATAGTATGGAGT-- Halmar GTTTCAGAC-------GGACCCTTGTGGGGTTGAAGC- Metjan ATTAAAATC-------AGACCGTTTCGGAATGGAAAT- Metkan GTTTCATTA-CCCGTATTATTACGGGTTAATTGCGAG- Nanequ CTTTCAATA---------TTTCTAATATATTAGAAAC- Themar GTTTCAATA-------CTTCCTTAGAGGTATGGAAAC- Theten GTTTCAATC-------CCTTTTAGGTAGGCTAAAAAC Thethe GTTGCAAAC-CTCGTTAGCCTCGTAGAGGATTGAAAC- Aquaeo GTTTTAACT--------CCACACGGTACATTAGAAAC- Bachal GTCGCACTC----TATATGGGT-GCGTGGATTGAAAT- Azoarcus GTGTTCCCC-------GCGCATCGCGGGGGTTGAAG-- Chltep GTCTTCCCC--------ACGCC-CGTGGGGGTGTTTC- Chrvio GTGCTCCCC-------CACGCA-CGTGGGGATGAACCG Clotet GTATTAGTA-------GCACCA-TATTGGAATGTAAAT Anabaena GTTTTAATTAACAAAAATCCCTATCAGGGATTGAAAC- Myctub GTTTCCGTC--CCCTCTCGGGGTTTTGGGTCTGACGAC Yerpes GTTCACTGC--------CGCACAGGCAGCTTAGAAA-- Metcap GTTTCAATCCACTCCCGGCTATTTAGCCGGGAGATAC- Mycgal GTTTTAGCA-CTGTACAATACTTGTGTAAGCAATAAC- Pasmul GTTAACTGC--------CGTATAGGCAGCTTAGAAA-- Pasmul GTTGTAGTTCCCTCTCTCATTTCGCAGTGCTACAAT-- Pasmul GTTCACCAT--------CGTGTAGATGGCTTAGAAA--
All
A large subset of CRISPRs is conserved among diverse specieseven between archaea and bacteria suggesting that
CRISPRs are horizontally transferred together with cas genes
Mojica FJ Diez-Villasenor C Garcia-Martinez J Soria E
Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elementsJ Mol Evol 2005 Feb60(2)174-82
Here we show that CRISPR spacers derive from preexisting sequences either chromosomal or within transmissible genetic elements such as bacteriophages and conjugative plasmids
helliphelliphelliphellipThe transcription of the CRISPR loci (Tang et al 2002) suggests that such activity could be executed by CRISPR-RNA molecules acting as regulatory RNA that specifically recognizes the target through the homologous RNA-spacer sequence similarly to the eukaryotic interference RNA
HypothesisCRISPRCasbull is a prokaryotic immunity system that functions on the
RNAi principle
bull integrates short fragments of essential phageplasmid genes into CRISPRs
bull When expressed these fragments (psiRNA ndash after prokaryotic siRNA) silence the target gene and make the organism immune to the respective agent
bull contains all or most of the protein activities involved in these processes
bull Some of the Cas proteins are functional analogs of the eukaryotic proteins involved in RNAi in particular components of RISC (RNA-Induced Silencing Complex) and form prokaryotic analogs of RISC (pRISCs)
Transcription
(protein-guided) RNA folding
specific transcription factor (COG1517)
cellular RNApol
polycistronic pre-psiRNA
p-dicer (Helicase+HD-Hydrolase)(COG1203 COG2254)
RNA processing(fast)
pre-psiRNA75-100 nt
RNA processing(slow)
p-dicer
psiRNA25-45 nt
p-slicer(COG1468 COG4343
COG1857)
RAMP
target RNA cleavage
p-RISC
plasmid or phage mRNA
Annealing to RNA target
p-RISC
The basic scheme of CASS functioning
psiRNA25-45 nt
RAMP
RAMP-RNA complex(unstable)plasmid or phage
mRNA
Primer elongationCASS RNApol(COG1353)
Amplified annealing
long dsRNA (stable)
p-dicerDuplex degradation
RAMPRAMP binding
Cycle continues
Annealing to RNA target
Variant of CASS functioning with polymerasepsi-RNA amplification(mostly in thermophiles but also in Mycobacteria)
plasmid or phage mRNA
pre-psiRNA75-100 nt
CASS RNApol(COG1353)Or other RT
Reverse transcription with random copy choice
Random RNA recombination and reverse transcription
dsDNA with CRISPRs and target-derived spacers
Homologous recombination with genomic CRISPR region
integrase(COG1518)
genomic DNA
genomic DNA with new target-derived spacer
OR
New psiRNA generation
Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12
Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity
Key validation
bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease
Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes
(1) Adaptation
(2) Expression amp Processing
(3) Interference
invading virus plasmid
CRISPRleader
5 4 3 2 1
6 5 4 3 2 1
invading virus plasmid
invading virus plasmid
host
host
cas operon
Cascade
Cas3
cas3 cas2cas1cas5
Van der Oost et al TIBS 2009
The three types of CRISPRCas systems and their signature genes
1343
1343
1343
1343
Helicase1203
HD-f
Cas2
Cas6
Cas6
Cas5
Cas8
Cas9
Cas1
Cas3
Cas10
Cas11
Cas7
Cas6
Cas6
Cas5 Cas6
Cas4
Cas6
13433513
1343
RuvC-f
3513
HNH
13433513
1857
18571857 RAMP1688
RAMP1688
RecB-f1468
RecB-f1468
Helicase1203
R AMP1583
1857 1857Helicase1203
1857Helicase1203
1343 y1724
ygcKygcL
EcoliCASS2
YpestCASS3
NmeniCASS4
MtubeCASS6 RAMP module
DvulgCASS1
Tneap and HmariCASS7
ApernCASS5
Type I (Helicase cassette)
Cascade
Type II (HNH-type)
Type III (RAMP module or Polymerase cassette)
CASS4a
1857 1343RecB-f1468
Helicase1203
BH0338MTH1090LA3191
1421 RAMP1567
RAMP15833337RAMP
1583
RAMP1583
RAMP1769
SPy1049
Cas6
Cas61857
RecB-f1468
RAMP1688
Helicase1203
AF0070
R AMP1583
1343
1517
Cas12 Cas13
RAMP1688
RAMP1688
RAMP1688
Makarova et al NRM submitted
Experimental data on CRISPRCas systems
1857
1857
1857
RAMP1688
10
Cordip DIP0037Bacfra BF3951
Wolsuc WS1444Camjej Cj1522c
Neimen NMA0630Pasmul PM1126
Strthe str0658Strpyo Spy 0770
Treden TDE0328Staepi SERP2463
Mycgal MGA 0523Mycmob MMOB0320
Porgin PG1982Legpne lpp0161
Wolsuc WS1615Sulsol SSO1450
Aerper APE1240Sulsol SSO1405
Pyraer PAE0200Pyraer PAE0081
Metmaz MM0559Metmaz MM3249
Arcful AF1878Mansuc MS1635
Niteur NE0111Myctub Rv2817c
Nostoc alr0381Synech slr7071Thethe TTHB145
Chrvio CV1229Xanaxo XAC3842
Chltep CT1130Desvul DVUA0134
Metcap MCA0651Azoarc ebA3283
Deigeo_DRAFT_1682Mansuc MS0981
Baccla ABC3592Bachal BH0341
Strpyo Spy 1286Vibvul VVA1544
Nostoc alr1468Synech slr7092
Synech slr7016Geosul GSU0057
Thethe TTHB224Nostoc alr1568
Metkan MK1312
Esccol b2755Saltyp STM2938
Phopro PBPRC0034Geosul GSU1392
Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639
Symthe_STH663Corjei jk0643
Nocfar nfa44220Strave SAV7537
Metcap MCA0930Cordip DIP2214
Chrvio CV1756Zymmob ZMO0680
Pasmul PM0311Erwcar ECA3679
Yerpes y1722Legpne lpl2837Phopro PBPRB1995
Acinet ACIAD2484
Clotet CTC01148Fusnuc FN1177
Aquaeo aq 369Theten TTE2658Themar TM1797
Bacfra BF2544Porgin PG2014
Halmar pNG4053Clotet CTC01463
Metjan MJ0378Pyrhor PH1245
Thekod TK0455Metthe MTH1084
Arcful AF2435Metace MA3670
Nanequ NEQ017Pyrhor PH0173
Pictor PTO0003Pictor PTO0049Thevol TVN0106
1343
1343
1343
1343
1518
1518
1343
1518
RecB-f4343
RecB-f1468
RecB-f1468
RecB-f1468
Helicase12031857 RAMP
1688
RAMP1688
Helicase1203
Helicase1203
RAMPBH0337
RAMPYgcH
RAMPy1727
RAMPy17261857
Helicase1203
HD-f
Helicase1203
HD-f
1343
HD-f
HD-f
HD-f
RuvC-f
AF0070
BH0338
y1724
ygcKygcL
RAMP1583
HTH
AF1870
SPy1049
3574
AF1870
PH0918
HTH 3574
1857 RAMP1688
RecB-f1468
Helicase1203
HD-f
AF1870
MTH1090
AF1873
RAMP1583
3574
AF1870
3574
1343
1
7
2
3
4
5
6
7a
Sulsol SSO1999Sulsol SSO1440
Sultok ST2639Sultok ST0032
Sulsol SSO1402Pyraer PAE0068
Arcful AF1874Pyrhor PH0917Thekod TK0450
Pyraby PAB1689
13433513
3513
1518
RuvC-f
HNH
4aHNH
1 0
Pyrhor PH0162
6
64
6
4
16
17
7
7
7
7
1
2
7
7
7
7
5 7
5 75 7
5 75 7
5 7
6 1
7
Metkan MK1314
Metace MA1931
Metmaz MM3356
Metbar Mbar_A1355
Strthe stu0960Myctub Rv2823cStaepi SERP2461
Themar TM1811Mansuc MS1651
Niteur NE0123Thethe TTP0102
Metthe MTH1082
Metthe MTH326
Metjan MJ1672Pictor PTO0053Thevol TVN0112
Metkan MK1297Thethe TTP0115
Bachal BH0328Synech sll7090
Aquaeo aq 387Theten TTE2639
Themar TM1794Arcful AF1867
Pyrfur PF1129Thefus Tfu1578
Porgin PG1987Sultok ST1979
Sulsol SSO1991Sulsol SSO1729
Sultok ST0012Sulaci Saci 2046Nostoc all1479
1421 RAMP1567
RAMP1583
RAMPMTH323
RAMP1769
XMTH324
ldquoPolymeraserdquo cassette
ldquoHelicaserdquo cassette(A)
(B)
(C)
Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4
Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56
Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)
mRNA targeting
Phylogeny of Cas1 and the 3 types of CRISPRCas systems
228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes
EcoliCASS2Type IType I
Type I
Type I
Type I
Type III
Type III
Type III
Type III
Type III
Type III
Type II
YpestCASS3
NmeniCASS4
PolymeraseRAMPmoduleMtubeCASS6
PolRAMP
DvulgCASS1
TneapHmariCASS7
ApernCASS5
cas1cas9 cas2 cas4
cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI
II
IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2
RAMP
Makarova et al NRM submitted
CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes
54
256
1537
26
516
2
9
7 56107
3
10
13
383
210
46
216
8
1
7 70211
10
1
0102030405060708090
100
1623
84 6
17 7
2322
0
9
0
0
151
2
1
21
17
20 1
0
1533 28 7 14
0
9 7 40
117 210
0102030405060708090
100
Type I Type II Type IIICas1presenceabsence
Makarova et al in preparation
Modular evolution of the 3 types of CRISPRCas
Makarova et al in preparation
Mol Microbiol 2011 Jan79(2)484-502
A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair
Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF
Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair
Back to repair functions
The 3 major modalities of evolution
Koonin Wolf Biol Direct 2009
CRISPRCas as a bona fide Lamarckian system
Koonin Wolf Biol Direct 2009
Phenomenon Biological rolefunction
Phyletic spread
Lamarckian criteria
Genomic changes caused by environmental factor
Changes are specific to relevant genomic loci
Changes provide adaptation to the causative factor
Bona fide LamarckianCRISPRCas Defense against
viruses and other mobile elements
Most of the Archaeaand many bacteria
Yes Yes Yes
piRNA Defense against transposable elements in germline
Animals Yes Yes Yes
HGT (specific cases)
Adaptation to new environment stress response resistance
Archaea bacteria unicellular eukaryotes
Yes Yes Yes
Quasi-LamarckianHGT (general phenomenon)
Diverse innovations Archaea bacteria unicellular eukaryotes
Yes No Yesno
Stress-induced mutagenesis
Stress responseresistanceadaptation to new conditions
Ubiquitous Yes No or partially
Yes (but general evolvabilityenhanced as well)
Diverse Lamarckian and quasi-Lamarckian phenomena
Koonin Wolf Biol Direct 2009
Stress as a gauge of evolutionary modality
Koonin Wolf Biol Direct 2009
bullEvolution of parasites is intrinsic to any replicator system
bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since
bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms
bullPerennial arms race between parasites and hosts is one of the principal factors of evolution
Valerian DoljaOregon State University
Yuri Wolf NCBITatiana Senkevich NIAID NIH
Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille
Didier Raoult et al
Bill MartinUniv Duesseldorf
Kira Makarova NCBI
CRISPR repeats
TB and TA
Strepto-like
Ecoli-like
Pasteurella-like
Cas2COG1343
Cas1COG1518
Cas4COG1468
Cas3COG1203
ldquoThe common structural characteristics of CRISPR loci are (i) the presence of multiple short direct repeats which show no or very little sequence variation within a given locus (ii) the presence of non-repetitive spacer sequences between the repeats of similar size (iii) the presence of a common leader sequence of a few hundred basepairs in most species harbouring multiple CRISPR loci (iv) the absence of long open reading frames within the locus and (v) the presence of the cas1 gene accompanied by the cas2 cas3 or cas4 genes in CRISPR-containing speciesrdquo
The cas genes are our ldquorepairrdquo system
ldquoHelicaserdquo cassette
ldquoPolymeraserdquo cassette
paragon of prokaryotic genome plasticity-extensive gene shuffling-evidence of multiple horizontal transfers-widespread gene loss
RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins
extreme sequence diversity MOTIFs specific I II III IV V COGs 13361367 hhshhGs ust-lKGhh+hh hhGtt hD lGhttsGh 160413371332 y1726-like slhlpEKuVRGT lRTIDs YGuVTsGhuh COG1851 hGphpGpsaFh hGFGRh BH0337-like TpA-h+GIh-uIh hhLpDV LGsREhuht COG1567 hhhpp ups-lhtAhh luscoGhGh COG1769 hhhh+Ph-hhh ss-hhGhlhsh hGh hGtcp+hsthchp COG1688 (Cas5) hhhhhts sssshhGhlsh lGttphh COGs 15835551 hhhhoPhhl hGtppshGFGl YgcH-like hHphlh hGu+uhGhGhh y1727-like LHphLh GFsthGLStss MJ0978-like hHNH lG+tsuhGhGol
Ferredoxin foldRNA-recognition motif (RRM)
The sequence similarity space of CRISPR repeats visualized with the BioLayoutprogram [26] Dots denote individual repeat sequences connecting lines represent Smith-Waterman similarities such that closer dots represent more similar sequences Dot colors denote cluster association as derived from MCL clustering The 12 largest clusters are indicated by circles together with their sequence logos coarse phylogenetic composition and sample secondary structures where applicableKunin et al Genome Biology 2007 8R61 doi101186gb-2007-8-4-r61
CRISPR show extreme diversity and complex clustering
Arcful GTTGAAATC-------AGACCAAAATGGGATTGAAAG- Metthe GTTAAAATC-------AGACCAAAATGGGATTGAAAT- Pyraby GTTCCAATA-------AGACTAGAATAGAATTGAAAG- Pyrhor GTTTAATAA--------GACTAAAATAGAATTGAAAG- Sulsol GATTAATCC-------------CAAAAGAATTGAAAG- Pyrhor GTTTCCGTA--------GAACTAAATAGTGTGGAAAG- Pyraby GTTTCCGTA--------GAACTTAGTAGTGTGGAAAG- Pyraer GTTTCAACT------------ATCTTTTGATTTTTGG- Pictor GTTAAATAA-------TAACCTAAATAGGATTGAAAG- Theaer GTAAAATAG---------ACCTTAATAGGATTGAAAG- Metace GTTTCAATC-------CCTCAAAGGTCTGATTTTAAC- Sultok GATGAATCC------------CAAAAGGAATTGAAAG- Sulaci GTTTTAGTT-------------TCTTGTCGTTATTAC- Pictor GTTTAAGAA--------TTACTAGATAGTATGGAGT-- Halmar GTTTCAGAC-------GGACCCTTGTGGGGTTGAAGC- Metjan ATTAAAATC-------AGACCGTTTCGGAATGGAAAT- Metkan GTTTCATTA-CCCGTATTATTACGGGTTAATTGCGAG- Nanequ CTTTCAATA---------TTTCTAATATATTAGAAAC- Themar GTTTCAATA-------CTTCCTTAGAGGTATGGAAAC- Theten GTTTCAATC-------CCTTTTAGGTAGGCTAAAAAC Thethe GTTGCAAAC-CTCGTTAGCCTCGTAGAGGATTGAAAC- Aquaeo GTTTTAACT--------CCACACGGTACATTAGAAAC- Bachal GTCGCACTC----TATATGGGT-GCGTGGATTGAAAT- Azoarcus GTGTTCCCC-------GCGCATCGCGGGGGTTGAAG-- Chltep GTCTTCCCC--------ACGCC-CGTGGGGGTGTTTC- Chrvio GTGCTCCCC-------CACGCA-CGTGGGGATGAACCG Clotet GTATTAGTA-------GCACCA-TATTGGAATGTAAAT Anabaena GTTTTAATTAACAAAAATCCCTATCAGGGATTGAAAC- Myctub GTTTCCGTC--CCCTCTCGGGGTTTTGGGTCTGACGAC Yerpes GTTCACTGC--------CGCACAGGCAGCTTAGAAA-- Metcap GTTTCAATCCACTCCCGGCTATTTAGCCGGGAGATAC- Mycgal GTTTTAGCA-CTGTACAATACTTGTGTAAGCAATAAC- Pasmul GTTAACTGC--------CGTATAGGCAGCTTAGAAA-- Pasmul GTTGTAGTTCCCTCTCTCATTTCGCAGTGCTACAAT-- Pasmul GTTCACCAT--------CGTGTAGATGGCTTAGAAA--
All
A large subset of CRISPRs is conserved among diverse specieseven between archaea and bacteria suggesting that
CRISPRs are horizontally transferred together with cas genes
Mojica FJ Diez-Villasenor C Garcia-Martinez J Soria E
Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elementsJ Mol Evol 2005 Feb60(2)174-82
Here we show that CRISPR spacers derive from preexisting sequences either chromosomal or within transmissible genetic elements such as bacteriophages and conjugative plasmids
helliphelliphelliphellipThe transcription of the CRISPR loci (Tang et al 2002) suggests that such activity could be executed by CRISPR-RNA molecules acting as regulatory RNA that specifically recognizes the target through the homologous RNA-spacer sequence similarly to the eukaryotic interference RNA
HypothesisCRISPRCasbull is a prokaryotic immunity system that functions on the
RNAi principle
bull integrates short fragments of essential phageplasmid genes into CRISPRs
bull When expressed these fragments (psiRNA ndash after prokaryotic siRNA) silence the target gene and make the organism immune to the respective agent
bull contains all or most of the protein activities involved in these processes
bull Some of the Cas proteins are functional analogs of the eukaryotic proteins involved in RNAi in particular components of RISC (RNA-Induced Silencing Complex) and form prokaryotic analogs of RISC (pRISCs)
Transcription
(protein-guided) RNA folding
specific transcription factor (COG1517)
cellular RNApol
polycistronic pre-psiRNA
p-dicer (Helicase+HD-Hydrolase)(COG1203 COG2254)
RNA processing(fast)
pre-psiRNA75-100 nt
RNA processing(slow)
p-dicer
psiRNA25-45 nt
p-slicer(COG1468 COG4343
COG1857)
RAMP
target RNA cleavage
p-RISC
plasmid or phage mRNA
Annealing to RNA target
p-RISC
The basic scheme of CASS functioning
psiRNA25-45 nt
RAMP
RAMP-RNA complex(unstable)plasmid or phage
mRNA
Primer elongationCASS RNApol(COG1353)
Amplified annealing
long dsRNA (stable)
p-dicerDuplex degradation
RAMPRAMP binding
Cycle continues
Annealing to RNA target
Variant of CASS functioning with polymerasepsi-RNA amplification(mostly in thermophiles but also in Mycobacteria)
plasmid or phage mRNA
pre-psiRNA75-100 nt
CASS RNApol(COG1353)Or other RT
Reverse transcription with random copy choice
Random RNA recombination and reverse transcription
dsDNA with CRISPRs and target-derived spacers
Homologous recombination with genomic CRISPR region
integrase(COG1518)
genomic DNA
genomic DNA with new target-derived spacer
OR
New psiRNA generation
Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12
Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity
Key validation
bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease
Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes
(1) Adaptation
(2) Expression amp Processing
(3) Interference
invading virus plasmid
CRISPRleader
5 4 3 2 1
6 5 4 3 2 1
invading virus plasmid
invading virus plasmid
host
host
cas operon
Cascade
Cas3
cas3 cas2cas1cas5
Van der Oost et al TIBS 2009
The three types of CRISPRCas systems and their signature genes
1343
1343
1343
1343
Helicase1203
HD-f
Cas2
Cas6
Cas6
Cas5
Cas8
Cas9
Cas1
Cas3
Cas10
Cas11
Cas7
Cas6
Cas6
Cas5 Cas6
Cas4
Cas6
13433513
1343
RuvC-f
3513
HNH
13433513
1857
18571857 RAMP1688
RAMP1688
RecB-f1468
RecB-f1468
Helicase1203
R AMP1583
1857 1857Helicase1203
1857Helicase1203
1343 y1724
ygcKygcL
EcoliCASS2
YpestCASS3
NmeniCASS4
MtubeCASS6 RAMP module
DvulgCASS1
Tneap and HmariCASS7
ApernCASS5
Type I (Helicase cassette)
Cascade
Type II (HNH-type)
Type III (RAMP module or Polymerase cassette)
CASS4a
1857 1343RecB-f1468
Helicase1203
BH0338MTH1090LA3191
1421 RAMP1567
RAMP15833337RAMP
1583
RAMP1583
RAMP1769
SPy1049
Cas6
Cas61857
RecB-f1468
RAMP1688
Helicase1203
AF0070
R AMP1583
1343
1517
Cas12 Cas13
RAMP1688
RAMP1688
RAMP1688
Makarova et al NRM submitted
Experimental data on CRISPRCas systems
1857
1857
1857
RAMP1688
10
Cordip DIP0037Bacfra BF3951
Wolsuc WS1444Camjej Cj1522c
Neimen NMA0630Pasmul PM1126
Strthe str0658Strpyo Spy 0770
Treden TDE0328Staepi SERP2463
Mycgal MGA 0523Mycmob MMOB0320
Porgin PG1982Legpne lpp0161
Wolsuc WS1615Sulsol SSO1450
Aerper APE1240Sulsol SSO1405
Pyraer PAE0200Pyraer PAE0081
Metmaz MM0559Metmaz MM3249
Arcful AF1878Mansuc MS1635
Niteur NE0111Myctub Rv2817c
Nostoc alr0381Synech slr7071Thethe TTHB145
Chrvio CV1229Xanaxo XAC3842
Chltep CT1130Desvul DVUA0134
Metcap MCA0651Azoarc ebA3283
Deigeo_DRAFT_1682Mansuc MS0981
Baccla ABC3592Bachal BH0341
Strpyo Spy 1286Vibvul VVA1544
Nostoc alr1468Synech slr7092
Synech slr7016Geosul GSU0057
Thethe TTHB224Nostoc alr1568
Metkan MK1312
Esccol b2755Saltyp STM2938
Phopro PBPRC0034Geosul GSU1392
Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639
Symthe_STH663Corjei jk0643
Nocfar nfa44220Strave SAV7537
Metcap MCA0930Cordip DIP2214
Chrvio CV1756Zymmob ZMO0680
Pasmul PM0311Erwcar ECA3679
Yerpes y1722Legpne lpl2837Phopro PBPRB1995
Acinet ACIAD2484
Clotet CTC01148Fusnuc FN1177
Aquaeo aq 369Theten TTE2658Themar TM1797
Bacfra BF2544Porgin PG2014
Halmar pNG4053Clotet CTC01463
Metjan MJ0378Pyrhor PH1245
Thekod TK0455Metthe MTH1084
Arcful AF2435Metace MA3670
Nanequ NEQ017Pyrhor PH0173
Pictor PTO0003Pictor PTO0049Thevol TVN0106
1343
1343
1343
1343
1518
1518
1343
1518
RecB-f4343
RecB-f1468
RecB-f1468
RecB-f1468
Helicase12031857 RAMP
1688
RAMP1688
Helicase1203
Helicase1203
RAMPBH0337
RAMPYgcH
RAMPy1727
RAMPy17261857
Helicase1203
HD-f
Helicase1203
HD-f
1343
HD-f
HD-f
HD-f
RuvC-f
AF0070
BH0338
y1724
ygcKygcL
RAMP1583
HTH
AF1870
SPy1049
3574
AF1870
PH0918
HTH 3574
1857 RAMP1688
RecB-f1468
Helicase1203
HD-f
AF1870
MTH1090
AF1873
RAMP1583
3574
AF1870
3574
1343
1
7
2
3
4
5
6
7a
Sulsol SSO1999Sulsol SSO1440
Sultok ST2639Sultok ST0032
Sulsol SSO1402Pyraer PAE0068
Arcful AF1874Pyrhor PH0917Thekod TK0450
Pyraby PAB1689
13433513
3513
1518
RuvC-f
HNH
4aHNH
1 0
Pyrhor PH0162
6
64
6
4
16
17
7
7
7
7
1
2
7
7
7
7
5 7
5 75 7
5 75 7
5 7
6 1
7
Metkan MK1314
Metace MA1931
Metmaz MM3356
Metbar Mbar_A1355
Strthe stu0960Myctub Rv2823cStaepi SERP2461
Themar TM1811Mansuc MS1651
Niteur NE0123Thethe TTP0102
Metthe MTH1082
Metthe MTH326
Metjan MJ1672Pictor PTO0053Thevol TVN0112
Metkan MK1297Thethe TTP0115
Bachal BH0328Synech sll7090
Aquaeo aq 387Theten TTE2639
Themar TM1794Arcful AF1867
Pyrfur PF1129Thefus Tfu1578
Porgin PG1987Sultok ST1979
Sulsol SSO1991Sulsol SSO1729
Sultok ST0012Sulaci Saci 2046Nostoc all1479
1421 RAMP1567
RAMP1583
RAMPMTH323
RAMP1769
XMTH324
ldquoPolymeraserdquo cassette
ldquoHelicaserdquo cassette(A)
(B)
(C)
Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4
Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56
Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)
mRNA targeting
Phylogeny of Cas1 and the 3 types of CRISPRCas systems
228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes
EcoliCASS2Type IType I
Type I
Type I
Type I
Type III
Type III
Type III
Type III
Type III
Type III
Type II
YpestCASS3
NmeniCASS4
PolymeraseRAMPmoduleMtubeCASS6
PolRAMP
DvulgCASS1
TneapHmariCASS7
ApernCASS5
cas1cas9 cas2 cas4
cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI
II
IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2
RAMP
Makarova et al NRM submitted
CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes
54
256
1537
26
516
2
9
7 56107
3
10
13
383
210
46
216
8
1
7 70211
10
1
0102030405060708090
100
1623
84 6
17 7
2322
0
9
0
0
151
2
1
21
17
20 1
0
1533 28 7 14
0
9 7 40
117 210
0102030405060708090
100
Type I Type II Type IIICas1presenceabsence
Makarova et al in preparation
Modular evolution of the 3 types of CRISPRCas
Makarova et al in preparation
Mol Microbiol 2011 Jan79(2)484-502
A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair
Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF
Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair
Back to repair functions
The 3 major modalities of evolution
Koonin Wolf Biol Direct 2009
CRISPRCas as a bona fide Lamarckian system
Koonin Wolf Biol Direct 2009
Phenomenon Biological rolefunction
Phyletic spread
Lamarckian criteria
Genomic changes caused by environmental factor
Changes are specific to relevant genomic loci
Changes provide adaptation to the causative factor
Bona fide LamarckianCRISPRCas Defense against
viruses and other mobile elements
Most of the Archaeaand many bacteria
Yes Yes Yes
piRNA Defense against transposable elements in germline
Animals Yes Yes Yes
HGT (specific cases)
Adaptation to new environment stress response resistance
Archaea bacteria unicellular eukaryotes
Yes Yes Yes
Quasi-LamarckianHGT (general phenomenon)
Diverse innovations Archaea bacteria unicellular eukaryotes
Yes No Yesno
Stress-induced mutagenesis
Stress responseresistanceadaptation to new conditions
Ubiquitous Yes No or partially
Yes (but general evolvabilityenhanced as well)
Diverse Lamarckian and quasi-Lamarckian phenomena
Koonin Wolf Biol Direct 2009
Stress as a gauge of evolutionary modality
Koonin Wolf Biol Direct 2009
bullEvolution of parasites is intrinsic to any replicator system
bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since
bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms
bullPerennial arms race between parasites and hosts is one of the principal factors of evolution
Valerian DoljaOregon State University
Yuri Wolf NCBITatiana Senkevich NIAID NIH
Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille
Didier Raoult et al
Bill MartinUniv Duesseldorf
Kira Makarova NCBI
ldquoHelicaserdquo cassette
ldquoPolymeraserdquo cassette
paragon of prokaryotic genome plasticity-extensive gene shuffling-evidence of multiple horizontal transfers-widespread gene loss
RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins
extreme sequence diversity MOTIFs specific I II III IV V COGs 13361367 hhshhGs ust-lKGhh+hh hhGtt hD lGhttsGh 160413371332 y1726-like slhlpEKuVRGT lRTIDs YGuVTsGhuh COG1851 hGphpGpsaFh hGFGRh BH0337-like TpA-h+GIh-uIh hhLpDV LGsREhuht COG1567 hhhpp ups-lhtAhh luscoGhGh COG1769 hhhh+Ph-hhh ss-hhGhlhsh hGh hGtcp+hsthchp COG1688 (Cas5) hhhhhts sssshhGhlsh lGttphh COGs 15835551 hhhhoPhhl hGtppshGFGl YgcH-like hHphlh hGu+uhGhGhh y1727-like LHphLh GFsthGLStss MJ0978-like hHNH lG+tsuhGhGol
Ferredoxin foldRNA-recognition motif (RRM)
The sequence similarity space of CRISPR repeats visualized with the BioLayoutprogram [26] Dots denote individual repeat sequences connecting lines represent Smith-Waterman similarities such that closer dots represent more similar sequences Dot colors denote cluster association as derived from MCL clustering The 12 largest clusters are indicated by circles together with their sequence logos coarse phylogenetic composition and sample secondary structures where applicableKunin et al Genome Biology 2007 8R61 doi101186gb-2007-8-4-r61
CRISPR show extreme diversity and complex clustering
Arcful GTTGAAATC-------AGACCAAAATGGGATTGAAAG- Metthe GTTAAAATC-------AGACCAAAATGGGATTGAAAT- Pyraby GTTCCAATA-------AGACTAGAATAGAATTGAAAG- Pyrhor GTTTAATAA--------GACTAAAATAGAATTGAAAG- Sulsol GATTAATCC-------------CAAAAGAATTGAAAG- Pyrhor GTTTCCGTA--------GAACTAAATAGTGTGGAAAG- Pyraby GTTTCCGTA--------GAACTTAGTAGTGTGGAAAG- Pyraer GTTTCAACT------------ATCTTTTGATTTTTGG- Pictor GTTAAATAA-------TAACCTAAATAGGATTGAAAG- Theaer GTAAAATAG---------ACCTTAATAGGATTGAAAG- Metace GTTTCAATC-------CCTCAAAGGTCTGATTTTAAC- Sultok GATGAATCC------------CAAAAGGAATTGAAAG- Sulaci GTTTTAGTT-------------TCTTGTCGTTATTAC- Pictor GTTTAAGAA--------TTACTAGATAGTATGGAGT-- Halmar GTTTCAGAC-------GGACCCTTGTGGGGTTGAAGC- Metjan ATTAAAATC-------AGACCGTTTCGGAATGGAAAT- Metkan GTTTCATTA-CCCGTATTATTACGGGTTAATTGCGAG- Nanequ CTTTCAATA---------TTTCTAATATATTAGAAAC- Themar GTTTCAATA-------CTTCCTTAGAGGTATGGAAAC- Theten GTTTCAATC-------CCTTTTAGGTAGGCTAAAAAC Thethe GTTGCAAAC-CTCGTTAGCCTCGTAGAGGATTGAAAC- Aquaeo GTTTTAACT--------CCACACGGTACATTAGAAAC- Bachal GTCGCACTC----TATATGGGT-GCGTGGATTGAAAT- Azoarcus GTGTTCCCC-------GCGCATCGCGGGGGTTGAAG-- Chltep GTCTTCCCC--------ACGCC-CGTGGGGGTGTTTC- Chrvio GTGCTCCCC-------CACGCA-CGTGGGGATGAACCG Clotet GTATTAGTA-------GCACCA-TATTGGAATGTAAAT Anabaena GTTTTAATTAACAAAAATCCCTATCAGGGATTGAAAC- Myctub GTTTCCGTC--CCCTCTCGGGGTTTTGGGTCTGACGAC Yerpes GTTCACTGC--------CGCACAGGCAGCTTAGAAA-- Metcap GTTTCAATCCACTCCCGGCTATTTAGCCGGGAGATAC- Mycgal GTTTTAGCA-CTGTACAATACTTGTGTAAGCAATAAC- Pasmul GTTAACTGC--------CGTATAGGCAGCTTAGAAA-- Pasmul GTTGTAGTTCCCTCTCTCATTTCGCAGTGCTACAAT-- Pasmul GTTCACCAT--------CGTGTAGATGGCTTAGAAA--
All
A large subset of CRISPRs is conserved among diverse specieseven between archaea and bacteria suggesting that
CRISPRs are horizontally transferred together with cas genes
Mojica FJ Diez-Villasenor C Garcia-Martinez J Soria E
Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elementsJ Mol Evol 2005 Feb60(2)174-82
Here we show that CRISPR spacers derive from preexisting sequences either chromosomal or within transmissible genetic elements such as bacteriophages and conjugative plasmids
helliphelliphelliphellipThe transcription of the CRISPR loci (Tang et al 2002) suggests that such activity could be executed by CRISPR-RNA molecules acting as regulatory RNA that specifically recognizes the target through the homologous RNA-spacer sequence similarly to the eukaryotic interference RNA
HypothesisCRISPRCasbull is a prokaryotic immunity system that functions on the
RNAi principle
bull integrates short fragments of essential phageplasmid genes into CRISPRs
bull When expressed these fragments (psiRNA ndash after prokaryotic siRNA) silence the target gene and make the organism immune to the respective agent
bull contains all or most of the protein activities involved in these processes
bull Some of the Cas proteins are functional analogs of the eukaryotic proteins involved in RNAi in particular components of RISC (RNA-Induced Silencing Complex) and form prokaryotic analogs of RISC (pRISCs)
Transcription
(protein-guided) RNA folding
specific transcription factor (COG1517)
cellular RNApol
polycistronic pre-psiRNA
p-dicer (Helicase+HD-Hydrolase)(COG1203 COG2254)
RNA processing(fast)
pre-psiRNA75-100 nt
RNA processing(slow)
p-dicer
psiRNA25-45 nt
p-slicer(COG1468 COG4343
COG1857)
RAMP
target RNA cleavage
p-RISC
plasmid or phage mRNA
Annealing to RNA target
p-RISC
The basic scheme of CASS functioning
psiRNA25-45 nt
RAMP
RAMP-RNA complex(unstable)plasmid or phage
mRNA
Primer elongationCASS RNApol(COG1353)
Amplified annealing
long dsRNA (stable)
p-dicerDuplex degradation
RAMPRAMP binding
Cycle continues
Annealing to RNA target
Variant of CASS functioning with polymerasepsi-RNA amplification(mostly in thermophiles but also in Mycobacteria)
plasmid or phage mRNA
pre-psiRNA75-100 nt
CASS RNApol(COG1353)Or other RT
Reverse transcription with random copy choice
Random RNA recombination and reverse transcription
dsDNA with CRISPRs and target-derived spacers
Homologous recombination with genomic CRISPR region
integrase(COG1518)
genomic DNA
genomic DNA with new target-derived spacer
OR
New psiRNA generation
Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12
Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity
Key validation
bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease
Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes
(1) Adaptation
(2) Expression amp Processing
(3) Interference
invading virus plasmid
CRISPRleader
5 4 3 2 1
6 5 4 3 2 1
invading virus plasmid
invading virus plasmid
host
host
cas operon
Cascade
Cas3
cas3 cas2cas1cas5
Van der Oost et al TIBS 2009
The three types of CRISPRCas systems and their signature genes
1343
1343
1343
1343
Helicase1203
HD-f
Cas2
Cas6
Cas6
Cas5
Cas8
Cas9
Cas1
Cas3
Cas10
Cas11
Cas7
Cas6
Cas6
Cas5 Cas6
Cas4
Cas6
13433513
1343
RuvC-f
3513
HNH
13433513
1857
18571857 RAMP1688
RAMP1688
RecB-f1468
RecB-f1468
Helicase1203
R AMP1583
1857 1857Helicase1203
1857Helicase1203
1343 y1724
ygcKygcL
EcoliCASS2
YpestCASS3
NmeniCASS4
MtubeCASS6 RAMP module
DvulgCASS1
Tneap and HmariCASS7
ApernCASS5
Type I (Helicase cassette)
Cascade
Type II (HNH-type)
Type III (RAMP module or Polymerase cassette)
CASS4a
1857 1343RecB-f1468
Helicase1203
BH0338MTH1090LA3191
1421 RAMP1567
RAMP15833337RAMP
1583
RAMP1583
RAMP1769
SPy1049
Cas6
Cas61857
RecB-f1468
RAMP1688
Helicase1203
AF0070
R AMP1583
1343
1517
Cas12 Cas13
RAMP1688
RAMP1688
RAMP1688
Makarova et al NRM submitted
Experimental data on CRISPRCas systems
1857
1857
1857
RAMP1688
10
Cordip DIP0037Bacfra BF3951
Wolsuc WS1444Camjej Cj1522c
Neimen NMA0630Pasmul PM1126
Strthe str0658Strpyo Spy 0770
Treden TDE0328Staepi SERP2463
Mycgal MGA 0523Mycmob MMOB0320
Porgin PG1982Legpne lpp0161
Wolsuc WS1615Sulsol SSO1450
Aerper APE1240Sulsol SSO1405
Pyraer PAE0200Pyraer PAE0081
Metmaz MM0559Metmaz MM3249
Arcful AF1878Mansuc MS1635
Niteur NE0111Myctub Rv2817c
Nostoc alr0381Synech slr7071Thethe TTHB145
Chrvio CV1229Xanaxo XAC3842
Chltep CT1130Desvul DVUA0134
Metcap MCA0651Azoarc ebA3283
Deigeo_DRAFT_1682Mansuc MS0981
Baccla ABC3592Bachal BH0341
Strpyo Spy 1286Vibvul VVA1544
Nostoc alr1468Synech slr7092
Synech slr7016Geosul GSU0057
Thethe TTHB224Nostoc alr1568
Metkan MK1312
Esccol b2755Saltyp STM2938
Phopro PBPRC0034Geosul GSU1392
Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639
Symthe_STH663Corjei jk0643
Nocfar nfa44220Strave SAV7537
Metcap MCA0930Cordip DIP2214
Chrvio CV1756Zymmob ZMO0680
Pasmul PM0311Erwcar ECA3679
Yerpes y1722Legpne lpl2837Phopro PBPRB1995
Acinet ACIAD2484
Clotet CTC01148Fusnuc FN1177
Aquaeo aq 369Theten TTE2658Themar TM1797
Bacfra BF2544Porgin PG2014
Halmar pNG4053Clotet CTC01463
Metjan MJ0378Pyrhor PH1245
Thekod TK0455Metthe MTH1084
Arcful AF2435Metace MA3670
Nanequ NEQ017Pyrhor PH0173
Pictor PTO0003Pictor PTO0049Thevol TVN0106
1343
1343
1343
1343
1518
1518
1343
1518
RecB-f4343
RecB-f1468
RecB-f1468
RecB-f1468
Helicase12031857 RAMP
1688
RAMP1688
Helicase1203
Helicase1203
RAMPBH0337
RAMPYgcH
RAMPy1727
RAMPy17261857
Helicase1203
HD-f
Helicase1203
HD-f
1343
HD-f
HD-f
HD-f
RuvC-f
AF0070
BH0338
y1724
ygcKygcL
RAMP1583
HTH
AF1870
SPy1049
3574
AF1870
PH0918
HTH 3574
1857 RAMP1688
RecB-f1468
Helicase1203
HD-f
AF1870
MTH1090
AF1873
RAMP1583
3574
AF1870
3574
1343
1
7
2
3
4
5
6
7a
Sulsol SSO1999Sulsol SSO1440
Sultok ST2639Sultok ST0032
Sulsol SSO1402Pyraer PAE0068
Arcful AF1874Pyrhor PH0917Thekod TK0450
Pyraby PAB1689
13433513
3513
1518
RuvC-f
HNH
4aHNH
1 0
Pyrhor PH0162
6
64
6
4
16
17
7
7
7
7
1
2
7
7
7
7
5 7
5 75 7
5 75 7
5 7
6 1
7
Metkan MK1314
Metace MA1931
Metmaz MM3356
Metbar Mbar_A1355
Strthe stu0960Myctub Rv2823cStaepi SERP2461
Themar TM1811Mansuc MS1651
Niteur NE0123Thethe TTP0102
Metthe MTH1082
Metthe MTH326
Metjan MJ1672Pictor PTO0053Thevol TVN0112
Metkan MK1297Thethe TTP0115
Bachal BH0328Synech sll7090
Aquaeo aq 387Theten TTE2639
Themar TM1794Arcful AF1867
Pyrfur PF1129Thefus Tfu1578
Porgin PG1987Sultok ST1979
Sulsol SSO1991Sulsol SSO1729
Sultok ST0012Sulaci Saci 2046Nostoc all1479
1421 RAMP1567
RAMP1583
RAMPMTH323
RAMP1769
XMTH324
ldquoPolymeraserdquo cassette
ldquoHelicaserdquo cassette(A)
(B)
(C)
Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4
Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56
Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)
mRNA targeting
Phylogeny of Cas1 and the 3 types of CRISPRCas systems
228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes
EcoliCASS2Type IType I
Type I
Type I
Type I
Type III
Type III
Type III
Type III
Type III
Type III
Type II
YpestCASS3
NmeniCASS4
PolymeraseRAMPmoduleMtubeCASS6
PolRAMP
DvulgCASS1
TneapHmariCASS7
ApernCASS5
cas1cas9 cas2 cas4
cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI
II
IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2
RAMP
Makarova et al NRM submitted
CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes
54
256
1537
26
516
2
9
7 56107
3
10
13
383
210
46
216
8
1
7 70211
10
1
0102030405060708090
100
1623
84 6
17 7
2322
0
9
0
0
151
2
1
21
17
20 1
0
1533 28 7 14
0
9 7 40
117 210
0102030405060708090
100
Type I Type II Type IIICas1presenceabsence
Makarova et al in preparation
Modular evolution of the 3 types of CRISPRCas
Makarova et al in preparation
Mol Microbiol 2011 Jan79(2)484-502
A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair
Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF
Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair
Back to repair functions
The 3 major modalities of evolution
Koonin Wolf Biol Direct 2009
CRISPRCas as a bona fide Lamarckian system
Koonin Wolf Biol Direct 2009
Phenomenon Biological rolefunction
Phyletic spread
Lamarckian criteria
Genomic changes caused by environmental factor
Changes are specific to relevant genomic loci
Changes provide adaptation to the causative factor
Bona fide LamarckianCRISPRCas Defense against
viruses and other mobile elements
Most of the Archaeaand many bacteria
Yes Yes Yes
piRNA Defense against transposable elements in germline
Animals Yes Yes Yes
HGT (specific cases)
Adaptation to new environment stress response resistance
Archaea bacteria unicellular eukaryotes
Yes Yes Yes
Quasi-LamarckianHGT (general phenomenon)
Diverse innovations Archaea bacteria unicellular eukaryotes
Yes No Yesno
Stress-induced mutagenesis
Stress responseresistanceadaptation to new conditions
Ubiquitous Yes No or partially
Yes (but general evolvabilityenhanced as well)
Diverse Lamarckian and quasi-Lamarckian phenomena
Koonin Wolf Biol Direct 2009
Stress as a gauge of evolutionary modality
Koonin Wolf Biol Direct 2009
bullEvolution of parasites is intrinsic to any replicator system
bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since
bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms
bullPerennial arms race between parasites and hosts is one of the principal factors of evolution
Valerian DoljaOregon State University
Yuri Wolf NCBITatiana Senkevich NIAID NIH
Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille
Didier Raoult et al
Bill MartinUniv Duesseldorf
Kira Makarova NCBI
RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins
extreme sequence diversity MOTIFs specific I II III IV V COGs 13361367 hhshhGs ust-lKGhh+hh hhGtt hD lGhttsGh 160413371332 y1726-like slhlpEKuVRGT lRTIDs YGuVTsGhuh COG1851 hGphpGpsaFh hGFGRh BH0337-like TpA-h+GIh-uIh hhLpDV LGsREhuht COG1567 hhhpp ups-lhtAhh luscoGhGh COG1769 hhhh+Ph-hhh ss-hhGhlhsh hGh hGtcp+hsthchp COG1688 (Cas5) hhhhhts sssshhGhlsh lGttphh COGs 15835551 hhhhoPhhl hGtppshGFGl YgcH-like hHphlh hGu+uhGhGhh y1727-like LHphLh GFsthGLStss MJ0978-like hHNH lG+tsuhGhGol
Ferredoxin foldRNA-recognition motif (RRM)
The sequence similarity space of CRISPR repeats visualized with the BioLayoutprogram [26] Dots denote individual repeat sequences connecting lines represent Smith-Waterman similarities such that closer dots represent more similar sequences Dot colors denote cluster association as derived from MCL clustering The 12 largest clusters are indicated by circles together with their sequence logos coarse phylogenetic composition and sample secondary structures where applicableKunin et al Genome Biology 2007 8R61 doi101186gb-2007-8-4-r61
CRISPR show extreme diversity and complex clustering
Arcful GTTGAAATC-------AGACCAAAATGGGATTGAAAG- Metthe GTTAAAATC-------AGACCAAAATGGGATTGAAAT- Pyraby GTTCCAATA-------AGACTAGAATAGAATTGAAAG- Pyrhor GTTTAATAA--------GACTAAAATAGAATTGAAAG- Sulsol GATTAATCC-------------CAAAAGAATTGAAAG- Pyrhor GTTTCCGTA--------GAACTAAATAGTGTGGAAAG- Pyraby GTTTCCGTA--------GAACTTAGTAGTGTGGAAAG- Pyraer GTTTCAACT------------ATCTTTTGATTTTTGG- Pictor GTTAAATAA-------TAACCTAAATAGGATTGAAAG- Theaer GTAAAATAG---------ACCTTAATAGGATTGAAAG- Metace GTTTCAATC-------CCTCAAAGGTCTGATTTTAAC- Sultok GATGAATCC------------CAAAAGGAATTGAAAG- Sulaci GTTTTAGTT-------------TCTTGTCGTTATTAC- Pictor GTTTAAGAA--------TTACTAGATAGTATGGAGT-- Halmar GTTTCAGAC-------GGACCCTTGTGGGGTTGAAGC- Metjan ATTAAAATC-------AGACCGTTTCGGAATGGAAAT- Metkan GTTTCATTA-CCCGTATTATTACGGGTTAATTGCGAG- Nanequ CTTTCAATA---------TTTCTAATATATTAGAAAC- Themar GTTTCAATA-------CTTCCTTAGAGGTATGGAAAC- Theten GTTTCAATC-------CCTTTTAGGTAGGCTAAAAAC Thethe GTTGCAAAC-CTCGTTAGCCTCGTAGAGGATTGAAAC- Aquaeo GTTTTAACT--------CCACACGGTACATTAGAAAC- Bachal GTCGCACTC----TATATGGGT-GCGTGGATTGAAAT- Azoarcus GTGTTCCCC-------GCGCATCGCGGGGGTTGAAG-- Chltep GTCTTCCCC--------ACGCC-CGTGGGGGTGTTTC- Chrvio GTGCTCCCC-------CACGCA-CGTGGGGATGAACCG Clotet GTATTAGTA-------GCACCA-TATTGGAATGTAAAT Anabaena GTTTTAATTAACAAAAATCCCTATCAGGGATTGAAAC- Myctub GTTTCCGTC--CCCTCTCGGGGTTTTGGGTCTGACGAC Yerpes GTTCACTGC--------CGCACAGGCAGCTTAGAAA-- Metcap GTTTCAATCCACTCCCGGCTATTTAGCCGGGAGATAC- Mycgal GTTTTAGCA-CTGTACAATACTTGTGTAAGCAATAAC- Pasmul GTTAACTGC--------CGTATAGGCAGCTTAGAAA-- Pasmul GTTGTAGTTCCCTCTCTCATTTCGCAGTGCTACAAT-- Pasmul GTTCACCAT--------CGTGTAGATGGCTTAGAAA--
All
A large subset of CRISPRs is conserved among diverse specieseven between archaea and bacteria suggesting that
CRISPRs are horizontally transferred together with cas genes
Mojica FJ Diez-Villasenor C Garcia-Martinez J Soria E
Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elementsJ Mol Evol 2005 Feb60(2)174-82
Here we show that CRISPR spacers derive from preexisting sequences either chromosomal or within transmissible genetic elements such as bacteriophages and conjugative plasmids
helliphelliphelliphellipThe transcription of the CRISPR loci (Tang et al 2002) suggests that such activity could be executed by CRISPR-RNA molecules acting as regulatory RNA that specifically recognizes the target through the homologous RNA-spacer sequence similarly to the eukaryotic interference RNA
HypothesisCRISPRCasbull is a prokaryotic immunity system that functions on the
RNAi principle
bull integrates short fragments of essential phageplasmid genes into CRISPRs
bull When expressed these fragments (psiRNA ndash after prokaryotic siRNA) silence the target gene and make the organism immune to the respective agent
bull contains all or most of the protein activities involved in these processes
bull Some of the Cas proteins are functional analogs of the eukaryotic proteins involved in RNAi in particular components of RISC (RNA-Induced Silencing Complex) and form prokaryotic analogs of RISC (pRISCs)
Transcription
(protein-guided) RNA folding
specific transcription factor (COG1517)
cellular RNApol
polycistronic pre-psiRNA
p-dicer (Helicase+HD-Hydrolase)(COG1203 COG2254)
RNA processing(fast)
pre-psiRNA75-100 nt
RNA processing(slow)
p-dicer
psiRNA25-45 nt
p-slicer(COG1468 COG4343
COG1857)
RAMP
target RNA cleavage
p-RISC
plasmid or phage mRNA
Annealing to RNA target
p-RISC
The basic scheme of CASS functioning
psiRNA25-45 nt
RAMP
RAMP-RNA complex(unstable)plasmid or phage
mRNA
Primer elongationCASS RNApol(COG1353)
Amplified annealing
long dsRNA (stable)
p-dicerDuplex degradation
RAMPRAMP binding
Cycle continues
Annealing to RNA target
Variant of CASS functioning with polymerasepsi-RNA amplification(mostly in thermophiles but also in Mycobacteria)
plasmid or phage mRNA
pre-psiRNA75-100 nt
CASS RNApol(COG1353)Or other RT
Reverse transcription with random copy choice
Random RNA recombination and reverse transcription
dsDNA with CRISPRs and target-derived spacers
Homologous recombination with genomic CRISPR region
integrase(COG1518)
genomic DNA
genomic DNA with new target-derived spacer
OR
New psiRNA generation
Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12
Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity
Key validation
bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease
Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes
(1) Adaptation
(2) Expression amp Processing
(3) Interference
invading virus plasmid
CRISPRleader
5 4 3 2 1
6 5 4 3 2 1
invading virus plasmid
invading virus plasmid
host
host
cas operon
Cascade
Cas3
cas3 cas2cas1cas5
Van der Oost et al TIBS 2009
The three types of CRISPRCas systems and their signature genes
1343
1343
1343
1343
Helicase1203
HD-f
Cas2
Cas6
Cas6
Cas5
Cas8
Cas9
Cas1
Cas3
Cas10
Cas11
Cas7
Cas6
Cas6
Cas5 Cas6
Cas4
Cas6
13433513
1343
RuvC-f
3513
HNH
13433513
1857
18571857 RAMP1688
RAMP1688
RecB-f1468
RecB-f1468
Helicase1203
R AMP1583
1857 1857Helicase1203
1857Helicase1203
1343 y1724
ygcKygcL
EcoliCASS2
YpestCASS3
NmeniCASS4
MtubeCASS6 RAMP module
DvulgCASS1
Tneap and HmariCASS7
ApernCASS5
Type I (Helicase cassette)
Cascade
Type II (HNH-type)
Type III (RAMP module or Polymerase cassette)
CASS4a
1857 1343RecB-f1468
Helicase1203
BH0338MTH1090LA3191
1421 RAMP1567
RAMP15833337RAMP
1583
RAMP1583
RAMP1769
SPy1049
Cas6
Cas61857
RecB-f1468
RAMP1688
Helicase1203
AF0070
R AMP1583
1343
1517
Cas12 Cas13
RAMP1688
RAMP1688
RAMP1688
Makarova et al NRM submitted
Experimental data on CRISPRCas systems
1857
1857
1857
RAMP1688
10
Cordip DIP0037Bacfra BF3951
Wolsuc WS1444Camjej Cj1522c
Neimen NMA0630Pasmul PM1126
Strthe str0658Strpyo Spy 0770
Treden TDE0328Staepi SERP2463
Mycgal MGA 0523Mycmob MMOB0320
Porgin PG1982Legpne lpp0161
Wolsuc WS1615Sulsol SSO1450
Aerper APE1240Sulsol SSO1405
Pyraer PAE0200Pyraer PAE0081
Metmaz MM0559Metmaz MM3249
Arcful AF1878Mansuc MS1635
Niteur NE0111Myctub Rv2817c
Nostoc alr0381Synech slr7071Thethe TTHB145
Chrvio CV1229Xanaxo XAC3842
Chltep CT1130Desvul DVUA0134
Metcap MCA0651Azoarc ebA3283
Deigeo_DRAFT_1682Mansuc MS0981
Baccla ABC3592Bachal BH0341
Strpyo Spy 1286Vibvul VVA1544
Nostoc alr1468Synech slr7092
Synech slr7016Geosul GSU0057
Thethe TTHB224Nostoc alr1568
Metkan MK1312
Esccol b2755Saltyp STM2938
Phopro PBPRC0034Geosul GSU1392
Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639
Symthe_STH663Corjei jk0643
Nocfar nfa44220Strave SAV7537
Metcap MCA0930Cordip DIP2214
Chrvio CV1756Zymmob ZMO0680
Pasmul PM0311Erwcar ECA3679
Yerpes y1722Legpne lpl2837Phopro PBPRB1995
Acinet ACIAD2484
Clotet CTC01148Fusnuc FN1177
Aquaeo aq 369Theten TTE2658Themar TM1797
Bacfra BF2544Porgin PG2014
Halmar pNG4053Clotet CTC01463
Metjan MJ0378Pyrhor PH1245
Thekod TK0455Metthe MTH1084
Arcful AF2435Metace MA3670
Nanequ NEQ017Pyrhor PH0173
Pictor PTO0003Pictor PTO0049Thevol TVN0106
1343
1343
1343
1343
1518
1518
1343
1518
RecB-f4343
RecB-f1468
RecB-f1468
RecB-f1468
Helicase12031857 RAMP
1688
RAMP1688
Helicase1203
Helicase1203
RAMPBH0337
RAMPYgcH
RAMPy1727
RAMPy17261857
Helicase1203
HD-f
Helicase1203
HD-f
1343
HD-f
HD-f
HD-f
RuvC-f
AF0070
BH0338
y1724
ygcKygcL
RAMP1583
HTH
AF1870
SPy1049
3574
AF1870
PH0918
HTH 3574
1857 RAMP1688
RecB-f1468
Helicase1203
HD-f
AF1870
MTH1090
AF1873
RAMP1583
3574
AF1870
3574
1343
1
7
2
3
4
5
6
7a
Sulsol SSO1999Sulsol SSO1440
Sultok ST2639Sultok ST0032
Sulsol SSO1402Pyraer PAE0068
Arcful AF1874Pyrhor PH0917Thekod TK0450
Pyraby PAB1689
13433513
3513
1518
RuvC-f
HNH
4aHNH
1 0
Pyrhor PH0162
6
64
6
4
16
17
7
7
7
7
1
2
7
7
7
7
5 7
5 75 7
5 75 7
5 7
6 1
7
Metkan MK1314
Metace MA1931
Metmaz MM3356
Metbar Mbar_A1355
Strthe stu0960Myctub Rv2823cStaepi SERP2461
Themar TM1811Mansuc MS1651
Niteur NE0123Thethe TTP0102
Metthe MTH1082
Metthe MTH326
Metjan MJ1672Pictor PTO0053Thevol TVN0112
Metkan MK1297Thethe TTP0115
Bachal BH0328Synech sll7090
Aquaeo aq 387Theten TTE2639
Themar TM1794Arcful AF1867
Pyrfur PF1129Thefus Tfu1578
Porgin PG1987Sultok ST1979
Sulsol SSO1991Sulsol SSO1729
Sultok ST0012Sulaci Saci 2046Nostoc all1479
1421 RAMP1567
RAMP1583
RAMPMTH323
RAMP1769
XMTH324
ldquoPolymeraserdquo cassette
ldquoHelicaserdquo cassette(A)
(B)
(C)
Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4
Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56
Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)
mRNA targeting
Phylogeny of Cas1 and the 3 types of CRISPRCas systems
228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes
EcoliCASS2Type IType I
Type I
Type I
Type I
Type III
Type III
Type III
Type III
Type III
Type III
Type II
YpestCASS3
NmeniCASS4
PolymeraseRAMPmoduleMtubeCASS6
PolRAMP
DvulgCASS1
TneapHmariCASS7
ApernCASS5
cas1cas9 cas2 cas4
cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI
II
IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2
RAMP
Makarova et al NRM submitted
CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes
54
256
1537
26
516
2
9
7 56107
3
10
13
383
210
46
216
8
1
7 70211
10
1
0102030405060708090
100
1623
84 6
17 7
2322
0
9
0
0
151
2
1
21
17
20 1
0
1533 28 7 14
0
9 7 40
117 210
0102030405060708090
100
Type I Type II Type IIICas1presenceabsence
Makarova et al in preparation
Modular evolution of the 3 types of CRISPRCas
Makarova et al in preparation
Mol Microbiol 2011 Jan79(2)484-502
A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair
Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF
Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair
Back to repair functions
The 3 major modalities of evolution
Koonin Wolf Biol Direct 2009
CRISPRCas as a bona fide Lamarckian system
Koonin Wolf Biol Direct 2009
Phenomenon Biological rolefunction
Phyletic spread
Lamarckian criteria
Genomic changes caused by environmental factor
Changes are specific to relevant genomic loci
Changes provide adaptation to the causative factor
Bona fide LamarckianCRISPRCas Defense against
viruses and other mobile elements
Most of the Archaeaand many bacteria
Yes Yes Yes
piRNA Defense against transposable elements in germline
Animals Yes Yes Yes
HGT (specific cases)
Adaptation to new environment stress response resistance
Archaea bacteria unicellular eukaryotes
Yes Yes Yes
Quasi-LamarckianHGT (general phenomenon)
Diverse innovations Archaea bacteria unicellular eukaryotes
Yes No Yesno
Stress-induced mutagenesis
Stress responseresistanceadaptation to new conditions
Ubiquitous Yes No or partially
Yes (but general evolvabilityenhanced as well)
Diverse Lamarckian and quasi-Lamarckian phenomena
Koonin Wolf Biol Direct 2009
Stress as a gauge of evolutionary modality
Koonin Wolf Biol Direct 2009
bullEvolution of parasites is intrinsic to any replicator system
bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since
bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms
bullPerennial arms race between parasites and hosts is one of the principal factors of evolution
Valerian DoljaOregon State University
Yuri Wolf NCBITatiana Senkevich NIAID NIH
Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille
Didier Raoult et al
Bill MartinUniv Duesseldorf
Kira Makarova NCBI
The sequence similarity space of CRISPR repeats visualized with the BioLayoutprogram [26] Dots denote individual repeat sequences connecting lines represent Smith-Waterman similarities such that closer dots represent more similar sequences Dot colors denote cluster association as derived from MCL clustering The 12 largest clusters are indicated by circles together with their sequence logos coarse phylogenetic composition and sample secondary structures where applicableKunin et al Genome Biology 2007 8R61 doi101186gb-2007-8-4-r61
CRISPR show extreme diversity and complex clustering
Arcful GTTGAAATC-------AGACCAAAATGGGATTGAAAG- Metthe GTTAAAATC-------AGACCAAAATGGGATTGAAAT- Pyraby GTTCCAATA-------AGACTAGAATAGAATTGAAAG- Pyrhor GTTTAATAA--------GACTAAAATAGAATTGAAAG- Sulsol GATTAATCC-------------CAAAAGAATTGAAAG- Pyrhor GTTTCCGTA--------GAACTAAATAGTGTGGAAAG- Pyraby GTTTCCGTA--------GAACTTAGTAGTGTGGAAAG- Pyraer GTTTCAACT------------ATCTTTTGATTTTTGG- Pictor GTTAAATAA-------TAACCTAAATAGGATTGAAAG- Theaer GTAAAATAG---------ACCTTAATAGGATTGAAAG- Metace GTTTCAATC-------CCTCAAAGGTCTGATTTTAAC- Sultok GATGAATCC------------CAAAAGGAATTGAAAG- Sulaci GTTTTAGTT-------------TCTTGTCGTTATTAC- Pictor GTTTAAGAA--------TTACTAGATAGTATGGAGT-- Halmar GTTTCAGAC-------GGACCCTTGTGGGGTTGAAGC- Metjan ATTAAAATC-------AGACCGTTTCGGAATGGAAAT- Metkan GTTTCATTA-CCCGTATTATTACGGGTTAATTGCGAG- Nanequ CTTTCAATA---------TTTCTAATATATTAGAAAC- Themar GTTTCAATA-------CTTCCTTAGAGGTATGGAAAC- Theten GTTTCAATC-------CCTTTTAGGTAGGCTAAAAAC Thethe GTTGCAAAC-CTCGTTAGCCTCGTAGAGGATTGAAAC- Aquaeo GTTTTAACT--------CCACACGGTACATTAGAAAC- Bachal GTCGCACTC----TATATGGGT-GCGTGGATTGAAAT- Azoarcus GTGTTCCCC-------GCGCATCGCGGGGGTTGAAG-- Chltep GTCTTCCCC--------ACGCC-CGTGGGGGTGTTTC- Chrvio GTGCTCCCC-------CACGCA-CGTGGGGATGAACCG Clotet GTATTAGTA-------GCACCA-TATTGGAATGTAAAT Anabaena GTTTTAATTAACAAAAATCCCTATCAGGGATTGAAAC- Myctub GTTTCCGTC--CCCTCTCGGGGTTTTGGGTCTGACGAC Yerpes GTTCACTGC--------CGCACAGGCAGCTTAGAAA-- Metcap GTTTCAATCCACTCCCGGCTATTTAGCCGGGAGATAC- Mycgal GTTTTAGCA-CTGTACAATACTTGTGTAAGCAATAAC- Pasmul GTTAACTGC--------CGTATAGGCAGCTTAGAAA-- Pasmul GTTGTAGTTCCCTCTCTCATTTCGCAGTGCTACAAT-- Pasmul GTTCACCAT--------CGTGTAGATGGCTTAGAAA--
All
A large subset of CRISPRs is conserved among diverse specieseven between archaea and bacteria suggesting that
CRISPRs are horizontally transferred together with cas genes
Mojica FJ Diez-Villasenor C Garcia-Martinez J Soria E
Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elementsJ Mol Evol 2005 Feb60(2)174-82
Here we show that CRISPR spacers derive from preexisting sequences either chromosomal or within transmissible genetic elements such as bacteriophages and conjugative plasmids
helliphelliphelliphellipThe transcription of the CRISPR loci (Tang et al 2002) suggests that such activity could be executed by CRISPR-RNA molecules acting as regulatory RNA that specifically recognizes the target through the homologous RNA-spacer sequence similarly to the eukaryotic interference RNA
HypothesisCRISPRCasbull is a prokaryotic immunity system that functions on the
RNAi principle
bull integrates short fragments of essential phageplasmid genes into CRISPRs
bull When expressed these fragments (psiRNA ndash after prokaryotic siRNA) silence the target gene and make the organism immune to the respective agent
bull contains all or most of the protein activities involved in these processes
bull Some of the Cas proteins are functional analogs of the eukaryotic proteins involved in RNAi in particular components of RISC (RNA-Induced Silencing Complex) and form prokaryotic analogs of RISC (pRISCs)
Transcription
(protein-guided) RNA folding
specific transcription factor (COG1517)
cellular RNApol
polycistronic pre-psiRNA
p-dicer (Helicase+HD-Hydrolase)(COG1203 COG2254)
RNA processing(fast)
pre-psiRNA75-100 nt
RNA processing(slow)
p-dicer
psiRNA25-45 nt
p-slicer(COG1468 COG4343
COG1857)
RAMP
target RNA cleavage
p-RISC
plasmid or phage mRNA
Annealing to RNA target
p-RISC
The basic scheme of CASS functioning
psiRNA25-45 nt
RAMP
RAMP-RNA complex(unstable)plasmid or phage
mRNA
Primer elongationCASS RNApol(COG1353)
Amplified annealing
long dsRNA (stable)
p-dicerDuplex degradation
RAMPRAMP binding
Cycle continues
Annealing to RNA target
Variant of CASS functioning with polymerasepsi-RNA amplification(mostly in thermophiles but also in Mycobacteria)
plasmid or phage mRNA
pre-psiRNA75-100 nt
CASS RNApol(COG1353)Or other RT
Reverse transcription with random copy choice
Random RNA recombination and reverse transcription
dsDNA with CRISPRs and target-derived spacers
Homologous recombination with genomic CRISPR region
integrase(COG1518)
genomic DNA
genomic DNA with new target-derived spacer
OR
New psiRNA generation
Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12
Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity
Key validation
bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease
Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes
(1) Adaptation
(2) Expression amp Processing
(3) Interference
invading virus plasmid
CRISPRleader
5 4 3 2 1
6 5 4 3 2 1
invading virus plasmid
invading virus plasmid
host
host
cas operon
Cascade
Cas3
cas3 cas2cas1cas5
Van der Oost et al TIBS 2009
The three types of CRISPRCas systems and their signature genes
1343
1343
1343
1343
Helicase1203
HD-f
Cas2
Cas6
Cas6
Cas5
Cas8
Cas9
Cas1
Cas3
Cas10
Cas11
Cas7
Cas6
Cas6
Cas5 Cas6
Cas4
Cas6
13433513
1343
RuvC-f
3513
HNH
13433513
1857
18571857 RAMP1688
RAMP1688
RecB-f1468
RecB-f1468
Helicase1203
R AMP1583
1857 1857Helicase1203
1857Helicase1203
1343 y1724
ygcKygcL
EcoliCASS2
YpestCASS3
NmeniCASS4
MtubeCASS6 RAMP module
DvulgCASS1
Tneap and HmariCASS7
ApernCASS5
Type I (Helicase cassette)
Cascade
Type II (HNH-type)
Type III (RAMP module or Polymerase cassette)
CASS4a
1857 1343RecB-f1468
Helicase1203
BH0338MTH1090LA3191
1421 RAMP1567
RAMP15833337RAMP
1583
RAMP1583
RAMP1769
SPy1049
Cas6
Cas61857
RecB-f1468
RAMP1688
Helicase1203
AF0070
R AMP1583
1343
1517
Cas12 Cas13
RAMP1688
RAMP1688
RAMP1688
Makarova et al NRM submitted
Experimental data on CRISPRCas systems
1857
1857
1857
RAMP1688
10
Cordip DIP0037Bacfra BF3951
Wolsuc WS1444Camjej Cj1522c
Neimen NMA0630Pasmul PM1126
Strthe str0658Strpyo Spy 0770
Treden TDE0328Staepi SERP2463
Mycgal MGA 0523Mycmob MMOB0320
Porgin PG1982Legpne lpp0161
Wolsuc WS1615Sulsol SSO1450
Aerper APE1240Sulsol SSO1405
Pyraer PAE0200Pyraer PAE0081
Metmaz MM0559Metmaz MM3249
Arcful AF1878Mansuc MS1635
Niteur NE0111Myctub Rv2817c
Nostoc alr0381Synech slr7071Thethe TTHB145
Chrvio CV1229Xanaxo XAC3842
Chltep CT1130Desvul DVUA0134
Metcap MCA0651Azoarc ebA3283
Deigeo_DRAFT_1682Mansuc MS0981
Baccla ABC3592Bachal BH0341
Strpyo Spy 1286Vibvul VVA1544
Nostoc alr1468Synech slr7092
Synech slr7016Geosul GSU0057
Thethe TTHB224Nostoc alr1568
Metkan MK1312
Esccol b2755Saltyp STM2938
Phopro PBPRC0034Geosul GSU1392
Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639
Symthe_STH663Corjei jk0643
Nocfar nfa44220Strave SAV7537
Metcap MCA0930Cordip DIP2214
Chrvio CV1756Zymmob ZMO0680
Pasmul PM0311Erwcar ECA3679
Yerpes y1722Legpne lpl2837Phopro PBPRB1995
Acinet ACIAD2484
Clotet CTC01148Fusnuc FN1177
Aquaeo aq 369Theten TTE2658Themar TM1797
Bacfra BF2544Porgin PG2014
Halmar pNG4053Clotet CTC01463
Metjan MJ0378Pyrhor PH1245
Thekod TK0455Metthe MTH1084
Arcful AF2435Metace MA3670
Nanequ NEQ017Pyrhor PH0173
Pictor PTO0003Pictor PTO0049Thevol TVN0106
1343
1343
1343
1343
1518
1518
1343
1518
RecB-f4343
RecB-f1468
RecB-f1468
RecB-f1468
Helicase12031857 RAMP
1688
RAMP1688
Helicase1203
Helicase1203
RAMPBH0337
RAMPYgcH
RAMPy1727
RAMPy17261857
Helicase1203
HD-f
Helicase1203
HD-f
1343
HD-f
HD-f
HD-f
RuvC-f
AF0070
BH0338
y1724
ygcKygcL
RAMP1583
HTH
AF1870
SPy1049
3574
AF1870
PH0918
HTH 3574
1857 RAMP1688
RecB-f1468
Helicase1203
HD-f
AF1870
MTH1090
AF1873
RAMP1583
3574
AF1870
3574
1343
1
7
2
3
4
5
6
7a
Sulsol SSO1999Sulsol SSO1440
Sultok ST2639Sultok ST0032
Sulsol SSO1402Pyraer PAE0068
Arcful AF1874Pyrhor PH0917Thekod TK0450
Pyraby PAB1689
13433513
3513
1518
RuvC-f
HNH
4aHNH
1 0
Pyrhor PH0162
6
64
6
4
16
17
7
7
7
7
1
2
7
7
7
7
5 7
5 75 7
5 75 7
5 7
6 1
7
Metkan MK1314
Metace MA1931
Metmaz MM3356
Metbar Mbar_A1355
Strthe stu0960Myctub Rv2823cStaepi SERP2461
Themar TM1811Mansuc MS1651
Niteur NE0123Thethe TTP0102
Metthe MTH1082
Metthe MTH326
Metjan MJ1672Pictor PTO0053Thevol TVN0112
Metkan MK1297Thethe TTP0115
Bachal BH0328Synech sll7090
Aquaeo aq 387Theten TTE2639
Themar TM1794Arcful AF1867
Pyrfur PF1129Thefus Tfu1578
Porgin PG1987Sultok ST1979
Sulsol SSO1991Sulsol SSO1729
Sultok ST0012Sulaci Saci 2046Nostoc all1479
1421 RAMP1567
RAMP1583
RAMPMTH323
RAMP1769
XMTH324
ldquoPolymeraserdquo cassette
ldquoHelicaserdquo cassette(A)
(B)
(C)
Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4
Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56
Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)
mRNA targeting
Phylogeny of Cas1 and the 3 types of CRISPRCas systems
228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes
EcoliCASS2Type IType I
Type I
Type I
Type I
Type III
Type III
Type III
Type III
Type III
Type III
Type II
YpestCASS3
NmeniCASS4
PolymeraseRAMPmoduleMtubeCASS6
PolRAMP
DvulgCASS1
TneapHmariCASS7
ApernCASS5
cas1cas9 cas2 cas4
cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI
II
IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2
RAMP
Makarova et al NRM submitted
CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes
54
256
1537
26
516
2
9
7 56107
3
10
13
383
210
46
216
8
1
7 70211
10
1
0102030405060708090
100
1623
84 6
17 7
2322
0
9
0
0
151
2
1
21
17
20 1
0
1533 28 7 14
0
9 7 40
117 210
0102030405060708090
100
Type I Type II Type IIICas1presenceabsence
Makarova et al in preparation
Modular evolution of the 3 types of CRISPRCas
Makarova et al in preparation
Mol Microbiol 2011 Jan79(2)484-502
A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair
Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF
Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair
Back to repair functions
The 3 major modalities of evolution
Koonin Wolf Biol Direct 2009
CRISPRCas as a bona fide Lamarckian system
Koonin Wolf Biol Direct 2009
Phenomenon Biological rolefunction
Phyletic spread
Lamarckian criteria
Genomic changes caused by environmental factor
Changes are specific to relevant genomic loci
Changes provide adaptation to the causative factor
Bona fide LamarckianCRISPRCas Defense against
viruses and other mobile elements
Most of the Archaeaand many bacteria
Yes Yes Yes
piRNA Defense against transposable elements in germline
Animals Yes Yes Yes
HGT (specific cases)
Adaptation to new environment stress response resistance
Archaea bacteria unicellular eukaryotes
Yes Yes Yes
Quasi-LamarckianHGT (general phenomenon)
Diverse innovations Archaea bacteria unicellular eukaryotes
Yes No Yesno
Stress-induced mutagenesis
Stress responseresistanceadaptation to new conditions
Ubiquitous Yes No or partially
Yes (but general evolvabilityenhanced as well)
Diverse Lamarckian and quasi-Lamarckian phenomena
Koonin Wolf Biol Direct 2009
Stress as a gauge of evolutionary modality
Koonin Wolf Biol Direct 2009
bullEvolution of parasites is intrinsic to any replicator system
bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since
bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms
bullPerennial arms race between parasites and hosts is one of the principal factors of evolution
Valerian DoljaOregon State University
Yuri Wolf NCBITatiana Senkevich NIAID NIH
Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille
Didier Raoult et al
Bill MartinUniv Duesseldorf
Kira Makarova NCBI
Arcful GTTGAAATC-------AGACCAAAATGGGATTGAAAG- Metthe GTTAAAATC-------AGACCAAAATGGGATTGAAAT- Pyraby GTTCCAATA-------AGACTAGAATAGAATTGAAAG- Pyrhor GTTTAATAA--------GACTAAAATAGAATTGAAAG- Sulsol GATTAATCC-------------CAAAAGAATTGAAAG- Pyrhor GTTTCCGTA--------GAACTAAATAGTGTGGAAAG- Pyraby GTTTCCGTA--------GAACTTAGTAGTGTGGAAAG- Pyraer GTTTCAACT------------ATCTTTTGATTTTTGG- Pictor GTTAAATAA-------TAACCTAAATAGGATTGAAAG- Theaer GTAAAATAG---------ACCTTAATAGGATTGAAAG- Metace GTTTCAATC-------CCTCAAAGGTCTGATTTTAAC- Sultok GATGAATCC------------CAAAAGGAATTGAAAG- Sulaci GTTTTAGTT-------------TCTTGTCGTTATTAC- Pictor GTTTAAGAA--------TTACTAGATAGTATGGAGT-- Halmar GTTTCAGAC-------GGACCCTTGTGGGGTTGAAGC- Metjan ATTAAAATC-------AGACCGTTTCGGAATGGAAAT- Metkan GTTTCATTA-CCCGTATTATTACGGGTTAATTGCGAG- Nanequ CTTTCAATA---------TTTCTAATATATTAGAAAC- Themar GTTTCAATA-------CTTCCTTAGAGGTATGGAAAC- Theten GTTTCAATC-------CCTTTTAGGTAGGCTAAAAAC Thethe GTTGCAAAC-CTCGTTAGCCTCGTAGAGGATTGAAAC- Aquaeo GTTTTAACT--------CCACACGGTACATTAGAAAC- Bachal GTCGCACTC----TATATGGGT-GCGTGGATTGAAAT- Azoarcus GTGTTCCCC-------GCGCATCGCGGGGGTTGAAG-- Chltep GTCTTCCCC--------ACGCC-CGTGGGGGTGTTTC- Chrvio GTGCTCCCC-------CACGCA-CGTGGGGATGAACCG Clotet GTATTAGTA-------GCACCA-TATTGGAATGTAAAT Anabaena GTTTTAATTAACAAAAATCCCTATCAGGGATTGAAAC- Myctub GTTTCCGTC--CCCTCTCGGGGTTTTGGGTCTGACGAC Yerpes GTTCACTGC--------CGCACAGGCAGCTTAGAAA-- Metcap GTTTCAATCCACTCCCGGCTATTTAGCCGGGAGATAC- Mycgal GTTTTAGCA-CTGTACAATACTTGTGTAAGCAATAAC- Pasmul GTTAACTGC--------CGTATAGGCAGCTTAGAAA-- Pasmul GTTGTAGTTCCCTCTCTCATTTCGCAGTGCTACAAT-- Pasmul GTTCACCAT--------CGTGTAGATGGCTTAGAAA--
All
A large subset of CRISPRs is conserved among diverse specieseven between archaea and bacteria suggesting that
CRISPRs are horizontally transferred together with cas genes
Mojica FJ Diez-Villasenor C Garcia-Martinez J Soria E
Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elementsJ Mol Evol 2005 Feb60(2)174-82
Here we show that CRISPR spacers derive from preexisting sequences either chromosomal or within transmissible genetic elements such as bacteriophages and conjugative plasmids
helliphelliphelliphellipThe transcription of the CRISPR loci (Tang et al 2002) suggests that such activity could be executed by CRISPR-RNA molecules acting as regulatory RNA that specifically recognizes the target through the homologous RNA-spacer sequence similarly to the eukaryotic interference RNA
HypothesisCRISPRCasbull is a prokaryotic immunity system that functions on the
RNAi principle
bull integrates short fragments of essential phageplasmid genes into CRISPRs
bull When expressed these fragments (psiRNA ndash after prokaryotic siRNA) silence the target gene and make the organism immune to the respective agent
bull contains all or most of the protein activities involved in these processes
bull Some of the Cas proteins are functional analogs of the eukaryotic proteins involved in RNAi in particular components of RISC (RNA-Induced Silencing Complex) and form prokaryotic analogs of RISC (pRISCs)
Transcription
(protein-guided) RNA folding
specific transcription factor (COG1517)
cellular RNApol
polycistronic pre-psiRNA
p-dicer (Helicase+HD-Hydrolase)(COG1203 COG2254)
RNA processing(fast)
pre-psiRNA75-100 nt
RNA processing(slow)
p-dicer
psiRNA25-45 nt
p-slicer(COG1468 COG4343
COG1857)
RAMP
target RNA cleavage
p-RISC
plasmid or phage mRNA
Annealing to RNA target
p-RISC
The basic scheme of CASS functioning
psiRNA25-45 nt
RAMP
RAMP-RNA complex(unstable)plasmid or phage
mRNA
Primer elongationCASS RNApol(COG1353)
Amplified annealing
long dsRNA (stable)
p-dicerDuplex degradation
RAMPRAMP binding
Cycle continues
Annealing to RNA target
Variant of CASS functioning with polymerasepsi-RNA amplification(mostly in thermophiles but also in Mycobacteria)
plasmid or phage mRNA
pre-psiRNA75-100 nt
CASS RNApol(COG1353)Or other RT
Reverse transcription with random copy choice
Random RNA recombination and reverse transcription
dsDNA with CRISPRs and target-derived spacers
Homologous recombination with genomic CRISPR region
integrase(COG1518)
genomic DNA
genomic DNA with new target-derived spacer
OR
New psiRNA generation
Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12
Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity
Key validation
bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease
Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes
(1) Adaptation
(2) Expression amp Processing
(3) Interference
invading virus plasmid
CRISPRleader
5 4 3 2 1
6 5 4 3 2 1
invading virus plasmid
invading virus plasmid
host
host
cas operon
Cascade
Cas3
cas3 cas2cas1cas5
Van der Oost et al TIBS 2009
The three types of CRISPRCas systems and their signature genes
1343
1343
1343
1343
Helicase1203
HD-f
Cas2
Cas6
Cas6
Cas5
Cas8
Cas9
Cas1
Cas3
Cas10
Cas11
Cas7
Cas6
Cas6
Cas5 Cas6
Cas4
Cas6
13433513
1343
RuvC-f
3513
HNH
13433513
1857
18571857 RAMP1688
RAMP1688
RecB-f1468
RecB-f1468
Helicase1203
R AMP1583
1857 1857Helicase1203
1857Helicase1203
1343 y1724
ygcKygcL
EcoliCASS2
YpestCASS3
NmeniCASS4
MtubeCASS6 RAMP module
DvulgCASS1
Tneap and HmariCASS7
ApernCASS5
Type I (Helicase cassette)
Cascade
Type II (HNH-type)
Type III (RAMP module or Polymerase cassette)
CASS4a
1857 1343RecB-f1468
Helicase1203
BH0338MTH1090LA3191
1421 RAMP1567
RAMP15833337RAMP
1583
RAMP1583
RAMP1769
SPy1049
Cas6
Cas61857
RecB-f1468
RAMP1688
Helicase1203
AF0070
R AMP1583
1343
1517
Cas12 Cas13
RAMP1688
RAMP1688
RAMP1688
Makarova et al NRM submitted
Experimental data on CRISPRCas systems
1857
1857
1857
RAMP1688
10
Cordip DIP0037Bacfra BF3951
Wolsuc WS1444Camjej Cj1522c
Neimen NMA0630Pasmul PM1126
Strthe str0658Strpyo Spy 0770
Treden TDE0328Staepi SERP2463
Mycgal MGA 0523Mycmob MMOB0320
Porgin PG1982Legpne lpp0161
Wolsuc WS1615Sulsol SSO1450
Aerper APE1240Sulsol SSO1405
Pyraer PAE0200Pyraer PAE0081
Metmaz MM0559Metmaz MM3249
Arcful AF1878Mansuc MS1635
Niteur NE0111Myctub Rv2817c
Nostoc alr0381Synech slr7071Thethe TTHB145
Chrvio CV1229Xanaxo XAC3842
Chltep CT1130Desvul DVUA0134
Metcap MCA0651Azoarc ebA3283
Deigeo_DRAFT_1682Mansuc MS0981
Baccla ABC3592Bachal BH0341
Strpyo Spy 1286Vibvul VVA1544
Nostoc alr1468Synech slr7092
Synech slr7016Geosul GSU0057
Thethe TTHB224Nostoc alr1568
Metkan MK1312
Esccol b2755Saltyp STM2938
Phopro PBPRC0034Geosul GSU1392
Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639
Symthe_STH663Corjei jk0643
Nocfar nfa44220Strave SAV7537
Metcap MCA0930Cordip DIP2214
Chrvio CV1756Zymmob ZMO0680
Pasmul PM0311Erwcar ECA3679
Yerpes y1722Legpne lpl2837Phopro PBPRB1995
Acinet ACIAD2484
Clotet CTC01148Fusnuc FN1177
Aquaeo aq 369Theten TTE2658Themar TM1797
Bacfra BF2544Porgin PG2014
Halmar pNG4053Clotet CTC01463
Metjan MJ0378Pyrhor PH1245
Thekod TK0455Metthe MTH1084
Arcful AF2435Metace MA3670
Nanequ NEQ017Pyrhor PH0173
Pictor PTO0003Pictor PTO0049Thevol TVN0106
1343
1343
1343
1343
1518
1518
1343
1518
RecB-f4343
RecB-f1468
RecB-f1468
RecB-f1468
Helicase12031857 RAMP
1688
RAMP1688
Helicase1203
Helicase1203
RAMPBH0337
RAMPYgcH
RAMPy1727
RAMPy17261857
Helicase1203
HD-f
Helicase1203
HD-f
1343
HD-f
HD-f
HD-f
RuvC-f
AF0070
BH0338
y1724
ygcKygcL
RAMP1583
HTH
AF1870
SPy1049
3574
AF1870
PH0918
HTH 3574
1857 RAMP1688
RecB-f1468
Helicase1203
HD-f
AF1870
MTH1090
AF1873
RAMP1583
3574
AF1870
3574
1343
1
7
2
3
4
5
6
7a
Sulsol SSO1999Sulsol SSO1440
Sultok ST2639Sultok ST0032
Sulsol SSO1402Pyraer PAE0068
Arcful AF1874Pyrhor PH0917Thekod TK0450
Pyraby PAB1689
13433513
3513
1518
RuvC-f
HNH
4aHNH
1 0
Pyrhor PH0162
6
64
6
4
16
17
7
7
7
7
1
2
7
7
7
7
5 7
5 75 7
5 75 7
5 7
6 1
7
Metkan MK1314
Metace MA1931
Metmaz MM3356
Metbar Mbar_A1355
Strthe stu0960Myctub Rv2823cStaepi SERP2461
Themar TM1811Mansuc MS1651
Niteur NE0123Thethe TTP0102
Metthe MTH1082
Metthe MTH326
Metjan MJ1672Pictor PTO0053Thevol TVN0112
Metkan MK1297Thethe TTP0115
Bachal BH0328Synech sll7090
Aquaeo aq 387Theten TTE2639
Themar TM1794Arcful AF1867
Pyrfur PF1129Thefus Tfu1578
Porgin PG1987Sultok ST1979
Sulsol SSO1991Sulsol SSO1729
Sultok ST0012Sulaci Saci 2046Nostoc all1479
1421 RAMP1567
RAMP1583
RAMPMTH323
RAMP1769
XMTH324
ldquoPolymeraserdquo cassette
ldquoHelicaserdquo cassette(A)
(B)
(C)
Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4
Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56
Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)
mRNA targeting
Phylogeny of Cas1 and the 3 types of CRISPRCas systems
228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes
EcoliCASS2Type IType I
Type I
Type I
Type I
Type III
Type III
Type III
Type III
Type III
Type III
Type II
YpestCASS3
NmeniCASS4
PolymeraseRAMPmoduleMtubeCASS6
PolRAMP
DvulgCASS1
TneapHmariCASS7
ApernCASS5
cas1cas9 cas2 cas4
cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI
II
IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2
RAMP
Makarova et al NRM submitted
CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes
54
256
1537
26
516
2
9
7 56107
3
10
13
383
210
46
216
8
1
7 70211
10
1
0102030405060708090
100
1623
84 6
17 7
2322
0
9
0
0
151
2
1
21
17
20 1
0
1533 28 7 14
0
9 7 40
117 210
0102030405060708090
100
Type I Type II Type IIICas1presenceabsence
Makarova et al in preparation
Modular evolution of the 3 types of CRISPRCas
Makarova et al in preparation
Mol Microbiol 2011 Jan79(2)484-502
A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair
Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF
Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair
Back to repair functions
The 3 major modalities of evolution
Koonin Wolf Biol Direct 2009
CRISPRCas as a bona fide Lamarckian system
Koonin Wolf Biol Direct 2009
Phenomenon Biological rolefunction
Phyletic spread
Lamarckian criteria
Genomic changes caused by environmental factor
Changes are specific to relevant genomic loci
Changes provide adaptation to the causative factor
Bona fide LamarckianCRISPRCas Defense against
viruses and other mobile elements
Most of the Archaeaand many bacteria
Yes Yes Yes
piRNA Defense against transposable elements in germline
Animals Yes Yes Yes
HGT (specific cases)
Adaptation to new environment stress response resistance
Archaea bacteria unicellular eukaryotes
Yes Yes Yes
Quasi-LamarckianHGT (general phenomenon)
Diverse innovations Archaea bacteria unicellular eukaryotes
Yes No Yesno
Stress-induced mutagenesis
Stress responseresistanceadaptation to new conditions
Ubiquitous Yes No or partially
Yes (but general evolvabilityenhanced as well)
Diverse Lamarckian and quasi-Lamarckian phenomena
Koonin Wolf Biol Direct 2009
Stress as a gauge of evolutionary modality
Koonin Wolf Biol Direct 2009
bullEvolution of parasites is intrinsic to any replicator system
bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since
bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms
bullPerennial arms race between parasites and hosts is one of the principal factors of evolution
Valerian DoljaOregon State University
Yuri Wolf NCBITatiana Senkevich NIAID NIH
Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille
Didier Raoult et al
Bill MartinUniv Duesseldorf
Kira Makarova NCBI
Mojica FJ Diez-Villasenor C Garcia-Martinez J Soria E
Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elementsJ Mol Evol 2005 Feb60(2)174-82
Here we show that CRISPR spacers derive from preexisting sequences either chromosomal or within transmissible genetic elements such as bacteriophages and conjugative plasmids
helliphelliphelliphellipThe transcription of the CRISPR loci (Tang et al 2002) suggests that such activity could be executed by CRISPR-RNA molecules acting as regulatory RNA that specifically recognizes the target through the homologous RNA-spacer sequence similarly to the eukaryotic interference RNA
HypothesisCRISPRCasbull is a prokaryotic immunity system that functions on the
RNAi principle
bull integrates short fragments of essential phageplasmid genes into CRISPRs
bull When expressed these fragments (psiRNA ndash after prokaryotic siRNA) silence the target gene and make the organism immune to the respective agent
bull contains all or most of the protein activities involved in these processes
bull Some of the Cas proteins are functional analogs of the eukaryotic proteins involved in RNAi in particular components of RISC (RNA-Induced Silencing Complex) and form prokaryotic analogs of RISC (pRISCs)
Transcription
(protein-guided) RNA folding
specific transcription factor (COG1517)
cellular RNApol
polycistronic pre-psiRNA
p-dicer (Helicase+HD-Hydrolase)(COG1203 COG2254)
RNA processing(fast)
pre-psiRNA75-100 nt
RNA processing(slow)
p-dicer
psiRNA25-45 nt
p-slicer(COG1468 COG4343
COG1857)
RAMP
target RNA cleavage
p-RISC
plasmid or phage mRNA
Annealing to RNA target
p-RISC
The basic scheme of CASS functioning
psiRNA25-45 nt
RAMP
RAMP-RNA complex(unstable)plasmid or phage
mRNA
Primer elongationCASS RNApol(COG1353)
Amplified annealing
long dsRNA (stable)
p-dicerDuplex degradation
RAMPRAMP binding
Cycle continues
Annealing to RNA target
Variant of CASS functioning with polymerasepsi-RNA amplification(mostly in thermophiles but also in Mycobacteria)
plasmid or phage mRNA
pre-psiRNA75-100 nt
CASS RNApol(COG1353)Or other RT
Reverse transcription with random copy choice
Random RNA recombination and reverse transcription
dsDNA with CRISPRs and target-derived spacers
Homologous recombination with genomic CRISPR region
integrase(COG1518)
genomic DNA
genomic DNA with new target-derived spacer
OR
New psiRNA generation
Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12
Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity
Key validation
bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease
Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes
(1) Adaptation
(2) Expression amp Processing
(3) Interference
invading virus plasmid
CRISPRleader
5 4 3 2 1
6 5 4 3 2 1
invading virus plasmid
invading virus plasmid
host
host
cas operon
Cascade
Cas3
cas3 cas2cas1cas5
Van der Oost et al TIBS 2009
The three types of CRISPRCas systems and their signature genes
1343
1343
1343
1343
Helicase1203
HD-f
Cas2
Cas6
Cas6
Cas5
Cas8
Cas9
Cas1
Cas3
Cas10
Cas11
Cas7
Cas6
Cas6
Cas5 Cas6
Cas4
Cas6
13433513
1343
RuvC-f
3513
HNH
13433513
1857
18571857 RAMP1688
RAMP1688
RecB-f1468
RecB-f1468
Helicase1203
R AMP1583
1857 1857Helicase1203
1857Helicase1203
1343 y1724
ygcKygcL
EcoliCASS2
YpestCASS3
NmeniCASS4
MtubeCASS6 RAMP module
DvulgCASS1
Tneap and HmariCASS7
ApernCASS5
Type I (Helicase cassette)
Cascade
Type II (HNH-type)
Type III (RAMP module or Polymerase cassette)
CASS4a
1857 1343RecB-f1468
Helicase1203
BH0338MTH1090LA3191
1421 RAMP1567
RAMP15833337RAMP
1583
RAMP1583
RAMP1769
SPy1049
Cas6
Cas61857
RecB-f1468
RAMP1688
Helicase1203
AF0070
R AMP1583
1343
1517
Cas12 Cas13
RAMP1688
RAMP1688
RAMP1688
Makarova et al NRM submitted
Experimental data on CRISPRCas systems
1857
1857
1857
RAMP1688
10
Cordip DIP0037Bacfra BF3951
Wolsuc WS1444Camjej Cj1522c
Neimen NMA0630Pasmul PM1126
Strthe str0658Strpyo Spy 0770
Treden TDE0328Staepi SERP2463
Mycgal MGA 0523Mycmob MMOB0320
Porgin PG1982Legpne lpp0161
Wolsuc WS1615Sulsol SSO1450
Aerper APE1240Sulsol SSO1405
Pyraer PAE0200Pyraer PAE0081
Metmaz MM0559Metmaz MM3249
Arcful AF1878Mansuc MS1635
Niteur NE0111Myctub Rv2817c
Nostoc alr0381Synech slr7071Thethe TTHB145
Chrvio CV1229Xanaxo XAC3842
Chltep CT1130Desvul DVUA0134
Metcap MCA0651Azoarc ebA3283
Deigeo_DRAFT_1682Mansuc MS0981
Baccla ABC3592Bachal BH0341
Strpyo Spy 1286Vibvul VVA1544
Nostoc alr1468Synech slr7092
Synech slr7016Geosul GSU0057
Thethe TTHB224Nostoc alr1568
Metkan MK1312
Esccol b2755Saltyp STM2938
Phopro PBPRC0034Geosul GSU1392
Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639
Symthe_STH663Corjei jk0643
Nocfar nfa44220Strave SAV7537
Metcap MCA0930Cordip DIP2214
Chrvio CV1756Zymmob ZMO0680
Pasmul PM0311Erwcar ECA3679
Yerpes y1722Legpne lpl2837Phopro PBPRB1995
Acinet ACIAD2484
Clotet CTC01148Fusnuc FN1177
Aquaeo aq 369Theten TTE2658Themar TM1797
Bacfra BF2544Porgin PG2014
Halmar pNG4053Clotet CTC01463
Metjan MJ0378Pyrhor PH1245
Thekod TK0455Metthe MTH1084
Arcful AF2435Metace MA3670
Nanequ NEQ017Pyrhor PH0173
Pictor PTO0003Pictor PTO0049Thevol TVN0106
1343
1343
1343
1343
1518
1518
1343
1518
RecB-f4343
RecB-f1468
RecB-f1468
RecB-f1468
Helicase12031857 RAMP
1688
RAMP1688
Helicase1203
Helicase1203
RAMPBH0337
RAMPYgcH
RAMPy1727
RAMPy17261857
Helicase1203
HD-f
Helicase1203
HD-f
1343
HD-f
HD-f
HD-f
RuvC-f
AF0070
BH0338
y1724
ygcKygcL
RAMP1583
HTH
AF1870
SPy1049
3574
AF1870
PH0918
HTH 3574
1857 RAMP1688
RecB-f1468
Helicase1203
HD-f
AF1870
MTH1090
AF1873
RAMP1583
3574
AF1870
3574
1343
1
7
2
3
4
5
6
7a
Sulsol SSO1999Sulsol SSO1440
Sultok ST2639Sultok ST0032
Sulsol SSO1402Pyraer PAE0068
Arcful AF1874Pyrhor PH0917Thekod TK0450
Pyraby PAB1689
13433513
3513
1518
RuvC-f
HNH
4aHNH
1 0
Pyrhor PH0162
6
64
6
4
16
17
7
7
7
7
1
2
7
7
7
7
5 7
5 75 7
5 75 7
5 7
6 1
7
Metkan MK1314
Metace MA1931
Metmaz MM3356
Metbar Mbar_A1355
Strthe stu0960Myctub Rv2823cStaepi SERP2461
Themar TM1811Mansuc MS1651
Niteur NE0123Thethe TTP0102
Metthe MTH1082
Metthe MTH326
Metjan MJ1672Pictor PTO0053Thevol TVN0112
Metkan MK1297Thethe TTP0115
Bachal BH0328Synech sll7090
Aquaeo aq 387Theten TTE2639
Themar TM1794Arcful AF1867
Pyrfur PF1129Thefus Tfu1578
Porgin PG1987Sultok ST1979
Sulsol SSO1991Sulsol SSO1729
Sultok ST0012Sulaci Saci 2046Nostoc all1479
1421 RAMP1567
RAMP1583
RAMPMTH323
RAMP1769
XMTH324
ldquoPolymeraserdquo cassette
ldquoHelicaserdquo cassette(A)
(B)
(C)
Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4
Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56
Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)
mRNA targeting
Phylogeny of Cas1 and the 3 types of CRISPRCas systems
228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes
EcoliCASS2Type IType I
Type I
Type I
Type I
Type III
Type III
Type III
Type III
Type III
Type III
Type II
YpestCASS3
NmeniCASS4
PolymeraseRAMPmoduleMtubeCASS6
PolRAMP
DvulgCASS1
TneapHmariCASS7
ApernCASS5
cas1cas9 cas2 cas4
cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI
II
IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2
RAMP
Makarova et al NRM submitted
CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes
54
256
1537
26
516
2
9
7 56107
3
10
13
383
210
46
216
8
1
7 70211
10
1
0102030405060708090
100
1623
84 6
17 7
2322
0
9
0
0
151
2
1
21
17
20 1
0
1533 28 7 14
0
9 7 40
117 210
0102030405060708090
100
Type I Type II Type IIICas1presenceabsence
Makarova et al in preparation
Modular evolution of the 3 types of CRISPRCas
Makarova et al in preparation
Mol Microbiol 2011 Jan79(2)484-502
A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair
Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF
Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair
Back to repair functions
The 3 major modalities of evolution
Koonin Wolf Biol Direct 2009
CRISPRCas as a bona fide Lamarckian system
Koonin Wolf Biol Direct 2009
Phenomenon Biological rolefunction
Phyletic spread
Lamarckian criteria
Genomic changes caused by environmental factor
Changes are specific to relevant genomic loci
Changes provide adaptation to the causative factor
Bona fide LamarckianCRISPRCas Defense against
viruses and other mobile elements
Most of the Archaeaand many bacteria
Yes Yes Yes
piRNA Defense against transposable elements in germline
Animals Yes Yes Yes
HGT (specific cases)
Adaptation to new environment stress response resistance
Archaea bacteria unicellular eukaryotes
Yes Yes Yes
Quasi-LamarckianHGT (general phenomenon)
Diverse innovations Archaea bacteria unicellular eukaryotes
Yes No Yesno
Stress-induced mutagenesis
Stress responseresistanceadaptation to new conditions
Ubiquitous Yes No or partially
Yes (but general evolvabilityenhanced as well)
Diverse Lamarckian and quasi-Lamarckian phenomena
Koonin Wolf Biol Direct 2009
Stress as a gauge of evolutionary modality
Koonin Wolf Biol Direct 2009
bullEvolution of parasites is intrinsic to any replicator system
bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since
bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms
bullPerennial arms race between parasites and hosts is one of the principal factors of evolution
Valerian DoljaOregon State University
Yuri Wolf NCBITatiana Senkevich NIAID NIH
Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille
Didier Raoult et al
Bill MartinUniv Duesseldorf
Kira Makarova NCBI
HypothesisCRISPRCasbull is a prokaryotic immunity system that functions on the
RNAi principle
bull integrates short fragments of essential phageplasmid genes into CRISPRs
bull When expressed these fragments (psiRNA ndash after prokaryotic siRNA) silence the target gene and make the organism immune to the respective agent
bull contains all or most of the protein activities involved in these processes
bull Some of the Cas proteins are functional analogs of the eukaryotic proteins involved in RNAi in particular components of RISC (RNA-Induced Silencing Complex) and form prokaryotic analogs of RISC (pRISCs)
Transcription
(protein-guided) RNA folding
specific transcription factor (COG1517)
cellular RNApol
polycistronic pre-psiRNA
p-dicer (Helicase+HD-Hydrolase)(COG1203 COG2254)
RNA processing(fast)
pre-psiRNA75-100 nt
RNA processing(slow)
p-dicer
psiRNA25-45 nt
p-slicer(COG1468 COG4343
COG1857)
RAMP
target RNA cleavage
p-RISC
plasmid or phage mRNA
Annealing to RNA target
p-RISC
The basic scheme of CASS functioning
psiRNA25-45 nt
RAMP
RAMP-RNA complex(unstable)plasmid or phage
mRNA
Primer elongationCASS RNApol(COG1353)
Amplified annealing
long dsRNA (stable)
p-dicerDuplex degradation
RAMPRAMP binding
Cycle continues
Annealing to RNA target
Variant of CASS functioning with polymerasepsi-RNA amplification(mostly in thermophiles but also in Mycobacteria)
plasmid or phage mRNA
pre-psiRNA75-100 nt
CASS RNApol(COG1353)Or other RT
Reverse transcription with random copy choice
Random RNA recombination and reverse transcription
dsDNA with CRISPRs and target-derived spacers
Homologous recombination with genomic CRISPR region
integrase(COG1518)
genomic DNA
genomic DNA with new target-derived spacer
OR
New psiRNA generation
Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12
Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity
Key validation
bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease
Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes
(1) Adaptation
(2) Expression amp Processing
(3) Interference
invading virus plasmid
CRISPRleader
5 4 3 2 1
6 5 4 3 2 1
invading virus plasmid
invading virus plasmid
host
host
cas operon
Cascade
Cas3
cas3 cas2cas1cas5
Van der Oost et al TIBS 2009
The three types of CRISPRCas systems and their signature genes
1343
1343
1343
1343
Helicase1203
HD-f
Cas2
Cas6
Cas6
Cas5
Cas8
Cas9
Cas1
Cas3
Cas10
Cas11
Cas7
Cas6
Cas6
Cas5 Cas6
Cas4
Cas6
13433513
1343
RuvC-f
3513
HNH
13433513
1857
18571857 RAMP1688
RAMP1688
RecB-f1468
RecB-f1468
Helicase1203
R AMP1583
1857 1857Helicase1203
1857Helicase1203
1343 y1724
ygcKygcL
EcoliCASS2
YpestCASS3
NmeniCASS4
MtubeCASS6 RAMP module
DvulgCASS1
Tneap and HmariCASS7
ApernCASS5
Type I (Helicase cassette)
Cascade
Type II (HNH-type)
Type III (RAMP module or Polymerase cassette)
CASS4a
1857 1343RecB-f1468
Helicase1203
BH0338MTH1090LA3191
1421 RAMP1567
RAMP15833337RAMP
1583
RAMP1583
RAMP1769
SPy1049
Cas6
Cas61857
RecB-f1468
RAMP1688
Helicase1203
AF0070
R AMP1583
1343
1517
Cas12 Cas13
RAMP1688
RAMP1688
RAMP1688
Makarova et al NRM submitted
Experimental data on CRISPRCas systems
1857
1857
1857
RAMP1688
10
Cordip DIP0037Bacfra BF3951
Wolsuc WS1444Camjej Cj1522c
Neimen NMA0630Pasmul PM1126
Strthe str0658Strpyo Spy 0770
Treden TDE0328Staepi SERP2463
Mycgal MGA 0523Mycmob MMOB0320
Porgin PG1982Legpne lpp0161
Wolsuc WS1615Sulsol SSO1450
Aerper APE1240Sulsol SSO1405
Pyraer PAE0200Pyraer PAE0081
Metmaz MM0559Metmaz MM3249
Arcful AF1878Mansuc MS1635
Niteur NE0111Myctub Rv2817c
Nostoc alr0381Synech slr7071Thethe TTHB145
Chrvio CV1229Xanaxo XAC3842
Chltep CT1130Desvul DVUA0134
Metcap MCA0651Azoarc ebA3283
Deigeo_DRAFT_1682Mansuc MS0981
Baccla ABC3592Bachal BH0341
Strpyo Spy 1286Vibvul VVA1544
Nostoc alr1468Synech slr7092
Synech slr7016Geosul GSU0057
Thethe TTHB224Nostoc alr1568
Metkan MK1312
Esccol b2755Saltyp STM2938
Phopro PBPRC0034Geosul GSU1392
Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639
Symthe_STH663Corjei jk0643
Nocfar nfa44220Strave SAV7537
Metcap MCA0930Cordip DIP2214
Chrvio CV1756Zymmob ZMO0680
Pasmul PM0311Erwcar ECA3679
Yerpes y1722Legpne lpl2837Phopro PBPRB1995
Acinet ACIAD2484
Clotet CTC01148Fusnuc FN1177
Aquaeo aq 369Theten TTE2658Themar TM1797
Bacfra BF2544Porgin PG2014
Halmar pNG4053Clotet CTC01463
Metjan MJ0378Pyrhor PH1245
Thekod TK0455Metthe MTH1084
Arcful AF2435Metace MA3670
Nanequ NEQ017Pyrhor PH0173
Pictor PTO0003Pictor PTO0049Thevol TVN0106
1343
1343
1343
1343
1518
1518
1343
1518
RecB-f4343
RecB-f1468
RecB-f1468
RecB-f1468
Helicase12031857 RAMP
1688
RAMP1688
Helicase1203
Helicase1203
RAMPBH0337
RAMPYgcH
RAMPy1727
RAMPy17261857
Helicase1203
HD-f
Helicase1203
HD-f
1343
HD-f
HD-f
HD-f
RuvC-f
AF0070
BH0338
y1724
ygcKygcL
RAMP1583
HTH
AF1870
SPy1049
3574
AF1870
PH0918
HTH 3574
1857 RAMP1688
RecB-f1468
Helicase1203
HD-f
AF1870
MTH1090
AF1873
RAMP1583
3574
AF1870
3574
1343
1
7
2
3
4
5
6
7a
Sulsol SSO1999Sulsol SSO1440
Sultok ST2639Sultok ST0032
Sulsol SSO1402Pyraer PAE0068
Arcful AF1874Pyrhor PH0917Thekod TK0450
Pyraby PAB1689
13433513
3513
1518
RuvC-f
HNH
4aHNH
1 0
Pyrhor PH0162
6
64
6
4
16
17
7
7
7
7
1
2
7
7
7
7
5 7
5 75 7
5 75 7
5 7
6 1
7
Metkan MK1314
Metace MA1931
Metmaz MM3356
Metbar Mbar_A1355
Strthe stu0960Myctub Rv2823cStaepi SERP2461
Themar TM1811Mansuc MS1651
Niteur NE0123Thethe TTP0102
Metthe MTH1082
Metthe MTH326
Metjan MJ1672Pictor PTO0053Thevol TVN0112
Metkan MK1297Thethe TTP0115
Bachal BH0328Synech sll7090
Aquaeo aq 387Theten TTE2639
Themar TM1794Arcful AF1867
Pyrfur PF1129Thefus Tfu1578
Porgin PG1987Sultok ST1979
Sulsol SSO1991Sulsol SSO1729
Sultok ST0012Sulaci Saci 2046Nostoc all1479
1421 RAMP1567
RAMP1583
RAMPMTH323
RAMP1769
XMTH324
ldquoPolymeraserdquo cassette
ldquoHelicaserdquo cassette(A)
(B)
(C)
Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4
Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56
Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)
mRNA targeting
Phylogeny of Cas1 and the 3 types of CRISPRCas systems
228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes
EcoliCASS2Type IType I
Type I
Type I
Type I
Type III
Type III
Type III
Type III
Type III
Type III
Type II
YpestCASS3
NmeniCASS4
PolymeraseRAMPmoduleMtubeCASS6
PolRAMP
DvulgCASS1
TneapHmariCASS7
ApernCASS5
cas1cas9 cas2 cas4
cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI
II
IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2
RAMP
Makarova et al NRM submitted
CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes
54
256
1537
26
516
2
9
7 56107
3
10
13
383
210
46
216
8
1
7 70211
10
1
0102030405060708090
100
1623
84 6
17 7
2322
0
9
0
0
151
2
1
21
17
20 1
0
1533 28 7 14
0
9 7 40
117 210
0102030405060708090
100
Type I Type II Type IIICas1presenceabsence
Makarova et al in preparation
Modular evolution of the 3 types of CRISPRCas
Makarova et al in preparation
Mol Microbiol 2011 Jan79(2)484-502
A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair
Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF
Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair
Back to repair functions
The 3 major modalities of evolution
Koonin Wolf Biol Direct 2009
CRISPRCas as a bona fide Lamarckian system
Koonin Wolf Biol Direct 2009
Phenomenon Biological rolefunction
Phyletic spread
Lamarckian criteria
Genomic changes caused by environmental factor
Changes are specific to relevant genomic loci
Changes provide adaptation to the causative factor
Bona fide LamarckianCRISPRCas Defense against
viruses and other mobile elements
Most of the Archaeaand many bacteria
Yes Yes Yes
piRNA Defense against transposable elements in germline
Animals Yes Yes Yes
HGT (specific cases)
Adaptation to new environment stress response resistance
Archaea bacteria unicellular eukaryotes
Yes Yes Yes
Quasi-LamarckianHGT (general phenomenon)
Diverse innovations Archaea bacteria unicellular eukaryotes
Yes No Yesno
Stress-induced mutagenesis
Stress responseresistanceadaptation to new conditions
Ubiquitous Yes No or partially
Yes (but general evolvabilityenhanced as well)
Diverse Lamarckian and quasi-Lamarckian phenomena
Koonin Wolf Biol Direct 2009
Stress as a gauge of evolutionary modality
Koonin Wolf Biol Direct 2009
bullEvolution of parasites is intrinsic to any replicator system
bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since
bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms
bullPerennial arms race between parasites and hosts is one of the principal factors of evolution
Valerian DoljaOregon State University
Yuri Wolf NCBITatiana Senkevich NIAID NIH
Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille
Didier Raoult et al
Bill MartinUniv Duesseldorf
Kira Makarova NCBI
Transcription
(protein-guided) RNA folding
specific transcription factor (COG1517)
cellular RNApol
polycistronic pre-psiRNA
p-dicer (Helicase+HD-Hydrolase)(COG1203 COG2254)
RNA processing(fast)
pre-psiRNA75-100 nt
RNA processing(slow)
p-dicer
psiRNA25-45 nt
p-slicer(COG1468 COG4343
COG1857)
RAMP
target RNA cleavage
p-RISC
plasmid or phage mRNA
Annealing to RNA target
p-RISC
The basic scheme of CASS functioning
psiRNA25-45 nt
RAMP
RAMP-RNA complex(unstable)plasmid or phage
mRNA
Primer elongationCASS RNApol(COG1353)
Amplified annealing
long dsRNA (stable)
p-dicerDuplex degradation
RAMPRAMP binding
Cycle continues
Annealing to RNA target
Variant of CASS functioning with polymerasepsi-RNA amplification(mostly in thermophiles but also in Mycobacteria)
plasmid or phage mRNA
pre-psiRNA75-100 nt
CASS RNApol(COG1353)Or other RT
Reverse transcription with random copy choice
Random RNA recombination and reverse transcription
dsDNA with CRISPRs and target-derived spacers
Homologous recombination with genomic CRISPR region
integrase(COG1518)
genomic DNA
genomic DNA with new target-derived spacer
OR
New psiRNA generation
Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12
Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity
Key validation
bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease
Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes
(1) Adaptation
(2) Expression amp Processing
(3) Interference
invading virus plasmid
CRISPRleader
5 4 3 2 1
6 5 4 3 2 1
invading virus plasmid
invading virus plasmid
host
host
cas operon
Cascade
Cas3
cas3 cas2cas1cas5
Van der Oost et al TIBS 2009
The three types of CRISPRCas systems and their signature genes
1343
1343
1343
1343
Helicase1203
HD-f
Cas2
Cas6
Cas6
Cas5
Cas8
Cas9
Cas1
Cas3
Cas10
Cas11
Cas7
Cas6
Cas6
Cas5 Cas6
Cas4
Cas6
13433513
1343
RuvC-f
3513
HNH
13433513
1857
18571857 RAMP1688
RAMP1688
RecB-f1468
RecB-f1468
Helicase1203
R AMP1583
1857 1857Helicase1203
1857Helicase1203
1343 y1724
ygcKygcL
EcoliCASS2
YpestCASS3
NmeniCASS4
MtubeCASS6 RAMP module
DvulgCASS1
Tneap and HmariCASS7
ApernCASS5
Type I (Helicase cassette)
Cascade
Type II (HNH-type)
Type III (RAMP module or Polymerase cassette)
CASS4a
1857 1343RecB-f1468
Helicase1203
BH0338MTH1090LA3191
1421 RAMP1567
RAMP15833337RAMP
1583
RAMP1583
RAMP1769
SPy1049
Cas6
Cas61857
RecB-f1468
RAMP1688
Helicase1203
AF0070
R AMP1583
1343
1517
Cas12 Cas13
RAMP1688
RAMP1688
RAMP1688
Makarova et al NRM submitted
Experimental data on CRISPRCas systems
1857
1857
1857
RAMP1688
10
Cordip DIP0037Bacfra BF3951
Wolsuc WS1444Camjej Cj1522c
Neimen NMA0630Pasmul PM1126
Strthe str0658Strpyo Spy 0770
Treden TDE0328Staepi SERP2463
Mycgal MGA 0523Mycmob MMOB0320
Porgin PG1982Legpne lpp0161
Wolsuc WS1615Sulsol SSO1450
Aerper APE1240Sulsol SSO1405
Pyraer PAE0200Pyraer PAE0081
Metmaz MM0559Metmaz MM3249
Arcful AF1878Mansuc MS1635
Niteur NE0111Myctub Rv2817c
Nostoc alr0381Synech slr7071Thethe TTHB145
Chrvio CV1229Xanaxo XAC3842
Chltep CT1130Desvul DVUA0134
Metcap MCA0651Azoarc ebA3283
Deigeo_DRAFT_1682Mansuc MS0981
Baccla ABC3592Bachal BH0341
Strpyo Spy 1286Vibvul VVA1544
Nostoc alr1468Synech slr7092
Synech slr7016Geosul GSU0057
Thethe TTHB224Nostoc alr1568
Metkan MK1312
Esccol b2755Saltyp STM2938
Phopro PBPRC0034Geosul GSU1392
Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639
Symthe_STH663Corjei jk0643
Nocfar nfa44220Strave SAV7537
Metcap MCA0930Cordip DIP2214
Chrvio CV1756Zymmob ZMO0680
Pasmul PM0311Erwcar ECA3679
Yerpes y1722Legpne lpl2837Phopro PBPRB1995
Acinet ACIAD2484
Clotet CTC01148Fusnuc FN1177
Aquaeo aq 369Theten TTE2658Themar TM1797
Bacfra BF2544Porgin PG2014
Halmar pNG4053Clotet CTC01463
Metjan MJ0378Pyrhor PH1245
Thekod TK0455Metthe MTH1084
Arcful AF2435Metace MA3670
Nanequ NEQ017Pyrhor PH0173
Pictor PTO0003Pictor PTO0049Thevol TVN0106
1343
1343
1343
1343
1518
1518
1343
1518
RecB-f4343
RecB-f1468
RecB-f1468
RecB-f1468
Helicase12031857 RAMP
1688
RAMP1688
Helicase1203
Helicase1203
RAMPBH0337
RAMPYgcH
RAMPy1727
RAMPy17261857
Helicase1203
HD-f
Helicase1203
HD-f
1343
HD-f
HD-f
HD-f
RuvC-f
AF0070
BH0338
y1724
ygcKygcL
RAMP1583
HTH
AF1870
SPy1049
3574
AF1870
PH0918
HTH 3574
1857 RAMP1688
RecB-f1468
Helicase1203
HD-f
AF1870
MTH1090
AF1873
RAMP1583
3574
AF1870
3574
1343
1
7
2
3
4
5
6
7a
Sulsol SSO1999Sulsol SSO1440
Sultok ST2639Sultok ST0032
Sulsol SSO1402Pyraer PAE0068
Arcful AF1874Pyrhor PH0917Thekod TK0450
Pyraby PAB1689
13433513
3513
1518
RuvC-f
HNH
4aHNH
1 0
Pyrhor PH0162
6
64
6
4
16
17
7
7
7
7
1
2
7
7
7
7
5 7
5 75 7
5 75 7
5 7
6 1
7
Metkan MK1314
Metace MA1931
Metmaz MM3356
Metbar Mbar_A1355
Strthe stu0960Myctub Rv2823cStaepi SERP2461
Themar TM1811Mansuc MS1651
Niteur NE0123Thethe TTP0102
Metthe MTH1082
Metthe MTH326
Metjan MJ1672Pictor PTO0053Thevol TVN0112
Metkan MK1297Thethe TTP0115
Bachal BH0328Synech sll7090
Aquaeo aq 387Theten TTE2639
Themar TM1794Arcful AF1867
Pyrfur PF1129Thefus Tfu1578
Porgin PG1987Sultok ST1979
Sulsol SSO1991Sulsol SSO1729
Sultok ST0012Sulaci Saci 2046Nostoc all1479
1421 RAMP1567
RAMP1583
RAMPMTH323
RAMP1769
XMTH324
ldquoPolymeraserdquo cassette
ldquoHelicaserdquo cassette(A)
(B)
(C)
Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4
Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56
Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)
mRNA targeting
Phylogeny of Cas1 and the 3 types of CRISPRCas systems
228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes
EcoliCASS2Type IType I
Type I
Type I
Type I
Type III
Type III
Type III
Type III
Type III
Type III
Type II
YpestCASS3
NmeniCASS4
PolymeraseRAMPmoduleMtubeCASS6
PolRAMP
DvulgCASS1
TneapHmariCASS7
ApernCASS5
cas1cas9 cas2 cas4
cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI
II
IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2
RAMP
Makarova et al NRM submitted
CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes
54
256
1537
26
516
2
9
7 56107
3
10
13
383
210
46
216
8
1
7 70211
10
1
0102030405060708090
100
1623
84 6
17 7
2322
0
9
0
0
151
2
1
21
17
20 1
0
1533 28 7 14
0
9 7 40
117 210
0102030405060708090
100
Type I Type II Type IIICas1presenceabsence
Makarova et al in preparation
Modular evolution of the 3 types of CRISPRCas
Makarova et al in preparation
Mol Microbiol 2011 Jan79(2)484-502
A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair
Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF
Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair
Back to repair functions
The 3 major modalities of evolution
Koonin Wolf Biol Direct 2009
CRISPRCas as a bona fide Lamarckian system
Koonin Wolf Biol Direct 2009
Phenomenon Biological rolefunction
Phyletic spread
Lamarckian criteria
Genomic changes caused by environmental factor
Changes are specific to relevant genomic loci
Changes provide adaptation to the causative factor
Bona fide LamarckianCRISPRCas Defense against
viruses and other mobile elements
Most of the Archaeaand many bacteria
Yes Yes Yes
piRNA Defense against transposable elements in germline
Animals Yes Yes Yes
HGT (specific cases)
Adaptation to new environment stress response resistance
Archaea bacteria unicellular eukaryotes
Yes Yes Yes
Quasi-LamarckianHGT (general phenomenon)
Diverse innovations Archaea bacteria unicellular eukaryotes
Yes No Yesno
Stress-induced mutagenesis
Stress responseresistanceadaptation to new conditions
Ubiquitous Yes No or partially
Yes (but general evolvabilityenhanced as well)
Diverse Lamarckian and quasi-Lamarckian phenomena
Koonin Wolf Biol Direct 2009
Stress as a gauge of evolutionary modality
Koonin Wolf Biol Direct 2009
bullEvolution of parasites is intrinsic to any replicator system
bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since
bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms
bullPerennial arms race between parasites and hosts is one of the principal factors of evolution
Valerian DoljaOregon State University
Yuri Wolf NCBITatiana Senkevich NIAID NIH
Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille
Didier Raoult et al
Bill MartinUniv Duesseldorf
Kira Makarova NCBI
psiRNA25-45 nt
RAMP
RAMP-RNA complex(unstable)plasmid or phage
mRNA
Primer elongationCASS RNApol(COG1353)
Amplified annealing
long dsRNA (stable)
p-dicerDuplex degradation
RAMPRAMP binding
Cycle continues
Annealing to RNA target
Variant of CASS functioning with polymerasepsi-RNA amplification(mostly in thermophiles but also in Mycobacteria)
plasmid or phage mRNA
pre-psiRNA75-100 nt
CASS RNApol(COG1353)Or other RT
Reverse transcription with random copy choice
Random RNA recombination and reverse transcription
dsDNA with CRISPRs and target-derived spacers
Homologous recombination with genomic CRISPR region
integrase(COG1518)
genomic DNA
genomic DNA with new target-derived spacer
OR
New psiRNA generation
Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12
Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity
Key validation
bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease
Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes
(1) Adaptation
(2) Expression amp Processing
(3) Interference
invading virus plasmid
CRISPRleader
5 4 3 2 1
6 5 4 3 2 1
invading virus plasmid
invading virus plasmid
host
host
cas operon
Cascade
Cas3
cas3 cas2cas1cas5
Van der Oost et al TIBS 2009
The three types of CRISPRCas systems and their signature genes
1343
1343
1343
1343
Helicase1203
HD-f
Cas2
Cas6
Cas6
Cas5
Cas8
Cas9
Cas1
Cas3
Cas10
Cas11
Cas7
Cas6
Cas6
Cas5 Cas6
Cas4
Cas6
13433513
1343
RuvC-f
3513
HNH
13433513
1857
18571857 RAMP1688
RAMP1688
RecB-f1468
RecB-f1468
Helicase1203
R AMP1583
1857 1857Helicase1203
1857Helicase1203
1343 y1724
ygcKygcL
EcoliCASS2
YpestCASS3
NmeniCASS4
MtubeCASS6 RAMP module
DvulgCASS1
Tneap and HmariCASS7
ApernCASS5
Type I (Helicase cassette)
Cascade
Type II (HNH-type)
Type III (RAMP module or Polymerase cassette)
CASS4a
1857 1343RecB-f1468
Helicase1203
BH0338MTH1090LA3191
1421 RAMP1567
RAMP15833337RAMP
1583
RAMP1583
RAMP1769
SPy1049
Cas6
Cas61857
RecB-f1468
RAMP1688
Helicase1203
AF0070
R AMP1583
1343
1517
Cas12 Cas13
RAMP1688
RAMP1688
RAMP1688
Makarova et al NRM submitted
Experimental data on CRISPRCas systems
1857
1857
1857
RAMP1688
10
Cordip DIP0037Bacfra BF3951
Wolsuc WS1444Camjej Cj1522c
Neimen NMA0630Pasmul PM1126
Strthe str0658Strpyo Spy 0770
Treden TDE0328Staepi SERP2463
Mycgal MGA 0523Mycmob MMOB0320
Porgin PG1982Legpne lpp0161
Wolsuc WS1615Sulsol SSO1450
Aerper APE1240Sulsol SSO1405
Pyraer PAE0200Pyraer PAE0081
Metmaz MM0559Metmaz MM3249
Arcful AF1878Mansuc MS1635
Niteur NE0111Myctub Rv2817c
Nostoc alr0381Synech slr7071Thethe TTHB145
Chrvio CV1229Xanaxo XAC3842
Chltep CT1130Desvul DVUA0134
Metcap MCA0651Azoarc ebA3283
Deigeo_DRAFT_1682Mansuc MS0981
Baccla ABC3592Bachal BH0341
Strpyo Spy 1286Vibvul VVA1544
Nostoc alr1468Synech slr7092
Synech slr7016Geosul GSU0057
Thethe TTHB224Nostoc alr1568
Metkan MK1312
Esccol b2755Saltyp STM2938
Phopro PBPRC0034Geosul GSU1392
Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639
Symthe_STH663Corjei jk0643
Nocfar nfa44220Strave SAV7537
Metcap MCA0930Cordip DIP2214
Chrvio CV1756Zymmob ZMO0680
Pasmul PM0311Erwcar ECA3679
Yerpes y1722Legpne lpl2837Phopro PBPRB1995
Acinet ACIAD2484
Clotet CTC01148Fusnuc FN1177
Aquaeo aq 369Theten TTE2658Themar TM1797
Bacfra BF2544Porgin PG2014
Halmar pNG4053Clotet CTC01463
Metjan MJ0378Pyrhor PH1245
Thekod TK0455Metthe MTH1084
Arcful AF2435Metace MA3670
Nanequ NEQ017Pyrhor PH0173
Pictor PTO0003Pictor PTO0049Thevol TVN0106
1343
1343
1343
1343
1518
1518
1343
1518
RecB-f4343
RecB-f1468
RecB-f1468
RecB-f1468
Helicase12031857 RAMP
1688
RAMP1688
Helicase1203
Helicase1203
RAMPBH0337
RAMPYgcH
RAMPy1727
RAMPy17261857
Helicase1203
HD-f
Helicase1203
HD-f
1343
HD-f
HD-f
HD-f
RuvC-f
AF0070
BH0338
y1724
ygcKygcL
RAMP1583
HTH
AF1870
SPy1049
3574
AF1870
PH0918
HTH 3574
1857 RAMP1688
RecB-f1468
Helicase1203
HD-f
AF1870
MTH1090
AF1873
RAMP1583
3574
AF1870
3574
1343
1
7
2
3
4
5
6
7a
Sulsol SSO1999Sulsol SSO1440
Sultok ST2639Sultok ST0032
Sulsol SSO1402Pyraer PAE0068
Arcful AF1874Pyrhor PH0917Thekod TK0450
Pyraby PAB1689
13433513
3513
1518
RuvC-f
HNH
4aHNH
1 0
Pyrhor PH0162
6
64
6
4
16
17
7
7
7
7
1
2
7
7
7
7
5 7
5 75 7
5 75 7
5 7
6 1
7
Metkan MK1314
Metace MA1931
Metmaz MM3356
Metbar Mbar_A1355
Strthe stu0960Myctub Rv2823cStaepi SERP2461
Themar TM1811Mansuc MS1651
Niteur NE0123Thethe TTP0102
Metthe MTH1082
Metthe MTH326
Metjan MJ1672Pictor PTO0053Thevol TVN0112
Metkan MK1297Thethe TTP0115
Bachal BH0328Synech sll7090
Aquaeo aq 387Theten TTE2639
Themar TM1794Arcful AF1867
Pyrfur PF1129Thefus Tfu1578
Porgin PG1987Sultok ST1979
Sulsol SSO1991Sulsol SSO1729
Sultok ST0012Sulaci Saci 2046Nostoc all1479
1421 RAMP1567
RAMP1583
RAMPMTH323
RAMP1769
XMTH324
ldquoPolymeraserdquo cassette
ldquoHelicaserdquo cassette(A)
(B)
(C)
Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4
Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56
Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)
mRNA targeting
Phylogeny of Cas1 and the 3 types of CRISPRCas systems
228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes
EcoliCASS2Type IType I
Type I
Type I
Type I
Type III
Type III
Type III
Type III
Type III
Type III
Type II
YpestCASS3
NmeniCASS4
PolymeraseRAMPmoduleMtubeCASS6
PolRAMP
DvulgCASS1
TneapHmariCASS7
ApernCASS5
cas1cas9 cas2 cas4
cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI
II
IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2
RAMP
Makarova et al NRM submitted
CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes
54
256
1537
26
516
2
9
7 56107
3
10
13
383
210
46
216
8
1
7 70211
10
1
0102030405060708090
100
1623
84 6
17 7
2322
0
9
0
0
151
2
1
21
17
20 1
0
1533 28 7 14
0
9 7 40
117 210
0102030405060708090
100
Type I Type II Type IIICas1presenceabsence
Makarova et al in preparation
Modular evolution of the 3 types of CRISPRCas
Makarova et al in preparation
Mol Microbiol 2011 Jan79(2)484-502
A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair
Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF
Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair
Back to repair functions
The 3 major modalities of evolution
Koonin Wolf Biol Direct 2009
CRISPRCas as a bona fide Lamarckian system
Koonin Wolf Biol Direct 2009
Phenomenon Biological rolefunction
Phyletic spread
Lamarckian criteria
Genomic changes caused by environmental factor
Changes are specific to relevant genomic loci
Changes provide adaptation to the causative factor
Bona fide LamarckianCRISPRCas Defense against
viruses and other mobile elements
Most of the Archaeaand many bacteria
Yes Yes Yes
piRNA Defense against transposable elements in germline
Animals Yes Yes Yes
HGT (specific cases)
Adaptation to new environment stress response resistance
Archaea bacteria unicellular eukaryotes
Yes Yes Yes
Quasi-LamarckianHGT (general phenomenon)
Diverse innovations Archaea bacteria unicellular eukaryotes
Yes No Yesno
Stress-induced mutagenesis
Stress responseresistanceadaptation to new conditions
Ubiquitous Yes No or partially
Yes (but general evolvabilityenhanced as well)
Diverse Lamarckian and quasi-Lamarckian phenomena
Koonin Wolf Biol Direct 2009
Stress as a gauge of evolutionary modality
Koonin Wolf Biol Direct 2009
bullEvolution of parasites is intrinsic to any replicator system
bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since
bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms
bullPerennial arms race between parasites and hosts is one of the principal factors of evolution
Valerian DoljaOregon State University
Yuri Wolf NCBITatiana Senkevich NIAID NIH
Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille
Didier Raoult et al
Bill MartinUniv Duesseldorf
Kira Makarova NCBI
plasmid or phage mRNA
pre-psiRNA75-100 nt
CASS RNApol(COG1353)Or other RT
Reverse transcription with random copy choice
Random RNA recombination and reverse transcription
dsDNA with CRISPRs and target-derived spacers
Homologous recombination with genomic CRISPR region
integrase(COG1518)
genomic DNA
genomic DNA with new target-derived spacer
OR
New psiRNA generation
Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12
Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity
Key validation
bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease
Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes
(1) Adaptation
(2) Expression amp Processing
(3) Interference
invading virus plasmid
CRISPRleader
5 4 3 2 1
6 5 4 3 2 1
invading virus plasmid
invading virus plasmid
host
host
cas operon
Cascade
Cas3
cas3 cas2cas1cas5
Van der Oost et al TIBS 2009
The three types of CRISPRCas systems and their signature genes
1343
1343
1343
1343
Helicase1203
HD-f
Cas2
Cas6
Cas6
Cas5
Cas8
Cas9
Cas1
Cas3
Cas10
Cas11
Cas7
Cas6
Cas6
Cas5 Cas6
Cas4
Cas6
13433513
1343
RuvC-f
3513
HNH
13433513
1857
18571857 RAMP1688
RAMP1688
RecB-f1468
RecB-f1468
Helicase1203
R AMP1583
1857 1857Helicase1203
1857Helicase1203
1343 y1724
ygcKygcL
EcoliCASS2
YpestCASS3
NmeniCASS4
MtubeCASS6 RAMP module
DvulgCASS1
Tneap and HmariCASS7
ApernCASS5
Type I (Helicase cassette)
Cascade
Type II (HNH-type)
Type III (RAMP module or Polymerase cassette)
CASS4a
1857 1343RecB-f1468
Helicase1203
BH0338MTH1090LA3191
1421 RAMP1567
RAMP15833337RAMP
1583
RAMP1583
RAMP1769
SPy1049
Cas6
Cas61857
RecB-f1468
RAMP1688
Helicase1203
AF0070
R AMP1583
1343
1517
Cas12 Cas13
RAMP1688
RAMP1688
RAMP1688
Makarova et al NRM submitted
Experimental data on CRISPRCas systems
1857
1857
1857
RAMP1688
10
Cordip DIP0037Bacfra BF3951
Wolsuc WS1444Camjej Cj1522c
Neimen NMA0630Pasmul PM1126
Strthe str0658Strpyo Spy 0770
Treden TDE0328Staepi SERP2463
Mycgal MGA 0523Mycmob MMOB0320
Porgin PG1982Legpne lpp0161
Wolsuc WS1615Sulsol SSO1450
Aerper APE1240Sulsol SSO1405
Pyraer PAE0200Pyraer PAE0081
Metmaz MM0559Metmaz MM3249
Arcful AF1878Mansuc MS1635
Niteur NE0111Myctub Rv2817c
Nostoc alr0381Synech slr7071Thethe TTHB145
Chrvio CV1229Xanaxo XAC3842
Chltep CT1130Desvul DVUA0134
Metcap MCA0651Azoarc ebA3283
Deigeo_DRAFT_1682Mansuc MS0981
Baccla ABC3592Bachal BH0341
Strpyo Spy 1286Vibvul VVA1544
Nostoc alr1468Synech slr7092
Synech slr7016Geosul GSU0057
Thethe TTHB224Nostoc alr1568
Metkan MK1312
Esccol b2755Saltyp STM2938
Phopro PBPRC0034Geosul GSU1392
Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639
Symthe_STH663Corjei jk0643
Nocfar nfa44220Strave SAV7537
Metcap MCA0930Cordip DIP2214
Chrvio CV1756Zymmob ZMO0680
Pasmul PM0311Erwcar ECA3679
Yerpes y1722Legpne lpl2837Phopro PBPRB1995
Acinet ACIAD2484
Clotet CTC01148Fusnuc FN1177
Aquaeo aq 369Theten TTE2658Themar TM1797
Bacfra BF2544Porgin PG2014
Halmar pNG4053Clotet CTC01463
Metjan MJ0378Pyrhor PH1245
Thekod TK0455Metthe MTH1084
Arcful AF2435Metace MA3670
Nanequ NEQ017Pyrhor PH0173
Pictor PTO0003Pictor PTO0049Thevol TVN0106
1343
1343
1343
1343
1518
1518
1343
1518
RecB-f4343
RecB-f1468
RecB-f1468
RecB-f1468
Helicase12031857 RAMP
1688
RAMP1688
Helicase1203
Helicase1203
RAMPBH0337
RAMPYgcH
RAMPy1727
RAMPy17261857
Helicase1203
HD-f
Helicase1203
HD-f
1343
HD-f
HD-f
HD-f
RuvC-f
AF0070
BH0338
y1724
ygcKygcL
RAMP1583
HTH
AF1870
SPy1049
3574
AF1870
PH0918
HTH 3574
1857 RAMP1688
RecB-f1468
Helicase1203
HD-f
AF1870
MTH1090
AF1873
RAMP1583
3574
AF1870
3574
1343
1
7
2
3
4
5
6
7a
Sulsol SSO1999Sulsol SSO1440
Sultok ST2639Sultok ST0032
Sulsol SSO1402Pyraer PAE0068
Arcful AF1874Pyrhor PH0917Thekod TK0450
Pyraby PAB1689
13433513
3513
1518
RuvC-f
HNH
4aHNH
1 0
Pyrhor PH0162
6
64
6
4
16
17
7
7
7
7
1
2
7
7
7
7
5 7
5 75 7
5 75 7
5 7
6 1
7
Metkan MK1314
Metace MA1931
Metmaz MM3356
Metbar Mbar_A1355
Strthe stu0960Myctub Rv2823cStaepi SERP2461
Themar TM1811Mansuc MS1651
Niteur NE0123Thethe TTP0102
Metthe MTH1082
Metthe MTH326
Metjan MJ1672Pictor PTO0053Thevol TVN0112
Metkan MK1297Thethe TTP0115
Bachal BH0328Synech sll7090
Aquaeo aq 387Theten TTE2639
Themar TM1794Arcful AF1867
Pyrfur PF1129Thefus Tfu1578
Porgin PG1987Sultok ST1979
Sulsol SSO1991Sulsol SSO1729
Sultok ST0012Sulaci Saci 2046Nostoc all1479
1421 RAMP1567
RAMP1583
RAMPMTH323
RAMP1769
XMTH324
ldquoPolymeraserdquo cassette
ldquoHelicaserdquo cassette(A)
(B)
(C)
Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4
Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56
Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)
mRNA targeting
Phylogeny of Cas1 and the 3 types of CRISPRCas systems
228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes
EcoliCASS2Type IType I
Type I
Type I
Type I
Type III
Type III
Type III
Type III
Type III
Type III
Type II
YpestCASS3
NmeniCASS4
PolymeraseRAMPmoduleMtubeCASS6
PolRAMP
DvulgCASS1
TneapHmariCASS7
ApernCASS5
cas1cas9 cas2 cas4
cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI
II
IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2
RAMP
Makarova et al NRM submitted
CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes
54
256
1537
26
516
2
9
7 56107
3
10
13
383
210
46
216
8
1
7 70211
10
1
0102030405060708090
100
1623
84 6
17 7
2322
0
9
0
0
151
2
1
21
17
20 1
0
1533 28 7 14
0
9 7 40
117 210
0102030405060708090
100
Type I Type II Type IIICas1presenceabsence
Makarova et al in preparation
Modular evolution of the 3 types of CRISPRCas
Makarova et al in preparation
Mol Microbiol 2011 Jan79(2)484-502
A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair
Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF
Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair
Back to repair functions
The 3 major modalities of evolution
Koonin Wolf Biol Direct 2009
CRISPRCas as a bona fide Lamarckian system
Koonin Wolf Biol Direct 2009
Phenomenon Biological rolefunction
Phyletic spread
Lamarckian criteria
Genomic changes caused by environmental factor
Changes are specific to relevant genomic loci
Changes provide adaptation to the causative factor
Bona fide LamarckianCRISPRCas Defense against
viruses and other mobile elements
Most of the Archaeaand many bacteria
Yes Yes Yes
piRNA Defense against transposable elements in germline
Animals Yes Yes Yes
HGT (specific cases)
Adaptation to new environment stress response resistance
Archaea bacteria unicellular eukaryotes
Yes Yes Yes
Quasi-LamarckianHGT (general phenomenon)
Diverse innovations Archaea bacteria unicellular eukaryotes
Yes No Yesno
Stress-induced mutagenesis
Stress responseresistanceadaptation to new conditions
Ubiquitous Yes No or partially
Yes (but general evolvabilityenhanced as well)
Diverse Lamarckian and quasi-Lamarckian phenomena
Koonin Wolf Biol Direct 2009
Stress as a gauge of evolutionary modality
Koonin Wolf Biol Direct 2009
bullEvolution of parasites is intrinsic to any replicator system
bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since
bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms
bullPerennial arms race between parasites and hosts is one of the principal factors of evolution
Valerian DoljaOregon State University
Yuri Wolf NCBITatiana Senkevich NIAID NIH
Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille
Didier Raoult et al
Bill MartinUniv Duesseldorf
Kira Makarova NCBI
Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12
Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity
Key validation
bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease
Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes
(1) Adaptation
(2) Expression amp Processing
(3) Interference
invading virus plasmid
CRISPRleader
5 4 3 2 1
6 5 4 3 2 1
invading virus plasmid
invading virus plasmid
host
host
cas operon
Cascade
Cas3
cas3 cas2cas1cas5
Van der Oost et al TIBS 2009
The three types of CRISPRCas systems and their signature genes
1343
1343
1343
1343
Helicase1203
HD-f
Cas2
Cas6
Cas6
Cas5
Cas8
Cas9
Cas1
Cas3
Cas10
Cas11
Cas7
Cas6
Cas6
Cas5 Cas6
Cas4
Cas6
13433513
1343
RuvC-f
3513
HNH
13433513
1857
18571857 RAMP1688
RAMP1688
RecB-f1468
RecB-f1468
Helicase1203
R AMP1583
1857 1857Helicase1203
1857Helicase1203
1343 y1724
ygcKygcL
EcoliCASS2
YpestCASS3
NmeniCASS4
MtubeCASS6 RAMP module
DvulgCASS1
Tneap and HmariCASS7
ApernCASS5
Type I (Helicase cassette)
Cascade
Type II (HNH-type)
Type III (RAMP module or Polymerase cassette)
CASS4a
1857 1343RecB-f1468
Helicase1203
BH0338MTH1090LA3191
1421 RAMP1567
RAMP15833337RAMP
1583
RAMP1583
RAMP1769
SPy1049
Cas6
Cas61857
RecB-f1468
RAMP1688
Helicase1203
AF0070
R AMP1583
1343
1517
Cas12 Cas13
RAMP1688
RAMP1688
RAMP1688
Makarova et al NRM submitted
Experimental data on CRISPRCas systems
1857
1857
1857
RAMP1688
10
Cordip DIP0037Bacfra BF3951
Wolsuc WS1444Camjej Cj1522c
Neimen NMA0630Pasmul PM1126
Strthe str0658Strpyo Spy 0770
Treden TDE0328Staepi SERP2463
Mycgal MGA 0523Mycmob MMOB0320
Porgin PG1982Legpne lpp0161
Wolsuc WS1615Sulsol SSO1450
Aerper APE1240Sulsol SSO1405
Pyraer PAE0200Pyraer PAE0081
Metmaz MM0559Metmaz MM3249
Arcful AF1878Mansuc MS1635
Niteur NE0111Myctub Rv2817c
Nostoc alr0381Synech slr7071Thethe TTHB145
Chrvio CV1229Xanaxo XAC3842
Chltep CT1130Desvul DVUA0134
Metcap MCA0651Azoarc ebA3283
Deigeo_DRAFT_1682Mansuc MS0981
Baccla ABC3592Bachal BH0341
Strpyo Spy 1286Vibvul VVA1544
Nostoc alr1468Synech slr7092
Synech slr7016Geosul GSU0057
Thethe TTHB224Nostoc alr1568
Metkan MK1312
Esccol b2755Saltyp STM2938
Phopro PBPRC0034Geosul GSU1392
Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639
Symthe_STH663Corjei jk0643
Nocfar nfa44220Strave SAV7537
Metcap MCA0930Cordip DIP2214
Chrvio CV1756Zymmob ZMO0680
Pasmul PM0311Erwcar ECA3679
Yerpes y1722Legpne lpl2837Phopro PBPRB1995
Acinet ACIAD2484
Clotet CTC01148Fusnuc FN1177
Aquaeo aq 369Theten TTE2658Themar TM1797
Bacfra BF2544Porgin PG2014
Halmar pNG4053Clotet CTC01463
Metjan MJ0378Pyrhor PH1245
Thekod TK0455Metthe MTH1084
Arcful AF2435Metace MA3670
Nanequ NEQ017Pyrhor PH0173
Pictor PTO0003Pictor PTO0049Thevol TVN0106
1343
1343
1343
1343
1518
1518
1343
1518
RecB-f4343
RecB-f1468
RecB-f1468
RecB-f1468
Helicase12031857 RAMP
1688
RAMP1688
Helicase1203
Helicase1203
RAMPBH0337
RAMPYgcH
RAMPy1727
RAMPy17261857
Helicase1203
HD-f
Helicase1203
HD-f
1343
HD-f
HD-f
HD-f
RuvC-f
AF0070
BH0338
y1724
ygcKygcL
RAMP1583
HTH
AF1870
SPy1049
3574
AF1870
PH0918
HTH 3574
1857 RAMP1688
RecB-f1468
Helicase1203
HD-f
AF1870
MTH1090
AF1873
RAMP1583
3574
AF1870
3574
1343
1
7
2
3
4
5
6
7a
Sulsol SSO1999Sulsol SSO1440
Sultok ST2639Sultok ST0032
Sulsol SSO1402Pyraer PAE0068
Arcful AF1874Pyrhor PH0917Thekod TK0450
Pyraby PAB1689
13433513
3513
1518
RuvC-f
HNH
4aHNH
1 0
Pyrhor PH0162
6
64
6
4
16
17
7
7
7
7
1
2
7
7
7
7
5 7
5 75 7
5 75 7
5 7
6 1
7
Metkan MK1314
Metace MA1931
Metmaz MM3356
Metbar Mbar_A1355
Strthe stu0960Myctub Rv2823cStaepi SERP2461
Themar TM1811Mansuc MS1651
Niteur NE0123Thethe TTP0102
Metthe MTH1082
Metthe MTH326
Metjan MJ1672Pictor PTO0053Thevol TVN0112
Metkan MK1297Thethe TTP0115
Bachal BH0328Synech sll7090
Aquaeo aq 387Theten TTE2639
Themar TM1794Arcful AF1867
Pyrfur PF1129Thefus Tfu1578
Porgin PG1987Sultok ST1979
Sulsol SSO1991Sulsol SSO1729
Sultok ST0012Sulaci Saci 2046Nostoc all1479
1421 RAMP1567
RAMP1583
RAMPMTH323
RAMP1769
XMTH324
ldquoPolymeraserdquo cassette
ldquoHelicaserdquo cassette(A)
(B)
(C)
Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4
Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56
Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)
mRNA targeting
Phylogeny of Cas1 and the 3 types of CRISPRCas systems
228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes
EcoliCASS2Type IType I
Type I
Type I
Type I
Type III
Type III
Type III
Type III
Type III
Type III
Type II
YpestCASS3
NmeniCASS4
PolymeraseRAMPmoduleMtubeCASS6
PolRAMP
DvulgCASS1
TneapHmariCASS7
ApernCASS5
cas1cas9 cas2 cas4
cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI
II
IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2
RAMP
Makarova et al NRM submitted
CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes
54
256
1537
26
516
2
9
7 56107
3
10
13
383
210
46
216
8
1
7 70211
10
1
0102030405060708090
100
1623
84 6
17 7
2322
0
9
0
0
151
2
1
21
17
20 1
0
1533 28 7 14
0
9 7 40
117 210
0102030405060708090
100
Type I Type II Type IIICas1presenceabsence
Makarova et al in preparation
Modular evolution of the 3 types of CRISPRCas
Makarova et al in preparation
Mol Microbiol 2011 Jan79(2)484-502
A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair
Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF
Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair
Back to repair functions
The 3 major modalities of evolution
Koonin Wolf Biol Direct 2009
CRISPRCas as a bona fide Lamarckian system
Koonin Wolf Biol Direct 2009
Phenomenon Biological rolefunction
Phyletic spread
Lamarckian criteria
Genomic changes caused by environmental factor
Changes are specific to relevant genomic loci
Changes provide adaptation to the causative factor
Bona fide LamarckianCRISPRCas Defense against
viruses and other mobile elements
Most of the Archaeaand many bacteria
Yes Yes Yes
piRNA Defense against transposable elements in germline
Animals Yes Yes Yes
HGT (specific cases)
Adaptation to new environment stress response resistance
Archaea bacteria unicellular eukaryotes
Yes Yes Yes
Quasi-LamarckianHGT (general phenomenon)
Diverse innovations Archaea bacteria unicellular eukaryotes
Yes No Yesno
Stress-induced mutagenesis
Stress responseresistanceadaptation to new conditions
Ubiquitous Yes No or partially
Yes (but general evolvabilityenhanced as well)
Diverse Lamarckian and quasi-Lamarckian phenomena
Koonin Wolf Biol Direct 2009
Stress as a gauge of evolutionary modality
Koonin Wolf Biol Direct 2009
bullEvolution of parasites is intrinsic to any replicator system
bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since
bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms
bullPerennial arms race between parasites and hosts is one of the principal factors of evolution
Valerian DoljaOregon State University
Yuri Wolf NCBITatiana Senkevich NIAID NIH
Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille
Didier Raoult et al
Bill MartinUniv Duesseldorf
Kira Makarova NCBI
bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease
Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes
(1) Adaptation
(2) Expression amp Processing
(3) Interference
invading virus plasmid
CRISPRleader
5 4 3 2 1
6 5 4 3 2 1
invading virus plasmid
invading virus plasmid
host
host
cas operon
Cascade
Cas3
cas3 cas2cas1cas5
Van der Oost et al TIBS 2009
The three types of CRISPRCas systems and their signature genes
1343
1343
1343
1343
Helicase1203
HD-f
Cas2
Cas6
Cas6
Cas5
Cas8
Cas9
Cas1
Cas3
Cas10
Cas11
Cas7
Cas6
Cas6
Cas5 Cas6
Cas4
Cas6
13433513
1343
RuvC-f
3513
HNH
13433513
1857
18571857 RAMP1688
RAMP1688
RecB-f1468
RecB-f1468
Helicase1203
R AMP1583
1857 1857Helicase1203
1857Helicase1203
1343 y1724
ygcKygcL
EcoliCASS2
YpestCASS3
NmeniCASS4
MtubeCASS6 RAMP module
DvulgCASS1
Tneap and HmariCASS7
ApernCASS5
Type I (Helicase cassette)
Cascade
Type II (HNH-type)
Type III (RAMP module or Polymerase cassette)
CASS4a
1857 1343RecB-f1468
Helicase1203
BH0338MTH1090LA3191
1421 RAMP1567
RAMP15833337RAMP
1583
RAMP1583
RAMP1769
SPy1049
Cas6
Cas61857
RecB-f1468
RAMP1688
Helicase1203
AF0070
R AMP1583
1343
1517
Cas12 Cas13
RAMP1688
RAMP1688
RAMP1688
Makarova et al NRM submitted
Experimental data on CRISPRCas systems
1857
1857
1857
RAMP1688
10
Cordip DIP0037Bacfra BF3951
Wolsuc WS1444Camjej Cj1522c
Neimen NMA0630Pasmul PM1126
Strthe str0658Strpyo Spy 0770
Treden TDE0328Staepi SERP2463
Mycgal MGA 0523Mycmob MMOB0320
Porgin PG1982Legpne lpp0161
Wolsuc WS1615Sulsol SSO1450
Aerper APE1240Sulsol SSO1405
Pyraer PAE0200Pyraer PAE0081
Metmaz MM0559Metmaz MM3249
Arcful AF1878Mansuc MS1635
Niteur NE0111Myctub Rv2817c
Nostoc alr0381Synech slr7071Thethe TTHB145
Chrvio CV1229Xanaxo XAC3842
Chltep CT1130Desvul DVUA0134
Metcap MCA0651Azoarc ebA3283
Deigeo_DRAFT_1682Mansuc MS0981
Baccla ABC3592Bachal BH0341
Strpyo Spy 1286Vibvul VVA1544
Nostoc alr1468Synech slr7092
Synech slr7016Geosul GSU0057
Thethe TTHB224Nostoc alr1568
Metkan MK1312
Esccol b2755Saltyp STM2938
Phopro PBPRC0034Geosul GSU1392
Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639
Symthe_STH663Corjei jk0643
Nocfar nfa44220Strave SAV7537
Metcap MCA0930Cordip DIP2214
Chrvio CV1756Zymmob ZMO0680
Pasmul PM0311Erwcar ECA3679
Yerpes y1722Legpne lpl2837Phopro PBPRB1995
Acinet ACIAD2484
Clotet CTC01148Fusnuc FN1177
Aquaeo aq 369Theten TTE2658Themar TM1797
Bacfra BF2544Porgin PG2014
Halmar pNG4053Clotet CTC01463
Metjan MJ0378Pyrhor PH1245
Thekod TK0455Metthe MTH1084
Arcful AF2435Metace MA3670
Nanequ NEQ017Pyrhor PH0173
Pictor PTO0003Pictor PTO0049Thevol TVN0106
1343
1343
1343
1343
1518
1518
1343
1518
RecB-f4343
RecB-f1468
RecB-f1468
RecB-f1468
Helicase12031857 RAMP
1688
RAMP1688
Helicase1203
Helicase1203
RAMPBH0337
RAMPYgcH
RAMPy1727
RAMPy17261857
Helicase1203
HD-f
Helicase1203
HD-f
1343
HD-f
HD-f
HD-f
RuvC-f
AF0070
BH0338
y1724
ygcKygcL
RAMP1583
HTH
AF1870
SPy1049
3574
AF1870
PH0918
HTH 3574
1857 RAMP1688
RecB-f1468
Helicase1203
HD-f
AF1870
MTH1090
AF1873
RAMP1583
3574
AF1870
3574
1343
1
7
2
3
4
5
6
7a
Sulsol SSO1999Sulsol SSO1440
Sultok ST2639Sultok ST0032
Sulsol SSO1402Pyraer PAE0068
Arcful AF1874Pyrhor PH0917Thekod TK0450
Pyraby PAB1689
13433513
3513
1518
RuvC-f
HNH
4aHNH
1 0
Pyrhor PH0162
6
64
6
4
16
17
7
7
7
7
1
2
7
7
7
7
5 7
5 75 7
5 75 7
5 7
6 1
7
Metkan MK1314
Metace MA1931
Metmaz MM3356
Metbar Mbar_A1355
Strthe stu0960Myctub Rv2823cStaepi SERP2461
Themar TM1811Mansuc MS1651
Niteur NE0123Thethe TTP0102
Metthe MTH1082
Metthe MTH326
Metjan MJ1672Pictor PTO0053Thevol TVN0112
Metkan MK1297Thethe TTP0115
Bachal BH0328Synech sll7090
Aquaeo aq 387Theten TTE2639
Themar TM1794Arcful AF1867
Pyrfur PF1129Thefus Tfu1578
Porgin PG1987Sultok ST1979
Sulsol SSO1991Sulsol SSO1729
Sultok ST0012Sulaci Saci 2046Nostoc all1479
1421 RAMP1567
RAMP1583
RAMPMTH323
RAMP1769
XMTH324
ldquoPolymeraserdquo cassette
ldquoHelicaserdquo cassette(A)
(B)
(C)
Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4
Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56
Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)
mRNA targeting
Phylogeny of Cas1 and the 3 types of CRISPRCas systems
228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes
EcoliCASS2Type IType I
Type I
Type I
Type I
Type III
Type III
Type III
Type III
Type III
Type III
Type II
YpestCASS3
NmeniCASS4
PolymeraseRAMPmoduleMtubeCASS6
PolRAMP
DvulgCASS1
TneapHmariCASS7
ApernCASS5
cas1cas9 cas2 cas4
cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI
II
IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2
RAMP
Makarova et al NRM submitted
CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes
54
256
1537
26
516
2
9
7 56107
3
10
13
383
210
46
216
8
1
7 70211
10
1
0102030405060708090
100
1623
84 6
17 7
2322
0
9
0
0
151
2
1
21
17
20 1
0
1533 28 7 14
0
9 7 40
117 210
0102030405060708090
100
Type I Type II Type IIICas1presenceabsence
Makarova et al in preparation
Modular evolution of the 3 types of CRISPRCas
Makarova et al in preparation
Mol Microbiol 2011 Jan79(2)484-502
A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair
Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF
Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair
Back to repair functions
The 3 major modalities of evolution
Koonin Wolf Biol Direct 2009
CRISPRCas as a bona fide Lamarckian system
Koonin Wolf Biol Direct 2009
Phenomenon Biological rolefunction
Phyletic spread
Lamarckian criteria
Genomic changes caused by environmental factor
Changes are specific to relevant genomic loci
Changes provide adaptation to the causative factor
Bona fide LamarckianCRISPRCas Defense against
viruses and other mobile elements
Most of the Archaeaand many bacteria
Yes Yes Yes
piRNA Defense against transposable elements in germline
Animals Yes Yes Yes
HGT (specific cases)
Adaptation to new environment stress response resistance
Archaea bacteria unicellular eukaryotes
Yes Yes Yes
Quasi-LamarckianHGT (general phenomenon)
Diverse innovations Archaea bacteria unicellular eukaryotes
Yes No Yesno
Stress-induced mutagenesis
Stress responseresistanceadaptation to new conditions
Ubiquitous Yes No or partially
Yes (but general evolvabilityenhanced as well)
Diverse Lamarckian and quasi-Lamarckian phenomena
Koonin Wolf Biol Direct 2009
Stress as a gauge of evolutionary modality
Koonin Wolf Biol Direct 2009
bullEvolution of parasites is intrinsic to any replicator system
bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since
bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms
bullPerennial arms race between parasites and hosts is one of the principal factors of evolution
Valerian DoljaOregon State University
Yuri Wolf NCBITatiana Senkevich NIAID NIH
Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille
Didier Raoult et al
Bill MartinUniv Duesseldorf
Kira Makarova NCBI
Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes
(1) Adaptation
(2) Expression amp Processing
(3) Interference
invading virus plasmid
CRISPRleader
5 4 3 2 1
6 5 4 3 2 1
invading virus plasmid
invading virus plasmid
host
host
cas operon
Cascade
Cas3
cas3 cas2cas1cas5
Van der Oost et al TIBS 2009
The three types of CRISPRCas systems and their signature genes
1343
1343
1343
1343
Helicase1203
HD-f
Cas2
Cas6
Cas6
Cas5
Cas8
Cas9
Cas1
Cas3
Cas10
Cas11
Cas7
Cas6
Cas6
Cas5 Cas6
Cas4
Cas6
13433513
1343
RuvC-f
3513
HNH
13433513
1857
18571857 RAMP1688
RAMP1688
RecB-f1468
RecB-f1468
Helicase1203
R AMP1583
1857 1857Helicase1203
1857Helicase1203
1343 y1724
ygcKygcL
EcoliCASS2
YpestCASS3
NmeniCASS4
MtubeCASS6 RAMP module
DvulgCASS1
Tneap and HmariCASS7
ApernCASS5
Type I (Helicase cassette)
Cascade
Type II (HNH-type)
Type III (RAMP module or Polymerase cassette)
CASS4a
1857 1343RecB-f1468
Helicase1203
BH0338MTH1090LA3191
1421 RAMP1567
RAMP15833337RAMP
1583
RAMP1583
RAMP1769
SPy1049
Cas6
Cas61857
RecB-f1468
RAMP1688
Helicase1203
AF0070
R AMP1583
1343
1517
Cas12 Cas13
RAMP1688
RAMP1688
RAMP1688
Makarova et al NRM submitted
Experimental data on CRISPRCas systems
1857
1857
1857
RAMP1688
10
Cordip DIP0037Bacfra BF3951
Wolsuc WS1444Camjej Cj1522c
Neimen NMA0630Pasmul PM1126
Strthe str0658Strpyo Spy 0770
Treden TDE0328Staepi SERP2463
Mycgal MGA 0523Mycmob MMOB0320
Porgin PG1982Legpne lpp0161
Wolsuc WS1615Sulsol SSO1450
Aerper APE1240Sulsol SSO1405
Pyraer PAE0200Pyraer PAE0081
Metmaz MM0559Metmaz MM3249
Arcful AF1878Mansuc MS1635
Niteur NE0111Myctub Rv2817c
Nostoc alr0381Synech slr7071Thethe TTHB145
Chrvio CV1229Xanaxo XAC3842
Chltep CT1130Desvul DVUA0134
Metcap MCA0651Azoarc ebA3283
Deigeo_DRAFT_1682Mansuc MS0981
Baccla ABC3592Bachal BH0341
Strpyo Spy 1286Vibvul VVA1544
Nostoc alr1468Synech slr7092
Synech slr7016Geosul GSU0057
Thethe TTHB224Nostoc alr1568
Metkan MK1312
Esccol b2755Saltyp STM2938
Phopro PBPRC0034Geosul GSU1392
Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639
Symthe_STH663Corjei jk0643
Nocfar nfa44220Strave SAV7537
Metcap MCA0930Cordip DIP2214
Chrvio CV1756Zymmob ZMO0680
Pasmul PM0311Erwcar ECA3679
Yerpes y1722Legpne lpl2837Phopro PBPRB1995
Acinet ACIAD2484
Clotet CTC01148Fusnuc FN1177
Aquaeo aq 369Theten TTE2658Themar TM1797
Bacfra BF2544Porgin PG2014
Halmar pNG4053Clotet CTC01463
Metjan MJ0378Pyrhor PH1245
Thekod TK0455Metthe MTH1084
Arcful AF2435Metace MA3670
Nanequ NEQ017Pyrhor PH0173
Pictor PTO0003Pictor PTO0049Thevol TVN0106
1343
1343
1343
1343
1518
1518
1343
1518
RecB-f4343
RecB-f1468
RecB-f1468
RecB-f1468
Helicase12031857 RAMP
1688
RAMP1688
Helicase1203
Helicase1203
RAMPBH0337
RAMPYgcH
RAMPy1727
RAMPy17261857
Helicase1203
HD-f
Helicase1203
HD-f
1343
HD-f
HD-f
HD-f
RuvC-f
AF0070
BH0338
y1724
ygcKygcL
RAMP1583
HTH
AF1870
SPy1049
3574
AF1870
PH0918
HTH 3574
1857 RAMP1688
RecB-f1468
Helicase1203
HD-f
AF1870
MTH1090
AF1873
RAMP1583
3574
AF1870
3574
1343
1
7
2
3
4
5
6
7a
Sulsol SSO1999Sulsol SSO1440
Sultok ST2639Sultok ST0032
Sulsol SSO1402Pyraer PAE0068
Arcful AF1874Pyrhor PH0917Thekod TK0450
Pyraby PAB1689
13433513
3513
1518
RuvC-f
HNH
4aHNH
1 0
Pyrhor PH0162
6
64
6
4
16
17
7
7
7
7
1
2
7
7
7
7
5 7
5 75 7
5 75 7
5 7
6 1
7
Metkan MK1314
Metace MA1931
Metmaz MM3356
Metbar Mbar_A1355
Strthe stu0960Myctub Rv2823cStaepi SERP2461
Themar TM1811Mansuc MS1651
Niteur NE0123Thethe TTP0102
Metthe MTH1082
Metthe MTH326
Metjan MJ1672Pictor PTO0053Thevol TVN0112
Metkan MK1297Thethe TTP0115
Bachal BH0328Synech sll7090
Aquaeo aq 387Theten TTE2639
Themar TM1794Arcful AF1867
Pyrfur PF1129Thefus Tfu1578
Porgin PG1987Sultok ST1979
Sulsol SSO1991Sulsol SSO1729
Sultok ST0012Sulaci Saci 2046Nostoc all1479
1421 RAMP1567
RAMP1583
RAMPMTH323
RAMP1769
XMTH324
ldquoPolymeraserdquo cassette
ldquoHelicaserdquo cassette(A)
(B)
(C)
Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4
Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56
Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)
mRNA targeting
Phylogeny of Cas1 and the 3 types of CRISPRCas systems
228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes
EcoliCASS2Type IType I
Type I
Type I
Type I
Type III
Type III
Type III
Type III
Type III
Type III
Type II
YpestCASS3
NmeniCASS4
PolymeraseRAMPmoduleMtubeCASS6
PolRAMP
DvulgCASS1
TneapHmariCASS7
ApernCASS5
cas1cas9 cas2 cas4
cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI
II
IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2
RAMP
Makarova et al NRM submitted
CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes
54
256
1537
26
516
2
9
7 56107
3
10
13
383
210
46
216
8
1
7 70211
10
1
0102030405060708090
100
1623
84 6
17 7
2322
0
9
0
0
151
2
1
21
17
20 1
0
1533 28 7 14
0
9 7 40
117 210
0102030405060708090
100
Type I Type II Type IIICas1presenceabsence
Makarova et al in preparation
Modular evolution of the 3 types of CRISPRCas
Makarova et al in preparation
Mol Microbiol 2011 Jan79(2)484-502
A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair
Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF
Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair
Back to repair functions
The 3 major modalities of evolution
Koonin Wolf Biol Direct 2009
CRISPRCas as a bona fide Lamarckian system
Koonin Wolf Biol Direct 2009
Phenomenon Biological rolefunction
Phyletic spread
Lamarckian criteria
Genomic changes caused by environmental factor
Changes are specific to relevant genomic loci
Changes provide adaptation to the causative factor
Bona fide LamarckianCRISPRCas Defense against
viruses and other mobile elements
Most of the Archaeaand many bacteria
Yes Yes Yes
piRNA Defense against transposable elements in germline
Animals Yes Yes Yes
HGT (specific cases)
Adaptation to new environment stress response resistance
Archaea bacteria unicellular eukaryotes
Yes Yes Yes
Quasi-LamarckianHGT (general phenomenon)
Diverse innovations Archaea bacteria unicellular eukaryotes
Yes No Yesno
Stress-induced mutagenesis
Stress responseresistanceadaptation to new conditions
Ubiquitous Yes No or partially
Yes (but general evolvabilityenhanced as well)
Diverse Lamarckian and quasi-Lamarckian phenomena
Koonin Wolf Biol Direct 2009
Stress as a gauge of evolutionary modality
Koonin Wolf Biol Direct 2009
bullEvolution of parasites is intrinsic to any replicator system
bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since
bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms
bullPerennial arms race between parasites and hosts is one of the principal factors of evolution
Valerian DoljaOregon State University
Yuri Wolf NCBITatiana Senkevich NIAID NIH
Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille
Didier Raoult et al
Bill MartinUniv Duesseldorf
Kira Makarova NCBI
(1) Adaptation
(2) Expression amp Processing
(3) Interference
invading virus plasmid
CRISPRleader
5 4 3 2 1
6 5 4 3 2 1
invading virus plasmid
invading virus plasmid
host
host
cas operon
Cascade
Cas3
cas3 cas2cas1cas5
Van der Oost et al TIBS 2009
The three types of CRISPRCas systems and their signature genes
1343
1343
1343
1343
Helicase1203
HD-f
Cas2
Cas6
Cas6
Cas5
Cas8
Cas9
Cas1
Cas3
Cas10
Cas11
Cas7
Cas6
Cas6
Cas5 Cas6
Cas4
Cas6
13433513
1343
RuvC-f
3513
HNH
13433513
1857
18571857 RAMP1688
RAMP1688
RecB-f1468
RecB-f1468
Helicase1203
R AMP1583
1857 1857Helicase1203
1857Helicase1203
1343 y1724
ygcKygcL
EcoliCASS2
YpestCASS3
NmeniCASS4
MtubeCASS6 RAMP module
DvulgCASS1
Tneap and HmariCASS7
ApernCASS5
Type I (Helicase cassette)
Cascade
Type II (HNH-type)
Type III (RAMP module or Polymerase cassette)
CASS4a
1857 1343RecB-f1468
Helicase1203
BH0338MTH1090LA3191
1421 RAMP1567
RAMP15833337RAMP
1583
RAMP1583
RAMP1769
SPy1049
Cas6
Cas61857
RecB-f1468
RAMP1688
Helicase1203
AF0070
R AMP1583
1343
1517
Cas12 Cas13
RAMP1688
RAMP1688
RAMP1688
Makarova et al NRM submitted
Experimental data on CRISPRCas systems
1857
1857
1857
RAMP1688
10
Cordip DIP0037Bacfra BF3951
Wolsuc WS1444Camjej Cj1522c
Neimen NMA0630Pasmul PM1126
Strthe str0658Strpyo Spy 0770
Treden TDE0328Staepi SERP2463
Mycgal MGA 0523Mycmob MMOB0320
Porgin PG1982Legpne lpp0161
Wolsuc WS1615Sulsol SSO1450
Aerper APE1240Sulsol SSO1405
Pyraer PAE0200Pyraer PAE0081
Metmaz MM0559Metmaz MM3249
Arcful AF1878Mansuc MS1635
Niteur NE0111Myctub Rv2817c
Nostoc alr0381Synech slr7071Thethe TTHB145
Chrvio CV1229Xanaxo XAC3842
Chltep CT1130Desvul DVUA0134
Metcap MCA0651Azoarc ebA3283
Deigeo_DRAFT_1682Mansuc MS0981
Baccla ABC3592Bachal BH0341
Strpyo Spy 1286Vibvul VVA1544
Nostoc alr1468Synech slr7092
Synech slr7016Geosul GSU0057
Thethe TTHB224Nostoc alr1568
Metkan MK1312
Esccol b2755Saltyp STM2938
Phopro PBPRC0034Geosul GSU1392
Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639
Symthe_STH663Corjei jk0643
Nocfar nfa44220Strave SAV7537
Metcap MCA0930Cordip DIP2214
Chrvio CV1756Zymmob ZMO0680
Pasmul PM0311Erwcar ECA3679
Yerpes y1722Legpne lpl2837Phopro PBPRB1995
Acinet ACIAD2484
Clotet CTC01148Fusnuc FN1177
Aquaeo aq 369Theten TTE2658Themar TM1797
Bacfra BF2544Porgin PG2014
Halmar pNG4053Clotet CTC01463
Metjan MJ0378Pyrhor PH1245
Thekod TK0455Metthe MTH1084
Arcful AF2435Metace MA3670
Nanequ NEQ017Pyrhor PH0173
Pictor PTO0003Pictor PTO0049Thevol TVN0106
1343
1343
1343
1343
1518
1518
1343
1518
RecB-f4343
RecB-f1468
RecB-f1468
RecB-f1468
Helicase12031857 RAMP
1688
RAMP1688
Helicase1203
Helicase1203
RAMPBH0337
RAMPYgcH
RAMPy1727
RAMPy17261857
Helicase1203
HD-f
Helicase1203
HD-f
1343
HD-f
HD-f
HD-f
RuvC-f
AF0070
BH0338
y1724
ygcKygcL
RAMP1583
HTH
AF1870
SPy1049
3574
AF1870
PH0918
HTH 3574
1857 RAMP1688
RecB-f1468
Helicase1203
HD-f
AF1870
MTH1090
AF1873
RAMP1583
3574
AF1870
3574
1343
1
7
2
3
4
5
6
7a
Sulsol SSO1999Sulsol SSO1440
Sultok ST2639Sultok ST0032
Sulsol SSO1402Pyraer PAE0068
Arcful AF1874Pyrhor PH0917Thekod TK0450
Pyraby PAB1689
13433513
3513
1518
RuvC-f
HNH
4aHNH
1 0
Pyrhor PH0162
6
64
6
4
16
17
7
7
7
7
1
2
7
7
7
7
5 7
5 75 7
5 75 7
5 7
6 1
7
Metkan MK1314
Metace MA1931
Metmaz MM3356
Metbar Mbar_A1355
Strthe stu0960Myctub Rv2823cStaepi SERP2461
Themar TM1811Mansuc MS1651
Niteur NE0123Thethe TTP0102
Metthe MTH1082
Metthe MTH326
Metjan MJ1672Pictor PTO0053Thevol TVN0112
Metkan MK1297Thethe TTP0115
Bachal BH0328Synech sll7090
Aquaeo aq 387Theten TTE2639
Themar TM1794Arcful AF1867
Pyrfur PF1129Thefus Tfu1578
Porgin PG1987Sultok ST1979
Sulsol SSO1991Sulsol SSO1729
Sultok ST0012Sulaci Saci 2046Nostoc all1479
1421 RAMP1567
RAMP1583
RAMPMTH323
RAMP1769
XMTH324
ldquoPolymeraserdquo cassette
ldquoHelicaserdquo cassette(A)
(B)
(C)
Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4
Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56
Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)
mRNA targeting
Phylogeny of Cas1 and the 3 types of CRISPRCas systems
228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes
EcoliCASS2Type IType I
Type I
Type I
Type I
Type III
Type III
Type III
Type III
Type III
Type III
Type II
YpestCASS3
NmeniCASS4
PolymeraseRAMPmoduleMtubeCASS6
PolRAMP
DvulgCASS1
TneapHmariCASS7
ApernCASS5
cas1cas9 cas2 cas4
cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI
II
IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2
RAMP
Makarova et al NRM submitted
CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes
54
256
1537
26
516
2
9
7 56107
3
10
13
383
210
46
216
8
1
7 70211
10
1
0102030405060708090
100
1623
84 6
17 7
2322
0
9
0
0
151
2
1
21
17
20 1
0
1533 28 7 14
0
9 7 40
117 210
0102030405060708090
100
Type I Type II Type IIICas1presenceabsence
Makarova et al in preparation
Modular evolution of the 3 types of CRISPRCas
Makarova et al in preparation
Mol Microbiol 2011 Jan79(2)484-502
A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair
Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF
Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair
Back to repair functions
The 3 major modalities of evolution
Koonin Wolf Biol Direct 2009
CRISPRCas as a bona fide Lamarckian system
Koonin Wolf Biol Direct 2009
Phenomenon Biological rolefunction
Phyletic spread
Lamarckian criteria
Genomic changes caused by environmental factor
Changes are specific to relevant genomic loci
Changes provide adaptation to the causative factor
Bona fide LamarckianCRISPRCas Defense against
viruses and other mobile elements
Most of the Archaeaand many bacteria
Yes Yes Yes
piRNA Defense against transposable elements in germline
Animals Yes Yes Yes
HGT (specific cases)
Adaptation to new environment stress response resistance
Archaea bacteria unicellular eukaryotes
Yes Yes Yes
Quasi-LamarckianHGT (general phenomenon)
Diverse innovations Archaea bacteria unicellular eukaryotes
Yes No Yesno
Stress-induced mutagenesis
Stress responseresistanceadaptation to new conditions
Ubiquitous Yes No or partially
Yes (but general evolvabilityenhanced as well)
Diverse Lamarckian and quasi-Lamarckian phenomena
Koonin Wolf Biol Direct 2009
Stress as a gauge of evolutionary modality
Koonin Wolf Biol Direct 2009
bullEvolution of parasites is intrinsic to any replicator system
bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since
bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms
bullPerennial arms race between parasites and hosts is one of the principal factors of evolution
Valerian DoljaOregon State University
Yuri Wolf NCBITatiana Senkevich NIAID NIH
Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille
Didier Raoult et al
Bill MartinUniv Duesseldorf
Kira Makarova NCBI
The three types of CRISPRCas systems and their signature genes
1343
1343
1343
1343
Helicase1203
HD-f
Cas2
Cas6
Cas6
Cas5
Cas8
Cas9
Cas1
Cas3
Cas10
Cas11
Cas7
Cas6
Cas6
Cas5 Cas6
Cas4
Cas6
13433513
1343
RuvC-f
3513
HNH
13433513
1857
18571857 RAMP1688
RAMP1688
RecB-f1468
RecB-f1468
Helicase1203
R AMP1583
1857 1857Helicase1203
1857Helicase1203
1343 y1724
ygcKygcL
EcoliCASS2
YpestCASS3
NmeniCASS4
MtubeCASS6 RAMP module
DvulgCASS1
Tneap and HmariCASS7
ApernCASS5
Type I (Helicase cassette)
Cascade
Type II (HNH-type)
Type III (RAMP module or Polymerase cassette)
CASS4a
1857 1343RecB-f1468
Helicase1203
BH0338MTH1090LA3191
1421 RAMP1567
RAMP15833337RAMP
1583
RAMP1583
RAMP1769
SPy1049
Cas6
Cas61857
RecB-f1468
RAMP1688
Helicase1203
AF0070
R AMP1583
1343
1517
Cas12 Cas13
RAMP1688
RAMP1688
RAMP1688
Makarova et al NRM submitted
Experimental data on CRISPRCas systems
1857
1857
1857
RAMP1688
10
Cordip DIP0037Bacfra BF3951
Wolsuc WS1444Camjej Cj1522c
Neimen NMA0630Pasmul PM1126
Strthe str0658Strpyo Spy 0770
Treden TDE0328Staepi SERP2463
Mycgal MGA 0523Mycmob MMOB0320
Porgin PG1982Legpne lpp0161
Wolsuc WS1615Sulsol SSO1450
Aerper APE1240Sulsol SSO1405
Pyraer PAE0200Pyraer PAE0081
Metmaz MM0559Metmaz MM3249
Arcful AF1878Mansuc MS1635
Niteur NE0111Myctub Rv2817c
Nostoc alr0381Synech slr7071Thethe TTHB145
Chrvio CV1229Xanaxo XAC3842
Chltep CT1130Desvul DVUA0134
Metcap MCA0651Azoarc ebA3283
Deigeo_DRAFT_1682Mansuc MS0981
Baccla ABC3592Bachal BH0341
Strpyo Spy 1286Vibvul VVA1544
Nostoc alr1468Synech slr7092
Synech slr7016Geosul GSU0057
Thethe TTHB224Nostoc alr1568
Metkan MK1312
Esccol b2755Saltyp STM2938
Phopro PBPRC0034Geosul GSU1392
Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639
Symthe_STH663Corjei jk0643
Nocfar nfa44220Strave SAV7537
Metcap MCA0930Cordip DIP2214
Chrvio CV1756Zymmob ZMO0680
Pasmul PM0311Erwcar ECA3679
Yerpes y1722Legpne lpl2837Phopro PBPRB1995
Acinet ACIAD2484
Clotet CTC01148Fusnuc FN1177
Aquaeo aq 369Theten TTE2658Themar TM1797
Bacfra BF2544Porgin PG2014
Halmar pNG4053Clotet CTC01463
Metjan MJ0378Pyrhor PH1245
Thekod TK0455Metthe MTH1084
Arcful AF2435Metace MA3670
Nanequ NEQ017Pyrhor PH0173
Pictor PTO0003Pictor PTO0049Thevol TVN0106
1343
1343
1343
1343
1518
1518
1343
1518
RecB-f4343
RecB-f1468
RecB-f1468
RecB-f1468
Helicase12031857 RAMP
1688
RAMP1688
Helicase1203
Helicase1203
RAMPBH0337
RAMPYgcH
RAMPy1727
RAMPy17261857
Helicase1203
HD-f
Helicase1203
HD-f
1343
HD-f
HD-f
HD-f
RuvC-f
AF0070
BH0338
y1724
ygcKygcL
RAMP1583
HTH
AF1870
SPy1049
3574
AF1870
PH0918
HTH 3574
1857 RAMP1688
RecB-f1468
Helicase1203
HD-f
AF1870
MTH1090
AF1873
RAMP1583
3574
AF1870
3574
1343
1
7
2
3
4
5
6
7a
Sulsol SSO1999Sulsol SSO1440
Sultok ST2639Sultok ST0032
Sulsol SSO1402Pyraer PAE0068
Arcful AF1874Pyrhor PH0917Thekod TK0450
Pyraby PAB1689
13433513
3513
1518
RuvC-f
HNH
4aHNH
1 0
Pyrhor PH0162
6
64
6
4
16
17
7
7
7
7
1
2
7
7
7
7
5 7
5 75 7
5 75 7
5 7
6 1
7
Metkan MK1314
Metace MA1931
Metmaz MM3356
Metbar Mbar_A1355
Strthe stu0960Myctub Rv2823cStaepi SERP2461
Themar TM1811Mansuc MS1651
Niteur NE0123Thethe TTP0102
Metthe MTH1082
Metthe MTH326
Metjan MJ1672Pictor PTO0053Thevol TVN0112
Metkan MK1297Thethe TTP0115
Bachal BH0328Synech sll7090
Aquaeo aq 387Theten TTE2639
Themar TM1794Arcful AF1867
Pyrfur PF1129Thefus Tfu1578
Porgin PG1987Sultok ST1979
Sulsol SSO1991Sulsol SSO1729
Sultok ST0012Sulaci Saci 2046Nostoc all1479
1421 RAMP1567
RAMP1583
RAMPMTH323
RAMP1769
XMTH324
ldquoPolymeraserdquo cassette
ldquoHelicaserdquo cassette(A)
(B)
(C)
Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4
Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56
Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)
mRNA targeting
Phylogeny of Cas1 and the 3 types of CRISPRCas systems
228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes
EcoliCASS2Type IType I
Type I
Type I
Type I
Type III
Type III
Type III
Type III
Type III
Type III
Type II
YpestCASS3
NmeniCASS4
PolymeraseRAMPmoduleMtubeCASS6
PolRAMP
DvulgCASS1
TneapHmariCASS7
ApernCASS5
cas1cas9 cas2 cas4
cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI
II
IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2
RAMP
Makarova et al NRM submitted
CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes
54
256
1537
26
516
2
9
7 56107
3
10
13
383
210
46
216
8
1
7 70211
10
1
0102030405060708090
100
1623
84 6
17 7
2322
0
9
0
0
151
2
1
21
17
20 1
0
1533 28 7 14
0
9 7 40
117 210
0102030405060708090
100
Type I Type II Type IIICas1presenceabsence
Makarova et al in preparation
Modular evolution of the 3 types of CRISPRCas
Makarova et al in preparation
Mol Microbiol 2011 Jan79(2)484-502
A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair
Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF
Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair
Back to repair functions
The 3 major modalities of evolution
Koonin Wolf Biol Direct 2009
CRISPRCas as a bona fide Lamarckian system
Koonin Wolf Biol Direct 2009
Phenomenon Biological rolefunction
Phyletic spread
Lamarckian criteria
Genomic changes caused by environmental factor
Changes are specific to relevant genomic loci
Changes provide adaptation to the causative factor
Bona fide LamarckianCRISPRCas Defense against
viruses and other mobile elements
Most of the Archaeaand many bacteria
Yes Yes Yes
piRNA Defense against transposable elements in germline
Animals Yes Yes Yes
HGT (specific cases)
Adaptation to new environment stress response resistance
Archaea bacteria unicellular eukaryotes
Yes Yes Yes
Quasi-LamarckianHGT (general phenomenon)
Diverse innovations Archaea bacteria unicellular eukaryotes
Yes No Yesno
Stress-induced mutagenesis
Stress responseresistanceadaptation to new conditions
Ubiquitous Yes No or partially
Yes (but general evolvabilityenhanced as well)
Diverse Lamarckian and quasi-Lamarckian phenomena
Koonin Wolf Biol Direct 2009
Stress as a gauge of evolutionary modality
Koonin Wolf Biol Direct 2009
bullEvolution of parasites is intrinsic to any replicator system
bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since
bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms
bullPerennial arms race between parasites and hosts is one of the principal factors of evolution
Valerian DoljaOregon State University
Yuri Wolf NCBITatiana Senkevich NIAID NIH
Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille
Didier Raoult et al
Bill MartinUniv Duesseldorf
Kira Makarova NCBI
Experimental data on CRISPRCas systems
1857
1857
1857
RAMP1688
10
Cordip DIP0037Bacfra BF3951
Wolsuc WS1444Camjej Cj1522c
Neimen NMA0630Pasmul PM1126
Strthe str0658Strpyo Spy 0770
Treden TDE0328Staepi SERP2463
Mycgal MGA 0523Mycmob MMOB0320
Porgin PG1982Legpne lpp0161
Wolsuc WS1615Sulsol SSO1450
Aerper APE1240Sulsol SSO1405
Pyraer PAE0200Pyraer PAE0081
Metmaz MM0559Metmaz MM3249
Arcful AF1878Mansuc MS1635
Niteur NE0111Myctub Rv2817c
Nostoc alr0381Synech slr7071Thethe TTHB145
Chrvio CV1229Xanaxo XAC3842
Chltep CT1130Desvul DVUA0134
Metcap MCA0651Azoarc ebA3283
Deigeo_DRAFT_1682Mansuc MS0981
Baccla ABC3592Bachal BH0341
Strpyo Spy 1286Vibvul VVA1544
Nostoc alr1468Synech slr7092
Synech slr7016Geosul GSU0057
Thethe TTHB224Nostoc alr1568
Metkan MK1312
Esccol b2755Saltyp STM2938
Phopro PBPRC0034Geosul GSU1392
Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639
Symthe_STH663Corjei jk0643
Nocfar nfa44220Strave SAV7537
Metcap MCA0930Cordip DIP2214
Chrvio CV1756Zymmob ZMO0680
Pasmul PM0311Erwcar ECA3679
Yerpes y1722Legpne lpl2837Phopro PBPRB1995
Acinet ACIAD2484
Clotet CTC01148Fusnuc FN1177
Aquaeo aq 369Theten TTE2658Themar TM1797
Bacfra BF2544Porgin PG2014
Halmar pNG4053Clotet CTC01463
Metjan MJ0378Pyrhor PH1245
Thekod TK0455Metthe MTH1084
Arcful AF2435Metace MA3670
Nanequ NEQ017Pyrhor PH0173
Pictor PTO0003Pictor PTO0049Thevol TVN0106
1343
1343
1343
1343
1518
1518
1343
1518
RecB-f4343
RecB-f1468
RecB-f1468
RecB-f1468
Helicase12031857 RAMP
1688
RAMP1688
Helicase1203
Helicase1203
RAMPBH0337
RAMPYgcH
RAMPy1727
RAMPy17261857
Helicase1203
HD-f
Helicase1203
HD-f
1343
HD-f
HD-f
HD-f
RuvC-f
AF0070
BH0338
y1724
ygcKygcL
RAMP1583
HTH
AF1870
SPy1049
3574
AF1870
PH0918
HTH 3574
1857 RAMP1688
RecB-f1468
Helicase1203
HD-f
AF1870
MTH1090
AF1873
RAMP1583
3574
AF1870
3574
1343
1
7
2
3
4
5
6
7a
Sulsol SSO1999Sulsol SSO1440
Sultok ST2639Sultok ST0032
Sulsol SSO1402Pyraer PAE0068
Arcful AF1874Pyrhor PH0917Thekod TK0450
Pyraby PAB1689
13433513
3513
1518
RuvC-f
HNH
4aHNH
1 0
Pyrhor PH0162
6
64
6
4
16
17
7
7
7
7
1
2
7
7
7
7
5 7
5 75 7
5 75 7
5 7
6 1
7
Metkan MK1314
Metace MA1931
Metmaz MM3356
Metbar Mbar_A1355
Strthe stu0960Myctub Rv2823cStaepi SERP2461
Themar TM1811Mansuc MS1651
Niteur NE0123Thethe TTP0102
Metthe MTH1082
Metthe MTH326
Metjan MJ1672Pictor PTO0053Thevol TVN0112
Metkan MK1297Thethe TTP0115
Bachal BH0328Synech sll7090
Aquaeo aq 387Theten TTE2639
Themar TM1794Arcful AF1867
Pyrfur PF1129Thefus Tfu1578
Porgin PG1987Sultok ST1979
Sulsol SSO1991Sulsol SSO1729
Sultok ST0012Sulaci Saci 2046Nostoc all1479
1421 RAMP1567
RAMP1583
RAMPMTH323
RAMP1769
XMTH324
ldquoPolymeraserdquo cassette
ldquoHelicaserdquo cassette(A)
(B)
(C)
Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5
E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4
Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56
Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)
mRNA targeting
Phylogeny of Cas1 and the 3 types of CRISPRCas systems
228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes
EcoliCASS2Type IType I
Type I
Type I
Type I
Type III
Type III
Type III
Type III
Type III
Type III
Type II
YpestCASS3
NmeniCASS4
PolymeraseRAMPmoduleMtubeCASS6
PolRAMP
DvulgCASS1
TneapHmariCASS7
ApernCASS5
cas1cas9 cas2 cas4
cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI
II
IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2
RAMP
Makarova et al NRM submitted
CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes
54
256
1537
26
516
2
9
7 56107
3
10
13
383
210
46
216
8
1
7 70211
10
1
0102030405060708090
100
1623
84 6
17 7
2322
0
9
0
0
151
2
1
21
17
20 1
0
1533 28 7 14
0
9 7 40
117 210
0102030405060708090
100
Type I Type II Type IIICas1presenceabsence
Makarova et al in preparation
Modular evolution of the 3 types of CRISPRCas
Makarova et al in preparation
Mol Microbiol 2011 Jan79(2)484-502
A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair
Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF
Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair
Back to repair functions
The 3 major modalities of evolution
Koonin Wolf Biol Direct 2009
CRISPRCas as a bona fide Lamarckian system
Koonin Wolf Biol Direct 2009
Phenomenon Biological rolefunction
Phyletic spread
Lamarckian criteria
Genomic changes caused by environmental factor
Changes are specific to relevant genomic loci
Changes provide adaptation to the causative factor
Bona fide LamarckianCRISPRCas Defense against
viruses and other mobile elements
Most of the Archaeaand many bacteria
Yes Yes Yes
piRNA Defense against transposable elements in germline
Animals Yes Yes Yes
HGT (specific cases)
Adaptation to new environment stress response resistance
Archaea bacteria unicellular eukaryotes
Yes Yes Yes
Quasi-LamarckianHGT (general phenomenon)
Diverse innovations Archaea bacteria unicellular eukaryotes
Yes No Yesno
Stress-induced mutagenesis
Stress responseresistanceadaptation to new conditions
Ubiquitous Yes No or partially
Yes (but general evolvabilityenhanced as well)
Diverse Lamarckian and quasi-Lamarckian phenomena
Koonin Wolf Biol Direct 2009
Stress as a gauge of evolutionary modality
Koonin Wolf Biol Direct 2009
bullEvolution of parasites is intrinsic to any replicator system
bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since
bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms
bullPerennial arms race between parasites and hosts is one of the principal factors of evolution
Valerian DoljaOregon State University
Yuri Wolf NCBITatiana Senkevich NIAID NIH
Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille
Didier Raoult et al
Bill MartinUniv Duesseldorf
Kira Makarova NCBI
Phylogeny of Cas1 and the 3 types of CRISPRCas systems
228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes
EcoliCASS2Type IType I
Type I
Type I
Type I
Type III
Type III
Type III
Type III
Type III
Type III
Type II
YpestCASS3
NmeniCASS4
PolymeraseRAMPmoduleMtubeCASS6
PolRAMP
DvulgCASS1
TneapHmariCASS7
ApernCASS5
cas1cas9 cas2 cas4
cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI
II
IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2
RAMP
Makarova et al NRM submitted
CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes
54
256
1537
26
516
2
9
7 56107
3
10
13
383
210
46
216
8
1
7 70211
10
1
0102030405060708090
100
1623
84 6
17 7
2322
0
9
0
0
151
2
1
21
17
20 1
0
1533 28 7 14
0
9 7 40
117 210
0102030405060708090
100
Type I Type II Type IIICas1presenceabsence
Makarova et al in preparation
Modular evolution of the 3 types of CRISPRCas
Makarova et al in preparation
Mol Microbiol 2011 Jan79(2)484-502
A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair
Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF
Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair
Back to repair functions
The 3 major modalities of evolution
Koonin Wolf Biol Direct 2009
CRISPRCas as a bona fide Lamarckian system
Koonin Wolf Biol Direct 2009
Phenomenon Biological rolefunction
Phyletic spread
Lamarckian criteria
Genomic changes caused by environmental factor
Changes are specific to relevant genomic loci
Changes provide adaptation to the causative factor
Bona fide LamarckianCRISPRCas Defense against
viruses and other mobile elements
Most of the Archaeaand many bacteria
Yes Yes Yes
piRNA Defense against transposable elements in germline
Animals Yes Yes Yes
HGT (specific cases)
Adaptation to new environment stress response resistance
Archaea bacteria unicellular eukaryotes
Yes Yes Yes
Quasi-LamarckianHGT (general phenomenon)
Diverse innovations Archaea bacteria unicellular eukaryotes
Yes No Yesno
Stress-induced mutagenesis
Stress responseresistanceadaptation to new conditions
Ubiquitous Yes No or partially
Yes (but general evolvabilityenhanced as well)
Diverse Lamarckian and quasi-Lamarckian phenomena
Koonin Wolf Biol Direct 2009
Stress as a gauge of evolutionary modality
Koonin Wolf Biol Direct 2009
bullEvolution of parasites is intrinsic to any replicator system
bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since
bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms
bullPerennial arms race between parasites and hosts is one of the principal factors of evolution
Valerian DoljaOregon State University
Yuri Wolf NCBITatiana Senkevich NIAID NIH
Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille
Didier Raoult et al
Bill MartinUniv Duesseldorf
Kira Makarova NCBI
CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes
54
256
1537
26
516
2
9
7 56107
3
10
13
383
210
46
216
8
1
7 70211
10
1
0102030405060708090
100
1623
84 6
17 7
2322
0
9
0
0
151
2
1
21
17
20 1
0
1533 28 7 14
0
9 7 40
117 210
0102030405060708090
100
Type I Type II Type IIICas1presenceabsence
Makarova et al in preparation
Modular evolution of the 3 types of CRISPRCas
Makarova et al in preparation
Mol Microbiol 2011 Jan79(2)484-502
A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair
Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF
Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair
Back to repair functions
The 3 major modalities of evolution
Koonin Wolf Biol Direct 2009
CRISPRCas as a bona fide Lamarckian system
Koonin Wolf Biol Direct 2009
Phenomenon Biological rolefunction
Phyletic spread
Lamarckian criteria
Genomic changes caused by environmental factor
Changes are specific to relevant genomic loci
Changes provide adaptation to the causative factor
Bona fide LamarckianCRISPRCas Defense against
viruses and other mobile elements
Most of the Archaeaand many bacteria
Yes Yes Yes
piRNA Defense against transposable elements in germline
Animals Yes Yes Yes
HGT (specific cases)
Adaptation to new environment stress response resistance
Archaea bacteria unicellular eukaryotes
Yes Yes Yes
Quasi-LamarckianHGT (general phenomenon)
Diverse innovations Archaea bacteria unicellular eukaryotes
Yes No Yesno
Stress-induced mutagenesis
Stress responseresistanceadaptation to new conditions
Ubiquitous Yes No or partially
Yes (but general evolvabilityenhanced as well)
Diverse Lamarckian and quasi-Lamarckian phenomena
Koonin Wolf Biol Direct 2009
Stress as a gauge of evolutionary modality
Koonin Wolf Biol Direct 2009
bullEvolution of parasites is intrinsic to any replicator system
bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since
bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms
bullPerennial arms race between parasites and hosts is one of the principal factors of evolution
Valerian DoljaOregon State University
Yuri Wolf NCBITatiana Senkevich NIAID NIH
Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille
Didier Raoult et al
Bill MartinUniv Duesseldorf
Kira Makarova NCBI
Modular evolution of the 3 types of CRISPRCas
Makarova et al in preparation
Mol Microbiol 2011 Jan79(2)484-502
A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair
Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF
Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair
Back to repair functions
The 3 major modalities of evolution
Koonin Wolf Biol Direct 2009
CRISPRCas as a bona fide Lamarckian system
Koonin Wolf Biol Direct 2009
Phenomenon Biological rolefunction
Phyletic spread
Lamarckian criteria
Genomic changes caused by environmental factor
Changes are specific to relevant genomic loci
Changes provide adaptation to the causative factor
Bona fide LamarckianCRISPRCas Defense against
viruses and other mobile elements
Most of the Archaeaand many bacteria
Yes Yes Yes
piRNA Defense against transposable elements in germline
Animals Yes Yes Yes
HGT (specific cases)
Adaptation to new environment stress response resistance
Archaea bacteria unicellular eukaryotes
Yes Yes Yes
Quasi-LamarckianHGT (general phenomenon)
Diverse innovations Archaea bacteria unicellular eukaryotes
Yes No Yesno
Stress-induced mutagenesis
Stress responseresistanceadaptation to new conditions
Ubiquitous Yes No or partially
Yes (but general evolvabilityenhanced as well)
Diverse Lamarckian and quasi-Lamarckian phenomena
Koonin Wolf Biol Direct 2009
Stress as a gauge of evolutionary modality
Koonin Wolf Biol Direct 2009
bullEvolution of parasites is intrinsic to any replicator system
bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since
bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms
bullPerennial arms race between parasites and hosts is one of the principal factors of evolution
Valerian DoljaOregon State University
Yuri Wolf NCBITatiana Senkevich NIAID NIH
Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille
Didier Raoult et al
Bill MartinUniv Duesseldorf
Kira Makarova NCBI
Mol Microbiol 2011 Jan79(2)484-502
A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair
Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF
Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair
Back to repair functions
The 3 major modalities of evolution
Koonin Wolf Biol Direct 2009
CRISPRCas as a bona fide Lamarckian system
Koonin Wolf Biol Direct 2009
Phenomenon Biological rolefunction
Phyletic spread
Lamarckian criteria
Genomic changes caused by environmental factor
Changes are specific to relevant genomic loci
Changes provide adaptation to the causative factor
Bona fide LamarckianCRISPRCas Defense against
viruses and other mobile elements
Most of the Archaeaand many bacteria
Yes Yes Yes
piRNA Defense against transposable elements in germline
Animals Yes Yes Yes
HGT (specific cases)
Adaptation to new environment stress response resistance
Archaea bacteria unicellular eukaryotes
Yes Yes Yes
Quasi-LamarckianHGT (general phenomenon)
Diverse innovations Archaea bacteria unicellular eukaryotes
Yes No Yesno
Stress-induced mutagenesis
Stress responseresistanceadaptation to new conditions
Ubiquitous Yes No or partially
Yes (but general evolvabilityenhanced as well)
Diverse Lamarckian and quasi-Lamarckian phenomena
Koonin Wolf Biol Direct 2009
Stress as a gauge of evolutionary modality
Koonin Wolf Biol Direct 2009
bullEvolution of parasites is intrinsic to any replicator system
bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since
bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms
bullPerennial arms race between parasites and hosts is one of the principal factors of evolution
Valerian DoljaOregon State University
Yuri Wolf NCBITatiana Senkevich NIAID NIH
Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille
Didier Raoult et al
Bill MartinUniv Duesseldorf
Kira Makarova NCBI
The 3 major modalities of evolution
Koonin Wolf Biol Direct 2009
CRISPRCas as a bona fide Lamarckian system
Koonin Wolf Biol Direct 2009
Phenomenon Biological rolefunction
Phyletic spread
Lamarckian criteria
Genomic changes caused by environmental factor
Changes are specific to relevant genomic loci
Changes provide adaptation to the causative factor
Bona fide LamarckianCRISPRCas Defense against
viruses and other mobile elements
Most of the Archaeaand many bacteria
Yes Yes Yes
piRNA Defense against transposable elements in germline
Animals Yes Yes Yes
HGT (specific cases)
Adaptation to new environment stress response resistance
Archaea bacteria unicellular eukaryotes
Yes Yes Yes
Quasi-LamarckianHGT (general phenomenon)
Diverse innovations Archaea bacteria unicellular eukaryotes
Yes No Yesno
Stress-induced mutagenesis
Stress responseresistanceadaptation to new conditions
Ubiquitous Yes No or partially
Yes (but general evolvabilityenhanced as well)
Diverse Lamarckian and quasi-Lamarckian phenomena
Koonin Wolf Biol Direct 2009
Stress as a gauge of evolutionary modality
Koonin Wolf Biol Direct 2009
bullEvolution of parasites is intrinsic to any replicator system
bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since
bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms
bullPerennial arms race between parasites and hosts is one of the principal factors of evolution
Valerian DoljaOregon State University
Yuri Wolf NCBITatiana Senkevich NIAID NIH
Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille
Didier Raoult et al
Bill MartinUniv Duesseldorf
Kira Makarova NCBI
CRISPRCas as a bona fide Lamarckian system
Koonin Wolf Biol Direct 2009
Phenomenon Biological rolefunction
Phyletic spread
Lamarckian criteria
Genomic changes caused by environmental factor
Changes are specific to relevant genomic loci
Changes provide adaptation to the causative factor
Bona fide LamarckianCRISPRCas Defense against
viruses and other mobile elements
Most of the Archaeaand many bacteria
Yes Yes Yes
piRNA Defense against transposable elements in germline
Animals Yes Yes Yes
HGT (specific cases)
Adaptation to new environment stress response resistance
Archaea bacteria unicellular eukaryotes
Yes Yes Yes
Quasi-LamarckianHGT (general phenomenon)
Diverse innovations Archaea bacteria unicellular eukaryotes
Yes No Yesno
Stress-induced mutagenesis
Stress responseresistanceadaptation to new conditions
Ubiquitous Yes No or partially
Yes (but general evolvabilityenhanced as well)
Diverse Lamarckian and quasi-Lamarckian phenomena
Koonin Wolf Biol Direct 2009
Stress as a gauge of evolutionary modality
Koonin Wolf Biol Direct 2009
bullEvolution of parasites is intrinsic to any replicator system
bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since
bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms
bullPerennial arms race between parasites and hosts is one of the principal factors of evolution
Valerian DoljaOregon State University
Yuri Wolf NCBITatiana Senkevich NIAID NIH
Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille
Didier Raoult et al
Bill MartinUniv Duesseldorf
Kira Makarova NCBI
Phenomenon Biological rolefunction
Phyletic spread
Lamarckian criteria
Genomic changes caused by environmental factor
Changes are specific to relevant genomic loci
Changes provide adaptation to the causative factor
Bona fide LamarckianCRISPRCas Defense against
viruses and other mobile elements
Most of the Archaeaand many bacteria
Yes Yes Yes
piRNA Defense against transposable elements in germline
Animals Yes Yes Yes
HGT (specific cases)
Adaptation to new environment stress response resistance
Archaea bacteria unicellular eukaryotes
Yes Yes Yes
Quasi-LamarckianHGT (general phenomenon)
Diverse innovations Archaea bacteria unicellular eukaryotes
Yes No Yesno
Stress-induced mutagenesis
Stress responseresistanceadaptation to new conditions
Ubiquitous Yes No or partially
Yes (but general evolvabilityenhanced as well)
Diverse Lamarckian and quasi-Lamarckian phenomena
Koonin Wolf Biol Direct 2009
Stress as a gauge of evolutionary modality
Koonin Wolf Biol Direct 2009
bullEvolution of parasites is intrinsic to any replicator system
bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since
bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms
bullPerennial arms race between parasites and hosts is one of the principal factors of evolution
Valerian DoljaOregon State University
Yuri Wolf NCBITatiana Senkevich NIAID NIH
Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille
Didier Raoult et al
Bill MartinUniv Duesseldorf
Kira Makarova NCBI
Stress as a gauge of evolutionary modality
Koonin Wolf Biol Direct 2009
bullEvolution of parasites is intrinsic to any replicator system
bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since
bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms
bullPerennial arms race between parasites and hosts is one of the principal factors of evolution
Valerian DoljaOregon State University
Yuri Wolf NCBITatiana Senkevich NIAID NIH
Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille
Didier Raoult et al
Bill MartinUniv Duesseldorf
Kira Makarova NCBI
bullEvolution of parasites is intrinsic to any replicator system
bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since
bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms
bullPerennial arms race between parasites and hosts is one of the principal factors of evolution
Valerian DoljaOregon State University
Yuri Wolf NCBITatiana Senkevich NIAID NIH
Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille
Didier Raoult et al
Bill MartinUniv Duesseldorf
Kira Makarova NCBI
Valerian DoljaOregon State University
Yuri Wolf NCBITatiana Senkevich NIAID NIH
Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille
Didier Raoult et al
Bill MartinUniv Duesseldorf
Kira Makarova NCBI