Top Banner
The Virus World, its evolution, evolution of antiviral defense, and the role of viruses in the evolution of cells Eugene V. Koonin National Center for Biotechnology Information, NIH, Bethesda KITP, Santa Barbara , February 17, 2011
69

The Virus World, its evolution, evolution of antiviral defense

Feb 03, 2022

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: The Virus World, its evolution, evolution of antiviral defense

The Virus World its evolution evolution of antiviral defense and

the role of viruses in the evolution of cells

Eugene V KooninNational Center for Biotechnology Information NIH Bethesda

KITP Santa Barbara February 17 2011

What is a virus

A virus is a small infectious agent that can replicate only inside the living cells of organismshttpenwikipediaorgwikiVirus

Raoult D Forterre P Redefining viruses lessons from Mimivirus Nat Rev Microbiol 2008 6(4)315-9

Viruses and virus-like agents possessbullgenomes bullvery often ndashthough not always ndashcapsids that encase the genome

but lackbullfunctional translation machinerybullmembranes with transportsecretionsystems

bullViruses are the most abundant biological entities in the biosphere there are 10-100 virus particles per cellbullThe pangenomes of viruses and cellular organismshave [at least] comparable complexities

1 cm3 of seawater contains 106-109 virus particles

There are millions of diverse bacteriophage speciesin the water soil and gut

Suttle CA (2005) Nature 437356

Edwards and Rohwer (2005) Nat Rev Microbiol 3504

Viruses are the dominant entities in the biosphere ndash physically and genetically ndash as shown by viral metagenomics ndash virome studies

Mean of sequences with matches to major functional categories

0

5

10

15

Microbial Metagenomes

Vira

l Met

agen

omes

0 5 10 15

Nucleosides and nucleotides

Cell wall and capsule

Fatty acids and lipids

Membrane transport

Stress response

Aromatic compound

Cell division and cell cycle

Nitrogen metabolism

Sulphur metabolism

Motility and chemotaxis

Phosphorus metabolism

Potassium metabolism

Cell signalling

Secondary metabolismDNA metabolism

Carbohydrates

Amino acids

Virulence

Protein metabolism

Respiration

Photosynthesis

Cofactors etc

RNA metabolism

20

Key

Kristensen Mushegian Dolja Koonin Trends Microbiol 2010

Most of the viromes might not even consists of typical viruses butrather of pseudovirus particles that carry microbial genes (GTAs)

Comparative genomics shows that viruses that cause human diseases

belong to families that evolved hundreds of millions or even billions

years ago

Viruses accompany the evolving cellular life throughout its history and

might even predate it

Some viruses are comparable to cellular life forms in size and genetic complexity

Mimivirus genome (~12 Mbp ~1000 genes) is twice as large as that of Mycoplasma genitalium (580 kbp ~500 genes)

The largest most complex viruses NCLDV (Nucleo-Cytoplasmic Large DNA viruses of eukaryotes) ndashthis is where the smallpox virus belongs

Raoult et al Science 2004 Nov 19306(5700)1344 Suzan-Monti et al PLoS ONE 2007 Mar 282(3)

Iyer Aravind KooninCommon origin of four diverse families of large eukaryotic DNA viruses J Virol 2001 75 11720Iyer et al Evolutionary genomics of nucleo-cytoplasmic large DNA viruses Virus Res 2006 Apr117(1)156

The largest most complex viruses the Nucleocytoplasmic Large DNA Viruses (NCLDV)

(this is where the smallpox virus AND themimivirus belong)

6 families of NCLDVhellipand counting

size kb hosts-poxviridae 26 [134-360] vertebrates insects-asfarviridae 1 [170] vertebrates protists()-iridoviridae 8 [103-212] vertebrates insects protists()-ascoviridae 4 [119-174] insects-phycodnaviridae 9 [155-407] algae haptophytes stramenopiles -mimiviridae 2 [1181-1200] amoebozoa algae()-[Marseille virus] 1 [368] amoebozoa ndash new family

The case for the monophyly (common origin) of NCLDV

9 universally conserved hallmark genes (vaccinia gene names)-primase (D5-N)-helicase (D5-C)-DNA polymerase (E9)-packaging ATPase (A32)-Major capsid protein (D13 non-capsid in poxviruses)-Thiol-oxidoreductase (E10)-Helicase (D6 D11)-ST protein kinase (F10)-Transcription factor VLTF2 (A1)

47 genes mapped to the last common ancestor of NCLDVby maximum likelihood ndash all main functional systems represented

Iyer et al J Virol 2001 Virus Res 2006 Yutin et al Virol J 2009

b1 Helvib1 Spofr

b1 Trinil2 Invir

l1 Aedtal3 Lymch

l3 Lymdil5 Ambtil5 Frovi

l5 Singrl4 Infsp

m6 Marseillevirusn1 Acapon2 Mamav

q1 Acatuq1 ParFRq1 ParMTq1 ParAR

q1 ParNYq1 Parbu

q6 Ostviq3 Ectsiq3 Felspq2 Emihu

u2 Amsmou2 Melsa

u1 Bovpau1 Orfviu1 Deevi

u1 Swiviu1 Myxviu1 Rabfi

u1 Goaviu1 Lumsku1 Shevi

u1 Tanviu1 Yabliu1 Yabmo

u1 Vacviu1 Varvi

u1 Molcou1 Crovi

u1 Canviu1 Fowvi

c1 Afrsw

1000010000

93159970

10000

1000010000

6702

9983

10000

7557

10000

1000010000

999310000

10000

9910

10000

7074

10000

6735

10000

10000

7417

10000

4747

454010000

9840

1000010000

10000

10000

10000

9720

9911

5452

9871

10000

10000

10000

05

Phycodnaviridae

Poxviridae

Ascoviridaeamp

Iridoviridae

MimiviridaeMarseillevirus

Asfarviridae

Phylogeny of NCLDV based on concatenation of 4 universal genes primase-helicase DNA polymerase packaging ATPase VLTF2 transcription factor

HOSTAnimals+ diverse protists

Amoebozoa Algae animals()

ChlorophytesHaptophytesstramenopiles

Animals haptophytes

bullDivergence of NCLDVmost likely antedatesdivergence of eukaryoticsupergroupsbullAlternativeaddition extensive horizontal transfer of viruses

Boyer Yutin et al PNAS 2009Koonin Yutin Intervirology2010

Animals

Boyer M Yutin N Pagnier I Barrassi L Fournous G Espinosa L Robert C AzzaS Sun S Rossmann MG Suzan-Monti M La Scola B Koonin EV Raoult DPNAS 2009106(51)21848-53

Giant Marseillevirus highlights the role of amoebae as a melting pot in emergence of chimeric microorganisms

Giant viruses such as Mimivirus isolated from amoeba found in aquatic habitatsshow biological sophistication comparable to that of simple cellular life formsand seem to evolve by similar mechanisms including extensive gene duplicationand horizontal gene transfer (HGT) possibly in part through a viral parasitethe virophage We report here the isolation of Marseille virus a previouslyuncharacterized giant virus of amoeba The virions of Marseillevirus encompass a 368-kb genome a minimum of 49 proteins and some messenger RNAs Phylogenetic analysis of core genes indicates that Marseillevirus is the prototypeof a family of nucleocytoplasmic large DNA viruses (NCLDV) of eukaryotes The genome repertoire of the virus is composed of typical NCLDV core genes and genes apparently obtained from eukaryotic hosts and their parasites or symbionts both bacterial and viral We propose that amoebae are melting potsldquoof microbial evolution where diverse forms emerge including giant viruses with complex gene repertoires of various origins

The mosaic composition of the Marseille virus genome

Gene content tree intervirus gene transfer

Amoeba as a melting pot for HGT between viruses andbacterial endosymbionts

There are really weird creatures out thereSome NCLDV host their own parasites

La Scola et al The virophage as a unique parasite of the giant mimivirus Nature 2008

Chimeric origin of the virophage genome

NCLDV

Mimivirus

Archaeal virus

NCLDVenvironmental only

Bacterialtransposon

La Scola et al Nature 2008 455100-4

Hypothesisorigin of the NCLDV (and other viruses ofEukaryotes) in the

the second melting pot of virus evolution ndasheukaryogenesis

Phage scaffold(virus hallmark genes)

Eukaryotic additionsdisplacements

Koonin Yutin Intervirology 2010

Poliovirus74 kb

VP0 3DPol2C 3CPro Angg

IRESVP1VP3 2APro 2B

3A 3B

Jelly-roll CPsSuperfamily 3

helicase RdRp

Viral hallmark genes

3C-Pro3DPol

RdRp3CPro g

VPg2C

S3H

Picornaviral lsquosignaturersquo genes

Protein sequence conservation

Some of the smallest viruses (this is where poliovirus belongs) The Big Bang of picorna-like virus evolution

Koonin Wolf Nagasaki Dolja Nat Rev Microbiol 2008 Dec6(12)925-39

Picorna-likeviruses possessdiverse arraysof hallmark andunique genes

Picorna-likeviral superfamily

Marine eukaryotic plankton carries a wealth ofpositive-strand RNA viruses nearly all belong to the picorna-like superfamily

Lang et al (2004) Virology 320206

HcRNAV44 kb

VP

HaRNAV86 kb

S-Pro Pol

VP1VP3VP2PolHel3

RsRNAV89 kb AnPolHel3 VP 1-3

SssRNAV90 kb An

PolHel3 VP 1-3C-Pro

sg sg

An

Nagasaki et al (2005) Appl Env Microbiol 718888

Culley et al (2006) Science 3121795

The amended Picornavirus-like

superfamily includes 14 recognized viral families

4 floating genera and15 unclassified positive-

strand anddouble-strand RNA

viruses that infect hosts from 4 of the 5 eukaryotic

supergroups-6 distinct clades

from RdRp phylogeny-diverse genome layouts

View from the viral side

6 major clades of picorna-likeviruses - 5 infect eukaryotesfrom different supergroups

The 5 supergroups of eukaryotes and their picorna-like viruses

Complementary view from thehost side

4 of the 5 eukaryoticsupergroups host diverse picorna-like viruses (no onechecked the 5th)

PolioV74 kb

VP0 3DPol2C 3CPro Angg

IRESVP1VP3 2APro 2B

3A 3B

Jelly-roll CPsSuperfamily 3

helicase RdRp

DNA phages Bacterial group II

retroelements

Bacterial and mitochondrialHtrA family proteases

Likely origins of the signature genes of the picorna-like superfamily were inferred usingsignature PSSMs and PSI-BLAST searches

3C-Pro

Hallmark genes of picorna-like viruses have distinct prokaryoticorigins

Radiation of major viralclades occurred in aldquoBig Bangrdquo during eukaryogenesis and antedates the divergenceof eukaryotic supergroups-the viruses then ldquosampledrdquo the hosts

Conclusions on the picorna and NCLDV storiesBig Bangs of virus evolution

bullPhylogenomic analysis of the picorna-like superfamily of eukaryotic RNA viruses indicates that the major lineages within this superfamily diverged prior to the divergence of the eukaryotic supergroupsbullBig Bang ndash an explosive early phase of viral evolutionbullMost likely the same pattern holds for other major groups ofviruses as illustrated by the evolutionary study of NCLDVa completely different group of virusesbullThe Big Bangs of eukaryotic virus evolution occurred concomitantly with a similar rapid phase of host evolution andcould be a manifestation of a general model of major evolutionarytransitions

Koonin The Biological Big Bang model for the major transitions in evolutionBiol Direct 2007 Aug 20221

Koonin Wolf Nagasaki Dolja Nat Rev Microbiol 2008 Dec6(12)925-39

+

1 Positive-strandRNA

3-30 kb - +R R ET

T+R

Class Replication cycle

2 Double-strandRNA

4-25 kb plusmnTr R E

T

+ plusmn

3 Negative-strandRNA

11-20 kbR R E

RdRp CPT

+-Tr -

+

Genetic cycles of RNA viruses

Diversity of viral genetic cycles versus the uniform genetic cycle of all cellular organisms Viruses are the

biospherersquos laboratory of genomic strategies

RdRp CP

RdRp CP

+4 Retroid

RNA viruses 7-12 kb +RT Tr E

T+Tr

plusmn

5 Retroid DNAviruses

elements2-10 kb +Tr

T+Tr

plusmnE RT

plusmn

RT CP

RT

Class Replication cycle

Genetic cycles of retroid viruses and retroelements

Genetic cycles of DNA viruses and plasmids 6 ssDNAvirusesplasmids2-11 kb

7 dsDNA virusesplasmids

5-1200 kb

+RCR RCR E

+Tr

plusmn +

TRCE

Tr

plusmn

+T

plusmnR E

Pr-Pol ALL CELLULARORGANISMS

YOU ARE HERE

I Genes with readily detectable homologs from cellular life forms 1 Genes with closely related homologs from cellular organisms

(typically the host of the given virus) present in a narrow group of viruses2 Genes that are conserved within a virus lineage or even several

lineages and have moderately close cellular homologsOrigin relatively recent (1) or ancient (2) acquisition from host

II Virus-specific genes3 ORFans ie genes without detectable homologs except possibly

in closely related viruses 4 Virus-specific genes that are conserved within a virus lineageAcquisition from host but with rapid divergence from ancestor once within viral genomes

III Viral hallmark genes5 Genes shared by many diverse virus lineages with only very distant

homologs in cellular organisms

Natural history of viral genes a one-page summary of viral comparative genomics

Contributions of different classes of viral genes to the genomes of different classes of viruses strong dependence on genome size

Genome size (log) 0 1 2 3

Recent acquisitions

Old acquisitions

Virus-specific ORFans

Virus-specific conserved

Virus hallmark

Most RNAvirusesretroelementsRCR replicons

Large RNA virusesadenotailedphages

NCLDVherpeslargephages

Natural history of viral genesViral Hallmark Genes

Shared by many diverse groups of viruses from the smallest RNA viruses to the giant DNA viruses

Strong support for monophyly of all viral members of the respective gene families

Only distant homologs in cellular organisms

Can be viewed as signatures of the lsquovirus statersquo

Play major roles in genome replication packagingand assembly

4 Rolling circle replicationinitiation endonuclease

1 Jelly-roll capsid protein

2 Superfamily 3 helicase

Protein products of viral hallmark genes

3 RNA-dependent RNA polymeraseand Reverse transcriptase

5 Viral archaeo-eukaryotic DNA primase

6 UL9-like superfamily 2 helicase

7 Packaging ATPase of the FtsK family

8 ATPase subunit of terminase

The primordial gene pool hypothesis (extremely counterintuitive ndash Santiago Elena Feb 16 2011)

The hallmark genes AND by implication the major lineages of modern viruses (at least viruses of prokaryotes) descend directly from a primordial gene pool

Koonin Senkevich Dolja Biol Direct 2006 129

Viral hallmark genes arebullpresent in a huge diversity of viruses and other selfish elementsbullrepresented only by remote homologs in cellular life forms

-synergy with the diversity of genomic strategies

A crucial corollary If viruses come directly from a primordial gene pool then origin of viruses isinextricably linked to the origin of cells

Cell degeneration Escaped genes Pr imordial geneticsystems

CELL

SMALLPARASITICCELL

VERYSMALLVIRUS

CHROMOSOME

PLASMID

VIRUS

RNA

VIRUS

PRE-CELLULAR LIFE FORMS

DNA

VIRUS

mRNA

VIRUS

Competing concepts of the or igin of viruses

CELLS

Origin and evolution of virus-like genetic

elementsin the pre-cellular

era

Koonin Martin TIG 2005Koonin Senkevich Dolja Biol Direct 2006 129Koonin Ann NY Acad Sci 2009

Replicon fusionyields chromosomes

bullViruses and virus-like genetic elements are not ldquojustrdquo pathogens they aredominant entities in the biospherebullEmergence of virus-like parasites is inevitable in any replicating systembullIn the pre-cellular epoch the genetic elements that later became viral andcellular genomes comprised a single pool in which they mixed matched andevolved new increasingly complex gene ensemblesbullDifferent replication strategies including RNA replication reverse transcriptionand DNA replication evolved already in the primordial genetic poolbullWith the emergence of prokaryotic cells a distinct pool of viral genes formed that retained its identity ever since as evidenced by the extant distribution ofviral hallmark genes ldquovirus worldrdquo or the virospherebullThe emergence of the eukaryotic cell was a second melting pot of virus evolution from which viruses of eukaryotes originated via recombination of genes from prokaryote viruses retroelements and the evolving eukaryotic hostbullViruses make essential contributions to the evolution of the genomes of cellularlife forms in particular as vehicles of HGT GTAs transducing phages

The ancient Virus World (VW)

Koonin EV Senkevich TG Dolja VV The ancient Virus World and evolution of cells Biol Direct 2006 Koonin EV Wolf YI Nagasaki K Dolja VV The complexity of the virus world Nat Rev Microbiol 2009

Evolution of antivirus defense systems

bull CRISPRCas system of adaptive immunity in prokaryotes

bull A case for Lamarckian evolutionbull The perennial virus-host arms race

CRISPR repeats and Cas genesCRISPR Clustered Regularly interspaced short palindromic repeatsCas CRISPR-associated (genes)

Sorek et al Nature Rev Microbiol 2008

Makarova KS Aravind L Grishin NV Rogozin IB Koonin EVA DNA repair system specific for thermophilic Archaea and bacteria predicted by genomic context analysisNucleic Acids Res 2002 Jan 1530(2)482-96

During a systematic analysis of conserved gene context in prokaryotic genomes a previously undetected complex partially conserved neighborhood consisting of more than 20 genes was discovered in most Archaea and some bacteria including the hyperthermophiles Thermotoga maritima and Aquifex aeolicus The gene composition and gene order in this neighborhood vary greatly between species but all versions have a stable conserved core that consists of five genes One of the core genes encodes a predicted DNA helicase often fused to a predicted HD-superfamily hydrolase and another encodes a RecB family exonuclease three core genes remain uncharacterized but one of these might encode a nuclease of a new family helliphelliphelliphellipThe functional features of the proteins encoded in this neighborhood suggest that they comprise a previously undetected DNA repair system which to our knowledge is the first repair system largely specific for thermophiles to be identified

Protein components of the system an update with unification of many diverse families

~25 families altogetherFamily SubfamilyA Phyletic

distributionBComments

1 COG1518 COG1518 (cas1) All Putative novel nucleaseintegrase Mostly α-helical protein

2 COG1343 COG1343 (cas2) COG3512ygbF-like

MTH324-likey1723_N-like

All Small protein related to VapD fused to helicase (COG1203) iny1723-like proteins

3 COG1203 COG1203 (cas3) All DNA helicase Most proteins have fusion to HD nuclease

4 RecB-like nuclease

COG1468 (cas4) COG4343 All RecB-like nuclease Contains three-cysteine C-terminal cluster

5 RAMP Repair-associated mysterious

protein

COG1688 (cas5)COG1769 COG1583COG1567 COG1336COG1367 COG1604COG1337 COG1332COG5551

BH0337-likeMJ0978-likeYgcH-like

y1726-like y1727-like

All Belong to ldquoRAMPrdquo family possibly RNA-binding proteinstructurally related to duplicated ferredoxin fold (PDB 1wj9)

6 COG1857 COG1857 COG3649 YgcJ-like y1725-like

All αβ protein predicted nuclease or integrase

7 HD-like nuclease

COG1203 (N-terminus) COG2254 All HD-like nuclease

8 BH0338 BH0338-likeMTH1090-like

All mostlyarchaea and FIRM

Large Zn-finger containing proteins probably nucleases (nucleaseactivity was shown for MTH1090 (ref)

9 COG1353 predicted

polymerase

COG1353 Most archaea some bacteria

Predicted palm-domain polymerase distantly related to viral RdRp and RT

hellipand ~15 other less common proteins Makarova et al BD 2006

CRISPR clustered regularly interspaced short palindromic repeats

CRISPR repeats

TB and TA

Strepto-like

Ecoli-like

Pasteurella-like

Cas2COG1343

Cas1COG1518

Cas4COG1468

Cas3COG1203

ldquoThe common structural characteristics of CRISPR loci are (i) the presence of multiple short direct repeats which show no or very little sequence variation within a given locus (ii) the presence of non-repetitive spacer sequences between the repeats of similar size (iii) the presence of a common leader sequence of a few hundred basepairs in most species harbouring multiple CRISPR loci (iv) the absence of long open reading frames within the locus and (v) the presence of the cas1 gene accompanied by the cas2 cas3 or cas4 genes in CRISPR-containing speciesrdquo

The cas genes are our ldquorepairrdquo system

ldquoHelicaserdquo cassette

ldquoPolymeraserdquo cassette

paragon of prokaryotic genome plasticity-extensive gene shuffling-evidence of multiple horizontal transfers-widespread gene loss

RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins

extreme sequence diversity MOTIFs specific I II III IV V COGs 13361367 hhshhGs ust-lKGhh+hh hhGtt hD lGhttsGh 160413371332 y1726-like slhlpEKuVRGT lRTIDs YGuVTsGhuh COG1851 hGphpGpsaFh hGFGRh BH0337-like TpA-h+GIh-uIh hhLpDV LGsREhuht COG1567 hhhpp ups-lhtAhh luscoGhGh COG1769 hhhh+Ph-hhh ss-hhGhlhsh hGh hGtcp+hsthchp COG1688 (Cas5) hhhhhts sssshhGhlsh lGttphh COGs 15835551 hhhhoPhhl hGtppshGFGl YgcH-like hHphlh hGu+uhGhGhh y1727-like LHphLh GFsthGLStss MJ0978-like hHNH lG+tsuhGhGol

Ferredoxin foldRNA-recognition motif (RRM)

The sequence similarity space of CRISPR repeats visualized with the BioLayoutprogram [26] Dots denote individual repeat sequences connecting lines represent Smith-Waterman similarities such that closer dots represent more similar sequences Dot colors denote cluster association as derived from MCL clustering The 12 largest clusters are indicated by circles together with their sequence logos coarse phylogenetic composition and sample secondary structures where applicableKunin et al Genome Biology 2007 8R61 doi101186gb-2007-8-4-r61

CRISPR show extreme diversity and complex clustering

Arcful GTTGAAATC-------AGACCAAAATGGGATTGAAAG- Metthe GTTAAAATC-------AGACCAAAATGGGATTGAAAT- Pyraby GTTCCAATA-------AGACTAGAATAGAATTGAAAG- Pyrhor GTTTAATAA--------GACTAAAATAGAATTGAAAG- Sulsol GATTAATCC-------------CAAAAGAATTGAAAG- Pyrhor GTTTCCGTA--------GAACTAAATAGTGTGGAAAG- Pyraby GTTTCCGTA--------GAACTTAGTAGTGTGGAAAG- Pyraer GTTTCAACT------------ATCTTTTGATTTTTGG- Pictor GTTAAATAA-------TAACCTAAATAGGATTGAAAG- Theaer GTAAAATAG---------ACCTTAATAGGATTGAAAG- Metace GTTTCAATC-------CCTCAAAGGTCTGATTTTAAC- Sultok GATGAATCC------------CAAAAGGAATTGAAAG- Sulaci GTTTTAGTT-------------TCTTGTCGTTATTAC- Pictor GTTTAAGAA--------TTACTAGATAGTATGGAGT-- Halmar GTTTCAGAC-------GGACCCTTGTGGGGTTGAAGC- Metjan ATTAAAATC-------AGACCGTTTCGGAATGGAAAT- Metkan GTTTCATTA-CCCGTATTATTACGGGTTAATTGCGAG- Nanequ CTTTCAATA---------TTTCTAATATATTAGAAAC- Themar GTTTCAATA-------CTTCCTTAGAGGTATGGAAAC- Theten GTTTCAATC-------CCTTTTAGGTAGGCTAAAAAC Thethe GTTGCAAAC-CTCGTTAGCCTCGTAGAGGATTGAAAC- Aquaeo GTTTTAACT--------CCACACGGTACATTAGAAAC- Bachal GTCGCACTC----TATATGGGT-GCGTGGATTGAAAT- Azoarcus GTGTTCCCC-------GCGCATCGCGGGGGTTGAAG-- Chltep GTCTTCCCC--------ACGCC-CGTGGGGGTGTTTC- Chrvio GTGCTCCCC-------CACGCA-CGTGGGGATGAACCG Clotet GTATTAGTA-------GCACCA-TATTGGAATGTAAAT Anabaena GTTTTAATTAACAAAAATCCCTATCAGGGATTGAAAC- Myctub GTTTCCGTC--CCCTCTCGGGGTTTTGGGTCTGACGAC Yerpes GTTCACTGC--------CGCACAGGCAGCTTAGAAA-- Metcap GTTTCAATCCACTCCCGGCTATTTAGCCGGGAGATAC- Mycgal GTTTTAGCA-CTGTACAATACTTGTGTAAGCAATAAC- Pasmul GTTAACTGC--------CGTATAGGCAGCTTAGAAA-- Pasmul GTTGTAGTTCCCTCTCTCATTTCGCAGTGCTACAAT-- Pasmul GTTCACCAT--------CGTGTAGATGGCTTAGAAA--

All

A large subset of CRISPRs is conserved among diverse specieseven between archaea and bacteria suggesting that

CRISPRs are horizontally transferred together with cas genes

Mojica FJ Diez-Villasenor C Garcia-Martinez J Soria E

Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elementsJ Mol Evol 2005 Feb60(2)174-82

Here we show that CRISPR spacers derive from preexisting sequences either chromosomal or within transmissible genetic elements such as bacteriophages and conjugative plasmids

helliphelliphelliphellipThe transcription of the CRISPR loci (Tang et al 2002) suggests that such activity could be executed by CRISPR-RNA molecules acting as regulatory RNA that specifically recognizes the target through the homologous RNA-spacer sequence similarly to the eukaryotic interference RNA

HypothesisCRISPRCasbull is a prokaryotic immunity system that functions on the

RNAi principle

bull integrates short fragments of essential phageplasmid genes into CRISPRs

bull When expressed these fragments (psiRNA ndash after prokaryotic siRNA) silence the target gene and make the organism immune to the respective agent

bull contains all or most of the protein activities involved in these processes

bull Some of the Cas proteins are functional analogs of the eukaryotic proteins involved in RNAi in particular components of RISC (RNA-Induced Silencing Complex) and form prokaryotic analogs of RISC (pRISCs)

Transcription

(protein-guided) RNA folding

specific transcription factor (COG1517)

cellular RNApol

polycistronic pre-psiRNA

p-dicer (Helicase+HD-Hydrolase)(COG1203 COG2254)

RNA processing(fast)

pre-psiRNA75-100 nt

RNA processing(slow)

p-dicer

psiRNA25-45 nt

p-slicer(COG1468 COG4343

COG1857)

RAMP

target RNA cleavage

p-RISC

plasmid or phage mRNA

Annealing to RNA target

p-RISC

The basic scheme of CASS functioning

psiRNA25-45 nt

RAMP

RAMP-RNA complex(unstable)plasmid or phage

mRNA

Primer elongationCASS RNApol(COG1353)

Amplified annealing

long dsRNA (stable)

p-dicerDuplex degradation

RAMPRAMP binding

Cycle continues

Annealing to RNA target

Variant of CASS functioning with polymerasepsi-RNA amplification(mostly in thermophiles but also in Mycobacteria)

plasmid or phage mRNA

pre-psiRNA75-100 nt

CASS RNApol(COG1353)Or other RT

Reverse transcription with random copy choice

Random RNA recombination and reverse transcription

dsDNA with CRISPRs and target-derived spacers

Homologous recombination with genomic CRISPR region

integrase(COG1518)

genomic DNA

genomic DNA with new target-derived spacer

OR

New psiRNA generation

Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12

Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity

Key validation

bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease

Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes

(1) Adaptation

(2) Expression amp Processing

(3) Interference

invading virus plasmid

CRISPRleader

5 4 3 2 1

6 5 4 3 2 1

invading virus plasmid

invading virus plasmid

host

host

cas operon

Cascade

Cas3

cas3 cas2cas1cas5

Van der Oost et al TIBS 2009

The three types of CRISPRCas systems and their signature genes

1343

1343

1343

1343

Helicase1203

HD-f

Cas2

Cas6

Cas6

Cas5

Cas8

Cas9

Cas1

Cas3

Cas10

Cas11

Cas7

Cas6

Cas6

Cas5 Cas6

Cas4

Cas6

13433513

1343

RuvC-f

3513

HNH

13433513

1857

18571857 RAMP1688

RAMP1688

RecB-f1468

RecB-f1468

Helicase1203

R AMP1583

1857 1857Helicase1203

1857Helicase1203

1343 y1724

ygcKygcL

EcoliCASS2

YpestCASS3

NmeniCASS4

MtubeCASS6 RAMP module

DvulgCASS1

Tneap and HmariCASS7

ApernCASS5

Type I (Helicase cassette)

Cascade

Type II (HNH-type)

Type III (RAMP module or Polymerase cassette)

CASS4a

1857 1343RecB-f1468

Helicase1203

BH0338MTH1090LA3191

1421 RAMP1567

RAMP15833337RAMP

1583

RAMP1583

RAMP1769

SPy1049

Cas6

Cas61857

RecB-f1468

RAMP1688

Helicase1203

AF0070

R AMP1583

1343

1517

Cas12 Cas13

RAMP1688

RAMP1688

RAMP1688

Makarova et al NRM submitted

Experimental data on CRISPRCas systems

1857

1857

1857

RAMP1688

10

Cordip DIP0037Bacfra BF3951

Wolsuc WS1444Camjej Cj1522c

Neimen NMA0630Pasmul PM1126

Strthe str0658Strpyo Spy 0770

Treden TDE0328Staepi SERP2463

Mycgal MGA 0523Mycmob MMOB0320

Porgin PG1982Legpne lpp0161

Wolsuc WS1615Sulsol SSO1450

Aerper APE1240Sulsol SSO1405

Pyraer PAE0200Pyraer PAE0081

Metmaz MM0559Metmaz MM3249

Arcful AF1878Mansuc MS1635

Niteur NE0111Myctub Rv2817c

Nostoc alr0381Synech slr7071Thethe TTHB145

Chrvio CV1229Xanaxo XAC3842

Chltep CT1130Desvul DVUA0134

Metcap MCA0651Azoarc ebA3283

Deigeo_DRAFT_1682Mansuc MS0981

Baccla ABC3592Bachal BH0341

Strpyo Spy 1286Vibvul VVA1544

Nostoc alr1468Synech slr7092

Synech slr7016Geosul GSU0057

Thethe TTHB224Nostoc alr1568

Metkan MK1312

Esccol b2755Saltyp STM2938

Phopro PBPRC0034Geosul GSU1392

Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639

Symthe_STH663Corjei jk0643

Nocfar nfa44220Strave SAV7537

Metcap MCA0930Cordip DIP2214

Chrvio CV1756Zymmob ZMO0680

Pasmul PM0311Erwcar ECA3679

Yerpes y1722Legpne lpl2837Phopro PBPRB1995

Acinet ACIAD2484

Clotet CTC01148Fusnuc FN1177

Aquaeo aq 369Theten TTE2658Themar TM1797

Bacfra BF2544Porgin PG2014

Halmar pNG4053Clotet CTC01463

Metjan MJ0378Pyrhor PH1245

Thekod TK0455Metthe MTH1084

Arcful AF2435Metace MA3670

Nanequ NEQ017Pyrhor PH0173

Pictor PTO0003Pictor PTO0049Thevol TVN0106

1343

1343

1343

1343

1518

1518

1343

1518

RecB-f4343

RecB-f1468

RecB-f1468

RecB-f1468

Helicase12031857 RAMP

1688

RAMP1688

Helicase1203

Helicase1203

RAMPBH0337

RAMPYgcH

RAMPy1727

RAMPy17261857

Helicase1203

HD-f

Helicase1203

HD-f

1343

HD-f

HD-f

HD-f

RuvC-f

AF0070

BH0338

y1724

ygcKygcL

RAMP1583

HTH

AF1870

SPy1049

3574

AF1870

PH0918

HTH 3574

1857 RAMP1688

RecB-f1468

Helicase1203

HD-f

AF1870

MTH1090

AF1873

RAMP1583

3574

AF1870

3574

1343

1

7

2

3

4

5

6

7a

Sulsol SSO1999Sulsol SSO1440

Sultok ST2639Sultok ST0032

Sulsol SSO1402Pyraer PAE0068

Arcful AF1874Pyrhor PH0917Thekod TK0450

Pyraby PAB1689

13433513

3513

1518

RuvC-f

HNH

4aHNH

1 0

Pyrhor PH0162

6

64

6

4

16

17

7

7

7

7

1

2

7

7

7

7

5 7

5 75 7

5 75 7

5 7

6 1

7

Metkan MK1314

Metace MA1931

Metmaz MM3356

Metbar Mbar_A1355

Strthe stu0960Myctub Rv2823cStaepi SERP2461

Themar TM1811Mansuc MS1651

Niteur NE0123Thethe TTP0102

Metthe MTH1082

Metthe MTH326

Metjan MJ1672Pictor PTO0053Thevol TVN0112

Metkan MK1297Thethe TTP0115

Bachal BH0328Synech sll7090

Aquaeo aq 387Theten TTE2639

Themar TM1794Arcful AF1867

Pyrfur PF1129Thefus Tfu1578

Porgin PG1987Sultok ST1979

Sulsol SSO1991Sulsol SSO1729

Sultok ST0012Sulaci Saci 2046Nostoc all1479

1421 RAMP1567

RAMP1583

RAMPMTH323

RAMP1769

XMTH324

ldquoPolymeraserdquo cassette

ldquoHelicaserdquo cassette(A)

(B)

(C)

Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4

Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56

Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)

mRNA targeting

Phylogeny of Cas1 and the 3 types of CRISPRCas systems

228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes

EcoliCASS2Type IType I

Type I

Type I

Type I

Type III

Type III

Type III

Type III

Type III

Type III

Type II

YpestCASS3

NmeniCASS4

PolymeraseRAMPmoduleMtubeCASS6

PolRAMP

DvulgCASS1

TneapHmariCASS7

ApernCASS5

cas1cas9 cas2 cas4

cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI

II

IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2

RAMP

Makarova et al NRM submitted

CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria

bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes

54

256

1537

26

516

2

9

7 56107

3

10

13

383

210

46

216

8

1

7 70211

10

1

0102030405060708090

100

1623

84 6

17 7

2322

0

9

0

0

151

2

1

21

17

20 1

0

1533 28 7 14

0

9 7 40

117 210

0102030405060708090

100

Type I Type II Type IIICas1presenceabsence

Makarova et al in preparation

Modular evolution of the 3 types of CRISPRCas

Makarova et al in preparation

Mol Microbiol 2011 Jan79(2)484-502

A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair

Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF

Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair

Back to repair functions

The 3 major modalities of evolution

Koonin Wolf Biol Direct 2009

CRISPRCas as a bona fide Lamarckian system

Koonin Wolf Biol Direct 2009

Phenomenon Biological rolefunction

Phyletic spread

Lamarckian criteria

Genomic changes caused by environmental factor

Changes are specific to relevant genomic loci

Changes provide adaptation to the causative factor

Bona fide LamarckianCRISPRCas Defense against

viruses and other mobile elements

Most of the Archaeaand many bacteria

Yes Yes Yes

piRNA Defense against transposable elements in germline

Animals Yes Yes Yes

HGT (specific cases)

Adaptation to new environment stress response resistance

Archaea bacteria unicellular eukaryotes

Yes Yes Yes

Quasi-LamarckianHGT (general phenomenon)

Diverse innovations Archaea bacteria unicellular eukaryotes

Yes No Yesno

Stress-induced mutagenesis

Stress responseresistanceadaptation to new conditions

Ubiquitous Yes No or partially

Yes (but general evolvabilityenhanced as well)

Diverse Lamarckian and quasi-Lamarckian phenomena

Koonin Wolf Biol Direct 2009

Stress as a gauge of evolutionary modality

Koonin Wolf Biol Direct 2009

bullEvolution of parasites is intrinsic to any replicator system

bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since

bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms

bullPerennial arms race between parasites and hosts is one of the principal factors of evolution

Valerian DoljaOregon State University

Yuri Wolf NCBITatiana Senkevich NIAID NIH

Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille

Didier Raoult et al

Bill MartinUniv Duesseldorf

Kira Makarova NCBI

  • Slide Number 1
  • What is a virus
  • Slide Number 3
  • Slide Number 4
  • Slide Number 5
  • Comparative genomics shows that viruses that cause human diseases belong to families that evolved hundreds of millions or even billions years agoViruses accompany the evolving cellular life throughout its history and might even predate it
  • Slide Number 7
  • Slide Number 8
  • Slide Number 9
  • Slide Number 10
  • Slide Number 11
  • Slide Number 12
  • Slide Number 13
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Slide Number 18
  • Slide Number 19
  • Slide Number 20
  • Slide Number 21
  • Slide Number 22
  • Slide Number 23
  • Slide Number 24
  • Slide Number 25
  • Slide Number 26
  • Slide Number 27
  • Slide Number 28
  • Slide Number 29
  • Slide Number 30
  • Slide Number 31
  • Slide Number 32
  • Slide Number 33
  • Slide Number 34
  • Slide Number 35
  • Slide Number 36
  • Evolution of antivirus defense systems
  • Slide Number 38
  • Slide Number 39
  • Slide Number 40
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins extreme sequence diversity
  • Slide Number 46
  • Slide Number 47
  • Slide Number 48
  • Slide Number 49
  • Hypothesis
  • Slide Number 51
  • Slide Number 52
  • Slide Number 53
  • Slide Number 54
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • The three types of CRISPRCas systems and their signature genes
  • Experimental data on CRISPRCas systems
  • Phylogeny of Cas1 and the 3 types of CRISPRCas systems
  • CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
  • Slide Number 62
  • Slide Number 63
  • Slide Number 64
  • Slide Number 65
  • Slide Number 66
  • Slide Number 67
  • Slide Number 68
  • Slide Number 69
Page 2: The Virus World, its evolution, evolution of antiviral defense

What is a virus

A virus is a small infectious agent that can replicate only inside the living cells of organismshttpenwikipediaorgwikiVirus

Raoult D Forterre P Redefining viruses lessons from Mimivirus Nat Rev Microbiol 2008 6(4)315-9

Viruses and virus-like agents possessbullgenomes bullvery often ndashthough not always ndashcapsids that encase the genome

but lackbullfunctional translation machinerybullmembranes with transportsecretionsystems

bullViruses are the most abundant biological entities in the biosphere there are 10-100 virus particles per cellbullThe pangenomes of viruses and cellular organismshave [at least] comparable complexities

1 cm3 of seawater contains 106-109 virus particles

There are millions of diverse bacteriophage speciesin the water soil and gut

Suttle CA (2005) Nature 437356

Edwards and Rohwer (2005) Nat Rev Microbiol 3504

Viruses are the dominant entities in the biosphere ndash physically and genetically ndash as shown by viral metagenomics ndash virome studies

Mean of sequences with matches to major functional categories

0

5

10

15

Microbial Metagenomes

Vira

l Met

agen

omes

0 5 10 15

Nucleosides and nucleotides

Cell wall and capsule

Fatty acids and lipids

Membrane transport

Stress response

Aromatic compound

Cell division and cell cycle

Nitrogen metabolism

Sulphur metabolism

Motility and chemotaxis

Phosphorus metabolism

Potassium metabolism

Cell signalling

Secondary metabolismDNA metabolism

Carbohydrates

Amino acids

Virulence

Protein metabolism

Respiration

Photosynthesis

Cofactors etc

RNA metabolism

20

Key

Kristensen Mushegian Dolja Koonin Trends Microbiol 2010

Most of the viromes might not even consists of typical viruses butrather of pseudovirus particles that carry microbial genes (GTAs)

Comparative genomics shows that viruses that cause human diseases

belong to families that evolved hundreds of millions or even billions

years ago

Viruses accompany the evolving cellular life throughout its history and

might even predate it

Some viruses are comparable to cellular life forms in size and genetic complexity

Mimivirus genome (~12 Mbp ~1000 genes) is twice as large as that of Mycoplasma genitalium (580 kbp ~500 genes)

The largest most complex viruses NCLDV (Nucleo-Cytoplasmic Large DNA viruses of eukaryotes) ndashthis is where the smallpox virus belongs

Raoult et al Science 2004 Nov 19306(5700)1344 Suzan-Monti et al PLoS ONE 2007 Mar 282(3)

Iyer Aravind KooninCommon origin of four diverse families of large eukaryotic DNA viruses J Virol 2001 75 11720Iyer et al Evolutionary genomics of nucleo-cytoplasmic large DNA viruses Virus Res 2006 Apr117(1)156

The largest most complex viruses the Nucleocytoplasmic Large DNA Viruses (NCLDV)

(this is where the smallpox virus AND themimivirus belong)

6 families of NCLDVhellipand counting

size kb hosts-poxviridae 26 [134-360] vertebrates insects-asfarviridae 1 [170] vertebrates protists()-iridoviridae 8 [103-212] vertebrates insects protists()-ascoviridae 4 [119-174] insects-phycodnaviridae 9 [155-407] algae haptophytes stramenopiles -mimiviridae 2 [1181-1200] amoebozoa algae()-[Marseille virus] 1 [368] amoebozoa ndash new family

The case for the monophyly (common origin) of NCLDV

9 universally conserved hallmark genes (vaccinia gene names)-primase (D5-N)-helicase (D5-C)-DNA polymerase (E9)-packaging ATPase (A32)-Major capsid protein (D13 non-capsid in poxviruses)-Thiol-oxidoreductase (E10)-Helicase (D6 D11)-ST protein kinase (F10)-Transcription factor VLTF2 (A1)

47 genes mapped to the last common ancestor of NCLDVby maximum likelihood ndash all main functional systems represented

Iyer et al J Virol 2001 Virus Res 2006 Yutin et al Virol J 2009

b1 Helvib1 Spofr

b1 Trinil2 Invir

l1 Aedtal3 Lymch

l3 Lymdil5 Ambtil5 Frovi

l5 Singrl4 Infsp

m6 Marseillevirusn1 Acapon2 Mamav

q1 Acatuq1 ParFRq1 ParMTq1 ParAR

q1 ParNYq1 Parbu

q6 Ostviq3 Ectsiq3 Felspq2 Emihu

u2 Amsmou2 Melsa

u1 Bovpau1 Orfviu1 Deevi

u1 Swiviu1 Myxviu1 Rabfi

u1 Goaviu1 Lumsku1 Shevi

u1 Tanviu1 Yabliu1 Yabmo

u1 Vacviu1 Varvi

u1 Molcou1 Crovi

u1 Canviu1 Fowvi

c1 Afrsw

1000010000

93159970

10000

1000010000

6702

9983

10000

7557

10000

1000010000

999310000

10000

9910

10000

7074

10000

6735

10000

10000

7417

10000

4747

454010000

9840

1000010000

10000

10000

10000

9720

9911

5452

9871

10000

10000

10000

05

Phycodnaviridae

Poxviridae

Ascoviridaeamp

Iridoviridae

MimiviridaeMarseillevirus

Asfarviridae

Phylogeny of NCLDV based on concatenation of 4 universal genes primase-helicase DNA polymerase packaging ATPase VLTF2 transcription factor

HOSTAnimals+ diverse protists

Amoebozoa Algae animals()

ChlorophytesHaptophytesstramenopiles

Animals haptophytes

bullDivergence of NCLDVmost likely antedatesdivergence of eukaryoticsupergroupsbullAlternativeaddition extensive horizontal transfer of viruses

Boyer Yutin et al PNAS 2009Koonin Yutin Intervirology2010

Animals

Boyer M Yutin N Pagnier I Barrassi L Fournous G Espinosa L Robert C AzzaS Sun S Rossmann MG Suzan-Monti M La Scola B Koonin EV Raoult DPNAS 2009106(51)21848-53

Giant Marseillevirus highlights the role of amoebae as a melting pot in emergence of chimeric microorganisms

Giant viruses such as Mimivirus isolated from amoeba found in aquatic habitatsshow biological sophistication comparable to that of simple cellular life formsand seem to evolve by similar mechanisms including extensive gene duplicationand horizontal gene transfer (HGT) possibly in part through a viral parasitethe virophage We report here the isolation of Marseille virus a previouslyuncharacterized giant virus of amoeba The virions of Marseillevirus encompass a 368-kb genome a minimum of 49 proteins and some messenger RNAs Phylogenetic analysis of core genes indicates that Marseillevirus is the prototypeof a family of nucleocytoplasmic large DNA viruses (NCLDV) of eukaryotes The genome repertoire of the virus is composed of typical NCLDV core genes and genes apparently obtained from eukaryotic hosts and their parasites or symbionts both bacterial and viral We propose that amoebae are melting potsldquoof microbial evolution where diverse forms emerge including giant viruses with complex gene repertoires of various origins

The mosaic composition of the Marseille virus genome

Gene content tree intervirus gene transfer

Amoeba as a melting pot for HGT between viruses andbacterial endosymbionts

There are really weird creatures out thereSome NCLDV host their own parasites

La Scola et al The virophage as a unique parasite of the giant mimivirus Nature 2008

Chimeric origin of the virophage genome

NCLDV

Mimivirus

Archaeal virus

NCLDVenvironmental only

Bacterialtransposon

La Scola et al Nature 2008 455100-4

Hypothesisorigin of the NCLDV (and other viruses ofEukaryotes) in the

the second melting pot of virus evolution ndasheukaryogenesis

Phage scaffold(virus hallmark genes)

Eukaryotic additionsdisplacements

Koonin Yutin Intervirology 2010

Poliovirus74 kb

VP0 3DPol2C 3CPro Angg

IRESVP1VP3 2APro 2B

3A 3B

Jelly-roll CPsSuperfamily 3

helicase RdRp

Viral hallmark genes

3C-Pro3DPol

RdRp3CPro g

VPg2C

S3H

Picornaviral lsquosignaturersquo genes

Protein sequence conservation

Some of the smallest viruses (this is where poliovirus belongs) The Big Bang of picorna-like virus evolution

Koonin Wolf Nagasaki Dolja Nat Rev Microbiol 2008 Dec6(12)925-39

Picorna-likeviruses possessdiverse arraysof hallmark andunique genes

Picorna-likeviral superfamily

Marine eukaryotic plankton carries a wealth ofpositive-strand RNA viruses nearly all belong to the picorna-like superfamily

Lang et al (2004) Virology 320206

HcRNAV44 kb

VP

HaRNAV86 kb

S-Pro Pol

VP1VP3VP2PolHel3

RsRNAV89 kb AnPolHel3 VP 1-3

SssRNAV90 kb An

PolHel3 VP 1-3C-Pro

sg sg

An

Nagasaki et al (2005) Appl Env Microbiol 718888

Culley et al (2006) Science 3121795

The amended Picornavirus-like

superfamily includes 14 recognized viral families

4 floating genera and15 unclassified positive-

strand anddouble-strand RNA

viruses that infect hosts from 4 of the 5 eukaryotic

supergroups-6 distinct clades

from RdRp phylogeny-diverse genome layouts

View from the viral side

6 major clades of picorna-likeviruses - 5 infect eukaryotesfrom different supergroups

The 5 supergroups of eukaryotes and their picorna-like viruses

Complementary view from thehost side

4 of the 5 eukaryoticsupergroups host diverse picorna-like viruses (no onechecked the 5th)

PolioV74 kb

VP0 3DPol2C 3CPro Angg

IRESVP1VP3 2APro 2B

3A 3B

Jelly-roll CPsSuperfamily 3

helicase RdRp

DNA phages Bacterial group II

retroelements

Bacterial and mitochondrialHtrA family proteases

Likely origins of the signature genes of the picorna-like superfamily were inferred usingsignature PSSMs and PSI-BLAST searches

3C-Pro

Hallmark genes of picorna-like viruses have distinct prokaryoticorigins

Radiation of major viralclades occurred in aldquoBig Bangrdquo during eukaryogenesis and antedates the divergenceof eukaryotic supergroups-the viruses then ldquosampledrdquo the hosts

Conclusions on the picorna and NCLDV storiesBig Bangs of virus evolution

bullPhylogenomic analysis of the picorna-like superfamily of eukaryotic RNA viruses indicates that the major lineages within this superfamily diverged prior to the divergence of the eukaryotic supergroupsbullBig Bang ndash an explosive early phase of viral evolutionbullMost likely the same pattern holds for other major groups ofviruses as illustrated by the evolutionary study of NCLDVa completely different group of virusesbullThe Big Bangs of eukaryotic virus evolution occurred concomitantly with a similar rapid phase of host evolution andcould be a manifestation of a general model of major evolutionarytransitions

Koonin The Biological Big Bang model for the major transitions in evolutionBiol Direct 2007 Aug 20221

Koonin Wolf Nagasaki Dolja Nat Rev Microbiol 2008 Dec6(12)925-39

+

1 Positive-strandRNA

3-30 kb - +R R ET

T+R

Class Replication cycle

2 Double-strandRNA

4-25 kb plusmnTr R E

T

+ plusmn

3 Negative-strandRNA

11-20 kbR R E

RdRp CPT

+-Tr -

+

Genetic cycles of RNA viruses

Diversity of viral genetic cycles versus the uniform genetic cycle of all cellular organisms Viruses are the

biospherersquos laboratory of genomic strategies

RdRp CP

RdRp CP

+4 Retroid

RNA viruses 7-12 kb +RT Tr E

T+Tr

plusmn

5 Retroid DNAviruses

elements2-10 kb +Tr

T+Tr

plusmnE RT

plusmn

RT CP

RT

Class Replication cycle

Genetic cycles of retroid viruses and retroelements

Genetic cycles of DNA viruses and plasmids 6 ssDNAvirusesplasmids2-11 kb

7 dsDNA virusesplasmids

5-1200 kb

+RCR RCR E

+Tr

plusmn +

TRCE

Tr

plusmn

+T

plusmnR E

Pr-Pol ALL CELLULARORGANISMS

YOU ARE HERE

I Genes with readily detectable homologs from cellular life forms 1 Genes with closely related homologs from cellular organisms

(typically the host of the given virus) present in a narrow group of viruses2 Genes that are conserved within a virus lineage or even several

lineages and have moderately close cellular homologsOrigin relatively recent (1) or ancient (2) acquisition from host

II Virus-specific genes3 ORFans ie genes without detectable homologs except possibly

in closely related viruses 4 Virus-specific genes that are conserved within a virus lineageAcquisition from host but with rapid divergence from ancestor once within viral genomes

III Viral hallmark genes5 Genes shared by many diverse virus lineages with only very distant

homologs in cellular organisms

Natural history of viral genes a one-page summary of viral comparative genomics

Contributions of different classes of viral genes to the genomes of different classes of viruses strong dependence on genome size

Genome size (log) 0 1 2 3

Recent acquisitions

Old acquisitions

Virus-specific ORFans

Virus-specific conserved

Virus hallmark

Most RNAvirusesretroelementsRCR replicons

Large RNA virusesadenotailedphages

NCLDVherpeslargephages

Natural history of viral genesViral Hallmark Genes

Shared by many diverse groups of viruses from the smallest RNA viruses to the giant DNA viruses

Strong support for monophyly of all viral members of the respective gene families

Only distant homologs in cellular organisms

Can be viewed as signatures of the lsquovirus statersquo

Play major roles in genome replication packagingand assembly

4 Rolling circle replicationinitiation endonuclease

1 Jelly-roll capsid protein

2 Superfamily 3 helicase

Protein products of viral hallmark genes

3 RNA-dependent RNA polymeraseand Reverse transcriptase

5 Viral archaeo-eukaryotic DNA primase

6 UL9-like superfamily 2 helicase

7 Packaging ATPase of the FtsK family

8 ATPase subunit of terminase

The primordial gene pool hypothesis (extremely counterintuitive ndash Santiago Elena Feb 16 2011)

The hallmark genes AND by implication the major lineages of modern viruses (at least viruses of prokaryotes) descend directly from a primordial gene pool

Koonin Senkevich Dolja Biol Direct 2006 129

Viral hallmark genes arebullpresent in a huge diversity of viruses and other selfish elementsbullrepresented only by remote homologs in cellular life forms

-synergy with the diversity of genomic strategies

A crucial corollary If viruses come directly from a primordial gene pool then origin of viruses isinextricably linked to the origin of cells

Cell degeneration Escaped genes Pr imordial geneticsystems

CELL

SMALLPARASITICCELL

VERYSMALLVIRUS

CHROMOSOME

PLASMID

VIRUS

RNA

VIRUS

PRE-CELLULAR LIFE FORMS

DNA

VIRUS

mRNA

VIRUS

Competing concepts of the or igin of viruses

CELLS

Origin and evolution of virus-like genetic

elementsin the pre-cellular

era

Koonin Martin TIG 2005Koonin Senkevich Dolja Biol Direct 2006 129Koonin Ann NY Acad Sci 2009

Replicon fusionyields chromosomes

bullViruses and virus-like genetic elements are not ldquojustrdquo pathogens they aredominant entities in the biospherebullEmergence of virus-like parasites is inevitable in any replicating systembullIn the pre-cellular epoch the genetic elements that later became viral andcellular genomes comprised a single pool in which they mixed matched andevolved new increasingly complex gene ensemblesbullDifferent replication strategies including RNA replication reverse transcriptionand DNA replication evolved already in the primordial genetic poolbullWith the emergence of prokaryotic cells a distinct pool of viral genes formed that retained its identity ever since as evidenced by the extant distribution ofviral hallmark genes ldquovirus worldrdquo or the virospherebullThe emergence of the eukaryotic cell was a second melting pot of virus evolution from which viruses of eukaryotes originated via recombination of genes from prokaryote viruses retroelements and the evolving eukaryotic hostbullViruses make essential contributions to the evolution of the genomes of cellularlife forms in particular as vehicles of HGT GTAs transducing phages

The ancient Virus World (VW)

Koonin EV Senkevich TG Dolja VV The ancient Virus World and evolution of cells Biol Direct 2006 Koonin EV Wolf YI Nagasaki K Dolja VV The complexity of the virus world Nat Rev Microbiol 2009

Evolution of antivirus defense systems

bull CRISPRCas system of adaptive immunity in prokaryotes

bull A case for Lamarckian evolutionbull The perennial virus-host arms race

CRISPR repeats and Cas genesCRISPR Clustered Regularly interspaced short palindromic repeatsCas CRISPR-associated (genes)

Sorek et al Nature Rev Microbiol 2008

Makarova KS Aravind L Grishin NV Rogozin IB Koonin EVA DNA repair system specific for thermophilic Archaea and bacteria predicted by genomic context analysisNucleic Acids Res 2002 Jan 1530(2)482-96

During a systematic analysis of conserved gene context in prokaryotic genomes a previously undetected complex partially conserved neighborhood consisting of more than 20 genes was discovered in most Archaea and some bacteria including the hyperthermophiles Thermotoga maritima and Aquifex aeolicus The gene composition and gene order in this neighborhood vary greatly between species but all versions have a stable conserved core that consists of five genes One of the core genes encodes a predicted DNA helicase often fused to a predicted HD-superfamily hydrolase and another encodes a RecB family exonuclease three core genes remain uncharacterized but one of these might encode a nuclease of a new family helliphelliphelliphellipThe functional features of the proteins encoded in this neighborhood suggest that they comprise a previously undetected DNA repair system which to our knowledge is the first repair system largely specific for thermophiles to be identified

Protein components of the system an update with unification of many diverse families

~25 families altogetherFamily SubfamilyA Phyletic

distributionBComments

1 COG1518 COG1518 (cas1) All Putative novel nucleaseintegrase Mostly α-helical protein

2 COG1343 COG1343 (cas2) COG3512ygbF-like

MTH324-likey1723_N-like

All Small protein related to VapD fused to helicase (COG1203) iny1723-like proteins

3 COG1203 COG1203 (cas3) All DNA helicase Most proteins have fusion to HD nuclease

4 RecB-like nuclease

COG1468 (cas4) COG4343 All RecB-like nuclease Contains three-cysteine C-terminal cluster

5 RAMP Repair-associated mysterious

protein

COG1688 (cas5)COG1769 COG1583COG1567 COG1336COG1367 COG1604COG1337 COG1332COG5551

BH0337-likeMJ0978-likeYgcH-like

y1726-like y1727-like

All Belong to ldquoRAMPrdquo family possibly RNA-binding proteinstructurally related to duplicated ferredoxin fold (PDB 1wj9)

6 COG1857 COG1857 COG3649 YgcJ-like y1725-like

All αβ protein predicted nuclease or integrase

7 HD-like nuclease

COG1203 (N-terminus) COG2254 All HD-like nuclease

8 BH0338 BH0338-likeMTH1090-like

All mostlyarchaea and FIRM

Large Zn-finger containing proteins probably nucleases (nucleaseactivity was shown for MTH1090 (ref)

9 COG1353 predicted

polymerase

COG1353 Most archaea some bacteria

Predicted palm-domain polymerase distantly related to viral RdRp and RT

hellipand ~15 other less common proteins Makarova et al BD 2006

CRISPR clustered regularly interspaced short palindromic repeats

CRISPR repeats

TB and TA

Strepto-like

Ecoli-like

Pasteurella-like

Cas2COG1343

Cas1COG1518

Cas4COG1468

Cas3COG1203

ldquoThe common structural characteristics of CRISPR loci are (i) the presence of multiple short direct repeats which show no or very little sequence variation within a given locus (ii) the presence of non-repetitive spacer sequences between the repeats of similar size (iii) the presence of a common leader sequence of a few hundred basepairs in most species harbouring multiple CRISPR loci (iv) the absence of long open reading frames within the locus and (v) the presence of the cas1 gene accompanied by the cas2 cas3 or cas4 genes in CRISPR-containing speciesrdquo

The cas genes are our ldquorepairrdquo system

ldquoHelicaserdquo cassette

ldquoPolymeraserdquo cassette

paragon of prokaryotic genome plasticity-extensive gene shuffling-evidence of multiple horizontal transfers-widespread gene loss

RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins

extreme sequence diversity MOTIFs specific I II III IV V COGs 13361367 hhshhGs ust-lKGhh+hh hhGtt hD lGhttsGh 160413371332 y1726-like slhlpEKuVRGT lRTIDs YGuVTsGhuh COG1851 hGphpGpsaFh hGFGRh BH0337-like TpA-h+GIh-uIh hhLpDV LGsREhuht COG1567 hhhpp ups-lhtAhh luscoGhGh COG1769 hhhh+Ph-hhh ss-hhGhlhsh hGh hGtcp+hsthchp COG1688 (Cas5) hhhhhts sssshhGhlsh lGttphh COGs 15835551 hhhhoPhhl hGtppshGFGl YgcH-like hHphlh hGu+uhGhGhh y1727-like LHphLh GFsthGLStss MJ0978-like hHNH lG+tsuhGhGol

Ferredoxin foldRNA-recognition motif (RRM)

The sequence similarity space of CRISPR repeats visualized with the BioLayoutprogram [26] Dots denote individual repeat sequences connecting lines represent Smith-Waterman similarities such that closer dots represent more similar sequences Dot colors denote cluster association as derived from MCL clustering The 12 largest clusters are indicated by circles together with their sequence logos coarse phylogenetic composition and sample secondary structures where applicableKunin et al Genome Biology 2007 8R61 doi101186gb-2007-8-4-r61

CRISPR show extreme diversity and complex clustering

Arcful GTTGAAATC-------AGACCAAAATGGGATTGAAAG- Metthe GTTAAAATC-------AGACCAAAATGGGATTGAAAT- Pyraby GTTCCAATA-------AGACTAGAATAGAATTGAAAG- Pyrhor GTTTAATAA--------GACTAAAATAGAATTGAAAG- Sulsol GATTAATCC-------------CAAAAGAATTGAAAG- Pyrhor GTTTCCGTA--------GAACTAAATAGTGTGGAAAG- Pyraby GTTTCCGTA--------GAACTTAGTAGTGTGGAAAG- Pyraer GTTTCAACT------------ATCTTTTGATTTTTGG- Pictor GTTAAATAA-------TAACCTAAATAGGATTGAAAG- Theaer GTAAAATAG---------ACCTTAATAGGATTGAAAG- Metace GTTTCAATC-------CCTCAAAGGTCTGATTTTAAC- Sultok GATGAATCC------------CAAAAGGAATTGAAAG- Sulaci GTTTTAGTT-------------TCTTGTCGTTATTAC- Pictor GTTTAAGAA--------TTACTAGATAGTATGGAGT-- Halmar GTTTCAGAC-------GGACCCTTGTGGGGTTGAAGC- Metjan ATTAAAATC-------AGACCGTTTCGGAATGGAAAT- Metkan GTTTCATTA-CCCGTATTATTACGGGTTAATTGCGAG- Nanequ CTTTCAATA---------TTTCTAATATATTAGAAAC- Themar GTTTCAATA-------CTTCCTTAGAGGTATGGAAAC- Theten GTTTCAATC-------CCTTTTAGGTAGGCTAAAAAC Thethe GTTGCAAAC-CTCGTTAGCCTCGTAGAGGATTGAAAC- Aquaeo GTTTTAACT--------CCACACGGTACATTAGAAAC- Bachal GTCGCACTC----TATATGGGT-GCGTGGATTGAAAT- Azoarcus GTGTTCCCC-------GCGCATCGCGGGGGTTGAAG-- Chltep GTCTTCCCC--------ACGCC-CGTGGGGGTGTTTC- Chrvio GTGCTCCCC-------CACGCA-CGTGGGGATGAACCG Clotet GTATTAGTA-------GCACCA-TATTGGAATGTAAAT Anabaena GTTTTAATTAACAAAAATCCCTATCAGGGATTGAAAC- Myctub GTTTCCGTC--CCCTCTCGGGGTTTTGGGTCTGACGAC Yerpes GTTCACTGC--------CGCACAGGCAGCTTAGAAA-- Metcap GTTTCAATCCACTCCCGGCTATTTAGCCGGGAGATAC- Mycgal GTTTTAGCA-CTGTACAATACTTGTGTAAGCAATAAC- Pasmul GTTAACTGC--------CGTATAGGCAGCTTAGAAA-- Pasmul GTTGTAGTTCCCTCTCTCATTTCGCAGTGCTACAAT-- Pasmul GTTCACCAT--------CGTGTAGATGGCTTAGAAA--

All

A large subset of CRISPRs is conserved among diverse specieseven between archaea and bacteria suggesting that

CRISPRs are horizontally transferred together with cas genes

Mojica FJ Diez-Villasenor C Garcia-Martinez J Soria E

Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elementsJ Mol Evol 2005 Feb60(2)174-82

Here we show that CRISPR spacers derive from preexisting sequences either chromosomal or within transmissible genetic elements such as bacteriophages and conjugative plasmids

helliphelliphelliphellipThe transcription of the CRISPR loci (Tang et al 2002) suggests that such activity could be executed by CRISPR-RNA molecules acting as regulatory RNA that specifically recognizes the target through the homologous RNA-spacer sequence similarly to the eukaryotic interference RNA

HypothesisCRISPRCasbull is a prokaryotic immunity system that functions on the

RNAi principle

bull integrates short fragments of essential phageplasmid genes into CRISPRs

bull When expressed these fragments (psiRNA ndash after prokaryotic siRNA) silence the target gene and make the organism immune to the respective agent

bull contains all or most of the protein activities involved in these processes

bull Some of the Cas proteins are functional analogs of the eukaryotic proteins involved in RNAi in particular components of RISC (RNA-Induced Silencing Complex) and form prokaryotic analogs of RISC (pRISCs)

Transcription

(protein-guided) RNA folding

specific transcription factor (COG1517)

cellular RNApol

polycistronic pre-psiRNA

p-dicer (Helicase+HD-Hydrolase)(COG1203 COG2254)

RNA processing(fast)

pre-psiRNA75-100 nt

RNA processing(slow)

p-dicer

psiRNA25-45 nt

p-slicer(COG1468 COG4343

COG1857)

RAMP

target RNA cleavage

p-RISC

plasmid or phage mRNA

Annealing to RNA target

p-RISC

The basic scheme of CASS functioning

psiRNA25-45 nt

RAMP

RAMP-RNA complex(unstable)plasmid or phage

mRNA

Primer elongationCASS RNApol(COG1353)

Amplified annealing

long dsRNA (stable)

p-dicerDuplex degradation

RAMPRAMP binding

Cycle continues

Annealing to RNA target

Variant of CASS functioning with polymerasepsi-RNA amplification(mostly in thermophiles but also in Mycobacteria)

plasmid or phage mRNA

pre-psiRNA75-100 nt

CASS RNApol(COG1353)Or other RT

Reverse transcription with random copy choice

Random RNA recombination and reverse transcription

dsDNA with CRISPRs and target-derived spacers

Homologous recombination with genomic CRISPR region

integrase(COG1518)

genomic DNA

genomic DNA with new target-derived spacer

OR

New psiRNA generation

Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12

Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity

Key validation

bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease

Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes

(1) Adaptation

(2) Expression amp Processing

(3) Interference

invading virus plasmid

CRISPRleader

5 4 3 2 1

6 5 4 3 2 1

invading virus plasmid

invading virus plasmid

host

host

cas operon

Cascade

Cas3

cas3 cas2cas1cas5

Van der Oost et al TIBS 2009

The three types of CRISPRCas systems and their signature genes

1343

1343

1343

1343

Helicase1203

HD-f

Cas2

Cas6

Cas6

Cas5

Cas8

Cas9

Cas1

Cas3

Cas10

Cas11

Cas7

Cas6

Cas6

Cas5 Cas6

Cas4

Cas6

13433513

1343

RuvC-f

3513

HNH

13433513

1857

18571857 RAMP1688

RAMP1688

RecB-f1468

RecB-f1468

Helicase1203

R AMP1583

1857 1857Helicase1203

1857Helicase1203

1343 y1724

ygcKygcL

EcoliCASS2

YpestCASS3

NmeniCASS4

MtubeCASS6 RAMP module

DvulgCASS1

Tneap and HmariCASS7

ApernCASS5

Type I (Helicase cassette)

Cascade

Type II (HNH-type)

Type III (RAMP module or Polymerase cassette)

CASS4a

1857 1343RecB-f1468

Helicase1203

BH0338MTH1090LA3191

1421 RAMP1567

RAMP15833337RAMP

1583

RAMP1583

RAMP1769

SPy1049

Cas6

Cas61857

RecB-f1468

RAMP1688

Helicase1203

AF0070

R AMP1583

1343

1517

Cas12 Cas13

RAMP1688

RAMP1688

RAMP1688

Makarova et al NRM submitted

Experimental data on CRISPRCas systems

1857

1857

1857

RAMP1688

10

Cordip DIP0037Bacfra BF3951

Wolsuc WS1444Camjej Cj1522c

Neimen NMA0630Pasmul PM1126

Strthe str0658Strpyo Spy 0770

Treden TDE0328Staepi SERP2463

Mycgal MGA 0523Mycmob MMOB0320

Porgin PG1982Legpne lpp0161

Wolsuc WS1615Sulsol SSO1450

Aerper APE1240Sulsol SSO1405

Pyraer PAE0200Pyraer PAE0081

Metmaz MM0559Metmaz MM3249

Arcful AF1878Mansuc MS1635

Niteur NE0111Myctub Rv2817c

Nostoc alr0381Synech slr7071Thethe TTHB145

Chrvio CV1229Xanaxo XAC3842

Chltep CT1130Desvul DVUA0134

Metcap MCA0651Azoarc ebA3283

Deigeo_DRAFT_1682Mansuc MS0981

Baccla ABC3592Bachal BH0341

Strpyo Spy 1286Vibvul VVA1544

Nostoc alr1468Synech slr7092

Synech slr7016Geosul GSU0057

Thethe TTHB224Nostoc alr1568

Metkan MK1312

Esccol b2755Saltyp STM2938

Phopro PBPRC0034Geosul GSU1392

Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639

Symthe_STH663Corjei jk0643

Nocfar nfa44220Strave SAV7537

Metcap MCA0930Cordip DIP2214

Chrvio CV1756Zymmob ZMO0680

Pasmul PM0311Erwcar ECA3679

Yerpes y1722Legpne lpl2837Phopro PBPRB1995

Acinet ACIAD2484

Clotet CTC01148Fusnuc FN1177

Aquaeo aq 369Theten TTE2658Themar TM1797

Bacfra BF2544Porgin PG2014

Halmar pNG4053Clotet CTC01463

Metjan MJ0378Pyrhor PH1245

Thekod TK0455Metthe MTH1084

Arcful AF2435Metace MA3670

Nanequ NEQ017Pyrhor PH0173

Pictor PTO0003Pictor PTO0049Thevol TVN0106

1343

1343

1343

1343

1518

1518

1343

1518

RecB-f4343

RecB-f1468

RecB-f1468

RecB-f1468

Helicase12031857 RAMP

1688

RAMP1688

Helicase1203

Helicase1203

RAMPBH0337

RAMPYgcH

RAMPy1727

RAMPy17261857

Helicase1203

HD-f

Helicase1203

HD-f

1343

HD-f

HD-f

HD-f

RuvC-f

AF0070

BH0338

y1724

ygcKygcL

RAMP1583

HTH

AF1870

SPy1049

3574

AF1870

PH0918

HTH 3574

1857 RAMP1688

RecB-f1468

Helicase1203

HD-f

AF1870

MTH1090

AF1873

RAMP1583

3574

AF1870

3574

1343

1

7

2

3

4

5

6

7a

Sulsol SSO1999Sulsol SSO1440

Sultok ST2639Sultok ST0032

Sulsol SSO1402Pyraer PAE0068

Arcful AF1874Pyrhor PH0917Thekod TK0450

Pyraby PAB1689

13433513

3513

1518

RuvC-f

HNH

4aHNH

1 0

Pyrhor PH0162

6

64

6

4

16

17

7

7

7

7

1

2

7

7

7

7

5 7

5 75 7

5 75 7

5 7

6 1

7

Metkan MK1314

Metace MA1931

Metmaz MM3356

Metbar Mbar_A1355

Strthe stu0960Myctub Rv2823cStaepi SERP2461

Themar TM1811Mansuc MS1651

Niteur NE0123Thethe TTP0102

Metthe MTH1082

Metthe MTH326

Metjan MJ1672Pictor PTO0053Thevol TVN0112

Metkan MK1297Thethe TTP0115

Bachal BH0328Synech sll7090

Aquaeo aq 387Theten TTE2639

Themar TM1794Arcful AF1867

Pyrfur PF1129Thefus Tfu1578

Porgin PG1987Sultok ST1979

Sulsol SSO1991Sulsol SSO1729

Sultok ST0012Sulaci Saci 2046Nostoc all1479

1421 RAMP1567

RAMP1583

RAMPMTH323

RAMP1769

XMTH324

ldquoPolymeraserdquo cassette

ldquoHelicaserdquo cassette(A)

(B)

(C)

Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4

Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56

Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)

mRNA targeting

Phylogeny of Cas1 and the 3 types of CRISPRCas systems

228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes

EcoliCASS2Type IType I

Type I

Type I

Type I

Type III

Type III

Type III

Type III

Type III

Type III

Type II

YpestCASS3

NmeniCASS4

PolymeraseRAMPmoduleMtubeCASS6

PolRAMP

DvulgCASS1

TneapHmariCASS7

ApernCASS5

cas1cas9 cas2 cas4

cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI

II

IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2

RAMP

Makarova et al NRM submitted

CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria

bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes

54

256

1537

26

516

2

9

7 56107

3

10

13

383

210

46

216

8

1

7 70211

10

1

0102030405060708090

100

1623

84 6

17 7

2322

0

9

0

0

151

2

1

21

17

20 1

0

1533 28 7 14

0

9 7 40

117 210

0102030405060708090

100

Type I Type II Type IIICas1presenceabsence

Makarova et al in preparation

Modular evolution of the 3 types of CRISPRCas

Makarova et al in preparation

Mol Microbiol 2011 Jan79(2)484-502

A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair

Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF

Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair

Back to repair functions

The 3 major modalities of evolution

Koonin Wolf Biol Direct 2009

CRISPRCas as a bona fide Lamarckian system

Koonin Wolf Biol Direct 2009

Phenomenon Biological rolefunction

Phyletic spread

Lamarckian criteria

Genomic changes caused by environmental factor

Changes are specific to relevant genomic loci

Changes provide adaptation to the causative factor

Bona fide LamarckianCRISPRCas Defense against

viruses and other mobile elements

Most of the Archaeaand many bacteria

Yes Yes Yes

piRNA Defense against transposable elements in germline

Animals Yes Yes Yes

HGT (specific cases)

Adaptation to new environment stress response resistance

Archaea bacteria unicellular eukaryotes

Yes Yes Yes

Quasi-LamarckianHGT (general phenomenon)

Diverse innovations Archaea bacteria unicellular eukaryotes

Yes No Yesno

Stress-induced mutagenesis

Stress responseresistanceadaptation to new conditions

Ubiquitous Yes No or partially

Yes (but general evolvabilityenhanced as well)

Diverse Lamarckian and quasi-Lamarckian phenomena

Koonin Wolf Biol Direct 2009

Stress as a gauge of evolutionary modality

Koonin Wolf Biol Direct 2009

bullEvolution of parasites is intrinsic to any replicator system

bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since

bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms

bullPerennial arms race between parasites and hosts is one of the principal factors of evolution

Valerian DoljaOregon State University

Yuri Wolf NCBITatiana Senkevich NIAID NIH

Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille

Didier Raoult et al

Bill MartinUniv Duesseldorf

Kira Makarova NCBI

  • Slide Number 1
  • What is a virus
  • Slide Number 3
  • Slide Number 4
  • Slide Number 5
  • Comparative genomics shows that viruses that cause human diseases belong to families that evolved hundreds of millions or even billions years agoViruses accompany the evolving cellular life throughout its history and might even predate it
  • Slide Number 7
  • Slide Number 8
  • Slide Number 9
  • Slide Number 10
  • Slide Number 11
  • Slide Number 12
  • Slide Number 13
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Slide Number 18
  • Slide Number 19
  • Slide Number 20
  • Slide Number 21
  • Slide Number 22
  • Slide Number 23
  • Slide Number 24
  • Slide Number 25
  • Slide Number 26
  • Slide Number 27
  • Slide Number 28
  • Slide Number 29
  • Slide Number 30
  • Slide Number 31
  • Slide Number 32
  • Slide Number 33
  • Slide Number 34
  • Slide Number 35
  • Slide Number 36
  • Evolution of antivirus defense systems
  • Slide Number 38
  • Slide Number 39
  • Slide Number 40
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins extreme sequence diversity
  • Slide Number 46
  • Slide Number 47
  • Slide Number 48
  • Slide Number 49
  • Hypothesis
  • Slide Number 51
  • Slide Number 52
  • Slide Number 53
  • Slide Number 54
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • The three types of CRISPRCas systems and their signature genes
  • Experimental data on CRISPRCas systems
  • Phylogeny of Cas1 and the 3 types of CRISPRCas systems
  • CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
  • Slide Number 62
  • Slide Number 63
  • Slide Number 64
  • Slide Number 65
  • Slide Number 66
  • Slide Number 67
  • Slide Number 68
  • Slide Number 69
Page 3: The Virus World, its evolution, evolution of antiviral defense

bullViruses are the most abundant biological entities in the biosphere there are 10-100 virus particles per cellbullThe pangenomes of viruses and cellular organismshave [at least] comparable complexities

1 cm3 of seawater contains 106-109 virus particles

There are millions of diverse bacteriophage speciesin the water soil and gut

Suttle CA (2005) Nature 437356

Edwards and Rohwer (2005) Nat Rev Microbiol 3504

Viruses are the dominant entities in the biosphere ndash physically and genetically ndash as shown by viral metagenomics ndash virome studies

Mean of sequences with matches to major functional categories

0

5

10

15

Microbial Metagenomes

Vira

l Met

agen

omes

0 5 10 15

Nucleosides and nucleotides

Cell wall and capsule

Fatty acids and lipids

Membrane transport

Stress response

Aromatic compound

Cell division and cell cycle

Nitrogen metabolism

Sulphur metabolism

Motility and chemotaxis

Phosphorus metabolism

Potassium metabolism

Cell signalling

Secondary metabolismDNA metabolism

Carbohydrates

Amino acids

Virulence

Protein metabolism

Respiration

Photosynthesis

Cofactors etc

RNA metabolism

20

Key

Kristensen Mushegian Dolja Koonin Trends Microbiol 2010

Most of the viromes might not even consists of typical viruses butrather of pseudovirus particles that carry microbial genes (GTAs)

Comparative genomics shows that viruses that cause human diseases

belong to families that evolved hundreds of millions or even billions

years ago

Viruses accompany the evolving cellular life throughout its history and

might even predate it

Some viruses are comparable to cellular life forms in size and genetic complexity

Mimivirus genome (~12 Mbp ~1000 genes) is twice as large as that of Mycoplasma genitalium (580 kbp ~500 genes)

The largest most complex viruses NCLDV (Nucleo-Cytoplasmic Large DNA viruses of eukaryotes) ndashthis is where the smallpox virus belongs

Raoult et al Science 2004 Nov 19306(5700)1344 Suzan-Monti et al PLoS ONE 2007 Mar 282(3)

Iyer Aravind KooninCommon origin of four diverse families of large eukaryotic DNA viruses J Virol 2001 75 11720Iyer et al Evolutionary genomics of nucleo-cytoplasmic large DNA viruses Virus Res 2006 Apr117(1)156

The largest most complex viruses the Nucleocytoplasmic Large DNA Viruses (NCLDV)

(this is where the smallpox virus AND themimivirus belong)

6 families of NCLDVhellipand counting

size kb hosts-poxviridae 26 [134-360] vertebrates insects-asfarviridae 1 [170] vertebrates protists()-iridoviridae 8 [103-212] vertebrates insects protists()-ascoviridae 4 [119-174] insects-phycodnaviridae 9 [155-407] algae haptophytes stramenopiles -mimiviridae 2 [1181-1200] amoebozoa algae()-[Marseille virus] 1 [368] amoebozoa ndash new family

The case for the monophyly (common origin) of NCLDV

9 universally conserved hallmark genes (vaccinia gene names)-primase (D5-N)-helicase (D5-C)-DNA polymerase (E9)-packaging ATPase (A32)-Major capsid protein (D13 non-capsid in poxviruses)-Thiol-oxidoreductase (E10)-Helicase (D6 D11)-ST protein kinase (F10)-Transcription factor VLTF2 (A1)

47 genes mapped to the last common ancestor of NCLDVby maximum likelihood ndash all main functional systems represented

Iyer et al J Virol 2001 Virus Res 2006 Yutin et al Virol J 2009

b1 Helvib1 Spofr

b1 Trinil2 Invir

l1 Aedtal3 Lymch

l3 Lymdil5 Ambtil5 Frovi

l5 Singrl4 Infsp

m6 Marseillevirusn1 Acapon2 Mamav

q1 Acatuq1 ParFRq1 ParMTq1 ParAR

q1 ParNYq1 Parbu

q6 Ostviq3 Ectsiq3 Felspq2 Emihu

u2 Amsmou2 Melsa

u1 Bovpau1 Orfviu1 Deevi

u1 Swiviu1 Myxviu1 Rabfi

u1 Goaviu1 Lumsku1 Shevi

u1 Tanviu1 Yabliu1 Yabmo

u1 Vacviu1 Varvi

u1 Molcou1 Crovi

u1 Canviu1 Fowvi

c1 Afrsw

1000010000

93159970

10000

1000010000

6702

9983

10000

7557

10000

1000010000

999310000

10000

9910

10000

7074

10000

6735

10000

10000

7417

10000

4747

454010000

9840

1000010000

10000

10000

10000

9720

9911

5452

9871

10000

10000

10000

05

Phycodnaviridae

Poxviridae

Ascoviridaeamp

Iridoviridae

MimiviridaeMarseillevirus

Asfarviridae

Phylogeny of NCLDV based on concatenation of 4 universal genes primase-helicase DNA polymerase packaging ATPase VLTF2 transcription factor

HOSTAnimals+ diverse protists

Amoebozoa Algae animals()

ChlorophytesHaptophytesstramenopiles

Animals haptophytes

bullDivergence of NCLDVmost likely antedatesdivergence of eukaryoticsupergroupsbullAlternativeaddition extensive horizontal transfer of viruses

Boyer Yutin et al PNAS 2009Koonin Yutin Intervirology2010

Animals

Boyer M Yutin N Pagnier I Barrassi L Fournous G Espinosa L Robert C AzzaS Sun S Rossmann MG Suzan-Monti M La Scola B Koonin EV Raoult DPNAS 2009106(51)21848-53

Giant Marseillevirus highlights the role of amoebae as a melting pot in emergence of chimeric microorganisms

Giant viruses such as Mimivirus isolated from amoeba found in aquatic habitatsshow biological sophistication comparable to that of simple cellular life formsand seem to evolve by similar mechanisms including extensive gene duplicationand horizontal gene transfer (HGT) possibly in part through a viral parasitethe virophage We report here the isolation of Marseille virus a previouslyuncharacterized giant virus of amoeba The virions of Marseillevirus encompass a 368-kb genome a minimum of 49 proteins and some messenger RNAs Phylogenetic analysis of core genes indicates that Marseillevirus is the prototypeof a family of nucleocytoplasmic large DNA viruses (NCLDV) of eukaryotes The genome repertoire of the virus is composed of typical NCLDV core genes and genes apparently obtained from eukaryotic hosts and their parasites or symbionts both bacterial and viral We propose that amoebae are melting potsldquoof microbial evolution where diverse forms emerge including giant viruses with complex gene repertoires of various origins

The mosaic composition of the Marseille virus genome

Gene content tree intervirus gene transfer

Amoeba as a melting pot for HGT between viruses andbacterial endosymbionts

There are really weird creatures out thereSome NCLDV host their own parasites

La Scola et al The virophage as a unique parasite of the giant mimivirus Nature 2008

Chimeric origin of the virophage genome

NCLDV

Mimivirus

Archaeal virus

NCLDVenvironmental only

Bacterialtransposon

La Scola et al Nature 2008 455100-4

Hypothesisorigin of the NCLDV (and other viruses ofEukaryotes) in the

the second melting pot of virus evolution ndasheukaryogenesis

Phage scaffold(virus hallmark genes)

Eukaryotic additionsdisplacements

Koonin Yutin Intervirology 2010

Poliovirus74 kb

VP0 3DPol2C 3CPro Angg

IRESVP1VP3 2APro 2B

3A 3B

Jelly-roll CPsSuperfamily 3

helicase RdRp

Viral hallmark genes

3C-Pro3DPol

RdRp3CPro g

VPg2C

S3H

Picornaviral lsquosignaturersquo genes

Protein sequence conservation

Some of the smallest viruses (this is where poliovirus belongs) The Big Bang of picorna-like virus evolution

Koonin Wolf Nagasaki Dolja Nat Rev Microbiol 2008 Dec6(12)925-39

Picorna-likeviruses possessdiverse arraysof hallmark andunique genes

Picorna-likeviral superfamily

Marine eukaryotic plankton carries a wealth ofpositive-strand RNA viruses nearly all belong to the picorna-like superfamily

Lang et al (2004) Virology 320206

HcRNAV44 kb

VP

HaRNAV86 kb

S-Pro Pol

VP1VP3VP2PolHel3

RsRNAV89 kb AnPolHel3 VP 1-3

SssRNAV90 kb An

PolHel3 VP 1-3C-Pro

sg sg

An

Nagasaki et al (2005) Appl Env Microbiol 718888

Culley et al (2006) Science 3121795

The amended Picornavirus-like

superfamily includes 14 recognized viral families

4 floating genera and15 unclassified positive-

strand anddouble-strand RNA

viruses that infect hosts from 4 of the 5 eukaryotic

supergroups-6 distinct clades

from RdRp phylogeny-diverse genome layouts

View from the viral side

6 major clades of picorna-likeviruses - 5 infect eukaryotesfrom different supergroups

The 5 supergroups of eukaryotes and their picorna-like viruses

Complementary view from thehost side

4 of the 5 eukaryoticsupergroups host diverse picorna-like viruses (no onechecked the 5th)

PolioV74 kb

VP0 3DPol2C 3CPro Angg

IRESVP1VP3 2APro 2B

3A 3B

Jelly-roll CPsSuperfamily 3

helicase RdRp

DNA phages Bacterial group II

retroelements

Bacterial and mitochondrialHtrA family proteases

Likely origins of the signature genes of the picorna-like superfamily were inferred usingsignature PSSMs and PSI-BLAST searches

3C-Pro

Hallmark genes of picorna-like viruses have distinct prokaryoticorigins

Radiation of major viralclades occurred in aldquoBig Bangrdquo during eukaryogenesis and antedates the divergenceof eukaryotic supergroups-the viruses then ldquosampledrdquo the hosts

Conclusions on the picorna and NCLDV storiesBig Bangs of virus evolution

bullPhylogenomic analysis of the picorna-like superfamily of eukaryotic RNA viruses indicates that the major lineages within this superfamily diverged prior to the divergence of the eukaryotic supergroupsbullBig Bang ndash an explosive early phase of viral evolutionbullMost likely the same pattern holds for other major groups ofviruses as illustrated by the evolutionary study of NCLDVa completely different group of virusesbullThe Big Bangs of eukaryotic virus evolution occurred concomitantly with a similar rapid phase of host evolution andcould be a manifestation of a general model of major evolutionarytransitions

Koonin The Biological Big Bang model for the major transitions in evolutionBiol Direct 2007 Aug 20221

Koonin Wolf Nagasaki Dolja Nat Rev Microbiol 2008 Dec6(12)925-39

+

1 Positive-strandRNA

3-30 kb - +R R ET

T+R

Class Replication cycle

2 Double-strandRNA

4-25 kb plusmnTr R E

T

+ plusmn

3 Negative-strandRNA

11-20 kbR R E

RdRp CPT

+-Tr -

+

Genetic cycles of RNA viruses

Diversity of viral genetic cycles versus the uniform genetic cycle of all cellular organisms Viruses are the

biospherersquos laboratory of genomic strategies

RdRp CP

RdRp CP

+4 Retroid

RNA viruses 7-12 kb +RT Tr E

T+Tr

plusmn

5 Retroid DNAviruses

elements2-10 kb +Tr

T+Tr

plusmnE RT

plusmn

RT CP

RT

Class Replication cycle

Genetic cycles of retroid viruses and retroelements

Genetic cycles of DNA viruses and plasmids 6 ssDNAvirusesplasmids2-11 kb

7 dsDNA virusesplasmids

5-1200 kb

+RCR RCR E

+Tr

plusmn +

TRCE

Tr

plusmn

+T

plusmnR E

Pr-Pol ALL CELLULARORGANISMS

YOU ARE HERE

I Genes with readily detectable homologs from cellular life forms 1 Genes with closely related homologs from cellular organisms

(typically the host of the given virus) present in a narrow group of viruses2 Genes that are conserved within a virus lineage or even several

lineages and have moderately close cellular homologsOrigin relatively recent (1) or ancient (2) acquisition from host

II Virus-specific genes3 ORFans ie genes without detectable homologs except possibly

in closely related viruses 4 Virus-specific genes that are conserved within a virus lineageAcquisition from host but with rapid divergence from ancestor once within viral genomes

III Viral hallmark genes5 Genes shared by many diverse virus lineages with only very distant

homologs in cellular organisms

Natural history of viral genes a one-page summary of viral comparative genomics

Contributions of different classes of viral genes to the genomes of different classes of viruses strong dependence on genome size

Genome size (log) 0 1 2 3

Recent acquisitions

Old acquisitions

Virus-specific ORFans

Virus-specific conserved

Virus hallmark

Most RNAvirusesretroelementsRCR replicons

Large RNA virusesadenotailedphages

NCLDVherpeslargephages

Natural history of viral genesViral Hallmark Genes

Shared by many diverse groups of viruses from the smallest RNA viruses to the giant DNA viruses

Strong support for monophyly of all viral members of the respective gene families

Only distant homologs in cellular organisms

Can be viewed as signatures of the lsquovirus statersquo

Play major roles in genome replication packagingand assembly

4 Rolling circle replicationinitiation endonuclease

1 Jelly-roll capsid protein

2 Superfamily 3 helicase

Protein products of viral hallmark genes

3 RNA-dependent RNA polymeraseand Reverse transcriptase

5 Viral archaeo-eukaryotic DNA primase

6 UL9-like superfamily 2 helicase

7 Packaging ATPase of the FtsK family

8 ATPase subunit of terminase

The primordial gene pool hypothesis (extremely counterintuitive ndash Santiago Elena Feb 16 2011)

The hallmark genes AND by implication the major lineages of modern viruses (at least viruses of prokaryotes) descend directly from a primordial gene pool

Koonin Senkevich Dolja Biol Direct 2006 129

Viral hallmark genes arebullpresent in a huge diversity of viruses and other selfish elementsbullrepresented only by remote homologs in cellular life forms

-synergy with the diversity of genomic strategies

A crucial corollary If viruses come directly from a primordial gene pool then origin of viruses isinextricably linked to the origin of cells

Cell degeneration Escaped genes Pr imordial geneticsystems

CELL

SMALLPARASITICCELL

VERYSMALLVIRUS

CHROMOSOME

PLASMID

VIRUS

RNA

VIRUS

PRE-CELLULAR LIFE FORMS

DNA

VIRUS

mRNA

VIRUS

Competing concepts of the or igin of viruses

CELLS

Origin and evolution of virus-like genetic

elementsin the pre-cellular

era

Koonin Martin TIG 2005Koonin Senkevich Dolja Biol Direct 2006 129Koonin Ann NY Acad Sci 2009

Replicon fusionyields chromosomes

bullViruses and virus-like genetic elements are not ldquojustrdquo pathogens they aredominant entities in the biospherebullEmergence of virus-like parasites is inevitable in any replicating systembullIn the pre-cellular epoch the genetic elements that later became viral andcellular genomes comprised a single pool in which they mixed matched andevolved new increasingly complex gene ensemblesbullDifferent replication strategies including RNA replication reverse transcriptionand DNA replication evolved already in the primordial genetic poolbullWith the emergence of prokaryotic cells a distinct pool of viral genes formed that retained its identity ever since as evidenced by the extant distribution ofviral hallmark genes ldquovirus worldrdquo or the virospherebullThe emergence of the eukaryotic cell was a second melting pot of virus evolution from which viruses of eukaryotes originated via recombination of genes from prokaryote viruses retroelements and the evolving eukaryotic hostbullViruses make essential contributions to the evolution of the genomes of cellularlife forms in particular as vehicles of HGT GTAs transducing phages

The ancient Virus World (VW)

Koonin EV Senkevich TG Dolja VV The ancient Virus World and evolution of cells Biol Direct 2006 Koonin EV Wolf YI Nagasaki K Dolja VV The complexity of the virus world Nat Rev Microbiol 2009

Evolution of antivirus defense systems

bull CRISPRCas system of adaptive immunity in prokaryotes

bull A case for Lamarckian evolutionbull The perennial virus-host arms race

CRISPR repeats and Cas genesCRISPR Clustered Regularly interspaced short palindromic repeatsCas CRISPR-associated (genes)

Sorek et al Nature Rev Microbiol 2008

Makarova KS Aravind L Grishin NV Rogozin IB Koonin EVA DNA repair system specific for thermophilic Archaea and bacteria predicted by genomic context analysisNucleic Acids Res 2002 Jan 1530(2)482-96

During a systematic analysis of conserved gene context in prokaryotic genomes a previously undetected complex partially conserved neighborhood consisting of more than 20 genes was discovered in most Archaea and some bacteria including the hyperthermophiles Thermotoga maritima and Aquifex aeolicus The gene composition and gene order in this neighborhood vary greatly between species but all versions have a stable conserved core that consists of five genes One of the core genes encodes a predicted DNA helicase often fused to a predicted HD-superfamily hydrolase and another encodes a RecB family exonuclease three core genes remain uncharacterized but one of these might encode a nuclease of a new family helliphelliphelliphellipThe functional features of the proteins encoded in this neighborhood suggest that they comprise a previously undetected DNA repair system which to our knowledge is the first repair system largely specific for thermophiles to be identified

Protein components of the system an update with unification of many diverse families

~25 families altogetherFamily SubfamilyA Phyletic

distributionBComments

1 COG1518 COG1518 (cas1) All Putative novel nucleaseintegrase Mostly α-helical protein

2 COG1343 COG1343 (cas2) COG3512ygbF-like

MTH324-likey1723_N-like

All Small protein related to VapD fused to helicase (COG1203) iny1723-like proteins

3 COG1203 COG1203 (cas3) All DNA helicase Most proteins have fusion to HD nuclease

4 RecB-like nuclease

COG1468 (cas4) COG4343 All RecB-like nuclease Contains three-cysteine C-terminal cluster

5 RAMP Repair-associated mysterious

protein

COG1688 (cas5)COG1769 COG1583COG1567 COG1336COG1367 COG1604COG1337 COG1332COG5551

BH0337-likeMJ0978-likeYgcH-like

y1726-like y1727-like

All Belong to ldquoRAMPrdquo family possibly RNA-binding proteinstructurally related to duplicated ferredoxin fold (PDB 1wj9)

6 COG1857 COG1857 COG3649 YgcJ-like y1725-like

All αβ protein predicted nuclease or integrase

7 HD-like nuclease

COG1203 (N-terminus) COG2254 All HD-like nuclease

8 BH0338 BH0338-likeMTH1090-like

All mostlyarchaea and FIRM

Large Zn-finger containing proteins probably nucleases (nucleaseactivity was shown for MTH1090 (ref)

9 COG1353 predicted

polymerase

COG1353 Most archaea some bacteria

Predicted palm-domain polymerase distantly related to viral RdRp and RT

hellipand ~15 other less common proteins Makarova et al BD 2006

CRISPR clustered regularly interspaced short palindromic repeats

CRISPR repeats

TB and TA

Strepto-like

Ecoli-like

Pasteurella-like

Cas2COG1343

Cas1COG1518

Cas4COG1468

Cas3COG1203

ldquoThe common structural characteristics of CRISPR loci are (i) the presence of multiple short direct repeats which show no or very little sequence variation within a given locus (ii) the presence of non-repetitive spacer sequences between the repeats of similar size (iii) the presence of a common leader sequence of a few hundred basepairs in most species harbouring multiple CRISPR loci (iv) the absence of long open reading frames within the locus and (v) the presence of the cas1 gene accompanied by the cas2 cas3 or cas4 genes in CRISPR-containing speciesrdquo

The cas genes are our ldquorepairrdquo system

ldquoHelicaserdquo cassette

ldquoPolymeraserdquo cassette

paragon of prokaryotic genome plasticity-extensive gene shuffling-evidence of multiple horizontal transfers-widespread gene loss

RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins

extreme sequence diversity MOTIFs specific I II III IV V COGs 13361367 hhshhGs ust-lKGhh+hh hhGtt hD lGhttsGh 160413371332 y1726-like slhlpEKuVRGT lRTIDs YGuVTsGhuh COG1851 hGphpGpsaFh hGFGRh BH0337-like TpA-h+GIh-uIh hhLpDV LGsREhuht COG1567 hhhpp ups-lhtAhh luscoGhGh COG1769 hhhh+Ph-hhh ss-hhGhlhsh hGh hGtcp+hsthchp COG1688 (Cas5) hhhhhts sssshhGhlsh lGttphh COGs 15835551 hhhhoPhhl hGtppshGFGl YgcH-like hHphlh hGu+uhGhGhh y1727-like LHphLh GFsthGLStss MJ0978-like hHNH lG+tsuhGhGol

Ferredoxin foldRNA-recognition motif (RRM)

The sequence similarity space of CRISPR repeats visualized with the BioLayoutprogram [26] Dots denote individual repeat sequences connecting lines represent Smith-Waterman similarities such that closer dots represent more similar sequences Dot colors denote cluster association as derived from MCL clustering The 12 largest clusters are indicated by circles together with their sequence logos coarse phylogenetic composition and sample secondary structures where applicableKunin et al Genome Biology 2007 8R61 doi101186gb-2007-8-4-r61

CRISPR show extreme diversity and complex clustering

Arcful GTTGAAATC-------AGACCAAAATGGGATTGAAAG- Metthe GTTAAAATC-------AGACCAAAATGGGATTGAAAT- Pyraby GTTCCAATA-------AGACTAGAATAGAATTGAAAG- Pyrhor GTTTAATAA--------GACTAAAATAGAATTGAAAG- Sulsol GATTAATCC-------------CAAAAGAATTGAAAG- Pyrhor GTTTCCGTA--------GAACTAAATAGTGTGGAAAG- Pyraby GTTTCCGTA--------GAACTTAGTAGTGTGGAAAG- Pyraer GTTTCAACT------------ATCTTTTGATTTTTGG- Pictor GTTAAATAA-------TAACCTAAATAGGATTGAAAG- Theaer GTAAAATAG---------ACCTTAATAGGATTGAAAG- Metace GTTTCAATC-------CCTCAAAGGTCTGATTTTAAC- Sultok GATGAATCC------------CAAAAGGAATTGAAAG- Sulaci GTTTTAGTT-------------TCTTGTCGTTATTAC- Pictor GTTTAAGAA--------TTACTAGATAGTATGGAGT-- Halmar GTTTCAGAC-------GGACCCTTGTGGGGTTGAAGC- Metjan ATTAAAATC-------AGACCGTTTCGGAATGGAAAT- Metkan GTTTCATTA-CCCGTATTATTACGGGTTAATTGCGAG- Nanequ CTTTCAATA---------TTTCTAATATATTAGAAAC- Themar GTTTCAATA-------CTTCCTTAGAGGTATGGAAAC- Theten GTTTCAATC-------CCTTTTAGGTAGGCTAAAAAC Thethe GTTGCAAAC-CTCGTTAGCCTCGTAGAGGATTGAAAC- Aquaeo GTTTTAACT--------CCACACGGTACATTAGAAAC- Bachal GTCGCACTC----TATATGGGT-GCGTGGATTGAAAT- Azoarcus GTGTTCCCC-------GCGCATCGCGGGGGTTGAAG-- Chltep GTCTTCCCC--------ACGCC-CGTGGGGGTGTTTC- Chrvio GTGCTCCCC-------CACGCA-CGTGGGGATGAACCG Clotet GTATTAGTA-------GCACCA-TATTGGAATGTAAAT Anabaena GTTTTAATTAACAAAAATCCCTATCAGGGATTGAAAC- Myctub GTTTCCGTC--CCCTCTCGGGGTTTTGGGTCTGACGAC Yerpes GTTCACTGC--------CGCACAGGCAGCTTAGAAA-- Metcap GTTTCAATCCACTCCCGGCTATTTAGCCGGGAGATAC- Mycgal GTTTTAGCA-CTGTACAATACTTGTGTAAGCAATAAC- Pasmul GTTAACTGC--------CGTATAGGCAGCTTAGAAA-- Pasmul GTTGTAGTTCCCTCTCTCATTTCGCAGTGCTACAAT-- Pasmul GTTCACCAT--------CGTGTAGATGGCTTAGAAA--

All

A large subset of CRISPRs is conserved among diverse specieseven between archaea and bacteria suggesting that

CRISPRs are horizontally transferred together with cas genes

Mojica FJ Diez-Villasenor C Garcia-Martinez J Soria E

Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elementsJ Mol Evol 2005 Feb60(2)174-82

Here we show that CRISPR spacers derive from preexisting sequences either chromosomal or within transmissible genetic elements such as bacteriophages and conjugative plasmids

helliphelliphelliphellipThe transcription of the CRISPR loci (Tang et al 2002) suggests that such activity could be executed by CRISPR-RNA molecules acting as regulatory RNA that specifically recognizes the target through the homologous RNA-spacer sequence similarly to the eukaryotic interference RNA

HypothesisCRISPRCasbull is a prokaryotic immunity system that functions on the

RNAi principle

bull integrates short fragments of essential phageplasmid genes into CRISPRs

bull When expressed these fragments (psiRNA ndash after prokaryotic siRNA) silence the target gene and make the organism immune to the respective agent

bull contains all or most of the protein activities involved in these processes

bull Some of the Cas proteins are functional analogs of the eukaryotic proteins involved in RNAi in particular components of RISC (RNA-Induced Silencing Complex) and form prokaryotic analogs of RISC (pRISCs)

Transcription

(protein-guided) RNA folding

specific transcription factor (COG1517)

cellular RNApol

polycistronic pre-psiRNA

p-dicer (Helicase+HD-Hydrolase)(COG1203 COG2254)

RNA processing(fast)

pre-psiRNA75-100 nt

RNA processing(slow)

p-dicer

psiRNA25-45 nt

p-slicer(COG1468 COG4343

COG1857)

RAMP

target RNA cleavage

p-RISC

plasmid or phage mRNA

Annealing to RNA target

p-RISC

The basic scheme of CASS functioning

psiRNA25-45 nt

RAMP

RAMP-RNA complex(unstable)plasmid or phage

mRNA

Primer elongationCASS RNApol(COG1353)

Amplified annealing

long dsRNA (stable)

p-dicerDuplex degradation

RAMPRAMP binding

Cycle continues

Annealing to RNA target

Variant of CASS functioning with polymerasepsi-RNA amplification(mostly in thermophiles but also in Mycobacteria)

plasmid or phage mRNA

pre-psiRNA75-100 nt

CASS RNApol(COG1353)Or other RT

Reverse transcription with random copy choice

Random RNA recombination and reverse transcription

dsDNA with CRISPRs and target-derived spacers

Homologous recombination with genomic CRISPR region

integrase(COG1518)

genomic DNA

genomic DNA with new target-derived spacer

OR

New psiRNA generation

Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12

Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity

Key validation

bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease

Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes

(1) Adaptation

(2) Expression amp Processing

(3) Interference

invading virus plasmid

CRISPRleader

5 4 3 2 1

6 5 4 3 2 1

invading virus plasmid

invading virus plasmid

host

host

cas operon

Cascade

Cas3

cas3 cas2cas1cas5

Van der Oost et al TIBS 2009

The three types of CRISPRCas systems and their signature genes

1343

1343

1343

1343

Helicase1203

HD-f

Cas2

Cas6

Cas6

Cas5

Cas8

Cas9

Cas1

Cas3

Cas10

Cas11

Cas7

Cas6

Cas6

Cas5 Cas6

Cas4

Cas6

13433513

1343

RuvC-f

3513

HNH

13433513

1857

18571857 RAMP1688

RAMP1688

RecB-f1468

RecB-f1468

Helicase1203

R AMP1583

1857 1857Helicase1203

1857Helicase1203

1343 y1724

ygcKygcL

EcoliCASS2

YpestCASS3

NmeniCASS4

MtubeCASS6 RAMP module

DvulgCASS1

Tneap and HmariCASS7

ApernCASS5

Type I (Helicase cassette)

Cascade

Type II (HNH-type)

Type III (RAMP module or Polymerase cassette)

CASS4a

1857 1343RecB-f1468

Helicase1203

BH0338MTH1090LA3191

1421 RAMP1567

RAMP15833337RAMP

1583

RAMP1583

RAMP1769

SPy1049

Cas6

Cas61857

RecB-f1468

RAMP1688

Helicase1203

AF0070

R AMP1583

1343

1517

Cas12 Cas13

RAMP1688

RAMP1688

RAMP1688

Makarova et al NRM submitted

Experimental data on CRISPRCas systems

1857

1857

1857

RAMP1688

10

Cordip DIP0037Bacfra BF3951

Wolsuc WS1444Camjej Cj1522c

Neimen NMA0630Pasmul PM1126

Strthe str0658Strpyo Spy 0770

Treden TDE0328Staepi SERP2463

Mycgal MGA 0523Mycmob MMOB0320

Porgin PG1982Legpne lpp0161

Wolsuc WS1615Sulsol SSO1450

Aerper APE1240Sulsol SSO1405

Pyraer PAE0200Pyraer PAE0081

Metmaz MM0559Metmaz MM3249

Arcful AF1878Mansuc MS1635

Niteur NE0111Myctub Rv2817c

Nostoc alr0381Synech slr7071Thethe TTHB145

Chrvio CV1229Xanaxo XAC3842

Chltep CT1130Desvul DVUA0134

Metcap MCA0651Azoarc ebA3283

Deigeo_DRAFT_1682Mansuc MS0981

Baccla ABC3592Bachal BH0341

Strpyo Spy 1286Vibvul VVA1544

Nostoc alr1468Synech slr7092

Synech slr7016Geosul GSU0057

Thethe TTHB224Nostoc alr1568

Metkan MK1312

Esccol b2755Saltyp STM2938

Phopro PBPRC0034Geosul GSU1392

Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639

Symthe_STH663Corjei jk0643

Nocfar nfa44220Strave SAV7537

Metcap MCA0930Cordip DIP2214

Chrvio CV1756Zymmob ZMO0680

Pasmul PM0311Erwcar ECA3679

Yerpes y1722Legpne lpl2837Phopro PBPRB1995

Acinet ACIAD2484

Clotet CTC01148Fusnuc FN1177

Aquaeo aq 369Theten TTE2658Themar TM1797

Bacfra BF2544Porgin PG2014

Halmar pNG4053Clotet CTC01463

Metjan MJ0378Pyrhor PH1245

Thekod TK0455Metthe MTH1084

Arcful AF2435Metace MA3670

Nanequ NEQ017Pyrhor PH0173

Pictor PTO0003Pictor PTO0049Thevol TVN0106

1343

1343

1343

1343

1518

1518

1343

1518

RecB-f4343

RecB-f1468

RecB-f1468

RecB-f1468

Helicase12031857 RAMP

1688

RAMP1688

Helicase1203

Helicase1203

RAMPBH0337

RAMPYgcH

RAMPy1727

RAMPy17261857

Helicase1203

HD-f

Helicase1203

HD-f

1343

HD-f

HD-f

HD-f

RuvC-f

AF0070

BH0338

y1724

ygcKygcL

RAMP1583

HTH

AF1870

SPy1049

3574

AF1870

PH0918

HTH 3574

1857 RAMP1688

RecB-f1468

Helicase1203

HD-f

AF1870

MTH1090

AF1873

RAMP1583

3574

AF1870

3574

1343

1

7

2

3

4

5

6

7a

Sulsol SSO1999Sulsol SSO1440

Sultok ST2639Sultok ST0032

Sulsol SSO1402Pyraer PAE0068

Arcful AF1874Pyrhor PH0917Thekod TK0450

Pyraby PAB1689

13433513

3513

1518

RuvC-f

HNH

4aHNH

1 0

Pyrhor PH0162

6

64

6

4

16

17

7

7

7

7

1

2

7

7

7

7

5 7

5 75 7

5 75 7

5 7

6 1

7

Metkan MK1314

Metace MA1931

Metmaz MM3356

Metbar Mbar_A1355

Strthe stu0960Myctub Rv2823cStaepi SERP2461

Themar TM1811Mansuc MS1651

Niteur NE0123Thethe TTP0102

Metthe MTH1082

Metthe MTH326

Metjan MJ1672Pictor PTO0053Thevol TVN0112

Metkan MK1297Thethe TTP0115

Bachal BH0328Synech sll7090

Aquaeo aq 387Theten TTE2639

Themar TM1794Arcful AF1867

Pyrfur PF1129Thefus Tfu1578

Porgin PG1987Sultok ST1979

Sulsol SSO1991Sulsol SSO1729

Sultok ST0012Sulaci Saci 2046Nostoc all1479

1421 RAMP1567

RAMP1583

RAMPMTH323

RAMP1769

XMTH324

ldquoPolymeraserdquo cassette

ldquoHelicaserdquo cassette(A)

(B)

(C)

Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4

Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56

Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)

mRNA targeting

Phylogeny of Cas1 and the 3 types of CRISPRCas systems

228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes

EcoliCASS2Type IType I

Type I

Type I

Type I

Type III

Type III

Type III

Type III

Type III

Type III

Type II

YpestCASS3

NmeniCASS4

PolymeraseRAMPmoduleMtubeCASS6

PolRAMP

DvulgCASS1

TneapHmariCASS7

ApernCASS5

cas1cas9 cas2 cas4

cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI

II

IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2

RAMP

Makarova et al NRM submitted

CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria

bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes

54

256

1537

26

516

2

9

7 56107

3

10

13

383

210

46

216

8

1

7 70211

10

1

0102030405060708090

100

1623

84 6

17 7

2322

0

9

0

0

151

2

1

21

17

20 1

0

1533 28 7 14

0

9 7 40

117 210

0102030405060708090

100

Type I Type II Type IIICas1presenceabsence

Makarova et al in preparation

Modular evolution of the 3 types of CRISPRCas

Makarova et al in preparation

Mol Microbiol 2011 Jan79(2)484-502

A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair

Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF

Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair

Back to repair functions

The 3 major modalities of evolution

Koonin Wolf Biol Direct 2009

CRISPRCas as a bona fide Lamarckian system

Koonin Wolf Biol Direct 2009

Phenomenon Biological rolefunction

Phyletic spread

Lamarckian criteria

Genomic changes caused by environmental factor

Changes are specific to relevant genomic loci

Changes provide adaptation to the causative factor

Bona fide LamarckianCRISPRCas Defense against

viruses and other mobile elements

Most of the Archaeaand many bacteria

Yes Yes Yes

piRNA Defense against transposable elements in germline

Animals Yes Yes Yes

HGT (specific cases)

Adaptation to new environment stress response resistance

Archaea bacteria unicellular eukaryotes

Yes Yes Yes

Quasi-LamarckianHGT (general phenomenon)

Diverse innovations Archaea bacteria unicellular eukaryotes

Yes No Yesno

Stress-induced mutagenesis

Stress responseresistanceadaptation to new conditions

Ubiquitous Yes No or partially

Yes (but general evolvabilityenhanced as well)

Diverse Lamarckian and quasi-Lamarckian phenomena

Koonin Wolf Biol Direct 2009

Stress as a gauge of evolutionary modality

Koonin Wolf Biol Direct 2009

bullEvolution of parasites is intrinsic to any replicator system

bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since

bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms

bullPerennial arms race between parasites and hosts is one of the principal factors of evolution

Valerian DoljaOregon State University

Yuri Wolf NCBITatiana Senkevich NIAID NIH

Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille

Didier Raoult et al

Bill MartinUniv Duesseldorf

Kira Makarova NCBI

  • Slide Number 1
  • What is a virus
  • Slide Number 3
  • Slide Number 4
  • Slide Number 5
  • Comparative genomics shows that viruses that cause human diseases belong to families that evolved hundreds of millions or even billions years agoViruses accompany the evolving cellular life throughout its history and might even predate it
  • Slide Number 7
  • Slide Number 8
  • Slide Number 9
  • Slide Number 10
  • Slide Number 11
  • Slide Number 12
  • Slide Number 13
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Slide Number 18
  • Slide Number 19
  • Slide Number 20
  • Slide Number 21
  • Slide Number 22
  • Slide Number 23
  • Slide Number 24
  • Slide Number 25
  • Slide Number 26
  • Slide Number 27
  • Slide Number 28
  • Slide Number 29
  • Slide Number 30
  • Slide Number 31
  • Slide Number 32
  • Slide Number 33
  • Slide Number 34
  • Slide Number 35
  • Slide Number 36
  • Evolution of antivirus defense systems
  • Slide Number 38
  • Slide Number 39
  • Slide Number 40
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins extreme sequence diversity
  • Slide Number 46
  • Slide Number 47
  • Slide Number 48
  • Slide Number 49
  • Hypothesis
  • Slide Number 51
  • Slide Number 52
  • Slide Number 53
  • Slide Number 54
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • The three types of CRISPRCas systems and their signature genes
  • Experimental data on CRISPRCas systems
  • Phylogeny of Cas1 and the 3 types of CRISPRCas systems
  • CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
  • Slide Number 62
  • Slide Number 63
  • Slide Number 64
  • Slide Number 65
  • Slide Number 66
  • Slide Number 67
  • Slide Number 68
  • Slide Number 69
Page 4: The Virus World, its evolution, evolution of antiviral defense

Mean of sequences with matches to major functional categories

0

5

10

15

Microbial Metagenomes

Vira

l Met

agen

omes

0 5 10 15

Nucleosides and nucleotides

Cell wall and capsule

Fatty acids and lipids

Membrane transport

Stress response

Aromatic compound

Cell division and cell cycle

Nitrogen metabolism

Sulphur metabolism

Motility and chemotaxis

Phosphorus metabolism

Potassium metabolism

Cell signalling

Secondary metabolismDNA metabolism

Carbohydrates

Amino acids

Virulence

Protein metabolism

Respiration

Photosynthesis

Cofactors etc

RNA metabolism

20

Key

Kristensen Mushegian Dolja Koonin Trends Microbiol 2010

Most of the viromes might not even consists of typical viruses butrather of pseudovirus particles that carry microbial genes (GTAs)

Comparative genomics shows that viruses that cause human diseases

belong to families that evolved hundreds of millions or even billions

years ago

Viruses accompany the evolving cellular life throughout its history and

might even predate it

Some viruses are comparable to cellular life forms in size and genetic complexity

Mimivirus genome (~12 Mbp ~1000 genes) is twice as large as that of Mycoplasma genitalium (580 kbp ~500 genes)

The largest most complex viruses NCLDV (Nucleo-Cytoplasmic Large DNA viruses of eukaryotes) ndashthis is where the smallpox virus belongs

Raoult et al Science 2004 Nov 19306(5700)1344 Suzan-Monti et al PLoS ONE 2007 Mar 282(3)

Iyer Aravind KooninCommon origin of four diverse families of large eukaryotic DNA viruses J Virol 2001 75 11720Iyer et al Evolutionary genomics of nucleo-cytoplasmic large DNA viruses Virus Res 2006 Apr117(1)156

The largest most complex viruses the Nucleocytoplasmic Large DNA Viruses (NCLDV)

(this is where the smallpox virus AND themimivirus belong)

6 families of NCLDVhellipand counting

size kb hosts-poxviridae 26 [134-360] vertebrates insects-asfarviridae 1 [170] vertebrates protists()-iridoviridae 8 [103-212] vertebrates insects protists()-ascoviridae 4 [119-174] insects-phycodnaviridae 9 [155-407] algae haptophytes stramenopiles -mimiviridae 2 [1181-1200] amoebozoa algae()-[Marseille virus] 1 [368] amoebozoa ndash new family

The case for the monophyly (common origin) of NCLDV

9 universally conserved hallmark genes (vaccinia gene names)-primase (D5-N)-helicase (D5-C)-DNA polymerase (E9)-packaging ATPase (A32)-Major capsid protein (D13 non-capsid in poxviruses)-Thiol-oxidoreductase (E10)-Helicase (D6 D11)-ST protein kinase (F10)-Transcription factor VLTF2 (A1)

47 genes mapped to the last common ancestor of NCLDVby maximum likelihood ndash all main functional systems represented

Iyer et al J Virol 2001 Virus Res 2006 Yutin et al Virol J 2009

b1 Helvib1 Spofr

b1 Trinil2 Invir

l1 Aedtal3 Lymch

l3 Lymdil5 Ambtil5 Frovi

l5 Singrl4 Infsp

m6 Marseillevirusn1 Acapon2 Mamav

q1 Acatuq1 ParFRq1 ParMTq1 ParAR

q1 ParNYq1 Parbu

q6 Ostviq3 Ectsiq3 Felspq2 Emihu

u2 Amsmou2 Melsa

u1 Bovpau1 Orfviu1 Deevi

u1 Swiviu1 Myxviu1 Rabfi

u1 Goaviu1 Lumsku1 Shevi

u1 Tanviu1 Yabliu1 Yabmo

u1 Vacviu1 Varvi

u1 Molcou1 Crovi

u1 Canviu1 Fowvi

c1 Afrsw

1000010000

93159970

10000

1000010000

6702

9983

10000

7557

10000

1000010000

999310000

10000

9910

10000

7074

10000

6735

10000

10000

7417

10000

4747

454010000

9840

1000010000

10000

10000

10000

9720

9911

5452

9871

10000

10000

10000

05

Phycodnaviridae

Poxviridae

Ascoviridaeamp

Iridoviridae

MimiviridaeMarseillevirus

Asfarviridae

Phylogeny of NCLDV based on concatenation of 4 universal genes primase-helicase DNA polymerase packaging ATPase VLTF2 transcription factor

HOSTAnimals+ diverse protists

Amoebozoa Algae animals()

ChlorophytesHaptophytesstramenopiles

Animals haptophytes

bullDivergence of NCLDVmost likely antedatesdivergence of eukaryoticsupergroupsbullAlternativeaddition extensive horizontal transfer of viruses

Boyer Yutin et al PNAS 2009Koonin Yutin Intervirology2010

Animals

Boyer M Yutin N Pagnier I Barrassi L Fournous G Espinosa L Robert C AzzaS Sun S Rossmann MG Suzan-Monti M La Scola B Koonin EV Raoult DPNAS 2009106(51)21848-53

Giant Marseillevirus highlights the role of amoebae as a melting pot in emergence of chimeric microorganisms

Giant viruses such as Mimivirus isolated from amoeba found in aquatic habitatsshow biological sophistication comparable to that of simple cellular life formsand seem to evolve by similar mechanisms including extensive gene duplicationand horizontal gene transfer (HGT) possibly in part through a viral parasitethe virophage We report here the isolation of Marseille virus a previouslyuncharacterized giant virus of amoeba The virions of Marseillevirus encompass a 368-kb genome a minimum of 49 proteins and some messenger RNAs Phylogenetic analysis of core genes indicates that Marseillevirus is the prototypeof a family of nucleocytoplasmic large DNA viruses (NCLDV) of eukaryotes The genome repertoire of the virus is composed of typical NCLDV core genes and genes apparently obtained from eukaryotic hosts and their parasites or symbionts both bacterial and viral We propose that amoebae are melting potsldquoof microbial evolution where diverse forms emerge including giant viruses with complex gene repertoires of various origins

The mosaic composition of the Marseille virus genome

Gene content tree intervirus gene transfer

Amoeba as a melting pot for HGT between viruses andbacterial endosymbionts

There are really weird creatures out thereSome NCLDV host their own parasites

La Scola et al The virophage as a unique parasite of the giant mimivirus Nature 2008

Chimeric origin of the virophage genome

NCLDV

Mimivirus

Archaeal virus

NCLDVenvironmental only

Bacterialtransposon

La Scola et al Nature 2008 455100-4

Hypothesisorigin of the NCLDV (and other viruses ofEukaryotes) in the

the second melting pot of virus evolution ndasheukaryogenesis

Phage scaffold(virus hallmark genes)

Eukaryotic additionsdisplacements

Koonin Yutin Intervirology 2010

Poliovirus74 kb

VP0 3DPol2C 3CPro Angg

IRESVP1VP3 2APro 2B

3A 3B

Jelly-roll CPsSuperfamily 3

helicase RdRp

Viral hallmark genes

3C-Pro3DPol

RdRp3CPro g

VPg2C

S3H

Picornaviral lsquosignaturersquo genes

Protein sequence conservation

Some of the smallest viruses (this is where poliovirus belongs) The Big Bang of picorna-like virus evolution

Koonin Wolf Nagasaki Dolja Nat Rev Microbiol 2008 Dec6(12)925-39

Picorna-likeviruses possessdiverse arraysof hallmark andunique genes

Picorna-likeviral superfamily

Marine eukaryotic plankton carries a wealth ofpositive-strand RNA viruses nearly all belong to the picorna-like superfamily

Lang et al (2004) Virology 320206

HcRNAV44 kb

VP

HaRNAV86 kb

S-Pro Pol

VP1VP3VP2PolHel3

RsRNAV89 kb AnPolHel3 VP 1-3

SssRNAV90 kb An

PolHel3 VP 1-3C-Pro

sg sg

An

Nagasaki et al (2005) Appl Env Microbiol 718888

Culley et al (2006) Science 3121795

The amended Picornavirus-like

superfamily includes 14 recognized viral families

4 floating genera and15 unclassified positive-

strand anddouble-strand RNA

viruses that infect hosts from 4 of the 5 eukaryotic

supergroups-6 distinct clades

from RdRp phylogeny-diverse genome layouts

View from the viral side

6 major clades of picorna-likeviruses - 5 infect eukaryotesfrom different supergroups

The 5 supergroups of eukaryotes and their picorna-like viruses

Complementary view from thehost side

4 of the 5 eukaryoticsupergroups host diverse picorna-like viruses (no onechecked the 5th)

PolioV74 kb

VP0 3DPol2C 3CPro Angg

IRESVP1VP3 2APro 2B

3A 3B

Jelly-roll CPsSuperfamily 3

helicase RdRp

DNA phages Bacterial group II

retroelements

Bacterial and mitochondrialHtrA family proteases

Likely origins of the signature genes of the picorna-like superfamily were inferred usingsignature PSSMs and PSI-BLAST searches

3C-Pro

Hallmark genes of picorna-like viruses have distinct prokaryoticorigins

Radiation of major viralclades occurred in aldquoBig Bangrdquo during eukaryogenesis and antedates the divergenceof eukaryotic supergroups-the viruses then ldquosampledrdquo the hosts

Conclusions on the picorna and NCLDV storiesBig Bangs of virus evolution

bullPhylogenomic analysis of the picorna-like superfamily of eukaryotic RNA viruses indicates that the major lineages within this superfamily diverged prior to the divergence of the eukaryotic supergroupsbullBig Bang ndash an explosive early phase of viral evolutionbullMost likely the same pattern holds for other major groups ofviruses as illustrated by the evolutionary study of NCLDVa completely different group of virusesbullThe Big Bangs of eukaryotic virus evolution occurred concomitantly with a similar rapid phase of host evolution andcould be a manifestation of a general model of major evolutionarytransitions

Koonin The Biological Big Bang model for the major transitions in evolutionBiol Direct 2007 Aug 20221

Koonin Wolf Nagasaki Dolja Nat Rev Microbiol 2008 Dec6(12)925-39

+

1 Positive-strandRNA

3-30 kb - +R R ET

T+R

Class Replication cycle

2 Double-strandRNA

4-25 kb plusmnTr R E

T

+ plusmn

3 Negative-strandRNA

11-20 kbR R E

RdRp CPT

+-Tr -

+

Genetic cycles of RNA viruses

Diversity of viral genetic cycles versus the uniform genetic cycle of all cellular organisms Viruses are the

biospherersquos laboratory of genomic strategies

RdRp CP

RdRp CP

+4 Retroid

RNA viruses 7-12 kb +RT Tr E

T+Tr

plusmn

5 Retroid DNAviruses

elements2-10 kb +Tr

T+Tr

plusmnE RT

plusmn

RT CP

RT

Class Replication cycle

Genetic cycles of retroid viruses and retroelements

Genetic cycles of DNA viruses and plasmids 6 ssDNAvirusesplasmids2-11 kb

7 dsDNA virusesplasmids

5-1200 kb

+RCR RCR E

+Tr

plusmn +

TRCE

Tr

plusmn

+T

plusmnR E

Pr-Pol ALL CELLULARORGANISMS

YOU ARE HERE

I Genes with readily detectable homologs from cellular life forms 1 Genes with closely related homologs from cellular organisms

(typically the host of the given virus) present in a narrow group of viruses2 Genes that are conserved within a virus lineage or even several

lineages and have moderately close cellular homologsOrigin relatively recent (1) or ancient (2) acquisition from host

II Virus-specific genes3 ORFans ie genes without detectable homologs except possibly

in closely related viruses 4 Virus-specific genes that are conserved within a virus lineageAcquisition from host but with rapid divergence from ancestor once within viral genomes

III Viral hallmark genes5 Genes shared by many diverse virus lineages with only very distant

homologs in cellular organisms

Natural history of viral genes a one-page summary of viral comparative genomics

Contributions of different classes of viral genes to the genomes of different classes of viruses strong dependence on genome size

Genome size (log) 0 1 2 3

Recent acquisitions

Old acquisitions

Virus-specific ORFans

Virus-specific conserved

Virus hallmark

Most RNAvirusesretroelementsRCR replicons

Large RNA virusesadenotailedphages

NCLDVherpeslargephages

Natural history of viral genesViral Hallmark Genes

Shared by many diverse groups of viruses from the smallest RNA viruses to the giant DNA viruses

Strong support for monophyly of all viral members of the respective gene families

Only distant homologs in cellular organisms

Can be viewed as signatures of the lsquovirus statersquo

Play major roles in genome replication packagingand assembly

4 Rolling circle replicationinitiation endonuclease

1 Jelly-roll capsid protein

2 Superfamily 3 helicase

Protein products of viral hallmark genes

3 RNA-dependent RNA polymeraseand Reverse transcriptase

5 Viral archaeo-eukaryotic DNA primase

6 UL9-like superfamily 2 helicase

7 Packaging ATPase of the FtsK family

8 ATPase subunit of terminase

The primordial gene pool hypothesis (extremely counterintuitive ndash Santiago Elena Feb 16 2011)

The hallmark genes AND by implication the major lineages of modern viruses (at least viruses of prokaryotes) descend directly from a primordial gene pool

Koonin Senkevich Dolja Biol Direct 2006 129

Viral hallmark genes arebullpresent in a huge diversity of viruses and other selfish elementsbullrepresented only by remote homologs in cellular life forms

-synergy with the diversity of genomic strategies

A crucial corollary If viruses come directly from a primordial gene pool then origin of viruses isinextricably linked to the origin of cells

Cell degeneration Escaped genes Pr imordial geneticsystems

CELL

SMALLPARASITICCELL

VERYSMALLVIRUS

CHROMOSOME

PLASMID

VIRUS

RNA

VIRUS

PRE-CELLULAR LIFE FORMS

DNA

VIRUS

mRNA

VIRUS

Competing concepts of the or igin of viruses

CELLS

Origin and evolution of virus-like genetic

elementsin the pre-cellular

era

Koonin Martin TIG 2005Koonin Senkevich Dolja Biol Direct 2006 129Koonin Ann NY Acad Sci 2009

Replicon fusionyields chromosomes

bullViruses and virus-like genetic elements are not ldquojustrdquo pathogens they aredominant entities in the biospherebullEmergence of virus-like parasites is inevitable in any replicating systembullIn the pre-cellular epoch the genetic elements that later became viral andcellular genomes comprised a single pool in which they mixed matched andevolved new increasingly complex gene ensemblesbullDifferent replication strategies including RNA replication reverse transcriptionand DNA replication evolved already in the primordial genetic poolbullWith the emergence of prokaryotic cells a distinct pool of viral genes formed that retained its identity ever since as evidenced by the extant distribution ofviral hallmark genes ldquovirus worldrdquo or the virospherebullThe emergence of the eukaryotic cell was a second melting pot of virus evolution from which viruses of eukaryotes originated via recombination of genes from prokaryote viruses retroelements and the evolving eukaryotic hostbullViruses make essential contributions to the evolution of the genomes of cellularlife forms in particular as vehicles of HGT GTAs transducing phages

The ancient Virus World (VW)

Koonin EV Senkevich TG Dolja VV The ancient Virus World and evolution of cells Biol Direct 2006 Koonin EV Wolf YI Nagasaki K Dolja VV The complexity of the virus world Nat Rev Microbiol 2009

Evolution of antivirus defense systems

bull CRISPRCas system of adaptive immunity in prokaryotes

bull A case for Lamarckian evolutionbull The perennial virus-host arms race

CRISPR repeats and Cas genesCRISPR Clustered Regularly interspaced short palindromic repeatsCas CRISPR-associated (genes)

Sorek et al Nature Rev Microbiol 2008

Makarova KS Aravind L Grishin NV Rogozin IB Koonin EVA DNA repair system specific for thermophilic Archaea and bacteria predicted by genomic context analysisNucleic Acids Res 2002 Jan 1530(2)482-96

During a systematic analysis of conserved gene context in prokaryotic genomes a previously undetected complex partially conserved neighborhood consisting of more than 20 genes was discovered in most Archaea and some bacteria including the hyperthermophiles Thermotoga maritima and Aquifex aeolicus The gene composition and gene order in this neighborhood vary greatly between species but all versions have a stable conserved core that consists of five genes One of the core genes encodes a predicted DNA helicase often fused to a predicted HD-superfamily hydrolase and another encodes a RecB family exonuclease three core genes remain uncharacterized but one of these might encode a nuclease of a new family helliphelliphelliphellipThe functional features of the proteins encoded in this neighborhood suggest that they comprise a previously undetected DNA repair system which to our knowledge is the first repair system largely specific for thermophiles to be identified

Protein components of the system an update with unification of many diverse families

~25 families altogetherFamily SubfamilyA Phyletic

distributionBComments

1 COG1518 COG1518 (cas1) All Putative novel nucleaseintegrase Mostly α-helical protein

2 COG1343 COG1343 (cas2) COG3512ygbF-like

MTH324-likey1723_N-like

All Small protein related to VapD fused to helicase (COG1203) iny1723-like proteins

3 COG1203 COG1203 (cas3) All DNA helicase Most proteins have fusion to HD nuclease

4 RecB-like nuclease

COG1468 (cas4) COG4343 All RecB-like nuclease Contains three-cysteine C-terminal cluster

5 RAMP Repair-associated mysterious

protein

COG1688 (cas5)COG1769 COG1583COG1567 COG1336COG1367 COG1604COG1337 COG1332COG5551

BH0337-likeMJ0978-likeYgcH-like

y1726-like y1727-like

All Belong to ldquoRAMPrdquo family possibly RNA-binding proteinstructurally related to duplicated ferredoxin fold (PDB 1wj9)

6 COG1857 COG1857 COG3649 YgcJ-like y1725-like

All αβ protein predicted nuclease or integrase

7 HD-like nuclease

COG1203 (N-terminus) COG2254 All HD-like nuclease

8 BH0338 BH0338-likeMTH1090-like

All mostlyarchaea and FIRM

Large Zn-finger containing proteins probably nucleases (nucleaseactivity was shown for MTH1090 (ref)

9 COG1353 predicted

polymerase

COG1353 Most archaea some bacteria

Predicted palm-domain polymerase distantly related to viral RdRp and RT

hellipand ~15 other less common proteins Makarova et al BD 2006

CRISPR clustered regularly interspaced short palindromic repeats

CRISPR repeats

TB and TA

Strepto-like

Ecoli-like

Pasteurella-like

Cas2COG1343

Cas1COG1518

Cas4COG1468

Cas3COG1203

ldquoThe common structural characteristics of CRISPR loci are (i) the presence of multiple short direct repeats which show no or very little sequence variation within a given locus (ii) the presence of non-repetitive spacer sequences between the repeats of similar size (iii) the presence of a common leader sequence of a few hundred basepairs in most species harbouring multiple CRISPR loci (iv) the absence of long open reading frames within the locus and (v) the presence of the cas1 gene accompanied by the cas2 cas3 or cas4 genes in CRISPR-containing speciesrdquo

The cas genes are our ldquorepairrdquo system

ldquoHelicaserdquo cassette

ldquoPolymeraserdquo cassette

paragon of prokaryotic genome plasticity-extensive gene shuffling-evidence of multiple horizontal transfers-widespread gene loss

RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins

extreme sequence diversity MOTIFs specific I II III IV V COGs 13361367 hhshhGs ust-lKGhh+hh hhGtt hD lGhttsGh 160413371332 y1726-like slhlpEKuVRGT lRTIDs YGuVTsGhuh COG1851 hGphpGpsaFh hGFGRh BH0337-like TpA-h+GIh-uIh hhLpDV LGsREhuht COG1567 hhhpp ups-lhtAhh luscoGhGh COG1769 hhhh+Ph-hhh ss-hhGhlhsh hGh hGtcp+hsthchp COG1688 (Cas5) hhhhhts sssshhGhlsh lGttphh COGs 15835551 hhhhoPhhl hGtppshGFGl YgcH-like hHphlh hGu+uhGhGhh y1727-like LHphLh GFsthGLStss MJ0978-like hHNH lG+tsuhGhGol

Ferredoxin foldRNA-recognition motif (RRM)

The sequence similarity space of CRISPR repeats visualized with the BioLayoutprogram [26] Dots denote individual repeat sequences connecting lines represent Smith-Waterman similarities such that closer dots represent more similar sequences Dot colors denote cluster association as derived from MCL clustering The 12 largest clusters are indicated by circles together with their sequence logos coarse phylogenetic composition and sample secondary structures where applicableKunin et al Genome Biology 2007 8R61 doi101186gb-2007-8-4-r61

CRISPR show extreme diversity and complex clustering

Arcful GTTGAAATC-------AGACCAAAATGGGATTGAAAG- Metthe GTTAAAATC-------AGACCAAAATGGGATTGAAAT- Pyraby GTTCCAATA-------AGACTAGAATAGAATTGAAAG- Pyrhor GTTTAATAA--------GACTAAAATAGAATTGAAAG- Sulsol GATTAATCC-------------CAAAAGAATTGAAAG- Pyrhor GTTTCCGTA--------GAACTAAATAGTGTGGAAAG- Pyraby GTTTCCGTA--------GAACTTAGTAGTGTGGAAAG- Pyraer GTTTCAACT------------ATCTTTTGATTTTTGG- Pictor GTTAAATAA-------TAACCTAAATAGGATTGAAAG- Theaer GTAAAATAG---------ACCTTAATAGGATTGAAAG- Metace GTTTCAATC-------CCTCAAAGGTCTGATTTTAAC- Sultok GATGAATCC------------CAAAAGGAATTGAAAG- Sulaci GTTTTAGTT-------------TCTTGTCGTTATTAC- Pictor GTTTAAGAA--------TTACTAGATAGTATGGAGT-- Halmar GTTTCAGAC-------GGACCCTTGTGGGGTTGAAGC- Metjan ATTAAAATC-------AGACCGTTTCGGAATGGAAAT- Metkan GTTTCATTA-CCCGTATTATTACGGGTTAATTGCGAG- Nanequ CTTTCAATA---------TTTCTAATATATTAGAAAC- Themar GTTTCAATA-------CTTCCTTAGAGGTATGGAAAC- Theten GTTTCAATC-------CCTTTTAGGTAGGCTAAAAAC Thethe GTTGCAAAC-CTCGTTAGCCTCGTAGAGGATTGAAAC- Aquaeo GTTTTAACT--------CCACACGGTACATTAGAAAC- Bachal GTCGCACTC----TATATGGGT-GCGTGGATTGAAAT- Azoarcus GTGTTCCCC-------GCGCATCGCGGGGGTTGAAG-- Chltep GTCTTCCCC--------ACGCC-CGTGGGGGTGTTTC- Chrvio GTGCTCCCC-------CACGCA-CGTGGGGATGAACCG Clotet GTATTAGTA-------GCACCA-TATTGGAATGTAAAT Anabaena GTTTTAATTAACAAAAATCCCTATCAGGGATTGAAAC- Myctub GTTTCCGTC--CCCTCTCGGGGTTTTGGGTCTGACGAC Yerpes GTTCACTGC--------CGCACAGGCAGCTTAGAAA-- Metcap GTTTCAATCCACTCCCGGCTATTTAGCCGGGAGATAC- Mycgal GTTTTAGCA-CTGTACAATACTTGTGTAAGCAATAAC- Pasmul GTTAACTGC--------CGTATAGGCAGCTTAGAAA-- Pasmul GTTGTAGTTCCCTCTCTCATTTCGCAGTGCTACAAT-- Pasmul GTTCACCAT--------CGTGTAGATGGCTTAGAAA--

All

A large subset of CRISPRs is conserved among diverse specieseven between archaea and bacteria suggesting that

CRISPRs are horizontally transferred together with cas genes

Mojica FJ Diez-Villasenor C Garcia-Martinez J Soria E

Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elementsJ Mol Evol 2005 Feb60(2)174-82

Here we show that CRISPR spacers derive from preexisting sequences either chromosomal or within transmissible genetic elements such as bacteriophages and conjugative plasmids

helliphelliphelliphellipThe transcription of the CRISPR loci (Tang et al 2002) suggests that such activity could be executed by CRISPR-RNA molecules acting as regulatory RNA that specifically recognizes the target through the homologous RNA-spacer sequence similarly to the eukaryotic interference RNA

HypothesisCRISPRCasbull is a prokaryotic immunity system that functions on the

RNAi principle

bull integrates short fragments of essential phageplasmid genes into CRISPRs

bull When expressed these fragments (psiRNA ndash after prokaryotic siRNA) silence the target gene and make the organism immune to the respective agent

bull contains all or most of the protein activities involved in these processes

bull Some of the Cas proteins are functional analogs of the eukaryotic proteins involved in RNAi in particular components of RISC (RNA-Induced Silencing Complex) and form prokaryotic analogs of RISC (pRISCs)

Transcription

(protein-guided) RNA folding

specific transcription factor (COG1517)

cellular RNApol

polycistronic pre-psiRNA

p-dicer (Helicase+HD-Hydrolase)(COG1203 COG2254)

RNA processing(fast)

pre-psiRNA75-100 nt

RNA processing(slow)

p-dicer

psiRNA25-45 nt

p-slicer(COG1468 COG4343

COG1857)

RAMP

target RNA cleavage

p-RISC

plasmid or phage mRNA

Annealing to RNA target

p-RISC

The basic scheme of CASS functioning

psiRNA25-45 nt

RAMP

RAMP-RNA complex(unstable)plasmid or phage

mRNA

Primer elongationCASS RNApol(COG1353)

Amplified annealing

long dsRNA (stable)

p-dicerDuplex degradation

RAMPRAMP binding

Cycle continues

Annealing to RNA target

Variant of CASS functioning with polymerasepsi-RNA amplification(mostly in thermophiles but also in Mycobacteria)

plasmid or phage mRNA

pre-psiRNA75-100 nt

CASS RNApol(COG1353)Or other RT

Reverse transcription with random copy choice

Random RNA recombination and reverse transcription

dsDNA with CRISPRs and target-derived spacers

Homologous recombination with genomic CRISPR region

integrase(COG1518)

genomic DNA

genomic DNA with new target-derived spacer

OR

New psiRNA generation

Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12

Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity

Key validation

bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease

Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes

(1) Adaptation

(2) Expression amp Processing

(3) Interference

invading virus plasmid

CRISPRleader

5 4 3 2 1

6 5 4 3 2 1

invading virus plasmid

invading virus plasmid

host

host

cas operon

Cascade

Cas3

cas3 cas2cas1cas5

Van der Oost et al TIBS 2009

The three types of CRISPRCas systems and their signature genes

1343

1343

1343

1343

Helicase1203

HD-f

Cas2

Cas6

Cas6

Cas5

Cas8

Cas9

Cas1

Cas3

Cas10

Cas11

Cas7

Cas6

Cas6

Cas5 Cas6

Cas4

Cas6

13433513

1343

RuvC-f

3513

HNH

13433513

1857

18571857 RAMP1688

RAMP1688

RecB-f1468

RecB-f1468

Helicase1203

R AMP1583

1857 1857Helicase1203

1857Helicase1203

1343 y1724

ygcKygcL

EcoliCASS2

YpestCASS3

NmeniCASS4

MtubeCASS6 RAMP module

DvulgCASS1

Tneap and HmariCASS7

ApernCASS5

Type I (Helicase cassette)

Cascade

Type II (HNH-type)

Type III (RAMP module or Polymerase cassette)

CASS4a

1857 1343RecB-f1468

Helicase1203

BH0338MTH1090LA3191

1421 RAMP1567

RAMP15833337RAMP

1583

RAMP1583

RAMP1769

SPy1049

Cas6

Cas61857

RecB-f1468

RAMP1688

Helicase1203

AF0070

R AMP1583

1343

1517

Cas12 Cas13

RAMP1688

RAMP1688

RAMP1688

Makarova et al NRM submitted

Experimental data on CRISPRCas systems

1857

1857

1857

RAMP1688

10

Cordip DIP0037Bacfra BF3951

Wolsuc WS1444Camjej Cj1522c

Neimen NMA0630Pasmul PM1126

Strthe str0658Strpyo Spy 0770

Treden TDE0328Staepi SERP2463

Mycgal MGA 0523Mycmob MMOB0320

Porgin PG1982Legpne lpp0161

Wolsuc WS1615Sulsol SSO1450

Aerper APE1240Sulsol SSO1405

Pyraer PAE0200Pyraer PAE0081

Metmaz MM0559Metmaz MM3249

Arcful AF1878Mansuc MS1635

Niteur NE0111Myctub Rv2817c

Nostoc alr0381Synech slr7071Thethe TTHB145

Chrvio CV1229Xanaxo XAC3842

Chltep CT1130Desvul DVUA0134

Metcap MCA0651Azoarc ebA3283

Deigeo_DRAFT_1682Mansuc MS0981

Baccla ABC3592Bachal BH0341

Strpyo Spy 1286Vibvul VVA1544

Nostoc alr1468Synech slr7092

Synech slr7016Geosul GSU0057

Thethe TTHB224Nostoc alr1568

Metkan MK1312

Esccol b2755Saltyp STM2938

Phopro PBPRC0034Geosul GSU1392

Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639

Symthe_STH663Corjei jk0643

Nocfar nfa44220Strave SAV7537

Metcap MCA0930Cordip DIP2214

Chrvio CV1756Zymmob ZMO0680

Pasmul PM0311Erwcar ECA3679

Yerpes y1722Legpne lpl2837Phopro PBPRB1995

Acinet ACIAD2484

Clotet CTC01148Fusnuc FN1177

Aquaeo aq 369Theten TTE2658Themar TM1797

Bacfra BF2544Porgin PG2014

Halmar pNG4053Clotet CTC01463

Metjan MJ0378Pyrhor PH1245

Thekod TK0455Metthe MTH1084

Arcful AF2435Metace MA3670

Nanequ NEQ017Pyrhor PH0173

Pictor PTO0003Pictor PTO0049Thevol TVN0106

1343

1343

1343

1343

1518

1518

1343

1518

RecB-f4343

RecB-f1468

RecB-f1468

RecB-f1468

Helicase12031857 RAMP

1688

RAMP1688

Helicase1203

Helicase1203

RAMPBH0337

RAMPYgcH

RAMPy1727

RAMPy17261857

Helicase1203

HD-f

Helicase1203

HD-f

1343

HD-f

HD-f

HD-f

RuvC-f

AF0070

BH0338

y1724

ygcKygcL

RAMP1583

HTH

AF1870

SPy1049

3574

AF1870

PH0918

HTH 3574

1857 RAMP1688

RecB-f1468

Helicase1203

HD-f

AF1870

MTH1090

AF1873

RAMP1583

3574

AF1870

3574

1343

1

7

2

3

4

5

6

7a

Sulsol SSO1999Sulsol SSO1440

Sultok ST2639Sultok ST0032

Sulsol SSO1402Pyraer PAE0068

Arcful AF1874Pyrhor PH0917Thekod TK0450

Pyraby PAB1689

13433513

3513

1518

RuvC-f

HNH

4aHNH

1 0

Pyrhor PH0162

6

64

6

4

16

17

7

7

7

7

1

2

7

7

7

7

5 7

5 75 7

5 75 7

5 7

6 1

7

Metkan MK1314

Metace MA1931

Metmaz MM3356

Metbar Mbar_A1355

Strthe stu0960Myctub Rv2823cStaepi SERP2461

Themar TM1811Mansuc MS1651

Niteur NE0123Thethe TTP0102

Metthe MTH1082

Metthe MTH326

Metjan MJ1672Pictor PTO0053Thevol TVN0112

Metkan MK1297Thethe TTP0115

Bachal BH0328Synech sll7090

Aquaeo aq 387Theten TTE2639

Themar TM1794Arcful AF1867

Pyrfur PF1129Thefus Tfu1578

Porgin PG1987Sultok ST1979

Sulsol SSO1991Sulsol SSO1729

Sultok ST0012Sulaci Saci 2046Nostoc all1479

1421 RAMP1567

RAMP1583

RAMPMTH323

RAMP1769

XMTH324

ldquoPolymeraserdquo cassette

ldquoHelicaserdquo cassette(A)

(B)

(C)

Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4

Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56

Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)

mRNA targeting

Phylogeny of Cas1 and the 3 types of CRISPRCas systems

228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes

EcoliCASS2Type IType I

Type I

Type I

Type I

Type III

Type III

Type III

Type III

Type III

Type III

Type II

YpestCASS3

NmeniCASS4

PolymeraseRAMPmoduleMtubeCASS6

PolRAMP

DvulgCASS1

TneapHmariCASS7

ApernCASS5

cas1cas9 cas2 cas4

cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI

II

IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2

RAMP

Makarova et al NRM submitted

CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria

bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes

54

256

1537

26

516

2

9

7 56107

3

10

13

383

210

46

216

8

1

7 70211

10

1

0102030405060708090

100

1623

84 6

17 7

2322

0

9

0

0

151

2

1

21

17

20 1

0

1533 28 7 14

0

9 7 40

117 210

0102030405060708090

100

Type I Type II Type IIICas1presenceabsence

Makarova et al in preparation

Modular evolution of the 3 types of CRISPRCas

Makarova et al in preparation

Mol Microbiol 2011 Jan79(2)484-502

A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair

Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF

Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair

Back to repair functions

The 3 major modalities of evolution

Koonin Wolf Biol Direct 2009

CRISPRCas as a bona fide Lamarckian system

Koonin Wolf Biol Direct 2009

Phenomenon Biological rolefunction

Phyletic spread

Lamarckian criteria

Genomic changes caused by environmental factor

Changes are specific to relevant genomic loci

Changes provide adaptation to the causative factor

Bona fide LamarckianCRISPRCas Defense against

viruses and other mobile elements

Most of the Archaeaand many bacteria

Yes Yes Yes

piRNA Defense against transposable elements in germline

Animals Yes Yes Yes

HGT (specific cases)

Adaptation to new environment stress response resistance

Archaea bacteria unicellular eukaryotes

Yes Yes Yes

Quasi-LamarckianHGT (general phenomenon)

Diverse innovations Archaea bacteria unicellular eukaryotes

Yes No Yesno

Stress-induced mutagenesis

Stress responseresistanceadaptation to new conditions

Ubiquitous Yes No or partially

Yes (but general evolvabilityenhanced as well)

Diverse Lamarckian and quasi-Lamarckian phenomena

Koonin Wolf Biol Direct 2009

Stress as a gauge of evolutionary modality

Koonin Wolf Biol Direct 2009

bullEvolution of parasites is intrinsic to any replicator system

bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since

bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms

bullPerennial arms race between parasites and hosts is one of the principal factors of evolution

Valerian DoljaOregon State University

Yuri Wolf NCBITatiana Senkevich NIAID NIH

Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille

Didier Raoult et al

Bill MartinUniv Duesseldorf

Kira Makarova NCBI

  • Slide Number 1
  • What is a virus
  • Slide Number 3
  • Slide Number 4
  • Slide Number 5
  • Comparative genomics shows that viruses that cause human diseases belong to families that evolved hundreds of millions or even billions years agoViruses accompany the evolving cellular life throughout its history and might even predate it
  • Slide Number 7
  • Slide Number 8
  • Slide Number 9
  • Slide Number 10
  • Slide Number 11
  • Slide Number 12
  • Slide Number 13
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Slide Number 18
  • Slide Number 19
  • Slide Number 20
  • Slide Number 21
  • Slide Number 22
  • Slide Number 23
  • Slide Number 24
  • Slide Number 25
  • Slide Number 26
  • Slide Number 27
  • Slide Number 28
  • Slide Number 29
  • Slide Number 30
  • Slide Number 31
  • Slide Number 32
  • Slide Number 33
  • Slide Number 34
  • Slide Number 35
  • Slide Number 36
  • Evolution of antivirus defense systems
  • Slide Number 38
  • Slide Number 39
  • Slide Number 40
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins extreme sequence diversity
  • Slide Number 46
  • Slide Number 47
  • Slide Number 48
  • Slide Number 49
  • Hypothesis
  • Slide Number 51
  • Slide Number 52
  • Slide Number 53
  • Slide Number 54
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • The three types of CRISPRCas systems and their signature genes
  • Experimental data on CRISPRCas systems
  • Phylogeny of Cas1 and the 3 types of CRISPRCas systems
  • CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
  • Slide Number 62
  • Slide Number 63
  • Slide Number 64
  • Slide Number 65
  • Slide Number 66
  • Slide Number 67
  • Slide Number 68
  • Slide Number 69
Page 5: The Virus World, its evolution, evolution of antiviral defense

Comparative genomics shows that viruses that cause human diseases

belong to families that evolved hundreds of millions or even billions

years ago

Viruses accompany the evolving cellular life throughout its history and

might even predate it

Some viruses are comparable to cellular life forms in size and genetic complexity

Mimivirus genome (~12 Mbp ~1000 genes) is twice as large as that of Mycoplasma genitalium (580 kbp ~500 genes)

The largest most complex viruses NCLDV (Nucleo-Cytoplasmic Large DNA viruses of eukaryotes) ndashthis is where the smallpox virus belongs

Raoult et al Science 2004 Nov 19306(5700)1344 Suzan-Monti et al PLoS ONE 2007 Mar 282(3)

Iyer Aravind KooninCommon origin of four diverse families of large eukaryotic DNA viruses J Virol 2001 75 11720Iyer et al Evolutionary genomics of nucleo-cytoplasmic large DNA viruses Virus Res 2006 Apr117(1)156

The largest most complex viruses the Nucleocytoplasmic Large DNA Viruses (NCLDV)

(this is where the smallpox virus AND themimivirus belong)

6 families of NCLDVhellipand counting

size kb hosts-poxviridae 26 [134-360] vertebrates insects-asfarviridae 1 [170] vertebrates protists()-iridoviridae 8 [103-212] vertebrates insects protists()-ascoviridae 4 [119-174] insects-phycodnaviridae 9 [155-407] algae haptophytes stramenopiles -mimiviridae 2 [1181-1200] amoebozoa algae()-[Marseille virus] 1 [368] amoebozoa ndash new family

The case for the monophyly (common origin) of NCLDV

9 universally conserved hallmark genes (vaccinia gene names)-primase (D5-N)-helicase (D5-C)-DNA polymerase (E9)-packaging ATPase (A32)-Major capsid protein (D13 non-capsid in poxviruses)-Thiol-oxidoreductase (E10)-Helicase (D6 D11)-ST protein kinase (F10)-Transcription factor VLTF2 (A1)

47 genes mapped to the last common ancestor of NCLDVby maximum likelihood ndash all main functional systems represented

Iyer et al J Virol 2001 Virus Res 2006 Yutin et al Virol J 2009

b1 Helvib1 Spofr

b1 Trinil2 Invir

l1 Aedtal3 Lymch

l3 Lymdil5 Ambtil5 Frovi

l5 Singrl4 Infsp

m6 Marseillevirusn1 Acapon2 Mamav

q1 Acatuq1 ParFRq1 ParMTq1 ParAR

q1 ParNYq1 Parbu

q6 Ostviq3 Ectsiq3 Felspq2 Emihu

u2 Amsmou2 Melsa

u1 Bovpau1 Orfviu1 Deevi

u1 Swiviu1 Myxviu1 Rabfi

u1 Goaviu1 Lumsku1 Shevi

u1 Tanviu1 Yabliu1 Yabmo

u1 Vacviu1 Varvi

u1 Molcou1 Crovi

u1 Canviu1 Fowvi

c1 Afrsw

1000010000

93159970

10000

1000010000

6702

9983

10000

7557

10000

1000010000

999310000

10000

9910

10000

7074

10000

6735

10000

10000

7417

10000

4747

454010000

9840

1000010000

10000

10000

10000

9720

9911

5452

9871

10000

10000

10000

05

Phycodnaviridae

Poxviridae

Ascoviridaeamp

Iridoviridae

MimiviridaeMarseillevirus

Asfarviridae

Phylogeny of NCLDV based on concatenation of 4 universal genes primase-helicase DNA polymerase packaging ATPase VLTF2 transcription factor

HOSTAnimals+ diverse protists

Amoebozoa Algae animals()

ChlorophytesHaptophytesstramenopiles

Animals haptophytes

bullDivergence of NCLDVmost likely antedatesdivergence of eukaryoticsupergroupsbullAlternativeaddition extensive horizontal transfer of viruses

Boyer Yutin et al PNAS 2009Koonin Yutin Intervirology2010

Animals

Boyer M Yutin N Pagnier I Barrassi L Fournous G Espinosa L Robert C AzzaS Sun S Rossmann MG Suzan-Monti M La Scola B Koonin EV Raoult DPNAS 2009106(51)21848-53

Giant Marseillevirus highlights the role of amoebae as a melting pot in emergence of chimeric microorganisms

Giant viruses such as Mimivirus isolated from amoeba found in aquatic habitatsshow biological sophistication comparable to that of simple cellular life formsand seem to evolve by similar mechanisms including extensive gene duplicationand horizontal gene transfer (HGT) possibly in part through a viral parasitethe virophage We report here the isolation of Marseille virus a previouslyuncharacterized giant virus of amoeba The virions of Marseillevirus encompass a 368-kb genome a minimum of 49 proteins and some messenger RNAs Phylogenetic analysis of core genes indicates that Marseillevirus is the prototypeof a family of nucleocytoplasmic large DNA viruses (NCLDV) of eukaryotes The genome repertoire of the virus is composed of typical NCLDV core genes and genes apparently obtained from eukaryotic hosts and their parasites or symbionts both bacterial and viral We propose that amoebae are melting potsldquoof microbial evolution where diverse forms emerge including giant viruses with complex gene repertoires of various origins

The mosaic composition of the Marseille virus genome

Gene content tree intervirus gene transfer

Amoeba as a melting pot for HGT between viruses andbacterial endosymbionts

There are really weird creatures out thereSome NCLDV host their own parasites

La Scola et al The virophage as a unique parasite of the giant mimivirus Nature 2008

Chimeric origin of the virophage genome

NCLDV

Mimivirus

Archaeal virus

NCLDVenvironmental only

Bacterialtransposon

La Scola et al Nature 2008 455100-4

Hypothesisorigin of the NCLDV (and other viruses ofEukaryotes) in the

the second melting pot of virus evolution ndasheukaryogenesis

Phage scaffold(virus hallmark genes)

Eukaryotic additionsdisplacements

Koonin Yutin Intervirology 2010

Poliovirus74 kb

VP0 3DPol2C 3CPro Angg

IRESVP1VP3 2APro 2B

3A 3B

Jelly-roll CPsSuperfamily 3

helicase RdRp

Viral hallmark genes

3C-Pro3DPol

RdRp3CPro g

VPg2C

S3H

Picornaviral lsquosignaturersquo genes

Protein sequence conservation

Some of the smallest viruses (this is where poliovirus belongs) The Big Bang of picorna-like virus evolution

Koonin Wolf Nagasaki Dolja Nat Rev Microbiol 2008 Dec6(12)925-39

Picorna-likeviruses possessdiverse arraysof hallmark andunique genes

Picorna-likeviral superfamily

Marine eukaryotic plankton carries a wealth ofpositive-strand RNA viruses nearly all belong to the picorna-like superfamily

Lang et al (2004) Virology 320206

HcRNAV44 kb

VP

HaRNAV86 kb

S-Pro Pol

VP1VP3VP2PolHel3

RsRNAV89 kb AnPolHel3 VP 1-3

SssRNAV90 kb An

PolHel3 VP 1-3C-Pro

sg sg

An

Nagasaki et al (2005) Appl Env Microbiol 718888

Culley et al (2006) Science 3121795

The amended Picornavirus-like

superfamily includes 14 recognized viral families

4 floating genera and15 unclassified positive-

strand anddouble-strand RNA

viruses that infect hosts from 4 of the 5 eukaryotic

supergroups-6 distinct clades

from RdRp phylogeny-diverse genome layouts

View from the viral side

6 major clades of picorna-likeviruses - 5 infect eukaryotesfrom different supergroups

The 5 supergroups of eukaryotes and their picorna-like viruses

Complementary view from thehost side

4 of the 5 eukaryoticsupergroups host diverse picorna-like viruses (no onechecked the 5th)

PolioV74 kb

VP0 3DPol2C 3CPro Angg

IRESVP1VP3 2APro 2B

3A 3B

Jelly-roll CPsSuperfamily 3

helicase RdRp

DNA phages Bacterial group II

retroelements

Bacterial and mitochondrialHtrA family proteases

Likely origins of the signature genes of the picorna-like superfamily were inferred usingsignature PSSMs and PSI-BLAST searches

3C-Pro

Hallmark genes of picorna-like viruses have distinct prokaryoticorigins

Radiation of major viralclades occurred in aldquoBig Bangrdquo during eukaryogenesis and antedates the divergenceof eukaryotic supergroups-the viruses then ldquosampledrdquo the hosts

Conclusions on the picorna and NCLDV storiesBig Bangs of virus evolution

bullPhylogenomic analysis of the picorna-like superfamily of eukaryotic RNA viruses indicates that the major lineages within this superfamily diverged prior to the divergence of the eukaryotic supergroupsbullBig Bang ndash an explosive early phase of viral evolutionbullMost likely the same pattern holds for other major groups ofviruses as illustrated by the evolutionary study of NCLDVa completely different group of virusesbullThe Big Bangs of eukaryotic virus evolution occurred concomitantly with a similar rapid phase of host evolution andcould be a manifestation of a general model of major evolutionarytransitions

Koonin The Biological Big Bang model for the major transitions in evolutionBiol Direct 2007 Aug 20221

Koonin Wolf Nagasaki Dolja Nat Rev Microbiol 2008 Dec6(12)925-39

+

1 Positive-strandRNA

3-30 kb - +R R ET

T+R

Class Replication cycle

2 Double-strandRNA

4-25 kb plusmnTr R E

T

+ plusmn

3 Negative-strandRNA

11-20 kbR R E

RdRp CPT

+-Tr -

+

Genetic cycles of RNA viruses

Diversity of viral genetic cycles versus the uniform genetic cycle of all cellular organisms Viruses are the

biospherersquos laboratory of genomic strategies

RdRp CP

RdRp CP

+4 Retroid

RNA viruses 7-12 kb +RT Tr E

T+Tr

plusmn

5 Retroid DNAviruses

elements2-10 kb +Tr

T+Tr

plusmnE RT

plusmn

RT CP

RT

Class Replication cycle

Genetic cycles of retroid viruses and retroelements

Genetic cycles of DNA viruses and plasmids 6 ssDNAvirusesplasmids2-11 kb

7 dsDNA virusesplasmids

5-1200 kb

+RCR RCR E

+Tr

plusmn +

TRCE

Tr

plusmn

+T

plusmnR E

Pr-Pol ALL CELLULARORGANISMS

YOU ARE HERE

I Genes with readily detectable homologs from cellular life forms 1 Genes with closely related homologs from cellular organisms

(typically the host of the given virus) present in a narrow group of viruses2 Genes that are conserved within a virus lineage or even several

lineages and have moderately close cellular homologsOrigin relatively recent (1) or ancient (2) acquisition from host

II Virus-specific genes3 ORFans ie genes without detectable homologs except possibly

in closely related viruses 4 Virus-specific genes that are conserved within a virus lineageAcquisition from host but with rapid divergence from ancestor once within viral genomes

III Viral hallmark genes5 Genes shared by many diverse virus lineages with only very distant

homologs in cellular organisms

Natural history of viral genes a one-page summary of viral comparative genomics

Contributions of different classes of viral genes to the genomes of different classes of viruses strong dependence on genome size

Genome size (log) 0 1 2 3

Recent acquisitions

Old acquisitions

Virus-specific ORFans

Virus-specific conserved

Virus hallmark

Most RNAvirusesretroelementsRCR replicons

Large RNA virusesadenotailedphages

NCLDVherpeslargephages

Natural history of viral genesViral Hallmark Genes

Shared by many diverse groups of viruses from the smallest RNA viruses to the giant DNA viruses

Strong support for monophyly of all viral members of the respective gene families

Only distant homologs in cellular organisms

Can be viewed as signatures of the lsquovirus statersquo

Play major roles in genome replication packagingand assembly

4 Rolling circle replicationinitiation endonuclease

1 Jelly-roll capsid protein

2 Superfamily 3 helicase

Protein products of viral hallmark genes

3 RNA-dependent RNA polymeraseand Reverse transcriptase

5 Viral archaeo-eukaryotic DNA primase

6 UL9-like superfamily 2 helicase

7 Packaging ATPase of the FtsK family

8 ATPase subunit of terminase

The primordial gene pool hypothesis (extremely counterintuitive ndash Santiago Elena Feb 16 2011)

The hallmark genes AND by implication the major lineages of modern viruses (at least viruses of prokaryotes) descend directly from a primordial gene pool

Koonin Senkevich Dolja Biol Direct 2006 129

Viral hallmark genes arebullpresent in a huge diversity of viruses and other selfish elementsbullrepresented only by remote homologs in cellular life forms

-synergy with the diversity of genomic strategies

A crucial corollary If viruses come directly from a primordial gene pool then origin of viruses isinextricably linked to the origin of cells

Cell degeneration Escaped genes Pr imordial geneticsystems

CELL

SMALLPARASITICCELL

VERYSMALLVIRUS

CHROMOSOME

PLASMID

VIRUS

RNA

VIRUS

PRE-CELLULAR LIFE FORMS

DNA

VIRUS

mRNA

VIRUS

Competing concepts of the or igin of viruses

CELLS

Origin and evolution of virus-like genetic

elementsin the pre-cellular

era

Koonin Martin TIG 2005Koonin Senkevich Dolja Biol Direct 2006 129Koonin Ann NY Acad Sci 2009

Replicon fusionyields chromosomes

bullViruses and virus-like genetic elements are not ldquojustrdquo pathogens they aredominant entities in the biospherebullEmergence of virus-like parasites is inevitable in any replicating systembullIn the pre-cellular epoch the genetic elements that later became viral andcellular genomes comprised a single pool in which they mixed matched andevolved new increasingly complex gene ensemblesbullDifferent replication strategies including RNA replication reverse transcriptionand DNA replication evolved already in the primordial genetic poolbullWith the emergence of prokaryotic cells a distinct pool of viral genes formed that retained its identity ever since as evidenced by the extant distribution ofviral hallmark genes ldquovirus worldrdquo or the virospherebullThe emergence of the eukaryotic cell was a second melting pot of virus evolution from which viruses of eukaryotes originated via recombination of genes from prokaryote viruses retroelements and the evolving eukaryotic hostbullViruses make essential contributions to the evolution of the genomes of cellularlife forms in particular as vehicles of HGT GTAs transducing phages

The ancient Virus World (VW)

Koonin EV Senkevich TG Dolja VV The ancient Virus World and evolution of cells Biol Direct 2006 Koonin EV Wolf YI Nagasaki K Dolja VV The complexity of the virus world Nat Rev Microbiol 2009

Evolution of antivirus defense systems

bull CRISPRCas system of adaptive immunity in prokaryotes

bull A case for Lamarckian evolutionbull The perennial virus-host arms race

CRISPR repeats and Cas genesCRISPR Clustered Regularly interspaced short palindromic repeatsCas CRISPR-associated (genes)

Sorek et al Nature Rev Microbiol 2008

Makarova KS Aravind L Grishin NV Rogozin IB Koonin EVA DNA repair system specific for thermophilic Archaea and bacteria predicted by genomic context analysisNucleic Acids Res 2002 Jan 1530(2)482-96

During a systematic analysis of conserved gene context in prokaryotic genomes a previously undetected complex partially conserved neighborhood consisting of more than 20 genes was discovered in most Archaea and some bacteria including the hyperthermophiles Thermotoga maritima and Aquifex aeolicus The gene composition and gene order in this neighborhood vary greatly between species but all versions have a stable conserved core that consists of five genes One of the core genes encodes a predicted DNA helicase often fused to a predicted HD-superfamily hydrolase and another encodes a RecB family exonuclease three core genes remain uncharacterized but one of these might encode a nuclease of a new family helliphelliphelliphellipThe functional features of the proteins encoded in this neighborhood suggest that they comprise a previously undetected DNA repair system which to our knowledge is the first repair system largely specific for thermophiles to be identified

Protein components of the system an update with unification of many diverse families

~25 families altogetherFamily SubfamilyA Phyletic

distributionBComments

1 COG1518 COG1518 (cas1) All Putative novel nucleaseintegrase Mostly α-helical protein

2 COG1343 COG1343 (cas2) COG3512ygbF-like

MTH324-likey1723_N-like

All Small protein related to VapD fused to helicase (COG1203) iny1723-like proteins

3 COG1203 COG1203 (cas3) All DNA helicase Most proteins have fusion to HD nuclease

4 RecB-like nuclease

COG1468 (cas4) COG4343 All RecB-like nuclease Contains three-cysteine C-terminal cluster

5 RAMP Repair-associated mysterious

protein

COG1688 (cas5)COG1769 COG1583COG1567 COG1336COG1367 COG1604COG1337 COG1332COG5551

BH0337-likeMJ0978-likeYgcH-like

y1726-like y1727-like

All Belong to ldquoRAMPrdquo family possibly RNA-binding proteinstructurally related to duplicated ferredoxin fold (PDB 1wj9)

6 COG1857 COG1857 COG3649 YgcJ-like y1725-like

All αβ protein predicted nuclease or integrase

7 HD-like nuclease

COG1203 (N-terminus) COG2254 All HD-like nuclease

8 BH0338 BH0338-likeMTH1090-like

All mostlyarchaea and FIRM

Large Zn-finger containing proteins probably nucleases (nucleaseactivity was shown for MTH1090 (ref)

9 COG1353 predicted

polymerase

COG1353 Most archaea some bacteria

Predicted palm-domain polymerase distantly related to viral RdRp and RT

hellipand ~15 other less common proteins Makarova et al BD 2006

CRISPR clustered regularly interspaced short palindromic repeats

CRISPR repeats

TB and TA

Strepto-like

Ecoli-like

Pasteurella-like

Cas2COG1343

Cas1COG1518

Cas4COG1468

Cas3COG1203

ldquoThe common structural characteristics of CRISPR loci are (i) the presence of multiple short direct repeats which show no or very little sequence variation within a given locus (ii) the presence of non-repetitive spacer sequences between the repeats of similar size (iii) the presence of a common leader sequence of a few hundred basepairs in most species harbouring multiple CRISPR loci (iv) the absence of long open reading frames within the locus and (v) the presence of the cas1 gene accompanied by the cas2 cas3 or cas4 genes in CRISPR-containing speciesrdquo

The cas genes are our ldquorepairrdquo system

ldquoHelicaserdquo cassette

ldquoPolymeraserdquo cassette

paragon of prokaryotic genome plasticity-extensive gene shuffling-evidence of multiple horizontal transfers-widespread gene loss

RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins

extreme sequence diversity MOTIFs specific I II III IV V COGs 13361367 hhshhGs ust-lKGhh+hh hhGtt hD lGhttsGh 160413371332 y1726-like slhlpEKuVRGT lRTIDs YGuVTsGhuh COG1851 hGphpGpsaFh hGFGRh BH0337-like TpA-h+GIh-uIh hhLpDV LGsREhuht COG1567 hhhpp ups-lhtAhh luscoGhGh COG1769 hhhh+Ph-hhh ss-hhGhlhsh hGh hGtcp+hsthchp COG1688 (Cas5) hhhhhts sssshhGhlsh lGttphh COGs 15835551 hhhhoPhhl hGtppshGFGl YgcH-like hHphlh hGu+uhGhGhh y1727-like LHphLh GFsthGLStss MJ0978-like hHNH lG+tsuhGhGol

Ferredoxin foldRNA-recognition motif (RRM)

The sequence similarity space of CRISPR repeats visualized with the BioLayoutprogram [26] Dots denote individual repeat sequences connecting lines represent Smith-Waterman similarities such that closer dots represent more similar sequences Dot colors denote cluster association as derived from MCL clustering The 12 largest clusters are indicated by circles together with their sequence logos coarse phylogenetic composition and sample secondary structures where applicableKunin et al Genome Biology 2007 8R61 doi101186gb-2007-8-4-r61

CRISPR show extreme diversity and complex clustering

Arcful GTTGAAATC-------AGACCAAAATGGGATTGAAAG- Metthe GTTAAAATC-------AGACCAAAATGGGATTGAAAT- Pyraby GTTCCAATA-------AGACTAGAATAGAATTGAAAG- Pyrhor GTTTAATAA--------GACTAAAATAGAATTGAAAG- Sulsol GATTAATCC-------------CAAAAGAATTGAAAG- Pyrhor GTTTCCGTA--------GAACTAAATAGTGTGGAAAG- Pyraby GTTTCCGTA--------GAACTTAGTAGTGTGGAAAG- Pyraer GTTTCAACT------------ATCTTTTGATTTTTGG- Pictor GTTAAATAA-------TAACCTAAATAGGATTGAAAG- Theaer GTAAAATAG---------ACCTTAATAGGATTGAAAG- Metace GTTTCAATC-------CCTCAAAGGTCTGATTTTAAC- Sultok GATGAATCC------------CAAAAGGAATTGAAAG- Sulaci GTTTTAGTT-------------TCTTGTCGTTATTAC- Pictor GTTTAAGAA--------TTACTAGATAGTATGGAGT-- Halmar GTTTCAGAC-------GGACCCTTGTGGGGTTGAAGC- Metjan ATTAAAATC-------AGACCGTTTCGGAATGGAAAT- Metkan GTTTCATTA-CCCGTATTATTACGGGTTAATTGCGAG- Nanequ CTTTCAATA---------TTTCTAATATATTAGAAAC- Themar GTTTCAATA-------CTTCCTTAGAGGTATGGAAAC- Theten GTTTCAATC-------CCTTTTAGGTAGGCTAAAAAC Thethe GTTGCAAAC-CTCGTTAGCCTCGTAGAGGATTGAAAC- Aquaeo GTTTTAACT--------CCACACGGTACATTAGAAAC- Bachal GTCGCACTC----TATATGGGT-GCGTGGATTGAAAT- Azoarcus GTGTTCCCC-------GCGCATCGCGGGGGTTGAAG-- Chltep GTCTTCCCC--------ACGCC-CGTGGGGGTGTTTC- Chrvio GTGCTCCCC-------CACGCA-CGTGGGGATGAACCG Clotet GTATTAGTA-------GCACCA-TATTGGAATGTAAAT Anabaena GTTTTAATTAACAAAAATCCCTATCAGGGATTGAAAC- Myctub GTTTCCGTC--CCCTCTCGGGGTTTTGGGTCTGACGAC Yerpes GTTCACTGC--------CGCACAGGCAGCTTAGAAA-- Metcap GTTTCAATCCACTCCCGGCTATTTAGCCGGGAGATAC- Mycgal GTTTTAGCA-CTGTACAATACTTGTGTAAGCAATAAC- Pasmul GTTAACTGC--------CGTATAGGCAGCTTAGAAA-- Pasmul GTTGTAGTTCCCTCTCTCATTTCGCAGTGCTACAAT-- Pasmul GTTCACCAT--------CGTGTAGATGGCTTAGAAA--

All

A large subset of CRISPRs is conserved among diverse specieseven between archaea and bacteria suggesting that

CRISPRs are horizontally transferred together with cas genes

Mojica FJ Diez-Villasenor C Garcia-Martinez J Soria E

Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elementsJ Mol Evol 2005 Feb60(2)174-82

Here we show that CRISPR spacers derive from preexisting sequences either chromosomal or within transmissible genetic elements such as bacteriophages and conjugative plasmids

helliphelliphelliphellipThe transcription of the CRISPR loci (Tang et al 2002) suggests that such activity could be executed by CRISPR-RNA molecules acting as regulatory RNA that specifically recognizes the target through the homologous RNA-spacer sequence similarly to the eukaryotic interference RNA

HypothesisCRISPRCasbull is a prokaryotic immunity system that functions on the

RNAi principle

bull integrates short fragments of essential phageplasmid genes into CRISPRs

bull When expressed these fragments (psiRNA ndash after prokaryotic siRNA) silence the target gene and make the organism immune to the respective agent

bull contains all or most of the protein activities involved in these processes

bull Some of the Cas proteins are functional analogs of the eukaryotic proteins involved in RNAi in particular components of RISC (RNA-Induced Silencing Complex) and form prokaryotic analogs of RISC (pRISCs)

Transcription

(protein-guided) RNA folding

specific transcription factor (COG1517)

cellular RNApol

polycistronic pre-psiRNA

p-dicer (Helicase+HD-Hydrolase)(COG1203 COG2254)

RNA processing(fast)

pre-psiRNA75-100 nt

RNA processing(slow)

p-dicer

psiRNA25-45 nt

p-slicer(COG1468 COG4343

COG1857)

RAMP

target RNA cleavage

p-RISC

plasmid or phage mRNA

Annealing to RNA target

p-RISC

The basic scheme of CASS functioning

psiRNA25-45 nt

RAMP

RAMP-RNA complex(unstable)plasmid or phage

mRNA

Primer elongationCASS RNApol(COG1353)

Amplified annealing

long dsRNA (stable)

p-dicerDuplex degradation

RAMPRAMP binding

Cycle continues

Annealing to RNA target

Variant of CASS functioning with polymerasepsi-RNA amplification(mostly in thermophiles but also in Mycobacteria)

plasmid or phage mRNA

pre-psiRNA75-100 nt

CASS RNApol(COG1353)Or other RT

Reverse transcription with random copy choice

Random RNA recombination and reverse transcription

dsDNA with CRISPRs and target-derived spacers

Homologous recombination with genomic CRISPR region

integrase(COG1518)

genomic DNA

genomic DNA with new target-derived spacer

OR

New psiRNA generation

Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12

Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity

Key validation

bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease

Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes

(1) Adaptation

(2) Expression amp Processing

(3) Interference

invading virus plasmid

CRISPRleader

5 4 3 2 1

6 5 4 3 2 1

invading virus plasmid

invading virus plasmid

host

host

cas operon

Cascade

Cas3

cas3 cas2cas1cas5

Van der Oost et al TIBS 2009

The three types of CRISPRCas systems and their signature genes

1343

1343

1343

1343

Helicase1203

HD-f

Cas2

Cas6

Cas6

Cas5

Cas8

Cas9

Cas1

Cas3

Cas10

Cas11

Cas7

Cas6

Cas6

Cas5 Cas6

Cas4

Cas6

13433513

1343

RuvC-f

3513

HNH

13433513

1857

18571857 RAMP1688

RAMP1688

RecB-f1468

RecB-f1468

Helicase1203

R AMP1583

1857 1857Helicase1203

1857Helicase1203

1343 y1724

ygcKygcL

EcoliCASS2

YpestCASS3

NmeniCASS4

MtubeCASS6 RAMP module

DvulgCASS1

Tneap and HmariCASS7

ApernCASS5

Type I (Helicase cassette)

Cascade

Type II (HNH-type)

Type III (RAMP module or Polymerase cassette)

CASS4a

1857 1343RecB-f1468

Helicase1203

BH0338MTH1090LA3191

1421 RAMP1567

RAMP15833337RAMP

1583

RAMP1583

RAMP1769

SPy1049

Cas6

Cas61857

RecB-f1468

RAMP1688

Helicase1203

AF0070

R AMP1583

1343

1517

Cas12 Cas13

RAMP1688

RAMP1688

RAMP1688

Makarova et al NRM submitted

Experimental data on CRISPRCas systems

1857

1857

1857

RAMP1688

10

Cordip DIP0037Bacfra BF3951

Wolsuc WS1444Camjej Cj1522c

Neimen NMA0630Pasmul PM1126

Strthe str0658Strpyo Spy 0770

Treden TDE0328Staepi SERP2463

Mycgal MGA 0523Mycmob MMOB0320

Porgin PG1982Legpne lpp0161

Wolsuc WS1615Sulsol SSO1450

Aerper APE1240Sulsol SSO1405

Pyraer PAE0200Pyraer PAE0081

Metmaz MM0559Metmaz MM3249

Arcful AF1878Mansuc MS1635

Niteur NE0111Myctub Rv2817c

Nostoc alr0381Synech slr7071Thethe TTHB145

Chrvio CV1229Xanaxo XAC3842

Chltep CT1130Desvul DVUA0134

Metcap MCA0651Azoarc ebA3283

Deigeo_DRAFT_1682Mansuc MS0981

Baccla ABC3592Bachal BH0341

Strpyo Spy 1286Vibvul VVA1544

Nostoc alr1468Synech slr7092

Synech slr7016Geosul GSU0057

Thethe TTHB224Nostoc alr1568

Metkan MK1312

Esccol b2755Saltyp STM2938

Phopro PBPRC0034Geosul GSU1392

Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639

Symthe_STH663Corjei jk0643

Nocfar nfa44220Strave SAV7537

Metcap MCA0930Cordip DIP2214

Chrvio CV1756Zymmob ZMO0680

Pasmul PM0311Erwcar ECA3679

Yerpes y1722Legpne lpl2837Phopro PBPRB1995

Acinet ACIAD2484

Clotet CTC01148Fusnuc FN1177

Aquaeo aq 369Theten TTE2658Themar TM1797

Bacfra BF2544Porgin PG2014

Halmar pNG4053Clotet CTC01463

Metjan MJ0378Pyrhor PH1245

Thekod TK0455Metthe MTH1084

Arcful AF2435Metace MA3670

Nanequ NEQ017Pyrhor PH0173

Pictor PTO0003Pictor PTO0049Thevol TVN0106

1343

1343

1343

1343

1518

1518

1343

1518

RecB-f4343

RecB-f1468

RecB-f1468

RecB-f1468

Helicase12031857 RAMP

1688

RAMP1688

Helicase1203

Helicase1203

RAMPBH0337

RAMPYgcH

RAMPy1727

RAMPy17261857

Helicase1203

HD-f

Helicase1203

HD-f

1343

HD-f

HD-f

HD-f

RuvC-f

AF0070

BH0338

y1724

ygcKygcL

RAMP1583

HTH

AF1870

SPy1049

3574

AF1870

PH0918

HTH 3574

1857 RAMP1688

RecB-f1468

Helicase1203

HD-f

AF1870

MTH1090

AF1873

RAMP1583

3574

AF1870

3574

1343

1

7

2

3

4

5

6

7a

Sulsol SSO1999Sulsol SSO1440

Sultok ST2639Sultok ST0032

Sulsol SSO1402Pyraer PAE0068

Arcful AF1874Pyrhor PH0917Thekod TK0450

Pyraby PAB1689

13433513

3513

1518

RuvC-f

HNH

4aHNH

1 0

Pyrhor PH0162

6

64

6

4

16

17

7

7

7

7

1

2

7

7

7

7

5 7

5 75 7

5 75 7

5 7

6 1

7

Metkan MK1314

Metace MA1931

Metmaz MM3356

Metbar Mbar_A1355

Strthe stu0960Myctub Rv2823cStaepi SERP2461

Themar TM1811Mansuc MS1651

Niteur NE0123Thethe TTP0102

Metthe MTH1082

Metthe MTH326

Metjan MJ1672Pictor PTO0053Thevol TVN0112

Metkan MK1297Thethe TTP0115

Bachal BH0328Synech sll7090

Aquaeo aq 387Theten TTE2639

Themar TM1794Arcful AF1867

Pyrfur PF1129Thefus Tfu1578

Porgin PG1987Sultok ST1979

Sulsol SSO1991Sulsol SSO1729

Sultok ST0012Sulaci Saci 2046Nostoc all1479

1421 RAMP1567

RAMP1583

RAMPMTH323

RAMP1769

XMTH324

ldquoPolymeraserdquo cassette

ldquoHelicaserdquo cassette(A)

(B)

(C)

Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4

Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56

Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)

mRNA targeting

Phylogeny of Cas1 and the 3 types of CRISPRCas systems

228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes

EcoliCASS2Type IType I

Type I

Type I

Type I

Type III

Type III

Type III

Type III

Type III

Type III

Type II

YpestCASS3

NmeniCASS4

PolymeraseRAMPmoduleMtubeCASS6

PolRAMP

DvulgCASS1

TneapHmariCASS7

ApernCASS5

cas1cas9 cas2 cas4

cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI

II

IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2

RAMP

Makarova et al NRM submitted

CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria

bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes

54

256

1537

26

516

2

9

7 56107

3

10

13

383

210

46

216

8

1

7 70211

10

1

0102030405060708090

100

1623

84 6

17 7

2322

0

9

0

0

151

2

1

21

17

20 1

0

1533 28 7 14

0

9 7 40

117 210

0102030405060708090

100

Type I Type II Type IIICas1presenceabsence

Makarova et al in preparation

Modular evolution of the 3 types of CRISPRCas

Makarova et al in preparation

Mol Microbiol 2011 Jan79(2)484-502

A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair

Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF

Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair

Back to repair functions

The 3 major modalities of evolution

Koonin Wolf Biol Direct 2009

CRISPRCas as a bona fide Lamarckian system

Koonin Wolf Biol Direct 2009

Phenomenon Biological rolefunction

Phyletic spread

Lamarckian criteria

Genomic changes caused by environmental factor

Changes are specific to relevant genomic loci

Changes provide adaptation to the causative factor

Bona fide LamarckianCRISPRCas Defense against

viruses and other mobile elements

Most of the Archaeaand many bacteria

Yes Yes Yes

piRNA Defense against transposable elements in germline

Animals Yes Yes Yes

HGT (specific cases)

Adaptation to new environment stress response resistance

Archaea bacteria unicellular eukaryotes

Yes Yes Yes

Quasi-LamarckianHGT (general phenomenon)

Diverse innovations Archaea bacteria unicellular eukaryotes

Yes No Yesno

Stress-induced mutagenesis

Stress responseresistanceadaptation to new conditions

Ubiquitous Yes No or partially

Yes (but general evolvabilityenhanced as well)

Diverse Lamarckian and quasi-Lamarckian phenomena

Koonin Wolf Biol Direct 2009

Stress as a gauge of evolutionary modality

Koonin Wolf Biol Direct 2009

bullEvolution of parasites is intrinsic to any replicator system

bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since

bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms

bullPerennial arms race between parasites and hosts is one of the principal factors of evolution

Valerian DoljaOregon State University

Yuri Wolf NCBITatiana Senkevich NIAID NIH

Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille

Didier Raoult et al

Bill MartinUniv Duesseldorf

Kira Makarova NCBI

  • Slide Number 1
  • What is a virus
  • Slide Number 3
  • Slide Number 4
  • Slide Number 5
  • Comparative genomics shows that viruses that cause human diseases belong to families that evolved hundreds of millions or even billions years agoViruses accompany the evolving cellular life throughout its history and might even predate it
  • Slide Number 7
  • Slide Number 8
  • Slide Number 9
  • Slide Number 10
  • Slide Number 11
  • Slide Number 12
  • Slide Number 13
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Slide Number 18
  • Slide Number 19
  • Slide Number 20
  • Slide Number 21
  • Slide Number 22
  • Slide Number 23
  • Slide Number 24
  • Slide Number 25
  • Slide Number 26
  • Slide Number 27
  • Slide Number 28
  • Slide Number 29
  • Slide Number 30
  • Slide Number 31
  • Slide Number 32
  • Slide Number 33
  • Slide Number 34
  • Slide Number 35
  • Slide Number 36
  • Evolution of antivirus defense systems
  • Slide Number 38
  • Slide Number 39
  • Slide Number 40
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins extreme sequence diversity
  • Slide Number 46
  • Slide Number 47
  • Slide Number 48
  • Slide Number 49
  • Hypothesis
  • Slide Number 51
  • Slide Number 52
  • Slide Number 53
  • Slide Number 54
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • The three types of CRISPRCas systems and their signature genes
  • Experimental data on CRISPRCas systems
  • Phylogeny of Cas1 and the 3 types of CRISPRCas systems
  • CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
  • Slide Number 62
  • Slide Number 63
  • Slide Number 64
  • Slide Number 65
  • Slide Number 66
  • Slide Number 67
  • Slide Number 68
  • Slide Number 69
Page 6: The Virus World, its evolution, evolution of antiviral defense

Some viruses are comparable to cellular life forms in size and genetic complexity

Mimivirus genome (~12 Mbp ~1000 genes) is twice as large as that of Mycoplasma genitalium (580 kbp ~500 genes)

The largest most complex viruses NCLDV (Nucleo-Cytoplasmic Large DNA viruses of eukaryotes) ndashthis is where the smallpox virus belongs

Raoult et al Science 2004 Nov 19306(5700)1344 Suzan-Monti et al PLoS ONE 2007 Mar 282(3)

Iyer Aravind KooninCommon origin of four diverse families of large eukaryotic DNA viruses J Virol 2001 75 11720Iyer et al Evolutionary genomics of nucleo-cytoplasmic large DNA viruses Virus Res 2006 Apr117(1)156

The largest most complex viruses the Nucleocytoplasmic Large DNA Viruses (NCLDV)

(this is where the smallpox virus AND themimivirus belong)

6 families of NCLDVhellipand counting

size kb hosts-poxviridae 26 [134-360] vertebrates insects-asfarviridae 1 [170] vertebrates protists()-iridoviridae 8 [103-212] vertebrates insects protists()-ascoviridae 4 [119-174] insects-phycodnaviridae 9 [155-407] algae haptophytes stramenopiles -mimiviridae 2 [1181-1200] amoebozoa algae()-[Marseille virus] 1 [368] amoebozoa ndash new family

The case for the monophyly (common origin) of NCLDV

9 universally conserved hallmark genes (vaccinia gene names)-primase (D5-N)-helicase (D5-C)-DNA polymerase (E9)-packaging ATPase (A32)-Major capsid protein (D13 non-capsid in poxviruses)-Thiol-oxidoreductase (E10)-Helicase (D6 D11)-ST protein kinase (F10)-Transcription factor VLTF2 (A1)

47 genes mapped to the last common ancestor of NCLDVby maximum likelihood ndash all main functional systems represented

Iyer et al J Virol 2001 Virus Res 2006 Yutin et al Virol J 2009

b1 Helvib1 Spofr

b1 Trinil2 Invir

l1 Aedtal3 Lymch

l3 Lymdil5 Ambtil5 Frovi

l5 Singrl4 Infsp

m6 Marseillevirusn1 Acapon2 Mamav

q1 Acatuq1 ParFRq1 ParMTq1 ParAR

q1 ParNYq1 Parbu

q6 Ostviq3 Ectsiq3 Felspq2 Emihu

u2 Amsmou2 Melsa

u1 Bovpau1 Orfviu1 Deevi

u1 Swiviu1 Myxviu1 Rabfi

u1 Goaviu1 Lumsku1 Shevi

u1 Tanviu1 Yabliu1 Yabmo

u1 Vacviu1 Varvi

u1 Molcou1 Crovi

u1 Canviu1 Fowvi

c1 Afrsw

1000010000

93159970

10000

1000010000

6702

9983

10000

7557

10000

1000010000

999310000

10000

9910

10000

7074

10000

6735

10000

10000

7417

10000

4747

454010000

9840

1000010000

10000

10000

10000

9720

9911

5452

9871

10000

10000

10000

05

Phycodnaviridae

Poxviridae

Ascoviridaeamp

Iridoviridae

MimiviridaeMarseillevirus

Asfarviridae

Phylogeny of NCLDV based on concatenation of 4 universal genes primase-helicase DNA polymerase packaging ATPase VLTF2 transcription factor

HOSTAnimals+ diverse protists

Amoebozoa Algae animals()

ChlorophytesHaptophytesstramenopiles

Animals haptophytes

bullDivergence of NCLDVmost likely antedatesdivergence of eukaryoticsupergroupsbullAlternativeaddition extensive horizontal transfer of viruses

Boyer Yutin et al PNAS 2009Koonin Yutin Intervirology2010

Animals

Boyer M Yutin N Pagnier I Barrassi L Fournous G Espinosa L Robert C AzzaS Sun S Rossmann MG Suzan-Monti M La Scola B Koonin EV Raoult DPNAS 2009106(51)21848-53

Giant Marseillevirus highlights the role of amoebae as a melting pot in emergence of chimeric microorganisms

Giant viruses such as Mimivirus isolated from amoeba found in aquatic habitatsshow biological sophistication comparable to that of simple cellular life formsand seem to evolve by similar mechanisms including extensive gene duplicationand horizontal gene transfer (HGT) possibly in part through a viral parasitethe virophage We report here the isolation of Marseille virus a previouslyuncharacterized giant virus of amoeba The virions of Marseillevirus encompass a 368-kb genome a minimum of 49 proteins and some messenger RNAs Phylogenetic analysis of core genes indicates that Marseillevirus is the prototypeof a family of nucleocytoplasmic large DNA viruses (NCLDV) of eukaryotes The genome repertoire of the virus is composed of typical NCLDV core genes and genes apparently obtained from eukaryotic hosts and their parasites or symbionts both bacterial and viral We propose that amoebae are melting potsldquoof microbial evolution where diverse forms emerge including giant viruses with complex gene repertoires of various origins

The mosaic composition of the Marseille virus genome

Gene content tree intervirus gene transfer

Amoeba as a melting pot for HGT between viruses andbacterial endosymbionts

There are really weird creatures out thereSome NCLDV host their own parasites

La Scola et al The virophage as a unique parasite of the giant mimivirus Nature 2008

Chimeric origin of the virophage genome

NCLDV

Mimivirus

Archaeal virus

NCLDVenvironmental only

Bacterialtransposon

La Scola et al Nature 2008 455100-4

Hypothesisorigin of the NCLDV (and other viruses ofEukaryotes) in the

the second melting pot of virus evolution ndasheukaryogenesis

Phage scaffold(virus hallmark genes)

Eukaryotic additionsdisplacements

Koonin Yutin Intervirology 2010

Poliovirus74 kb

VP0 3DPol2C 3CPro Angg

IRESVP1VP3 2APro 2B

3A 3B

Jelly-roll CPsSuperfamily 3

helicase RdRp

Viral hallmark genes

3C-Pro3DPol

RdRp3CPro g

VPg2C

S3H

Picornaviral lsquosignaturersquo genes

Protein sequence conservation

Some of the smallest viruses (this is where poliovirus belongs) The Big Bang of picorna-like virus evolution

Koonin Wolf Nagasaki Dolja Nat Rev Microbiol 2008 Dec6(12)925-39

Picorna-likeviruses possessdiverse arraysof hallmark andunique genes

Picorna-likeviral superfamily

Marine eukaryotic plankton carries a wealth ofpositive-strand RNA viruses nearly all belong to the picorna-like superfamily

Lang et al (2004) Virology 320206

HcRNAV44 kb

VP

HaRNAV86 kb

S-Pro Pol

VP1VP3VP2PolHel3

RsRNAV89 kb AnPolHel3 VP 1-3

SssRNAV90 kb An

PolHel3 VP 1-3C-Pro

sg sg

An

Nagasaki et al (2005) Appl Env Microbiol 718888

Culley et al (2006) Science 3121795

The amended Picornavirus-like

superfamily includes 14 recognized viral families

4 floating genera and15 unclassified positive-

strand anddouble-strand RNA

viruses that infect hosts from 4 of the 5 eukaryotic

supergroups-6 distinct clades

from RdRp phylogeny-diverse genome layouts

View from the viral side

6 major clades of picorna-likeviruses - 5 infect eukaryotesfrom different supergroups

The 5 supergroups of eukaryotes and their picorna-like viruses

Complementary view from thehost side

4 of the 5 eukaryoticsupergroups host diverse picorna-like viruses (no onechecked the 5th)

PolioV74 kb

VP0 3DPol2C 3CPro Angg

IRESVP1VP3 2APro 2B

3A 3B

Jelly-roll CPsSuperfamily 3

helicase RdRp

DNA phages Bacterial group II

retroelements

Bacterial and mitochondrialHtrA family proteases

Likely origins of the signature genes of the picorna-like superfamily were inferred usingsignature PSSMs and PSI-BLAST searches

3C-Pro

Hallmark genes of picorna-like viruses have distinct prokaryoticorigins

Radiation of major viralclades occurred in aldquoBig Bangrdquo during eukaryogenesis and antedates the divergenceof eukaryotic supergroups-the viruses then ldquosampledrdquo the hosts

Conclusions on the picorna and NCLDV storiesBig Bangs of virus evolution

bullPhylogenomic analysis of the picorna-like superfamily of eukaryotic RNA viruses indicates that the major lineages within this superfamily diverged prior to the divergence of the eukaryotic supergroupsbullBig Bang ndash an explosive early phase of viral evolutionbullMost likely the same pattern holds for other major groups ofviruses as illustrated by the evolutionary study of NCLDVa completely different group of virusesbullThe Big Bangs of eukaryotic virus evolution occurred concomitantly with a similar rapid phase of host evolution andcould be a manifestation of a general model of major evolutionarytransitions

Koonin The Biological Big Bang model for the major transitions in evolutionBiol Direct 2007 Aug 20221

Koonin Wolf Nagasaki Dolja Nat Rev Microbiol 2008 Dec6(12)925-39

+

1 Positive-strandRNA

3-30 kb - +R R ET

T+R

Class Replication cycle

2 Double-strandRNA

4-25 kb plusmnTr R E

T

+ plusmn

3 Negative-strandRNA

11-20 kbR R E

RdRp CPT

+-Tr -

+

Genetic cycles of RNA viruses

Diversity of viral genetic cycles versus the uniform genetic cycle of all cellular organisms Viruses are the

biospherersquos laboratory of genomic strategies

RdRp CP

RdRp CP

+4 Retroid

RNA viruses 7-12 kb +RT Tr E

T+Tr

plusmn

5 Retroid DNAviruses

elements2-10 kb +Tr

T+Tr

plusmnE RT

plusmn

RT CP

RT

Class Replication cycle

Genetic cycles of retroid viruses and retroelements

Genetic cycles of DNA viruses and plasmids 6 ssDNAvirusesplasmids2-11 kb

7 dsDNA virusesplasmids

5-1200 kb

+RCR RCR E

+Tr

plusmn +

TRCE

Tr

plusmn

+T

plusmnR E

Pr-Pol ALL CELLULARORGANISMS

YOU ARE HERE

I Genes with readily detectable homologs from cellular life forms 1 Genes with closely related homologs from cellular organisms

(typically the host of the given virus) present in a narrow group of viruses2 Genes that are conserved within a virus lineage or even several

lineages and have moderately close cellular homologsOrigin relatively recent (1) or ancient (2) acquisition from host

II Virus-specific genes3 ORFans ie genes without detectable homologs except possibly

in closely related viruses 4 Virus-specific genes that are conserved within a virus lineageAcquisition from host but with rapid divergence from ancestor once within viral genomes

III Viral hallmark genes5 Genes shared by many diverse virus lineages with only very distant

homologs in cellular organisms

Natural history of viral genes a one-page summary of viral comparative genomics

Contributions of different classes of viral genes to the genomes of different classes of viruses strong dependence on genome size

Genome size (log) 0 1 2 3

Recent acquisitions

Old acquisitions

Virus-specific ORFans

Virus-specific conserved

Virus hallmark

Most RNAvirusesretroelementsRCR replicons

Large RNA virusesadenotailedphages

NCLDVherpeslargephages

Natural history of viral genesViral Hallmark Genes

Shared by many diverse groups of viruses from the smallest RNA viruses to the giant DNA viruses

Strong support for monophyly of all viral members of the respective gene families

Only distant homologs in cellular organisms

Can be viewed as signatures of the lsquovirus statersquo

Play major roles in genome replication packagingand assembly

4 Rolling circle replicationinitiation endonuclease

1 Jelly-roll capsid protein

2 Superfamily 3 helicase

Protein products of viral hallmark genes

3 RNA-dependent RNA polymeraseand Reverse transcriptase

5 Viral archaeo-eukaryotic DNA primase

6 UL9-like superfamily 2 helicase

7 Packaging ATPase of the FtsK family

8 ATPase subunit of terminase

The primordial gene pool hypothesis (extremely counterintuitive ndash Santiago Elena Feb 16 2011)

The hallmark genes AND by implication the major lineages of modern viruses (at least viruses of prokaryotes) descend directly from a primordial gene pool

Koonin Senkevich Dolja Biol Direct 2006 129

Viral hallmark genes arebullpresent in a huge diversity of viruses and other selfish elementsbullrepresented only by remote homologs in cellular life forms

-synergy with the diversity of genomic strategies

A crucial corollary If viruses come directly from a primordial gene pool then origin of viruses isinextricably linked to the origin of cells

Cell degeneration Escaped genes Pr imordial geneticsystems

CELL

SMALLPARASITICCELL

VERYSMALLVIRUS

CHROMOSOME

PLASMID

VIRUS

RNA

VIRUS

PRE-CELLULAR LIFE FORMS

DNA

VIRUS

mRNA

VIRUS

Competing concepts of the or igin of viruses

CELLS

Origin and evolution of virus-like genetic

elementsin the pre-cellular

era

Koonin Martin TIG 2005Koonin Senkevich Dolja Biol Direct 2006 129Koonin Ann NY Acad Sci 2009

Replicon fusionyields chromosomes

bullViruses and virus-like genetic elements are not ldquojustrdquo pathogens they aredominant entities in the biospherebullEmergence of virus-like parasites is inevitable in any replicating systembullIn the pre-cellular epoch the genetic elements that later became viral andcellular genomes comprised a single pool in which they mixed matched andevolved new increasingly complex gene ensemblesbullDifferent replication strategies including RNA replication reverse transcriptionand DNA replication evolved already in the primordial genetic poolbullWith the emergence of prokaryotic cells a distinct pool of viral genes formed that retained its identity ever since as evidenced by the extant distribution ofviral hallmark genes ldquovirus worldrdquo or the virospherebullThe emergence of the eukaryotic cell was a second melting pot of virus evolution from which viruses of eukaryotes originated via recombination of genes from prokaryote viruses retroelements and the evolving eukaryotic hostbullViruses make essential contributions to the evolution of the genomes of cellularlife forms in particular as vehicles of HGT GTAs transducing phages

The ancient Virus World (VW)

Koonin EV Senkevich TG Dolja VV The ancient Virus World and evolution of cells Biol Direct 2006 Koonin EV Wolf YI Nagasaki K Dolja VV The complexity of the virus world Nat Rev Microbiol 2009

Evolution of antivirus defense systems

bull CRISPRCas system of adaptive immunity in prokaryotes

bull A case for Lamarckian evolutionbull The perennial virus-host arms race

CRISPR repeats and Cas genesCRISPR Clustered Regularly interspaced short palindromic repeatsCas CRISPR-associated (genes)

Sorek et al Nature Rev Microbiol 2008

Makarova KS Aravind L Grishin NV Rogozin IB Koonin EVA DNA repair system specific for thermophilic Archaea and bacteria predicted by genomic context analysisNucleic Acids Res 2002 Jan 1530(2)482-96

During a systematic analysis of conserved gene context in prokaryotic genomes a previously undetected complex partially conserved neighborhood consisting of more than 20 genes was discovered in most Archaea and some bacteria including the hyperthermophiles Thermotoga maritima and Aquifex aeolicus The gene composition and gene order in this neighborhood vary greatly between species but all versions have a stable conserved core that consists of five genes One of the core genes encodes a predicted DNA helicase often fused to a predicted HD-superfamily hydrolase and another encodes a RecB family exonuclease three core genes remain uncharacterized but one of these might encode a nuclease of a new family helliphelliphelliphellipThe functional features of the proteins encoded in this neighborhood suggest that they comprise a previously undetected DNA repair system which to our knowledge is the first repair system largely specific for thermophiles to be identified

Protein components of the system an update with unification of many diverse families

~25 families altogetherFamily SubfamilyA Phyletic

distributionBComments

1 COG1518 COG1518 (cas1) All Putative novel nucleaseintegrase Mostly α-helical protein

2 COG1343 COG1343 (cas2) COG3512ygbF-like

MTH324-likey1723_N-like

All Small protein related to VapD fused to helicase (COG1203) iny1723-like proteins

3 COG1203 COG1203 (cas3) All DNA helicase Most proteins have fusion to HD nuclease

4 RecB-like nuclease

COG1468 (cas4) COG4343 All RecB-like nuclease Contains three-cysteine C-terminal cluster

5 RAMP Repair-associated mysterious

protein

COG1688 (cas5)COG1769 COG1583COG1567 COG1336COG1367 COG1604COG1337 COG1332COG5551

BH0337-likeMJ0978-likeYgcH-like

y1726-like y1727-like

All Belong to ldquoRAMPrdquo family possibly RNA-binding proteinstructurally related to duplicated ferredoxin fold (PDB 1wj9)

6 COG1857 COG1857 COG3649 YgcJ-like y1725-like

All αβ protein predicted nuclease or integrase

7 HD-like nuclease

COG1203 (N-terminus) COG2254 All HD-like nuclease

8 BH0338 BH0338-likeMTH1090-like

All mostlyarchaea and FIRM

Large Zn-finger containing proteins probably nucleases (nucleaseactivity was shown for MTH1090 (ref)

9 COG1353 predicted

polymerase

COG1353 Most archaea some bacteria

Predicted palm-domain polymerase distantly related to viral RdRp and RT

hellipand ~15 other less common proteins Makarova et al BD 2006

CRISPR clustered regularly interspaced short palindromic repeats

CRISPR repeats

TB and TA

Strepto-like

Ecoli-like

Pasteurella-like

Cas2COG1343

Cas1COG1518

Cas4COG1468

Cas3COG1203

ldquoThe common structural characteristics of CRISPR loci are (i) the presence of multiple short direct repeats which show no or very little sequence variation within a given locus (ii) the presence of non-repetitive spacer sequences between the repeats of similar size (iii) the presence of a common leader sequence of a few hundred basepairs in most species harbouring multiple CRISPR loci (iv) the absence of long open reading frames within the locus and (v) the presence of the cas1 gene accompanied by the cas2 cas3 or cas4 genes in CRISPR-containing speciesrdquo

The cas genes are our ldquorepairrdquo system

ldquoHelicaserdquo cassette

ldquoPolymeraserdquo cassette

paragon of prokaryotic genome plasticity-extensive gene shuffling-evidence of multiple horizontal transfers-widespread gene loss

RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins

extreme sequence diversity MOTIFs specific I II III IV V COGs 13361367 hhshhGs ust-lKGhh+hh hhGtt hD lGhttsGh 160413371332 y1726-like slhlpEKuVRGT lRTIDs YGuVTsGhuh COG1851 hGphpGpsaFh hGFGRh BH0337-like TpA-h+GIh-uIh hhLpDV LGsREhuht COG1567 hhhpp ups-lhtAhh luscoGhGh COG1769 hhhh+Ph-hhh ss-hhGhlhsh hGh hGtcp+hsthchp COG1688 (Cas5) hhhhhts sssshhGhlsh lGttphh COGs 15835551 hhhhoPhhl hGtppshGFGl YgcH-like hHphlh hGu+uhGhGhh y1727-like LHphLh GFsthGLStss MJ0978-like hHNH lG+tsuhGhGol

Ferredoxin foldRNA-recognition motif (RRM)

The sequence similarity space of CRISPR repeats visualized with the BioLayoutprogram [26] Dots denote individual repeat sequences connecting lines represent Smith-Waterman similarities such that closer dots represent more similar sequences Dot colors denote cluster association as derived from MCL clustering The 12 largest clusters are indicated by circles together with their sequence logos coarse phylogenetic composition and sample secondary structures where applicableKunin et al Genome Biology 2007 8R61 doi101186gb-2007-8-4-r61

CRISPR show extreme diversity and complex clustering

Arcful GTTGAAATC-------AGACCAAAATGGGATTGAAAG- Metthe GTTAAAATC-------AGACCAAAATGGGATTGAAAT- Pyraby GTTCCAATA-------AGACTAGAATAGAATTGAAAG- Pyrhor GTTTAATAA--------GACTAAAATAGAATTGAAAG- Sulsol GATTAATCC-------------CAAAAGAATTGAAAG- Pyrhor GTTTCCGTA--------GAACTAAATAGTGTGGAAAG- Pyraby GTTTCCGTA--------GAACTTAGTAGTGTGGAAAG- Pyraer GTTTCAACT------------ATCTTTTGATTTTTGG- Pictor GTTAAATAA-------TAACCTAAATAGGATTGAAAG- Theaer GTAAAATAG---------ACCTTAATAGGATTGAAAG- Metace GTTTCAATC-------CCTCAAAGGTCTGATTTTAAC- Sultok GATGAATCC------------CAAAAGGAATTGAAAG- Sulaci GTTTTAGTT-------------TCTTGTCGTTATTAC- Pictor GTTTAAGAA--------TTACTAGATAGTATGGAGT-- Halmar GTTTCAGAC-------GGACCCTTGTGGGGTTGAAGC- Metjan ATTAAAATC-------AGACCGTTTCGGAATGGAAAT- Metkan GTTTCATTA-CCCGTATTATTACGGGTTAATTGCGAG- Nanequ CTTTCAATA---------TTTCTAATATATTAGAAAC- Themar GTTTCAATA-------CTTCCTTAGAGGTATGGAAAC- Theten GTTTCAATC-------CCTTTTAGGTAGGCTAAAAAC Thethe GTTGCAAAC-CTCGTTAGCCTCGTAGAGGATTGAAAC- Aquaeo GTTTTAACT--------CCACACGGTACATTAGAAAC- Bachal GTCGCACTC----TATATGGGT-GCGTGGATTGAAAT- Azoarcus GTGTTCCCC-------GCGCATCGCGGGGGTTGAAG-- Chltep GTCTTCCCC--------ACGCC-CGTGGGGGTGTTTC- Chrvio GTGCTCCCC-------CACGCA-CGTGGGGATGAACCG Clotet GTATTAGTA-------GCACCA-TATTGGAATGTAAAT Anabaena GTTTTAATTAACAAAAATCCCTATCAGGGATTGAAAC- Myctub GTTTCCGTC--CCCTCTCGGGGTTTTGGGTCTGACGAC Yerpes GTTCACTGC--------CGCACAGGCAGCTTAGAAA-- Metcap GTTTCAATCCACTCCCGGCTATTTAGCCGGGAGATAC- Mycgal GTTTTAGCA-CTGTACAATACTTGTGTAAGCAATAAC- Pasmul GTTAACTGC--------CGTATAGGCAGCTTAGAAA-- Pasmul GTTGTAGTTCCCTCTCTCATTTCGCAGTGCTACAAT-- Pasmul GTTCACCAT--------CGTGTAGATGGCTTAGAAA--

All

A large subset of CRISPRs is conserved among diverse specieseven between archaea and bacteria suggesting that

CRISPRs are horizontally transferred together with cas genes

Mojica FJ Diez-Villasenor C Garcia-Martinez J Soria E

Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elementsJ Mol Evol 2005 Feb60(2)174-82

Here we show that CRISPR spacers derive from preexisting sequences either chromosomal or within transmissible genetic elements such as bacteriophages and conjugative plasmids

helliphelliphelliphellipThe transcription of the CRISPR loci (Tang et al 2002) suggests that such activity could be executed by CRISPR-RNA molecules acting as regulatory RNA that specifically recognizes the target through the homologous RNA-spacer sequence similarly to the eukaryotic interference RNA

HypothesisCRISPRCasbull is a prokaryotic immunity system that functions on the

RNAi principle

bull integrates short fragments of essential phageplasmid genes into CRISPRs

bull When expressed these fragments (psiRNA ndash after prokaryotic siRNA) silence the target gene and make the organism immune to the respective agent

bull contains all or most of the protein activities involved in these processes

bull Some of the Cas proteins are functional analogs of the eukaryotic proteins involved in RNAi in particular components of RISC (RNA-Induced Silencing Complex) and form prokaryotic analogs of RISC (pRISCs)

Transcription

(protein-guided) RNA folding

specific transcription factor (COG1517)

cellular RNApol

polycistronic pre-psiRNA

p-dicer (Helicase+HD-Hydrolase)(COG1203 COG2254)

RNA processing(fast)

pre-psiRNA75-100 nt

RNA processing(slow)

p-dicer

psiRNA25-45 nt

p-slicer(COG1468 COG4343

COG1857)

RAMP

target RNA cleavage

p-RISC

plasmid or phage mRNA

Annealing to RNA target

p-RISC

The basic scheme of CASS functioning

psiRNA25-45 nt

RAMP

RAMP-RNA complex(unstable)plasmid or phage

mRNA

Primer elongationCASS RNApol(COG1353)

Amplified annealing

long dsRNA (stable)

p-dicerDuplex degradation

RAMPRAMP binding

Cycle continues

Annealing to RNA target

Variant of CASS functioning with polymerasepsi-RNA amplification(mostly in thermophiles but also in Mycobacteria)

plasmid or phage mRNA

pre-psiRNA75-100 nt

CASS RNApol(COG1353)Or other RT

Reverse transcription with random copy choice

Random RNA recombination and reverse transcription

dsDNA with CRISPRs and target-derived spacers

Homologous recombination with genomic CRISPR region

integrase(COG1518)

genomic DNA

genomic DNA with new target-derived spacer

OR

New psiRNA generation

Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12

Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity

Key validation

bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease

Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes

(1) Adaptation

(2) Expression amp Processing

(3) Interference

invading virus plasmid

CRISPRleader

5 4 3 2 1

6 5 4 3 2 1

invading virus plasmid

invading virus plasmid

host

host

cas operon

Cascade

Cas3

cas3 cas2cas1cas5

Van der Oost et al TIBS 2009

The three types of CRISPRCas systems and their signature genes

1343

1343

1343

1343

Helicase1203

HD-f

Cas2

Cas6

Cas6

Cas5

Cas8

Cas9

Cas1

Cas3

Cas10

Cas11

Cas7

Cas6

Cas6

Cas5 Cas6

Cas4

Cas6

13433513

1343

RuvC-f

3513

HNH

13433513

1857

18571857 RAMP1688

RAMP1688

RecB-f1468

RecB-f1468

Helicase1203

R AMP1583

1857 1857Helicase1203

1857Helicase1203

1343 y1724

ygcKygcL

EcoliCASS2

YpestCASS3

NmeniCASS4

MtubeCASS6 RAMP module

DvulgCASS1

Tneap and HmariCASS7

ApernCASS5

Type I (Helicase cassette)

Cascade

Type II (HNH-type)

Type III (RAMP module or Polymerase cassette)

CASS4a

1857 1343RecB-f1468

Helicase1203

BH0338MTH1090LA3191

1421 RAMP1567

RAMP15833337RAMP

1583

RAMP1583

RAMP1769

SPy1049

Cas6

Cas61857

RecB-f1468

RAMP1688

Helicase1203

AF0070

R AMP1583

1343

1517

Cas12 Cas13

RAMP1688

RAMP1688

RAMP1688

Makarova et al NRM submitted

Experimental data on CRISPRCas systems

1857

1857

1857

RAMP1688

10

Cordip DIP0037Bacfra BF3951

Wolsuc WS1444Camjej Cj1522c

Neimen NMA0630Pasmul PM1126

Strthe str0658Strpyo Spy 0770

Treden TDE0328Staepi SERP2463

Mycgal MGA 0523Mycmob MMOB0320

Porgin PG1982Legpne lpp0161

Wolsuc WS1615Sulsol SSO1450

Aerper APE1240Sulsol SSO1405

Pyraer PAE0200Pyraer PAE0081

Metmaz MM0559Metmaz MM3249

Arcful AF1878Mansuc MS1635

Niteur NE0111Myctub Rv2817c

Nostoc alr0381Synech slr7071Thethe TTHB145

Chrvio CV1229Xanaxo XAC3842

Chltep CT1130Desvul DVUA0134

Metcap MCA0651Azoarc ebA3283

Deigeo_DRAFT_1682Mansuc MS0981

Baccla ABC3592Bachal BH0341

Strpyo Spy 1286Vibvul VVA1544

Nostoc alr1468Synech slr7092

Synech slr7016Geosul GSU0057

Thethe TTHB224Nostoc alr1568

Metkan MK1312

Esccol b2755Saltyp STM2938

Phopro PBPRC0034Geosul GSU1392

Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639

Symthe_STH663Corjei jk0643

Nocfar nfa44220Strave SAV7537

Metcap MCA0930Cordip DIP2214

Chrvio CV1756Zymmob ZMO0680

Pasmul PM0311Erwcar ECA3679

Yerpes y1722Legpne lpl2837Phopro PBPRB1995

Acinet ACIAD2484

Clotet CTC01148Fusnuc FN1177

Aquaeo aq 369Theten TTE2658Themar TM1797

Bacfra BF2544Porgin PG2014

Halmar pNG4053Clotet CTC01463

Metjan MJ0378Pyrhor PH1245

Thekod TK0455Metthe MTH1084

Arcful AF2435Metace MA3670

Nanequ NEQ017Pyrhor PH0173

Pictor PTO0003Pictor PTO0049Thevol TVN0106

1343

1343

1343

1343

1518

1518

1343

1518

RecB-f4343

RecB-f1468

RecB-f1468

RecB-f1468

Helicase12031857 RAMP

1688

RAMP1688

Helicase1203

Helicase1203

RAMPBH0337

RAMPYgcH

RAMPy1727

RAMPy17261857

Helicase1203

HD-f

Helicase1203

HD-f

1343

HD-f

HD-f

HD-f

RuvC-f

AF0070

BH0338

y1724

ygcKygcL

RAMP1583

HTH

AF1870

SPy1049

3574

AF1870

PH0918

HTH 3574

1857 RAMP1688

RecB-f1468

Helicase1203

HD-f

AF1870

MTH1090

AF1873

RAMP1583

3574

AF1870

3574

1343

1

7

2

3

4

5

6

7a

Sulsol SSO1999Sulsol SSO1440

Sultok ST2639Sultok ST0032

Sulsol SSO1402Pyraer PAE0068

Arcful AF1874Pyrhor PH0917Thekod TK0450

Pyraby PAB1689

13433513

3513

1518

RuvC-f

HNH

4aHNH

1 0

Pyrhor PH0162

6

64

6

4

16

17

7

7

7

7

1

2

7

7

7

7

5 7

5 75 7

5 75 7

5 7

6 1

7

Metkan MK1314

Metace MA1931

Metmaz MM3356

Metbar Mbar_A1355

Strthe stu0960Myctub Rv2823cStaepi SERP2461

Themar TM1811Mansuc MS1651

Niteur NE0123Thethe TTP0102

Metthe MTH1082

Metthe MTH326

Metjan MJ1672Pictor PTO0053Thevol TVN0112

Metkan MK1297Thethe TTP0115

Bachal BH0328Synech sll7090

Aquaeo aq 387Theten TTE2639

Themar TM1794Arcful AF1867

Pyrfur PF1129Thefus Tfu1578

Porgin PG1987Sultok ST1979

Sulsol SSO1991Sulsol SSO1729

Sultok ST0012Sulaci Saci 2046Nostoc all1479

1421 RAMP1567

RAMP1583

RAMPMTH323

RAMP1769

XMTH324

ldquoPolymeraserdquo cassette

ldquoHelicaserdquo cassette(A)

(B)

(C)

Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4

Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56

Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)

mRNA targeting

Phylogeny of Cas1 and the 3 types of CRISPRCas systems

228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes

EcoliCASS2Type IType I

Type I

Type I

Type I

Type III

Type III

Type III

Type III

Type III

Type III

Type II

YpestCASS3

NmeniCASS4

PolymeraseRAMPmoduleMtubeCASS6

PolRAMP

DvulgCASS1

TneapHmariCASS7

ApernCASS5

cas1cas9 cas2 cas4

cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI

II

IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2

RAMP

Makarova et al NRM submitted

CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria

bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes

54

256

1537

26

516

2

9

7 56107

3

10

13

383

210

46

216

8

1

7 70211

10

1

0102030405060708090

100

1623

84 6

17 7

2322

0

9

0

0

151

2

1

21

17

20 1

0

1533 28 7 14

0

9 7 40

117 210

0102030405060708090

100

Type I Type II Type IIICas1presenceabsence

Makarova et al in preparation

Modular evolution of the 3 types of CRISPRCas

Makarova et al in preparation

Mol Microbiol 2011 Jan79(2)484-502

A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair

Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF

Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair

Back to repair functions

The 3 major modalities of evolution

Koonin Wolf Biol Direct 2009

CRISPRCas as a bona fide Lamarckian system

Koonin Wolf Biol Direct 2009

Phenomenon Biological rolefunction

Phyletic spread

Lamarckian criteria

Genomic changes caused by environmental factor

Changes are specific to relevant genomic loci

Changes provide adaptation to the causative factor

Bona fide LamarckianCRISPRCas Defense against

viruses and other mobile elements

Most of the Archaeaand many bacteria

Yes Yes Yes

piRNA Defense against transposable elements in germline

Animals Yes Yes Yes

HGT (specific cases)

Adaptation to new environment stress response resistance

Archaea bacteria unicellular eukaryotes

Yes Yes Yes

Quasi-LamarckianHGT (general phenomenon)

Diverse innovations Archaea bacteria unicellular eukaryotes

Yes No Yesno

Stress-induced mutagenesis

Stress responseresistanceadaptation to new conditions

Ubiquitous Yes No or partially

Yes (but general evolvabilityenhanced as well)

Diverse Lamarckian and quasi-Lamarckian phenomena

Koonin Wolf Biol Direct 2009

Stress as a gauge of evolutionary modality

Koonin Wolf Biol Direct 2009

bullEvolution of parasites is intrinsic to any replicator system

bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since

bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms

bullPerennial arms race between parasites and hosts is one of the principal factors of evolution

Valerian DoljaOregon State University

Yuri Wolf NCBITatiana Senkevich NIAID NIH

Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille

Didier Raoult et al

Bill MartinUniv Duesseldorf

Kira Makarova NCBI

  • Slide Number 1
  • What is a virus
  • Slide Number 3
  • Slide Number 4
  • Slide Number 5
  • Comparative genomics shows that viruses that cause human diseases belong to families that evolved hundreds of millions or even billions years agoViruses accompany the evolving cellular life throughout its history and might even predate it
  • Slide Number 7
  • Slide Number 8
  • Slide Number 9
  • Slide Number 10
  • Slide Number 11
  • Slide Number 12
  • Slide Number 13
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Slide Number 18
  • Slide Number 19
  • Slide Number 20
  • Slide Number 21
  • Slide Number 22
  • Slide Number 23
  • Slide Number 24
  • Slide Number 25
  • Slide Number 26
  • Slide Number 27
  • Slide Number 28
  • Slide Number 29
  • Slide Number 30
  • Slide Number 31
  • Slide Number 32
  • Slide Number 33
  • Slide Number 34
  • Slide Number 35
  • Slide Number 36
  • Evolution of antivirus defense systems
  • Slide Number 38
  • Slide Number 39
  • Slide Number 40
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins extreme sequence diversity
  • Slide Number 46
  • Slide Number 47
  • Slide Number 48
  • Slide Number 49
  • Hypothesis
  • Slide Number 51
  • Slide Number 52
  • Slide Number 53
  • Slide Number 54
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • The three types of CRISPRCas systems and their signature genes
  • Experimental data on CRISPRCas systems
  • Phylogeny of Cas1 and the 3 types of CRISPRCas systems
  • CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
  • Slide Number 62
  • Slide Number 63
  • Slide Number 64
  • Slide Number 65
  • Slide Number 66
  • Slide Number 67
  • Slide Number 68
  • Slide Number 69
Page 7: The Virus World, its evolution, evolution of antiviral defense

Iyer Aravind KooninCommon origin of four diverse families of large eukaryotic DNA viruses J Virol 2001 75 11720Iyer et al Evolutionary genomics of nucleo-cytoplasmic large DNA viruses Virus Res 2006 Apr117(1)156

The largest most complex viruses the Nucleocytoplasmic Large DNA Viruses (NCLDV)

(this is where the smallpox virus AND themimivirus belong)

6 families of NCLDVhellipand counting

size kb hosts-poxviridae 26 [134-360] vertebrates insects-asfarviridae 1 [170] vertebrates protists()-iridoviridae 8 [103-212] vertebrates insects protists()-ascoviridae 4 [119-174] insects-phycodnaviridae 9 [155-407] algae haptophytes stramenopiles -mimiviridae 2 [1181-1200] amoebozoa algae()-[Marseille virus] 1 [368] amoebozoa ndash new family

The case for the monophyly (common origin) of NCLDV

9 universally conserved hallmark genes (vaccinia gene names)-primase (D5-N)-helicase (D5-C)-DNA polymerase (E9)-packaging ATPase (A32)-Major capsid protein (D13 non-capsid in poxviruses)-Thiol-oxidoreductase (E10)-Helicase (D6 D11)-ST protein kinase (F10)-Transcription factor VLTF2 (A1)

47 genes mapped to the last common ancestor of NCLDVby maximum likelihood ndash all main functional systems represented

Iyer et al J Virol 2001 Virus Res 2006 Yutin et al Virol J 2009

b1 Helvib1 Spofr

b1 Trinil2 Invir

l1 Aedtal3 Lymch

l3 Lymdil5 Ambtil5 Frovi

l5 Singrl4 Infsp

m6 Marseillevirusn1 Acapon2 Mamav

q1 Acatuq1 ParFRq1 ParMTq1 ParAR

q1 ParNYq1 Parbu

q6 Ostviq3 Ectsiq3 Felspq2 Emihu

u2 Amsmou2 Melsa

u1 Bovpau1 Orfviu1 Deevi

u1 Swiviu1 Myxviu1 Rabfi

u1 Goaviu1 Lumsku1 Shevi

u1 Tanviu1 Yabliu1 Yabmo

u1 Vacviu1 Varvi

u1 Molcou1 Crovi

u1 Canviu1 Fowvi

c1 Afrsw

1000010000

93159970

10000

1000010000

6702

9983

10000

7557

10000

1000010000

999310000

10000

9910

10000

7074

10000

6735

10000

10000

7417

10000

4747

454010000

9840

1000010000

10000

10000

10000

9720

9911

5452

9871

10000

10000

10000

05

Phycodnaviridae

Poxviridae

Ascoviridaeamp

Iridoviridae

MimiviridaeMarseillevirus

Asfarviridae

Phylogeny of NCLDV based on concatenation of 4 universal genes primase-helicase DNA polymerase packaging ATPase VLTF2 transcription factor

HOSTAnimals+ diverse protists

Amoebozoa Algae animals()

ChlorophytesHaptophytesstramenopiles

Animals haptophytes

bullDivergence of NCLDVmost likely antedatesdivergence of eukaryoticsupergroupsbullAlternativeaddition extensive horizontal transfer of viruses

Boyer Yutin et al PNAS 2009Koonin Yutin Intervirology2010

Animals

Boyer M Yutin N Pagnier I Barrassi L Fournous G Espinosa L Robert C AzzaS Sun S Rossmann MG Suzan-Monti M La Scola B Koonin EV Raoult DPNAS 2009106(51)21848-53

Giant Marseillevirus highlights the role of amoebae as a melting pot in emergence of chimeric microorganisms

Giant viruses such as Mimivirus isolated from amoeba found in aquatic habitatsshow biological sophistication comparable to that of simple cellular life formsand seem to evolve by similar mechanisms including extensive gene duplicationand horizontal gene transfer (HGT) possibly in part through a viral parasitethe virophage We report here the isolation of Marseille virus a previouslyuncharacterized giant virus of amoeba The virions of Marseillevirus encompass a 368-kb genome a minimum of 49 proteins and some messenger RNAs Phylogenetic analysis of core genes indicates that Marseillevirus is the prototypeof a family of nucleocytoplasmic large DNA viruses (NCLDV) of eukaryotes The genome repertoire of the virus is composed of typical NCLDV core genes and genes apparently obtained from eukaryotic hosts and their parasites or symbionts both bacterial and viral We propose that amoebae are melting potsldquoof microbial evolution where diverse forms emerge including giant viruses with complex gene repertoires of various origins

The mosaic composition of the Marseille virus genome

Gene content tree intervirus gene transfer

Amoeba as a melting pot for HGT between viruses andbacterial endosymbionts

There are really weird creatures out thereSome NCLDV host their own parasites

La Scola et al The virophage as a unique parasite of the giant mimivirus Nature 2008

Chimeric origin of the virophage genome

NCLDV

Mimivirus

Archaeal virus

NCLDVenvironmental only

Bacterialtransposon

La Scola et al Nature 2008 455100-4

Hypothesisorigin of the NCLDV (and other viruses ofEukaryotes) in the

the second melting pot of virus evolution ndasheukaryogenesis

Phage scaffold(virus hallmark genes)

Eukaryotic additionsdisplacements

Koonin Yutin Intervirology 2010

Poliovirus74 kb

VP0 3DPol2C 3CPro Angg

IRESVP1VP3 2APro 2B

3A 3B

Jelly-roll CPsSuperfamily 3

helicase RdRp

Viral hallmark genes

3C-Pro3DPol

RdRp3CPro g

VPg2C

S3H

Picornaviral lsquosignaturersquo genes

Protein sequence conservation

Some of the smallest viruses (this is where poliovirus belongs) The Big Bang of picorna-like virus evolution

Koonin Wolf Nagasaki Dolja Nat Rev Microbiol 2008 Dec6(12)925-39

Picorna-likeviruses possessdiverse arraysof hallmark andunique genes

Picorna-likeviral superfamily

Marine eukaryotic plankton carries a wealth ofpositive-strand RNA viruses nearly all belong to the picorna-like superfamily

Lang et al (2004) Virology 320206

HcRNAV44 kb

VP

HaRNAV86 kb

S-Pro Pol

VP1VP3VP2PolHel3

RsRNAV89 kb AnPolHel3 VP 1-3

SssRNAV90 kb An

PolHel3 VP 1-3C-Pro

sg sg

An

Nagasaki et al (2005) Appl Env Microbiol 718888

Culley et al (2006) Science 3121795

The amended Picornavirus-like

superfamily includes 14 recognized viral families

4 floating genera and15 unclassified positive-

strand anddouble-strand RNA

viruses that infect hosts from 4 of the 5 eukaryotic

supergroups-6 distinct clades

from RdRp phylogeny-diverse genome layouts

View from the viral side

6 major clades of picorna-likeviruses - 5 infect eukaryotesfrom different supergroups

The 5 supergroups of eukaryotes and their picorna-like viruses

Complementary view from thehost side

4 of the 5 eukaryoticsupergroups host diverse picorna-like viruses (no onechecked the 5th)

PolioV74 kb

VP0 3DPol2C 3CPro Angg

IRESVP1VP3 2APro 2B

3A 3B

Jelly-roll CPsSuperfamily 3

helicase RdRp

DNA phages Bacterial group II

retroelements

Bacterial and mitochondrialHtrA family proteases

Likely origins of the signature genes of the picorna-like superfamily were inferred usingsignature PSSMs and PSI-BLAST searches

3C-Pro

Hallmark genes of picorna-like viruses have distinct prokaryoticorigins

Radiation of major viralclades occurred in aldquoBig Bangrdquo during eukaryogenesis and antedates the divergenceof eukaryotic supergroups-the viruses then ldquosampledrdquo the hosts

Conclusions on the picorna and NCLDV storiesBig Bangs of virus evolution

bullPhylogenomic analysis of the picorna-like superfamily of eukaryotic RNA viruses indicates that the major lineages within this superfamily diverged prior to the divergence of the eukaryotic supergroupsbullBig Bang ndash an explosive early phase of viral evolutionbullMost likely the same pattern holds for other major groups ofviruses as illustrated by the evolutionary study of NCLDVa completely different group of virusesbullThe Big Bangs of eukaryotic virus evolution occurred concomitantly with a similar rapid phase of host evolution andcould be a manifestation of a general model of major evolutionarytransitions

Koonin The Biological Big Bang model for the major transitions in evolutionBiol Direct 2007 Aug 20221

Koonin Wolf Nagasaki Dolja Nat Rev Microbiol 2008 Dec6(12)925-39

+

1 Positive-strandRNA

3-30 kb - +R R ET

T+R

Class Replication cycle

2 Double-strandRNA

4-25 kb plusmnTr R E

T

+ plusmn

3 Negative-strandRNA

11-20 kbR R E

RdRp CPT

+-Tr -

+

Genetic cycles of RNA viruses

Diversity of viral genetic cycles versus the uniform genetic cycle of all cellular organisms Viruses are the

biospherersquos laboratory of genomic strategies

RdRp CP

RdRp CP

+4 Retroid

RNA viruses 7-12 kb +RT Tr E

T+Tr

plusmn

5 Retroid DNAviruses

elements2-10 kb +Tr

T+Tr

plusmnE RT

plusmn

RT CP

RT

Class Replication cycle

Genetic cycles of retroid viruses and retroelements

Genetic cycles of DNA viruses and plasmids 6 ssDNAvirusesplasmids2-11 kb

7 dsDNA virusesplasmids

5-1200 kb

+RCR RCR E

+Tr

plusmn +

TRCE

Tr

plusmn

+T

plusmnR E

Pr-Pol ALL CELLULARORGANISMS

YOU ARE HERE

I Genes with readily detectable homologs from cellular life forms 1 Genes with closely related homologs from cellular organisms

(typically the host of the given virus) present in a narrow group of viruses2 Genes that are conserved within a virus lineage or even several

lineages and have moderately close cellular homologsOrigin relatively recent (1) or ancient (2) acquisition from host

II Virus-specific genes3 ORFans ie genes without detectable homologs except possibly

in closely related viruses 4 Virus-specific genes that are conserved within a virus lineageAcquisition from host but with rapid divergence from ancestor once within viral genomes

III Viral hallmark genes5 Genes shared by many diverse virus lineages with only very distant

homologs in cellular organisms

Natural history of viral genes a one-page summary of viral comparative genomics

Contributions of different classes of viral genes to the genomes of different classes of viruses strong dependence on genome size

Genome size (log) 0 1 2 3

Recent acquisitions

Old acquisitions

Virus-specific ORFans

Virus-specific conserved

Virus hallmark

Most RNAvirusesretroelementsRCR replicons

Large RNA virusesadenotailedphages

NCLDVherpeslargephages

Natural history of viral genesViral Hallmark Genes

Shared by many diverse groups of viruses from the smallest RNA viruses to the giant DNA viruses

Strong support for monophyly of all viral members of the respective gene families

Only distant homologs in cellular organisms

Can be viewed as signatures of the lsquovirus statersquo

Play major roles in genome replication packagingand assembly

4 Rolling circle replicationinitiation endonuclease

1 Jelly-roll capsid protein

2 Superfamily 3 helicase

Protein products of viral hallmark genes

3 RNA-dependent RNA polymeraseand Reverse transcriptase

5 Viral archaeo-eukaryotic DNA primase

6 UL9-like superfamily 2 helicase

7 Packaging ATPase of the FtsK family

8 ATPase subunit of terminase

The primordial gene pool hypothesis (extremely counterintuitive ndash Santiago Elena Feb 16 2011)

The hallmark genes AND by implication the major lineages of modern viruses (at least viruses of prokaryotes) descend directly from a primordial gene pool

Koonin Senkevich Dolja Biol Direct 2006 129

Viral hallmark genes arebullpresent in a huge diversity of viruses and other selfish elementsbullrepresented only by remote homologs in cellular life forms

-synergy with the diversity of genomic strategies

A crucial corollary If viruses come directly from a primordial gene pool then origin of viruses isinextricably linked to the origin of cells

Cell degeneration Escaped genes Pr imordial geneticsystems

CELL

SMALLPARASITICCELL

VERYSMALLVIRUS

CHROMOSOME

PLASMID

VIRUS

RNA

VIRUS

PRE-CELLULAR LIFE FORMS

DNA

VIRUS

mRNA

VIRUS

Competing concepts of the or igin of viruses

CELLS

Origin and evolution of virus-like genetic

elementsin the pre-cellular

era

Koonin Martin TIG 2005Koonin Senkevich Dolja Biol Direct 2006 129Koonin Ann NY Acad Sci 2009

Replicon fusionyields chromosomes

bullViruses and virus-like genetic elements are not ldquojustrdquo pathogens they aredominant entities in the biospherebullEmergence of virus-like parasites is inevitable in any replicating systembullIn the pre-cellular epoch the genetic elements that later became viral andcellular genomes comprised a single pool in which they mixed matched andevolved new increasingly complex gene ensemblesbullDifferent replication strategies including RNA replication reverse transcriptionand DNA replication evolved already in the primordial genetic poolbullWith the emergence of prokaryotic cells a distinct pool of viral genes formed that retained its identity ever since as evidenced by the extant distribution ofviral hallmark genes ldquovirus worldrdquo or the virospherebullThe emergence of the eukaryotic cell was a second melting pot of virus evolution from which viruses of eukaryotes originated via recombination of genes from prokaryote viruses retroelements and the evolving eukaryotic hostbullViruses make essential contributions to the evolution of the genomes of cellularlife forms in particular as vehicles of HGT GTAs transducing phages

The ancient Virus World (VW)

Koonin EV Senkevich TG Dolja VV The ancient Virus World and evolution of cells Biol Direct 2006 Koonin EV Wolf YI Nagasaki K Dolja VV The complexity of the virus world Nat Rev Microbiol 2009

Evolution of antivirus defense systems

bull CRISPRCas system of adaptive immunity in prokaryotes

bull A case for Lamarckian evolutionbull The perennial virus-host arms race

CRISPR repeats and Cas genesCRISPR Clustered Regularly interspaced short palindromic repeatsCas CRISPR-associated (genes)

Sorek et al Nature Rev Microbiol 2008

Makarova KS Aravind L Grishin NV Rogozin IB Koonin EVA DNA repair system specific for thermophilic Archaea and bacteria predicted by genomic context analysisNucleic Acids Res 2002 Jan 1530(2)482-96

During a systematic analysis of conserved gene context in prokaryotic genomes a previously undetected complex partially conserved neighborhood consisting of more than 20 genes was discovered in most Archaea and some bacteria including the hyperthermophiles Thermotoga maritima and Aquifex aeolicus The gene composition and gene order in this neighborhood vary greatly between species but all versions have a stable conserved core that consists of five genes One of the core genes encodes a predicted DNA helicase often fused to a predicted HD-superfamily hydrolase and another encodes a RecB family exonuclease three core genes remain uncharacterized but one of these might encode a nuclease of a new family helliphelliphelliphellipThe functional features of the proteins encoded in this neighborhood suggest that they comprise a previously undetected DNA repair system which to our knowledge is the first repair system largely specific for thermophiles to be identified

Protein components of the system an update with unification of many diverse families

~25 families altogetherFamily SubfamilyA Phyletic

distributionBComments

1 COG1518 COG1518 (cas1) All Putative novel nucleaseintegrase Mostly α-helical protein

2 COG1343 COG1343 (cas2) COG3512ygbF-like

MTH324-likey1723_N-like

All Small protein related to VapD fused to helicase (COG1203) iny1723-like proteins

3 COG1203 COG1203 (cas3) All DNA helicase Most proteins have fusion to HD nuclease

4 RecB-like nuclease

COG1468 (cas4) COG4343 All RecB-like nuclease Contains three-cysteine C-terminal cluster

5 RAMP Repair-associated mysterious

protein

COG1688 (cas5)COG1769 COG1583COG1567 COG1336COG1367 COG1604COG1337 COG1332COG5551

BH0337-likeMJ0978-likeYgcH-like

y1726-like y1727-like

All Belong to ldquoRAMPrdquo family possibly RNA-binding proteinstructurally related to duplicated ferredoxin fold (PDB 1wj9)

6 COG1857 COG1857 COG3649 YgcJ-like y1725-like

All αβ protein predicted nuclease or integrase

7 HD-like nuclease

COG1203 (N-terminus) COG2254 All HD-like nuclease

8 BH0338 BH0338-likeMTH1090-like

All mostlyarchaea and FIRM

Large Zn-finger containing proteins probably nucleases (nucleaseactivity was shown for MTH1090 (ref)

9 COG1353 predicted

polymerase

COG1353 Most archaea some bacteria

Predicted palm-domain polymerase distantly related to viral RdRp and RT

hellipand ~15 other less common proteins Makarova et al BD 2006

CRISPR clustered regularly interspaced short palindromic repeats

CRISPR repeats

TB and TA

Strepto-like

Ecoli-like

Pasteurella-like

Cas2COG1343

Cas1COG1518

Cas4COG1468

Cas3COG1203

ldquoThe common structural characteristics of CRISPR loci are (i) the presence of multiple short direct repeats which show no or very little sequence variation within a given locus (ii) the presence of non-repetitive spacer sequences between the repeats of similar size (iii) the presence of a common leader sequence of a few hundred basepairs in most species harbouring multiple CRISPR loci (iv) the absence of long open reading frames within the locus and (v) the presence of the cas1 gene accompanied by the cas2 cas3 or cas4 genes in CRISPR-containing speciesrdquo

The cas genes are our ldquorepairrdquo system

ldquoHelicaserdquo cassette

ldquoPolymeraserdquo cassette

paragon of prokaryotic genome plasticity-extensive gene shuffling-evidence of multiple horizontal transfers-widespread gene loss

RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins

extreme sequence diversity MOTIFs specific I II III IV V COGs 13361367 hhshhGs ust-lKGhh+hh hhGtt hD lGhttsGh 160413371332 y1726-like slhlpEKuVRGT lRTIDs YGuVTsGhuh COG1851 hGphpGpsaFh hGFGRh BH0337-like TpA-h+GIh-uIh hhLpDV LGsREhuht COG1567 hhhpp ups-lhtAhh luscoGhGh COG1769 hhhh+Ph-hhh ss-hhGhlhsh hGh hGtcp+hsthchp COG1688 (Cas5) hhhhhts sssshhGhlsh lGttphh COGs 15835551 hhhhoPhhl hGtppshGFGl YgcH-like hHphlh hGu+uhGhGhh y1727-like LHphLh GFsthGLStss MJ0978-like hHNH lG+tsuhGhGol

Ferredoxin foldRNA-recognition motif (RRM)

The sequence similarity space of CRISPR repeats visualized with the BioLayoutprogram [26] Dots denote individual repeat sequences connecting lines represent Smith-Waterman similarities such that closer dots represent more similar sequences Dot colors denote cluster association as derived from MCL clustering The 12 largest clusters are indicated by circles together with their sequence logos coarse phylogenetic composition and sample secondary structures where applicableKunin et al Genome Biology 2007 8R61 doi101186gb-2007-8-4-r61

CRISPR show extreme diversity and complex clustering

Arcful GTTGAAATC-------AGACCAAAATGGGATTGAAAG- Metthe GTTAAAATC-------AGACCAAAATGGGATTGAAAT- Pyraby GTTCCAATA-------AGACTAGAATAGAATTGAAAG- Pyrhor GTTTAATAA--------GACTAAAATAGAATTGAAAG- Sulsol GATTAATCC-------------CAAAAGAATTGAAAG- Pyrhor GTTTCCGTA--------GAACTAAATAGTGTGGAAAG- Pyraby GTTTCCGTA--------GAACTTAGTAGTGTGGAAAG- Pyraer GTTTCAACT------------ATCTTTTGATTTTTGG- Pictor GTTAAATAA-------TAACCTAAATAGGATTGAAAG- Theaer GTAAAATAG---------ACCTTAATAGGATTGAAAG- Metace GTTTCAATC-------CCTCAAAGGTCTGATTTTAAC- Sultok GATGAATCC------------CAAAAGGAATTGAAAG- Sulaci GTTTTAGTT-------------TCTTGTCGTTATTAC- Pictor GTTTAAGAA--------TTACTAGATAGTATGGAGT-- Halmar GTTTCAGAC-------GGACCCTTGTGGGGTTGAAGC- Metjan ATTAAAATC-------AGACCGTTTCGGAATGGAAAT- Metkan GTTTCATTA-CCCGTATTATTACGGGTTAATTGCGAG- Nanequ CTTTCAATA---------TTTCTAATATATTAGAAAC- Themar GTTTCAATA-------CTTCCTTAGAGGTATGGAAAC- Theten GTTTCAATC-------CCTTTTAGGTAGGCTAAAAAC Thethe GTTGCAAAC-CTCGTTAGCCTCGTAGAGGATTGAAAC- Aquaeo GTTTTAACT--------CCACACGGTACATTAGAAAC- Bachal GTCGCACTC----TATATGGGT-GCGTGGATTGAAAT- Azoarcus GTGTTCCCC-------GCGCATCGCGGGGGTTGAAG-- Chltep GTCTTCCCC--------ACGCC-CGTGGGGGTGTTTC- Chrvio GTGCTCCCC-------CACGCA-CGTGGGGATGAACCG Clotet GTATTAGTA-------GCACCA-TATTGGAATGTAAAT Anabaena GTTTTAATTAACAAAAATCCCTATCAGGGATTGAAAC- Myctub GTTTCCGTC--CCCTCTCGGGGTTTTGGGTCTGACGAC Yerpes GTTCACTGC--------CGCACAGGCAGCTTAGAAA-- Metcap GTTTCAATCCACTCCCGGCTATTTAGCCGGGAGATAC- Mycgal GTTTTAGCA-CTGTACAATACTTGTGTAAGCAATAAC- Pasmul GTTAACTGC--------CGTATAGGCAGCTTAGAAA-- Pasmul GTTGTAGTTCCCTCTCTCATTTCGCAGTGCTACAAT-- Pasmul GTTCACCAT--------CGTGTAGATGGCTTAGAAA--

All

A large subset of CRISPRs is conserved among diverse specieseven between archaea and bacteria suggesting that

CRISPRs are horizontally transferred together with cas genes

Mojica FJ Diez-Villasenor C Garcia-Martinez J Soria E

Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elementsJ Mol Evol 2005 Feb60(2)174-82

Here we show that CRISPR spacers derive from preexisting sequences either chromosomal or within transmissible genetic elements such as bacteriophages and conjugative plasmids

helliphelliphelliphellipThe transcription of the CRISPR loci (Tang et al 2002) suggests that such activity could be executed by CRISPR-RNA molecules acting as regulatory RNA that specifically recognizes the target through the homologous RNA-spacer sequence similarly to the eukaryotic interference RNA

HypothesisCRISPRCasbull is a prokaryotic immunity system that functions on the

RNAi principle

bull integrates short fragments of essential phageplasmid genes into CRISPRs

bull When expressed these fragments (psiRNA ndash after prokaryotic siRNA) silence the target gene and make the organism immune to the respective agent

bull contains all or most of the protein activities involved in these processes

bull Some of the Cas proteins are functional analogs of the eukaryotic proteins involved in RNAi in particular components of RISC (RNA-Induced Silencing Complex) and form prokaryotic analogs of RISC (pRISCs)

Transcription

(protein-guided) RNA folding

specific transcription factor (COG1517)

cellular RNApol

polycistronic pre-psiRNA

p-dicer (Helicase+HD-Hydrolase)(COG1203 COG2254)

RNA processing(fast)

pre-psiRNA75-100 nt

RNA processing(slow)

p-dicer

psiRNA25-45 nt

p-slicer(COG1468 COG4343

COG1857)

RAMP

target RNA cleavage

p-RISC

plasmid or phage mRNA

Annealing to RNA target

p-RISC

The basic scheme of CASS functioning

psiRNA25-45 nt

RAMP

RAMP-RNA complex(unstable)plasmid or phage

mRNA

Primer elongationCASS RNApol(COG1353)

Amplified annealing

long dsRNA (stable)

p-dicerDuplex degradation

RAMPRAMP binding

Cycle continues

Annealing to RNA target

Variant of CASS functioning with polymerasepsi-RNA amplification(mostly in thermophiles but also in Mycobacteria)

plasmid or phage mRNA

pre-psiRNA75-100 nt

CASS RNApol(COG1353)Or other RT

Reverse transcription with random copy choice

Random RNA recombination and reverse transcription

dsDNA with CRISPRs and target-derived spacers

Homologous recombination with genomic CRISPR region

integrase(COG1518)

genomic DNA

genomic DNA with new target-derived spacer

OR

New psiRNA generation

Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12

Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity

Key validation

bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease

Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes

(1) Adaptation

(2) Expression amp Processing

(3) Interference

invading virus plasmid

CRISPRleader

5 4 3 2 1

6 5 4 3 2 1

invading virus plasmid

invading virus plasmid

host

host

cas operon

Cascade

Cas3

cas3 cas2cas1cas5

Van der Oost et al TIBS 2009

The three types of CRISPRCas systems and their signature genes

1343

1343

1343

1343

Helicase1203

HD-f

Cas2

Cas6

Cas6

Cas5

Cas8

Cas9

Cas1

Cas3

Cas10

Cas11

Cas7

Cas6

Cas6

Cas5 Cas6

Cas4

Cas6

13433513

1343

RuvC-f

3513

HNH

13433513

1857

18571857 RAMP1688

RAMP1688

RecB-f1468

RecB-f1468

Helicase1203

R AMP1583

1857 1857Helicase1203

1857Helicase1203

1343 y1724

ygcKygcL

EcoliCASS2

YpestCASS3

NmeniCASS4

MtubeCASS6 RAMP module

DvulgCASS1

Tneap and HmariCASS7

ApernCASS5

Type I (Helicase cassette)

Cascade

Type II (HNH-type)

Type III (RAMP module or Polymerase cassette)

CASS4a

1857 1343RecB-f1468

Helicase1203

BH0338MTH1090LA3191

1421 RAMP1567

RAMP15833337RAMP

1583

RAMP1583

RAMP1769

SPy1049

Cas6

Cas61857

RecB-f1468

RAMP1688

Helicase1203

AF0070

R AMP1583

1343

1517

Cas12 Cas13

RAMP1688

RAMP1688

RAMP1688

Makarova et al NRM submitted

Experimental data on CRISPRCas systems

1857

1857

1857

RAMP1688

10

Cordip DIP0037Bacfra BF3951

Wolsuc WS1444Camjej Cj1522c

Neimen NMA0630Pasmul PM1126

Strthe str0658Strpyo Spy 0770

Treden TDE0328Staepi SERP2463

Mycgal MGA 0523Mycmob MMOB0320

Porgin PG1982Legpne lpp0161

Wolsuc WS1615Sulsol SSO1450

Aerper APE1240Sulsol SSO1405

Pyraer PAE0200Pyraer PAE0081

Metmaz MM0559Metmaz MM3249

Arcful AF1878Mansuc MS1635

Niteur NE0111Myctub Rv2817c

Nostoc alr0381Synech slr7071Thethe TTHB145

Chrvio CV1229Xanaxo XAC3842

Chltep CT1130Desvul DVUA0134

Metcap MCA0651Azoarc ebA3283

Deigeo_DRAFT_1682Mansuc MS0981

Baccla ABC3592Bachal BH0341

Strpyo Spy 1286Vibvul VVA1544

Nostoc alr1468Synech slr7092

Synech slr7016Geosul GSU0057

Thethe TTHB224Nostoc alr1568

Metkan MK1312

Esccol b2755Saltyp STM2938

Phopro PBPRC0034Geosul GSU1392

Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639

Symthe_STH663Corjei jk0643

Nocfar nfa44220Strave SAV7537

Metcap MCA0930Cordip DIP2214

Chrvio CV1756Zymmob ZMO0680

Pasmul PM0311Erwcar ECA3679

Yerpes y1722Legpne lpl2837Phopro PBPRB1995

Acinet ACIAD2484

Clotet CTC01148Fusnuc FN1177

Aquaeo aq 369Theten TTE2658Themar TM1797

Bacfra BF2544Porgin PG2014

Halmar pNG4053Clotet CTC01463

Metjan MJ0378Pyrhor PH1245

Thekod TK0455Metthe MTH1084

Arcful AF2435Metace MA3670

Nanequ NEQ017Pyrhor PH0173

Pictor PTO0003Pictor PTO0049Thevol TVN0106

1343

1343

1343

1343

1518

1518

1343

1518

RecB-f4343

RecB-f1468

RecB-f1468

RecB-f1468

Helicase12031857 RAMP

1688

RAMP1688

Helicase1203

Helicase1203

RAMPBH0337

RAMPYgcH

RAMPy1727

RAMPy17261857

Helicase1203

HD-f

Helicase1203

HD-f

1343

HD-f

HD-f

HD-f

RuvC-f

AF0070

BH0338

y1724

ygcKygcL

RAMP1583

HTH

AF1870

SPy1049

3574

AF1870

PH0918

HTH 3574

1857 RAMP1688

RecB-f1468

Helicase1203

HD-f

AF1870

MTH1090

AF1873

RAMP1583

3574

AF1870

3574

1343

1

7

2

3

4

5

6

7a

Sulsol SSO1999Sulsol SSO1440

Sultok ST2639Sultok ST0032

Sulsol SSO1402Pyraer PAE0068

Arcful AF1874Pyrhor PH0917Thekod TK0450

Pyraby PAB1689

13433513

3513

1518

RuvC-f

HNH

4aHNH

1 0

Pyrhor PH0162

6

64

6

4

16

17

7

7

7

7

1

2

7

7

7

7

5 7

5 75 7

5 75 7

5 7

6 1

7

Metkan MK1314

Metace MA1931

Metmaz MM3356

Metbar Mbar_A1355

Strthe stu0960Myctub Rv2823cStaepi SERP2461

Themar TM1811Mansuc MS1651

Niteur NE0123Thethe TTP0102

Metthe MTH1082

Metthe MTH326

Metjan MJ1672Pictor PTO0053Thevol TVN0112

Metkan MK1297Thethe TTP0115

Bachal BH0328Synech sll7090

Aquaeo aq 387Theten TTE2639

Themar TM1794Arcful AF1867

Pyrfur PF1129Thefus Tfu1578

Porgin PG1987Sultok ST1979

Sulsol SSO1991Sulsol SSO1729

Sultok ST0012Sulaci Saci 2046Nostoc all1479

1421 RAMP1567

RAMP1583

RAMPMTH323

RAMP1769

XMTH324

ldquoPolymeraserdquo cassette

ldquoHelicaserdquo cassette(A)

(B)

(C)

Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4

Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56

Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)

mRNA targeting

Phylogeny of Cas1 and the 3 types of CRISPRCas systems

228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes

EcoliCASS2Type IType I

Type I

Type I

Type I

Type III

Type III

Type III

Type III

Type III

Type III

Type II

YpestCASS3

NmeniCASS4

PolymeraseRAMPmoduleMtubeCASS6

PolRAMP

DvulgCASS1

TneapHmariCASS7

ApernCASS5

cas1cas9 cas2 cas4

cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI

II

IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2

RAMP

Makarova et al NRM submitted

CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria

bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes

54

256

1537

26

516

2

9

7 56107

3

10

13

383

210

46

216

8

1

7 70211

10

1

0102030405060708090

100

1623

84 6

17 7

2322

0

9

0

0

151

2

1

21

17

20 1

0

1533 28 7 14

0

9 7 40

117 210

0102030405060708090

100

Type I Type II Type IIICas1presenceabsence

Makarova et al in preparation

Modular evolution of the 3 types of CRISPRCas

Makarova et al in preparation

Mol Microbiol 2011 Jan79(2)484-502

A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair

Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF

Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair

Back to repair functions

The 3 major modalities of evolution

Koonin Wolf Biol Direct 2009

CRISPRCas as a bona fide Lamarckian system

Koonin Wolf Biol Direct 2009

Phenomenon Biological rolefunction

Phyletic spread

Lamarckian criteria

Genomic changes caused by environmental factor

Changes are specific to relevant genomic loci

Changes provide adaptation to the causative factor

Bona fide LamarckianCRISPRCas Defense against

viruses and other mobile elements

Most of the Archaeaand many bacteria

Yes Yes Yes

piRNA Defense against transposable elements in germline

Animals Yes Yes Yes

HGT (specific cases)

Adaptation to new environment stress response resistance

Archaea bacteria unicellular eukaryotes

Yes Yes Yes

Quasi-LamarckianHGT (general phenomenon)

Diverse innovations Archaea bacteria unicellular eukaryotes

Yes No Yesno

Stress-induced mutagenesis

Stress responseresistanceadaptation to new conditions

Ubiquitous Yes No or partially

Yes (but general evolvabilityenhanced as well)

Diverse Lamarckian and quasi-Lamarckian phenomena

Koonin Wolf Biol Direct 2009

Stress as a gauge of evolutionary modality

Koonin Wolf Biol Direct 2009

bullEvolution of parasites is intrinsic to any replicator system

bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since

bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms

bullPerennial arms race between parasites and hosts is one of the principal factors of evolution

Valerian DoljaOregon State University

Yuri Wolf NCBITatiana Senkevich NIAID NIH

Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille

Didier Raoult et al

Bill MartinUniv Duesseldorf

Kira Makarova NCBI

  • Slide Number 1
  • What is a virus
  • Slide Number 3
  • Slide Number 4
  • Slide Number 5
  • Comparative genomics shows that viruses that cause human diseases belong to families that evolved hundreds of millions or even billions years agoViruses accompany the evolving cellular life throughout its history and might even predate it
  • Slide Number 7
  • Slide Number 8
  • Slide Number 9
  • Slide Number 10
  • Slide Number 11
  • Slide Number 12
  • Slide Number 13
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Slide Number 18
  • Slide Number 19
  • Slide Number 20
  • Slide Number 21
  • Slide Number 22
  • Slide Number 23
  • Slide Number 24
  • Slide Number 25
  • Slide Number 26
  • Slide Number 27
  • Slide Number 28
  • Slide Number 29
  • Slide Number 30
  • Slide Number 31
  • Slide Number 32
  • Slide Number 33
  • Slide Number 34
  • Slide Number 35
  • Slide Number 36
  • Evolution of antivirus defense systems
  • Slide Number 38
  • Slide Number 39
  • Slide Number 40
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins extreme sequence diversity
  • Slide Number 46
  • Slide Number 47
  • Slide Number 48
  • Slide Number 49
  • Hypothesis
  • Slide Number 51
  • Slide Number 52
  • Slide Number 53
  • Slide Number 54
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • The three types of CRISPRCas systems and their signature genes
  • Experimental data on CRISPRCas systems
  • Phylogeny of Cas1 and the 3 types of CRISPRCas systems
  • CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
  • Slide Number 62
  • Slide Number 63
  • Slide Number 64
  • Slide Number 65
  • Slide Number 66
  • Slide Number 67
  • Slide Number 68
  • Slide Number 69
Page 8: The Virus World, its evolution, evolution of antiviral defense

The case for the monophyly (common origin) of NCLDV

9 universally conserved hallmark genes (vaccinia gene names)-primase (D5-N)-helicase (D5-C)-DNA polymerase (E9)-packaging ATPase (A32)-Major capsid protein (D13 non-capsid in poxviruses)-Thiol-oxidoreductase (E10)-Helicase (D6 D11)-ST protein kinase (F10)-Transcription factor VLTF2 (A1)

47 genes mapped to the last common ancestor of NCLDVby maximum likelihood ndash all main functional systems represented

Iyer et al J Virol 2001 Virus Res 2006 Yutin et al Virol J 2009

b1 Helvib1 Spofr

b1 Trinil2 Invir

l1 Aedtal3 Lymch

l3 Lymdil5 Ambtil5 Frovi

l5 Singrl4 Infsp

m6 Marseillevirusn1 Acapon2 Mamav

q1 Acatuq1 ParFRq1 ParMTq1 ParAR

q1 ParNYq1 Parbu

q6 Ostviq3 Ectsiq3 Felspq2 Emihu

u2 Amsmou2 Melsa

u1 Bovpau1 Orfviu1 Deevi

u1 Swiviu1 Myxviu1 Rabfi

u1 Goaviu1 Lumsku1 Shevi

u1 Tanviu1 Yabliu1 Yabmo

u1 Vacviu1 Varvi

u1 Molcou1 Crovi

u1 Canviu1 Fowvi

c1 Afrsw

1000010000

93159970

10000

1000010000

6702

9983

10000

7557

10000

1000010000

999310000

10000

9910

10000

7074

10000

6735

10000

10000

7417

10000

4747

454010000

9840

1000010000

10000

10000

10000

9720

9911

5452

9871

10000

10000

10000

05

Phycodnaviridae

Poxviridae

Ascoviridaeamp

Iridoviridae

MimiviridaeMarseillevirus

Asfarviridae

Phylogeny of NCLDV based on concatenation of 4 universal genes primase-helicase DNA polymerase packaging ATPase VLTF2 transcription factor

HOSTAnimals+ diverse protists

Amoebozoa Algae animals()

ChlorophytesHaptophytesstramenopiles

Animals haptophytes

bullDivergence of NCLDVmost likely antedatesdivergence of eukaryoticsupergroupsbullAlternativeaddition extensive horizontal transfer of viruses

Boyer Yutin et al PNAS 2009Koonin Yutin Intervirology2010

Animals

Boyer M Yutin N Pagnier I Barrassi L Fournous G Espinosa L Robert C AzzaS Sun S Rossmann MG Suzan-Monti M La Scola B Koonin EV Raoult DPNAS 2009106(51)21848-53

Giant Marseillevirus highlights the role of amoebae as a melting pot in emergence of chimeric microorganisms

Giant viruses such as Mimivirus isolated from amoeba found in aquatic habitatsshow biological sophistication comparable to that of simple cellular life formsand seem to evolve by similar mechanisms including extensive gene duplicationand horizontal gene transfer (HGT) possibly in part through a viral parasitethe virophage We report here the isolation of Marseille virus a previouslyuncharacterized giant virus of amoeba The virions of Marseillevirus encompass a 368-kb genome a minimum of 49 proteins and some messenger RNAs Phylogenetic analysis of core genes indicates that Marseillevirus is the prototypeof a family of nucleocytoplasmic large DNA viruses (NCLDV) of eukaryotes The genome repertoire of the virus is composed of typical NCLDV core genes and genes apparently obtained from eukaryotic hosts and their parasites or symbionts both bacterial and viral We propose that amoebae are melting potsldquoof microbial evolution where diverse forms emerge including giant viruses with complex gene repertoires of various origins

The mosaic composition of the Marseille virus genome

Gene content tree intervirus gene transfer

Amoeba as a melting pot for HGT between viruses andbacterial endosymbionts

There are really weird creatures out thereSome NCLDV host their own parasites

La Scola et al The virophage as a unique parasite of the giant mimivirus Nature 2008

Chimeric origin of the virophage genome

NCLDV

Mimivirus

Archaeal virus

NCLDVenvironmental only

Bacterialtransposon

La Scola et al Nature 2008 455100-4

Hypothesisorigin of the NCLDV (and other viruses ofEukaryotes) in the

the second melting pot of virus evolution ndasheukaryogenesis

Phage scaffold(virus hallmark genes)

Eukaryotic additionsdisplacements

Koonin Yutin Intervirology 2010

Poliovirus74 kb

VP0 3DPol2C 3CPro Angg

IRESVP1VP3 2APro 2B

3A 3B

Jelly-roll CPsSuperfamily 3

helicase RdRp

Viral hallmark genes

3C-Pro3DPol

RdRp3CPro g

VPg2C

S3H

Picornaviral lsquosignaturersquo genes

Protein sequence conservation

Some of the smallest viruses (this is where poliovirus belongs) The Big Bang of picorna-like virus evolution

Koonin Wolf Nagasaki Dolja Nat Rev Microbiol 2008 Dec6(12)925-39

Picorna-likeviruses possessdiverse arraysof hallmark andunique genes

Picorna-likeviral superfamily

Marine eukaryotic plankton carries a wealth ofpositive-strand RNA viruses nearly all belong to the picorna-like superfamily

Lang et al (2004) Virology 320206

HcRNAV44 kb

VP

HaRNAV86 kb

S-Pro Pol

VP1VP3VP2PolHel3

RsRNAV89 kb AnPolHel3 VP 1-3

SssRNAV90 kb An

PolHel3 VP 1-3C-Pro

sg sg

An

Nagasaki et al (2005) Appl Env Microbiol 718888

Culley et al (2006) Science 3121795

The amended Picornavirus-like

superfamily includes 14 recognized viral families

4 floating genera and15 unclassified positive-

strand anddouble-strand RNA

viruses that infect hosts from 4 of the 5 eukaryotic

supergroups-6 distinct clades

from RdRp phylogeny-diverse genome layouts

View from the viral side

6 major clades of picorna-likeviruses - 5 infect eukaryotesfrom different supergroups

The 5 supergroups of eukaryotes and their picorna-like viruses

Complementary view from thehost side

4 of the 5 eukaryoticsupergroups host diverse picorna-like viruses (no onechecked the 5th)

PolioV74 kb

VP0 3DPol2C 3CPro Angg

IRESVP1VP3 2APro 2B

3A 3B

Jelly-roll CPsSuperfamily 3

helicase RdRp

DNA phages Bacterial group II

retroelements

Bacterial and mitochondrialHtrA family proteases

Likely origins of the signature genes of the picorna-like superfamily were inferred usingsignature PSSMs and PSI-BLAST searches

3C-Pro

Hallmark genes of picorna-like viruses have distinct prokaryoticorigins

Radiation of major viralclades occurred in aldquoBig Bangrdquo during eukaryogenesis and antedates the divergenceof eukaryotic supergroups-the viruses then ldquosampledrdquo the hosts

Conclusions on the picorna and NCLDV storiesBig Bangs of virus evolution

bullPhylogenomic analysis of the picorna-like superfamily of eukaryotic RNA viruses indicates that the major lineages within this superfamily diverged prior to the divergence of the eukaryotic supergroupsbullBig Bang ndash an explosive early phase of viral evolutionbullMost likely the same pattern holds for other major groups ofviruses as illustrated by the evolutionary study of NCLDVa completely different group of virusesbullThe Big Bangs of eukaryotic virus evolution occurred concomitantly with a similar rapid phase of host evolution andcould be a manifestation of a general model of major evolutionarytransitions

Koonin The Biological Big Bang model for the major transitions in evolutionBiol Direct 2007 Aug 20221

Koonin Wolf Nagasaki Dolja Nat Rev Microbiol 2008 Dec6(12)925-39

+

1 Positive-strandRNA

3-30 kb - +R R ET

T+R

Class Replication cycle

2 Double-strandRNA

4-25 kb plusmnTr R E

T

+ plusmn

3 Negative-strandRNA

11-20 kbR R E

RdRp CPT

+-Tr -

+

Genetic cycles of RNA viruses

Diversity of viral genetic cycles versus the uniform genetic cycle of all cellular organisms Viruses are the

biospherersquos laboratory of genomic strategies

RdRp CP

RdRp CP

+4 Retroid

RNA viruses 7-12 kb +RT Tr E

T+Tr

plusmn

5 Retroid DNAviruses

elements2-10 kb +Tr

T+Tr

plusmnE RT

plusmn

RT CP

RT

Class Replication cycle

Genetic cycles of retroid viruses and retroelements

Genetic cycles of DNA viruses and plasmids 6 ssDNAvirusesplasmids2-11 kb

7 dsDNA virusesplasmids

5-1200 kb

+RCR RCR E

+Tr

plusmn +

TRCE

Tr

plusmn

+T

plusmnR E

Pr-Pol ALL CELLULARORGANISMS

YOU ARE HERE

I Genes with readily detectable homologs from cellular life forms 1 Genes with closely related homologs from cellular organisms

(typically the host of the given virus) present in a narrow group of viruses2 Genes that are conserved within a virus lineage or even several

lineages and have moderately close cellular homologsOrigin relatively recent (1) or ancient (2) acquisition from host

II Virus-specific genes3 ORFans ie genes without detectable homologs except possibly

in closely related viruses 4 Virus-specific genes that are conserved within a virus lineageAcquisition from host but with rapid divergence from ancestor once within viral genomes

III Viral hallmark genes5 Genes shared by many diverse virus lineages with only very distant

homologs in cellular organisms

Natural history of viral genes a one-page summary of viral comparative genomics

Contributions of different classes of viral genes to the genomes of different classes of viruses strong dependence on genome size

Genome size (log) 0 1 2 3

Recent acquisitions

Old acquisitions

Virus-specific ORFans

Virus-specific conserved

Virus hallmark

Most RNAvirusesretroelementsRCR replicons

Large RNA virusesadenotailedphages

NCLDVherpeslargephages

Natural history of viral genesViral Hallmark Genes

Shared by many diverse groups of viruses from the smallest RNA viruses to the giant DNA viruses

Strong support for monophyly of all viral members of the respective gene families

Only distant homologs in cellular organisms

Can be viewed as signatures of the lsquovirus statersquo

Play major roles in genome replication packagingand assembly

4 Rolling circle replicationinitiation endonuclease

1 Jelly-roll capsid protein

2 Superfamily 3 helicase

Protein products of viral hallmark genes

3 RNA-dependent RNA polymeraseand Reverse transcriptase

5 Viral archaeo-eukaryotic DNA primase

6 UL9-like superfamily 2 helicase

7 Packaging ATPase of the FtsK family

8 ATPase subunit of terminase

The primordial gene pool hypothesis (extremely counterintuitive ndash Santiago Elena Feb 16 2011)

The hallmark genes AND by implication the major lineages of modern viruses (at least viruses of prokaryotes) descend directly from a primordial gene pool

Koonin Senkevich Dolja Biol Direct 2006 129

Viral hallmark genes arebullpresent in a huge diversity of viruses and other selfish elementsbullrepresented only by remote homologs in cellular life forms

-synergy with the diversity of genomic strategies

A crucial corollary If viruses come directly from a primordial gene pool then origin of viruses isinextricably linked to the origin of cells

Cell degeneration Escaped genes Pr imordial geneticsystems

CELL

SMALLPARASITICCELL

VERYSMALLVIRUS

CHROMOSOME

PLASMID

VIRUS

RNA

VIRUS

PRE-CELLULAR LIFE FORMS

DNA

VIRUS

mRNA

VIRUS

Competing concepts of the or igin of viruses

CELLS

Origin and evolution of virus-like genetic

elementsin the pre-cellular

era

Koonin Martin TIG 2005Koonin Senkevich Dolja Biol Direct 2006 129Koonin Ann NY Acad Sci 2009

Replicon fusionyields chromosomes

bullViruses and virus-like genetic elements are not ldquojustrdquo pathogens they aredominant entities in the biospherebullEmergence of virus-like parasites is inevitable in any replicating systembullIn the pre-cellular epoch the genetic elements that later became viral andcellular genomes comprised a single pool in which they mixed matched andevolved new increasingly complex gene ensemblesbullDifferent replication strategies including RNA replication reverse transcriptionand DNA replication evolved already in the primordial genetic poolbullWith the emergence of prokaryotic cells a distinct pool of viral genes formed that retained its identity ever since as evidenced by the extant distribution ofviral hallmark genes ldquovirus worldrdquo or the virospherebullThe emergence of the eukaryotic cell was a second melting pot of virus evolution from which viruses of eukaryotes originated via recombination of genes from prokaryote viruses retroelements and the evolving eukaryotic hostbullViruses make essential contributions to the evolution of the genomes of cellularlife forms in particular as vehicles of HGT GTAs transducing phages

The ancient Virus World (VW)

Koonin EV Senkevich TG Dolja VV The ancient Virus World and evolution of cells Biol Direct 2006 Koonin EV Wolf YI Nagasaki K Dolja VV The complexity of the virus world Nat Rev Microbiol 2009

Evolution of antivirus defense systems

bull CRISPRCas system of adaptive immunity in prokaryotes

bull A case for Lamarckian evolutionbull The perennial virus-host arms race

CRISPR repeats and Cas genesCRISPR Clustered Regularly interspaced short palindromic repeatsCas CRISPR-associated (genes)

Sorek et al Nature Rev Microbiol 2008

Makarova KS Aravind L Grishin NV Rogozin IB Koonin EVA DNA repair system specific for thermophilic Archaea and bacteria predicted by genomic context analysisNucleic Acids Res 2002 Jan 1530(2)482-96

During a systematic analysis of conserved gene context in prokaryotic genomes a previously undetected complex partially conserved neighborhood consisting of more than 20 genes was discovered in most Archaea and some bacteria including the hyperthermophiles Thermotoga maritima and Aquifex aeolicus The gene composition and gene order in this neighborhood vary greatly between species but all versions have a stable conserved core that consists of five genes One of the core genes encodes a predicted DNA helicase often fused to a predicted HD-superfamily hydrolase and another encodes a RecB family exonuclease three core genes remain uncharacterized but one of these might encode a nuclease of a new family helliphelliphelliphellipThe functional features of the proteins encoded in this neighborhood suggest that they comprise a previously undetected DNA repair system which to our knowledge is the first repair system largely specific for thermophiles to be identified

Protein components of the system an update with unification of many diverse families

~25 families altogetherFamily SubfamilyA Phyletic

distributionBComments

1 COG1518 COG1518 (cas1) All Putative novel nucleaseintegrase Mostly α-helical protein

2 COG1343 COG1343 (cas2) COG3512ygbF-like

MTH324-likey1723_N-like

All Small protein related to VapD fused to helicase (COG1203) iny1723-like proteins

3 COG1203 COG1203 (cas3) All DNA helicase Most proteins have fusion to HD nuclease

4 RecB-like nuclease

COG1468 (cas4) COG4343 All RecB-like nuclease Contains three-cysteine C-terminal cluster

5 RAMP Repair-associated mysterious

protein

COG1688 (cas5)COG1769 COG1583COG1567 COG1336COG1367 COG1604COG1337 COG1332COG5551

BH0337-likeMJ0978-likeYgcH-like

y1726-like y1727-like

All Belong to ldquoRAMPrdquo family possibly RNA-binding proteinstructurally related to duplicated ferredoxin fold (PDB 1wj9)

6 COG1857 COG1857 COG3649 YgcJ-like y1725-like

All αβ protein predicted nuclease or integrase

7 HD-like nuclease

COG1203 (N-terminus) COG2254 All HD-like nuclease

8 BH0338 BH0338-likeMTH1090-like

All mostlyarchaea and FIRM

Large Zn-finger containing proteins probably nucleases (nucleaseactivity was shown for MTH1090 (ref)

9 COG1353 predicted

polymerase

COG1353 Most archaea some bacteria

Predicted palm-domain polymerase distantly related to viral RdRp and RT

hellipand ~15 other less common proteins Makarova et al BD 2006

CRISPR clustered regularly interspaced short palindromic repeats

CRISPR repeats

TB and TA

Strepto-like

Ecoli-like

Pasteurella-like

Cas2COG1343

Cas1COG1518

Cas4COG1468

Cas3COG1203

ldquoThe common structural characteristics of CRISPR loci are (i) the presence of multiple short direct repeats which show no or very little sequence variation within a given locus (ii) the presence of non-repetitive spacer sequences between the repeats of similar size (iii) the presence of a common leader sequence of a few hundred basepairs in most species harbouring multiple CRISPR loci (iv) the absence of long open reading frames within the locus and (v) the presence of the cas1 gene accompanied by the cas2 cas3 or cas4 genes in CRISPR-containing speciesrdquo

The cas genes are our ldquorepairrdquo system

ldquoHelicaserdquo cassette

ldquoPolymeraserdquo cassette

paragon of prokaryotic genome plasticity-extensive gene shuffling-evidence of multiple horizontal transfers-widespread gene loss

RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins

extreme sequence diversity MOTIFs specific I II III IV V COGs 13361367 hhshhGs ust-lKGhh+hh hhGtt hD lGhttsGh 160413371332 y1726-like slhlpEKuVRGT lRTIDs YGuVTsGhuh COG1851 hGphpGpsaFh hGFGRh BH0337-like TpA-h+GIh-uIh hhLpDV LGsREhuht COG1567 hhhpp ups-lhtAhh luscoGhGh COG1769 hhhh+Ph-hhh ss-hhGhlhsh hGh hGtcp+hsthchp COG1688 (Cas5) hhhhhts sssshhGhlsh lGttphh COGs 15835551 hhhhoPhhl hGtppshGFGl YgcH-like hHphlh hGu+uhGhGhh y1727-like LHphLh GFsthGLStss MJ0978-like hHNH lG+tsuhGhGol

Ferredoxin foldRNA-recognition motif (RRM)

The sequence similarity space of CRISPR repeats visualized with the BioLayoutprogram [26] Dots denote individual repeat sequences connecting lines represent Smith-Waterman similarities such that closer dots represent more similar sequences Dot colors denote cluster association as derived from MCL clustering The 12 largest clusters are indicated by circles together with their sequence logos coarse phylogenetic composition and sample secondary structures where applicableKunin et al Genome Biology 2007 8R61 doi101186gb-2007-8-4-r61

CRISPR show extreme diversity and complex clustering

Arcful GTTGAAATC-------AGACCAAAATGGGATTGAAAG- Metthe GTTAAAATC-------AGACCAAAATGGGATTGAAAT- Pyraby GTTCCAATA-------AGACTAGAATAGAATTGAAAG- Pyrhor GTTTAATAA--------GACTAAAATAGAATTGAAAG- Sulsol GATTAATCC-------------CAAAAGAATTGAAAG- Pyrhor GTTTCCGTA--------GAACTAAATAGTGTGGAAAG- Pyraby GTTTCCGTA--------GAACTTAGTAGTGTGGAAAG- Pyraer GTTTCAACT------------ATCTTTTGATTTTTGG- Pictor GTTAAATAA-------TAACCTAAATAGGATTGAAAG- Theaer GTAAAATAG---------ACCTTAATAGGATTGAAAG- Metace GTTTCAATC-------CCTCAAAGGTCTGATTTTAAC- Sultok GATGAATCC------------CAAAAGGAATTGAAAG- Sulaci GTTTTAGTT-------------TCTTGTCGTTATTAC- Pictor GTTTAAGAA--------TTACTAGATAGTATGGAGT-- Halmar GTTTCAGAC-------GGACCCTTGTGGGGTTGAAGC- Metjan ATTAAAATC-------AGACCGTTTCGGAATGGAAAT- Metkan GTTTCATTA-CCCGTATTATTACGGGTTAATTGCGAG- Nanequ CTTTCAATA---------TTTCTAATATATTAGAAAC- Themar GTTTCAATA-------CTTCCTTAGAGGTATGGAAAC- Theten GTTTCAATC-------CCTTTTAGGTAGGCTAAAAAC Thethe GTTGCAAAC-CTCGTTAGCCTCGTAGAGGATTGAAAC- Aquaeo GTTTTAACT--------CCACACGGTACATTAGAAAC- Bachal GTCGCACTC----TATATGGGT-GCGTGGATTGAAAT- Azoarcus GTGTTCCCC-------GCGCATCGCGGGGGTTGAAG-- Chltep GTCTTCCCC--------ACGCC-CGTGGGGGTGTTTC- Chrvio GTGCTCCCC-------CACGCA-CGTGGGGATGAACCG Clotet GTATTAGTA-------GCACCA-TATTGGAATGTAAAT Anabaena GTTTTAATTAACAAAAATCCCTATCAGGGATTGAAAC- Myctub GTTTCCGTC--CCCTCTCGGGGTTTTGGGTCTGACGAC Yerpes GTTCACTGC--------CGCACAGGCAGCTTAGAAA-- Metcap GTTTCAATCCACTCCCGGCTATTTAGCCGGGAGATAC- Mycgal GTTTTAGCA-CTGTACAATACTTGTGTAAGCAATAAC- Pasmul GTTAACTGC--------CGTATAGGCAGCTTAGAAA-- Pasmul GTTGTAGTTCCCTCTCTCATTTCGCAGTGCTACAAT-- Pasmul GTTCACCAT--------CGTGTAGATGGCTTAGAAA--

All

A large subset of CRISPRs is conserved among diverse specieseven between archaea and bacteria suggesting that

CRISPRs are horizontally transferred together with cas genes

Mojica FJ Diez-Villasenor C Garcia-Martinez J Soria E

Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elementsJ Mol Evol 2005 Feb60(2)174-82

Here we show that CRISPR spacers derive from preexisting sequences either chromosomal or within transmissible genetic elements such as bacteriophages and conjugative plasmids

helliphelliphelliphellipThe transcription of the CRISPR loci (Tang et al 2002) suggests that such activity could be executed by CRISPR-RNA molecules acting as regulatory RNA that specifically recognizes the target through the homologous RNA-spacer sequence similarly to the eukaryotic interference RNA

HypothesisCRISPRCasbull is a prokaryotic immunity system that functions on the

RNAi principle

bull integrates short fragments of essential phageplasmid genes into CRISPRs

bull When expressed these fragments (psiRNA ndash after prokaryotic siRNA) silence the target gene and make the organism immune to the respective agent

bull contains all or most of the protein activities involved in these processes

bull Some of the Cas proteins are functional analogs of the eukaryotic proteins involved in RNAi in particular components of RISC (RNA-Induced Silencing Complex) and form prokaryotic analogs of RISC (pRISCs)

Transcription

(protein-guided) RNA folding

specific transcription factor (COG1517)

cellular RNApol

polycistronic pre-psiRNA

p-dicer (Helicase+HD-Hydrolase)(COG1203 COG2254)

RNA processing(fast)

pre-psiRNA75-100 nt

RNA processing(slow)

p-dicer

psiRNA25-45 nt

p-slicer(COG1468 COG4343

COG1857)

RAMP

target RNA cleavage

p-RISC

plasmid or phage mRNA

Annealing to RNA target

p-RISC

The basic scheme of CASS functioning

psiRNA25-45 nt

RAMP

RAMP-RNA complex(unstable)plasmid or phage

mRNA

Primer elongationCASS RNApol(COG1353)

Amplified annealing

long dsRNA (stable)

p-dicerDuplex degradation

RAMPRAMP binding

Cycle continues

Annealing to RNA target

Variant of CASS functioning with polymerasepsi-RNA amplification(mostly in thermophiles but also in Mycobacteria)

plasmid or phage mRNA

pre-psiRNA75-100 nt

CASS RNApol(COG1353)Or other RT

Reverse transcription with random copy choice

Random RNA recombination and reverse transcription

dsDNA with CRISPRs and target-derived spacers

Homologous recombination with genomic CRISPR region

integrase(COG1518)

genomic DNA

genomic DNA with new target-derived spacer

OR

New psiRNA generation

Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12

Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity

Key validation

bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease

Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes

(1) Adaptation

(2) Expression amp Processing

(3) Interference

invading virus plasmid

CRISPRleader

5 4 3 2 1

6 5 4 3 2 1

invading virus plasmid

invading virus plasmid

host

host

cas operon

Cascade

Cas3

cas3 cas2cas1cas5

Van der Oost et al TIBS 2009

The three types of CRISPRCas systems and their signature genes

1343

1343

1343

1343

Helicase1203

HD-f

Cas2

Cas6

Cas6

Cas5

Cas8

Cas9

Cas1

Cas3

Cas10

Cas11

Cas7

Cas6

Cas6

Cas5 Cas6

Cas4

Cas6

13433513

1343

RuvC-f

3513

HNH

13433513

1857

18571857 RAMP1688

RAMP1688

RecB-f1468

RecB-f1468

Helicase1203

R AMP1583

1857 1857Helicase1203

1857Helicase1203

1343 y1724

ygcKygcL

EcoliCASS2

YpestCASS3

NmeniCASS4

MtubeCASS6 RAMP module

DvulgCASS1

Tneap and HmariCASS7

ApernCASS5

Type I (Helicase cassette)

Cascade

Type II (HNH-type)

Type III (RAMP module or Polymerase cassette)

CASS4a

1857 1343RecB-f1468

Helicase1203

BH0338MTH1090LA3191

1421 RAMP1567

RAMP15833337RAMP

1583

RAMP1583

RAMP1769

SPy1049

Cas6

Cas61857

RecB-f1468

RAMP1688

Helicase1203

AF0070

R AMP1583

1343

1517

Cas12 Cas13

RAMP1688

RAMP1688

RAMP1688

Makarova et al NRM submitted

Experimental data on CRISPRCas systems

1857

1857

1857

RAMP1688

10

Cordip DIP0037Bacfra BF3951

Wolsuc WS1444Camjej Cj1522c

Neimen NMA0630Pasmul PM1126

Strthe str0658Strpyo Spy 0770

Treden TDE0328Staepi SERP2463

Mycgal MGA 0523Mycmob MMOB0320

Porgin PG1982Legpne lpp0161

Wolsuc WS1615Sulsol SSO1450

Aerper APE1240Sulsol SSO1405

Pyraer PAE0200Pyraer PAE0081

Metmaz MM0559Metmaz MM3249

Arcful AF1878Mansuc MS1635

Niteur NE0111Myctub Rv2817c

Nostoc alr0381Synech slr7071Thethe TTHB145

Chrvio CV1229Xanaxo XAC3842

Chltep CT1130Desvul DVUA0134

Metcap MCA0651Azoarc ebA3283

Deigeo_DRAFT_1682Mansuc MS0981

Baccla ABC3592Bachal BH0341

Strpyo Spy 1286Vibvul VVA1544

Nostoc alr1468Synech slr7092

Synech slr7016Geosul GSU0057

Thethe TTHB224Nostoc alr1568

Metkan MK1312

Esccol b2755Saltyp STM2938

Phopro PBPRC0034Geosul GSU1392

Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639

Symthe_STH663Corjei jk0643

Nocfar nfa44220Strave SAV7537

Metcap MCA0930Cordip DIP2214

Chrvio CV1756Zymmob ZMO0680

Pasmul PM0311Erwcar ECA3679

Yerpes y1722Legpne lpl2837Phopro PBPRB1995

Acinet ACIAD2484

Clotet CTC01148Fusnuc FN1177

Aquaeo aq 369Theten TTE2658Themar TM1797

Bacfra BF2544Porgin PG2014

Halmar pNG4053Clotet CTC01463

Metjan MJ0378Pyrhor PH1245

Thekod TK0455Metthe MTH1084

Arcful AF2435Metace MA3670

Nanequ NEQ017Pyrhor PH0173

Pictor PTO0003Pictor PTO0049Thevol TVN0106

1343

1343

1343

1343

1518

1518

1343

1518

RecB-f4343

RecB-f1468

RecB-f1468

RecB-f1468

Helicase12031857 RAMP

1688

RAMP1688

Helicase1203

Helicase1203

RAMPBH0337

RAMPYgcH

RAMPy1727

RAMPy17261857

Helicase1203

HD-f

Helicase1203

HD-f

1343

HD-f

HD-f

HD-f

RuvC-f

AF0070

BH0338

y1724

ygcKygcL

RAMP1583

HTH

AF1870

SPy1049

3574

AF1870

PH0918

HTH 3574

1857 RAMP1688

RecB-f1468

Helicase1203

HD-f

AF1870

MTH1090

AF1873

RAMP1583

3574

AF1870

3574

1343

1

7

2

3

4

5

6

7a

Sulsol SSO1999Sulsol SSO1440

Sultok ST2639Sultok ST0032

Sulsol SSO1402Pyraer PAE0068

Arcful AF1874Pyrhor PH0917Thekod TK0450

Pyraby PAB1689

13433513

3513

1518

RuvC-f

HNH

4aHNH

1 0

Pyrhor PH0162

6

64

6

4

16

17

7

7

7

7

1

2

7

7

7

7

5 7

5 75 7

5 75 7

5 7

6 1

7

Metkan MK1314

Metace MA1931

Metmaz MM3356

Metbar Mbar_A1355

Strthe stu0960Myctub Rv2823cStaepi SERP2461

Themar TM1811Mansuc MS1651

Niteur NE0123Thethe TTP0102

Metthe MTH1082

Metthe MTH326

Metjan MJ1672Pictor PTO0053Thevol TVN0112

Metkan MK1297Thethe TTP0115

Bachal BH0328Synech sll7090

Aquaeo aq 387Theten TTE2639

Themar TM1794Arcful AF1867

Pyrfur PF1129Thefus Tfu1578

Porgin PG1987Sultok ST1979

Sulsol SSO1991Sulsol SSO1729

Sultok ST0012Sulaci Saci 2046Nostoc all1479

1421 RAMP1567

RAMP1583

RAMPMTH323

RAMP1769

XMTH324

ldquoPolymeraserdquo cassette

ldquoHelicaserdquo cassette(A)

(B)

(C)

Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4

Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56

Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)

mRNA targeting

Phylogeny of Cas1 and the 3 types of CRISPRCas systems

228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes

EcoliCASS2Type IType I

Type I

Type I

Type I

Type III

Type III

Type III

Type III

Type III

Type III

Type II

YpestCASS3

NmeniCASS4

PolymeraseRAMPmoduleMtubeCASS6

PolRAMP

DvulgCASS1

TneapHmariCASS7

ApernCASS5

cas1cas9 cas2 cas4

cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI

II

IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2

RAMP

Makarova et al NRM submitted

CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria

bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes

54

256

1537

26

516

2

9

7 56107

3

10

13

383

210

46

216

8

1

7 70211

10

1

0102030405060708090

100

1623

84 6

17 7

2322

0

9

0

0

151

2

1

21

17

20 1

0

1533 28 7 14

0

9 7 40

117 210

0102030405060708090

100

Type I Type II Type IIICas1presenceabsence

Makarova et al in preparation

Modular evolution of the 3 types of CRISPRCas

Makarova et al in preparation

Mol Microbiol 2011 Jan79(2)484-502

A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair

Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF

Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair

Back to repair functions

The 3 major modalities of evolution

Koonin Wolf Biol Direct 2009

CRISPRCas as a bona fide Lamarckian system

Koonin Wolf Biol Direct 2009

Phenomenon Biological rolefunction

Phyletic spread

Lamarckian criteria

Genomic changes caused by environmental factor

Changes are specific to relevant genomic loci

Changes provide adaptation to the causative factor

Bona fide LamarckianCRISPRCas Defense against

viruses and other mobile elements

Most of the Archaeaand many bacteria

Yes Yes Yes

piRNA Defense against transposable elements in germline

Animals Yes Yes Yes

HGT (specific cases)

Adaptation to new environment stress response resistance

Archaea bacteria unicellular eukaryotes

Yes Yes Yes

Quasi-LamarckianHGT (general phenomenon)

Diverse innovations Archaea bacteria unicellular eukaryotes

Yes No Yesno

Stress-induced mutagenesis

Stress responseresistanceadaptation to new conditions

Ubiquitous Yes No or partially

Yes (but general evolvabilityenhanced as well)

Diverse Lamarckian and quasi-Lamarckian phenomena

Koonin Wolf Biol Direct 2009

Stress as a gauge of evolutionary modality

Koonin Wolf Biol Direct 2009

bullEvolution of parasites is intrinsic to any replicator system

bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since

bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms

bullPerennial arms race between parasites and hosts is one of the principal factors of evolution

Valerian DoljaOregon State University

Yuri Wolf NCBITatiana Senkevich NIAID NIH

Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille

Didier Raoult et al

Bill MartinUniv Duesseldorf

Kira Makarova NCBI

  • Slide Number 1
  • What is a virus
  • Slide Number 3
  • Slide Number 4
  • Slide Number 5
  • Comparative genomics shows that viruses that cause human diseases belong to families that evolved hundreds of millions or even billions years agoViruses accompany the evolving cellular life throughout its history and might even predate it
  • Slide Number 7
  • Slide Number 8
  • Slide Number 9
  • Slide Number 10
  • Slide Number 11
  • Slide Number 12
  • Slide Number 13
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Slide Number 18
  • Slide Number 19
  • Slide Number 20
  • Slide Number 21
  • Slide Number 22
  • Slide Number 23
  • Slide Number 24
  • Slide Number 25
  • Slide Number 26
  • Slide Number 27
  • Slide Number 28
  • Slide Number 29
  • Slide Number 30
  • Slide Number 31
  • Slide Number 32
  • Slide Number 33
  • Slide Number 34
  • Slide Number 35
  • Slide Number 36
  • Evolution of antivirus defense systems
  • Slide Number 38
  • Slide Number 39
  • Slide Number 40
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins extreme sequence diversity
  • Slide Number 46
  • Slide Number 47
  • Slide Number 48
  • Slide Number 49
  • Hypothesis
  • Slide Number 51
  • Slide Number 52
  • Slide Number 53
  • Slide Number 54
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • The three types of CRISPRCas systems and their signature genes
  • Experimental data on CRISPRCas systems
  • Phylogeny of Cas1 and the 3 types of CRISPRCas systems
  • CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
  • Slide Number 62
  • Slide Number 63
  • Slide Number 64
  • Slide Number 65
  • Slide Number 66
  • Slide Number 67
  • Slide Number 68
  • Slide Number 69
Page 9: The Virus World, its evolution, evolution of antiviral defense

b1 Helvib1 Spofr

b1 Trinil2 Invir

l1 Aedtal3 Lymch

l3 Lymdil5 Ambtil5 Frovi

l5 Singrl4 Infsp

m6 Marseillevirusn1 Acapon2 Mamav

q1 Acatuq1 ParFRq1 ParMTq1 ParAR

q1 ParNYq1 Parbu

q6 Ostviq3 Ectsiq3 Felspq2 Emihu

u2 Amsmou2 Melsa

u1 Bovpau1 Orfviu1 Deevi

u1 Swiviu1 Myxviu1 Rabfi

u1 Goaviu1 Lumsku1 Shevi

u1 Tanviu1 Yabliu1 Yabmo

u1 Vacviu1 Varvi

u1 Molcou1 Crovi

u1 Canviu1 Fowvi

c1 Afrsw

1000010000

93159970

10000

1000010000

6702

9983

10000

7557

10000

1000010000

999310000

10000

9910

10000

7074

10000

6735

10000

10000

7417

10000

4747

454010000

9840

1000010000

10000

10000

10000

9720

9911

5452

9871

10000

10000

10000

05

Phycodnaviridae

Poxviridae

Ascoviridaeamp

Iridoviridae

MimiviridaeMarseillevirus

Asfarviridae

Phylogeny of NCLDV based on concatenation of 4 universal genes primase-helicase DNA polymerase packaging ATPase VLTF2 transcription factor

HOSTAnimals+ diverse protists

Amoebozoa Algae animals()

ChlorophytesHaptophytesstramenopiles

Animals haptophytes

bullDivergence of NCLDVmost likely antedatesdivergence of eukaryoticsupergroupsbullAlternativeaddition extensive horizontal transfer of viruses

Boyer Yutin et al PNAS 2009Koonin Yutin Intervirology2010

Animals

Boyer M Yutin N Pagnier I Barrassi L Fournous G Espinosa L Robert C AzzaS Sun S Rossmann MG Suzan-Monti M La Scola B Koonin EV Raoult DPNAS 2009106(51)21848-53

Giant Marseillevirus highlights the role of amoebae as a melting pot in emergence of chimeric microorganisms

Giant viruses such as Mimivirus isolated from amoeba found in aquatic habitatsshow biological sophistication comparable to that of simple cellular life formsand seem to evolve by similar mechanisms including extensive gene duplicationand horizontal gene transfer (HGT) possibly in part through a viral parasitethe virophage We report here the isolation of Marseille virus a previouslyuncharacterized giant virus of amoeba The virions of Marseillevirus encompass a 368-kb genome a minimum of 49 proteins and some messenger RNAs Phylogenetic analysis of core genes indicates that Marseillevirus is the prototypeof a family of nucleocytoplasmic large DNA viruses (NCLDV) of eukaryotes The genome repertoire of the virus is composed of typical NCLDV core genes and genes apparently obtained from eukaryotic hosts and their parasites or symbionts both bacterial and viral We propose that amoebae are melting potsldquoof microbial evolution where diverse forms emerge including giant viruses with complex gene repertoires of various origins

The mosaic composition of the Marseille virus genome

Gene content tree intervirus gene transfer

Amoeba as a melting pot for HGT between viruses andbacterial endosymbionts

There are really weird creatures out thereSome NCLDV host their own parasites

La Scola et al The virophage as a unique parasite of the giant mimivirus Nature 2008

Chimeric origin of the virophage genome

NCLDV

Mimivirus

Archaeal virus

NCLDVenvironmental only

Bacterialtransposon

La Scola et al Nature 2008 455100-4

Hypothesisorigin of the NCLDV (and other viruses ofEukaryotes) in the

the second melting pot of virus evolution ndasheukaryogenesis

Phage scaffold(virus hallmark genes)

Eukaryotic additionsdisplacements

Koonin Yutin Intervirology 2010

Poliovirus74 kb

VP0 3DPol2C 3CPro Angg

IRESVP1VP3 2APro 2B

3A 3B

Jelly-roll CPsSuperfamily 3

helicase RdRp

Viral hallmark genes

3C-Pro3DPol

RdRp3CPro g

VPg2C

S3H

Picornaviral lsquosignaturersquo genes

Protein sequence conservation

Some of the smallest viruses (this is where poliovirus belongs) The Big Bang of picorna-like virus evolution

Koonin Wolf Nagasaki Dolja Nat Rev Microbiol 2008 Dec6(12)925-39

Picorna-likeviruses possessdiverse arraysof hallmark andunique genes

Picorna-likeviral superfamily

Marine eukaryotic plankton carries a wealth ofpositive-strand RNA viruses nearly all belong to the picorna-like superfamily

Lang et al (2004) Virology 320206

HcRNAV44 kb

VP

HaRNAV86 kb

S-Pro Pol

VP1VP3VP2PolHel3

RsRNAV89 kb AnPolHel3 VP 1-3

SssRNAV90 kb An

PolHel3 VP 1-3C-Pro

sg sg

An

Nagasaki et al (2005) Appl Env Microbiol 718888

Culley et al (2006) Science 3121795

The amended Picornavirus-like

superfamily includes 14 recognized viral families

4 floating genera and15 unclassified positive-

strand anddouble-strand RNA

viruses that infect hosts from 4 of the 5 eukaryotic

supergroups-6 distinct clades

from RdRp phylogeny-diverse genome layouts

View from the viral side

6 major clades of picorna-likeviruses - 5 infect eukaryotesfrom different supergroups

The 5 supergroups of eukaryotes and their picorna-like viruses

Complementary view from thehost side

4 of the 5 eukaryoticsupergroups host diverse picorna-like viruses (no onechecked the 5th)

PolioV74 kb

VP0 3DPol2C 3CPro Angg

IRESVP1VP3 2APro 2B

3A 3B

Jelly-roll CPsSuperfamily 3

helicase RdRp

DNA phages Bacterial group II

retroelements

Bacterial and mitochondrialHtrA family proteases

Likely origins of the signature genes of the picorna-like superfamily were inferred usingsignature PSSMs and PSI-BLAST searches

3C-Pro

Hallmark genes of picorna-like viruses have distinct prokaryoticorigins

Radiation of major viralclades occurred in aldquoBig Bangrdquo during eukaryogenesis and antedates the divergenceof eukaryotic supergroups-the viruses then ldquosampledrdquo the hosts

Conclusions on the picorna and NCLDV storiesBig Bangs of virus evolution

bullPhylogenomic analysis of the picorna-like superfamily of eukaryotic RNA viruses indicates that the major lineages within this superfamily diverged prior to the divergence of the eukaryotic supergroupsbullBig Bang ndash an explosive early phase of viral evolutionbullMost likely the same pattern holds for other major groups ofviruses as illustrated by the evolutionary study of NCLDVa completely different group of virusesbullThe Big Bangs of eukaryotic virus evolution occurred concomitantly with a similar rapid phase of host evolution andcould be a manifestation of a general model of major evolutionarytransitions

Koonin The Biological Big Bang model for the major transitions in evolutionBiol Direct 2007 Aug 20221

Koonin Wolf Nagasaki Dolja Nat Rev Microbiol 2008 Dec6(12)925-39

+

1 Positive-strandRNA

3-30 kb - +R R ET

T+R

Class Replication cycle

2 Double-strandRNA

4-25 kb plusmnTr R E

T

+ plusmn

3 Negative-strandRNA

11-20 kbR R E

RdRp CPT

+-Tr -

+

Genetic cycles of RNA viruses

Diversity of viral genetic cycles versus the uniform genetic cycle of all cellular organisms Viruses are the

biospherersquos laboratory of genomic strategies

RdRp CP

RdRp CP

+4 Retroid

RNA viruses 7-12 kb +RT Tr E

T+Tr

plusmn

5 Retroid DNAviruses

elements2-10 kb +Tr

T+Tr

plusmnE RT

plusmn

RT CP

RT

Class Replication cycle

Genetic cycles of retroid viruses and retroelements

Genetic cycles of DNA viruses and plasmids 6 ssDNAvirusesplasmids2-11 kb

7 dsDNA virusesplasmids

5-1200 kb

+RCR RCR E

+Tr

plusmn +

TRCE

Tr

plusmn

+T

plusmnR E

Pr-Pol ALL CELLULARORGANISMS

YOU ARE HERE

I Genes with readily detectable homologs from cellular life forms 1 Genes with closely related homologs from cellular organisms

(typically the host of the given virus) present in a narrow group of viruses2 Genes that are conserved within a virus lineage or even several

lineages and have moderately close cellular homologsOrigin relatively recent (1) or ancient (2) acquisition from host

II Virus-specific genes3 ORFans ie genes without detectable homologs except possibly

in closely related viruses 4 Virus-specific genes that are conserved within a virus lineageAcquisition from host but with rapid divergence from ancestor once within viral genomes

III Viral hallmark genes5 Genes shared by many diverse virus lineages with only very distant

homologs in cellular organisms

Natural history of viral genes a one-page summary of viral comparative genomics

Contributions of different classes of viral genes to the genomes of different classes of viruses strong dependence on genome size

Genome size (log) 0 1 2 3

Recent acquisitions

Old acquisitions

Virus-specific ORFans

Virus-specific conserved

Virus hallmark

Most RNAvirusesretroelementsRCR replicons

Large RNA virusesadenotailedphages

NCLDVherpeslargephages

Natural history of viral genesViral Hallmark Genes

Shared by many diverse groups of viruses from the smallest RNA viruses to the giant DNA viruses

Strong support for monophyly of all viral members of the respective gene families

Only distant homologs in cellular organisms

Can be viewed as signatures of the lsquovirus statersquo

Play major roles in genome replication packagingand assembly

4 Rolling circle replicationinitiation endonuclease

1 Jelly-roll capsid protein

2 Superfamily 3 helicase

Protein products of viral hallmark genes

3 RNA-dependent RNA polymeraseand Reverse transcriptase

5 Viral archaeo-eukaryotic DNA primase

6 UL9-like superfamily 2 helicase

7 Packaging ATPase of the FtsK family

8 ATPase subunit of terminase

The primordial gene pool hypothesis (extremely counterintuitive ndash Santiago Elena Feb 16 2011)

The hallmark genes AND by implication the major lineages of modern viruses (at least viruses of prokaryotes) descend directly from a primordial gene pool

Koonin Senkevich Dolja Biol Direct 2006 129

Viral hallmark genes arebullpresent in a huge diversity of viruses and other selfish elementsbullrepresented only by remote homologs in cellular life forms

-synergy with the diversity of genomic strategies

A crucial corollary If viruses come directly from a primordial gene pool then origin of viruses isinextricably linked to the origin of cells

Cell degeneration Escaped genes Pr imordial geneticsystems

CELL

SMALLPARASITICCELL

VERYSMALLVIRUS

CHROMOSOME

PLASMID

VIRUS

RNA

VIRUS

PRE-CELLULAR LIFE FORMS

DNA

VIRUS

mRNA

VIRUS

Competing concepts of the or igin of viruses

CELLS

Origin and evolution of virus-like genetic

elementsin the pre-cellular

era

Koonin Martin TIG 2005Koonin Senkevich Dolja Biol Direct 2006 129Koonin Ann NY Acad Sci 2009

Replicon fusionyields chromosomes

bullViruses and virus-like genetic elements are not ldquojustrdquo pathogens they aredominant entities in the biospherebullEmergence of virus-like parasites is inevitable in any replicating systembullIn the pre-cellular epoch the genetic elements that later became viral andcellular genomes comprised a single pool in which they mixed matched andevolved new increasingly complex gene ensemblesbullDifferent replication strategies including RNA replication reverse transcriptionand DNA replication evolved already in the primordial genetic poolbullWith the emergence of prokaryotic cells a distinct pool of viral genes formed that retained its identity ever since as evidenced by the extant distribution ofviral hallmark genes ldquovirus worldrdquo or the virospherebullThe emergence of the eukaryotic cell was a second melting pot of virus evolution from which viruses of eukaryotes originated via recombination of genes from prokaryote viruses retroelements and the evolving eukaryotic hostbullViruses make essential contributions to the evolution of the genomes of cellularlife forms in particular as vehicles of HGT GTAs transducing phages

The ancient Virus World (VW)

Koonin EV Senkevich TG Dolja VV The ancient Virus World and evolution of cells Biol Direct 2006 Koonin EV Wolf YI Nagasaki K Dolja VV The complexity of the virus world Nat Rev Microbiol 2009

Evolution of antivirus defense systems

bull CRISPRCas system of adaptive immunity in prokaryotes

bull A case for Lamarckian evolutionbull The perennial virus-host arms race

CRISPR repeats and Cas genesCRISPR Clustered Regularly interspaced short palindromic repeatsCas CRISPR-associated (genes)

Sorek et al Nature Rev Microbiol 2008

Makarova KS Aravind L Grishin NV Rogozin IB Koonin EVA DNA repair system specific for thermophilic Archaea and bacteria predicted by genomic context analysisNucleic Acids Res 2002 Jan 1530(2)482-96

During a systematic analysis of conserved gene context in prokaryotic genomes a previously undetected complex partially conserved neighborhood consisting of more than 20 genes was discovered in most Archaea and some bacteria including the hyperthermophiles Thermotoga maritima and Aquifex aeolicus The gene composition and gene order in this neighborhood vary greatly between species but all versions have a stable conserved core that consists of five genes One of the core genes encodes a predicted DNA helicase often fused to a predicted HD-superfamily hydrolase and another encodes a RecB family exonuclease three core genes remain uncharacterized but one of these might encode a nuclease of a new family helliphelliphelliphellipThe functional features of the proteins encoded in this neighborhood suggest that they comprise a previously undetected DNA repair system which to our knowledge is the first repair system largely specific for thermophiles to be identified

Protein components of the system an update with unification of many diverse families

~25 families altogetherFamily SubfamilyA Phyletic

distributionBComments

1 COG1518 COG1518 (cas1) All Putative novel nucleaseintegrase Mostly α-helical protein

2 COG1343 COG1343 (cas2) COG3512ygbF-like

MTH324-likey1723_N-like

All Small protein related to VapD fused to helicase (COG1203) iny1723-like proteins

3 COG1203 COG1203 (cas3) All DNA helicase Most proteins have fusion to HD nuclease

4 RecB-like nuclease

COG1468 (cas4) COG4343 All RecB-like nuclease Contains three-cysteine C-terminal cluster

5 RAMP Repair-associated mysterious

protein

COG1688 (cas5)COG1769 COG1583COG1567 COG1336COG1367 COG1604COG1337 COG1332COG5551

BH0337-likeMJ0978-likeYgcH-like

y1726-like y1727-like

All Belong to ldquoRAMPrdquo family possibly RNA-binding proteinstructurally related to duplicated ferredoxin fold (PDB 1wj9)

6 COG1857 COG1857 COG3649 YgcJ-like y1725-like

All αβ protein predicted nuclease or integrase

7 HD-like nuclease

COG1203 (N-terminus) COG2254 All HD-like nuclease

8 BH0338 BH0338-likeMTH1090-like

All mostlyarchaea and FIRM

Large Zn-finger containing proteins probably nucleases (nucleaseactivity was shown for MTH1090 (ref)

9 COG1353 predicted

polymerase

COG1353 Most archaea some bacteria

Predicted palm-domain polymerase distantly related to viral RdRp and RT

hellipand ~15 other less common proteins Makarova et al BD 2006

CRISPR clustered regularly interspaced short palindromic repeats

CRISPR repeats

TB and TA

Strepto-like

Ecoli-like

Pasteurella-like

Cas2COG1343

Cas1COG1518

Cas4COG1468

Cas3COG1203

ldquoThe common structural characteristics of CRISPR loci are (i) the presence of multiple short direct repeats which show no or very little sequence variation within a given locus (ii) the presence of non-repetitive spacer sequences between the repeats of similar size (iii) the presence of a common leader sequence of a few hundred basepairs in most species harbouring multiple CRISPR loci (iv) the absence of long open reading frames within the locus and (v) the presence of the cas1 gene accompanied by the cas2 cas3 or cas4 genes in CRISPR-containing speciesrdquo

The cas genes are our ldquorepairrdquo system

ldquoHelicaserdquo cassette

ldquoPolymeraserdquo cassette

paragon of prokaryotic genome plasticity-extensive gene shuffling-evidence of multiple horizontal transfers-widespread gene loss

RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins

extreme sequence diversity MOTIFs specific I II III IV V COGs 13361367 hhshhGs ust-lKGhh+hh hhGtt hD lGhttsGh 160413371332 y1726-like slhlpEKuVRGT lRTIDs YGuVTsGhuh COG1851 hGphpGpsaFh hGFGRh BH0337-like TpA-h+GIh-uIh hhLpDV LGsREhuht COG1567 hhhpp ups-lhtAhh luscoGhGh COG1769 hhhh+Ph-hhh ss-hhGhlhsh hGh hGtcp+hsthchp COG1688 (Cas5) hhhhhts sssshhGhlsh lGttphh COGs 15835551 hhhhoPhhl hGtppshGFGl YgcH-like hHphlh hGu+uhGhGhh y1727-like LHphLh GFsthGLStss MJ0978-like hHNH lG+tsuhGhGol

Ferredoxin foldRNA-recognition motif (RRM)

The sequence similarity space of CRISPR repeats visualized with the BioLayoutprogram [26] Dots denote individual repeat sequences connecting lines represent Smith-Waterman similarities such that closer dots represent more similar sequences Dot colors denote cluster association as derived from MCL clustering The 12 largest clusters are indicated by circles together with their sequence logos coarse phylogenetic composition and sample secondary structures where applicableKunin et al Genome Biology 2007 8R61 doi101186gb-2007-8-4-r61

CRISPR show extreme diversity and complex clustering

Arcful GTTGAAATC-------AGACCAAAATGGGATTGAAAG- Metthe GTTAAAATC-------AGACCAAAATGGGATTGAAAT- Pyraby GTTCCAATA-------AGACTAGAATAGAATTGAAAG- Pyrhor GTTTAATAA--------GACTAAAATAGAATTGAAAG- Sulsol GATTAATCC-------------CAAAAGAATTGAAAG- Pyrhor GTTTCCGTA--------GAACTAAATAGTGTGGAAAG- Pyraby GTTTCCGTA--------GAACTTAGTAGTGTGGAAAG- Pyraer GTTTCAACT------------ATCTTTTGATTTTTGG- Pictor GTTAAATAA-------TAACCTAAATAGGATTGAAAG- Theaer GTAAAATAG---------ACCTTAATAGGATTGAAAG- Metace GTTTCAATC-------CCTCAAAGGTCTGATTTTAAC- Sultok GATGAATCC------------CAAAAGGAATTGAAAG- Sulaci GTTTTAGTT-------------TCTTGTCGTTATTAC- Pictor GTTTAAGAA--------TTACTAGATAGTATGGAGT-- Halmar GTTTCAGAC-------GGACCCTTGTGGGGTTGAAGC- Metjan ATTAAAATC-------AGACCGTTTCGGAATGGAAAT- Metkan GTTTCATTA-CCCGTATTATTACGGGTTAATTGCGAG- Nanequ CTTTCAATA---------TTTCTAATATATTAGAAAC- Themar GTTTCAATA-------CTTCCTTAGAGGTATGGAAAC- Theten GTTTCAATC-------CCTTTTAGGTAGGCTAAAAAC Thethe GTTGCAAAC-CTCGTTAGCCTCGTAGAGGATTGAAAC- Aquaeo GTTTTAACT--------CCACACGGTACATTAGAAAC- Bachal GTCGCACTC----TATATGGGT-GCGTGGATTGAAAT- Azoarcus GTGTTCCCC-------GCGCATCGCGGGGGTTGAAG-- Chltep GTCTTCCCC--------ACGCC-CGTGGGGGTGTTTC- Chrvio GTGCTCCCC-------CACGCA-CGTGGGGATGAACCG Clotet GTATTAGTA-------GCACCA-TATTGGAATGTAAAT Anabaena GTTTTAATTAACAAAAATCCCTATCAGGGATTGAAAC- Myctub GTTTCCGTC--CCCTCTCGGGGTTTTGGGTCTGACGAC Yerpes GTTCACTGC--------CGCACAGGCAGCTTAGAAA-- Metcap GTTTCAATCCACTCCCGGCTATTTAGCCGGGAGATAC- Mycgal GTTTTAGCA-CTGTACAATACTTGTGTAAGCAATAAC- Pasmul GTTAACTGC--------CGTATAGGCAGCTTAGAAA-- Pasmul GTTGTAGTTCCCTCTCTCATTTCGCAGTGCTACAAT-- Pasmul GTTCACCAT--------CGTGTAGATGGCTTAGAAA--

All

A large subset of CRISPRs is conserved among diverse specieseven between archaea and bacteria suggesting that

CRISPRs are horizontally transferred together with cas genes

Mojica FJ Diez-Villasenor C Garcia-Martinez J Soria E

Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elementsJ Mol Evol 2005 Feb60(2)174-82

Here we show that CRISPR spacers derive from preexisting sequences either chromosomal or within transmissible genetic elements such as bacteriophages and conjugative plasmids

helliphelliphelliphellipThe transcription of the CRISPR loci (Tang et al 2002) suggests that such activity could be executed by CRISPR-RNA molecules acting as regulatory RNA that specifically recognizes the target through the homologous RNA-spacer sequence similarly to the eukaryotic interference RNA

HypothesisCRISPRCasbull is a prokaryotic immunity system that functions on the

RNAi principle

bull integrates short fragments of essential phageplasmid genes into CRISPRs

bull When expressed these fragments (psiRNA ndash after prokaryotic siRNA) silence the target gene and make the organism immune to the respective agent

bull contains all or most of the protein activities involved in these processes

bull Some of the Cas proteins are functional analogs of the eukaryotic proteins involved in RNAi in particular components of RISC (RNA-Induced Silencing Complex) and form prokaryotic analogs of RISC (pRISCs)

Transcription

(protein-guided) RNA folding

specific transcription factor (COG1517)

cellular RNApol

polycistronic pre-psiRNA

p-dicer (Helicase+HD-Hydrolase)(COG1203 COG2254)

RNA processing(fast)

pre-psiRNA75-100 nt

RNA processing(slow)

p-dicer

psiRNA25-45 nt

p-slicer(COG1468 COG4343

COG1857)

RAMP

target RNA cleavage

p-RISC

plasmid or phage mRNA

Annealing to RNA target

p-RISC

The basic scheme of CASS functioning

psiRNA25-45 nt

RAMP

RAMP-RNA complex(unstable)plasmid or phage

mRNA

Primer elongationCASS RNApol(COG1353)

Amplified annealing

long dsRNA (stable)

p-dicerDuplex degradation

RAMPRAMP binding

Cycle continues

Annealing to RNA target

Variant of CASS functioning with polymerasepsi-RNA amplification(mostly in thermophiles but also in Mycobacteria)

plasmid or phage mRNA

pre-psiRNA75-100 nt

CASS RNApol(COG1353)Or other RT

Reverse transcription with random copy choice

Random RNA recombination and reverse transcription

dsDNA with CRISPRs and target-derived spacers

Homologous recombination with genomic CRISPR region

integrase(COG1518)

genomic DNA

genomic DNA with new target-derived spacer

OR

New psiRNA generation

Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12

Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity

Key validation

bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease

Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes

(1) Adaptation

(2) Expression amp Processing

(3) Interference

invading virus plasmid

CRISPRleader

5 4 3 2 1

6 5 4 3 2 1

invading virus plasmid

invading virus plasmid

host

host

cas operon

Cascade

Cas3

cas3 cas2cas1cas5

Van der Oost et al TIBS 2009

The three types of CRISPRCas systems and their signature genes

1343

1343

1343

1343

Helicase1203

HD-f

Cas2

Cas6

Cas6

Cas5

Cas8

Cas9

Cas1

Cas3

Cas10

Cas11

Cas7

Cas6

Cas6

Cas5 Cas6

Cas4

Cas6

13433513

1343

RuvC-f

3513

HNH

13433513

1857

18571857 RAMP1688

RAMP1688

RecB-f1468

RecB-f1468

Helicase1203

R AMP1583

1857 1857Helicase1203

1857Helicase1203

1343 y1724

ygcKygcL

EcoliCASS2

YpestCASS3

NmeniCASS4

MtubeCASS6 RAMP module

DvulgCASS1

Tneap and HmariCASS7

ApernCASS5

Type I (Helicase cassette)

Cascade

Type II (HNH-type)

Type III (RAMP module or Polymerase cassette)

CASS4a

1857 1343RecB-f1468

Helicase1203

BH0338MTH1090LA3191

1421 RAMP1567

RAMP15833337RAMP

1583

RAMP1583

RAMP1769

SPy1049

Cas6

Cas61857

RecB-f1468

RAMP1688

Helicase1203

AF0070

R AMP1583

1343

1517

Cas12 Cas13

RAMP1688

RAMP1688

RAMP1688

Makarova et al NRM submitted

Experimental data on CRISPRCas systems

1857

1857

1857

RAMP1688

10

Cordip DIP0037Bacfra BF3951

Wolsuc WS1444Camjej Cj1522c

Neimen NMA0630Pasmul PM1126

Strthe str0658Strpyo Spy 0770

Treden TDE0328Staepi SERP2463

Mycgal MGA 0523Mycmob MMOB0320

Porgin PG1982Legpne lpp0161

Wolsuc WS1615Sulsol SSO1450

Aerper APE1240Sulsol SSO1405

Pyraer PAE0200Pyraer PAE0081

Metmaz MM0559Metmaz MM3249

Arcful AF1878Mansuc MS1635

Niteur NE0111Myctub Rv2817c

Nostoc alr0381Synech slr7071Thethe TTHB145

Chrvio CV1229Xanaxo XAC3842

Chltep CT1130Desvul DVUA0134

Metcap MCA0651Azoarc ebA3283

Deigeo_DRAFT_1682Mansuc MS0981

Baccla ABC3592Bachal BH0341

Strpyo Spy 1286Vibvul VVA1544

Nostoc alr1468Synech slr7092

Synech slr7016Geosul GSU0057

Thethe TTHB224Nostoc alr1568

Metkan MK1312

Esccol b2755Saltyp STM2938

Phopro PBPRC0034Geosul GSU1392

Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639

Symthe_STH663Corjei jk0643

Nocfar nfa44220Strave SAV7537

Metcap MCA0930Cordip DIP2214

Chrvio CV1756Zymmob ZMO0680

Pasmul PM0311Erwcar ECA3679

Yerpes y1722Legpne lpl2837Phopro PBPRB1995

Acinet ACIAD2484

Clotet CTC01148Fusnuc FN1177

Aquaeo aq 369Theten TTE2658Themar TM1797

Bacfra BF2544Porgin PG2014

Halmar pNG4053Clotet CTC01463

Metjan MJ0378Pyrhor PH1245

Thekod TK0455Metthe MTH1084

Arcful AF2435Metace MA3670

Nanequ NEQ017Pyrhor PH0173

Pictor PTO0003Pictor PTO0049Thevol TVN0106

1343

1343

1343

1343

1518

1518

1343

1518

RecB-f4343

RecB-f1468

RecB-f1468

RecB-f1468

Helicase12031857 RAMP

1688

RAMP1688

Helicase1203

Helicase1203

RAMPBH0337

RAMPYgcH

RAMPy1727

RAMPy17261857

Helicase1203

HD-f

Helicase1203

HD-f

1343

HD-f

HD-f

HD-f

RuvC-f

AF0070

BH0338

y1724

ygcKygcL

RAMP1583

HTH

AF1870

SPy1049

3574

AF1870

PH0918

HTH 3574

1857 RAMP1688

RecB-f1468

Helicase1203

HD-f

AF1870

MTH1090

AF1873

RAMP1583

3574

AF1870

3574

1343

1

7

2

3

4

5

6

7a

Sulsol SSO1999Sulsol SSO1440

Sultok ST2639Sultok ST0032

Sulsol SSO1402Pyraer PAE0068

Arcful AF1874Pyrhor PH0917Thekod TK0450

Pyraby PAB1689

13433513

3513

1518

RuvC-f

HNH

4aHNH

1 0

Pyrhor PH0162

6

64

6

4

16

17

7

7

7

7

1

2

7

7

7

7

5 7

5 75 7

5 75 7

5 7

6 1

7

Metkan MK1314

Metace MA1931

Metmaz MM3356

Metbar Mbar_A1355

Strthe stu0960Myctub Rv2823cStaepi SERP2461

Themar TM1811Mansuc MS1651

Niteur NE0123Thethe TTP0102

Metthe MTH1082

Metthe MTH326

Metjan MJ1672Pictor PTO0053Thevol TVN0112

Metkan MK1297Thethe TTP0115

Bachal BH0328Synech sll7090

Aquaeo aq 387Theten TTE2639

Themar TM1794Arcful AF1867

Pyrfur PF1129Thefus Tfu1578

Porgin PG1987Sultok ST1979

Sulsol SSO1991Sulsol SSO1729

Sultok ST0012Sulaci Saci 2046Nostoc all1479

1421 RAMP1567

RAMP1583

RAMPMTH323

RAMP1769

XMTH324

ldquoPolymeraserdquo cassette

ldquoHelicaserdquo cassette(A)

(B)

(C)

Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4

Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56

Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)

mRNA targeting

Phylogeny of Cas1 and the 3 types of CRISPRCas systems

228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes

EcoliCASS2Type IType I

Type I

Type I

Type I

Type III

Type III

Type III

Type III

Type III

Type III

Type II

YpestCASS3

NmeniCASS4

PolymeraseRAMPmoduleMtubeCASS6

PolRAMP

DvulgCASS1

TneapHmariCASS7

ApernCASS5

cas1cas9 cas2 cas4

cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI

II

IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2

RAMP

Makarova et al NRM submitted

CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria

bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes

54

256

1537

26

516

2

9

7 56107

3

10

13

383

210

46

216

8

1

7 70211

10

1

0102030405060708090

100

1623

84 6

17 7

2322

0

9

0

0

151

2

1

21

17

20 1

0

1533 28 7 14

0

9 7 40

117 210

0102030405060708090

100

Type I Type II Type IIICas1presenceabsence

Makarova et al in preparation

Modular evolution of the 3 types of CRISPRCas

Makarova et al in preparation

Mol Microbiol 2011 Jan79(2)484-502

A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair

Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF

Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair

Back to repair functions

The 3 major modalities of evolution

Koonin Wolf Biol Direct 2009

CRISPRCas as a bona fide Lamarckian system

Koonin Wolf Biol Direct 2009

Phenomenon Biological rolefunction

Phyletic spread

Lamarckian criteria

Genomic changes caused by environmental factor

Changes are specific to relevant genomic loci

Changes provide adaptation to the causative factor

Bona fide LamarckianCRISPRCas Defense against

viruses and other mobile elements

Most of the Archaeaand many bacteria

Yes Yes Yes

piRNA Defense against transposable elements in germline

Animals Yes Yes Yes

HGT (specific cases)

Adaptation to new environment stress response resistance

Archaea bacteria unicellular eukaryotes

Yes Yes Yes

Quasi-LamarckianHGT (general phenomenon)

Diverse innovations Archaea bacteria unicellular eukaryotes

Yes No Yesno

Stress-induced mutagenesis

Stress responseresistanceadaptation to new conditions

Ubiquitous Yes No or partially

Yes (but general evolvabilityenhanced as well)

Diverse Lamarckian and quasi-Lamarckian phenomena

Koonin Wolf Biol Direct 2009

Stress as a gauge of evolutionary modality

Koonin Wolf Biol Direct 2009

bullEvolution of parasites is intrinsic to any replicator system

bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since

bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms

bullPerennial arms race between parasites and hosts is one of the principal factors of evolution

Valerian DoljaOregon State University

Yuri Wolf NCBITatiana Senkevich NIAID NIH

Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille

Didier Raoult et al

Bill MartinUniv Duesseldorf

Kira Makarova NCBI

  • Slide Number 1
  • What is a virus
  • Slide Number 3
  • Slide Number 4
  • Slide Number 5
  • Comparative genomics shows that viruses that cause human diseases belong to families that evolved hundreds of millions or even billions years agoViruses accompany the evolving cellular life throughout its history and might even predate it
  • Slide Number 7
  • Slide Number 8
  • Slide Number 9
  • Slide Number 10
  • Slide Number 11
  • Slide Number 12
  • Slide Number 13
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Slide Number 18
  • Slide Number 19
  • Slide Number 20
  • Slide Number 21
  • Slide Number 22
  • Slide Number 23
  • Slide Number 24
  • Slide Number 25
  • Slide Number 26
  • Slide Number 27
  • Slide Number 28
  • Slide Number 29
  • Slide Number 30
  • Slide Number 31
  • Slide Number 32
  • Slide Number 33
  • Slide Number 34
  • Slide Number 35
  • Slide Number 36
  • Evolution of antivirus defense systems
  • Slide Number 38
  • Slide Number 39
  • Slide Number 40
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins extreme sequence diversity
  • Slide Number 46
  • Slide Number 47
  • Slide Number 48
  • Slide Number 49
  • Hypothesis
  • Slide Number 51
  • Slide Number 52
  • Slide Number 53
  • Slide Number 54
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • The three types of CRISPRCas systems and their signature genes
  • Experimental data on CRISPRCas systems
  • Phylogeny of Cas1 and the 3 types of CRISPRCas systems
  • CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
  • Slide Number 62
  • Slide Number 63
  • Slide Number 64
  • Slide Number 65
  • Slide Number 66
  • Slide Number 67
  • Slide Number 68
  • Slide Number 69
Page 10: The Virus World, its evolution, evolution of antiviral defense

Boyer M Yutin N Pagnier I Barrassi L Fournous G Espinosa L Robert C AzzaS Sun S Rossmann MG Suzan-Monti M La Scola B Koonin EV Raoult DPNAS 2009106(51)21848-53

Giant Marseillevirus highlights the role of amoebae as a melting pot in emergence of chimeric microorganisms

Giant viruses such as Mimivirus isolated from amoeba found in aquatic habitatsshow biological sophistication comparable to that of simple cellular life formsand seem to evolve by similar mechanisms including extensive gene duplicationand horizontal gene transfer (HGT) possibly in part through a viral parasitethe virophage We report here the isolation of Marseille virus a previouslyuncharacterized giant virus of amoeba The virions of Marseillevirus encompass a 368-kb genome a minimum of 49 proteins and some messenger RNAs Phylogenetic analysis of core genes indicates that Marseillevirus is the prototypeof a family of nucleocytoplasmic large DNA viruses (NCLDV) of eukaryotes The genome repertoire of the virus is composed of typical NCLDV core genes and genes apparently obtained from eukaryotic hosts and their parasites or symbionts both bacterial and viral We propose that amoebae are melting potsldquoof microbial evolution where diverse forms emerge including giant viruses with complex gene repertoires of various origins

The mosaic composition of the Marseille virus genome

Gene content tree intervirus gene transfer

Amoeba as a melting pot for HGT between viruses andbacterial endosymbionts

There are really weird creatures out thereSome NCLDV host their own parasites

La Scola et al The virophage as a unique parasite of the giant mimivirus Nature 2008

Chimeric origin of the virophage genome

NCLDV

Mimivirus

Archaeal virus

NCLDVenvironmental only

Bacterialtransposon

La Scola et al Nature 2008 455100-4

Hypothesisorigin of the NCLDV (and other viruses ofEukaryotes) in the

the second melting pot of virus evolution ndasheukaryogenesis

Phage scaffold(virus hallmark genes)

Eukaryotic additionsdisplacements

Koonin Yutin Intervirology 2010

Poliovirus74 kb

VP0 3DPol2C 3CPro Angg

IRESVP1VP3 2APro 2B

3A 3B

Jelly-roll CPsSuperfamily 3

helicase RdRp

Viral hallmark genes

3C-Pro3DPol

RdRp3CPro g

VPg2C

S3H

Picornaviral lsquosignaturersquo genes

Protein sequence conservation

Some of the smallest viruses (this is where poliovirus belongs) The Big Bang of picorna-like virus evolution

Koonin Wolf Nagasaki Dolja Nat Rev Microbiol 2008 Dec6(12)925-39

Picorna-likeviruses possessdiverse arraysof hallmark andunique genes

Picorna-likeviral superfamily

Marine eukaryotic plankton carries a wealth ofpositive-strand RNA viruses nearly all belong to the picorna-like superfamily

Lang et al (2004) Virology 320206

HcRNAV44 kb

VP

HaRNAV86 kb

S-Pro Pol

VP1VP3VP2PolHel3

RsRNAV89 kb AnPolHel3 VP 1-3

SssRNAV90 kb An

PolHel3 VP 1-3C-Pro

sg sg

An

Nagasaki et al (2005) Appl Env Microbiol 718888

Culley et al (2006) Science 3121795

The amended Picornavirus-like

superfamily includes 14 recognized viral families

4 floating genera and15 unclassified positive-

strand anddouble-strand RNA

viruses that infect hosts from 4 of the 5 eukaryotic

supergroups-6 distinct clades

from RdRp phylogeny-diverse genome layouts

View from the viral side

6 major clades of picorna-likeviruses - 5 infect eukaryotesfrom different supergroups

The 5 supergroups of eukaryotes and their picorna-like viruses

Complementary view from thehost side

4 of the 5 eukaryoticsupergroups host diverse picorna-like viruses (no onechecked the 5th)

PolioV74 kb

VP0 3DPol2C 3CPro Angg

IRESVP1VP3 2APro 2B

3A 3B

Jelly-roll CPsSuperfamily 3

helicase RdRp

DNA phages Bacterial group II

retroelements

Bacterial and mitochondrialHtrA family proteases

Likely origins of the signature genes of the picorna-like superfamily were inferred usingsignature PSSMs and PSI-BLAST searches

3C-Pro

Hallmark genes of picorna-like viruses have distinct prokaryoticorigins

Radiation of major viralclades occurred in aldquoBig Bangrdquo during eukaryogenesis and antedates the divergenceof eukaryotic supergroups-the viruses then ldquosampledrdquo the hosts

Conclusions on the picorna and NCLDV storiesBig Bangs of virus evolution

bullPhylogenomic analysis of the picorna-like superfamily of eukaryotic RNA viruses indicates that the major lineages within this superfamily diverged prior to the divergence of the eukaryotic supergroupsbullBig Bang ndash an explosive early phase of viral evolutionbullMost likely the same pattern holds for other major groups ofviruses as illustrated by the evolutionary study of NCLDVa completely different group of virusesbullThe Big Bangs of eukaryotic virus evolution occurred concomitantly with a similar rapid phase of host evolution andcould be a manifestation of a general model of major evolutionarytransitions

Koonin The Biological Big Bang model for the major transitions in evolutionBiol Direct 2007 Aug 20221

Koonin Wolf Nagasaki Dolja Nat Rev Microbiol 2008 Dec6(12)925-39

+

1 Positive-strandRNA

3-30 kb - +R R ET

T+R

Class Replication cycle

2 Double-strandRNA

4-25 kb plusmnTr R E

T

+ plusmn

3 Negative-strandRNA

11-20 kbR R E

RdRp CPT

+-Tr -

+

Genetic cycles of RNA viruses

Diversity of viral genetic cycles versus the uniform genetic cycle of all cellular organisms Viruses are the

biospherersquos laboratory of genomic strategies

RdRp CP

RdRp CP

+4 Retroid

RNA viruses 7-12 kb +RT Tr E

T+Tr

plusmn

5 Retroid DNAviruses

elements2-10 kb +Tr

T+Tr

plusmnE RT

plusmn

RT CP

RT

Class Replication cycle

Genetic cycles of retroid viruses and retroelements

Genetic cycles of DNA viruses and plasmids 6 ssDNAvirusesplasmids2-11 kb

7 dsDNA virusesplasmids

5-1200 kb

+RCR RCR E

+Tr

plusmn +

TRCE

Tr

plusmn

+T

plusmnR E

Pr-Pol ALL CELLULARORGANISMS

YOU ARE HERE

I Genes with readily detectable homologs from cellular life forms 1 Genes with closely related homologs from cellular organisms

(typically the host of the given virus) present in a narrow group of viruses2 Genes that are conserved within a virus lineage or even several

lineages and have moderately close cellular homologsOrigin relatively recent (1) or ancient (2) acquisition from host

II Virus-specific genes3 ORFans ie genes without detectable homologs except possibly

in closely related viruses 4 Virus-specific genes that are conserved within a virus lineageAcquisition from host but with rapid divergence from ancestor once within viral genomes

III Viral hallmark genes5 Genes shared by many diverse virus lineages with only very distant

homologs in cellular organisms

Natural history of viral genes a one-page summary of viral comparative genomics

Contributions of different classes of viral genes to the genomes of different classes of viruses strong dependence on genome size

Genome size (log) 0 1 2 3

Recent acquisitions

Old acquisitions

Virus-specific ORFans

Virus-specific conserved

Virus hallmark

Most RNAvirusesretroelementsRCR replicons

Large RNA virusesadenotailedphages

NCLDVherpeslargephages

Natural history of viral genesViral Hallmark Genes

Shared by many diverse groups of viruses from the smallest RNA viruses to the giant DNA viruses

Strong support for monophyly of all viral members of the respective gene families

Only distant homologs in cellular organisms

Can be viewed as signatures of the lsquovirus statersquo

Play major roles in genome replication packagingand assembly

4 Rolling circle replicationinitiation endonuclease

1 Jelly-roll capsid protein

2 Superfamily 3 helicase

Protein products of viral hallmark genes

3 RNA-dependent RNA polymeraseand Reverse transcriptase

5 Viral archaeo-eukaryotic DNA primase

6 UL9-like superfamily 2 helicase

7 Packaging ATPase of the FtsK family

8 ATPase subunit of terminase

The primordial gene pool hypothesis (extremely counterintuitive ndash Santiago Elena Feb 16 2011)

The hallmark genes AND by implication the major lineages of modern viruses (at least viruses of prokaryotes) descend directly from a primordial gene pool

Koonin Senkevich Dolja Biol Direct 2006 129

Viral hallmark genes arebullpresent in a huge diversity of viruses and other selfish elementsbullrepresented only by remote homologs in cellular life forms

-synergy with the diversity of genomic strategies

A crucial corollary If viruses come directly from a primordial gene pool then origin of viruses isinextricably linked to the origin of cells

Cell degeneration Escaped genes Pr imordial geneticsystems

CELL

SMALLPARASITICCELL

VERYSMALLVIRUS

CHROMOSOME

PLASMID

VIRUS

RNA

VIRUS

PRE-CELLULAR LIFE FORMS

DNA

VIRUS

mRNA

VIRUS

Competing concepts of the or igin of viruses

CELLS

Origin and evolution of virus-like genetic

elementsin the pre-cellular

era

Koonin Martin TIG 2005Koonin Senkevich Dolja Biol Direct 2006 129Koonin Ann NY Acad Sci 2009

Replicon fusionyields chromosomes

bullViruses and virus-like genetic elements are not ldquojustrdquo pathogens they aredominant entities in the biospherebullEmergence of virus-like parasites is inevitable in any replicating systembullIn the pre-cellular epoch the genetic elements that later became viral andcellular genomes comprised a single pool in which they mixed matched andevolved new increasingly complex gene ensemblesbullDifferent replication strategies including RNA replication reverse transcriptionand DNA replication evolved already in the primordial genetic poolbullWith the emergence of prokaryotic cells a distinct pool of viral genes formed that retained its identity ever since as evidenced by the extant distribution ofviral hallmark genes ldquovirus worldrdquo or the virospherebullThe emergence of the eukaryotic cell was a second melting pot of virus evolution from which viruses of eukaryotes originated via recombination of genes from prokaryote viruses retroelements and the evolving eukaryotic hostbullViruses make essential contributions to the evolution of the genomes of cellularlife forms in particular as vehicles of HGT GTAs transducing phages

The ancient Virus World (VW)

Koonin EV Senkevich TG Dolja VV The ancient Virus World and evolution of cells Biol Direct 2006 Koonin EV Wolf YI Nagasaki K Dolja VV The complexity of the virus world Nat Rev Microbiol 2009

Evolution of antivirus defense systems

bull CRISPRCas system of adaptive immunity in prokaryotes

bull A case for Lamarckian evolutionbull The perennial virus-host arms race

CRISPR repeats and Cas genesCRISPR Clustered Regularly interspaced short palindromic repeatsCas CRISPR-associated (genes)

Sorek et al Nature Rev Microbiol 2008

Makarova KS Aravind L Grishin NV Rogozin IB Koonin EVA DNA repair system specific for thermophilic Archaea and bacteria predicted by genomic context analysisNucleic Acids Res 2002 Jan 1530(2)482-96

During a systematic analysis of conserved gene context in prokaryotic genomes a previously undetected complex partially conserved neighborhood consisting of more than 20 genes was discovered in most Archaea and some bacteria including the hyperthermophiles Thermotoga maritima and Aquifex aeolicus The gene composition and gene order in this neighborhood vary greatly between species but all versions have a stable conserved core that consists of five genes One of the core genes encodes a predicted DNA helicase often fused to a predicted HD-superfamily hydrolase and another encodes a RecB family exonuclease three core genes remain uncharacterized but one of these might encode a nuclease of a new family helliphelliphelliphellipThe functional features of the proteins encoded in this neighborhood suggest that they comprise a previously undetected DNA repair system which to our knowledge is the first repair system largely specific for thermophiles to be identified

Protein components of the system an update with unification of many diverse families

~25 families altogetherFamily SubfamilyA Phyletic

distributionBComments

1 COG1518 COG1518 (cas1) All Putative novel nucleaseintegrase Mostly α-helical protein

2 COG1343 COG1343 (cas2) COG3512ygbF-like

MTH324-likey1723_N-like

All Small protein related to VapD fused to helicase (COG1203) iny1723-like proteins

3 COG1203 COG1203 (cas3) All DNA helicase Most proteins have fusion to HD nuclease

4 RecB-like nuclease

COG1468 (cas4) COG4343 All RecB-like nuclease Contains three-cysteine C-terminal cluster

5 RAMP Repair-associated mysterious

protein

COG1688 (cas5)COG1769 COG1583COG1567 COG1336COG1367 COG1604COG1337 COG1332COG5551

BH0337-likeMJ0978-likeYgcH-like

y1726-like y1727-like

All Belong to ldquoRAMPrdquo family possibly RNA-binding proteinstructurally related to duplicated ferredoxin fold (PDB 1wj9)

6 COG1857 COG1857 COG3649 YgcJ-like y1725-like

All αβ protein predicted nuclease or integrase

7 HD-like nuclease

COG1203 (N-terminus) COG2254 All HD-like nuclease

8 BH0338 BH0338-likeMTH1090-like

All mostlyarchaea and FIRM

Large Zn-finger containing proteins probably nucleases (nucleaseactivity was shown for MTH1090 (ref)

9 COG1353 predicted

polymerase

COG1353 Most archaea some bacteria

Predicted palm-domain polymerase distantly related to viral RdRp and RT

hellipand ~15 other less common proteins Makarova et al BD 2006

CRISPR clustered regularly interspaced short palindromic repeats

CRISPR repeats

TB and TA

Strepto-like

Ecoli-like

Pasteurella-like

Cas2COG1343

Cas1COG1518

Cas4COG1468

Cas3COG1203

ldquoThe common structural characteristics of CRISPR loci are (i) the presence of multiple short direct repeats which show no or very little sequence variation within a given locus (ii) the presence of non-repetitive spacer sequences between the repeats of similar size (iii) the presence of a common leader sequence of a few hundred basepairs in most species harbouring multiple CRISPR loci (iv) the absence of long open reading frames within the locus and (v) the presence of the cas1 gene accompanied by the cas2 cas3 or cas4 genes in CRISPR-containing speciesrdquo

The cas genes are our ldquorepairrdquo system

ldquoHelicaserdquo cassette

ldquoPolymeraserdquo cassette

paragon of prokaryotic genome plasticity-extensive gene shuffling-evidence of multiple horizontal transfers-widespread gene loss

RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins

extreme sequence diversity MOTIFs specific I II III IV V COGs 13361367 hhshhGs ust-lKGhh+hh hhGtt hD lGhttsGh 160413371332 y1726-like slhlpEKuVRGT lRTIDs YGuVTsGhuh COG1851 hGphpGpsaFh hGFGRh BH0337-like TpA-h+GIh-uIh hhLpDV LGsREhuht COG1567 hhhpp ups-lhtAhh luscoGhGh COG1769 hhhh+Ph-hhh ss-hhGhlhsh hGh hGtcp+hsthchp COG1688 (Cas5) hhhhhts sssshhGhlsh lGttphh COGs 15835551 hhhhoPhhl hGtppshGFGl YgcH-like hHphlh hGu+uhGhGhh y1727-like LHphLh GFsthGLStss MJ0978-like hHNH lG+tsuhGhGol

Ferredoxin foldRNA-recognition motif (RRM)

The sequence similarity space of CRISPR repeats visualized with the BioLayoutprogram [26] Dots denote individual repeat sequences connecting lines represent Smith-Waterman similarities such that closer dots represent more similar sequences Dot colors denote cluster association as derived from MCL clustering The 12 largest clusters are indicated by circles together with their sequence logos coarse phylogenetic composition and sample secondary structures where applicableKunin et al Genome Biology 2007 8R61 doi101186gb-2007-8-4-r61

CRISPR show extreme diversity and complex clustering

Arcful GTTGAAATC-------AGACCAAAATGGGATTGAAAG- Metthe GTTAAAATC-------AGACCAAAATGGGATTGAAAT- Pyraby GTTCCAATA-------AGACTAGAATAGAATTGAAAG- Pyrhor GTTTAATAA--------GACTAAAATAGAATTGAAAG- Sulsol GATTAATCC-------------CAAAAGAATTGAAAG- Pyrhor GTTTCCGTA--------GAACTAAATAGTGTGGAAAG- Pyraby GTTTCCGTA--------GAACTTAGTAGTGTGGAAAG- Pyraer GTTTCAACT------------ATCTTTTGATTTTTGG- Pictor GTTAAATAA-------TAACCTAAATAGGATTGAAAG- Theaer GTAAAATAG---------ACCTTAATAGGATTGAAAG- Metace GTTTCAATC-------CCTCAAAGGTCTGATTTTAAC- Sultok GATGAATCC------------CAAAAGGAATTGAAAG- Sulaci GTTTTAGTT-------------TCTTGTCGTTATTAC- Pictor GTTTAAGAA--------TTACTAGATAGTATGGAGT-- Halmar GTTTCAGAC-------GGACCCTTGTGGGGTTGAAGC- Metjan ATTAAAATC-------AGACCGTTTCGGAATGGAAAT- Metkan GTTTCATTA-CCCGTATTATTACGGGTTAATTGCGAG- Nanequ CTTTCAATA---------TTTCTAATATATTAGAAAC- Themar GTTTCAATA-------CTTCCTTAGAGGTATGGAAAC- Theten GTTTCAATC-------CCTTTTAGGTAGGCTAAAAAC Thethe GTTGCAAAC-CTCGTTAGCCTCGTAGAGGATTGAAAC- Aquaeo GTTTTAACT--------CCACACGGTACATTAGAAAC- Bachal GTCGCACTC----TATATGGGT-GCGTGGATTGAAAT- Azoarcus GTGTTCCCC-------GCGCATCGCGGGGGTTGAAG-- Chltep GTCTTCCCC--------ACGCC-CGTGGGGGTGTTTC- Chrvio GTGCTCCCC-------CACGCA-CGTGGGGATGAACCG Clotet GTATTAGTA-------GCACCA-TATTGGAATGTAAAT Anabaena GTTTTAATTAACAAAAATCCCTATCAGGGATTGAAAC- Myctub GTTTCCGTC--CCCTCTCGGGGTTTTGGGTCTGACGAC Yerpes GTTCACTGC--------CGCACAGGCAGCTTAGAAA-- Metcap GTTTCAATCCACTCCCGGCTATTTAGCCGGGAGATAC- Mycgal GTTTTAGCA-CTGTACAATACTTGTGTAAGCAATAAC- Pasmul GTTAACTGC--------CGTATAGGCAGCTTAGAAA-- Pasmul GTTGTAGTTCCCTCTCTCATTTCGCAGTGCTACAAT-- Pasmul GTTCACCAT--------CGTGTAGATGGCTTAGAAA--

All

A large subset of CRISPRs is conserved among diverse specieseven between archaea and bacteria suggesting that

CRISPRs are horizontally transferred together with cas genes

Mojica FJ Diez-Villasenor C Garcia-Martinez J Soria E

Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elementsJ Mol Evol 2005 Feb60(2)174-82

Here we show that CRISPR spacers derive from preexisting sequences either chromosomal or within transmissible genetic elements such as bacteriophages and conjugative plasmids

helliphelliphelliphellipThe transcription of the CRISPR loci (Tang et al 2002) suggests that such activity could be executed by CRISPR-RNA molecules acting as regulatory RNA that specifically recognizes the target through the homologous RNA-spacer sequence similarly to the eukaryotic interference RNA

HypothesisCRISPRCasbull is a prokaryotic immunity system that functions on the

RNAi principle

bull integrates short fragments of essential phageplasmid genes into CRISPRs

bull When expressed these fragments (psiRNA ndash after prokaryotic siRNA) silence the target gene and make the organism immune to the respective agent

bull contains all or most of the protein activities involved in these processes

bull Some of the Cas proteins are functional analogs of the eukaryotic proteins involved in RNAi in particular components of RISC (RNA-Induced Silencing Complex) and form prokaryotic analogs of RISC (pRISCs)

Transcription

(protein-guided) RNA folding

specific transcription factor (COG1517)

cellular RNApol

polycistronic pre-psiRNA

p-dicer (Helicase+HD-Hydrolase)(COG1203 COG2254)

RNA processing(fast)

pre-psiRNA75-100 nt

RNA processing(slow)

p-dicer

psiRNA25-45 nt

p-slicer(COG1468 COG4343

COG1857)

RAMP

target RNA cleavage

p-RISC

plasmid or phage mRNA

Annealing to RNA target

p-RISC

The basic scheme of CASS functioning

psiRNA25-45 nt

RAMP

RAMP-RNA complex(unstable)plasmid or phage

mRNA

Primer elongationCASS RNApol(COG1353)

Amplified annealing

long dsRNA (stable)

p-dicerDuplex degradation

RAMPRAMP binding

Cycle continues

Annealing to RNA target

Variant of CASS functioning with polymerasepsi-RNA amplification(mostly in thermophiles but also in Mycobacteria)

plasmid or phage mRNA

pre-psiRNA75-100 nt

CASS RNApol(COG1353)Or other RT

Reverse transcription with random copy choice

Random RNA recombination and reverse transcription

dsDNA with CRISPRs and target-derived spacers

Homologous recombination with genomic CRISPR region

integrase(COG1518)

genomic DNA

genomic DNA with new target-derived spacer

OR

New psiRNA generation

Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12

Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity

Key validation

bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease

Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes

(1) Adaptation

(2) Expression amp Processing

(3) Interference

invading virus plasmid

CRISPRleader

5 4 3 2 1

6 5 4 3 2 1

invading virus plasmid

invading virus plasmid

host

host

cas operon

Cascade

Cas3

cas3 cas2cas1cas5

Van der Oost et al TIBS 2009

The three types of CRISPRCas systems and their signature genes

1343

1343

1343

1343

Helicase1203

HD-f

Cas2

Cas6

Cas6

Cas5

Cas8

Cas9

Cas1

Cas3

Cas10

Cas11

Cas7

Cas6

Cas6

Cas5 Cas6

Cas4

Cas6

13433513

1343

RuvC-f

3513

HNH

13433513

1857

18571857 RAMP1688

RAMP1688

RecB-f1468

RecB-f1468

Helicase1203

R AMP1583

1857 1857Helicase1203

1857Helicase1203

1343 y1724

ygcKygcL

EcoliCASS2

YpestCASS3

NmeniCASS4

MtubeCASS6 RAMP module

DvulgCASS1

Tneap and HmariCASS7

ApernCASS5

Type I (Helicase cassette)

Cascade

Type II (HNH-type)

Type III (RAMP module or Polymerase cassette)

CASS4a

1857 1343RecB-f1468

Helicase1203

BH0338MTH1090LA3191

1421 RAMP1567

RAMP15833337RAMP

1583

RAMP1583

RAMP1769

SPy1049

Cas6

Cas61857

RecB-f1468

RAMP1688

Helicase1203

AF0070

R AMP1583

1343

1517

Cas12 Cas13

RAMP1688

RAMP1688

RAMP1688

Makarova et al NRM submitted

Experimental data on CRISPRCas systems

1857

1857

1857

RAMP1688

10

Cordip DIP0037Bacfra BF3951

Wolsuc WS1444Camjej Cj1522c

Neimen NMA0630Pasmul PM1126

Strthe str0658Strpyo Spy 0770

Treden TDE0328Staepi SERP2463

Mycgal MGA 0523Mycmob MMOB0320

Porgin PG1982Legpne lpp0161

Wolsuc WS1615Sulsol SSO1450

Aerper APE1240Sulsol SSO1405

Pyraer PAE0200Pyraer PAE0081

Metmaz MM0559Metmaz MM3249

Arcful AF1878Mansuc MS1635

Niteur NE0111Myctub Rv2817c

Nostoc alr0381Synech slr7071Thethe TTHB145

Chrvio CV1229Xanaxo XAC3842

Chltep CT1130Desvul DVUA0134

Metcap MCA0651Azoarc ebA3283

Deigeo_DRAFT_1682Mansuc MS0981

Baccla ABC3592Bachal BH0341

Strpyo Spy 1286Vibvul VVA1544

Nostoc alr1468Synech slr7092

Synech slr7016Geosul GSU0057

Thethe TTHB224Nostoc alr1568

Metkan MK1312

Esccol b2755Saltyp STM2938

Phopro PBPRC0034Geosul GSU1392

Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639

Symthe_STH663Corjei jk0643

Nocfar nfa44220Strave SAV7537

Metcap MCA0930Cordip DIP2214

Chrvio CV1756Zymmob ZMO0680

Pasmul PM0311Erwcar ECA3679

Yerpes y1722Legpne lpl2837Phopro PBPRB1995

Acinet ACIAD2484

Clotet CTC01148Fusnuc FN1177

Aquaeo aq 369Theten TTE2658Themar TM1797

Bacfra BF2544Porgin PG2014

Halmar pNG4053Clotet CTC01463

Metjan MJ0378Pyrhor PH1245

Thekod TK0455Metthe MTH1084

Arcful AF2435Metace MA3670

Nanequ NEQ017Pyrhor PH0173

Pictor PTO0003Pictor PTO0049Thevol TVN0106

1343

1343

1343

1343

1518

1518

1343

1518

RecB-f4343

RecB-f1468

RecB-f1468

RecB-f1468

Helicase12031857 RAMP

1688

RAMP1688

Helicase1203

Helicase1203

RAMPBH0337

RAMPYgcH

RAMPy1727

RAMPy17261857

Helicase1203

HD-f

Helicase1203

HD-f

1343

HD-f

HD-f

HD-f

RuvC-f

AF0070

BH0338

y1724

ygcKygcL

RAMP1583

HTH

AF1870

SPy1049

3574

AF1870

PH0918

HTH 3574

1857 RAMP1688

RecB-f1468

Helicase1203

HD-f

AF1870

MTH1090

AF1873

RAMP1583

3574

AF1870

3574

1343

1

7

2

3

4

5

6

7a

Sulsol SSO1999Sulsol SSO1440

Sultok ST2639Sultok ST0032

Sulsol SSO1402Pyraer PAE0068

Arcful AF1874Pyrhor PH0917Thekod TK0450

Pyraby PAB1689

13433513

3513

1518

RuvC-f

HNH

4aHNH

1 0

Pyrhor PH0162

6

64

6

4

16

17

7

7

7

7

1

2

7

7

7

7

5 7

5 75 7

5 75 7

5 7

6 1

7

Metkan MK1314

Metace MA1931

Metmaz MM3356

Metbar Mbar_A1355

Strthe stu0960Myctub Rv2823cStaepi SERP2461

Themar TM1811Mansuc MS1651

Niteur NE0123Thethe TTP0102

Metthe MTH1082

Metthe MTH326

Metjan MJ1672Pictor PTO0053Thevol TVN0112

Metkan MK1297Thethe TTP0115

Bachal BH0328Synech sll7090

Aquaeo aq 387Theten TTE2639

Themar TM1794Arcful AF1867

Pyrfur PF1129Thefus Tfu1578

Porgin PG1987Sultok ST1979

Sulsol SSO1991Sulsol SSO1729

Sultok ST0012Sulaci Saci 2046Nostoc all1479

1421 RAMP1567

RAMP1583

RAMPMTH323

RAMP1769

XMTH324

ldquoPolymeraserdquo cassette

ldquoHelicaserdquo cassette(A)

(B)

(C)

Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4

Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56

Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)

mRNA targeting

Phylogeny of Cas1 and the 3 types of CRISPRCas systems

228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes

EcoliCASS2Type IType I

Type I

Type I

Type I

Type III

Type III

Type III

Type III

Type III

Type III

Type II

YpestCASS3

NmeniCASS4

PolymeraseRAMPmoduleMtubeCASS6

PolRAMP

DvulgCASS1

TneapHmariCASS7

ApernCASS5

cas1cas9 cas2 cas4

cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI

II

IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2

RAMP

Makarova et al NRM submitted

CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria

bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes

54

256

1537

26

516

2

9

7 56107

3

10

13

383

210

46

216

8

1

7 70211

10

1

0102030405060708090

100

1623

84 6

17 7

2322

0

9

0

0

151

2

1

21

17

20 1

0

1533 28 7 14

0

9 7 40

117 210

0102030405060708090

100

Type I Type II Type IIICas1presenceabsence

Makarova et al in preparation

Modular evolution of the 3 types of CRISPRCas

Makarova et al in preparation

Mol Microbiol 2011 Jan79(2)484-502

A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair

Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF

Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair

Back to repair functions

The 3 major modalities of evolution

Koonin Wolf Biol Direct 2009

CRISPRCas as a bona fide Lamarckian system

Koonin Wolf Biol Direct 2009

Phenomenon Biological rolefunction

Phyletic spread

Lamarckian criteria

Genomic changes caused by environmental factor

Changes are specific to relevant genomic loci

Changes provide adaptation to the causative factor

Bona fide LamarckianCRISPRCas Defense against

viruses and other mobile elements

Most of the Archaeaand many bacteria

Yes Yes Yes

piRNA Defense against transposable elements in germline

Animals Yes Yes Yes

HGT (specific cases)

Adaptation to new environment stress response resistance

Archaea bacteria unicellular eukaryotes

Yes Yes Yes

Quasi-LamarckianHGT (general phenomenon)

Diverse innovations Archaea bacteria unicellular eukaryotes

Yes No Yesno

Stress-induced mutagenesis

Stress responseresistanceadaptation to new conditions

Ubiquitous Yes No or partially

Yes (but general evolvabilityenhanced as well)

Diverse Lamarckian and quasi-Lamarckian phenomena

Koonin Wolf Biol Direct 2009

Stress as a gauge of evolutionary modality

Koonin Wolf Biol Direct 2009

bullEvolution of parasites is intrinsic to any replicator system

bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since

bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms

bullPerennial arms race between parasites and hosts is one of the principal factors of evolution

Valerian DoljaOregon State University

Yuri Wolf NCBITatiana Senkevich NIAID NIH

Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille

Didier Raoult et al

Bill MartinUniv Duesseldorf

Kira Makarova NCBI

  • Slide Number 1
  • What is a virus
  • Slide Number 3
  • Slide Number 4
  • Slide Number 5
  • Comparative genomics shows that viruses that cause human diseases belong to families that evolved hundreds of millions or even billions years agoViruses accompany the evolving cellular life throughout its history and might even predate it
  • Slide Number 7
  • Slide Number 8
  • Slide Number 9
  • Slide Number 10
  • Slide Number 11
  • Slide Number 12
  • Slide Number 13
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Slide Number 18
  • Slide Number 19
  • Slide Number 20
  • Slide Number 21
  • Slide Number 22
  • Slide Number 23
  • Slide Number 24
  • Slide Number 25
  • Slide Number 26
  • Slide Number 27
  • Slide Number 28
  • Slide Number 29
  • Slide Number 30
  • Slide Number 31
  • Slide Number 32
  • Slide Number 33
  • Slide Number 34
  • Slide Number 35
  • Slide Number 36
  • Evolution of antivirus defense systems
  • Slide Number 38
  • Slide Number 39
  • Slide Number 40
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins extreme sequence diversity
  • Slide Number 46
  • Slide Number 47
  • Slide Number 48
  • Slide Number 49
  • Hypothesis
  • Slide Number 51
  • Slide Number 52
  • Slide Number 53
  • Slide Number 54
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • The three types of CRISPRCas systems and their signature genes
  • Experimental data on CRISPRCas systems
  • Phylogeny of Cas1 and the 3 types of CRISPRCas systems
  • CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
  • Slide Number 62
  • Slide Number 63
  • Slide Number 64
  • Slide Number 65
  • Slide Number 66
  • Slide Number 67
  • Slide Number 68
  • Slide Number 69
Page 11: The Virus World, its evolution, evolution of antiviral defense

The mosaic composition of the Marseille virus genome

Gene content tree intervirus gene transfer

Amoeba as a melting pot for HGT between viruses andbacterial endosymbionts

There are really weird creatures out thereSome NCLDV host their own parasites

La Scola et al The virophage as a unique parasite of the giant mimivirus Nature 2008

Chimeric origin of the virophage genome

NCLDV

Mimivirus

Archaeal virus

NCLDVenvironmental only

Bacterialtransposon

La Scola et al Nature 2008 455100-4

Hypothesisorigin of the NCLDV (and other viruses ofEukaryotes) in the

the second melting pot of virus evolution ndasheukaryogenesis

Phage scaffold(virus hallmark genes)

Eukaryotic additionsdisplacements

Koonin Yutin Intervirology 2010

Poliovirus74 kb

VP0 3DPol2C 3CPro Angg

IRESVP1VP3 2APro 2B

3A 3B

Jelly-roll CPsSuperfamily 3

helicase RdRp

Viral hallmark genes

3C-Pro3DPol

RdRp3CPro g

VPg2C

S3H

Picornaviral lsquosignaturersquo genes

Protein sequence conservation

Some of the smallest viruses (this is where poliovirus belongs) The Big Bang of picorna-like virus evolution

Koonin Wolf Nagasaki Dolja Nat Rev Microbiol 2008 Dec6(12)925-39

Picorna-likeviruses possessdiverse arraysof hallmark andunique genes

Picorna-likeviral superfamily

Marine eukaryotic plankton carries a wealth ofpositive-strand RNA viruses nearly all belong to the picorna-like superfamily

Lang et al (2004) Virology 320206

HcRNAV44 kb

VP

HaRNAV86 kb

S-Pro Pol

VP1VP3VP2PolHel3

RsRNAV89 kb AnPolHel3 VP 1-3

SssRNAV90 kb An

PolHel3 VP 1-3C-Pro

sg sg

An

Nagasaki et al (2005) Appl Env Microbiol 718888

Culley et al (2006) Science 3121795

The amended Picornavirus-like

superfamily includes 14 recognized viral families

4 floating genera and15 unclassified positive-

strand anddouble-strand RNA

viruses that infect hosts from 4 of the 5 eukaryotic

supergroups-6 distinct clades

from RdRp phylogeny-diverse genome layouts

View from the viral side

6 major clades of picorna-likeviruses - 5 infect eukaryotesfrom different supergroups

The 5 supergroups of eukaryotes and their picorna-like viruses

Complementary view from thehost side

4 of the 5 eukaryoticsupergroups host diverse picorna-like viruses (no onechecked the 5th)

PolioV74 kb

VP0 3DPol2C 3CPro Angg

IRESVP1VP3 2APro 2B

3A 3B

Jelly-roll CPsSuperfamily 3

helicase RdRp

DNA phages Bacterial group II

retroelements

Bacterial and mitochondrialHtrA family proteases

Likely origins of the signature genes of the picorna-like superfamily were inferred usingsignature PSSMs and PSI-BLAST searches

3C-Pro

Hallmark genes of picorna-like viruses have distinct prokaryoticorigins

Radiation of major viralclades occurred in aldquoBig Bangrdquo during eukaryogenesis and antedates the divergenceof eukaryotic supergroups-the viruses then ldquosampledrdquo the hosts

Conclusions on the picorna and NCLDV storiesBig Bangs of virus evolution

bullPhylogenomic analysis of the picorna-like superfamily of eukaryotic RNA viruses indicates that the major lineages within this superfamily diverged prior to the divergence of the eukaryotic supergroupsbullBig Bang ndash an explosive early phase of viral evolutionbullMost likely the same pattern holds for other major groups ofviruses as illustrated by the evolutionary study of NCLDVa completely different group of virusesbullThe Big Bangs of eukaryotic virus evolution occurred concomitantly with a similar rapid phase of host evolution andcould be a manifestation of a general model of major evolutionarytransitions

Koonin The Biological Big Bang model for the major transitions in evolutionBiol Direct 2007 Aug 20221

Koonin Wolf Nagasaki Dolja Nat Rev Microbiol 2008 Dec6(12)925-39

+

1 Positive-strandRNA

3-30 kb - +R R ET

T+R

Class Replication cycle

2 Double-strandRNA

4-25 kb plusmnTr R E

T

+ plusmn

3 Negative-strandRNA

11-20 kbR R E

RdRp CPT

+-Tr -

+

Genetic cycles of RNA viruses

Diversity of viral genetic cycles versus the uniform genetic cycle of all cellular organisms Viruses are the

biospherersquos laboratory of genomic strategies

RdRp CP

RdRp CP

+4 Retroid

RNA viruses 7-12 kb +RT Tr E

T+Tr

plusmn

5 Retroid DNAviruses

elements2-10 kb +Tr

T+Tr

plusmnE RT

plusmn

RT CP

RT

Class Replication cycle

Genetic cycles of retroid viruses and retroelements

Genetic cycles of DNA viruses and plasmids 6 ssDNAvirusesplasmids2-11 kb

7 dsDNA virusesplasmids

5-1200 kb

+RCR RCR E

+Tr

plusmn +

TRCE

Tr

plusmn

+T

plusmnR E

Pr-Pol ALL CELLULARORGANISMS

YOU ARE HERE

I Genes with readily detectable homologs from cellular life forms 1 Genes with closely related homologs from cellular organisms

(typically the host of the given virus) present in a narrow group of viruses2 Genes that are conserved within a virus lineage or even several

lineages and have moderately close cellular homologsOrigin relatively recent (1) or ancient (2) acquisition from host

II Virus-specific genes3 ORFans ie genes without detectable homologs except possibly

in closely related viruses 4 Virus-specific genes that are conserved within a virus lineageAcquisition from host but with rapid divergence from ancestor once within viral genomes

III Viral hallmark genes5 Genes shared by many diverse virus lineages with only very distant

homologs in cellular organisms

Natural history of viral genes a one-page summary of viral comparative genomics

Contributions of different classes of viral genes to the genomes of different classes of viruses strong dependence on genome size

Genome size (log) 0 1 2 3

Recent acquisitions

Old acquisitions

Virus-specific ORFans

Virus-specific conserved

Virus hallmark

Most RNAvirusesretroelementsRCR replicons

Large RNA virusesadenotailedphages

NCLDVherpeslargephages

Natural history of viral genesViral Hallmark Genes

Shared by many diverse groups of viruses from the smallest RNA viruses to the giant DNA viruses

Strong support for monophyly of all viral members of the respective gene families

Only distant homologs in cellular organisms

Can be viewed as signatures of the lsquovirus statersquo

Play major roles in genome replication packagingand assembly

4 Rolling circle replicationinitiation endonuclease

1 Jelly-roll capsid protein

2 Superfamily 3 helicase

Protein products of viral hallmark genes

3 RNA-dependent RNA polymeraseand Reverse transcriptase

5 Viral archaeo-eukaryotic DNA primase

6 UL9-like superfamily 2 helicase

7 Packaging ATPase of the FtsK family

8 ATPase subunit of terminase

The primordial gene pool hypothesis (extremely counterintuitive ndash Santiago Elena Feb 16 2011)

The hallmark genes AND by implication the major lineages of modern viruses (at least viruses of prokaryotes) descend directly from a primordial gene pool

Koonin Senkevich Dolja Biol Direct 2006 129

Viral hallmark genes arebullpresent in a huge diversity of viruses and other selfish elementsbullrepresented only by remote homologs in cellular life forms

-synergy with the diversity of genomic strategies

A crucial corollary If viruses come directly from a primordial gene pool then origin of viruses isinextricably linked to the origin of cells

Cell degeneration Escaped genes Pr imordial geneticsystems

CELL

SMALLPARASITICCELL

VERYSMALLVIRUS

CHROMOSOME

PLASMID

VIRUS

RNA

VIRUS

PRE-CELLULAR LIFE FORMS

DNA

VIRUS

mRNA

VIRUS

Competing concepts of the or igin of viruses

CELLS

Origin and evolution of virus-like genetic

elementsin the pre-cellular

era

Koonin Martin TIG 2005Koonin Senkevich Dolja Biol Direct 2006 129Koonin Ann NY Acad Sci 2009

Replicon fusionyields chromosomes

bullViruses and virus-like genetic elements are not ldquojustrdquo pathogens they aredominant entities in the biospherebullEmergence of virus-like parasites is inevitable in any replicating systembullIn the pre-cellular epoch the genetic elements that later became viral andcellular genomes comprised a single pool in which they mixed matched andevolved new increasingly complex gene ensemblesbullDifferent replication strategies including RNA replication reverse transcriptionand DNA replication evolved already in the primordial genetic poolbullWith the emergence of prokaryotic cells a distinct pool of viral genes formed that retained its identity ever since as evidenced by the extant distribution ofviral hallmark genes ldquovirus worldrdquo or the virospherebullThe emergence of the eukaryotic cell was a second melting pot of virus evolution from which viruses of eukaryotes originated via recombination of genes from prokaryote viruses retroelements and the evolving eukaryotic hostbullViruses make essential contributions to the evolution of the genomes of cellularlife forms in particular as vehicles of HGT GTAs transducing phages

The ancient Virus World (VW)

Koonin EV Senkevich TG Dolja VV The ancient Virus World and evolution of cells Biol Direct 2006 Koonin EV Wolf YI Nagasaki K Dolja VV The complexity of the virus world Nat Rev Microbiol 2009

Evolution of antivirus defense systems

bull CRISPRCas system of adaptive immunity in prokaryotes

bull A case for Lamarckian evolutionbull The perennial virus-host arms race

CRISPR repeats and Cas genesCRISPR Clustered Regularly interspaced short palindromic repeatsCas CRISPR-associated (genes)

Sorek et al Nature Rev Microbiol 2008

Makarova KS Aravind L Grishin NV Rogozin IB Koonin EVA DNA repair system specific for thermophilic Archaea and bacteria predicted by genomic context analysisNucleic Acids Res 2002 Jan 1530(2)482-96

During a systematic analysis of conserved gene context in prokaryotic genomes a previously undetected complex partially conserved neighborhood consisting of more than 20 genes was discovered in most Archaea and some bacteria including the hyperthermophiles Thermotoga maritima and Aquifex aeolicus The gene composition and gene order in this neighborhood vary greatly between species but all versions have a stable conserved core that consists of five genes One of the core genes encodes a predicted DNA helicase often fused to a predicted HD-superfamily hydrolase and another encodes a RecB family exonuclease three core genes remain uncharacterized but one of these might encode a nuclease of a new family helliphelliphelliphellipThe functional features of the proteins encoded in this neighborhood suggest that they comprise a previously undetected DNA repair system which to our knowledge is the first repair system largely specific for thermophiles to be identified

Protein components of the system an update with unification of many diverse families

~25 families altogetherFamily SubfamilyA Phyletic

distributionBComments

1 COG1518 COG1518 (cas1) All Putative novel nucleaseintegrase Mostly α-helical protein

2 COG1343 COG1343 (cas2) COG3512ygbF-like

MTH324-likey1723_N-like

All Small protein related to VapD fused to helicase (COG1203) iny1723-like proteins

3 COG1203 COG1203 (cas3) All DNA helicase Most proteins have fusion to HD nuclease

4 RecB-like nuclease

COG1468 (cas4) COG4343 All RecB-like nuclease Contains three-cysteine C-terminal cluster

5 RAMP Repair-associated mysterious

protein

COG1688 (cas5)COG1769 COG1583COG1567 COG1336COG1367 COG1604COG1337 COG1332COG5551

BH0337-likeMJ0978-likeYgcH-like

y1726-like y1727-like

All Belong to ldquoRAMPrdquo family possibly RNA-binding proteinstructurally related to duplicated ferredoxin fold (PDB 1wj9)

6 COG1857 COG1857 COG3649 YgcJ-like y1725-like

All αβ protein predicted nuclease or integrase

7 HD-like nuclease

COG1203 (N-terminus) COG2254 All HD-like nuclease

8 BH0338 BH0338-likeMTH1090-like

All mostlyarchaea and FIRM

Large Zn-finger containing proteins probably nucleases (nucleaseactivity was shown for MTH1090 (ref)

9 COG1353 predicted

polymerase

COG1353 Most archaea some bacteria

Predicted palm-domain polymerase distantly related to viral RdRp and RT

hellipand ~15 other less common proteins Makarova et al BD 2006

CRISPR clustered regularly interspaced short palindromic repeats

CRISPR repeats

TB and TA

Strepto-like

Ecoli-like

Pasteurella-like

Cas2COG1343

Cas1COG1518

Cas4COG1468

Cas3COG1203

ldquoThe common structural characteristics of CRISPR loci are (i) the presence of multiple short direct repeats which show no or very little sequence variation within a given locus (ii) the presence of non-repetitive spacer sequences between the repeats of similar size (iii) the presence of a common leader sequence of a few hundred basepairs in most species harbouring multiple CRISPR loci (iv) the absence of long open reading frames within the locus and (v) the presence of the cas1 gene accompanied by the cas2 cas3 or cas4 genes in CRISPR-containing speciesrdquo

The cas genes are our ldquorepairrdquo system

ldquoHelicaserdquo cassette

ldquoPolymeraserdquo cassette

paragon of prokaryotic genome plasticity-extensive gene shuffling-evidence of multiple horizontal transfers-widespread gene loss

RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins

extreme sequence diversity MOTIFs specific I II III IV V COGs 13361367 hhshhGs ust-lKGhh+hh hhGtt hD lGhttsGh 160413371332 y1726-like slhlpEKuVRGT lRTIDs YGuVTsGhuh COG1851 hGphpGpsaFh hGFGRh BH0337-like TpA-h+GIh-uIh hhLpDV LGsREhuht COG1567 hhhpp ups-lhtAhh luscoGhGh COG1769 hhhh+Ph-hhh ss-hhGhlhsh hGh hGtcp+hsthchp COG1688 (Cas5) hhhhhts sssshhGhlsh lGttphh COGs 15835551 hhhhoPhhl hGtppshGFGl YgcH-like hHphlh hGu+uhGhGhh y1727-like LHphLh GFsthGLStss MJ0978-like hHNH lG+tsuhGhGol

Ferredoxin foldRNA-recognition motif (RRM)

The sequence similarity space of CRISPR repeats visualized with the BioLayoutprogram [26] Dots denote individual repeat sequences connecting lines represent Smith-Waterman similarities such that closer dots represent more similar sequences Dot colors denote cluster association as derived from MCL clustering The 12 largest clusters are indicated by circles together with their sequence logos coarse phylogenetic composition and sample secondary structures where applicableKunin et al Genome Biology 2007 8R61 doi101186gb-2007-8-4-r61

CRISPR show extreme diversity and complex clustering

Arcful GTTGAAATC-------AGACCAAAATGGGATTGAAAG- Metthe GTTAAAATC-------AGACCAAAATGGGATTGAAAT- Pyraby GTTCCAATA-------AGACTAGAATAGAATTGAAAG- Pyrhor GTTTAATAA--------GACTAAAATAGAATTGAAAG- Sulsol GATTAATCC-------------CAAAAGAATTGAAAG- Pyrhor GTTTCCGTA--------GAACTAAATAGTGTGGAAAG- Pyraby GTTTCCGTA--------GAACTTAGTAGTGTGGAAAG- Pyraer GTTTCAACT------------ATCTTTTGATTTTTGG- Pictor GTTAAATAA-------TAACCTAAATAGGATTGAAAG- Theaer GTAAAATAG---------ACCTTAATAGGATTGAAAG- Metace GTTTCAATC-------CCTCAAAGGTCTGATTTTAAC- Sultok GATGAATCC------------CAAAAGGAATTGAAAG- Sulaci GTTTTAGTT-------------TCTTGTCGTTATTAC- Pictor GTTTAAGAA--------TTACTAGATAGTATGGAGT-- Halmar GTTTCAGAC-------GGACCCTTGTGGGGTTGAAGC- Metjan ATTAAAATC-------AGACCGTTTCGGAATGGAAAT- Metkan GTTTCATTA-CCCGTATTATTACGGGTTAATTGCGAG- Nanequ CTTTCAATA---------TTTCTAATATATTAGAAAC- Themar GTTTCAATA-------CTTCCTTAGAGGTATGGAAAC- Theten GTTTCAATC-------CCTTTTAGGTAGGCTAAAAAC Thethe GTTGCAAAC-CTCGTTAGCCTCGTAGAGGATTGAAAC- Aquaeo GTTTTAACT--------CCACACGGTACATTAGAAAC- Bachal GTCGCACTC----TATATGGGT-GCGTGGATTGAAAT- Azoarcus GTGTTCCCC-------GCGCATCGCGGGGGTTGAAG-- Chltep GTCTTCCCC--------ACGCC-CGTGGGGGTGTTTC- Chrvio GTGCTCCCC-------CACGCA-CGTGGGGATGAACCG Clotet GTATTAGTA-------GCACCA-TATTGGAATGTAAAT Anabaena GTTTTAATTAACAAAAATCCCTATCAGGGATTGAAAC- Myctub GTTTCCGTC--CCCTCTCGGGGTTTTGGGTCTGACGAC Yerpes GTTCACTGC--------CGCACAGGCAGCTTAGAAA-- Metcap GTTTCAATCCACTCCCGGCTATTTAGCCGGGAGATAC- Mycgal GTTTTAGCA-CTGTACAATACTTGTGTAAGCAATAAC- Pasmul GTTAACTGC--------CGTATAGGCAGCTTAGAAA-- Pasmul GTTGTAGTTCCCTCTCTCATTTCGCAGTGCTACAAT-- Pasmul GTTCACCAT--------CGTGTAGATGGCTTAGAAA--

All

A large subset of CRISPRs is conserved among diverse specieseven between archaea and bacteria suggesting that

CRISPRs are horizontally transferred together with cas genes

Mojica FJ Diez-Villasenor C Garcia-Martinez J Soria E

Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elementsJ Mol Evol 2005 Feb60(2)174-82

Here we show that CRISPR spacers derive from preexisting sequences either chromosomal or within transmissible genetic elements such as bacteriophages and conjugative plasmids

helliphelliphelliphellipThe transcription of the CRISPR loci (Tang et al 2002) suggests that such activity could be executed by CRISPR-RNA molecules acting as regulatory RNA that specifically recognizes the target through the homologous RNA-spacer sequence similarly to the eukaryotic interference RNA

HypothesisCRISPRCasbull is a prokaryotic immunity system that functions on the

RNAi principle

bull integrates short fragments of essential phageplasmid genes into CRISPRs

bull When expressed these fragments (psiRNA ndash after prokaryotic siRNA) silence the target gene and make the organism immune to the respective agent

bull contains all or most of the protein activities involved in these processes

bull Some of the Cas proteins are functional analogs of the eukaryotic proteins involved in RNAi in particular components of RISC (RNA-Induced Silencing Complex) and form prokaryotic analogs of RISC (pRISCs)

Transcription

(protein-guided) RNA folding

specific transcription factor (COG1517)

cellular RNApol

polycistronic pre-psiRNA

p-dicer (Helicase+HD-Hydrolase)(COG1203 COG2254)

RNA processing(fast)

pre-psiRNA75-100 nt

RNA processing(slow)

p-dicer

psiRNA25-45 nt

p-slicer(COG1468 COG4343

COG1857)

RAMP

target RNA cleavage

p-RISC

plasmid or phage mRNA

Annealing to RNA target

p-RISC

The basic scheme of CASS functioning

psiRNA25-45 nt

RAMP

RAMP-RNA complex(unstable)plasmid or phage

mRNA

Primer elongationCASS RNApol(COG1353)

Amplified annealing

long dsRNA (stable)

p-dicerDuplex degradation

RAMPRAMP binding

Cycle continues

Annealing to RNA target

Variant of CASS functioning with polymerasepsi-RNA amplification(mostly in thermophiles but also in Mycobacteria)

plasmid or phage mRNA

pre-psiRNA75-100 nt

CASS RNApol(COG1353)Or other RT

Reverse transcription with random copy choice

Random RNA recombination and reverse transcription

dsDNA with CRISPRs and target-derived spacers

Homologous recombination with genomic CRISPR region

integrase(COG1518)

genomic DNA

genomic DNA with new target-derived spacer

OR

New psiRNA generation

Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12

Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity

Key validation

bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease

Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes

(1) Adaptation

(2) Expression amp Processing

(3) Interference

invading virus plasmid

CRISPRleader

5 4 3 2 1

6 5 4 3 2 1

invading virus plasmid

invading virus plasmid

host

host

cas operon

Cascade

Cas3

cas3 cas2cas1cas5

Van der Oost et al TIBS 2009

The three types of CRISPRCas systems and their signature genes

1343

1343

1343

1343

Helicase1203

HD-f

Cas2

Cas6

Cas6

Cas5

Cas8

Cas9

Cas1

Cas3

Cas10

Cas11

Cas7

Cas6

Cas6

Cas5 Cas6

Cas4

Cas6

13433513

1343

RuvC-f

3513

HNH

13433513

1857

18571857 RAMP1688

RAMP1688

RecB-f1468

RecB-f1468

Helicase1203

R AMP1583

1857 1857Helicase1203

1857Helicase1203

1343 y1724

ygcKygcL

EcoliCASS2

YpestCASS3

NmeniCASS4

MtubeCASS6 RAMP module

DvulgCASS1

Tneap and HmariCASS7

ApernCASS5

Type I (Helicase cassette)

Cascade

Type II (HNH-type)

Type III (RAMP module or Polymerase cassette)

CASS4a

1857 1343RecB-f1468

Helicase1203

BH0338MTH1090LA3191

1421 RAMP1567

RAMP15833337RAMP

1583

RAMP1583

RAMP1769

SPy1049

Cas6

Cas61857

RecB-f1468

RAMP1688

Helicase1203

AF0070

R AMP1583

1343

1517

Cas12 Cas13

RAMP1688

RAMP1688

RAMP1688

Makarova et al NRM submitted

Experimental data on CRISPRCas systems

1857

1857

1857

RAMP1688

10

Cordip DIP0037Bacfra BF3951

Wolsuc WS1444Camjej Cj1522c

Neimen NMA0630Pasmul PM1126

Strthe str0658Strpyo Spy 0770

Treden TDE0328Staepi SERP2463

Mycgal MGA 0523Mycmob MMOB0320

Porgin PG1982Legpne lpp0161

Wolsuc WS1615Sulsol SSO1450

Aerper APE1240Sulsol SSO1405

Pyraer PAE0200Pyraer PAE0081

Metmaz MM0559Metmaz MM3249

Arcful AF1878Mansuc MS1635

Niteur NE0111Myctub Rv2817c

Nostoc alr0381Synech slr7071Thethe TTHB145

Chrvio CV1229Xanaxo XAC3842

Chltep CT1130Desvul DVUA0134

Metcap MCA0651Azoarc ebA3283

Deigeo_DRAFT_1682Mansuc MS0981

Baccla ABC3592Bachal BH0341

Strpyo Spy 1286Vibvul VVA1544

Nostoc alr1468Synech slr7092

Synech slr7016Geosul GSU0057

Thethe TTHB224Nostoc alr1568

Metkan MK1312

Esccol b2755Saltyp STM2938

Phopro PBPRC0034Geosul GSU1392

Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639

Symthe_STH663Corjei jk0643

Nocfar nfa44220Strave SAV7537

Metcap MCA0930Cordip DIP2214

Chrvio CV1756Zymmob ZMO0680

Pasmul PM0311Erwcar ECA3679

Yerpes y1722Legpne lpl2837Phopro PBPRB1995

Acinet ACIAD2484

Clotet CTC01148Fusnuc FN1177

Aquaeo aq 369Theten TTE2658Themar TM1797

Bacfra BF2544Porgin PG2014

Halmar pNG4053Clotet CTC01463

Metjan MJ0378Pyrhor PH1245

Thekod TK0455Metthe MTH1084

Arcful AF2435Metace MA3670

Nanequ NEQ017Pyrhor PH0173

Pictor PTO0003Pictor PTO0049Thevol TVN0106

1343

1343

1343

1343

1518

1518

1343

1518

RecB-f4343

RecB-f1468

RecB-f1468

RecB-f1468

Helicase12031857 RAMP

1688

RAMP1688

Helicase1203

Helicase1203

RAMPBH0337

RAMPYgcH

RAMPy1727

RAMPy17261857

Helicase1203

HD-f

Helicase1203

HD-f

1343

HD-f

HD-f

HD-f

RuvC-f

AF0070

BH0338

y1724

ygcKygcL

RAMP1583

HTH

AF1870

SPy1049

3574

AF1870

PH0918

HTH 3574

1857 RAMP1688

RecB-f1468

Helicase1203

HD-f

AF1870

MTH1090

AF1873

RAMP1583

3574

AF1870

3574

1343

1

7

2

3

4

5

6

7a

Sulsol SSO1999Sulsol SSO1440

Sultok ST2639Sultok ST0032

Sulsol SSO1402Pyraer PAE0068

Arcful AF1874Pyrhor PH0917Thekod TK0450

Pyraby PAB1689

13433513

3513

1518

RuvC-f

HNH

4aHNH

1 0

Pyrhor PH0162

6

64

6

4

16

17

7

7

7

7

1

2

7

7

7

7

5 7

5 75 7

5 75 7

5 7

6 1

7

Metkan MK1314

Metace MA1931

Metmaz MM3356

Metbar Mbar_A1355

Strthe stu0960Myctub Rv2823cStaepi SERP2461

Themar TM1811Mansuc MS1651

Niteur NE0123Thethe TTP0102

Metthe MTH1082

Metthe MTH326

Metjan MJ1672Pictor PTO0053Thevol TVN0112

Metkan MK1297Thethe TTP0115

Bachal BH0328Synech sll7090

Aquaeo aq 387Theten TTE2639

Themar TM1794Arcful AF1867

Pyrfur PF1129Thefus Tfu1578

Porgin PG1987Sultok ST1979

Sulsol SSO1991Sulsol SSO1729

Sultok ST0012Sulaci Saci 2046Nostoc all1479

1421 RAMP1567

RAMP1583

RAMPMTH323

RAMP1769

XMTH324

ldquoPolymeraserdquo cassette

ldquoHelicaserdquo cassette(A)

(B)

(C)

Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4

Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56

Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)

mRNA targeting

Phylogeny of Cas1 and the 3 types of CRISPRCas systems

228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes

EcoliCASS2Type IType I

Type I

Type I

Type I

Type III

Type III

Type III

Type III

Type III

Type III

Type II

YpestCASS3

NmeniCASS4

PolymeraseRAMPmoduleMtubeCASS6

PolRAMP

DvulgCASS1

TneapHmariCASS7

ApernCASS5

cas1cas9 cas2 cas4

cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI

II

IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2

RAMP

Makarova et al NRM submitted

CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria

bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes

54

256

1537

26

516

2

9

7 56107

3

10

13

383

210

46

216

8

1

7 70211

10

1

0102030405060708090

100

1623

84 6

17 7

2322

0

9

0

0

151

2

1

21

17

20 1

0

1533 28 7 14

0

9 7 40

117 210

0102030405060708090

100

Type I Type II Type IIICas1presenceabsence

Makarova et al in preparation

Modular evolution of the 3 types of CRISPRCas

Makarova et al in preparation

Mol Microbiol 2011 Jan79(2)484-502

A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair

Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF

Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair

Back to repair functions

The 3 major modalities of evolution

Koonin Wolf Biol Direct 2009

CRISPRCas as a bona fide Lamarckian system

Koonin Wolf Biol Direct 2009

Phenomenon Biological rolefunction

Phyletic spread

Lamarckian criteria

Genomic changes caused by environmental factor

Changes are specific to relevant genomic loci

Changes provide adaptation to the causative factor

Bona fide LamarckianCRISPRCas Defense against

viruses and other mobile elements

Most of the Archaeaand many bacteria

Yes Yes Yes

piRNA Defense against transposable elements in germline

Animals Yes Yes Yes

HGT (specific cases)

Adaptation to new environment stress response resistance

Archaea bacteria unicellular eukaryotes

Yes Yes Yes

Quasi-LamarckianHGT (general phenomenon)

Diverse innovations Archaea bacteria unicellular eukaryotes

Yes No Yesno

Stress-induced mutagenesis

Stress responseresistanceadaptation to new conditions

Ubiquitous Yes No or partially

Yes (but general evolvabilityenhanced as well)

Diverse Lamarckian and quasi-Lamarckian phenomena

Koonin Wolf Biol Direct 2009

Stress as a gauge of evolutionary modality

Koonin Wolf Biol Direct 2009

bullEvolution of parasites is intrinsic to any replicator system

bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since

bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms

bullPerennial arms race between parasites and hosts is one of the principal factors of evolution

Valerian DoljaOregon State University

Yuri Wolf NCBITatiana Senkevich NIAID NIH

Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille

Didier Raoult et al

Bill MartinUniv Duesseldorf

Kira Makarova NCBI

  • Slide Number 1
  • What is a virus
  • Slide Number 3
  • Slide Number 4
  • Slide Number 5
  • Comparative genomics shows that viruses that cause human diseases belong to families that evolved hundreds of millions or even billions years agoViruses accompany the evolving cellular life throughout its history and might even predate it
  • Slide Number 7
  • Slide Number 8
  • Slide Number 9
  • Slide Number 10
  • Slide Number 11
  • Slide Number 12
  • Slide Number 13
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Slide Number 18
  • Slide Number 19
  • Slide Number 20
  • Slide Number 21
  • Slide Number 22
  • Slide Number 23
  • Slide Number 24
  • Slide Number 25
  • Slide Number 26
  • Slide Number 27
  • Slide Number 28
  • Slide Number 29
  • Slide Number 30
  • Slide Number 31
  • Slide Number 32
  • Slide Number 33
  • Slide Number 34
  • Slide Number 35
  • Slide Number 36
  • Evolution of antivirus defense systems
  • Slide Number 38
  • Slide Number 39
  • Slide Number 40
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins extreme sequence diversity
  • Slide Number 46
  • Slide Number 47
  • Slide Number 48
  • Slide Number 49
  • Hypothesis
  • Slide Number 51
  • Slide Number 52
  • Slide Number 53
  • Slide Number 54
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • The three types of CRISPRCas systems and their signature genes
  • Experimental data on CRISPRCas systems
  • Phylogeny of Cas1 and the 3 types of CRISPRCas systems
  • CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
  • Slide Number 62
  • Slide Number 63
  • Slide Number 64
  • Slide Number 65
  • Slide Number 66
  • Slide Number 67
  • Slide Number 68
  • Slide Number 69
Page 12: The Virus World, its evolution, evolution of antiviral defense

Gene content tree intervirus gene transfer

Amoeba as a melting pot for HGT between viruses andbacterial endosymbionts

There are really weird creatures out thereSome NCLDV host their own parasites

La Scola et al The virophage as a unique parasite of the giant mimivirus Nature 2008

Chimeric origin of the virophage genome

NCLDV

Mimivirus

Archaeal virus

NCLDVenvironmental only

Bacterialtransposon

La Scola et al Nature 2008 455100-4

Hypothesisorigin of the NCLDV (and other viruses ofEukaryotes) in the

the second melting pot of virus evolution ndasheukaryogenesis

Phage scaffold(virus hallmark genes)

Eukaryotic additionsdisplacements

Koonin Yutin Intervirology 2010

Poliovirus74 kb

VP0 3DPol2C 3CPro Angg

IRESVP1VP3 2APro 2B

3A 3B

Jelly-roll CPsSuperfamily 3

helicase RdRp

Viral hallmark genes

3C-Pro3DPol

RdRp3CPro g

VPg2C

S3H

Picornaviral lsquosignaturersquo genes

Protein sequence conservation

Some of the smallest viruses (this is where poliovirus belongs) The Big Bang of picorna-like virus evolution

Koonin Wolf Nagasaki Dolja Nat Rev Microbiol 2008 Dec6(12)925-39

Picorna-likeviruses possessdiverse arraysof hallmark andunique genes

Picorna-likeviral superfamily

Marine eukaryotic plankton carries a wealth ofpositive-strand RNA viruses nearly all belong to the picorna-like superfamily

Lang et al (2004) Virology 320206

HcRNAV44 kb

VP

HaRNAV86 kb

S-Pro Pol

VP1VP3VP2PolHel3

RsRNAV89 kb AnPolHel3 VP 1-3

SssRNAV90 kb An

PolHel3 VP 1-3C-Pro

sg sg

An

Nagasaki et al (2005) Appl Env Microbiol 718888

Culley et al (2006) Science 3121795

The amended Picornavirus-like

superfamily includes 14 recognized viral families

4 floating genera and15 unclassified positive-

strand anddouble-strand RNA

viruses that infect hosts from 4 of the 5 eukaryotic

supergroups-6 distinct clades

from RdRp phylogeny-diverse genome layouts

View from the viral side

6 major clades of picorna-likeviruses - 5 infect eukaryotesfrom different supergroups

The 5 supergroups of eukaryotes and their picorna-like viruses

Complementary view from thehost side

4 of the 5 eukaryoticsupergroups host diverse picorna-like viruses (no onechecked the 5th)

PolioV74 kb

VP0 3DPol2C 3CPro Angg

IRESVP1VP3 2APro 2B

3A 3B

Jelly-roll CPsSuperfamily 3

helicase RdRp

DNA phages Bacterial group II

retroelements

Bacterial and mitochondrialHtrA family proteases

Likely origins of the signature genes of the picorna-like superfamily were inferred usingsignature PSSMs and PSI-BLAST searches

3C-Pro

Hallmark genes of picorna-like viruses have distinct prokaryoticorigins

Radiation of major viralclades occurred in aldquoBig Bangrdquo during eukaryogenesis and antedates the divergenceof eukaryotic supergroups-the viruses then ldquosampledrdquo the hosts

Conclusions on the picorna and NCLDV storiesBig Bangs of virus evolution

bullPhylogenomic analysis of the picorna-like superfamily of eukaryotic RNA viruses indicates that the major lineages within this superfamily diverged prior to the divergence of the eukaryotic supergroupsbullBig Bang ndash an explosive early phase of viral evolutionbullMost likely the same pattern holds for other major groups ofviruses as illustrated by the evolutionary study of NCLDVa completely different group of virusesbullThe Big Bangs of eukaryotic virus evolution occurred concomitantly with a similar rapid phase of host evolution andcould be a manifestation of a general model of major evolutionarytransitions

Koonin The Biological Big Bang model for the major transitions in evolutionBiol Direct 2007 Aug 20221

Koonin Wolf Nagasaki Dolja Nat Rev Microbiol 2008 Dec6(12)925-39

+

1 Positive-strandRNA

3-30 kb - +R R ET

T+R

Class Replication cycle

2 Double-strandRNA

4-25 kb plusmnTr R E

T

+ plusmn

3 Negative-strandRNA

11-20 kbR R E

RdRp CPT

+-Tr -

+

Genetic cycles of RNA viruses

Diversity of viral genetic cycles versus the uniform genetic cycle of all cellular organisms Viruses are the

biospherersquos laboratory of genomic strategies

RdRp CP

RdRp CP

+4 Retroid

RNA viruses 7-12 kb +RT Tr E

T+Tr

plusmn

5 Retroid DNAviruses

elements2-10 kb +Tr

T+Tr

plusmnE RT

plusmn

RT CP

RT

Class Replication cycle

Genetic cycles of retroid viruses and retroelements

Genetic cycles of DNA viruses and plasmids 6 ssDNAvirusesplasmids2-11 kb

7 dsDNA virusesplasmids

5-1200 kb

+RCR RCR E

+Tr

plusmn +

TRCE

Tr

plusmn

+T

plusmnR E

Pr-Pol ALL CELLULARORGANISMS

YOU ARE HERE

I Genes with readily detectable homologs from cellular life forms 1 Genes with closely related homologs from cellular organisms

(typically the host of the given virus) present in a narrow group of viruses2 Genes that are conserved within a virus lineage or even several

lineages and have moderately close cellular homologsOrigin relatively recent (1) or ancient (2) acquisition from host

II Virus-specific genes3 ORFans ie genes without detectable homologs except possibly

in closely related viruses 4 Virus-specific genes that are conserved within a virus lineageAcquisition from host but with rapid divergence from ancestor once within viral genomes

III Viral hallmark genes5 Genes shared by many diverse virus lineages with only very distant

homologs in cellular organisms

Natural history of viral genes a one-page summary of viral comparative genomics

Contributions of different classes of viral genes to the genomes of different classes of viruses strong dependence on genome size

Genome size (log) 0 1 2 3

Recent acquisitions

Old acquisitions

Virus-specific ORFans

Virus-specific conserved

Virus hallmark

Most RNAvirusesretroelementsRCR replicons

Large RNA virusesadenotailedphages

NCLDVherpeslargephages

Natural history of viral genesViral Hallmark Genes

Shared by many diverse groups of viruses from the smallest RNA viruses to the giant DNA viruses

Strong support for monophyly of all viral members of the respective gene families

Only distant homologs in cellular organisms

Can be viewed as signatures of the lsquovirus statersquo

Play major roles in genome replication packagingand assembly

4 Rolling circle replicationinitiation endonuclease

1 Jelly-roll capsid protein

2 Superfamily 3 helicase

Protein products of viral hallmark genes

3 RNA-dependent RNA polymeraseand Reverse transcriptase

5 Viral archaeo-eukaryotic DNA primase

6 UL9-like superfamily 2 helicase

7 Packaging ATPase of the FtsK family

8 ATPase subunit of terminase

The primordial gene pool hypothesis (extremely counterintuitive ndash Santiago Elena Feb 16 2011)

The hallmark genes AND by implication the major lineages of modern viruses (at least viruses of prokaryotes) descend directly from a primordial gene pool

Koonin Senkevich Dolja Biol Direct 2006 129

Viral hallmark genes arebullpresent in a huge diversity of viruses and other selfish elementsbullrepresented only by remote homologs in cellular life forms

-synergy with the diversity of genomic strategies

A crucial corollary If viruses come directly from a primordial gene pool then origin of viruses isinextricably linked to the origin of cells

Cell degeneration Escaped genes Pr imordial geneticsystems

CELL

SMALLPARASITICCELL

VERYSMALLVIRUS

CHROMOSOME

PLASMID

VIRUS

RNA

VIRUS

PRE-CELLULAR LIFE FORMS

DNA

VIRUS

mRNA

VIRUS

Competing concepts of the or igin of viruses

CELLS

Origin and evolution of virus-like genetic

elementsin the pre-cellular

era

Koonin Martin TIG 2005Koonin Senkevich Dolja Biol Direct 2006 129Koonin Ann NY Acad Sci 2009

Replicon fusionyields chromosomes

bullViruses and virus-like genetic elements are not ldquojustrdquo pathogens they aredominant entities in the biospherebullEmergence of virus-like parasites is inevitable in any replicating systembullIn the pre-cellular epoch the genetic elements that later became viral andcellular genomes comprised a single pool in which they mixed matched andevolved new increasingly complex gene ensemblesbullDifferent replication strategies including RNA replication reverse transcriptionand DNA replication evolved already in the primordial genetic poolbullWith the emergence of prokaryotic cells a distinct pool of viral genes formed that retained its identity ever since as evidenced by the extant distribution ofviral hallmark genes ldquovirus worldrdquo or the virospherebullThe emergence of the eukaryotic cell was a second melting pot of virus evolution from which viruses of eukaryotes originated via recombination of genes from prokaryote viruses retroelements and the evolving eukaryotic hostbullViruses make essential contributions to the evolution of the genomes of cellularlife forms in particular as vehicles of HGT GTAs transducing phages

The ancient Virus World (VW)

Koonin EV Senkevich TG Dolja VV The ancient Virus World and evolution of cells Biol Direct 2006 Koonin EV Wolf YI Nagasaki K Dolja VV The complexity of the virus world Nat Rev Microbiol 2009

Evolution of antivirus defense systems

bull CRISPRCas system of adaptive immunity in prokaryotes

bull A case for Lamarckian evolutionbull The perennial virus-host arms race

CRISPR repeats and Cas genesCRISPR Clustered Regularly interspaced short palindromic repeatsCas CRISPR-associated (genes)

Sorek et al Nature Rev Microbiol 2008

Makarova KS Aravind L Grishin NV Rogozin IB Koonin EVA DNA repair system specific for thermophilic Archaea and bacteria predicted by genomic context analysisNucleic Acids Res 2002 Jan 1530(2)482-96

During a systematic analysis of conserved gene context in prokaryotic genomes a previously undetected complex partially conserved neighborhood consisting of more than 20 genes was discovered in most Archaea and some bacteria including the hyperthermophiles Thermotoga maritima and Aquifex aeolicus The gene composition and gene order in this neighborhood vary greatly between species but all versions have a stable conserved core that consists of five genes One of the core genes encodes a predicted DNA helicase often fused to a predicted HD-superfamily hydrolase and another encodes a RecB family exonuclease three core genes remain uncharacterized but one of these might encode a nuclease of a new family helliphelliphelliphellipThe functional features of the proteins encoded in this neighborhood suggest that they comprise a previously undetected DNA repair system which to our knowledge is the first repair system largely specific for thermophiles to be identified

Protein components of the system an update with unification of many diverse families

~25 families altogetherFamily SubfamilyA Phyletic

distributionBComments

1 COG1518 COG1518 (cas1) All Putative novel nucleaseintegrase Mostly α-helical protein

2 COG1343 COG1343 (cas2) COG3512ygbF-like

MTH324-likey1723_N-like

All Small protein related to VapD fused to helicase (COG1203) iny1723-like proteins

3 COG1203 COG1203 (cas3) All DNA helicase Most proteins have fusion to HD nuclease

4 RecB-like nuclease

COG1468 (cas4) COG4343 All RecB-like nuclease Contains three-cysteine C-terminal cluster

5 RAMP Repair-associated mysterious

protein

COG1688 (cas5)COG1769 COG1583COG1567 COG1336COG1367 COG1604COG1337 COG1332COG5551

BH0337-likeMJ0978-likeYgcH-like

y1726-like y1727-like

All Belong to ldquoRAMPrdquo family possibly RNA-binding proteinstructurally related to duplicated ferredoxin fold (PDB 1wj9)

6 COG1857 COG1857 COG3649 YgcJ-like y1725-like

All αβ protein predicted nuclease or integrase

7 HD-like nuclease

COG1203 (N-terminus) COG2254 All HD-like nuclease

8 BH0338 BH0338-likeMTH1090-like

All mostlyarchaea and FIRM

Large Zn-finger containing proteins probably nucleases (nucleaseactivity was shown for MTH1090 (ref)

9 COG1353 predicted

polymerase

COG1353 Most archaea some bacteria

Predicted palm-domain polymerase distantly related to viral RdRp and RT

hellipand ~15 other less common proteins Makarova et al BD 2006

CRISPR clustered regularly interspaced short palindromic repeats

CRISPR repeats

TB and TA

Strepto-like

Ecoli-like

Pasteurella-like

Cas2COG1343

Cas1COG1518

Cas4COG1468

Cas3COG1203

ldquoThe common structural characteristics of CRISPR loci are (i) the presence of multiple short direct repeats which show no or very little sequence variation within a given locus (ii) the presence of non-repetitive spacer sequences between the repeats of similar size (iii) the presence of a common leader sequence of a few hundred basepairs in most species harbouring multiple CRISPR loci (iv) the absence of long open reading frames within the locus and (v) the presence of the cas1 gene accompanied by the cas2 cas3 or cas4 genes in CRISPR-containing speciesrdquo

The cas genes are our ldquorepairrdquo system

ldquoHelicaserdquo cassette

ldquoPolymeraserdquo cassette

paragon of prokaryotic genome plasticity-extensive gene shuffling-evidence of multiple horizontal transfers-widespread gene loss

RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins

extreme sequence diversity MOTIFs specific I II III IV V COGs 13361367 hhshhGs ust-lKGhh+hh hhGtt hD lGhttsGh 160413371332 y1726-like slhlpEKuVRGT lRTIDs YGuVTsGhuh COG1851 hGphpGpsaFh hGFGRh BH0337-like TpA-h+GIh-uIh hhLpDV LGsREhuht COG1567 hhhpp ups-lhtAhh luscoGhGh COG1769 hhhh+Ph-hhh ss-hhGhlhsh hGh hGtcp+hsthchp COG1688 (Cas5) hhhhhts sssshhGhlsh lGttphh COGs 15835551 hhhhoPhhl hGtppshGFGl YgcH-like hHphlh hGu+uhGhGhh y1727-like LHphLh GFsthGLStss MJ0978-like hHNH lG+tsuhGhGol

Ferredoxin foldRNA-recognition motif (RRM)

The sequence similarity space of CRISPR repeats visualized with the BioLayoutprogram [26] Dots denote individual repeat sequences connecting lines represent Smith-Waterman similarities such that closer dots represent more similar sequences Dot colors denote cluster association as derived from MCL clustering The 12 largest clusters are indicated by circles together with their sequence logos coarse phylogenetic composition and sample secondary structures where applicableKunin et al Genome Biology 2007 8R61 doi101186gb-2007-8-4-r61

CRISPR show extreme diversity and complex clustering

Arcful GTTGAAATC-------AGACCAAAATGGGATTGAAAG- Metthe GTTAAAATC-------AGACCAAAATGGGATTGAAAT- Pyraby GTTCCAATA-------AGACTAGAATAGAATTGAAAG- Pyrhor GTTTAATAA--------GACTAAAATAGAATTGAAAG- Sulsol GATTAATCC-------------CAAAAGAATTGAAAG- Pyrhor GTTTCCGTA--------GAACTAAATAGTGTGGAAAG- Pyraby GTTTCCGTA--------GAACTTAGTAGTGTGGAAAG- Pyraer GTTTCAACT------------ATCTTTTGATTTTTGG- Pictor GTTAAATAA-------TAACCTAAATAGGATTGAAAG- Theaer GTAAAATAG---------ACCTTAATAGGATTGAAAG- Metace GTTTCAATC-------CCTCAAAGGTCTGATTTTAAC- Sultok GATGAATCC------------CAAAAGGAATTGAAAG- Sulaci GTTTTAGTT-------------TCTTGTCGTTATTAC- Pictor GTTTAAGAA--------TTACTAGATAGTATGGAGT-- Halmar GTTTCAGAC-------GGACCCTTGTGGGGTTGAAGC- Metjan ATTAAAATC-------AGACCGTTTCGGAATGGAAAT- Metkan GTTTCATTA-CCCGTATTATTACGGGTTAATTGCGAG- Nanequ CTTTCAATA---------TTTCTAATATATTAGAAAC- Themar GTTTCAATA-------CTTCCTTAGAGGTATGGAAAC- Theten GTTTCAATC-------CCTTTTAGGTAGGCTAAAAAC Thethe GTTGCAAAC-CTCGTTAGCCTCGTAGAGGATTGAAAC- Aquaeo GTTTTAACT--------CCACACGGTACATTAGAAAC- Bachal GTCGCACTC----TATATGGGT-GCGTGGATTGAAAT- Azoarcus GTGTTCCCC-------GCGCATCGCGGGGGTTGAAG-- Chltep GTCTTCCCC--------ACGCC-CGTGGGGGTGTTTC- Chrvio GTGCTCCCC-------CACGCA-CGTGGGGATGAACCG Clotet GTATTAGTA-------GCACCA-TATTGGAATGTAAAT Anabaena GTTTTAATTAACAAAAATCCCTATCAGGGATTGAAAC- Myctub GTTTCCGTC--CCCTCTCGGGGTTTTGGGTCTGACGAC Yerpes GTTCACTGC--------CGCACAGGCAGCTTAGAAA-- Metcap GTTTCAATCCACTCCCGGCTATTTAGCCGGGAGATAC- Mycgal GTTTTAGCA-CTGTACAATACTTGTGTAAGCAATAAC- Pasmul GTTAACTGC--------CGTATAGGCAGCTTAGAAA-- Pasmul GTTGTAGTTCCCTCTCTCATTTCGCAGTGCTACAAT-- Pasmul GTTCACCAT--------CGTGTAGATGGCTTAGAAA--

All

A large subset of CRISPRs is conserved among diverse specieseven between archaea and bacteria suggesting that

CRISPRs are horizontally transferred together with cas genes

Mojica FJ Diez-Villasenor C Garcia-Martinez J Soria E

Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elementsJ Mol Evol 2005 Feb60(2)174-82

Here we show that CRISPR spacers derive from preexisting sequences either chromosomal or within transmissible genetic elements such as bacteriophages and conjugative plasmids

helliphelliphelliphellipThe transcription of the CRISPR loci (Tang et al 2002) suggests that such activity could be executed by CRISPR-RNA molecules acting as regulatory RNA that specifically recognizes the target through the homologous RNA-spacer sequence similarly to the eukaryotic interference RNA

HypothesisCRISPRCasbull is a prokaryotic immunity system that functions on the

RNAi principle

bull integrates short fragments of essential phageplasmid genes into CRISPRs

bull When expressed these fragments (psiRNA ndash after prokaryotic siRNA) silence the target gene and make the organism immune to the respective agent

bull contains all or most of the protein activities involved in these processes

bull Some of the Cas proteins are functional analogs of the eukaryotic proteins involved in RNAi in particular components of RISC (RNA-Induced Silencing Complex) and form prokaryotic analogs of RISC (pRISCs)

Transcription

(protein-guided) RNA folding

specific transcription factor (COG1517)

cellular RNApol

polycistronic pre-psiRNA

p-dicer (Helicase+HD-Hydrolase)(COG1203 COG2254)

RNA processing(fast)

pre-psiRNA75-100 nt

RNA processing(slow)

p-dicer

psiRNA25-45 nt

p-slicer(COG1468 COG4343

COG1857)

RAMP

target RNA cleavage

p-RISC

plasmid or phage mRNA

Annealing to RNA target

p-RISC

The basic scheme of CASS functioning

psiRNA25-45 nt

RAMP

RAMP-RNA complex(unstable)plasmid or phage

mRNA

Primer elongationCASS RNApol(COG1353)

Amplified annealing

long dsRNA (stable)

p-dicerDuplex degradation

RAMPRAMP binding

Cycle continues

Annealing to RNA target

Variant of CASS functioning with polymerasepsi-RNA amplification(mostly in thermophiles but also in Mycobacteria)

plasmid or phage mRNA

pre-psiRNA75-100 nt

CASS RNApol(COG1353)Or other RT

Reverse transcription with random copy choice

Random RNA recombination and reverse transcription

dsDNA with CRISPRs and target-derived spacers

Homologous recombination with genomic CRISPR region

integrase(COG1518)

genomic DNA

genomic DNA with new target-derived spacer

OR

New psiRNA generation

Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12

Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity

Key validation

bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease

Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes

(1) Adaptation

(2) Expression amp Processing

(3) Interference

invading virus plasmid

CRISPRleader

5 4 3 2 1

6 5 4 3 2 1

invading virus plasmid

invading virus plasmid

host

host

cas operon

Cascade

Cas3

cas3 cas2cas1cas5

Van der Oost et al TIBS 2009

The three types of CRISPRCas systems and their signature genes

1343

1343

1343

1343

Helicase1203

HD-f

Cas2

Cas6

Cas6

Cas5

Cas8

Cas9

Cas1

Cas3

Cas10

Cas11

Cas7

Cas6

Cas6

Cas5 Cas6

Cas4

Cas6

13433513

1343

RuvC-f

3513

HNH

13433513

1857

18571857 RAMP1688

RAMP1688

RecB-f1468

RecB-f1468

Helicase1203

R AMP1583

1857 1857Helicase1203

1857Helicase1203

1343 y1724

ygcKygcL

EcoliCASS2

YpestCASS3

NmeniCASS4

MtubeCASS6 RAMP module

DvulgCASS1

Tneap and HmariCASS7

ApernCASS5

Type I (Helicase cassette)

Cascade

Type II (HNH-type)

Type III (RAMP module or Polymerase cassette)

CASS4a

1857 1343RecB-f1468

Helicase1203

BH0338MTH1090LA3191

1421 RAMP1567

RAMP15833337RAMP

1583

RAMP1583

RAMP1769

SPy1049

Cas6

Cas61857

RecB-f1468

RAMP1688

Helicase1203

AF0070

R AMP1583

1343

1517

Cas12 Cas13

RAMP1688

RAMP1688

RAMP1688

Makarova et al NRM submitted

Experimental data on CRISPRCas systems

1857

1857

1857

RAMP1688

10

Cordip DIP0037Bacfra BF3951

Wolsuc WS1444Camjej Cj1522c

Neimen NMA0630Pasmul PM1126

Strthe str0658Strpyo Spy 0770

Treden TDE0328Staepi SERP2463

Mycgal MGA 0523Mycmob MMOB0320

Porgin PG1982Legpne lpp0161

Wolsuc WS1615Sulsol SSO1450

Aerper APE1240Sulsol SSO1405

Pyraer PAE0200Pyraer PAE0081

Metmaz MM0559Metmaz MM3249

Arcful AF1878Mansuc MS1635

Niteur NE0111Myctub Rv2817c

Nostoc alr0381Synech slr7071Thethe TTHB145

Chrvio CV1229Xanaxo XAC3842

Chltep CT1130Desvul DVUA0134

Metcap MCA0651Azoarc ebA3283

Deigeo_DRAFT_1682Mansuc MS0981

Baccla ABC3592Bachal BH0341

Strpyo Spy 1286Vibvul VVA1544

Nostoc alr1468Synech slr7092

Synech slr7016Geosul GSU0057

Thethe TTHB224Nostoc alr1568

Metkan MK1312

Esccol b2755Saltyp STM2938

Phopro PBPRC0034Geosul GSU1392

Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639

Symthe_STH663Corjei jk0643

Nocfar nfa44220Strave SAV7537

Metcap MCA0930Cordip DIP2214

Chrvio CV1756Zymmob ZMO0680

Pasmul PM0311Erwcar ECA3679

Yerpes y1722Legpne lpl2837Phopro PBPRB1995

Acinet ACIAD2484

Clotet CTC01148Fusnuc FN1177

Aquaeo aq 369Theten TTE2658Themar TM1797

Bacfra BF2544Porgin PG2014

Halmar pNG4053Clotet CTC01463

Metjan MJ0378Pyrhor PH1245

Thekod TK0455Metthe MTH1084

Arcful AF2435Metace MA3670

Nanequ NEQ017Pyrhor PH0173

Pictor PTO0003Pictor PTO0049Thevol TVN0106

1343

1343

1343

1343

1518

1518

1343

1518

RecB-f4343

RecB-f1468

RecB-f1468

RecB-f1468

Helicase12031857 RAMP

1688

RAMP1688

Helicase1203

Helicase1203

RAMPBH0337

RAMPYgcH

RAMPy1727

RAMPy17261857

Helicase1203

HD-f

Helicase1203

HD-f

1343

HD-f

HD-f

HD-f

RuvC-f

AF0070

BH0338

y1724

ygcKygcL

RAMP1583

HTH

AF1870

SPy1049

3574

AF1870

PH0918

HTH 3574

1857 RAMP1688

RecB-f1468

Helicase1203

HD-f

AF1870

MTH1090

AF1873

RAMP1583

3574

AF1870

3574

1343

1

7

2

3

4

5

6

7a

Sulsol SSO1999Sulsol SSO1440

Sultok ST2639Sultok ST0032

Sulsol SSO1402Pyraer PAE0068

Arcful AF1874Pyrhor PH0917Thekod TK0450

Pyraby PAB1689

13433513

3513

1518

RuvC-f

HNH

4aHNH

1 0

Pyrhor PH0162

6

64

6

4

16

17

7

7

7

7

1

2

7

7

7

7

5 7

5 75 7

5 75 7

5 7

6 1

7

Metkan MK1314

Metace MA1931

Metmaz MM3356

Metbar Mbar_A1355

Strthe stu0960Myctub Rv2823cStaepi SERP2461

Themar TM1811Mansuc MS1651

Niteur NE0123Thethe TTP0102

Metthe MTH1082

Metthe MTH326

Metjan MJ1672Pictor PTO0053Thevol TVN0112

Metkan MK1297Thethe TTP0115

Bachal BH0328Synech sll7090

Aquaeo aq 387Theten TTE2639

Themar TM1794Arcful AF1867

Pyrfur PF1129Thefus Tfu1578

Porgin PG1987Sultok ST1979

Sulsol SSO1991Sulsol SSO1729

Sultok ST0012Sulaci Saci 2046Nostoc all1479

1421 RAMP1567

RAMP1583

RAMPMTH323

RAMP1769

XMTH324

ldquoPolymeraserdquo cassette

ldquoHelicaserdquo cassette(A)

(B)

(C)

Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4

Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56

Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)

mRNA targeting

Phylogeny of Cas1 and the 3 types of CRISPRCas systems

228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes

EcoliCASS2Type IType I

Type I

Type I

Type I

Type III

Type III

Type III

Type III

Type III

Type III

Type II

YpestCASS3

NmeniCASS4

PolymeraseRAMPmoduleMtubeCASS6

PolRAMP

DvulgCASS1

TneapHmariCASS7

ApernCASS5

cas1cas9 cas2 cas4

cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI

II

IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2

RAMP

Makarova et al NRM submitted

CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria

bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes

54

256

1537

26

516

2

9

7 56107

3

10

13

383

210

46

216

8

1

7 70211

10

1

0102030405060708090

100

1623

84 6

17 7

2322

0

9

0

0

151

2

1

21

17

20 1

0

1533 28 7 14

0

9 7 40

117 210

0102030405060708090

100

Type I Type II Type IIICas1presenceabsence

Makarova et al in preparation

Modular evolution of the 3 types of CRISPRCas

Makarova et al in preparation

Mol Microbiol 2011 Jan79(2)484-502

A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair

Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF

Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair

Back to repair functions

The 3 major modalities of evolution

Koonin Wolf Biol Direct 2009

CRISPRCas as a bona fide Lamarckian system

Koonin Wolf Biol Direct 2009

Phenomenon Biological rolefunction

Phyletic spread

Lamarckian criteria

Genomic changes caused by environmental factor

Changes are specific to relevant genomic loci

Changes provide adaptation to the causative factor

Bona fide LamarckianCRISPRCas Defense against

viruses and other mobile elements

Most of the Archaeaand many bacteria

Yes Yes Yes

piRNA Defense against transposable elements in germline

Animals Yes Yes Yes

HGT (specific cases)

Adaptation to new environment stress response resistance

Archaea bacteria unicellular eukaryotes

Yes Yes Yes

Quasi-LamarckianHGT (general phenomenon)

Diverse innovations Archaea bacteria unicellular eukaryotes

Yes No Yesno

Stress-induced mutagenesis

Stress responseresistanceadaptation to new conditions

Ubiquitous Yes No or partially

Yes (but general evolvabilityenhanced as well)

Diverse Lamarckian and quasi-Lamarckian phenomena

Koonin Wolf Biol Direct 2009

Stress as a gauge of evolutionary modality

Koonin Wolf Biol Direct 2009

bullEvolution of parasites is intrinsic to any replicator system

bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since

bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms

bullPerennial arms race between parasites and hosts is one of the principal factors of evolution

Valerian DoljaOregon State University

Yuri Wolf NCBITatiana Senkevich NIAID NIH

Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille

Didier Raoult et al

Bill MartinUniv Duesseldorf

Kira Makarova NCBI

  • Slide Number 1
  • What is a virus
  • Slide Number 3
  • Slide Number 4
  • Slide Number 5
  • Comparative genomics shows that viruses that cause human diseases belong to families that evolved hundreds of millions or even billions years agoViruses accompany the evolving cellular life throughout its history and might even predate it
  • Slide Number 7
  • Slide Number 8
  • Slide Number 9
  • Slide Number 10
  • Slide Number 11
  • Slide Number 12
  • Slide Number 13
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Slide Number 18
  • Slide Number 19
  • Slide Number 20
  • Slide Number 21
  • Slide Number 22
  • Slide Number 23
  • Slide Number 24
  • Slide Number 25
  • Slide Number 26
  • Slide Number 27
  • Slide Number 28
  • Slide Number 29
  • Slide Number 30
  • Slide Number 31
  • Slide Number 32
  • Slide Number 33
  • Slide Number 34
  • Slide Number 35
  • Slide Number 36
  • Evolution of antivirus defense systems
  • Slide Number 38
  • Slide Number 39
  • Slide Number 40
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins extreme sequence diversity
  • Slide Number 46
  • Slide Number 47
  • Slide Number 48
  • Slide Number 49
  • Hypothesis
  • Slide Number 51
  • Slide Number 52
  • Slide Number 53
  • Slide Number 54
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • The three types of CRISPRCas systems and their signature genes
  • Experimental data on CRISPRCas systems
  • Phylogeny of Cas1 and the 3 types of CRISPRCas systems
  • CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
  • Slide Number 62
  • Slide Number 63
  • Slide Number 64
  • Slide Number 65
  • Slide Number 66
  • Slide Number 67
  • Slide Number 68
  • Slide Number 69
Page 13: The Virus World, its evolution, evolution of antiviral defense

Amoeba as a melting pot for HGT between viruses andbacterial endosymbionts

There are really weird creatures out thereSome NCLDV host their own parasites

La Scola et al The virophage as a unique parasite of the giant mimivirus Nature 2008

Chimeric origin of the virophage genome

NCLDV

Mimivirus

Archaeal virus

NCLDVenvironmental only

Bacterialtransposon

La Scola et al Nature 2008 455100-4

Hypothesisorigin of the NCLDV (and other viruses ofEukaryotes) in the

the second melting pot of virus evolution ndasheukaryogenesis

Phage scaffold(virus hallmark genes)

Eukaryotic additionsdisplacements

Koonin Yutin Intervirology 2010

Poliovirus74 kb

VP0 3DPol2C 3CPro Angg

IRESVP1VP3 2APro 2B

3A 3B

Jelly-roll CPsSuperfamily 3

helicase RdRp

Viral hallmark genes

3C-Pro3DPol

RdRp3CPro g

VPg2C

S3H

Picornaviral lsquosignaturersquo genes

Protein sequence conservation

Some of the smallest viruses (this is where poliovirus belongs) The Big Bang of picorna-like virus evolution

Koonin Wolf Nagasaki Dolja Nat Rev Microbiol 2008 Dec6(12)925-39

Picorna-likeviruses possessdiverse arraysof hallmark andunique genes

Picorna-likeviral superfamily

Marine eukaryotic plankton carries a wealth ofpositive-strand RNA viruses nearly all belong to the picorna-like superfamily

Lang et al (2004) Virology 320206

HcRNAV44 kb

VP

HaRNAV86 kb

S-Pro Pol

VP1VP3VP2PolHel3

RsRNAV89 kb AnPolHel3 VP 1-3

SssRNAV90 kb An

PolHel3 VP 1-3C-Pro

sg sg

An

Nagasaki et al (2005) Appl Env Microbiol 718888

Culley et al (2006) Science 3121795

The amended Picornavirus-like

superfamily includes 14 recognized viral families

4 floating genera and15 unclassified positive-

strand anddouble-strand RNA

viruses that infect hosts from 4 of the 5 eukaryotic

supergroups-6 distinct clades

from RdRp phylogeny-diverse genome layouts

View from the viral side

6 major clades of picorna-likeviruses - 5 infect eukaryotesfrom different supergroups

The 5 supergroups of eukaryotes and their picorna-like viruses

Complementary view from thehost side

4 of the 5 eukaryoticsupergroups host diverse picorna-like viruses (no onechecked the 5th)

PolioV74 kb

VP0 3DPol2C 3CPro Angg

IRESVP1VP3 2APro 2B

3A 3B

Jelly-roll CPsSuperfamily 3

helicase RdRp

DNA phages Bacterial group II

retroelements

Bacterial and mitochondrialHtrA family proteases

Likely origins of the signature genes of the picorna-like superfamily were inferred usingsignature PSSMs and PSI-BLAST searches

3C-Pro

Hallmark genes of picorna-like viruses have distinct prokaryoticorigins

Radiation of major viralclades occurred in aldquoBig Bangrdquo during eukaryogenesis and antedates the divergenceof eukaryotic supergroups-the viruses then ldquosampledrdquo the hosts

Conclusions on the picorna and NCLDV storiesBig Bangs of virus evolution

bullPhylogenomic analysis of the picorna-like superfamily of eukaryotic RNA viruses indicates that the major lineages within this superfamily diverged prior to the divergence of the eukaryotic supergroupsbullBig Bang ndash an explosive early phase of viral evolutionbullMost likely the same pattern holds for other major groups ofviruses as illustrated by the evolutionary study of NCLDVa completely different group of virusesbullThe Big Bangs of eukaryotic virus evolution occurred concomitantly with a similar rapid phase of host evolution andcould be a manifestation of a general model of major evolutionarytransitions

Koonin The Biological Big Bang model for the major transitions in evolutionBiol Direct 2007 Aug 20221

Koonin Wolf Nagasaki Dolja Nat Rev Microbiol 2008 Dec6(12)925-39

+

1 Positive-strandRNA

3-30 kb - +R R ET

T+R

Class Replication cycle

2 Double-strandRNA

4-25 kb plusmnTr R E

T

+ plusmn

3 Negative-strandRNA

11-20 kbR R E

RdRp CPT

+-Tr -

+

Genetic cycles of RNA viruses

Diversity of viral genetic cycles versus the uniform genetic cycle of all cellular organisms Viruses are the

biospherersquos laboratory of genomic strategies

RdRp CP

RdRp CP

+4 Retroid

RNA viruses 7-12 kb +RT Tr E

T+Tr

plusmn

5 Retroid DNAviruses

elements2-10 kb +Tr

T+Tr

plusmnE RT

plusmn

RT CP

RT

Class Replication cycle

Genetic cycles of retroid viruses and retroelements

Genetic cycles of DNA viruses and plasmids 6 ssDNAvirusesplasmids2-11 kb

7 dsDNA virusesplasmids

5-1200 kb

+RCR RCR E

+Tr

plusmn +

TRCE

Tr

plusmn

+T

plusmnR E

Pr-Pol ALL CELLULARORGANISMS

YOU ARE HERE

I Genes with readily detectable homologs from cellular life forms 1 Genes with closely related homologs from cellular organisms

(typically the host of the given virus) present in a narrow group of viruses2 Genes that are conserved within a virus lineage or even several

lineages and have moderately close cellular homologsOrigin relatively recent (1) or ancient (2) acquisition from host

II Virus-specific genes3 ORFans ie genes without detectable homologs except possibly

in closely related viruses 4 Virus-specific genes that are conserved within a virus lineageAcquisition from host but with rapid divergence from ancestor once within viral genomes

III Viral hallmark genes5 Genes shared by many diverse virus lineages with only very distant

homologs in cellular organisms

Natural history of viral genes a one-page summary of viral comparative genomics

Contributions of different classes of viral genes to the genomes of different classes of viruses strong dependence on genome size

Genome size (log) 0 1 2 3

Recent acquisitions

Old acquisitions

Virus-specific ORFans

Virus-specific conserved

Virus hallmark

Most RNAvirusesretroelementsRCR replicons

Large RNA virusesadenotailedphages

NCLDVherpeslargephages

Natural history of viral genesViral Hallmark Genes

Shared by many diverse groups of viruses from the smallest RNA viruses to the giant DNA viruses

Strong support for monophyly of all viral members of the respective gene families

Only distant homologs in cellular organisms

Can be viewed as signatures of the lsquovirus statersquo

Play major roles in genome replication packagingand assembly

4 Rolling circle replicationinitiation endonuclease

1 Jelly-roll capsid protein

2 Superfamily 3 helicase

Protein products of viral hallmark genes

3 RNA-dependent RNA polymeraseand Reverse transcriptase

5 Viral archaeo-eukaryotic DNA primase

6 UL9-like superfamily 2 helicase

7 Packaging ATPase of the FtsK family

8 ATPase subunit of terminase

The primordial gene pool hypothesis (extremely counterintuitive ndash Santiago Elena Feb 16 2011)

The hallmark genes AND by implication the major lineages of modern viruses (at least viruses of prokaryotes) descend directly from a primordial gene pool

Koonin Senkevich Dolja Biol Direct 2006 129

Viral hallmark genes arebullpresent in a huge diversity of viruses and other selfish elementsbullrepresented only by remote homologs in cellular life forms

-synergy with the diversity of genomic strategies

A crucial corollary If viruses come directly from a primordial gene pool then origin of viruses isinextricably linked to the origin of cells

Cell degeneration Escaped genes Pr imordial geneticsystems

CELL

SMALLPARASITICCELL

VERYSMALLVIRUS

CHROMOSOME

PLASMID

VIRUS

RNA

VIRUS

PRE-CELLULAR LIFE FORMS

DNA

VIRUS

mRNA

VIRUS

Competing concepts of the or igin of viruses

CELLS

Origin and evolution of virus-like genetic

elementsin the pre-cellular

era

Koonin Martin TIG 2005Koonin Senkevich Dolja Biol Direct 2006 129Koonin Ann NY Acad Sci 2009

Replicon fusionyields chromosomes

bullViruses and virus-like genetic elements are not ldquojustrdquo pathogens they aredominant entities in the biospherebullEmergence of virus-like parasites is inevitable in any replicating systembullIn the pre-cellular epoch the genetic elements that later became viral andcellular genomes comprised a single pool in which they mixed matched andevolved new increasingly complex gene ensemblesbullDifferent replication strategies including RNA replication reverse transcriptionand DNA replication evolved already in the primordial genetic poolbullWith the emergence of prokaryotic cells a distinct pool of viral genes formed that retained its identity ever since as evidenced by the extant distribution ofviral hallmark genes ldquovirus worldrdquo or the virospherebullThe emergence of the eukaryotic cell was a second melting pot of virus evolution from which viruses of eukaryotes originated via recombination of genes from prokaryote viruses retroelements and the evolving eukaryotic hostbullViruses make essential contributions to the evolution of the genomes of cellularlife forms in particular as vehicles of HGT GTAs transducing phages

The ancient Virus World (VW)

Koonin EV Senkevich TG Dolja VV The ancient Virus World and evolution of cells Biol Direct 2006 Koonin EV Wolf YI Nagasaki K Dolja VV The complexity of the virus world Nat Rev Microbiol 2009

Evolution of antivirus defense systems

bull CRISPRCas system of adaptive immunity in prokaryotes

bull A case for Lamarckian evolutionbull The perennial virus-host arms race

CRISPR repeats and Cas genesCRISPR Clustered Regularly interspaced short palindromic repeatsCas CRISPR-associated (genes)

Sorek et al Nature Rev Microbiol 2008

Makarova KS Aravind L Grishin NV Rogozin IB Koonin EVA DNA repair system specific for thermophilic Archaea and bacteria predicted by genomic context analysisNucleic Acids Res 2002 Jan 1530(2)482-96

During a systematic analysis of conserved gene context in prokaryotic genomes a previously undetected complex partially conserved neighborhood consisting of more than 20 genes was discovered in most Archaea and some bacteria including the hyperthermophiles Thermotoga maritima and Aquifex aeolicus The gene composition and gene order in this neighborhood vary greatly between species but all versions have a stable conserved core that consists of five genes One of the core genes encodes a predicted DNA helicase often fused to a predicted HD-superfamily hydrolase and another encodes a RecB family exonuclease three core genes remain uncharacterized but one of these might encode a nuclease of a new family helliphelliphelliphellipThe functional features of the proteins encoded in this neighborhood suggest that they comprise a previously undetected DNA repair system which to our knowledge is the first repair system largely specific for thermophiles to be identified

Protein components of the system an update with unification of many diverse families

~25 families altogetherFamily SubfamilyA Phyletic

distributionBComments

1 COG1518 COG1518 (cas1) All Putative novel nucleaseintegrase Mostly α-helical protein

2 COG1343 COG1343 (cas2) COG3512ygbF-like

MTH324-likey1723_N-like

All Small protein related to VapD fused to helicase (COG1203) iny1723-like proteins

3 COG1203 COG1203 (cas3) All DNA helicase Most proteins have fusion to HD nuclease

4 RecB-like nuclease

COG1468 (cas4) COG4343 All RecB-like nuclease Contains three-cysteine C-terminal cluster

5 RAMP Repair-associated mysterious

protein

COG1688 (cas5)COG1769 COG1583COG1567 COG1336COG1367 COG1604COG1337 COG1332COG5551

BH0337-likeMJ0978-likeYgcH-like

y1726-like y1727-like

All Belong to ldquoRAMPrdquo family possibly RNA-binding proteinstructurally related to duplicated ferredoxin fold (PDB 1wj9)

6 COG1857 COG1857 COG3649 YgcJ-like y1725-like

All αβ protein predicted nuclease or integrase

7 HD-like nuclease

COG1203 (N-terminus) COG2254 All HD-like nuclease

8 BH0338 BH0338-likeMTH1090-like

All mostlyarchaea and FIRM

Large Zn-finger containing proteins probably nucleases (nucleaseactivity was shown for MTH1090 (ref)

9 COG1353 predicted

polymerase

COG1353 Most archaea some bacteria

Predicted palm-domain polymerase distantly related to viral RdRp and RT

hellipand ~15 other less common proteins Makarova et al BD 2006

CRISPR clustered regularly interspaced short palindromic repeats

CRISPR repeats

TB and TA

Strepto-like

Ecoli-like

Pasteurella-like

Cas2COG1343

Cas1COG1518

Cas4COG1468

Cas3COG1203

ldquoThe common structural characteristics of CRISPR loci are (i) the presence of multiple short direct repeats which show no or very little sequence variation within a given locus (ii) the presence of non-repetitive spacer sequences between the repeats of similar size (iii) the presence of a common leader sequence of a few hundred basepairs in most species harbouring multiple CRISPR loci (iv) the absence of long open reading frames within the locus and (v) the presence of the cas1 gene accompanied by the cas2 cas3 or cas4 genes in CRISPR-containing speciesrdquo

The cas genes are our ldquorepairrdquo system

ldquoHelicaserdquo cassette

ldquoPolymeraserdquo cassette

paragon of prokaryotic genome plasticity-extensive gene shuffling-evidence of multiple horizontal transfers-widespread gene loss

RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins

extreme sequence diversity MOTIFs specific I II III IV V COGs 13361367 hhshhGs ust-lKGhh+hh hhGtt hD lGhttsGh 160413371332 y1726-like slhlpEKuVRGT lRTIDs YGuVTsGhuh COG1851 hGphpGpsaFh hGFGRh BH0337-like TpA-h+GIh-uIh hhLpDV LGsREhuht COG1567 hhhpp ups-lhtAhh luscoGhGh COG1769 hhhh+Ph-hhh ss-hhGhlhsh hGh hGtcp+hsthchp COG1688 (Cas5) hhhhhts sssshhGhlsh lGttphh COGs 15835551 hhhhoPhhl hGtppshGFGl YgcH-like hHphlh hGu+uhGhGhh y1727-like LHphLh GFsthGLStss MJ0978-like hHNH lG+tsuhGhGol

Ferredoxin foldRNA-recognition motif (RRM)

The sequence similarity space of CRISPR repeats visualized with the BioLayoutprogram [26] Dots denote individual repeat sequences connecting lines represent Smith-Waterman similarities such that closer dots represent more similar sequences Dot colors denote cluster association as derived from MCL clustering The 12 largest clusters are indicated by circles together with their sequence logos coarse phylogenetic composition and sample secondary structures where applicableKunin et al Genome Biology 2007 8R61 doi101186gb-2007-8-4-r61

CRISPR show extreme diversity and complex clustering

Arcful GTTGAAATC-------AGACCAAAATGGGATTGAAAG- Metthe GTTAAAATC-------AGACCAAAATGGGATTGAAAT- Pyraby GTTCCAATA-------AGACTAGAATAGAATTGAAAG- Pyrhor GTTTAATAA--------GACTAAAATAGAATTGAAAG- Sulsol GATTAATCC-------------CAAAAGAATTGAAAG- Pyrhor GTTTCCGTA--------GAACTAAATAGTGTGGAAAG- Pyraby GTTTCCGTA--------GAACTTAGTAGTGTGGAAAG- Pyraer GTTTCAACT------------ATCTTTTGATTTTTGG- Pictor GTTAAATAA-------TAACCTAAATAGGATTGAAAG- Theaer GTAAAATAG---------ACCTTAATAGGATTGAAAG- Metace GTTTCAATC-------CCTCAAAGGTCTGATTTTAAC- Sultok GATGAATCC------------CAAAAGGAATTGAAAG- Sulaci GTTTTAGTT-------------TCTTGTCGTTATTAC- Pictor GTTTAAGAA--------TTACTAGATAGTATGGAGT-- Halmar GTTTCAGAC-------GGACCCTTGTGGGGTTGAAGC- Metjan ATTAAAATC-------AGACCGTTTCGGAATGGAAAT- Metkan GTTTCATTA-CCCGTATTATTACGGGTTAATTGCGAG- Nanequ CTTTCAATA---------TTTCTAATATATTAGAAAC- Themar GTTTCAATA-------CTTCCTTAGAGGTATGGAAAC- Theten GTTTCAATC-------CCTTTTAGGTAGGCTAAAAAC Thethe GTTGCAAAC-CTCGTTAGCCTCGTAGAGGATTGAAAC- Aquaeo GTTTTAACT--------CCACACGGTACATTAGAAAC- Bachal GTCGCACTC----TATATGGGT-GCGTGGATTGAAAT- Azoarcus GTGTTCCCC-------GCGCATCGCGGGGGTTGAAG-- Chltep GTCTTCCCC--------ACGCC-CGTGGGGGTGTTTC- Chrvio GTGCTCCCC-------CACGCA-CGTGGGGATGAACCG Clotet GTATTAGTA-------GCACCA-TATTGGAATGTAAAT Anabaena GTTTTAATTAACAAAAATCCCTATCAGGGATTGAAAC- Myctub GTTTCCGTC--CCCTCTCGGGGTTTTGGGTCTGACGAC Yerpes GTTCACTGC--------CGCACAGGCAGCTTAGAAA-- Metcap GTTTCAATCCACTCCCGGCTATTTAGCCGGGAGATAC- Mycgal GTTTTAGCA-CTGTACAATACTTGTGTAAGCAATAAC- Pasmul GTTAACTGC--------CGTATAGGCAGCTTAGAAA-- Pasmul GTTGTAGTTCCCTCTCTCATTTCGCAGTGCTACAAT-- Pasmul GTTCACCAT--------CGTGTAGATGGCTTAGAAA--

All

A large subset of CRISPRs is conserved among diverse specieseven between archaea and bacteria suggesting that

CRISPRs are horizontally transferred together with cas genes

Mojica FJ Diez-Villasenor C Garcia-Martinez J Soria E

Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elementsJ Mol Evol 2005 Feb60(2)174-82

Here we show that CRISPR spacers derive from preexisting sequences either chromosomal or within transmissible genetic elements such as bacteriophages and conjugative plasmids

helliphelliphelliphellipThe transcription of the CRISPR loci (Tang et al 2002) suggests that such activity could be executed by CRISPR-RNA molecules acting as regulatory RNA that specifically recognizes the target through the homologous RNA-spacer sequence similarly to the eukaryotic interference RNA

HypothesisCRISPRCasbull is a prokaryotic immunity system that functions on the

RNAi principle

bull integrates short fragments of essential phageplasmid genes into CRISPRs

bull When expressed these fragments (psiRNA ndash after prokaryotic siRNA) silence the target gene and make the organism immune to the respective agent

bull contains all or most of the protein activities involved in these processes

bull Some of the Cas proteins are functional analogs of the eukaryotic proteins involved in RNAi in particular components of RISC (RNA-Induced Silencing Complex) and form prokaryotic analogs of RISC (pRISCs)

Transcription

(protein-guided) RNA folding

specific transcription factor (COG1517)

cellular RNApol

polycistronic pre-psiRNA

p-dicer (Helicase+HD-Hydrolase)(COG1203 COG2254)

RNA processing(fast)

pre-psiRNA75-100 nt

RNA processing(slow)

p-dicer

psiRNA25-45 nt

p-slicer(COG1468 COG4343

COG1857)

RAMP

target RNA cleavage

p-RISC

plasmid or phage mRNA

Annealing to RNA target

p-RISC

The basic scheme of CASS functioning

psiRNA25-45 nt

RAMP

RAMP-RNA complex(unstable)plasmid or phage

mRNA

Primer elongationCASS RNApol(COG1353)

Amplified annealing

long dsRNA (stable)

p-dicerDuplex degradation

RAMPRAMP binding

Cycle continues

Annealing to RNA target

Variant of CASS functioning with polymerasepsi-RNA amplification(mostly in thermophiles but also in Mycobacteria)

plasmid or phage mRNA

pre-psiRNA75-100 nt

CASS RNApol(COG1353)Or other RT

Reverse transcription with random copy choice

Random RNA recombination and reverse transcription

dsDNA with CRISPRs and target-derived spacers

Homologous recombination with genomic CRISPR region

integrase(COG1518)

genomic DNA

genomic DNA with new target-derived spacer

OR

New psiRNA generation

Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12

Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity

Key validation

bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease

Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes

(1) Adaptation

(2) Expression amp Processing

(3) Interference

invading virus plasmid

CRISPRleader

5 4 3 2 1

6 5 4 3 2 1

invading virus plasmid

invading virus plasmid

host

host

cas operon

Cascade

Cas3

cas3 cas2cas1cas5

Van der Oost et al TIBS 2009

The three types of CRISPRCas systems and their signature genes

1343

1343

1343

1343

Helicase1203

HD-f

Cas2

Cas6

Cas6

Cas5

Cas8

Cas9

Cas1

Cas3

Cas10

Cas11

Cas7

Cas6

Cas6

Cas5 Cas6

Cas4

Cas6

13433513

1343

RuvC-f

3513

HNH

13433513

1857

18571857 RAMP1688

RAMP1688

RecB-f1468

RecB-f1468

Helicase1203

R AMP1583

1857 1857Helicase1203

1857Helicase1203

1343 y1724

ygcKygcL

EcoliCASS2

YpestCASS3

NmeniCASS4

MtubeCASS6 RAMP module

DvulgCASS1

Tneap and HmariCASS7

ApernCASS5

Type I (Helicase cassette)

Cascade

Type II (HNH-type)

Type III (RAMP module or Polymerase cassette)

CASS4a

1857 1343RecB-f1468

Helicase1203

BH0338MTH1090LA3191

1421 RAMP1567

RAMP15833337RAMP

1583

RAMP1583

RAMP1769

SPy1049

Cas6

Cas61857

RecB-f1468

RAMP1688

Helicase1203

AF0070

R AMP1583

1343

1517

Cas12 Cas13

RAMP1688

RAMP1688

RAMP1688

Makarova et al NRM submitted

Experimental data on CRISPRCas systems

1857

1857

1857

RAMP1688

10

Cordip DIP0037Bacfra BF3951

Wolsuc WS1444Camjej Cj1522c

Neimen NMA0630Pasmul PM1126

Strthe str0658Strpyo Spy 0770

Treden TDE0328Staepi SERP2463

Mycgal MGA 0523Mycmob MMOB0320

Porgin PG1982Legpne lpp0161

Wolsuc WS1615Sulsol SSO1450

Aerper APE1240Sulsol SSO1405

Pyraer PAE0200Pyraer PAE0081

Metmaz MM0559Metmaz MM3249

Arcful AF1878Mansuc MS1635

Niteur NE0111Myctub Rv2817c

Nostoc alr0381Synech slr7071Thethe TTHB145

Chrvio CV1229Xanaxo XAC3842

Chltep CT1130Desvul DVUA0134

Metcap MCA0651Azoarc ebA3283

Deigeo_DRAFT_1682Mansuc MS0981

Baccla ABC3592Bachal BH0341

Strpyo Spy 1286Vibvul VVA1544

Nostoc alr1468Synech slr7092

Synech slr7016Geosul GSU0057

Thethe TTHB224Nostoc alr1568

Metkan MK1312

Esccol b2755Saltyp STM2938

Phopro PBPRC0034Geosul GSU1392

Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639

Symthe_STH663Corjei jk0643

Nocfar nfa44220Strave SAV7537

Metcap MCA0930Cordip DIP2214

Chrvio CV1756Zymmob ZMO0680

Pasmul PM0311Erwcar ECA3679

Yerpes y1722Legpne lpl2837Phopro PBPRB1995

Acinet ACIAD2484

Clotet CTC01148Fusnuc FN1177

Aquaeo aq 369Theten TTE2658Themar TM1797

Bacfra BF2544Porgin PG2014

Halmar pNG4053Clotet CTC01463

Metjan MJ0378Pyrhor PH1245

Thekod TK0455Metthe MTH1084

Arcful AF2435Metace MA3670

Nanequ NEQ017Pyrhor PH0173

Pictor PTO0003Pictor PTO0049Thevol TVN0106

1343

1343

1343

1343

1518

1518

1343

1518

RecB-f4343

RecB-f1468

RecB-f1468

RecB-f1468

Helicase12031857 RAMP

1688

RAMP1688

Helicase1203

Helicase1203

RAMPBH0337

RAMPYgcH

RAMPy1727

RAMPy17261857

Helicase1203

HD-f

Helicase1203

HD-f

1343

HD-f

HD-f

HD-f

RuvC-f

AF0070

BH0338

y1724

ygcKygcL

RAMP1583

HTH

AF1870

SPy1049

3574

AF1870

PH0918

HTH 3574

1857 RAMP1688

RecB-f1468

Helicase1203

HD-f

AF1870

MTH1090

AF1873

RAMP1583

3574

AF1870

3574

1343

1

7

2

3

4

5

6

7a

Sulsol SSO1999Sulsol SSO1440

Sultok ST2639Sultok ST0032

Sulsol SSO1402Pyraer PAE0068

Arcful AF1874Pyrhor PH0917Thekod TK0450

Pyraby PAB1689

13433513

3513

1518

RuvC-f

HNH

4aHNH

1 0

Pyrhor PH0162

6

64

6

4

16

17

7

7

7

7

1

2

7

7

7

7

5 7

5 75 7

5 75 7

5 7

6 1

7

Metkan MK1314

Metace MA1931

Metmaz MM3356

Metbar Mbar_A1355

Strthe stu0960Myctub Rv2823cStaepi SERP2461

Themar TM1811Mansuc MS1651

Niteur NE0123Thethe TTP0102

Metthe MTH1082

Metthe MTH326

Metjan MJ1672Pictor PTO0053Thevol TVN0112

Metkan MK1297Thethe TTP0115

Bachal BH0328Synech sll7090

Aquaeo aq 387Theten TTE2639

Themar TM1794Arcful AF1867

Pyrfur PF1129Thefus Tfu1578

Porgin PG1987Sultok ST1979

Sulsol SSO1991Sulsol SSO1729

Sultok ST0012Sulaci Saci 2046Nostoc all1479

1421 RAMP1567

RAMP1583

RAMPMTH323

RAMP1769

XMTH324

ldquoPolymeraserdquo cassette

ldquoHelicaserdquo cassette(A)

(B)

(C)

Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4

Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56

Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)

mRNA targeting

Phylogeny of Cas1 and the 3 types of CRISPRCas systems

228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes

EcoliCASS2Type IType I

Type I

Type I

Type I

Type III

Type III

Type III

Type III

Type III

Type III

Type II

YpestCASS3

NmeniCASS4

PolymeraseRAMPmoduleMtubeCASS6

PolRAMP

DvulgCASS1

TneapHmariCASS7

ApernCASS5

cas1cas9 cas2 cas4

cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI

II

IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2

RAMP

Makarova et al NRM submitted

CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria

bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes

54

256

1537

26

516

2

9

7 56107

3

10

13

383

210

46

216

8

1

7 70211

10

1

0102030405060708090

100

1623

84 6

17 7

2322

0

9

0

0

151

2

1

21

17

20 1

0

1533 28 7 14

0

9 7 40

117 210

0102030405060708090

100

Type I Type II Type IIICas1presenceabsence

Makarova et al in preparation

Modular evolution of the 3 types of CRISPRCas

Makarova et al in preparation

Mol Microbiol 2011 Jan79(2)484-502

A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair

Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF

Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair

Back to repair functions

The 3 major modalities of evolution

Koonin Wolf Biol Direct 2009

CRISPRCas as a bona fide Lamarckian system

Koonin Wolf Biol Direct 2009

Phenomenon Biological rolefunction

Phyletic spread

Lamarckian criteria

Genomic changes caused by environmental factor

Changes are specific to relevant genomic loci

Changes provide adaptation to the causative factor

Bona fide LamarckianCRISPRCas Defense against

viruses and other mobile elements

Most of the Archaeaand many bacteria

Yes Yes Yes

piRNA Defense against transposable elements in germline

Animals Yes Yes Yes

HGT (specific cases)

Adaptation to new environment stress response resistance

Archaea bacteria unicellular eukaryotes

Yes Yes Yes

Quasi-LamarckianHGT (general phenomenon)

Diverse innovations Archaea bacteria unicellular eukaryotes

Yes No Yesno

Stress-induced mutagenesis

Stress responseresistanceadaptation to new conditions

Ubiquitous Yes No or partially

Yes (but general evolvabilityenhanced as well)

Diverse Lamarckian and quasi-Lamarckian phenomena

Koonin Wolf Biol Direct 2009

Stress as a gauge of evolutionary modality

Koonin Wolf Biol Direct 2009

bullEvolution of parasites is intrinsic to any replicator system

bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since

bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms

bullPerennial arms race between parasites and hosts is one of the principal factors of evolution

Valerian DoljaOregon State University

Yuri Wolf NCBITatiana Senkevich NIAID NIH

Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille

Didier Raoult et al

Bill MartinUniv Duesseldorf

Kira Makarova NCBI

  • Slide Number 1
  • What is a virus
  • Slide Number 3
  • Slide Number 4
  • Slide Number 5
  • Comparative genomics shows that viruses that cause human diseases belong to families that evolved hundreds of millions or even billions years agoViruses accompany the evolving cellular life throughout its history and might even predate it
  • Slide Number 7
  • Slide Number 8
  • Slide Number 9
  • Slide Number 10
  • Slide Number 11
  • Slide Number 12
  • Slide Number 13
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Slide Number 18
  • Slide Number 19
  • Slide Number 20
  • Slide Number 21
  • Slide Number 22
  • Slide Number 23
  • Slide Number 24
  • Slide Number 25
  • Slide Number 26
  • Slide Number 27
  • Slide Number 28
  • Slide Number 29
  • Slide Number 30
  • Slide Number 31
  • Slide Number 32
  • Slide Number 33
  • Slide Number 34
  • Slide Number 35
  • Slide Number 36
  • Evolution of antivirus defense systems
  • Slide Number 38
  • Slide Number 39
  • Slide Number 40
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins extreme sequence diversity
  • Slide Number 46
  • Slide Number 47
  • Slide Number 48
  • Slide Number 49
  • Hypothesis
  • Slide Number 51
  • Slide Number 52
  • Slide Number 53
  • Slide Number 54
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • The three types of CRISPRCas systems and their signature genes
  • Experimental data on CRISPRCas systems
  • Phylogeny of Cas1 and the 3 types of CRISPRCas systems
  • CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
  • Slide Number 62
  • Slide Number 63
  • Slide Number 64
  • Slide Number 65
  • Slide Number 66
  • Slide Number 67
  • Slide Number 68
  • Slide Number 69
Page 14: The Virus World, its evolution, evolution of antiviral defense

There are really weird creatures out thereSome NCLDV host their own parasites

La Scola et al The virophage as a unique parasite of the giant mimivirus Nature 2008

Chimeric origin of the virophage genome

NCLDV

Mimivirus

Archaeal virus

NCLDVenvironmental only

Bacterialtransposon

La Scola et al Nature 2008 455100-4

Hypothesisorigin of the NCLDV (and other viruses ofEukaryotes) in the

the second melting pot of virus evolution ndasheukaryogenesis

Phage scaffold(virus hallmark genes)

Eukaryotic additionsdisplacements

Koonin Yutin Intervirology 2010

Poliovirus74 kb

VP0 3DPol2C 3CPro Angg

IRESVP1VP3 2APro 2B

3A 3B

Jelly-roll CPsSuperfamily 3

helicase RdRp

Viral hallmark genes

3C-Pro3DPol

RdRp3CPro g

VPg2C

S3H

Picornaviral lsquosignaturersquo genes

Protein sequence conservation

Some of the smallest viruses (this is where poliovirus belongs) The Big Bang of picorna-like virus evolution

Koonin Wolf Nagasaki Dolja Nat Rev Microbiol 2008 Dec6(12)925-39

Picorna-likeviruses possessdiverse arraysof hallmark andunique genes

Picorna-likeviral superfamily

Marine eukaryotic plankton carries a wealth ofpositive-strand RNA viruses nearly all belong to the picorna-like superfamily

Lang et al (2004) Virology 320206

HcRNAV44 kb

VP

HaRNAV86 kb

S-Pro Pol

VP1VP3VP2PolHel3

RsRNAV89 kb AnPolHel3 VP 1-3

SssRNAV90 kb An

PolHel3 VP 1-3C-Pro

sg sg

An

Nagasaki et al (2005) Appl Env Microbiol 718888

Culley et al (2006) Science 3121795

The amended Picornavirus-like

superfamily includes 14 recognized viral families

4 floating genera and15 unclassified positive-

strand anddouble-strand RNA

viruses that infect hosts from 4 of the 5 eukaryotic

supergroups-6 distinct clades

from RdRp phylogeny-diverse genome layouts

View from the viral side

6 major clades of picorna-likeviruses - 5 infect eukaryotesfrom different supergroups

The 5 supergroups of eukaryotes and their picorna-like viruses

Complementary view from thehost side

4 of the 5 eukaryoticsupergroups host diverse picorna-like viruses (no onechecked the 5th)

PolioV74 kb

VP0 3DPol2C 3CPro Angg

IRESVP1VP3 2APro 2B

3A 3B

Jelly-roll CPsSuperfamily 3

helicase RdRp

DNA phages Bacterial group II

retroelements

Bacterial and mitochondrialHtrA family proteases

Likely origins of the signature genes of the picorna-like superfamily were inferred usingsignature PSSMs and PSI-BLAST searches

3C-Pro

Hallmark genes of picorna-like viruses have distinct prokaryoticorigins

Radiation of major viralclades occurred in aldquoBig Bangrdquo during eukaryogenesis and antedates the divergenceof eukaryotic supergroups-the viruses then ldquosampledrdquo the hosts

Conclusions on the picorna and NCLDV storiesBig Bangs of virus evolution

bullPhylogenomic analysis of the picorna-like superfamily of eukaryotic RNA viruses indicates that the major lineages within this superfamily diverged prior to the divergence of the eukaryotic supergroupsbullBig Bang ndash an explosive early phase of viral evolutionbullMost likely the same pattern holds for other major groups ofviruses as illustrated by the evolutionary study of NCLDVa completely different group of virusesbullThe Big Bangs of eukaryotic virus evolution occurred concomitantly with a similar rapid phase of host evolution andcould be a manifestation of a general model of major evolutionarytransitions

Koonin The Biological Big Bang model for the major transitions in evolutionBiol Direct 2007 Aug 20221

Koonin Wolf Nagasaki Dolja Nat Rev Microbiol 2008 Dec6(12)925-39

+

1 Positive-strandRNA

3-30 kb - +R R ET

T+R

Class Replication cycle

2 Double-strandRNA

4-25 kb plusmnTr R E

T

+ plusmn

3 Negative-strandRNA

11-20 kbR R E

RdRp CPT

+-Tr -

+

Genetic cycles of RNA viruses

Diversity of viral genetic cycles versus the uniform genetic cycle of all cellular organisms Viruses are the

biospherersquos laboratory of genomic strategies

RdRp CP

RdRp CP

+4 Retroid

RNA viruses 7-12 kb +RT Tr E

T+Tr

plusmn

5 Retroid DNAviruses

elements2-10 kb +Tr

T+Tr

plusmnE RT

plusmn

RT CP

RT

Class Replication cycle

Genetic cycles of retroid viruses and retroelements

Genetic cycles of DNA viruses and plasmids 6 ssDNAvirusesplasmids2-11 kb

7 dsDNA virusesplasmids

5-1200 kb

+RCR RCR E

+Tr

plusmn +

TRCE

Tr

plusmn

+T

plusmnR E

Pr-Pol ALL CELLULARORGANISMS

YOU ARE HERE

I Genes with readily detectable homologs from cellular life forms 1 Genes with closely related homologs from cellular organisms

(typically the host of the given virus) present in a narrow group of viruses2 Genes that are conserved within a virus lineage or even several

lineages and have moderately close cellular homologsOrigin relatively recent (1) or ancient (2) acquisition from host

II Virus-specific genes3 ORFans ie genes without detectable homologs except possibly

in closely related viruses 4 Virus-specific genes that are conserved within a virus lineageAcquisition from host but with rapid divergence from ancestor once within viral genomes

III Viral hallmark genes5 Genes shared by many diverse virus lineages with only very distant

homologs in cellular organisms

Natural history of viral genes a one-page summary of viral comparative genomics

Contributions of different classes of viral genes to the genomes of different classes of viruses strong dependence on genome size

Genome size (log) 0 1 2 3

Recent acquisitions

Old acquisitions

Virus-specific ORFans

Virus-specific conserved

Virus hallmark

Most RNAvirusesretroelementsRCR replicons

Large RNA virusesadenotailedphages

NCLDVherpeslargephages

Natural history of viral genesViral Hallmark Genes

Shared by many diverse groups of viruses from the smallest RNA viruses to the giant DNA viruses

Strong support for monophyly of all viral members of the respective gene families

Only distant homologs in cellular organisms

Can be viewed as signatures of the lsquovirus statersquo

Play major roles in genome replication packagingand assembly

4 Rolling circle replicationinitiation endonuclease

1 Jelly-roll capsid protein

2 Superfamily 3 helicase

Protein products of viral hallmark genes

3 RNA-dependent RNA polymeraseand Reverse transcriptase

5 Viral archaeo-eukaryotic DNA primase

6 UL9-like superfamily 2 helicase

7 Packaging ATPase of the FtsK family

8 ATPase subunit of terminase

The primordial gene pool hypothesis (extremely counterintuitive ndash Santiago Elena Feb 16 2011)

The hallmark genes AND by implication the major lineages of modern viruses (at least viruses of prokaryotes) descend directly from a primordial gene pool

Koonin Senkevich Dolja Biol Direct 2006 129

Viral hallmark genes arebullpresent in a huge diversity of viruses and other selfish elementsbullrepresented only by remote homologs in cellular life forms

-synergy with the diversity of genomic strategies

A crucial corollary If viruses come directly from a primordial gene pool then origin of viruses isinextricably linked to the origin of cells

Cell degeneration Escaped genes Pr imordial geneticsystems

CELL

SMALLPARASITICCELL

VERYSMALLVIRUS

CHROMOSOME

PLASMID

VIRUS

RNA

VIRUS

PRE-CELLULAR LIFE FORMS

DNA

VIRUS

mRNA

VIRUS

Competing concepts of the or igin of viruses

CELLS

Origin and evolution of virus-like genetic

elementsin the pre-cellular

era

Koonin Martin TIG 2005Koonin Senkevich Dolja Biol Direct 2006 129Koonin Ann NY Acad Sci 2009

Replicon fusionyields chromosomes

bullViruses and virus-like genetic elements are not ldquojustrdquo pathogens they aredominant entities in the biospherebullEmergence of virus-like parasites is inevitable in any replicating systembullIn the pre-cellular epoch the genetic elements that later became viral andcellular genomes comprised a single pool in which they mixed matched andevolved new increasingly complex gene ensemblesbullDifferent replication strategies including RNA replication reverse transcriptionand DNA replication evolved already in the primordial genetic poolbullWith the emergence of prokaryotic cells a distinct pool of viral genes formed that retained its identity ever since as evidenced by the extant distribution ofviral hallmark genes ldquovirus worldrdquo or the virospherebullThe emergence of the eukaryotic cell was a second melting pot of virus evolution from which viruses of eukaryotes originated via recombination of genes from prokaryote viruses retroelements and the evolving eukaryotic hostbullViruses make essential contributions to the evolution of the genomes of cellularlife forms in particular as vehicles of HGT GTAs transducing phages

The ancient Virus World (VW)

Koonin EV Senkevich TG Dolja VV The ancient Virus World and evolution of cells Biol Direct 2006 Koonin EV Wolf YI Nagasaki K Dolja VV The complexity of the virus world Nat Rev Microbiol 2009

Evolution of antivirus defense systems

bull CRISPRCas system of adaptive immunity in prokaryotes

bull A case for Lamarckian evolutionbull The perennial virus-host arms race

CRISPR repeats and Cas genesCRISPR Clustered Regularly interspaced short palindromic repeatsCas CRISPR-associated (genes)

Sorek et al Nature Rev Microbiol 2008

Makarova KS Aravind L Grishin NV Rogozin IB Koonin EVA DNA repair system specific for thermophilic Archaea and bacteria predicted by genomic context analysisNucleic Acids Res 2002 Jan 1530(2)482-96

During a systematic analysis of conserved gene context in prokaryotic genomes a previously undetected complex partially conserved neighborhood consisting of more than 20 genes was discovered in most Archaea and some bacteria including the hyperthermophiles Thermotoga maritima and Aquifex aeolicus The gene composition and gene order in this neighborhood vary greatly between species but all versions have a stable conserved core that consists of five genes One of the core genes encodes a predicted DNA helicase often fused to a predicted HD-superfamily hydrolase and another encodes a RecB family exonuclease three core genes remain uncharacterized but one of these might encode a nuclease of a new family helliphelliphelliphellipThe functional features of the proteins encoded in this neighborhood suggest that they comprise a previously undetected DNA repair system which to our knowledge is the first repair system largely specific for thermophiles to be identified

Protein components of the system an update with unification of many diverse families

~25 families altogetherFamily SubfamilyA Phyletic

distributionBComments

1 COG1518 COG1518 (cas1) All Putative novel nucleaseintegrase Mostly α-helical protein

2 COG1343 COG1343 (cas2) COG3512ygbF-like

MTH324-likey1723_N-like

All Small protein related to VapD fused to helicase (COG1203) iny1723-like proteins

3 COG1203 COG1203 (cas3) All DNA helicase Most proteins have fusion to HD nuclease

4 RecB-like nuclease

COG1468 (cas4) COG4343 All RecB-like nuclease Contains three-cysteine C-terminal cluster

5 RAMP Repair-associated mysterious

protein

COG1688 (cas5)COG1769 COG1583COG1567 COG1336COG1367 COG1604COG1337 COG1332COG5551

BH0337-likeMJ0978-likeYgcH-like

y1726-like y1727-like

All Belong to ldquoRAMPrdquo family possibly RNA-binding proteinstructurally related to duplicated ferredoxin fold (PDB 1wj9)

6 COG1857 COG1857 COG3649 YgcJ-like y1725-like

All αβ protein predicted nuclease or integrase

7 HD-like nuclease

COG1203 (N-terminus) COG2254 All HD-like nuclease

8 BH0338 BH0338-likeMTH1090-like

All mostlyarchaea and FIRM

Large Zn-finger containing proteins probably nucleases (nucleaseactivity was shown for MTH1090 (ref)

9 COG1353 predicted

polymerase

COG1353 Most archaea some bacteria

Predicted palm-domain polymerase distantly related to viral RdRp and RT

hellipand ~15 other less common proteins Makarova et al BD 2006

CRISPR clustered regularly interspaced short palindromic repeats

CRISPR repeats

TB and TA

Strepto-like

Ecoli-like

Pasteurella-like

Cas2COG1343

Cas1COG1518

Cas4COG1468

Cas3COG1203

ldquoThe common structural characteristics of CRISPR loci are (i) the presence of multiple short direct repeats which show no or very little sequence variation within a given locus (ii) the presence of non-repetitive spacer sequences between the repeats of similar size (iii) the presence of a common leader sequence of a few hundred basepairs in most species harbouring multiple CRISPR loci (iv) the absence of long open reading frames within the locus and (v) the presence of the cas1 gene accompanied by the cas2 cas3 or cas4 genes in CRISPR-containing speciesrdquo

The cas genes are our ldquorepairrdquo system

ldquoHelicaserdquo cassette

ldquoPolymeraserdquo cassette

paragon of prokaryotic genome plasticity-extensive gene shuffling-evidence of multiple horizontal transfers-widespread gene loss

RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins

extreme sequence diversity MOTIFs specific I II III IV V COGs 13361367 hhshhGs ust-lKGhh+hh hhGtt hD lGhttsGh 160413371332 y1726-like slhlpEKuVRGT lRTIDs YGuVTsGhuh COG1851 hGphpGpsaFh hGFGRh BH0337-like TpA-h+GIh-uIh hhLpDV LGsREhuht COG1567 hhhpp ups-lhtAhh luscoGhGh COG1769 hhhh+Ph-hhh ss-hhGhlhsh hGh hGtcp+hsthchp COG1688 (Cas5) hhhhhts sssshhGhlsh lGttphh COGs 15835551 hhhhoPhhl hGtppshGFGl YgcH-like hHphlh hGu+uhGhGhh y1727-like LHphLh GFsthGLStss MJ0978-like hHNH lG+tsuhGhGol

Ferredoxin foldRNA-recognition motif (RRM)

The sequence similarity space of CRISPR repeats visualized with the BioLayoutprogram [26] Dots denote individual repeat sequences connecting lines represent Smith-Waterman similarities such that closer dots represent more similar sequences Dot colors denote cluster association as derived from MCL clustering The 12 largest clusters are indicated by circles together with their sequence logos coarse phylogenetic composition and sample secondary structures where applicableKunin et al Genome Biology 2007 8R61 doi101186gb-2007-8-4-r61

CRISPR show extreme diversity and complex clustering

Arcful GTTGAAATC-------AGACCAAAATGGGATTGAAAG- Metthe GTTAAAATC-------AGACCAAAATGGGATTGAAAT- Pyraby GTTCCAATA-------AGACTAGAATAGAATTGAAAG- Pyrhor GTTTAATAA--------GACTAAAATAGAATTGAAAG- Sulsol GATTAATCC-------------CAAAAGAATTGAAAG- Pyrhor GTTTCCGTA--------GAACTAAATAGTGTGGAAAG- Pyraby GTTTCCGTA--------GAACTTAGTAGTGTGGAAAG- Pyraer GTTTCAACT------------ATCTTTTGATTTTTGG- Pictor GTTAAATAA-------TAACCTAAATAGGATTGAAAG- Theaer GTAAAATAG---------ACCTTAATAGGATTGAAAG- Metace GTTTCAATC-------CCTCAAAGGTCTGATTTTAAC- Sultok GATGAATCC------------CAAAAGGAATTGAAAG- Sulaci GTTTTAGTT-------------TCTTGTCGTTATTAC- Pictor GTTTAAGAA--------TTACTAGATAGTATGGAGT-- Halmar GTTTCAGAC-------GGACCCTTGTGGGGTTGAAGC- Metjan ATTAAAATC-------AGACCGTTTCGGAATGGAAAT- Metkan GTTTCATTA-CCCGTATTATTACGGGTTAATTGCGAG- Nanequ CTTTCAATA---------TTTCTAATATATTAGAAAC- Themar GTTTCAATA-------CTTCCTTAGAGGTATGGAAAC- Theten GTTTCAATC-------CCTTTTAGGTAGGCTAAAAAC Thethe GTTGCAAAC-CTCGTTAGCCTCGTAGAGGATTGAAAC- Aquaeo GTTTTAACT--------CCACACGGTACATTAGAAAC- Bachal GTCGCACTC----TATATGGGT-GCGTGGATTGAAAT- Azoarcus GTGTTCCCC-------GCGCATCGCGGGGGTTGAAG-- Chltep GTCTTCCCC--------ACGCC-CGTGGGGGTGTTTC- Chrvio GTGCTCCCC-------CACGCA-CGTGGGGATGAACCG Clotet GTATTAGTA-------GCACCA-TATTGGAATGTAAAT Anabaena GTTTTAATTAACAAAAATCCCTATCAGGGATTGAAAC- Myctub GTTTCCGTC--CCCTCTCGGGGTTTTGGGTCTGACGAC Yerpes GTTCACTGC--------CGCACAGGCAGCTTAGAAA-- Metcap GTTTCAATCCACTCCCGGCTATTTAGCCGGGAGATAC- Mycgal GTTTTAGCA-CTGTACAATACTTGTGTAAGCAATAAC- Pasmul GTTAACTGC--------CGTATAGGCAGCTTAGAAA-- Pasmul GTTGTAGTTCCCTCTCTCATTTCGCAGTGCTACAAT-- Pasmul GTTCACCAT--------CGTGTAGATGGCTTAGAAA--

All

A large subset of CRISPRs is conserved among diverse specieseven between archaea and bacteria suggesting that

CRISPRs are horizontally transferred together with cas genes

Mojica FJ Diez-Villasenor C Garcia-Martinez J Soria E

Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elementsJ Mol Evol 2005 Feb60(2)174-82

Here we show that CRISPR spacers derive from preexisting sequences either chromosomal or within transmissible genetic elements such as bacteriophages and conjugative plasmids

helliphelliphelliphellipThe transcription of the CRISPR loci (Tang et al 2002) suggests that such activity could be executed by CRISPR-RNA molecules acting as regulatory RNA that specifically recognizes the target through the homologous RNA-spacer sequence similarly to the eukaryotic interference RNA

HypothesisCRISPRCasbull is a prokaryotic immunity system that functions on the

RNAi principle

bull integrates short fragments of essential phageplasmid genes into CRISPRs

bull When expressed these fragments (psiRNA ndash after prokaryotic siRNA) silence the target gene and make the organism immune to the respective agent

bull contains all or most of the protein activities involved in these processes

bull Some of the Cas proteins are functional analogs of the eukaryotic proteins involved in RNAi in particular components of RISC (RNA-Induced Silencing Complex) and form prokaryotic analogs of RISC (pRISCs)

Transcription

(protein-guided) RNA folding

specific transcription factor (COG1517)

cellular RNApol

polycistronic pre-psiRNA

p-dicer (Helicase+HD-Hydrolase)(COG1203 COG2254)

RNA processing(fast)

pre-psiRNA75-100 nt

RNA processing(slow)

p-dicer

psiRNA25-45 nt

p-slicer(COG1468 COG4343

COG1857)

RAMP

target RNA cleavage

p-RISC

plasmid or phage mRNA

Annealing to RNA target

p-RISC

The basic scheme of CASS functioning

psiRNA25-45 nt

RAMP

RAMP-RNA complex(unstable)plasmid or phage

mRNA

Primer elongationCASS RNApol(COG1353)

Amplified annealing

long dsRNA (stable)

p-dicerDuplex degradation

RAMPRAMP binding

Cycle continues

Annealing to RNA target

Variant of CASS functioning with polymerasepsi-RNA amplification(mostly in thermophiles but also in Mycobacteria)

plasmid or phage mRNA

pre-psiRNA75-100 nt

CASS RNApol(COG1353)Or other RT

Reverse transcription with random copy choice

Random RNA recombination and reverse transcription

dsDNA with CRISPRs and target-derived spacers

Homologous recombination with genomic CRISPR region

integrase(COG1518)

genomic DNA

genomic DNA with new target-derived spacer

OR

New psiRNA generation

Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12

Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity

Key validation

bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease

Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes

(1) Adaptation

(2) Expression amp Processing

(3) Interference

invading virus plasmid

CRISPRleader

5 4 3 2 1

6 5 4 3 2 1

invading virus plasmid

invading virus plasmid

host

host

cas operon

Cascade

Cas3

cas3 cas2cas1cas5

Van der Oost et al TIBS 2009

The three types of CRISPRCas systems and their signature genes

1343

1343

1343

1343

Helicase1203

HD-f

Cas2

Cas6

Cas6

Cas5

Cas8

Cas9

Cas1

Cas3

Cas10

Cas11

Cas7

Cas6

Cas6

Cas5 Cas6

Cas4

Cas6

13433513

1343

RuvC-f

3513

HNH

13433513

1857

18571857 RAMP1688

RAMP1688

RecB-f1468

RecB-f1468

Helicase1203

R AMP1583

1857 1857Helicase1203

1857Helicase1203

1343 y1724

ygcKygcL

EcoliCASS2

YpestCASS3

NmeniCASS4

MtubeCASS6 RAMP module

DvulgCASS1

Tneap and HmariCASS7

ApernCASS5

Type I (Helicase cassette)

Cascade

Type II (HNH-type)

Type III (RAMP module or Polymerase cassette)

CASS4a

1857 1343RecB-f1468

Helicase1203

BH0338MTH1090LA3191

1421 RAMP1567

RAMP15833337RAMP

1583

RAMP1583

RAMP1769

SPy1049

Cas6

Cas61857

RecB-f1468

RAMP1688

Helicase1203

AF0070

R AMP1583

1343

1517

Cas12 Cas13

RAMP1688

RAMP1688

RAMP1688

Makarova et al NRM submitted

Experimental data on CRISPRCas systems

1857

1857

1857

RAMP1688

10

Cordip DIP0037Bacfra BF3951

Wolsuc WS1444Camjej Cj1522c

Neimen NMA0630Pasmul PM1126

Strthe str0658Strpyo Spy 0770

Treden TDE0328Staepi SERP2463

Mycgal MGA 0523Mycmob MMOB0320

Porgin PG1982Legpne lpp0161

Wolsuc WS1615Sulsol SSO1450

Aerper APE1240Sulsol SSO1405

Pyraer PAE0200Pyraer PAE0081

Metmaz MM0559Metmaz MM3249

Arcful AF1878Mansuc MS1635

Niteur NE0111Myctub Rv2817c

Nostoc alr0381Synech slr7071Thethe TTHB145

Chrvio CV1229Xanaxo XAC3842

Chltep CT1130Desvul DVUA0134

Metcap MCA0651Azoarc ebA3283

Deigeo_DRAFT_1682Mansuc MS0981

Baccla ABC3592Bachal BH0341

Strpyo Spy 1286Vibvul VVA1544

Nostoc alr1468Synech slr7092

Synech slr7016Geosul GSU0057

Thethe TTHB224Nostoc alr1568

Metkan MK1312

Esccol b2755Saltyp STM2938

Phopro PBPRC0034Geosul GSU1392

Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639

Symthe_STH663Corjei jk0643

Nocfar nfa44220Strave SAV7537

Metcap MCA0930Cordip DIP2214

Chrvio CV1756Zymmob ZMO0680

Pasmul PM0311Erwcar ECA3679

Yerpes y1722Legpne lpl2837Phopro PBPRB1995

Acinet ACIAD2484

Clotet CTC01148Fusnuc FN1177

Aquaeo aq 369Theten TTE2658Themar TM1797

Bacfra BF2544Porgin PG2014

Halmar pNG4053Clotet CTC01463

Metjan MJ0378Pyrhor PH1245

Thekod TK0455Metthe MTH1084

Arcful AF2435Metace MA3670

Nanequ NEQ017Pyrhor PH0173

Pictor PTO0003Pictor PTO0049Thevol TVN0106

1343

1343

1343

1343

1518

1518

1343

1518

RecB-f4343

RecB-f1468

RecB-f1468

RecB-f1468

Helicase12031857 RAMP

1688

RAMP1688

Helicase1203

Helicase1203

RAMPBH0337

RAMPYgcH

RAMPy1727

RAMPy17261857

Helicase1203

HD-f

Helicase1203

HD-f

1343

HD-f

HD-f

HD-f

RuvC-f

AF0070

BH0338

y1724

ygcKygcL

RAMP1583

HTH

AF1870

SPy1049

3574

AF1870

PH0918

HTH 3574

1857 RAMP1688

RecB-f1468

Helicase1203

HD-f

AF1870

MTH1090

AF1873

RAMP1583

3574

AF1870

3574

1343

1

7

2

3

4

5

6

7a

Sulsol SSO1999Sulsol SSO1440

Sultok ST2639Sultok ST0032

Sulsol SSO1402Pyraer PAE0068

Arcful AF1874Pyrhor PH0917Thekod TK0450

Pyraby PAB1689

13433513

3513

1518

RuvC-f

HNH

4aHNH

1 0

Pyrhor PH0162

6

64

6

4

16

17

7

7

7

7

1

2

7

7

7

7

5 7

5 75 7

5 75 7

5 7

6 1

7

Metkan MK1314

Metace MA1931

Metmaz MM3356

Metbar Mbar_A1355

Strthe stu0960Myctub Rv2823cStaepi SERP2461

Themar TM1811Mansuc MS1651

Niteur NE0123Thethe TTP0102

Metthe MTH1082

Metthe MTH326

Metjan MJ1672Pictor PTO0053Thevol TVN0112

Metkan MK1297Thethe TTP0115

Bachal BH0328Synech sll7090

Aquaeo aq 387Theten TTE2639

Themar TM1794Arcful AF1867

Pyrfur PF1129Thefus Tfu1578

Porgin PG1987Sultok ST1979

Sulsol SSO1991Sulsol SSO1729

Sultok ST0012Sulaci Saci 2046Nostoc all1479

1421 RAMP1567

RAMP1583

RAMPMTH323

RAMP1769

XMTH324

ldquoPolymeraserdquo cassette

ldquoHelicaserdquo cassette(A)

(B)

(C)

Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4

Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56

Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)

mRNA targeting

Phylogeny of Cas1 and the 3 types of CRISPRCas systems

228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes

EcoliCASS2Type IType I

Type I

Type I

Type I

Type III

Type III

Type III

Type III

Type III

Type III

Type II

YpestCASS3

NmeniCASS4

PolymeraseRAMPmoduleMtubeCASS6

PolRAMP

DvulgCASS1

TneapHmariCASS7

ApernCASS5

cas1cas9 cas2 cas4

cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI

II

IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2

RAMP

Makarova et al NRM submitted

CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria

bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes

54

256

1537

26

516

2

9

7 56107

3

10

13

383

210

46

216

8

1

7 70211

10

1

0102030405060708090

100

1623

84 6

17 7

2322

0

9

0

0

151

2

1

21

17

20 1

0

1533 28 7 14

0

9 7 40

117 210

0102030405060708090

100

Type I Type II Type IIICas1presenceabsence

Makarova et al in preparation

Modular evolution of the 3 types of CRISPRCas

Makarova et al in preparation

Mol Microbiol 2011 Jan79(2)484-502

A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair

Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF

Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair

Back to repair functions

The 3 major modalities of evolution

Koonin Wolf Biol Direct 2009

CRISPRCas as a bona fide Lamarckian system

Koonin Wolf Biol Direct 2009

Phenomenon Biological rolefunction

Phyletic spread

Lamarckian criteria

Genomic changes caused by environmental factor

Changes are specific to relevant genomic loci

Changes provide adaptation to the causative factor

Bona fide LamarckianCRISPRCas Defense against

viruses and other mobile elements

Most of the Archaeaand many bacteria

Yes Yes Yes

piRNA Defense against transposable elements in germline

Animals Yes Yes Yes

HGT (specific cases)

Adaptation to new environment stress response resistance

Archaea bacteria unicellular eukaryotes

Yes Yes Yes

Quasi-LamarckianHGT (general phenomenon)

Diverse innovations Archaea bacteria unicellular eukaryotes

Yes No Yesno

Stress-induced mutagenesis

Stress responseresistanceadaptation to new conditions

Ubiquitous Yes No or partially

Yes (but general evolvabilityenhanced as well)

Diverse Lamarckian and quasi-Lamarckian phenomena

Koonin Wolf Biol Direct 2009

Stress as a gauge of evolutionary modality

Koonin Wolf Biol Direct 2009

bullEvolution of parasites is intrinsic to any replicator system

bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since

bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms

bullPerennial arms race between parasites and hosts is one of the principal factors of evolution

Valerian DoljaOregon State University

Yuri Wolf NCBITatiana Senkevich NIAID NIH

Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille

Didier Raoult et al

Bill MartinUniv Duesseldorf

Kira Makarova NCBI

  • Slide Number 1
  • What is a virus
  • Slide Number 3
  • Slide Number 4
  • Slide Number 5
  • Comparative genomics shows that viruses that cause human diseases belong to families that evolved hundreds of millions or even billions years agoViruses accompany the evolving cellular life throughout its history and might even predate it
  • Slide Number 7
  • Slide Number 8
  • Slide Number 9
  • Slide Number 10
  • Slide Number 11
  • Slide Number 12
  • Slide Number 13
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Slide Number 18
  • Slide Number 19
  • Slide Number 20
  • Slide Number 21
  • Slide Number 22
  • Slide Number 23
  • Slide Number 24
  • Slide Number 25
  • Slide Number 26
  • Slide Number 27
  • Slide Number 28
  • Slide Number 29
  • Slide Number 30
  • Slide Number 31
  • Slide Number 32
  • Slide Number 33
  • Slide Number 34
  • Slide Number 35
  • Slide Number 36
  • Evolution of antivirus defense systems
  • Slide Number 38
  • Slide Number 39
  • Slide Number 40
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins extreme sequence diversity
  • Slide Number 46
  • Slide Number 47
  • Slide Number 48
  • Slide Number 49
  • Hypothesis
  • Slide Number 51
  • Slide Number 52
  • Slide Number 53
  • Slide Number 54
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • The three types of CRISPRCas systems and their signature genes
  • Experimental data on CRISPRCas systems
  • Phylogeny of Cas1 and the 3 types of CRISPRCas systems
  • CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
  • Slide Number 62
  • Slide Number 63
  • Slide Number 64
  • Slide Number 65
  • Slide Number 66
  • Slide Number 67
  • Slide Number 68
  • Slide Number 69
Page 15: The Virus World, its evolution, evolution of antiviral defense

Chimeric origin of the virophage genome

NCLDV

Mimivirus

Archaeal virus

NCLDVenvironmental only

Bacterialtransposon

La Scola et al Nature 2008 455100-4

Hypothesisorigin of the NCLDV (and other viruses ofEukaryotes) in the

the second melting pot of virus evolution ndasheukaryogenesis

Phage scaffold(virus hallmark genes)

Eukaryotic additionsdisplacements

Koonin Yutin Intervirology 2010

Poliovirus74 kb

VP0 3DPol2C 3CPro Angg

IRESVP1VP3 2APro 2B

3A 3B

Jelly-roll CPsSuperfamily 3

helicase RdRp

Viral hallmark genes

3C-Pro3DPol

RdRp3CPro g

VPg2C

S3H

Picornaviral lsquosignaturersquo genes

Protein sequence conservation

Some of the smallest viruses (this is where poliovirus belongs) The Big Bang of picorna-like virus evolution

Koonin Wolf Nagasaki Dolja Nat Rev Microbiol 2008 Dec6(12)925-39

Picorna-likeviruses possessdiverse arraysof hallmark andunique genes

Picorna-likeviral superfamily

Marine eukaryotic plankton carries a wealth ofpositive-strand RNA viruses nearly all belong to the picorna-like superfamily

Lang et al (2004) Virology 320206

HcRNAV44 kb

VP

HaRNAV86 kb

S-Pro Pol

VP1VP3VP2PolHel3

RsRNAV89 kb AnPolHel3 VP 1-3

SssRNAV90 kb An

PolHel3 VP 1-3C-Pro

sg sg

An

Nagasaki et al (2005) Appl Env Microbiol 718888

Culley et al (2006) Science 3121795

The amended Picornavirus-like

superfamily includes 14 recognized viral families

4 floating genera and15 unclassified positive-

strand anddouble-strand RNA

viruses that infect hosts from 4 of the 5 eukaryotic

supergroups-6 distinct clades

from RdRp phylogeny-diverse genome layouts

View from the viral side

6 major clades of picorna-likeviruses - 5 infect eukaryotesfrom different supergroups

The 5 supergroups of eukaryotes and their picorna-like viruses

Complementary view from thehost side

4 of the 5 eukaryoticsupergroups host diverse picorna-like viruses (no onechecked the 5th)

PolioV74 kb

VP0 3DPol2C 3CPro Angg

IRESVP1VP3 2APro 2B

3A 3B

Jelly-roll CPsSuperfamily 3

helicase RdRp

DNA phages Bacterial group II

retroelements

Bacterial and mitochondrialHtrA family proteases

Likely origins of the signature genes of the picorna-like superfamily were inferred usingsignature PSSMs and PSI-BLAST searches

3C-Pro

Hallmark genes of picorna-like viruses have distinct prokaryoticorigins

Radiation of major viralclades occurred in aldquoBig Bangrdquo during eukaryogenesis and antedates the divergenceof eukaryotic supergroups-the viruses then ldquosampledrdquo the hosts

Conclusions on the picorna and NCLDV storiesBig Bangs of virus evolution

bullPhylogenomic analysis of the picorna-like superfamily of eukaryotic RNA viruses indicates that the major lineages within this superfamily diverged prior to the divergence of the eukaryotic supergroupsbullBig Bang ndash an explosive early phase of viral evolutionbullMost likely the same pattern holds for other major groups ofviruses as illustrated by the evolutionary study of NCLDVa completely different group of virusesbullThe Big Bangs of eukaryotic virus evolution occurred concomitantly with a similar rapid phase of host evolution andcould be a manifestation of a general model of major evolutionarytransitions

Koonin The Biological Big Bang model for the major transitions in evolutionBiol Direct 2007 Aug 20221

Koonin Wolf Nagasaki Dolja Nat Rev Microbiol 2008 Dec6(12)925-39

+

1 Positive-strandRNA

3-30 kb - +R R ET

T+R

Class Replication cycle

2 Double-strandRNA

4-25 kb plusmnTr R E

T

+ plusmn

3 Negative-strandRNA

11-20 kbR R E

RdRp CPT

+-Tr -

+

Genetic cycles of RNA viruses

Diversity of viral genetic cycles versus the uniform genetic cycle of all cellular organisms Viruses are the

biospherersquos laboratory of genomic strategies

RdRp CP

RdRp CP

+4 Retroid

RNA viruses 7-12 kb +RT Tr E

T+Tr

plusmn

5 Retroid DNAviruses

elements2-10 kb +Tr

T+Tr

plusmnE RT

plusmn

RT CP

RT

Class Replication cycle

Genetic cycles of retroid viruses and retroelements

Genetic cycles of DNA viruses and plasmids 6 ssDNAvirusesplasmids2-11 kb

7 dsDNA virusesplasmids

5-1200 kb

+RCR RCR E

+Tr

plusmn +

TRCE

Tr

plusmn

+T

plusmnR E

Pr-Pol ALL CELLULARORGANISMS

YOU ARE HERE

I Genes with readily detectable homologs from cellular life forms 1 Genes with closely related homologs from cellular organisms

(typically the host of the given virus) present in a narrow group of viruses2 Genes that are conserved within a virus lineage or even several

lineages and have moderately close cellular homologsOrigin relatively recent (1) or ancient (2) acquisition from host

II Virus-specific genes3 ORFans ie genes without detectable homologs except possibly

in closely related viruses 4 Virus-specific genes that are conserved within a virus lineageAcquisition from host but with rapid divergence from ancestor once within viral genomes

III Viral hallmark genes5 Genes shared by many diverse virus lineages with only very distant

homologs in cellular organisms

Natural history of viral genes a one-page summary of viral comparative genomics

Contributions of different classes of viral genes to the genomes of different classes of viruses strong dependence on genome size

Genome size (log) 0 1 2 3

Recent acquisitions

Old acquisitions

Virus-specific ORFans

Virus-specific conserved

Virus hallmark

Most RNAvirusesretroelementsRCR replicons

Large RNA virusesadenotailedphages

NCLDVherpeslargephages

Natural history of viral genesViral Hallmark Genes

Shared by many diverse groups of viruses from the smallest RNA viruses to the giant DNA viruses

Strong support for monophyly of all viral members of the respective gene families

Only distant homologs in cellular organisms

Can be viewed as signatures of the lsquovirus statersquo

Play major roles in genome replication packagingand assembly

4 Rolling circle replicationinitiation endonuclease

1 Jelly-roll capsid protein

2 Superfamily 3 helicase

Protein products of viral hallmark genes

3 RNA-dependent RNA polymeraseand Reverse transcriptase

5 Viral archaeo-eukaryotic DNA primase

6 UL9-like superfamily 2 helicase

7 Packaging ATPase of the FtsK family

8 ATPase subunit of terminase

The primordial gene pool hypothesis (extremely counterintuitive ndash Santiago Elena Feb 16 2011)

The hallmark genes AND by implication the major lineages of modern viruses (at least viruses of prokaryotes) descend directly from a primordial gene pool

Koonin Senkevich Dolja Biol Direct 2006 129

Viral hallmark genes arebullpresent in a huge diversity of viruses and other selfish elementsbullrepresented only by remote homologs in cellular life forms

-synergy with the diversity of genomic strategies

A crucial corollary If viruses come directly from a primordial gene pool then origin of viruses isinextricably linked to the origin of cells

Cell degeneration Escaped genes Pr imordial geneticsystems

CELL

SMALLPARASITICCELL

VERYSMALLVIRUS

CHROMOSOME

PLASMID

VIRUS

RNA

VIRUS

PRE-CELLULAR LIFE FORMS

DNA

VIRUS

mRNA

VIRUS

Competing concepts of the or igin of viruses

CELLS

Origin and evolution of virus-like genetic

elementsin the pre-cellular

era

Koonin Martin TIG 2005Koonin Senkevich Dolja Biol Direct 2006 129Koonin Ann NY Acad Sci 2009

Replicon fusionyields chromosomes

bullViruses and virus-like genetic elements are not ldquojustrdquo pathogens they aredominant entities in the biospherebullEmergence of virus-like parasites is inevitable in any replicating systembullIn the pre-cellular epoch the genetic elements that later became viral andcellular genomes comprised a single pool in which they mixed matched andevolved new increasingly complex gene ensemblesbullDifferent replication strategies including RNA replication reverse transcriptionand DNA replication evolved already in the primordial genetic poolbullWith the emergence of prokaryotic cells a distinct pool of viral genes formed that retained its identity ever since as evidenced by the extant distribution ofviral hallmark genes ldquovirus worldrdquo or the virospherebullThe emergence of the eukaryotic cell was a second melting pot of virus evolution from which viruses of eukaryotes originated via recombination of genes from prokaryote viruses retroelements and the evolving eukaryotic hostbullViruses make essential contributions to the evolution of the genomes of cellularlife forms in particular as vehicles of HGT GTAs transducing phages

The ancient Virus World (VW)

Koonin EV Senkevich TG Dolja VV The ancient Virus World and evolution of cells Biol Direct 2006 Koonin EV Wolf YI Nagasaki K Dolja VV The complexity of the virus world Nat Rev Microbiol 2009

Evolution of antivirus defense systems

bull CRISPRCas system of adaptive immunity in prokaryotes

bull A case for Lamarckian evolutionbull The perennial virus-host arms race

CRISPR repeats and Cas genesCRISPR Clustered Regularly interspaced short palindromic repeatsCas CRISPR-associated (genes)

Sorek et al Nature Rev Microbiol 2008

Makarova KS Aravind L Grishin NV Rogozin IB Koonin EVA DNA repair system specific for thermophilic Archaea and bacteria predicted by genomic context analysisNucleic Acids Res 2002 Jan 1530(2)482-96

During a systematic analysis of conserved gene context in prokaryotic genomes a previously undetected complex partially conserved neighborhood consisting of more than 20 genes was discovered in most Archaea and some bacteria including the hyperthermophiles Thermotoga maritima and Aquifex aeolicus The gene composition and gene order in this neighborhood vary greatly between species but all versions have a stable conserved core that consists of five genes One of the core genes encodes a predicted DNA helicase often fused to a predicted HD-superfamily hydrolase and another encodes a RecB family exonuclease three core genes remain uncharacterized but one of these might encode a nuclease of a new family helliphelliphelliphellipThe functional features of the proteins encoded in this neighborhood suggest that they comprise a previously undetected DNA repair system which to our knowledge is the first repair system largely specific for thermophiles to be identified

Protein components of the system an update with unification of many diverse families

~25 families altogetherFamily SubfamilyA Phyletic

distributionBComments

1 COG1518 COG1518 (cas1) All Putative novel nucleaseintegrase Mostly α-helical protein

2 COG1343 COG1343 (cas2) COG3512ygbF-like

MTH324-likey1723_N-like

All Small protein related to VapD fused to helicase (COG1203) iny1723-like proteins

3 COG1203 COG1203 (cas3) All DNA helicase Most proteins have fusion to HD nuclease

4 RecB-like nuclease

COG1468 (cas4) COG4343 All RecB-like nuclease Contains three-cysteine C-terminal cluster

5 RAMP Repair-associated mysterious

protein

COG1688 (cas5)COG1769 COG1583COG1567 COG1336COG1367 COG1604COG1337 COG1332COG5551

BH0337-likeMJ0978-likeYgcH-like

y1726-like y1727-like

All Belong to ldquoRAMPrdquo family possibly RNA-binding proteinstructurally related to duplicated ferredoxin fold (PDB 1wj9)

6 COG1857 COG1857 COG3649 YgcJ-like y1725-like

All αβ protein predicted nuclease or integrase

7 HD-like nuclease

COG1203 (N-terminus) COG2254 All HD-like nuclease

8 BH0338 BH0338-likeMTH1090-like

All mostlyarchaea and FIRM

Large Zn-finger containing proteins probably nucleases (nucleaseactivity was shown for MTH1090 (ref)

9 COG1353 predicted

polymerase

COG1353 Most archaea some bacteria

Predicted palm-domain polymerase distantly related to viral RdRp and RT

hellipand ~15 other less common proteins Makarova et al BD 2006

CRISPR clustered regularly interspaced short palindromic repeats

CRISPR repeats

TB and TA

Strepto-like

Ecoli-like

Pasteurella-like

Cas2COG1343

Cas1COG1518

Cas4COG1468

Cas3COG1203

ldquoThe common structural characteristics of CRISPR loci are (i) the presence of multiple short direct repeats which show no or very little sequence variation within a given locus (ii) the presence of non-repetitive spacer sequences between the repeats of similar size (iii) the presence of a common leader sequence of a few hundred basepairs in most species harbouring multiple CRISPR loci (iv) the absence of long open reading frames within the locus and (v) the presence of the cas1 gene accompanied by the cas2 cas3 or cas4 genes in CRISPR-containing speciesrdquo

The cas genes are our ldquorepairrdquo system

ldquoHelicaserdquo cassette

ldquoPolymeraserdquo cassette

paragon of prokaryotic genome plasticity-extensive gene shuffling-evidence of multiple horizontal transfers-widespread gene loss

RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins

extreme sequence diversity MOTIFs specific I II III IV V COGs 13361367 hhshhGs ust-lKGhh+hh hhGtt hD lGhttsGh 160413371332 y1726-like slhlpEKuVRGT lRTIDs YGuVTsGhuh COG1851 hGphpGpsaFh hGFGRh BH0337-like TpA-h+GIh-uIh hhLpDV LGsREhuht COG1567 hhhpp ups-lhtAhh luscoGhGh COG1769 hhhh+Ph-hhh ss-hhGhlhsh hGh hGtcp+hsthchp COG1688 (Cas5) hhhhhts sssshhGhlsh lGttphh COGs 15835551 hhhhoPhhl hGtppshGFGl YgcH-like hHphlh hGu+uhGhGhh y1727-like LHphLh GFsthGLStss MJ0978-like hHNH lG+tsuhGhGol

Ferredoxin foldRNA-recognition motif (RRM)

The sequence similarity space of CRISPR repeats visualized with the BioLayoutprogram [26] Dots denote individual repeat sequences connecting lines represent Smith-Waterman similarities such that closer dots represent more similar sequences Dot colors denote cluster association as derived from MCL clustering The 12 largest clusters are indicated by circles together with their sequence logos coarse phylogenetic composition and sample secondary structures where applicableKunin et al Genome Biology 2007 8R61 doi101186gb-2007-8-4-r61

CRISPR show extreme diversity and complex clustering

Arcful GTTGAAATC-------AGACCAAAATGGGATTGAAAG- Metthe GTTAAAATC-------AGACCAAAATGGGATTGAAAT- Pyraby GTTCCAATA-------AGACTAGAATAGAATTGAAAG- Pyrhor GTTTAATAA--------GACTAAAATAGAATTGAAAG- Sulsol GATTAATCC-------------CAAAAGAATTGAAAG- Pyrhor GTTTCCGTA--------GAACTAAATAGTGTGGAAAG- Pyraby GTTTCCGTA--------GAACTTAGTAGTGTGGAAAG- Pyraer GTTTCAACT------------ATCTTTTGATTTTTGG- Pictor GTTAAATAA-------TAACCTAAATAGGATTGAAAG- Theaer GTAAAATAG---------ACCTTAATAGGATTGAAAG- Metace GTTTCAATC-------CCTCAAAGGTCTGATTTTAAC- Sultok GATGAATCC------------CAAAAGGAATTGAAAG- Sulaci GTTTTAGTT-------------TCTTGTCGTTATTAC- Pictor GTTTAAGAA--------TTACTAGATAGTATGGAGT-- Halmar GTTTCAGAC-------GGACCCTTGTGGGGTTGAAGC- Metjan ATTAAAATC-------AGACCGTTTCGGAATGGAAAT- Metkan GTTTCATTA-CCCGTATTATTACGGGTTAATTGCGAG- Nanequ CTTTCAATA---------TTTCTAATATATTAGAAAC- Themar GTTTCAATA-------CTTCCTTAGAGGTATGGAAAC- Theten GTTTCAATC-------CCTTTTAGGTAGGCTAAAAAC Thethe GTTGCAAAC-CTCGTTAGCCTCGTAGAGGATTGAAAC- Aquaeo GTTTTAACT--------CCACACGGTACATTAGAAAC- Bachal GTCGCACTC----TATATGGGT-GCGTGGATTGAAAT- Azoarcus GTGTTCCCC-------GCGCATCGCGGGGGTTGAAG-- Chltep GTCTTCCCC--------ACGCC-CGTGGGGGTGTTTC- Chrvio GTGCTCCCC-------CACGCA-CGTGGGGATGAACCG Clotet GTATTAGTA-------GCACCA-TATTGGAATGTAAAT Anabaena GTTTTAATTAACAAAAATCCCTATCAGGGATTGAAAC- Myctub GTTTCCGTC--CCCTCTCGGGGTTTTGGGTCTGACGAC Yerpes GTTCACTGC--------CGCACAGGCAGCTTAGAAA-- Metcap GTTTCAATCCACTCCCGGCTATTTAGCCGGGAGATAC- Mycgal GTTTTAGCA-CTGTACAATACTTGTGTAAGCAATAAC- Pasmul GTTAACTGC--------CGTATAGGCAGCTTAGAAA-- Pasmul GTTGTAGTTCCCTCTCTCATTTCGCAGTGCTACAAT-- Pasmul GTTCACCAT--------CGTGTAGATGGCTTAGAAA--

All

A large subset of CRISPRs is conserved among diverse specieseven between archaea and bacteria suggesting that

CRISPRs are horizontally transferred together with cas genes

Mojica FJ Diez-Villasenor C Garcia-Martinez J Soria E

Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elementsJ Mol Evol 2005 Feb60(2)174-82

Here we show that CRISPR spacers derive from preexisting sequences either chromosomal or within transmissible genetic elements such as bacteriophages and conjugative plasmids

helliphelliphelliphellipThe transcription of the CRISPR loci (Tang et al 2002) suggests that such activity could be executed by CRISPR-RNA molecules acting as regulatory RNA that specifically recognizes the target through the homologous RNA-spacer sequence similarly to the eukaryotic interference RNA

HypothesisCRISPRCasbull is a prokaryotic immunity system that functions on the

RNAi principle

bull integrates short fragments of essential phageplasmid genes into CRISPRs

bull When expressed these fragments (psiRNA ndash after prokaryotic siRNA) silence the target gene and make the organism immune to the respective agent

bull contains all or most of the protein activities involved in these processes

bull Some of the Cas proteins are functional analogs of the eukaryotic proteins involved in RNAi in particular components of RISC (RNA-Induced Silencing Complex) and form prokaryotic analogs of RISC (pRISCs)

Transcription

(protein-guided) RNA folding

specific transcription factor (COG1517)

cellular RNApol

polycistronic pre-psiRNA

p-dicer (Helicase+HD-Hydrolase)(COG1203 COG2254)

RNA processing(fast)

pre-psiRNA75-100 nt

RNA processing(slow)

p-dicer

psiRNA25-45 nt

p-slicer(COG1468 COG4343

COG1857)

RAMP

target RNA cleavage

p-RISC

plasmid or phage mRNA

Annealing to RNA target

p-RISC

The basic scheme of CASS functioning

psiRNA25-45 nt

RAMP

RAMP-RNA complex(unstable)plasmid or phage

mRNA

Primer elongationCASS RNApol(COG1353)

Amplified annealing

long dsRNA (stable)

p-dicerDuplex degradation

RAMPRAMP binding

Cycle continues

Annealing to RNA target

Variant of CASS functioning with polymerasepsi-RNA amplification(mostly in thermophiles but also in Mycobacteria)

plasmid or phage mRNA

pre-psiRNA75-100 nt

CASS RNApol(COG1353)Or other RT

Reverse transcription with random copy choice

Random RNA recombination and reverse transcription

dsDNA with CRISPRs and target-derived spacers

Homologous recombination with genomic CRISPR region

integrase(COG1518)

genomic DNA

genomic DNA with new target-derived spacer

OR

New psiRNA generation

Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12

Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity

Key validation

bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease

Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes

(1) Adaptation

(2) Expression amp Processing

(3) Interference

invading virus plasmid

CRISPRleader

5 4 3 2 1

6 5 4 3 2 1

invading virus plasmid

invading virus plasmid

host

host

cas operon

Cascade

Cas3

cas3 cas2cas1cas5

Van der Oost et al TIBS 2009

The three types of CRISPRCas systems and their signature genes

1343

1343

1343

1343

Helicase1203

HD-f

Cas2

Cas6

Cas6

Cas5

Cas8

Cas9

Cas1

Cas3

Cas10

Cas11

Cas7

Cas6

Cas6

Cas5 Cas6

Cas4

Cas6

13433513

1343

RuvC-f

3513

HNH

13433513

1857

18571857 RAMP1688

RAMP1688

RecB-f1468

RecB-f1468

Helicase1203

R AMP1583

1857 1857Helicase1203

1857Helicase1203

1343 y1724

ygcKygcL

EcoliCASS2

YpestCASS3

NmeniCASS4

MtubeCASS6 RAMP module

DvulgCASS1

Tneap and HmariCASS7

ApernCASS5

Type I (Helicase cassette)

Cascade

Type II (HNH-type)

Type III (RAMP module or Polymerase cassette)

CASS4a

1857 1343RecB-f1468

Helicase1203

BH0338MTH1090LA3191

1421 RAMP1567

RAMP15833337RAMP

1583

RAMP1583

RAMP1769

SPy1049

Cas6

Cas61857

RecB-f1468

RAMP1688

Helicase1203

AF0070

R AMP1583

1343

1517

Cas12 Cas13

RAMP1688

RAMP1688

RAMP1688

Makarova et al NRM submitted

Experimental data on CRISPRCas systems

1857

1857

1857

RAMP1688

10

Cordip DIP0037Bacfra BF3951

Wolsuc WS1444Camjej Cj1522c

Neimen NMA0630Pasmul PM1126

Strthe str0658Strpyo Spy 0770

Treden TDE0328Staepi SERP2463

Mycgal MGA 0523Mycmob MMOB0320

Porgin PG1982Legpne lpp0161

Wolsuc WS1615Sulsol SSO1450

Aerper APE1240Sulsol SSO1405

Pyraer PAE0200Pyraer PAE0081

Metmaz MM0559Metmaz MM3249

Arcful AF1878Mansuc MS1635

Niteur NE0111Myctub Rv2817c

Nostoc alr0381Synech slr7071Thethe TTHB145

Chrvio CV1229Xanaxo XAC3842

Chltep CT1130Desvul DVUA0134

Metcap MCA0651Azoarc ebA3283

Deigeo_DRAFT_1682Mansuc MS0981

Baccla ABC3592Bachal BH0341

Strpyo Spy 1286Vibvul VVA1544

Nostoc alr1468Synech slr7092

Synech slr7016Geosul GSU0057

Thethe TTHB224Nostoc alr1568

Metkan MK1312

Esccol b2755Saltyp STM2938

Phopro PBPRC0034Geosul GSU1392

Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639

Symthe_STH663Corjei jk0643

Nocfar nfa44220Strave SAV7537

Metcap MCA0930Cordip DIP2214

Chrvio CV1756Zymmob ZMO0680

Pasmul PM0311Erwcar ECA3679

Yerpes y1722Legpne lpl2837Phopro PBPRB1995

Acinet ACIAD2484

Clotet CTC01148Fusnuc FN1177

Aquaeo aq 369Theten TTE2658Themar TM1797

Bacfra BF2544Porgin PG2014

Halmar pNG4053Clotet CTC01463

Metjan MJ0378Pyrhor PH1245

Thekod TK0455Metthe MTH1084

Arcful AF2435Metace MA3670

Nanequ NEQ017Pyrhor PH0173

Pictor PTO0003Pictor PTO0049Thevol TVN0106

1343

1343

1343

1343

1518

1518

1343

1518

RecB-f4343

RecB-f1468

RecB-f1468

RecB-f1468

Helicase12031857 RAMP

1688

RAMP1688

Helicase1203

Helicase1203

RAMPBH0337

RAMPYgcH

RAMPy1727

RAMPy17261857

Helicase1203

HD-f

Helicase1203

HD-f

1343

HD-f

HD-f

HD-f

RuvC-f

AF0070

BH0338

y1724

ygcKygcL

RAMP1583

HTH

AF1870

SPy1049

3574

AF1870

PH0918

HTH 3574

1857 RAMP1688

RecB-f1468

Helicase1203

HD-f

AF1870

MTH1090

AF1873

RAMP1583

3574

AF1870

3574

1343

1

7

2

3

4

5

6

7a

Sulsol SSO1999Sulsol SSO1440

Sultok ST2639Sultok ST0032

Sulsol SSO1402Pyraer PAE0068

Arcful AF1874Pyrhor PH0917Thekod TK0450

Pyraby PAB1689

13433513

3513

1518

RuvC-f

HNH

4aHNH

1 0

Pyrhor PH0162

6

64

6

4

16

17

7

7

7

7

1

2

7

7

7

7

5 7

5 75 7

5 75 7

5 7

6 1

7

Metkan MK1314

Metace MA1931

Metmaz MM3356

Metbar Mbar_A1355

Strthe stu0960Myctub Rv2823cStaepi SERP2461

Themar TM1811Mansuc MS1651

Niteur NE0123Thethe TTP0102

Metthe MTH1082

Metthe MTH326

Metjan MJ1672Pictor PTO0053Thevol TVN0112

Metkan MK1297Thethe TTP0115

Bachal BH0328Synech sll7090

Aquaeo aq 387Theten TTE2639

Themar TM1794Arcful AF1867

Pyrfur PF1129Thefus Tfu1578

Porgin PG1987Sultok ST1979

Sulsol SSO1991Sulsol SSO1729

Sultok ST0012Sulaci Saci 2046Nostoc all1479

1421 RAMP1567

RAMP1583

RAMPMTH323

RAMP1769

XMTH324

ldquoPolymeraserdquo cassette

ldquoHelicaserdquo cassette(A)

(B)

(C)

Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4

Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56

Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)

mRNA targeting

Phylogeny of Cas1 and the 3 types of CRISPRCas systems

228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes

EcoliCASS2Type IType I

Type I

Type I

Type I

Type III

Type III

Type III

Type III

Type III

Type III

Type II

YpestCASS3

NmeniCASS4

PolymeraseRAMPmoduleMtubeCASS6

PolRAMP

DvulgCASS1

TneapHmariCASS7

ApernCASS5

cas1cas9 cas2 cas4

cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI

II

IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2

RAMP

Makarova et al NRM submitted

CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria

bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes

54

256

1537

26

516

2

9

7 56107

3

10

13

383

210

46

216

8

1

7 70211

10

1

0102030405060708090

100

1623

84 6

17 7

2322

0

9

0

0

151

2

1

21

17

20 1

0

1533 28 7 14

0

9 7 40

117 210

0102030405060708090

100

Type I Type II Type IIICas1presenceabsence

Makarova et al in preparation

Modular evolution of the 3 types of CRISPRCas

Makarova et al in preparation

Mol Microbiol 2011 Jan79(2)484-502

A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair

Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF

Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair

Back to repair functions

The 3 major modalities of evolution

Koonin Wolf Biol Direct 2009

CRISPRCas as a bona fide Lamarckian system

Koonin Wolf Biol Direct 2009

Phenomenon Biological rolefunction

Phyletic spread

Lamarckian criteria

Genomic changes caused by environmental factor

Changes are specific to relevant genomic loci

Changes provide adaptation to the causative factor

Bona fide LamarckianCRISPRCas Defense against

viruses and other mobile elements

Most of the Archaeaand many bacteria

Yes Yes Yes

piRNA Defense against transposable elements in germline

Animals Yes Yes Yes

HGT (specific cases)

Adaptation to new environment stress response resistance

Archaea bacteria unicellular eukaryotes

Yes Yes Yes

Quasi-LamarckianHGT (general phenomenon)

Diverse innovations Archaea bacteria unicellular eukaryotes

Yes No Yesno

Stress-induced mutagenesis

Stress responseresistanceadaptation to new conditions

Ubiquitous Yes No or partially

Yes (but general evolvabilityenhanced as well)

Diverse Lamarckian and quasi-Lamarckian phenomena

Koonin Wolf Biol Direct 2009

Stress as a gauge of evolutionary modality

Koonin Wolf Biol Direct 2009

bullEvolution of parasites is intrinsic to any replicator system

bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since

bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms

bullPerennial arms race between parasites and hosts is one of the principal factors of evolution

Valerian DoljaOregon State University

Yuri Wolf NCBITatiana Senkevich NIAID NIH

Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille

Didier Raoult et al

Bill MartinUniv Duesseldorf

Kira Makarova NCBI

  • Slide Number 1
  • What is a virus
  • Slide Number 3
  • Slide Number 4
  • Slide Number 5
  • Comparative genomics shows that viruses that cause human diseases belong to families that evolved hundreds of millions or even billions years agoViruses accompany the evolving cellular life throughout its history and might even predate it
  • Slide Number 7
  • Slide Number 8
  • Slide Number 9
  • Slide Number 10
  • Slide Number 11
  • Slide Number 12
  • Slide Number 13
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Slide Number 18
  • Slide Number 19
  • Slide Number 20
  • Slide Number 21
  • Slide Number 22
  • Slide Number 23
  • Slide Number 24
  • Slide Number 25
  • Slide Number 26
  • Slide Number 27
  • Slide Number 28
  • Slide Number 29
  • Slide Number 30
  • Slide Number 31
  • Slide Number 32
  • Slide Number 33
  • Slide Number 34
  • Slide Number 35
  • Slide Number 36
  • Evolution of antivirus defense systems
  • Slide Number 38
  • Slide Number 39
  • Slide Number 40
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins extreme sequence diversity
  • Slide Number 46
  • Slide Number 47
  • Slide Number 48
  • Slide Number 49
  • Hypothesis
  • Slide Number 51
  • Slide Number 52
  • Slide Number 53
  • Slide Number 54
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • The three types of CRISPRCas systems and their signature genes
  • Experimental data on CRISPRCas systems
  • Phylogeny of Cas1 and the 3 types of CRISPRCas systems
  • CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
  • Slide Number 62
  • Slide Number 63
  • Slide Number 64
  • Slide Number 65
  • Slide Number 66
  • Slide Number 67
  • Slide Number 68
  • Slide Number 69
Page 16: The Virus World, its evolution, evolution of antiviral defense

Hypothesisorigin of the NCLDV (and other viruses ofEukaryotes) in the

the second melting pot of virus evolution ndasheukaryogenesis

Phage scaffold(virus hallmark genes)

Eukaryotic additionsdisplacements

Koonin Yutin Intervirology 2010

Poliovirus74 kb

VP0 3DPol2C 3CPro Angg

IRESVP1VP3 2APro 2B

3A 3B

Jelly-roll CPsSuperfamily 3

helicase RdRp

Viral hallmark genes

3C-Pro3DPol

RdRp3CPro g

VPg2C

S3H

Picornaviral lsquosignaturersquo genes

Protein sequence conservation

Some of the smallest viruses (this is where poliovirus belongs) The Big Bang of picorna-like virus evolution

Koonin Wolf Nagasaki Dolja Nat Rev Microbiol 2008 Dec6(12)925-39

Picorna-likeviruses possessdiverse arraysof hallmark andunique genes

Picorna-likeviral superfamily

Marine eukaryotic plankton carries a wealth ofpositive-strand RNA viruses nearly all belong to the picorna-like superfamily

Lang et al (2004) Virology 320206

HcRNAV44 kb

VP

HaRNAV86 kb

S-Pro Pol

VP1VP3VP2PolHel3

RsRNAV89 kb AnPolHel3 VP 1-3

SssRNAV90 kb An

PolHel3 VP 1-3C-Pro

sg sg

An

Nagasaki et al (2005) Appl Env Microbiol 718888

Culley et al (2006) Science 3121795

The amended Picornavirus-like

superfamily includes 14 recognized viral families

4 floating genera and15 unclassified positive-

strand anddouble-strand RNA

viruses that infect hosts from 4 of the 5 eukaryotic

supergroups-6 distinct clades

from RdRp phylogeny-diverse genome layouts

View from the viral side

6 major clades of picorna-likeviruses - 5 infect eukaryotesfrom different supergroups

The 5 supergroups of eukaryotes and their picorna-like viruses

Complementary view from thehost side

4 of the 5 eukaryoticsupergroups host diverse picorna-like viruses (no onechecked the 5th)

PolioV74 kb

VP0 3DPol2C 3CPro Angg

IRESVP1VP3 2APro 2B

3A 3B

Jelly-roll CPsSuperfamily 3

helicase RdRp

DNA phages Bacterial group II

retroelements

Bacterial and mitochondrialHtrA family proteases

Likely origins of the signature genes of the picorna-like superfamily were inferred usingsignature PSSMs and PSI-BLAST searches

3C-Pro

Hallmark genes of picorna-like viruses have distinct prokaryoticorigins

Radiation of major viralclades occurred in aldquoBig Bangrdquo during eukaryogenesis and antedates the divergenceof eukaryotic supergroups-the viruses then ldquosampledrdquo the hosts

Conclusions on the picorna and NCLDV storiesBig Bangs of virus evolution

bullPhylogenomic analysis of the picorna-like superfamily of eukaryotic RNA viruses indicates that the major lineages within this superfamily diverged prior to the divergence of the eukaryotic supergroupsbullBig Bang ndash an explosive early phase of viral evolutionbullMost likely the same pattern holds for other major groups ofviruses as illustrated by the evolutionary study of NCLDVa completely different group of virusesbullThe Big Bangs of eukaryotic virus evolution occurred concomitantly with a similar rapid phase of host evolution andcould be a manifestation of a general model of major evolutionarytransitions

Koonin The Biological Big Bang model for the major transitions in evolutionBiol Direct 2007 Aug 20221

Koonin Wolf Nagasaki Dolja Nat Rev Microbiol 2008 Dec6(12)925-39

+

1 Positive-strandRNA

3-30 kb - +R R ET

T+R

Class Replication cycle

2 Double-strandRNA

4-25 kb plusmnTr R E

T

+ plusmn

3 Negative-strandRNA

11-20 kbR R E

RdRp CPT

+-Tr -

+

Genetic cycles of RNA viruses

Diversity of viral genetic cycles versus the uniform genetic cycle of all cellular organisms Viruses are the

biospherersquos laboratory of genomic strategies

RdRp CP

RdRp CP

+4 Retroid

RNA viruses 7-12 kb +RT Tr E

T+Tr

plusmn

5 Retroid DNAviruses

elements2-10 kb +Tr

T+Tr

plusmnE RT

plusmn

RT CP

RT

Class Replication cycle

Genetic cycles of retroid viruses and retroelements

Genetic cycles of DNA viruses and plasmids 6 ssDNAvirusesplasmids2-11 kb

7 dsDNA virusesplasmids

5-1200 kb

+RCR RCR E

+Tr

plusmn +

TRCE

Tr

plusmn

+T

plusmnR E

Pr-Pol ALL CELLULARORGANISMS

YOU ARE HERE

I Genes with readily detectable homologs from cellular life forms 1 Genes with closely related homologs from cellular organisms

(typically the host of the given virus) present in a narrow group of viruses2 Genes that are conserved within a virus lineage or even several

lineages and have moderately close cellular homologsOrigin relatively recent (1) or ancient (2) acquisition from host

II Virus-specific genes3 ORFans ie genes without detectable homologs except possibly

in closely related viruses 4 Virus-specific genes that are conserved within a virus lineageAcquisition from host but with rapid divergence from ancestor once within viral genomes

III Viral hallmark genes5 Genes shared by many diverse virus lineages with only very distant

homologs in cellular organisms

Natural history of viral genes a one-page summary of viral comparative genomics

Contributions of different classes of viral genes to the genomes of different classes of viruses strong dependence on genome size

Genome size (log) 0 1 2 3

Recent acquisitions

Old acquisitions

Virus-specific ORFans

Virus-specific conserved

Virus hallmark

Most RNAvirusesretroelementsRCR replicons

Large RNA virusesadenotailedphages

NCLDVherpeslargephages

Natural history of viral genesViral Hallmark Genes

Shared by many diverse groups of viruses from the smallest RNA viruses to the giant DNA viruses

Strong support for monophyly of all viral members of the respective gene families

Only distant homologs in cellular organisms

Can be viewed as signatures of the lsquovirus statersquo

Play major roles in genome replication packagingand assembly

4 Rolling circle replicationinitiation endonuclease

1 Jelly-roll capsid protein

2 Superfamily 3 helicase

Protein products of viral hallmark genes

3 RNA-dependent RNA polymeraseand Reverse transcriptase

5 Viral archaeo-eukaryotic DNA primase

6 UL9-like superfamily 2 helicase

7 Packaging ATPase of the FtsK family

8 ATPase subunit of terminase

The primordial gene pool hypothesis (extremely counterintuitive ndash Santiago Elena Feb 16 2011)

The hallmark genes AND by implication the major lineages of modern viruses (at least viruses of prokaryotes) descend directly from a primordial gene pool

Koonin Senkevich Dolja Biol Direct 2006 129

Viral hallmark genes arebullpresent in a huge diversity of viruses and other selfish elementsbullrepresented only by remote homologs in cellular life forms

-synergy with the diversity of genomic strategies

A crucial corollary If viruses come directly from a primordial gene pool then origin of viruses isinextricably linked to the origin of cells

Cell degeneration Escaped genes Pr imordial geneticsystems

CELL

SMALLPARASITICCELL

VERYSMALLVIRUS

CHROMOSOME

PLASMID

VIRUS

RNA

VIRUS

PRE-CELLULAR LIFE FORMS

DNA

VIRUS

mRNA

VIRUS

Competing concepts of the or igin of viruses

CELLS

Origin and evolution of virus-like genetic

elementsin the pre-cellular

era

Koonin Martin TIG 2005Koonin Senkevich Dolja Biol Direct 2006 129Koonin Ann NY Acad Sci 2009

Replicon fusionyields chromosomes

bullViruses and virus-like genetic elements are not ldquojustrdquo pathogens they aredominant entities in the biospherebullEmergence of virus-like parasites is inevitable in any replicating systembullIn the pre-cellular epoch the genetic elements that later became viral andcellular genomes comprised a single pool in which they mixed matched andevolved new increasingly complex gene ensemblesbullDifferent replication strategies including RNA replication reverse transcriptionand DNA replication evolved already in the primordial genetic poolbullWith the emergence of prokaryotic cells a distinct pool of viral genes formed that retained its identity ever since as evidenced by the extant distribution ofviral hallmark genes ldquovirus worldrdquo or the virospherebullThe emergence of the eukaryotic cell was a second melting pot of virus evolution from which viruses of eukaryotes originated via recombination of genes from prokaryote viruses retroelements and the evolving eukaryotic hostbullViruses make essential contributions to the evolution of the genomes of cellularlife forms in particular as vehicles of HGT GTAs transducing phages

The ancient Virus World (VW)

Koonin EV Senkevich TG Dolja VV The ancient Virus World and evolution of cells Biol Direct 2006 Koonin EV Wolf YI Nagasaki K Dolja VV The complexity of the virus world Nat Rev Microbiol 2009

Evolution of antivirus defense systems

bull CRISPRCas system of adaptive immunity in prokaryotes

bull A case for Lamarckian evolutionbull The perennial virus-host arms race

CRISPR repeats and Cas genesCRISPR Clustered Regularly interspaced short palindromic repeatsCas CRISPR-associated (genes)

Sorek et al Nature Rev Microbiol 2008

Makarova KS Aravind L Grishin NV Rogozin IB Koonin EVA DNA repair system specific for thermophilic Archaea and bacteria predicted by genomic context analysisNucleic Acids Res 2002 Jan 1530(2)482-96

During a systematic analysis of conserved gene context in prokaryotic genomes a previously undetected complex partially conserved neighborhood consisting of more than 20 genes was discovered in most Archaea and some bacteria including the hyperthermophiles Thermotoga maritima and Aquifex aeolicus The gene composition and gene order in this neighborhood vary greatly between species but all versions have a stable conserved core that consists of five genes One of the core genes encodes a predicted DNA helicase often fused to a predicted HD-superfamily hydrolase and another encodes a RecB family exonuclease three core genes remain uncharacterized but one of these might encode a nuclease of a new family helliphelliphelliphellipThe functional features of the proteins encoded in this neighborhood suggest that they comprise a previously undetected DNA repair system which to our knowledge is the first repair system largely specific for thermophiles to be identified

Protein components of the system an update with unification of many diverse families

~25 families altogetherFamily SubfamilyA Phyletic

distributionBComments

1 COG1518 COG1518 (cas1) All Putative novel nucleaseintegrase Mostly α-helical protein

2 COG1343 COG1343 (cas2) COG3512ygbF-like

MTH324-likey1723_N-like

All Small protein related to VapD fused to helicase (COG1203) iny1723-like proteins

3 COG1203 COG1203 (cas3) All DNA helicase Most proteins have fusion to HD nuclease

4 RecB-like nuclease

COG1468 (cas4) COG4343 All RecB-like nuclease Contains three-cysteine C-terminal cluster

5 RAMP Repair-associated mysterious

protein

COG1688 (cas5)COG1769 COG1583COG1567 COG1336COG1367 COG1604COG1337 COG1332COG5551

BH0337-likeMJ0978-likeYgcH-like

y1726-like y1727-like

All Belong to ldquoRAMPrdquo family possibly RNA-binding proteinstructurally related to duplicated ferredoxin fold (PDB 1wj9)

6 COG1857 COG1857 COG3649 YgcJ-like y1725-like

All αβ protein predicted nuclease or integrase

7 HD-like nuclease

COG1203 (N-terminus) COG2254 All HD-like nuclease

8 BH0338 BH0338-likeMTH1090-like

All mostlyarchaea and FIRM

Large Zn-finger containing proteins probably nucleases (nucleaseactivity was shown for MTH1090 (ref)

9 COG1353 predicted

polymerase

COG1353 Most archaea some bacteria

Predicted palm-domain polymerase distantly related to viral RdRp and RT

hellipand ~15 other less common proteins Makarova et al BD 2006

CRISPR clustered regularly interspaced short palindromic repeats

CRISPR repeats

TB and TA

Strepto-like

Ecoli-like

Pasteurella-like

Cas2COG1343

Cas1COG1518

Cas4COG1468

Cas3COG1203

ldquoThe common structural characteristics of CRISPR loci are (i) the presence of multiple short direct repeats which show no or very little sequence variation within a given locus (ii) the presence of non-repetitive spacer sequences between the repeats of similar size (iii) the presence of a common leader sequence of a few hundred basepairs in most species harbouring multiple CRISPR loci (iv) the absence of long open reading frames within the locus and (v) the presence of the cas1 gene accompanied by the cas2 cas3 or cas4 genes in CRISPR-containing speciesrdquo

The cas genes are our ldquorepairrdquo system

ldquoHelicaserdquo cassette

ldquoPolymeraserdquo cassette

paragon of prokaryotic genome plasticity-extensive gene shuffling-evidence of multiple horizontal transfers-widespread gene loss

RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins

extreme sequence diversity MOTIFs specific I II III IV V COGs 13361367 hhshhGs ust-lKGhh+hh hhGtt hD lGhttsGh 160413371332 y1726-like slhlpEKuVRGT lRTIDs YGuVTsGhuh COG1851 hGphpGpsaFh hGFGRh BH0337-like TpA-h+GIh-uIh hhLpDV LGsREhuht COG1567 hhhpp ups-lhtAhh luscoGhGh COG1769 hhhh+Ph-hhh ss-hhGhlhsh hGh hGtcp+hsthchp COG1688 (Cas5) hhhhhts sssshhGhlsh lGttphh COGs 15835551 hhhhoPhhl hGtppshGFGl YgcH-like hHphlh hGu+uhGhGhh y1727-like LHphLh GFsthGLStss MJ0978-like hHNH lG+tsuhGhGol

Ferredoxin foldRNA-recognition motif (RRM)

The sequence similarity space of CRISPR repeats visualized with the BioLayoutprogram [26] Dots denote individual repeat sequences connecting lines represent Smith-Waterman similarities such that closer dots represent more similar sequences Dot colors denote cluster association as derived from MCL clustering The 12 largest clusters are indicated by circles together with their sequence logos coarse phylogenetic composition and sample secondary structures where applicableKunin et al Genome Biology 2007 8R61 doi101186gb-2007-8-4-r61

CRISPR show extreme diversity and complex clustering

Arcful GTTGAAATC-------AGACCAAAATGGGATTGAAAG- Metthe GTTAAAATC-------AGACCAAAATGGGATTGAAAT- Pyraby GTTCCAATA-------AGACTAGAATAGAATTGAAAG- Pyrhor GTTTAATAA--------GACTAAAATAGAATTGAAAG- Sulsol GATTAATCC-------------CAAAAGAATTGAAAG- Pyrhor GTTTCCGTA--------GAACTAAATAGTGTGGAAAG- Pyraby GTTTCCGTA--------GAACTTAGTAGTGTGGAAAG- Pyraer GTTTCAACT------------ATCTTTTGATTTTTGG- Pictor GTTAAATAA-------TAACCTAAATAGGATTGAAAG- Theaer GTAAAATAG---------ACCTTAATAGGATTGAAAG- Metace GTTTCAATC-------CCTCAAAGGTCTGATTTTAAC- Sultok GATGAATCC------------CAAAAGGAATTGAAAG- Sulaci GTTTTAGTT-------------TCTTGTCGTTATTAC- Pictor GTTTAAGAA--------TTACTAGATAGTATGGAGT-- Halmar GTTTCAGAC-------GGACCCTTGTGGGGTTGAAGC- Metjan ATTAAAATC-------AGACCGTTTCGGAATGGAAAT- Metkan GTTTCATTA-CCCGTATTATTACGGGTTAATTGCGAG- Nanequ CTTTCAATA---------TTTCTAATATATTAGAAAC- Themar GTTTCAATA-------CTTCCTTAGAGGTATGGAAAC- Theten GTTTCAATC-------CCTTTTAGGTAGGCTAAAAAC Thethe GTTGCAAAC-CTCGTTAGCCTCGTAGAGGATTGAAAC- Aquaeo GTTTTAACT--------CCACACGGTACATTAGAAAC- Bachal GTCGCACTC----TATATGGGT-GCGTGGATTGAAAT- Azoarcus GTGTTCCCC-------GCGCATCGCGGGGGTTGAAG-- Chltep GTCTTCCCC--------ACGCC-CGTGGGGGTGTTTC- Chrvio GTGCTCCCC-------CACGCA-CGTGGGGATGAACCG Clotet GTATTAGTA-------GCACCA-TATTGGAATGTAAAT Anabaena GTTTTAATTAACAAAAATCCCTATCAGGGATTGAAAC- Myctub GTTTCCGTC--CCCTCTCGGGGTTTTGGGTCTGACGAC Yerpes GTTCACTGC--------CGCACAGGCAGCTTAGAAA-- Metcap GTTTCAATCCACTCCCGGCTATTTAGCCGGGAGATAC- Mycgal GTTTTAGCA-CTGTACAATACTTGTGTAAGCAATAAC- Pasmul GTTAACTGC--------CGTATAGGCAGCTTAGAAA-- Pasmul GTTGTAGTTCCCTCTCTCATTTCGCAGTGCTACAAT-- Pasmul GTTCACCAT--------CGTGTAGATGGCTTAGAAA--

All

A large subset of CRISPRs is conserved among diverse specieseven between archaea and bacteria suggesting that

CRISPRs are horizontally transferred together with cas genes

Mojica FJ Diez-Villasenor C Garcia-Martinez J Soria E

Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elementsJ Mol Evol 2005 Feb60(2)174-82

Here we show that CRISPR spacers derive from preexisting sequences either chromosomal or within transmissible genetic elements such as bacteriophages and conjugative plasmids

helliphelliphelliphellipThe transcription of the CRISPR loci (Tang et al 2002) suggests that such activity could be executed by CRISPR-RNA molecules acting as regulatory RNA that specifically recognizes the target through the homologous RNA-spacer sequence similarly to the eukaryotic interference RNA

HypothesisCRISPRCasbull is a prokaryotic immunity system that functions on the

RNAi principle

bull integrates short fragments of essential phageplasmid genes into CRISPRs

bull When expressed these fragments (psiRNA ndash after prokaryotic siRNA) silence the target gene and make the organism immune to the respective agent

bull contains all or most of the protein activities involved in these processes

bull Some of the Cas proteins are functional analogs of the eukaryotic proteins involved in RNAi in particular components of RISC (RNA-Induced Silencing Complex) and form prokaryotic analogs of RISC (pRISCs)

Transcription

(protein-guided) RNA folding

specific transcription factor (COG1517)

cellular RNApol

polycistronic pre-psiRNA

p-dicer (Helicase+HD-Hydrolase)(COG1203 COG2254)

RNA processing(fast)

pre-psiRNA75-100 nt

RNA processing(slow)

p-dicer

psiRNA25-45 nt

p-slicer(COG1468 COG4343

COG1857)

RAMP

target RNA cleavage

p-RISC

plasmid or phage mRNA

Annealing to RNA target

p-RISC

The basic scheme of CASS functioning

psiRNA25-45 nt

RAMP

RAMP-RNA complex(unstable)plasmid or phage

mRNA

Primer elongationCASS RNApol(COG1353)

Amplified annealing

long dsRNA (stable)

p-dicerDuplex degradation

RAMPRAMP binding

Cycle continues

Annealing to RNA target

Variant of CASS functioning with polymerasepsi-RNA amplification(mostly in thermophiles but also in Mycobacteria)

plasmid or phage mRNA

pre-psiRNA75-100 nt

CASS RNApol(COG1353)Or other RT

Reverse transcription with random copy choice

Random RNA recombination and reverse transcription

dsDNA with CRISPRs and target-derived spacers

Homologous recombination with genomic CRISPR region

integrase(COG1518)

genomic DNA

genomic DNA with new target-derived spacer

OR

New psiRNA generation

Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12

Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity

Key validation

bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease

Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes

(1) Adaptation

(2) Expression amp Processing

(3) Interference

invading virus plasmid

CRISPRleader

5 4 3 2 1

6 5 4 3 2 1

invading virus plasmid

invading virus plasmid

host

host

cas operon

Cascade

Cas3

cas3 cas2cas1cas5

Van der Oost et al TIBS 2009

The three types of CRISPRCas systems and their signature genes

1343

1343

1343

1343

Helicase1203

HD-f

Cas2

Cas6

Cas6

Cas5

Cas8

Cas9

Cas1

Cas3

Cas10

Cas11

Cas7

Cas6

Cas6

Cas5 Cas6

Cas4

Cas6

13433513

1343

RuvC-f

3513

HNH

13433513

1857

18571857 RAMP1688

RAMP1688

RecB-f1468

RecB-f1468

Helicase1203

R AMP1583

1857 1857Helicase1203

1857Helicase1203

1343 y1724

ygcKygcL

EcoliCASS2

YpestCASS3

NmeniCASS4

MtubeCASS6 RAMP module

DvulgCASS1

Tneap and HmariCASS7

ApernCASS5

Type I (Helicase cassette)

Cascade

Type II (HNH-type)

Type III (RAMP module or Polymerase cassette)

CASS4a

1857 1343RecB-f1468

Helicase1203

BH0338MTH1090LA3191

1421 RAMP1567

RAMP15833337RAMP

1583

RAMP1583

RAMP1769

SPy1049

Cas6

Cas61857

RecB-f1468

RAMP1688

Helicase1203

AF0070

R AMP1583

1343

1517

Cas12 Cas13

RAMP1688

RAMP1688

RAMP1688

Makarova et al NRM submitted

Experimental data on CRISPRCas systems

1857

1857

1857

RAMP1688

10

Cordip DIP0037Bacfra BF3951

Wolsuc WS1444Camjej Cj1522c

Neimen NMA0630Pasmul PM1126

Strthe str0658Strpyo Spy 0770

Treden TDE0328Staepi SERP2463

Mycgal MGA 0523Mycmob MMOB0320

Porgin PG1982Legpne lpp0161

Wolsuc WS1615Sulsol SSO1450

Aerper APE1240Sulsol SSO1405

Pyraer PAE0200Pyraer PAE0081

Metmaz MM0559Metmaz MM3249

Arcful AF1878Mansuc MS1635

Niteur NE0111Myctub Rv2817c

Nostoc alr0381Synech slr7071Thethe TTHB145

Chrvio CV1229Xanaxo XAC3842

Chltep CT1130Desvul DVUA0134

Metcap MCA0651Azoarc ebA3283

Deigeo_DRAFT_1682Mansuc MS0981

Baccla ABC3592Bachal BH0341

Strpyo Spy 1286Vibvul VVA1544

Nostoc alr1468Synech slr7092

Synech slr7016Geosul GSU0057

Thethe TTHB224Nostoc alr1568

Metkan MK1312

Esccol b2755Saltyp STM2938

Phopro PBPRC0034Geosul GSU1392

Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639

Symthe_STH663Corjei jk0643

Nocfar nfa44220Strave SAV7537

Metcap MCA0930Cordip DIP2214

Chrvio CV1756Zymmob ZMO0680

Pasmul PM0311Erwcar ECA3679

Yerpes y1722Legpne lpl2837Phopro PBPRB1995

Acinet ACIAD2484

Clotet CTC01148Fusnuc FN1177

Aquaeo aq 369Theten TTE2658Themar TM1797

Bacfra BF2544Porgin PG2014

Halmar pNG4053Clotet CTC01463

Metjan MJ0378Pyrhor PH1245

Thekod TK0455Metthe MTH1084

Arcful AF2435Metace MA3670

Nanequ NEQ017Pyrhor PH0173

Pictor PTO0003Pictor PTO0049Thevol TVN0106

1343

1343

1343

1343

1518

1518

1343

1518

RecB-f4343

RecB-f1468

RecB-f1468

RecB-f1468

Helicase12031857 RAMP

1688

RAMP1688

Helicase1203

Helicase1203

RAMPBH0337

RAMPYgcH

RAMPy1727

RAMPy17261857

Helicase1203

HD-f

Helicase1203

HD-f

1343

HD-f

HD-f

HD-f

RuvC-f

AF0070

BH0338

y1724

ygcKygcL

RAMP1583

HTH

AF1870

SPy1049

3574

AF1870

PH0918

HTH 3574

1857 RAMP1688

RecB-f1468

Helicase1203

HD-f

AF1870

MTH1090

AF1873

RAMP1583

3574

AF1870

3574

1343

1

7

2

3

4

5

6

7a

Sulsol SSO1999Sulsol SSO1440

Sultok ST2639Sultok ST0032

Sulsol SSO1402Pyraer PAE0068

Arcful AF1874Pyrhor PH0917Thekod TK0450

Pyraby PAB1689

13433513

3513

1518

RuvC-f

HNH

4aHNH

1 0

Pyrhor PH0162

6

64

6

4

16

17

7

7

7

7

1

2

7

7

7

7

5 7

5 75 7

5 75 7

5 7

6 1

7

Metkan MK1314

Metace MA1931

Metmaz MM3356

Metbar Mbar_A1355

Strthe stu0960Myctub Rv2823cStaepi SERP2461

Themar TM1811Mansuc MS1651

Niteur NE0123Thethe TTP0102

Metthe MTH1082

Metthe MTH326

Metjan MJ1672Pictor PTO0053Thevol TVN0112

Metkan MK1297Thethe TTP0115

Bachal BH0328Synech sll7090

Aquaeo aq 387Theten TTE2639

Themar TM1794Arcful AF1867

Pyrfur PF1129Thefus Tfu1578

Porgin PG1987Sultok ST1979

Sulsol SSO1991Sulsol SSO1729

Sultok ST0012Sulaci Saci 2046Nostoc all1479

1421 RAMP1567

RAMP1583

RAMPMTH323

RAMP1769

XMTH324

ldquoPolymeraserdquo cassette

ldquoHelicaserdquo cassette(A)

(B)

(C)

Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4

Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56

Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)

mRNA targeting

Phylogeny of Cas1 and the 3 types of CRISPRCas systems

228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes

EcoliCASS2Type IType I

Type I

Type I

Type I

Type III

Type III

Type III

Type III

Type III

Type III

Type II

YpestCASS3

NmeniCASS4

PolymeraseRAMPmoduleMtubeCASS6

PolRAMP

DvulgCASS1

TneapHmariCASS7

ApernCASS5

cas1cas9 cas2 cas4

cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI

II

IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2

RAMP

Makarova et al NRM submitted

CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria

bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes

54

256

1537

26

516

2

9

7 56107

3

10

13

383

210

46

216

8

1

7 70211

10

1

0102030405060708090

100

1623

84 6

17 7

2322

0

9

0

0

151

2

1

21

17

20 1

0

1533 28 7 14

0

9 7 40

117 210

0102030405060708090

100

Type I Type II Type IIICas1presenceabsence

Makarova et al in preparation

Modular evolution of the 3 types of CRISPRCas

Makarova et al in preparation

Mol Microbiol 2011 Jan79(2)484-502

A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair

Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF

Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair

Back to repair functions

The 3 major modalities of evolution

Koonin Wolf Biol Direct 2009

CRISPRCas as a bona fide Lamarckian system

Koonin Wolf Biol Direct 2009

Phenomenon Biological rolefunction

Phyletic spread

Lamarckian criteria

Genomic changes caused by environmental factor

Changes are specific to relevant genomic loci

Changes provide adaptation to the causative factor

Bona fide LamarckianCRISPRCas Defense against

viruses and other mobile elements

Most of the Archaeaand many bacteria

Yes Yes Yes

piRNA Defense against transposable elements in germline

Animals Yes Yes Yes

HGT (specific cases)

Adaptation to new environment stress response resistance

Archaea bacteria unicellular eukaryotes

Yes Yes Yes

Quasi-LamarckianHGT (general phenomenon)

Diverse innovations Archaea bacteria unicellular eukaryotes

Yes No Yesno

Stress-induced mutagenesis

Stress responseresistanceadaptation to new conditions

Ubiquitous Yes No or partially

Yes (but general evolvabilityenhanced as well)

Diverse Lamarckian and quasi-Lamarckian phenomena

Koonin Wolf Biol Direct 2009

Stress as a gauge of evolutionary modality

Koonin Wolf Biol Direct 2009

bullEvolution of parasites is intrinsic to any replicator system

bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since

bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms

bullPerennial arms race between parasites and hosts is one of the principal factors of evolution

Valerian DoljaOregon State University

Yuri Wolf NCBITatiana Senkevich NIAID NIH

Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille

Didier Raoult et al

Bill MartinUniv Duesseldorf

Kira Makarova NCBI

  • Slide Number 1
  • What is a virus
  • Slide Number 3
  • Slide Number 4
  • Slide Number 5
  • Comparative genomics shows that viruses that cause human diseases belong to families that evolved hundreds of millions or even billions years agoViruses accompany the evolving cellular life throughout its history and might even predate it
  • Slide Number 7
  • Slide Number 8
  • Slide Number 9
  • Slide Number 10
  • Slide Number 11
  • Slide Number 12
  • Slide Number 13
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Slide Number 18
  • Slide Number 19
  • Slide Number 20
  • Slide Number 21
  • Slide Number 22
  • Slide Number 23
  • Slide Number 24
  • Slide Number 25
  • Slide Number 26
  • Slide Number 27
  • Slide Number 28
  • Slide Number 29
  • Slide Number 30
  • Slide Number 31
  • Slide Number 32
  • Slide Number 33
  • Slide Number 34
  • Slide Number 35
  • Slide Number 36
  • Evolution of antivirus defense systems
  • Slide Number 38
  • Slide Number 39
  • Slide Number 40
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins extreme sequence diversity
  • Slide Number 46
  • Slide Number 47
  • Slide Number 48
  • Slide Number 49
  • Hypothesis
  • Slide Number 51
  • Slide Number 52
  • Slide Number 53
  • Slide Number 54
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • The three types of CRISPRCas systems and their signature genes
  • Experimental data on CRISPRCas systems
  • Phylogeny of Cas1 and the 3 types of CRISPRCas systems
  • CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
  • Slide Number 62
  • Slide Number 63
  • Slide Number 64
  • Slide Number 65
  • Slide Number 66
  • Slide Number 67
  • Slide Number 68
  • Slide Number 69
Page 17: The Virus World, its evolution, evolution of antiviral defense

Poliovirus74 kb

VP0 3DPol2C 3CPro Angg

IRESVP1VP3 2APro 2B

3A 3B

Jelly-roll CPsSuperfamily 3

helicase RdRp

Viral hallmark genes

3C-Pro3DPol

RdRp3CPro g

VPg2C

S3H

Picornaviral lsquosignaturersquo genes

Protein sequence conservation

Some of the smallest viruses (this is where poliovirus belongs) The Big Bang of picorna-like virus evolution

Koonin Wolf Nagasaki Dolja Nat Rev Microbiol 2008 Dec6(12)925-39

Picorna-likeviruses possessdiverse arraysof hallmark andunique genes

Picorna-likeviral superfamily

Marine eukaryotic plankton carries a wealth ofpositive-strand RNA viruses nearly all belong to the picorna-like superfamily

Lang et al (2004) Virology 320206

HcRNAV44 kb

VP

HaRNAV86 kb

S-Pro Pol

VP1VP3VP2PolHel3

RsRNAV89 kb AnPolHel3 VP 1-3

SssRNAV90 kb An

PolHel3 VP 1-3C-Pro

sg sg

An

Nagasaki et al (2005) Appl Env Microbiol 718888

Culley et al (2006) Science 3121795

The amended Picornavirus-like

superfamily includes 14 recognized viral families

4 floating genera and15 unclassified positive-

strand anddouble-strand RNA

viruses that infect hosts from 4 of the 5 eukaryotic

supergroups-6 distinct clades

from RdRp phylogeny-diverse genome layouts

View from the viral side

6 major clades of picorna-likeviruses - 5 infect eukaryotesfrom different supergroups

The 5 supergroups of eukaryotes and their picorna-like viruses

Complementary view from thehost side

4 of the 5 eukaryoticsupergroups host diverse picorna-like viruses (no onechecked the 5th)

PolioV74 kb

VP0 3DPol2C 3CPro Angg

IRESVP1VP3 2APro 2B

3A 3B

Jelly-roll CPsSuperfamily 3

helicase RdRp

DNA phages Bacterial group II

retroelements

Bacterial and mitochondrialHtrA family proteases

Likely origins of the signature genes of the picorna-like superfamily were inferred usingsignature PSSMs and PSI-BLAST searches

3C-Pro

Hallmark genes of picorna-like viruses have distinct prokaryoticorigins

Radiation of major viralclades occurred in aldquoBig Bangrdquo during eukaryogenesis and antedates the divergenceof eukaryotic supergroups-the viruses then ldquosampledrdquo the hosts

Conclusions on the picorna and NCLDV storiesBig Bangs of virus evolution

bullPhylogenomic analysis of the picorna-like superfamily of eukaryotic RNA viruses indicates that the major lineages within this superfamily diverged prior to the divergence of the eukaryotic supergroupsbullBig Bang ndash an explosive early phase of viral evolutionbullMost likely the same pattern holds for other major groups ofviruses as illustrated by the evolutionary study of NCLDVa completely different group of virusesbullThe Big Bangs of eukaryotic virus evolution occurred concomitantly with a similar rapid phase of host evolution andcould be a manifestation of a general model of major evolutionarytransitions

Koonin The Biological Big Bang model for the major transitions in evolutionBiol Direct 2007 Aug 20221

Koonin Wolf Nagasaki Dolja Nat Rev Microbiol 2008 Dec6(12)925-39

+

1 Positive-strandRNA

3-30 kb - +R R ET

T+R

Class Replication cycle

2 Double-strandRNA

4-25 kb plusmnTr R E

T

+ plusmn

3 Negative-strandRNA

11-20 kbR R E

RdRp CPT

+-Tr -

+

Genetic cycles of RNA viruses

Diversity of viral genetic cycles versus the uniform genetic cycle of all cellular organisms Viruses are the

biospherersquos laboratory of genomic strategies

RdRp CP

RdRp CP

+4 Retroid

RNA viruses 7-12 kb +RT Tr E

T+Tr

plusmn

5 Retroid DNAviruses

elements2-10 kb +Tr

T+Tr

plusmnE RT

plusmn

RT CP

RT

Class Replication cycle

Genetic cycles of retroid viruses and retroelements

Genetic cycles of DNA viruses and plasmids 6 ssDNAvirusesplasmids2-11 kb

7 dsDNA virusesplasmids

5-1200 kb

+RCR RCR E

+Tr

plusmn +

TRCE

Tr

plusmn

+T

plusmnR E

Pr-Pol ALL CELLULARORGANISMS

YOU ARE HERE

I Genes with readily detectable homologs from cellular life forms 1 Genes with closely related homologs from cellular organisms

(typically the host of the given virus) present in a narrow group of viruses2 Genes that are conserved within a virus lineage or even several

lineages and have moderately close cellular homologsOrigin relatively recent (1) or ancient (2) acquisition from host

II Virus-specific genes3 ORFans ie genes without detectable homologs except possibly

in closely related viruses 4 Virus-specific genes that are conserved within a virus lineageAcquisition from host but with rapid divergence from ancestor once within viral genomes

III Viral hallmark genes5 Genes shared by many diverse virus lineages with only very distant

homologs in cellular organisms

Natural history of viral genes a one-page summary of viral comparative genomics

Contributions of different classes of viral genes to the genomes of different classes of viruses strong dependence on genome size

Genome size (log) 0 1 2 3

Recent acquisitions

Old acquisitions

Virus-specific ORFans

Virus-specific conserved

Virus hallmark

Most RNAvirusesretroelementsRCR replicons

Large RNA virusesadenotailedphages

NCLDVherpeslargephages

Natural history of viral genesViral Hallmark Genes

Shared by many diverse groups of viruses from the smallest RNA viruses to the giant DNA viruses

Strong support for monophyly of all viral members of the respective gene families

Only distant homologs in cellular organisms

Can be viewed as signatures of the lsquovirus statersquo

Play major roles in genome replication packagingand assembly

4 Rolling circle replicationinitiation endonuclease

1 Jelly-roll capsid protein

2 Superfamily 3 helicase

Protein products of viral hallmark genes

3 RNA-dependent RNA polymeraseand Reverse transcriptase

5 Viral archaeo-eukaryotic DNA primase

6 UL9-like superfamily 2 helicase

7 Packaging ATPase of the FtsK family

8 ATPase subunit of terminase

The primordial gene pool hypothesis (extremely counterintuitive ndash Santiago Elena Feb 16 2011)

The hallmark genes AND by implication the major lineages of modern viruses (at least viruses of prokaryotes) descend directly from a primordial gene pool

Koonin Senkevich Dolja Biol Direct 2006 129

Viral hallmark genes arebullpresent in a huge diversity of viruses and other selfish elementsbullrepresented only by remote homologs in cellular life forms

-synergy with the diversity of genomic strategies

A crucial corollary If viruses come directly from a primordial gene pool then origin of viruses isinextricably linked to the origin of cells

Cell degeneration Escaped genes Pr imordial geneticsystems

CELL

SMALLPARASITICCELL

VERYSMALLVIRUS

CHROMOSOME

PLASMID

VIRUS

RNA

VIRUS

PRE-CELLULAR LIFE FORMS

DNA

VIRUS

mRNA

VIRUS

Competing concepts of the or igin of viruses

CELLS

Origin and evolution of virus-like genetic

elementsin the pre-cellular

era

Koonin Martin TIG 2005Koonin Senkevich Dolja Biol Direct 2006 129Koonin Ann NY Acad Sci 2009

Replicon fusionyields chromosomes

bullViruses and virus-like genetic elements are not ldquojustrdquo pathogens they aredominant entities in the biospherebullEmergence of virus-like parasites is inevitable in any replicating systembullIn the pre-cellular epoch the genetic elements that later became viral andcellular genomes comprised a single pool in which they mixed matched andevolved new increasingly complex gene ensemblesbullDifferent replication strategies including RNA replication reverse transcriptionand DNA replication evolved already in the primordial genetic poolbullWith the emergence of prokaryotic cells a distinct pool of viral genes formed that retained its identity ever since as evidenced by the extant distribution ofviral hallmark genes ldquovirus worldrdquo or the virospherebullThe emergence of the eukaryotic cell was a second melting pot of virus evolution from which viruses of eukaryotes originated via recombination of genes from prokaryote viruses retroelements and the evolving eukaryotic hostbullViruses make essential contributions to the evolution of the genomes of cellularlife forms in particular as vehicles of HGT GTAs transducing phages

The ancient Virus World (VW)

Koonin EV Senkevich TG Dolja VV The ancient Virus World and evolution of cells Biol Direct 2006 Koonin EV Wolf YI Nagasaki K Dolja VV The complexity of the virus world Nat Rev Microbiol 2009

Evolution of antivirus defense systems

bull CRISPRCas system of adaptive immunity in prokaryotes

bull A case for Lamarckian evolutionbull The perennial virus-host arms race

CRISPR repeats and Cas genesCRISPR Clustered Regularly interspaced short palindromic repeatsCas CRISPR-associated (genes)

Sorek et al Nature Rev Microbiol 2008

Makarova KS Aravind L Grishin NV Rogozin IB Koonin EVA DNA repair system specific for thermophilic Archaea and bacteria predicted by genomic context analysisNucleic Acids Res 2002 Jan 1530(2)482-96

During a systematic analysis of conserved gene context in prokaryotic genomes a previously undetected complex partially conserved neighborhood consisting of more than 20 genes was discovered in most Archaea and some bacteria including the hyperthermophiles Thermotoga maritima and Aquifex aeolicus The gene composition and gene order in this neighborhood vary greatly between species but all versions have a stable conserved core that consists of five genes One of the core genes encodes a predicted DNA helicase often fused to a predicted HD-superfamily hydrolase and another encodes a RecB family exonuclease three core genes remain uncharacterized but one of these might encode a nuclease of a new family helliphelliphelliphellipThe functional features of the proteins encoded in this neighborhood suggest that they comprise a previously undetected DNA repair system which to our knowledge is the first repair system largely specific for thermophiles to be identified

Protein components of the system an update with unification of many diverse families

~25 families altogetherFamily SubfamilyA Phyletic

distributionBComments

1 COG1518 COG1518 (cas1) All Putative novel nucleaseintegrase Mostly α-helical protein

2 COG1343 COG1343 (cas2) COG3512ygbF-like

MTH324-likey1723_N-like

All Small protein related to VapD fused to helicase (COG1203) iny1723-like proteins

3 COG1203 COG1203 (cas3) All DNA helicase Most proteins have fusion to HD nuclease

4 RecB-like nuclease

COG1468 (cas4) COG4343 All RecB-like nuclease Contains three-cysteine C-terminal cluster

5 RAMP Repair-associated mysterious

protein

COG1688 (cas5)COG1769 COG1583COG1567 COG1336COG1367 COG1604COG1337 COG1332COG5551

BH0337-likeMJ0978-likeYgcH-like

y1726-like y1727-like

All Belong to ldquoRAMPrdquo family possibly RNA-binding proteinstructurally related to duplicated ferredoxin fold (PDB 1wj9)

6 COG1857 COG1857 COG3649 YgcJ-like y1725-like

All αβ protein predicted nuclease or integrase

7 HD-like nuclease

COG1203 (N-terminus) COG2254 All HD-like nuclease

8 BH0338 BH0338-likeMTH1090-like

All mostlyarchaea and FIRM

Large Zn-finger containing proteins probably nucleases (nucleaseactivity was shown for MTH1090 (ref)

9 COG1353 predicted

polymerase

COG1353 Most archaea some bacteria

Predicted palm-domain polymerase distantly related to viral RdRp and RT

hellipand ~15 other less common proteins Makarova et al BD 2006

CRISPR clustered regularly interspaced short palindromic repeats

CRISPR repeats

TB and TA

Strepto-like

Ecoli-like

Pasteurella-like

Cas2COG1343

Cas1COG1518

Cas4COG1468

Cas3COG1203

ldquoThe common structural characteristics of CRISPR loci are (i) the presence of multiple short direct repeats which show no or very little sequence variation within a given locus (ii) the presence of non-repetitive spacer sequences between the repeats of similar size (iii) the presence of a common leader sequence of a few hundred basepairs in most species harbouring multiple CRISPR loci (iv) the absence of long open reading frames within the locus and (v) the presence of the cas1 gene accompanied by the cas2 cas3 or cas4 genes in CRISPR-containing speciesrdquo

The cas genes are our ldquorepairrdquo system

ldquoHelicaserdquo cassette

ldquoPolymeraserdquo cassette

paragon of prokaryotic genome plasticity-extensive gene shuffling-evidence of multiple horizontal transfers-widespread gene loss

RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins

extreme sequence diversity MOTIFs specific I II III IV V COGs 13361367 hhshhGs ust-lKGhh+hh hhGtt hD lGhttsGh 160413371332 y1726-like slhlpEKuVRGT lRTIDs YGuVTsGhuh COG1851 hGphpGpsaFh hGFGRh BH0337-like TpA-h+GIh-uIh hhLpDV LGsREhuht COG1567 hhhpp ups-lhtAhh luscoGhGh COG1769 hhhh+Ph-hhh ss-hhGhlhsh hGh hGtcp+hsthchp COG1688 (Cas5) hhhhhts sssshhGhlsh lGttphh COGs 15835551 hhhhoPhhl hGtppshGFGl YgcH-like hHphlh hGu+uhGhGhh y1727-like LHphLh GFsthGLStss MJ0978-like hHNH lG+tsuhGhGol

Ferredoxin foldRNA-recognition motif (RRM)

The sequence similarity space of CRISPR repeats visualized with the BioLayoutprogram [26] Dots denote individual repeat sequences connecting lines represent Smith-Waterman similarities such that closer dots represent more similar sequences Dot colors denote cluster association as derived from MCL clustering The 12 largest clusters are indicated by circles together with their sequence logos coarse phylogenetic composition and sample secondary structures where applicableKunin et al Genome Biology 2007 8R61 doi101186gb-2007-8-4-r61

CRISPR show extreme diversity and complex clustering

Arcful GTTGAAATC-------AGACCAAAATGGGATTGAAAG- Metthe GTTAAAATC-------AGACCAAAATGGGATTGAAAT- Pyraby GTTCCAATA-------AGACTAGAATAGAATTGAAAG- Pyrhor GTTTAATAA--------GACTAAAATAGAATTGAAAG- Sulsol GATTAATCC-------------CAAAAGAATTGAAAG- Pyrhor GTTTCCGTA--------GAACTAAATAGTGTGGAAAG- Pyraby GTTTCCGTA--------GAACTTAGTAGTGTGGAAAG- Pyraer GTTTCAACT------------ATCTTTTGATTTTTGG- Pictor GTTAAATAA-------TAACCTAAATAGGATTGAAAG- Theaer GTAAAATAG---------ACCTTAATAGGATTGAAAG- Metace GTTTCAATC-------CCTCAAAGGTCTGATTTTAAC- Sultok GATGAATCC------------CAAAAGGAATTGAAAG- Sulaci GTTTTAGTT-------------TCTTGTCGTTATTAC- Pictor GTTTAAGAA--------TTACTAGATAGTATGGAGT-- Halmar GTTTCAGAC-------GGACCCTTGTGGGGTTGAAGC- Metjan ATTAAAATC-------AGACCGTTTCGGAATGGAAAT- Metkan GTTTCATTA-CCCGTATTATTACGGGTTAATTGCGAG- Nanequ CTTTCAATA---------TTTCTAATATATTAGAAAC- Themar GTTTCAATA-------CTTCCTTAGAGGTATGGAAAC- Theten GTTTCAATC-------CCTTTTAGGTAGGCTAAAAAC Thethe GTTGCAAAC-CTCGTTAGCCTCGTAGAGGATTGAAAC- Aquaeo GTTTTAACT--------CCACACGGTACATTAGAAAC- Bachal GTCGCACTC----TATATGGGT-GCGTGGATTGAAAT- Azoarcus GTGTTCCCC-------GCGCATCGCGGGGGTTGAAG-- Chltep GTCTTCCCC--------ACGCC-CGTGGGGGTGTTTC- Chrvio GTGCTCCCC-------CACGCA-CGTGGGGATGAACCG Clotet GTATTAGTA-------GCACCA-TATTGGAATGTAAAT Anabaena GTTTTAATTAACAAAAATCCCTATCAGGGATTGAAAC- Myctub GTTTCCGTC--CCCTCTCGGGGTTTTGGGTCTGACGAC Yerpes GTTCACTGC--------CGCACAGGCAGCTTAGAAA-- Metcap GTTTCAATCCACTCCCGGCTATTTAGCCGGGAGATAC- Mycgal GTTTTAGCA-CTGTACAATACTTGTGTAAGCAATAAC- Pasmul GTTAACTGC--------CGTATAGGCAGCTTAGAAA-- Pasmul GTTGTAGTTCCCTCTCTCATTTCGCAGTGCTACAAT-- Pasmul GTTCACCAT--------CGTGTAGATGGCTTAGAAA--

All

A large subset of CRISPRs is conserved among diverse specieseven between archaea and bacteria suggesting that

CRISPRs are horizontally transferred together with cas genes

Mojica FJ Diez-Villasenor C Garcia-Martinez J Soria E

Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elementsJ Mol Evol 2005 Feb60(2)174-82

Here we show that CRISPR spacers derive from preexisting sequences either chromosomal or within transmissible genetic elements such as bacteriophages and conjugative plasmids

helliphelliphelliphellipThe transcription of the CRISPR loci (Tang et al 2002) suggests that such activity could be executed by CRISPR-RNA molecules acting as regulatory RNA that specifically recognizes the target through the homologous RNA-spacer sequence similarly to the eukaryotic interference RNA

HypothesisCRISPRCasbull is a prokaryotic immunity system that functions on the

RNAi principle

bull integrates short fragments of essential phageplasmid genes into CRISPRs

bull When expressed these fragments (psiRNA ndash after prokaryotic siRNA) silence the target gene and make the organism immune to the respective agent

bull contains all or most of the protein activities involved in these processes

bull Some of the Cas proteins are functional analogs of the eukaryotic proteins involved in RNAi in particular components of RISC (RNA-Induced Silencing Complex) and form prokaryotic analogs of RISC (pRISCs)

Transcription

(protein-guided) RNA folding

specific transcription factor (COG1517)

cellular RNApol

polycistronic pre-psiRNA

p-dicer (Helicase+HD-Hydrolase)(COG1203 COG2254)

RNA processing(fast)

pre-psiRNA75-100 nt

RNA processing(slow)

p-dicer

psiRNA25-45 nt

p-slicer(COG1468 COG4343

COG1857)

RAMP

target RNA cleavage

p-RISC

plasmid or phage mRNA

Annealing to RNA target

p-RISC

The basic scheme of CASS functioning

psiRNA25-45 nt

RAMP

RAMP-RNA complex(unstable)plasmid or phage

mRNA

Primer elongationCASS RNApol(COG1353)

Amplified annealing

long dsRNA (stable)

p-dicerDuplex degradation

RAMPRAMP binding

Cycle continues

Annealing to RNA target

Variant of CASS functioning with polymerasepsi-RNA amplification(mostly in thermophiles but also in Mycobacteria)

plasmid or phage mRNA

pre-psiRNA75-100 nt

CASS RNApol(COG1353)Or other RT

Reverse transcription with random copy choice

Random RNA recombination and reverse transcription

dsDNA with CRISPRs and target-derived spacers

Homologous recombination with genomic CRISPR region

integrase(COG1518)

genomic DNA

genomic DNA with new target-derived spacer

OR

New psiRNA generation

Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12

Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity

Key validation

bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease

Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes

(1) Adaptation

(2) Expression amp Processing

(3) Interference

invading virus plasmid

CRISPRleader

5 4 3 2 1

6 5 4 3 2 1

invading virus plasmid

invading virus plasmid

host

host

cas operon

Cascade

Cas3

cas3 cas2cas1cas5

Van der Oost et al TIBS 2009

The three types of CRISPRCas systems and their signature genes

1343

1343

1343

1343

Helicase1203

HD-f

Cas2

Cas6

Cas6

Cas5

Cas8

Cas9

Cas1

Cas3

Cas10

Cas11

Cas7

Cas6

Cas6

Cas5 Cas6

Cas4

Cas6

13433513

1343

RuvC-f

3513

HNH

13433513

1857

18571857 RAMP1688

RAMP1688

RecB-f1468

RecB-f1468

Helicase1203

R AMP1583

1857 1857Helicase1203

1857Helicase1203

1343 y1724

ygcKygcL

EcoliCASS2

YpestCASS3

NmeniCASS4

MtubeCASS6 RAMP module

DvulgCASS1

Tneap and HmariCASS7

ApernCASS5

Type I (Helicase cassette)

Cascade

Type II (HNH-type)

Type III (RAMP module or Polymerase cassette)

CASS4a

1857 1343RecB-f1468

Helicase1203

BH0338MTH1090LA3191

1421 RAMP1567

RAMP15833337RAMP

1583

RAMP1583

RAMP1769

SPy1049

Cas6

Cas61857

RecB-f1468

RAMP1688

Helicase1203

AF0070

R AMP1583

1343

1517

Cas12 Cas13

RAMP1688

RAMP1688

RAMP1688

Makarova et al NRM submitted

Experimental data on CRISPRCas systems

1857

1857

1857

RAMP1688

10

Cordip DIP0037Bacfra BF3951

Wolsuc WS1444Camjej Cj1522c

Neimen NMA0630Pasmul PM1126

Strthe str0658Strpyo Spy 0770

Treden TDE0328Staepi SERP2463

Mycgal MGA 0523Mycmob MMOB0320

Porgin PG1982Legpne lpp0161

Wolsuc WS1615Sulsol SSO1450

Aerper APE1240Sulsol SSO1405

Pyraer PAE0200Pyraer PAE0081

Metmaz MM0559Metmaz MM3249

Arcful AF1878Mansuc MS1635

Niteur NE0111Myctub Rv2817c

Nostoc alr0381Synech slr7071Thethe TTHB145

Chrvio CV1229Xanaxo XAC3842

Chltep CT1130Desvul DVUA0134

Metcap MCA0651Azoarc ebA3283

Deigeo_DRAFT_1682Mansuc MS0981

Baccla ABC3592Bachal BH0341

Strpyo Spy 1286Vibvul VVA1544

Nostoc alr1468Synech slr7092

Synech slr7016Geosul GSU0057

Thethe TTHB224Nostoc alr1568

Metkan MK1312

Esccol b2755Saltyp STM2938

Phopro PBPRC0034Geosul GSU1392

Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639

Symthe_STH663Corjei jk0643

Nocfar nfa44220Strave SAV7537

Metcap MCA0930Cordip DIP2214

Chrvio CV1756Zymmob ZMO0680

Pasmul PM0311Erwcar ECA3679

Yerpes y1722Legpne lpl2837Phopro PBPRB1995

Acinet ACIAD2484

Clotet CTC01148Fusnuc FN1177

Aquaeo aq 369Theten TTE2658Themar TM1797

Bacfra BF2544Porgin PG2014

Halmar pNG4053Clotet CTC01463

Metjan MJ0378Pyrhor PH1245

Thekod TK0455Metthe MTH1084

Arcful AF2435Metace MA3670

Nanequ NEQ017Pyrhor PH0173

Pictor PTO0003Pictor PTO0049Thevol TVN0106

1343

1343

1343

1343

1518

1518

1343

1518

RecB-f4343

RecB-f1468

RecB-f1468

RecB-f1468

Helicase12031857 RAMP

1688

RAMP1688

Helicase1203

Helicase1203

RAMPBH0337

RAMPYgcH

RAMPy1727

RAMPy17261857

Helicase1203

HD-f

Helicase1203

HD-f

1343

HD-f

HD-f

HD-f

RuvC-f

AF0070

BH0338

y1724

ygcKygcL

RAMP1583

HTH

AF1870

SPy1049

3574

AF1870

PH0918

HTH 3574

1857 RAMP1688

RecB-f1468

Helicase1203

HD-f

AF1870

MTH1090

AF1873

RAMP1583

3574

AF1870

3574

1343

1

7

2

3

4

5

6

7a

Sulsol SSO1999Sulsol SSO1440

Sultok ST2639Sultok ST0032

Sulsol SSO1402Pyraer PAE0068

Arcful AF1874Pyrhor PH0917Thekod TK0450

Pyraby PAB1689

13433513

3513

1518

RuvC-f

HNH

4aHNH

1 0

Pyrhor PH0162

6

64

6

4

16

17

7

7

7

7

1

2

7

7

7

7

5 7

5 75 7

5 75 7

5 7

6 1

7

Metkan MK1314

Metace MA1931

Metmaz MM3356

Metbar Mbar_A1355

Strthe stu0960Myctub Rv2823cStaepi SERP2461

Themar TM1811Mansuc MS1651

Niteur NE0123Thethe TTP0102

Metthe MTH1082

Metthe MTH326

Metjan MJ1672Pictor PTO0053Thevol TVN0112

Metkan MK1297Thethe TTP0115

Bachal BH0328Synech sll7090

Aquaeo aq 387Theten TTE2639

Themar TM1794Arcful AF1867

Pyrfur PF1129Thefus Tfu1578

Porgin PG1987Sultok ST1979

Sulsol SSO1991Sulsol SSO1729

Sultok ST0012Sulaci Saci 2046Nostoc all1479

1421 RAMP1567

RAMP1583

RAMPMTH323

RAMP1769

XMTH324

ldquoPolymeraserdquo cassette

ldquoHelicaserdquo cassette(A)

(B)

(C)

Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4

Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56

Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)

mRNA targeting

Phylogeny of Cas1 and the 3 types of CRISPRCas systems

228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes

EcoliCASS2Type IType I

Type I

Type I

Type I

Type III

Type III

Type III

Type III

Type III

Type III

Type II

YpestCASS3

NmeniCASS4

PolymeraseRAMPmoduleMtubeCASS6

PolRAMP

DvulgCASS1

TneapHmariCASS7

ApernCASS5

cas1cas9 cas2 cas4

cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI

II

IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2

RAMP

Makarova et al NRM submitted

CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria

bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes

54

256

1537

26

516

2

9

7 56107

3

10

13

383

210

46

216

8

1

7 70211

10

1

0102030405060708090

100

1623

84 6

17 7

2322

0

9

0

0

151

2

1

21

17

20 1

0

1533 28 7 14

0

9 7 40

117 210

0102030405060708090

100

Type I Type II Type IIICas1presenceabsence

Makarova et al in preparation

Modular evolution of the 3 types of CRISPRCas

Makarova et al in preparation

Mol Microbiol 2011 Jan79(2)484-502

A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair

Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF

Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair

Back to repair functions

The 3 major modalities of evolution

Koonin Wolf Biol Direct 2009

CRISPRCas as a bona fide Lamarckian system

Koonin Wolf Biol Direct 2009

Phenomenon Biological rolefunction

Phyletic spread

Lamarckian criteria

Genomic changes caused by environmental factor

Changes are specific to relevant genomic loci

Changes provide adaptation to the causative factor

Bona fide LamarckianCRISPRCas Defense against

viruses and other mobile elements

Most of the Archaeaand many bacteria

Yes Yes Yes

piRNA Defense against transposable elements in germline

Animals Yes Yes Yes

HGT (specific cases)

Adaptation to new environment stress response resistance

Archaea bacteria unicellular eukaryotes

Yes Yes Yes

Quasi-LamarckianHGT (general phenomenon)

Diverse innovations Archaea bacteria unicellular eukaryotes

Yes No Yesno

Stress-induced mutagenesis

Stress responseresistanceadaptation to new conditions

Ubiquitous Yes No or partially

Yes (but general evolvabilityenhanced as well)

Diverse Lamarckian and quasi-Lamarckian phenomena

Koonin Wolf Biol Direct 2009

Stress as a gauge of evolutionary modality

Koonin Wolf Biol Direct 2009

bullEvolution of parasites is intrinsic to any replicator system

bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since

bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms

bullPerennial arms race between parasites and hosts is one of the principal factors of evolution

Valerian DoljaOregon State University

Yuri Wolf NCBITatiana Senkevich NIAID NIH

Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille

Didier Raoult et al

Bill MartinUniv Duesseldorf

Kira Makarova NCBI

  • Slide Number 1
  • What is a virus
  • Slide Number 3
  • Slide Number 4
  • Slide Number 5
  • Comparative genomics shows that viruses that cause human diseases belong to families that evolved hundreds of millions or even billions years agoViruses accompany the evolving cellular life throughout its history and might even predate it
  • Slide Number 7
  • Slide Number 8
  • Slide Number 9
  • Slide Number 10
  • Slide Number 11
  • Slide Number 12
  • Slide Number 13
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Slide Number 18
  • Slide Number 19
  • Slide Number 20
  • Slide Number 21
  • Slide Number 22
  • Slide Number 23
  • Slide Number 24
  • Slide Number 25
  • Slide Number 26
  • Slide Number 27
  • Slide Number 28
  • Slide Number 29
  • Slide Number 30
  • Slide Number 31
  • Slide Number 32
  • Slide Number 33
  • Slide Number 34
  • Slide Number 35
  • Slide Number 36
  • Evolution of antivirus defense systems
  • Slide Number 38
  • Slide Number 39
  • Slide Number 40
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins extreme sequence diversity
  • Slide Number 46
  • Slide Number 47
  • Slide Number 48
  • Slide Number 49
  • Hypothesis
  • Slide Number 51
  • Slide Number 52
  • Slide Number 53
  • Slide Number 54
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • The three types of CRISPRCas systems and their signature genes
  • Experimental data on CRISPRCas systems
  • Phylogeny of Cas1 and the 3 types of CRISPRCas systems
  • CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
  • Slide Number 62
  • Slide Number 63
  • Slide Number 64
  • Slide Number 65
  • Slide Number 66
  • Slide Number 67
  • Slide Number 68
  • Slide Number 69
Page 18: The Virus World, its evolution, evolution of antiviral defense

Marine eukaryotic plankton carries a wealth ofpositive-strand RNA viruses nearly all belong to the picorna-like superfamily

Lang et al (2004) Virology 320206

HcRNAV44 kb

VP

HaRNAV86 kb

S-Pro Pol

VP1VP3VP2PolHel3

RsRNAV89 kb AnPolHel3 VP 1-3

SssRNAV90 kb An

PolHel3 VP 1-3C-Pro

sg sg

An

Nagasaki et al (2005) Appl Env Microbiol 718888

Culley et al (2006) Science 3121795

The amended Picornavirus-like

superfamily includes 14 recognized viral families

4 floating genera and15 unclassified positive-

strand anddouble-strand RNA

viruses that infect hosts from 4 of the 5 eukaryotic

supergroups-6 distinct clades

from RdRp phylogeny-diverse genome layouts

View from the viral side

6 major clades of picorna-likeviruses - 5 infect eukaryotesfrom different supergroups

The 5 supergroups of eukaryotes and their picorna-like viruses

Complementary view from thehost side

4 of the 5 eukaryoticsupergroups host diverse picorna-like viruses (no onechecked the 5th)

PolioV74 kb

VP0 3DPol2C 3CPro Angg

IRESVP1VP3 2APro 2B

3A 3B

Jelly-roll CPsSuperfamily 3

helicase RdRp

DNA phages Bacterial group II

retroelements

Bacterial and mitochondrialHtrA family proteases

Likely origins of the signature genes of the picorna-like superfamily were inferred usingsignature PSSMs and PSI-BLAST searches

3C-Pro

Hallmark genes of picorna-like viruses have distinct prokaryoticorigins

Radiation of major viralclades occurred in aldquoBig Bangrdquo during eukaryogenesis and antedates the divergenceof eukaryotic supergroups-the viruses then ldquosampledrdquo the hosts

Conclusions on the picorna and NCLDV storiesBig Bangs of virus evolution

bullPhylogenomic analysis of the picorna-like superfamily of eukaryotic RNA viruses indicates that the major lineages within this superfamily diverged prior to the divergence of the eukaryotic supergroupsbullBig Bang ndash an explosive early phase of viral evolutionbullMost likely the same pattern holds for other major groups ofviruses as illustrated by the evolutionary study of NCLDVa completely different group of virusesbullThe Big Bangs of eukaryotic virus evolution occurred concomitantly with a similar rapid phase of host evolution andcould be a manifestation of a general model of major evolutionarytransitions

Koonin The Biological Big Bang model for the major transitions in evolutionBiol Direct 2007 Aug 20221

Koonin Wolf Nagasaki Dolja Nat Rev Microbiol 2008 Dec6(12)925-39

+

1 Positive-strandRNA

3-30 kb - +R R ET

T+R

Class Replication cycle

2 Double-strandRNA

4-25 kb plusmnTr R E

T

+ plusmn

3 Negative-strandRNA

11-20 kbR R E

RdRp CPT

+-Tr -

+

Genetic cycles of RNA viruses

Diversity of viral genetic cycles versus the uniform genetic cycle of all cellular organisms Viruses are the

biospherersquos laboratory of genomic strategies

RdRp CP

RdRp CP

+4 Retroid

RNA viruses 7-12 kb +RT Tr E

T+Tr

plusmn

5 Retroid DNAviruses

elements2-10 kb +Tr

T+Tr

plusmnE RT

plusmn

RT CP

RT

Class Replication cycle

Genetic cycles of retroid viruses and retroelements

Genetic cycles of DNA viruses and plasmids 6 ssDNAvirusesplasmids2-11 kb

7 dsDNA virusesplasmids

5-1200 kb

+RCR RCR E

+Tr

plusmn +

TRCE

Tr

plusmn

+T

plusmnR E

Pr-Pol ALL CELLULARORGANISMS

YOU ARE HERE

I Genes with readily detectable homologs from cellular life forms 1 Genes with closely related homologs from cellular organisms

(typically the host of the given virus) present in a narrow group of viruses2 Genes that are conserved within a virus lineage or even several

lineages and have moderately close cellular homologsOrigin relatively recent (1) or ancient (2) acquisition from host

II Virus-specific genes3 ORFans ie genes without detectable homologs except possibly

in closely related viruses 4 Virus-specific genes that are conserved within a virus lineageAcquisition from host but with rapid divergence from ancestor once within viral genomes

III Viral hallmark genes5 Genes shared by many diverse virus lineages with only very distant

homologs in cellular organisms

Natural history of viral genes a one-page summary of viral comparative genomics

Contributions of different classes of viral genes to the genomes of different classes of viruses strong dependence on genome size

Genome size (log) 0 1 2 3

Recent acquisitions

Old acquisitions

Virus-specific ORFans

Virus-specific conserved

Virus hallmark

Most RNAvirusesretroelementsRCR replicons

Large RNA virusesadenotailedphages

NCLDVherpeslargephages

Natural history of viral genesViral Hallmark Genes

Shared by many diverse groups of viruses from the smallest RNA viruses to the giant DNA viruses

Strong support for monophyly of all viral members of the respective gene families

Only distant homologs in cellular organisms

Can be viewed as signatures of the lsquovirus statersquo

Play major roles in genome replication packagingand assembly

4 Rolling circle replicationinitiation endonuclease

1 Jelly-roll capsid protein

2 Superfamily 3 helicase

Protein products of viral hallmark genes

3 RNA-dependent RNA polymeraseand Reverse transcriptase

5 Viral archaeo-eukaryotic DNA primase

6 UL9-like superfamily 2 helicase

7 Packaging ATPase of the FtsK family

8 ATPase subunit of terminase

The primordial gene pool hypothesis (extremely counterintuitive ndash Santiago Elena Feb 16 2011)

The hallmark genes AND by implication the major lineages of modern viruses (at least viruses of prokaryotes) descend directly from a primordial gene pool

Koonin Senkevich Dolja Biol Direct 2006 129

Viral hallmark genes arebullpresent in a huge diversity of viruses and other selfish elementsbullrepresented only by remote homologs in cellular life forms

-synergy with the diversity of genomic strategies

A crucial corollary If viruses come directly from a primordial gene pool then origin of viruses isinextricably linked to the origin of cells

Cell degeneration Escaped genes Pr imordial geneticsystems

CELL

SMALLPARASITICCELL

VERYSMALLVIRUS

CHROMOSOME

PLASMID

VIRUS

RNA

VIRUS

PRE-CELLULAR LIFE FORMS

DNA

VIRUS

mRNA

VIRUS

Competing concepts of the or igin of viruses

CELLS

Origin and evolution of virus-like genetic

elementsin the pre-cellular

era

Koonin Martin TIG 2005Koonin Senkevich Dolja Biol Direct 2006 129Koonin Ann NY Acad Sci 2009

Replicon fusionyields chromosomes

bullViruses and virus-like genetic elements are not ldquojustrdquo pathogens they aredominant entities in the biospherebullEmergence of virus-like parasites is inevitable in any replicating systembullIn the pre-cellular epoch the genetic elements that later became viral andcellular genomes comprised a single pool in which they mixed matched andevolved new increasingly complex gene ensemblesbullDifferent replication strategies including RNA replication reverse transcriptionand DNA replication evolved already in the primordial genetic poolbullWith the emergence of prokaryotic cells a distinct pool of viral genes formed that retained its identity ever since as evidenced by the extant distribution ofviral hallmark genes ldquovirus worldrdquo or the virospherebullThe emergence of the eukaryotic cell was a second melting pot of virus evolution from which viruses of eukaryotes originated via recombination of genes from prokaryote viruses retroelements and the evolving eukaryotic hostbullViruses make essential contributions to the evolution of the genomes of cellularlife forms in particular as vehicles of HGT GTAs transducing phages

The ancient Virus World (VW)

Koonin EV Senkevich TG Dolja VV The ancient Virus World and evolution of cells Biol Direct 2006 Koonin EV Wolf YI Nagasaki K Dolja VV The complexity of the virus world Nat Rev Microbiol 2009

Evolution of antivirus defense systems

bull CRISPRCas system of adaptive immunity in prokaryotes

bull A case for Lamarckian evolutionbull The perennial virus-host arms race

CRISPR repeats and Cas genesCRISPR Clustered Regularly interspaced short palindromic repeatsCas CRISPR-associated (genes)

Sorek et al Nature Rev Microbiol 2008

Makarova KS Aravind L Grishin NV Rogozin IB Koonin EVA DNA repair system specific for thermophilic Archaea and bacteria predicted by genomic context analysisNucleic Acids Res 2002 Jan 1530(2)482-96

During a systematic analysis of conserved gene context in prokaryotic genomes a previously undetected complex partially conserved neighborhood consisting of more than 20 genes was discovered in most Archaea and some bacteria including the hyperthermophiles Thermotoga maritima and Aquifex aeolicus The gene composition and gene order in this neighborhood vary greatly between species but all versions have a stable conserved core that consists of five genes One of the core genes encodes a predicted DNA helicase often fused to a predicted HD-superfamily hydrolase and another encodes a RecB family exonuclease three core genes remain uncharacterized but one of these might encode a nuclease of a new family helliphelliphelliphellipThe functional features of the proteins encoded in this neighborhood suggest that they comprise a previously undetected DNA repair system which to our knowledge is the first repair system largely specific for thermophiles to be identified

Protein components of the system an update with unification of many diverse families

~25 families altogetherFamily SubfamilyA Phyletic

distributionBComments

1 COG1518 COG1518 (cas1) All Putative novel nucleaseintegrase Mostly α-helical protein

2 COG1343 COG1343 (cas2) COG3512ygbF-like

MTH324-likey1723_N-like

All Small protein related to VapD fused to helicase (COG1203) iny1723-like proteins

3 COG1203 COG1203 (cas3) All DNA helicase Most proteins have fusion to HD nuclease

4 RecB-like nuclease

COG1468 (cas4) COG4343 All RecB-like nuclease Contains three-cysteine C-terminal cluster

5 RAMP Repair-associated mysterious

protein

COG1688 (cas5)COG1769 COG1583COG1567 COG1336COG1367 COG1604COG1337 COG1332COG5551

BH0337-likeMJ0978-likeYgcH-like

y1726-like y1727-like

All Belong to ldquoRAMPrdquo family possibly RNA-binding proteinstructurally related to duplicated ferredoxin fold (PDB 1wj9)

6 COG1857 COG1857 COG3649 YgcJ-like y1725-like

All αβ protein predicted nuclease or integrase

7 HD-like nuclease

COG1203 (N-terminus) COG2254 All HD-like nuclease

8 BH0338 BH0338-likeMTH1090-like

All mostlyarchaea and FIRM

Large Zn-finger containing proteins probably nucleases (nucleaseactivity was shown for MTH1090 (ref)

9 COG1353 predicted

polymerase

COG1353 Most archaea some bacteria

Predicted palm-domain polymerase distantly related to viral RdRp and RT

hellipand ~15 other less common proteins Makarova et al BD 2006

CRISPR clustered regularly interspaced short palindromic repeats

CRISPR repeats

TB and TA

Strepto-like

Ecoli-like

Pasteurella-like

Cas2COG1343

Cas1COG1518

Cas4COG1468

Cas3COG1203

ldquoThe common structural characteristics of CRISPR loci are (i) the presence of multiple short direct repeats which show no or very little sequence variation within a given locus (ii) the presence of non-repetitive spacer sequences between the repeats of similar size (iii) the presence of a common leader sequence of a few hundred basepairs in most species harbouring multiple CRISPR loci (iv) the absence of long open reading frames within the locus and (v) the presence of the cas1 gene accompanied by the cas2 cas3 or cas4 genes in CRISPR-containing speciesrdquo

The cas genes are our ldquorepairrdquo system

ldquoHelicaserdquo cassette

ldquoPolymeraserdquo cassette

paragon of prokaryotic genome plasticity-extensive gene shuffling-evidence of multiple horizontal transfers-widespread gene loss

RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins

extreme sequence diversity MOTIFs specific I II III IV V COGs 13361367 hhshhGs ust-lKGhh+hh hhGtt hD lGhttsGh 160413371332 y1726-like slhlpEKuVRGT lRTIDs YGuVTsGhuh COG1851 hGphpGpsaFh hGFGRh BH0337-like TpA-h+GIh-uIh hhLpDV LGsREhuht COG1567 hhhpp ups-lhtAhh luscoGhGh COG1769 hhhh+Ph-hhh ss-hhGhlhsh hGh hGtcp+hsthchp COG1688 (Cas5) hhhhhts sssshhGhlsh lGttphh COGs 15835551 hhhhoPhhl hGtppshGFGl YgcH-like hHphlh hGu+uhGhGhh y1727-like LHphLh GFsthGLStss MJ0978-like hHNH lG+tsuhGhGol

Ferredoxin foldRNA-recognition motif (RRM)

The sequence similarity space of CRISPR repeats visualized with the BioLayoutprogram [26] Dots denote individual repeat sequences connecting lines represent Smith-Waterman similarities such that closer dots represent more similar sequences Dot colors denote cluster association as derived from MCL clustering The 12 largest clusters are indicated by circles together with their sequence logos coarse phylogenetic composition and sample secondary structures where applicableKunin et al Genome Biology 2007 8R61 doi101186gb-2007-8-4-r61

CRISPR show extreme diversity and complex clustering

Arcful GTTGAAATC-------AGACCAAAATGGGATTGAAAG- Metthe GTTAAAATC-------AGACCAAAATGGGATTGAAAT- Pyraby GTTCCAATA-------AGACTAGAATAGAATTGAAAG- Pyrhor GTTTAATAA--------GACTAAAATAGAATTGAAAG- Sulsol GATTAATCC-------------CAAAAGAATTGAAAG- Pyrhor GTTTCCGTA--------GAACTAAATAGTGTGGAAAG- Pyraby GTTTCCGTA--------GAACTTAGTAGTGTGGAAAG- Pyraer GTTTCAACT------------ATCTTTTGATTTTTGG- Pictor GTTAAATAA-------TAACCTAAATAGGATTGAAAG- Theaer GTAAAATAG---------ACCTTAATAGGATTGAAAG- Metace GTTTCAATC-------CCTCAAAGGTCTGATTTTAAC- Sultok GATGAATCC------------CAAAAGGAATTGAAAG- Sulaci GTTTTAGTT-------------TCTTGTCGTTATTAC- Pictor GTTTAAGAA--------TTACTAGATAGTATGGAGT-- Halmar GTTTCAGAC-------GGACCCTTGTGGGGTTGAAGC- Metjan ATTAAAATC-------AGACCGTTTCGGAATGGAAAT- Metkan GTTTCATTA-CCCGTATTATTACGGGTTAATTGCGAG- Nanequ CTTTCAATA---------TTTCTAATATATTAGAAAC- Themar GTTTCAATA-------CTTCCTTAGAGGTATGGAAAC- Theten GTTTCAATC-------CCTTTTAGGTAGGCTAAAAAC Thethe GTTGCAAAC-CTCGTTAGCCTCGTAGAGGATTGAAAC- Aquaeo GTTTTAACT--------CCACACGGTACATTAGAAAC- Bachal GTCGCACTC----TATATGGGT-GCGTGGATTGAAAT- Azoarcus GTGTTCCCC-------GCGCATCGCGGGGGTTGAAG-- Chltep GTCTTCCCC--------ACGCC-CGTGGGGGTGTTTC- Chrvio GTGCTCCCC-------CACGCA-CGTGGGGATGAACCG Clotet GTATTAGTA-------GCACCA-TATTGGAATGTAAAT Anabaena GTTTTAATTAACAAAAATCCCTATCAGGGATTGAAAC- Myctub GTTTCCGTC--CCCTCTCGGGGTTTTGGGTCTGACGAC Yerpes GTTCACTGC--------CGCACAGGCAGCTTAGAAA-- Metcap GTTTCAATCCACTCCCGGCTATTTAGCCGGGAGATAC- Mycgal GTTTTAGCA-CTGTACAATACTTGTGTAAGCAATAAC- Pasmul GTTAACTGC--------CGTATAGGCAGCTTAGAAA-- Pasmul GTTGTAGTTCCCTCTCTCATTTCGCAGTGCTACAAT-- Pasmul GTTCACCAT--------CGTGTAGATGGCTTAGAAA--

All

A large subset of CRISPRs is conserved among diverse specieseven between archaea and bacteria suggesting that

CRISPRs are horizontally transferred together with cas genes

Mojica FJ Diez-Villasenor C Garcia-Martinez J Soria E

Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elementsJ Mol Evol 2005 Feb60(2)174-82

Here we show that CRISPR spacers derive from preexisting sequences either chromosomal or within transmissible genetic elements such as bacteriophages and conjugative plasmids

helliphelliphelliphellipThe transcription of the CRISPR loci (Tang et al 2002) suggests that such activity could be executed by CRISPR-RNA molecules acting as regulatory RNA that specifically recognizes the target through the homologous RNA-spacer sequence similarly to the eukaryotic interference RNA

HypothesisCRISPRCasbull is a prokaryotic immunity system that functions on the

RNAi principle

bull integrates short fragments of essential phageplasmid genes into CRISPRs

bull When expressed these fragments (psiRNA ndash after prokaryotic siRNA) silence the target gene and make the organism immune to the respective agent

bull contains all or most of the protein activities involved in these processes

bull Some of the Cas proteins are functional analogs of the eukaryotic proteins involved in RNAi in particular components of RISC (RNA-Induced Silencing Complex) and form prokaryotic analogs of RISC (pRISCs)

Transcription

(protein-guided) RNA folding

specific transcription factor (COG1517)

cellular RNApol

polycistronic pre-psiRNA

p-dicer (Helicase+HD-Hydrolase)(COG1203 COG2254)

RNA processing(fast)

pre-psiRNA75-100 nt

RNA processing(slow)

p-dicer

psiRNA25-45 nt

p-slicer(COG1468 COG4343

COG1857)

RAMP

target RNA cleavage

p-RISC

plasmid or phage mRNA

Annealing to RNA target

p-RISC

The basic scheme of CASS functioning

psiRNA25-45 nt

RAMP

RAMP-RNA complex(unstable)plasmid or phage

mRNA

Primer elongationCASS RNApol(COG1353)

Amplified annealing

long dsRNA (stable)

p-dicerDuplex degradation

RAMPRAMP binding

Cycle continues

Annealing to RNA target

Variant of CASS functioning with polymerasepsi-RNA amplification(mostly in thermophiles but also in Mycobacteria)

plasmid or phage mRNA

pre-psiRNA75-100 nt

CASS RNApol(COG1353)Or other RT

Reverse transcription with random copy choice

Random RNA recombination and reverse transcription

dsDNA with CRISPRs and target-derived spacers

Homologous recombination with genomic CRISPR region

integrase(COG1518)

genomic DNA

genomic DNA with new target-derived spacer

OR

New psiRNA generation

Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12

Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity

Key validation

bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease

Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes

(1) Adaptation

(2) Expression amp Processing

(3) Interference

invading virus plasmid

CRISPRleader

5 4 3 2 1

6 5 4 3 2 1

invading virus plasmid

invading virus plasmid

host

host

cas operon

Cascade

Cas3

cas3 cas2cas1cas5

Van der Oost et al TIBS 2009

The three types of CRISPRCas systems and their signature genes

1343

1343

1343

1343

Helicase1203

HD-f

Cas2

Cas6

Cas6

Cas5

Cas8

Cas9

Cas1

Cas3

Cas10

Cas11

Cas7

Cas6

Cas6

Cas5 Cas6

Cas4

Cas6

13433513

1343

RuvC-f

3513

HNH

13433513

1857

18571857 RAMP1688

RAMP1688

RecB-f1468

RecB-f1468

Helicase1203

R AMP1583

1857 1857Helicase1203

1857Helicase1203

1343 y1724

ygcKygcL

EcoliCASS2

YpestCASS3

NmeniCASS4

MtubeCASS6 RAMP module

DvulgCASS1

Tneap and HmariCASS7

ApernCASS5

Type I (Helicase cassette)

Cascade

Type II (HNH-type)

Type III (RAMP module or Polymerase cassette)

CASS4a

1857 1343RecB-f1468

Helicase1203

BH0338MTH1090LA3191

1421 RAMP1567

RAMP15833337RAMP

1583

RAMP1583

RAMP1769

SPy1049

Cas6

Cas61857

RecB-f1468

RAMP1688

Helicase1203

AF0070

R AMP1583

1343

1517

Cas12 Cas13

RAMP1688

RAMP1688

RAMP1688

Makarova et al NRM submitted

Experimental data on CRISPRCas systems

1857

1857

1857

RAMP1688

10

Cordip DIP0037Bacfra BF3951

Wolsuc WS1444Camjej Cj1522c

Neimen NMA0630Pasmul PM1126

Strthe str0658Strpyo Spy 0770

Treden TDE0328Staepi SERP2463

Mycgal MGA 0523Mycmob MMOB0320

Porgin PG1982Legpne lpp0161

Wolsuc WS1615Sulsol SSO1450

Aerper APE1240Sulsol SSO1405

Pyraer PAE0200Pyraer PAE0081

Metmaz MM0559Metmaz MM3249

Arcful AF1878Mansuc MS1635

Niteur NE0111Myctub Rv2817c

Nostoc alr0381Synech slr7071Thethe TTHB145

Chrvio CV1229Xanaxo XAC3842

Chltep CT1130Desvul DVUA0134

Metcap MCA0651Azoarc ebA3283

Deigeo_DRAFT_1682Mansuc MS0981

Baccla ABC3592Bachal BH0341

Strpyo Spy 1286Vibvul VVA1544

Nostoc alr1468Synech slr7092

Synech slr7016Geosul GSU0057

Thethe TTHB224Nostoc alr1568

Metkan MK1312

Esccol b2755Saltyp STM2938

Phopro PBPRC0034Geosul GSU1392

Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639

Symthe_STH663Corjei jk0643

Nocfar nfa44220Strave SAV7537

Metcap MCA0930Cordip DIP2214

Chrvio CV1756Zymmob ZMO0680

Pasmul PM0311Erwcar ECA3679

Yerpes y1722Legpne lpl2837Phopro PBPRB1995

Acinet ACIAD2484

Clotet CTC01148Fusnuc FN1177

Aquaeo aq 369Theten TTE2658Themar TM1797

Bacfra BF2544Porgin PG2014

Halmar pNG4053Clotet CTC01463

Metjan MJ0378Pyrhor PH1245

Thekod TK0455Metthe MTH1084

Arcful AF2435Metace MA3670

Nanequ NEQ017Pyrhor PH0173

Pictor PTO0003Pictor PTO0049Thevol TVN0106

1343

1343

1343

1343

1518

1518

1343

1518

RecB-f4343

RecB-f1468

RecB-f1468

RecB-f1468

Helicase12031857 RAMP

1688

RAMP1688

Helicase1203

Helicase1203

RAMPBH0337

RAMPYgcH

RAMPy1727

RAMPy17261857

Helicase1203

HD-f

Helicase1203

HD-f

1343

HD-f

HD-f

HD-f

RuvC-f

AF0070

BH0338

y1724

ygcKygcL

RAMP1583

HTH

AF1870

SPy1049

3574

AF1870

PH0918

HTH 3574

1857 RAMP1688

RecB-f1468

Helicase1203

HD-f

AF1870

MTH1090

AF1873

RAMP1583

3574

AF1870

3574

1343

1

7

2

3

4

5

6

7a

Sulsol SSO1999Sulsol SSO1440

Sultok ST2639Sultok ST0032

Sulsol SSO1402Pyraer PAE0068

Arcful AF1874Pyrhor PH0917Thekod TK0450

Pyraby PAB1689

13433513

3513

1518

RuvC-f

HNH

4aHNH

1 0

Pyrhor PH0162

6

64

6

4

16

17

7

7

7

7

1

2

7

7

7

7

5 7

5 75 7

5 75 7

5 7

6 1

7

Metkan MK1314

Metace MA1931

Metmaz MM3356

Metbar Mbar_A1355

Strthe stu0960Myctub Rv2823cStaepi SERP2461

Themar TM1811Mansuc MS1651

Niteur NE0123Thethe TTP0102

Metthe MTH1082

Metthe MTH326

Metjan MJ1672Pictor PTO0053Thevol TVN0112

Metkan MK1297Thethe TTP0115

Bachal BH0328Synech sll7090

Aquaeo aq 387Theten TTE2639

Themar TM1794Arcful AF1867

Pyrfur PF1129Thefus Tfu1578

Porgin PG1987Sultok ST1979

Sulsol SSO1991Sulsol SSO1729

Sultok ST0012Sulaci Saci 2046Nostoc all1479

1421 RAMP1567

RAMP1583

RAMPMTH323

RAMP1769

XMTH324

ldquoPolymeraserdquo cassette

ldquoHelicaserdquo cassette(A)

(B)

(C)

Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4

Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56

Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)

mRNA targeting

Phylogeny of Cas1 and the 3 types of CRISPRCas systems

228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes

EcoliCASS2Type IType I

Type I

Type I

Type I

Type III

Type III

Type III

Type III

Type III

Type III

Type II

YpestCASS3

NmeniCASS4

PolymeraseRAMPmoduleMtubeCASS6

PolRAMP

DvulgCASS1

TneapHmariCASS7

ApernCASS5

cas1cas9 cas2 cas4

cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI

II

IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2

RAMP

Makarova et al NRM submitted

CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria

bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes

54

256

1537

26

516

2

9

7 56107

3

10

13

383

210

46

216

8

1

7 70211

10

1

0102030405060708090

100

1623

84 6

17 7

2322

0

9

0

0

151

2

1

21

17

20 1

0

1533 28 7 14

0

9 7 40

117 210

0102030405060708090

100

Type I Type II Type IIICas1presenceabsence

Makarova et al in preparation

Modular evolution of the 3 types of CRISPRCas

Makarova et al in preparation

Mol Microbiol 2011 Jan79(2)484-502

A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair

Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF

Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair

Back to repair functions

The 3 major modalities of evolution

Koonin Wolf Biol Direct 2009

CRISPRCas as a bona fide Lamarckian system

Koonin Wolf Biol Direct 2009

Phenomenon Biological rolefunction

Phyletic spread

Lamarckian criteria

Genomic changes caused by environmental factor

Changes are specific to relevant genomic loci

Changes provide adaptation to the causative factor

Bona fide LamarckianCRISPRCas Defense against

viruses and other mobile elements

Most of the Archaeaand many bacteria

Yes Yes Yes

piRNA Defense against transposable elements in germline

Animals Yes Yes Yes

HGT (specific cases)

Adaptation to new environment stress response resistance

Archaea bacteria unicellular eukaryotes

Yes Yes Yes

Quasi-LamarckianHGT (general phenomenon)

Diverse innovations Archaea bacteria unicellular eukaryotes

Yes No Yesno

Stress-induced mutagenesis

Stress responseresistanceadaptation to new conditions

Ubiquitous Yes No or partially

Yes (but general evolvabilityenhanced as well)

Diverse Lamarckian and quasi-Lamarckian phenomena

Koonin Wolf Biol Direct 2009

Stress as a gauge of evolutionary modality

Koonin Wolf Biol Direct 2009

bullEvolution of parasites is intrinsic to any replicator system

bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since

bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms

bullPerennial arms race between parasites and hosts is one of the principal factors of evolution

Valerian DoljaOregon State University

Yuri Wolf NCBITatiana Senkevich NIAID NIH

Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille

Didier Raoult et al

Bill MartinUniv Duesseldorf

Kira Makarova NCBI

  • Slide Number 1
  • What is a virus
  • Slide Number 3
  • Slide Number 4
  • Slide Number 5
  • Comparative genomics shows that viruses that cause human diseases belong to families that evolved hundreds of millions or even billions years agoViruses accompany the evolving cellular life throughout its history and might even predate it
  • Slide Number 7
  • Slide Number 8
  • Slide Number 9
  • Slide Number 10
  • Slide Number 11
  • Slide Number 12
  • Slide Number 13
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Slide Number 18
  • Slide Number 19
  • Slide Number 20
  • Slide Number 21
  • Slide Number 22
  • Slide Number 23
  • Slide Number 24
  • Slide Number 25
  • Slide Number 26
  • Slide Number 27
  • Slide Number 28
  • Slide Number 29
  • Slide Number 30
  • Slide Number 31
  • Slide Number 32
  • Slide Number 33
  • Slide Number 34
  • Slide Number 35
  • Slide Number 36
  • Evolution of antivirus defense systems
  • Slide Number 38
  • Slide Number 39
  • Slide Number 40
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins extreme sequence diversity
  • Slide Number 46
  • Slide Number 47
  • Slide Number 48
  • Slide Number 49
  • Hypothesis
  • Slide Number 51
  • Slide Number 52
  • Slide Number 53
  • Slide Number 54
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • The three types of CRISPRCas systems and their signature genes
  • Experimental data on CRISPRCas systems
  • Phylogeny of Cas1 and the 3 types of CRISPRCas systems
  • CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
  • Slide Number 62
  • Slide Number 63
  • Slide Number 64
  • Slide Number 65
  • Slide Number 66
  • Slide Number 67
  • Slide Number 68
  • Slide Number 69
Page 19: The Virus World, its evolution, evolution of antiviral defense

The amended Picornavirus-like

superfamily includes 14 recognized viral families

4 floating genera and15 unclassified positive-

strand anddouble-strand RNA

viruses that infect hosts from 4 of the 5 eukaryotic

supergroups-6 distinct clades

from RdRp phylogeny-diverse genome layouts

View from the viral side

6 major clades of picorna-likeviruses - 5 infect eukaryotesfrom different supergroups

The 5 supergroups of eukaryotes and their picorna-like viruses

Complementary view from thehost side

4 of the 5 eukaryoticsupergroups host diverse picorna-like viruses (no onechecked the 5th)

PolioV74 kb

VP0 3DPol2C 3CPro Angg

IRESVP1VP3 2APro 2B

3A 3B

Jelly-roll CPsSuperfamily 3

helicase RdRp

DNA phages Bacterial group II

retroelements

Bacterial and mitochondrialHtrA family proteases

Likely origins of the signature genes of the picorna-like superfamily were inferred usingsignature PSSMs and PSI-BLAST searches

3C-Pro

Hallmark genes of picorna-like viruses have distinct prokaryoticorigins

Radiation of major viralclades occurred in aldquoBig Bangrdquo during eukaryogenesis and antedates the divergenceof eukaryotic supergroups-the viruses then ldquosampledrdquo the hosts

Conclusions on the picorna and NCLDV storiesBig Bangs of virus evolution

bullPhylogenomic analysis of the picorna-like superfamily of eukaryotic RNA viruses indicates that the major lineages within this superfamily diverged prior to the divergence of the eukaryotic supergroupsbullBig Bang ndash an explosive early phase of viral evolutionbullMost likely the same pattern holds for other major groups ofviruses as illustrated by the evolutionary study of NCLDVa completely different group of virusesbullThe Big Bangs of eukaryotic virus evolution occurred concomitantly with a similar rapid phase of host evolution andcould be a manifestation of a general model of major evolutionarytransitions

Koonin The Biological Big Bang model for the major transitions in evolutionBiol Direct 2007 Aug 20221

Koonin Wolf Nagasaki Dolja Nat Rev Microbiol 2008 Dec6(12)925-39

+

1 Positive-strandRNA

3-30 kb - +R R ET

T+R

Class Replication cycle

2 Double-strandRNA

4-25 kb plusmnTr R E

T

+ plusmn

3 Negative-strandRNA

11-20 kbR R E

RdRp CPT

+-Tr -

+

Genetic cycles of RNA viruses

Diversity of viral genetic cycles versus the uniform genetic cycle of all cellular organisms Viruses are the

biospherersquos laboratory of genomic strategies

RdRp CP

RdRp CP

+4 Retroid

RNA viruses 7-12 kb +RT Tr E

T+Tr

plusmn

5 Retroid DNAviruses

elements2-10 kb +Tr

T+Tr

plusmnE RT

plusmn

RT CP

RT

Class Replication cycle

Genetic cycles of retroid viruses and retroelements

Genetic cycles of DNA viruses and plasmids 6 ssDNAvirusesplasmids2-11 kb

7 dsDNA virusesplasmids

5-1200 kb

+RCR RCR E

+Tr

plusmn +

TRCE

Tr

plusmn

+T

plusmnR E

Pr-Pol ALL CELLULARORGANISMS

YOU ARE HERE

I Genes with readily detectable homologs from cellular life forms 1 Genes with closely related homologs from cellular organisms

(typically the host of the given virus) present in a narrow group of viruses2 Genes that are conserved within a virus lineage or even several

lineages and have moderately close cellular homologsOrigin relatively recent (1) or ancient (2) acquisition from host

II Virus-specific genes3 ORFans ie genes without detectable homologs except possibly

in closely related viruses 4 Virus-specific genes that are conserved within a virus lineageAcquisition from host but with rapid divergence from ancestor once within viral genomes

III Viral hallmark genes5 Genes shared by many diverse virus lineages with only very distant

homologs in cellular organisms

Natural history of viral genes a one-page summary of viral comparative genomics

Contributions of different classes of viral genes to the genomes of different classes of viruses strong dependence on genome size

Genome size (log) 0 1 2 3

Recent acquisitions

Old acquisitions

Virus-specific ORFans

Virus-specific conserved

Virus hallmark

Most RNAvirusesretroelementsRCR replicons

Large RNA virusesadenotailedphages

NCLDVherpeslargephages

Natural history of viral genesViral Hallmark Genes

Shared by many diverse groups of viruses from the smallest RNA viruses to the giant DNA viruses

Strong support for monophyly of all viral members of the respective gene families

Only distant homologs in cellular organisms

Can be viewed as signatures of the lsquovirus statersquo

Play major roles in genome replication packagingand assembly

4 Rolling circle replicationinitiation endonuclease

1 Jelly-roll capsid protein

2 Superfamily 3 helicase

Protein products of viral hallmark genes

3 RNA-dependent RNA polymeraseand Reverse transcriptase

5 Viral archaeo-eukaryotic DNA primase

6 UL9-like superfamily 2 helicase

7 Packaging ATPase of the FtsK family

8 ATPase subunit of terminase

The primordial gene pool hypothesis (extremely counterintuitive ndash Santiago Elena Feb 16 2011)

The hallmark genes AND by implication the major lineages of modern viruses (at least viruses of prokaryotes) descend directly from a primordial gene pool

Koonin Senkevich Dolja Biol Direct 2006 129

Viral hallmark genes arebullpresent in a huge diversity of viruses and other selfish elementsbullrepresented only by remote homologs in cellular life forms

-synergy with the diversity of genomic strategies

A crucial corollary If viruses come directly from a primordial gene pool then origin of viruses isinextricably linked to the origin of cells

Cell degeneration Escaped genes Pr imordial geneticsystems

CELL

SMALLPARASITICCELL

VERYSMALLVIRUS

CHROMOSOME

PLASMID

VIRUS

RNA

VIRUS

PRE-CELLULAR LIFE FORMS

DNA

VIRUS

mRNA

VIRUS

Competing concepts of the or igin of viruses

CELLS

Origin and evolution of virus-like genetic

elementsin the pre-cellular

era

Koonin Martin TIG 2005Koonin Senkevich Dolja Biol Direct 2006 129Koonin Ann NY Acad Sci 2009

Replicon fusionyields chromosomes

bullViruses and virus-like genetic elements are not ldquojustrdquo pathogens they aredominant entities in the biospherebullEmergence of virus-like parasites is inevitable in any replicating systembullIn the pre-cellular epoch the genetic elements that later became viral andcellular genomes comprised a single pool in which they mixed matched andevolved new increasingly complex gene ensemblesbullDifferent replication strategies including RNA replication reverse transcriptionand DNA replication evolved already in the primordial genetic poolbullWith the emergence of prokaryotic cells a distinct pool of viral genes formed that retained its identity ever since as evidenced by the extant distribution ofviral hallmark genes ldquovirus worldrdquo or the virospherebullThe emergence of the eukaryotic cell was a second melting pot of virus evolution from which viruses of eukaryotes originated via recombination of genes from prokaryote viruses retroelements and the evolving eukaryotic hostbullViruses make essential contributions to the evolution of the genomes of cellularlife forms in particular as vehicles of HGT GTAs transducing phages

The ancient Virus World (VW)

Koonin EV Senkevich TG Dolja VV The ancient Virus World and evolution of cells Biol Direct 2006 Koonin EV Wolf YI Nagasaki K Dolja VV The complexity of the virus world Nat Rev Microbiol 2009

Evolution of antivirus defense systems

bull CRISPRCas system of adaptive immunity in prokaryotes

bull A case for Lamarckian evolutionbull The perennial virus-host arms race

CRISPR repeats and Cas genesCRISPR Clustered Regularly interspaced short palindromic repeatsCas CRISPR-associated (genes)

Sorek et al Nature Rev Microbiol 2008

Makarova KS Aravind L Grishin NV Rogozin IB Koonin EVA DNA repair system specific for thermophilic Archaea and bacteria predicted by genomic context analysisNucleic Acids Res 2002 Jan 1530(2)482-96

During a systematic analysis of conserved gene context in prokaryotic genomes a previously undetected complex partially conserved neighborhood consisting of more than 20 genes was discovered in most Archaea and some bacteria including the hyperthermophiles Thermotoga maritima and Aquifex aeolicus The gene composition and gene order in this neighborhood vary greatly between species but all versions have a stable conserved core that consists of five genes One of the core genes encodes a predicted DNA helicase often fused to a predicted HD-superfamily hydrolase and another encodes a RecB family exonuclease three core genes remain uncharacterized but one of these might encode a nuclease of a new family helliphelliphelliphellipThe functional features of the proteins encoded in this neighborhood suggest that they comprise a previously undetected DNA repair system which to our knowledge is the first repair system largely specific for thermophiles to be identified

Protein components of the system an update with unification of many diverse families

~25 families altogetherFamily SubfamilyA Phyletic

distributionBComments

1 COG1518 COG1518 (cas1) All Putative novel nucleaseintegrase Mostly α-helical protein

2 COG1343 COG1343 (cas2) COG3512ygbF-like

MTH324-likey1723_N-like

All Small protein related to VapD fused to helicase (COG1203) iny1723-like proteins

3 COG1203 COG1203 (cas3) All DNA helicase Most proteins have fusion to HD nuclease

4 RecB-like nuclease

COG1468 (cas4) COG4343 All RecB-like nuclease Contains three-cysteine C-terminal cluster

5 RAMP Repair-associated mysterious

protein

COG1688 (cas5)COG1769 COG1583COG1567 COG1336COG1367 COG1604COG1337 COG1332COG5551

BH0337-likeMJ0978-likeYgcH-like

y1726-like y1727-like

All Belong to ldquoRAMPrdquo family possibly RNA-binding proteinstructurally related to duplicated ferredoxin fold (PDB 1wj9)

6 COG1857 COG1857 COG3649 YgcJ-like y1725-like

All αβ protein predicted nuclease or integrase

7 HD-like nuclease

COG1203 (N-terminus) COG2254 All HD-like nuclease

8 BH0338 BH0338-likeMTH1090-like

All mostlyarchaea and FIRM

Large Zn-finger containing proteins probably nucleases (nucleaseactivity was shown for MTH1090 (ref)

9 COG1353 predicted

polymerase

COG1353 Most archaea some bacteria

Predicted palm-domain polymerase distantly related to viral RdRp and RT

hellipand ~15 other less common proteins Makarova et al BD 2006

CRISPR clustered regularly interspaced short palindromic repeats

CRISPR repeats

TB and TA

Strepto-like

Ecoli-like

Pasteurella-like

Cas2COG1343

Cas1COG1518

Cas4COG1468

Cas3COG1203

ldquoThe common structural characteristics of CRISPR loci are (i) the presence of multiple short direct repeats which show no or very little sequence variation within a given locus (ii) the presence of non-repetitive spacer sequences between the repeats of similar size (iii) the presence of a common leader sequence of a few hundred basepairs in most species harbouring multiple CRISPR loci (iv) the absence of long open reading frames within the locus and (v) the presence of the cas1 gene accompanied by the cas2 cas3 or cas4 genes in CRISPR-containing speciesrdquo

The cas genes are our ldquorepairrdquo system

ldquoHelicaserdquo cassette

ldquoPolymeraserdquo cassette

paragon of prokaryotic genome plasticity-extensive gene shuffling-evidence of multiple horizontal transfers-widespread gene loss

RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins

extreme sequence diversity MOTIFs specific I II III IV V COGs 13361367 hhshhGs ust-lKGhh+hh hhGtt hD lGhttsGh 160413371332 y1726-like slhlpEKuVRGT lRTIDs YGuVTsGhuh COG1851 hGphpGpsaFh hGFGRh BH0337-like TpA-h+GIh-uIh hhLpDV LGsREhuht COG1567 hhhpp ups-lhtAhh luscoGhGh COG1769 hhhh+Ph-hhh ss-hhGhlhsh hGh hGtcp+hsthchp COG1688 (Cas5) hhhhhts sssshhGhlsh lGttphh COGs 15835551 hhhhoPhhl hGtppshGFGl YgcH-like hHphlh hGu+uhGhGhh y1727-like LHphLh GFsthGLStss MJ0978-like hHNH lG+tsuhGhGol

Ferredoxin foldRNA-recognition motif (RRM)

The sequence similarity space of CRISPR repeats visualized with the BioLayoutprogram [26] Dots denote individual repeat sequences connecting lines represent Smith-Waterman similarities such that closer dots represent more similar sequences Dot colors denote cluster association as derived from MCL clustering The 12 largest clusters are indicated by circles together with their sequence logos coarse phylogenetic composition and sample secondary structures where applicableKunin et al Genome Biology 2007 8R61 doi101186gb-2007-8-4-r61

CRISPR show extreme diversity and complex clustering

Arcful GTTGAAATC-------AGACCAAAATGGGATTGAAAG- Metthe GTTAAAATC-------AGACCAAAATGGGATTGAAAT- Pyraby GTTCCAATA-------AGACTAGAATAGAATTGAAAG- Pyrhor GTTTAATAA--------GACTAAAATAGAATTGAAAG- Sulsol GATTAATCC-------------CAAAAGAATTGAAAG- Pyrhor GTTTCCGTA--------GAACTAAATAGTGTGGAAAG- Pyraby GTTTCCGTA--------GAACTTAGTAGTGTGGAAAG- Pyraer GTTTCAACT------------ATCTTTTGATTTTTGG- Pictor GTTAAATAA-------TAACCTAAATAGGATTGAAAG- Theaer GTAAAATAG---------ACCTTAATAGGATTGAAAG- Metace GTTTCAATC-------CCTCAAAGGTCTGATTTTAAC- Sultok GATGAATCC------------CAAAAGGAATTGAAAG- Sulaci GTTTTAGTT-------------TCTTGTCGTTATTAC- Pictor GTTTAAGAA--------TTACTAGATAGTATGGAGT-- Halmar GTTTCAGAC-------GGACCCTTGTGGGGTTGAAGC- Metjan ATTAAAATC-------AGACCGTTTCGGAATGGAAAT- Metkan GTTTCATTA-CCCGTATTATTACGGGTTAATTGCGAG- Nanequ CTTTCAATA---------TTTCTAATATATTAGAAAC- Themar GTTTCAATA-------CTTCCTTAGAGGTATGGAAAC- Theten GTTTCAATC-------CCTTTTAGGTAGGCTAAAAAC Thethe GTTGCAAAC-CTCGTTAGCCTCGTAGAGGATTGAAAC- Aquaeo GTTTTAACT--------CCACACGGTACATTAGAAAC- Bachal GTCGCACTC----TATATGGGT-GCGTGGATTGAAAT- Azoarcus GTGTTCCCC-------GCGCATCGCGGGGGTTGAAG-- Chltep GTCTTCCCC--------ACGCC-CGTGGGGGTGTTTC- Chrvio GTGCTCCCC-------CACGCA-CGTGGGGATGAACCG Clotet GTATTAGTA-------GCACCA-TATTGGAATGTAAAT Anabaena GTTTTAATTAACAAAAATCCCTATCAGGGATTGAAAC- Myctub GTTTCCGTC--CCCTCTCGGGGTTTTGGGTCTGACGAC Yerpes GTTCACTGC--------CGCACAGGCAGCTTAGAAA-- Metcap GTTTCAATCCACTCCCGGCTATTTAGCCGGGAGATAC- Mycgal GTTTTAGCA-CTGTACAATACTTGTGTAAGCAATAAC- Pasmul GTTAACTGC--------CGTATAGGCAGCTTAGAAA-- Pasmul GTTGTAGTTCCCTCTCTCATTTCGCAGTGCTACAAT-- Pasmul GTTCACCAT--------CGTGTAGATGGCTTAGAAA--

All

A large subset of CRISPRs is conserved among diverse specieseven between archaea and bacteria suggesting that

CRISPRs are horizontally transferred together with cas genes

Mojica FJ Diez-Villasenor C Garcia-Martinez J Soria E

Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elementsJ Mol Evol 2005 Feb60(2)174-82

Here we show that CRISPR spacers derive from preexisting sequences either chromosomal or within transmissible genetic elements such as bacteriophages and conjugative plasmids

helliphelliphelliphellipThe transcription of the CRISPR loci (Tang et al 2002) suggests that such activity could be executed by CRISPR-RNA molecules acting as regulatory RNA that specifically recognizes the target through the homologous RNA-spacer sequence similarly to the eukaryotic interference RNA

HypothesisCRISPRCasbull is a prokaryotic immunity system that functions on the

RNAi principle

bull integrates short fragments of essential phageplasmid genes into CRISPRs

bull When expressed these fragments (psiRNA ndash after prokaryotic siRNA) silence the target gene and make the organism immune to the respective agent

bull contains all or most of the protein activities involved in these processes

bull Some of the Cas proteins are functional analogs of the eukaryotic proteins involved in RNAi in particular components of RISC (RNA-Induced Silencing Complex) and form prokaryotic analogs of RISC (pRISCs)

Transcription

(protein-guided) RNA folding

specific transcription factor (COG1517)

cellular RNApol

polycistronic pre-psiRNA

p-dicer (Helicase+HD-Hydrolase)(COG1203 COG2254)

RNA processing(fast)

pre-psiRNA75-100 nt

RNA processing(slow)

p-dicer

psiRNA25-45 nt

p-slicer(COG1468 COG4343

COG1857)

RAMP

target RNA cleavage

p-RISC

plasmid or phage mRNA

Annealing to RNA target

p-RISC

The basic scheme of CASS functioning

psiRNA25-45 nt

RAMP

RAMP-RNA complex(unstable)plasmid or phage

mRNA

Primer elongationCASS RNApol(COG1353)

Amplified annealing

long dsRNA (stable)

p-dicerDuplex degradation

RAMPRAMP binding

Cycle continues

Annealing to RNA target

Variant of CASS functioning with polymerasepsi-RNA amplification(mostly in thermophiles but also in Mycobacteria)

plasmid or phage mRNA

pre-psiRNA75-100 nt

CASS RNApol(COG1353)Or other RT

Reverse transcription with random copy choice

Random RNA recombination and reverse transcription

dsDNA with CRISPRs and target-derived spacers

Homologous recombination with genomic CRISPR region

integrase(COG1518)

genomic DNA

genomic DNA with new target-derived spacer

OR

New psiRNA generation

Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12

Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity

Key validation

bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease

Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes

(1) Adaptation

(2) Expression amp Processing

(3) Interference

invading virus plasmid

CRISPRleader

5 4 3 2 1

6 5 4 3 2 1

invading virus plasmid

invading virus plasmid

host

host

cas operon

Cascade

Cas3

cas3 cas2cas1cas5

Van der Oost et al TIBS 2009

The three types of CRISPRCas systems and their signature genes

1343

1343

1343

1343

Helicase1203

HD-f

Cas2

Cas6

Cas6

Cas5

Cas8

Cas9

Cas1

Cas3

Cas10

Cas11

Cas7

Cas6

Cas6

Cas5 Cas6

Cas4

Cas6

13433513

1343

RuvC-f

3513

HNH

13433513

1857

18571857 RAMP1688

RAMP1688

RecB-f1468

RecB-f1468

Helicase1203

R AMP1583

1857 1857Helicase1203

1857Helicase1203

1343 y1724

ygcKygcL

EcoliCASS2

YpestCASS3

NmeniCASS4

MtubeCASS6 RAMP module

DvulgCASS1

Tneap and HmariCASS7

ApernCASS5

Type I (Helicase cassette)

Cascade

Type II (HNH-type)

Type III (RAMP module or Polymerase cassette)

CASS4a

1857 1343RecB-f1468

Helicase1203

BH0338MTH1090LA3191

1421 RAMP1567

RAMP15833337RAMP

1583

RAMP1583

RAMP1769

SPy1049

Cas6

Cas61857

RecB-f1468

RAMP1688

Helicase1203

AF0070

R AMP1583

1343

1517

Cas12 Cas13

RAMP1688

RAMP1688

RAMP1688

Makarova et al NRM submitted

Experimental data on CRISPRCas systems

1857

1857

1857

RAMP1688

10

Cordip DIP0037Bacfra BF3951

Wolsuc WS1444Camjej Cj1522c

Neimen NMA0630Pasmul PM1126

Strthe str0658Strpyo Spy 0770

Treden TDE0328Staepi SERP2463

Mycgal MGA 0523Mycmob MMOB0320

Porgin PG1982Legpne lpp0161

Wolsuc WS1615Sulsol SSO1450

Aerper APE1240Sulsol SSO1405

Pyraer PAE0200Pyraer PAE0081

Metmaz MM0559Metmaz MM3249

Arcful AF1878Mansuc MS1635

Niteur NE0111Myctub Rv2817c

Nostoc alr0381Synech slr7071Thethe TTHB145

Chrvio CV1229Xanaxo XAC3842

Chltep CT1130Desvul DVUA0134

Metcap MCA0651Azoarc ebA3283

Deigeo_DRAFT_1682Mansuc MS0981

Baccla ABC3592Bachal BH0341

Strpyo Spy 1286Vibvul VVA1544

Nostoc alr1468Synech slr7092

Synech slr7016Geosul GSU0057

Thethe TTHB224Nostoc alr1568

Metkan MK1312

Esccol b2755Saltyp STM2938

Phopro PBPRC0034Geosul GSU1392

Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639

Symthe_STH663Corjei jk0643

Nocfar nfa44220Strave SAV7537

Metcap MCA0930Cordip DIP2214

Chrvio CV1756Zymmob ZMO0680

Pasmul PM0311Erwcar ECA3679

Yerpes y1722Legpne lpl2837Phopro PBPRB1995

Acinet ACIAD2484

Clotet CTC01148Fusnuc FN1177

Aquaeo aq 369Theten TTE2658Themar TM1797

Bacfra BF2544Porgin PG2014

Halmar pNG4053Clotet CTC01463

Metjan MJ0378Pyrhor PH1245

Thekod TK0455Metthe MTH1084

Arcful AF2435Metace MA3670

Nanequ NEQ017Pyrhor PH0173

Pictor PTO0003Pictor PTO0049Thevol TVN0106

1343

1343

1343

1343

1518

1518

1343

1518

RecB-f4343

RecB-f1468

RecB-f1468

RecB-f1468

Helicase12031857 RAMP

1688

RAMP1688

Helicase1203

Helicase1203

RAMPBH0337

RAMPYgcH

RAMPy1727

RAMPy17261857

Helicase1203

HD-f

Helicase1203

HD-f

1343

HD-f

HD-f

HD-f

RuvC-f

AF0070

BH0338

y1724

ygcKygcL

RAMP1583

HTH

AF1870

SPy1049

3574

AF1870

PH0918

HTH 3574

1857 RAMP1688

RecB-f1468

Helicase1203

HD-f

AF1870

MTH1090

AF1873

RAMP1583

3574

AF1870

3574

1343

1

7

2

3

4

5

6

7a

Sulsol SSO1999Sulsol SSO1440

Sultok ST2639Sultok ST0032

Sulsol SSO1402Pyraer PAE0068

Arcful AF1874Pyrhor PH0917Thekod TK0450

Pyraby PAB1689

13433513

3513

1518

RuvC-f

HNH

4aHNH

1 0

Pyrhor PH0162

6

64

6

4

16

17

7

7

7

7

1

2

7

7

7

7

5 7

5 75 7

5 75 7

5 7

6 1

7

Metkan MK1314

Metace MA1931

Metmaz MM3356

Metbar Mbar_A1355

Strthe stu0960Myctub Rv2823cStaepi SERP2461

Themar TM1811Mansuc MS1651

Niteur NE0123Thethe TTP0102

Metthe MTH1082

Metthe MTH326

Metjan MJ1672Pictor PTO0053Thevol TVN0112

Metkan MK1297Thethe TTP0115

Bachal BH0328Synech sll7090

Aquaeo aq 387Theten TTE2639

Themar TM1794Arcful AF1867

Pyrfur PF1129Thefus Tfu1578

Porgin PG1987Sultok ST1979

Sulsol SSO1991Sulsol SSO1729

Sultok ST0012Sulaci Saci 2046Nostoc all1479

1421 RAMP1567

RAMP1583

RAMPMTH323

RAMP1769

XMTH324

ldquoPolymeraserdquo cassette

ldquoHelicaserdquo cassette(A)

(B)

(C)

Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4

Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56

Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)

mRNA targeting

Phylogeny of Cas1 and the 3 types of CRISPRCas systems

228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes

EcoliCASS2Type IType I

Type I

Type I

Type I

Type III

Type III

Type III

Type III

Type III

Type III

Type II

YpestCASS3

NmeniCASS4

PolymeraseRAMPmoduleMtubeCASS6

PolRAMP

DvulgCASS1

TneapHmariCASS7

ApernCASS5

cas1cas9 cas2 cas4

cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI

II

IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2

RAMP

Makarova et al NRM submitted

CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria

bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes

54

256

1537

26

516

2

9

7 56107

3

10

13

383

210

46

216

8

1

7 70211

10

1

0102030405060708090

100

1623

84 6

17 7

2322

0

9

0

0

151

2

1

21

17

20 1

0

1533 28 7 14

0

9 7 40

117 210

0102030405060708090

100

Type I Type II Type IIICas1presenceabsence

Makarova et al in preparation

Modular evolution of the 3 types of CRISPRCas

Makarova et al in preparation

Mol Microbiol 2011 Jan79(2)484-502

A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair

Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF

Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair

Back to repair functions

The 3 major modalities of evolution

Koonin Wolf Biol Direct 2009

CRISPRCas as a bona fide Lamarckian system

Koonin Wolf Biol Direct 2009

Phenomenon Biological rolefunction

Phyletic spread

Lamarckian criteria

Genomic changes caused by environmental factor

Changes are specific to relevant genomic loci

Changes provide adaptation to the causative factor

Bona fide LamarckianCRISPRCas Defense against

viruses and other mobile elements

Most of the Archaeaand many bacteria

Yes Yes Yes

piRNA Defense against transposable elements in germline

Animals Yes Yes Yes

HGT (specific cases)

Adaptation to new environment stress response resistance

Archaea bacteria unicellular eukaryotes

Yes Yes Yes

Quasi-LamarckianHGT (general phenomenon)

Diverse innovations Archaea bacteria unicellular eukaryotes

Yes No Yesno

Stress-induced mutagenesis

Stress responseresistanceadaptation to new conditions

Ubiquitous Yes No or partially

Yes (but general evolvabilityenhanced as well)

Diverse Lamarckian and quasi-Lamarckian phenomena

Koonin Wolf Biol Direct 2009

Stress as a gauge of evolutionary modality

Koonin Wolf Biol Direct 2009

bullEvolution of parasites is intrinsic to any replicator system

bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since

bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms

bullPerennial arms race between parasites and hosts is one of the principal factors of evolution

Valerian DoljaOregon State University

Yuri Wolf NCBITatiana Senkevich NIAID NIH

Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille

Didier Raoult et al

Bill MartinUniv Duesseldorf

Kira Makarova NCBI

  • Slide Number 1
  • What is a virus
  • Slide Number 3
  • Slide Number 4
  • Slide Number 5
  • Comparative genomics shows that viruses that cause human diseases belong to families that evolved hundreds of millions or even billions years agoViruses accompany the evolving cellular life throughout its history and might even predate it
  • Slide Number 7
  • Slide Number 8
  • Slide Number 9
  • Slide Number 10
  • Slide Number 11
  • Slide Number 12
  • Slide Number 13
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Slide Number 18
  • Slide Number 19
  • Slide Number 20
  • Slide Number 21
  • Slide Number 22
  • Slide Number 23
  • Slide Number 24
  • Slide Number 25
  • Slide Number 26
  • Slide Number 27
  • Slide Number 28
  • Slide Number 29
  • Slide Number 30
  • Slide Number 31
  • Slide Number 32
  • Slide Number 33
  • Slide Number 34
  • Slide Number 35
  • Slide Number 36
  • Evolution of antivirus defense systems
  • Slide Number 38
  • Slide Number 39
  • Slide Number 40
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins extreme sequence diversity
  • Slide Number 46
  • Slide Number 47
  • Slide Number 48
  • Slide Number 49
  • Hypothesis
  • Slide Number 51
  • Slide Number 52
  • Slide Number 53
  • Slide Number 54
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • The three types of CRISPRCas systems and their signature genes
  • Experimental data on CRISPRCas systems
  • Phylogeny of Cas1 and the 3 types of CRISPRCas systems
  • CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
  • Slide Number 62
  • Slide Number 63
  • Slide Number 64
  • Slide Number 65
  • Slide Number 66
  • Slide Number 67
  • Slide Number 68
  • Slide Number 69
Page 20: The Virus World, its evolution, evolution of antiviral defense

View from the viral side

6 major clades of picorna-likeviruses - 5 infect eukaryotesfrom different supergroups

The 5 supergroups of eukaryotes and their picorna-like viruses

Complementary view from thehost side

4 of the 5 eukaryoticsupergroups host diverse picorna-like viruses (no onechecked the 5th)

PolioV74 kb

VP0 3DPol2C 3CPro Angg

IRESVP1VP3 2APro 2B

3A 3B

Jelly-roll CPsSuperfamily 3

helicase RdRp

DNA phages Bacterial group II

retroelements

Bacterial and mitochondrialHtrA family proteases

Likely origins of the signature genes of the picorna-like superfamily were inferred usingsignature PSSMs and PSI-BLAST searches

3C-Pro

Hallmark genes of picorna-like viruses have distinct prokaryoticorigins

Radiation of major viralclades occurred in aldquoBig Bangrdquo during eukaryogenesis and antedates the divergenceof eukaryotic supergroups-the viruses then ldquosampledrdquo the hosts

Conclusions on the picorna and NCLDV storiesBig Bangs of virus evolution

bullPhylogenomic analysis of the picorna-like superfamily of eukaryotic RNA viruses indicates that the major lineages within this superfamily diverged prior to the divergence of the eukaryotic supergroupsbullBig Bang ndash an explosive early phase of viral evolutionbullMost likely the same pattern holds for other major groups ofviruses as illustrated by the evolutionary study of NCLDVa completely different group of virusesbullThe Big Bangs of eukaryotic virus evolution occurred concomitantly with a similar rapid phase of host evolution andcould be a manifestation of a general model of major evolutionarytransitions

Koonin The Biological Big Bang model for the major transitions in evolutionBiol Direct 2007 Aug 20221

Koonin Wolf Nagasaki Dolja Nat Rev Microbiol 2008 Dec6(12)925-39

+

1 Positive-strandRNA

3-30 kb - +R R ET

T+R

Class Replication cycle

2 Double-strandRNA

4-25 kb plusmnTr R E

T

+ plusmn

3 Negative-strandRNA

11-20 kbR R E

RdRp CPT

+-Tr -

+

Genetic cycles of RNA viruses

Diversity of viral genetic cycles versus the uniform genetic cycle of all cellular organisms Viruses are the

biospherersquos laboratory of genomic strategies

RdRp CP

RdRp CP

+4 Retroid

RNA viruses 7-12 kb +RT Tr E

T+Tr

plusmn

5 Retroid DNAviruses

elements2-10 kb +Tr

T+Tr

plusmnE RT

plusmn

RT CP

RT

Class Replication cycle

Genetic cycles of retroid viruses and retroelements

Genetic cycles of DNA viruses and plasmids 6 ssDNAvirusesplasmids2-11 kb

7 dsDNA virusesplasmids

5-1200 kb

+RCR RCR E

+Tr

plusmn +

TRCE

Tr

plusmn

+T

plusmnR E

Pr-Pol ALL CELLULARORGANISMS

YOU ARE HERE

I Genes with readily detectable homologs from cellular life forms 1 Genes with closely related homologs from cellular organisms

(typically the host of the given virus) present in a narrow group of viruses2 Genes that are conserved within a virus lineage or even several

lineages and have moderately close cellular homologsOrigin relatively recent (1) or ancient (2) acquisition from host

II Virus-specific genes3 ORFans ie genes without detectable homologs except possibly

in closely related viruses 4 Virus-specific genes that are conserved within a virus lineageAcquisition from host but with rapid divergence from ancestor once within viral genomes

III Viral hallmark genes5 Genes shared by many diverse virus lineages with only very distant

homologs in cellular organisms

Natural history of viral genes a one-page summary of viral comparative genomics

Contributions of different classes of viral genes to the genomes of different classes of viruses strong dependence on genome size

Genome size (log) 0 1 2 3

Recent acquisitions

Old acquisitions

Virus-specific ORFans

Virus-specific conserved

Virus hallmark

Most RNAvirusesretroelementsRCR replicons

Large RNA virusesadenotailedphages

NCLDVherpeslargephages

Natural history of viral genesViral Hallmark Genes

Shared by many diverse groups of viruses from the smallest RNA viruses to the giant DNA viruses

Strong support for monophyly of all viral members of the respective gene families

Only distant homologs in cellular organisms

Can be viewed as signatures of the lsquovirus statersquo

Play major roles in genome replication packagingand assembly

4 Rolling circle replicationinitiation endonuclease

1 Jelly-roll capsid protein

2 Superfamily 3 helicase

Protein products of viral hallmark genes

3 RNA-dependent RNA polymeraseand Reverse transcriptase

5 Viral archaeo-eukaryotic DNA primase

6 UL9-like superfamily 2 helicase

7 Packaging ATPase of the FtsK family

8 ATPase subunit of terminase

The primordial gene pool hypothesis (extremely counterintuitive ndash Santiago Elena Feb 16 2011)

The hallmark genes AND by implication the major lineages of modern viruses (at least viruses of prokaryotes) descend directly from a primordial gene pool

Koonin Senkevich Dolja Biol Direct 2006 129

Viral hallmark genes arebullpresent in a huge diversity of viruses and other selfish elementsbullrepresented only by remote homologs in cellular life forms

-synergy with the diversity of genomic strategies

A crucial corollary If viruses come directly from a primordial gene pool then origin of viruses isinextricably linked to the origin of cells

Cell degeneration Escaped genes Pr imordial geneticsystems

CELL

SMALLPARASITICCELL

VERYSMALLVIRUS

CHROMOSOME

PLASMID

VIRUS

RNA

VIRUS

PRE-CELLULAR LIFE FORMS

DNA

VIRUS

mRNA

VIRUS

Competing concepts of the or igin of viruses

CELLS

Origin and evolution of virus-like genetic

elementsin the pre-cellular

era

Koonin Martin TIG 2005Koonin Senkevich Dolja Biol Direct 2006 129Koonin Ann NY Acad Sci 2009

Replicon fusionyields chromosomes

bullViruses and virus-like genetic elements are not ldquojustrdquo pathogens they aredominant entities in the biospherebullEmergence of virus-like parasites is inevitable in any replicating systembullIn the pre-cellular epoch the genetic elements that later became viral andcellular genomes comprised a single pool in which they mixed matched andevolved new increasingly complex gene ensemblesbullDifferent replication strategies including RNA replication reverse transcriptionand DNA replication evolved already in the primordial genetic poolbullWith the emergence of prokaryotic cells a distinct pool of viral genes formed that retained its identity ever since as evidenced by the extant distribution ofviral hallmark genes ldquovirus worldrdquo or the virospherebullThe emergence of the eukaryotic cell was a second melting pot of virus evolution from which viruses of eukaryotes originated via recombination of genes from prokaryote viruses retroelements and the evolving eukaryotic hostbullViruses make essential contributions to the evolution of the genomes of cellularlife forms in particular as vehicles of HGT GTAs transducing phages

The ancient Virus World (VW)

Koonin EV Senkevich TG Dolja VV The ancient Virus World and evolution of cells Biol Direct 2006 Koonin EV Wolf YI Nagasaki K Dolja VV The complexity of the virus world Nat Rev Microbiol 2009

Evolution of antivirus defense systems

bull CRISPRCas system of adaptive immunity in prokaryotes

bull A case for Lamarckian evolutionbull The perennial virus-host arms race

CRISPR repeats and Cas genesCRISPR Clustered Regularly interspaced short palindromic repeatsCas CRISPR-associated (genes)

Sorek et al Nature Rev Microbiol 2008

Makarova KS Aravind L Grishin NV Rogozin IB Koonin EVA DNA repair system specific for thermophilic Archaea and bacteria predicted by genomic context analysisNucleic Acids Res 2002 Jan 1530(2)482-96

During a systematic analysis of conserved gene context in prokaryotic genomes a previously undetected complex partially conserved neighborhood consisting of more than 20 genes was discovered in most Archaea and some bacteria including the hyperthermophiles Thermotoga maritima and Aquifex aeolicus The gene composition and gene order in this neighborhood vary greatly between species but all versions have a stable conserved core that consists of five genes One of the core genes encodes a predicted DNA helicase often fused to a predicted HD-superfamily hydrolase and another encodes a RecB family exonuclease three core genes remain uncharacterized but one of these might encode a nuclease of a new family helliphelliphelliphellipThe functional features of the proteins encoded in this neighborhood suggest that they comprise a previously undetected DNA repair system which to our knowledge is the first repair system largely specific for thermophiles to be identified

Protein components of the system an update with unification of many diverse families

~25 families altogetherFamily SubfamilyA Phyletic

distributionBComments

1 COG1518 COG1518 (cas1) All Putative novel nucleaseintegrase Mostly α-helical protein

2 COG1343 COG1343 (cas2) COG3512ygbF-like

MTH324-likey1723_N-like

All Small protein related to VapD fused to helicase (COG1203) iny1723-like proteins

3 COG1203 COG1203 (cas3) All DNA helicase Most proteins have fusion to HD nuclease

4 RecB-like nuclease

COG1468 (cas4) COG4343 All RecB-like nuclease Contains three-cysteine C-terminal cluster

5 RAMP Repair-associated mysterious

protein

COG1688 (cas5)COG1769 COG1583COG1567 COG1336COG1367 COG1604COG1337 COG1332COG5551

BH0337-likeMJ0978-likeYgcH-like

y1726-like y1727-like

All Belong to ldquoRAMPrdquo family possibly RNA-binding proteinstructurally related to duplicated ferredoxin fold (PDB 1wj9)

6 COG1857 COG1857 COG3649 YgcJ-like y1725-like

All αβ protein predicted nuclease or integrase

7 HD-like nuclease

COG1203 (N-terminus) COG2254 All HD-like nuclease

8 BH0338 BH0338-likeMTH1090-like

All mostlyarchaea and FIRM

Large Zn-finger containing proteins probably nucleases (nucleaseactivity was shown for MTH1090 (ref)

9 COG1353 predicted

polymerase

COG1353 Most archaea some bacteria

Predicted palm-domain polymerase distantly related to viral RdRp and RT

hellipand ~15 other less common proteins Makarova et al BD 2006

CRISPR clustered regularly interspaced short palindromic repeats

CRISPR repeats

TB and TA

Strepto-like

Ecoli-like

Pasteurella-like

Cas2COG1343

Cas1COG1518

Cas4COG1468

Cas3COG1203

ldquoThe common structural characteristics of CRISPR loci are (i) the presence of multiple short direct repeats which show no or very little sequence variation within a given locus (ii) the presence of non-repetitive spacer sequences between the repeats of similar size (iii) the presence of a common leader sequence of a few hundred basepairs in most species harbouring multiple CRISPR loci (iv) the absence of long open reading frames within the locus and (v) the presence of the cas1 gene accompanied by the cas2 cas3 or cas4 genes in CRISPR-containing speciesrdquo

The cas genes are our ldquorepairrdquo system

ldquoHelicaserdquo cassette

ldquoPolymeraserdquo cassette

paragon of prokaryotic genome plasticity-extensive gene shuffling-evidence of multiple horizontal transfers-widespread gene loss

RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins

extreme sequence diversity MOTIFs specific I II III IV V COGs 13361367 hhshhGs ust-lKGhh+hh hhGtt hD lGhttsGh 160413371332 y1726-like slhlpEKuVRGT lRTIDs YGuVTsGhuh COG1851 hGphpGpsaFh hGFGRh BH0337-like TpA-h+GIh-uIh hhLpDV LGsREhuht COG1567 hhhpp ups-lhtAhh luscoGhGh COG1769 hhhh+Ph-hhh ss-hhGhlhsh hGh hGtcp+hsthchp COG1688 (Cas5) hhhhhts sssshhGhlsh lGttphh COGs 15835551 hhhhoPhhl hGtppshGFGl YgcH-like hHphlh hGu+uhGhGhh y1727-like LHphLh GFsthGLStss MJ0978-like hHNH lG+tsuhGhGol

Ferredoxin foldRNA-recognition motif (RRM)

The sequence similarity space of CRISPR repeats visualized with the BioLayoutprogram [26] Dots denote individual repeat sequences connecting lines represent Smith-Waterman similarities such that closer dots represent more similar sequences Dot colors denote cluster association as derived from MCL clustering The 12 largest clusters are indicated by circles together with their sequence logos coarse phylogenetic composition and sample secondary structures where applicableKunin et al Genome Biology 2007 8R61 doi101186gb-2007-8-4-r61

CRISPR show extreme diversity and complex clustering

Arcful GTTGAAATC-------AGACCAAAATGGGATTGAAAG- Metthe GTTAAAATC-------AGACCAAAATGGGATTGAAAT- Pyraby GTTCCAATA-------AGACTAGAATAGAATTGAAAG- Pyrhor GTTTAATAA--------GACTAAAATAGAATTGAAAG- Sulsol GATTAATCC-------------CAAAAGAATTGAAAG- Pyrhor GTTTCCGTA--------GAACTAAATAGTGTGGAAAG- Pyraby GTTTCCGTA--------GAACTTAGTAGTGTGGAAAG- Pyraer GTTTCAACT------------ATCTTTTGATTTTTGG- Pictor GTTAAATAA-------TAACCTAAATAGGATTGAAAG- Theaer GTAAAATAG---------ACCTTAATAGGATTGAAAG- Metace GTTTCAATC-------CCTCAAAGGTCTGATTTTAAC- Sultok GATGAATCC------------CAAAAGGAATTGAAAG- Sulaci GTTTTAGTT-------------TCTTGTCGTTATTAC- Pictor GTTTAAGAA--------TTACTAGATAGTATGGAGT-- Halmar GTTTCAGAC-------GGACCCTTGTGGGGTTGAAGC- Metjan ATTAAAATC-------AGACCGTTTCGGAATGGAAAT- Metkan GTTTCATTA-CCCGTATTATTACGGGTTAATTGCGAG- Nanequ CTTTCAATA---------TTTCTAATATATTAGAAAC- Themar GTTTCAATA-------CTTCCTTAGAGGTATGGAAAC- Theten GTTTCAATC-------CCTTTTAGGTAGGCTAAAAAC Thethe GTTGCAAAC-CTCGTTAGCCTCGTAGAGGATTGAAAC- Aquaeo GTTTTAACT--------CCACACGGTACATTAGAAAC- Bachal GTCGCACTC----TATATGGGT-GCGTGGATTGAAAT- Azoarcus GTGTTCCCC-------GCGCATCGCGGGGGTTGAAG-- Chltep GTCTTCCCC--------ACGCC-CGTGGGGGTGTTTC- Chrvio GTGCTCCCC-------CACGCA-CGTGGGGATGAACCG Clotet GTATTAGTA-------GCACCA-TATTGGAATGTAAAT Anabaena GTTTTAATTAACAAAAATCCCTATCAGGGATTGAAAC- Myctub GTTTCCGTC--CCCTCTCGGGGTTTTGGGTCTGACGAC Yerpes GTTCACTGC--------CGCACAGGCAGCTTAGAAA-- Metcap GTTTCAATCCACTCCCGGCTATTTAGCCGGGAGATAC- Mycgal GTTTTAGCA-CTGTACAATACTTGTGTAAGCAATAAC- Pasmul GTTAACTGC--------CGTATAGGCAGCTTAGAAA-- Pasmul GTTGTAGTTCCCTCTCTCATTTCGCAGTGCTACAAT-- Pasmul GTTCACCAT--------CGTGTAGATGGCTTAGAAA--

All

A large subset of CRISPRs is conserved among diverse specieseven between archaea and bacteria suggesting that

CRISPRs are horizontally transferred together with cas genes

Mojica FJ Diez-Villasenor C Garcia-Martinez J Soria E

Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elementsJ Mol Evol 2005 Feb60(2)174-82

Here we show that CRISPR spacers derive from preexisting sequences either chromosomal or within transmissible genetic elements such as bacteriophages and conjugative plasmids

helliphelliphelliphellipThe transcription of the CRISPR loci (Tang et al 2002) suggests that such activity could be executed by CRISPR-RNA molecules acting as regulatory RNA that specifically recognizes the target through the homologous RNA-spacer sequence similarly to the eukaryotic interference RNA

HypothesisCRISPRCasbull is a prokaryotic immunity system that functions on the

RNAi principle

bull integrates short fragments of essential phageplasmid genes into CRISPRs

bull When expressed these fragments (psiRNA ndash after prokaryotic siRNA) silence the target gene and make the organism immune to the respective agent

bull contains all or most of the protein activities involved in these processes

bull Some of the Cas proteins are functional analogs of the eukaryotic proteins involved in RNAi in particular components of RISC (RNA-Induced Silencing Complex) and form prokaryotic analogs of RISC (pRISCs)

Transcription

(protein-guided) RNA folding

specific transcription factor (COG1517)

cellular RNApol

polycistronic pre-psiRNA

p-dicer (Helicase+HD-Hydrolase)(COG1203 COG2254)

RNA processing(fast)

pre-psiRNA75-100 nt

RNA processing(slow)

p-dicer

psiRNA25-45 nt

p-slicer(COG1468 COG4343

COG1857)

RAMP

target RNA cleavage

p-RISC

plasmid or phage mRNA

Annealing to RNA target

p-RISC

The basic scheme of CASS functioning

psiRNA25-45 nt

RAMP

RAMP-RNA complex(unstable)plasmid or phage

mRNA

Primer elongationCASS RNApol(COG1353)

Amplified annealing

long dsRNA (stable)

p-dicerDuplex degradation

RAMPRAMP binding

Cycle continues

Annealing to RNA target

Variant of CASS functioning with polymerasepsi-RNA amplification(mostly in thermophiles but also in Mycobacteria)

plasmid or phage mRNA

pre-psiRNA75-100 nt

CASS RNApol(COG1353)Or other RT

Reverse transcription with random copy choice

Random RNA recombination and reverse transcription

dsDNA with CRISPRs and target-derived spacers

Homologous recombination with genomic CRISPR region

integrase(COG1518)

genomic DNA

genomic DNA with new target-derived spacer

OR

New psiRNA generation

Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12

Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity

Key validation

bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease

Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes

(1) Adaptation

(2) Expression amp Processing

(3) Interference

invading virus plasmid

CRISPRleader

5 4 3 2 1

6 5 4 3 2 1

invading virus plasmid

invading virus plasmid

host

host

cas operon

Cascade

Cas3

cas3 cas2cas1cas5

Van der Oost et al TIBS 2009

The three types of CRISPRCas systems and their signature genes

1343

1343

1343

1343

Helicase1203

HD-f

Cas2

Cas6

Cas6

Cas5

Cas8

Cas9

Cas1

Cas3

Cas10

Cas11

Cas7

Cas6

Cas6

Cas5 Cas6

Cas4

Cas6

13433513

1343

RuvC-f

3513

HNH

13433513

1857

18571857 RAMP1688

RAMP1688

RecB-f1468

RecB-f1468

Helicase1203

R AMP1583

1857 1857Helicase1203

1857Helicase1203

1343 y1724

ygcKygcL

EcoliCASS2

YpestCASS3

NmeniCASS4

MtubeCASS6 RAMP module

DvulgCASS1

Tneap and HmariCASS7

ApernCASS5

Type I (Helicase cassette)

Cascade

Type II (HNH-type)

Type III (RAMP module or Polymerase cassette)

CASS4a

1857 1343RecB-f1468

Helicase1203

BH0338MTH1090LA3191

1421 RAMP1567

RAMP15833337RAMP

1583

RAMP1583

RAMP1769

SPy1049

Cas6

Cas61857

RecB-f1468

RAMP1688

Helicase1203

AF0070

R AMP1583

1343

1517

Cas12 Cas13

RAMP1688

RAMP1688

RAMP1688

Makarova et al NRM submitted

Experimental data on CRISPRCas systems

1857

1857

1857

RAMP1688

10

Cordip DIP0037Bacfra BF3951

Wolsuc WS1444Camjej Cj1522c

Neimen NMA0630Pasmul PM1126

Strthe str0658Strpyo Spy 0770

Treden TDE0328Staepi SERP2463

Mycgal MGA 0523Mycmob MMOB0320

Porgin PG1982Legpne lpp0161

Wolsuc WS1615Sulsol SSO1450

Aerper APE1240Sulsol SSO1405

Pyraer PAE0200Pyraer PAE0081

Metmaz MM0559Metmaz MM3249

Arcful AF1878Mansuc MS1635

Niteur NE0111Myctub Rv2817c

Nostoc alr0381Synech slr7071Thethe TTHB145

Chrvio CV1229Xanaxo XAC3842

Chltep CT1130Desvul DVUA0134

Metcap MCA0651Azoarc ebA3283

Deigeo_DRAFT_1682Mansuc MS0981

Baccla ABC3592Bachal BH0341

Strpyo Spy 1286Vibvul VVA1544

Nostoc alr1468Synech slr7092

Synech slr7016Geosul GSU0057

Thethe TTHB224Nostoc alr1568

Metkan MK1312

Esccol b2755Saltyp STM2938

Phopro PBPRC0034Geosul GSU1392

Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639

Symthe_STH663Corjei jk0643

Nocfar nfa44220Strave SAV7537

Metcap MCA0930Cordip DIP2214

Chrvio CV1756Zymmob ZMO0680

Pasmul PM0311Erwcar ECA3679

Yerpes y1722Legpne lpl2837Phopro PBPRB1995

Acinet ACIAD2484

Clotet CTC01148Fusnuc FN1177

Aquaeo aq 369Theten TTE2658Themar TM1797

Bacfra BF2544Porgin PG2014

Halmar pNG4053Clotet CTC01463

Metjan MJ0378Pyrhor PH1245

Thekod TK0455Metthe MTH1084

Arcful AF2435Metace MA3670

Nanequ NEQ017Pyrhor PH0173

Pictor PTO0003Pictor PTO0049Thevol TVN0106

1343

1343

1343

1343

1518

1518

1343

1518

RecB-f4343

RecB-f1468

RecB-f1468

RecB-f1468

Helicase12031857 RAMP

1688

RAMP1688

Helicase1203

Helicase1203

RAMPBH0337

RAMPYgcH

RAMPy1727

RAMPy17261857

Helicase1203

HD-f

Helicase1203

HD-f

1343

HD-f

HD-f

HD-f

RuvC-f

AF0070

BH0338

y1724

ygcKygcL

RAMP1583

HTH

AF1870

SPy1049

3574

AF1870

PH0918

HTH 3574

1857 RAMP1688

RecB-f1468

Helicase1203

HD-f

AF1870

MTH1090

AF1873

RAMP1583

3574

AF1870

3574

1343

1

7

2

3

4

5

6

7a

Sulsol SSO1999Sulsol SSO1440

Sultok ST2639Sultok ST0032

Sulsol SSO1402Pyraer PAE0068

Arcful AF1874Pyrhor PH0917Thekod TK0450

Pyraby PAB1689

13433513

3513

1518

RuvC-f

HNH

4aHNH

1 0

Pyrhor PH0162

6

64

6

4

16

17

7

7

7

7

1

2

7

7

7

7

5 7

5 75 7

5 75 7

5 7

6 1

7

Metkan MK1314

Metace MA1931

Metmaz MM3356

Metbar Mbar_A1355

Strthe stu0960Myctub Rv2823cStaepi SERP2461

Themar TM1811Mansuc MS1651

Niteur NE0123Thethe TTP0102

Metthe MTH1082

Metthe MTH326

Metjan MJ1672Pictor PTO0053Thevol TVN0112

Metkan MK1297Thethe TTP0115

Bachal BH0328Synech sll7090

Aquaeo aq 387Theten TTE2639

Themar TM1794Arcful AF1867

Pyrfur PF1129Thefus Tfu1578

Porgin PG1987Sultok ST1979

Sulsol SSO1991Sulsol SSO1729

Sultok ST0012Sulaci Saci 2046Nostoc all1479

1421 RAMP1567

RAMP1583

RAMPMTH323

RAMP1769

XMTH324

ldquoPolymeraserdquo cassette

ldquoHelicaserdquo cassette(A)

(B)

(C)

Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4

Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56

Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)

mRNA targeting

Phylogeny of Cas1 and the 3 types of CRISPRCas systems

228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes

EcoliCASS2Type IType I

Type I

Type I

Type I

Type III

Type III

Type III

Type III

Type III

Type III

Type II

YpestCASS3

NmeniCASS4

PolymeraseRAMPmoduleMtubeCASS6

PolRAMP

DvulgCASS1

TneapHmariCASS7

ApernCASS5

cas1cas9 cas2 cas4

cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI

II

IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2

RAMP

Makarova et al NRM submitted

CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria

bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes

54

256

1537

26

516

2

9

7 56107

3

10

13

383

210

46

216

8

1

7 70211

10

1

0102030405060708090

100

1623

84 6

17 7

2322

0

9

0

0

151

2

1

21

17

20 1

0

1533 28 7 14

0

9 7 40

117 210

0102030405060708090

100

Type I Type II Type IIICas1presenceabsence

Makarova et al in preparation

Modular evolution of the 3 types of CRISPRCas

Makarova et al in preparation

Mol Microbiol 2011 Jan79(2)484-502

A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair

Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF

Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair

Back to repair functions

The 3 major modalities of evolution

Koonin Wolf Biol Direct 2009

CRISPRCas as a bona fide Lamarckian system

Koonin Wolf Biol Direct 2009

Phenomenon Biological rolefunction

Phyletic spread

Lamarckian criteria

Genomic changes caused by environmental factor

Changes are specific to relevant genomic loci

Changes provide adaptation to the causative factor

Bona fide LamarckianCRISPRCas Defense against

viruses and other mobile elements

Most of the Archaeaand many bacteria

Yes Yes Yes

piRNA Defense against transposable elements in germline

Animals Yes Yes Yes

HGT (specific cases)

Adaptation to new environment stress response resistance

Archaea bacteria unicellular eukaryotes

Yes Yes Yes

Quasi-LamarckianHGT (general phenomenon)

Diverse innovations Archaea bacteria unicellular eukaryotes

Yes No Yesno

Stress-induced mutagenesis

Stress responseresistanceadaptation to new conditions

Ubiquitous Yes No or partially

Yes (but general evolvabilityenhanced as well)

Diverse Lamarckian and quasi-Lamarckian phenomena

Koonin Wolf Biol Direct 2009

Stress as a gauge of evolutionary modality

Koonin Wolf Biol Direct 2009

bullEvolution of parasites is intrinsic to any replicator system

bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since

bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms

bullPerennial arms race between parasites and hosts is one of the principal factors of evolution

Valerian DoljaOregon State University

Yuri Wolf NCBITatiana Senkevich NIAID NIH

Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille

Didier Raoult et al

Bill MartinUniv Duesseldorf

Kira Makarova NCBI

  • Slide Number 1
  • What is a virus
  • Slide Number 3
  • Slide Number 4
  • Slide Number 5
  • Comparative genomics shows that viruses that cause human diseases belong to families that evolved hundreds of millions or even billions years agoViruses accompany the evolving cellular life throughout its history and might even predate it
  • Slide Number 7
  • Slide Number 8
  • Slide Number 9
  • Slide Number 10
  • Slide Number 11
  • Slide Number 12
  • Slide Number 13
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Slide Number 18
  • Slide Number 19
  • Slide Number 20
  • Slide Number 21
  • Slide Number 22
  • Slide Number 23
  • Slide Number 24
  • Slide Number 25
  • Slide Number 26
  • Slide Number 27
  • Slide Number 28
  • Slide Number 29
  • Slide Number 30
  • Slide Number 31
  • Slide Number 32
  • Slide Number 33
  • Slide Number 34
  • Slide Number 35
  • Slide Number 36
  • Evolution of antivirus defense systems
  • Slide Number 38
  • Slide Number 39
  • Slide Number 40
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins extreme sequence diversity
  • Slide Number 46
  • Slide Number 47
  • Slide Number 48
  • Slide Number 49
  • Hypothesis
  • Slide Number 51
  • Slide Number 52
  • Slide Number 53
  • Slide Number 54
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • The three types of CRISPRCas systems and their signature genes
  • Experimental data on CRISPRCas systems
  • Phylogeny of Cas1 and the 3 types of CRISPRCas systems
  • CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
  • Slide Number 62
  • Slide Number 63
  • Slide Number 64
  • Slide Number 65
  • Slide Number 66
  • Slide Number 67
  • Slide Number 68
  • Slide Number 69
Page 21: The Virus World, its evolution, evolution of antiviral defense

The 5 supergroups of eukaryotes and their picorna-like viruses

Complementary view from thehost side

4 of the 5 eukaryoticsupergroups host diverse picorna-like viruses (no onechecked the 5th)

PolioV74 kb

VP0 3DPol2C 3CPro Angg

IRESVP1VP3 2APro 2B

3A 3B

Jelly-roll CPsSuperfamily 3

helicase RdRp

DNA phages Bacterial group II

retroelements

Bacterial and mitochondrialHtrA family proteases

Likely origins of the signature genes of the picorna-like superfamily were inferred usingsignature PSSMs and PSI-BLAST searches

3C-Pro

Hallmark genes of picorna-like viruses have distinct prokaryoticorigins

Radiation of major viralclades occurred in aldquoBig Bangrdquo during eukaryogenesis and antedates the divergenceof eukaryotic supergroups-the viruses then ldquosampledrdquo the hosts

Conclusions on the picorna and NCLDV storiesBig Bangs of virus evolution

bullPhylogenomic analysis of the picorna-like superfamily of eukaryotic RNA viruses indicates that the major lineages within this superfamily diverged prior to the divergence of the eukaryotic supergroupsbullBig Bang ndash an explosive early phase of viral evolutionbullMost likely the same pattern holds for other major groups ofviruses as illustrated by the evolutionary study of NCLDVa completely different group of virusesbullThe Big Bangs of eukaryotic virus evolution occurred concomitantly with a similar rapid phase of host evolution andcould be a manifestation of a general model of major evolutionarytransitions

Koonin The Biological Big Bang model for the major transitions in evolutionBiol Direct 2007 Aug 20221

Koonin Wolf Nagasaki Dolja Nat Rev Microbiol 2008 Dec6(12)925-39

+

1 Positive-strandRNA

3-30 kb - +R R ET

T+R

Class Replication cycle

2 Double-strandRNA

4-25 kb plusmnTr R E

T

+ plusmn

3 Negative-strandRNA

11-20 kbR R E

RdRp CPT

+-Tr -

+

Genetic cycles of RNA viruses

Diversity of viral genetic cycles versus the uniform genetic cycle of all cellular organisms Viruses are the

biospherersquos laboratory of genomic strategies

RdRp CP

RdRp CP

+4 Retroid

RNA viruses 7-12 kb +RT Tr E

T+Tr

plusmn

5 Retroid DNAviruses

elements2-10 kb +Tr

T+Tr

plusmnE RT

plusmn

RT CP

RT

Class Replication cycle

Genetic cycles of retroid viruses and retroelements

Genetic cycles of DNA viruses and plasmids 6 ssDNAvirusesplasmids2-11 kb

7 dsDNA virusesplasmids

5-1200 kb

+RCR RCR E

+Tr

plusmn +

TRCE

Tr

plusmn

+T

plusmnR E

Pr-Pol ALL CELLULARORGANISMS

YOU ARE HERE

I Genes with readily detectable homologs from cellular life forms 1 Genes with closely related homologs from cellular organisms

(typically the host of the given virus) present in a narrow group of viruses2 Genes that are conserved within a virus lineage or even several

lineages and have moderately close cellular homologsOrigin relatively recent (1) or ancient (2) acquisition from host

II Virus-specific genes3 ORFans ie genes without detectable homologs except possibly

in closely related viruses 4 Virus-specific genes that are conserved within a virus lineageAcquisition from host but with rapid divergence from ancestor once within viral genomes

III Viral hallmark genes5 Genes shared by many diverse virus lineages with only very distant

homologs in cellular organisms

Natural history of viral genes a one-page summary of viral comparative genomics

Contributions of different classes of viral genes to the genomes of different classes of viruses strong dependence on genome size

Genome size (log) 0 1 2 3

Recent acquisitions

Old acquisitions

Virus-specific ORFans

Virus-specific conserved

Virus hallmark

Most RNAvirusesretroelementsRCR replicons

Large RNA virusesadenotailedphages

NCLDVherpeslargephages

Natural history of viral genesViral Hallmark Genes

Shared by many diverse groups of viruses from the smallest RNA viruses to the giant DNA viruses

Strong support for monophyly of all viral members of the respective gene families

Only distant homologs in cellular organisms

Can be viewed as signatures of the lsquovirus statersquo

Play major roles in genome replication packagingand assembly

4 Rolling circle replicationinitiation endonuclease

1 Jelly-roll capsid protein

2 Superfamily 3 helicase

Protein products of viral hallmark genes

3 RNA-dependent RNA polymeraseand Reverse transcriptase

5 Viral archaeo-eukaryotic DNA primase

6 UL9-like superfamily 2 helicase

7 Packaging ATPase of the FtsK family

8 ATPase subunit of terminase

The primordial gene pool hypothesis (extremely counterintuitive ndash Santiago Elena Feb 16 2011)

The hallmark genes AND by implication the major lineages of modern viruses (at least viruses of prokaryotes) descend directly from a primordial gene pool

Koonin Senkevich Dolja Biol Direct 2006 129

Viral hallmark genes arebullpresent in a huge diversity of viruses and other selfish elementsbullrepresented only by remote homologs in cellular life forms

-synergy with the diversity of genomic strategies

A crucial corollary If viruses come directly from a primordial gene pool then origin of viruses isinextricably linked to the origin of cells

Cell degeneration Escaped genes Pr imordial geneticsystems

CELL

SMALLPARASITICCELL

VERYSMALLVIRUS

CHROMOSOME

PLASMID

VIRUS

RNA

VIRUS

PRE-CELLULAR LIFE FORMS

DNA

VIRUS

mRNA

VIRUS

Competing concepts of the or igin of viruses

CELLS

Origin and evolution of virus-like genetic

elementsin the pre-cellular

era

Koonin Martin TIG 2005Koonin Senkevich Dolja Biol Direct 2006 129Koonin Ann NY Acad Sci 2009

Replicon fusionyields chromosomes

bullViruses and virus-like genetic elements are not ldquojustrdquo pathogens they aredominant entities in the biospherebullEmergence of virus-like parasites is inevitable in any replicating systembullIn the pre-cellular epoch the genetic elements that later became viral andcellular genomes comprised a single pool in which they mixed matched andevolved new increasingly complex gene ensemblesbullDifferent replication strategies including RNA replication reverse transcriptionand DNA replication evolved already in the primordial genetic poolbullWith the emergence of prokaryotic cells a distinct pool of viral genes formed that retained its identity ever since as evidenced by the extant distribution ofviral hallmark genes ldquovirus worldrdquo or the virospherebullThe emergence of the eukaryotic cell was a second melting pot of virus evolution from which viruses of eukaryotes originated via recombination of genes from prokaryote viruses retroelements and the evolving eukaryotic hostbullViruses make essential contributions to the evolution of the genomes of cellularlife forms in particular as vehicles of HGT GTAs transducing phages

The ancient Virus World (VW)

Koonin EV Senkevich TG Dolja VV The ancient Virus World and evolution of cells Biol Direct 2006 Koonin EV Wolf YI Nagasaki K Dolja VV The complexity of the virus world Nat Rev Microbiol 2009

Evolution of antivirus defense systems

bull CRISPRCas system of adaptive immunity in prokaryotes

bull A case for Lamarckian evolutionbull The perennial virus-host arms race

CRISPR repeats and Cas genesCRISPR Clustered Regularly interspaced short palindromic repeatsCas CRISPR-associated (genes)

Sorek et al Nature Rev Microbiol 2008

Makarova KS Aravind L Grishin NV Rogozin IB Koonin EVA DNA repair system specific for thermophilic Archaea and bacteria predicted by genomic context analysisNucleic Acids Res 2002 Jan 1530(2)482-96

During a systematic analysis of conserved gene context in prokaryotic genomes a previously undetected complex partially conserved neighborhood consisting of more than 20 genes was discovered in most Archaea and some bacteria including the hyperthermophiles Thermotoga maritima and Aquifex aeolicus The gene composition and gene order in this neighborhood vary greatly between species but all versions have a stable conserved core that consists of five genes One of the core genes encodes a predicted DNA helicase often fused to a predicted HD-superfamily hydrolase and another encodes a RecB family exonuclease three core genes remain uncharacterized but one of these might encode a nuclease of a new family helliphelliphelliphellipThe functional features of the proteins encoded in this neighborhood suggest that they comprise a previously undetected DNA repair system which to our knowledge is the first repair system largely specific for thermophiles to be identified

Protein components of the system an update with unification of many diverse families

~25 families altogetherFamily SubfamilyA Phyletic

distributionBComments

1 COG1518 COG1518 (cas1) All Putative novel nucleaseintegrase Mostly α-helical protein

2 COG1343 COG1343 (cas2) COG3512ygbF-like

MTH324-likey1723_N-like

All Small protein related to VapD fused to helicase (COG1203) iny1723-like proteins

3 COG1203 COG1203 (cas3) All DNA helicase Most proteins have fusion to HD nuclease

4 RecB-like nuclease

COG1468 (cas4) COG4343 All RecB-like nuclease Contains three-cysteine C-terminal cluster

5 RAMP Repair-associated mysterious

protein

COG1688 (cas5)COG1769 COG1583COG1567 COG1336COG1367 COG1604COG1337 COG1332COG5551

BH0337-likeMJ0978-likeYgcH-like

y1726-like y1727-like

All Belong to ldquoRAMPrdquo family possibly RNA-binding proteinstructurally related to duplicated ferredoxin fold (PDB 1wj9)

6 COG1857 COG1857 COG3649 YgcJ-like y1725-like

All αβ protein predicted nuclease or integrase

7 HD-like nuclease

COG1203 (N-terminus) COG2254 All HD-like nuclease

8 BH0338 BH0338-likeMTH1090-like

All mostlyarchaea and FIRM

Large Zn-finger containing proteins probably nucleases (nucleaseactivity was shown for MTH1090 (ref)

9 COG1353 predicted

polymerase

COG1353 Most archaea some bacteria

Predicted palm-domain polymerase distantly related to viral RdRp and RT

hellipand ~15 other less common proteins Makarova et al BD 2006

CRISPR clustered regularly interspaced short palindromic repeats

CRISPR repeats

TB and TA

Strepto-like

Ecoli-like

Pasteurella-like

Cas2COG1343

Cas1COG1518

Cas4COG1468

Cas3COG1203

ldquoThe common structural characteristics of CRISPR loci are (i) the presence of multiple short direct repeats which show no or very little sequence variation within a given locus (ii) the presence of non-repetitive spacer sequences between the repeats of similar size (iii) the presence of a common leader sequence of a few hundred basepairs in most species harbouring multiple CRISPR loci (iv) the absence of long open reading frames within the locus and (v) the presence of the cas1 gene accompanied by the cas2 cas3 or cas4 genes in CRISPR-containing speciesrdquo

The cas genes are our ldquorepairrdquo system

ldquoHelicaserdquo cassette

ldquoPolymeraserdquo cassette

paragon of prokaryotic genome plasticity-extensive gene shuffling-evidence of multiple horizontal transfers-widespread gene loss

RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins

extreme sequence diversity MOTIFs specific I II III IV V COGs 13361367 hhshhGs ust-lKGhh+hh hhGtt hD lGhttsGh 160413371332 y1726-like slhlpEKuVRGT lRTIDs YGuVTsGhuh COG1851 hGphpGpsaFh hGFGRh BH0337-like TpA-h+GIh-uIh hhLpDV LGsREhuht COG1567 hhhpp ups-lhtAhh luscoGhGh COG1769 hhhh+Ph-hhh ss-hhGhlhsh hGh hGtcp+hsthchp COG1688 (Cas5) hhhhhts sssshhGhlsh lGttphh COGs 15835551 hhhhoPhhl hGtppshGFGl YgcH-like hHphlh hGu+uhGhGhh y1727-like LHphLh GFsthGLStss MJ0978-like hHNH lG+tsuhGhGol

Ferredoxin foldRNA-recognition motif (RRM)

The sequence similarity space of CRISPR repeats visualized with the BioLayoutprogram [26] Dots denote individual repeat sequences connecting lines represent Smith-Waterman similarities such that closer dots represent more similar sequences Dot colors denote cluster association as derived from MCL clustering The 12 largest clusters are indicated by circles together with their sequence logos coarse phylogenetic composition and sample secondary structures where applicableKunin et al Genome Biology 2007 8R61 doi101186gb-2007-8-4-r61

CRISPR show extreme diversity and complex clustering

Arcful GTTGAAATC-------AGACCAAAATGGGATTGAAAG- Metthe GTTAAAATC-------AGACCAAAATGGGATTGAAAT- Pyraby GTTCCAATA-------AGACTAGAATAGAATTGAAAG- Pyrhor GTTTAATAA--------GACTAAAATAGAATTGAAAG- Sulsol GATTAATCC-------------CAAAAGAATTGAAAG- Pyrhor GTTTCCGTA--------GAACTAAATAGTGTGGAAAG- Pyraby GTTTCCGTA--------GAACTTAGTAGTGTGGAAAG- Pyraer GTTTCAACT------------ATCTTTTGATTTTTGG- Pictor GTTAAATAA-------TAACCTAAATAGGATTGAAAG- Theaer GTAAAATAG---------ACCTTAATAGGATTGAAAG- Metace GTTTCAATC-------CCTCAAAGGTCTGATTTTAAC- Sultok GATGAATCC------------CAAAAGGAATTGAAAG- Sulaci GTTTTAGTT-------------TCTTGTCGTTATTAC- Pictor GTTTAAGAA--------TTACTAGATAGTATGGAGT-- Halmar GTTTCAGAC-------GGACCCTTGTGGGGTTGAAGC- Metjan ATTAAAATC-------AGACCGTTTCGGAATGGAAAT- Metkan GTTTCATTA-CCCGTATTATTACGGGTTAATTGCGAG- Nanequ CTTTCAATA---------TTTCTAATATATTAGAAAC- Themar GTTTCAATA-------CTTCCTTAGAGGTATGGAAAC- Theten GTTTCAATC-------CCTTTTAGGTAGGCTAAAAAC Thethe GTTGCAAAC-CTCGTTAGCCTCGTAGAGGATTGAAAC- Aquaeo GTTTTAACT--------CCACACGGTACATTAGAAAC- Bachal GTCGCACTC----TATATGGGT-GCGTGGATTGAAAT- Azoarcus GTGTTCCCC-------GCGCATCGCGGGGGTTGAAG-- Chltep GTCTTCCCC--------ACGCC-CGTGGGGGTGTTTC- Chrvio GTGCTCCCC-------CACGCA-CGTGGGGATGAACCG Clotet GTATTAGTA-------GCACCA-TATTGGAATGTAAAT Anabaena GTTTTAATTAACAAAAATCCCTATCAGGGATTGAAAC- Myctub GTTTCCGTC--CCCTCTCGGGGTTTTGGGTCTGACGAC Yerpes GTTCACTGC--------CGCACAGGCAGCTTAGAAA-- Metcap GTTTCAATCCACTCCCGGCTATTTAGCCGGGAGATAC- Mycgal GTTTTAGCA-CTGTACAATACTTGTGTAAGCAATAAC- Pasmul GTTAACTGC--------CGTATAGGCAGCTTAGAAA-- Pasmul GTTGTAGTTCCCTCTCTCATTTCGCAGTGCTACAAT-- Pasmul GTTCACCAT--------CGTGTAGATGGCTTAGAAA--

All

A large subset of CRISPRs is conserved among diverse specieseven between archaea and bacteria suggesting that

CRISPRs are horizontally transferred together with cas genes

Mojica FJ Diez-Villasenor C Garcia-Martinez J Soria E

Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elementsJ Mol Evol 2005 Feb60(2)174-82

Here we show that CRISPR spacers derive from preexisting sequences either chromosomal or within transmissible genetic elements such as bacteriophages and conjugative plasmids

helliphelliphelliphellipThe transcription of the CRISPR loci (Tang et al 2002) suggests that such activity could be executed by CRISPR-RNA molecules acting as regulatory RNA that specifically recognizes the target through the homologous RNA-spacer sequence similarly to the eukaryotic interference RNA

HypothesisCRISPRCasbull is a prokaryotic immunity system that functions on the

RNAi principle

bull integrates short fragments of essential phageplasmid genes into CRISPRs

bull When expressed these fragments (psiRNA ndash after prokaryotic siRNA) silence the target gene and make the organism immune to the respective agent

bull contains all or most of the protein activities involved in these processes

bull Some of the Cas proteins are functional analogs of the eukaryotic proteins involved in RNAi in particular components of RISC (RNA-Induced Silencing Complex) and form prokaryotic analogs of RISC (pRISCs)

Transcription

(protein-guided) RNA folding

specific transcription factor (COG1517)

cellular RNApol

polycistronic pre-psiRNA

p-dicer (Helicase+HD-Hydrolase)(COG1203 COG2254)

RNA processing(fast)

pre-psiRNA75-100 nt

RNA processing(slow)

p-dicer

psiRNA25-45 nt

p-slicer(COG1468 COG4343

COG1857)

RAMP

target RNA cleavage

p-RISC

plasmid or phage mRNA

Annealing to RNA target

p-RISC

The basic scheme of CASS functioning

psiRNA25-45 nt

RAMP

RAMP-RNA complex(unstable)plasmid or phage

mRNA

Primer elongationCASS RNApol(COG1353)

Amplified annealing

long dsRNA (stable)

p-dicerDuplex degradation

RAMPRAMP binding

Cycle continues

Annealing to RNA target

Variant of CASS functioning with polymerasepsi-RNA amplification(mostly in thermophiles but also in Mycobacteria)

plasmid or phage mRNA

pre-psiRNA75-100 nt

CASS RNApol(COG1353)Or other RT

Reverse transcription with random copy choice

Random RNA recombination and reverse transcription

dsDNA with CRISPRs and target-derived spacers

Homologous recombination with genomic CRISPR region

integrase(COG1518)

genomic DNA

genomic DNA with new target-derived spacer

OR

New psiRNA generation

Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12

Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity

Key validation

bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease

Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes

(1) Adaptation

(2) Expression amp Processing

(3) Interference

invading virus plasmid

CRISPRleader

5 4 3 2 1

6 5 4 3 2 1

invading virus plasmid

invading virus plasmid

host

host

cas operon

Cascade

Cas3

cas3 cas2cas1cas5

Van der Oost et al TIBS 2009

The three types of CRISPRCas systems and their signature genes

1343

1343

1343

1343

Helicase1203

HD-f

Cas2

Cas6

Cas6

Cas5

Cas8

Cas9

Cas1

Cas3

Cas10

Cas11

Cas7

Cas6

Cas6

Cas5 Cas6

Cas4

Cas6

13433513

1343

RuvC-f

3513

HNH

13433513

1857

18571857 RAMP1688

RAMP1688

RecB-f1468

RecB-f1468

Helicase1203

R AMP1583

1857 1857Helicase1203

1857Helicase1203

1343 y1724

ygcKygcL

EcoliCASS2

YpestCASS3

NmeniCASS4

MtubeCASS6 RAMP module

DvulgCASS1

Tneap and HmariCASS7

ApernCASS5

Type I (Helicase cassette)

Cascade

Type II (HNH-type)

Type III (RAMP module or Polymerase cassette)

CASS4a

1857 1343RecB-f1468

Helicase1203

BH0338MTH1090LA3191

1421 RAMP1567

RAMP15833337RAMP

1583

RAMP1583

RAMP1769

SPy1049

Cas6

Cas61857

RecB-f1468

RAMP1688

Helicase1203

AF0070

R AMP1583

1343

1517

Cas12 Cas13

RAMP1688

RAMP1688

RAMP1688

Makarova et al NRM submitted

Experimental data on CRISPRCas systems

1857

1857

1857

RAMP1688

10

Cordip DIP0037Bacfra BF3951

Wolsuc WS1444Camjej Cj1522c

Neimen NMA0630Pasmul PM1126

Strthe str0658Strpyo Spy 0770

Treden TDE0328Staepi SERP2463

Mycgal MGA 0523Mycmob MMOB0320

Porgin PG1982Legpne lpp0161

Wolsuc WS1615Sulsol SSO1450

Aerper APE1240Sulsol SSO1405

Pyraer PAE0200Pyraer PAE0081

Metmaz MM0559Metmaz MM3249

Arcful AF1878Mansuc MS1635

Niteur NE0111Myctub Rv2817c

Nostoc alr0381Synech slr7071Thethe TTHB145

Chrvio CV1229Xanaxo XAC3842

Chltep CT1130Desvul DVUA0134

Metcap MCA0651Azoarc ebA3283

Deigeo_DRAFT_1682Mansuc MS0981

Baccla ABC3592Bachal BH0341

Strpyo Spy 1286Vibvul VVA1544

Nostoc alr1468Synech slr7092

Synech slr7016Geosul GSU0057

Thethe TTHB224Nostoc alr1568

Metkan MK1312

Esccol b2755Saltyp STM2938

Phopro PBPRC0034Geosul GSU1392

Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639

Symthe_STH663Corjei jk0643

Nocfar nfa44220Strave SAV7537

Metcap MCA0930Cordip DIP2214

Chrvio CV1756Zymmob ZMO0680

Pasmul PM0311Erwcar ECA3679

Yerpes y1722Legpne lpl2837Phopro PBPRB1995

Acinet ACIAD2484

Clotet CTC01148Fusnuc FN1177

Aquaeo aq 369Theten TTE2658Themar TM1797

Bacfra BF2544Porgin PG2014

Halmar pNG4053Clotet CTC01463

Metjan MJ0378Pyrhor PH1245

Thekod TK0455Metthe MTH1084

Arcful AF2435Metace MA3670

Nanequ NEQ017Pyrhor PH0173

Pictor PTO0003Pictor PTO0049Thevol TVN0106

1343

1343

1343

1343

1518

1518

1343

1518

RecB-f4343

RecB-f1468

RecB-f1468

RecB-f1468

Helicase12031857 RAMP

1688

RAMP1688

Helicase1203

Helicase1203

RAMPBH0337

RAMPYgcH

RAMPy1727

RAMPy17261857

Helicase1203

HD-f

Helicase1203

HD-f

1343

HD-f

HD-f

HD-f

RuvC-f

AF0070

BH0338

y1724

ygcKygcL

RAMP1583

HTH

AF1870

SPy1049

3574

AF1870

PH0918

HTH 3574

1857 RAMP1688

RecB-f1468

Helicase1203

HD-f

AF1870

MTH1090

AF1873

RAMP1583

3574

AF1870

3574

1343

1

7

2

3

4

5

6

7a

Sulsol SSO1999Sulsol SSO1440

Sultok ST2639Sultok ST0032

Sulsol SSO1402Pyraer PAE0068

Arcful AF1874Pyrhor PH0917Thekod TK0450

Pyraby PAB1689

13433513

3513

1518

RuvC-f

HNH

4aHNH

1 0

Pyrhor PH0162

6

64

6

4

16

17

7

7

7

7

1

2

7

7

7

7

5 7

5 75 7

5 75 7

5 7

6 1

7

Metkan MK1314

Metace MA1931

Metmaz MM3356

Metbar Mbar_A1355

Strthe stu0960Myctub Rv2823cStaepi SERP2461

Themar TM1811Mansuc MS1651

Niteur NE0123Thethe TTP0102

Metthe MTH1082

Metthe MTH326

Metjan MJ1672Pictor PTO0053Thevol TVN0112

Metkan MK1297Thethe TTP0115

Bachal BH0328Synech sll7090

Aquaeo aq 387Theten TTE2639

Themar TM1794Arcful AF1867

Pyrfur PF1129Thefus Tfu1578

Porgin PG1987Sultok ST1979

Sulsol SSO1991Sulsol SSO1729

Sultok ST0012Sulaci Saci 2046Nostoc all1479

1421 RAMP1567

RAMP1583

RAMPMTH323

RAMP1769

XMTH324

ldquoPolymeraserdquo cassette

ldquoHelicaserdquo cassette(A)

(B)

(C)

Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4

Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56

Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)

mRNA targeting

Phylogeny of Cas1 and the 3 types of CRISPRCas systems

228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes

EcoliCASS2Type IType I

Type I

Type I

Type I

Type III

Type III

Type III

Type III

Type III

Type III

Type II

YpestCASS3

NmeniCASS4

PolymeraseRAMPmoduleMtubeCASS6

PolRAMP

DvulgCASS1

TneapHmariCASS7

ApernCASS5

cas1cas9 cas2 cas4

cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI

II

IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2

RAMP

Makarova et al NRM submitted

CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria

bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes

54

256

1537

26

516

2

9

7 56107

3

10

13

383

210

46

216

8

1

7 70211

10

1

0102030405060708090

100

1623

84 6

17 7

2322

0

9

0

0

151

2

1

21

17

20 1

0

1533 28 7 14

0

9 7 40

117 210

0102030405060708090

100

Type I Type II Type IIICas1presenceabsence

Makarova et al in preparation

Modular evolution of the 3 types of CRISPRCas

Makarova et al in preparation

Mol Microbiol 2011 Jan79(2)484-502

A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair

Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF

Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair

Back to repair functions

The 3 major modalities of evolution

Koonin Wolf Biol Direct 2009

CRISPRCas as a bona fide Lamarckian system

Koonin Wolf Biol Direct 2009

Phenomenon Biological rolefunction

Phyletic spread

Lamarckian criteria

Genomic changes caused by environmental factor

Changes are specific to relevant genomic loci

Changes provide adaptation to the causative factor

Bona fide LamarckianCRISPRCas Defense against

viruses and other mobile elements

Most of the Archaeaand many bacteria

Yes Yes Yes

piRNA Defense against transposable elements in germline

Animals Yes Yes Yes

HGT (specific cases)

Adaptation to new environment stress response resistance

Archaea bacteria unicellular eukaryotes

Yes Yes Yes

Quasi-LamarckianHGT (general phenomenon)

Diverse innovations Archaea bacteria unicellular eukaryotes

Yes No Yesno

Stress-induced mutagenesis

Stress responseresistanceadaptation to new conditions

Ubiquitous Yes No or partially

Yes (but general evolvabilityenhanced as well)

Diverse Lamarckian and quasi-Lamarckian phenomena

Koonin Wolf Biol Direct 2009

Stress as a gauge of evolutionary modality

Koonin Wolf Biol Direct 2009

bullEvolution of parasites is intrinsic to any replicator system

bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since

bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms

bullPerennial arms race between parasites and hosts is one of the principal factors of evolution

Valerian DoljaOregon State University

Yuri Wolf NCBITatiana Senkevich NIAID NIH

Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille

Didier Raoult et al

Bill MartinUniv Duesseldorf

Kira Makarova NCBI

  • Slide Number 1
  • What is a virus
  • Slide Number 3
  • Slide Number 4
  • Slide Number 5
  • Comparative genomics shows that viruses that cause human diseases belong to families that evolved hundreds of millions or even billions years agoViruses accompany the evolving cellular life throughout its history and might even predate it
  • Slide Number 7
  • Slide Number 8
  • Slide Number 9
  • Slide Number 10
  • Slide Number 11
  • Slide Number 12
  • Slide Number 13
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Slide Number 18
  • Slide Number 19
  • Slide Number 20
  • Slide Number 21
  • Slide Number 22
  • Slide Number 23
  • Slide Number 24
  • Slide Number 25
  • Slide Number 26
  • Slide Number 27
  • Slide Number 28
  • Slide Number 29
  • Slide Number 30
  • Slide Number 31
  • Slide Number 32
  • Slide Number 33
  • Slide Number 34
  • Slide Number 35
  • Slide Number 36
  • Evolution of antivirus defense systems
  • Slide Number 38
  • Slide Number 39
  • Slide Number 40
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins extreme sequence diversity
  • Slide Number 46
  • Slide Number 47
  • Slide Number 48
  • Slide Number 49
  • Hypothesis
  • Slide Number 51
  • Slide Number 52
  • Slide Number 53
  • Slide Number 54
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • The three types of CRISPRCas systems and their signature genes
  • Experimental data on CRISPRCas systems
  • Phylogeny of Cas1 and the 3 types of CRISPRCas systems
  • CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
  • Slide Number 62
  • Slide Number 63
  • Slide Number 64
  • Slide Number 65
  • Slide Number 66
  • Slide Number 67
  • Slide Number 68
  • Slide Number 69
Page 22: The Virus World, its evolution, evolution of antiviral defense

PolioV74 kb

VP0 3DPol2C 3CPro Angg

IRESVP1VP3 2APro 2B

3A 3B

Jelly-roll CPsSuperfamily 3

helicase RdRp

DNA phages Bacterial group II

retroelements

Bacterial and mitochondrialHtrA family proteases

Likely origins of the signature genes of the picorna-like superfamily were inferred usingsignature PSSMs and PSI-BLAST searches

3C-Pro

Hallmark genes of picorna-like viruses have distinct prokaryoticorigins

Radiation of major viralclades occurred in aldquoBig Bangrdquo during eukaryogenesis and antedates the divergenceof eukaryotic supergroups-the viruses then ldquosampledrdquo the hosts

Conclusions on the picorna and NCLDV storiesBig Bangs of virus evolution

bullPhylogenomic analysis of the picorna-like superfamily of eukaryotic RNA viruses indicates that the major lineages within this superfamily diverged prior to the divergence of the eukaryotic supergroupsbullBig Bang ndash an explosive early phase of viral evolutionbullMost likely the same pattern holds for other major groups ofviruses as illustrated by the evolutionary study of NCLDVa completely different group of virusesbullThe Big Bangs of eukaryotic virus evolution occurred concomitantly with a similar rapid phase of host evolution andcould be a manifestation of a general model of major evolutionarytransitions

Koonin The Biological Big Bang model for the major transitions in evolutionBiol Direct 2007 Aug 20221

Koonin Wolf Nagasaki Dolja Nat Rev Microbiol 2008 Dec6(12)925-39

+

1 Positive-strandRNA

3-30 kb - +R R ET

T+R

Class Replication cycle

2 Double-strandRNA

4-25 kb plusmnTr R E

T

+ plusmn

3 Negative-strandRNA

11-20 kbR R E

RdRp CPT

+-Tr -

+

Genetic cycles of RNA viruses

Diversity of viral genetic cycles versus the uniform genetic cycle of all cellular organisms Viruses are the

biospherersquos laboratory of genomic strategies

RdRp CP

RdRp CP

+4 Retroid

RNA viruses 7-12 kb +RT Tr E

T+Tr

plusmn

5 Retroid DNAviruses

elements2-10 kb +Tr

T+Tr

plusmnE RT

plusmn

RT CP

RT

Class Replication cycle

Genetic cycles of retroid viruses and retroelements

Genetic cycles of DNA viruses and plasmids 6 ssDNAvirusesplasmids2-11 kb

7 dsDNA virusesplasmids

5-1200 kb

+RCR RCR E

+Tr

plusmn +

TRCE

Tr

plusmn

+T

plusmnR E

Pr-Pol ALL CELLULARORGANISMS

YOU ARE HERE

I Genes with readily detectable homologs from cellular life forms 1 Genes with closely related homologs from cellular organisms

(typically the host of the given virus) present in a narrow group of viruses2 Genes that are conserved within a virus lineage or even several

lineages and have moderately close cellular homologsOrigin relatively recent (1) or ancient (2) acquisition from host

II Virus-specific genes3 ORFans ie genes without detectable homologs except possibly

in closely related viruses 4 Virus-specific genes that are conserved within a virus lineageAcquisition from host but with rapid divergence from ancestor once within viral genomes

III Viral hallmark genes5 Genes shared by many diverse virus lineages with only very distant

homologs in cellular organisms

Natural history of viral genes a one-page summary of viral comparative genomics

Contributions of different classes of viral genes to the genomes of different classes of viruses strong dependence on genome size

Genome size (log) 0 1 2 3

Recent acquisitions

Old acquisitions

Virus-specific ORFans

Virus-specific conserved

Virus hallmark

Most RNAvirusesretroelementsRCR replicons

Large RNA virusesadenotailedphages

NCLDVherpeslargephages

Natural history of viral genesViral Hallmark Genes

Shared by many diverse groups of viruses from the smallest RNA viruses to the giant DNA viruses

Strong support for monophyly of all viral members of the respective gene families

Only distant homologs in cellular organisms

Can be viewed as signatures of the lsquovirus statersquo

Play major roles in genome replication packagingand assembly

4 Rolling circle replicationinitiation endonuclease

1 Jelly-roll capsid protein

2 Superfamily 3 helicase

Protein products of viral hallmark genes

3 RNA-dependent RNA polymeraseand Reverse transcriptase

5 Viral archaeo-eukaryotic DNA primase

6 UL9-like superfamily 2 helicase

7 Packaging ATPase of the FtsK family

8 ATPase subunit of terminase

The primordial gene pool hypothesis (extremely counterintuitive ndash Santiago Elena Feb 16 2011)

The hallmark genes AND by implication the major lineages of modern viruses (at least viruses of prokaryotes) descend directly from a primordial gene pool

Koonin Senkevich Dolja Biol Direct 2006 129

Viral hallmark genes arebullpresent in a huge diversity of viruses and other selfish elementsbullrepresented only by remote homologs in cellular life forms

-synergy with the diversity of genomic strategies

A crucial corollary If viruses come directly from a primordial gene pool then origin of viruses isinextricably linked to the origin of cells

Cell degeneration Escaped genes Pr imordial geneticsystems

CELL

SMALLPARASITICCELL

VERYSMALLVIRUS

CHROMOSOME

PLASMID

VIRUS

RNA

VIRUS

PRE-CELLULAR LIFE FORMS

DNA

VIRUS

mRNA

VIRUS

Competing concepts of the or igin of viruses

CELLS

Origin and evolution of virus-like genetic

elementsin the pre-cellular

era

Koonin Martin TIG 2005Koonin Senkevich Dolja Biol Direct 2006 129Koonin Ann NY Acad Sci 2009

Replicon fusionyields chromosomes

bullViruses and virus-like genetic elements are not ldquojustrdquo pathogens they aredominant entities in the biospherebullEmergence of virus-like parasites is inevitable in any replicating systembullIn the pre-cellular epoch the genetic elements that later became viral andcellular genomes comprised a single pool in which they mixed matched andevolved new increasingly complex gene ensemblesbullDifferent replication strategies including RNA replication reverse transcriptionand DNA replication evolved already in the primordial genetic poolbullWith the emergence of prokaryotic cells a distinct pool of viral genes formed that retained its identity ever since as evidenced by the extant distribution ofviral hallmark genes ldquovirus worldrdquo or the virospherebullThe emergence of the eukaryotic cell was a second melting pot of virus evolution from which viruses of eukaryotes originated via recombination of genes from prokaryote viruses retroelements and the evolving eukaryotic hostbullViruses make essential contributions to the evolution of the genomes of cellularlife forms in particular as vehicles of HGT GTAs transducing phages

The ancient Virus World (VW)

Koonin EV Senkevich TG Dolja VV The ancient Virus World and evolution of cells Biol Direct 2006 Koonin EV Wolf YI Nagasaki K Dolja VV The complexity of the virus world Nat Rev Microbiol 2009

Evolution of antivirus defense systems

bull CRISPRCas system of adaptive immunity in prokaryotes

bull A case for Lamarckian evolutionbull The perennial virus-host arms race

CRISPR repeats and Cas genesCRISPR Clustered Regularly interspaced short palindromic repeatsCas CRISPR-associated (genes)

Sorek et al Nature Rev Microbiol 2008

Makarova KS Aravind L Grishin NV Rogozin IB Koonin EVA DNA repair system specific for thermophilic Archaea and bacteria predicted by genomic context analysisNucleic Acids Res 2002 Jan 1530(2)482-96

During a systematic analysis of conserved gene context in prokaryotic genomes a previously undetected complex partially conserved neighborhood consisting of more than 20 genes was discovered in most Archaea and some bacteria including the hyperthermophiles Thermotoga maritima and Aquifex aeolicus The gene composition and gene order in this neighborhood vary greatly between species but all versions have a stable conserved core that consists of five genes One of the core genes encodes a predicted DNA helicase often fused to a predicted HD-superfamily hydrolase and another encodes a RecB family exonuclease three core genes remain uncharacterized but one of these might encode a nuclease of a new family helliphelliphelliphellipThe functional features of the proteins encoded in this neighborhood suggest that they comprise a previously undetected DNA repair system which to our knowledge is the first repair system largely specific for thermophiles to be identified

Protein components of the system an update with unification of many diverse families

~25 families altogetherFamily SubfamilyA Phyletic

distributionBComments

1 COG1518 COG1518 (cas1) All Putative novel nucleaseintegrase Mostly α-helical protein

2 COG1343 COG1343 (cas2) COG3512ygbF-like

MTH324-likey1723_N-like

All Small protein related to VapD fused to helicase (COG1203) iny1723-like proteins

3 COG1203 COG1203 (cas3) All DNA helicase Most proteins have fusion to HD nuclease

4 RecB-like nuclease

COG1468 (cas4) COG4343 All RecB-like nuclease Contains three-cysteine C-terminal cluster

5 RAMP Repair-associated mysterious

protein

COG1688 (cas5)COG1769 COG1583COG1567 COG1336COG1367 COG1604COG1337 COG1332COG5551

BH0337-likeMJ0978-likeYgcH-like

y1726-like y1727-like

All Belong to ldquoRAMPrdquo family possibly RNA-binding proteinstructurally related to duplicated ferredoxin fold (PDB 1wj9)

6 COG1857 COG1857 COG3649 YgcJ-like y1725-like

All αβ protein predicted nuclease or integrase

7 HD-like nuclease

COG1203 (N-terminus) COG2254 All HD-like nuclease

8 BH0338 BH0338-likeMTH1090-like

All mostlyarchaea and FIRM

Large Zn-finger containing proteins probably nucleases (nucleaseactivity was shown for MTH1090 (ref)

9 COG1353 predicted

polymerase

COG1353 Most archaea some bacteria

Predicted palm-domain polymerase distantly related to viral RdRp and RT

hellipand ~15 other less common proteins Makarova et al BD 2006

CRISPR clustered regularly interspaced short palindromic repeats

CRISPR repeats

TB and TA

Strepto-like

Ecoli-like

Pasteurella-like

Cas2COG1343

Cas1COG1518

Cas4COG1468

Cas3COG1203

ldquoThe common structural characteristics of CRISPR loci are (i) the presence of multiple short direct repeats which show no or very little sequence variation within a given locus (ii) the presence of non-repetitive spacer sequences between the repeats of similar size (iii) the presence of a common leader sequence of a few hundred basepairs in most species harbouring multiple CRISPR loci (iv) the absence of long open reading frames within the locus and (v) the presence of the cas1 gene accompanied by the cas2 cas3 or cas4 genes in CRISPR-containing speciesrdquo

The cas genes are our ldquorepairrdquo system

ldquoHelicaserdquo cassette

ldquoPolymeraserdquo cassette

paragon of prokaryotic genome plasticity-extensive gene shuffling-evidence of multiple horizontal transfers-widespread gene loss

RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins

extreme sequence diversity MOTIFs specific I II III IV V COGs 13361367 hhshhGs ust-lKGhh+hh hhGtt hD lGhttsGh 160413371332 y1726-like slhlpEKuVRGT lRTIDs YGuVTsGhuh COG1851 hGphpGpsaFh hGFGRh BH0337-like TpA-h+GIh-uIh hhLpDV LGsREhuht COG1567 hhhpp ups-lhtAhh luscoGhGh COG1769 hhhh+Ph-hhh ss-hhGhlhsh hGh hGtcp+hsthchp COG1688 (Cas5) hhhhhts sssshhGhlsh lGttphh COGs 15835551 hhhhoPhhl hGtppshGFGl YgcH-like hHphlh hGu+uhGhGhh y1727-like LHphLh GFsthGLStss MJ0978-like hHNH lG+tsuhGhGol

Ferredoxin foldRNA-recognition motif (RRM)

The sequence similarity space of CRISPR repeats visualized with the BioLayoutprogram [26] Dots denote individual repeat sequences connecting lines represent Smith-Waterman similarities such that closer dots represent more similar sequences Dot colors denote cluster association as derived from MCL clustering The 12 largest clusters are indicated by circles together with their sequence logos coarse phylogenetic composition and sample secondary structures where applicableKunin et al Genome Biology 2007 8R61 doi101186gb-2007-8-4-r61

CRISPR show extreme diversity and complex clustering

Arcful GTTGAAATC-------AGACCAAAATGGGATTGAAAG- Metthe GTTAAAATC-------AGACCAAAATGGGATTGAAAT- Pyraby GTTCCAATA-------AGACTAGAATAGAATTGAAAG- Pyrhor GTTTAATAA--------GACTAAAATAGAATTGAAAG- Sulsol GATTAATCC-------------CAAAAGAATTGAAAG- Pyrhor GTTTCCGTA--------GAACTAAATAGTGTGGAAAG- Pyraby GTTTCCGTA--------GAACTTAGTAGTGTGGAAAG- Pyraer GTTTCAACT------------ATCTTTTGATTTTTGG- Pictor GTTAAATAA-------TAACCTAAATAGGATTGAAAG- Theaer GTAAAATAG---------ACCTTAATAGGATTGAAAG- Metace GTTTCAATC-------CCTCAAAGGTCTGATTTTAAC- Sultok GATGAATCC------------CAAAAGGAATTGAAAG- Sulaci GTTTTAGTT-------------TCTTGTCGTTATTAC- Pictor GTTTAAGAA--------TTACTAGATAGTATGGAGT-- Halmar GTTTCAGAC-------GGACCCTTGTGGGGTTGAAGC- Metjan ATTAAAATC-------AGACCGTTTCGGAATGGAAAT- Metkan GTTTCATTA-CCCGTATTATTACGGGTTAATTGCGAG- Nanequ CTTTCAATA---------TTTCTAATATATTAGAAAC- Themar GTTTCAATA-------CTTCCTTAGAGGTATGGAAAC- Theten GTTTCAATC-------CCTTTTAGGTAGGCTAAAAAC Thethe GTTGCAAAC-CTCGTTAGCCTCGTAGAGGATTGAAAC- Aquaeo GTTTTAACT--------CCACACGGTACATTAGAAAC- Bachal GTCGCACTC----TATATGGGT-GCGTGGATTGAAAT- Azoarcus GTGTTCCCC-------GCGCATCGCGGGGGTTGAAG-- Chltep GTCTTCCCC--------ACGCC-CGTGGGGGTGTTTC- Chrvio GTGCTCCCC-------CACGCA-CGTGGGGATGAACCG Clotet GTATTAGTA-------GCACCA-TATTGGAATGTAAAT Anabaena GTTTTAATTAACAAAAATCCCTATCAGGGATTGAAAC- Myctub GTTTCCGTC--CCCTCTCGGGGTTTTGGGTCTGACGAC Yerpes GTTCACTGC--------CGCACAGGCAGCTTAGAAA-- Metcap GTTTCAATCCACTCCCGGCTATTTAGCCGGGAGATAC- Mycgal GTTTTAGCA-CTGTACAATACTTGTGTAAGCAATAAC- Pasmul GTTAACTGC--------CGTATAGGCAGCTTAGAAA-- Pasmul GTTGTAGTTCCCTCTCTCATTTCGCAGTGCTACAAT-- Pasmul GTTCACCAT--------CGTGTAGATGGCTTAGAAA--

All

A large subset of CRISPRs is conserved among diverse specieseven between archaea and bacteria suggesting that

CRISPRs are horizontally transferred together with cas genes

Mojica FJ Diez-Villasenor C Garcia-Martinez J Soria E

Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elementsJ Mol Evol 2005 Feb60(2)174-82

Here we show that CRISPR spacers derive from preexisting sequences either chromosomal or within transmissible genetic elements such as bacteriophages and conjugative plasmids

helliphelliphelliphellipThe transcription of the CRISPR loci (Tang et al 2002) suggests that such activity could be executed by CRISPR-RNA molecules acting as regulatory RNA that specifically recognizes the target through the homologous RNA-spacer sequence similarly to the eukaryotic interference RNA

HypothesisCRISPRCasbull is a prokaryotic immunity system that functions on the

RNAi principle

bull integrates short fragments of essential phageplasmid genes into CRISPRs

bull When expressed these fragments (psiRNA ndash after prokaryotic siRNA) silence the target gene and make the organism immune to the respective agent

bull contains all or most of the protein activities involved in these processes

bull Some of the Cas proteins are functional analogs of the eukaryotic proteins involved in RNAi in particular components of RISC (RNA-Induced Silencing Complex) and form prokaryotic analogs of RISC (pRISCs)

Transcription

(protein-guided) RNA folding

specific transcription factor (COG1517)

cellular RNApol

polycistronic pre-psiRNA

p-dicer (Helicase+HD-Hydrolase)(COG1203 COG2254)

RNA processing(fast)

pre-psiRNA75-100 nt

RNA processing(slow)

p-dicer

psiRNA25-45 nt

p-slicer(COG1468 COG4343

COG1857)

RAMP

target RNA cleavage

p-RISC

plasmid or phage mRNA

Annealing to RNA target

p-RISC

The basic scheme of CASS functioning

psiRNA25-45 nt

RAMP

RAMP-RNA complex(unstable)plasmid or phage

mRNA

Primer elongationCASS RNApol(COG1353)

Amplified annealing

long dsRNA (stable)

p-dicerDuplex degradation

RAMPRAMP binding

Cycle continues

Annealing to RNA target

Variant of CASS functioning with polymerasepsi-RNA amplification(mostly in thermophiles but also in Mycobacteria)

plasmid or phage mRNA

pre-psiRNA75-100 nt

CASS RNApol(COG1353)Or other RT

Reverse transcription with random copy choice

Random RNA recombination and reverse transcription

dsDNA with CRISPRs and target-derived spacers

Homologous recombination with genomic CRISPR region

integrase(COG1518)

genomic DNA

genomic DNA with new target-derived spacer

OR

New psiRNA generation

Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12

Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity

Key validation

bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease

Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes

(1) Adaptation

(2) Expression amp Processing

(3) Interference

invading virus plasmid

CRISPRleader

5 4 3 2 1

6 5 4 3 2 1

invading virus plasmid

invading virus plasmid

host

host

cas operon

Cascade

Cas3

cas3 cas2cas1cas5

Van der Oost et al TIBS 2009

The three types of CRISPRCas systems and their signature genes

1343

1343

1343

1343

Helicase1203

HD-f

Cas2

Cas6

Cas6

Cas5

Cas8

Cas9

Cas1

Cas3

Cas10

Cas11

Cas7

Cas6

Cas6

Cas5 Cas6

Cas4

Cas6

13433513

1343

RuvC-f

3513

HNH

13433513

1857

18571857 RAMP1688

RAMP1688

RecB-f1468

RecB-f1468

Helicase1203

R AMP1583

1857 1857Helicase1203

1857Helicase1203

1343 y1724

ygcKygcL

EcoliCASS2

YpestCASS3

NmeniCASS4

MtubeCASS6 RAMP module

DvulgCASS1

Tneap and HmariCASS7

ApernCASS5

Type I (Helicase cassette)

Cascade

Type II (HNH-type)

Type III (RAMP module or Polymerase cassette)

CASS4a

1857 1343RecB-f1468

Helicase1203

BH0338MTH1090LA3191

1421 RAMP1567

RAMP15833337RAMP

1583

RAMP1583

RAMP1769

SPy1049

Cas6

Cas61857

RecB-f1468

RAMP1688

Helicase1203

AF0070

R AMP1583

1343

1517

Cas12 Cas13

RAMP1688

RAMP1688

RAMP1688

Makarova et al NRM submitted

Experimental data on CRISPRCas systems

1857

1857

1857

RAMP1688

10

Cordip DIP0037Bacfra BF3951

Wolsuc WS1444Camjej Cj1522c

Neimen NMA0630Pasmul PM1126

Strthe str0658Strpyo Spy 0770

Treden TDE0328Staepi SERP2463

Mycgal MGA 0523Mycmob MMOB0320

Porgin PG1982Legpne lpp0161

Wolsuc WS1615Sulsol SSO1450

Aerper APE1240Sulsol SSO1405

Pyraer PAE0200Pyraer PAE0081

Metmaz MM0559Metmaz MM3249

Arcful AF1878Mansuc MS1635

Niteur NE0111Myctub Rv2817c

Nostoc alr0381Synech slr7071Thethe TTHB145

Chrvio CV1229Xanaxo XAC3842

Chltep CT1130Desvul DVUA0134

Metcap MCA0651Azoarc ebA3283

Deigeo_DRAFT_1682Mansuc MS0981

Baccla ABC3592Bachal BH0341

Strpyo Spy 1286Vibvul VVA1544

Nostoc alr1468Synech slr7092

Synech slr7016Geosul GSU0057

Thethe TTHB224Nostoc alr1568

Metkan MK1312

Esccol b2755Saltyp STM2938

Phopro PBPRC0034Geosul GSU1392

Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639

Symthe_STH663Corjei jk0643

Nocfar nfa44220Strave SAV7537

Metcap MCA0930Cordip DIP2214

Chrvio CV1756Zymmob ZMO0680

Pasmul PM0311Erwcar ECA3679

Yerpes y1722Legpne lpl2837Phopro PBPRB1995

Acinet ACIAD2484

Clotet CTC01148Fusnuc FN1177

Aquaeo aq 369Theten TTE2658Themar TM1797

Bacfra BF2544Porgin PG2014

Halmar pNG4053Clotet CTC01463

Metjan MJ0378Pyrhor PH1245

Thekod TK0455Metthe MTH1084

Arcful AF2435Metace MA3670

Nanequ NEQ017Pyrhor PH0173

Pictor PTO0003Pictor PTO0049Thevol TVN0106

1343

1343

1343

1343

1518

1518

1343

1518

RecB-f4343

RecB-f1468

RecB-f1468

RecB-f1468

Helicase12031857 RAMP

1688

RAMP1688

Helicase1203

Helicase1203

RAMPBH0337

RAMPYgcH

RAMPy1727

RAMPy17261857

Helicase1203

HD-f

Helicase1203

HD-f

1343

HD-f

HD-f

HD-f

RuvC-f

AF0070

BH0338

y1724

ygcKygcL

RAMP1583

HTH

AF1870

SPy1049

3574

AF1870

PH0918

HTH 3574

1857 RAMP1688

RecB-f1468

Helicase1203

HD-f

AF1870

MTH1090

AF1873

RAMP1583

3574

AF1870

3574

1343

1

7

2

3

4

5

6

7a

Sulsol SSO1999Sulsol SSO1440

Sultok ST2639Sultok ST0032

Sulsol SSO1402Pyraer PAE0068

Arcful AF1874Pyrhor PH0917Thekod TK0450

Pyraby PAB1689

13433513

3513

1518

RuvC-f

HNH

4aHNH

1 0

Pyrhor PH0162

6

64

6

4

16

17

7

7

7

7

1

2

7

7

7

7

5 7

5 75 7

5 75 7

5 7

6 1

7

Metkan MK1314

Metace MA1931

Metmaz MM3356

Metbar Mbar_A1355

Strthe stu0960Myctub Rv2823cStaepi SERP2461

Themar TM1811Mansuc MS1651

Niteur NE0123Thethe TTP0102

Metthe MTH1082

Metthe MTH326

Metjan MJ1672Pictor PTO0053Thevol TVN0112

Metkan MK1297Thethe TTP0115

Bachal BH0328Synech sll7090

Aquaeo aq 387Theten TTE2639

Themar TM1794Arcful AF1867

Pyrfur PF1129Thefus Tfu1578

Porgin PG1987Sultok ST1979

Sulsol SSO1991Sulsol SSO1729

Sultok ST0012Sulaci Saci 2046Nostoc all1479

1421 RAMP1567

RAMP1583

RAMPMTH323

RAMP1769

XMTH324

ldquoPolymeraserdquo cassette

ldquoHelicaserdquo cassette(A)

(B)

(C)

Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4

Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56

Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)

mRNA targeting

Phylogeny of Cas1 and the 3 types of CRISPRCas systems

228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes

EcoliCASS2Type IType I

Type I

Type I

Type I

Type III

Type III

Type III

Type III

Type III

Type III

Type II

YpestCASS3

NmeniCASS4

PolymeraseRAMPmoduleMtubeCASS6

PolRAMP

DvulgCASS1

TneapHmariCASS7

ApernCASS5

cas1cas9 cas2 cas4

cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI

II

IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2

RAMP

Makarova et al NRM submitted

CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria

bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes

54

256

1537

26

516

2

9

7 56107

3

10

13

383

210

46

216

8

1

7 70211

10

1

0102030405060708090

100

1623

84 6

17 7

2322

0

9

0

0

151

2

1

21

17

20 1

0

1533 28 7 14

0

9 7 40

117 210

0102030405060708090

100

Type I Type II Type IIICas1presenceabsence

Makarova et al in preparation

Modular evolution of the 3 types of CRISPRCas

Makarova et al in preparation

Mol Microbiol 2011 Jan79(2)484-502

A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair

Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF

Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair

Back to repair functions

The 3 major modalities of evolution

Koonin Wolf Biol Direct 2009

CRISPRCas as a bona fide Lamarckian system

Koonin Wolf Biol Direct 2009

Phenomenon Biological rolefunction

Phyletic spread

Lamarckian criteria

Genomic changes caused by environmental factor

Changes are specific to relevant genomic loci

Changes provide adaptation to the causative factor

Bona fide LamarckianCRISPRCas Defense against

viruses and other mobile elements

Most of the Archaeaand many bacteria

Yes Yes Yes

piRNA Defense against transposable elements in germline

Animals Yes Yes Yes

HGT (specific cases)

Adaptation to new environment stress response resistance

Archaea bacteria unicellular eukaryotes

Yes Yes Yes

Quasi-LamarckianHGT (general phenomenon)

Diverse innovations Archaea bacteria unicellular eukaryotes

Yes No Yesno

Stress-induced mutagenesis

Stress responseresistanceadaptation to new conditions

Ubiquitous Yes No or partially

Yes (but general evolvabilityenhanced as well)

Diverse Lamarckian and quasi-Lamarckian phenomena

Koonin Wolf Biol Direct 2009

Stress as a gauge of evolutionary modality

Koonin Wolf Biol Direct 2009

bullEvolution of parasites is intrinsic to any replicator system

bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since

bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms

bullPerennial arms race between parasites and hosts is one of the principal factors of evolution

Valerian DoljaOregon State University

Yuri Wolf NCBITatiana Senkevich NIAID NIH

Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille

Didier Raoult et al

Bill MartinUniv Duesseldorf

Kira Makarova NCBI

  • Slide Number 1
  • What is a virus
  • Slide Number 3
  • Slide Number 4
  • Slide Number 5
  • Comparative genomics shows that viruses that cause human diseases belong to families that evolved hundreds of millions or even billions years agoViruses accompany the evolving cellular life throughout its history and might even predate it
  • Slide Number 7
  • Slide Number 8
  • Slide Number 9
  • Slide Number 10
  • Slide Number 11
  • Slide Number 12
  • Slide Number 13
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Slide Number 18
  • Slide Number 19
  • Slide Number 20
  • Slide Number 21
  • Slide Number 22
  • Slide Number 23
  • Slide Number 24
  • Slide Number 25
  • Slide Number 26
  • Slide Number 27
  • Slide Number 28
  • Slide Number 29
  • Slide Number 30
  • Slide Number 31
  • Slide Number 32
  • Slide Number 33
  • Slide Number 34
  • Slide Number 35
  • Slide Number 36
  • Evolution of antivirus defense systems
  • Slide Number 38
  • Slide Number 39
  • Slide Number 40
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins extreme sequence diversity
  • Slide Number 46
  • Slide Number 47
  • Slide Number 48
  • Slide Number 49
  • Hypothesis
  • Slide Number 51
  • Slide Number 52
  • Slide Number 53
  • Slide Number 54
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • The three types of CRISPRCas systems and their signature genes
  • Experimental data on CRISPRCas systems
  • Phylogeny of Cas1 and the 3 types of CRISPRCas systems
  • CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
  • Slide Number 62
  • Slide Number 63
  • Slide Number 64
  • Slide Number 65
  • Slide Number 66
  • Slide Number 67
  • Slide Number 68
  • Slide Number 69
Page 23: The Virus World, its evolution, evolution of antiviral defense

Hallmark genes of picorna-like viruses have distinct prokaryoticorigins

Radiation of major viralclades occurred in aldquoBig Bangrdquo during eukaryogenesis and antedates the divergenceof eukaryotic supergroups-the viruses then ldquosampledrdquo the hosts

Conclusions on the picorna and NCLDV storiesBig Bangs of virus evolution

bullPhylogenomic analysis of the picorna-like superfamily of eukaryotic RNA viruses indicates that the major lineages within this superfamily diverged prior to the divergence of the eukaryotic supergroupsbullBig Bang ndash an explosive early phase of viral evolutionbullMost likely the same pattern holds for other major groups ofviruses as illustrated by the evolutionary study of NCLDVa completely different group of virusesbullThe Big Bangs of eukaryotic virus evolution occurred concomitantly with a similar rapid phase of host evolution andcould be a manifestation of a general model of major evolutionarytransitions

Koonin The Biological Big Bang model for the major transitions in evolutionBiol Direct 2007 Aug 20221

Koonin Wolf Nagasaki Dolja Nat Rev Microbiol 2008 Dec6(12)925-39

+

1 Positive-strandRNA

3-30 kb - +R R ET

T+R

Class Replication cycle

2 Double-strandRNA

4-25 kb plusmnTr R E

T

+ plusmn

3 Negative-strandRNA

11-20 kbR R E

RdRp CPT

+-Tr -

+

Genetic cycles of RNA viruses

Diversity of viral genetic cycles versus the uniform genetic cycle of all cellular organisms Viruses are the

biospherersquos laboratory of genomic strategies

RdRp CP

RdRp CP

+4 Retroid

RNA viruses 7-12 kb +RT Tr E

T+Tr

plusmn

5 Retroid DNAviruses

elements2-10 kb +Tr

T+Tr

plusmnE RT

plusmn

RT CP

RT

Class Replication cycle

Genetic cycles of retroid viruses and retroelements

Genetic cycles of DNA viruses and plasmids 6 ssDNAvirusesplasmids2-11 kb

7 dsDNA virusesplasmids

5-1200 kb

+RCR RCR E

+Tr

plusmn +

TRCE

Tr

plusmn

+T

plusmnR E

Pr-Pol ALL CELLULARORGANISMS

YOU ARE HERE

I Genes with readily detectable homologs from cellular life forms 1 Genes with closely related homologs from cellular organisms

(typically the host of the given virus) present in a narrow group of viruses2 Genes that are conserved within a virus lineage or even several

lineages and have moderately close cellular homologsOrigin relatively recent (1) or ancient (2) acquisition from host

II Virus-specific genes3 ORFans ie genes without detectable homologs except possibly

in closely related viruses 4 Virus-specific genes that are conserved within a virus lineageAcquisition from host but with rapid divergence from ancestor once within viral genomes

III Viral hallmark genes5 Genes shared by many diverse virus lineages with only very distant

homologs in cellular organisms

Natural history of viral genes a one-page summary of viral comparative genomics

Contributions of different classes of viral genes to the genomes of different classes of viruses strong dependence on genome size

Genome size (log) 0 1 2 3

Recent acquisitions

Old acquisitions

Virus-specific ORFans

Virus-specific conserved

Virus hallmark

Most RNAvirusesretroelementsRCR replicons

Large RNA virusesadenotailedphages

NCLDVherpeslargephages

Natural history of viral genesViral Hallmark Genes

Shared by many diverse groups of viruses from the smallest RNA viruses to the giant DNA viruses

Strong support for monophyly of all viral members of the respective gene families

Only distant homologs in cellular organisms

Can be viewed as signatures of the lsquovirus statersquo

Play major roles in genome replication packagingand assembly

4 Rolling circle replicationinitiation endonuclease

1 Jelly-roll capsid protein

2 Superfamily 3 helicase

Protein products of viral hallmark genes

3 RNA-dependent RNA polymeraseand Reverse transcriptase

5 Viral archaeo-eukaryotic DNA primase

6 UL9-like superfamily 2 helicase

7 Packaging ATPase of the FtsK family

8 ATPase subunit of terminase

The primordial gene pool hypothesis (extremely counterintuitive ndash Santiago Elena Feb 16 2011)

The hallmark genes AND by implication the major lineages of modern viruses (at least viruses of prokaryotes) descend directly from a primordial gene pool

Koonin Senkevich Dolja Biol Direct 2006 129

Viral hallmark genes arebullpresent in a huge diversity of viruses and other selfish elementsbullrepresented only by remote homologs in cellular life forms

-synergy with the diversity of genomic strategies

A crucial corollary If viruses come directly from a primordial gene pool then origin of viruses isinextricably linked to the origin of cells

Cell degeneration Escaped genes Pr imordial geneticsystems

CELL

SMALLPARASITICCELL

VERYSMALLVIRUS

CHROMOSOME

PLASMID

VIRUS

RNA

VIRUS

PRE-CELLULAR LIFE FORMS

DNA

VIRUS

mRNA

VIRUS

Competing concepts of the or igin of viruses

CELLS

Origin and evolution of virus-like genetic

elementsin the pre-cellular

era

Koonin Martin TIG 2005Koonin Senkevich Dolja Biol Direct 2006 129Koonin Ann NY Acad Sci 2009

Replicon fusionyields chromosomes

bullViruses and virus-like genetic elements are not ldquojustrdquo pathogens they aredominant entities in the biospherebullEmergence of virus-like parasites is inevitable in any replicating systembullIn the pre-cellular epoch the genetic elements that later became viral andcellular genomes comprised a single pool in which they mixed matched andevolved new increasingly complex gene ensemblesbullDifferent replication strategies including RNA replication reverse transcriptionand DNA replication evolved already in the primordial genetic poolbullWith the emergence of prokaryotic cells a distinct pool of viral genes formed that retained its identity ever since as evidenced by the extant distribution ofviral hallmark genes ldquovirus worldrdquo or the virospherebullThe emergence of the eukaryotic cell was a second melting pot of virus evolution from which viruses of eukaryotes originated via recombination of genes from prokaryote viruses retroelements and the evolving eukaryotic hostbullViruses make essential contributions to the evolution of the genomes of cellularlife forms in particular as vehicles of HGT GTAs transducing phages

The ancient Virus World (VW)

Koonin EV Senkevich TG Dolja VV The ancient Virus World and evolution of cells Biol Direct 2006 Koonin EV Wolf YI Nagasaki K Dolja VV The complexity of the virus world Nat Rev Microbiol 2009

Evolution of antivirus defense systems

bull CRISPRCas system of adaptive immunity in prokaryotes

bull A case for Lamarckian evolutionbull The perennial virus-host arms race

CRISPR repeats and Cas genesCRISPR Clustered Regularly interspaced short palindromic repeatsCas CRISPR-associated (genes)

Sorek et al Nature Rev Microbiol 2008

Makarova KS Aravind L Grishin NV Rogozin IB Koonin EVA DNA repair system specific for thermophilic Archaea and bacteria predicted by genomic context analysisNucleic Acids Res 2002 Jan 1530(2)482-96

During a systematic analysis of conserved gene context in prokaryotic genomes a previously undetected complex partially conserved neighborhood consisting of more than 20 genes was discovered in most Archaea and some bacteria including the hyperthermophiles Thermotoga maritima and Aquifex aeolicus The gene composition and gene order in this neighborhood vary greatly between species but all versions have a stable conserved core that consists of five genes One of the core genes encodes a predicted DNA helicase often fused to a predicted HD-superfamily hydrolase and another encodes a RecB family exonuclease three core genes remain uncharacterized but one of these might encode a nuclease of a new family helliphelliphelliphellipThe functional features of the proteins encoded in this neighborhood suggest that they comprise a previously undetected DNA repair system which to our knowledge is the first repair system largely specific for thermophiles to be identified

Protein components of the system an update with unification of many diverse families

~25 families altogetherFamily SubfamilyA Phyletic

distributionBComments

1 COG1518 COG1518 (cas1) All Putative novel nucleaseintegrase Mostly α-helical protein

2 COG1343 COG1343 (cas2) COG3512ygbF-like

MTH324-likey1723_N-like

All Small protein related to VapD fused to helicase (COG1203) iny1723-like proteins

3 COG1203 COG1203 (cas3) All DNA helicase Most proteins have fusion to HD nuclease

4 RecB-like nuclease

COG1468 (cas4) COG4343 All RecB-like nuclease Contains three-cysteine C-terminal cluster

5 RAMP Repair-associated mysterious

protein

COG1688 (cas5)COG1769 COG1583COG1567 COG1336COG1367 COG1604COG1337 COG1332COG5551

BH0337-likeMJ0978-likeYgcH-like

y1726-like y1727-like

All Belong to ldquoRAMPrdquo family possibly RNA-binding proteinstructurally related to duplicated ferredoxin fold (PDB 1wj9)

6 COG1857 COG1857 COG3649 YgcJ-like y1725-like

All αβ protein predicted nuclease or integrase

7 HD-like nuclease

COG1203 (N-terminus) COG2254 All HD-like nuclease

8 BH0338 BH0338-likeMTH1090-like

All mostlyarchaea and FIRM

Large Zn-finger containing proteins probably nucleases (nucleaseactivity was shown for MTH1090 (ref)

9 COG1353 predicted

polymerase

COG1353 Most archaea some bacteria

Predicted palm-domain polymerase distantly related to viral RdRp and RT

hellipand ~15 other less common proteins Makarova et al BD 2006

CRISPR clustered regularly interspaced short palindromic repeats

CRISPR repeats

TB and TA

Strepto-like

Ecoli-like

Pasteurella-like

Cas2COG1343

Cas1COG1518

Cas4COG1468

Cas3COG1203

ldquoThe common structural characteristics of CRISPR loci are (i) the presence of multiple short direct repeats which show no or very little sequence variation within a given locus (ii) the presence of non-repetitive spacer sequences between the repeats of similar size (iii) the presence of a common leader sequence of a few hundred basepairs in most species harbouring multiple CRISPR loci (iv) the absence of long open reading frames within the locus and (v) the presence of the cas1 gene accompanied by the cas2 cas3 or cas4 genes in CRISPR-containing speciesrdquo

The cas genes are our ldquorepairrdquo system

ldquoHelicaserdquo cassette

ldquoPolymeraserdquo cassette

paragon of prokaryotic genome plasticity-extensive gene shuffling-evidence of multiple horizontal transfers-widespread gene loss

RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins

extreme sequence diversity MOTIFs specific I II III IV V COGs 13361367 hhshhGs ust-lKGhh+hh hhGtt hD lGhttsGh 160413371332 y1726-like slhlpEKuVRGT lRTIDs YGuVTsGhuh COG1851 hGphpGpsaFh hGFGRh BH0337-like TpA-h+GIh-uIh hhLpDV LGsREhuht COG1567 hhhpp ups-lhtAhh luscoGhGh COG1769 hhhh+Ph-hhh ss-hhGhlhsh hGh hGtcp+hsthchp COG1688 (Cas5) hhhhhts sssshhGhlsh lGttphh COGs 15835551 hhhhoPhhl hGtppshGFGl YgcH-like hHphlh hGu+uhGhGhh y1727-like LHphLh GFsthGLStss MJ0978-like hHNH lG+tsuhGhGol

Ferredoxin foldRNA-recognition motif (RRM)

The sequence similarity space of CRISPR repeats visualized with the BioLayoutprogram [26] Dots denote individual repeat sequences connecting lines represent Smith-Waterman similarities such that closer dots represent more similar sequences Dot colors denote cluster association as derived from MCL clustering The 12 largest clusters are indicated by circles together with their sequence logos coarse phylogenetic composition and sample secondary structures where applicableKunin et al Genome Biology 2007 8R61 doi101186gb-2007-8-4-r61

CRISPR show extreme diversity and complex clustering

Arcful GTTGAAATC-------AGACCAAAATGGGATTGAAAG- Metthe GTTAAAATC-------AGACCAAAATGGGATTGAAAT- Pyraby GTTCCAATA-------AGACTAGAATAGAATTGAAAG- Pyrhor GTTTAATAA--------GACTAAAATAGAATTGAAAG- Sulsol GATTAATCC-------------CAAAAGAATTGAAAG- Pyrhor GTTTCCGTA--------GAACTAAATAGTGTGGAAAG- Pyraby GTTTCCGTA--------GAACTTAGTAGTGTGGAAAG- Pyraer GTTTCAACT------------ATCTTTTGATTTTTGG- Pictor GTTAAATAA-------TAACCTAAATAGGATTGAAAG- Theaer GTAAAATAG---------ACCTTAATAGGATTGAAAG- Metace GTTTCAATC-------CCTCAAAGGTCTGATTTTAAC- Sultok GATGAATCC------------CAAAAGGAATTGAAAG- Sulaci GTTTTAGTT-------------TCTTGTCGTTATTAC- Pictor GTTTAAGAA--------TTACTAGATAGTATGGAGT-- Halmar GTTTCAGAC-------GGACCCTTGTGGGGTTGAAGC- Metjan ATTAAAATC-------AGACCGTTTCGGAATGGAAAT- Metkan GTTTCATTA-CCCGTATTATTACGGGTTAATTGCGAG- Nanequ CTTTCAATA---------TTTCTAATATATTAGAAAC- Themar GTTTCAATA-------CTTCCTTAGAGGTATGGAAAC- Theten GTTTCAATC-------CCTTTTAGGTAGGCTAAAAAC Thethe GTTGCAAAC-CTCGTTAGCCTCGTAGAGGATTGAAAC- Aquaeo GTTTTAACT--------CCACACGGTACATTAGAAAC- Bachal GTCGCACTC----TATATGGGT-GCGTGGATTGAAAT- Azoarcus GTGTTCCCC-------GCGCATCGCGGGGGTTGAAG-- Chltep GTCTTCCCC--------ACGCC-CGTGGGGGTGTTTC- Chrvio GTGCTCCCC-------CACGCA-CGTGGGGATGAACCG Clotet GTATTAGTA-------GCACCA-TATTGGAATGTAAAT Anabaena GTTTTAATTAACAAAAATCCCTATCAGGGATTGAAAC- Myctub GTTTCCGTC--CCCTCTCGGGGTTTTGGGTCTGACGAC Yerpes GTTCACTGC--------CGCACAGGCAGCTTAGAAA-- Metcap GTTTCAATCCACTCCCGGCTATTTAGCCGGGAGATAC- Mycgal GTTTTAGCA-CTGTACAATACTTGTGTAAGCAATAAC- Pasmul GTTAACTGC--------CGTATAGGCAGCTTAGAAA-- Pasmul GTTGTAGTTCCCTCTCTCATTTCGCAGTGCTACAAT-- Pasmul GTTCACCAT--------CGTGTAGATGGCTTAGAAA--

All

A large subset of CRISPRs is conserved among diverse specieseven between archaea and bacteria suggesting that

CRISPRs are horizontally transferred together with cas genes

Mojica FJ Diez-Villasenor C Garcia-Martinez J Soria E

Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elementsJ Mol Evol 2005 Feb60(2)174-82

Here we show that CRISPR spacers derive from preexisting sequences either chromosomal or within transmissible genetic elements such as bacteriophages and conjugative plasmids

helliphelliphelliphellipThe transcription of the CRISPR loci (Tang et al 2002) suggests that such activity could be executed by CRISPR-RNA molecules acting as regulatory RNA that specifically recognizes the target through the homologous RNA-spacer sequence similarly to the eukaryotic interference RNA

HypothesisCRISPRCasbull is a prokaryotic immunity system that functions on the

RNAi principle

bull integrates short fragments of essential phageplasmid genes into CRISPRs

bull When expressed these fragments (psiRNA ndash after prokaryotic siRNA) silence the target gene and make the organism immune to the respective agent

bull contains all or most of the protein activities involved in these processes

bull Some of the Cas proteins are functional analogs of the eukaryotic proteins involved in RNAi in particular components of RISC (RNA-Induced Silencing Complex) and form prokaryotic analogs of RISC (pRISCs)

Transcription

(protein-guided) RNA folding

specific transcription factor (COG1517)

cellular RNApol

polycistronic pre-psiRNA

p-dicer (Helicase+HD-Hydrolase)(COG1203 COG2254)

RNA processing(fast)

pre-psiRNA75-100 nt

RNA processing(slow)

p-dicer

psiRNA25-45 nt

p-slicer(COG1468 COG4343

COG1857)

RAMP

target RNA cleavage

p-RISC

plasmid or phage mRNA

Annealing to RNA target

p-RISC

The basic scheme of CASS functioning

psiRNA25-45 nt

RAMP

RAMP-RNA complex(unstable)plasmid or phage

mRNA

Primer elongationCASS RNApol(COG1353)

Amplified annealing

long dsRNA (stable)

p-dicerDuplex degradation

RAMPRAMP binding

Cycle continues

Annealing to RNA target

Variant of CASS functioning with polymerasepsi-RNA amplification(mostly in thermophiles but also in Mycobacteria)

plasmid or phage mRNA

pre-psiRNA75-100 nt

CASS RNApol(COG1353)Or other RT

Reverse transcription with random copy choice

Random RNA recombination and reverse transcription

dsDNA with CRISPRs and target-derived spacers

Homologous recombination with genomic CRISPR region

integrase(COG1518)

genomic DNA

genomic DNA with new target-derived spacer

OR

New psiRNA generation

Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12

Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity

Key validation

bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease

Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes

(1) Adaptation

(2) Expression amp Processing

(3) Interference

invading virus plasmid

CRISPRleader

5 4 3 2 1

6 5 4 3 2 1

invading virus plasmid

invading virus plasmid

host

host

cas operon

Cascade

Cas3

cas3 cas2cas1cas5

Van der Oost et al TIBS 2009

The three types of CRISPRCas systems and their signature genes

1343

1343

1343

1343

Helicase1203

HD-f

Cas2

Cas6

Cas6

Cas5

Cas8

Cas9

Cas1

Cas3

Cas10

Cas11

Cas7

Cas6

Cas6

Cas5 Cas6

Cas4

Cas6

13433513

1343

RuvC-f

3513

HNH

13433513

1857

18571857 RAMP1688

RAMP1688

RecB-f1468

RecB-f1468

Helicase1203

R AMP1583

1857 1857Helicase1203

1857Helicase1203

1343 y1724

ygcKygcL

EcoliCASS2

YpestCASS3

NmeniCASS4

MtubeCASS6 RAMP module

DvulgCASS1

Tneap and HmariCASS7

ApernCASS5

Type I (Helicase cassette)

Cascade

Type II (HNH-type)

Type III (RAMP module or Polymerase cassette)

CASS4a

1857 1343RecB-f1468

Helicase1203

BH0338MTH1090LA3191

1421 RAMP1567

RAMP15833337RAMP

1583

RAMP1583

RAMP1769

SPy1049

Cas6

Cas61857

RecB-f1468

RAMP1688

Helicase1203

AF0070

R AMP1583

1343

1517

Cas12 Cas13

RAMP1688

RAMP1688

RAMP1688

Makarova et al NRM submitted

Experimental data on CRISPRCas systems

1857

1857

1857

RAMP1688

10

Cordip DIP0037Bacfra BF3951

Wolsuc WS1444Camjej Cj1522c

Neimen NMA0630Pasmul PM1126

Strthe str0658Strpyo Spy 0770

Treden TDE0328Staepi SERP2463

Mycgal MGA 0523Mycmob MMOB0320

Porgin PG1982Legpne lpp0161

Wolsuc WS1615Sulsol SSO1450

Aerper APE1240Sulsol SSO1405

Pyraer PAE0200Pyraer PAE0081

Metmaz MM0559Metmaz MM3249

Arcful AF1878Mansuc MS1635

Niteur NE0111Myctub Rv2817c

Nostoc alr0381Synech slr7071Thethe TTHB145

Chrvio CV1229Xanaxo XAC3842

Chltep CT1130Desvul DVUA0134

Metcap MCA0651Azoarc ebA3283

Deigeo_DRAFT_1682Mansuc MS0981

Baccla ABC3592Bachal BH0341

Strpyo Spy 1286Vibvul VVA1544

Nostoc alr1468Synech slr7092

Synech slr7016Geosul GSU0057

Thethe TTHB224Nostoc alr1568

Metkan MK1312

Esccol b2755Saltyp STM2938

Phopro PBPRC0034Geosul GSU1392

Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639

Symthe_STH663Corjei jk0643

Nocfar nfa44220Strave SAV7537

Metcap MCA0930Cordip DIP2214

Chrvio CV1756Zymmob ZMO0680

Pasmul PM0311Erwcar ECA3679

Yerpes y1722Legpne lpl2837Phopro PBPRB1995

Acinet ACIAD2484

Clotet CTC01148Fusnuc FN1177

Aquaeo aq 369Theten TTE2658Themar TM1797

Bacfra BF2544Porgin PG2014

Halmar pNG4053Clotet CTC01463

Metjan MJ0378Pyrhor PH1245

Thekod TK0455Metthe MTH1084

Arcful AF2435Metace MA3670

Nanequ NEQ017Pyrhor PH0173

Pictor PTO0003Pictor PTO0049Thevol TVN0106

1343

1343

1343

1343

1518

1518

1343

1518

RecB-f4343

RecB-f1468

RecB-f1468

RecB-f1468

Helicase12031857 RAMP

1688

RAMP1688

Helicase1203

Helicase1203

RAMPBH0337

RAMPYgcH

RAMPy1727

RAMPy17261857

Helicase1203

HD-f

Helicase1203

HD-f

1343

HD-f

HD-f

HD-f

RuvC-f

AF0070

BH0338

y1724

ygcKygcL

RAMP1583

HTH

AF1870

SPy1049

3574

AF1870

PH0918

HTH 3574

1857 RAMP1688

RecB-f1468

Helicase1203

HD-f

AF1870

MTH1090

AF1873

RAMP1583

3574

AF1870

3574

1343

1

7

2

3

4

5

6

7a

Sulsol SSO1999Sulsol SSO1440

Sultok ST2639Sultok ST0032

Sulsol SSO1402Pyraer PAE0068

Arcful AF1874Pyrhor PH0917Thekod TK0450

Pyraby PAB1689

13433513

3513

1518

RuvC-f

HNH

4aHNH

1 0

Pyrhor PH0162

6

64

6

4

16

17

7

7

7

7

1

2

7

7

7

7

5 7

5 75 7

5 75 7

5 7

6 1

7

Metkan MK1314

Metace MA1931

Metmaz MM3356

Metbar Mbar_A1355

Strthe stu0960Myctub Rv2823cStaepi SERP2461

Themar TM1811Mansuc MS1651

Niteur NE0123Thethe TTP0102

Metthe MTH1082

Metthe MTH326

Metjan MJ1672Pictor PTO0053Thevol TVN0112

Metkan MK1297Thethe TTP0115

Bachal BH0328Synech sll7090

Aquaeo aq 387Theten TTE2639

Themar TM1794Arcful AF1867

Pyrfur PF1129Thefus Tfu1578

Porgin PG1987Sultok ST1979

Sulsol SSO1991Sulsol SSO1729

Sultok ST0012Sulaci Saci 2046Nostoc all1479

1421 RAMP1567

RAMP1583

RAMPMTH323

RAMP1769

XMTH324

ldquoPolymeraserdquo cassette

ldquoHelicaserdquo cassette(A)

(B)

(C)

Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4

Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56

Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)

mRNA targeting

Phylogeny of Cas1 and the 3 types of CRISPRCas systems

228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes

EcoliCASS2Type IType I

Type I

Type I

Type I

Type III

Type III

Type III

Type III

Type III

Type III

Type II

YpestCASS3

NmeniCASS4

PolymeraseRAMPmoduleMtubeCASS6

PolRAMP

DvulgCASS1

TneapHmariCASS7

ApernCASS5

cas1cas9 cas2 cas4

cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI

II

IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2

RAMP

Makarova et al NRM submitted

CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria

bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes

54

256

1537

26

516

2

9

7 56107

3

10

13

383

210

46

216

8

1

7 70211

10

1

0102030405060708090

100

1623

84 6

17 7

2322

0

9

0

0

151

2

1

21

17

20 1

0

1533 28 7 14

0

9 7 40

117 210

0102030405060708090

100

Type I Type II Type IIICas1presenceabsence

Makarova et al in preparation

Modular evolution of the 3 types of CRISPRCas

Makarova et al in preparation

Mol Microbiol 2011 Jan79(2)484-502

A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair

Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF

Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair

Back to repair functions

The 3 major modalities of evolution

Koonin Wolf Biol Direct 2009

CRISPRCas as a bona fide Lamarckian system

Koonin Wolf Biol Direct 2009

Phenomenon Biological rolefunction

Phyletic spread

Lamarckian criteria

Genomic changes caused by environmental factor

Changes are specific to relevant genomic loci

Changes provide adaptation to the causative factor

Bona fide LamarckianCRISPRCas Defense against

viruses and other mobile elements

Most of the Archaeaand many bacteria

Yes Yes Yes

piRNA Defense against transposable elements in germline

Animals Yes Yes Yes

HGT (specific cases)

Adaptation to new environment stress response resistance

Archaea bacteria unicellular eukaryotes

Yes Yes Yes

Quasi-LamarckianHGT (general phenomenon)

Diverse innovations Archaea bacteria unicellular eukaryotes

Yes No Yesno

Stress-induced mutagenesis

Stress responseresistanceadaptation to new conditions

Ubiquitous Yes No or partially

Yes (but general evolvabilityenhanced as well)

Diverse Lamarckian and quasi-Lamarckian phenomena

Koonin Wolf Biol Direct 2009

Stress as a gauge of evolutionary modality

Koonin Wolf Biol Direct 2009

bullEvolution of parasites is intrinsic to any replicator system

bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since

bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms

bullPerennial arms race between parasites and hosts is one of the principal factors of evolution

Valerian DoljaOregon State University

Yuri Wolf NCBITatiana Senkevich NIAID NIH

Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille

Didier Raoult et al

Bill MartinUniv Duesseldorf

Kira Makarova NCBI

  • Slide Number 1
  • What is a virus
  • Slide Number 3
  • Slide Number 4
  • Slide Number 5
  • Comparative genomics shows that viruses that cause human diseases belong to families that evolved hundreds of millions or even billions years agoViruses accompany the evolving cellular life throughout its history and might even predate it
  • Slide Number 7
  • Slide Number 8
  • Slide Number 9
  • Slide Number 10
  • Slide Number 11
  • Slide Number 12
  • Slide Number 13
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Slide Number 18
  • Slide Number 19
  • Slide Number 20
  • Slide Number 21
  • Slide Number 22
  • Slide Number 23
  • Slide Number 24
  • Slide Number 25
  • Slide Number 26
  • Slide Number 27
  • Slide Number 28
  • Slide Number 29
  • Slide Number 30
  • Slide Number 31
  • Slide Number 32
  • Slide Number 33
  • Slide Number 34
  • Slide Number 35
  • Slide Number 36
  • Evolution of antivirus defense systems
  • Slide Number 38
  • Slide Number 39
  • Slide Number 40
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins extreme sequence diversity
  • Slide Number 46
  • Slide Number 47
  • Slide Number 48
  • Slide Number 49
  • Hypothesis
  • Slide Number 51
  • Slide Number 52
  • Slide Number 53
  • Slide Number 54
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • The three types of CRISPRCas systems and their signature genes
  • Experimental data on CRISPRCas systems
  • Phylogeny of Cas1 and the 3 types of CRISPRCas systems
  • CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
  • Slide Number 62
  • Slide Number 63
  • Slide Number 64
  • Slide Number 65
  • Slide Number 66
  • Slide Number 67
  • Slide Number 68
  • Slide Number 69
Page 24: The Virus World, its evolution, evolution of antiviral defense

Conclusions on the picorna and NCLDV storiesBig Bangs of virus evolution

bullPhylogenomic analysis of the picorna-like superfamily of eukaryotic RNA viruses indicates that the major lineages within this superfamily diverged prior to the divergence of the eukaryotic supergroupsbullBig Bang ndash an explosive early phase of viral evolutionbullMost likely the same pattern holds for other major groups ofviruses as illustrated by the evolutionary study of NCLDVa completely different group of virusesbullThe Big Bangs of eukaryotic virus evolution occurred concomitantly with a similar rapid phase of host evolution andcould be a manifestation of a general model of major evolutionarytransitions

Koonin The Biological Big Bang model for the major transitions in evolutionBiol Direct 2007 Aug 20221

Koonin Wolf Nagasaki Dolja Nat Rev Microbiol 2008 Dec6(12)925-39

+

1 Positive-strandRNA

3-30 kb - +R R ET

T+R

Class Replication cycle

2 Double-strandRNA

4-25 kb plusmnTr R E

T

+ plusmn

3 Negative-strandRNA

11-20 kbR R E

RdRp CPT

+-Tr -

+

Genetic cycles of RNA viruses

Diversity of viral genetic cycles versus the uniform genetic cycle of all cellular organisms Viruses are the

biospherersquos laboratory of genomic strategies

RdRp CP

RdRp CP

+4 Retroid

RNA viruses 7-12 kb +RT Tr E

T+Tr

plusmn

5 Retroid DNAviruses

elements2-10 kb +Tr

T+Tr

plusmnE RT

plusmn

RT CP

RT

Class Replication cycle

Genetic cycles of retroid viruses and retroelements

Genetic cycles of DNA viruses and plasmids 6 ssDNAvirusesplasmids2-11 kb

7 dsDNA virusesplasmids

5-1200 kb

+RCR RCR E

+Tr

plusmn +

TRCE

Tr

plusmn

+T

plusmnR E

Pr-Pol ALL CELLULARORGANISMS

YOU ARE HERE

I Genes with readily detectable homologs from cellular life forms 1 Genes with closely related homologs from cellular organisms

(typically the host of the given virus) present in a narrow group of viruses2 Genes that are conserved within a virus lineage or even several

lineages and have moderately close cellular homologsOrigin relatively recent (1) or ancient (2) acquisition from host

II Virus-specific genes3 ORFans ie genes without detectable homologs except possibly

in closely related viruses 4 Virus-specific genes that are conserved within a virus lineageAcquisition from host but with rapid divergence from ancestor once within viral genomes

III Viral hallmark genes5 Genes shared by many diverse virus lineages with only very distant

homologs in cellular organisms

Natural history of viral genes a one-page summary of viral comparative genomics

Contributions of different classes of viral genes to the genomes of different classes of viruses strong dependence on genome size

Genome size (log) 0 1 2 3

Recent acquisitions

Old acquisitions

Virus-specific ORFans

Virus-specific conserved

Virus hallmark

Most RNAvirusesretroelementsRCR replicons

Large RNA virusesadenotailedphages

NCLDVherpeslargephages

Natural history of viral genesViral Hallmark Genes

Shared by many diverse groups of viruses from the smallest RNA viruses to the giant DNA viruses

Strong support for monophyly of all viral members of the respective gene families

Only distant homologs in cellular organisms

Can be viewed as signatures of the lsquovirus statersquo

Play major roles in genome replication packagingand assembly

4 Rolling circle replicationinitiation endonuclease

1 Jelly-roll capsid protein

2 Superfamily 3 helicase

Protein products of viral hallmark genes

3 RNA-dependent RNA polymeraseand Reverse transcriptase

5 Viral archaeo-eukaryotic DNA primase

6 UL9-like superfamily 2 helicase

7 Packaging ATPase of the FtsK family

8 ATPase subunit of terminase

The primordial gene pool hypothesis (extremely counterintuitive ndash Santiago Elena Feb 16 2011)

The hallmark genes AND by implication the major lineages of modern viruses (at least viruses of prokaryotes) descend directly from a primordial gene pool

Koonin Senkevich Dolja Biol Direct 2006 129

Viral hallmark genes arebullpresent in a huge diversity of viruses and other selfish elementsbullrepresented only by remote homologs in cellular life forms

-synergy with the diversity of genomic strategies

A crucial corollary If viruses come directly from a primordial gene pool then origin of viruses isinextricably linked to the origin of cells

Cell degeneration Escaped genes Pr imordial geneticsystems

CELL

SMALLPARASITICCELL

VERYSMALLVIRUS

CHROMOSOME

PLASMID

VIRUS

RNA

VIRUS

PRE-CELLULAR LIFE FORMS

DNA

VIRUS

mRNA

VIRUS

Competing concepts of the or igin of viruses

CELLS

Origin and evolution of virus-like genetic

elementsin the pre-cellular

era

Koonin Martin TIG 2005Koonin Senkevich Dolja Biol Direct 2006 129Koonin Ann NY Acad Sci 2009

Replicon fusionyields chromosomes

bullViruses and virus-like genetic elements are not ldquojustrdquo pathogens they aredominant entities in the biospherebullEmergence of virus-like parasites is inevitable in any replicating systembullIn the pre-cellular epoch the genetic elements that later became viral andcellular genomes comprised a single pool in which they mixed matched andevolved new increasingly complex gene ensemblesbullDifferent replication strategies including RNA replication reverse transcriptionand DNA replication evolved already in the primordial genetic poolbullWith the emergence of prokaryotic cells a distinct pool of viral genes formed that retained its identity ever since as evidenced by the extant distribution ofviral hallmark genes ldquovirus worldrdquo or the virospherebullThe emergence of the eukaryotic cell was a second melting pot of virus evolution from which viruses of eukaryotes originated via recombination of genes from prokaryote viruses retroelements and the evolving eukaryotic hostbullViruses make essential contributions to the evolution of the genomes of cellularlife forms in particular as vehicles of HGT GTAs transducing phages

The ancient Virus World (VW)

Koonin EV Senkevich TG Dolja VV The ancient Virus World and evolution of cells Biol Direct 2006 Koonin EV Wolf YI Nagasaki K Dolja VV The complexity of the virus world Nat Rev Microbiol 2009

Evolution of antivirus defense systems

bull CRISPRCas system of adaptive immunity in prokaryotes

bull A case for Lamarckian evolutionbull The perennial virus-host arms race

CRISPR repeats and Cas genesCRISPR Clustered Regularly interspaced short palindromic repeatsCas CRISPR-associated (genes)

Sorek et al Nature Rev Microbiol 2008

Makarova KS Aravind L Grishin NV Rogozin IB Koonin EVA DNA repair system specific for thermophilic Archaea and bacteria predicted by genomic context analysisNucleic Acids Res 2002 Jan 1530(2)482-96

During a systematic analysis of conserved gene context in prokaryotic genomes a previously undetected complex partially conserved neighborhood consisting of more than 20 genes was discovered in most Archaea and some bacteria including the hyperthermophiles Thermotoga maritima and Aquifex aeolicus The gene composition and gene order in this neighborhood vary greatly between species but all versions have a stable conserved core that consists of five genes One of the core genes encodes a predicted DNA helicase often fused to a predicted HD-superfamily hydrolase and another encodes a RecB family exonuclease three core genes remain uncharacterized but one of these might encode a nuclease of a new family helliphelliphelliphellipThe functional features of the proteins encoded in this neighborhood suggest that they comprise a previously undetected DNA repair system which to our knowledge is the first repair system largely specific for thermophiles to be identified

Protein components of the system an update with unification of many diverse families

~25 families altogetherFamily SubfamilyA Phyletic

distributionBComments

1 COG1518 COG1518 (cas1) All Putative novel nucleaseintegrase Mostly α-helical protein

2 COG1343 COG1343 (cas2) COG3512ygbF-like

MTH324-likey1723_N-like

All Small protein related to VapD fused to helicase (COG1203) iny1723-like proteins

3 COG1203 COG1203 (cas3) All DNA helicase Most proteins have fusion to HD nuclease

4 RecB-like nuclease

COG1468 (cas4) COG4343 All RecB-like nuclease Contains three-cysteine C-terminal cluster

5 RAMP Repair-associated mysterious

protein

COG1688 (cas5)COG1769 COG1583COG1567 COG1336COG1367 COG1604COG1337 COG1332COG5551

BH0337-likeMJ0978-likeYgcH-like

y1726-like y1727-like

All Belong to ldquoRAMPrdquo family possibly RNA-binding proteinstructurally related to duplicated ferredoxin fold (PDB 1wj9)

6 COG1857 COG1857 COG3649 YgcJ-like y1725-like

All αβ protein predicted nuclease or integrase

7 HD-like nuclease

COG1203 (N-terminus) COG2254 All HD-like nuclease

8 BH0338 BH0338-likeMTH1090-like

All mostlyarchaea and FIRM

Large Zn-finger containing proteins probably nucleases (nucleaseactivity was shown for MTH1090 (ref)

9 COG1353 predicted

polymerase

COG1353 Most archaea some bacteria

Predicted palm-domain polymerase distantly related to viral RdRp and RT

hellipand ~15 other less common proteins Makarova et al BD 2006

CRISPR clustered regularly interspaced short palindromic repeats

CRISPR repeats

TB and TA

Strepto-like

Ecoli-like

Pasteurella-like

Cas2COG1343

Cas1COG1518

Cas4COG1468

Cas3COG1203

ldquoThe common structural characteristics of CRISPR loci are (i) the presence of multiple short direct repeats which show no or very little sequence variation within a given locus (ii) the presence of non-repetitive spacer sequences between the repeats of similar size (iii) the presence of a common leader sequence of a few hundred basepairs in most species harbouring multiple CRISPR loci (iv) the absence of long open reading frames within the locus and (v) the presence of the cas1 gene accompanied by the cas2 cas3 or cas4 genes in CRISPR-containing speciesrdquo

The cas genes are our ldquorepairrdquo system

ldquoHelicaserdquo cassette

ldquoPolymeraserdquo cassette

paragon of prokaryotic genome plasticity-extensive gene shuffling-evidence of multiple horizontal transfers-widespread gene loss

RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins

extreme sequence diversity MOTIFs specific I II III IV V COGs 13361367 hhshhGs ust-lKGhh+hh hhGtt hD lGhttsGh 160413371332 y1726-like slhlpEKuVRGT lRTIDs YGuVTsGhuh COG1851 hGphpGpsaFh hGFGRh BH0337-like TpA-h+GIh-uIh hhLpDV LGsREhuht COG1567 hhhpp ups-lhtAhh luscoGhGh COG1769 hhhh+Ph-hhh ss-hhGhlhsh hGh hGtcp+hsthchp COG1688 (Cas5) hhhhhts sssshhGhlsh lGttphh COGs 15835551 hhhhoPhhl hGtppshGFGl YgcH-like hHphlh hGu+uhGhGhh y1727-like LHphLh GFsthGLStss MJ0978-like hHNH lG+tsuhGhGol

Ferredoxin foldRNA-recognition motif (RRM)

The sequence similarity space of CRISPR repeats visualized with the BioLayoutprogram [26] Dots denote individual repeat sequences connecting lines represent Smith-Waterman similarities such that closer dots represent more similar sequences Dot colors denote cluster association as derived from MCL clustering The 12 largest clusters are indicated by circles together with their sequence logos coarse phylogenetic composition and sample secondary structures where applicableKunin et al Genome Biology 2007 8R61 doi101186gb-2007-8-4-r61

CRISPR show extreme diversity and complex clustering

Arcful GTTGAAATC-------AGACCAAAATGGGATTGAAAG- Metthe GTTAAAATC-------AGACCAAAATGGGATTGAAAT- Pyraby GTTCCAATA-------AGACTAGAATAGAATTGAAAG- Pyrhor GTTTAATAA--------GACTAAAATAGAATTGAAAG- Sulsol GATTAATCC-------------CAAAAGAATTGAAAG- Pyrhor GTTTCCGTA--------GAACTAAATAGTGTGGAAAG- Pyraby GTTTCCGTA--------GAACTTAGTAGTGTGGAAAG- Pyraer GTTTCAACT------------ATCTTTTGATTTTTGG- Pictor GTTAAATAA-------TAACCTAAATAGGATTGAAAG- Theaer GTAAAATAG---------ACCTTAATAGGATTGAAAG- Metace GTTTCAATC-------CCTCAAAGGTCTGATTTTAAC- Sultok GATGAATCC------------CAAAAGGAATTGAAAG- Sulaci GTTTTAGTT-------------TCTTGTCGTTATTAC- Pictor GTTTAAGAA--------TTACTAGATAGTATGGAGT-- Halmar GTTTCAGAC-------GGACCCTTGTGGGGTTGAAGC- Metjan ATTAAAATC-------AGACCGTTTCGGAATGGAAAT- Metkan GTTTCATTA-CCCGTATTATTACGGGTTAATTGCGAG- Nanequ CTTTCAATA---------TTTCTAATATATTAGAAAC- Themar GTTTCAATA-------CTTCCTTAGAGGTATGGAAAC- Theten GTTTCAATC-------CCTTTTAGGTAGGCTAAAAAC Thethe GTTGCAAAC-CTCGTTAGCCTCGTAGAGGATTGAAAC- Aquaeo GTTTTAACT--------CCACACGGTACATTAGAAAC- Bachal GTCGCACTC----TATATGGGT-GCGTGGATTGAAAT- Azoarcus GTGTTCCCC-------GCGCATCGCGGGGGTTGAAG-- Chltep GTCTTCCCC--------ACGCC-CGTGGGGGTGTTTC- Chrvio GTGCTCCCC-------CACGCA-CGTGGGGATGAACCG Clotet GTATTAGTA-------GCACCA-TATTGGAATGTAAAT Anabaena GTTTTAATTAACAAAAATCCCTATCAGGGATTGAAAC- Myctub GTTTCCGTC--CCCTCTCGGGGTTTTGGGTCTGACGAC Yerpes GTTCACTGC--------CGCACAGGCAGCTTAGAAA-- Metcap GTTTCAATCCACTCCCGGCTATTTAGCCGGGAGATAC- Mycgal GTTTTAGCA-CTGTACAATACTTGTGTAAGCAATAAC- Pasmul GTTAACTGC--------CGTATAGGCAGCTTAGAAA-- Pasmul GTTGTAGTTCCCTCTCTCATTTCGCAGTGCTACAAT-- Pasmul GTTCACCAT--------CGTGTAGATGGCTTAGAAA--

All

A large subset of CRISPRs is conserved among diverse specieseven between archaea and bacteria suggesting that

CRISPRs are horizontally transferred together with cas genes

Mojica FJ Diez-Villasenor C Garcia-Martinez J Soria E

Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elementsJ Mol Evol 2005 Feb60(2)174-82

Here we show that CRISPR spacers derive from preexisting sequences either chromosomal or within transmissible genetic elements such as bacteriophages and conjugative plasmids

helliphelliphelliphellipThe transcription of the CRISPR loci (Tang et al 2002) suggests that such activity could be executed by CRISPR-RNA molecules acting as regulatory RNA that specifically recognizes the target through the homologous RNA-spacer sequence similarly to the eukaryotic interference RNA

HypothesisCRISPRCasbull is a prokaryotic immunity system that functions on the

RNAi principle

bull integrates short fragments of essential phageplasmid genes into CRISPRs

bull When expressed these fragments (psiRNA ndash after prokaryotic siRNA) silence the target gene and make the organism immune to the respective agent

bull contains all or most of the protein activities involved in these processes

bull Some of the Cas proteins are functional analogs of the eukaryotic proteins involved in RNAi in particular components of RISC (RNA-Induced Silencing Complex) and form prokaryotic analogs of RISC (pRISCs)

Transcription

(protein-guided) RNA folding

specific transcription factor (COG1517)

cellular RNApol

polycistronic pre-psiRNA

p-dicer (Helicase+HD-Hydrolase)(COG1203 COG2254)

RNA processing(fast)

pre-psiRNA75-100 nt

RNA processing(slow)

p-dicer

psiRNA25-45 nt

p-slicer(COG1468 COG4343

COG1857)

RAMP

target RNA cleavage

p-RISC

plasmid or phage mRNA

Annealing to RNA target

p-RISC

The basic scheme of CASS functioning

psiRNA25-45 nt

RAMP

RAMP-RNA complex(unstable)plasmid or phage

mRNA

Primer elongationCASS RNApol(COG1353)

Amplified annealing

long dsRNA (stable)

p-dicerDuplex degradation

RAMPRAMP binding

Cycle continues

Annealing to RNA target

Variant of CASS functioning with polymerasepsi-RNA amplification(mostly in thermophiles but also in Mycobacteria)

plasmid or phage mRNA

pre-psiRNA75-100 nt

CASS RNApol(COG1353)Or other RT

Reverse transcription with random copy choice

Random RNA recombination and reverse transcription

dsDNA with CRISPRs and target-derived spacers

Homologous recombination with genomic CRISPR region

integrase(COG1518)

genomic DNA

genomic DNA with new target-derived spacer

OR

New psiRNA generation

Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12

Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity

Key validation

bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease

Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes

(1) Adaptation

(2) Expression amp Processing

(3) Interference

invading virus plasmid

CRISPRleader

5 4 3 2 1

6 5 4 3 2 1

invading virus plasmid

invading virus plasmid

host

host

cas operon

Cascade

Cas3

cas3 cas2cas1cas5

Van der Oost et al TIBS 2009

The three types of CRISPRCas systems and their signature genes

1343

1343

1343

1343

Helicase1203

HD-f

Cas2

Cas6

Cas6

Cas5

Cas8

Cas9

Cas1

Cas3

Cas10

Cas11

Cas7

Cas6

Cas6

Cas5 Cas6

Cas4

Cas6

13433513

1343

RuvC-f

3513

HNH

13433513

1857

18571857 RAMP1688

RAMP1688

RecB-f1468

RecB-f1468

Helicase1203

R AMP1583

1857 1857Helicase1203

1857Helicase1203

1343 y1724

ygcKygcL

EcoliCASS2

YpestCASS3

NmeniCASS4

MtubeCASS6 RAMP module

DvulgCASS1

Tneap and HmariCASS7

ApernCASS5

Type I (Helicase cassette)

Cascade

Type II (HNH-type)

Type III (RAMP module or Polymerase cassette)

CASS4a

1857 1343RecB-f1468

Helicase1203

BH0338MTH1090LA3191

1421 RAMP1567

RAMP15833337RAMP

1583

RAMP1583

RAMP1769

SPy1049

Cas6

Cas61857

RecB-f1468

RAMP1688

Helicase1203

AF0070

R AMP1583

1343

1517

Cas12 Cas13

RAMP1688

RAMP1688

RAMP1688

Makarova et al NRM submitted

Experimental data on CRISPRCas systems

1857

1857

1857

RAMP1688

10

Cordip DIP0037Bacfra BF3951

Wolsuc WS1444Camjej Cj1522c

Neimen NMA0630Pasmul PM1126

Strthe str0658Strpyo Spy 0770

Treden TDE0328Staepi SERP2463

Mycgal MGA 0523Mycmob MMOB0320

Porgin PG1982Legpne lpp0161

Wolsuc WS1615Sulsol SSO1450

Aerper APE1240Sulsol SSO1405

Pyraer PAE0200Pyraer PAE0081

Metmaz MM0559Metmaz MM3249

Arcful AF1878Mansuc MS1635

Niteur NE0111Myctub Rv2817c

Nostoc alr0381Synech slr7071Thethe TTHB145

Chrvio CV1229Xanaxo XAC3842

Chltep CT1130Desvul DVUA0134

Metcap MCA0651Azoarc ebA3283

Deigeo_DRAFT_1682Mansuc MS0981

Baccla ABC3592Bachal BH0341

Strpyo Spy 1286Vibvul VVA1544

Nostoc alr1468Synech slr7092

Synech slr7016Geosul GSU0057

Thethe TTHB224Nostoc alr1568

Metkan MK1312

Esccol b2755Saltyp STM2938

Phopro PBPRC0034Geosul GSU1392

Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639

Symthe_STH663Corjei jk0643

Nocfar nfa44220Strave SAV7537

Metcap MCA0930Cordip DIP2214

Chrvio CV1756Zymmob ZMO0680

Pasmul PM0311Erwcar ECA3679

Yerpes y1722Legpne lpl2837Phopro PBPRB1995

Acinet ACIAD2484

Clotet CTC01148Fusnuc FN1177

Aquaeo aq 369Theten TTE2658Themar TM1797

Bacfra BF2544Porgin PG2014

Halmar pNG4053Clotet CTC01463

Metjan MJ0378Pyrhor PH1245

Thekod TK0455Metthe MTH1084

Arcful AF2435Metace MA3670

Nanequ NEQ017Pyrhor PH0173

Pictor PTO0003Pictor PTO0049Thevol TVN0106

1343

1343

1343

1343

1518

1518

1343

1518

RecB-f4343

RecB-f1468

RecB-f1468

RecB-f1468

Helicase12031857 RAMP

1688

RAMP1688

Helicase1203

Helicase1203

RAMPBH0337

RAMPYgcH

RAMPy1727

RAMPy17261857

Helicase1203

HD-f

Helicase1203

HD-f

1343

HD-f

HD-f

HD-f

RuvC-f

AF0070

BH0338

y1724

ygcKygcL

RAMP1583

HTH

AF1870

SPy1049

3574

AF1870

PH0918

HTH 3574

1857 RAMP1688

RecB-f1468

Helicase1203

HD-f

AF1870

MTH1090

AF1873

RAMP1583

3574

AF1870

3574

1343

1

7

2

3

4

5

6

7a

Sulsol SSO1999Sulsol SSO1440

Sultok ST2639Sultok ST0032

Sulsol SSO1402Pyraer PAE0068

Arcful AF1874Pyrhor PH0917Thekod TK0450

Pyraby PAB1689

13433513

3513

1518

RuvC-f

HNH

4aHNH

1 0

Pyrhor PH0162

6

64

6

4

16

17

7

7

7

7

1

2

7

7

7

7

5 7

5 75 7

5 75 7

5 7

6 1

7

Metkan MK1314

Metace MA1931

Metmaz MM3356

Metbar Mbar_A1355

Strthe stu0960Myctub Rv2823cStaepi SERP2461

Themar TM1811Mansuc MS1651

Niteur NE0123Thethe TTP0102

Metthe MTH1082

Metthe MTH326

Metjan MJ1672Pictor PTO0053Thevol TVN0112

Metkan MK1297Thethe TTP0115

Bachal BH0328Synech sll7090

Aquaeo aq 387Theten TTE2639

Themar TM1794Arcful AF1867

Pyrfur PF1129Thefus Tfu1578

Porgin PG1987Sultok ST1979

Sulsol SSO1991Sulsol SSO1729

Sultok ST0012Sulaci Saci 2046Nostoc all1479

1421 RAMP1567

RAMP1583

RAMPMTH323

RAMP1769

XMTH324

ldquoPolymeraserdquo cassette

ldquoHelicaserdquo cassette(A)

(B)

(C)

Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4

Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56

Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)

mRNA targeting

Phylogeny of Cas1 and the 3 types of CRISPRCas systems

228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes

EcoliCASS2Type IType I

Type I

Type I

Type I

Type III

Type III

Type III

Type III

Type III

Type III

Type II

YpestCASS3

NmeniCASS4

PolymeraseRAMPmoduleMtubeCASS6

PolRAMP

DvulgCASS1

TneapHmariCASS7

ApernCASS5

cas1cas9 cas2 cas4

cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI

II

IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2

RAMP

Makarova et al NRM submitted

CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria

bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes

54

256

1537

26

516

2

9

7 56107

3

10

13

383

210

46

216

8

1

7 70211

10

1

0102030405060708090

100

1623

84 6

17 7

2322

0

9

0

0

151

2

1

21

17

20 1

0

1533 28 7 14

0

9 7 40

117 210

0102030405060708090

100

Type I Type II Type IIICas1presenceabsence

Makarova et al in preparation

Modular evolution of the 3 types of CRISPRCas

Makarova et al in preparation

Mol Microbiol 2011 Jan79(2)484-502

A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair

Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF

Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair

Back to repair functions

The 3 major modalities of evolution

Koonin Wolf Biol Direct 2009

CRISPRCas as a bona fide Lamarckian system

Koonin Wolf Biol Direct 2009

Phenomenon Biological rolefunction

Phyletic spread

Lamarckian criteria

Genomic changes caused by environmental factor

Changes are specific to relevant genomic loci

Changes provide adaptation to the causative factor

Bona fide LamarckianCRISPRCas Defense against

viruses and other mobile elements

Most of the Archaeaand many bacteria

Yes Yes Yes

piRNA Defense against transposable elements in germline

Animals Yes Yes Yes

HGT (specific cases)

Adaptation to new environment stress response resistance

Archaea bacteria unicellular eukaryotes

Yes Yes Yes

Quasi-LamarckianHGT (general phenomenon)

Diverse innovations Archaea bacteria unicellular eukaryotes

Yes No Yesno

Stress-induced mutagenesis

Stress responseresistanceadaptation to new conditions

Ubiquitous Yes No or partially

Yes (but general evolvabilityenhanced as well)

Diverse Lamarckian and quasi-Lamarckian phenomena

Koonin Wolf Biol Direct 2009

Stress as a gauge of evolutionary modality

Koonin Wolf Biol Direct 2009

bullEvolution of parasites is intrinsic to any replicator system

bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since

bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms

bullPerennial arms race between parasites and hosts is one of the principal factors of evolution

Valerian DoljaOregon State University

Yuri Wolf NCBITatiana Senkevich NIAID NIH

Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille

Didier Raoult et al

Bill MartinUniv Duesseldorf

Kira Makarova NCBI

  • Slide Number 1
  • What is a virus
  • Slide Number 3
  • Slide Number 4
  • Slide Number 5
  • Comparative genomics shows that viruses that cause human diseases belong to families that evolved hundreds of millions or even billions years agoViruses accompany the evolving cellular life throughout its history and might even predate it
  • Slide Number 7
  • Slide Number 8
  • Slide Number 9
  • Slide Number 10
  • Slide Number 11
  • Slide Number 12
  • Slide Number 13
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Slide Number 18
  • Slide Number 19
  • Slide Number 20
  • Slide Number 21
  • Slide Number 22
  • Slide Number 23
  • Slide Number 24
  • Slide Number 25
  • Slide Number 26
  • Slide Number 27
  • Slide Number 28
  • Slide Number 29
  • Slide Number 30
  • Slide Number 31
  • Slide Number 32
  • Slide Number 33
  • Slide Number 34
  • Slide Number 35
  • Slide Number 36
  • Evolution of antivirus defense systems
  • Slide Number 38
  • Slide Number 39
  • Slide Number 40
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins extreme sequence diversity
  • Slide Number 46
  • Slide Number 47
  • Slide Number 48
  • Slide Number 49
  • Hypothesis
  • Slide Number 51
  • Slide Number 52
  • Slide Number 53
  • Slide Number 54
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • The three types of CRISPRCas systems and their signature genes
  • Experimental data on CRISPRCas systems
  • Phylogeny of Cas1 and the 3 types of CRISPRCas systems
  • CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
  • Slide Number 62
  • Slide Number 63
  • Slide Number 64
  • Slide Number 65
  • Slide Number 66
  • Slide Number 67
  • Slide Number 68
  • Slide Number 69
Page 25: The Virus World, its evolution, evolution of antiviral defense

+

1 Positive-strandRNA

3-30 kb - +R R ET

T+R

Class Replication cycle

2 Double-strandRNA

4-25 kb plusmnTr R E

T

+ plusmn

3 Negative-strandRNA

11-20 kbR R E

RdRp CPT

+-Tr -

+

Genetic cycles of RNA viruses

Diversity of viral genetic cycles versus the uniform genetic cycle of all cellular organisms Viruses are the

biospherersquos laboratory of genomic strategies

RdRp CP

RdRp CP

+4 Retroid

RNA viruses 7-12 kb +RT Tr E

T+Tr

plusmn

5 Retroid DNAviruses

elements2-10 kb +Tr

T+Tr

plusmnE RT

plusmn

RT CP

RT

Class Replication cycle

Genetic cycles of retroid viruses and retroelements

Genetic cycles of DNA viruses and plasmids 6 ssDNAvirusesplasmids2-11 kb

7 dsDNA virusesplasmids

5-1200 kb

+RCR RCR E

+Tr

plusmn +

TRCE

Tr

plusmn

+T

plusmnR E

Pr-Pol ALL CELLULARORGANISMS

YOU ARE HERE

I Genes with readily detectable homologs from cellular life forms 1 Genes with closely related homologs from cellular organisms

(typically the host of the given virus) present in a narrow group of viruses2 Genes that are conserved within a virus lineage or even several

lineages and have moderately close cellular homologsOrigin relatively recent (1) or ancient (2) acquisition from host

II Virus-specific genes3 ORFans ie genes without detectable homologs except possibly

in closely related viruses 4 Virus-specific genes that are conserved within a virus lineageAcquisition from host but with rapid divergence from ancestor once within viral genomes

III Viral hallmark genes5 Genes shared by many diverse virus lineages with only very distant

homologs in cellular organisms

Natural history of viral genes a one-page summary of viral comparative genomics

Contributions of different classes of viral genes to the genomes of different classes of viruses strong dependence on genome size

Genome size (log) 0 1 2 3

Recent acquisitions

Old acquisitions

Virus-specific ORFans

Virus-specific conserved

Virus hallmark

Most RNAvirusesretroelementsRCR replicons

Large RNA virusesadenotailedphages

NCLDVherpeslargephages

Natural history of viral genesViral Hallmark Genes

Shared by many diverse groups of viruses from the smallest RNA viruses to the giant DNA viruses

Strong support for monophyly of all viral members of the respective gene families

Only distant homologs in cellular organisms

Can be viewed as signatures of the lsquovirus statersquo

Play major roles in genome replication packagingand assembly

4 Rolling circle replicationinitiation endonuclease

1 Jelly-roll capsid protein

2 Superfamily 3 helicase

Protein products of viral hallmark genes

3 RNA-dependent RNA polymeraseand Reverse transcriptase

5 Viral archaeo-eukaryotic DNA primase

6 UL9-like superfamily 2 helicase

7 Packaging ATPase of the FtsK family

8 ATPase subunit of terminase

The primordial gene pool hypothesis (extremely counterintuitive ndash Santiago Elena Feb 16 2011)

The hallmark genes AND by implication the major lineages of modern viruses (at least viruses of prokaryotes) descend directly from a primordial gene pool

Koonin Senkevich Dolja Biol Direct 2006 129

Viral hallmark genes arebullpresent in a huge diversity of viruses and other selfish elementsbullrepresented only by remote homologs in cellular life forms

-synergy with the diversity of genomic strategies

A crucial corollary If viruses come directly from a primordial gene pool then origin of viruses isinextricably linked to the origin of cells

Cell degeneration Escaped genes Pr imordial geneticsystems

CELL

SMALLPARASITICCELL

VERYSMALLVIRUS

CHROMOSOME

PLASMID

VIRUS

RNA

VIRUS

PRE-CELLULAR LIFE FORMS

DNA

VIRUS

mRNA

VIRUS

Competing concepts of the or igin of viruses

CELLS

Origin and evolution of virus-like genetic

elementsin the pre-cellular

era

Koonin Martin TIG 2005Koonin Senkevich Dolja Biol Direct 2006 129Koonin Ann NY Acad Sci 2009

Replicon fusionyields chromosomes

bullViruses and virus-like genetic elements are not ldquojustrdquo pathogens they aredominant entities in the biospherebullEmergence of virus-like parasites is inevitable in any replicating systembullIn the pre-cellular epoch the genetic elements that later became viral andcellular genomes comprised a single pool in which they mixed matched andevolved new increasingly complex gene ensemblesbullDifferent replication strategies including RNA replication reverse transcriptionand DNA replication evolved already in the primordial genetic poolbullWith the emergence of prokaryotic cells a distinct pool of viral genes formed that retained its identity ever since as evidenced by the extant distribution ofviral hallmark genes ldquovirus worldrdquo or the virospherebullThe emergence of the eukaryotic cell was a second melting pot of virus evolution from which viruses of eukaryotes originated via recombination of genes from prokaryote viruses retroelements and the evolving eukaryotic hostbullViruses make essential contributions to the evolution of the genomes of cellularlife forms in particular as vehicles of HGT GTAs transducing phages

The ancient Virus World (VW)

Koonin EV Senkevich TG Dolja VV The ancient Virus World and evolution of cells Biol Direct 2006 Koonin EV Wolf YI Nagasaki K Dolja VV The complexity of the virus world Nat Rev Microbiol 2009

Evolution of antivirus defense systems

bull CRISPRCas system of adaptive immunity in prokaryotes

bull A case for Lamarckian evolutionbull The perennial virus-host arms race

CRISPR repeats and Cas genesCRISPR Clustered Regularly interspaced short palindromic repeatsCas CRISPR-associated (genes)

Sorek et al Nature Rev Microbiol 2008

Makarova KS Aravind L Grishin NV Rogozin IB Koonin EVA DNA repair system specific for thermophilic Archaea and bacteria predicted by genomic context analysisNucleic Acids Res 2002 Jan 1530(2)482-96

During a systematic analysis of conserved gene context in prokaryotic genomes a previously undetected complex partially conserved neighborhood consisting of more than 20 genes was discovered in most Archaea and some bacteria including the hyperthermophiles Thermotoga maritima and Aquifex aeolicus The gene composition and gene order in this neighborhood vary greatly between species but all versions have a stable conserved core that consists of five genes One of the core genes encodes a predicted DNA helicase often fused to a predicted HD-superfamily hydrolase and another encodes a RecB family exonuclease three core genes remain uncharacterized but one of these might encode a nuclease of a new family helliphelliphelliphellipThe functional features of the proteins encoded in this neighborhood suggest that they comprise a previously undetected DNA repair system which to our knowledge is the first repair system largely specific for thermophiles to be identified

Protein components of the system an update with unification of many diverse families

~25 families altogetherFamily SubfamilyA Phyletic

distributionBComments

1 COG1518 COG1518 (cas1) All Putative novel nucleaseintegrase Mostly α-helical protein

2 COG1343 COG1343 (cas2) COG3512ygbF-like

MTH324-likey1723_N-like

All Small protein related to VapD fused to helicase (COG1203) iny1723-like proteins

3 COG1203 COG1203 (cas3) All DNA helicase Most proteins have fusion to HD nuclease

4 RecB-like nuclease

COG1468 (cas4) COG4343 All RecB-like nuclease Contains three-cysteine C-terminal cluster

5 RAMP Repair-associated mysterious

protein

COG1688 (cas5)COG1769 COG1583COG1567 COG1336COG1367 COG1604COG1337 COG1332COG5551

BH0337-likeMJ0978-likeYgcH-like

y1726-like y1727-like

All Belong to ldquoRAMPrdquo family possibly RNA-binding proteinstructurally related to duplicated ferredoxin fold (PDB 1wj9)

6 COG1857 COG1857 COG3649 YgcJ-like y1725-like

All αβ protein predicted nuclease or integrase

7 HD-like nuclease

COG1203 (N-terminus) COG2254 All HD-like nuclease

8 BH0338 BH0338-likeMTH1090-like

All mostlyarchaea and FIRM

Large Zn-finger containing proteins probably nucleases (nucleaseactivity was shown for MTH1090 (ref)

9 COG1353 predicted

polymerase

COG1353 Most archaea some bacteria

Predicted palm-domain polymerase distantly related to viral RdRp and RT

hellipand ~15 other less common proteins Makarova et al BD 2006

CRISPR clustered regularly interspaced short palindromic repeats

CRISPR repeats

TB and TA

Strepto-like

Ecoli-like

Pasteurella-like

Cas2COG1343

Cas1COG1518

Cas4COG1468

Cas3COG1203

ldquoThe common structural characteristics of CRISPR loci are (i) the presence of multiple short direct repeats which show no or very little sequence variation within a given locus (ii) the presence of non-repetitive spacer sequences between the repeats of similar size (iii) the presence of a common leader sequence of a few hundred basepairs in most species harbouring multiple CRISPR loci (iv) the absence of long open reading frames within the locus and (v) the presence of the cas1 gene accompanied by the cas2 cas3 or cas4 genes in CRISPR-containing speciesrdquo

The cas genes are our ldquorepairrdquo system

ldquoHelicaserdquo cassette

ldquoPolymeraserdquo cassette

paragon of prokaryotic genome plasticity-extensive gene shuffling-evidence of multiple horizontal transfers-widespread gene loss

RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins

extreme sequence diversity MOTIFs specific I II III IV V COGs 13361367 hhshhGs ust-lKGhh+hh hhGtt hD lGhttsGh 160413371332 y1726-like slhlpEKuVRGT lRTIDs YGuVTsGhuh COG1851 hGphpGpsaFh hGFGRh BH0337-like TpA-h+GIh-uIh hhLpDV LGsREhuht COG1567 hhhpp ups-lhtAhh luscoGhGh COG1769 hhhh+Ph-hhh ss-hhGhlhsh hGh hGtcp+hsthchp COG1688 (Cas5) hhhhhts sssshhGhlsh lGttphh COGs 15835551 hhhhoPhhl hGtppshGFGl YgcH-like hHphlh hGu+uhGhGhh y1727-like LHphLh GFsthGLStss MJ0978-like hHNH lG+tsuhGhGol

Ferredoxin foldRNA-recognition motif (RRM)

The sequence similarity space of CRISPR repeats visualized with the BioLayoutprogram [26] Dots denote individual repeat sequences connecting lines represent Smith-Waterman similarities such that closer dots represent more similar sequences Dot colors denote cluster association as derived from MCL clustering The 12 largest clusters are indicated by circles together with their sequence logos coarse phylogenetic composition and sample secondary structures where applicableKunin et al Genome Biology 2007 8R61 doi101186gb-2007-8-4-r61

CRISPR show extreme diversity and complex clustering

Arcful GTTGAAATC-------AGACCAAAATGGGATTGAAAG- Metthe GTTAAAATC-------AGACCAAAATGGGATTGAAAT- Pyraby GTTCCAATA-------AGACTAGAATAGAATTGAAAG- Pyrhor GTTTAATAA--------GACTAAAATAGAATTGAAAG- Sulsol GATTAATCC-------------CAAAAGAATTGAAAG- Pyrhor GTTTCCGTA--------GAACTAAATAGTGTGGAAAG- Pyraby GTTTCCGTA--------GAACTTAGTAGTGTGGAAAG- Pyraer GTTTCAACT------------ATCTTTTGATTTTTGG- Pictor GTTAAATAA-------TAACCTAAATAGGATTGAAAG- Theaer GTAAAATAG---------ACCTTAATAGGATTGAAAG- Metace GTTTCAATC-------CCTCAAAGGTCTGATTTTAAC- Sultok GATGAATCC------------CAAAAGGAATTGAAAG- Sulaci GTTTTAGTT-------------TCTTGTCGTTATTAC- Pictor GTTTAAGAA--------TTACTAGATAGTATGGAGT-- Halmar GTTTCAGAC-------GGACCCTTGTGGGGTTGAAGC- Metjan ATTAAAATC-------AGACCGTTTCGGAATGGAAAT- Metkan GTTTCATTA-CCCGTATTATTACGGGTTAATTGCGAG- Nanequ CTTTCAATA---------TTTCTAATATATTAGAAAC- Themar GTTTCAATA-------CTTCCTTAGAGGTATGGAAAC- Theten GTTTCAATC-------CCTTTTAGGTAGGCTAAAAAC Thethe GTTGCAAAC-CTCGTTAGCCTCGTAGAGGATTGAAAC- Aquaeo GTTTTAACT--------CCACACGGTACATTAGAAAC- Bachal GTCGCACTC----TATATGGGT-GCGTGGATTGAAAT- Azoarcus GTGTTCCCC-------GCGCATCGCGGGGGTTGAAG-- Chltep GTCTTCCCC--------ACGCC-CGTGGGGGTGTTTC- Chrvio GTGCTCCCC-------CACGCA-CGTGGGGATGAACCG Clotet GTATTAGTA-------GCACCA-TATTGGAATGTAAAT Anabaena GTTTTAATTAACAAAAATCCCTATCAGGGATTGAAAC- Myctub GTTTCCGTC--CCCTCTCGGGGTTTTGGGTCTGACGAC Yerpes GTTCACTGC--------CGCACAGGCAGCTTAGAAA-- Metcap GTTTCAATCCACTCCCGGCTATTTAGCCGGGAGATAC- Mycgal GTTTTAGCA-CTGTACAATACTTGTGTAAGCAATAAC- Pasmul GTTAACTGC--------CGTATAGGCAGCTTAGAAA-- Pasmul GTTGTAGTTCCCTCTCTCATTTCGCAGTGCTACAAT-- Pasmul GTTCACCAT--------CGTGTAGATGGCTTAGAAA--

All

A large subset of CRISPRs is conserved among diverse specieseven between archaea and bacteria suggesting that

CRISPRs are horizontally transferred together with cas genes

Mojica FJ Diez-Villasenor C Garcia-Martinez J Soria E

Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elementsJ Mol Evol 2005 Feb60(2)174-82

Here we show that CRISPR spacers derive from preexisting sequences either chromosomal or within transmissible genetic elements such as bacteriophages and conjugative plasmids

helliphelliphelliphellipThe transcription of the CRISPR loci (Tang et al 2002) suggests that such activity could be executed by CRISPR-RNA molecules acting as regulatory RNA that specifically recognizes the target through the homologous RNA-spacer sequence similarly to the eukaryotic interference RNA

HypothesisCRISPRCasbull is a prokaryotic immunity system that functions on the

RNAi principle

bull integrates short fragments of essential phageplasmid genes into CRISPRs

bull When expressed these fragments (psiRNA ndash after prokaryotic siRNA) silence the target gene and make the organism immune to the respective agent

bull contains all or most of the protein activities involved in these processes

bull Some of the Cas proteins are functional analogs of the eukaryotic proteins involved in RNAi in particular components of RISC (RNA-Induced Silencing Complex) and form prokaryotic analogs of RISC (pRISCs)

Transcription

(protein-guided) RNA folding

specific transcription factor (COG1517)

cellular RNApol

polycistronic pre-psiRNA

p-dicer (Helicase+HD-Hydrolase)(COG1203 COG2254)

RNA processing(fast)

pre-psiRNA75-100 nt

RNA processing(slow)

p-dicer

psiRNA25-45 nt

p-slicer(COG1468 COG4343

COG1857)

RAMP

target RNA cleavage

p-RISC

plasmid or phage mRNA

Annealing to RNA target

p-RISC

The basic scheme of CASS functioning

psiRNA25-45 nt

RAMP

RAMP-RNA complex(unstable)plasmid or phage

mRNA

Primer elongationCASS RNApol(COG1353)

Amplified annealing

long dsRNA (stable)

p-dicerDuplex degradation

RAMPRAMP binding

Cycle continues

Annealing to RNA target

Variant of CASS functioning with polymerasepsi-RNA amplification(mostly in thermophiles but also in Mycobacteria)

plasmid or phage mRNA

pre-psiRNA75-100 nt

CASS RNApol(COG1353)Or other RT

Reverse transcription with random copy choice

Random RNA recombination and reverse transcription

dsDNA with CRISPRs and target-derived spacers

Homologous recombination with genomic CRISPR region

integrase(COG1518)

genomic DNA

genomic DNA with new target-derived spacer

OR

New psiRNA generation

Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12

Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity

Key validation

bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease

Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes

(1) Adaptation

(2) Expression amp Processing

(3) Interference

invading virus plasmid

CRISPRleader

5 4 3 2 1

6 5 4 3 2 1

invading virus plasmid

invading virus plasmid

host

host

cas operon

Cascade

Cas3

cas3 cas2cas1cas5

Van der Oost et al TIBS 2009

The three types of CRISPRCas systems and their signature genes

1343

1343

1343

1343

Helicase1203

HD-f

Cas2

Cas6

Cas6

Cas5

Cas8

Cas9

Cas1

Cas3

Cas10

Cas11

Cas7

Cas6

Cas6

Cas5 Cas6

Cas4

Cas6

13433513

1343

RuvC-f

3513

HNH

13433513

1857

18571857 RAMP1688

RAMP1688

RecB-f1468

RecB-f1468

Helicase1203

R AMP1583

1857 1857Helicase1203

1857Helicase1203

1343 y1724

ygcKygcL

EcoliCASS2

YpestCASS3

NmeniCASS4

MtubeCASS6 RAMP module

DvulgCASS1

Tneap and HmariCASS7

ApernCASS5

Type I (Helicase cassette)

Cascade

Type II (HNH-type)

Type III (RAMP module or Polymerase cassette)

CASS4a

1857 1343RecB-f1468

Helicase1203

BH0338MTH1090LA3191

1421 RAMP1567

RAMP15833337RAMP

1583

RAMP1583

RAMP1769

SPy1049

Cas6

Cas61857

RecB-f1468

RAMP1688

Helicase1203

AF0070

R AMP1583

1343

1517

Cas12 Cas13

RAMP1688

RAMP1688

RAMP1688

Makarova et al NRM submitted

Experimental data on CRISPRCas systems

1857

1857

1857

RAMP1688

10

Cordip DIP0037Bacfra BF3951

Wolsuc WS1444Camjej Cj1522c

Neimen NMA0630Pasmul PM1126

Strthe str0658Strpyo Spy 0770

Treden TDE0328Staepi SERP2463

Mycgal MGA 0523Mycmob MMOB0320

Porgin PG1982Legpne lpp0161

Wolsuc WS1615Sulsol SSO1450

Aerper APE1240Sulsol SSO1405

Pyraer PAE0200Pyraer PAE0081

Metmaz MM0559Metmaz MM3249

Arcful AF1878Mansuc MS1635

Niteur NE0111Myctub Rv2817c

Nostoc alr0381Synech slr7071Thethe TTHB145

Chrvio CV1229Xanaxo XAC3842

Chltep CT1130Desvul DVUA0134

Metcap MCA0651Azoarc ebA3283

Deigeo_DRAFT_1682Mansuc MS0981

Baccla ABC3592Bachal BH0341

Strpyo Spy 1286Vibvul VVA1544

Nostoc alr1468Synech slr7092

Synech slr7016Geosul GSU0057

Thethe TTHB224Nostoc alr1568

Metkan MK1312

Esccol b2755Saltyp STM2938

Phopro PBPRC0034Geosul GSU1392

Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639

Symthe_STH663Corjei jk0643

Nocfar nfa44220Strave SAV7537

Metcap MCA0930Cordip DIP2214

Chrvio CV1756Zymmob ZMO0680

Pasmul PM0311Erwcar ECA3679

Yerpes y1722Legpne lpl2837Phopro PBPRB1995

Acinet ACIAD2484

Clotet CTC01148Fusnuc FN1177

Aquaeo aq 369Theten TTE2658Themar TM1797

Bacfra BF2544Porgin PG2014

Halmar pNG4053Clotet CTC01463

Metjan MJ0378Pyrhor PH1245

Thekod TK0455Metthe MTH1084

Arcful AF2435Metace MA3670

Nanequ NEQ017Pyrhor PH0173

Pictor PTO0003Pictor PTO0049Thevol TVN0106

1343

1343

1343

1343

1518

1518

1343

1518

RecB-f4343

RecB-f1468

RecB-f1468

RecB-f1468

Helicase12031857 RAMP

1688

RAMP1688

Helicase1203

Helicase1203

RAMPBH0337

RAMPYgcH

RAMPy1727

RAMPy17261857

Helicase1203

HD-f

Helicase1203

HD-f

1343

HD-f

HD-f

HD-f

RuvC-f

AF0070

BH0338

y1724

ygcKygcL

RAMP1583

HTH

AF1870

SPy1049

3574

AF1870

PH0918

HTH 3574

1857 RAMP1688

RecB-f1468

Helicase1203

HD-f

AF1870

MTH1090

AF1873

RAMP1583

3574

AF1870

3574

1343

1

7

2

3

4

5

6

7a

Sulsol SSO1999Sulsol SSO1440

Sultok ST2639Sultok ST0032

Sulsol SSO1402Pyraer PAE0068

Arcful AF1874Pyrhor PH0917Thekod TK0450

Pyraby PAB1689

13433513

3513

1518

RuvC-f

HNH

4aHNH

1 0

Pyrhor PH0162

6

64

6

4

16

17

7

7

7

7

1

2

7

7

7

7

5 7

5 75 7

5 75 7

5 7

6 1

7

Metkan MK1314

Metace MA1931

Metmaz MM3356

Metbar Mbar_A1355

Strthe stu0960Myctub Rv2823cStaepi SERP2461

Themar TM1811Mansuc MS1651

Niteur NE0123Thethe TTP0102

Metthe MTH1082

Metthe MTH326

Metjan MJ1672Pictor PTO0053Thevol TVN0112

Metkan MK1297Thethe TTP0115

Bachal BH0328Synech sll7090

Aquaeo aq 387Theten TTE2639

Themar TM1794Arcful AF1867

Pyrfur PF1129Thefus Tfu1578

Porgin PG1987Sultok ST1979

Sulsol SSO1991Sulsol SSO1729

Sultok ST0012Sulaci Saci 2046Nostoc all1479

1421 RAMP1567

RAMP1583

RAMPMTH323

RAMP1769

XMTH324

ldquoPolymeraserdquo cassette

ldquoHelicaserdquo cassette(A)

(B)

(C)

Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4

Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56

Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)

mRNA targeting

Phylogeny of Cas1 and the 3 types of CRISPRCas systems

228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes

EcoliCASS2Type IType I

Type I

Type I

Type I

Type III

Type III

Type III

Type III

Type III

Type III

Type II

YpestCASS3

NmeniCASS4

PolymeraseRAMPmoduleMtubeCASS6

PolRAMP

DvulgCASS1

TneapHmariCASS7

ApernCASS5

cas1cas9 cas2 cas4

cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI

II

IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2

RAMP

Makarova et al NRM submitted

CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria

bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes

54

256

1537

26

516

2

9

7 56107

3

10

13

383

210

46

216

8

1

7 70211

10

1

0102030405060708090

100

1623

84 6

17 7

2322

0

9

0

0

151

2

1

21

17

20 1

0

1533 28 7 14

0

9 7 40

117 210

0102030405060708090

100

Type I Type II Type IIICas1presenceabsence

Makarova et al in preparation

Modular evolution of the 3 types of CRISPRCas

Makarova et al in preparation

Mol Microbiol 2011 Jan79(2)484-502

A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair

Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF

Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair

Back to repair functions

The 3 major modalities of evolution

Koonin Wolf Biol Direct 2009

CRISPRCas as a bona fide Lamarckian system

Koonin Wolf Biol Direct 2009

Phenomenon Biological rolefunction

Phyletic spread

Lamarckian criteria

Genomic changes caused by environmental factor

Changes are specific to relevant genomic loci

Changes provide adaptation to the causative factor

Bona fide LamarckianCRISPRCas Defense against

viruses and other mobile elements

Most of the Archaeaand many bacteria

Yes Yes Yes

piRNA Defense against transposable elements in germline

Animals Yes Yes Yes

HGT (specific cases)

Adaptation to new environment stress response resistance

Archaea bacteria unicellular eukaryotes

Yes Yes Yes

Quasi-LamarckianHGT (general phenomenon)

Diverse innovations Archaea bacteria unicellular eukaryotes

Yes No Yesno

Stress-induced mutagenesis

Stress responseresistanceadaptation to new conditions

Ubiquitous Yes No or partially

Yes (but general evolvabilityenhanced as well)

Diverse Lamarckian and quasi-Lamarckian phenomena

Koonin Wolf Biol Direct 2009

Stress as a gauge of evolutionary modality

Koonin Wolf Biol Direct 2009

bullEvolution of parasites is intrinsic to any replicator system

bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since

bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms

bullPerennial arms race between parasites and hosts is one of the principal factors of evolution

Valerian DoljaOregon State University

Yuri Wolf NCBITatiana Senkevich NIAID NIH

Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille

Didier Raoult et al

Bill MartinUniv Duesseldorf

Kira Makarova NCBI

  • Slide Number 1
  • What is a virus
  • Slide Number 3
  • Slide Number 4
  • Slide Number 5
  • Comparative genomics shows that viruses that cause human diseases belong to families that evolved hundreds of millions or even billions years agoViruses accompany the evolving cellular life throughout its history and might even predate it
  • Slide Number 7
  • Slide Number 8
  • Slide Number 9
  • Slide Number 10
  • Slide Number 11
  • Slide Number 12
  • Slide Number 13
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Slide Number 18
  • Slide Number 19
  • Slide Number 20
  • Slide Number 21
  • Slide Number 22
  • Slide Number 23
  • Slide Number 24
  • Slide Number 25
  • Slide Number 26
  • Slide Number 27
  • Slide Number 28
  • Slide Number 29
  • Slide Number 30
  • Slide Number 31
  • Slide Number 32
  • Slide Number 33
  • Slide Number 34
  • Slide Number 35
  • Slide Number 36
  • Evolution of antivirus defense systems
  • Slide Number 38
  • Slide Number 39
  • Slide Number 40
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins extreme sequence diversity
  • Slide Number 46
  • Slide Number 47
  • Slide Number 48
  • Slide Number 49
  • Hypothesis
  • Slide Number 51
  • Slide Number 52
  • Slide Number 53
  • Slide Number 54
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • The three types of CRISPRCas systems and their signature genes
  • Experimental data on CRISPRCas systems
  • Phylogeny of Cas1 and the 3 types of CRISPRCas systems
  • CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
  • Slide Number 62
  • Slide Number 63
  • Slide Number 64
  • Slide Number 65
  • Slide Number 66
  • Slide Number 67
  • Slide Number 68
  • Slide Number 69
Page 26: The Virus World, its evolution, evolution of antiviral defense

+4 Retroid

RNA viruses 7-12 kb +RT Tr E

T+Tr

plusmn

5 Retroid DNAviruses

elements2-10 kb +Tr

T+Tr

plusmnE RT

plusmn

RT CP

RT

Class Replication cycle

Genetic cycles of retroid viruses and retroelements

Genetic cycles of DNA viruses and plasmids 6 ssDNAvirusesplasmids2-11 kb

7 dsDNA virusesplasmids

5-1200 kb

+RCR RCR E

+Tr

plusmn +

TRCE

Tr

plusmn

+T

plusmnR E

Pr-Pol ALL CELLULARORGANISMS

YOU ARE HERE

I Genes with readily detectable homologs from cellular life forms 1 Genes with closely related homologs from cellular organisms

(typically the host of the given virus) present in a narrow group of viruses2 Genes that are conserved within a virus lineage or even several

lineages and have moderately close cellular homologsOrigin relatively recent (1) or ancient (2) acquisition from host

II Virus-specific genes3 ORFans ie genes without detectable homologs except possibly

in closely related viruses 4 Virus-specific genes that are conserved within a virus lineageAcquisition from host but with rapid divergence from ancestor once within viral genomes

III Viral hallmark genes5 Genes shared by many diverse virus lineages with only very distant

homologs in cellular organisms

Natural history of viral genes a one-page summary of viral comparative genomics

Contributions of different classes of viral genes to the genomes of different classes of viruses strong dependence on genome size

Genome size (log) 0 1 2 3

Recent acquisitions

Old acquisitions

Virus-specific ORFans

Virus-specific conserved

Virus hallmark

Most RNAvirusesretroelementsRCR replicons

Large RNA virusesadenotailedphages

NCLDVherpeslargephages

Natural history of viral genesViral Hallmark Genes

Shared by many diverse groups of viruses from the smallest RNA viruses to the giant DNA viruses

Strong support for monophyly of all viral members of the respective gene families

Only distant homologs in cellular organisms

Can be viewed as signatures of the lsquovirus statersquo

Play major roles in genome replication packagingand assembly

4 Rolling circle replicationinitiation endonuclease

1 Jelly-roll capsid protein

2 Superfamily 3 helicase

Protein products of viral hallmark genes

3 RNA-dependent RNA polymeraseand Reverse transcriptase

5 Viral archaeo-eukaryotic DNA primase

6 UL9-like superfamily 2 helicase

7 Packaging ATPase of the FtsK family

8 ATPase subunit of terminase

The primordial gene pool hypothesis (extremely counterintuitive ndash Santiago Elena Feb 16 2011)

The hallmark genes AND by implication the major lineages of modern viruses (at least viruses of prokaryotes) descend directly from a primordial gene pool

Koonin Senkevich Dolja Biol Direct 2006 129

Viral hallmark genes arebullpresent in a huge diversity of viruses and other selfish elementsbullrepresented only by remote homologs in cellular life forms

-synergy with the diversity of genomic strategies

A crucial corollary If viruses come directly from a primordial gene pool then origin of viruses isinextricably linked to the origin of cells

Cell degeneration Escaped genes Pr imordial geneticsystems

CELL

SMALLPARASITICCELL

VERYSMALLVIRUS

CHROMOSOME

PLASMID

VIRUS

RNA

VIRUS

PRE-CELLULAR LIFE FORMS

DNA

VIRUS

mRNA

VIRUS

Competing concepts of the or igin of viruses

CELLS

Origin and evolution of virus-like genetic

elementsin the pre-cellular

era

Koonin Martin TIG 2005Koonin Senkevich Dolja Biol Direct 2006 129Koonin Ann NY Acad Sci 2009

Replicon fusionyields chromosomes

bullViruses and virus-like genetic elements are not ldquojustrdquo pathogens they aredominant entities in the biospherebullEmergence of virus-like parasites is inevitable in any replicating systembullIn the pre-cellular epoch the genetic elements that later became viral andcellular genomes comprised a single pool in which they mixed matched andevolved new increasingly complex gene ensemblesbullDifferent replication strategies including RNA replication reverse transcriptionand DNA replication evolved already in the primordial genetic poolbullWith the emergence of prokaryotic cells a distinct pool of viral genes formed that retained its identity ever since as evidenced by the extant distribution ofviral hallmark genes ldquovirus worldrdquo or the virospherebullThe emergence of the eukaryotic cell was a second melting pot of virus evolution from which viruses of eukaryotes originated via recombination of genes from prokaryote viruses retroelements and the evolving eukaryotic hostbullViruses make essential contributions to the evolution of the genomes of cellularlife forms in particular as vehicles of HGT GTAs transducing phages

The ancient Virus World (VW)

Koonin EV Senkevich TG Dolja VV The ancient Virus World and evolution of cells Biol Direct 2006 Koonin EV Wolf YI Nagasaki K Dolja VV The complexity of the virus world Nat Rev Microbiol 2009

Evolution of antivirus defense systems

bull CRISPRCas system of adaptive immunity in prokaryotes

bull A case for Lamarckian evolutionbull The perennial virus-host arms race

CRISPR repeats and Cas genesCRISPR Clustered Regularly interspaced short palindromic repeatsCas CRISPR-associated (genes)

Sorek et al Nature Rev Microbiol 2008

Makarova KS Aravind L Grishin NV Rogozin IB Koonin EVA DNA repair system specific for thermophilic Archaea and bacteria predicted by genomic context analysisNucleic Acids Res 2002 Jan 1530(2)482-96

During a systematic analysis of conserved gene context in prokaryotic genomes a previously undetected complex partially conserved neighborhood consisting of more than 20 genes was discovered in most Archaea and some bacteria including the hyperthermophiles Thermotoga maritima and Aquifex aeolicus The gene composition and gene order in this neighborhood vary greatly between species but all versions have a stable conserved core that consists of five genes One of the core genes encodes a predicted DNA helicase often fused to a predicted HD-superfamily hydrolase and another encodes a RecB family exonuclease three core genes remain uncharacterized but one of these might encode a nuclease of a new family helliphelliphelliphellipThe functional features of the proteins encoded in this neighborhood suggest that they comprise a previously undetected DNA repair system which to our knowledge is the first repair system largely specific for thermophiles to be identified

Protein components of the system an update with unification of many diverse families

~25 families altogetherFamily SubfamilyA Phyletic

distributionBComments

1 COG1518 COG1518 (cas1) All Putative novel nucleaseintegrase Mostly α-helical protein

2 COG1343 COG1343 (cas2) COG3512ygbF-like

MTH324-likey1723_N-like

All Small protein related to VapD fused to helicase (COG1203) iny1723-like proteins

3 COG1203 COG1203 (cas3) All DNA helicase Most proteins have fusion to HD nuclease

4 RecB-like nuclease

COG1468 (cas4) COG4343 All RecB-like nuclease Contains three-cysteine C-terminal cluster

5 RAMP Repair-associated mysterious

protein

COG1688 (cas5)COG1769 COG1583COG1567 COG1336COG1367 COG1604COG1337 COG1332COG5551

BH0337-likeMJ0978-likeYgcH-like

y1726-like y1727-like

All Belong to ldquoRAMPrdquo family possibly RNA-binding proteinstructurally related to duplicated ferredoxin fold (PDB 1wj9)

6 COG1857 COG1857 COG3649 YgcJ-like y1725-like

All αβ protein predicted nuclease or integrase

7 HD-like nuclease

COG1203 (N-terminus) COG2254 All HD-like nuclease

8 BH0338 BH0338-likeMTH1090-like

All mostlyarchaea and FIRM

Large Zn-finger containing proteins probably nucleases (nucleaseactivity was shown for MTH1090 (ref)

9 COG1353 predicted

polymerase

COG1353 Most archaea some bacteria

Predicted palm-domain polymerase distantly related to viral RdRp and RT

hellipand ~15 other less common proteins Makarova et al BD 2006

CRISPR clustered regularly interspaced short palindromic repeats

CRISPR repeats

TB and TA

Strepto-like

Ecoli-like

Pasteurella-like

Cas2COG1343

Cas1COG1518

Cas4COG1468

Cas3COG1203

ldquoThe common structural characteristics of CRISPR loci are (i) the presence of multiple short direct repeats which show no or very little sequence variation within a given locus (ii) the presence of non-repetitive spacer sequences between the repeats of similar size (iii) the presence of a common leader sequence of a few hundred basepairs in most species harbouring multiple CRISPR loci (iv) the absence of long open reading frames within the locus and (v) the presence of the cas1 gene accompanied by the cas2 cas3 or cas4 genes in CRISPR-containing speciesrdquo

The cas genes are our ldquorepairrdquo system

ldquoHelicaserdquo cassette

ldquoPolymeraserdquo cassette

paragon of prokaryotic genome plasticity-extensive gene shuffling-evidence of multiple horizontal transfers-widespread gene loss

RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins

extreme sequence diversity MOTIFs specific I II III IV V COGs 13361367 hhshhGs ust-lKGhh+hh hhGtt hD lGhttsGh 160413371332 y1726-like slhlpEKuVRGT lRTIDs YGuVTsGhuh COG1851 hGphpGpsaFh hGFGRh BH0337-like TpA-h+GIh-uIh hhLpDV LGsREhuht COG1567 hhhpp ups-lhtAhh luscoGhGh COG1769 hhhh+Ph-hhh ss-hhGhlhsh hGh hGtcp+hsthchp COG1688 (Cas5) hhhhhts sssshhGhlsh lGttphh COGs 15835551 hhhhoPhhl hGtppshGFGl YgcH-like hHphlh hGu+uhGhGhh y1727-like LHphLh GFsthGLStss MJ0978-like hHNH lG+tsuhGhGol

Ferredoxin foldRNA-recognition motif (RRM)

The sequence similarity space of CRISPR repeats visualized with the BioLayoutprogram [26] Dots denote individual repeat sequences connecting lines represent Smith-Waterman similarities such that closer dots represent more similar sequences Dot colors denote cluster association as derived from MCL clustering The 12 largest clusters are indicated by circles together with their sequence logos coarse phylogenetic composition and sample secondary structures where applicableKunin et al Genome Biology 2007 8R61 doi101186gb-2007-8-4-r61

CRISPR show extreme diversity and complex clustering

Arcful GTTGAAATC-------AGACCAAAATGGGATTGAAAG- Metthe GTTAAAATC-------AGACCAAAATGGGATTGAAAT- Pyraby GTTCCAATA-------AGACTAGAATAGAATTGAAAG- Pyrhor GTTTAATAA--------GACTAAAATAGAATTGAAAG- Sulsol GATTAATCC-------------CAAAAGAATTGAAAG- Pyrhor GTTTCCGTA--------GAACTAAATAGTGTGGAAAG- Pyraby GTTTCCGTA--------GAACTTAGTAGTGTGGAAAG- Pyraer GTTTCAACT------------ATCTTTTGATTTTTGG- Pictor GTTAAATAA-------TAACCTAAATAGGATTGAAAG- Theaer GTAAAATAG---------ACCTTAATAGGATTGAAAG- Metace GTTTCAATC-------CCTCAAAGGTCTGATTTTAAC- Sultok GATGAATCC------------CAAAAGGAATTGAAAG- Sulaci GTTTTAGTT-------------TCTTGTCGTTATTAC- Pictor GTTTAAGAA--------TTACTAGATAGTATGGAGT-- Halmar GTTTCAGAC-------GGACCCTTGTGGGGTTGAAGC- Metjan ATTAAAATC-------AGACCGTTTCGGAATGGAAAT- Metkan GTTTCATTA-CCCGTATTATTACGGGTTAATTGCGAG- Nanequ CTTTCAATA---------TTTCTAATATATTAGAAAC- Themar GTTTCAATA-------CTTCCTTAGAGGTATGGAAAC- Theten GTTTCAATC-------CCTTTTAGGTAGGCTAAAAAC Thethe GTTGCAAAC-CTCGTTAGCCTCGTAGAGGATTGAAAC- Aquaeo GTTTTAACT--------CCACACGGTACATTAGAAAC- Bachal GTCGCACTC----TATATGGGT-GCGTGGATTGAAAT- Azoarcus GTGTTCCCC-------GCGCATCGCGGGGGTTGAAG-- Chltep GTCTTCCCC--------ACGCC-CGTGGGGGTGTTTC- Chrvio GTGCTCCCC-------CACGCA-CGTGGGGATGAACCG Clotet GTATTAGTA-------GCACCA-TATTGGAATGTAAAT Anabaena GTTTTAATTAACAAAAATCCCTATCAGGGATTGAAAC- Myctub GTTTCCGTC--CCCTCTCGGGGTTTTGGGTCTGACGAC Yerpes GTTCACTGC--------CGCACAGGCAGCTTAGAAA-- Metcap GTTTCAATCCACTCCCGGCTATTTAGCCGGGAGATAC- Mycgal GTTTTAGCA-CTGTACAATACTTGTGTAAGCAATAAC- Pasmul GTTAACTGC--------CGTATAGGCAGCTTAGAAA-- Pasmul GTTGTAGTTCCCTCTCTCATTTCGCAGTGCTACAAT-- Pasmul GTTCACCAT--------CGTGTAGATGGCTTAGAAA--

All

A large subset of CRISPRs is conserved among diverse specieseven between archaea and bacteria suggesting that

CRISPRs are horizontally transferred together with cas genes

Mojica FJ Diez-Villasenor C Garcia-Martinez J Soria E

Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elementsJ Mol Evol 2005 Feb60(2)174-82

Here we show that CRISPR spacers derive from preexisting sequences either chromosomal or within transmissible genetic elements such as bacteriophages and conjugative plasmids

helliphelliphelliphellipThe transcription of the CRISPR loci (Tang et al 2002) suggests that such activity could be executed by CRISPR-RNA molecules acting as regulatory RNA that specifically recognizes the target through the homologous RNA-spacer sequence similarly to the eukaryotic interference RNA

HypothesisCRISPRCasbull is a prokaryotic immunity system that functions on the

RNAi principle

bull integrates short fragments of essential phageplasmid genes into CRISPRs

bull When expressed these fragments (psiRNA ndash after prokaryotic siRNA) silence the target gene and make the organism immune to the respective agent

bull contains all or most of the protein activities involved in these processes

bull Some of the Cas proteins are functional analogs of the eukaryotic proteins involved in RNAi in particular components of RISC (RNA-Induced Silencing Complex) and form prokaryotic analogs of RISC (pRISCs)

Transcription

(protein-guided) RNA folding

specific transcription factor (COG1517)

cellular RNApol

polycistronic pre-psiRNA

p-dicer (Helicase+HD-Hydrolase)(COG1203 COG2254)

RNA processing(fast)

pre-psiRNA75-100 nt

RNA processing(slow)

p-dicer

psiRNA25-45 nt

p-slicer(COG1468 COG4343

COG1857)

RAMP

target RNA cleavage

p-RISC

plasmid or phage mRNA

Annealing to RNA target

p-RISC

The basic scheme of CASS functioning

psiRNA25-45 nt

RAMP

RAMP-RNA complex(unstable)plasmid or phage

mRNA

Primer elongationCASS RNApol(COG1353)

Amplified annealing

long dsRNA (stable)

p-dicerDuplex degradation

RAMPRAMP binding

Cycle continues

Annealing to RNA target

Variant of CASS functioning with polymerasepsi-RNA amplification(mostly in thermophiles but also in Mycobacteria)

plasmid or phage mRNA

pre-psiRNA75-100 nt

CASS RNApol(COG1353)Or other RT

Reverse transcription with random copy choice

Random RNA recombination and reverse transcription

dsDNA with CRISPRs and target-derived spacers

Homologous recombination with genomic CRISPR region

integrase(COG1518)

genomic DNA

genomic DNA with new target-derived spacer

OR

New psiRNA generation

Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12

Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity

Key validation

bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease

Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes

(1) Adaptation

(2) Expression amp Processing

(3) Interference

invading virus plasmid

CRISPRleader

5 4 3 2 1

6 5 4 3 2 1

invading virus plasmid

invading virus plasmid

host

host

cas operon

Cascade

Cas3

cas3 cas2cas1cas5

Van der Oost et al TIBS 2009

The three types of CRISPRCas systems and their signature genes

1343

1343

1343

1343

Helicase1203

HD-f

Cas2

Cas6

Cas6

Cas5

Cas8

Cas9

Cas1

Cas3

Cas10

Cas11

Cas7

Cas6

Cas6

Cas5 Cas6

Cas4

Cas6

13433513

1343

RuvC-f

3513

HNH

13433513

1857

18571857 RAMP1688

RAMP1688

RecB-f1468

RecB-f1468

Helicase1203

R AMP1583

1857 1857Helicase1203

1857Helicase1203

1343 y1724

ygcKygcL

EcoliCASS2

YpestCASS3

NmeniCASS4

MtubeCASS6 RAMP module

DvulgCASS1

Tneap and HmariCASS7

ApernCASS5

Type I (Helicase cassette)

Cascade

Type II (HNH-type)

Type III (RAMP module or Polymerase cassette)

CASS4a

1857 1343RecB-f1468

Helicase1203

BH0338MTH1090LA3191

1421 RAMP1567

RAMP15833337RAMP

1583

RAMP1583

RAMP1769

SPy1049

Cas6

Cas61857

RecB-f1468

RAMP1688

Helicase1203

AF0070

R AMP1583

1343

1517

Cas12 Cas13

RAMP1688

RAMP1688

RAMP1688

Makarova et al NRM submitted

Experimental data on CRISPRCas systems

1857

1857

1857

RAMP1688

10

Cordip DIP0037Bacfra BF3951

Wolsuc WS1444Camjej Cj1522c

Neimen NMA0630Pasmul PM1126

Strthe str0658Strpyo Spy 0770

Treden TDE0328Staepi SERP2463

Mycgal MGA 0523Mycmob MMOB0320

Porgin PG1982Legpne lpp0161

Wolsuc WS1615Sulsol SSO1450

Aerper APE1240Sulsol SSO1405

Pyraer PAE0200Pyraer PAE0081

Metmaz MM0559Metmaz MM3249

Arcful AF1878Mansuc MS1635

Niteur NE0111Myctub Rv2817c

Nostoc alr0381Synech slr7071Thethe TTHB145

Chrvio CV1229Xanaxo XAC3842

Chltep CT1130Desvul DVUA0134

Metcap MCA0651Azoarc ebA3283

Deigeo_DRAFT_1682Mansuc MS0981

Baccla ABC3592Bachal BH0341

Strpyo Spy 1286Vibvul VVA1544

Nostoc alr1468Synech slr7092

Synech slr7016Geosul GSU0057

Thethe TTHB224Nostoc alr1568

Metkan MK1312

Esccol b2755Saltyp STM2938

Phopro PBPRC0034Geosul GSU1392

Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639

Symthe_STH663Corjei jk0643

Nocfar nfa44220Strave SAV7537

Metcap MCA0930Cordip DIP2214

Chrvio CV1756Zymmob ZMO0680

Pasmul PM0311Erwcar ECA3679

Yerpes y1722Legpne lpl2837Phopro PBPRB1995

Acinet ACIAD2484

Clotet CTC01148Fusnuc FN1177

Aquaeo aq 369Theten TTE2658Themar TM1797

Bacfra BF2544Porgin PG2014

Halmar pNG4053Clotet CTC01463

Metjan MJ0378Pyrhor PH1245

Thekod TK0455Metthe MTH1084

Arcful AF2435Metace MA3670

Nanequ NEQ017Pyrhor PH0173

Pictor PTO0003Pictor PTO0049Thevol TVN0106

1343

1343

1343

1343

1518

1518

1343

1518

RecB-f4343

RecB-f1468

RecB-f1468

RecB-f1468

Helicase12031857 RAMP

1688

RAMP1688

Helicase1203

Helicase1203

RAMPBH0337

RAMPYgcH

RAMPy1727

RAMPy17261857

Helicase1203

HD-f

Helicase1203

HD-f

1343

HD-f

HD-f

HD-f

RuvC-f

AF0070

BH0338

y1724

ygcKygcL

RAMP1583

HTH

AF1870

SPy1049

3574

AF1870

PH0918

HTH 3574

1857 RAMP1688

RecB-f1468

Helicase1203

HD-f

AF1870

MTH1090

AF1873

RAMP1583

3574

AF1870

3574

1343

1

7

2

3

4

5

6

7a

Sulsol SSO1999Sulsol SSO1440

Sultok ST2639Sultok ST0032

Sulsol SSO1402Pyraer PAE0068

Arcful AF1874Pyrhor PH0917Thekod TK0450

Pyraby PAB1689

13433513

3513

1518

RuvC-f

HNH

4aHNH

1 0

Pyrhor PH0162

6

64

6

4

16

17

7

7

7

7

1

2

7

7

7

7

5 7

5 75 7

5 75 7

5 7

6 1

7

Metkan MK1314

Metace MA1931

Metmaz MM3356

Metbar Mbar_A1355

Strthe stu0960Myctub Rv2823cStaepi SERP2461

Themar TM1811Mansuc MS1651

Niteur NE0123Thethe TTP0102

Metthe MTH1082

Metthe MTH326

Metjan MJ1672Pictor PTO0053Thevol TVN0112

Metkan MK1297Thethe TTP0115

Bachal BH0328Synech sll7090

Aquaeo aq 387Theten TTE2639

Themar TM1794Arcful AF1867

Pyrfur PF1129Thefus Tfu1578

Porgin PG1987Sultok ST1979

Sulsol SSO1991Sulsol SSO1729

Sultok ST0012Sulaci Saci 2046Nostoc all1479

1421 RAMP1567

RAMP1583

RAMPMTH323

RAMP1769

XMTH324

ldquoPolymeraserdquo cassette

ldquoHelicaserdquo cassette(A)

(B)

(C)

Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4

Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56

Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)

mRNA targeting

Phylogeny of Cas1 and the 3 types of CRISPRCas systems

228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes

EcoliCASS2Type IType I

Type I

Type I

Type I

Type III

Type III

Type III

Type III

Type III

Type III

Type II

YpestCASS3

NmeniCASS4

PolymeraseRAMPmoduleMtubeCASS6

PolRAMP

DvulgCASS1

TneapHmariCASS7

ApernCASS5

cas1cas9 cas2 cas4

cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI

II

IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2

RAMP

Makarova et al NRM submitted

CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria

bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes

54

256

1537

26

516

2

9

7 56107

3

10

13

383

210

46

216

8

1

7 70211

10

1

0102030405060708090

100

1623

84 6

17 7

2322

0

9

0

0

151

2

1

21

17

20 1

0

1533 28 7 14

0

9 7 40

117 210

0102030405060708090

100

Type I Type II Type IIICas1presenceabsence

Makarova et al in preparation

Modular evolution of the 3 types of CRISPRCas

Makarova et al in preparation

Mol Microbiol 2011 Jan79(2)484-502

A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair

Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF

Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair

Back to repair functions

The 3 major modalities of evolution

Koonin Wolf Biol Direct 2009

CRISPRCas as a bona fide Lamarckian system

Koonin Wolf Biol Direct 2009

Phenomenon Biological rolefunction

Phyletic spread

Lamarckian criteria

Genomic changes caused by environmental factor

Changes are specific to relevant genomic loci

Changes provide adaptation to the causative factor

Bona fide LamarckianCRISPRCas Defense against

viruses and other mobile elements

Most of the Archaeaand many bacteria

Yes Yes Yes

piRNA Defense against transposable elements in germline

Animals Yes Yes Yes

HGT (specific cases)

Adaptation to new environment stress response resistance

Archaea bacteria unicellular eukaryotes

Yes Yes Yes

Quasi-LamarckianHGT (general phenomenon)

Diverse innovations Archaea bacteria unicellular eukaryotes

Yes No Yesno

Stress-induced mutagenesis

Stress responseresistanceadaptation to new conditions

Ubiquitous Yes No or partially

Yes (but general evolvabilityenhanced as well)

Diverse Lamarckian and quasi-Lamarckian phenomena

Koonin Wolf Biol Direct 2009

Stress as a gauge of evolutionary modality

Koonin Wolf Biol Direct 2009

bullEvolution of parasites is intrinsic to any replicator system

bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since

bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms

bullPerennial arms race between parasites and hosts is one of the principal factors of evolution

Valerian DoljaOregon State University

Yuri Wolf NCBITatiana Senkevich NIAID NIH

Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille

Didier Raoult et al

Bill MartinUniv Duesseldorf

Kira Makarova NCBI

  • Slide Number 1
  • What is a virus
  • Slide Number 3
  • Slide Number 4
  • Slide Number 5
  • Comparative genomics shows that viruses that cause human diseases belong to families that evolved hundreds of millions or even billions years agoViruses accompany the evolving cellular life throughout its history and might even predate it
  • Slide Number 7
  • Slide Number 8
  • Slide Number 9
  • Slide Number 10
  • Slide Number 11
  • Slide Number 12
  • Slide Number 13
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Slide Number 18
  • Slide Number 19
  • Slide Number 20
  • Slide Number 21
  • Slide Number 22
  • Slide Number 23
  • Slide Number 24
  • Slide Number 25
  • Slide Number 26
  • Slide Number 27
  • Slide Number 28
  • Slide Number 29
  • Slide Number 30
  • Slide Number 31
  • Slide Number 32
  • Slide Number 33
  • Slide Number 34
  • Slide Number 35
  • Slide Number 36
  • Evolution of antivirus defense systems
  • Slide Number 38
  • Slide Number 39
  • Slide Number 40
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins extreme sequence diversity
  • Slide Number 46
  • Slide Number 47
  • Slide Number 48
  • Slide Number 49
  • Hypothesis
  • Slide Number 51
  • Slide Number 52
  • Slide Number 53
  • Slide Number 54
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • The three types of CRISPRCas systems and their signature genes
  • Experimental data on CRISPRCas systems
  • Phylogeny of Cas1 and the 3 types of CRISPRCas systems
  • CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
  • Slide Number 62
  • Slide Number 63
  • Slide Number 64
  • Slide Number 65
  • Slide Number 66
  • Slide Number 67
  • Slide Number 68
  • Slide Number 69
Page 27: The Virus World, its evolution, evolution of antiviral defense

I Genes with readily detectable homologs from cellular life forms 1 Genes with closely related homologs from cellular organisms

(typically the host of the given virus) present in a narrow group of viruses2 Genes that are conserved within a virus lineage or even several

lineages and have moderately close cellular homologsOrigin relatively recent (1) or ancient (2) acquisition from host

II Virus-specific genes3 ORFans ie genes without detectable homologs except possibly

in closely related viruses 4 Virus-specific genes that are conserved within a virus lineageAcquisition from host but with rapid divergence from ancestor once within viral genomes

III Viral hallmark genes5 Genes shared by many diverse virus lineages with only very distant

homologs in cellular organisms

Natural history of viral genes a one-page summary of viral comparative genomics

Contributions of different classes of viral genes to the genomes of different classes of viruses strong dependence on genome size

Genome size (log) 0 1 2 3

Recent acquisitions

Old acquisitions

Virus-specific ORFans

Virus-specific conserved

Virus hallmark

Most RNAvirusesretroelementsRCR replicons

Large RNA virusesadenotailedphages

NCLDVherpeslargephages

Natural history of viral genesViral Hallmark Genes

Shared by many diverse groups of viruses from the smallest RNA viruses to the giant DNA viruses

Strong support for monophyly of all viral members of the respective gene families

Only distant homologs in cellular organisms

Can be viewed as signatures of the lsquovirus statersquo

Play major roles in genome replication packagingand assembly

4 Rolling circle replicationinitiation endonuclease

1 Jelly-roll capsid protein

2 Superfamily 3 helicase

Protein products of viral hallmark genes

3 RNA-dependent RNA polymeraseand Reverse transcriptase

5 Viral archaeo-eukaryotic DNA primase

6 UL9-like superfamily 2 helicase

7 Packaging ATPase of the FtsK family

8 ATPase subunit of terminase

The primordial gene pool hypothesis (extremely counterintuitive ndash Santiago Elena Feb 16 2011)

The hallmark genes AND by implication the major lineages of modern viruses (at least viruses of prokaryotes) descend directly from a primordial gene pool

Koonin Senkevich Dolja Biol Direct 2006 129

Viral hallmark genes arebullpresent in a huge diversity of viruses and other selfish elementsbullrepresented only by remote homologs in cellular life forms

-synergy with the diversity of genomic strategies

A crucial corollary If viruses come directly from a primordial gene pool then origin of viruses isinextricably linked to the origin of cells

Cell degeneration Escaped genes Pr imordial geneticsystems

CELL

SMALLPARASITICCELL

VERYSMALLVIRUS

CHROMOSOME

PLASMID

VIRUS

RNA

VIRUS

PRE-CELLULAR LIFE FORMS

DNA

VIRUS

mRNA

VIRUS

Competing concepts of the or igin of viruses

CELLS

Origin and evolution of virus-like genetic

elementsin the pre-cellular

era

Koonin Martin TIG 2005Koonin Senkevich Dolja Biol Direct 2006 129Koonin Ann NY Acad Sci 2009

Replicon fusionyields chromosomes

bullViruses and virus-like genetic elements are not ldquojustrdquo pathogens they aredominant entities in the biospherebullEmergence of virus-like parasites is inevitable in any replicating systembullIn the pre-cellular epoch the genetic elements that later became viral andcellular genomes comprised a single pool in which they mixed matched andevolved new increasingly complex gene ensemblesbullDifferent replication strategies including RNA replication reverse transcriptionand DNA replication evolved already in the primordial genetic poolbullWith the emergence of prokaryotic cells a distinct pool of viral genes formed that retained its identity ever since as evidenced by the extant distribution ofviral hallmark genes ldquovirus worldrdquo or the virospherebullThe emergence of the eukaryotic cell was a second melting pot of virus evolution from which viruses of eukaryotes originated via recombination of genes from prokaryote viruses retroelements and the evolving eukaryotic hostbullViruses make essential contributions to the evolution of the genomes of cellularlife forms in particular as vehicles of HGT GTAs transducing phages

The ancient Virus World (VW)

Koonin EV Senkevich TG Dolja VV The ancient Virus World and evolution of cells Biol Direct 2006 Koonin EV Wolf YI Nagasaki K Dolja VV The complexity of the virus world Nat Rev Microbiol 2009

Evolution of antivirus defense systems

bull CRISPRCas system of adaptive immunity in prokaryotes

bull A case for Lamarckian evolutionbull The perennial virus-host arms race

CRISPR repeats and Cas genesCRISPR Clustered Regularly interspaced short palindromic repeatsCas CRISPR-associated (genes)

Sorek et al Nature Rev Microbiol 2008

Makarova KS Aravind L Grishin NV Rogozin IB Koonin EVA DNA repair system specific for thermophilic Archaea and bacteria predicted by genomic context analysisNucleic Acids Res 2002 Jan 1530(2)482-96

During a systematic analysis of conserved gene context in prokaryotic genomes a previously undetected complex partially conserved neighborhood consisting of more than 20 genes was discovered in most Archaea and some bacteria including the hyperthermophiles Thermotoga maritima and Aquifex aeolicus The gene composition and gene order in this neighborhood vary greatly between species but all versions have a stable conserved core that consists of five genes One of the core genes encodes a predicted DNA helicase often fused to a predicted HD-superfamily hydrolase and another encodes a RecB family exonuclease three core genes remain uncharacterized but one of these might encode a nuclease of a new family helliphelliphelliphellipThe functional features of the proteins encoded in this neighborhood suggest that they comprise a previously undetected DNA repair system which to our knowledge is the first repair system largely specific for thermophiles to be identified

Protein components of the system an update with unification of many diverse families

~25 families altogetherFamily SubfamilyA Phyletic

distributionBComments

1 COG1518 COG1518 (cas1) All Putative novel nucleaseintegrase Mostly α-helical protein

2 COG1343 COG1343 (cas2) COG3512ygbF-like

MTH324-likey1723_N-like

All Small protein related to VapD fused to helicase (COG1203) iny1723-like proteins

3 COG1203 COG1203 (cas3) All DNA helicase Most proteins have fusion to HD nuclease

4 RecB-like nuclease

COG1468 (cas4) COG4343 All RecB-like nuclease Contains three-cysteine C-terminal cluster

5 RAMP Repair-associated mysterious

protein

COG1688 (cas5)COG1769 COG1583COG1567 COG1336COG1367 COG1604COG1337 COG1332COG5551

BH0337-likeMJ0978-likeYgcH-like

y1726-like y1727-like

All Belong to ldquoRAMPrdquo family possibly RNA-binding proteinstructurally related to duplicated ferredoxin fold (PDB 1wj9)

6 COG1857 COG1857 COG3649 YgcJ-like y1725-like

All αβ protein predicted nuclease or integrase

7 HD-like nuclease

COG1203 (N-terminus) COG2254 All HD-like nuclease

8 BH0338 BH0338-likeMTH1090-like

All mostlyarchaea and FIRM

Large Zn-finger containing proteins probably nucleases (nucleaseactivity was shown for MTH1090 (ref)

9 COG1353 predicted

polymerase

COG1353 Most archaea some bacteria

Predicted palm-domain polymerase distantly related to viral RdRp and RT

hellipand ~15 other less common proteins Makarova et al BD 2006

CRISPR clustered regularly interspaced short palindromic repeats

CRISPR repeats

TB and TA

Strepto-like

Ecoli-like

Pasteurella-like

Cas2COG1343

Cas1COG1518

Cas4COG1468

Cas3COG1203

ldquoThe common structural characteristics of CRISPR loci are (i) the presence of multiple short direct repeats which show no or very little sequence variation within a given locus (ii) the presence of non-repetitive spacer sequences between the repeats of similar size (iii) the presence of a common leader sequence of a few hundred basepairs in most species harbouring multiple CRISPR loci (iv) the absence of long open reading frames within the locus and (v) the presence of the cas1 gene accompanied by the cas2 cas3 or cas4 genes in CRISPR-containing speciesrdquo

The cas genes are our ldquorepairrdquo system

ldquoHelicaserdquo cassette

ldquoPolymeraserdquo cassette

paragon of prokaryotic genome plasticity-extensive gene shuffling-evidence of multiple horizontal transfers-widespread gene loss

RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins

extreme sequence diversity MOTIFs specific I II III IV V COGs 13361367 hhshhGs ust-lKGhh+hh hhGtt hD lGhttsGh 160413371332 y1726-like slhlpEKuVRGT lRTIDs YGuVTsGhuh COG1851 hGphpGpsaFh hGFGRh BH0337-like TpA-h+GIh-uIh hhLpDV LGsREhuht COG1567 hhhpp ups-lhtAhh luscoGhGh COG1769 hhhh+Ph-hhh ss-hhGhlhsh hGh hGtcp+hsthchp COG1688 (Cas5) hhhhhts sssshhGhlsh lGttphh COGs 15835551 hhhhoPhhl hGtppshGFGl YgcH-like hHphlh hGu+uhGhGhh y1727-like LHphLh GFsthGLStss MJ0978-like hHNH lG+tsuhGhGol

Ferredoxin foldRNA-recognition motif (RRM)

The sequence similarity space of CRISPR repeats visualized with the BioLayoutprogram [26] Dots denote individual repeat sequences connecting lines represent Smith-Waterman similarities such that closer dots represent more similar sequences Dot colors denote cluster association as derived from MCL clustering The 12 largest clusters are indicated by circles together with their sequence logos coarse phylogenetic composition and sample secondary structures where applicableKunin et al Genome Biology 2007 8R61 doi101186gb-2007-8-4-r61

CRISPR show extreme diversity and complex clustering

Arcful GTTGAAATC-------AGACCAAAATGGGATTGAAAG- Metthe GTTAAAATC-------AGACCAAAATGGGATTGAAAT- Pyraby GTTCCAATA-------AGACTAGAATAGAATTGAAAG- Pyrhor GTTTAATAA--------GACTAAAATAGAATTGAAAG- Sulsol GATTAATCC-------------CAAAAGAATTGAAAG- Pyrhor GTTTCCGTA--------GAACTAAATAGTGTGGAAAG- Pyraby GTTTCCGTA--------GAACTTAGTAGTGTGGAAAG- Pyraer GTTTCAACT------------ATCTTTTGATTTTTGG- Pictor GTTAAATAA-------TAACCTAAATAGGATTGAAAG- Theaer GTAAAATAG---------ACCTTAATAGGATTGAAAG- Metace GTTTCAATC-------CCTCAAAGGTCTGATTTTAAC- Sultok GATGAATCC------------CAAAAGGAATTGAAAG- Sulaci GTTTTAGTT-------------TCTTGTCGTTATTAC- Pictor GTTTAAGAA--------TTACTAGATAGTATGGAGT-- Halmar GTTTCAGAC-------GGACCCTTGTGGGGTTGAAGC- Metjan ATTAAAATC-------AGACCGTTTCGGAATGGAAAT- Metkan GTTTCATTA-CCCGTATTATTACGGGTTAATTGCGAG- Nanequ CTTTCAATA---------TTTCTAATATATTAGAAAC- Themar GTTTCAATA-------CTTCCTTAGAGGTATGGAAAC- Theten GTTTCAATC-------CCTTTTAGGTAGGCTAAAAAC Thethe GTTGCAAAC-CTCGTTAGCCTCGTAGAGGATTGAAAC- Aquaeo GTTTTAACT--------CCACACGGTACATTAGAAAC- Bachal GTCGCACTC----TATATGGGT-GCGTGGATTGAAAT- Azoarcus GTGTTCCCC-------GCGCATCGCGGGGGTTGAAG-- Chltep GTCTTCCCC--------ACGCC-CGTGGGGGTGTTTC- Chrvio GTGCTCCCC-------CACGCA-CGTGGGGATGAACCG Clotet GTATTAGTA-------GCACCA-TATTGGAATGTAAAT Anabaena GTTTTAATTAACAAAAATCCCTATCAGGGATTGAAAC- Myctub GTTTCCGTC--CCCTCTCGGGGTTTTGGGTCTGACGAC Yerpes GTTCACTGC--------CGCACAGGCAGCTTAGAAA-- Metcap GTTTCAATCCACTCCCGGCTATTTAGCCGGGAGATAC- Mycgal GTTTTAGCA-CTGTACAATACTTGTGTAAGCAATAAC- Pasmul GTTAACTGC--------CGTATAGGCAGCTTAGAAA-- Pasmul GTTGTAGTTCCCTCTCTCATTTCGCAGTGCTACAAT-- Pasmul GTTCACCAT--------CGTGTAGATGGCTTAGAAA--

All

A large subset of CRISPRs is conserved among diverse specieseven between archaea and bacteria suggesting that

CRISPRs are horizontally transferred together with cas genes

Mojica FJ Diez-Villasenor C Garcia-Martinez J Soria E

Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elementsJ Mol Evol 2005 Feb60(2)174-82

Here we show that CRISPR spacers derive from preexisting sequences either chromosomal or within transmissible genetic elements such as bacteriophages and conjugative plasmids

helliphelliphelliphellipThe transcription of the CRISPR loci (Tang et al 2002) suggests that such activity could be executed by CRISPR-RNA molecules acting as regulatory RNA that specifically recognizes the target through the homologous RNA-spacer sequence similarly to the eukaryotic interference RNA

HypothesisCRISPRCasbull is a prokaryotic immunity system that functions on the

RNAi principle

bull integrates short fragments of essential phageplasmid genes into CRISPRs

bull When expressed these fragments (psiRNA ndash after prokaryotic siRNA) silence the target gene and make the organism immune to the respective agent

bull contains all or most of the protein activities involved in these processes

bull Some of the Cas proteins are functional analogs of the eukaryotic proteins involved in RNAi in particular components of RISC (RNA-Induced Silencing Complex) and form prokaryotic analogs of RISC (pRISCs)

Transcription

(protein-guided) RNA folding

specific transcription factor (COG1517)

cellular RNApol

polycistronic pre-psiRNA

p-dicer (Helicase+HD-Hydrolase)(COG1203 COG2254)

RNA processing(fast)

pre-psiRNA75-100 nt

RNA processing(slow)

p-dicer

psiRNA25-45 nt

p-slicer(COG1468 COG4343

COG1857)

RAMP

target RNA cleavage

p-RISC

plasmid or phage mRNA

Annealing to RNA target

p-RISC

The basic scheme of CASS functioning

psiRNA25-45 nt

RAMP

RAMP-RNA complex(unstable)plasmid or phage

mRNA

Primer elongationCASS RNApol(COG1353)

Amplified annealing

long dsRNA (stable)

p-dicerDuplex degradation

RAMPRAMP binding

Cycle continues

Annealing to RNA target

Variant of CASS functioning with polymerasepsi-RNA amplification(mostly in thermophiles but also in Mycobacteria)

plasmid or phage mRNA

pre-psiRNA75-100 nt

CASS RNApol(COG1353)Or other RT

Reverse transcription with random copy choice

Random RNA recombination and reverse transcription

dsDNA with CRISPRs and target-derived spacers

Homologous recombination with genomic CRISPR region

integrase(COG1518)

genomic DNA

genomic DNA with new target-derived spacer

OR

New psiRNA generation

Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12

Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity

Key validation

bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease

Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes

(1) Adaptation

(2) Expression amp Processing

(3) Interference

invading virus plasmid

CRISPRleader

5 4 3 2 1

6 5 4 3 2 1

invading virus plasmid

invading virus plasmid

host

host

cas operon

Cascade

Cas3

cas3 cas2cas1cas5

Van der Oost et al TIBS 2009

The three types of CRISPRCas systems and their signature genes

1343

1343

1343

1343

Helicase1203

HD-f

Cas2

Cas6

Cas6

Cas5

Cas8

Cas9

Cas1

Cas3

Cas10

Cas11

Cas7

Cas6

Cas6

Cas5 Cas6

Cas4

Cas6

13433513

1343

RuvC-f

3513

HNH

13433513

1857

18571857 RAMP1688

RAMP1688

RecB-f1468

RecB-f1468

Helicase1203

R AMP1583

1857 1857Helicase1203

1857Helicase1203

1343 y1724

ygcKygcL

EcoliCASS2

YpestCASS3

NmeniCASS4

MtubeCASS6 RAMP module

DvulgCASS1

Tneap and HmariCASS7

ApernCASS5

Type I (Helicase cassette)

Cascade

Type II (HNH-type)

Type III (RAMP module or Polymerase cassette)

CASS4a

1857 1343RecB-f1468

Helicase1203

BH0338MTH1090LA3191

1421 RAMP1567

RAMP15833337RAMP

1583

RAMP1583

RAMP1769

SPy1049

Cas6

Cas61857

RecB-f1468

RAMP1688

Helicase1203

AF0070

R AMP1583

1343

1517

Cas12 Cas13

RAMP1688

RAMP1688

RAMP1688

Makarova et al NRM submitted

Experimental data on CRISPRCas systems

1857

1857

1857

RAMP1688

10

Cordip DIP0037Bacfra BF3951

Wolsuc WS1444Camjej Cj1522c

Neimen NMA0630Pasmul PM1126

Strthe str0658Strpyo Spy 0770

Treden TDE0328Staepi SERP2463

Mycgal MGA 0523Mycmob MMOB0320

Porgin PG1982Legpne lpp0161

Wolsuc WS1615Sulsol SSO1450

Aerper APE1240Sulsol SSO1405

Pyraer PAE0200Pyraer PAE0081

Metmaz MM0559Metmaz MM3249

Arcful AF1878Mansuc MS1635

Niteur NE0111Myctub Rv2817c

Nostoc alr0381Synech slr7071Thethe TTHB145

Chrvio CV1229Xanaxo XAC3842

Chltep CT1130Desvul DVUA0134

Metcap MCA0651Azoarc ebA3283

Deigeo_DRAFT_1682Mansuc MS0981

Baccla ABC3592Bachal BH0341

Strpyo Spy 1286Vibvul VVA1544

Nostoc alr1468Synech slr7092

Synech slr7016Geosul GSU0057

Thethe TTHB224Nostoc alr1568

Metkan MK1312

Esccol b2755Saltyp STM2938

Phopro PBPRC0034Geosul GSU1392

Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639

Symthe_STH663Corjei jk0643

Nocfar nfa44220Strave SAV7537

Metcap MCA0930Cordip DIP2214

Chrvio CV1756Zymmob ZMO0680

Pasmul PM0311Erwcar ECA3679

Yerpes y1722Legpne lpl2837Phopro PBPRB1995

Acinet ACIAD2484

Clotet CTC01148Fusnuc FN1177

Aquaeo aq 369Theten TTE2658Themar TM1797

Bacfra BF2544Porgin PG2014

Halmar pNG4053Clotet CTC01463

Metjan MJ0378Pyrhor PH1245

Thekod TK0455Metthe MTH1084

Arcful AF2435Metace MA3670

Nanequ NEQ017Pyrhor PH0173

Pictor PTO0003Pictor PTO0049Thevol TVN0106

1343

1343

1343

1343

1518

1518

1343

1518

RecB-f4343

RecB-f1468

RecB-f1468

RecB-f1468

Helicase12031857 RAMP

1688

RAMP1688

Helicase1203

Helicase1203

RAMPBH0337

RAMPYgcH

RAMPy1727

RAMPy17261857

Helicase1203

HD-f

Helicase1203

HD-f

1343

HD-f

HD-f

HD-f

RuvC-f

AF0070

BH0338

y1724

ygcKygcL

RAMP1583

HTH

AF1870

SPy1049

3574

AF1870

PH0918

HTH 3574

1857 RAMP1688

RecB-f1468

Helicase1203

HD-f

AF1870

MTH1090

AF1873

RAMP1583

3574

AF1870

3574

1343

1

7

2

3

4

5

6

7a

Sulsol SSO1999Sulsol SSO1440

Sultok ST2639Sultok ST0032

Sulsol SSO1402Pyraer PAE0068

Arcful AF1874Pyrhor PH0917Thekod TK0450

Pyraby PAB1689

13433513

3513

1518

RuvC-f

HNH

4aHNH

1 0

Pyrhor PH0162

6

64

6

4

16

17

7

7

7

7

1

2

7

7

7

7

5 7

5 75 7

5 75 7

5 7

6 1

7

Metkan MK1314

Metace MA1931

Metmaz MM3356

Metbar Mbar_A1355

Strthe stu0960Myctub Rv2823cStaepi SERP2461

Themar TM1811Mansuc MS1651

Niteur NE0123Thethe TTP0102

Metthe MTH1082

Metthe MTH326

Metjan MJ1672Pictor PTO0053Thevol TVN0112

Metkan MK1297Thethe TTP0115

Bachal BH0328Synech sll7090

Aquaeo aq 387Theten TTE2639

Themar TM1794Arcful AF1867

Pyrfur PF1129Thefus Tfu1578

Porgin PG1987Sultok ST1979

Sulsol SSO1991Sulsol SSO1729

Sultok ST0012Sulaci Saci 2046Nostoc all1479

1421 RAMP1567

RAMP1583

RAMPMTH323

RAMP1769

XMTH324

ldquoPolymeraserdquo cassette

ldquoHelicaserdquo cassette(A)

(B)

(C)

Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4

Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56

Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)

mRNA targeting

Phylogeny of Cas1 and the 3 types of CRISPRCas systems

228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes

EcoliCASS2Type IType I

Type I

Type I

Type I

Type III

Type III

Type III

Type III

Type III

Type III

Type II

YpestCASS3

NmeniCASS4

PolymeraseRAMPmoduleMtubeCASS6

PolRAMP

DvulgCASS1

TneapHmariCASS7

ApernCASS5

cas1cas9 cas2 cas4

cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI

II

IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2

RAMP

Makarova et al NRM submitted

CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria

bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes

54

256

1537

26

516

2

9

7 56107

3

10

13

383

210

46

216

8

1

7 70211

10

1

0102030405060708090

100

1623

84 6

17 7

2322

0

9

0

0

151

2

1

21

17

20 1

0

1533 28 7 14

0

9 7 40

117 210

0102030405060708090

100

Type I Type II Type IIICas1presenceabsence

Makarova et al in preparation

Modular evolution of the 3 types of CRISPRCas

Makarova et al in preparation

Mol Microbiol 2011 Jan79(2)484-502

A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair

Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF

Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair

Back to repair functions

The 3 major modalities of evolution

Koonin Wolf Biol Direct 2009

CRISPRCas as a bona fide Lamarckian system

Koonin Wolf Biol Direct 2009

Phenomenon Biological rolefunction

Phyletic spread

Lamarckian criteria

Genomic changes caused by environmental factor

Changes are specific to relevant genomic loci

Changes provide adaptation to the causative factor

Bona fide LamarckianCRISPRCas Defense against

viruses and other mobile elements

Most of the Archaeaand many bacteria

Yes Yes Yes

piRNA Defense against transposable elements in germline

Animals Yes Yes Yes

HGT (specific cases)

Adaptation to new environment stress response resistance

Archaea bacteria unicellular eukaryotes

Yes Yes Yes

Quasi-LamarckianHGT (general phenomenon)

Diverse innovations Archaea bacteria unicellular eukaryotes

Yes No Yesno

Stress-induced mutagenesis

Stress responseresistanceadaptation to new conditions

Ubiquitous Yes No or partially

Yes (but general evolvabilityenhanced as well)

Diverse Lamarckian and quasi-Lamarckian phenomena

Koonin Wolf Biol Direct 2009

Stress as a gauge of evolutionary modality

Koonin Wolf Biol Direct 2009

bullEvolution of parasites is intrinsic to any replicator system

bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since

bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms

bullPerennial arms race between parasites and hosts is one of the principal factors of evolution

Valerian DoljaOregon State University

Yuri Wolf NCBITatiana Senkevich NIAID NIH

Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille

Didier Raoult et al

Bill MartinUniv Duesseldorf

Kira Makarova NCBI

  • Slide Number 1
  • What is a virus
  • Slide Number 3
  • Slide Number 4
  • Slide Number 5
  • Comparative genomics shows that viruses that cause human diseases belong to families that evolved hundreds of millions or even billions years agoViruses accompany the evolving cellular life throughout its history and might even predate it
  • Slide Number 7
  • Slide Number 8
  • Slide Number 9
  • Slide Number 10
  • Slide Number 11
  • Slide Number 12
  • Slide Number 13
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Slide Number 18
  • Slide Number 19
  • Slide Number 20
  • Slide Number 21
  • Slide Number 22
  • Slide Number 23
  • Slide Number 24
  • Slide Number 25
  • Slide Number 26
  • Slide Number 27
  • Slide Number 28
  • Slide Number 29
  • Slide Number 30
  • Slide Number 31
  • Slide Number 32
  • Slide Number 33
  • Slide Number 34
  • Slide Number 35
  • Slide Number 36
  • Evolution of antivirus defense systems
  • Slide Number 38
  • Slide Number 39
  • Slide Number 40
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins extreme sequence diversity
  • Slide Number 46
  • Slide Number 47
  • Slide Number 48
  • Slide Number 49
  • Hypothesis
  • Slide Number 51
  • Slide Number 52
  • Slide Number 53
  • Slide Number 54
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • The three types of CRISPRCas systems and their signature genes
  • Experimental data on CRISPRCas systems
  • Phylogeny of Cas1 and the 3 types of CRISPRCas systems
  • CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
  • Slide Number 62
  • Slide Number 63
  • Slide Number 64
  • Slide Number 65
  • Slide Number 66
  • Slide Number 67
  • Slide Number 68
  • Slide Number 69
Page 28: The Virus World, its evolution, evolution of antiviral defense

Contributions of different classes of viral genes to the genomes of different classes of viruses strong dependence on genome size

Genome size (log) 0 1 2 3

Recent acquisitions

Old acquisitions

Virus-specific ORFans

Virus-specific conserved

Virus hallmark

Most RNAvirusesretroelementsRCR replicons

Large RNA virusesadenotailedphages

NCLDVherpeslargephages

Natural history of viral genesViral Hallmark Genes

Shared by many diverse groups of viruses from the smallest RNA viruses to the giant DNA viruses

Strong support for monophyly of all viral members of the respective gene families

Only distant homologs in cellular organisms

Can be viewed as signatures of the lsquovirus statersquo

Play major roles in genome replication packagingand assembly

4 Rolling circle replicationinitiation endonuclease

1 Jelly-roll capsid protein

2 Superfamily 3 helicase

Protein products of viral hallmark genes

3 RNA-dependent RNA polymeraseand Reverse transcriptase

5 Viral archaeo-eukaryotic DNA primase

6 UL9-like superfamily 2 helicase

7 Packaging ATPase of the FtsK family

8 ATPase subunit of terminase

The primordial gene pool hypothesis (extremely counterintuitive ndash Santiago Elena Feb 16 2011)

The hallmark genes AND by implication the major lineages of modern viruses (at least viruses of prokaryotes) descend directly from a primordial gene pool

Koonin Senkevich Dolja Biol Direct 2006 129

Viral hallmark genes arebullpresent in a huge diversity of viruses and other selfish elementsbullrepresented only by remote homologs in cellular life forms

-synergy with the diversity of genomic strategies

A crucial corollary If viruses come directly from a primordial gene pool then origin of viruses isinextricably linked to the origin of cells

Cell degeneration Escaped genes Pr imordial geneticsystems

CELL

SMALLPARASITICCELL

VERYSMALLVIRUS

CHROMOSOME

PLASMID

VIRUS

RNA

VIRUS

PRE-CELLULAR LIFE FORMS

DNA

VIRUS

mRNA

VIRUS

Competing concepts of the or igin of viruses

CELLS

Origin and evolution of virus-like genetic

elementsin the pre-cellular

era

Koonin Martin TIG 2005Koonin Senkevich Dolja Biol Direct 2006 129Koonin Ann NY Acad Sci 2009

Replicon fusionyields chromosomes

bullViruses and virus-like genetic elements are not ldquojustrdquo pathogens they aredominant entities in the biospherebullEmergence of virus-like parasites is inevitable in any replicating systembullIn the pre-cellular epoch the genetic elements that later became viral andcellular genomes comprised a single pool in which they mixed matched andevolved new increasingly complex gene ensemblesbullDifferent replication strategies including RNA replication reverse transcriptionand DNA replication evolved already in the primordial genetic poolbullWith the emergence of prokaryotic cells a distinct pool of viral genes formed that retained its identity ever since as evidenced by the extant distribution ofviral hallmark genes ldquovirus worldrdquo or the virospherebullThe emergence of the eukaryotic cell was a second melting pot of virus evolution from which viruses of eukaryotes originated via recombination of genes from prokaryote viruses retroelements and the evolving eukaryotic hostbullViruses make essential contributions to the evolution of the genomes of cellularlife forms in particular as vehicles of HGT GTAs transducing phages

The ancient Virus World (VW)

Koonin EV Senkevich TG Dolja VV The ancient Virus World and evolution of cells Biol Direct 2006 Koonin EV Wolf YI Nagasaki K Dolja VV The complexity of the virus world Nat Rev Microbiol 2009

Evolution of antivirus defense systems

bull CRISPRCas system of adaptive immunity in prokaryotes

bull A case for Lamarckian evolutionbull The perennial virus-host arms race

CRISPR repeats and Cas genesCRISPR Clustered Regularly interspaced short palindromic repeatsCas CRISPR-associated (genes)

Sorek et al Nature Rev Microbiol 2008

Makarova KS Aravind L Grishin NV Rogozin IB Koonin EVA DNA repair system specific for thermophilic Archaea and bacteria predicted by genomic context analysisNucleic Acids Res 2002 Jan 1530(2)482-96

During a systematic analysis of conserved gene context in prokaryotic genomes a previously undetected complex partially conserved neighborhood consisting of more than 20 genes was discovered in most Archaea and some bacteria including the hyperthermophiles Thermotoga maritima and Aquifex aeolicus The gene composition and gene order in this neighborhood vary greatly between species but all versions have a stable conserved core that consists of five genes One of the core genes encodes a predicted DNA helicase often fused to a predicted HD-superfamily hydrolase and another encodes a RecB family exonuclease three core genes remain uncharacterized but one of these might encode a nuclease of a new family helliphelliphelliphellipThe functional features of the proteins encoded in this neighborhood suggest that they comprise a previously undetected DNA repair system which to our knowledge is the first repair system largely specific for thermophiles to be identified

Protein components of the system an update with unification of many diverse families

~25 families altogetherFamily SubfamilyA Phyletic

distributionBComments

1 COG1518 COG1518 (cas1) All Putative novel nucleaseintegrase Mostly α-helical protein

2 COG1343 COG1343 (cas2) COG3512ygbF-like

MTH324-likey1723_N-like

All Small protein related to VapD fused to helicase (COG1203) iny1723-like proteins

3 COG1203 COG1203 (cas3) All DNA helicase Most proteins have fusion to HD nuclease

4 RecB-like nuclease

COG1468 (cas4) COG4343 All RecB-like nuclease Contains three-cysteine C-terminal cluster

5 RAMP Repair-associated mysterious

protein

COG1688 (cas5)COG1769 COG1583COG1567 COG1336COG1367 COG1604COG1337 COG1332COG5551

BH0337-likeMJ0978-likeYgcH-like

y1726-like y1727-like

All Belong to ldquoRAMPrdquo family possibly RNA-binding proteinstructurally related to duplicated ferredoxin fold (PDB 1wj9)

6 COG1857 COG1857 COG3649 YgcJ-like y1725-like

All αβ protein predicted nuclease or integrase

7 HD-like nuclease

COG1203 (N-terminus) COG2254 All HD-like nuclease

8 BH0338 BH0338-likeMTH1090-like

All mostlyarchaea and FIRM

Large Zn-finger containing proteins probably nucleases (nucleaseactivity was shown for MTH1090 (ref)

9 COG1353 predicted

polymerase

COG1353 Most archaea some bacteria

Predicted palm-domain polymerase distantly related to viral RdRp and RT

hellipand ~15 other less common proteins Makarova et al BD 2006

CRISPR clustered regularly interspaced short palindromic repeats

CRISPR repeats

TB and TA

Strepto-like

Ecoli-like

Pasteurella-like

Cas2COG1343

Cas1COG1518

Cas4COG1468

Cas3COG1203

ldquoThe common structural characteristics of CRISPR loci are (i) the presence of multiple short direct repeats which show no or very little sequence variation within a given locus (ii) the presence of non-repetitive spacer sequences between the repeats of similar size (iii) the presence of a common leader sequence of a few hundred basepairs in most species harbouring multiple CRISPR loci (iv) the absence of long open reading frames within the locus and (v) the presence of the cas1 gene accompanied by the cas2 cas3 or cas4 genes in CRISPR-containing speciesrdquo

The cas genes are our ldquorepairrdquo system

ldquoHelicaserdquo cassette

ldquoPolymeraserdquo cassette

paragon of prokaryotic genome plasticity-extensive gene shuffling-evidence of multiple horizontal transfers-widespread gene loss

RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins

extreme sequence diversity MOTIFs specific I II III IV V COGs 13361367 hhshhGs ust-lKGhh+hh hhGtt hD lGhttsGh 160413371332 y1726-like slhlpEKuVRGT lRTIDs YGuVTsGhuh COG1851 hGphpGpsaFh hGFGRh BH0337-like TpA-h+GIh-uIh hhLpDV LGsREhuht COG1567 hhhpp ups-lhtAhh luscoGhGh COG1769 hhhh+Ph-hhh ss-hhGhlhsh hGh hGtcp+hsthchp COG1688 (Cas5) hhhhhts sssshhGhlsh lGttphh COGs 15835551 hhhhoPhhl hGtppshGFGl YgcH-like hHphlh hGu+uhGhGhh y1727-like LHphLh GFsthGLStss MJ0978-like hHNH lG+tsuhGhGol

Ferredoxin foldRNA-recognition motif (RRM)

The sequence similarity space of CRISPR repeats visualized with the BioLayoutprogram [26] Dots denote individual repeat sequences connecting lines represent Smith-Waterman similarities such that closer dots represent more similar sequences Dot colors denote cluster association as derived from MCL clustering The 12 largest clusters are indicated by circles together with their sequence logos coarse phylogenetic composition and sample secondary structures where applicableKunin et al Genome Biology 2007 8R61 doi101186gb-2007-8-4-r61

CRISPR show extreme diversity and complex clustering

Arcful GTTGAAATC-------AGACCAAAATGGGATTGAAAG- Metthe GTTAAAATC-------AGACCAAAATGGGATTGAAAT- Pyraby GTTCCAATA-------AGACTAGAATAGAATTGAAAG- Pyrhor GTTTAATAA--------GACTAAAATAGAATTGAAAG- Sulsol GATTAATCC-------------CAAAAGAATTGAAAG- Pyrhor GTTTCCGTA--------GAACTAAATAGTGTGGAAAG- Pyraby GTTTCCGTA--------GAACTTAGTAGTGTGGAAAG- Pyraer GTTTCAACT------------ATCTTTTGATTTTTGG- Pictor GTTAAATAA-------TAACCTAAATAGGATTGAAAG- Theaer GTAAAATAG---------ACCTTAATAGGATTGAAAG- Metace GTTTCAATC-------CCTCAAAGGTCTGATTTTAAC- Sultok GATGAATCC------------CAAAAGGAATTGAAAG- Sulaci GTTTTAGTT-------------TCTTGTCGTTATTAC- Pictor GTTTAAGAA--------TTACTAGATAGTATGGAGT-- Halmar GTTTCAGAC-------GGACCCTTGTGGGGTTGAAGC- Metjan ATTAAAATC-------AGACCGTTTCGGAATGGAAAT- Metkan GTTTCATTA-CCCGTATTATTACGGGTTAATTGCGAG- Nanequ CTTTCAATA---------TTTCTAATATATTAGAAAC- Themar GTTTCAATA-------CTTCCTTAGAGGTATGGAAAC- Theten GTTTCAATC-------CCTTTTAGGTAGGCTAAAAAC Thethe GTTGCAAAC-CTCGTTAGCCTCGTAGAGGATTGAAAC- Aquaeo GTTTTAACT--------CCACACGGTACATTAGAAAC- Bachal GTCGCACTC----TATATGGGT-GCGTGGATTGAAAT- Azoarcus GTGTTCCCC-------GCGCATCGCGGGGGTTGAAG-- Chltep GTCTTCCCC--------ACGCC-CGTGGGGGTGTTTC- Chrvio GTGCTCCCC-------CACGCA-CGTGGGGATGAACCG Clotet GTATTAGTA-------GCACCA-TATTGGAATGTAAAT Anabaena GTTTTAATTAACAAAAATCCCTATCAGGGATTGAAAC- Myctub GTTTCCGTC--CCCTCTCGGGGTTTTGGGTCTGACGAC Yerpes GTTCACTGC--------CGCACAGGCAGCTTAGAAA-- Metcap GTTTCAATCCACTCCCGGCTATTTAGCCGGGAGATAC- Mycgal GTTTTAGCA-CTGTACAATACTTGTGTAAGCAATAAC- Pasmul GTTAACTGC--------CGTATAGGCAGCTTAGAAA-- Pasmul GTTGTAGTTCCCTCTCTCATTTCGCAGTGCTACAAT-- Pasmul GTTCACCAT--------CGTGTAGATGGCTTAGAAA--

All

A large subset of CRISPRs is conserved among diverse specieseven between archaea and bacteria suggesting that

CRISPRs are horizontally transferred together with cas genes

Mojica FJ Diez-Villasenor C Garcia-Martinez J Soria E

Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elementsJ Mol Evol 2005 Feb60(2)174-82

Here we show that CRISPR spacers derive from preexisting sequences either chromosomal or within transmissible genetic elements such as bacteriophages and conjugative plasmids

helliphelliphelliphellipThe transcription of the CRISPR loci (Tang et al 2002) suggests that such activity could be executed by CRISPR-RNA molecules acting as regulatory RNA that specifically recognizes the target through the homologous RNA-spacer sequence similarly to the eukaryotic interference RNA

HypothesisCRISPRCasbull is a prokaryotic immunity system that functions on the

RNAi principle

bull integrates short fragments of essential phageplasmid genes into CRISPRs

bull When expressed these fragments (psiRNA ndash after prokaryotic siRNA) silence the target gene and make the organism immune to the respective agent

bull contains all or most of the protein activities involved in these processes

bull Some of the Cas proteins are functional analogs of the eukaryotic proteins involved in RNAi in particular components of RISC (RNA-Induced Silencing Complex) and form prokaryotic analogs of RISC (pRISCs)

Transcription

(protein-guided) RNA folding

specific transcription factor (COG1517)

cellular RNApol

polycistronic pre-psiRNA

p-dicer (Helicase+HD-Hydrolase)(COG1203 COG2254)

RNA processing(fast)

pre-psiRNA75-100 nt

RNA processing(slow)

p-dicer

psiRNA25-45 nt

p-slicer(COG1468 COG4343

COG1857)

RAMP

target RNA cleavage

p-RISC

plasmid or phage mRNA

Annealing to RNA target

p-RISC

The basic scheme of CASS functioning

psiRNA25-45 nt

RAMP

RAMP-RNA complex(unstable)plasmid or phage

mRNA

Primer elongationCASS RNApol(COG1353)

Amplified annealing

long dsRNA (stable)

p-dicerDuplex degradation

RAMPRAMP binding

Cycle continues

Annealing to RNA target

Variant of CASS functioning with polymerasepsi-RNA amplification(mostly in thermophiles but also in Mycobacteria)

plasmid or phage mRNA

pre-psiRNA75-100 nt

CASS RNApol(COG1353)Or other RT

Reverse transcription with random copy choice

Random RNA recombination and reverse transcription

dsDNA with CRISPRs and target-derived spacers

Homologous recombination with genomic CRISPR region

integrase(COG1518)

genomic DNA

genomic DNA with new target-derived spacer

OR

New psiRNA generation

Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12

Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity

Key validation

bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease

Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes

(1) Adaptation

(2) Expression amp Processing

(3) Interference

invading virus plasmid

CRISPRleader

5 4 3 2 1

6 5 4 3 2 1

invading virus plasmid

invading virus plasmid

host

host

cas operon

Cascade

Cas3

cas3 cas2cas1cas5

Van der Oost et al TIBS 2009

The three types of CRISPRCas systems and their signature genes

1343

1343

1343

1343

Helicase1203

HD-f

Cas2

Cas6

Cas6

Cas5

Cas8

Cas9

Cas1

Cas3

Cas10

Cas11

Cas7

Cas6

Cas6

Cas5 Cas6

Cas4

Cas6

13433513

1343

RuvC-f

3513

HNH

13433513

1857

18571857 RAMP1688

RAMP1688

RecB-f1468

RecB-f1468

Helicase1203

R AMP1583

1857 1857Helicase1203

1857Helicase1203

1343 y1724

ygcKygcL

EcoliCASS2

YpestCASS3

NmeniCASS4

MtubeCASS6 RAMP module

DvulgCASS1

Tneap and HmariCASS7

ApernCASS5

Type I (Helicase cassette)

Cascade

Type II (HNH-type)

Type III (RAMP module or Polymerase cassette)

CASS4a

1857 1343RecB-f1468

Helicase1203

BH0338MTH1090LA3191

1421 RAMP1567

RAMP15833337RAMP

1583

RAMP1583

RAMP1769

SPy1049

Cas6

Cas61857

RecB-f1468

RAMP1688

Helicase1203

AF0070

R AMP1583

1343

1517

Cas12 Cas13

RAMP1688

RAMP1688

RAMP1688

Makarova et al NRM submitted

Experimental data on CRISPRCas systems

1857

1857

1857

RAMP1688

10

Cordip DIP0037Bacfra BF3951

Wolsuc WS1444Camjej Cj1522c

Neimen NMA0630Pasmul PM1126

Strthe str0658Strpyo Spy 0770

Treden TDE0328Staepi SERP2463

Mycgal MGA 0523Mycmob MMOB0320

Porgin PG1982Legpne lpp0161

Wolsuc WS1615Sulsol SSO1450

Aerper APE1240Sulsol SSO1405

Pyraer PAE0200Pyraer PAE0081

Metmaz MM0559Metmaz MM3249

Arcful AF1878Mansuc MS1635

Niteur NE0111Myctub Rv2817c

Nostoc alr0381Synech slr7071Thethe TTHB145

Chrvio CV1229Xanaxo XAC3842

Chltep CT1130Desvul DVUA0134

Metcap MCA0651Azoarc ebA3283

Deigeo_DRAFT_1682Mansuc MS0981

Baccla ABC3592Bachal BH0341

Strpyo Spy 1286Vibvul VVA1544

Nostoc alr1468Synech slr7092

Synech slr7016Geosul GSU0057

Thethe TTHB224Nostoc alr1568

Metkan MK1312

Esccol b2755Saltyp STM2938

Phopro PBPRC0034Geosul GSU1392

Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639

Symthe_STH663Corjei jk0643

Nocfar nfa44220Strave SAV7537

Metcap MCA0930Cordip DIP2214

Chrvio CV1756Zymmob ZMO0680

Pasmul PM0311Erwcar ECA3679

Yerpes y1722Legpne lpl2837Phopro PBPRB1995

Acinet ACIAD2484

Clotet CTC01148Fusnuc FN1177

Aquaeo aq 369Theten TTE2658Themar TM1797

Bacfra BF2544Porgin PG2014

Halmar pNG4053Clotet CTC01463

Metjan MJ0378Pyrhor PH1245

Thekod TK0455Metthe MTH1084

Arcful AF2435Metace MA3670

Nanequ NEQ017Pyrhor PH0173

Pictor PTO0003Pictor PTO0049Thevol TVN0106

1343

1343

1343

1343

1518

1518

1343

1518

RecB-f4343

RecB-f1468

RecB-f1468

RecB-f1468

Helicase12031857 RAMP

1688

RAMP1688

Helicase1203

Helicase1203

RAMPBH0337

RAMPYgcH

RAMPy1727

RAMPy17261857

Helicase1203

HD-f

Helicase1203

HD-f

1343

HD-f

HD-f

HD-f

RuvC-f

AF0070

BH0338

y1724

ygcKygcL

RAMP1583

HTH

AF1870

SPy1049

3574

AF1870

PH0918

HTH 3574

1857 RAMP1688

RecB-f1468

Helicase1203

HD-f

AF1870

MTH1090

AF1873

RAMP1583

3574

AF1870

3574

1343

1

7

2

3

4

5

6

7a

Sulsol SSO1999Sulsol SSO1440

Sultok ST2639Sultok ST0032

Sulsol SSO1402Pyraer PAE0068

Arcful AF1874Pyrhor PH0917Thekod TK0450

Pyraby PAB1689

13433513

3513

1518

RuvC-f

HNH

4aHNH

1 0

Pyrhor PH0162

6

64

6

4

16

17

7

7

7

7

1

2

7

7

7

7

5 7

5 75 7

5 75 7

5 7

6 1

7

Metkan MK1314

Metace MA1931

Metmaz MM3356

Metbar Mbar_A1355

Strthe stu0960Myctub Rv2823cStaepi SERP2461

Themar TM1811Mansuc MS1651

Niteur NE0123Thethe TTP0102

Metthe MTH1082

Metthe MTH326

Metjan MJ1672Pictor PTO0053Thevol TVN0112

Metkan MK1297Thethe TTP0115

Bachal BH0328Synech sll7090

Aquaeo aq 387Theten TTE2639

Themar TM1794Arcful AF1867

Pyrfur PF1129Thefus Tfu1578

Porgin PG1987Sultok ST1979

Sulsol SSO1991Sulsol SSO1729

Sultok ST0012Sulaci Saci 2046Nostoc all1479

1421 RAMP1567

RAMP1583

RAMPMTH323

RAMP1769

XMTH324

ldquoPolymeraserdquo cassette

ldquoHelicaserdquo cassette(A)

(B)

(C)

Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4

Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56

Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)

mRNA targeting

Phylogeny of Cas1 and the 3 types of CRISPRCas systems

228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes

EcoliCASS2Type IType I

Type I

Type I

Type I

Type III

Type III

Type III

Type III

Type III

Type III

Type II

YpestCASS3

NmeniCASS4

PolymeraseRAMPmoduleMtubeCASS6

PolRAMP

DvulgCASS1

TneapHmariCASS7

ApernCASS5

cas1cas9 cas2 cas4

cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI

II

IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2

RAMP

Makarova et al NRM submitted

CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria

bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes

54

256

1537

26

516

2

9

7 56107

3

10

13

383

210

46

216

8

1

7 70211

10

1

0102030405060708090

100

1623

84 6

17 7

2322

0

9

0

0

151

2

1

21

17

20 1

0

1533 28 7 14

0

9 7 40

117 210

0102030405060708090

100

Type I Type II Type IIICas1presenceabsence

Makarova et al in preparation

Modular evolution of the 3 types of CRISPRCas

Makarova et al in preparation

Mol Microbiol 2011 Jan79(2)484-502

A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair

Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF

Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair

Back to repair functions

The 3 major modalities of evolution

Koonin Wolf Biol Direct 2009

CRISPRCas as a bona fide Lamarckian system

Koonin Wolf Biol Direct 2009

Phenomenon Biological rolefunction

Phyletic spread

Lamarckian criteria

Genomic changes caused by environmental factor

Changes are specific to relevant genomic loci

Changes provide adaptation to the causative factor

Bona fide LamarckianCRISPRCas Defense against

viruses and other mobile elements

Most of the Archaeaand many bacteria

Yes Yes Yes

piRNA Defense against transposable elements in germline

Animals Yes Yes Yes

HGT (specific cases)

Adaptation to new environment stress response resistance

Archaea bacteria unicellular eukaryotes

Yes Yes Yes

Quasi-LamarckianHGT (general phenomenon)

Diverse innovations Archaea bacteria unicellular eukaryotes

Yes No Yesno

Stress-induced mutagenesis

Stress responseresistanceadaptation to new conditions

Ubiquitous Yes No or partially

Yes (but general evolvabilityenhanced as well)

Diverse Lamarckian and quasi-Lamarckian phenomena

Koonin Wolf Biol Direct 2009

Stress as a gauge of evolutionary modality

Koonin Wolf Biol Direct 2009

bullEvolution of parasites is intrinsic to any replicator system

bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since

bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms

bullPerennial arms race between parasites and hosts is one of the principal factors of evolution

Valerian DoljaOregon State University

Yuri Wolf NCBITatiana Senkevich NIAID NIH

Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille

Didier Raoult et al

Bill MartinUniv Duesseldorf

Kira Makarova NCBI

  • Slide Number 1
  • What is a virus
  • Slide Number 3
  • Slide Number 4
  • Slide Number 5
  • Comparative genomics shows that viruses that cause human diseases belong to families that evolved hundreds of millions or even billions years agoViruses accompany the evolving cellular life throughout its history and might even predate it
  • Slide Number 7
  • Slide Number 8
  • Slide Number 9
  • Slide Number 10
  • Slide Number 11
  • Slide Number 12
  • Slide Number 13
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Slide Number 18
  • Slide Number 19
  • Slide Number 20
  • Slide Number 21
  • Slide Number 22
  • Slide Number 23
  • Slide Number 24
  • Slide Number 25
  • Slide Number 26
  • Slide Number 27
  • Slide Number 28
  • Slide Number 29
  • Slide Number 30
  • Slide Number 31
  • Slide Number 32
  • Slide Number 33
  • Slide Number 34
  • Slide Number 35
  • Slide Number 36
  • Evolution of antivirus defense systems
  • Slide Number 38
  • Slide Number 39
  • Slide Number 40
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins extreme sequence diversity
  • Slide Number 46
  • Slide Number 47
  • Slide Number 48
  • Slide Number 49
  • Hypothesis
  • Slide Number 51
  • Slide Number 52
  • Slide Number 53
  • Slide Number 54
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • The three types of CRISPRCas systems and their signature genes
  • Experimental data on CRISPRCas systems
  • Phylogeny of Cas1 and the 3 types of CRISPRCas systems
  • CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
  • Slide Number 62
  • Slide Number 63
  • Slide Number 64
  • Slide Number 65
  • Slide Number 66
  • Slide Number 67
  • Slide Number 68
  • Slide Number 69
Page 29: The Virus World, its evolution, evolution of antiviral defense

Natural history of viral genesViral Hallmark Genes

Shared by many diverse groups of viruses from the smallest RNA viruses to the giant DNA viruses

Strong support for monophyly of all viral members of the respective gene families

Only distant homologs in cellular organisms

Can be viewed as signatures of the lsquovirus statersquo

Play major roles in genome replication packagingand assembly

4 Rolling circle replicationinitiation endonuclease

1 Jelly-roll capsid protein

2 Superfamily 3 helicase

Protein products of viral hallmark genes

3 RNA-dependent RNA polymeraseand Reverse transcriptase

5 Viral archaeo-eukaryotic DNA primase

6 UL9-like superfamily 2 helicase

7 Packaging ATPase of the FtsK family

8 ATPase subunit of terminase

The primordial gene pool hypothesis (extremely counterintuitive ndash Santiago Elena Feb 16 2011)

The hallmark genes AND by implication the major lineages of modern viruses (at least viruses of prokaryotes) descend directly from a primordial gene pool

Koonin Senkevich Dolja Biol Direct 2006 129

Viral hallmark genes arebullpresent in a huge diversity of viruses and other selfish elementsbullrepresented only by remote homologs in cellular life forms

-synergy with the diversity of genomic strategies

A crucial corollary If viruses come directly from a primordial gene pool then origin of viruses isinextricably linked to the origin of cells

Cell degeneration Escaped genes Pr imordial geneticsystems

CELL

SMALLPARASITICCELL

VERYSMALLVIRUS

CHROMOSOME

PLASMID

VIRUS

RNA

VIRUS

PRE-CELLULAR LIFE FORMS

DNA

VIRUS

mRNA

VIRUS

Competing concepts of the or igin of viruses

CELLS

Origin and evolution of virus-like genetic

elementsin the pre-cellular

era

Koonin Martin TIG 2005Koonin Senkevich Dolja Biol Direct 2006 129Koonin Ann NY Acad Sci 2009

Replicon fusionyields chromosomes

bullViruses and virus-like genetic elements are not ldquojustrdquo pathogens they aredominant entities in the biospherebullEmergence of virus-like parasites is inevitable in any replicating systembullIn the pre-cellular epoch the genetic elements that later became viral andcellular genomes comprised a single pool in which they mixed matched andevolved new increasingly complex gene ensemblesbullDifferent replication strategies including RNA replication reverse transcriptionand DNA replication evolved already in the primordial genetic poolbullWith the emergence of prokaryotic cells a distinct pool of viral genes formed that retained its identity ever since as evidenced by the extant distribution ofviral hallmark genes ldquovirus worldrdquo or the virospherebullThe emergence of the eukaryotic cell was a second melting pot of virus evolution from which viruses of eukaryotes originated via recombination of genes from prokaryote viruses retroelements and the evolving eukaryotic hostbullViruses make essential contributions to the evolution of the genomes of cellularlife forms in particular as vehicles of HGT GTAs transducing phages

The ancient Virus World (VW)

Koonin EV Senkevich TG Dolja VV The ancient Virus World and evolution of cells Biol Direct 2006 Koonin EV Wolf YI Nagasaki K Dolja VV The complexity of the virus world Nat Rev Microbiol 2009

Evolution of antivirus defense systems

bull CRISPRCas system of adaptive immunity in prokaryotes

bull A case for Lamarckian evolutionbull The perennial virus-host arms race

CRISPR repeats and Cas genesCRISPR Clustered Regularly interspaced short palindromic repeatsCas CRISPR-associated (genes)

Sorek et al Nature Rev Microbiol 2008

Makarova KS Aravind L Grishin NV Rogozin IB Koonin EVA DNA repair system specific for thermophilic Archaea and bacteria predicted by genomic context analysisNucleic Acids Res 2002 Jan 1530(2)482-96

During a systematic analysis of conserved gene context in prokaryotic genomes a previously undetected complex partially conserved neighborhood consisting of more than 20 genes was discovered in most Archaea and some bacteria including the hyperthermophiles Thermotoga maritima and Aquifex aeolicus The gene composition and gene order in this neighborhood vary greatly between species but all versions have a stable conserved core that consists of five genes One of the core genes encodes a predicted DNA helicase often fused to a predicted HD-superfamily hydrolase and another encodes a RecB family exonuclease three core genes remain uncharacterized but one of these might encode a nuclease of a new family helliphelliphelliphellipThe functional features of the proteins encoded in this neighborhood suggest that they comprise a previously undetected DNA repair system which to our knowledge is the first repair system largely specific for thermophiles to be identified

Protein components of the system an update with unification of many diverse families

~25 families altogetherFamily SubfamilyA Phyletic

distributionBComments

1 COG1518 COG1518 (cas1) All Putative novel nucleaseintegrase Mostly α-helical protein

2 COG1343 COG1343 (cas2) COG3512ygbF-like

MTH324-likey1723_N-like

All Small protein related to VapD fused to helicase (COG1203) iny1723-like proteins

3 COG1203 COG1203 (cas3) All DNA helicase Most proteins have fusion to HD nuclease

4 RecB-like nuclease

COG1468 (cas4) COG4343 All RecB-like nuclease Contains three-cysteine C-terminal cluster

5 RAMP Repair-associated mysterious

protein

COG1688 (cas5)COG1769 COG1583COG1567 COG1336COG1367 COG1604COG1337 COG1332COG5551

BH0337-likeMJ0978-likeYgcH-like

y1726-like y1727-like

All Belong to ldquoRAMPrdquo family possibly RNA-binding proteinstructurally related to duplicated ferredoxin fold (PDB 1wj9)

6 COG1857 COG1857 COG3649 YgcJ-like y1725-like

All αβ protein predicted nuclease or integrase

7 HD-like nuclease

COG1203 (N-terminus) COG2254 All HD-like nuclease

8 BH0338 BH0338-likeMTH1090-like

All mostlyarchaea and FIRM

Large Zn-finger containing proteins probably nucleases (nucleaseactivity was shown for MTH1090 (ref)

9 COG1353 predicted

polymerase

COG1353 Most archaea some bacteria

Predicted palm-domain polymerase distantly related to viral RdRp and RT

hellipand ~15 other less common proteins Makarova et al BD 2006

CRISPR clustered regularly interspaced short palindromic repeats

CRISPR repeats

TB and TA

Strepto-like

Ecoli-like

Pasteurella-like

Cas2COG1343

Cas1COG1518

Cas4COG1468

Cas3COG1203

ldquoThe common structural characteristics of CRISPR loci are (i) the presence of multiple short direct repeats which show no or very little sequence variation within a given locus (ii) the presence of non-repetitive spacer sequences between the repeats of similar size (iii) the presence of a common leader sequence of a few hundred basepairs in most species harbouring multiple CRISPR loci (iv) the absence of long open reading frames within the locus and (v) the presence of the cas1 gene accompanied by the cas2 cas3 or cas4 genes in CRISPR-containing speciesrdquo

The cas genes are our ldquorepairrdquo system

ldquoHelicaserdquo cassette

ldquoPolymeraserdquo cassette

paragon of prokaryotic genome plasticity-extensive gene shuffling-evidence of multiple horizontal transfers-widespread gene loss

RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins

extreme sequence diversity MOTIFs specific I II III IV V COGs 13361367 hhshhGs ust-lKGhh+hh hhGtt hD lGhttsGh 160413371332 y1726-like slhlpEKuVRGT lRTIDs YGuVTsGhuh COG1851 hGphpGpsaFh hGFGRh BH0337-like TpA-h+GIh-uIh hhLpDV LGsREhuht COG1567 hhhpp ups-lhtAhh luscoGhGh COG1769 hhhh+Ph-hhh ss-hhGhlhsh hGh hGtcp+hsthchp COG1688 (Cas5) hhhhhts sssshhGhlsh lGttphh COGs 15835551 hhhhoPhhl hGtppshGFGl YgcH-like hHphlh hGu+uhGhGhh y1727-like LHphLh GFsthGLStss MJ0978-like hHNH lG+tsuhGhGol

Ferredoxin foldRNA-recognition motif (RRM)

The sequence similarity space of CRISPR repeats visualized with the BioLayoutprogram [26] Dots denote individual repeat sequences connecting lines represent Smith-Waterman similarities such that closer dots represent more similar sequences Dot colors denote cluster association as derived from MCL clustering The 12 largest clusters are indicated by circles together with their sequence logos coarse phylogenetic composition and sample secondary structures where applicableKunin et al Genome Biology 2007 8R61 doi101186gb-2007-8-4-r61

CRISPR show extreme diversity and complex clustering

Arcful GTTGAAATC-------AGACCAAAATGGGATTGAAAG- Metthe GTTAAAATC-------AGACCAAAATGGGATTGAAAT- Pyraby GTTCCAATA-------AGACTAGAATAGAATTGAAAG- Pyrhor GTTTAATAA--------GACTAAAATAGAATTGAAAG- Sulsol GATTAATCC-------------CAAAAGAATTGAAAG- Pyrhor GTTTCCGTA--------GAACTAAATAGTGTGGAAAG- Pyraby GTTTCCGTA--------GAACTTAGTAGTGTGGAAAG- Pyraer GTTTCAACT------------ATCTTTTGATTTTTGG- Pictor GTTAAATAA-------TAACCTAAATAGGATTGAAAG- Theaer GTAAAATAG---------ACCTTAATAGGATTGAAAG- Metace GTTTCAATC-------CCTCAAAGGTCTGATTTTAAC- Sultok GATGAATCC------------CAAAAGGAATTGAAAG- Sulaci GTTTTAGTT-------------TCTTGTCGTTATTAC- Pictor GTTTAAGAA--------TTACTAGATAGTATGGAGT-- Halmar GTTTCAGAC-------GGACCCTTGTGGGGTTGAAGC- Metjan ATTAAAATC-------AGACCGTTTCGGAATGGAAAT- Metkan GTTTCATTA-CCCGTATTATTACGGGTTAATTGCGAG- Nanequ CTTTCAATA---------TTTCTAATATATTAGAAAC- Themar GTTTCAATA-------CTTCCTTAGAGGTATGGAAAC- Theten GTTTCAATC-------CCTTTTAGGTAGGCTAAAAAC Thethe GTTGCAAAC-CTCGTTAGCCTCGTAGAGGATTGAAAC- Aquaeo GTTTTAACT--------CCACACGGTACATTAGAAAC- Bachal GTCGCACTC----TATATGGGT-GCGTGGATTGAAAT- Azoarcus GTGTTCCCC-------GCGCATCGCGGGGGTTGAAG-- Chltep GTCTTCCCC--------ACGCC-CGTGGGGGTGTTTC- Chrvio GTGCTCCCC-------CACGCA-CGTGGGGATGAACCG Clotet GTATTAGTA-------GCACCA-TATTGGAATGTAAAT Anabaena GTTTTAATTAACAAAAATCCCTATCAGGGATTGAAAC- Myctub GTTTCCGTC--CCCTCTCGGGGTTTTGGGTCTGACGAC Yerpes GTTCACTGC--------CGCACAGGCAGCTTAGAAA-- Metcap GTTTCAATCCACTCCCGGCTATTTAGCCGGGAGATAC- Mycgal GTTTTAGCA-CTGTACAATACTTGTGTAAGCAATAAC- Pasmul GTTAACTGC--------CGTATAGGCAGCTTAGAAA-- Pasmul GTTGTAGTTCCCTCTCTCATTTCGCAGTGCTACAAT-- Pasmul GTTCACCAT--------CGTGTAGATGGCTTAGAAA--

All

A large subset of CRISPRs is conserved among diverse specieseven between archaea and bacteria suggesting that

CRISPRs are horizontally transferred together with cas genes

Mojica FJ Diez-Villasenor C Garcia-Martinez J Soria E

Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elementsJ Mol Evol 2005 Feb60(2)174-82

Here we show that CRISPR spacers derive from preexisting sequences either chromosomal or within transmissible genetic elements such as bacteriophages and conjugative plasmids

helliphelliphelliphellipThe transcription of the CRISPR loci (Tang et al 2002) suggests that such activity could be executed by CRISPR-RNA molecules acting as regulatory RNA that specifically recognizes the target through the homologous RNA-spacer sequence similarly to the eukaryotic interference RNA

HypothesisCRISPRCasbull is a prokaryotic immunity system that functions on the

RNAi principle

bull integrates short fragments of essential phageplasmid genes into CRISPRs

bull When expressed these fragments (psiRNA ndash after prokaryotic siRNA) silence the target gene and make the organism immune to the respective agent

bull contains all or most of the protein activities involved in these processes

bull Some of the Cas proteins are functional analogs of the eukaryotic proteins involved in RNAi in particular components of RISC (RNA-Induced Silencing Complex) and form prokaryotic analogs of RISC (pRISCs)

Transcription

(protein-guided) RNA folding

specific transcription factor (COG1517)

cellular RNApol

polycistronic pre-psiRNA

p-dicer (Helicase+HD-Hydrolase)(COG1203 COG2254)

RNA processing(fast)

pre-psiRNA75-100 nt

RNA processing(slow)

p-dicer

psiRNA25-45 nt

p-slicer(COG1468 COG4343

COG1857)

RAMP

target RNA cleavage

p-RISC

plasmid or phage mRNA

Annealing to RNA target

p-RISC

The basic scheme of CASS functioning

psiRNA25-45 nt

RAMP

RAMP-RNA complex(unstable)plasmid or phage

mRNA

Primer elongationCASS RNApol(COG1353)

Amplified annealing

long dsRNA (stable)

p-dicerDuplex degradation

RAMPRAMP binding

Cycle continues

Annealing to RNA target

Variant of CASS functioning with polymerasepsi-RNA amplification(mostly in thermophiles but also in Mycobacteria)

plasmid or phage mRNA

pre-psiRNA75-100 nt

CASS RNApol(COG1353)Or other RT

Reverse transcription with random copy choice

Random RNA recombination and reverse transcription

dsDNA with CRISPRs and target-derived spacers

Homologous recombination with genomic CRISPR region

integrase(COG1518)

genomic DNA

genomic DNA with new target-derived spacer

OR

New psiRNA generation

Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12

Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity

Key validation

bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease

Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes

(1) Adaptation

(2) Expression amp Processing

(3) Interference

invading virus plasmid

CRISPRleader

5 4 3 2 1

6 5 4 3 2 1

invading virus plasmid

invading virus plasmid

host

host

cas operon

Cascade

Cas3

cas3 cas2cas1cas5

Van der Oost et al TIBS 2009

The three types of CRISPRCas systems and their signature genes

1343

1343

1343

1343

Helicase1203

HD-f

Cas2

Cas6

Cas6

Cas5

Cas8

Cas9

Cas1

Cas3

Cas10

Cas11

Cas7

Cas6

Cas6

Cas5 Cas6

Cas4

Cas6

13433513

1343

RuvC-f

3513

HNH

13433513

1857

18571857 RAMP1688

RAMP1688

RecB-f1468

RecB-f1468

Helicase1203

R AMP1583

1857 1857Helicase1203

1857Helicase1203

1343 y1724

ygcKygcL

EcoliCASS2

YpestCASS3

NmeniCASS4

MtubeCASS6 RAMP module

DvulgCASS1

Tneap and HmariCASS7

ApernCASS5

Type I (Helicase cassette)

Cascade

Type II (HNH-type)

Type III (RAMP module or Polymerase cassette)

CASS4a

1857 1343RecB-f1468

Helicase1203

BH0338MTH1090LA3191

1421 RAMP1567

RAMP15833337RAMP

1583

RAMP1583

RAMP1769

SPy1049

Cas6

Cas61857

RecB-f1468

RAMP1688

Helicase1203

AF0070

R AMP1583

1343

1517

Cas12 Cas13

RAMP1688

RAMP1688

RAMP1688

Makarova et al NRM submitted

Experimental data on CRISPRCas systems

1857

1857

1857

RAMP1688

10

Cordip DIP0037Bacfra BF3951

Wolsuc WS1444Camjej Cj1522c

Neimen NMA0630Pasmul PM1126

Strthe str0658Strpyo Spy 0770

Treden TDE0328Staepi SERP2463

Mycgal MGA 0523Mycmob MMOB0320

Porgin PG1982Legpne lpp0161

Wolsuc WS1615Sulsol SSO1450

Aerper APE1240Sulsol SSO1405

Pyraer PAE0200Pyraer PAE0081

Metmaz MM0559Metmaz MM3249

Arcful AF1878Mansuc MS1635

Niteur NE0111Myctub Rv2817c

Nostoc alr0381Synech slr7071Thethe TTHB145

Chrvio CV1229Xanaxo XAC3842

Chltep CT1130Desvul DVUA0134

Metcap MCA0651Azoarc ebA3283

Deigeo_DRAFT_1682Mansuc MS0981

Baccla ABC3592Bachal BH0341

Strpyo Spy 1286Vibvul VVA1544

Nostoc alr1468Synech slr7092

Synech slr7016Geosul GSU0057

Thethe TTHB224Nostoc alr1568

Metkan MK1312

Esccol b2755Saltyp STM2938

Phopro PBPRC0034Geosul GSU1392

Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639

Symthe_STH663Corjei jk0643

Nocfar nfa44220Strave SAV7537

Metcap MCA0930Cordip DIP2214

Chrvio CV1756Zymmob ZMO0680

Pasmul PM0311Erwcar ECA3679

Yerpes y1722Legpne lpl2837Phopro PBPRB1995

Acinet ACIAD2484

Clotet CTC01148Fusnuc FN1177

Aquaeo aq 369Theten TTE2658Themar TM1797

Bacfra BF2544Porgin PG2014

Halmar pNG4053Clotet CTC01463

Metjan MJ0378Pyrhor PH1245

Thekod TK0455Metthe MTH1084

Arcful AF2435Metace MA3670

Nanequ NEQ017Pyrhor PH0173

Pictor PTO0003Pictor PTO0049Thevol TVN0106

1343

1343

1343

1343

1518

1518

1343

1518

RecB-f4343

RecB-f1468

RecB-f1468

RecB-f1468

Helicase12031857 RAMP

1688

RAMP1688

Helicase1203

Helicase1203

RAMPBH0337

RAMPYgcH

RAMPy1727

RAMPy17261857

Helicase1203

HD-f

Helicase1203

HD-f

1343

HD-f

HD-f

HD-f

RuvC-f

AF0070

BH0338

y1724

ygcKygcL

RAMP1583

HTH

AF1870

SPy1049

3574

AF1870

PH0918

HTH 3574

1857 RAMP1688

RecB-f1468

Helicase1203

HD-f

AF1870

MTH1090

AF1873

RAMP1583

3574

AF1870

3574

1343

1

7

2

3

4

5

6

7a

Sulsol SSO1999Sulsol SSO1440

Sultok ST2639Sultok ST0032

Sulsol SSO1402Pyraer PAE0068

Arcful AF1874Pyrhor PH0917Thekod TK0450

Pyraby PAB1689

13433513

3513

1518

RuvC-f

HNH

4aHNH

1 0

Pyrhor PH0162

6

64

6

4

16

17

7

7

7

7

1

2

7

7

7

7

5 7

5 75 7

5 75 7

5 7

6 1

7

Metkan MK1314

Metace MA1931

Metmaz MM3356

Metbar Mbar_A1355

Strthe stu0960Myctub Rv2823cStaepi SERP2461

Themar TM1811Mansuc MS1651

Niteur NE0123Thethe TTP0102

Metthe MTH1082

Metthe MTH326

Metjan MJ1672Pictor PTO0053Thevol TVN0112

Metkan MK1297Thethe TTP0115

Bachal BH0328Synech sll7090

Aquaeo aq 387Theten TTE2639

Themar TM1794Arcful AF1867

Pyrfur PF1129Thefus Tfu1578

Porgin PG1987Sultok ST1979

Sulsol SSO1991Sulsol SSO1729

Sultok ST0012Sulaci Saci 2046Nostoc all1479

1421 RAMP1567

RAMP1583

RAMPMTH323

RAMP1769

XMTH324

ldquoPolymeraserdquo cassette

ldquoHelicaserdquo cassette(A)

(B)

(C)

Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4

Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56

Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)

mRNA targeting

Phylogeny of Cas1 and the 3 types of CRISPRCas systems

228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes

EcoliCASS2Type IType I

Type I

Type I

Type I

Type III

Type III

Type III

Type III

Type III

Type III

Type II

YpestCASS3

NmeniCASS4

PolymeraseRAMPmoduleMtubeCASS6

PolRAMP

DvulgCASS1

TneapHmariCASS7

ApernCASS5

cas1cas9 cas2 cas4

cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI

II

IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2

RAMP

Makarova et al NRM submitted

CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria

bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes

54

256

1537

26

516

2

9

7 56107

3

10

13

383

210

46

216

8

1

7 70211

10

1

0102030405060708090

100

1623

84 6

17 7

2322

0

9

0

0

151

2

1

21

17

20 1

0

1533 28 7 14

0

9 7 40

117 210

0102030405060708090

100

Type I Type II Type IIICas1presenceabsence

Makarova et al in preparation

Modular evolution of the 3 types of CRISPRCas

Makarova et al in preparation

Mol Microbiol 2011 Jan79(2)484-502

A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair

Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF

Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair

Back to repair functions

The 3 major modalities of evolution

Koonin Wolf Biol Direct 2009

CRISPRCas as a bona fide Lamarckian system

Koonin Wolf Biol Direct 2009

Phenomenon Biological rolefunction

Phyletic spread

Lamarckian criteria

Genomic changes caused by environmental factor

Changes are specific to relevant genomic loci

Changes provide adaptation to the causative factor

Bona fide LamarckianCRISPRCas Defense against

viruses and other mobile elements

Most of the Archaeaand many bacteria

Yes Yes Yes

piRNA Defense against transposable elements in germline

Animals Yes Yes Yes

HGT (specific cases)

Adaptation to new environment stress response resistance

Archaea bacteria unicellular eukaryotes

Yes Yes Yes

Quasi-LamarckianHGT (general phenomenon)

Diverse innovations Archaea bacteria unicellular eukaryotes

Yes No Yesno

Stress-induced mutagenesis

Stress responseresistanceadaptation to new conditions

Ubiquitous Yes No or partially

Yes (but general evolvabilityenhanced as well)

Diverse Lamarckian and quasi-Lamarckian phenomena

Koonin Wolf Biol Direct 2009

Stress as a gauge of evolutionary modality

Koonin Wolf Biol Direct 2009

bullEvolution of parasites is intrinsic to any replicator system

bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since

bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms

bullPerennial arms race between parasites and hosts is one of the principal factors of evolution

Valerian DoljaOregon State University

Yuri Wolf NCBITatiana Senkevich NIAID NIH

Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille

Didier Raoult et al

Bill MartinUniv Duesseldorf

Kira Makarova NCBI

  • Slide Number 1
  • What is a virus
  • Slide Number 3
  • Slide Number 4
  • Slide Number 5
  • Comparative genomics shows that viruses that cause human diseases belong to families that evolved hundreds of millions or even billions years agoViruses accompany the evolving cellular life throughout its history and might even predate it
  • Slide Number 7
  • Slide Number 8
  • Slide Number 9
  • Slide Number 10
  • Slide Number 11
  • Slide Number 12
  • Slide Number 13
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Slide Number 18
  • Slide Number 19
  • Slide Number 20
  • Slide Number 21
  • Slide Number 22
  • Slide Number 23
  • Slide Number 24
  • Slide Number 25
  • Slide Number 26
  • Slide Number 27
  • Slide Number 28
  • Slide Number 29
  • Slide Number 30
  • Slide Number 31
  • Slide Number 32
  • Slide Number 33
  • Slide Number 34
  • Slide Number 35
  • Slide Number 36
  • Evolution of antivirus defense systems
  • Slide Number 38
  • Slide Number 39
  • Slide Number 40
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins extreme sequence diversity
  • Slide Number 46
  • Slide Number 47
  • Slide Number 48
  • Slide Number 49
  • Hypothesis
  • Slide Number 51
  • Slide Number 52
  • Slide Number 53
  • Slide Number 54
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • The three types of CRISPRCas systems and their signature genes
  • Experimental data on CRISPRCas systems
  • Phylogeny of Cas1 and the 3 types of CRISPRCas systems
  • CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
  • Slide Number 62
  • Slide Number 63
  • Slide Number 64
  • Slide Number 65
  • Slide Number 66
  • Slide Number 67
  • Slide Number 68
  • Slide Number 69
Page 30: The Virus World, its evolution, evolution of antiviral defense

4 Rolling circle replicationinitiation endonuclease

1 Jelly-roll capsid protein

2 Superfamily 3 helicase

Protein products of viral hallmark genes

3 RNA-dependent RNA polymeraseand Reverse transcriptase

5 Viral archaeo-eukaryotic DNA primase

6 UL9-like superfamily 2 helicase

7 Packaging ATPase of the FtsK family

8 ATPase subunit of terminase

The primordial gene pool hypothesis (extremely counterintuitive ndash Santiago Elena Feb 16 2011)

The hallmark genes AND by implication the major lineages of modern viruses (at least viruses of prokaryotes) descend directly from a primordial gene pool

Koonin Senkevich Dolja Biol Direct 2006 129

Viral hallmark genes arebullpresent in a huge diversity of viruses and other selfish elementsbullrepresented only by remote homologs in cellular life forms

-synergy with the diversity of genomic strategies

A crucial corollary If viruses come directly from a primordial gene pool then origin of viruses isinextricably linked to the origin of cells

Cell degeneration Escaped genes Pr imordial geneticsystems

CELL

SMALLPARASITICCELL

VERYSMALLVIRUS

CHROMOSOME

PLASMID

VIRUS

RNA

VIRUS

PRE-CELLULAR LIFE FORMS

DNA

VIRUS

mRNA

VIRUS

Competing concepts of the or igin of viruses

CELLS

Origin and evolution of virus-like genetic

elementsin the pre-cellular

era

Koonin Martin TIG 2005Koonin Senkevich Dolja Biol Direct 2006 129Koonin Ann NY Acad Sci 2009

Replicon fusionyields chromosomes

bullViruses and virus-like genetic elements are not ldquojustrdquo pathogens they aredominant entities in the biospherebullEmergence of virus-like parasites is inevitable in any replicating systembullIn the pre-cellular epoch the genetic elements that later became viral andcellular genomes comprised a single pool in which they mixed matched andevolved new increasingly complex gene ensemblesbullDifferent replication strategies including RNA replication reverse transcriptionand DNA replication evolved already in the primordial genetic poolbullWith the emergence of prokaryotic cells a distinct pool of viral genes formed that retained its identity ever since as evidenced by the extant distribution ofviral hallmark genes ldquovirus worldrdquo or the virospherebullThe emergence of the eukaryotic cell was a second melting pot of virus evolution from which viruses of eukaryotes originated via recombination of genes from prokaryote viruses retroelements and the evolving eukaryotic hostbullViruses make essential contributions to the evolution of the genomes of cellularlife forms in particular as vehicles of HGT GTAs transducing phages

The ancient Virus World (VW)

Koonin EV Senkevich TG Dolja VV The ancient Virus World and evolution of cells Biol Direct 2006 Koonin EV Wolf YI Nagasaki K Dolja VV The complexity of the virus world Nat Rev Microbiol 2009

Evolution of antivirus defense systems

bull CRISPRCas system of adaptive immunity in prokaryotes

bull A case for Lamarckian evolutionbull The perennial virus-host arms race

CRISPR repeats and Cas genesCRISPR Clustered Regularly interspaced short palindromic repeatsCas CRISPR-associated (genes)

Sorek et al Nature Rev Microbiol 2008

Makarova KS Aravind L Grishin NV Rogozin IB Koonin EVA DNA repair system specific for thermophilic Archaea and bacteria predicted by genomic context analysisNucleic Acids Res 2002 Jan 1530(2)482-96

During a systematic analysis of conserved gene context in prokaryotic genomes a previously undetected complex partially conserved neighborhood consisting of more than 20 genes was discovered in most Archaea and some bacteria including the hyperthermophiles Thermotoga maritima and Aquifex aeolicus The gene composition and gene order in this neighborhood vary greatly between species but all versions have a stable conserved core that consists of five genes One of the core genes encodes a predicted DNA helicase often fused to a predicted HD-superfamily hydrolase and another encodes a RecB family exonuclease three core genes remain uncharacterized but one of these might encode a nuclease of a new family helliphelliphelliphellipThe functional features of the proteins encoded in this neighborhood suggest that they comprise a previously undetected DNA repair system which to our knowledge is the first repair system largely specific for thermophiles to be identified

Protein components of the system an update with unification of many diverse families

~25 families altogetherFamily SubfamilyA Phyletic

distributionBComments

1 COG1518 COG1518 (cas1) All Putative novel nucleaseintegrase Mostly α-helical protein

2 COG1343 COG1343 (cas2) COG3512ygbF-like

MTH324-likey1723_N-like

All Small protein related to VapD fused to helicase (COG1203) iny1723-like proteins

3 COG1203 COG1203 (cas3) All DNA helicase Most proteins have fusion to HD nuclease

4 RecB-like nuclease

COG1468 (cas4) COG4343 All RecB-like nuclease Contains three-cysteine C-terminal cluster

5 RAMP Repair-associated mysterious

protein

COG1688 (cas5)COG1769 COG1583COG1567 COG1336COG1367 COG1604COG1337 COG1332COG5551

BH0337-likeMJ0978-likeYgcH-like

y1726-like y1727-like

All Belong to ldquoRAMPrdquo family possibly RNA-binding proteinstructurally related to duplicated ferredoxin fold (PDB 1wj9)

6 COG1857 COG1857 COG3649 YgcJ-like y1725-like

All αβ protein predicted nuclease or integrase

7 HD-like nuclease

COG1203 (N-terminus) COG2254 All HD-like nuclease

8 BH0338 BH0338-likeMTH1090-like

All mostlyarchaea and FIRM

Large Zn-finger containing proteins probably nucleases (nucleaseactivity was shown for MTH1090 (ref)

9 COG1353 predicted

polymerase

COG1353 Most archaea some bacteria

Predicted palm-domain polymerase distantly related to viral RdRp and RT

hellipand ~15 other less common proteins Makarova et al BD 2006

CRISPR clustered regularly interspaced short palindromic repeats

CRISPR repeats

TB and TA

Strepto-like

Ecoli-like

Pasteurella-like

Cas2COG1343

Cas1COG1518

Cas4COG1468

Cas3COG1203

ldquoThe common structural characteristics of CRISPR loci are (i) the presence of multiple short direct repeats which show no or very little sequence variation within a given locus (ii) the presence of non-repetitive spacer sequences between the repeats of similar size (iii) the presence of a common leader sequence of a few hundred basepairs in most species harbouring multiple CRISPR loci (iv) the absence of long open reading frames within the locus and (v) the presence of the cas1 gene accompanied by the cas2 cas3 or cas4 genes in CRISPR-containing speciesrdquo

The cas genes are our ldquorepairrdquo system

ldquoHelicaserdquo cassette

ldquoPolymeraserdquo cassette

paragon of prokaryotic genome plasticity-extensive gene shuffling-evidence of multiple horizontal transfers-widespread gene loss

RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins

extreme sequence diversity MOTIFs specific I II III IV V COGs 13361367 hhshhGs ust-lKGhh+hh hhGtt hD lGhttsGh 160413371332 y1726-like slhlpEKuVRGT lRTIDs YGuVTsGhuh COG1851 hGphpGpsaFh hGFGRh BH0337-like TpA-h+GIh-uIh hhLpDV LGsREhuht COG1567 hhhpp ups-lhtAhh luscoGhGh COG1769 hhhh+Ph-hhh ss-hhGhlhsh hGh hGtcp+hsthchp COG1688 (Cas5) hhhhhts sssshhGhlsh lGttphh COGs 15835551 hhhhoPhhl hGtppshGFGl YgcH-like hHphlh hGu+uhGhGhh y1727-like LHphLh GFsthGLStss MJ0978-like hHNH lG+tsuhGhGol

Ferredoxin foldRNA-recognition motif (RRM)

The sequence similarity space of CRISPR repeats visualized with the BioLayoutprogram [26] Dots denote individual repeat sequences connecting lines represent Smith-Waterman similarities such that closer dots represent more similar sequences Dot colors denote cluster association as derived from MCL clustering The 12 largest clusters are indicated by circles together with their sequence logos coarse phylogenetic composition and sample secondary structures where applicableKunin et al Genome Biology 2007 8R61 doi101186gb-2007-8-4-r61

CRISPR show extreme diversity and complex clustering

Arcful GTTGAAATC-------AGACCAAAATGGGATTGAAAG- Metthe GTTAAAATC-------AGACCAAAATGGGATTGAAAT- Pyraby GTTCCAATA-------AGACTAGAATAGAATTGAAAG- Pyrhor GTTTAATAA--------GACTAAAATAGAATTGAAAG- Sulsol GATTAATCC-------------CAAAAGAATTGAAAG- Pyrhor GTTTCCGTA--------GAACTAAATAGTGTGGAAAG- Pyraby GTTTCCGTA--------GAACTTAGTAGTGTGGAAAG- Pyraer GTTTCAACT------------ATCTTTTGATTTTTGG- Pictor GTTAAATAA-------TAACCTAAATAGGATTGAAAG- Theaer GTAAAATAG---------ACCTTAATAGGATTGAAAG- Metace GTTTCAATC-------CCTCAAAGGTCTGATTTTAAC- Sultok GATGAATCC------------CAAAAGGAATTGAAAG- Sulaci GTTTTAGTT-------------TCTTGTCGTTATTAC- Pictor GTTTAAGAA--------TTACTAGATAGTATGGAGT-- Halmar GTTTCAGAC-------GGACCCTTGTGGGGTTGAAGC- Metjan ATTAAAATC-------AGACCGTTTCGGAATGGAAAT- Metkan GTTTCATTA-CCCGTATTATTACGGGTTAATTGCGAG- Nanequ CTTTCAATA---------TTTCTAATATATTAGAAAC- Themar GTTTCAATA-------CTTCCTTAGAGGTATGGAAAC- Theten GTTTCAATC-------CCTTTTAGGTAGGCTAAAAAC Thethe GTTGCAAAC-CTCGTTAGCCTCGTAGAGGATTGAAAC- Aquaeo GTTTTAACT--------CCACACGGTACATTAGAAAC- Bachal GTCGCACTC----TATATGGGT-GCGTGGATTGAAAT- Azoarcus GTGTTCCCC-------GCGCATCGCGGGGGTTGAAG-- Chltep GTCTTCCCC--------ACGCC-CGTGGGGGTGTTTC- Chrvio GTGCTCCCC-------CACGCA-CGTGGGGATGAACCG Clotet GTATTAGTA-------GCACCA-TATTGGAATGTAAAT Anabaena GTTTTAATTAACAAAAATCCCTATCAGGGATTGAAAC- Myctub GTTTCCGTC--CCCTCTCGGGGTTTTGGGTCTGACGAC Yerpes GTTCACTGC--------CGCACAGGCAGCTTAGAAA-- Metcap GTTTCAATCCACTCCCGGCTATTTAGCCGGGAGATAC- Mycgal GTTTTAGCA-CTGTACAATACTTGTGTAAGCAATAAC- Pasmul GTTAACTGC--------CGTATAGGCAGCTTAGAAA-- Pasmul GTTGTAGTTCCCTCTCTCATTTCGCAGTGCTACAAT-- Pasmul GTTCACCAT--------CGTGTAGATGGCTTAGAAA--

All

A large subset of CRISPRs is conserved among diverse specieseven between archaea and bacteria suggesting that

CRISPRs are horizontally transferred together with cas genes

Mojica FJ Diez-Villasenor C Garcia-Martinez J Soria E

Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elementsJ Mol Evol 2005 Feb60(2)174-82

Here we show that CRISPR spacers derive from preexisting sequences either chromosomal or within transmissible genetic elements such as bacteriophages and conjugative plasmids

helliphelliphelliphellipThe transcription of the CRISPR loci (Tang et al 2002) suggests that such activity could be executed by CRISPR-RNA molecules acting as regulatory RNA that specifically recognizes the target through the homologous RNA-spacer sequence similarly to the eukaryotic interference RNA

HypothesisCRISPRCasbull is a prokaryotic immunity system that functions on the

RNAi principle

bull integrates short fragments of essential phageplasmid genes into CRISPRs

bull When expressed these fragments (psiRNA ndash after prokaryotic siRNA) silence the target gene and make the organism immune to the respective agent

bull contains all or most of the protein activities involved in these processes

bull Some of the Cas proteins are functional analogs of the eukaryotic proteins involved in RNAi in particular components of RISC (RNA-Induced Silencing Complex) and form prokaryotic analogs of RISC (pRISCs)

Transcription

(protein-guided) RNA folding

specific transcription factor (COG1517)

cellular RNApol

polycistronic pre-psiRNA

p-dicer (Helicase+HD-Hydrolase)(COG1203 COG2254)

RNA processing(fast)

pre-psiRNA75-100 nt

RNA processing(slow)

p-dicer

psiRNA25-45 nt

p-slicer(COG1468 COG4343

COG1857)

RAMP

target RNA cleavage

p-RISC

plasmid or phage mRNA

Annealing to RNA target

p-RISC

The basic scheme of CASS functioning

psiRNA25-45 nt

RAMP

RAMP-RNA complex(unstable)plasmid or phage

mRNA

Primer elongationCASS RNApol(COG1353)

Amplified annealing

long dsRNA (stable)

p-dicerDuplex degradation

RAMPRAMP binding

Cycle continues

Annealing to RNA target

Variant of CASS functioning with polymerasepsi-RNA amplification(mostly in thermophiles but also in Mycobacteria)

plasmid or phage mRNA

pre-psiRNA75-100 nt

CASS RNApol(COG1353)Or other RT

Reverse transcription with random copy choice

Random RNA recombination and reverse transcription

dsDNA with CRISPRs and target-derived spacers

Homologous recombination with genomic CRISPR region

integrase(COG1518)

genomic DNA

genomic DNA with new target-derived spacer

OR

New psiRNA generation

Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12

Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity

Key validation

bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease

Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes

(1) Adaptation

(2) Expression amp Processing

(3) Interference

invading virus plasmid

CRISPRleader

5 4 3 2 1

6 5 4 3 2 1

invading virus plasmid

invading virus plasmid

host

host

cas operon

Cascade

Cas3

cas3 cas2cas1cas5

Van der Oost et al TIBS 2009

The three types of CRISPRCas systems and their signature genes

1343

1343

1343

1343

Helicase1203

HD-f

Cas2

Cas6

Cas6

Cas5

Cas8

Cas9

Cas1

Cas3

Cas10

Cas11

Cas7

Cas6

Cas6

Cas5 Cas6

Cas4

Cas6

13433513

1343

RuvC-f

3513

HNH

13433513

1857

18571857 RAMP1688

RAMP1688

RecB-f1468

RecB-f1468

Helicase1203

R AMP1583

1857 1857Helicase1203

1857Helicase1203

1343 y1724

ygcKygcL

EcoliCASS2

YpestCASS3

NmeniCASS4

MtubeCASS6 RAMP module

DvulgCASS1

Tneap and HmariCASS7

ApernCASS5

Type I (Helicase cassette)

Cascade

Type II (HNH-type)

Type III (RAMP module or Polymerase cassette)

CASS4a

1857 1343RecB-f1468

Helicase1203

BH0338MTH1090LA3191

1421 RAMP1567

RAMP15833337RAMP

1583

RAMP1583

RAMP1769

SPy1049

Cas6

Cas61857

RecB-f1468

RAMP1688

Helicase1203

AF0070

R AMP1583

1343

1517

Cas12 Cas13

RAMP1688

RAMP1688

RAMP1688

Makarova et al NRM submitted

Experimental data on CRISPRCas systems

1857

1857

1857

RAMP1688

10

Cordip DIP0037Bacfra BF3951

Wolsuc WS1444Camjej Cj1522c

Neimen NMA0630Pasmul PM1126

Strthe str0658Strpyo Spy 0770

Treden TDE0328Staepi SERP2463

Mycgal MGA 0523Mycmob MMOB0320

Porgin PG1982Legpne lpp0161

Wolsuc WS1615Sulsol SSO1450

Aerper APE1240Sulsol SSO1405

Pyraer PAE0200Pyraer PAE0081

Metmaz MM0559Metmaz MM3249

Arcful AF1878Mansuc MS1635

Niteur NE0111Myctub Rv2817c

Nostoc alr0381Synech slr7071Thethe TTHB145

Chrvio CV1229Xanaxo XAC3842

Chltep CT1130Desvul DVUA0134

Metcap MCA0651Azoarc ebA3283

Deigeo_DRAFT_1682Mansuc MS0981

Baccla ABC3592Bachal BH0341

Strpyo Spy 1286Vibvul VVA1544

Nostoc alr1468Synech slr7092

Synech slr7016Geosul GSU0057

Thethe TTHB224Nostoc alr1568

Metkan MK1312

Esccol b2755Saltyp STM2938

Phopro PBPRC0034Geosul GSU1392

Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639

Symthe_STH663Corjei jk0643

Nocfar nfa44220Strave SAV7537

Metcap MCA0930Cordip DIP2214

Chrvio CV1756Zymmob ZMO0680

Pasmul PM0311Erwcar ECA3679

Yerpes y1722Legpne lpl2837Phopro PBPRB1995

Acinet ACIAD2484

Clotet CTC01148Fusnuc FN1177

Aquaeo aq 369Theten TTE2658Themar TM1797

Bacfra BF2544Porgin PG2014

Halmar pNG4053Clotet CTC01463

Metjan MJ0378Pyrhor PH1245

Thekod TK0455Metthe MTH1084

Arcful AF2435Metace MA3670

Nanequ NEQ017Pyrhor PH0173

Pictor PTO0003Pictor PTO0049Thevol TVN0106

1343

1343

1343

1343

1518

1518

1343

1518

RecB-f4343

RecB-f1468

RecB-f1468

RecB-f1468

Helicase12031857 RAMP

1688

RAMP1688

Helicase1203

Helicase1203

RAMPBH0337

RAMPYgcH

RAMPy1727

RAMPy17261857

Helicase1203

HD-f

Helicase1203

HD-f

1343

HD-f

HD-f

HD-f

RuvC-f

AF0070

BH0338

y1724

ygcKygcL

RAMP1583

HTH

AF1870

SPy1049

3574

AF1870

PH0918

HTH 3574

1857 RAMP1688

RecB-f1468

Helicase1203

HD-f

AF1870

MTH1090

AF1873

RAMP1583

3574

AF1870

3574

1343

1

7

2

3

4

5

6

7a

Sulsol SSO1999Sulsol SSO1440

Sultok ST2639Sultok ST0032

Sulsol SSO1402Pyraer PAE0068

Arcful AF1874Pyrhor PH0917Thekod TK0450

Pyraby PAB1689

13433513

3513

1518

RuvC-f

HNH

4aHNH

1 0

Pyrhor PH0162

6

64

6

4

16

17

7

7

7

7

1

2

7

7

7

7

5 7

5 75 7

5 75 7

5 7

6 1

7

Metkan MK1314

Metace MA1931

Metmaz MM3356

Metbar Mbar_A1355

Strthe stu0960Myctub Rv2823cStaepi SERP2461

Themar TM1811Mansuc MS1651

Niteur NE0123Thethe TTP0102

Metthe MTH1082

Metthe MTH326

Metjan MJ1672Pictor PTO0053Thevol TVN0112

Metkan MK1297Thethe TTP0115

Bachal BH0328Synech sll7090

Aquaeo aq 387Theten TTE2639

Themar TM1794Arcful AF1867

Pyrfur PF1129Thefus Tfu1578

Porgin PG1987Sultok ST1979

Sulsol SSO1991Sulsol SSO1729

Sultok ST0012Sulaci Saci 2046Nostoc all1479

1421 RAMP1567

RAMP1583

RAMPMTH323

RAMP1769

XMTH324

ldquoPolymeraserdquo cassette

ldquoHelicaserdquo cassette(A)

(B)

(C)

Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4

Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56

Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)

mRNA targeting

Phylogeny of Cas1 and the 3 types of CRISPRCas systems

228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes

EcoliCASS2Type IType I

Type I

Type I

Type I

Type III

Type III

Type III

Type III

Type III

Type III

Type II

YpestCASS3

NmeniCASS4

PolymeraseRAMPmoduleMtubeCASS6

PolRAMP

DvulgCASS1

TneapHmariCASS7

ApernCASS5

cas1cas9 cas2 cas4

cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI

II

IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2

RAMP

Makarova et al NRM submitted

CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria

bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes

54

256

1537

26

516

2

9

7 56107

3

10

13

383

210

46

216

8

1

7 70211

10

1

0102030405060708090

100

1623

84 6

17 7

2322

0

9

0

0

151

2

1

21

17

20 1

0

1533 28 7 14

0

9 7 40

117 210

0102030405060708090

100

Type I Type II Type IIICas1presenceabsence

Makarova et al in preparation

Modular evolution of the 3 types of CRISPRCas

Makarova et al in preparation

Mol Microbiol 2011 Jan79(2)484-502

A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair

Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF

Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair

Back to repair functions

The 3 major modalities of evolution

Koonin Wolf Biol Direct 2009

CRISPRCas as a bona fide Lamarckian system

Koonin Wolf Biol Direct 2009

Phenomenon Biological rolefunction

Phyletic spread

Lamarckian criteria

Genomic changes caused by environmental factor

Changes are specific to relevant genomic loci

Changes provide adaptation to the causative factor

Bona fide LamarckianCRISPRCas Defense against

viruses and other mobile elements

Most of the Archaeaand many bacteria

Yes Yes Yes

piRNA Defense against transposable elements in germline

Animals Yes Yes Yes

HGT (specific cases)

Adaptation to new environment stress response resistance

Archaea bacteria unicellular eukaryotes

Yes Yes Yes

Quasi-LamarckianHGT (general phenomenon)

Diverse innovations Archaea bacteria unicellular eukaryotes

Yes No Yesno

Stress-induced mutagenesis

Stress responseresistanceadaptation to new conditions

Ubiquitous Yes No or partially

Yes (but general evolvabilityenhanced as well)

Diverse Lamarckian and quasi-Lamarckian phenomena

Koonin Wolf Biol Direct 2009

Stress as a gauge of evolutionary modality

Koonin Wolf Biol Direct 2009

bullEvolution of parasites is intrinsic to any replicator system

bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since

bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms

bullPerennial arms race between parasites and hosts is one of the principal factors of evolution

Valerian DoljaOregon State University

Yuri Wolf NCBITatiana Senkevich NIAID NIH

Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille

Didier Raoult et al

Bill MartinUniv Duesseldorf

Kira Makarova NCBI

  • Slide Number 1
  • What is a virus
  • Slide Number 3
  • Slide Number 4
  • Slide Number 5
  • Comparative genomics shows that viruses that cause human diseases belong to families that evolved hundreds of millions or even billions years agoViruses accompany the evolving cellular life throughout its history and might even predate it
  • Slide Number 7
  • Slide Number 8
  • Slide Number 9
  • Slide Number 10
  • Slide Number 11
  • Slide Number 12
  • Slide Number 13
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Slide Number 18
  • Slide Number 19
  • Slide Number 20
  • Slide Number 21
  • Slide Number 22
  • Slide Number 23
  • Slide Number 24
  • Slide Number 25
  • Slide Number 26
  • Slide Number 27
  • Slide Number 28
  • Slide Number 29
  • Slide Number 30
  • Slide Number 31
  • Slide Number 32
  • Slide Number 33
  • Slide Number 34
  • Slide Number 35
  • Slide Number 36
  • Evolution of antivirus defense systems
  • Slide Number 38
  • Slide Number 39
  • Slide Number 40
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins extreme sequence diversity
  • Slide Number 46
  • Slide Number 47
  • Slide Number 48
  • Slide Number 49
  • Hypothesis
  • Slide Number 51
  • Slide Number 52
  • Slide Number 53
  • Slide Number 54
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • The three types of CRISPRCas systems and their signature genes
  • Experimental data on CRISPRCas systems
  • Phylogeny of Cas1 and the 3 types of CRISPRCas systems
  • CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
  • Slide Number 62
  • Slide Number 63
  • Slide Number 64
  • Slide Number 65
  • Slide Number 66
  • Slide Number 67
  • Slide Number 68
  • Slide Number 69
Page 31: The Virus World, its evolution, evolution of antiviral defense

The primordial gene pool hypothesis (extremely counterintuitive ndash Santiago Elena Feb 16 2011)

The hallmark genes AND by implication the major lineages of modern viruses (at least viruses of prokaryotes) descend directly from a primordial gene pool

Koonin Senkevich Dolja Biol Direct 2006 129

Viral hallmark genes arebullpresent in a huge diversity of viruses and other selfish elementsbullrepresented only by remote homologs in cellular life forms

-synergy with the diversity of genomic strategies

A crucial corollary If viruses come directly from a primordial gene pool then origin of viruses isinextricably linked to the origin of cells

Cell degeneration Escaped genes Pr imordial geneticsystems

CELL

SMALLPARASITICCELL

VERYSMALLVIRUS

CHROMOSOME

PLASMID

VIRUS

RNA

VIRUS

PRE-CELLULAR LIFE FORMS

DNA

VIRUS

mRNA

VIRUS

Competing concepts of the or igin of viruses

CELLS

Origin and evolution of virus-like genetic

elementsin the pre-cellular

era

Koonin Martin TIG 2005Koonin Senkevich Dolja Biol Direct 2006 129Koonin Ann NY Acad Sci 2009

Replicon fusionyields chromosomes

bullViruses and virus-like genetic elements are not ldquojustrdquo pathogens they aredominant entities in the biospherebullEmergence of virus-like parasites is inevitable in any replicating systembullIn the pre-cellular epoch the genetic elements that later became viral andcellular genomes comprised a single pool in which they mixed matched andevolved new increasingly complex gene ensemblesbullDifferent replication strategies including RNA replication reverse transcriptionand DNA replication evolved already in the primordial genetic poolbullWith the emergence of prokaryotic cells a distinct pool of viral genes formed that retained its identity ever since as evidenced by the extant distribution ofviral hallmark genes ldquovirus worldrdquo or the virospherebullThe emergence of the eukaryotic cell was a second melting pot of virus evolution from which viruses of eukaryotes originated via recombination of genes from prokaryote viruses retroelements and the evolving eukaryotic hostbullViruses make essential contributions to the evolution of the genomes of cellularlife forms in particular as vehicles of HGT GTAs transducing phages

The ancient Virus World (VW)

Koonin EV Senkevich TG Dolja VV The ancient Virus World and evolution of cells Biol Direct 2006 Koonin EV Wolf YI Nagasaki K Dolja VV The complexity of the virus world Nat Rev Microbiol 2009

Evolution of antivirus defense systems

bull CRISPRCas system of adaptive immunity in prokaryotes

bull A case for Lamarckian evolutionbull The perennial virus-host arms race

CRISPR repeats and Cas genesCRISPR Clustered Regularly interspaced short palindromic repeatsCas CRISPR-associated (genes)

Sorek et al Nature Rev Microbiol 2008

Makarova KS Aravind L Grishin NV Rogozin IB Koonin EVA DNA repair system specific for thermophilic Archaea and bacteria predicted by genomic context analysisNucleic Acids Res 2002 Jan 1530(2)482-96

During a systematic analysis of conserved gene context in prokaryotic genomes a previously undetected complex partially conserved neighborhood consisting of more than 20 genes was discovered in most Archaea and some bacteria including the hyperthermophiles Thermotoga maritima and Aquifex aeolicus The gene composition and gene order in this neighborhood vary greatly between species but all versions have a stable conserved core that consists of five genes One of the core genes encodes a predicted DNA helicase often fused to a predicted HD-superfamily hydrolase and another encodes a RecB family exonuclease three core genes remain uncharacterized but one of these might encode a nuclease of a new family helliphelliphelliphellipThe functional features of the proteins encoded in this neighborhood suggest that they comprise a previously undetected DNA repair system which to our knowledge is the first repair system largely specific for thermophiles to be identified

Protein components of the system an update with unification of many diverse families

~25 families altogetherFamily SubfamilyA Phyletic

distributionBComments

1 COG1518 COG1518 (cas1) All Putative novel nucleaseintegrase Mostly α-helical protein

2 COG1343 COG1343 (cas2) COG3512ygbF-like

MTH324-likey1723_N-like

All Small protein related to VapD fused to helicase (COG1203) iny1723-like proteins

3 COG1203 COG1203 (cas3) All DNA helicase Most proteins have fusion to HD nuclease

4 RecB-like nuclease

COG1468 (cas4) COG4343 All RecB-like nuclease Contains three-cysteine C-terminal cluster

5 RAMP Repair-associated mysterious

protein

COG1688 (cas5)COG1769 COG1583COG1567 COG1336COG1367 COG1604COG1337 COG1332COG5551

BH0337-likeMJ0978-likeYgcH-like

y1726-like y1727-like

All Belong to ldquoRAMPrdquo family possibly RNA-binding proteinstructurally related to duplicated ferredoxin fold (PDB 1wj9)

6 COG1857 COG1857 COG3649 YgcJ-like y1725-like

All αβ protein predicted nuclease or integrase

7 HD-like nuclease

COG1203 (N-terminus) COG2254 All HD-like nuclease

8 BH0338 BH0338-likeMTH1090-like

All mostlyarchaea and FIRM

Large Zn-finger containing proteins probably nucleases (nucleaseactivity was shown for MTH1090 (ref)

9 COG1353 predicted

polymerase

COG1353 Most archaea some bacteria

Predicted palm-domain polymerase distantly related to viral RdRp and RT

hellipand ~15 other less common proteins Makarova et al BD 2006

CRISPR clustered regularly interspaced short palindromic repeats

CRISPR repeats

TB and TA

Strepto-like

Ecoli-like

Pasteurella-like

Cas2COG1343

Cas1COG1518

Cas4COG1468

Cas3COG1203

ldquoThe common structural characteristics of CRISPR loci are (i) the presence of multiple short direct repeats which show no or very little sequence variation within a given locus (ii) the presence of non-repetitive spacer sequences between the repeats of similar size (iii) the presence of a common leader sequence of a few hundred basepairs in most species harbouring multiple CRISPR loci (iv) the absence of long open reading frames within the locus and (v) the presence of the cas1 gene accompanied by the cas2 cas3 or cas4 genes in CRISPR-containing speciesrdquo

The cas genes are our ldquorepairrdquo system

ldquoHelicaserdquo cassette

ldquoPolymeraserdquo cassette

paragon of prokaryotic genome plasticity-extensive gene shuffling-evidence of multiple horizontal transfers-widespread gene loss

RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins

extreme sequence diversity MOTIFs specific I II III IV V COGs 13361367 hhshhGs ust-lKGhh+hh hhGtt hD lGhttsGh 160413371332 y1726-like slhlpEKuVRGT lRTIDs YGuVTsGhuh COG1851 hGphpGpsaFh hGFGRh BH0337-like TpA-h+GIh-uIh hhLpDV LGsREhuht COG1567 hhhpp ups-lhtAhh luscoGhGh COG1769 hhhh+Ph-hhh ss-hhGhlhsh hGh hGtcp+hsthchp COG1688 (Cas5) hhhhhts sssshhGhlsh lGttphh COGs 15835551 hhhhoPhhl hGtppshGFGl YgcH-like hHphlh hGu+uhGhGhh y1727-like LHphLh GFsthGLStss MJ0978-like hHNH lG+tsuhGhGol

Ferredoxin foldRNA-recognition motif (RRM)

The sequence similarity space of CRISPR repeats visualized with the BioLayoutprogram [26] Dots denote individual repeat sequences connecting lines represent Smith-Waterman similarities such that closer dots represent more similar sequences Dot colors denote cluster association as derived from MCL clustering The 12 largest clusters are indicated by circles together with their sequence logos coarse phylogenetic composition and sample secondary structures where applicableKunin et al Genome Biology 2007 8R61 doi101186gb-2007-8-4-r61

CRISPR show extreme diversity and complex clustering

Arcful GTTGAAATC-------AGACCAAAATGGGATTGAAAG- Metthe GTTAAAATC-------AGACCAAAATGGGATTGAAAT- Pyraby GTTCCAATA-------AGACTAGAATAGAATTGAAAG- Pyrhor GTTTAATAA--------GACTAAAATAGAATTGAAAG- Sulsol GATTAATCC-------------CAAAAGAATTGAAAG- Pyrhor GTTTCCGTA--------GAACTAAATAGTGTGGAAAG- Pyraby GTTTCCGTA--------GAACTTAGTAGTGTGGAAAG- Pyraer GTTTCAACT------------ATCTTTTGATTTTTGG- Pictor GTTAAATAA-------TAACCTAAATAGGATTGAAAG- Theaer GTAAAATAG---------ACCTTAATAGGATTGAAAG- Metace GTTTCAATC-------CCTCAAAGGTCTGATTTTAAC- Sultok GATGAATCC------------CAAAAGGAATTGAAAG- Sulaci GTTTTAGTT-------------TCTTGTCGTTATTAC- Pictor GTTTAAGAA--------TTACTAGATAGTATGGAGT-- Halmar GTTTCAGAC-------GGACCCTTGTGGGGTTGAAGC- Metjan ATTAAAATC-------AGACCGTTTCGGAATGGAAAT- Metkan GTTTCATTA-CCCGTATTATTACGGGTTAATTGCGAG- Nanequ CTTTCAATA---------TTTCTAATATATTAGAAAC- Themar GTTTCAATA-------CTTCCTTAGAGGTATGGAAAC- Theten GTTTCAATC-------CCTTTTAGGTAGGCTAAAAAC Thethe GTTGCAAAC-CTCGTTAGCCTCGTAGAGGATTGAAAC- Aquaeo GTTTTAACT--------CCACACGGTACATTAGAAAC- Bachal GTCGCACTC----TATATGGGT-GCGTGGATTGAAAT- Azoarcus GTGTTCCCC-------GCGCATCGCGGGGGTTGAAG-- Chltep GTCTTCCCC--------ACGCC-CGTGGGGGTGTTTC- Chrvio GTGCTCCCC-------CACGCA-CGTGGGGATGAACCG Clotet GTATTAGTA-------GCACCA-TATTGGAATGTAAAT Anabaena GTTTTAATTAACAAAAATCCCTATCAGGGATTGAAAC- Myctub GTTTCCGTC--CCCTCTCGGGGTTTTGGGTCTGACGAC Yerpes GTTCACTGC--------CGCACAGGCAGCTTAGAAA-- Metcap GTTTCAATCCACTCCCGGCTATTTAGCCGGGAGATAC- Mycgal GTTTTAGCA-CTGTACAATACTTGTGTAAGCAATAAC- Pasmul GTTAACTGC--------CGTATAGGCAGCTTAGAAA-- Pasmul GTTGTAGTTCCCTCTCTCATTTCGCAGTGCTACAAT-- Pasmul GTTCACCAT--------CGTGTAGATGGCTTAGAAA--

All

A large subset of CRISPRs is conserved among diverse specieseven between archaea and bacteria suggesting that

CRISPRs are horizontally transferred together with cas genes

Mojica FJ Diez-Villasenor C Garcia-Martinez J Soria E

Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elementsJ Mol Evol 2005 Feb60(2)174-82

Here we show that CRISPR spacers derive from preexisting sequences either chromosomal or within transmissible genetic elements such as bacteriophages and conjugative plasmids

helliphelliphelliphellipThe transcription of the CRISPR loci (Tang et al 2002) suggests that such activity could be executed by CRISPR-RNA molecules acting as regulatory RNA that specifically recognizes the target through the homologous RNA-spacer sequence similarly to the eukaryotic interference RNA

HypothesisCRISPRCasbull is a prokaryotic immunity system that functions on the

RNAi principle

bull integrates short fragments of essential phageplasmid genes into CRISPRs

bull When expressed these fragments (psiRNA ndash after prokaryotic siRNA) silence the target gene and make the organism immune to the respective agent

bull contains all or most of the protein activities involved in these processes

bull Some of the Cas proteins are functional analogs of the eukaryotic proteins involved in RNAi in particular components of RISC (RNA-Induced Silencing Complex) and form prokaryotic analogs of RISC (pRISCs)

Transcription

(protein-guided) RNA folding

specific transcription factor (COG1517)

cellular RNApol

polycistronic pre-psiRNA

p-dicer (Helicase+HD-Hydrolase)(COG1203 COG2254)

RNA processing(fast)

pre-psiRNA75-100 nt

RNA processing(slow)

p-dicer

psiRNA25-45 nt

p-slicer(COG1468 COG4343

COG1857)

RAMP

target RNA cleavage

p-RISC

plasmid or phage mRNA

Annealing to RNA target

p-RISC

The basic scheme of CASS functioning

psiRNA25-45 nt

RAMP

RAMP-RNA complex(unstable)plasmid or phage

mRNA

Primer elongationCASS RNApol(COG1353)

Amplified annealing

long dsRNA (stable)

p-dicerDuplex degradation

RAMPRAMP binding

Cycle continues

Annealing to RNA target

Variant of CASS functioning with polymerasepsi-RNA amplification(mostly in thermophiles but also in Mycobacteria)

plasmid or phage mRNA

pre-psiRNA75-100 nt

CASS RNApol(COG1353)Or other RT

Reverse transcription with random copy choice

Random RNA recombination and reverse transcription

dsDNA with CRISPRs and target-derived spacers

Homologous recombination with genomic CRISPR region

integrase(COG1518)

genomic DNA

genomic DNA with new target-derived spacer

OR

New psiRNA generation

Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12

Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity

Key validation

bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease

Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes

(1) Adaptation

(2) Expression amp Processing

(3) Interference

invading virus plasmid

CRISPRleader

5 4 3 2 1

6 5 4 3 2 1

invading virus plasmid

invading virus plasmid

host

host

cas operon

Cascade

Cas3

cas3 cas2cas1cas5

Van der Oost et al TIBS 2009

The three types of CRISPRCas systems and their signature genes

1343

1343

1343

1343

Helicase1203

HD-f

Cas2

Cas6

Cas6

Cas5

Cas8

Cas9

Cas1

Cas3

Cas10

Cas11

Cas7

Cas6

Cas6

Cas5 Cas6

Cas4

Cas6

13433513

1343

RuvC-f

3513

HNH

13433513

1857

18571857 RAMP1688

RAMP1688

RecB-f1468

RecB-f1468

Helicase1203

R AMP1583

1857 1857Helicase1203

1857Helicase1203

1343 y1724

ygcKygcL

EcoliCASS2

YpestCASS3

NmeniCASS4

MtubeCASS6 RAMP module

DvulgCASS1

Tneap and HmariCASS7

ApernCASS5

Type I (Helicase cassette)

Cascade

Type II (HNH-type)

Type III (RAMP module or Polymerase cassette)

CASS4a

1857 1343RecB-f1468

Helicase1203

BH0338MTH1090LA3191

1421 RAMP1567

RAMP15833337RAMP

1583

RAMP1583

RAMP1769

SPy1049

Cas6

Cas61857

RecB-f1468

RAMP1688

Helicase1203

AF0070

R AMP1583

1343

1517

Cas12 Cas13

RAMP1688

RAMP1688

RAMP1688

Makarova et al NRM submitted

Experimental data on CRISPRCas systems

1857

1857

1857

RAMP1688

10

Cordip DIP0037Bacfra BF3951

Wolsuc WS1444Camjej Cj1522c

Neimen NMA0630Pasmul PM1126

Strthe str0658Strpyo Spy 0770

Treden TDE0328Staepi SERP2463

Mycgal MGA 0523Mycmob MMOB0320

Porgin PG1982Legpne lpp0161

Wolsuc WS1615Sulsol SSO1450

Aerper APE1240Sulsol SSO1405

Pyraer PAE0200Pyraer PAE0081

Metmaz MM0559Metmaz MM3249

Arcful AF1878Mansuc MS1635

Niteur NE0111Myctub Rv2817c

Nostoc alr0381Synech slr7071Thethe TTHB145

Chrvio CV1229Xanaxo XAC3842

Chltep CT1130Desvul DVUA0134

Metcap MCA0651Azoarc ebA3283

Deigeo_DRAFT_1682Mansuc MS0981

Baccla ABC3592Bachal BH0341

Strpyo Spy 1286Vibvul VVA1544

Nostoc alr1468Synech slr7092

Synech slr7016Geosul GSU0057

Thethe TTHB224Nostoc alr1568

Metkan MK1312

Esccol b2755Saltyp STM2938

Phopro PBPRC0034Geosul GSU1392

Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639

Symthe_STH663Corjei jk0643

Nocfar nfa44220Strave SAV7537

Metcap MCA0930Cordip DIP2214

Chrvio CV1756Zymmob ZMO0680

Pasmul PM0311Erwcar ECA3679

Yerpes y1722Legpne lpl2837Phopro PBPRB1995

Acinet ACIAD2484

Clotet CTC01148Fusnuc FN1177

Aquaeo aq 369Theten TTE2658Themar TM1797

Bacfra BF2544Porgin PG2014

Halmar pNG4053Clotet CTC01463

Metjan MJ0378Pyrhor PH1245

Thekod TK0455Metthe MTH1084

Arcful AF2435Metace MA3670

Nanequ NEQ017Pyrhor PH0173

Pictor PTO0003Pictor PTO0049Thevol TVN0106

1343

1343

1343

1343

1518

1518

1343

1518

RecB-f4343

RecB-f1468

RecB-f1468

RecB-f1468

Helicase12031857 RAMP

1688

RAMP1688

Helicase1203

Helicase1203

RAMPBH0337

RAMPYgcH

RAMPy1727

RAMPy17261857

Helicase1203

HD-f

Helicase1203

HD-f

1343

HD-f

HD-f

HD-f

RuvC-f

AF0070

BH0338

y1724

ygcKygcL

RAMP1583

HTH

AF1870

SPy1049

3574

AF1870

PH0918

HTH 3574

1857 RAMP1688

RecB-f1468

Helicase1203

HD-f

AF1870

MTH1090

AF1873

RAMP1583

3574

AF1870

3574

1343

1

7

2

3

4

5

6

7a

Sulsol SSO1999Sulsol SSO1440

Sultok ST2639Sultok ST0032

Sulsol SSO1402Pyraer PAE0068

Arcful AF1874Pyrhor PH0917Thekod TK0450

Pyraby PAB1689

13433513

3513

1518

RuvC-f

HNH

4aHNH

1 0

Pyrhor PH0162

6

64

6

4

16

17

7

7

7

7

1

2

7

7

7

7

5 7

5 75 7

5 75 7

5 7

6 1

7

Metkan MK1314

Metace MA1931

Metmaz MM3356

Metbar Mbar_A1355

Strthe stu0960Myctub Rv2823cStaepi SERP2461

Themar TM1811Mansuc MS1651

Niteur NE0123Thethe TTP0102

Metthe MTH1082

Metthe MTH326

Metjan MJ1672Pictor PTO0053Thevol TVN0112

Metkan MK1297Thethe TTP0115

Bachal BH0328Synech sll7090

Aquaeo aq 387Theten TTE2639

Themar TM1794Arcful AF1867

Pyrfur PF1129Thefus Tfu1578

Porgin PG1987Sultok ST1979

Sulsol SSO1991Sulsol SSO1729

Sultok ST0012Sulaci Saci 2046Nostoc all1479

1421 RAMP1567

RAMP1583

RAMPMTH323

RAMP1769

XMTH324

ldquoPolymeraserdquo cassette

ldquoHelicaserdquo cassette(A)

(B)

(C)

Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4

Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56

Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)

mRNA targeting

Phylogeny of Cas1 and the 3 types of CRISPRCas systems

228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes

EcoliCASS2Type IType I

Type I

Type I

Type I

Type III

Type III

Type III

Type III

Type III

Type III

Type II

YpestCASS3

NmeniCASS4

PolymeraseRAMPmoduleMtubeCASS6

PolRAMP

DvulgCASS1

TneapHmariCASS7

ApernCASS5

cas1cas9 cas2 cas4

cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI

II

IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2

RAMP

Makarova et al NRM submitted

CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria

bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes

54

256

1537

26

516

2

9

7 56107

3

10

13

383

210

46

216

8

1

7 70211

10

1

0102030405060708090

100

1623

84 6

17 7

2322

0

9

0

0

151

2

1

21

17

20 1

0

1533 28 7 14

0

9 7 40

117 210

0102030405060708090

100

Type I Type II Type IIICas1presenceabsence

Makarova et al in preparation

Modular evolution of the 3 types of CRISPRCas

Makarova et al in preparation

Mol Microbiol 2011 Jan79(2)484-502

A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair

Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF

Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair

Back to repair functions

The 3 major modalities of evolution

Koonin Wolf Biol Direct 2009

CRISPRCas as a bona fide Lamarckian system

Koonin Wolf Biol Direct 2009

Phenomenon Biological rolefunction

Phyletic spread

Lamarckian criteria

Genomic changes caused by environmental factor

Changes are specific to relevant genomic loci

Changes provide adaptation to the causative factor

Bona fide LamarckianCRISPRCas Defense against

viruses and other mobile elements

Most of the Archaeaand many bacteria

Yes Yes Yes

piRNA Defense against transposable elements in germline

Animals Yes Yes Yes

HGT (specific cases)

Adaptation to new environment stress response resistance

Archaea bacteria unicellular eukaryotes

Yes Yes Yes

Quasi-LamarckianHGT (general phenomenon)

Diverse innovations Archaea bacteria unicellular eukaryotes

Yes No Yesno

Stress-induced mutagenesis

Stress responseresistanceadaptation to new conditions

Ubiquitous Yes No or partially

Yes (but general evolvabilityenhanced as well)

Diverse Lamarckian and quasi-Lamarckian phenomena

Koonin Wolf Biol Direct 2009

Stress as a gauge of evolutionary modality

Koonin Wolf Biol Direct 2009

bullEvolution of parasites is intrinsic to any replicator system

bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since

bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms

bullPerennial arms race between parasites and hosts is one of the principal factors of evolution

Valerian DoljaOregon State University

Yuri Wolf NCBITatiana Senkevich NIAID NIH

Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille

Didier Raoult et al

Bill MartinUniv Duesseldorf

Kira Makarova NCBI

  • Slide Number 1
  • What is a virus
  • Slide Number 3
  • Slide Number 4
  • Slide Number 5
  • Comparative genomics shows that viruses that cause human diseases belong to families that evolved hundreds of millions or even billions years agoViruses accompany the evolving cellular life throughout its history and might even predate it
  • Slide Number 7
  • Slide Number 8
  • Slide Number 9
  • Slide Number 10
  • Slide Number 11
  • Slide Number 12
  • Slide Number 13
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Slide Number 18
  • Slide Number 19
  • Slide Number 20
  • Slide Number 21
  • Slide Number 22
  • Slide Number 23
  • Slide Number 24
  • Slide Number 25
  • Slide Number 26
  • Slide Number 27
  • Slide Number 28
  • Slide Number 29
  • Slide Number 30
  • Slide Number 31
  • Slide Number 32
  • Slide Number 33
  • Slide Number 34
  • Slide Number 35
  • Slide Number 36
  • Evolution of antivirus defense systems
  • Slide Number 38
  • Slide Number 39
  • Slide Number 40
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins extreme sequence diversity
  • Slide Number 46
  • Slide Number 47
  • Slide Number 48
  • Slide Number 49
  • Hypothesis
  • Slide Number 51
  • Slide Number 52
  • Slide Number 53
  • Slide Number 54
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • The three types of CRISPRCas systems and their signature genes
  • Experimental data on CRISPRCas systems
  • Phylogeny of Cas1 and the 3 types of CRISPRCas systems
  • CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
  • Slide Number 62
  • Slide Number 63
  • Slide Number 64
  • Slide Number 65
  • Slide Number 66
  • Slide Number 67
  • Slide Number 68
  • Slide Number 69
Page 32: The Virus World, its evolution, evolution of antiviral defense

Cell degeneration Escaped genes Pr imordial geneticsystems

CELL

SMALLPARASITICCELL

VERYSMALLVIRUS

CHROMOSOME

PLASMID

VIRUS

RNA

VIRUS

PRE-CELLULAR LIFE FORMS

DNA

VIRUS

mRNA

VIRUS

Competing concepts of the or igin of viruses

CELLS

Origin and evolution of virus-like genetic

elementsin the pre-cellular

era

Koonin Martin TIG 2005Koonin Senkevich Dolja Biol Direct 2006 129Koonin Ann NY Acad Sci 2009

Replicon fusionyields chromosomes

bullViruses and virus-like genetic elements are not ldquojustrdquo pathogens they aredominant entities in the biospherebullEmergence of virus-like parasites is inevitable in any replicating systembullIn the pre-cellular epoch the genetic elements that later became viral andcellular genomes comprised a single pool in which they mixed matched andevolved new increasingly complex gene ensemblesbullDifferent replication strategies including RNA replication reverse transcriptionand DNA replication evolved already in the primordial genetic poolbullWith the emergence of prokaryotic cells a distinct pool of viral genes formed that retained its identity ever since as evidenced by the extant distribution ofviral hallmark genes ldquovirus worldrdquo or the virospherebullThe emergence of the eukaryotic cell was a second melting pot of virus evolution from which viruses of eukaryotes originated via recombination of genes from prokaryote viruses retroelements and the evolving eukaryotic hostbullViruses make essential contributions to the evolution of the genomes of cellularlife forms in particular as vehicles of HGT GTAs transducing phages

The ancient Virus World (VW)

Koonin EV Senkevich TG Dolja VV The ancient Virus World and evolution of cells Biol Direct 2006 Koonin EV Wolf YI Nagasaki K Dolja VV The complexity of the virus world Nat Rev Microbiol 2009

Evolution of antivirus defense systems

bull CRISPRCas system of adaptive immunity in prokaryotes

bull A case for Lamarckian evolutionbull The perennial virus-host arms race

CRISPR repeats and Cas genesCRISPR Clustered Regularly interspaced short palindromic repeatsCas CRISPR-associated (genes)

Sorek et al Nature Rev Microbiol 2008

Makarova KS Aravind L Grishin NV Rogozin IB Koonin EVA DNA repair system specific for thermophilic Archaea and bacteria predicted by genomic context analysisNucleic Acids Res 2002 Jan 1530(2)482-96

During a systematic analysis of conserved gene context in prokaryotic genomes a previously undetected complex partially conserved neighborhood consisting of more than 20 genes was discovered in most Archaea and some bacteria including the hyperthermophiles Thermotoga maritima and Aquifex aeolicus The gene composition and gene order in this neighborhood vary greatly between species but all versions have a stable conserved core that consists of five genes One of the core genes encodes a predicted DNA helicase often fused to a predicted HD-superfamily hydrolase and another encodes a RecB family exonuclease three core genes remain uncharacterized but one of these might encode a nuclease of a new family helliphelliphelliphellipThe functional features of the proteins encoded in this neighborhood suggest that they comprise a previously undetected DNA repair system which to our knowledge is the first repair system largely specific for thermophiles to be identified

Protein components of the system an update with unification of many diverse families

~25 families altogetherFamily SubfamilyA Phyletic

distributionBComments

1 COG1518 COG1518 (cas1) All Putative novel nucleaseintegrase Mostly α-helical protein

2 COG1343 COG1343 (cas2) COG3512ygbF-like

MTH324-likey1723_N-like

All Small protein related to VapD fused to helicase (COG1203) iny1723-like proteins

3 COG1203 COG1203 (cas3) All DNA helicase Most proteins have fusion to HD nuclease

4 RecB-like nuclease

COG1468 (cas4) COG4343 All RecB-like nuclease Contains three-cysteine C-terminal cluster

5 RAMP Repair-associated mysterious

protein

COG1688 (cas5)COG1769 COG1583COG1567 COG1336COG1367 COG1604COG1337 COG1332COG5551

BH0337-likeMJ0978-likeYgcH-like

y1726-like y1727-like

All Belong to ldquoRAMPrdquo family possibly RNA-binding proteinstructurally related to duplicated ferredoxin fold (PDB 1wj9)

6 COG1857 COG1857 COG3649 YgcJ-like y1725-like

All αβ protein predicted nuclease or integrase

7 HD-like nuclease

COG1203 (N-terminus) COG2254 All HD-like nuclease

8 BH0338 BH0338-likeMTH1090-like

All mostlyarchaea and FIRM

Large Zn-finger containing proteins probably nucleases (nucleaseactivity was shown for MTH1090 (ref)

9 COG1353 predicted

polymerase

COG1353 Most archaea some bacteria

Predicted palm-domain polymerase distantly related to viral RdRp and RT

hellipand ~15 other less common proteins Makarova et al BD 2006

CRISPR clustered regularly interspaced short palindromic repeats

CRISPR repeats

TB and TA

Strepto-like

Ecoli-like

Pasteurella-like

Cas2COG1343

Cas1COG1518

Cas4COG1468

Cas3COG1203

ldquoThe common structural characteristics of CRISPR loci are (i) the presence of multiple short direct repeats which show no or very little sequence variation within a given locus (ii) the presence of non-repetitive spacer sequences between the repeats of similar size (iii) the presence of a common leader sequence of a few hundred basepairs in most species harbouring multiple CRISPR loci (iv) the absence of long open reading frames within the locus and (v) the presence of the cas1 gene accompanied by the cas2 cas3 or cas4 genes in CRISPR-containing speciesrdquo

The cas genes are our ldquorepairrdquo system

ldquoHelicaserdquo cassette

ldquoPolymeraserdquo cassette

paragon of prokaryotic genome plasticity-extensive gene shuffling-evidence of multiple horizontal transfers-widespread gene loss

RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins

extreme sequence diversity MOTIFs specific I II III IV V COGs 13361367 hhshhGs ust-lKGhh+hh hhGtt hD lGhttsGh 160413371332 y1726-like slhlpEKuVRGT lRTIDs YGuVTsGhuh COG1851 hGphpGpsaFh hGFGRh BH0337-like TpA-h+GIh-uIh hhLpDV LGsREhuht COG1567 hhhpp ups-lhtAhh luscoGhGh COG1769 hhhh+Ph-hhh ss-hhGhlhsh hGh hGtcp+hsthchp COG1688 (Cas5) hhhhhts sssshhGhlsh lGttphh COGs 15835551 hhhhoPhhl hGtppshGFGl YgcH-like hHphlh hGu+uhGhGhh y1727-like LHphLh GFsthGLStss MJ0978-like hHNH lG+tsuhGhGol

Ferredoxin foldRNA-recognition motif (RRM)

The sequence similarity space of CRISPR repeats visualized with the BioLayoutprogram [26] Dots denote individual repeat sequences connecting lines represent Smith-Waterman similarities such that closer dots represent more similar sequences Dot colors denote cluster association as derived from MCL clustering The 12 largest clusters are indicated by circles together with their sequence logos coarse phylogenetic composition and sample secondary structures where applicableKunin et al Genome Biology 2007 8R61 doi101186gb-2007-8-4-r61

CRISPR show extreme diversity and complex clustering

Arcful GTTGAAATC-------AGACCAAAATGGGATTGAAAG- Metthe GTTAAAATC-------AGACCAAAATGGGATTGAAAT- Pyraby GTTCCAATA-------AGACTAGAATAGAATTGAAAG- Pyrhor GTTTAATAA--------GACTAAAATAGAATTGAAAG- Sulsol GATTAATCC-------------CAAAAGAATTGAAAG- Pyrhor GTTTCCGTA--------GAACTAAATAGTGTGGAAAG- Pyraby GTTTCCGTA--------GAACTTAGTAGTGTGGAAAG- Pyraer GTTTCAACT------------ATCTTTTGATTTTTGG- Pictor GTTAAATAA-------TAACCTAAATAGGATTGAAAG- Theaer GTAAAATAG---------ACCTTAATAGGATTGAAAG- Metace GTTTCAATC-------CCTCAAAGGTCTGATTTTAAC- Sultok GATGAATCC------------CAAAAGGAATTGAAAG- Sulaci GTTTTAGTT-------------TCTTGTCGTTATTAC- Pictor GTTTAAGAA--------TTACTAGATAGTATGGAGT-- Halmar GTTTCAGAC-------GGACCCTTGTGGGGTTGAAGC- Metjan ATTAAAATC-------AGACCGTTTCGGAATGGAAAT- Metkan GTTTCATTA-CCCGTATTATTACGGGTTAATTGCGAG- Nanequ CTTTCAATA---------TTTCTAATATATTAGAAAC- Themar GTTTCAATA-------CTTCCTTAGAGGTATGGAAAC- Theten GTTTCAATC-------CCTTTTAGGTAGGCTAAAAAC Thethe GTTGCAAAC-CTCGTTAGCCTCGTAGAGGATTGAAAC- Aquaeo GTTTTAACT--------CCACACGGTACATTAGAAAC- Bachal GTCGCACTC----TATATGGGT-GCGTGGATTGAAAT- Azoarcus GTGTTCCCC-------GCGCATCGCGGGGGTTGAAG-- Chltep GTCTTCCCC--------ACGCC-CGTGGGGGTGTTTC- Chrvio GTGCTCCCC-------CACGCA-CGTGGGGATGAACCG Clotet GTATTAGTA-------GCACCA-TATTGGAATGTAAAT Anabaena GTTTTAATTAACAAAAATCCCTATCAGGGATTGAAAC- Myctub GTTTCCGTC--CCCTCTCGGGGTTTTGGGTCTGACGAC Yerpes GTTCACTGC--------CGCACAGGCAGCTTAGAAA-- Metcap GTTTCAATCCACTCCCGGCTATTTAGCCGGGAGATAC- Mycgal GTTTTAGCA-CTGTACAATACTTGTGTAAGCAATAAC- Pasmul GTTAACTGC--------CGTATAGGCAGCTTAGAAA-- Pasmul GTTGTAGTTCCCTCTCTCATTTCGCAGTGCTACAAT-- Pasmul GTTCACCAT--------CGTGTAGATGGCTTAGAAA--

All

A large subset of CRISPRs is conserved among diverse specieseven between archaea and bacteria suggesting that

CRISPRs are horizontally transferred together with cas genes

Mojica FJ Diez-Villasenor C Garcia-Martinez J Soria E

Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elementsJ Mol Evol 2005 Feb60(2)174-82

Here we show that CRISPR spacers derive from preexisting sequences either chromosomal or within transmissible genetic elements such as bacteriophages and conjugative plasmids

helliphelliphelliphellipThe transcription of the CRISPR loci (Tang et al 2002) suggests that such activity could be executed by CRISPR-RNA molecules acting as regulatory RNA that specifically recognizes the target through the homologous RNA-spacer sequence similarly to the eukaryotic interference RNA

HypothesisCRISPRCasbull is a prokaryotic immunity system that functions on the

RNAi principle

bull integrates short fragments of essential phageplasmid genes into CRISPRs

bull When expressed these fragments (psiRNA ndash after prokaryotic siRNA) silence the target gene and make the organism immune to the respective agent

bull contains all or most of the protein activities involved in these processes

bull Some of the Cas proteins are functional analogs of the eukaryotic proteins involved in RNAi in particular components of RISC (RNA-Induced Silencing Complex) and form prokaryotic analogs of RISC (pRISCs)

Transcription

(protein-guided) RNA folding

specific transcription factor (COG1517)

cellular RNApol

polycistronic pre-psiRNA

p-dicer (Helicase+HD-Hydrolase)(COG1203 COG2254)

RNA processing(fast)

pre-psiRNA75-100 nt

RNA processing(slow)

p-dicer

psiRNA25-45 nt

p-slicer(COG1468 COG4343

COG1857)

RAMP

target RNA cleavage

p-RISC

plasmid or phage mRNA

Annealing to RNA target

p-RISC

The basic scheme of CASS functioning

psiRNA25-45 nt

RAMP

RAMP-RNA complex(unstable)plasmid or phage

mRNA

Primer elongationCASS RNApol(COG1353)

Amplified annealing

long dsRNA (stable)

p-dicerDuplex degradation

RAMPRAMP binding

Cycle continues

Annealing to RNA target

Variant of CASS functioning with polymerasepsi-RNA amplification(mostly in thermophiles but also in Mycobacteria)

plasmid or phage mRNA

pre-psiRNA75-100 nt

CASS RNApol(COG1353)Or other RT

Reverse transcription with random copy choice

Random RNA recombination and reverse transcription

dsDNA with CRISPRs and target-derived spacers

Homologous recombination with genomic CRISPR region

integrase(COG1518)

genomic DNA

genomic DNA with new target-derived spacer

OR

New psiRNA generation

Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12

Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity

Key validation

bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease

Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes

(1) Adaptation

(2) Expression amp Processing

(3) Interference

invading virus plasmid

CRISPRleader

5 4 3 2 1

6 5 4 3 2 1

invading virus plasmid

invading virus plasmid

host

host

cas operon

Cascade

Cas3

cas3 cas2cas1cas5

Van der Oost et al TIBS 2009

The three types of CRISPRCas systems and their signature genes

1343

1343

1343

1343

Helicase1203

HD-f

Cas2

Cas6

Cas6

Cas5

Cas8

Cas9

Cas1

Cas3

Cas10

Cas11

Cas7

Cas6

Cas6

Cas5 Cas6

Cas4

Cas6

13433513

1343

RuvC-f

3513

HNH

13433513

1857

18571857 RAMP1688

RAMP1688

RecB-f1468

RecB-f1468

Helicase1203

R AMP1583

1857 1857Helicase1203

1857Helicase1203

1343 y1724

ygcKygcL

EcoliCASS2

YpestCASS3

NmeniCASS4

MtubeCASS6 RAMP module

DvulgCASS1

Tneap and HmariCASS7

ApernCASS5

Type I (Helicase cassette)

Cascade

Type II (HNH-type)

Type III (RAMP module or Polymerase cassette)

CASS4a

1857 1343RecB-f1468

Helicase1203

BH0338MTH1090LA3191

1421 RAMP1567

RAMP15833337RAMP

1583

RAMP1583

RAMP1769

SPy1049

Cas6

Cas61857

RecB-f1468

RAMP1688

Helicase1203

AF0070

R AMP1583

1343

1517

Cas12 Cas13

RAMP1688

RAMP1688

RAMP1688

Makarova et al NRM submitted

Experimental data on CRISPRCas systems

1857

1857

1857

RAMP1688

10

Cordip DIP0037Bacfra BF3951

Wolsuc WS1444Camjej Cj1522c

Neimen NMA0630Pasmul PM1126

Strthe str0658Strpyo Spy 0770

Treden TDE0328Staepi SERP2463

Mycgal MGA 0523Mycmob MMOB0320

Porgin PG1982Legpne lpp0161

Wolsuc WS1615Sulsol SSO1450

Aerper APE1240Sulsol SSO1405

Pyraer PAE0200Pyraer PAE0081

Metmaz MM0559Metmaz MM3249

Arcful AF1878Mansuc MS1635

Niteur NE0111Myctub Rv2817c

Nostoc alr0381Synech slr7071Thethe TTHB145

Chrvio CV1229Xanaxo XAC3842

Chltep CT1130Desvul DVUA0134

Metcap MCA0651Azoarc ebA3283

Deigeo_DRAFT_1682Mansuc MS0981

Baccla ABC3592Bachal BH0341

Strpyo Spy 1286Vibvul VVA1544

Nostoc alr1468Synech slr7092

Synech slr7016Geosul GSU0057

Thethe TTHB224Nostoc alr1568

Metkan MK1312

Esccol b2755Saltyp STM2938

Phopro PBPRC0034Geosul GSU1392

Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639

Symthe_STH663Corjei jk0643

Nocfar nfa44220Strave SAV7537

Metcap MCA0930Cordip DIP2214

Chrvio CV1756Zymmob ZMO0680

Pasmul PM0311Erwcar ECA3679

Yerpes y1722Legpne lpl2837Phopro PBPRB1995

Acinet ACIAD2484

Clotet CTC01148Fusnuc FN1177

Aquaeo aq 369Theten TTE2658Themar TM1797

Bacfra BF2544Porgin PG2014

Halmar pNG4053Clotet CTC01463

Metjan MJ0378Pyrhor PH1245

Thekod TK0455Metthe MTH1084

Arcful AF2435Metace MA3670

Nanequ NEQ017Pyrhor PH0173

Pictor PTO0003Pictor PTO0049Thevol TVN0106

1343

1343

1343

1343

1518

1518

1343

1518

RecB-f4343

RecB-f1468

RecB-f1468

RecB-f1468

Helicase12031857 RAMP

1688

RAMP1688

Helicase1203

Helicase1203

RAMPBH0337

RAMPYgcH

RAMPy1727

RAMPy17261857

Helicase1203

HD-f

Helicase1203

HD-f

1343

HD-f

HD-f

HD-f

RuvC-f

AF0070

BH0338

y1724

ygcKygcL

RAMP1583

HTH

AF1870

SPy1049

3574

AF1870

PH0918

HTH 3574

1857 RAMP1688

RecB-f1468

Helicase1203

HD-f

AF1870

MTH1090

AF1873

RAMP1583

3574

AF1870

3574

1343

1

7

2

3

4

5

6

7a

Sulsol SSO1999Sulsol SSO1440

Sultok ST2639Sultok ST0032

Sulsol SSO1402Pyraer PAE0068

Arcful AF1874Pyrhor PH0917Thekod TK0450

Pyraby PAB1689

13433513

3513

1518

RuvC-f

HNH

4aHNH

1 0

Pyrhor PH0162

6

64

6

4

16

17

7

7

7

7

1

2

7

7

7

7

5 7

5 75 7

5 75 7

5 7

6 1

7

Metkan MK1314

Metace MA1931

Metmaz MM3356

Metbar Mbar_A1355

Strthe stu0960Myctub Rv2823cStaepi SERP2461

Themar TM1811Mansuc MS1651

Niteur NE0123Thethe TTP0102

Metthe MTH1082

Metthe MTH326

Metjan MJ1672Pictor PTO0053Thevol TVN0112

Metkan MK1297Thethe TTP0115

Bachal BH0328Synech sll7090

Aquaeo aq 387Theten TTE2639

Themar TM1794Arcful AF1867

Pyrfur PF1129Thefus Tfu1578

Porgin PG1987Sultok ST1979

Sulsol SSO1991Sulsol SSO1729

Sultok ST0012Sulaci Saci 2046Nostoc all1479

1421 RAMP1567

RAMP1583

RAMPMTH323

RAMP1769

XMTH324

ldquoPolymeraserdquo cassette

ldquoHelicaserdquo cassette(A)

(B)

(C)

Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4

Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56

Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)

mRNA targeting

Phylogeny of Cas1 and the 3 types of CRISPRCas systems

228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes

EcoliCASS2Type IType I

Type I

Type I

Type I

Type III

Type III

Type III

Type III

Type III

Type III

Type II

YpestCASS3

NmeniCASS4

PolymeraseRAMPmoduleMtubeCASS6

PolRAMP

DvulgCASS1

TneapHmariCASS7

ApernCASS5

cas1cas9 cas2 cas4

cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI

II

IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2

RAMP

Makarova et al NRM submitted

CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria

bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes

54

256

1537

26

516

2

9

7 56107

3

10

13

383

210

46

216

8

1

7 70211

10

1

0102030405060708090

100

1623

84 6

17 7

2322

0

9

0

0

151

2

1

21

17

20 1

0

1533 28 7 14

0

9 7 40

117 210

0102030405060708090

100

Type I Type II Type IIICas1presenceabsence

Makarova et al in preparation

Modular evolution of the 3 types of CRISPRCas

Makarova et al in preparation

Mol Microbiol 2011 Jan79(2)484-502

A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair

Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF

Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair

Back to repair functions

The 3 major modalities of evolution

Koonin Wolf Biol Direct 2009

CRISPRCas as a bona fide Lamarckian system

Koonin Wolf Biol Direct 2009

Phenomenon Biological rolefunction

Phyletic spread

Lamarckian criteria

Genomic changes caused by environmental factor

Changes are specific to relevant genomic loci

Changes provide adaptation to the causative factor

Bona fide LamarckianCRISPRCas Defense against

viruses and other mobile elements

Most of the Archaeaand many bacteria

Yes Yes Yes

piRNA Defense against transposable elements in germline

Animals Yes Yes Yes

HGT (specific cases)

Adaptation to new environment stress response resistance

Archaea bacteria unicellular eukaryotes

Yes Yes Yes

Quasi-LamarckianHGT (general phenomenon)

Diverse innovations Archaea bacteria unicellular eukaryotes

Yes No Yesno

Stress-induced mutagenesis

Stress responseresistanceadaptation to new conditions

Ubiquitous Yes No or partially

Yes (but general evolvabilityenhanced as well)

Diverse Lamarckian and quasi-Lamarckian phenomena

Koonin Wolf Biol Direct 2009

Stress as a gauge of evolutionary modality

Koonin Wolf Biol Direct 2009

bullEvolution of parasites is intrinsic to any replicator system

bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since

bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms

bullPerennial arms race between parasites and hosts is one of the principal factors of evolution

Valerian DoljaOregon State University

Yuri Wolf NCBITatiana Senkevich NIAID NIH

Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille

Didier Raoult et al

Bill MartinUniv Duesseldorf

Kira Makarova NCBI

  • Slide Number 1
  • What is a virus
  • Slide Number 3
  • Slide Number 4
  • Slide Number 5
  • Comparative genomics shows that viruses that cause human diseases belong to families that evolved hundreds of millions or even billions years agoViruses accompany the evolving cellular life throughout its history and might even predate it
  • Slide Number 7
  • Slide Number 8
  • Slide Number 9
  • Slide Number 10
  • Slide Number 11
  • Slide Number 12
  • Slide Number 13
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Slide Number 18
  • Slide Number 19
  • Slide Number 20
  • Slide Number 21
  • Slide Number 22
  • Slide Number 23
  • Slide Number 24
  • Slide Number 25
  • Slide Number 26
  • Slide Number 27
  • Slide Number 28
  • Slide Number 29
  • Slide Number 30
  • Slide Number 31
  • Slide Number 32
  • Slide Number 33
  • Slide Number 34
  • Slide Number 35
  • Slide Number 36
  • Evolution of antivirus defense systems
  • Slide Number 38
  • Slide Number 39
  • Slide Number 40
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins extreme sequence diversity
  • Slide Number 46
  • Slide Number 47
  • Slide Number 48
  • Slide Number 49
  • Hypothesis
  • Slide Number 51
  • Slide Number 52
  • Slide Number 53
  • Slide Number 54
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • The three types of CRISPRCas systems and their signature genes
  • Experimental data on CRISPRCas systems
  • Phylogeny of Cas1 and the 3 types of CRISPRCas systems
  • CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
  • Slide Number 62
  • Slide Number 63
  • Slide Number 64
  • Slide Number 65
  • Slide Number 66
  • Slide Number 67
  • Slide Number 68
  • Slide Number 69
Page 33: The Virus World, its evolution, evolution of antiviral defense

Origin and evolution of virus-like genetic

elementsin the pre-cellular

era

Koonin Martin TIG 2005Koonin Senkevich Dolja Biol Direct 2006 129Koonin Ann NY Acad Sci 2009

Replicon fusionyields chromosomes

bullViruses and virus-like genetic elements are not ldquojustrdquo pathogens they aredominant entities in the biospherebullEmergence of virus-like parasites is inevitable in any replicating systembullIn the pre-cellular epoch the genetic elements that later became viral andcellular genomes comprised a single pool in which they mixed matched andevolved new increasingly complex gene ensemblesbullDifferent replication strategies including RNA replication reverse transcriptionand DNA replication evolved already in the primordial genetic poolbullWith the emergence of prokaryotic cells a distinct pool of viral genes formed that retained its identity ever since as evidenced by the extant distribution ofviral hallmark genes ldquovirus worldrdquo or the virospherebullThe emergence of the eukaryotic cell was a second melting pot of virus evolution from which viruses of eukaryotes originated via recombination of genes from prokaryote viruses retroelements and the evolving eukaryotic hostbullViruses make essential contributions to the evolution of the genomes of cellularlife forms in particular as vehicles of HGT GTAs transducing phages

The ancient Virus World (VW)

Koonin EV Senkevich TG Dolja VV The ancient Virus World and evolution of cells Biol Direct 2006 Koonin EV Wolf YI Nagasaki K Dolja VV The complexity of the virus world Nat Rev Microbiol 2009

Evolution of antivirus defense systems

bull CRISPRCas system of adaptive immunity in prokaryotes

bull A case for Lamarckian evolutionbull The perennial virus-host arms race

CRISPR repeats and Cas genesCRISPR Clustered Regularly interspaced short palindromic repeatsCas CRISPR-associated (genes)

Sorek et al Nature Rev Microbiol 2008

Makarova KS Aravind L Grishin NV Rogozin IB Koonin EVA DNA repair system specific for thermophilic Archaea and bacteria predicted by genomic context analysisNucleic Acids Res 2002 Jan 1530(2)482-96

During a systematic analysis of conserved gene context in prokaryotic genomes a previously undetected complex partially conserved neighborhood consisting of more than 20 genes was discovered in most Archaea and some bacteria including the hyperthermophiles Thermotoga maritima and Aquifex aeolicus The gene composition and gene order in this neighborhood vary greatly between species but all versions have a stable conserved core that consists of five genes One of the core genes encodes a predicted DNA helicase often fused to a predicted HD-superfamily hydrolase and another encodes a RecB family exonuclease three core genes remain uncharacterized but one of these might encode a nuclease of a new family helliphelliphelliphellipThe functional features of the proteins encoded in this neighborhood suggest that they comprise a previously undetected DNA repair system which to our knowledge is the first repair system largely specific for thermophiles to be identified

Protein components of the system an update with unification of many diverse families

~25 families altogetherFamily SubfamilyA Phyletic

distributionBComments

1 COG1518 COG1518 (cas1) All Putative novel nucleaseintegrase Mostly α-helical protein

2 COG1343 COG1343 (cas2) COG3512ygbF-like

MTH324-likey1723_N-like

All Small protein related to VapD fused to helicase (COG1203) iny1723-like proteins

3 COG1203 COG1203 (cas3) All DNA helicase Most proteins have fusion to HD nuclease

4 RecB-like nuclease

COG1468 (cas4) COG4343 All RecB-like nuclease Contains three-cysteine C-terminal cluster

5 RAMP Repair-associated mysterious

protein

COG1688 (cas5)COG1769 COG1583COG1567 COG1336COG1367 COG1604COG1337 COG1332COG5551

BH0337-likeMJ0978-likeYgcH-like

y1726-like y1727-like

All Belong to ldquoRAMPrdquo family possibly RNA-binding proteinstructurally related to duplicated ferredoxin fold (PDB 1wj9)

6 COG1857 COG1857 COG3649 YgcJ-like y1725-like

All αβ protein predicted nuclease or integrase

7 HD-like nuclease

COG1203 (N-terminus) COG2254 All HD-like nuclease

8 BH0338 BH0338-likeMTH1090-like

All mostlyarchaea and FIRM

Large Zn-finger containing proteins probably nucleases (nucleaseactivity was shown for MTH1090 (ref)

9 COG1353 predicted

polymerase

COG1353 Most archaea some bacteria

Predicted palm-domain polymerase distantly related to viral RdRp and RT

hellipand ~15 other less common proteins Makarova et al BD 2006

CRISPR clustered regularly interspaced short palindromic repeats

CRISPR repeats

TB and TA

Strepto-like

Ecoli-like

Pasteurella-like

Cas2COG1343

Cas1COG1518

Cas4COG1468

Cas3COG1203

ldquoThe common structural characteristics of CRISPR loci are (i) the presence of multiple short direct repeats which show no or very little sequence variation within a given locus (ii) the presence of non-repetitive spacer sequences between the repeats of similar size (iii) the presence of a common leader sequence of a few hundred basepairs in most species harbouring multiple CRISPR loci (iv) the absence of long open reading frames within the locus and (v) the presence of the cas1 gene accompanied by the cas2 cas3 or cas4 genes in CRISPR-containing speciesrdquo

The cas genes are our ldquorepairrdquo system

ldquoHelicaserdquo cassette

ldquoPolymeraserdquo cassette

paragon of prokaryotic genome plasticity-extensive gene shuffling-evidence of multiple horizontal transfers-widespread gene loss

RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins

extreme sequence diversity MOTIFs specific I II III IV V COGs 13361367 hhshhGs ust-lKGhh+hh hhGtt hD lGhttsGh 160413371332 y1726-like slhlpEKuVRGT lRTIDs YGuVTsGhuh COG1851 hGphpGpsaFh hGFGRh BH0337-like TpA-h+GIh-uIh hhLpDV LGsREhuht COG1567 hhhpp ups-lhtAhh luscoGhGh COG1769 hhhh+Ph-hhh ss-hhGhlhsh hGh hGtcp+hsthchp COG1688 (Cas5) hhhhhts sssshhGhlsh lGttphh COGs 15835551 hhhhoPhhl hGtppshGFGl YgcH-like hHphlh hGu+uhGhGhh y1727-like LHphLh GFsthGLStss MJ0978-like hHNH lG+tsuhGhGol

Ferredoxin foldRNA-recognition motif (RRM)

The sequence similarity space of CRISPR repeats visualized with the BioLayoutprogram [26] Dots denote individual repeat sequences connecting lines represent Smith-Waterman similarities such that closer dots represent more similar sequences Dot colors denote cluster association as derived from MCL clustering The 12 largest clusters are indicated by circles together with their sequence logos coarse phylogenetic composition and sample secondary structures where applicableKunin et al Genome Biology 2007 8R61 doi101186gb-2007-8-4-r61

CRISPR show extreme diversity and complex clustering

Arcful GTTGAAATC-------AGACCAAAATGGGATTGAAAG- Metthe GTTAAAATC-------AGACCAAAATGGGATTGAAAT- Pyraby GTTCCAATA-------AGACTAGAATAGAATTGAAAG- Pyrhor GTTTAATAA--------GACTAAAATAGAATTGAAAG- Sulsol GATTAATCC-------------CAAAAGAATTGAAAG- Pyrhor GTTTCCGTA--------GAACTAAATAGTGTGGAAAG- Pyraby GTTTCCGTA--------GAACTTAGTAGTGTGGAAAG- Pyraer GTTTCAACT------------ATCTTTTGATTTTTGG- Pictor GTTAAATAA-------TAACCTAAATAGGATTGAAAG- Theaer GTAAAATAG---------ACCTTAATAGGATTGAAAG- Metace GTTTCAATC-------CCTCAAAGGTCTGATTTTAAC- Sultok GATGAATCC------------CAAAAGGAATTGAAAG- Sulaci GTTTTAGTT-------------TCTTGTCGTTATTAC- Pictor GTTTAAGAA--------TTACTAGATAGTATGGAGT-- Halmar GTTTCAGAC-------GGACCCTTGTGGGGTTGAAGC- Metjan ATTAAAATC-------AGACCGTTTCGGAATGGAAAT- Metkan GTTTCATTA-CCCGTATTATTACGGGTTAATTGCGAG- Nanequ CTTTCAATA---------TTTCTAATATATTAGAAAC- Themar GTTTCAATA-------CTTCCTTAGAGGTATGGAAAC- Theten GTTTCAATC-------CCTTTTAGGTAGGCTAAAAAC Thethe GTTGCAAAC-CTCGTTAGCCTCGTAGAGGATTGAAAC- Aquaeo GTTTTAACT--------CCACACGGTACATTAGAAAC- Bachal GTCGCACTC----TATATGGGT-GCGTGGATTGAAAT- Azoarcus GTGTTCCCC-------GCGCATCGCGGGGGTTGAAG-- Chltep GTCTTCCCC--------ACGCC-CGTGGGGGTGTTTC- Chrvio GTGCTCCCC-------CACGCA-CGTGGGGATGAACCG Clotet GTATTAGTA-------GCACCA-TATTGGAATGTAAAT Anabaena GTTTTAATTAACAAAAATCCCTATCAGGGATTGAAAC- Myctub GTTTCCGTC--CCCTCTCGGGGTTTTGGGTCTGACGAC Yerpes GTTCACTGC--------CGCACAGGCAGCTTAGAAA-- Metcap GTTTCAATCCACTCCCGGCTATTTAGCCGGGAGATAC- Mycgal GTTTTAGCA-CTGTACAATACTTGTGTAAGCAATAAC- Pasmul GTTAACTGC--------CGTATAGGCAGCTTAGAAA-- Pasmul GTTGTAGTTCCCTCTCTCATTTCGCAGTGCTACAAT-- Pasmul GTTCACCAT--------CGTGTAGATGGCTTAGAAA--

All

A large subset of CRISPRs is conserved among diverse specieseven between archaea and bacteria suggesting that

CRISPRs are horizontally transferred together with cas genes

Mojica FJ Diez-Villasenor C Garcia-Martinez J Soria E

Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elementsJ Mol Evol 2005 Feb60(2)174-82

Here we show that CRISPR spacers derive from preexisting sequences either chromosomal or within transmissible genetic elements such as bacteriophages and conjugative plasmids

helliphelliphelliphellipThe transcription of the CRISPR loci (Tang et al 2002) suggests that such activity could be executed by CRISPR-RNA molecules acting as regulatory RNA that specifically recognizes the target through the homologous RNA-spacer sequence similarly to the eukaryotic interference RNA

HypothesisCRISPRCasbull is a prokaryotic immunity system that functions on the

RNAi principle

bull integrates short fragments of essential phageplasmid genes into CRISPRs

bull When expressed these fragments (psiRNA ndash after prokaryotic siRNA) silence the target gene and make the organism immune to the respective agent

bull contains all or most of the protein activities involved in these processes

bull Some of the Cas proteins are functional analogs of the eukaryotic proteins involved in RNAi in particular components of RISC (RNA-Induced Silencing Complex) and form prokaryotic analogs of RISC (pRISCs)

Transcription

(protein-guided) RNA folding

specific transcription factor (COG1517)

cellular RNApol

polycistronic pre-psiRNA

p-dicer (Helicase+HD-Hydrolase)(COG1203 COG2254)

RNA processing(fast)

pre-psiRNA75-100 nt

RNA processing(slow)

p-dicer

psiRNA25-45 nt

p-slicer(COG1468 COG4343

COG1857)

RAMP

target RNA cleavage

p-RISC

plasmid or phage mRNA

Annealing to RNA target

p-RISC

The basic scheme of CASS functioning

psiRNA25-45 nt

RAMP

RAMP-RNA complex(unstable)plasmid or phage

mRNA

Primer elongationCASS RNApol(COG1353)

Amplified annealing

long dsRNA (stable)

p-dicerDuplex degradation

RAMPRAMP binding

Cycle continues

Annealing to RNA target

Variant of CASS functioning with polymerasepsi-RNA amplification(mostly in thermophiles but also in Mycobacteria)

plasmid or phage mRNA

pre-psiRNA75-100 nt

CASS RNApol(COG1353)Or other RT

Reverse transcription with random copy choice

Random RNA recombination and reverse transcription

dsDNA with CRISPRs and target-derived spacers

Homologous recombination with genomic CRISPR region

integrase(COG1518)

genomic DNA

genomic DNA with new target-derived spacer

OR

New psiRNA generation

Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12

Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity

Key validation

bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease

Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes

(1) Adaptation

(2) Expression amp Processing

(3) Interference

invading virus plasmid

CRISPRleader

5 4 3 2 1

6 5 4 3 2 1

invading virus plasmid

invading virus plasmid

host

host

cas operon

Cascade

Cas3

cas3 cas2cas1cas5

Van der Oost et al TIBS 2009

The three types of CRISPRCas systems and their signature genes

1343

1343

1343

1343

Helicase1203

HD-f

Cas2

Cas6

Cas6

Cas5

Cas8

Cas9

Cas1

Cas3

Cas10

Cas11

Cas7

Cas6

Cas6

Cas5 Cas6

Cas4

Cas6

13433513

1343

RuvC-f

3513

HNH

13433513

1857

18571857 RAMP1688

RAMP1688

RecB-f1468

RecB-f1468

Helicase1203

R AMP1583

1857 1857Helicase1203

1857Helicase1203

1343 y1724

ygcKygcL

EcoliCASS2

YpestCASS3

NmeniCASS4

MtubeCASS6 RAMP module

DvulgCASS1

Tneap and HmariCASS7

ApernCASS5

Type I (Helicase cassette)

Cascade

Type II (HNH-type)

Type III (RAMP module or Polymerase cassette)

CASS4a

1857 1343RecB-f1468

Helicase1203

BH0338MTH1090LA3191

1421 RAMP1567

RAMP15833337RAMP

1583

RAMP1583

RAMP1769

SPy1049

Cas6

Cas61857

RecB-f1468

RAMP1688

Helicase1203

AF0070

R AMP1583

1343

1517

Cas12 Cas13

RAMP1688

RAMP1688

RAMP1688

Makarova et al NRM submitted

Experimental data on CRISPRCas systems

1857

1857

1857

RAMP1688

10

Cordip DIP0037Bacfra BF3951

Wolsuc WS1444Camjej Cj1522c

Neimen NMA0630Pasmul PM1126

Strthe str0658Strpyo Spy 0770

Treden TDE0328Staepi SERP2463

Mycgal MGA 0523Mycmob MMOB0320

Porgin PG1982Legpne lpp0161

Wolsuc WS1615Sulsol SSO1450

Aerper APE1240Sulsol SSO1405

Pyraer PAE0200Pyraer PAE0081

Metmaz MM0559Metmaz MM3249

Arcful AF1878Mansuc MS1635

Niteur NE0111Myctub Rv2817c

Nostoc alr0381Synech slr7071Thethe TTHB145

Chrvio CV1229Xanaxo XAC3842

Chltep CT1130Desvul DVUA0134

Metcap MCA0651Azoarc ebA3283

Deigeo_DRAFT_1682Mansuc MS0981

Baccla ABC3592Bachal BH0341

Strpyo Spy 1286Vibvul VVA1544

Nostoc alr1468Synech slr7092

Synech slr7016Geosul GSU0057

Thethe TTHB224Nostoc alr1568

Metkan MK1312

Esccol b2755Saltyp STM2938

Phopro PBPRC0034Geosul GSU1392

Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639

Symthe_STH663Corjei jk0643

Nocfar nfa44220Strave SAV7537

Metcap MCA0930Cordip DIP2214

Chrvio CV1756Zymmob ZMO0680

Pasmul PM0311Erwcar ECA3679

Yerpes y1722Legpne lpl2837Phopro PBPRB1995

Acinet ACIAD2484

Clotet CTC01148Fusnuc FN1177

Aquaeo aq 369Theten TTE2658Themar TM1797

Bacfra BF2544Porgin PG2014

Halmar pNG4053Clotet CTC01463

Metjan MJ0378Pyrhor PH1245

Thekod TK0455Metthe MTH1084

Arcful AF2435Metace MA3670

Nanequ NEQ017Pyrhor PH0173

Pictor PTO0003Pictor PTO0049Thevol TVN0106

1343

1343

1343

1343

1518

1518

1343

1518

RecB-f4343

RecB-f1468

RecB-f1468

RecB-f1468

Helicase12031857 RAMP

1688

RAMP1688

Helicase1203

Helicase1203

RAMPBH0337

RAMPYgcH

RAMPy1727

RAMPy17261857

Helicase1203

HD-f

Helicase1203

HD-f

1343

HD-f

HD-f

HD-f

RuvC-f

AF0070

BH0338

y1724

ygcKygcL

RAMP1583

HTH

AF1870

SPy1049

3574

AF1870

PH0918

HTH 3574

1857 RAMP1688

RecB-f1468

Helicase1203

HD-f

AF1870

MTH1090

AF1873

RAMP1583

3574

AF1870

3574

1343

1

7

2

3

4

5

6

7a

Sulsol SSO1999Sulsol SSO1440

Sultok ST2639Sultok ST0032

Sulsol SSO1402Pyraer PAE0068

Arcful AF1874Pyrhor PH0917Thekod TK0450

Pyraby PAB1689

13433513

3513

1518

RuvC-f

HNH

4aHNH

1 0

Pyrhor PH0162

6

64

6

4

16

17

7

7

7

7

1

2

7

7

7

7

5 7

5 75 7

5 75 7

5 7

6 1

7

Metkan MK1314

Metace MA1931

Metmaz MM3356

Metbar Mbar_A1355

Strthe stu0960Myctub Rv2823cStaepi SERP2461

Themar TM1811Mansuc MS1651

Niteur NE0123Thethe TTP0102

Metthe MTH1082

Metthe MTH326

Metjan MJ1672Pictor PTO0053Thevol TVN0112

Metkan MK1297Thethe TTP0115

Bachal BH0328Synech sll7090

Aquaeo aq 387Theten TTE2639

Themar TM1794Arcful AF1867

Pyrfur PF1129Thefus Tfu1578

Porgin PG1987Sultok ST1979

Sulsol SSO1991Sulsol SSO1729

Sultok ST0012Sulaci Saci 2046Nostoc all1479

1421 RAMP1567

RAMP1583

RAMPMTH323

RAMP1769

XMTH324

ldquoPolymeraserdquo cassette

ldquoHelicaserdquo cassette(A)

(B)

(C)

Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4

Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56

Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)

mRNA targeting

Phylogeny of Cas1 and the 3 types of CRISPRCas systems

228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes

EcoliCASS2Type IType I

Type I

Type I

Type I

Type III

Type III

Type III

Type III

Type III

Type III

Type II

YpestCASS3

NmeniCASS4

PolymeraseRAMPmoduleMtubeCASS6

PolRAMP

DvulgCASS1

TneapHmariCASS7

ApernCASS5

cas1cas9 cas2 cas4

cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI

II

IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2

RAMP

Makarova et al NRM submitted

CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria

bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes

54

256

1537

26

516

2

9

7 56107

3

10

13

383

210

46

216

8

1

7 70211

10

1

0102030405060708090

100

1623

84 6

17 7

2322

0

9

0

0

151

2

1

21

17

20 1

0

1533 28 7 14

0

9 7 40

117 210

0102030405060708090

100

Type I Type II Type IIICas1presenceabsence

Makarova et al in preparation

Modular evolution of the 3 types of CRISPRCas

Makarova et al in preparation

Mol Microbiol 2011 Jan79(2)484-502

A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair

Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF

Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair

Back to repair functions

The 3 major modalities of evolution

Koonin Wolf Biol Direct 2009

CRISPRCas as a bona fide Lamarckian system

Koonin Wolf Biol Direct 2009

Phenomenon Biological rolefunction

Phyletic spread

Lamarckian criteria

Genomic changes caused by environmental factor

Changes are specific to relevant genomic loci

Changes provide adaptation to the causative factor

Bona fide LamarckianCRISPRCas Defense against

viruses and other mobile elements

Most of the Archaeaand many bacteria

Yes Yes Yes

piRNA Defense against transposable elements in germline

Animals Yes Yes Yes

HGT (specific cases)

Adaptation to new environment stress response resistance

Archaea bacteria unicellular eukaryotes

Yes Yes Yes

Quasi-LamarckianHGT (general phenomenon)

Diverse innovations Archaea bacteria unicellular eukaryotes

Yes No Yesno

Stress-induced mutagenesis

Stress responseresistanceadaptation to new conditions

Ubiquitous Yes No or partially

Yes (but general evolvabilityenhanced as well)

Diverse Lamarckian and quasi-Lamarckian phenomena

Koonin Wolf Biol Direct 2009

Stress as a gauge of evolutionary modality

Koonin Wolf Biol Direct 2009

bullEvolution of parasites is intrinsic to any replicator system

bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since

bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms

bullPerennial arms race between parasites and hosts is one of the principal factors of evolution

Valerian DoljaOregon State University

Yuri Wolf NCBITatiana Senkevich NIAID NIH

Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille

Didier Raoult et al

Bill MartinUniv Duesseldorf

Kira Makarova NCBI

  • Slide Number 1
  • What is a virus
  • Slide Number 3
  • Slide Number 4
  • Slide Number 5
  • Comparative genomics shows that viruses that cause human diseases belong to families that evolved hundreds of millions or even billions years agoViruses accompany the evolving cellular life throughout its history and might even predate it
  • Slide Number 7
  • Slide Number 8
  • Slide Number 9
  • Slide Number 10
  • Slide Number 11
  • Slide Number 12
  • Slide Number 13
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Slide Number 18
  • Slide Number 19
  • Slide Number 20
  • Slide Number 21
  • Slide Number 22
  • Slide Number 23
  • Slide Number 24
  • Slide Number 25
  • Slide Number 26
  • Slide Number 27
  • Slide Number 28
  • Slide Number 29
  • Slide Number 30
  • Slide Number 31
  • Slide Number 32
  • Slide Number 33
  • Slide Number 34
  • Slide Number 35
  • Slide Number 36
  • Evolution of antivirus defense systems
  • Slide Number 38
  • Slide Number 39
  • Slide Number 40
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins extreme sequence diversity
  • Slide Number 46
  • Slide Number 47
  • Slide Number 48
  • Slide Number 49
  • Hypothesis
  • Slide Number 51
  • Slide Number 52
  • Slide Number 53
  • Slide Number 54
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • The three types of CRISPRCas systems and their signature genes
  • Experimental data on CRISPRCas systems
  • Phylogeny of Cas1 and the 3 types of CRISPRCas systems
  • CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
  • Slide Number 62
  • Slide Number 63
  • Slide Number 64
  • Slide Number 65
  • Slide Number 66
  • Slide Number 67
  • Slide Number 68
  • Slide Number 69
Page 34: The Virus World, its evolution, evolution of antiviral defense

bullViruses and virus-like genetic elements are not ldquojustrdquo pathogens they aredominant entities in the biospherebullEmergence of virus-like parasites is inevitable in any replicating systembullIn the pre-cellular epoch the genetic elements that later became viral andcellular genomes comprised a single pool in which they mixed matched andevolved new increasingly complex gene ensemblesbullDifferent replication strategies including RNA replication reverse transcriptionand DNA replication evolved already in the primordial genetic poolbullWith the emergence of prokaryotic cells a distinct pool of viral genes formed that retained its identity ever since as evidenced by the extant distribution ofviral hallmark genes ldquovirus worldrdquo or the virospherebullThe emergence of the eukaryotic cell was a second melting pot of virus evolution from which viruses of eukaryotes originated via recombination of genes from prokaryote viruses retroelements and the evolving eukaryotic hostbullViruses make essential contributions to the evolution of the genomes of cellularlife forms in particular as vehicles of HGT GTAs transducing phages

The ancient Virus World (VW)

Koonin EV Senkevich TG Dolja VV The ancient Virus World and evolution of cells Biol Direct 2006 Koonin EV Wolf YI Nagasaki K Dolja VV The complexity of the virus world Nat Rev Microbiol 2009

Evolution of antivirus defense systems

bull CRISPRCas system of adaptive immunity in prokaryotes

bull A case for Lamarckian evolutionbull The perennial virus-host arms race

CRISPR repeats and Cas genesCRISPR Clustered Regularly interspaced short palindromic repeatsCas CRISPR-associated (genes)

Sorek et al Nature Rev Microbiol 2008

Makarova KS Aravind L Grishin NV Rogozin IB Koonin EVA DNA repair system specific for thermophilic Archaea and bacteria predicted by genomic context analysisNucleic Acids Res 2002 Jan 1530(2)482-96

During a systematic analysis of conserved gene context in prokaryotic genomes a previously undetected complex partially conserved neighborhood consisting of more than 20 genes was discovered in most Archaea and some bacteria including the hyperthermophiles Thermotoga maritima and Aquifex aeolicus The gene composition and gene order in this neighborhood vary greatly between species but all versions have a stable conserved core that consists of five genes One of the core genes encodes a predicted DNA helicase often fused to a predicted HD-superfamily hydrolase and another encodes a RecB family exonuclease three core genes remain uncharacterized but one of these might encode a nuclease of a new family helliphelliphelliphellipThe functional features of the proteins encoded in this neighborhood suggest that they comprise a previously undetected DNA repair system which to our knowledge is the first repair system largely specific for thermophiles to be identified

Protein components of the system an update with unification of many diverse families

~25 families altogetherFamily SubfamilyA Phyletic

distributionBComments

1 COG1518 COG1518 (cas1) All Putative novel nucleaseintegrase Mostly α-helical protein

2 COG1343 COG1343 (cas2) COG3512ygbF-like

MTH324-likey1723_N-like

All Small protein related to VapD fused to helicase (COG1203) iny1723-like proteins

3 COG1203 COG1203 (cas3) All DNA helicase Most proteins have fusion to HD nuclease

4 RecB-like nuclease

COG1468 (cas4) COG4343 All RecB-like nuclease Contains three-cysteine C-terminal cluster

5 RAMP Repair-associated mysterious

protein

COG1688 (cas5)COG1769 COG1583COG1567 COG1336COG1367 COG1604COG1337 COG1332COG5551

BH0337-likeMJ0978-likeYgcH-like

y1726-like y1727-like

All Belong to ldquoRAMPrdquo family possibly RNA-binding proteinstructurally related to duplicated ferredoxin fold (PDB 1wj9)

6 COG1857 COG1857 COG3649 YgcJ-like y1725-like

All αβ protein predicted nuclease or integrase

7 HD-like nuclease

COG1203 (N-terminus) COG2254 All HD-like nuclease

8 BH0338 BH0338-likeMTH1090-like

All mostlyarchaea and FIRM

Large Zn-finger containing proteins probably nucleases (nucleaseactivity was shown for MTH1090 (ref)

9 COG1353 predicted

polymerase

COG1353 Most archaea some bacteria

Predicted palm-domain polymerase distantly related to viral RdRp and RT

hellipand ~15 other less common proteins Makarova et al BD 2006

CRISPR clustered regularly interspaced short palindromic repeats

CRISPR repeats

TB and TA

Strepto-like

Ecoli-like

Pasteurella-like

Cas2COG1343

Cas1COG1518

Cas4COG1468

Cas3COG1203

ldquoThe common structural characteristics of CRISPR loci are (i) the presence of multiple short direct repeats which show no or very little sequence variation within a given locus (ii) the presence of non-repetitive spacer sequences between the repeats of similar size (iii) the presence of a common leader sequence of a few hundred basepairs in most species harbouring multiple CRISPR loci (iv) the absence of long open reading frames within the locus and (v) the presence of the cas1 gene accompanied by the cas2 cas3 or cas4 genes in CRISPR-containing speciesrdquo

The cas genes are our ldquorepairrdquo system

ldquoHelicaserdquo cassette

ldquoPolymeraserdquo cassette

paragon of prokaryotic genome plasticity-extensive gene shuffling-evidence of multiple horizontal transfers-widespread gene loss

RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins

extreme sequence diversity MOTIFs specific I II III IV V COGs 13361367 hhshhGs ust-lKGhh+hh hhGtt hD lGhttsGh 160413371332 y1726-like slhlpEKuVRGT lRTIDs YGuVTsGhuh COG1851 hGphpGpsaFh hGFGRh BH0337-like TpA-h+GIh-uIh hhLpDV LGsREhuht COG1567 hhhpp ups-lhtAhh luscoGhGh COG1769 hhhh+Ph-hhh ss-hhGhlhsh hGh hGtcp+hsthchp COG1688 (Cas5) hhhhhts sssshhGhlsh lGttphh COGs 15835551 hhhhoPhhl hGtppshGFGl YgcH-like hHphlh hGu+uhGhGhh y1727-like LHphLh GFsthGLStss MJ0978-like hHNH lG+tsuhGhGol

Ferredoxin foldRNA-recognition motif (RRM)

The sequence similarity space of CRISPR repeats visualized with the BioLayoutprogram [26] Dots denote individual repeat sequences connecting lines represent Smith-Waterman similarities such that closer dots represent more similar sequences Dot colors denote cluster association as derived from MCL clustering The 12 largest clusters are indicated by circles together with their sequence logos coarse phylogenetic composition and sample secondary structures where applicableKunin et al Genome Biology 2007 8R61 doi101186gb-2007-8-4-r61

CRISPR show extreme diversity and complex clustering

Arcful GTTGAAATC-------AGACCAAAATGGGATTGAAAG- Metthe GTTAAAATC-------AGACCAAAATGGGATTGAAAT- Pyraby GTTCCAATA-------AGACTAGAATAGAATTGAAAG- Pyrhor GTTTAATAA--------GACTAAAATAGAATTGAAAG- Sulsol GATTAATCC-------------CAAAAGAATTGAAAG- Pyrhor GTTTCCGTA--------GAACTAAATAGTGTGGAAAG- Pyraby GTTTCCGTA--------GAACTTAGTAGTGTGGAAAG- Pyraer GTTTCAACT------------ATCTTTTGATTTTTGG- Pictor GTTAAATAA-------TAACCTAAATAGGATTGAAAG- Theaer GTAAAATAG---------ACCTTAATAGGATTGAAAG- Metace GTTTCAATC-------CCTCAAAGGTCTGATTTTAAC- Sultok GATGAATCC------------CAAAAGGAATTGAAAG- Sulaci GTTTTAGTT-------------TCTTGTCGTTATTAC- Pictor GTTTAAGAA--------TTACTAGATAGTATGGAGT-- Halmar GTTTCAGAC-------GGACCCTTGTGGGGTTGAAGC- Metjan ATTAAAATC-------AGACCGTTTCGGAATGGAAAT- Metkan GTTTCATTA-CCCGTATTATTACGGGTTAATTGCGAG- Nanequ CTTTCAATA---------TTTCTAATATATTAGAAAC- Themar GTTTCAATA-------CTTCCTTAGAGGTATGGAAAC- Theten GTTTCAATC-------CCTTTTAGGTAGGCTAAAAAC Thethe GTTGCAAAC-CTCGTTAGCCTCGTAGAGGATTGAAAC- Aquaeo GTTTTAACT--------CCACACGGTACATTAGAAAC- Bachal GTCGCACTC----TATATGGGT-GCGTGGATTGAAAT- Azoarcus GTGTTCCCC-------GCGCATCGCGGGGGTTGAAG-- Chltep GTCTTCCCC--------ACGCC-CGTGGGGGTGTTTC- Chrvio GTGCTCCCC-------CACGCA-CGTGGGGATGAACCG Clotet GTATTAGTA-------GCACCA-TATTGGAATGTAAAT Anabaena GTTTTAATTAACAAAAATCCCTATCAGGGATTGAAAC- Myctub GTTTCCGTC--CCCTCTCGGGGTTTTGGGTCTGACGAC Yerpes GTTCACTGC--------CGCACAGGCAGCTTAGAAA-- Metcap GTTTCAATCCACTCCCGGCTATTTAGCCGGGAGATAC- Mycgal GTTTTAGCA-CTGTACAATACTTGTGTAAGCAATAAC- Pasmul GTTAACTGC--------CGTATAGGCAGCTTAGAAA-- Pasmul GTTGTAGTTCCCTCTCTCATTTCGCAGTGCTACAAT-- Pasmul GTTCACCAT--------CGTGTAGATGGCTTAGAAA--

All

A large subset of CRISPRs is conserved among diverse specieseven between archaea and bacteria suggesting that

CRISPRs are horizontally transferred together with cas genes

Mojica FJ Diez-Villasenor C Garcia-Martinez J Soria E

Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elementsJ Mol Evol 2005 Feb60(2)174-82

Here we show that CRISPR spacers derive from preexisting sequences either chromosomal or within transmissible genetic elements such as bacteriophages and conjugative plasmids

helliphelliphelliphellipThe transcription of the CRISPR loci (Tang et al 2002) suggests that such activity could be executed by CRISPR-RNA molecules acting as regulatory RNA that specifically recognizes the target through the homologous RNA-spacer sequence similarly to the eukaryotic interference RNA

HypothesisCRISPRCasbull is a prokaryotic immunity system that functions on the

RNAi principle

bull integrates short fragments of essential phageplasmid genes into CRISPRs

bull When expressed these fragments (psiRNA ndash after prokaryotic siRNA) silence the target gene and make the organism immune to the respective agent

bull contains all or most of the protein activities involved in these processes

bull Some of the Cas proteins are functional analogs of the eukaryotic proteins involved in RNAi in particular components of RISC (RNA-Induced Silencing Complex) and form prokaryotic analogs of RISC (pRISCs)

Transcription

(protein-guided) RNA folding

specific transcription factor (COG1517)

cellular RNApol

polycistronic pre-psiRNA

p-dicer (Helicase+HD-Hydrolase)(COG1203 COG2254)

RNA processing(fast)

pre-psiRNA75-100 nt

RNA processing(slow)

p-dicer

psiRNA25-45 nt

p-slicer(COG1468 COG4343

COG1857)

RAMP

target RNA cleavage

p-RISC

plasmid or phage mRNA

Annealing to RNA target

p-RISC

The basic scheme of CASS functioning

psiRNA25-45 nt

RAMP

RAMP-RNA complex(unstable)plasmid or phage

mRNA

Primer elongationCASS RNApol(COG1353)

Amplified annealing

long dsRNA (stable)

p-dicerDuplex degradation

RAMPRAMP binding

Cycle continues

Annealing to RNA target

Variant of CASS functioning with polymerasepsi-RNA amplification(mostly in thermophiles but also in Mycobacteria)

plasmid or phage mRNA

pre-psiRNA75-100 nt

CASS RNApol(COG1353)Or other RT

Reverse transcription with random copy choice

Random RNA recombination and reverse transcription

dsDNA with CRISPRs and target-derived spacers

Homologous recombination with genomic CRISPR region

integrase(COG1518)

genomic DNA

genomic DNA with new target-derived spacer

OR

New psiRNA generation

Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12

Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity

Key validation

bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease

Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes

(1) Adaptation

(2) Expression amp Processing

(3) Interference

invading virus plasmid

CRISPRleader

5 4 3 2 1

6 5 4 3 2 1

invading virus plasmid

invading virus plasmid

host

host

cas operon

Cascade

Cas3

cas3 cas2cas1cas5

Van der Oost et al TIBS 2009

The three types of CRISPRCas systems and their signature genes

1343

1343

1343

1343

Helicase1203

HD-f

Cas2

Cas6

Cas6

Cas5

Cas8

Cas9

Cas1

Cas3

Cas10

Cas11

Cas7

Cas6

Cas6

Cas5 Cas6

Cas4

Cas6

13433513

1343

RuvC-f

3513

HNH

13433513

1857

18571857 RAMP1688

RAMP1688

RecB-f1468

RecB-f1468

Helicase1203

R AMP1583

1857 1857Helicase1203

1857Helicase1203

1343 y1724

ygcKygcL

EcoliCASS2

YpestCASS3

NmeniCASS4

MtubeCASS6 RAMP module

DvulgCASS1

Tneap and HmariCASS7

ApernCASS5

Type I (Helicase cassette)

Cascade

Type II (HNH-type)

Type III (RAMP module or Polymerase cassette)

CASS4a

1857 1343RecB-f1468

Helicase1203

BH0338MTH1090LA3191

1421 RAMP1567

RAMP15833337RAMP

1583

RAMP1583

RAMP1769

SPy1049

Cas6

Cas61857

RecB-f1468

RAMP1688

Helicase1203

AF0070

R AMP1583

1343

1517

Cas12 Cas13

RAMP1688

RAMP1688

RAMP1688

Makarova et al NRM submitted

Experimental data on CRISPRCas systems

1857

1857

1857

RAMP1688

10

Cordip DIP0037Bacfra BF3951

Wolsuc WS1444Camjej Cj1522c

Neimen NMA0630Pasmul PM1126

Strthe str0658Strpyo Spy 0770

Treden TDE0328Staepi SERP2463

Mycgal MGA 0523Mycmob MMOB0320

Porgin PG1982Legpne lpp0161

Wolsuc WS1615Sulsol SSO1450

Aerper APE1240Sulsol SSO1405

Pyraer PAE0200Pyraer PAE0081

Metmaz MM0559Metmaz MM3249

Arcful AF1878Mansuc MS1635

Niteur NE0111Myctub Rv2817c

Nostoc alr0381Synech slr7071Thethe TTHB145

Chrvio CV1229Xanaxo XAC3842

Chltep CT1130Desvul DVUA0134

Metcap MCA0651Azoarc ebA3283

Deigeo_DRAFT_1682Mansuc MS0981

Baccla ABC3592Bachal BH0341

Strpyo Spy 1286Vibvul VVA1544

Nostoc alr1468Synech slr7092

Synech slr7016Geosul GSU0057

Thethe TTHB224Nostoc alr1568

Metkan MK1312

Esccol b2755Saltyp STM2938

Phopro PBPRC0034Geosul GSU1392

Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639

Symthe_STH663Corjei jk0643

Nocfar nfa44220Strave SAV7537

Metcap MCA0930Cordip DIP2214

Chrvio CV1756Zymmob ZMO0680

Pasmul PM0311Erwcar ECA3679

Yerpes y1722Legpne lpl2837Phopro PBPRB1995

Acinet ACIAD2484

Clotet CTC01148Fusnuc FN1177

Aquaeo aq 369Theten TTE2658Themar TM1797

Bacfra BF2544Porgin PG2014

Halmar pNG4053Clotet CTC01463

Metjan MJ0378Pyrhor PH1245

Thekod TK0455Metthe MTH1084

Arcful AF2435Metace MA3670

Nanequ NEQ017Pyrhor PH0173

Pictor PTO0003Pictor PTO0049Thevol TVN0106

1343

1343

1343

1343

1518

1518

1343

1518

RecB-f4343

RecB-f1468

RecB-f1468

RecB-f1468

Helicase12031857 RAMP

1688

RAMP1688

Helicase1203

Helicase1203

RAMPBH0337

RAMPYgcH

RAMPy1727

RAMPy17261857

Helicase1203

HD-f

Helicase1203

HD-f

1343

HD-f

HD-f

HD-f

RuvC-f

AF0070

BH0338

y1724

ygcKygcL

RAMP1583

HTH

AF1870

SPy1049

3574

AF1870

PH0918

HTH 3574

1857 RAMP1688

RecB-f1468

Helicase1203

HD-f

AF1870

MTH1090

AF1873

RAMP1583

3574

AF1870

3574

1343

1

7

2

3

4

5

6

7a

Sulsol SSO1999Sulsol SSO1440

Sultok ST2639Sultok ST0032

Sulsol SSO1402Pyraer PAE0068

Arcful AF1874Pyrhor PH0917Thekod TK0450

Pyraby PAB1689

13433513

3513

1518

RuvC-f

HNH

4aHNH

1 0

Pyrhor PH0162

6

64

6

4

16

17

7

7

7

7

1

2

7

7

7

7

5 7

5 75 7

5 75 7

5 7

6 1

7

Metkan MK1314

Metace MA1931

Metmaz MM3356

Metbar Mbar_A1355

Strthe stu0960Myctub Rv2823cStaepi SERP2461

Themar TM1811Mansuc MS1651

Niteur NE0123Thethe TTP0102

Metthe MTH1082

Metthe MTH326

Metjan MJ1672Pictor PTO0053Thevol TVN0112

Metkan MK1297Thethe TTP0115

Bachal BH0328Synech sll7090

Aquaeo aq 387Theten TTE2639

Themar TM1794Arcful AF1867

Pyrfur PF1129Thefus Tfu1578

Porgin PG1987Sultok ST1979

Sulsol SSO1991Sulsol SSO1729

Sultok ST0012Sulaci Saci 2046Nostoc all1479

1421 RAMP1567

RAMP1583

RAMPMTH323

RAMP1769

XMTH324

ldquoPolymeraserdquo cassette

ldquoHelicaserdquo cassette(A)

(B)

(C)

Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4

Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56

Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)

mRNA targeting

Phylogeny of Cas1 and the 3 types of CRISPRCas systems

228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes

EcoliCASS2Type IType I

Type I

Type I

Type I

Type III

Type III

Type III

Type III

Type III

Type III

Type II

YpestCASS3

NmeniCASS4

PolymeraseRAMPmoduleMtubeCASS6

PolRAMP

DvulgCASS1

TneapHmariCASS7

ApernCASS5

cas1cas9 cas2 cas4

cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI

II

IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2

RAMP

Makarova et al NRM submitted

CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria

bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes

54

256

1537

26

516

2

9

7 56107

3

10

13

383

210

46

216

8

1

7 70211

10

1

0102030405060708090

100

1623

84 6

17 7

2322

0

9

0

0

151

2

1

21

17

20 1

0

1533 28 7 14

0

9 7 40

117 210

0102030405060708090

100

Type I Type II Type IIICas1presenceabsence

Makarova et al in preparation

Modular evolution of the 3 types of CRISPRCas

Makarova et al in preparation

Mol Microbiol 2011 Jan79(2)484-502

A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair

Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF

Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair

Back to repair functions

The 3 major modalities of evolution

Koonin Wolf Biol Direct 2009

CRISPRCas as a bona fide Lamarckian system

Koonin Wolf Biol Direct 2009

Phenomenon Biological rolefunction

Phyletic spread

Lamarckian criteria

Genomic changes caused by environmental factor

Changes are specific to relevant genomic loci

Changes provide adaptation to the causative factor

Bona fide LamarckianCRISPRCas Defense against

viruses and other mobile elements

Most of the Archaeaand many bacteria

Yes Yes Yes

piRNA Defense against transposable elements in germline

Animals Yes Yes Yes

HGT (specific cases)

Adaptation to new environment stress response resistance

Archaea bacteria unicellular eukaryotes

Yes Yes Yes

Quasi-LamarckianHGT (general phenomenon)

Diverse innovations Archaea bacteria unicellular eukaryotes

Yes No Yesno

Stress-induced mutagenesis

Stress responseresistanceadaptation to new conditions

Ubiquitous Yes No or partially

Yes (but general evolvabilityenhanced as well)

Diverse Lamarckian and quasi-Lamarckian phenomena

Koonin Wolf Biol Direct 2009

Stress as a gauge of evolutionary modality

Koonin Wolf Biol Direct 2009

bullEvolution of parasites is intrinsic to any replicator system

bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since

bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms

bullPerennial arms race between parasites and hosts is one of the principal factors of evolution

Valerian DoljaOregon State University

Yuri Wolf NCBITatiana Senkevich NIAID NIH

Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille

Didier Raoult et al

Bill MartinUniv Duesseldorf

Kira Makarova NCBI

  • Slide Number 1
  • What is a virus
  • Slide Number 3
  • Slide Number 4
  • Slide Number 5
  • Comparative genomics shows that viruses that cause human diseases belong to families that evolved hundreds of millions or even billions years agoViruses accompany the evolving cellular life throughout its history and might even predate it
  • Slide Number 7
  • Slide Number 8
  • Slide Number 9
  • Slide Number 10
  • Slide Number 11
  • Slide Number 12
  • Slide Number 13
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Slide Number 18
  • Slide Number 19
  • Slide Number 20
  • Slide Number 21
  • Slide Number 22
  • Slide Number 23
  • Slide Number 24
  • Slide Number 25
  • Slide Number 26
  • Slide Number 27
  • Slide Number 28
  • Slide Number 29
  • Slide Number 30
  • Slide Number 31
  • Slide Number 32
  • Slide Number 33
  • Slide Number 34
  • Slide Number 35
  • Slide Number 36
  • Evolution of antivirus defense systems
  • Slide Number 38
  • Slide Number 39
  • Slide Number 40
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins extreme sequence diversity
  • Slide Number 46
  • Slide Number 47
  • Slide Number 48
  • Slide Number 49
  • Hypothesis
  • Slide Number 51
  • Slide Number 52
  • Slide Number 53
  • Slide Number 54
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • The three types of CRISPRCas systems and their signature genes
  • Experimental data on CRISPRCas systems
  • Phylogeny of Cas1 and the 3 types of CRISPRCas systems
  • CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
  • Slide Number 62
  • Slide Number 63
  • Slide Number 64
  • Slide Number 65
  • Slide Number 66
  • Slide Number 67
  • Slide Number 68
  • Slide Number 69
Page 35: The Virus World, its evolution, evolution of antiviral defense

Evolution of antivirus defense systems

bull CRISPRCas system of adaptive immunity in prokaryotes

bull A case for Lamarckian evolutionbull The perennial virus-host arms race

CRISPR repeats and Cas genesCRISPR Clustered Regularly interspaced short palindromic repeatsCas CRISPR-associated (genes)

Sorek et al Nature Rev Microbiol 2008

Makarova KS Aravind L Grishin NV Rogozin IB Koonin EVA DNA repair system specific for thermophilic Archaea and bacteria predicted by genomic context analysisNucleic Acids Res 2002 Jan 1530(2)482-96

During a systematic analysis of conserved gene context in prokaryotic genomes a previously undetected complex partially conserved neighborhood consisting of more than 20 genes was discovered in most Archaea and some bacteria including the hyperthermophiles Thermotoga maritima and Aquifex aeolicus The gene composition and gene order in this neighborhood vary greatly between species but all versions have a stable conserved core that consists of five genes One of the core genes encodes a predicted DNA helicase often fused to a predicted HD-superfamily hydrolase and another encodes a RecB family exonuclease three core genes remain uncharacterized but one of these might encode a nuclease of a new family helliphelliphelliphellipThe functional features of the proteins encoded in this neighborhood suggest that they comprise a previously undetected DNA repair system which to our knowledge is the first repair system largely specific for thermophiles to be identified

Protein components of the system an update with unification of many diverse families

~25 families altogetherFamily SubfamilyA Phyletic

distributionBComments

1 COG1518 COG1518 (cas1) All Putative novel nucleaseintegrase Mostly α-helical protein

2 COG1343 COG1343 (cas2) COG3512ygbF-like

MTH324-likey1723_N-like

All Small protein related to VapD fused to helicase (COG1203) iny1723-like proteins

3 COG1203 COG1203 (cas3) All DNA helicase Most proteins have fusion to HD nuclease

4 RecB-like nuclease

COG1468 (cas4) COG4343 All RecB-like nuclease Contains three-cysteine C-terminal cluster

5 RAMP Repair-associated mysterious

protein

COG1688 (cas5)COG1769 COG1583COG1567 COG1336COG1367 COG1604COG1337 COG1332COG5551

BH0337-likeMJ0978-likeYgcH-like

y1726-like y1727-like

All Belong to ldquoRAMPrdquo family possibly RNA-binding proteinstructurally related to duplicated ferredoxin fold (PDB 1wj9)

6 COG1857 COG1857 COG3649 YgcJ-like y1725-like

All αβ protein predicted nuclease or integrase

7 HD-like nuclease

COG1203 (N-terminus) COG2254 All HD-like nuclease

8 BH0338 BH0338-likeMTH1090-like

All mostlyarchaea and FIRM

Large Zn-finger containing proteins probably nucleases (nucleaseactivity was shown for MTH1090 (ref)

9 COG1353 predicted

polymerase

COG1353 Most archaea some bacteria

Predicted palm-domain polymerase distantly related to viral RdRp and RT

hellipand ~15 other less common proteins Makarova et al BD 2006

CRISPR clustered regularly interspaced short palindromic repeats

CRISPR repeats

TB and TA

Strepto-like

Ecoli-like

Pasteurella-like

Cas2COG1343

Cas1COG1518

Cas4COG1468

Cas3COG1203

ldquoThe common structural characteristics of CRISPR loci are (i) the presence of multiple short direct repeats which show no or very little sequence variation within a given locus (ii) the presence of non-repetitive spacer sequences between the repeats of similar size (iii) the presence of a common leader sequence of a few hundred basepairs in most species harbouring multiple CRISPR loci (iv) the absence of long open reading frames within the locus and (v) the presence of the cas1 gene accompanied by the cas2 cas3 or cas4 genes in CRISPR-containing speciesrdquo

The cas genes are our ldquorepairrdquo system

ldquoHelicaserdquo cassette

ldquoPolymeraserdquo cassette

paragon of prokaryotic genome plasticity-extensive gene shuffling-evidence of multiple horizontal transfers-widespread gene loss

RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins

extreme sequence diversity MOTIFs specific I II III IV V COGs 13361367 hhshhGs ust-lKGhh+hh hhGtt hD lGhttsGh 160413371332 y1726-like slhlpEKuVRGT lRTIDs YGuVTsGhuh COG1851 hGphpGpsaFh hGFGRh BH0337-like TpA-h+GIh-uIh hhLpDV LGsREhuht COG1567 hhhpp ups-lhtAhh luscoGhGh COG1769 hhhh+Ph-hhh ss-hhGhlhsh hGh hGtcp+hsthchp COG1688 (Cas5) hhhhhts sssshhGhlsh lGttphh COGs 15835551 hhhhoPhhl hGtppshGFGl YgcH-like hHphlh hGu+uhGhGhh y1727-like LHphLh GFsthGLStss MJ0978-like hHNH lG+tsuhGhGol

Ferredoxin foldRNA-recognition motif (RRM)

The sequence similarity space of CRISPR repeats visualized with the BioLayoutprogram [26] Dots denote individual repeat sequences connecting lines represent Smith-Waterman similarities such that closer dots represent more similar sequences Dot colors denote cluster association as derived from MCL clustering The 12 largest clusters are indicated by circles together with their sequence logos coarse phylogenetic composition and sample secondary structures where applicableKunin et al Genome Biology 2007 8R61 doi101186gb-2007-8-4-r61

CRISPR show extreme diversity and complex clustering

Arcful GTTGAAATC-------AGACCAAAATGGGATTGAAAG- Metthe GTTAAAATC-------AGACCAAAATGGGATTGAAAT- Pyraby GTTCCAATA-------AGACTAGAATAGAATTGAAAG- Pyrhor GTTTAATAA--------GACTAAAATAGAATTGAAAG- Sulsol GATTAATCC-------------CAAAAGAATTGAAAG- Pyrhor GTTTCCGTA--------GAACTAAATAGTGTGGAAAG- Pyraby GTTTCCGTA--------GAACTTAGTAGTGTGGAAAG- Pyraer GTTTCAACT------------ATCTTTTGATTTTTGG- Pictor GTTAAATAA-------TAACCTAAATAGGATTGAAAG- Theaer GTAAAATAG---------ACCTTAATAGGATTGAAAG- Metace GTTTCAATC-------CCTCAAAGGTCTGATTTTAAC- Sultok GATGAATCC------------CAAAAGGAATTGAAAG- Sulaci GTTTTAGTT-------------TCTTGTCGTTATTAC- Pictor GTTTAAGAA--------TTACTAGATAGTATGGAGT-- Halmar GTTTCAGAC-------GGACCCTTGTGGGGTTGAAGC- Metjan ATTAAAATC-------AGACCGTTTCGGAATGGAAAT- Metkan GTTTCATTA-CCCGTATTATTACGGGTTAATTGCGAG- Nanequ CTTTCAATA---------TTTCTAATATATTAGAAAC- Themar GTTTCAATA-------CTTCCTTAGAGGTATGGAAAC- Theten GTTTCAATC-------CCTTTTAGGTAGGCTAAAAAC Thethe GTTGCAAAC-CTCGTTAGCCTCGTAGAGGATTGAAAC- Aquaeo GTTTTAACT--------CCACACGGTACATTAGAAAC- Bachal GTCGCACTC----TATATGGGT-GCGTGGATTGAAAT- Azoarcus GTGTTCCCC-------GCGCATCGCGGGGGTTGAAG-- Chltep GTCTTCCCC--------ACGCC-CGTGGGGGTGTTTC- Chrvio GTGCTCCCC-------CACGCA-CGTGGGGATGAACCG Clotet GTATTAGTA-------GCACCA-TATTGGAATGTAAAT Anabaena GTTTTAATTAACAAAAATCCCTATCAGGGATTGAAAC- Myctub GTTTCCGTC--CCCTCTCGGGGTTTTGGGTCTGACGAC Yerpes GTTCACTGC--------CGCACAGGCAGCTTAGAAA-- Metcap GTTTCAATCCACTCCCGGCTATTTAGCCGGGAGATAC- Mycgal GTTTTAGCA-CTGTACAATACTTGTGTAAGCAATAAC- Pasmul GTTAACTGC--------CGTATAGGCAGCTTAGAAA-- Pasmul GTTGTAGTTCCCTCTCTCATTTCGCAGTGCTACAAT-- Pasmul GTTCACCAT--------CGTGTAGATGGCTTAGAAA--

All

A large subset of CRISPRs is conserved among diverse specieseven between archaea and bacteria suggesting that

CRISPRs are horizontally transferred together with cas genes

Mojica FJ Diez-Villasenor C Garcia-Martinez J Soria E

Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elementsJ Mol Evol 2005 Feb60(2)174-82

Here we show that CRISPR spacers derive from preexisting sequences either chromosomal or within transmissible genetic elements such as bacteriophages and conjugative plasmids

helliphelliphelliphellipThe transcription of the CRISPR loci (Tang et al 2002) suggests that such activity could be executed by CRISPR-RNA molecules acting as regulatory RNA that specifically recognizes the target through the homologous RNA-spacer sequence similarly to the eukaryotic interference RNA

HypothesisCRISPRCasbull is a prokaryotic immunity system that functions on the

RNAi principle

bull integrates short fragments of essential phageplasmid genes into CRISPRs

bull When expressed these fragments (psiRNA ndash after prokaryotic siRNA) silence the target gene and make the organism immune to the respective agent

bull contains all or most of the protein activities involved in these processes

bull Some of the Cas proteins are functional analogs of the eukaryotic proteins involved in RNAi in particular components of RISC (RNA-Induced Silencing Complex) and form prokaryotic analogs of RISC (pRISCs)

Transcription

(protein-guided) RNA folding

specific transcription factor (COG1517)

cellular RNApol

polycistronic pre-psiRNA

p-dicer (Helicase+HD-Hydrolase)(COG1203 COG2254)

RNA processing(fast)

pre-psiRNA75-100 nt

RNA processing(slow)

p-dicer

psiRNA25-45 nt

p-slicer(COG1468 COG4343

COG1857)

RAMP

target RNA cleavage

p-RISC

plasmid or phage mRNA

Annealing to RNA target

p-RISC

The basic scheme of CASS functioning

psiRNA25-45 nt

RAMP

RAMP-RNA complex(unstable)plasmid or phage

mRNA

Primer elongationCASS RNApol(COG1353)

Amplified annealing

long dsRNA (stable)

p-dicerDuplex degradation

RAMPRAMP binding

Cycle continues

Annealing to RNA target

Variant of CASS functioning with polymerasepsi-RNA amplification(mostly in thermophiles but also in Mycobacteria)

plasmid or phage mRNA

pre-psiRNA75-100 nt

CASS RNApol(COG1353)Or other RT

Reverse transcription with random copy choice

Random RNA recombination and reverse transcription

dsDNA with CRISPRs and target-derived spacers

Homologous recombination with genomic CRISPR region

integrase(COG1518)

genomic DNA

genomic DNA with new target-derived spacer

OR

New psiRNA generation

Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12

Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity

Key validation

bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease

Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes

(1) Adaptation

(2) Expression amp Processing

(3) Interference

invading virus plasmid

CRISPRleader

5 4 3 2 1

6 5 4 3 2 1

invading virus plasmid

invading virus plasmid

host

host

cas operon

Cascade

Cas3

cas3 cas2cas1cas5

Van der Oost et al TIBS 2009

The three types of CRISPRCas systems and their signature genes

1343

1343

1343

1343

Helicase1203

HD-f

Cas2

Cas6

Cas6

Cas5

Cas8

Cas9

Cas1

Cas3

Cas10

Cas11

Cas7

Cas6

Cas6

Cas5 Cas6

Cas4

Cas6

13433513

1343

RuvC-f

3513

HNH

13433513

1857

18571857 RAMP1688

RAMP1688

RecB-f1468

RecB-f1468

Helicase1203

R AMP1583

1857 1857Helicase1203

1857Helicase1203

1343 y1724

ygcKygcL

EcoliCASS2

YpestCASS3

NmeniCASS4

MtubeCASS6 RAMP module

DvulgCASS1

Tneap and HmariCASS7

ApernCASS5

Type I (Helicase cassette)

Cascade

Type II (HNH-type)

Type III (RAMP module or Polymerase cassette)

CASS4a

1857 1343RecB-f1468

Helicase1203

BH0338MTH1090LA3191

1421 RAMP1567

RAMP15833337RAMP

1583

RAMP1583

RAMP1769

SPy1049

Cas6

Cas61857

RecB-f1468

RAMP1688

Helicase1203

AF0070

R AMP1583

1343

1517

Cas12 Cas13

RAMP1688

RAMP1688

RAMP1688

Makarova et al NRM submitted

Experimental data on CRISPRCas systems

1857

1857

1857

RAMP1688

10

Cordip DIP0037Bacfra BF3951

Wolsuc WS1444Camjej Cj1522c

Neimen NMA0630Pasmul PM1126

Strthe str0658Strpyo Spy 0770

Treden TDE0328Staepi SERP2463

Mycgal MGA 0523Mycmob MMOB0320

Porgin PG1982Legpne lpp0161

Wolsuc WS1615Sulsol SSO1450

Aerper APE1240Sulsol SSO1405

Pyraer PAE0200Pyraer PAE0081

Metmaz MM0559Metmaz MM3249

Arcful AF1878Mansuc MS1635

Niteur NE0111Myctub Rv2817c

Nostoc alr0381Synech slr7071Thethe TTHB145

Chrvio CV1229Xanaxo XAC3842

Chltep CT1130Desvul DVUA0134

Metcap MCA0651Azoarc ebA3283

Deigeo_DRAFT_1682Mansuc MS0981

Baccla ABC3592Bachal BH0341

Strpyo Spy 1286Vibvul VVA1544

Nostoc alr1468Synech slr7092

Synech slr7016Geosul GSU0057

Thethe TTHB224Nostoc alr1568

Metkan MK1312

Esccol b2755Saltyp STM2938

Phopro PBPRC0034Geosul GSU1392

Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639

Symthe_STH663Corjei jk0643

Nocfar nfa44220Strave SAV7537

Metcap MCA0930Cordip DIP2214

Chrvio CV1756Zymmob ZMO0680

Pasmul PM0311Erwcar ECA3679

Yerpes y1722Legpne lpl2837Phopro PBPRB1995

Acinet ACIAD2484

Clotet CTC01148Fusnuc FN1177

Aquaeo aq 369Theten TTE2658Themar TM1797

Bacfra BF2544Porgin PG2014

Halmar pNG4053Clotet CTC01463

Metjan MJ0378Pyrhor PH1245

Thekod TK0455Metthe MTH1084

Arcful AF2435Metace MA3670

Nanequ NEQ017Pyrhor PH0173

Pictor PTO0003Pictor PTO0049Thevol TVN0106

1343

1343

1343

1343

1518

1518

1343

1518

RecB-f4343

RecB-f1468

RecB-f1468

RecB-f1468

Helicase12031857 RAMP

1688

RAMP1688

Helicase1203

Helicase1203

RAMPBH0337

RAMPYgcH

RAMPy1727

RAMPy17261857

Helicase1203

HD-f

Helicase1203

HD-f

1343

HD-f

HD-f

HD-f

RuvC-f

AF0070

BH0338

y1724

ygcKygcL

RAMP1583

HTH

AF1870

SPy1049

3574

AF1870

PH0918

HTH 3574

1857 RAMP1688

RecB-f1468

Helicase1203

HD-f

AF1870

MTH1090

AF1873

RAMP1583

3574

AF1870

3574

1343

1

7

2

3

4

5

6

7a

Sulsol SSO1999Sulsol SSO1440

Sultok ST2639Sultok ST0032

Sulsol SSO1402Pyraer PAE0068

Arcful AF1874Pyrhor PH0917Thekod TK0450

Pyraby PAB1689

13433513

3513

1518

RuvC-f

HNH

4aHNH

1 0

Pyrhor PH0162

6

64

6

4

16

17

7

7

7

7

1

2

7

7

7

7

5 7

5 75 7

5 75 7

5 7

6 1

7

Metkan MK1314

Metace MA1931

Metmaz MM3356

Metbar Mbar_A1355

Strthe stu0960Myctub Rv2823cStaepi SERP2461

Themar TM1811Mansuc MS1651

Niteur NE0123Thethe TTP0102

Metthe MTH1082

Metthe MTH326

Metjan MJ1672Pictor PTO0053Thevol TVN0112

Metkan MK1297Thethe TTP0115

Bachal BH0328Synech sll7090

Aquaeo aq 387Theten TTE2639

Themar TM1794Arcful AF1867

Pyrfur PF1129Thefus Tfu1578

Porgin PG1987Sultok ST1979

Sulsol SSO1991Sulsol SSO1729

Sultok ST0012Sulaci Saci 2046Nostoc all1479

1421 RAMP1567

RAMP1583

RAMPMTH323

RAMP1769

XMTH324

ldquoPolymeraserdquo cassette

ldquoHelicaserdquo cassette(A)

(B)

(C)

Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4

Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56

Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)

mRNA targeting

Phylogeny of Cas1 and the 3 types of CRISPRCas systems

228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes

EcoliCASS2Type IType I

Type I

Type I

Type I

Type III

Type III

Type III

Type III

Type III

Type III

Type II

YpestCASS3

NmeniCASS4

PolymeraseRAMPmoduleMtubeCASS6

PolRAMP

DvulgCASS1

TneapHmariCASS7

ApernCASS5

cas1cas9 cas2 cas4

cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI

II

IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2

RAMP

Makarova et al NRM submitted

CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria

bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes

54

256

1537

26

516

2

9

7 56107

3

10

13

383

210

46

216

8

1

7 70211

10

1

0102030405060708090

100

1623

84 6

17 7

2322

0

9

0

0

151

2

1

21

17

20 1

0

1533 28 7 14

0

9 7 40

117 210

0102030405060708090

100

Type I Type II Type IIICas1presenceabsence

Makarova et al in preparation

Modular evolution of the 3 types of CRISPRCas

Makarova et al in preparation

Mol Microbiol 2011 Jan79(2)484-502

A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair

Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF

Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair

Back to repair functions

The 3 major modalities of evolution

Koonin Wolf Biol Direct 2009

CRISPRCas as a bona fide Lamarckian system

Koonin Wolf Biol Direct 2009

Phenomenon Biological rolefunction

Phyletic spread

Lamarckian criteria

Genomic changes caused by environmental factor

Changes are specific to relevant genomic loci

Changes provide adaptation to the causative factor

Bona fide LamarckianCRISPRCas Defense against

viruses and other mobile elements

Most of the Archaeaand many bacteria

Yes Yes Yes

piRNA Defense against transposable elements in germline

Animals Yes Yes Yes

HGT (specific cases)

Adaptation to new environment stress response resistance

Archaea bacteria unicellular eukaryotes

Yes Yes Yes

Quasi-LamarckianHGT (general phenomenon)

Diverse innovations Archaea bacteria unicellular eukaryotes

Yes No Yesno

Stress-induced mutagenesis

Stress responseresistanceadaptation to new conditions

Ubiquitous Yes No or partially

Yes (but general evolvabilityenhanced as well)

Diverse Lamarckian and quasi-Lamarckian phenomena

Koonin Wolf Biol Direct 2009

Stress as a gauge of evolutionary modality

Koonin Wolf Biol Direct 2009

bullEvolution of parasites is intrinsic to any replicator system

bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since

bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms

bullPerennial arms race between parasites and hosts is one of the principal factors of evolution

Valerian DoljaOregon State University

Yuri Wolf NCBITatiana Senkevich NIAID NIH

Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille

Didier Raoult et al

Bill MartinUniv Duesseldorf

Kira Makarova NCBI

  • Slide Number 1
  • What is a virus
  • Slide Number 3
  • Slide Number 4
  • Slide Number 5
  • Comparative genomics shows that viruses that cause human diseases belong to families that evolved hundreds of millions or even billions years agoViruses accompany the evolving cellular life throughout its history and might even predate it
  • Slide Number 7
  • Slide Number 8
  • Slide Number 9
  • Slide Number 10
  • Slide Number 11
  • Slide Number 12
  • Slide Number 13
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Slide Number 18
  • Slide Number 19
  • Slide Number 20
  • Slide Number 21
  • Slide Number 22
  • Slide Number 23
  • Slide Number 24
  • Slide Number 25
  • Slide Number 26
  • Slide Number 27
  • Slide Number 28
  • Slide Number 29
  • Slide Number 30
  • Slide Number 31
  • Slide Number 32
  • Slide Number 33
  • Slide Number 34
  • Slide Number 35
  • Slide Number 36
  • Evolution of antivirus defense systems
  • Slide Number 38
  • Slide Number 39
  • Slide Number 40
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins extreme sequence diversity
  • Slide Number 46
  • Slide Number 47
  • Slide Number 48
  • Slide Number 49
  • Hypothesis
  • Slide Number 51
  • Slide Number 52
  • Slide Number 53
  • Slide Number 54
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • The three types of CRISPRCas systems and their signature genes
  • Experimental data on CRISPRCas systems
  • Phylogeny of Cas1 and the 3 types of CRISPRCas systems
  • CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
  • Slide Number 62
  • Slide Number 63
  • Slide Number 64
  • Slide Number 65
  • Slide Number 66
  • Slide Number 67
  • Slide Number 68
  • Slide Number 69
Page 36: The Virus World, its evolution, evolution of antiviral defense

CRISPR repeats and Cas genesCRISPR Clustered Regularly interspaced short palindromic repeatsCas CRISPR-associated (genes)

Sorek et al Nature Rev Microbiol 2008

Makarova KS Aravind L Grishin NV Rogozin IB Koonin EVA DNA repair system specific for thermophilic Archaea and bacteria predicted by genomic context analysisNucleic Acids Res 2002 Jan 1530(2)482-96

During a systematic analysis of conserved gene context in prokaryotic genomes a previously undetected complex partially conserved neighborhood consisting of more than 20 genes was discovered in most Archaea and some bacteria including the hyperthermophiles Thermotoga maritima and Aquifex aeolicus The gene composition and gene order in this neighborhood vary greatly between species but all versions have a stable conserved core that consists of five genes One of the core genes encodes a predicted DNA helicase often fused to a predicted HD-superfamily hydrolase and another encodes a RecB family exonuclease three core genes remain uncharacterized but one of these might encode a nuclease of a new family helliphelliphelliphellipThe functional features of the proteins encoded in this neighborhood suggest that they comprise a previously undetected DNA repair system which to our knowledge is the first repair system largely specific for thermophiles to be identified

Protein components of the system an update with unification of many diverse families

~25 families altogetherFamily SubfamilyA Phyletic

distributionBComments

1 COG1518 COG1518 (cas1) All Putative novel nucleaseintegrase Mostly α-helical protein

2 COG1343 COG1343 (cas2) COG3512ygbF-like

MTH324-likey1723_N-like

All Small protein related to VapD fused to helicase (COG1203) iny1723-like proteins

3 COG1203 COG1203 (cas3) All DNA helicase Most proteins have fusion to HD nuclease

4 RecB-like nuclease

COG1468 (cas4) COG4343 All RecB-like nuclease Contains three-cysteine C-terminal cluster

5 RAMP Repair-associated mysterious

protein

COG1688 (cas5)COG1769 COG1583COG1567 COG1336COG1367 COG1604COG1337 COG1332COG5551

BH0337-likeMJ0978-likeYgcH-like

y1726-like y1727-like

All Belong to ldquoRAMPrdquo family possibly RNA-binding proteinstructurally related to duplicated ferredoxin fold (PDB 1wj9)

6 COG1857 COG1857 COG3649 YgcJ-like y1725-like

All αβ protein predicted nuclease or integrase

7 HD-like nuclease

COG1203 (N-terminus) COG2254 All HD-like nuclease

8 BH0338 BH0338-likeMTH1090-like

All mostlyarchaea and FIRM

Large Zn-finger containing proteins probably nucleases (nucleaseactivity was shown for MTH1090 (ref)

9 COG1353 predicted

polymerase

COG1353 Most archaea some bacteria

Predicted palm-domain polymerase distantly related to viral RdRp and RT

hellipand ~15 other less common proteins Makarova et al BD 2006

CRISPR clustered regularly interspaced short palindromic repeats

CRISPR repeats

TB and TA

Strepto-like

Ecoli-like

Pasteurella-like

Cas2COG1343

Cas1COG1518

Cas4COG1468

Cas3COG1203

ldquoThe common structural characteristics of CRISPR loci are (i) the presence of multiple short direct repeats which show no or very little sequence variation within a given locus (ii) the presence of non-repetitive spacer sequences between the repeats of similar size (iii) the presence of a common leader sequence of a few hundred basepairs in most species harbouring multiple CRISPR loci (iv) the absence of long open reading frames within the locus and (v) the presence of the cas1 gene accompanied by the cas2 cas3 or cas4 genes in CRISPR-containing speciesrdquo

The cas genes are our ldquorepairrdquo system

ldquoHelicaserdquo cassette

ldquoPolymeraserdquo cassette

paragon of prokaryotic genome plasticity-extensive gene shuffling-evidence of multiple horizontal transfers-widespread gene loss

RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins

extreme sequence diversity MOTIFs specific I II III IV V COGs 13361367 hhshhGs ust-lKGhh+hh hhGtt hD lGhttsGh 160413371332 y1726-like slhlpEKuVRGT lRTIDs YGuVTsGhuh COG1851 hGphpGpsaFh hGFGRh BH0337-like TpA-h+GIh-uIh hhLpDV LGsREhuht COG1567 hhhpp ups-lhtAhh luscoGhGh COG1769 hhhh+Ph-hhh ss-hhGhlhsh hGh hGtcp+hsthchp COG1688 (Cas5) hhhhhts sssshhGhlsh lGttphh COGs 15835551 hhhhoPhhl hGtppshGFGl YgcH-like hHphlh hGu+uhGhGhh y1727-like LHphLh GFsthGLStss MJ0978-like hHNH lG+tsuhGhGol

Ferredoxin foldRNA-recognition motif (RRM)

The sequence similarity space of CRISPR repeats visualized with the BioLayoutprogram [26] Dots denote individual repeat sequences connecting lines represent Smith-Waterman similarities such that closer dots represent more similar sequences Dot colors denote cluster association as derived from MCL clustering The 12 largest clusters are indicated by circles together with their sequence logos coarse phylogenetic composition and sample secondary structures where applicableKunin et al Genome Biology 2007 8R61 doi101186gb-2007-8-4-r61

CRISPR show extreme diversity and complex clustering

Arcful GTTGAAATC-------AGACCAAAATGGGATTGAAAG- Metthe GTTAAAATC-------AGACCAAAATGGGATTGAAAT- Pyraby GTTCCAATA-------AGACTAGAATAGAATTGAAAG- Pyrhor GTTTAATAA--------GACTAAAATAGAATTGAAAG- Sulsol GATTAATCC-------------CAAAAGAATTGAAAG- Pyrhor GTTTCCGTA--------GAACTAAATAGTGTGGAAAG- Pyraby GTTTCCGTA--------GAACTTAGTAGTGTGGAAAG- Pyraer GTTTCAACT------------ATCTTTTGATTTTTGG- Pictor GTTAAATAA-------TAACCTAAATAGGATTGAAAG- Theaer GTAAAATAG---------ACCTTAATAGGATTGAAAG- Metace GTTTCAATC-------CCTCAAAGGTCTGATTTTAAC- Sultok GATGAATCC------------CAAAAGGAATTGAAAG- Sulaci GTTTTAGTT-------------TCTTGTCGTTATTAC- Pictor GTTTAAGAA--------TTACTAGATAGTATGGAGT-- Halmar GTTTCAGAC-------GGACCCTTGTGGGGTTGAAGC- Metjan ATTAAAATC-------AGACCGTTTCGGAATGGAAAT- Metkan GTTTCATTA-CCCGTATTATTACGGGTTAATTGCGAG- Nanequ CTTTCAATA---------TTTCTAATATATTAGAAAC- Themar GTTTCAATA-------CTTCCTTAGAGGTATGGAAAC- Theten GTTTCAATC-------CCTTTTAGGTAGGCTAAAAAC Thethe GTTGCAAAC-CTCGTTAGCCTCGTAGAGGATTGAAAC- Aquaeo GTTTTAACT--------CCACACGGTACATTAGAAAC- Bachal GTCGCACTC----TATATGGGT-GCGTGGATTGAAAT- Azoarcus GTGTTCCCC-------GCGCATCGCGGGGGTTGAAG-- Chltep GTCTTCCCC--------ACGCC-CGTGGGGGTGTTTC- Chrvio GTGCTCCCC-------CACGCA-CGTGGGGATGAACCG Clotet GTATTAGTA-------GCACCA-TATTGGAATGTAAAT Anabaena GTTTTAATTAACAAAAATCCCTATCAGGGATTGAAAC- Myctub GTTTCCGTC--CCCTCTCGGGGTTTTGGGTCTGACGAC Yerpes GTTCACTGC--------CGCACAGGCAGCTTAGAAA-- Metcap GTTTCAATCCACTCCCGGCTATTTAGCCGGGAGATAC- Mycgal GTTTTAGCA-CTGTACAATACTTGTGTAAGCAATAAC- Pasmul GTTAACTGC--------CGTATAGGCAGCTTAGAAA-- Pasmul GTTGTAGTTCCCTCTCTCATTTCGCAGTGCTACAAT-- Pasmul GTTCACCAT--------CGTGTAGATGGCTTAGAAA--

All

A large subset of CRISPRs is conserved among diverse specieseven between archaea and bacteria suggesting that

CRISPRs are horizontally transferred together with cas genes

Mojica FJ Diez-Villasenor C Garcia-Martinez J Soria E

Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elementsJ Mol Evol 2005 Feb60(2)174-82

Here we show that CRISPR spacers derive from preexisting sequences either chromosomal or within transmissible genetic elements such as bacteriophages and conjugative plasmids

helliphelliphelliphellipThe transcription of the CRISPR loci (Tang et al 2002) suggests that such activity could be executed by CRISPR-RNA molecules acting as regulatory RNA that specifically recognizes the target through the homologous RNA-spacer sequence similarly to the eukaryotic interference RNA

HypothesisCRISPRCasbull is a prokaryotic immunity system that functions on the

RNAi principle

bull integrates short fragments of essential phageplasmid genes into CRISPRs

bull When expressed these fragments (psiRNA ndash after prokaryotic siRNA) silence the target gene and make the organism immune to the respective agent

bull contains all or most of the protein activities involved in these processes

bull Some of the Cas proteins are functional analogs of the eukaryotic proteins involved in RNAi in particular components of RISC (RNA-Induced Silencing Complex) and form prokaryotic analogs of RISC (pRISCs)

Transcription

(protein-guided) RNA folding

specific transcription factor (COG1517)

cellular RNApol

polycistronic pre-psiRNA

p-dicer (Helicase+HD-Hydrolase)(COG1203 COG2254)

RNA processing(fast)

pre-psiRNA75-100 nt

RNA processing(slow)

p-dicer

psiRNA25-45 nt

p-slicer(COG1468 COG4343

COG1857)

RAMP

target RNA cleavage

p-RISC

plasmid or phage mRNA

Annealing to RNA target

p-RISC

The basic scheme of CASS functioning

psiRNA25-45 nt

RAMP

RAMP-RNA complex(unstable)plasmid or phage

mRNA

Primer elongationCASS RNApol(COG1353)

Amplified annealing

long dsRNA (stable)

p-dicerDuplex degradation

RAMPRAMP binding

Cycle continues

Annealing to RNA target

Variant of CASS functioning with polymerasepsi-RNA amplification(mostly in thermophiles but also in Mycobacteria)

plasmid or phage mRNA

pre-psiRNA75-100 nt

CASS RNApol(COG1353)Or other RT

Reverse transcription with random copy choice

Random RNA recombination and reverse transcription

dsDNA with CRISPRs and target-derived spacers

Homologous recombination with genomic CRISPR region

integrase(COG1518)

genomic DNA

genomic DNA with new target-derived spacer

OR

New psiRNA generation

Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12

Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity

Key validation

bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease

Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes

(1) Adaptation

(2) Expression amp Processing

(3) Interference

invading virus plasmid

CRISPRleader

5 4 3 2 1

6 5 4 3 2 1

invading virus plasmid

invading virus plasmid

host

host

cas operon

Cascade

Cas3

cas3 cas2cas1cas5

Van der Oost et al TIBS 2009

The three types of CRISPRCas systems and their signature genes

1343

1343

1343

1343

Helicase1203

HD-f

Cas2

Cas6

Cas6

Cas5

Cas8

Cas9

Cas1

Cas3

Cas10

Cas11

Cas7

Cas6

Cas6

Cas5 Cas6

Cas4

Cas6

13433513

1343

RuvC-f

3513

HNH

13433513

1857

18571857 RAMP1688

RAMP1688

RecB-f1468

RecB-f1468

Helicase1203

R AMP1583

1857 1857Helicase1203

1857Helicase1203

1343 y1724

ygcKygcL

EcoliCASS2

YpestCASS3

NmeniCASS4

MtubeCASS6 RAMP module

DvulgCASS1

Tneap and HmariCASS7

ApernCASS5

Type I (Helicase cassette)

Cascade

Type II (HNH-type)

Type III (RAMP module or Polymerase cassette)

CASS4a

1857 1343RecB-f1468

Helicase1203

BH0338MTH1090LA3191

1421 RAMP1567

RAMP15833337RAMP

1583

RAMP1583

RAMP1769

SPy1049

Cas6

Cas61857

RecB-f1468

RAMP1688

Helicase1203

AF0070

R AMP1583

1343

1517

Cas12 Cas13

RAMP1688

RAMP1688

RAMP1688

Makarova et al NRM submitted

Experimental data on CRISPRCas systems

1857

1857

1857

RAMP1688

10

Cordip DIP0037Bacfra BF3951

Wolsuc WS1444Camjej Cj1522c

Neimen NMA0630Pasmul PM1126

Strthe str0658Strpyo Spy 0770

Treden TDE0328Staepi SERP2463

Mycgal MGA 0523Mycmob MMOB0320

Porgin PG1982Legpne lpp0161

Wolsuc WS1615Sulsol SSO1450

Aerper APE1240Sulsol SSO1405

Pyraer PAE0200Pyraer PAE0081

Metmaz MM0559Metmaz MM3249

Arcful AF1878Mansuc MS1635

Niteur NE0111Myctub Rv2817c

Nostoc alr0381Synech slr7071Thethe TTHB145

Chrvio CV1229Xanaxo XAC3842

Chltep CT1130Desvul DVUA0134

Metcap MCA0651Azoarc ebA3283

Deigeo_DRAFT_1682Mansuc MS0981

Baccla ABC3592Bachal BH0341

Strpyo Spy 1286Vibvul VVA1544

Nostoc alr1468Synech slr7092

Synech slr7016Geosul GSU0057

Thethe TTHB224Nostoc alr1568

Metkan MK1312

Esccol b2755Saltyp STM2938

Phopro PBPRC0034Geosul GSU1392

Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639

Symthe_STH663Corjei jk0643

Nocfar nfa44220Strave SAV7537

Metcap MCA0930Cordip DIP2214

Chrvio CV1756Zymmob ZMO0680

Pasmul PM0311Erwcar ECA3679

Yerpes y1722Legpne lpl2837Phopro PBPRB1995

Acinet ACIAD2484

Clotet CTC01148Fusnuc FN1177

Aquaeo aq 369Theten TTE2658Themar TM1797

Bacfra BF2544Porgin PG2014

Halmar pNG4053Clotet CTC01463

Metjan MJ0378Pyrhor PH1245

Thekod TK0455Metthe MTH1084

Arcful AF2435Metace MA3670

Nanequ NEQ017Pyrhor PH0173

Pictor PTO0003Pictor PTO0049Thevol TVN0106

1343

1343

1343

1343

1518

1518

1343

1518

RecB-f4343

RecB-f1468

RecB-f1468

RecB-f1468

Helicase12031857 RAMP

1688

RAMP1688

Helicase1203

Helicase1203

RAMPBH0337

RAMPYgcH

RAMPy1727

RAMPy17261857

Helicase1203

HD-f

Helicase1203

HD-f

1343

HD-f

HD-f

HD-f

RuvC-f

AF0070

BH0338

y1724

ygcKygcL

RAMP1583

HTH

AF1870

SPy1049

3574

AF1870

PH0918

HTH 3574

1857 RAMP1688

RecB-f1468

Helicase1203

HD-f

AF1870

MTH1090

AF1873

RAMP1583

3574

AF1870

3574

1343

1

7

2

3

4

5

6

7a

Sulsol SSO1999Sulsol SSO1440

Sultok ST2639Sultok ST0032

Sulsol SSO1402Pyraer PAE0068

Arcful AF1874Pyrhor PH0917Thekod TK0450

Pyraby PAB1689

13433513

3513

1518

RuvC-f

HNH

4aHNH

1 0

Pyrhor PH0162

6

64

6

4

16

17

7

7

7

7

1

2

7

7

7

7

5 7

5 75 7

5 75 7

5 7

6 1

7

Metkan MK1314

Metace MA1931

Metmaz MM3356

Metbar Mbar_A1355

Strthe stu0960Myctub Rv2823cStaepi SERP2461

Themar TM1811Mansuc MS1651

Niteur NE0123Thethe TTP0102

Metthe MTH1082

Metthe MTH326

Metjan MJ1672Pictor PTO0053Thevol TVN0112

Metkan MK1297Thethe TTP0115

Bachal BH0328Synech sll7090

Aquaeo aq 387Theten TTE2639

Themar TM1794Arcful AF1867

Pyrfur PF1129Thefus Tfu1578

Porgin PG1987Sultok ST1979

Sulsol SSO1991Sulsol SSO1729

Sultok ST0012Sulaci Saci 2046Nostoc all1479

1421 RAMP1567

RAMP1583

RAMPMTH323

RAMP1769

XMTH324

ldquoPolymeraserdquo cassette

ldquoHelicaserdquo cassette(A)

(B)

(C)

Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4

Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56

Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)

mRNA targeting

Phylogeny of Cas1 and the 3 types of CRISPRCas systems

228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes

EcoliCASS2Type IType I

Type I

Type I

Type I

Type III

Type III

Type III

Type III

Type III

Type III

Type II

YpestCASS3

NmeniCASS4

PolymeraseRAMPmoduleMtubeCASS6

PolRAMP

DvulgCASS1

TneapHmariCASS7

ApernCASS5

cas1cas9 cas2 cas4

cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI

II

IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2

RAMP

Makarova et al NRM submitted

CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria

bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes

54

256

1537

26

516

2

9

7 56107

3

10

13

383

210

46

216

8

1

7 70211

10

1

0102030405060708090

100

1623

84 6

17 7

2322

0

9

0

0

151

2

1

21

17

20 1

0

1533 28 7 14

0

9 7 40

117 210

0102030405060708090

100

Type I Type II Type IIICas1presenceabsence

Makarova et al in preparation

Modular evolution of the 3 types of CRISPRCas

Makarova et al in preparation

Mol Microbiol 2011 Jan79(2)484-502

A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair

Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF

Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair

Back to repair functions

The 3 major modalities of evolution

Koonin Wolf Biol Direct 2009

CRISPRCas as a bona fide Lamarckian system

Koonin Wolf Biol Direct 2009

Phenomenon Biological rolefunction

Phyletic spread

Lamarckian criteria

Genomic changes caused by environmental factor

Changes are specific to relevant genomic loci

Changes provide adaptation to the causative factor

Bona fide LamarckianCRISPRCas Defense against

viruses and other mobile elements

Most of the Archaeaand many bacteria

Yes Yes Yes

piRNA Defense against transposable elements in germline

Animals Yes Yes Yes

HGT (specific cases)

Adaptation to new environment stress response resistance

Archaea bacteria unicellular eukaryotes

Yes Yes Yes

Quasi-LamarckianHGT (general phenomenon)

Diverse innovations Archaea bacteria unicellular eukaryotes

Yes No Yesno

Stress-induced mutagenesis

Stress responseresistanceadaptation to new conditions

Ubiquitous Yes No or partially

Yes (but general evolvabilityenhanced as well)

Diverse Lamarckian and quasi-Lamarckian phenomena

Koonin Wolf Biol Direct 2009

Stress as a gauge of evolutionary modality

Koonin Wolf Biol Direct 2009

bullEvolution of parasites is intrinsic to any replicator system

bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since

bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms

bullPerennial arms race between parasites and hosts is one of the principal factors of evolution

Valerian DoljaOregon State University

Yuri Wolf NCBITatiana Senkevich NIAID NIH

Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille

Didier Raoult et al

Bill MartinUniv Duesseldorf

Kira Makarova NCBI

  • Slide Number 1
  • What is a virus
  • Slide Number 3
  • Slide Number 4
  • Slide Number 5
  • Comparative genomics shows that viruses that cause human diseases belong to families that evolved hundreds of millions or even billions years agoViruses accompany the evolving cellular life throughout its history and might even predate it
  • Slide Number 7
  • Slide Number 8
  • Slide Number 9
  • Slide Number 10
  • Slide Number 11
  • Slide Number 12
  • Slide Number 13
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Slide Number 18
  • Slide Number 19
  • Slide Number 20
  • Slide Number 21
  • Slide Number 22
  • Slide Number 23
  • Slide Number 24
  • Slide Number 25
  • Slide Number 26
  • Slide Number 27
  • Slide Number 28
  • Slide Number 29
  • Slide Number 30
  • Slide Number 31
  • Slide Number 32
  • Slide Number 33
  • Slide Number 34
  • Slide Number 35
  • Slide Number 36
  • Evolution of antivirus defense systems
  • Slide Number 38
  • Slide Number 39
  • Slide Number 40
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins extreme sequence diversity
  • Slide Number 46
  • Slide Number 47
  • Slide Number 48
  • Slide Number 49
  • Hypothesis
  • Slide Number 51
  • Slide Number 52
  • Slide Number 53
  • Slide Number 54
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • The three types of CRISPRCas systems and their signature genes
  • Experimental data on CRISPRCas systems
  • Phylogeny of Cas1 and the 3 types of CRISPRCas systems
  • CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
  • Slide Number 62
  • Slide Number 63
  • Slide Number 64
  • Slide Number 65
  • Slide Number 66
  • Slide Number 67
  • Slide Number 68
  • Slide Number 69
Page 37: The Virus World, its evolution, evolution of antiviral defense

Makarova KS Aravind L Grishin NV Rogozin IB Koonin EVA DNA repair system specific for thermophilic Archaea and bacteria predicted by genomic context analysisNucleic Acids Res 2002 Jan 1530(2)482-96

During a systematic analysis of conserved gene context in prokaryotic genomes a previously undetected complex partially conserved neighborhood consisting of more than 20 genes was discovered in most Archaea and some bacteria including the hyperthermophiles Thermotoga maritima and Aquifex aeolicus The gene composition and gene order in this neighborhood vary greatly between species but all versions have a stable conserved core that consists of five genes One of the core genes encodes a predicted DNA helicase often fused to a predicted HD-superfamily hydrolase and another encodes a RecB family exonuclease three core genes remain uncharacterized but one of these might encode a nuclease of a new family helliphelliphelliphellipThe functional features of the proteins encoded in this neighborhood suggest that they comprise a previously undetected DNA repair system which to our knowledge is the first repair system largely specific for thermophiles to be identified

Protein components of the system an update with unification of many diverse families

~25 families altogetherFamily SubfamilyA Phyletic

distributionBComments

1 COG1518 COG1518 (cas1) All Putative novel nucleaseintegrase Mostly α-helical protein

2 COG1343 COG1343 (cas2) COG3512ygbF-like

MTH324-likey1723_N-like

All Small protein related to VapD fused to helicase (COG1203) iny1723-like proteins

3 COG1203 COG1203 (cas3) All DNA helicase Most proteins have fusion to HD nuclease

4 RecB-like nuclease

COG1468 (cas4) COG4343 All RecB-like nuclease Contains three-cysteine C-terminal cluster

5 RAMP Repair-associated mysterious

protein

COG1688 (cas5)COG1769 COG1583COG1567 COG1336COG1367 COG1604COG1337 COG1332COG5551

BH0337-likeMJ0978-likeYgcH-like

y1726-like y1727-like

All Belong to ldquoRAMPrdquo family possibly RNA-binding proteinstructurally related to duplicated ferredoxin fold (PDB 1wj9)

6 COG1857 COG1857 COG3649 YgcJ-like y1725-like

All αβ protein predicted nuclease or integrase

7 HD-like nuclease

COG1203 (N-terminus) COG2254 All HD-like nuclease

8 BH0338 BH0338-likeMTH1090-like

All mostlyarchaea and FIRM

Large Zn-finger containing proteins probably nucleases (nucleaseactivity was shown for MTH1090 (ref)

9 COG1353 predicted

polymerase

COG1353 Most archaea some bacteria

Predicted palm-domain polymerase distantly related to viral RdRp and RT

hellipand ~15 other less common proteins Makarova et al BD 2006

CRISPR clustered regularly interspaced short palindromic repeats

CRISPR repeats

TB and TA

Strepto-like

Ecoli-like

Pasteurella-like

Cas2COG1343

Cas1COG1518

Cas4COG1468

Cas3COG1203

ldquoThe common structural characteristics of CRISPR loci are (i) the presence of multiple short direct repeats which show no or very little sequence variation within a given locus (ii) the presence of non-repetitive spacer sequences between the repeats of similar size (iii) the presence of a common leader sequence of a few hundred basepairs in most species harbouring multiple CRISPR loci (iv) the absence of long open reading frames within the locus and (v) the presence of the cas1 gene accompanied by the cas2 cas3 or cas4 genes in CRISPR-containing speciesrdquo

The cas genes are our ldquorepairrdquo system

ldquoHelicaserdquo cassette

ldquoPolymeraserdquo cassette

paragon of prokaryotic genome plasticity-extensive gene shuffling-evidence of multiple horizontal transfers-widespread gene loss

RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins

extreme sequence diversity MOTIFs specific I II III IV V COGs 13361367 hhshhGs ust-lKGhh+hh hhGtt hD lGhttsGh 160413371332 y1726-like slhlpEKuVRGT lRTIDs YGuVTsGhuh COG1851 hGphpGpsaFh hGFGRh BH0337-like TpA-h+GIh-uIh hhLpDV LGsREhuht COG1567 hhhpp ups-lhtAhh luscoGhGh COG1769 hhhh+Ph-hhh ss-hhGhlhsh hGh hGtcp+hsthchp COG1688 (Cas5) hhhhhts sssshhGhlsh lGttphh COGs 15835551 hhhhoPhhl hGtppshGFGl YgcH-like hHphlh hGu+uhGhGhh y1727-like LHphLh GFsthGLStss MJ0978-like hHNH lG+tsuhGhGol

Ferredoxin foldRNA-recognition motif (RRM)

The sequence similarity space of CRISPR repeats visualized with the BioLayoutprogram [26] Dots denote individual repeat sequences connecting lines represent Smith-Waterman similarities such that closer dots represent more similar sequences Dot colors denote cluster association as derived from MCL clustering The 12 largest clusters are indicated by circles together with their sequence logos coarse phylogenetic composition and sample secondary structures where applicableKunin et al Genome Biology 2007 8R61 doi101186gb-2007-8-4-r61

CRISPR show extreme diversity and complex clustering

Arcful GTTGAAATC-------AGACCAAAATGGGATTGAAAG- Metthe GTTAAAATC-------AGACCAAAATGGGATTGAAAT- Pyraby GTTCCAATA-------AGACTAGAATAGAATTGAAAG- Pyrhor GTTTAATAA--------GACTAAAATAGAATTGAAAG- Sulsol GATTAATCC-------------CAAAAGAATTGAAAG- Pyrhor GTTTCCGTA--------GAACTAAATAGTGTGGAAAG- Pyraby GTTTCCGTA--------GAACTTAGTAGTGTGGAAAG- Pyraer GTTTCAACT------------ATCTTTTGATTTTTGG- Pictor GTTAAATAA-------TAACCTAAATAGGATTGAAAG- Theaer GTAAAATAG---------ACCTTAATAGGATTGAAAG- Metace GTTTCAATC-------CCTCAAAGGTCTGATTTTAAC- Sultok GATGAATCC------------CAAAAGGAATTGAAAG- Sulaci GTTTTAGTT-------------TCTTGTCGTTATTAC- Pictor GTTTAAGAA--------TTACTAGATAGTATGGAGT-- Halmar GTTTCAGAC-------GGACCCTTGTGGGGTTGAAGC- Metjan ATTAAAATC-------AGACCGTTTCGGAATGGAAAT- Metkan GTTTCATTA-CCCGTATTATTACGGGTTAATTGCGAG- Nanequ CTTTCAATA---------TTTCTAATATATTAGAAAC- Themar GTTTCAATA-------CTTCCTTAGAGGTATGGAAAC- Theten GTTTCAATC-------CCTTTTAGGTAGGCTAAAAAC Thethe GTTGCAAAC-CTCGTTAGCCTCGTAGAGGATTGAAAC- Aquaeo GTTTTAACT--------CCACACGGTACATTAGAAAC- Bachal GTCGCACTC----TATATGGGT-GCGTGGATTGAAAT- Azoarcus GTGTTCCCC-------GCGCATCGCGGGGGTTGAAG-- Chltep GTCTTCCCC--------ACGCC-CGTGGGGGTGTTTC- Chrvio GTGCTCCCC-------CACGCA-CGTGGGGATGAACCG Clotet GTATTAGTA-------GCACCA-TATTGGAATGTAAAT Anabaena GTTTTAATTAACAAAAATCCCTATCAGGGATTGAAAC- Myctub GTTTCCGTC--CCCTCTCGGGGTTTTGGGTCTGACGAC Yerpes GTTCACTGC--------CGCACAGGCAGCTTAGAAA-- Metcap GTTTCAATCCACTCCCGGCTATTTAGCCGGGAGATAC- Mycgal GTTTTAGCA-CTGTACAATACTTGTGTAAGCAATAAC- Pasmul GTTAACTGC--------CGTATAGGCAGCTTAGAAA-- Pasmul GTTGTAGTTCCCTCTCTCATTTCGCAGTGCTACAAT-- Pasmul GTTCACCAT--------CGTGTAGATGGCTTAGAAA--

All

A large subset of CRISPRs is conserved among diverse specieseven between archaea and bacteria suggesting that

CRISPRs are horizontally transferred together with cas genes

Mojica FJ Diez-Villasenor C Garcia-Martinez J Soria E

Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elementsJ Mol Evol 2005 Feb60(2)174-82

Here we show that CRISPR spacers derive from preexisting sequences either chromosomal or within transmissible genetic elements such as bacteriophages and conjugative plasmids

helliphelliphelliphellipThe transcription of the CRISPR loci (Tang et al 2002) suggests that such activity could be executed by CRISPR-RNA molecules acting as regulatory RNA that specifically recognizes the target through the homologous RNA-spacer sequence similarly to the eukaryotic interference RNA

HypothesisCRISPRCasbull is a prokaryotic immunity system that functions on the

RNAi principle

bull integrates short fragments of essential phageplasmid genes into CRISPRs

bull When expressed these fragments (psiRNA ndash after prokaryotic siRNA) silence the target gene and make the organism immune to the respective agent

bull contains all or most of the protein activities involved in these processes

bull Some of the Cas proteins are functional analogs of the eukaryotic proteins involved in RNAi in particular components of RISC (RNA-Induced Silencing Complex) and form prokaryotic analogs of RISC (pRISCs)

Transcription

(protein-guided) RNA folding

specific transcription factor (COG1517)

cellular RNApol

polycistronic pre-psiRNA

p-dicer (Helicase+HD-Hydrolase)(COG1203 COG2254)

RNA processing(fast)

pre-psiRNA75-100 nt

RNA processing(slow)

p-dicer

psiRNA25-45 nt

p-slicer(COG1468 COG4343

COG1857)

RAMP

target RNA cleavage

p-RISC

plasmid or phage mRNA

Annealing to RNA target

p-RISC

The basic scheme of CASS functioning

psiRNA25-45 nt

RAMP

RAMP-RNA complex(unstable)plasmid or phage

mRNA

Primer elongationCASS RNApol(COG1353)

Amplified annealing

long dsRNA (stable)

p-dicerDuplex degradation

RAMPRAMP binding

Cycle continues

Annealing to RNA target

Variant of CASS functioning with polymerasepsi-RNA amplification(mostly in thermophiles but also in Mycobacteria)

plasmid or phage mRNA

pre-psiRNA75-100 nt

CASS RNApol(COG1353)Or other RT

Reverse transcription with random copy choice

Random RNA recombination and reverse transcription

dsDNA with CRISPRs and target-derived spacers

Homologous recombination with genomic CRISPR region

integrase(COG1518)

genomic DNA

genomic DNA with new target-derived spacer

OR

New psiRNA generation

Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12

Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity

Key validation

bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease

Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes

(1) Adaptation

(2) Expression amp Processing

(3) Interference

invading virus plasmid

CRISPRleader

5 4 3 2 1

6 5 4 3 2 1

invading virus plasmid

invading virus plasmid

host

host

cas operon

Cascade

Cas3

cas3 cas2cas1cas5

Van der Oost et al TIBS 2009

The three types of CRISPRCas systems and their signature genes

1343

1343

1343

1343

Helicase1203

HD-f

Cas2

Cas6

Cas6

Cas5

Cas8

Cas9

Cas1

Cas3

Cas10

Cas11

Cas7

Cas6

Cas6

Cas5 Cas6

Cas4

Cas6

13433513

1343

RuvC-f

3513

HNH

13433513

1857

18571857 RAMP1688

RAMP1688

RecB-f1468

RecB-f1468

Helicase1203

R AMP1583

1857 1857Helicase1203

1857Helicase1203

1343 y1724

ygcKygcL

EcoliCASS2

YpestCASS3

NmeniCASS4

MtubeCASS6 RAMP module

DvulgCASS1

Tneap and HmariCASS7

ApernCASS5

Type I (Helicase cassette)

Cascade

Type II (HNH-type)

Type III (RAMP module or Polymerase cassette)

CASS4a

1857 1343RecB-f1468

Helicase1203

BH0338MTH1090LA3191

1421 RAMP1567

RAMP15833337RAMP

1583

RAMP1583

RAMP1769

SPy1049

Cas6

Cas61857

RecB-f1468

RAMP1688

Helicase1203

AF0070

R AMP1583

1343

1517

Cas12 Cas13

RAMP1688

RAMP1688

RAMP1688

Makarova et al NRM submitted

Experimental data on CRISPRCas systems

1857

1857

1857

RAMP1688

10

Cordip DIP0037Bacfra BF3951

Wolsuc WS1444Camjej Cj1522c

Neimen NMA0630Pasmul PM1126

Strthe str0658Strpyo Spy 0770

Treden TDE0328Staepi SERP2463

Mycgal MGA 0523Mycmob MMOB0320

Porgin PG1982Legpne lpp0161

Wolsuc WS1615Sulsol SSO1450

Aerper APE1240Sulsol SSO1405

Pyraer PAE0200Pyraer PAE0081

Metmaz MM0559Metmaz MM3249

Arcful AF1878Mansuc MS1635

Niteur NE0111Myctub Rv2817c

Nostoc alr0381Synech slr7071Thethe TTHB145

Chrvio CV1229Xanaxo XAC3842

Chltep CT1130Desvul DVUA0134

Metcap MCA0651Azoarc ebA3283

Deigeo_DRAFT_1682Mansuc MS0981

Baccla ABC3592Bachal BH0341

Strpyo Spy 1286Vibvul VVA1544

Nostoc alr1468Synech slr7092

Synech slr7016Geosul GSU0057

Thethe TTHB224Nostoc alr1568

Metkan MK1312

Esccol b2755Saltyp STM2938

Phopro PBPRC0034Geosul GSU1392

Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639

Symthe_STH663Corjei jk0643

Nocfar nfa44220Strave SAV7537

Metcap MCA0930Cordip DIP2214

Chrvio CV1756Zymmob ZMO0680

Pasmul PM0311Erwcar ECA3679

Yerpes y1722Legpne lpl2837Phopro PBPRB1995

Acinet ACIAD2484

Clotet CTC01148Fusnuc FN1177

Aquaeo aq 369Theten TTE2658Themar TM1797

Bacfra BF2544Porgin PG2014

Halmar pNG4053Clotet CTC01463

Metjan MJ0378Pyrhor PH1245

Thekod TK0455Metthe MTH1084

Arcful AF2435Metace MA3670

Nanequ NEQ017Pyrhor PH0173

Pictor PTO0003Pictor PTO0049Thevol TVN0106

1343

1343

1343

1343

1518

1518

1343

1518

RecB-f4343

RecB-f1468

RecB-f1468

RecB-f1468

Helicase12031857 RAMP

1688

RAMP1688

Helicase1203

Helicase1203

RAMPBH0337

RAMPYgcH

RAMPy1727

RAMPy17261857

Helicase1203

HD-f

Helicase1203

HD-f

1343

HD-f

HD-f

HD-f

RuvC-f

AF0070

BH0338

y1724

ygcKygcL

RAMP1583

HTH

AF1870

SPy1049

3574

AF1870

PH0918

HTH 3574

1857 RAMP1688

RecB-f1468

Helicase1203

HD-f

AF1870

MTH1090

AF1873

RAMP1583

3574

AF1870

3574

1343

1

7

2

3

4

5

6

7a

Sulsol SSO1999Sulsol SSO1440

Sultok ST2639Sultok ST0032

Sulsol SSO1402Pyraer PAE0068

Arcful AF1874Pyrhor PH0917Thekod TK0450

Pyraby PAB1689

13433513

3513

1518

RuvC-f

HNH

4aHNH

1 0

Pyrhor PH0162

6

64

6

4

16

17

7

7

7

7

1

2

7

7

7

7

5 7

5 75 7

5 75 7

5 7

6 1

7

Metkan MK1314

Metace MA1931

Metmaz MM3356

Metbar Mbar_A1355

Strthe stu0960Myctub Rv2823cStaepi SERP2461

Themar TM1811Mansuc MS1651

Niteur NE0123Thethe TTP0102

Metthe MTH1082

Metthe MTH326

Metjan MJ1672Pictor PTO0053Thevol TVN0112

Metkan MK1297Thethe TTP0115

Bachal BH0328Synech sll7090

Aquaeo aq 387Theten TTE2639

Themar TM1794Arcful AF1867

Pyrfur PF1129Thefus Tfu1578

Porgin PG1987Sultok ST1979

Sulsol SSO1991Sulsol SSO1729

Sultok ST0012Sulaci Saci 2046Nostoc all1479

1421 RAMP1567

RAMP1583

RAMPMTH323

RAMP1769

XMTH324

ldquoPolymeraserdquo cassette

ldquoHelicaserdquo cassette(A)

(B)

(C)

Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4

Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56

Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)

mRNA targeting

Phylogeny of Cas1 and the 3 types of CRISPRCas systems

228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes

EcoliCASS2Type IType I

Type I

Type I

Type I

Type III

Type III

Type III

Type III

Type III

Type III

Type II

YpestCASS3

NmeniCASS4

PolymeraseRAMPmoduleMtubeCASS6

PolRAMP

DvulgCASS1

TneapHmariCASS7

ApernCASS5

cas1cas9 cas2 cas4

cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI

II

IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2

RAMP

Makarova et al NRM submitted

CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria

bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes

54

256

1537

26

516

2

9

7 56107

3

10

13

383

210

46

216

8

1

7 70211

10

1

0102030405060708090

100

1623

84 6

17 7

2322

0

9

0

0

151

2

1

21

17

20 1

0

1533 28 7 14

0

9 7 40

117 210

0102030405060708090

100

Type I Type II Type IIICas1presenceabsence

Makarova et al in preparation

Modular evolution of the 3 types of CRISPRCas

Makarova et al in preparation

Mol Microbiol 2011 Jan79(2)484-502

A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair

Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF

Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair

Back to repair functions

The 3 major modalities of evolution

Koonin Wolf Biol Direct 2009

CRISPRCas as a bona fide Lamarckian system

Koonin Wolf Biol Direct 2009

Phenomenon Biological rolefunction

Phyletic spread

Lamarckian criteria

Genomic changes caused by environmental factor

Changes are specific to relevant genomic loci

Changes provide adaptation to the causative factor

Bona fide LamarckianCRISPRCas Defense against

viruses and other mobile elements

Most of the Archaeaand many bacteria

Yes Yes Yes

piRNA Defense against transposable elements in germline

Animals Yes Yes Yes

HGT (specific cases)

Adaptation to new environment stress response resistance

Archaea bacteria unicellular eukaryotes

Yes Yes Yes

Quasi-LamarckianHGT (general phenomenon)

Diverse innovations Archaea bacteria unicellular eukaryotes

Yes No Yesno

Stress-induced mutagenesis

Stress responseresistanceadaptation to new conditions

Ubiquitous Yes No or partially

Yes (but general evolvabilityenhanced as well)

Diverse Lamarckian and quasi-Lamarckian phenomena

Koonin Wolf Biol Direct 2009

Stress as a gauge of evolutionary modality

Koonin Wolf Biol Direct 2009

bullEvolution of parasites is intrinsic to any replicator system

bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since

bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms

bullPerennial arms race between parasites and hosts is one of the principal factors of evolution

Valerian DoljaOregon State University

Yuri Wolf NCBITatiana Senkevich NIAID NIH

Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille

Didier Raoult et al

Bill MartinUniv Duesseldorf

Kira Makarova NCBI

  • Slide Number 1
  • What is a virus
  • Slide Number 3
  • Slide Number 4
  • Slide Number 5
  • Comparative genomics shows that viruses that cause human diseases belong to families that evolved hundreds of millions or even billions years agoViruses accompany the evolving cellular life throughout its history and might even predate it
  • Slide Number 7
  • Slide Number 8
  • Slide Number 9
  • Slide Number 10
  • Slide Number 11
  • Slide Number 12
  • Slide Number 13
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Slide Number 18
  • Slide Number 19
  • Slide Number 20
  • Slide Number 21
  • Slide Number 22
  • Slide Number 23
  • Slide Number 24
  • Slide Number 25
  • Slide Number 26
  • Slide Number 27
  • Slide Number 28
  • Slide Number 29
  • Slide Number 30
  • Slide Number 31
  • Slide Number 32
  • Slide Number 33
  • Slide Number 34
  • Slide Number 35
  • Slide Number 36
  • Evolution of antivirus defense systems
  • Slide Number 38
  • Slide Number 39
  • Slide Number 40
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins extreme sequence diversity
  • Slide Number 46
  • Slide Number 47
  • Slide Number 48
  • Slide Number 49
  • Hypothesis
  • Slide Number 51
  • Slide Number 52
  • Slide Number 53
  • Slide Number 54
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • The three types of CRISPRCas systems and their signature genes
  • Experimental data on CRISPRCas systems
  • Phylogeny of Cas1 and the 3 types of CRISPRCas systems
  • CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
  • Slide Number 62
  • Slide Number 63
  • Slide Number 64
  • Slide Number 65
  • Slide Number 66
  • Slide Number 67
  • Slide Number 68
  • Slide Number 69
Page 38: The Virus World, its evolution, evolution of antiviral defense

Protein components of the system an update with unification of many diverse families

~25 families altogetherFamily SubfamilyA Phyletic

distributionBComments

1 COG1518 COG1518 (cas1) All Putative novel nucleaseintegrase Mostly α-helical protein

2 COG1343 COG1343 (cas2) COG3512ygbF-like

MTH324-likey1723_N-like

All Small protein related to VapD fused to helicase (COG1203) iny1723-like proteins

3 COG1203 COG1203 (cas3) All DNA helicase Most proteins have fusion to HD nuclease

4 RecB-like nuclease

COG1468 (cas4) COG4343 All RecB-like nuclease Contains three-cysteine C-terminal cluster

5 RAMP Repair-associated mysterious

protein

COG1688 (cas5)COG1769 COG1583COG1567 COG1336COG1367 COG1604COG1337 COG1332COG5551

BH0337-likeMJ0978-likeYgcH-like

y1726-like y1727-like

All Belong to ldquoRAMPrdquo family possibly RNA-binding proteinstructurally related to duplicated ferredoxin fold (PDB 1wj9)

6 COG1857 COG1857 COG3649 YgcJ-like y1725-like

All αβ protein predicted nuclease or integrase

7 HD-like nuclease

COG1203 (N-terminus) COG2254 All HD-like nuclease

8 BH0338 BH0338-likeMTH1090-like

All mostlyarchaea and FIRM

Large Zn-finger containing proteins probably nucleases (nucleaseactivity was shown for MTH1090 (ref)

9 COG1353 predicted

polymerase

COG1353 Most archaea some bacteria

Predicted palm-domain polymerase distantly related to viral RdRp and RT

hellipand ~15 other less common proteins Makarova et al BD 2006

CRISPR clustered regularly interspaced short palindromic repeats

CRISPR repeats

TB and TA

Strepto-like

Ecoli-like

Pasteurella-like

Cas2COG1343

Cas1COG1518

Cas4COG1468

Cas3COG1203

ldquoThe common structural characteristics of CRISPR loci are (i) the presence of multiple short direct repeats which show no or very little sequence variation within a given locus (ii) the presence of non-repetitive spacer sequences between the repeats of similar size (iii) the presence of a common leader sequence of a few hundred basepairs in most species harbouring multiple CRISPR loci (iv) the absence of long open reading frames within the locus and (v) the presence of the cas1 gene accompanied by the cas2 cas3 or cas4 genes in CRISPR-containing speciesrdquo

The cas genes are our ldquorepairrdquo system

ldquoHelicaserdquo cassette

ldquoPolymeraserdquo cassette

paragon of prokaryotic genome plasticity-extensive gene shuffling-evidence of multiple horizontal transfers-widespread gene loss

RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins

extreme sequence diversity MOTIFs specific I II III IV V COGs 13361367 hhshhGs ust-lKGhh+hh hhGtt hD lGhttsGh 160413371332 y1726-like slhlpEKuVRGT lRTIDs YGuVTsGhuh COG1851 hGphpGpsaFh hGFGRh BH0337-like TpA-h+GIh-uIh hhLpDV LGsREhuht COG1567 hhhpp ups-lhtAhh luscoGhGh COG1769 hhhh+Ph-hhh ss-hhGhlhsh hGh hGtcp+hsthchp COG1688 (Cas5) hhhhhts sssshhGhlsh lGttphh COGs 15835551 hhhhoPhhl hGtppshGFGl YgcH-like hHphlh hGu+uhGhGhh y1727-like LHphLh GFsthGLStss MJ0978-like hHNH lG+tsuhGhGol

Ferredoxin foldRNA-recognition motif (RRM)

The sequence similarity space of CRISPR repeats visualized with the BioLayoutprogram [26] Dots denote individual repeat sequences connecting lines represent Smith-Waterman similarities such that closer dots represent more similar sequences Dot colors denote cluster association as derived from MCL clustering The 12 largest clusters are indicated by circles together with their sequence logos coarse phylogenetic composition and sample secondary structures where applicableKunin et al Genome Biology 2007 8R61 doi101186gb-2007-8-4-r61

CRISPR show extreme diversity and complex clustering

Arcful GTTGAAATC-------AGACCAAAATGGGATTGAAAG- Metthe GTTAAAATC-------AGACCAAAATGGGATTGAAAT- Pyraby GTTCCAATA-------AGACTAGAATAGAATTGAAAG- Pyrhor GTTTAATAA--------GACTAAAATAGAATTGAAAG- Sulsol GATTAATCC-------------CAAAAGAATTGAAAG- Pyrhor GTTTCCGTA--------GAACTAAATAGTGTGGAAAG- Pyraby GTTTCCGTA--------GAACTTAGTAGTGTGGAAAG- Pyraer GTTTCAACT------------ATCTTTTGATTTTTGG- Pictor GTTAAATAA-------TAACCTAAATAGGATTGAAAG- Theaer GTAAAATAG---------ACCTTAATAGGATTGAAAG- Metace GTTTCAATC-------CCTCAAAGGTCTGATTTTAAC- Sultok GATGAATCC------------CAAAAGGAATTGAAAG- Sulaci GTTTTAGTT-------------TCTTGTCGTTATTAC- Pictor GTTTAAGAA--------TTACTAGATAGTATGGAGT-- Halmar GTTTCAGAC-------GGACCCTTGTGGGGTTGAAGC- Metjan ATTAAAATC-------AGACCGTTTCGGAATGGAAAT- Metkan GTTTCATTA-CCCGTATTATTACGGGTTAATTGCGAG- Nanequ CTTTCAATA---------TTTCTAATATATTAGAAAC- Themar GTTTCAATA-------CTTCCTTAGAGGTATGGAAAC- Theten GTTTCAATC-------CCTTTTAGGTAGGCTAAAAAC Thethe GTTGCAAAC-CTCGTTAGCCTCGTAGAGGATTGAAAC- Aquaeo GTTTTAACT--------CCACACGGTACATTAGAAAC- Bachal GTCGCACTC----TATATGGGT-GCGTGGATTGAAAT- Azoarcus GTGTTCCCC-------GCGCATCGCGGGGGTTGAAG-- Chltep GTCTTCCCC--------ACGCC-CGTGGGGGTGTTTC- Chrvio GTGCTCCCC-------CACGCA-CGTGGGGATGAACCG Clotet GTATTAGTA-------GCACCA-TATTGGAATGTAAAT Anabaena GTTTTAATTAACAAAAATCCCTATCAGGGATTGAAAC- Myctub GTTTCCGTC--CCCTCTCGGGGTTTTGGGTCTGACGAC Yerpes GTTCACTGC--------CGCACAGGCAGCTTAGAAA-- Metcap GTTTCAATCCACTCCCGGCTATTTAGCCGGGAGATAC- Mycgal GTTTTAGCA-CTGTACAATACTTGTGTAAGCAATAAC- Pasmul GTTAACTGC--------CGTATAGGCAGCTTAGAAA-- Pasmul GTTGTAGTTCCCTCTCTCATTTCGCAGTGCTACAAT-- Pasmul GTTCACCAT--------CGTGTAGATGGCTTAGAAA--

All

A large subset of CRISPRs is conserved among diverse specieseven between archaea and bacteria suggesting that

CRISPRs are horizontally transferred together with cas genes

Mojica FJ Diez-Villasenor C Garcia-Martinez J Soria E

Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elementsJ Mol Evol 2005 Feb60(2)174-82

Here we show that CRISPR spacers derive from preexisting sequences either chromosomal or within transmissible genetic elements such as bacteriophages and conjugative plasmids

helliphelliphelliphellipThe transcription of the CRISPR loci (Tang et al 2002) suggests that such activity could be executed by CRISPR-RNA molecules acting as regulatory RNA that specifically recognizes the target through the homologous RNA-spacer sequence similarly to the eukaryotic interference RNA

HypothesisCRISPRCasbull is a prokaryotic immunity system that functions on the

RNAi principle

bull integrates short fragments of essential phageplasmid genes into CRISPRs

bull When expressed these fragments (psiRNA ndash after prokaryotic siRNA) silence the target gene and make the organism immune to the respective agent

bull contains all or most of the protein activities involved in these processes

bull Some of the Cas proteins are functional analogs of the eukaryotic proteins involved in RNAi in particular components of RISC (RNA-Induced Silencing Complex) and form prokaryotic analogs of RISC (pRISCs)

Transcription

(protein-guided) RNA folding

specific transcription factor (COG1517)

cellular RNApol

polycistronic pre-psiRNA

p-dicer (Helicase+HD-Hydrolase)(COG1203 COG2254)

RNA processing(fast)

pre-psiRNA75-100 nt

RNA processing(slow)

p-dicer

psiRNA25-45 nt

p-slicer(COG1468 COG4343

COG1857)

RAMP

target RNA cleavage

p-RISC

plasmid or phage mRNA

Annealing to RNA target

p-RISC

The basic scheme of CASS functioning

psiRNA25-45 nt

RAMP

RAMP-RNA complex(unstable)plasmid or phage

mRNA

Primer elongationCASS RNApol(COG1353)

Amplified annealing

long dsRNA (stable)

p-dicerDuplex degradation

RAMPRAMP binding

Cycle continues

Annealing to RNA target

Variant of CASS functioning with polymerasepsi-RNA amplification(mostly in thermophiles but also in Mycobacteria)

plasmid or phage mRNA

pre-psiRNA75-100 nt

CASS RNApol(COG1353)Or other RT

Reverse transcription with random copy choice

Random RNA recombination and reverse transcription

dsDNA with CRISPRs and target-derived spacers

Homologous recombination with genomic CRISPR region

integrase(COG1518)

genomic DNA

genomic DNA with new target-derived spacer

OR

New psiRNA generation

Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12

Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity

Key validation

bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease

Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes

(1) Adaptation

(2) Expression amp Processing

(3) Interference

invading virus plasmid

CRISPRleader

5 4 3 2 1

6 5 4 3 2 1

invading virus plasmid

invading virus plasmid

host

host

cas operon

Cascade

Cas3

cas3 cas2cas1cas5

Van der Oost et al TIBS 2009

The three types of CRISPRCas systems and their signature genes

1343

1343

1343

1343

Helicase1203

HD-f

Cas2

Cas6

Cas6

Cas5

Cas8

Cas9

Cas1

Cas3

Cas10

Cas11

Cas7

Cas6

Cas6

Cas5 Cas6

Cas4

Cas6

13433513

1343

RuvC-f

3513

HNH

13433513

1857

18571857 RAMP1688

RAMP1688

RecB-f1468

RecB-f1468

Helicase1203

R AMP1583

1857 1857Helicase1203

1857Helicase1203

1343 y1724

ygcKygcL

EcoliCASS2

YpestCASS3

NmeniCASS4

MtubeCASS6 RAMP module

DvulgCASS1

Tneap and HmariCASS7

ApernCASS5

Type I (Helicase cassette)

Cascade

Type II (HNH-type)

Type III (RAMP module or Polymerase cassette)

CASS4a

1857 1343RecB-f1468

Helicase1203

BH0338MTH1090LA3191

1421 RAMP1567

RAMP15833337RAMP

1583

RAMP1583

RAMP1769

SPy1049

Cas6

Cas61857

RecB-f1468

RAMP1688

Helicase1203

AF0070

R AMP1583

1343

1517

Cas12 Cas13

RAMP1688

RAMP1688

RAMP1688

Makarova et al NRM submitted

Experimental data on CRISPRCas systems

1857

1857

1857

RAMP1688

10

Cordip DIP0037Bacfra BF3951

Wolsuc WS1444Camjej Cj1522c

Neimen NMA0630Pasmul PM1126

Strthe str0658Strpyo Spy 0770

Treden TDE0328Staepi SERP2463

Mycgal MGA 0523Mycmob MMOB0320

Porgin PG1982Legpne lpp0161

Wolsuc WS1615Sulsol SSO1450

Aerper APE1240Sulsol SSO1405

Pyraer PAE0200Pyraer PAE0081

Metmaz MM0559Metmaz MM3249

Arcful AF1878Mansuc MS1635

Niteur NE0111Myctub Rv2817c

Nostoc alr0381Synech slr7071Thethe TTHB145

Chrvio CV1229Xanaxo XAC3842

Chltep CT1130Desvul DVUA0134

Metcap MCA0651Azoarc ebA3283

Deigeo_DRAFT_1682Mansuc MS0981

Baccla ABC3592Bachal BH0341

Strpyo Spy 1286Vibvul VVA1544

Nostoc alr1468Synech slr7092

Synech slr7016Geosul GSU0057

Thethe TTHB224Nostoc alr1568

Metkan MK1312

Esccol b2755Saltyp STM2938

Phopro PBPRC0034Geosul GSU1392

Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639

Symthe_STH663Corjei jk0643

Nocfar nfa44220Strave SAV7537

Metcap MCA0930Cordip DIP2214

Chrvio CV1756Zymmob ZMO0680

Pasmul PM0311Erwcar ECA3679

Yerpes y1722Legpne lpl2837Phopro PBPRB1995

Acinet ACIAD2484

Clotet CTC01148Fusnuc FN1177

Aquaeo aq 369Theten TTE2658Themar TM1797

Bacfra BF2544Porgin PG2014

Halmar pNG4053Clotet CTC01463

Metjan MJ0378Pyrhor PH1245

Thekod TK0455Metthe MTH1084

Arcful AF2435Metace MA3670

Nanequ NEQ017Pyrhor PH0173

Pictor PTO0003Pictor PTO0049Thevol TVN0106

1343

1343

1343

1343

1518

1518

1343

1518

RecB-f4343

RecB-f1468

RecB-f1468

RecB-f1468

Helicase12031857 RAMP

1688

RAMP1688

Helicase1203

Helicase1203

RAMPBH0337

RAMPYgcH

RAMPy1727

RAMPy17261857

Helicase1203

HD-f

Helicase1203

HD-f

1343

HD-f

HD-f

HD-f

RuvC-f

AF0070

BH0338

y1724

ygcKygcL

RAMP1583

HTH

AF1870

SPy1049

3574

AF1870

PH0918

HTH 3574

1857 RAMP1688

RecB-f1468

Helicase1203

HD-f

AF1870

MTH1090

AF1873

RAMP1583

3574

AF1870

3574

1343

1

7

2

3

4

5

6

7a

Sulsol SSO1999Sulsol SSO1440

Sultok ST2639Sultok ST0032

Sulsol SSO1402Pyraer PAE0068

Arcful AF1874Pyrhor PH0917Thekod TK0450

Pyraby PAB1689

13433513

3513

1518

RuvC-f

HNH

4aHNH

1 0

Pyrhor PH0162

6

64

6

4

16

17

7

7

7

7

1

2

7

7

7

7

5 7

5 75 7

5 75 7

5 7

6 1

7

Metkan MK1314

Metace MA1931

Metmaz MM3356

Metbar Mbar_A1355

Strthe stu0960Myctub Rv2823cStaepi SERP2461

Themar TM1811Mansuc MS1651

Niteur NE0123Thethe TTP0102

Metthe MTH1082

Metthe MTH326

Metjan MJ1672Pictor PTO0053Thevol TVN0112

Metkan MK1297Thethe TTP0115

Bachal BH0328Synech sll7090

Aquaeo aq 387Theten TTE2639

Themar TM1794Arcful AF1867

Pyrfur PF1129Thefus Tfu1578

Porgin PG1987Sultok ST1979

Sulsol SSO1991Sulsol SSO1729

Sultok ST0012Sulaci Saci 2046Nostoc all1479

1421 RAMP1567

RAMP1583

RAMPMTH323

RAMP1769

XMTH324

ldquoPolymeraserdquo cassette

ldquoHelicaserdquo cassette(A)

(B)

(C)

Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4

Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56

Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)

mRNA targeting

Phylogeny of Cas1 and the 3 types of CRISPRCas systems

228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes

EcoliCASS2Type IType I

Type I

Type I

Type I

Type III

Type III

Type III

Type III

Type III

Type III

Type II

YpestCASS3

NmeniCASS4

PolymeraseRAMPmoduleMtubeCASS6

PolRAMP

DvulgCASS1

TneapHmariCASS7

ApernCASS5

cas1cas9 cas2 cas4

cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI

II

IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2

RAMP

Makarova et al NRM submitted

CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria

bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes

54

256

1537

26

516

2

9

7 56107

3

10

13

383

210

46

216

8

1

7 70211

10

1

0102030405060708090

100

1623

84 6

17 7

2322

0

9

0

0

151

2

1

21

17

20 1

0

1533 28 7 14

0

9 7 40

117 210

0102030405060708090

100

Type I Type II Type IIICas1presenceabsence

Makarova et al in preparation

Modular evolution of the 3 types of CRISPRCas

Makarova et al in preparation

Mol Microbiol 2011 Jan79(2)484-502

A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair

Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF

Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair

Back to repair functions

The 3 major modalities of evolution

Koonin Wolf Biol Direct 2009

CRISPRCas as a bona fide Lamarckian system

Koonin Wolf Biol Direct 2009

Phenomenon Biological rolefunction

Phyletic spread

Lamarckian criteria

Genomic changes caused by environmental factor

Changes are specific to relevant genomic loci

Changes provide adaptation to the causative factor

Bona fide LamarckianCRISPRCas Defense against

viruses and other mobile elements

Most of the Archaeaand many bacteria

Yes Yes Yes

piRNA Defense against transposable elements in germline

Animals Yes Yes Yes

HGT (specific cases)

Adaptation to new environment stress response resistance

Archaea bacteria unicellular eukaryotes

Yes Yes Yes

Quasi-LamarckianHGT (general phenomenon)

Diverse innovations Archaea bacteria unicellular eukaryotes

Yes No Yesno

Stress-induced mutagenesis

Stress responseresistanceadaptation to new conditions

Ubiquitous Yes No or partially

Yes (but general evolvabilityenhanced as well)

Diverse Lamarckian and quasi-Lamarckian phenomena

Koonin Wolf Biol Direct 2009

Stress as a gauge of evolutionary modality

Koonin Wolf Biol Direct 2009

bullEvolution of parasites is intrinsic to any replicator system

bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since

bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms

bullPerennial arms race between parasites and hosts is one of the principal factors of evolution

Valerian DoljaOregon State University

Yuri Wolf NCBITatiana Senkevich NIAID NIH

Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille

Didier Raoult et al

Bill MartinUniv Duesseldorf

Kira Makarova NCBI

  • Slide Number 1
  • What is a virus
  • Slide Number 3
  • Slide Number 4
  • Slide Number 5
  • Comparative genomics shows that viruses that cause human diseases belong to families that evolved hundreds of millions or even billions years agoViruses accompany the evolving cellular life throughout its history and might even predate it
  • Slide Number 7
  • Slide Number 8
  • Slide Number 9
  • Slide Number 10
  • Slide Number 11
  • Slide Number 12
  • Slide Number 13
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Slide Number 18
  • Slide Number 19
  • Slide Number 20
  • Slide Number 21
  • Slide Number 22
  • Slide Number 23
  • Slide Number 24
  • Slide Number 25
  • Slide Number 26
  • Slide Number 27
  • Slide Number 28
  • Slide Number 29
  • Slide Number 30
  • Slide Number 31
  • Slide Number 32
  • Slide Number 33
  • Slide Number 34
  • Slide Number 35
  • Slide Number 36
  • Evolution of antivirus defense systems
  • Slide Number 38
  • Slide Number 39
  • Slide Number 40
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins extreme sequence diversity
  • Slide Number 46
  • Slide Number 47
  • Slide Number 48
  • Slide Number 49
  • Hypothesis
  • Slide Number 51
  • Slide Number 52
  • Slide Number 53
  • Slide Number 54
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • The three types of CRISPRCas systems and their signature genes
  • Experimental data on CRISPRCas systems
  • Phylogeny of Cas1 and the 3 types of CRISPRCas systems
  • CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
  • Slide Number 62
  • Slide Number 63
  • Slide Number 64
  • Slide Number 65
  • Slide Number 66
  • Slide Number 67
  • Slide Number 68
  • Slide Number 69
Page 39: The Virus World, its evolution, evolution of antiviral defense

CRISPR clustered regularly interspaced short palindromic repeats

CRISPR repeats

TB and TA

Strepto-like

Ecoli-like

Pasteurella-like

Cas2COG1343

Cas1COG1518

Cas4COG1468

Cas3COG1203

ldquoThe common structural characteristics of CRISPR loci are (i) the presence of multiple short direct repeats which show no or very little sequence variation within a given locus (ii) the presence of non-repetitive spacer sequences between the repeats of similar size (iii) the presence of a common leader sequence of a few hundred basepairs in most species harbouring multiple CRISPR loci (iv) the absence of long open reading frames within the locus and (v) the presence of the cas1 gene accompanied by the cas2 cas3 or cas4 genes in CRISPR-containing speciesrdquo

The cas genes are our ldquorepairrdquo system

ldquoHelicaserdquo cassette

ldquoPolymeraserdquo cassette

paragon of prokaryotic genome plasticity-extensive gene shuffling-evidence of multiple horizontal transfers-widespread gene loss

RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins

extreme sequence diversity MOTIFs specific I II III IV V COGs 13361367 hhshhGs ust-lKGhh+hh hhGtt hD lGhttsGh 160413371332 y1726-like slhlpEKuVRGT lRTIDs YGuVTsGhuh COG1851 hGphpGpsaFh hGFGRh BH0337-like TpA-h+GIh-uIh hhLpDV LGsREhuht COG1567 hhhpp ups-lhtAhh luscoGhGh COG1769 hhhh+Ph-hhh ss-hhGhlhsh hGh hGtcp+hsthchp COG1688 (Cas5) hhhhhts sssshhGhlsh lGttphh COGs 15835551 hhhhoPhhl hGtppshGFGl YgcH-like hHphlh hGu+uhGhGhh y1727-like LHphLh GFsthGLStss MJ0978-like hHNH lG+tsuhGhGol

Ferredoxin foldRNA-recognition motif (RRM)

The sequence similarity space of CRISPR repeats visualized with the BioLayoutprogram [26] Dots denote individual repeat sequences connecting lines represent Smith-Waterman similarities such that closer dots represent more similar sequences Dot colors denote cluster association as derived from MCL clustering The 12 largest clusters are indicated by circles together with their sequence logos coarse phylogenetic composition and sample secondary structures where applicableKunin et al Genome Biology 2007 8R61 doi101186gb-2007-8-4-r61

CRISPR show extreme diversity and complex clustering

Arcful GTTGAAATC-------AGACCAAAATGGGATTGAAAG- Metthe GTTAAAATC-------AGACCAAAATGGGATTGAAAT- Pyraby GTTCCAATA-------AGACTAGAATAGAATTGAAAG- Pyrhor GTTTAATAA--------GACTAAAATAGAATTGAAAG- Sulsol GATTAATCC-------------CAAAAGAATTGAAAG- Pyrhor GTTTCCGTA--------GAACTAAATAGTGTGGAAAG- Pyraby GTTTCCGTA--------GAACTTAGTAGTGTGGAAAG- Pyraer GTTTCAACT------------ATCTTTTGATTTTTGG- Pictor GTTAAATAA-------TAACCTAAATAGGATTGAAAG- Theaer GTAAAATAG---------ACCTTAATAGGATTGAAAG- Metace GTTTCAATC-------CCTCAAAGGTCTGATTTTAAC- Sultok GATGAATCC------------CAAAAGGAATTGAAAG- Sulaci GTTTTAGTT-------------TCTTGTCGTTATTAC- Pictor GTTTAAGAA--------TTACTAGATAGTATGGAGT-- Halmar GTTTCAGAC-------GGACCCTTGTGGGGTTGAAGC- Metjan ATTAAAATC-------AGACCGTTTCGGAATGGAAAT- Metkan GTTTCATTA-CCCGTATTATTACGGGTTAATTGCGAG- Nanequ CTTTCAATA---------TTTCTAATATATTAGAAAC- Themar GTTTCAATA-------CTTCCTTAGAGGTATGGAAAC- Theten GTTTCAATC-------CCTTTTAGGTAGGCTAAAAAC Thethe GTTGCAAAC-CTCGTTAGCCTCGTAGAGGATTGAAAC- Aquaeo GTTTTAACT--------CCACACGGTACATTAGAAAC- Bachal GTCGCACTC----TATATGGGT-GCGTGGATTGAAAT- Azoarcus GTGTTCCCC-------GCGCATCGCGGGGGTTGAAG-- Chltep GTCTTCCCC--------ACGCC-CGTGGGGGTGTTTC- Chrvio GTGCTCCCC-------CACGCA-CGTGGGGATGAACCG Clotet GTATTAGTA-------GCACCA-TATTGGAATGTAAAT Anabaena GTTTTAATTAACAAAAATCCCTATCAGGGATTGAAAC- Myctub GTTTCCGTC--CCCTCTCGGGGTTTTGGGTCTGACGAC Yerpes GTTCACTGC--------CGCACAGGCAGCTTAGAAA-- Metcap GTTTCAATCCACTCCCGGCTATTTAGCCGGGAGATAC- Mycgal GTTTTAGCA-CTGTACAATACTTGTGTAAGCAATAAC- Pasmul GTTAACTGC--------CGTATAGGCAGCTTAGAAA-- Pasmul GTTGTAGTTCCCTCTCTCATTTCGCAGTGCTACAAT-- Pasmul GTTCACCAT--------CGTGTAGATGGCTTAGAAA--

All

A large subset of CRISPRs is conserved among diverse specieseven between archaea and bacteria suggesting that

CRISPRs are horizontally transferred together with cas genes

Mojica FJ Diez-Villasenor C Garcia-Martinez J Soria E

Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elementsJ Mol Evol 2005 Feb60(2)174-82

Here we show that CRISPR spacers derive from preexisting sequences either chromosomal or within transmissible genetic elements such as bacteriophages and conjugative plasmids

helliphelliphelliphellipThe transcription of the CRISPR loci (Tang et al 2002) suggests that such activity could be executed by CRISPR-RNA molecules acting as regulatory RNA that specifically recognizes the target through the homologous RNA-spacer sequence similarly to the eukaryotic interference RNA

HypothesisCRISPRCasbull is a prokaryotic immunity system that functions on the

RNAi principle

bull integrates short fragments of essential phageplasmid genes into CRISPRs

bull When expressed these fragments (psiRNA ndash after prokaryotic siRNA) silence the target gene and make the organism immune to the respective agent

bull contains all or most of the protein activities involved in these processes

bull Some of the Cas proteins are functional analogs of the eukaryotic proteins involved in RNAi in particular components of RISC (RNA-Induced Silencing Complex) and form prokaryotic analogs of RISC (pRISCs)

Transcription

(protein-guided) RNA folding

specific transcription factor (COG1517)

cellular RNApol

polycistronic pre-psiRNA

p-dicer (Helicase+HD-Hydrolase)(COG1203 COG2254)

RNA processing(fast)

pre-psiRNA75-100 nt

RNA processing(slow)

p-dicer

psiRNA25-45 nt

p-slicer(COG1468 COG4343

COG1857)

RAMP

target RNA cleavage

p-RISC

plasmid or phage mRNA

Annealing to RNA target

p-RISC

The basic scheme of CASS functioning

psiRNA25-45 nt

RAMP

RAMP-RNA complex(unstable)plasmid or phage

mRNA

Primer elongationCASS RNApol(COG1353)

Amplified annealing

long dsRNA (stable)

p-dicerDuplex degradation

RAMPRAMP binding

Cycle continues

Annealing to RNA target

Variant of CASS functioning with polymerasepsi-RNA amplification(mostly in thermophiles but also in Mycobacteria)

plasmid or phage mRNA

pre-psiRNA75-100 nt

CASS RNApol(COG1353)Or other RT

Reverse transcription with random copy choice

Random RNA recombination and reverse transcription

dsDNA with CRISPRs and target-derived spacers

Homologous recombination with genomic CRISPR region

integrase(COG1518)

genomic DNA

genomic DNA with new target-derived spacer

OR

New psiRNA generation

Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12

Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity

Key validation

bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease

Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes

(1) Adaptation

(2) Expression amp Processing

(3) Interference

invading virus plasmid

CRISPRleader

5 4 3 2 1

6 5 4 3 2 1

invading virus plasmid

invading virus plasmid

host

host

cas operon

Cascade

Cas3

cas3 cas2cas1cas5

Van der Oost et al TIBS 2009

The three types of CRISPRCas systems and their signature genes

1343

1343

1343

1343

Helicase1203

HD-f

Cas2

Cas6

Cas6

Cas5

Cas8

Cas9

Cas1

Cas3

Cas10

Cas11

Cas7

Cas6

Cas6

Cas5 Cas6

Cas4

Cas6

13433513

1343

RuvC-f

3513

HNH

13433513

1857

18571857 RAMP1688

RAMP1688

RecB-f1468

RecB-f1468

Helicase1203

R AMP1583

1857 1857Helicase1203

1857Helicase1203

1343 y1724

ygcKygcL

EcoliCASS2

YpestCASS3

NmeniCASS4

MtubeCASS6 RAMP module

DvulgCASS1

Tneap and HmariCASS7

ApernCASS5

Type I (Helicase cassette)

Cascade

Type II (HNH-type)

Type III (RAMP module or Polymerase cassette)

CASS4a

1857 1343RecB-f1468

Helicase1203

BH0338MTH1090LA3191

1421 RAMP1567

RAMP15833337RAMP

1583

RAMP1583

RAMP1769

SPy1049

Cas6

Cas61857

RecB-f1468

RAMP1688

Helicase1203

AF0070

R AMP1583

1343

1517

Cas12 Cas13

RAMP1688

RAMP1688

RAMP1688

Makarova et al NRM submitted

Experimental data on CRISPRCas systems

1857

1857

1857

RAMP1688

10

Cordip DIP0037Bacfra BF3951

Wolsuc WS1444Camjej Cj1522c

Neimen NMA0630Pasmul PM1126

Strthe str0658Strpyo Spy 0770

Treden TDE0328Staepi SERP2463

Mycgal MGA 0523Mycmob MMOB0320

Porgin PG1982Legpne lpp0161

Wolsuc WS1615Sulsol SSO1450

Aerper APE1240Sulsol SSO1405

Pyraer PAE0200Pyraer PAE0081

Metmaz MM0559Metmaz MM3249

Arcful AF1878Mansuc MS1635

Niteur NE0111Myctub Rv2817c

Nostoc alr0381Synech slr7071Thethe TTHB145

Chrvio CV1229Xanaxo XAC3842

Chltep CT1130Desvul DVUA0134

Metcap MCA0651Azoarc ebA3283

Deigeo_DRAFT_1682Mansuc MS0981

Baccla ABC3592Bachal BH0341

Strpyo Spy 1286Vibvul VVA1544

Nostoc alr1468Synech slr7092

Synech slr7016Geosul GSU0057

Thethe TTHB224Nostoc alr1568

Metkan MK1312

Esccol b2755Saltyp STM2938

Phopro PBPRC0034Geosul GSU1392

Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639

Symthe_STH663Corjei jk0643

Nocfar nfa44220Strave SAV7537

Metcap MCA0930Cordip DIP2214

Chrvio CV1756Zymmob ZMO0680

Pasmul PM0311Erwcar ECA3679

Yerpes y1722Legpne lpl2837Phopro PBPRB1995

Acinet ACIAD2484

Clotet CTC01148Fusnuc FN1177

Aquaeo aq 369Theten TTE2658Themar TM1797

Bacfra BF2544Porgin PG2014

Halmar pNG4053Clotet CTC01463

Metjan MJ0378Pyrhor PH1245

Thekod TK0455Metthe MTH1084

Arcful AF2435Metace MA3670

Nanequ NEQ017Pyrhor PH0173

Pictor PTO0003Pictor PTO0049Thevol TVN0106

1343

1343

1343

1343

1518

1518

1343

1518

RecB-f4343

RecB-f1468

RecB-f1468

RecB-f1468

Helicase12031857 RAMP

1688

RAMP1688

Helicase1203

Helicase1203

RAMPBH0337

RAMPYgcH

RAMPy1727

RAMPy17261857

Helicase1203

HD-f

Helicase1203

HD-f

1343

HD-f

HD-f

HD-f

RuvC-f

AF0070

BH0338

y1724

ygcKygcL

RAMP1583

HTH

AF1870

SPy1049

3574

AF1870

PH0918

HTH 3574

1857 RAMP1688

RecB-f1468

Helicase1203

HD-f

AF1870

MTH1090

AF1873

RAMP1583

3574

AF1870

3574

1343

1

7

2

3

4

5

6

7a

Sulsol SSO1999Sulsol SSO1440

Sultok ST2639Sultok ST0032

Sulsol SSO1402Pyraer PAE0068

Arcful AF1874Pyrhor PH0917Thekod TK0450

Pyraby PAB1689

13433513

3513

1518

RuvC-f

HNH

4aHNH

1 0

Pyrhor PH0162

6

64

6

4

16

17

7

7

7

7

1

2

7

7

7

7

5 7

5 75 7

5 75 7

5 7

6 1

7

Metkan MK1314

Metace MA1931

Metmaz MM3356

Metbar Mbar_A1355

Strthe stu0960Myctub Rv2823cStaepi SERP2461

Themar TM1811Mansuc MS1651

Niteur NE0123Thethe TTP0102

Metthe MTH1082

Metthe MTH326

Metjan MJ1672Pictor PTO0053Thevol TVN0112

Metkan MK1297Thethe TTP0115

Bachal BH0328Synech sll7090

Aquaeo aq 387Theten TTE2639

Themar TM1794Arcful AF1867

Pyrfur PF1129Thefus Tfu1578

Porgin PG1987Sultok ST1979

Sulsol SSO1991Sulsol SSO1729

Sultok ST0012Sulaci Saci 2046Nostoc all1479

1421 RAMP1567

RAMP1583

RAMPMTH323

RAMP1769

XMTH324

ldquoPolymeraserdquo cassette

ldquoHelicaserdquo cassette(A)

(B)

(C)

Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4

Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56

Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)

mRNA targeting

Phylogeny of Cas1 and the 3 types of CRISPRCas systems

228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes

EcoliCASS2Type IType I

Type I

Type I

Type I

Type III

Type III

Type III

Type III

Type III

Type III

Type II

YpestCASS3

NmeniCASS4

PolymeraseRAMPmoduleMtubeCASS6

PolRAMP

DvulgCASS1

TneapHmariCASS7

ApernCASS5

cas1cas9 cas2 cas4

cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI

II

IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2

RAMP

Makarova et al NRM submitted

CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria

bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes

54

256

1537

26

516

2

9

7 56107

3

10

13

383

210

46

216

8

1

7 70211

10

1

0102030405060708090

100

1623

84 6

17 7

2322

0

9

0

0

151

2

1

21

17

20 1

0

1533 28 7 14

0

9 7 40

117 210

0102030405060708090

100

Type I Type II Type IIICas1presenceabsence

Makarova et al in preparation

Modular evolution of the 3 types of CRISPRCas

Makarova et al in preparation

Mol Microbiol 2011 Jan79(2)484-502

A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair

Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF

Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair

Back to repair functions

The 3 major modalities of evolution

Koonin Wolf Biol Direct 2009

CRISPRCas as a bona fide Lamarckian system

Koonin Wolf Biol Direct 2009

Phenomenon Biological rolefunction

Phyletic spread

Lamarckian criteria

Genomic changes caused by environmental factor

Changes are specific to relevant genomic loci

Changes provide adaptation to the causative factor

Bona fide LamarckianCRISPRCas Defense against

viruses and other mobile elements

Most of the Archaeaand many bacteria

Yes Yes Yes

piRNA Defense against transposable elements in germline

Animals Yes Yes Yes

HGT (specific cases)

Adaptation to new environment stress response resistance

Archaea bacteria unicellular eukaryotes

Yes Yes Yes

Quasi-LamarckianHGT (general phenomenon)

Diverse innovations Archaea bacteria unicellular eukaryotes

Yes No Yesno

Stress-induced mutagenesis

Stress responseresistanceadaptation to new conditions

Ubiquitous Yes No or partially

Yes (but general evolvabilityenhanced as well)

Diverse Lamarckian and quasi-Lamarckian phenomena

Koonin Wolf Biol Direct 2009

Stress as a gauge of evolutionary modality

Koonin Wolf Biol Direct 2009

bullEvolution of parasites is intrinsic to any replicator system

bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since

bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms

bullPerennial arms race between parasites and hosts is one of the principal factors of evolution

Valerian DoljaOregon State University

Yuri Wolf NCBITatiana Senkevich NIAID NIH

Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille

Didier Raoult et al

Bill MartinUniv Duesseldorf

Kira Makarova NCBI

  • Slide Number 1
  • What is a virus
  • Slide Number 3
  • Slide Number 4
  • Slide Number 5
  • Comparative genomics shows that viruses that cause human diseases belong to families that evolved hundreds of millions or even billions years agoViruses accompany the evolving cellular life throughout its history and might even predate it
  • Slide Number 7
  • Slide Number 8
  • Slide Number 9
  • Slide Number 10
  • Slide Number 11
  • Slide Number 12
  • Slide Number 13
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Slide Number 18
  • Slide Number 19
  • Slide Number 20
  • Slide Number 21
  • Slide Number 22
  • Slide Number 23
  • Slide Number 24
  • Slide Number 25
  • Slide Number 26
  • Slide Number 27
  • Slide Number 28
  • Slide Number 29
  • Slide Number 30
  • Slide Number 31
  • Slide Number 32
  • Slide Number 33
  • Slide Number 34
  • Slide Number 35
  • Slide Number 36
  • Evolution of antivirus defense systems
  • Slide Number 38
  • Slide Number 39
  • Slide Number 40
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins extreme sequence diversity
  • Slide Number 46
  • Slide Number 47
  • Slide Number 48
  • Slide Number 49
  • Hypothesis
  • Slide Number 51
  • Slide Number 52
  • Slide Number 53
  • Slide Number 54
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • The three types of CRISPRCas systems and their signature genes
  • Experimental data on CRISPRCas systems
  • Phylogeny of Cas1 and the 3 types of CRISPRCas systems
  • CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
  • Slide Number 62
  • Slide Number 63
  • Slide Number 64
  • Slide Number 65
  • Slide Number 66
  • Slide Number 67
  • Slide Number 68
  • Slide Number 69
Page 40: The Virus World, its evolution, evolution of antiviral defense

CRISPR repeats

TB and TA

Strepto-like

Ecoli-like

Pasteurella-like

Cas2COG1343

Cas1COG1518

Cas4COG1468

Cas3COG1203

ldquoThe common structural characteristics of CRISPR loci are (i) the presence of multiple short direct repeats which show no or very little sequence variation within a given locus (ii) the presence of non-repetitive spacer sequences between the repeats of similar size (iii) the presence of a common leader sequence of a few hundred basepairs in most species harbouring multiple CRISPR loci (iv) the absence of long open reading frames within the locus and (v) the presence of the cas1 gene accompanied by the cas2 cas3 or cas4 genes in CRISPR-containing speciesrdquo

The cas genes are our ldquorepairrdquo system

ldquoHelicaserdquo cassette

ldquoPolymeraserdquo cassette

paragon of prokaryotic genome plasticity-extensive gene shuffling-evidence of multiple horizontal transfers-widespread gene loss

RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins

extreme sequence diversity MOTIFs specific I II III IV V COGs 13361367 hhshhGs ust-lKGhh+hh hhGtt hD lGhttsGh 160413371332 y1726-like slhlpEKuVRGT lRTIDs YGuVTsGhuh COG1851 hGphpGpsaFh hGFGRh BH0337-like TpA-h+GIh-uIh hhLpDV LGsREhuht COG1567 hhhpp ups-lhtAhh luscoGhGh COG1769 hhhh+Ph-hhh ss-hhGhlhsh hGh hGtcp+hsthchp COG1688 (Cas5) hhhhhts sssshhGhlsh lGttphh COGs 15835551 hhhhoPhhl hGtppshGFGl YgcH-like hHphlh hGu+uhGhGhh y1727-like LHphLh GFsthGLStss MJ0978-like hHNH lG+tsuhGhGol

Ferredoxin foldRNA-recognition motif (RRM)

The sequence similarity space of CRISPR repeats visualized with the BioLayoutprogram [26] Dots denote individual repeat sequences connecting lines represent Smith-Waterman similarities such that closer dots represent more similar sequences Dot colors denote cluster association as derived from MCL clustering The 12 largest clusters are indicated by circles together with their sequence logos coarse phylogenetic composition and sample secondary structures where applicableKunin et al Genome Biology 2007 8R61 doi101186gb-2007-8-4-r61

CRISPR show extreme diversity and complex clustering

Arcful GTTGAAATC-------AGACCAAAATGGGATTGAAAG- Metthe GTTAAAATC-------AGACCAAAATGGGATTGAAAT- Pyraby GTTCCAATA-------AGACTAGAATAGAATTGAAAG- Pyrhor GTTTAATAA--------GACTAAAATAGAATTGAAAG- Sulsol GATTAATCC-------------CAAAAGAATTGAAAG- Pyrhor GTTTCCGTA--------GAACTAAATAGTGTGGAAAG- Pyraby GTTTCCGTA--------GAACTTAGTAGTGTGGAAAG- Pyraer GTTTCAACT------------ATCTTTTGATTTTTGG- Pictor GTTAAATAA-------TAACCTAAATAGGATTGAAAG- Theaer GTAAAATAG---------ACCTTAATAGGATTGAAAG- Metace GTTTCAATC-------CCTCAAAGGTCTGATTTTAAC- Sultok GATGAATCC------------CAAAAGGAATTGAAAG- Sulaci GTTTTAGTT-------------TCTTGTCGTTATTAC- Pictor GTTTAAGAA--------TTACTAGATAGTATGGAGT-- Halmar GTTTCAGAC-------GGACCCTTGTGGGGTTGAAGC- Metjan ATTAAAATC-------AGACCGTTTCGGAATGGAAAT- Metkan GTTTCATTA-CCCGTATTATTACGGGTTAATTGCGAG- Nanequ CTTTCAATA---------TTTCTAATATATTAGAAAC- Themar GTTTCAATA-------CTTCCTTAGAGGTATGGAAAC- Theten GTTTCAATC-------CCTTTTAGGTAGGCTAAAAAC Thethe GTTGCAAAC-CTCGTTAGCCTCGTAGAGGATTGAAAC- Aquaeo GTTTTAACT--------CCACACGGTACATTAGAAAC- Bachal GTCGCACTC----TATATGGGT-GCGTGGATTGAAAT- Azoarcus GTGTTCCCC-------GCGCATCGCGGGGGTTGAAG-- Chltep GTCTTCCCC--------ACGCC-CGTGGGGGTGTTTC- Chrvio GTGCTCCCC-------CACGCA-CGTGGGGATGAACCG Clotet GTATTAGTA-------GCACCA-TATTGGAATGTAAAT Anabaena GTTTTAATTAACAAAAATCCCTATCAGGGATTGAAAC- Myctub GTTTCCGTC--CCCTCTCGGGGTTTTGGGTCTGACGAC Yerpes GTTCACTGC--------CGCACAGGCAGCTTAGAAA-- Metcap GTTTCAATCCACTCCCGGCTATTTAGCCGGGAGATAC- Mycgal GTTTTAGCA-CTGTACAATACTTGTGTAAGCAATAAC- Pasmul GTTAACTGC--------CGTATAGGCAGCTTAGAAA-- Pasmul GTTGTAGTTCCCTCTCTCATTTCGCAGTGCTACAAT-- Pasmul GTTCACCAT--------CGTGTAGATGGCTTAGAAA--

All

A large subset of CRISPRs is conserved among diverse specieseven between archaea and bacteria suggesting that

CRISPRs are horizontally transferred together with cas genes

Mojica FJ Diez-Villasenor C Garcia-Martinez J Soria E

Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elementsJ Mol Evol 2005 Feb60(2)174-82

Here we show that CRISPR spacers derive from preexisting sequences either chromosomal or within transmissible genetic elements such as bacteriophages and conjugative plasmids

helliphelliphelliphellipThe transcription of the CRISPR loci (Tang et al 2002) suggests that such activity could be executed by CRISPR-RNA molecules acting as regulatory RNA that specifically recognizes the target through the homologous RNA-spacer sequence similarly to the eukaryotic interference RNA

HypothesisCRISPRCasbull is a prokaryotic immunity system that functions on the

RNAi principle

bull integrates short fragments of essential phageplasmid genes into CRISPRs

bull When expressed these fragments (psiRNA ndash after prokaryotic siRNA) silence the target gene and make the organism immune to the respective agent

bull contains all or most of the protein activities involved in these processes

bull Some of the Cas proteins are functional analogs of the eukaryotic proteins involved in RNAi in particular components of RISC (RNA-Induced Silencing Complex) and form prokaryotic analogs of RISC (pRISCs)

Transcription

(protein-guided) RNA folding

specific transcription factor (COG1517)

cellular RNApol

polycistronic pre-psiRNA

p-dicer (Helicase+HD-Hydrolase)(COG1203 COG2254)

RNA processing(fast)

pre-psiRNA75-100 nt

RNA processing(slow)

p-dicer

psiRNA25-45 nt

p-slicer(COG1468 COG4343

COG1857)

RAMP

target RNA cleavage

p-RISC

plasmid or phage mRNA

Annealing to RNA target

p-RISC

The basic scheme of CASS functioning

psiRNA25-45 nt

RAMP

RAMP-RNA complex(unstable)plasmid or phage

mRNA

Primer elongationCASS RNApol(COG1353)

Amplified annealing

long dsRNA (stable)

p-dicerDuplex degradation

RAMPRAMP binding

Cycle continues

Annealing to RNA target

Variant of CASS functioning with polymerasepsi-RNA amplification(mostly in thermophiles but also in Mycobacteria)

plasmid or phage mRNA

pre-psiRNA75-100 nt

CASS RNApol(COG1353)Or other RT

Reverse transcription with random copy choice

Random RNA recombination and reverse transcription

dsDNA with CRISPRs and target-derived spacers

Homologous recombination with genomic CRISPR region

integrase(COG1518)

genomic DNA

genomic DNA with new target-derived spacer

OR

New psiRNA generation

Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12

Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity

Key validation

bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease

Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes

(1) Adaptation

(2) Expression amp Processing

(3) Interference

invading virus plasmid

CRISPRleader

5 4 3 2 1

6 5 4 3 2 1

invading virus plasmid

invading virus plasmid

host

host

cas operon

Cascade

Cas3

cas3 cas2cas1cas5

Van der Oost et al TIBS 2009

The three types of CRISPRCas systems and their signature genes

1343

1343

1343

1343

Helicase1203

HD-f

Cas2

Cas6

Cas6

Cas5

Cas8

Cas9

Cas1

Cas3

Cas10

Cas11

Cas7

Cas6

Cas6

Cas5 Cas6

Cas4

Cas6

13433513

1343

RuvC-f

3513

HNH

13433513

1857

18571857 RAMP1688

RAMP1688

RecB-f1468

RecB-f1468

Helicase1203

R AMP1583

1857 1857Helicase1203

1857Helicase1203

1343 y1724

ygcKygcL

EcoliCASS2

YpestCASS3

NmeniCASS4

MtubeCASS6 RAMP module

DvulgCASS1

Tneap and HmariCASS7

ApernCASS5

Type I (Helicase cassette)

Cascade

Type II (HNH-type)

Type III (RAMP module or Polymerase cassette)

CASS4a

1857 1343RecB-f1468

Helicase1203

BH0338MTH1090LA3191

1421 RAMP1567

RAMP15833337RAMP

1583

RAMP1583

RAMP1769

SPy1049

Cas6

Cas61857

RecB-f1468

RAMP1688

Helicase1203

AF0070

R AMP1583

1343

1517

Cas12 Cas13

RAMP1688

RAMP1688

RAMP1688

Makarova et al NRM submitted

Experimental data on CRISPRCas systems

1857

1857

1857

RAMP1688

10

Cordip DIP0037Bacfra BF3951

Wolsuc WS1444Camjej Cj1522c

Neimen NMA0630Pasmul PM1126

Strthe str0658Strpyo Spy 0770

Treden TDE0328Staepi SERP2463

Mycgal MGA 0523Mycmob MMOB0320

Porgin PG1982Legpne lpp0161

Wolsuc WS1615Sulsol SSO1450

Aerper APE1240Sulsol SSO1405

Pyraer PAE0200Pyraer PAE0081

Metmaz MM0559Metmaz MM3249

Arcful AF1878Mansuc MS1635

Niteur NE0111Myctub Rv2817c

Nostoc alr0381Synech slr7071Thethe TTHB145

Chrvio CV1229Xanaxo XAC3842

Chltep CT1130Desvul DVUA0134

Metcap MCA0651Azoarc ebA3283

Deigeo_DRAFT_1682Mansuc MS0981

Baccla ABC3592Bachal BH0341

Strpyo Spy 1286Vibvul VVA1544

Nostoc alr1468Synech slr7092

Synech slr7016Geosul GSU0057

Thethe TTHB224Nostoc alr1568

Metkan MK1312

Esccol b2755Saltyp STM2938

Phopro PBPRC0034Geosul GSU1392

Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639

Symthe_STH663Corjei jk0643

Nocfar nfa44220Strave SAV7537

Metcap MCA0930Cordip DIP2214

Chrvio CV1756Zymmob ZMO0680

Pasmul PM0311Erwcar ECA3679

Yerpes y1722Legpne lpl2837Phopro PBPRB1995

Acinet ACIAD2484

Clotet CTC01148Fusnuc FN1177

Aquaeo aq 369Theten TTE2658Themar TM1797

Bacfra BF2544Porgin PG2014

Halmar pNG4053Clotet CTC01463

Metjan MJ0378Pyrhor PH1245

Thekod TK0455Metthe MTH1084

Arcful AF2435Metace MA3670

Nanequ NEQ017Pyrhor PH0173

Pictor PTO0003Pictor PTO0049Thevol TVN0106

1343

1343

1343

1343

1518

1518

1343

1518

RecB-f4343

RecB-f1468

RecB-f1468

RecB-f1468

Helicase12031857 RAMP

1688

RAMP1688

Helicase1203

Helicase1203

RAMPBH0337

RAMPYgcH

RAMPy1727

RAMPy17261857

Helicase1203

HD-f

Helicase1203

HD-f

1343

HD-f

HD-f

HD-f

RuvC-f

AF0070

BH0338

y1724

ygcKygcL

RAMP1583

HTH

AF1870

SPy1049

3574

AF1870

PH0918

HTH 3574

1857 RAMP1688

RecB-f1468

Helicase1203

HD-f

AF1870

MTH1090

AF1873

RAMP1583

3574

AF1870

3574

1343

1

7

2

3

4

5

6

7a

Sulsol SSO1999Sulsol SSO1440

Sultok ST2639Sultok ST0032

Sulsol SSO1402Pyraer PAE0068

Arcful AF1874Pyrhor PH0917Thekod TK0450

Pyraby PAB1689

13433513

3513

1518

RuvC-f

HNH

4aHNH

1 0

Pyrhor PH0162

6

64

6

4

16

17

7

7

7

7

1

2

7

7

7

7

5 7

5 75 7

5 75 7

5 7

6 1

7

Metkan MK1314

Metace MA1931

Metmaz MM3356

Metbar Mbar_A1355

Strthe stu0960Myctub Rv2823cStaepi SERP2461

Themar TM1811Mansuc MS1651

Niteur NE0123Thethe TTP0102

Metthe MTH1082

Metthe MTH326

Metjan MJ1672Pictor PTO0053Thevol TVN0112

Metkan MK1297Thethe TTP0115

Bachal BH0328Synech sll7090

Aquaeo aq 387Theten TTE2639

Themar TM1794Arcful AF1867

Pyrfur PF1129Thefus Tfu1578

Porgin PG1987Sultok ST1979

Sulsol SSO1991Sulsol SSO1729

Sultok ST0012Sulaci Saci 2046Nostoc all1479

1421 RAMP1567

RAMP1583

RAMPMTH323

RAMP1769

XMTH324

ldquoPolymeraserdquo cassette

ldquoHelicaserdquo cassette(A)

(B)

(C)

Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4

Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56

Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)

mRNA targeting

Phylogeny of Cas1 and the 3 types of CRISPRCas systems

228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes

EcoliCASS2Type IType I

Type I

Type I

Type I

Type III

Type III

Type III

Type III

Type III

Type III

Type II

YpestCASS3

NmeniCASS4

PolymeraseRAMPmoduleMtubeCASS6

PolRAMP

DvulgCASS1

TneapHmariCASS7

ApernCASS5

cas1cas9 cas2 cas4

cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI

II

IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2

RAMP

Makarova et al NRM submitted

CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria

bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes

54

256

1537

26

516

2

9

7 56107

3

10

13

383

210

46

216

8

1

7 70211

10

1

0102030405060708090

100

1623

84 6

17 7

2322

0

9

0

0

151

2

1

21

17

20 1

0

1533 28 7 14

0

9 7 40

117 210

0102030405060708090

100

Type I Type II Type IIICas1presenceabsence

Makarova et al in preparation

Modular evolution of the 3 types of CRISPRCas

Makarova et al in preparation

Mol Microbiol 2011 Jan79(2)484-502

A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair

Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF

Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair

Back to repair functions

The 3 major modalities of evolution

Koonin Wolf Biol Direct 2009

CRISPRCas as a bona fide Lamarckian system

Koonin Wolf Biol Direct 2009

Phenomenon Biological rolefunction

Phyletic spread

Lamarckian criteria

Genomic changes caused by environmental factor

Changes are specific to relevant genomic loci

Changes provide adaptation to the causative factor

Bona fide LamarckianCRISPRCas Defense against

viruses and other mobile elements

Most of the Archaeaand many bacteria

Yes Yes Yes

piRNA Defense against transposable elements in germline

Animals Yes Yes Yes

HGT (specific cases)

Adaptation to new environment stress response resistance

Archaea bacteria unicellular eukaryotes

Yes Yes Yes

Quasi-LamarckianHGT (general phenomenon)

Diverse innovations Archaea bacteria unicellular eukaryotes

Yes No Yesno

Stress-induced mutagenesis

Stress responseresistanceadaptation to new conditions

Ubiquitous Yes No or partially

Yes (but general evolvabilityenhanced as well)

Diverse Lamarckian and quasi-Lamarckian phenomena

Koonin Wolf Biol Direct 2009

Stress as a gauge of evolutionary modality

Koonin Wolf Biol Direct 2009

bullEvolution of parasites is intrinsic to any replicator system

bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since

bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms

bullPerennial arms race between parasites and hosts is one of the principal factors of evolution

Valerian DoljaOregon State University

Yuri Wolf NCBITatiana Senkevich NIAID NIH

Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille

Didier Raoult et al

Bill MartinUniv Duesseldorf

Kira Makarova NCBI

  • Slide Number 1
  • What is a virus
  • Slide Number 3
  • Slide Number 4
  • Slide Number 5
  • Comparative genomics shows that viruses that cause human diseases belong to families that evolved hundreds of millions or even billions years agoViruses accompany the evolving cellular life throughout its history and might even predate it
  • Slide Number 7
  • Slide Number 8
  • Slide Number 9
  • Slide Number 10
  • Slide Number 11
  • Slide Number 12
  • Slide Number 13
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Slide Number 18
  • Slide Number 19
  • Slide Number 20
  • Slide Number 21
  • Slide Number 22
  • Slide Number 23
  • Slide Number 24
  • Slide Number 25
  • Slide Number 26
  • Slide Number 27
  • Slide Number 28
  • Slide Number 29
  • Slide Number 30
  • Slide Number 31
  • Slide Number 32
  • Slide Number 33
  • Slide Number 34
  • Slide Number 35
  • Slide Number 36
  • Evolution of antivirus defense systems
  • Slide Number 38
  • Slide Number 39
  • Slide Number 40
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins extreme sequence diversity
  • Slide Number 46
  • Slide Number 47
  • Slide Number 48
  • Slide Number 49
  • Hypothesis
  • Slide Number 51
  • Slide Number 52
  • Slide Number 53
  • Slide Number 54
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • The three types of CRISPRCas systems and their signature genes
  • Experimental data on CRISPRCas systems
  • Phylogeny of Cas1 and the 3 types of CRISPRCas systems
  • CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
  • Slide Number 62
  • Slide Number 63
  • Slide Number 64
  • Slide Number 65
  • Slide Number 66
  • Slide Number 67
  • Slide Number 68
  • Slide Number 69
Page 41: The Virus World, its evolution, evolution of antiviral defense

ldquoHelicaserdquo cassette

ldquoPolymeraserdquo cassette

paragon of prokaryotic genome plasticity-extensive gene shuffling-evidence of multiple horizontal transfers-widespread gene loss

RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins

extreme sequence diversity MOTIFs specific I II III IV V COGs 13361367 hhshhGs ust-lKGhh+hh hhGtt hD lGhttsGh 160413371332 y1726-like slhlpEKuVRGT lRTIDs YGuVTsGhuh COG1851 hGphpGpsaFh hGFGRh BH0337-like TpA-h+GIh-uIh hhLpDV LGsREhuht COG1567 hhhpp ups-lhtAhh luscoGhGh COG1769 hhhh+Ph-hhh ss-hhGhlhsh hGh hGtcp+hsthchp COG1688 (Cas5) hhhhhts sssshhGhlsh lGttphh COGs 15835551 hhhhoPhhl hGtppshGFGl YgcH-like hHphlh hGu+uhGhGhh y1727-like LHphLh GFsthGLStss MJ0978-like hHNH lG+tsuhGhGol

Ferredoxin foldRNA-recognition motif (RRM)

The sequence similarity space of CRISPR repeats visualized with the BioLayoutprogram [26] Dots denote individual repeat sequences connecting lines represent Smith-Waterman similarities such that closer dots represent more similar sequences Dot colors denote cluster association as derived from MCL clustering The 12 largest clusters are indicated by circles together with their sequence logos coarse phylogenetic composition and sample secondary structures where applicableKunin et al Genome Biology 2007 8R61 doi101186gb-2007-8-4-r61

CRISPR show extreme diversity and complex clustering

Arcful GTTGAAATC-------AGACCAAAATGGGATTGAAAG- Metthe GTTAAAATC-------AGACCAAAATGGGATTGAAAT- Pyraby GTTCCAATA-------AGACTAGAATAGAATTGAAAG- Pyrhor GTTTAATAA--------GACTAAAATAGAATTGAAAG- Sulsol GATTAATCC-------------CAAAAGAATTGAAAG- Pyrhor GTTTCCGTA--------GAACTAAATAGTGTGGAAAG- Pyraby GTTTCCGTA--------GAACTTAGTAGTGTGGAAAG- Pyraer GTTTCAACT------------ATCTTTTGATTTTTGG- Pictor GTTAAATAA-------TAACCTAAATAGGATTGAAAG- Theaer GTAAAATAG---------ACCTTAATAGGATTGAAAG- Metace GTTTCAATC-------CCTCAAAGGTCTGATTTTAAC- Sultok GATGAATCC------------CAAAAGGAATTGAAAG- Sulaci GTTTTAGTT-------------TCTTGTCGTTATTAC- Pictor GTTTAAGAA--------TTACTAGATAGTATGGAGT-- Halmar GTTTCAGAC-------GGACCCTTGTGGGGTTGAAGC- Metjan ATTAAAATC-------AGACCGTTTCGGAATGGAAAT- Metkan GTTTCATTA-CCCGTATTATTACGGGTTAATTGCGAG- Nanequ CTTTCAATA---------TTTCTAATATATTAGAAAC- Themar GTTTCAATA-------CTTCCTTAGAGGTATGGAAAC- Theten GTTTCAATC-------CCTTTTAGGTAGGCTAAAAAC Thethe GTTGCAAAC-CTCGTTAGCCTCGTAGAGGATTGAAAC- Aquaeo GTTTTAACT--------CCACACGGTACATTAGAAAC- Bachal GTCGCACTC----TATATGGGT-GCGTGGATTGAAAT- Azoarcus GTGTTCCCC-------GCGCATCGCGGGGGTTGAAG-- Chltep GTCTTCCCC--------ACGCC-CGTGGGGGTGTTTC- Chrvio GTGCTCCCC-------CACGCA-CGTGGGGATGAACCG Clotet GTATTAGTA-------GCACCA-TATTGGAATGTAAAT Anabaena GTTTTAATTAACAAAAATCCCTATCAGGGATTGAAAC- Myctub GTTTCCGTC--CCCTCTCGGGGTTTTGGGTCTGACGAC Yerpes GTTCACTGC--------CGCACAGGCAGCTTAGAAA-- Metcap GTTTCAATCCACTCCCGGCTATTTAGCCGGGAGATAC- Mycgal GTTTTAGCA-CTGTACAATACTTGTGTAAGCAATAAC- Pasmul GTTAACTGC--------CGTATAGGCAGCTTAGAAA-- Pasmul GTTGTAGTTCCCTCTCTCATTTCGCAGTGCTACAAT-- Pasmul GTTCACCAT--------CGTGTAGATGGCTTAGAAA--

All

A large subset of CRISPRs is conserved among diverse specieseven between archaea and bacteria suggesting that

CRISPRs are horizontally transferred together with cas genes

Mojica FJ Diez-Villasenor C Garcia-Martinez J Soria E

Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elementsJ Mol Evol 2005 Feb60(2)174-82

Here we show that CRISPR spacers derive from preexisting sequences either chromosomal or within transmissible genetic elements such as bacteriophages and conjugative plasmids

helliphelliphelliphellipThe transcription of the CRISPR loci (Tang et al 2002) suggests that such activity could be executed by CRISPR-RNA molecules acting as regulatory RNA that specifically recognizes the target through the homologous RNA-spacer sequence similarly to the eukaryotic interference RNA

HypothesisCRISPRCasbull is a prokaryotic immunity system that functions on the

RNAi principle

bull integrates short fragments of essential phageplasmid genes into CRISPRs

bull When expressed these fragments (psiRNA ndash after prokaryotic siRNA) silence the target gene and make the organism immune to the respective agent

bull contains all or most of the protein activities involved in these processes

bull Some of the Cas proteins are functional analogs of the eukaryotic proteins involved in RNAi in particular components of RISC (RNA-Induced Silencing Complex) and form prokaryotic analogs of RISC (pRISCs)

Transcription

(protein-guided) RNA folding

specific transcription factor (COG1517)

cellular RNApol

polycistronic pre-psiRNA

p-dicer (Helicase+HD-Hydrolase)(COG1203 COG2254)

RNA processing(fast)

pre-psiRNA75-100 nt

RNA processing(slow)

p-dicer

psiRNA25-45 nt

p-slicer(COG1468 COG4343

COG1857)

RAMP

target RNA cleavage

p-RISC

plasmid or phage mRNA

Annealing to RNA target

p-RISC

The basic scheme of CASS functioning

psiRNA25-45 nt

RAMP

RAMP-RNA complex(unstable)plasmid or phage

mRNA

Primer elongationCASS RNApol(COG1353)

Amplified annealing

long dsRNA (stable)

p-dicerDuplex degradation

RAMPRAMP binding

Cycle continues

Annealing to RNA target

Variant of CASS functioning with polymerasepsi-RNA amplification(mostly in thermophiles but also in Mycobacteria)

plasmid or phage mRNA

pre-psiRNA75-100 nt

CASS RNApol(COG1353)Or other RT

Reverse transcription with random copy choice

Random RNA recombination and reverse transcription

dsDNA with CRISPRs and target-derived spacers

Homologous recombination with genomic CRISPR region

integrase(COG1518)

genomic DNA

genomic DNA with new target-derived spacer

OR

New psiRNA generation

Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12

Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity

Key validation

bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease

Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes

(1) Adaptation

(2) Expression amp Processing

(3) Interference

invading virus plasmid

CRISPRleader

5 4 3 2 1

6 5 4 3 2 1

invading virus plasmid

invading virus plasmid

host

host

cas operon

Cascade

Cas3

cas3 cas2cas1cas5

Van der Oost et al TIBS 2009

The three types of CRISPRCas systems and their signature genes

1343

1343

1343

1343

Helicase1203

HD-f

Cas2

Cas6

Cas6

Cas5

Cas8

Cas9

Cas1

Cas3

Cas10

Cas11

Cas7

Cas6

Cas6

Cas5 Cas6

Cas4

Cas6

13433513

1343

RuvC-f

3513

HNH

13433513

1857

18571857 RAMP1688

RAMP1688

RecB-f1468

RecB-f1468

Helicase1203

R AMP1583

1857 1857Helicase1203

1857Helicase1203

1343 y1724

ygcKygcL

EcoliCASS2

YpestCASS3

NmeniCASS4

MtubeCASS6 RAMP module

DvulgCASS1

Tneap and HmariCASS7

ApernCASS5

Type I (Helicase cassette)

Cascade

Type II (HNH-type)

Type III (RAMP module or Polymerase cassette)

CASS4a

1857 1343RecB-f1468

Helicase1203

BH0338MTH1090LA3191

1421 RAMP1567

RAMP15833337RAMP

1583

RAMP1583

RAMP1769

SPy1049

Cas6

Cas61857

RecB-f1468

RAMP1688

Helicase1203

AF0070

R AMP1583

1343

1517

Cas12 Cas13

RAMP1688

RAMP1688

RAMP1688

Makarova et al NRM submitted

Experimental data on CRISPRCas systems

1857

1857

1857

RAMP1688

10

Cordip DIP0037Bacfra BF3951

Wolsuc WS1444Camjej Cj1522c

Neimen NMA0630Pasmul PM1126

Strthe str0658Strpyo Spy 0770

Treden TDE0328Staepi SERP2463

Mycgal MGA 0523Mycmob MMOB0320

Porgin PG1982Legpne lpp0161

Wolsuc WS1615Sulsol SSO1450

Aerper APE1240Sulsol SSO1405

Pyraer PAE0200Pyraer PAE0081

Metmaz MM0559Metmaz MM3249

Arcful AF1878Mansuc MS1635

Niteur NE0111Myctub Rv2817c

Nostoc alr0381Synech slr7071Thethe TTHB145

Chrvio CV1229Xanaxo XAC3842

Chltep CT1130Desvul DVUA0134

Metcap MCA0651Azoarc ebA3283

Deigeo_DRAFT_1682Mansuc MS0981

Baccla ABC3592Bachal BH0341

Strpyo Spy 1286Vibvul VVA1544

Nostoc alr1468Synech slr7092

Synech slr7016Geosul GSU0057

Thethe TTHB224Nostoc alr1568

Metkan MK1312

Esccol b2755Saltyp STM2938

Phopro PBPRC0034Geosul GSU1392

Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639

Symthe_STH663Corjei jk0643

Nocfar nfa44220Strave SAV7537

Metcap MCA0930Cordip DIP2214

Chrvio CV1756Zymmob ZMO0680

Pasmul PM0311Erwcar ECA3679

Yerpes y1722Legpne lpl2837Phopro PBPRB1995

Acinet ACIAD2484

Clotet CTC01148Fusnuc FN1177

Aquaeo aq 369Theten TTE2658Themar TM1797

Bacfra BF2544Porgin PG2014

Halmar pNG4053Clotet CTC01463

Metjan MJ0378Pyrhor PH1245

Thekod TK0455Metthe MTH1084

Arcful AF2435Metace MA3670

Nanequ NEQ017Pyrhor PH0173

Pictor PTO0003Pictor PTO0049Thevol TVN0106

1343

1343

1343

1343

1518

1518

1343

1518

RecB-f4343

RecB-f1468

RecB-f1468

RecB-f1468

Helicase12031857 RAMP

1688

RAMP1688

Helicase1203

Helicase1203

RAMPBH0337

RAMPYgcH

RAMPy1727

RAMPy17261857

Helicase1203

HD-f

Helicase1203

HD-f

1343

HD-f

HD-f

HD-f

RuvC-f

AF0070

BH0338

y1724

ygcKygcL

RAMP1583

HTH

AF1870

SPy1049

3574

AF1870

PH0918

HTH 3574

1857 RAMP1688

RecB-f1468

Helicase1203

HD-f

AF1870

MTH1090

AF1873

RAMP1583

3574

AF1870

3574

1343

1

7

2

3

4

5

6

7a

Sulsol SSO1999Sulsol SSO1440

Sultok ST2639Sultok ST0032

Sulsol SSO1402Pyraer PAE0068

Arcful AF1874Pyrhor PH0917Thekod TK0450

Pyraby PAB1689

13433513

3513

1518

RuvC-f

HNH

4aHNH

1 0

Pyrhor PH0162

6

64

6

4

16

17

7

7

7

7

1

2

7

7

7

7

5 7

5 75 7

5 75 7

5 7

6 1

7

Metkan MK1314

Metace MA1931

Metmaz MM3356

Metbar Mbar_A1355

Strthe stu0960Myctub Rv2823cStaepi SERP2461

Themar TM1811Mansuc MS1651

Niteur NE0123Thethe TTP0102

Metthe MTH1082

Metthe MTH326

Metjan MJ1672Pictor PTO0053Thevol TVN0112

Metkan MK1297Thethe TTP0115

Bachal BH0328Synech sll7090

Aquaeo aq 387Theten TTE2639

Themar TM1794Arcful AF1867

Pyrfur PF1129Thefus Tfu1578

Porgin PG1987Sultok ST1979

Sulsol SSO1991Sulsol SSO1729

Sultok ST0012Sulaci Saci 2046Nostoc all1479

1421 RAMP1567

RAMP1583

RAMPMTH323

RAMP1769

XMTH324

ldquoPolymeraserdquo cassette

ldquoHelicaserdquo cassette(A)

(B)

(C)

Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4

Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56

Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)

mRNA targeting

Phylogeny of Cas1 and the 3 types of CRISPRCas systems

228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes

EcoliCASS2Type IType I

Type I

Type I

Type I

Type III

Type III

Type III

Type III

Type III

Type III

Type II

YpestCASS3

NmeniCASS4

PolymeraseRAMPmoduleMtubeCASS6

PolRAMP

DvulgCASS1

TneapHmariCASS7

ApernCASS5

cas1cas9 cas2 cas4

cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI

II

IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2

RAMP

Makarova et al NRM submitted

CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria

bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes

54

256

1537

26

516

2

9

7 56107

3

10

13

383

210

46

216

8

1

7 70211

10

1

0102030405060708090

100

1623

84 6

17 7

2322

0

9

0

0

151

2

1

21

17

20 1

0

1533 28 7 14

0

9 7 40

117 210

0102030405060708090

100

Type I Type II Type IIICas1presenceabsence

Makarova et al in preparation

Modular evolution of the 3 types of CRISPRCas

Makarova et al in preparation

Mol Microbiol 2011 Jan79(2)484-502

A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair

Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF

Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair

Back to repair functions

The 3 major modalities of evolution

Koonin Wolf Biol Direct 2009

CRISPRCas as a bona fide Lamarckian system

Koonin Wolf Biol Direct 2009

Phenomenon Biological rolefunction

Phyletic spread

Lamarckian criteria

Genomic changes caused by environmental factor

Changes are specific to relevant genomic loci

Changes provide adaptation to the causative factor

Bona fide LamarckianCRISPRCas Defense against

viruses and other mobile elements

Most of the Archaeaand many bacteria

Yes Yes Yes

piRNA Defense against transposable elements in germline

Animals Yes Yes Yes

HGT (specific cases)

Adaptation to new environment stress response resistance

Archaea bacteria unicellular eukaryotes

Yes Yes Yes

Quasi-LamarckianHGT (general phenomenon)

Diverse innovations Archaea bacteria unicellular eukaryotes

Yes No Yesno

Stress-induced mutagenesis

Stress responseresistanceadaptation to new conditions

Ubiquitous Yes No or partially

Yes (but general evolvabilityenhanced as well)

Diverse Lamarckian and quasi-Lamarckian phenomena

Koonin Wolf Biol Direct 2009

Stress as a gauge of evolutionary modality

Koonin Wolf Biol Direct 2009

bullEvolution of parasites is intrinsic to any replicator system

bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since

bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms

bullPerennial arms race between parasites and hosts is one of the principal factors of evolution

Valerian DoljaOregon State University

Yuri Wolf NCBITatiana Senkevich NIAID NIH

Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille

Didier Raoult et al

Bill MartinUniv Duesseldorf

Kira Makarova NCBI

  • Slide Number 1
  • What is a virus
  • Slide Number 3
  • Slide Number 4
  • Slide Number 5
  • Comparative genomics shows that viruses that cause human diseases belong to families that evolved hundreds of millions or even billions years agoViruses accompany the evolving cellular life throughout its history and might even predate it
  • Slide Number 7
  • Slide Number 8
  • Slide Number 9
  • Slide Number 10
  • Slide Number 11
  • Slide Number 12
  • Slide Number 13
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Slide Number 18
  • Slide Number 19
  • Slide Number 20
  • Slide Number 21
  • Slide Number 22
  • Slide Number 23
  • Slide Number 24
  • Slide Number 25
  • Slide Number 26
  • Slide Number 27
  • Slide Number 28
  • Slide Number 29
  • Slide Number 30
  • Slide Number 31
  • Slide Number 32
  • Slide Number 33
  • Slide Number 34
  • Slide Number 35
  • Slide Number 36
  • Evolution of antivirus defense systems
  • Slide Number 38
  • Slide Number 39
  • Slide Number 40
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins extreme sequence diversity
  • Slide Number 46
  • Slide Number 47
  • Slide Number 48
  • Slide Number 49
  • Hypothesis
  • Slide Number 51
  • Slide Number 52
  • Slide Number 53
  • Slide Number 54
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • The three types of CRISPRCas systems and their signature genes
  • Experimental data on CRISPRCas systems
  • Phylogeny of Cas1 and the 3 types of CRISPRCas systems
  • CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
  • Slide Number 62
  • Slide Number 63
  • Slide Number 64
  • Slide Number 65
  • Slide Number 66
  • Slide Number 67
  • Slide Number 68
  • Slide Number 69
Page 42: The Virus World, its evolution, evolution of antiviral defense

RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins

extreme sequence diversity MOTIFs specific I II III IV V COGs 13361367 hhshhGs ust-lKGhh+hh hhGtt hD lGhttsGh 160413371332 y1726-like slhlpEKuVRGT lRTIDs YGuVTsGhuh COG1851 hGphpGpsaFh hGFGRh BH0337-like TpA-h+GIh-uIh hhLpDV LGsREhuht COG1567 hhhpp ups-lhtAhh luscoGhGh COG1769 hhhh+Ph-hhh ss-hhGhlhsh hGh hGtcp+hsthchp COG1688 (Cas5) hhhhhts sssshhGhlsh lGttphh COGs 15835551 hhhhoPhhl hGtppshGFGl YgcH-like hHphlh hGu+uhGhGhh y1727-like LHphLh GFsthGLStss MJ0978-like hHNH lG+tsuhGhGol

Ferredoxin foldRNA-recognition motif (RRM)

The sequence similarity space of CRISPR repeats visualized with the BioLayoutprogram [26] Dots denote individual repeat sequences connecting lines represent Smith-Waterman similarities such that closer dots represent more similar sequences Dot colors denote cluster association as derived from MCL clustering The 12 largest clusters are indicated by circles together with their sequence logos coarse phylogenetic composition and sample secondary structures where applicableKunin et al Genome Biology 2007 8R61 doi101186gb-2007-8-4-r61

CRISPR show extreme diversity and complex clustering

Arcful GTTGAAATC-------AGACCAAAATGGGATTGAAAG- Metthe GTTAAAATC-------AGACCAAAATGGGATTGAAAT- Pyraby GTTCCAATA-------AGACTAGAATAGAATTGAAAG- Pyrhor GTTTAATAA--------GACTAAAATAGAATTGAAAG- Sulsol GATTAATCC-------------CAAAAGAATTGAAAG- Pyrhor GTTTCCGTA--------GAACTAAATAGTGTGGAAAG- Pyraby GTTTCCGTA--------GAACTTAGTAGTGTGGAAAG- Pyraer GTTTCAACT------------ATCTTTTGATTTTTGG- Pictor GTTAAATAA-------TAACCTAAATAGGATTGAAAG- Theaer GTAAAATAG---------ACCTTAATAGGATTGAAAG- Metace GTTTCAATC-------CCTCAAAGGTCTGATTTTAAC- Sultok GATGAATCC------------CAAAAGGAATTGAAAG- Sulaci GTTTTAGTT-------------TCTTGTCGTTATTAC- Pictor GTTTAAGAA--------TTACTAGATAGTATGGAGT-- Halmar GTTTCAGAC-------GGACCCTTGTGGGGTTGAAGC- Metjan ATTAAAATC-------AGACCGTTTCGGAATGGAAAT- Metkan GTTTCATTA-CCCGTATTATTACGGGTTAATTGCGAG- Nanequ CTTTCAATA---------TTTCTAATATATTAGAAAC- Themar GTTTCAATA-------CTTCCTTAGAGGTATGGAAAC- Theten GTTTCAATC-------CCTTTTAGGTAGGCTAAAAAC Thethe GTTGCAAAC-CTCGTTAGCCTCGTAGAGGATTGAAAC- Aquaeo GTTTTAACT--------CCACACGGTACATTAGAAAC- Bachal GTCGCACTC----TATATGGGT-GCGTGGATTGAAAT- Azoarcus GTGTTCCCC-------GCGCATCGCGGGGGTTGAAG-- Chltep GTCTTCCCC--------ACGCC-CGTGGGGGTGTTTC- Chrvio GTGCTCCCC-------CACGCA-CGTGGGGATGAACCG Clotet GTATTAGTA-------GCACCA-TATTGGAATGTAAAT Anabaena GTTTTAATTAACAAAAATCCCTATCAGGGATTGAAAC- Myctub GTTTCCGTC--CCCTCTCGGGGTTTTGGGTCTGACGAC Yerpes GTTCACTGC--------CGCACAGGCAGCTTAGAAA-- Metcap GTTTCAATCCACTCCCGGCTATTTAGCCGGGAGATAC- Mycgal GTTTTAGCA-CTGTACAATACTTGTGTAAGCAATAAC- Pasmul GTTAACTGC--------CGTATAGGCAGCTTAGAAA-- Pasmul GTTGTAGTTCCCTCTCTCATTTCGCAGTGCTACAAT-- Pasmul GTTCACCAT--------CGTGTAGATGGCTTAGAAA--

All

A large subset of CRISPRs is conserved among diverse specieseven between archaea and bacteria suggesting that

CRISPRs are horizontally transferred together with cas genes

Mojica FJ Diez-Villasenor C Garcia-Martinez J Soria E

Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elementsJ Mol Evol 2005 Feb60(2)174-82

Here we show that CRISPR spacers derive from preexisting sequences either chromosomal or within transmissible genetic elements such as bacteriophages and conjugative plasmids

helliphelliphelliphellipThe transcription of the CRISPR loci (Tang et al 2002) suggests that such activity could be executed by CRISPR-RNA molecules acting as regulatory RNA that specifically recognizes the target through the homologous RNA-spacer sequence similarly to the eukaryotic interference RNA

HypothesisCRISPRCasbull is a prokaryotic immunity system that functions on the

RNAi principle

bull integrates short fragments of essential phageplasmid genes into CRISPRs

bull When expressed these fragments (psiRNA ndash after prokaryotic siRNA) silence the target gene and make the organism immune to the respective agent

bull contains all or most of the protein activities involved in these processes

bull Some of the Cas proteins are functional analogs of the eukaryotic proteins involved in RNAi in particular components of RISC (RNA-Induced Silencing Complex) and form prokaryotic analogs of RISC (pRISCs)

Transcription

(protein-guided) RNA folding

specific transcription factor (COG1517)

cellular RNApol

polycistronic pre-psiRNA

p-dicer (Helicase+HD-Hydrolase)(COG1203 COG2254)

RNA processing(fast)

pre-psiRNA75-100 nt

RNA processing(slow)

p-dicer

psiRNA25-45 nt

p-slicer(COG1468 COG4343

COG1857)

RAMP

target RNA cleavage

p-RISC

plasmid or phage mRNA

Annealing to RNA target

p-RISC

The basic scheme of CASS functioning

psiRNA25-45 nt

RAMP

RAMP-RNA complex(unstable)plasmid or phage

mRNA

Primer elongationCASS RNApol(COG1353)

Amplified annealing

long dsRNA (stable)

p-dicerDuplex degradation

RAMPRAMP binding

Cycle continues

Annealing to RNA target

Variant of CASS functioning with polymerasepsi-RNA amplification(mostly in thermophiles but also in Mycobacteria)

plasmid or phage mRNA

pre-psiRNA75-100 nt

CASS RNApol(COG1353)Or other RT

Reverse transcription with random copy choice

Random RNA recombination and reverse transcription

dsDNA with CRISPRs and target-derived spacers

Homologous recombination with genomic CRISPR region

integrase(COG1518)

genomic DNA

genomic DNA with new target-derived spacer

OR

New psiRNA generation

Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12

Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity

Key validation

bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease

Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes

(1) Adaptation

(2) Expression amp Processing

(3) Interference

invading virus plasmid

CRISPRleader

5 4 3 2 1

6 5 4 3 2 1

invading virus plasmid

invading virus plasmid

host

host

cas operon

Cascade

Cas3

cas3 cas2cas1cas5

Van der Oost et al TIBS 2009

The three types of CRISPRCas systems and their signature genes

1343

1343

1343

1343

Helicase1203

HD-f

Cas2

Cas6

Cas6

Cas5

Cas8

Cas9

Cas1

Cas3

Cas10

Cas11

Cas7

Cas6

Cas6

Cas5 Cas6

Cas4

Cas6

13433513

1343

RuvC-f

3513

HNH

13433513

1857

18571857 RAMP1688

RAMP1688

RecB-f1468

RecB-f1468

Helicase1203

R AMP1583

1857 1857Helicase1203

1857Helicase1203

1343 y1724

ygcKygcL

EcoliCASS2

YpestCASS3

NmeniCASS4

MtubeCASS6 RAMP module

DvulgCASS1

Tneap and HmariCASS7

ApernCASS5

Type I (Helicase cassette)

Cascade

Type II (HNH-type)

Type III (RAMP module or Polymerase cassette)

CASS4a

1857 1343RecB-f1468

Helicase1203

BH0338MTH1090LA3191

1421 RAMP1567

RAMP15833337RAMP

1583

RAMP1583

RAMP1769

SPy1049

Cas6

Cas61857

RecB-f1468

RAMP1688

Helicase1203

AF0070

R AMP1583

1343

1517

Cas12 Cas13

RAMP1688

RAMP1688

RAMP1688

Makarova et al NRM submitted

Experimental data on CRISPRCas systems

1857

1857

1857

RAMP1688

10

Cordip DIP0037Bacfra BF3951

Wolsuc WS1444Camjej Cj1522c

Neimen NMA0630Pasmul PM1126

Strthe str0658Strpyo Spy 0770

Treden TDE0328Staepi SERP2463

Mycgal MGA 0523Mycmob MMOB0320

Porgin PG1982Legpne lpp0161

Wolsuc WS1615Sulsol SSO1450

Aerper APE1240Sulsol SSO1405

Pyraer PAE0200Pyraer PAE0081

Metmaz MM0559Metmaz MM3249

Arcful AF1878Mansuc MS1635

Niteur NE0111Myctub Rv2817c

Nostoc alr0381Synech slr7071Thethe TTHB145

Chrvio CV1229Xanaxo XAC3842

Chltep CT1130Desvul DVUA0134

Metcap MCA0651Azoarc ebA3283

Deigeo_DRAFT_1682Mansuc MS0981

Baccla ABC3592Bachal BH0341

Strpyo Spy 1286Vibvul VVA1544

Nostoc alr1468Synech slr7092

Synech slr7016Geosul GSU0057

Thethe TTHB224Nostoc alr1568

Metkan MK1312

Esccol b2755Saltyp STM2938

Phopro PBPRC0034Geosul GSU1392

Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639

Symthe_STH663Corjei jk0643

Nocfar nfa44220Strave SAV7537

Metcap MCA0930Cordip DIP2214

Chrvio CV1756Zymmob ZMO0680

Pasmul PM0311Erwcar ECA3679

Yerpes y1722Legpne lpl2837Phopro PBPRB1995

Acinet ACIAD2484

Clotet CTC01148Fusnuc FN1177

Aquaeo aq 369Theten TTE2658Themar TM1797

Bacfra BF2544Porgin PG2014

Halmar pNG4053Clotet CTC01463

Metjan MJ0378Pyrhor PH1245

Thekod TK0455Metthe MTH1084

Arcful AF2435Metace MA3670

Nanequ NEQ017Pyrhor PH0173

Pictor PTO0003Pictor PTO0049Thevol TVN0106

1343

1343

1343

1343

1518

1518

1343

1518

RecB-f4343

RecB-f1468

RecB-f1468

RecB-f1468

Helicase12031857 RAMP

1688

RAMP1688

Helicase1203

Helicase1203

RAMPBH0337

RAMPYgcH

RAMPy1727

RAMPy17261857

Helicase1203

HD-f

Helicase1203

HD-f

1343

HD-f

HD-f

HD-f

RuvC-f

AF0070

BH0338

y1724

ygcKygcL

RAMP1583

HTH

AF1870

SPy1049

3574

AF1870

PH0918

HTH 3574

1857 RAMP1688

RecB-f1468

Helicase1203

HD-f

AF1870

MTH1090

AF1873

RAMP1583

3574

AF1870

3574

1343

1

7

2

3

4

5

6

7a

Sulsol SSO1999Sulsol SSO1440

Sultok ST2639Sultok ST0032

Sulsol SSO1402Pyraer PAE0068

Arcful AF1874Pyrhor PH0917Thekod TK0450

Pyraby PAB1689

13433513

3513

1518

RuvC-f

HNH

4aHNH

1 0

Pyrhor PH0162

6

64

6

4

16

17

7

7

7

7

1

2

7

7

7

7

5 7

5 75 7

5 75 7

5 7

6 1

7

Metkan MK1314

Metace MA1931

Metmaz MM3356

Metbar Mbar_A1355

Strthe stu0960Myctub Rv2823cStaepi SERP2461

Themar TM1811Mansuc MS1651

Niteur NE0123Thethe TTP0102

Metthe MTH1082

Metthe MTH326

Metjan MJ1672Pictor PTO0053Thevol TVN0112

Metkan MK1297Thethe TTP0115

Bachal BH0328Synech sll7090

Aquaeo aq 387Theten TTE2639

Themar TM1794Arcful AF1867

Pyrfur PF1129Thefus Tfu1578

Porgin PG1987Sultok ST1979

Sulsol SSO1991Sulsol SSO1729

Sultok ST0012Sulaci Saci 2046Nostoc all1479

1421 RAMP1567

RAMP1583

RAMPMTH323

RAMP1769

XMTH324

ldquoPolymeraserdquo cassette

ldquoHelicaserdquo cassette(A)

(B)

(C)

Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4

Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56

Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)

mRNA targeting

Phylogeny of Cas1 and the 3 types of CRISPRCas systems

228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes

EcoliCASS2Type IType I

Type I

Type I

Type I

Type III

Type III

Type III

Type III

Type III

Type III

Type II

YpestCASS3

NmeniCASS4

PolymeraseRAMPmoduleMtubeCASS6

PolRAMP

DvulgCASS1

TneapHmariCASS7

ApernCASS5

cas1cas9 cas2 cas4

cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI

II

IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2

RAMP

Makarova et al NRM submitted

CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria

bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes

54

256

1537

26

516

2

9

7 56107

3

10

13

383

210

46

216

8

1

7 70211

10

1

0102030405060708090

100

1623

84 6

17 7

2322

0

9

0

0

151

2

1

21

17

20 1

0

1533 28 7 14

0

9 7 40

117 210

0102030405060708090

100

Type I Type II Type IIICas1presenceabsence

Makarova et al in preparation

Modular evolution of the 3 types of CRISPRCas

Makarova et al in preparation

Mol Microbiol 2011 Jan79(2)484-502

A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair

Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF

Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair

Back to repair functions

The 3 major modalities of evolution

Koonin Wolf Biol Direct 2009

CRISPRCas as a bona fide Lamarckian system

Koonin Wolf Biol Direct 2009

Phenomenon Biological rolefunction

Phyletic spread

Lamarckian criteria

Genomic changes caused by environmental factor

Changes are specific to relevant genomic loci

Changes provide adaptation to the causative factor

Bona fide LamarckianCRISPRCas Defense against

viruses and other mobile elements

Most of the Archaeaand many bacteria

Yes Yes Yes

piRNA Defense against transposable elements in germline

Animals Yes Yes Yes

HGT (specific cases)

Adaptation to new environment stress response resistance

Archaea bacteria unicellular eukaryotes

Yes Yes Yes

Quasi-LamarckianHGT (general phenomenon)

Diverse innovations Archaea bacteria unicellular eukaryotes

Yes No Yesno

Stress-induced mutagenesis

Stress responseresistanceadaptation to new conditions

Ubiquitous Yes No or partially

Yes (but general evolvabilityenhanced as well)

Diverse Lamarckian and quasi-Lamarckian phenomena

Koonin Wolf Biol Direct 2009

Stress as a gauge of evolutionary modality

Koonin Wolf Biol Direct 2009

bullEvolution of parasites is intrinsic to any replicator system

bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since

bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms

bullPerennial arms race between parasites and hosts is one of the principal factors of evolution

Valerian DoljaOregon State University

Yuri Wolf NCBITatiana Senkevich NIAID NIH

Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille

Didier Raoult et al

Bill MartinUniv Duesseldorf

Kira Makarova NCBI

  • Slide Number 1
  • What is a virus
  • Slide Number 3
  • Slide Number 4
  • Slide Number 5
  • Comparative genomics shows that viruses that cause human diseases belong to families that evolved hundreds of millions or even billions years agoViruses accompany the evolving cellular life throughout its history and might even predate it
  • Slide Number 7
  • Slide Number 8
  • Slide Number 9
  • Slide Number 10
  • Slide Number 11
  • Slide Number 12
  • Slide Number 13
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Slide Number 18
  • Slide Number 19
  • Slide Number 20
  • Slide Number 21
  • Slide Number 22
  • Slide Number 23
  • Slide Number 24
  • Slide Number 25
  • Slide Number 26
  • Slide Number 27
  • Slide Number 28
  • Slide Number 29
  • Slide Number 30
  • Slide Number 31
  • Slide Number 32
  • Slide Number 33
  • Slide Number 34
  • Slide Number 35
  • Slide Number 36
  • Evolution of antivirus defense systems
  • Slide Number 38
  • Slide Number 39
  • Slide Number 40
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins extreme sequence diversity
  • Slide Number 46
  • Slide Number 47
  • Slide Number 48
  • Slide Number 49
  • Hypothesis
  • Slide Number 51
  • Slide Number 52
  • Slide Number 53
  • Slide Number 54
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • The three types of CRISPRCas systems and their signature genes
  • Experimental data on CRISPRCas systems
  • Phylogeny of Cas1 and the 3 types of CRISPRCas systems
  • CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
  • Slide Number 62
  • Slide Number 63
  • Slide Number 64
  • Slide Number 65
  • Slide Number 66
  • Slide Number 67
  • Slide Number 68
  • Slide Number 69
Page 43: The Virus World, its evolution, evolution of antiviral defense

The sequence similarity space of CRISPR repeats visualized with the BioLayoutprogram [26] Dots denote individual repeat sequences connecting lines represent Smith-Waterman similarities such that closer dots represent more similar sequences Dot colors denote cluster association as derived from MCL clustering The 12 largest clusters are indicated by circles together with their sequence logos coarse phylogenetic composition and sample secondary structures where applicableKunin et al Genome Biology 2007 8R61 doi101186gb-2007-8-4-r61

CRISPR show extreme diversity and complex clustering

Arcful GTTGAAATC-------AGACCAAAATGGGATTGAAAG- Metthe GTTAAAATC-------AGACCAAAATGGGATTGAAAT- Pyraby GTTCCAATA-------AGACTAGAATAGAATTGAAAG- Pyrhor GTTTAATAA--------GACTAAAATAGAATTGAAAG- Sulsol GATTAATCC-------------CAAAAGAATTGAAAG- Pyrhor GTTTCCGTA--------GAACTAAATAGTGTGGAAAG- Pyraby GTTTCCGTA--------GAACTTAGTAGTGTGGAAAG- Pyraer GTTTCAACT------------ATCTTTTGATTTTTGG- Pictor GTTAAATAA-------TAACCTAAATAGGATTGAAAG- Theaer GTAAAATAG---------ACCTTAATAGGATTGAAAG- Metace GTTTCAATC-------CCTCAAAGGTCTGATTTTAAC- Sultok GATGAATCC------------CAAAAGGAATTGAAAG- Sulaci GTTTTAGTT-------------TCTTGTCGTTATTAC- Pictor GTTTAAGAA--------TTACTAGATAGTATGGAGT-- Halmar GTTTCAGAC-------GGACCCTTGTGGGGTTGAAGC- Metjan ATTAAAATC-------AGACCGTTTCGGAATGGAAAT- Metkan GTTTCATTA-CCCGTATTATTACGGGTTAATTGCGAG- Nanequ CTTTCAATA---------TTTCTAATATATTAGAAAC- Themar GTTTCAATA-------CTTCCTTAGAGGTATGGAAAC- Theten GTTTCAATC-------CCTTTTAGGTAGGCTAAAAAC Thethe GTTGCAAAC-CTCGTTAGCCTCGTAGAGGATTGAAAC- Aquaeo GTTTTAACT--------CCACACGGTACATTAGAAAC- Bachal GTCGCACTC----TATATGGGT-GCGTGGATTGAAAT- Azoarcus GTGTTCCCC-------GCGCATCGCGGGGGTTGAAG-- Chltep GTCTTCCCC--------ACGCC-CGTGGGGGTGTTTC- Chrvio GTGCTCCCC-------CACGCA-CGTGGGGATGAACCG Clotet GTATTAGTA-------GCACCA-TATTGGAATGTAAAT Anabaena GTTTTAATTAACAAAAATCCCTATCAGGGATTGAAAC- Myctub GTTTCCGTC--CCCTCTCGGGGTTTTGGGTCTGACGAC Yerpes GTTCACTGC--------CGCACAGGCAGCTTAGAAA-- Metcap GTTTCAATCCACTCCCGGCTATTTAGCCGGGAGATAC- Mycgal GTTTTAGCA-CTGTACAATACTTGTGTAAGCAATAAC- Pasmul GTTAACTGC--------CGTATAGGCAGCTTAGAAA-- Pasmul GTTGTAGTTCCCTCTCTCATTTCGCAGTGCTACAAT-- Pasmul GTTCACCAT--------CGTGTAGATGGCTTAGAAA--

All

A large subset of CRISPRs is conserved among diverse specieseven between archaea and bacteria suggesting that

CRISPRs are horizontally transferred together with cas genes

Mojica FJ Diez-Villasenor C Garcia-Martinez J Soria E

Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elementsJ Mol Evol 2005 Feb60(2)174-82

Here we show that CRISPR spacers derive from preexisting sequences either chromosomal or within transmissible genetic elements such as bacteriophages and conjugative plasmids

helliphelliphelliphellipThe transcription of the CRISPR loci (Tang et al 2002) suggests that such activity could be executed by CRISPR-RNA molecules acting as regulatory RNA that specifically recognizes the target through the homologous RNA-spacer sequence similarly to the eukaryotic interference RNA

HypothesisCRISPRCasbull is a prokaryotic immunity system that functions on the

RNAi principle

bull integrates short fragments of essential phageplasmid genes into CRISPRs

bull When expressed these fragments (psiRNA ndash after prokaryotic siRNA) silence the target gene and make the organism immune to the respective agent

bull contains all or most of the protein activities involved in these processes

bull Some of the Cas proteins are functional analogs of the eukaryotic proteins involved in RNAi in particular components of RISC (RNA-Induced Silencing Complex) and form prokaryotic analogs of RISC (pRISCs)

Transcription

(protein-guided) RNA folding

specific transcription factor (COG1517)

cellular RNApol

polycistronic pre-psiRNA

p-dicer (Helicase+HD-Hydrolase)(COG1203 COG2254)

RNA processing(fast)

pre-psiRNA75-100 nt

RNA processing(slow)

p-dicer

psiRNA25-45 nt

p-slicer(COG1468 COG4343

COG1857)

RAMP

target RNA cleavage

p-RISC

plasmid or phage mRNA

Annealing to RNA target

p-RISC

The basic scheme of CASS functioning

psiRNA25-45 nt

RAMP

RAMP-RNA complex(unstable)plasmid or phage

mRNA

Primer elongationCASS RNApol(COG1353)

Amplified annealing

long dsRNA (stable)

p-dicerDuplex degradation

RAMPRAMP binding

Cycle continues

Annealing to RNA target

Variant of CASS functioning with polymerasepsi-RNA amplification(mostly in thermophiles but also in Mycobacteria)

plasmid or phage mRNA

pre-psiRNA75-100 nt

CASS RNApol(COG1353)Or other RT

Reverse transcription with random copy choice

Random RNA recombination and reverse transcription

dsDNA with CRISPRs and target-derived spacers

Homologous recombination with genomic CRISPR region

integrase(COG1518)

genomic DNA

genomic DNA with new target-derived spacer

OR

New psiRNA generation

Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12

Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity

Key validation

bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease

Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes

(1) Adaptation

(2) Expression amp Processing

(3) Interference

invading virus plasmid

CRISPRleader

5 4 3 2 1

6 5 4 3 2 1

invading virus plasmid

invading virus plasmid

host

host

cas operon

Cascade

Cas3

cas3 cas2cas1cas5

Van der Oost et al TIBS 2009

The three types of CRISPRCas systems and their signature genes

1343

1343

1343

1343

Helicase1203

HD-f

Cas2

Cas6

Cas6

Cas5

Cas8

Cas9

Cas1

Cas3

Cas10

Cas11

Cas7

Cas6

Cas6

Cas5 Cas6

Cas4

Cas6

13433513

1343

RuvC-f

3513

HNH

13433513

1857

18571857 RAMP1688

RAMP1688

RecB-f1468

RecB-f1468

Helicase1203

R AMP1583

1857 1857Helicase1203

1857Helicase1203

1343 y1724

ygcKygcL

EcoliCASS2

YpestCASS3

NmeniCASS4

MtubeCASS6 RAMP module

DvulgCASS1

Tneap and HmariCASS7

ApernCASS5

Type I (Helicase cassette)

Cascade

Type II (HNH-type)

Type III (RAMP module or Polymerase cassette)

CASS4a

1857 1343RecB-f1468

Helicase1203

BH0338MTH1090LA3191

1421 RAMP1567

RAMP15833337RAMP

1583

RAMP1583

RAMP1769

SPy1049

Cas6

Cas61857

RecB-f1468

RAMP1688

Helicase1203

AF0070

R AMP1583

1343

1517

Cas12 Cas13

RAMP1688

RAMP1688

RAMP1688

Makarova et al NRM submitted

Experimental data on CRISPRCas systems

1857

1857

1857

RAMP1688

10

Cordip DIP0037Bacfra BF3951

Wolsuc WS1444Camjej Cj1522c

Neimen NMA0630Pasmul PM1126

Strthe str0658Strpyo Spy 0770

Treden TDE0328Staepi SERP2463

Mycgal MGA 0523Mycmob MMOB0320

Porgin PG1982Legpne lpp0161

Wolsuc WS1615Sulsol SSO1450

Aerper APE1240Sulsol SSO1405

Pyraer PAE0200Pyraer PAE0081

Metmaz MM0559Metmaz MM3249

Arcful AF1878Mansuc MS1635

Niteur NE0111Myctub Rv2817c

Nostoc alr0381Synech slr7071Thethe TTHB145

Chrvio CV1229Xanaxo XAC3842

Chltep CT1130Desvul DVUA0134

Metcap MCA0651Azoarc ebA3283

Deigeo_DRAFT_1682Mansuc MS0981

Baccla ABC3592Bachal BH0341

Strpyo Spy 1286Vibvul VVA1544

Nostoc alr1468Synech slr7092

Synech slr7016Geosul GSU0057

Thethe TTHB224Nostoc alr1568

Metkan MK1312

Esccol b2755Saltyp STM2938

Phopro PBPRC0034Geosul GSU1392

Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639

Symthe_STH663Corjei jk0643

Nocfar nfa44220Strave SAV7537

Metcap MCA0930Cordip DIP2214

Chrvio CV1756Zymmob ZMO0680

Pasmul PM0311Erwcar ECA3679

Yerpes y1722Legpne lpl2837Phopro PBPRB1995

Acinet ACIAD2484

Clotet CTC01148Fusnuc FN1177

Aquaeo aq 369Theten TTE2658Themar TM1797

Bacfra BF2544Porgin PG2014

Halmar pNG4053Clotet CTC01463

Metjan MJ0378Pyrhor PH1245

Thekod TK0455Metthe MTH1084

Arcful AF2435Metace MA3670

Nanequ NEQ017Pyrhor PH0173

Pictor PTO0003Pictor PTO0049Thevol TVN0106

1343

1343

1343

1343

1518

1518

1343

1518

RecB-f4343

RecB-f1468

RecB-f1468

RecB-f1468

Helicase12031857 RAMP

1688

RAMP1688

Helicase1203

Helicase1203

RAMPBH0337

RAMPYgcH

RAMPy1727

RAMPy17261857

Helicase1203

HD-f

Helicase1203

HD-f

1343

HD-f

HD-f

HD-f

RuvC-f

AF0070

BH0338

y1724

ygcKygcL

RAMP1583

HTH

AF1870

SPy1049

3574

AF1870

PH0918

HTH 3574

1857 RAMP1688

RecB-f1468

Helicase1203

HD-f

AF1870

MTH1090

AF1873

RAMP1583

3574

AF1870

3574

1343

1

7

2

3

4

5

6

7a

Sulsol SSO1999Sulsol SSO1440

Sultok ST2639Sultok ST0032

Sulsol SSO1402Pyraer PAE0068

Arcful AF1874Pyrhor PH0917Thekod TK0450

Pyraby PAB1689

13433513

3513

1518

RuvC-f

HNH

4aHNH

1 0

Pyrhor PH0162

6

64

6

4

16

17

7

7

7

7

1

2

7

7

7

7

5 7

5 75 7

5 75 7

5 7

6 1

7

Metkan MK1314

Metace MA1931

Metmaz MM3356

Metbar Mbar_A1355

Strthe stu0960Myctub Rv2823cStaepi SERP2461

Themar TM1811Mansuc MS1651

Niteur NE0123Thethe TTP0102

Metthe MTH1082

Metthe MTH326

Metjan MJ1672Pictor PTO0053Thevol TVN0112

Metkan MK1297Thethe TTP0115

Bachal BH0328Synech sll7090

Aquaeo aq 387Theten TTE2639

Themar TM1794Arcful AF1867

Pyrfur PF1129Thefus Tfu1578

Porgin PG1987Sultok ST1979

Sulsol SSO1991Sulsol SSO1729

Sultok ST0012Sulaci Saci 2046Nostoc all1479

1421 RAMP1567

RAMP1583

RAMPMTH323

RAMP1769

XMTH324

ldquoPolymeraserdquo cassette

ldquoHelicaserdquo cassette(A)

(B)

(C)

Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4

Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56

Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)

mRNA targeting

Phylogeny of Cas1 and the 3 types of CRISPRCas systems

228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes

EcoliCASS2Type IType I

Type I

Type I

Type I

Type III

Type III

Type III

Type III

Type III

Type III

Type II

YpestCASS3

NmeniCASS4

PolymeraseRAMPmoduleMtubeCASS6

PolRAMP

DvulgCASS1

TneapHmariCASS7

ApernCASS5

cas1cas9 cas2 cas4

cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI

II

IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2

RAMP

Makarova et al NRM submitted

CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria

bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes

54

256

1537

26

516

2

9

7 56107

3

10

13

383

210

46

216

8

1

7 70211

10

1

0102030405060708090

100

1623

84 6

17 7

2322

0

9

0

0

151

2

1

21

17

20 1

0

1533 28 7 14

0

9 7 40

117 210

0102030405060708090

100

Type I Type II Type IIICas1presenceabsence

Makarova et al in preparation

Modular evolution of the 3 types of CRISPRCas

Makarova et al in preparation

Mol Microbiol 2011 Jan79(2)484-502

A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair

Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF

Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair

Back to repair functions

The 3 major modalities of evolution

Koonin Wolf Biol Direct 2009

CRISPRCas as a bona fide Lamarckian system

Koonin Wolf Biol Direct 2009

Phenomenon Biological rolefunction

Phyletic spread

Lamarckian criteria

Genomic changes caused by environmental factor

Changes are specific to relevant genomic loci

Changes provide adaptation to the causative factor

Bona fide LamarckianCRISPRCas Defense against

viruses and other mobile elements

Most of the Archaeaand many bacteria

Yes Yes Yes

piRNA Defense against transposable elements in germline

Animals Yes Yes Yes

HGT (specific cases)

Adaptation to new environment stress response resistance

Archaea bacteria unicellular eukaryotes

Yes Yes Yes

Quasi-LamarckianHGT (general phenomenon)

Diverse innovations Archaea bacteria unicellular eukaryotes

Yes No Yesno

Stress-induced mutagenesis

Stress responseresistanceadaptation to new conditions

Ubiquitous Yes No or partially

Yes (but general evolvabilityenhanced as well)

Diverse Lamarckian and quasi-Lamarckian phenomena

Koonin Wolf Biol Direct 2009

Stress as a gauge of evolutionary modality

Koonin Wolf Biol Direct 2009

bullEvolution of parasites is intrinsic to any replicator system

bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since

bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms

bullPerennial arms race between parasites and hosts is one of the principal factors of evolution

Valerian DoljaOregon State University

Yuri Wolf NCBITatiana Senkevich NIAID NIH

Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille

Didier Raoult et al

Bill MartinUniv Duesseldorf

Kira Makarova NCBI

  • Slide Number 1
  • What is a virus
  • Slide Number 3
  • Slide Number 4
  • Slide Number 5
  • Comparative genomics shows that viruses that cause human diseases belong to families that evolved hundreds of millions or even billions years agoViruses accompany the evolving cellular life throughout its history and might even predate it
  • Slide Number 7
  • Slide Number 8
  • Slide Number 9
  • Slide Number 10
  • Slide Number 11
  • Slide Number 12
  • Slide Number 13
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Slide Number 18
  • Slide Number 19
  • Slide Number 20
  • Slide Number 21
  • Slide Number 22
  • Slide Number 23
  • Slide Number 24
  • Slide Number 25
  • Slide Number 26
  • Slide Number 27
  • Slide Number 28
  • Slide Number 29
  • Slide Number 30
  • Slide Number 31
  • Slide Number 32
  • Slide Number 33
  • Slide Number 34
  • Slide Number 35
  • Slide Number 36
  • Evolution of antivirus defense systems
  • Slide Number 38
  • Slide Number 39
  • Slide Number 40
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins extreme sequence diversity
  • Slide Number 46
  • Slide Number 47
  • Slide Number 48
  • Slide Number 49
  • Hypothesis
  • Slide Number 51
  • Slide Number 52
  • Slide Number 53
  • Slide Number 54
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • The three types of CRISPRCas systems and their signature genes
  • Experimental data on CRISPRCas systems
  • Phylogeny of Cas1 and the 3 types of CRISPRCas systems
  • CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
  • Slide Number 62
  • Slide Number 63
  • Slide Number 64
  • Slide Number 65
  • Slide Number 66
  • Slide Number 67
  • Slide Number 68
  • Slide Number 69
Page 44: The Virus World, its evolution, evolution of antiviral defense

Arcful GTTGAAATC-------AGACCAAAATGGGATTGAAAG- Metthe GTTAAAATC-------AGACCAAAATGGGATTGAAAT- Pyraby GTTCCAATA-------AGACTAGAATAGAATTGAAAG- Pyrhor GTTTAATAA--------GACTAAAATAGAATTGAAAG- Sulsol GATTAATCC-------------CAAAAGAATTGAAAG- Pyrhor GTTTCCGTA--------GAACTAAATAGTGTGGAAAG- Pyraby GTTTCCGTA--------GAACTTAGTAGTGTGGAAAG- Pyraer GTTTCAACT------------ATCTTTTGATTTTTGG- Pictor GTTAAATAA-------TAACCTAAATAGGATTGAAAG- Theaer GTAAAATAG---------ACCTTAATAGGATTGAAAG- Metace GTTTCAATC-------CCTCAAAGGTCTGATTTTAAC- Sultok GATGAATCC------------CAAAAGGAATTGAAAG- Sulaci GTTTTAGTT-------------TCTTGTCGTTATTAC- Pictor GTTTAAGAA--------TTACTAGATAGTATGGAGT-- Halmar GTTTCAGAC-------GGACCCTTGTGGGGTTGAAGC- Metjan ATTAAAATC-------AGACCGTTTCGGAATGGAAAT- Metkan GTTTCATTA-CCCGTATTATTACGGGTTAATTGCGAG- Nanequ CTTTCAATA---------TTTCTAATATATTAGAAAC- Themar GTTTCAATA-------CTTCCTTAGAGGTATGGAAAC- Theten GTTTCAATC-------CCTTTTAGGTAGGCTAAAAAC Thethe GTTGCAAAC-CTCGTTAGCCTCGTAGAGGATTGAAAC- Aquaeo GTTTTAACT--------CCACACGGTACATTAGAAAC- Bachal GTCGCACTC----TATATGGGT-GCGTGGATTGAAAT- Azoarcus GTGTTCCCC-------GCGCATCGCGGGGGTTGAAG-- Chltep GTCTTCCCC--------ACGCC-CGTGGGGGTGTTTC- Chrvio GTGCTCCCC-------CACGCA-CGTGGGGATGAACCG Clotet GTATTAGTA-------GCACCA-TATTGGAATGTAAAT Anabaena GTTTTAATTAACAAAAATCCCTATCAGGGATTGAAAC- Myctub GTTTCCGTC--CCCTCTCGGGGTTTTGGGTCTGACGAC Yerpes GTTCACTGC--------CGCACAGGCAGCTTAGAAA-- Metcap GTTTCAATCCACTCCCGGCTATTTAGCCGGGAGATAC- Mycgal GTTTTAGCA-CTGTACAATACTTGTGTAAGCAATAAC- Pasmul GTTAACTGC--------CGTATAGGCAGCTTAGAAA-- Pasmul GTTGTAGTTCCCTCTCTCATTTCGCAGTGCTACAAT-- Pasmul GTTCACCAT--------CGTGTAGATGGCTTAGAAA--

All

A large subset of CRISPRs is conserved among diverse specieseven between archaea and bacteria suggesting that

CRISPRs are horizontally transferred together with cas genes

Mojica FJ Diez-Villasenor C Garcia-Martinez J Soria E

Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elementsJ Mol Evol 2005 Feb60(2)174-82

Here we show that CRISPR spacers derive from preexisting sequences either chromosomal or within transmissible genetic elements such as bacteriophages and conjugative plasmids

helliphelliphelliphellipThe transcription of the CRISPR loci (Tang et al 2002) suggests that such activity could be executed by CRISPR-RNA molecules acting as regulatory RNA that specifically recognizes the target through the homologous RNA-spacer sequence similarly to the eukaryotic interference RNA

HypothesisCRISPRCasbull is a prokaryotic immunity system that functions on the

RNAi principle

bull integrates short fragments of essential phageplasmid genes into CRISPRs

bull When expressed these fragments (psiRNA ndash after prokaryotic siRNA) silence the target gene and make the organism immune to the respective agent

bull contains all or most of the protein activities involved in these processes

bull Some of the Cas proteins are functional analogs of the eukaryotic proteins involved in RNAi in particular components of RISC (RNA-Induced Silencing Complex) and form prokaryotic analogs of RISC (pRISCs)

Transcription

(protein-guided) RNA folding

specific transcription factor (COG1517)

cellular RNApol

polycistronic pre-psiRNA

p-dicer (Helicase+HD-Hydrolase)(COG1203 COG2254)

RNA processing(fast)

pre-psiRNA75-100 nt

RNA processing(slow)

p-dicer

psiRNA25-45 nt

p-slicer(COG1468 COG4343

COG1857)

RAMP

target RNA cleavage

p-RISC

plasmid or phage mRNA

Annealing to RNA target

p-RISC

The basic scheme of CASS functioning

psiRNA25-45 nt

RAMP

RAMP-RNA complex(unstable)plasmid or phage

mRNA

Primer elongationCASS RNApol(COG1353)

Amplified annealing

long dsRNA (stable)

p-dicerDuplex degradation

RAMPRAMP binding

Cycle continues

Annealing to RNA target

Variant of CASS functioning with polymerasepsi-RNA amplification(mostly in thermophiles but also in Mycobacteria)

plasmid or phage mRNA

pre-psiRNA75-100 nt

CASS RNApol(COG1353)Or other RT

Reverse transcription with random copy choice

Random RNA recombination and reverse transcription

dsDNA with CRISPRs and target-derived spacers

Homologous recombination with genomic CRISPR region

integrase(COG1518)

genomic DNA

genomic DNA with new target-derived spacer

OR

New psiRNA generation

Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12

Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity

Key validation

bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease

Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes

(1) Adaptation

(2) Expression amp Processing

(3) Interference

invading virus plasmid

CRISPRleader

5 4 3 2 1

6 5 4 3 2 1

invading virus plasmid

invading virus plasmid

host

host

cas operon

Cascade

Cas3

cas3 cas2cas1cas5

Van der Oost et al TIBS 2009

The three types of CRISPRCas systems and their signature genes

1343

1343

1343

1343

Helicase1203

HD-f

Cas2

Cas6

Cas6

Cas5

Cas8

Cas9

Cas1

Cas3

Cas10

Cas11

Cas7

Cas6

Cas6

Cas5 Cas6

Cas4

Cas6

13433513

1343

RuvC-f

3513

HNH

13433513

1857

18571857 RAMP1688

RAMP1688

RecB-f1468

RecB-f1468

Helicase1203

R AMP1583

1857 1857Helicase1203

1857Helicase1203

1343 y1724

ygcKygcL

EcoliCASS2

YpestCASS3

NmeniCASS4

MtubeCASS6 RAMP module

DvulgCASS1

Tneap and HmariCASS7

ApernCASS5

Type I (Helicase cassette)

Cascade

Type II (HNH-type)

Type III (RAMP module or Polymerase cassette)

CASS4a

1857 1343RecB-f1468

Helicase1203

BH0338MTH1090LA3191

1421 RAMP1567

RAMP15833337RAMP

1583

RAMP1583

RAMP1769

SPy1049

Cas6

Cas61857

RecB-f1468

RAMP1688

Helicase1203

AF0070

R AMP1583

1343

1517

Cas12 Cas13

RAMP1688

RAMP1688

RAMP1688

Makarova et al NRM submitted

Experimental data on CRISPRCas systems

1857

1857

1857

RAMP1688

10

Cordip DIP0037Bacfra BF3951

Wolsuc WS1444Camjej Cj1522c

Neimen NMA0630Pasmul PM1126

Strthe str0658Strpyo Spy 0770

Treden TDE0328Staepi SERP2463

Mycgal MGA 0523Mycmob MMOB0320

Porgin PG1982Legpne lpp0161

Wolsuc WS1615Sulsol SSO1450

Aerper APE1240Sulsol SSO1405

Pyraer PAE0200Pyraer PAE0081

Metmaz MM0559Metmaz MM3249

Arcful AF1878Mansuc MS1635

Niteur NE0111Myctub Rv2817c

Nostoc alr0381Synech slr7071Thethe TTHB145

Chrvio CV1229Xanaxo XAC3842

Chltep CT1130Desvul DVUA0134

Metcap MCA0651Azoarc ebA3283

Deigeo_DRAFT_1682Mansuc MS0981

Baccla ABC3592Bachal BH0341

Strpyo Spy 1286Vibvul VVA1544

Nostoc alr1468Synech slr7092

Synech slr7016Geosul GSU0057

Thethe TTHB224Nostoc alr1568

Metkan MK1312

Esccol b2755Saltyp STM2938

Phopro PBPRC0034Geosul GSU1392

Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639

Symthe_STH663Corjei jk0643

Nocfar nfa44220Strave SAV7537

Metcap MCA0930Cordip DIP2214

Chrvio CV1756Zymmob ZMO0680

Pasmul PM0311Erwcar ECA3679

Yerpes y1722Legpne lpl2837Phopro PBPRB1995

Acinet ACIAD2484

Clotet CTC01148Fusnuc FN1177

Aquaeo aq 369Theten TTE2658Themar TM1797

Bacfra BF2544Porgin PG2014

Halmar pNG4053Clotet CTC01463

Metjan MJ0378Pyrhor PH1245

Thekod TK0455Metthe MTH1084

Arcful AF2435Metace MA3670

Nanequ NEQ017Pyrhor PH0173

Pictor PTO0003Pictor PTO0049Thevol TVN0106

1343

1343

1343

1343

1518

1518

1343

1518

RecB-f4343

RecB-f1468

RecB-f1468

RecB-f1468

Helicase12031857 RAMP

1688

RAMP1688

Helicase1203

Helicase1203

RAMPBH0337

RAMPYgcH

RAMPy1727

RAMPy17261857

Helicase1203

HD-f

Helicase1203

HD-f

1343

HD-f

HD-f

HD-f

RuvC-f

AF0070

BH0338

y1724

ygcKygcL

RAMP1583

HTH

AF1870

SPy1049

3574

AF1870

PH0918

HTH 3574

1857 RAMP1688

RecB-f1468

Helicase1203

HD-f

AF1870

MTH1090

AF1873

RAMP1583

3574

AF1870

3574

1343

1

7

2

3

4

5

6

7a

Sulsol SSO1999Sulsol SSO1440

Sultok ST2639Sultok ST0032

Sulsol SSO1402Pyraer PAE0068

Arcful AF1874Pyrhor PH0917Thekod TK0450

Pyraby PAB1689

13433513

3513

1518

RuvC-f

HNH

4aHNH

1 0

Pyrhor PH0162

6

64

6

4

16

17

7

7

7

7

1

2

7

7

7

7

5 7

5 75 7

5 75 7

5 7

6 1

7

Metkan MK1314

Metace MA1931

Metmaz MM3356

Metbar Mbar_A1355

Strthe stu0960Myctub Rv2823cStaepi SERP2461

Themar TM1811Mansuc MS1651

Niteur NE0123Thethe TTP0102

Metthe MTH1082

Metthe MTH326

Metjan MJ1672Pictor PTO0053Thevol TVN0112

Metkan MK1297Thethe TTP0115

Bachal BH0328Synech sll7090

Aquaeo aq 387Theten TTE2639

Themar TM1794Arcful AF1867

Pyrfur PF1129Thefus Tfu1578

Porgin PG1987Sultok ST1979

Sulsol SSO1991Sulsol SSO1729

Sultok ST0012Sulaci Saci 2046Nostoc all1479

1421 RAMP1567

RAMP1583

RAMPMTH323

RAMP1769

XMTH324

ldquoPolymeraserdquo cassette

ldquoHelicaserdquo cassette(A)

(B)

(C)

Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4

Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56

Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)

mRNA targeting

Phylogeny of Cas1 and the 3 types of CRISPRCas systems

228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes

EcoliCASS2Type IType I

Type I

Type I

Type I

Type III

Type III

Type III

Type III

Type III

Type III

Type II

YpestCASS3

NmeniCASS4

PolymeraseRAMPmoduleMtubeCASS6

PolRAMP

DvulgCASS1

TneapHmariCASS7

ApernCASS5

cas1cas9 cas2 cas4

cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI

II

IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2

RAMP

Makarova et al NRM submitted

CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria

bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes

54

256

1537

26

516

2

9

7 56107

3

10

13

383

210

46

216

8

1

7 70211

10

1

0102030405060708090

100

1623

84 6

17 7

2322

0

9

0

0

151

2

1

21

17

20 1

0

1533 28 7 14

0

9 7 40

117 210

0102030405060708090

100

Type I Type II Type IIICas1presenceabsence

Makarova et al in preparation

Modular evolution of the 3 types of CRISPRCas

Makarova et al in preparation

Mol Microbiol 2011 Jan79(2)484-502

A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair

Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF

Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair

Back to repair functions

The 3 major modalities of evolution

Koonin Wolf Biol Direct 2009

CRISPRCas as a bona fide Lamarckian system

Koonin Wolf Biol Direct 2009

Phenomenon Biological rolefunction

Phyletic spread

Lamarckian criteria

Genomic changes caused by environmental factor

Changes are specific to relevant genomic loci

Changes provide adaptation to the causative factor

Bona fide LamarckianCRISPRCas Defense against

viruses and other mobile elements

Most of the Archaeaand many bacteria

Yes Yes Yes

piRNA Defense against transposable elements in germline

Animals Yes Yes Yes

HGT (specific cases)

Adaptation to new environment stress response resistance

Archaea bacteria unicellular eukaryotes

Yes Yes Yes

Quasi-LamarckianHGT (general phenomenon)

Diverse innovations Archaea bacteria unicellular eukaryotes

Yes No Yesno

Stress-induced mutagenesis

Stress responseresistanceadaptation to new conditions

Ubiquitous Yes No or partially

Yes (but general evolvabilityenhanced as well)

Diverse Lamarckian and quasi-Lamarckian phenomena

Koonin Wolf Biol Direct 2009

Stress as a gauge of evolutionary modality

Koonin Wolf Biol Direct 2009

bullEvolution of parasites is intrinsic to any replicator system

bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since

bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms

bullPerennial arms race between parasites and hosts is one of the principal factors of evolution

Valerian DoljaOregon State University

Yuri Wolf NCBITatiana Senkevich NIAID NIH

Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille

Didier Raoult et al

Bill MartinUniv Duesseldorf

Kira Makarova NCBI

  • Slide Number 1
  • What is a virus
  • Slide Number 3
  • Slide Number 4
  • Slide Number 5
  • Comparative genomics shows that viruses that cause human diseases belong to families that evolved hundreds of millions or even billions years agoViruses accompany the evolving cellular life throughout its history and might even predate it
  • Slide Number 7
  • Slide Number 8
  • Slide Number 9
  • Slide Number 10
  • Slide Number 11
  • Slide Number 12
  • Slide Number 13
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Slide Number 18
  • Slide Number 19
  • Slide Number 20
  • Slide Number 21
  • Slide Number 22
  • Slide Number 23
  • Slide Number 24
  • Slide Number 25
  • Slide Number 26
  • Slide Number 27
  • Slide Number 28
  • Slide Number 29
  • Slide Number 30
  • Slide Number 31
  • Slide Number 32
  • Slide Number 33
  • Slide Number 34
  • Slide Number 35
  • Slide Number 36
  • Evolution of antivirus defense systems
  • Slide Number 38
  • Slide Number 39
  • Slide Number 40
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins extreme sequence diversity
  • Slide Number 46
  • Slide Number 47
  • Slide Number 48
  • Slide Number 49
  • Hypothesis
  • Slide Number 51
  • Slide Number 52
  • Slide Number 53
  • Slide Number 54
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • The three types of CRISPRCas systems and their signature genes
  • Experimental data on CRISPRCas systems
  • Phylogeny of Cas1 and the 3 types of CRISPRCas systems
  • CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
  • Slide Number 62
  • Slide Number 63
  • Slide Number 64
  • Slide Number 65
  • Slide Number 66
  • Slide Number 67
  • Slide Number 68
  • Slide Number 69
Page 45: The Virus World, its evolution, evolution of antiviral defense

Mojica FJ Diez-Villasenor C Garcia-Martinez J Soria E

Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elementsJ Mol Evol 2005 Feb60(2)174-82

Here we show that CRISPR spacers derive from preexisting sequences either chromosomal or within transmissible genetic elements such as bacteriophages and conjugative plasmids

helliphelliphelliphellipThe transcription of the CRISPR loci (Tang et al 2002) suggests that such activity could be executed by CRISPR-RNA molecules acting as regulatory RNA that specifically recognizes the target through the homologous RNA-spacer sequence similarly to the eukaryotic interference RNA

HypothesisCRISPRCasbull is a prokaryotic immunity system that functions on the

RNAi principle

bull integrates short fragments of essential phageplasmid genes into CRISPRs

bull When expressed these fragments (psiRNA ndash after prokaryotic siRNA) silence the target gene and make the organism immune to the respective agent

bull contains all or most of the protein activities involved in these processes

bull Some of the Cas proteins are functional analogs of the eukaryotic proteins involved in RNAi in particular components of RISC (RNA-Induced Silencing Complex) and form prokaryotic analogs of RISC (pRISCs)

Transcription

(protein-guided) RNA folding

specific transcription factor (COG1517)

cellular RNApol

polycistronic pre-psiRNA

p-dicer (Helicase+HD-Hydrolase)(COG1203 COG2254)

RNA processing(fast)

pre-psiRNA75-100 nt

RNA processing(slow)

p-dicer

psiRNA25-45 nt

p-slicer(COG1468 COG4343

COG1857)

RAMP

target RNA cleavage

p-RISC

plasmid or phage mRNA

Annealing to RNA target

p-RISC

The basic scheme of CASS functioning

psiRNA25-45 nt

RAMP

RAMP-RNA complex(unstable)plasmid or phage

mRNA

Primer elongationCASS RNApol(COG1353)

Amplified annealing

long dsRNA (stable)

p-dicerDuplex degradation

RAMPRAMP binding

Cycle continues

Annealing to RNA target

Variant of CASS functioning with polymerasepsi-RNA amplification(mostly in thermophiles but also in Mycobacteria)

plasmid or phage mRNA

pre-psiRNA75-100 nt

CASS RNApol(COG1353)Or other RT

Reverse transcription with random copy choice

Random RNA recombination and reverse transcription

dsDNA with CRISPRs and target-derived spacers

Homologous recombination with genomic CRISPR region

integrase(COG1518)

genomic DNA

genomic DNA with new target-derived spacer

OR

New psiRNA generation

Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12

Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity

Key validation

bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease

Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes

(1) Adaptation

(2) Expression amp Processing

(3) Interference

invading virus plasmid

CRISPRleader

5 4 3 2 1

6 5 4 3 2 1

invading virus plasmid

invading virus plasmid

host

host

cas operon

Cascade

Cas3

cas3 cas2cas1cas5

Van der Oost et al TIBS 2009

The three types of CRISPRCas systems and their signature genes

1343

1343

1343

1343

Helicase1203

HD-f

Cas2

Cas6

Cas6

Cas5

Cas8

Cas9

Cas1

Cas3

Cas10

Cas11

Cas7

Cas6

Cas6

Cas5 Cas6

Cas4

Cas6

13433513

1343

RuvC-f

3513

HNH

13433513

1857

18571857 RAMP1688

RAMP1688

RecB-f1468

RecB-f1468

Helicase1203

R AMP1583

1857 1857Helicase1203

1857Helicase1203

1343 y1724

ygcKygcL

EcoliCASS2

YpestCASS3

NmeniCASS4

MtubeCASS6 RAMP module

DvulgCASS1

Tneap and HmariCASS7

ApernCASS5

Type I (Helicase cassette)

Cascade

Type II (HNH-type)

Type III (RAMP module or Polymerase cassette)

CASS4a

1857 1343RecB-f1468

Helicase1203

BH0338MTH1090LA3191

1421 RAMP1567

RAMP15833337RAMP

1583

RAMP1583

RAMP1769

SPy1049

Cas6

Cas61857

RecB-f1468

RAMP1688

Helicase1203

AF0070

R AMP1583

1343

1517

Cas12 Cas13

RAMP1688

RAMP1688

RAMP1688

Makarova et al NRM submitted

Experimental data on CRISPRCas systems

1857

1857

1857

RAMP1688

10

Cordip DIP0037Bacfra BF3951

Wolsuc WS1444Camjej Cj1522c

Neimen NMA0630Pasmul PM1126

Strthe str0658Strpyo Spy 0770

Treden TDE0328Staepi SERP2463

Mycgal MGA 0523Mycmob MMOB0320

Porgin PG1982Legpne lpp0161

Wolsuc WS1615Sulsol SSO1450

Aerper APE1240Sulsol SSO1405

Pyraer PAE0200Pyraer PAE0081

Metmaz MM0559Metmaz MM3249

Arcful AF1878Mansuc MS1635

Niteur NE0111Myctub Rv2817c

Nostoc alr0381Synech slr7071Thethe TTHB145

Chrvio CV1229Xanaxo XAC3842

Chltep CT1130Desvul DVUA0134

Metcap MCA0651Azoarc ebA3283

Deigeo_DRAFT_1682Mansuc MS0981

Baccla ABC3592Bachal BH0341

Strpyo Spy 1286Vibvul VVA1544

Nostoc alr1468Synech slr7092

Synech slr7016Geosul GSU0057

Thethe TTHB224Nostoc alr1568

Metkan MK1312

Esccol b2755Saltyp STM2938

Phopro PBPRC0034Geosul GSU1392

Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639

Symthe_STH663Corjei jk0643

Nocfar nfa44220Strave SAV7537

Metcap MCA0930Cordip DIP2214

Chrvio CV1756Zymmob ZMO0680

Pasmul PM0311Erwcar ECA3679

Yerpes y1722Legpne lpl2837Phopro PBPRB1995

Acinet ACIAD2484

Clotet CTC01148Fusnuc FN1177

Aquaeo aq 369Theten TTE2658Themar TM1797

Bacfra BF2544Porgin PG2014

Halmar pNG4053Clotet CTC01463

Metjan MJ0378Pyrhor PH1245

Thekod TK0455Metthe MTH1084

Arcful AF2435Metace MA3670

Nanequ NEQ017Pyrhor PH0173

Pictor PTO0003Pictor PTO0049Thevol TVN0106

1343

1343

1343

1343

1518

1518

1343

1518

RecB-f4343

RecB-f1468

RecB-f1468

RecB-f1468

Helicase12031857 RAMP

1688

RAMP1688

Helicase1203

Helicase1203

RAMPBH0337

RAMPYgcH

RAMPy1727

RAMPy17261857

Helicase1203

HD-f

Helicase1203

HD-f

1343

HD-f

HD-f

HD-f

RuvC-f

AF0070

BH0338

y1724

ygcKygcL

RAMP1583

HTH

AF1870

SPy1049

3574

AF1870

PH0918

HTH 3574

1857 RAMP1688

RecB-f1468

Helicase1203

HD-f

AF1870

MTH1090

AF1873

RAMP1583

3574

AF1870

3574

1343

1

7

2

3

4

5

6

7a

Sulsol SSO1999Sulsol SSO1440

Sultok ST2639Sultok ST0032

Sulsol SSO1402Pyraer PAE0068

Arcful AF1874Pyrhor PH0917Thekod TK0450

Pyraby PAB1689

13433513

3513

1518

RuvC-f

HNH

4aHNH

1 0

Pyrhor PH0162

6

64

6

4

16

17

7

7

7

7

1

2

7

7

7

7

5 7

5 75 7

5 75 7

5 7

6 1

7

Metkan MK1314

Metace MA1931

Metmaz MM3356

Metbar Mbar_A1355

Strthe stu0960Myctub Rv2823cStaepi SERP2461

Themar TM1811Mansuc MS1651

Niteur NE0123Thethe TTP0102

Metthe MTH1082

Metthe MTH326

Metjan MJ1672Pictor PTO0053Thevol TVN0112

Metkan MK1297Thethe TTP0115

Bachal BH0328Synech sll7090

Aquaeo aq 387Theten TTE2639

Themar TM1794Arcful AF1867

Pyrfur PF1129Thefus Tfu1578

Porgin PG1987Sultok ST1979

Sulsol SSO1991Sulsol SSO1729

Sultok ST0012Sulaci Saci 2046Nostoc all1479

1421 RAMP1567

RAMP1583

RAMPMTH323

RAMP1769

XMTH324

ldquoPolymeraserdquo cassette

ldquoHelicaserdquo cassette(A)

(B)

(C)

Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4

Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56

Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)

mRNA targeting

Phylogeny of Cas1 and the 3 types of CRISPRCas systems

228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes

EcoliCASS2Type IType I

Type I

Type I

Type I

Type III

Type III

Type III

Type III

Type III

Type III

Type II

YpestCASS3

NmeniCASS4

PolymeraseRAMPmoduleMtubeCASS6

PolRAMP

DvulgCASS1

TneapHmariCASS7

ApernCASS5

cas1cas9 cas2 cas4

cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI

II

IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2

RAMP

Makarova et al NRM submitted

CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria

bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes

54

256

1537

26

516

2

9

7 56107

3

10

13

383

210

46

216

8

1

7 70211

10

1

0102030405060708090

100

1623

84 6

17 7

2322

0

9

0

0

151

2

1

21

17

20 1

0

1533 28 7 14

0

9 7 40

117 210

0102030405060708090

100

Type I Type II Type IIICas1presenceabsence

Makarova et al in preparation

Modular evolution of the 3 types of CRISPRCas

Makarova et al in preparation

Mol Microbiol 2011 Jan79(2)484-502

A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair

Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF

Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair

Back to repair functions

The 3 major modalities of evolution

Koonin Wolf Biol Direct 2009

CRISPRCas as a bona fide Lamarckian system

Koonin Wolf Biol Direct 2009

Phenomenon Biological rolefunction

Phyletic spread

Lamarckian criteria

Genomic changes caused by environmental factor

Changes are specific to relevant genomic loci

Changes provide adaptation to the causative factor

Bona fide LamarckianCRISPRCas Defense against

viruses and other mobile elements

Most of the Archaeaand many bacteria

Yes Yes Yes

piRNA Defense against transposable elements in germline

Animals Yes Yes Yes

HGT (specific cases)

Adaptation to new environment stress response resistance

Archaea bacteria unicellular eukaryotes

Yes Yes Yes

Quasi-LamarckianHGT (general phenomenon)

Diverse innovations Archaea bacteria unicellular eukaryotes

Yes No Yesno

Stress-induced mutagenesis

Stress responseresistanceadaptation to new conditions

Ubiquitous Yes No or partially

Yes (but general evolvabilityenhanced as well)

Diverse Lamarckian and quasi-Lamarckian phenomena

Koonin Wolf Biol Direct 2009

Stress as a gauge of evolutionary modality

Koonin Wolf Biol Direct 2009

bullEvolution of parasites is intrinsic to any replicator system

bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since

bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms

bullPerennial arms race between parasites and hosts is one of the principal factors of evolution

Valerian DoljaOregon State University

Yuri Wolf NCBITatiana Senkevich NIAID NIH

Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille

Didier Raoult et al

Bill MartinUniv Duesseldorf

Kira Makarova NCBI

  • Slide Number 1
  • What is a virus
  • Slide Number 3
  • Slide Number 4
  • Slide Number 5
  • Comparative genomics shows that viruses that cause human diseases belong to families that evolved hundreds of millions or even billions years agoViruses accompany the evolving cellular life throughout its history and might even predate it
  • Slide Number 7
  • Slide Number 8
  • Slide Number 9
  • Slide Number 10
  • Slide Number 11
  • Slide Number 12
  • Slide Number 13
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Slide Number 18
  • Slide Number 19
  • Slide Number 20
  • Slide Number 21
  • Slide Number 22
  • Slide Number 23
  • Slide Number 24
  • Slide Number 25
  • Slide Number 26
  • Slide Number 27
  • Slide Number 28
  • Slide Number 29
  • Slide Number 30
  • Slide Number 31
  • Slide Number 32
  • Slide Number 33
  • Slide Number 34
  • Slide Number 35
  • Slide Number 36
  • Evolution of antivirus defense systems
  • Slide Number 38
  • Slide Number 39
  • Slide Number 40
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins extreme sequence diversity
  • Slide Number 46
  • Slide Number 47
  • Slide Number 48
  • Slide Number 49
  • Hypothesis
  • Slide Number 51
  • Slide Number 52
  • Slide Number 53
  • Slide Number 54
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • The three types of CRISPRCas systems and their signature genes
  • Experimental data on CRISPRCas systems
  • Phylogeny of Cas1 and the 3 types of CRISPRCas systems
  • CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
  • Slide Number 62
  • Slide Number 63
  • Slide Number 64
  • Slide Number 65
  • Slide Number 66
  • Slide Number 67
  • Slide Number 68
  • Slide Number 69
Page 46: The Virus World, its evolution, evolution of antiviral defense

HypothesisCRISPRCasbull is a prokaryotic immunity system that functions on the

RNAi principle

bull integrates short fragments of essential phageplasmid genes into CRISPRs

bull When expressed these fragments (psiRNA ndash after prokaryotic siRNA) silence the target gene and make the organism immune to the respective agent

bull contains all or most of the protein activities involved in these processes

bull Some of the Cas proteins are functional analogs of the eukaryotic proteins involved in RNAi in particular components of RISC (RNA-Induced Silencing Complex) and form prokaryotic analogs of RISC (pRISCs)

Transcription

(protein-guided) RNA folding

specific transcription factor (COG1517)

cellular RNApol

polycistronic pre-psiRNA

p-dicer (Helicase+HD-Hydrolase)(COG1203 COG2254)

RNA processing(fast)

pre-psiRNA75-100 nt

RNA processing(slow)

p-dicer

psiRNA25-45 nt

p-slicer(COG1468 COG4343

COG1857)

RAMP

target RNA cleavage

p-RISC

plasmid or phage mRNA

Annealing to RNA target

p-RISC

The basic scheme of CASS functioning

psiRNA25-45 nt

RAMP

RAMP-RNA complex(unstable)plasmid or phage

mRNA

Primer elongationCASS RNApol(COG1353)

Amplified annealing

long dsRNA (stable)

p-dicerDuplex degradation

RAMPRAMP binding

Cycle continues

Annealing to RNA target

Variant of CASS functioning with polymerasepsi-RNA amplification(mostly in thermophiles but also in Mycobacteria)

plasmid or phage mRNA

pre-psiRNA75-100 nt

CASS RNApol(COG1353)Or other RT

Reverse transcription with random copy choice

Random RNA recombination and reverse transcription

dsDNA with CRISPRs and target-derived spacers

Homologous recombination with genomic CRISPR region

integrase(COG1518)

genomic DNA

genomic DNA with new target-derived spacer

OR

New psiRNA generation

Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12

Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity

Key validation

bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease

Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes

(1) Adaptation

(2) Expression amp Processing

(3) Interference

invading virus plasmid

CRISPRleader

5 4 3 2 1

6 5 4 3 2 1

invading virus plasmid

invading virus plasmid

host

host

cas operon

Cascade

Cas3

cas3 cas2cas1cas5

Van der Oost et al TIBS 2009

The three types of CRISPRCas systems and their signature genes

1343

1343

1343

1343

Helicase1203

HD-f

Cas2

Cas6

Cas6

Cas5

Cas8

Cas9

Cas1

Cas3

Cas10

Cas11

Cas7

Cas6

Cas6

Cas5 Cas6

Cas4

Cas6

13433513

1343

RuvC-f

3513

HNH

13433513

1857

18571857 RAMP1688

RAMP1688

RecB-f1468

RecB-f1468

Helicase1203

R AMP1583

1857 1857Helicase1203

1857Helicase1203

1343 y1724

ygcKygcL

EcoliCASS2

YpestCASS3

NmeniCASS4

MtubeCASS6 RAMP module

DvulgCASS1

Tneap and HmariCASS7

ApernCASS5

Type I (Helicase cassette)

Cascade

Type II (HNH-type)

Type III (RAMP module or Polymerase cassette)

CASS4a

1857 1343RecB-f1468

Helicase1203

BH0338MTH1090LA3191

1421 RAMP1567

RAMP15833337RAMP

1583

RAMP1583

RAMP1769

SPy1049

Cas6

Cas61857

RecB-f1468

RAMP1688

Helicase1203

AF0070

R AMP1583

1343

1517

Cas12 Cas13

RAMP1688

RAMP1688

RAMP1688

Makarova et al NRM submitted

Experimental data on CRISPRCas systems

1857

1857

1857

RAMP1688

10

Cordip DIP0037Bacfra BF3951

Wolsuc WS1444Camjej Cj1522c

Neimen NMA0630Pasmul PM1126

Strthe str0658Strpyo Spy 0770

Treden TDE0328Staepi SERP2463

Mycgal MGA 0523Mycmob MMOB0320

Porgin PG1982Legpne lpp0161

Wolsuc WS1615Sulsol SSO1450

Aerper APE1240Sulsol SSO1405

Pyraer PAE0200Pyraer PAE0081

Metmaz MM0559Metmaz MM3249

Arcful AF1878Mansuc MS1635

Niteur NE0111Myctub Rv2817c

Nostoc alr0381Synech slr7071Thethe TTHB145

Chrvio CV1229Xanaxo XAC3842

Chltep CT1130Desvul DVUA0134

Metcap MCA0651Azoarc ebA3283

Deigeo_DRAFT_1682Mansuc MS0981

Baccla ABC3592Bachal BH0341

Strpyo Spy 1286Vibvul VVA1544

Nostoc alr1468Synech slr7092

Synech slr7016Geosul GSU0057

Thethe TTHB224Nostoc alr1568

Metkan MK1312

Esccol b2755Saltyp STM2938

Phopro PBPRC0034Geosul GSU1392

Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639

Symthe_STH663Corjei jk0643

Nocfar nfa44220Strave SAV7537

Metcap MCA0930Cordip DIP2214

Chrvio CV1756Zymmob ZMO0680

Pasmul PM0311Erwcar ECA3679

Yerpes y1722Legpne lpl2837Phopro PBPRB1995

Acinet ACIAD2484

Clotet CTC01148Fusnuc FN1177

Aquaeo aq 369Theten TTE2658Themar TM1797

Bacfra BF2544Porgin PG2014

Halmar pNG4053Clotet CTC01463

Metjan MJ0378Pyrhor PH1245

Thekod TK0455Metthe MTH1084

Arcful AF2435Metace MA3670

Nanequ NEQ017Pyrhor PH0173

Pictor PTO0003Pictor PTO0049Thevol TVN0106

1343

1343

1343

1343

1518

1518

1343

1518

RecB-f4343

RecB-f1468

RecB-f1468

RecB-f1468

Helicase12031857 RAMP

1688

RAMP1688

Helicase1203

Helicase1203

RAMPBH0337

RAMPYgcH

RAMPy1727

RAMPy17261857

Helicase1203

HD-f

Helicase1203

HD-f

1343

HD-f

HD-f

HD-f

RuvC-f

AF0070

BH0338

y1724

ygcKygcL

RAMP1583

HTH

AF1870

SPy1049

3574

AF1870

PH0918

HTH 3574

1857 RAMP1688

RecB-f1468

Helicase1203

HD-f

AF1870

MTH1090

AF1873

RAMP1583

3574

AF1870

3574

1343

1

7

2

3

4

5

6

7a

Sulsol SSO1999Sulsol SSO1440

Sultok ST2639Sultok ST0032

Sulsol SSO1402Pyraer PAE0068

Arcful AF1874Pyrhor PH0917Thekod TK0450

Pyraby PAB1689

13433513

3513

1518

RuvC-f

HNH

4aHNH

1 0

Pyrhor PH0162

6

64

6

4

16

17

7

7

7

7

1

2

7

7

7

7

5 7

5 75 7

5 75 7

5 7

6 1

7

Metkan MK1314

Metace MA1931

Metmaz MM3356

Metbar Mbar_A1355

Strthe stu0960Myctub Rv2823cStaepi SERP2461

Themar TM1811Mansuc MS1651

Niteur NE0123Thethe TTP0102

Metthe MTH1082

Metthe MTH326

Metjan MJ1672Pictor PTO0053Thevol TVN0112

Metkan MK1297Thethe TTP0115

Bachal BH0328Synech sll7090

Aquaeo aq 387Theten TTE2639

Themar TM1794Arcful AF1867

Pyrfur PF1129Thefus Tfu1578

Porgin PG1987Sultok ST1979

Sulsol SSO1991Sulsol SSO1729

Sultok ST0012Sulaci Saci 2046Nostoc all1479

1421 RAMP1567

RAMP1583

RAMPMTH323

RAMP1769

XMTH324

ldquoPolymeraserdquo cassette

ldquoHelicaserdquo cassette(A)

(B)

(C)

Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4

Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56

Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)

mRNA targeting

Phylogeny of Cas1 and the 3 types of CRISPRCas systems

228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes

EcoliCASS2Type IType I

Type I

Type I

Type I

Type III

Type III

Type III

Type III

Type III

Type III

Type II

YpestCASS3

NmeniCASS4

PolymeraseRAMPmoduleMtubeCASS6

PolRAMP

DvulgCASS1

TneapHmariCASS7

ApernCASS5

cas1cas9 cas2 cas4

cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI

II

IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2

RAMP

Makarova et al NRM submitted

CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria

bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes

54

256

1537

26

516

2

9

7 56107

3

10

13

383

210

46

216

8

1

7 70211

10

1

0102030405060708090

100

1623

84 6

17 7

2322

0

9

0

0

151

2

1

21

17

20 1

0

1533 28 7 14

0

9 7 40

117 210

0102030405060708090

100

Type I Type II Type IIICas1presenceabsence

Makarova et al in preparation

Modular evolution of the 3 types of CRISPRCas

Makarova et al in preparation

Mol Microbiol 2011 Jan79(2)484-502

A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair

Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF

Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair

Back to repair functions

The 3 major modalities of evolution

Koonin Wolf Biol Direct 2009

CRISPRCas as a bona fide Lamarckian system

Koonin Wolf Biol Direct 2009

Phenomenon Biological rolefunction

Phyletic spread

Lamarckian criteria

Genomic changes caused by environmental factor

Changes are specific to relevant genomic loci

Changes provide adaptation to the causative factor

Bona fide LamarckianCRISPRCas Defense against

viruses and other mobile elements

Most of the Archaeaand many bacteria

Yes Yes Yes

piRNA Defense against transposable elements in germline

Animals Yes Yes Yes

HGT (specific cases)

Adaptation to new environment stress response resistance

Archaea bacteria unicellular eukaryotes

Yes Yes Yes

Quasi-LamarckianHGT (general phenomenon)

Diverse innovations Archaea bacteria unicellular eukaryotes

Yes No Yesno

Stress-induced mutagenesis

Stress responseresistanceadaptation to new conditions

Ubiquitous Yes No or partially

Yes (but general evolvabilityenhanced as well)

Diverse Lamarckian and quasi-Lamarckian phenomena

Koonin Wolf Biol Direct 2009

Stress as a gauge of evolutionary modality

Koonin Wolf Biol Direct 2009

bullEvolution of parasites is intrinsic to any replicator system

bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since

bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms

bullPerennial arms race between parasites and hosts is one of the principal factors of evolution

Valerian DoljaOregon State University

Yuri Wolf NCBITatiana Senkevich NIAID NIH

Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille

Didier Raoult et al

Bill MartinUniv Duesseldorf

Kira Makarova NCBI

  • Slide Number 1
  • What is a virus
  • Slide Number 3
  • Slide Number 4
  • Slide Number 5
  • Comparative genomics shows that viruses that cause human diseases belong to families that evolved hundreds of millions or even billions years agoViruses accompany the evolving cellular life throughout its history and might even predate it
  • Slide Number 7
  • Slide Number 8
  • Slide Number 9
  • Slide Number 10
  • Slide Number 11
  • Slide Number 12
  • Slide Number 13
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Slide Number 18
  • Slide Number 19
  • Slide Number 20
  • Slide Number 21
  • Slide Number 22
  • Slide Number 23
  • Slide Number 24
  • Slide Number 25
  • Slide Number 26
  • Slide Number 27
  • Slide Number 28
  • Slide Number 29
  • Slide Number 30
  • Slide Number 31
  • Slide Number 32
  • Slide Number 33
  • Slide Number 34
  • Slide Number 35
  • Slide Number 36
  • Evolution of antivirus defense systems
  • Slide Number 38
  • Slide Number 39
  • Slide Number 40
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins extreme sequence diversity
  • Slide Number 46
  • Slide Number 47
  • Slide Number 48
  • Slide Number 49
  • Hypothesis
  • Slide Number 51
  • Slide Number 52
  • Slide Number 53
  • Slide Number 54
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • The three types of CRISPRCas systems and their signature genes
  • Experimental data on CRISPRCas systems
  • Phylogeny of Cas1 and the 3 types of CRISPRCas systems
  • CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
  • Slide Number 62
  • Slide Number 63
  • Slide Number 64
  • Slide Number 65
  • Slide Number 66
  • Slide Number 67
  • Slide Number 68
  • Slide Number 69
Page 47: The Virus World, its evolution, evolution of antiviral defense

Transcription

(protein-guided) RNA folding

specific transcription factor (COG1517)

cellular RNApol

polycistronic pre-psiRNA

p-dicer (Helicase+HD-Hydrolase)(COG1203 COG2254)

RNA processing(fast)

pre-psiRNA75-100 nt

RNA processing(slow)

p-dicer

psiRNA25-45 nt

p-slicer(COG1468 COG4343

COG1857)

RAMP

target RNA cleavage

p-RISC

plasmid or phage mRNA

Annealing to RNA target

p-RISC

The basic scheme of CASS functioning

psiRNA25-45 nt

RAMP

RAMP-RNA complex(unstable)plasmid or phage

mRNA

Primer elongationCASS RNApol(COG1353)

Amplified annealing

long dsRNA (stable)

p-dicerDuplex degradation

RAMPRAMP binding

Cycle continues

Annealing to RNA target

Variant of CASS functioning with polymerasepsi-RNA amplification(mostly in thermophiles but also in Mycobacteria)

plasmid or phage mRNA

pre-psiRNA75-100 nt

CASS RNApol(COG1353)Or other RT

Reverse transcription with random copy choice

Random RNA recombination and reverse transcription

dsDNA with CRISPRs and target-derived spacers

Homologous recombination with genomic CRISPR region

integrase(COG1518)

genomic DNA

genomic DNA with new target-derived spacer

OR

New psiRNA generation

Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12

Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity

Key validation

bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease

Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes

(1) Adaptation

(2) Expression amp Processing

(3) Interference

invading virus plasmid

CRISPRleader

5 4 3 2 1

6 5 4 3 2 1

invading virus plasmid

invading virus plasmid

host

host

cas operon

Cascade

Cas3

cas3 cas2cas1cas5

Van der Oost et al TIBS 2009

The three types of CRISPRCas systems and their signature genes

1343

1343

1343

1343

Helicase1203

HD-f

Cas2

Cas6

Cas6

Cas5

Cas8

Cas9

Cas1

Cas3

Cas10

Cas11

Cas7

Cas6

Cas6

Cas5 Cas6

Cas4

Cas6

13433513

1343

RuvC-f

3513

HNH

13433513

1857

18571857 RAMP1688

RAMP1688

RecB-f1468

RecB-f1468

Helicase1203

R AMP1583

1857 1857Helicase1203

1857Helicase1203

1343 y1724

ygcKygcL

EcoliCASS2

YpestCASS3

NmeniCASS4

MtubeCASS6 RAMP module

DvulgCASS1

Tneap and HmariCASS7

ApernCASS5

Type I (Helicase cassette)

Cascade

Type II (HNH-type)

Type III (RAMP module or Polymerase cassette)

CASS4a

1857 1343RecB-f1468

Helicase1203

BH0338MTH1090LA3191

1421 RAMP1567

RAMP15833337RAMP

1583

RAMP1583

RAMP1769

SPy1049

Cas6

Cas61857

RecB-f1468

RAMP1688

Helicase1203

AF0070

R AMP1583

1343

1517

Cas12 Cas13

RAMP1688

RAMP1688

RAMP1688

Makarova et al NRM submitted

Experimental data on CRISPRCas systems

1857

1857

1857

RAMP1688

10

Cordip DIP0037Bacfra BF3951

Wolsuc WS1444Camjej Cj1522c

Neimen NMA0630Pasmul PM1126

Strthe str0658Strpyo Spy 0770

Treden TDE0328Staepi SERP2463

Mycgal MGA 0523Mycmob MMOB0320

Porgin PG1982Legpne lpp0161

Wolsuc WS1615Sulsol SSO1450

Aerper APE1240Sulsol SSO1405

Pyraer PAE0200Pyraer PAE0081

Metmaz MM0559Metmaz MM3249

Arcful AF1878Mansuc MS1635

Niteur NE0111Myctub Rv2817c

Nostoc alr0381Synech slr7071Thethe TTHB145

Chrvio CV1229Xanaxo XAC3842

Chltep CT1130Desvul DVUA0134

Metcap MCA0651Azoarc ebA3283

Deigeo_DRAFT_1682Mansuc MS0981

Baccla ABC3592Bachal BH0341

Strpyo Spy 1286Vibvul VVA1544

Nostoc alr1468Synech slr7092

Synech slr7016Geosul GSU0057

Thethe TTHB224Nostoc alr1568

Metkan MK1312

Esccol b2755Saltyp STM2938

Phopro PBPRC0034Geosul GSU1392

Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639

Symthe_STH663Corjei jk0643

Nocfar nfa44220Strave SAV7537

Metcap MCA0930Cordip DIP2214

Chrvio CV1756Zymmob ZMO0680

Pasmul PM0311Erwcar ECA3679

Yerpes y1722Legpne lpl2837Phopro PBPRB1995

Acinet ACIAD2484

Clotet CTC01148Fusnuc FN1177

Aquaeo aq 369Theten TTE2658Themar TM1797

Bacfra BF2544Porgin PG2014

Halmar pNG4053Clotet CTC01463

Metjan MJ0378Pyrhor PH1245

Thekod TK0455Metthe MTH1084

Arcful AF2435Metace MA3670

Nanequ NEQ017Pyrhor PH0173

Pictor PTO0003Pictor PTO0049Thevol TVN0106

1343

1343

1343

1343

1518

1518

1343

1518

RecB-f4343

RecB-f1468

RecB-f1468

RecB-f1468

Helicase12031857 RAMP

1688

RAMP1688

Helicase1203

Helicase1203

RAMPBH0337

RAMPYgcH

RAMPy1727

RAMPy17261857

Helicase1203

HD-f

Helicase1203

HD-f

1343

HD-f

HD-f

HD-f

RuvC-f

AF0070

BH0338

y1724

ygcKygcL

RAMP1583

HTH

AF1870

SPy1049

3574

AF1870

PH0918

HTH 3574

1857 RAMP1688

RecB-f1468

Helicase1203

HD-f

AF1870

MTH1090

AF1873

RAMP1583

3574

AF1870

3574

1343

1

7

2

3

4

5

6

7a

Sulsol SSO1999Sulsol SSO1440

Sultok ST2639Sultok ST0032

Sulsol SSO1402Pyraer PAE0068

Arcful AF1874Pyrhor PH0917Thekod TK0450

Pyraby PAB1689

13433513

3513

1518

RuvC-f

HNH

4aHNH

1 0

Pyrhor PH0162

6

64

6

4

16

17

7

7

7

7

1

2

7

7

7

7

5 7

5 75 7

5 75 7

5 7

6 1

7

Metkan MK1314

Metace MA1931

Metmaz MM3356

Metbar Mbar_A1355

Strthe stu0960Myctub Rv2823cStaepi SERP2461

Themar TM1811Mansuc MS1651

Niteur NE0123Thethe TTP0102

Metthe MTH1082

Metthe MTH326

Metjan MJ1672Pictor PTO0053Thevol TVN0112

Metkan MK1297Thethe TTP0115

Bachal BH0328Synech sll7090

Aquaeo aq 387Theten TTE2639

Themar TM1794Arcful AF1867

Pyrfur PF1129Thefus Tfu1578

Porgin PG1987Sultok ST1979

Sulsol SSO1991Sulsol SSO1729

Sultok ST0012Sulaci Saci 2046Nostoc all1479

1421 RAMP1567

RAMP1583

RAMPMTH323

RAMP1769

XMTH324

ldquoPolymeraserdquo cassette

ldquoHelicaserdquo cassette(A)

(B)

(C)

Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4

Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56

Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)

mRNA targeting

Phylogeny of Cas1 and the 3 types of CRISPRCas systems

228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes

EcoliCASS2Type IType I

Type I

Type I

Type I

Type III

Type III

Type III

Type III

Type III

Type III

Type II

YpestCASS3

NmeniCASS4

PolymeraseRAMPmoduleMtubeCASS6

PolRAMP

DvulgCASS1

TneapHmariCASS7

ApernCASS5

cas1cas9 cas2 cas4

cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI

II

IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2

RAMP

Makarova et al NRM submitted

CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria

bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes

54

256

1537

26

516

2

9

7 56107

3

10

13

383

210

46

216

8

1

7 70211

10

1

0102030405060708090

100

1623

84 6

17 7

2322

0

9

0

0

151

2

1

21

17

20 1

0

1533 28 7 14

0

9 7 40

117 210

0102030405060708090

100

Type I Type II Type IIICas1presenceabsence

Makarova et al in preparation

Modular evolution of the 3 types of CRISPRCas

Makarova et al in preparation

Mol Microbiol 2011 Jan79(2)484-502

A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair

Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF

Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair

Back to repair functions

The 3 major modalities of evolution

Koonin Wolf Biol Direct 2009

CRISPRCas as a bona fide Lamarckian system

Koonin Wolf Biol Direct 2009

Phenomenon Biological rolefunction

Phyletic spread

Lamarckian criteria

Genomic changes caused by environmental factor

Changes are specific to relevant genomic loci

Changes provide adaptation to the causative factor

Bona fide LamarckianCRISPRCas Defense against

viruses and other mobile elements

Most of the Archaeaand many bacteria

Yes Yes Yes

piRNA Defense against transposable elements in germline

Animals Yes Yes Yes

HGT (specific cases)

Adaptation to new environment stress response resistance

Archaea bacteria unicellular eukaryotes

Yes Yes Yes

Quasi-LamarckianHGT (general phenomenon)

Diverse innovations Archaea bacteria unicellular eukaryotes

Yes No Yesno

Stress-induced mutagenesis

Stress responseresistanceadaptation to new conditions

Ubiquitous Yes No or partially

Yes (but general evolvabilityenhanced as well)

Diverse Lamarckian and quasi-Lamarckian phenomena

Koonin Wolf Biol Direct 2009

Stress as a gauge of evolutionary modality

Koonin Wolf Biol Direct 2009

bullEvolution of parasites is intrinsic to any replicator system

bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since

bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms

bullPerennial arms race between parasites and hosts is one of the principal factors of evolution

Valerian DoljaOregon State University

Yuri Wolf NCBITatiana Senkevich NIAID NIH

Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille

Didier Raoult et al

Bill MartinUniv Duesseldorf

Kira Makarova NCBI

  • Slide Number 1
  • What is a virus
  • Slide Number 3
  • Slide Number 4
  • Slide Number 5
  • Comparative genomics shows that viruses that cause human diseases belong to families that evolved hundreds of millions or even billions years agoViruses accompany the evolving cellular life throughout its history and might even predate it
  • Slide Number 7
  • Slide Number 8
  • Slide Number 9
  • Slide Number 10
  • Slide Number 11
  • Slide Number 12
  • Slide Number 13
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Slide Number 18
  • Slide Number 19
  • Slide Number 20
  • Slide Number 21
  • Slide Number 22
  • Slide Number 23
  • Slide Number 24
  • Slide Number 25
  • Slide Number 26
  • Slide Number 27
  • Slide Number 28
  • Slide Number 29
  • Slide Number 30
  • Slide Number 31
  • Slide Number 32
  • Slide Number 33
  • Slide Number 34
  • Slide Number 35
  • Slide Number 36
  • Evolution of antivirus defense systems
  • Slide Number 38
  • Slide Number 39
  • Slide Number 40
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins extreme sequence diversity
  • Slide Number 46
  • Slide Number 47
  • Slide Number 48
  • Slide Number 49
  • Hypothesis
  • Slide Number 51
  • Slide Number 52
  • Slide Number 53
  • Slide Number 54
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • The three types of CRISPRCas systems and their signature genes
  • Experimental data on CRISPRCas systems
  • Phylogeny of Cas1 and the 3 types of CRISPRCas systems
  • CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
  • Slide Number 62
  • Slide Number 63
  • Slide Number 64
  • Slide Number 65
  • Slide Number 66
  • Slide Number 67
  • Slide Number 68
  • Slide Number 69
Page 48: The Virus World, its evolution, evolution of antiviral defense

psiRNA25-45 nt

RAMP

RAMP-RNA complex(unstable)plasmid or phage

mRNA

Primer elongationCASS RNApol(COG1353)

Amplified annealing

long dsRNA (stable)

p-dicerDuplex degradation

RAMPRAMP binding

Cycle continues

Annealing to RNA target

Variant of CASS functioning with polymerasepsi-RNA amplification(mostly in thermophiles but also in Mycobacteria)

plasmid or phage mRNA

pre-psiRNA75-100 nt

CASS RNApol(COG1353)Or other RT

Reverse transcription with random copy choice

Random RNA recombination and reverse transcription

dsDNA with CRISPRs and target-derived spacers

Homologous recombination with genomic CRISPR region

integrase(COG1518)

genomic DNA

genomic DNA with new target-derived spacer

OR

New psiRNA generation

Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12

Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity

Key validation

bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease

Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes

(1) Adaptation

(2) Expression amp Processing

(3) Interference

invading virus plasmid

CRISPRleader

5 4 3 2 1

6 5 4 3 2 1

invading virus plasmid

invading virus plasmid

host

host

cas operon

Cascade

Cas3

cas3 cas2cas1cas5

Van der Oost et al TIBS 2009

The three types of CRISPRCas systems and their signature genes

1343

1343

1343

1343

Helicase1203

HD-f

Cas2

Cas6

Cas6

Cas5

Cas8

Cas9

Cas1

Cas3

Cas10

Cas11

Cas7

Cas6

Cas6

Cas5 Cas6

Cas4

Cas6

13433513

1343

RuvC-f

3513

HNH

13433513

1857

18571857 RAMP1688

RAMP1688

RecB-f1468

RecB-f1468

Helicase1203

R AMP1583

1857 1857Helicase1203

1857Helicase1203

1343 y1724

ygcKygcL

EcoliCASS2

YpestCASS3

NmeniCASS4

MtubeCASS6 RAMP module

DvulgCASS1

Tneap and HmariCASS7

ApernCASS5

Type I (Helicase cassette)

Cascade

Type II (HNH-type)

Type III (RAMP module or Polymerase cassette)

CASS4a

1857 1343RecB-f1468

Helicase1203

BH0338MTH1090LA3191

1421 RAMP1567

RAMP15833337RAMP

1583

RAMP1583

RAMP1769

SPy1049

Cas6

Cas61857

RecB-f1468

RAMP1688

Helicase1203

AF0070

R AMP1583

1343

1517

Cas12 Cas13

RAMP1688

RAMP1688

RAMP1688

Makarova et al NRM submitted

Experimental data on CRISPRCas systems

1857

1857

1857

RAMP1688

10

Cordip DIP0037Bacfra BF3951

Wolsuc WS1444Camjej Cj1522c

Neimen NMA0630Pasmul PM1126

Strthe str0658Strpyo Spy 0770

Treden TDE0328Staepi SERP2463

Mycgal MGA 0523Mycmob MMOB0320

Porgin PG1982Legpne lpp0161

Wolsuc WS1615Sulsol SSO1450

Aerper APE1240Sulsol SSO1405

Pyraer PAE0200Pyraer PAE0081

Metmaz MM0559Metmaz MM3249

Arcful AF1878Mansuc MS1635

Niteur NE0111Myctub Rv2817c

Nostoc alr0381Synech slr7071Thethe TTHB145

Chrvio CV1229Xanaxo XAC3842

Chltep CT1130Desvul DVUA0134

Metcap MCA0651Azoarc ebA3283

Deigeo_DRAFT_1682Mansuc MS0981

Baccla ABC3592Bachal BH0341

Strpyo Spy 1286Vibvul VVA1544

Nostoc alr1468Synech slr7092

Synech slr7016Geosul GSU0057

Thethe TTHB224Nostoc alr1568

Metkan MK1312

Esccol b2755Saltyp STM2938

Phopro PBPRC0034Geosul GSU1392

Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639

Symthe_STH663Corjei jk0643

Nocfar nfa44220Strave SAV7537

Metcap MCA0930Cordip DIP2214

Chrvio CV1756Zymmob ZMO0680

Pasmul PM0311Erwcar ECA3679

Yerpes y1722Legpne lpl2837Phopro PBPRB1995

Acinet ACIAD2484

Clotet CTC01148Fusnuc FN1177

Aquaeo aq 369Theten TTE2658Themar TM1797

Bacfra BF2544Porgin PG2014

Halmar pNG4053Clotet CTC01463

Metjan MJ0378Pyrhor PH1245

Thekod TK0455Metthe MTH1084

Arcful AF2435Metace MA3670

Nanequ NEQ017Pyrhor PH0173

Pictor PTO0003Pictor PTO0049Thevol TVN0106

1343

1343

1343

1343

1518

1518

1343

1518

RecB-f4343

RecB-f1468

RecB-f1468

RecB-f1468

Helicase12031857 RAMP

1688

RAMP1688

Helicase1203

Helicase1203

RAMPBH0337

RAMPYgcH

RAMPy1727

RAMPy17261857

Helicase1203

HD-f

Helicase1203

HD-f

1343

HD-f

HD-f

HD-f

RuvC-f

AF0070

BH0338

y1724

ygcKygcL

RAMP1583

HTH

AF1870

SPy1049

3574

AF1870

PH0918

HTH 3574

1857 RAMP1688

RecB-f1468

Helicase1203

HD-f

AF1870

MTH1090

AF1873

RAMP1583

3574

AF1870

3574

1343

1

7

2

3

4

5

6

7a

Sulsol SSO1999Sulsol SSO1440

Sultok ST2639Sultok ST0032

Sulsol SSO1402Pyraer PAE0068

Arcful AF1874Pyrhor PH0917Thekod TK0450

Pyraby PAB1689

13433513

3513

1518

RuvC-f

HNH

4aHNH

1 0

Pyrhor PH0162

6

64

6

4

16

17

7

7

7

7

1

2

7

7

7

7

5 7

5 75 7

5 75 7

5 7

6 1

7

Metkan MK1314

Metace MA1931

Metmaz MM3356

Metbar Mbar_A1355

Strthe stu0960Myctub Rv2823cStaepi SERP2461

Themar TM1811Mansuc MS1651

Niteur NE0123Thethe TTP0102

Metthe MTH1082

Metthe MTH326

Metjan MJ1672Pictor PTO0053Thevol TVN0112

Metkan MK1297Thethe TTP0115

Bachal BH0328Synech sll7090

Aquaeo aq 387Theten TTE2639

Themar TM1794Arcful AF1867

Pyrfur PF1129Thefus Tfu1578

Porgin PG1987Sultok ST1979

Sulsol SSO1991Sulsol SSO1729

Sultok ST0012Sulaci Saci 2046Nostoc all1479

1421 RAMP1567

RAMP1583

RAMPMTH323

RAMP1769

XMTH324

ldquoPolymeraserdquo cassette

ldquoHelicaserdquo cassette(A)

(B)

(C)

Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4

Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56

Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)

mRNA targeting

Phylogeny of Cas1 and the 3 types of CRISPRCas systems

228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes

EcoliCASS2Type IType I

Type I

Type I

Type I

Type III

Type III

Type III

Type III

Type III

Type III

Type II

YpestCASS3

NmeniCASS4

PolymeraseRAMPmoduleMtubeCASS6

PolRAMP

DvulgCASS1

TneapHmariCASS7

ApernCASS5

cas1cas9 cas2 cas4

cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI

II

IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2

RAMP

Makarova et al NRM submitted

CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria

bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes

54

256

1537

26

516

2

9

7 56107

3

10

13

383

210

46

216

8

1

7 70211

10

1

0102030405060708090

100

1623

84 6

17 7

2322

0

9

0

0

151

2

1

21

17

20 1

0

1533 28 7 14

0

9 7 40

117 210

0102030405060708090

100

Type I Type II Type IIICas1presenceabsence

Makarova et al in preparation

Modular evolution of the 3 types of CRISPRCas

Makarova et al in preparation

Mol Microbiol 2011 Jan79(2)484-502

A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair

Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF

Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair

Back to repair functions

The 3 major modalities of evolution

Koonin Wolf Biol Direct 2009

CRISPRCas as a bona fide Lamarckian system

Koonin Wolf Biol Direct 2009

Phenomenon Biological rolefunction

Phyletic spread

Lamarckian criteria

Genomic changes caused by environmental factor

Changes are specific to relevant genomic loci

Changes provide adaptation to the causative factor

Bona fide LamarckianCRISPRCas Defense against

viruses and other mobile elements

Most of the Archaeaand many bacteria

Yes Yes Yes

piRNA Defense against transposable elements in germline

Animals Yes Yes Yes

HGT (specific cases)

Adaptation to new environment stress response resistance

Archaea bacteria unicellular eukaryotes

Yes Yes Yes

Quasi-LamarckianHGT (general phenomenon)

Diverse innovations Archaea bacteria unicellular eukaryotes

Yes No Yesno

Stress-induced mutagenesis

Stress responseresistanceadaptation to new conditions

Ubiquitous Yes No or partially

Yes (but general evolvabilityenhanced as well)

Diverse Lamarckian and quasi-Lamarckian phenomena

Koonin Wolf Biol Direct 2009

Stress as a gauge of evolutionary modality

Koonin Wolf Biol Direct 2009

bullEvolution of parasites is intrinsic to any replicator system

bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since

bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms

bullPerennial arms race between parasites and hosts is one of the principal factors of evolution

Valerian DoljaOregon State University

Yuri Wolf NCBITatiana Senkevich NIAID NIH

Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille

Didier Raoult et al

Bill MartinUniv Duesseldorf

Kira Makarova NCBI

  • Slide Number 1
  • What is a virus
  • Slide Number 3
  • Slide Number 4
  • Slide Number 5
  • Comparative genomics shows that viruses that cause human diseases belong to families that evolved hundreds of millions or even billions years agoViruses accompany the evolving cellular life throughout its history and might even predate it
  • Slide Number 7
  • Slide Number 8
  • Slide Number 9
  • Slide Number 10
  • Slide Number 11
  • Slide Number 12
  • Slide Number 13
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Slide Number 18
  • Slide Number 19
  • Slide Number 20
  • Slide Number 21
  • Slide Number 22
  • Slide Number 23
  • Slide Number 24
  • Slide Number 25
  • Slide Number 26
  • Slide Number 27
  • Slide Number 28
  • Slide Number 29
  • Slide Number 30
  • Slide Number 31
  • Slide Number 32
  • Slide Number 33
  • Slide Number 34
  • Slide Number 35
  • Slide Number 36
  • Evolution of antivirus defense systems
  • Slide Number 38
  • Slide Number 39
  • Slide Number 40
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins extreme sequence diversity
  • Slide Number 46
  • Slide Number 47
  • Slide Number 48
  • Slide Number 49
  • Hypothesis
  • Slide Number 51
  • Slide Number 52
  • Slide Number 53
  • Slide Number 54
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • The three types of CRISPRCas systems and their signature genes
  • Experimental data on CRISPRCas systems
  • Phylogeny of Cas1 and the 3 types of CRISPRCas systems
  • CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
  • Slide Number 62
  • Slide Number 63
  • Slide Number 64
  • Slide Number 65
  • Slide Number 66
  • Slide Number 67
  • Slide Number 68
  • Slide Number 69
Page 49: The Virus World, its evolution, evolution of antiviral defense

plasmid or phage mRNA

pre-psiRNA75-100 nt

CASS RNApol(COG1353)Or other RT

Reverse transcription with random copy choice

Random RNA recombination and reverse transcription

dsDNA with CRISPRs and target-derived spacers

Homologous recombination with genomic CRISPR region

integrase(COG1518)

genomic DNA

genomic DNA with new target-derived spacer

OR

New psiRNA generation

Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12

Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity

Key validation

bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease

Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes

(1) Adaptation

(2) Expression amp Processing

(3) Interference

invading virus plasmid

CRISPRleader

5 4 3 2 1

6 5 4 3 2 1

invading virus plasmid

invading virus plasmid

host

host

cas operon

Cascade

Cas3

cas3 cas2cas1cas5

Van der Oost et al TIBS 2009

The three types of CRISPRCas systems and their signature genes

1343

1343

1343

1343

Helicase1203

HD-f

Cas2

Cas6

Cas6

Cas5

Cas8

Cas9

Cas1

Cas3

Cas10

Cas11

Cas7

Cas6

Cas6

Cas5 Cas6

Cas4

Cas6

13433513

1343

RuvC-f

3513

HNH

13433513

1857

18571857 RAMP1688

RAMP1688

RecB-f1468

RecB-f1468

Helicase1203

R AMP1583

1857 1857Helicase1203

1857Helicase1203

1343 y1724

ygcKygcL

EcoliCASS2

YpestCASS3

NmeniCASS4

MtubeCASS6 RAMP module

DvulgCASS1

Tneap and HmariCASS7

ApernCASS5

Type I (Helicase cassette)

Cascade

Type II (HNH-type)

Type III (RAMP module or Polymerase cassette)

CASS4a

1857 1343RecB-f1468

Helicase1203

BH0338MTH1090LA3191

1421 RAMP1567

RAMP15833337RAMP

1583

RAMP1583

RAMP1769

SPy1049

Cas6

Cas61857

RecB-f1468

RAMP1688

Helicase1203

AF0070

R AMP1583

1343

1517

Cas12 Cas13

RAMP1688

RAMP1688

RAMP1688

Makarova et al NRM submitted

Experimental data on CRISPRCas systems

1857

1857

1857

RAMP1688

10

Cordip DIP0037Bacfra BF3951

Wolsuc WS1444Camjej Cj1522c

Neimen NMA0630Pasmul PM1126

Strthe str0658Strpyo Spy 0770

Treden TDE0328Staepi SERP2463

Mycgal MGA 0523Mycmob MMOB0320

Porgin PG1982Legpne lpp0161

Wolsuc WS1615Sulsol SSO1450

Aerper APE1240Sulsol SSO1405

Pyraer PAE0200Pyraer PAE0081

Metmaz MM0559Metmaz MM3249

Arcful AF1878Mansuc MS1635

Niteur NE0111Myctub Rv2817c

Nostoc alr0381Synech slr7071Thethe TTHB145

Chrvio CV1229Xanaxo XAC3842

Chltep CT1130Desvul DVUA0134

Metcap MCA0651Azoarc ebA3283

Deigeo_DRAFT_1682Mansuc MS0981

Baccla ABC3592Bachal BH0341

Strpyo Spy 1286Vibvul VVA1544

Nostoc alr1468Synech slr7092

Synech slr7016Geosul GSU0057

Thethe TTHB224Nostoc alr1568

Metkan MK1312

Esccol b2755Saltyp STM2938

Phopro PBPRC0034Geosul GSU1392

Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639

Symthe_STH663Corjei jk0643

Nocfar nfa44220Strave SAV7537

Metcap MCA0930Cordip DIP2214

Chrvio CV1756Zymmob ZMO0680

Pasmul PM0311Erwcar ECA3679

Yerpes y1722Legpne lpl2837Phopro PBPRB1995

Acinet ACIAD2484

Clotet CTC01148Fusnuc FN1177

Aquaeo aq 369Theten TTE2658Themar TM1797

Bacfra BF2544Porgin PG2014

Halmar pNG4053Clotet CTC01463

Metjan MJ0378Pyrhor PH1245

Thekod TK0455Metthe MTH1084

Arcful AF2435Metace MA3670

Nanequ NEQ017Pyrhor PH0173

Pictor PTO0003Pictor PTO0049Thevol TVN0106

1343

1343

1343

1343

1518

1518

1343

1518

RecB-f4343

RecB-f1468

RecB-f1468

RecB-f1468

Helicase12031857 RAMP

1688

RAMP1688

Helicase1203

Helicase1203

RAMPBH0337

RAMPYgcH

RAMPy1727

RAMPy17261857

Helicase1203

HD-f

Helicase1203

HD-f

1343

HD-f

HD-f

HD-f

RuvC-f

AF0070

BH0338

y1724

ygcKygcL

RAMP1583

HTH

AF1870

SPy1049

3574

AF1870

PH0918

HTH 3574

1857 RAMP1688

RecB-f1468

Helicase1203

HD-f

AF1870

MTH1090

AF1873

RAMP1583

3574

AF1870

3574

1343

1

7

2

3

4

5

6

7a

Sulsol SSO1999Sulsol SSO1440

Sultok ST2639Sultok ST0032

Sulsol SSO1402Pyraer PAE0068

Arcful AF1874Pyrhor PH0917Thekod TK0450

Pyraby PAB1689

13433513

3513

1518

RuvC-f

HNH

4aHNH

1 0

Pyrhor PH0162

6

64

6

4

16

17

7

7

7

7

1

2

7

7

7

7

5 7

5 75 7

5 75 7

5 7

6 1

7

Metkan MK1314

Metace MA1931

Metmaz MM3356

Metbar Mbar_A1355

Strthe stu0960Myctub Rv2823cStaepi SERP2461

Themar TM1811Mansuc MS1651

Niteur NE0123Thethe TTP0102

Metthe MTH1082

Metthe MTH326

Metjan MJ1672Pictor PTO0053Thevol TVN0112

Metkan MK1297Thethe TTP0115

Bachal BH0328Synech sll7090

Aquaeo aq 387Theten TTE2639

Themar TM1794Arcful AF1867

Pyrfur PF1129Thefus Tfu1578

Porgin PG1987Sultok ST1979

Sulsol SSO1991Sulsol SSO1729

Sultok ST0012Sulaci Saci 2046Nostoc all1479

1421 RAMP1567

RAMP1583

RAMPMTH323

RAMP1769

XMTH324

ldquoPolymeraserdquo cassette

ldquoHelicaserdquo cassette(A)

(B)

(C)

Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4

Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56

Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)

mRNA targeting

Phylogeny of Cas1 and the 3 types of CRISPRCas systems

228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes

EcoliCASS2Type IType I

Type I

Type I

Type I

Type III

Type III

Type III

Type III

Type III

Type III

Type II

YpestCASS3

NmeniCASS4

PolymeraseRAMPmoduleMtubeCASS6

PolRAMP

DvulgCASS1

TneapHmariCASS7

ApernCASS5

cas1cas9 cas2 cas4

cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI

II

IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2

RAMP

Makarova et al NRM submitted

CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria

bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes

54

256

1537

26

516

2

9

7 56107

3

10

13

383

210

46

216

8

1

7 70211

10

1

0102030405060708090

100

1623

84 6

17 7

2322

0

9

0

0

151

2

1

21

17

20 1

0

1533 28 7 14

0

9 7 40

117 210

0102030405060708090

100

Type I Type II Type IIICas1presenceabsence

Makarova et al in preparation

Modular evolution of the 3 types of CRISPRCas

Makarova et al in preparation

Mol Microbiol 2011 Jan79(2)484-502

A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair

Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF

Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair

Back to repair functions

The 3 major modalities of evolution

Koonin Wolf Biol Direct 2009

CRISPRCas as a bona fide Lamarckian system

Koonin Wolf Biol Direct 2009

Phenomenon Biological rolefunction

Phyletic spread

Lamarckian criteria

Genomic changes caused by environmental factor

Changes are specific to relevant genomic loci

Changes provide adaptation to the causative factor

Bona fide LamarckianCRISPRCas Defense against

viruses and other mobile elements

Most of the Archaeaand many bacteria

Yes Yes Yes

piRNA Defense against transposable elements in germline

Animals Yes Yes Yes

HGT (specific cases)

Adaptation to new environment stress response resistance

Archaea bacteria unicellular eukaryotes

Yes Yes Yes

Quasi-LamarckianHGT (general phenomenon)

Diverse innovations Archaea bacteria unicellular eukaryotes

Yes No Yesno

Stress-induced mutagenesis

Stress responseresistanceadaptation to new conditions

Ubiquitous Yes No or partially

Yes (but general evolvabilityenhanced as well)

Diverse Lamarckian and quasi-Lamarckian phenomena

Koonin Wolf Biol Direct 2009

Stress as a gauge of evolutionary modality

Koonin Wolf Biol Direct 2009

bullEvolution of parasites is intrinsic to any replicator system

bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since

bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms

bullPerennial arms race between parasites and hosts is one of the principal factors of evolution

Valerian DoljaOregon State University

Yuri Wolf NCBITatiana Senkevich NIAID NIH

Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille

Didier Raoult et al

Bill MartinUniv Duesseldorf

Kira Makarova NCBI

  • Slide Number 1
  • What is a virus
  • Slide Number 3
  • Slide Number 4
  • Slide Number 5
  • Comparative genomics shows that viruses that cause human diseases belong to families that evolved hundreds of millions or even billions years agoViruses accompany the evolving cellular life throughout its history and might even predate it
  • Slide Number 7
  • Slide Number 8
  • Slide Number 9
  • Slide Number 10
  • Slide Number 11
  • Slide Number 12
  • Slide Number 13
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Slide Number 18
  • Slide Number 19
  • Slide Number 20
  • Slide Number 21
  • Slide Number 22
  • Slide Number 23
  • Slide Number 24
  • Slide Number 25
  • Slide Number 26
  • Slide Number 27
  • Slide Number 28
  • Slide Number 29
  • Slide Number 30
  • Slide Number 31
  • Slide Number 32
  • Slide Number 33
  • Slide Number 34
  • Slide Number 35
  • Slide Number 36
  • Evolution of antivirus defense systems
  • Slide Number 38
  • Slide Number 39
  • Slide Number 40
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins extreme sequence diversity
  • Slide Number 46
  • Slide Number 47
  • Slide Number 48
  • Slide Number 49
  • Hypothesis
  • Slide Number 51
  • Slide Number 52
  • Slide Number 53
  • Slide Number 54
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • The three types of CRISPRCas systems and their signature genes
  • Experimental data on CRISPRCas systems
  • Phylogeny of Cas1 and the 3 types of CRISPRCas systems
  • CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
  • Slide Number 62
  • Slide Number 63
  • Slide Number 64
  • Slide Number 65
  • Slide Number 66
  • Slide Number 67
  • Slide Number 68
  • Slide Number 69
Page 50: The Virus World, its evolution, evolution of antiviral defense

Barrangou R Fremaux C Deveau H Richards M Boyaval P Moineau S Romero DA Horvath P CRISPR provides acquired resistance against viruses in prokaryotes Science 2007 Mar 23315(5819)1709-12

Clustered regularly interspaced short palindromic repeats (CRISPR) are a distinctive feature of the genomes of most Bacteria and Archaea and are thought to be involved in resistance to bacteriophages We found that after viral challenge bacteria integrated new spacers derived from phage genomic sequences Removal or addition of particular spacers modified the phage-resistance phenotype of the cell Thus CRISPR together with associated cas genes provided resistance against phages and resistance Specificity is determined by spacer-phage sequence similarity

Key validation

bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease

Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes

(1) Adaptation

(2) Expression amp Processing

(3) Interference

invading virus plasmid

CRISPRleader

5 4 3 2 1

6 5 4 3 2 1

invading virus plasmid

invading virus plasmid

host

host

cas operon

Cascade

Cas3

cas3 cas2cas1cas5

Van der Oost et al TIBS 2009

The three types of CRISPRCas systems and their signature genes

1343

1343

1343

1343

Helicase1203

HD-f

Cas2

Cas6

Cas6

Cas5

Cas8

Cas9

Cas1

Cas3

Cas10

Cas11

Cas7

Cas6

Cas6

Cas5 Cas6

Cas4

Cas6

13433513

1343

RuvC-f

3513

HNH

13433513

1857

18571857 RAMP1688

RAMP1688

RecB-f1468

RecB-f1468

Helicase1203

R AMP1583

1857 1857Helicase1203

1857Helicase1203

1343 y1724

ygcKygcL

EcoliCASS2

YpestCASS3

NmeniCASS4

MtubeCASS6 RAMP module

DvulgCASS1

Tneap and HmariCASS7

ApernCASS5

Type I (Helicase cassette)

Cascade

Type II (HNH-type)

Type III (RAMP module or Polymerase cassette)

CASS4a

1857 1343RecB-f1468

Helicase1203

BH0338MTH1090LA3191

1421 RAMP1567

RAMP15833337RAMP

1583

RAMP1583

RAMP1769

SPy1049

Cas6

Cas61857

RecB-f1468

RAMP1688

Helicase1203

AF0070

R AMP1583

1343

1517

Cas12 Cas13

RAMP1688

RAMP1688

RAMP1688

Makarova et al NRM submitted

Experimental data on CRISPRCas systems

1857

1857

1857

RAMP1688

10

Cordip DIP0037Bacfra BF3951

Wolsuc WS1444Camjej Cj1522c

Neimen NMA0630Pasmul PM1126

Strthe str0658Strpyo Spy 0770

Treden TDE0328Staepi SERP2463

Mycgal MGA 0523Mycmob MMOB0320

Porgin PG1982Legpne lpp0161

Wolsuc WS1615Sulsol SSO1450

Aerper APE1240Sulsol SSO1405

Pyraer PAE0200Pyraer PAE0081

Metmaz MM0559Metmaz MM3249

Arcful AF1878Mansuc MS1635

Niteur NE0111Myctub Rv2817c

Nostoc alr0381Synech slr7071Thethe TTHB145

Chrvio CV1229Xanaxo XAC3842

Chltep CT1130Desvul DVUA0134

Metcap MCA0651Azoarc ebA3283

Deigeo_DRAFT_1682Mansuc MS0981

Baccla ABC3592Bachal BH0341

Strpyo Spy 1286Vibvul VVA1544

Nostoc alr1468Synech slr7092

Synech slr7016Geosul GSU0057

Thethe TTHB224Nostoc alr1568

Metkan MK1312

Esccol b2755Saltyp STM2938

Phopro PBPRC0034Geosul GSU1392

Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639

Symthe_STH663Corjei jk0643

Nocfar nfa44220Strave SAV7537

Metcap MCA0930Cordip DIP2214

Chrvio CV1756Zymmob ZMO0680

Pasmul PM0311Erwcar ECA3679

Yerpes y1722Legpne lpl2837Phopro PBPRB1995

Acinet ACIAD2484

Clotet CTC01148Fusnuc FN1177

Aquaeo aq 369Theten TTE2658Themar TM1797

Bacfra BF2544Porgin PG2014

Halmar pNG4053Clotet CTC01463

Metjan MJ0378Pyrhor PH1245

Thekod TK0455Metthe MTH1084

Arcful AF2435Metace MA3670

Nanequ NEQ017Pyrhor PH0173

Pictor PTO0003Pictor PTO0049Thevol TVN0106

1343

1343

1343

1343

1518

1518

1343

1518

RecB-f4343

RecB-f1468

RecB-f1468

RecB-f1468

Helicase12031857 RAMP

1688

RAMP1688

Helicase1203

Helicase1203

RAMPBH0337

RAMPYgcH

RAMPy1727

RAMPy17261857

Helicase1203

HD-f

Helicase1203

HD-f

1343

HD-f

HD-f

HD-f

RuvC-f

AF0070

BH0338

y1724

ygcKygcL

RAMP1583

HTH

AF1870

SPy1049

3574

AF1870

PH0918

HTH 3574

1857 RAMP1688

RecB-f1468

Helicase1203

HD-f

AF1870

MTH1090

AF1873

RAMP1583

3574

AF1870

3574

1343

1

7

2

3

4

5

6

7a

Sulsol SSO1999Sulsol SSO1440

Sultok ST2639Sultok ST0032

Sulsol SSO1402Pyraer PAE0068

Arcful AF1874Pyrhor PH0917Thekod TK0450

Pyraby PAB1689

13433513

3513

1518

RuvC-f

HNH

4aHNH

1 0

Pyrhor PH0162

6

64

6

4

16

17

7

7

7

7

1

2

7

7

7

7

5 7

5 75 7

5 75 7

5 7

6 1

7

Metkan MK1314

Metace MA1931

Metmaz MM3356

Metbar Mbar_A1355

Strthe stu0960Myctub Rv2823cStaepi SERP2461

Themar TM1811Mansuc MS1651

Niteur NE0123Thethe TTP0102

Metthe MTH1082

Metthe MTH326

Metjan MJ1672Pictor PTO0053Thevol TVN0112

Metkan MK1297Thethe TTP0115

Bachal BH0328Synech sll7090

Aquaeo aq 387Theten TTE2639

Themar TM1794Arcful AF1867

Pyrfur PF1129Thefus Tfu1578

Porgin PG1987Sultok ST1979

Sulsol SSO1991Sulsol SSO1729

Sultok ST0012Sulaci Saci 2046Nostoc all1479

1421 RAMP1567

RAMP1583

RAMPMTH323

RAMP1769

XMTH324

ldquoPolymeraserdquo cassette

ldquoHelicaserdquo cassette(A)

(B)

(C)

Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4

Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56

Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)

mRNA targeting

Phylogeny of Cas1 and the 3 types of CRISPRCas systems

228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes

EcoliCASS2Type IType I

Type I

Type I

Type I

Type III

Type III

Type III

Type III

Type III

Type III

Type II

YpestCASS3

NmeniCASS4

PolymeraseRAMPmoduleMtubeCASS6

PolRAMP

DvulgCASS1

TneapHmariCASS7

ApernCASS5

cas1cas9 cas2 cas4

cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI

II

IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2

RAMP

Makarova et al NRM submitted

CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria

bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes

54

256

1537

26

516

2

9

7 56107

3

10

13

383

210

46

216

8

1

7 70211

10

1

0102030405060708090

100

1623

84 6

17 7

2322

0

9

0

0

151

2

1

21

17

20 1

0

1533 28 7 14

0

9 7 40

117 210

0102030405060708090

100

Type I Type II Type IIICas1presenceabsence

Makarova et al in preparation

Modular evolution of the 3 types of CRISPRCas

Makarova et al in preparation

Mol Microbiol 2011 Jan79(2)484-502

A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair

Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF

Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair

Back to repair functions

The 3 major modalities of evolution

Koonin Wolf Biol Direct 2009

CRISPRCas as a bona fide Lamarckian system

Koonin Wolf Biol Direct 2009

Phenomenon Biological rolefunction

Phyletic spread

Lamarckian criteria

Genomic changes caused by environmental factor

Changes are specific to relevant genomic loci

Changes provide adaptation to the causative factor

Bona fide LamarckianCRISPRCas Defense against

viruses and other mobile elements

Most of the Archaeaand many bacteria

Yes Yes Yes

piRNA Defense against transposable elements in germline

Animals Yes Yes Yes

HGT (specific cases)

Adaptation to new environment stress response resistance

Archaea bacteria unicellular eukaryotes

Yes Yes Yes

Quasi-LamarckianHGT (general phenomenon)

Diverse innovations Archaea bacteria unicellular eukaryotes

Yes No Yesno

Stress-induced mutagenesis

Stress responseresistanceadaptation to new conditions

Ubiquitous Yes No or partially

Yes (but general evolvabilityenhanced as well)

Diverse Lamarckian and quasi-Lamarckian phenomena

Koonin Wolf Biol Direct 2009

Stress as a gauge of evolutionary modality

Koonin Wolf Biol Direct 2009

bullEvolution of parasites is intrinsic to any replicator system

bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since

bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms

bullPerennial arms race between parasites and hosts is one of the principal factors of evolution

Valerian DoljaOregon State University

Yuri Wolf NCBITatiana Senkevich NIAID NIH

Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille

Didier Raoult et al

Bill MartinUniv Duesseldorf

Kira Makarova NCBI

  • Slide Number 1
  • What is a virus
  • Slide Number 3
  • Slide Number 4
  • Slide Number 5
  • Comparative genomics shows that viruses that cause human diseases belong to families that evolved hundreds of millions or even billions years agoViruses accompany the evolving cellular life throughout its history and might even predate it
  • Slide Number 7
  • Slide Number 8
  • Slide Number 9
  • Slide Number 10
  • Slide Number 11
  • Slide Number 12
  • Slide Number 13
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Slide Number 18
  • Slide Number 19
  • Slide Number 20
  • Slide Number 21
  • Slide Number 22
  • Slide Number 23
  • Slide Number 24
  • Slide Number 25
  • Slide Number 26
  • Slide Number 27
  • Slide Number 28
  • Slide Number 29
  • Slide Number 30
  • Slide Number 31
  • Slide Number 32
  • Slide Number 33
  • Slide Number 34
  • Slide Number 35
  • Slide Number 36
  • Evolution of antivirus defense systems
  • Slide Number 38
  • Slide Number 39
  • Slide Number 40
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins extreme sequence diversity
  • Slide Number 46
  • Slide Number 47
  • Slide Number 48
  • Slide Number 49
  • Hypothesis
  • Slide Number 51
  • Slide Number 52
  • Slide Number 53
  • Slide Number 54
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • The three types of CRISPRCas systems and their signature genes
  • Experimental data on CRISPRCas systems
  • Phylogeny of Cas1 and the 3 types of CRISPRCas systems
  • CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
  • Slide Number 62
  • Slide Number 63
  • Slide Number 64
  • Slide Number 65
  • Slide Number 66
  • Slide Number 67
  • Slide Number 68
  • Slide Number 69
Page 51: The Virus World, its evolution, evolution of antiviral defense

bullPhage-specific inserts confer resistance that is highly sequence-specific a single substitution (SNP) reverts to sensitivitybullThe spacers worked only when inserted between CRISPRbullResistance required COG3513 (cas5) a predicted nuclease

Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes

(1) Adaptation

(2) Expression amp Processing

(3) Interference

invading virus plasmid

CRISPRleader

5 4 3 2 1

6 5 4 3 2 1

invading virus plasmid

invading virus plasmid

host

host

cas operon

Cascade

Cas3

cas3 cas2cas1cas5

Van der Oost et al TIBS 2009

The three types of CRISPRCas systems and their signature genes

1343

1343

1343

1343

Helicase1203

HD-f

Cas2

Cas6

Cas6

Cas5

Cas8

Cas9

Cas1

Cas3

Cas10

Cas11

Cas7

Cas6

Cas6

Cas5 Cas6

Cas4

Cas6

13433513

1343

RuvC-f

3513

HNH

13433513

1857

18571857 RAMP1688

RAMP1688

RecB-f1468

RecB-f1468

Helicase1203

R AMP1583

1857 1857Helicase1203

1857Helicase1203

1343 y1724

ygcKygcL

EcoliCASS2

YpestCASS3

NmeniCASS4

MtubeCASS6 RAMP module

DvulgCASS1

Tneap and HmariCASS7

ApernCASS5

Type I (Helicase cassette)

Cascade

Type II (HNH-type)

Type III (RAMP module or Polymerase cassette)

CASS4a

1857 1343RecB-f1468

Helicase1203

BH0338MTH1090LA3191

1421 RAMP1567

RAMP15833337RAMP

1583

RAMP1583

RAMP1769

SPy1049

Cas6

Cas61857

RecB-f1468

RAMP1688

Helicase1203

AF0070

R AMP1583

1343

1517

Cas12 Cas13

RAMP1688

RAMP1688

RAMP1688

Makarova et al NRM submitted

Experimental data on CRISPRCas systems

1857

1857

1857

RAMP1688

10

Cordip DIP0037Bacfra BF3951

Wolsuc WS1444Camjej Cj1522c

Neimen NMA0630Pasmul PM1126

Strthe str0658Strpyo Spy 0770

Treden TDE0328Staepi SERP2463

Mycgal MGA 0523Mycmob MMOB0320

Porgin PG1982Legpne lpp0161

Wolsuc WS1615Sulsol SSO1450

Aerper APE1240Sulsol SSO1405

Pyraer PAE0200Pyraer PAE0081

Metmaz MM0559Metmaz MM3249

Arcful AF1878Mansuc MS1635

Niteur NE0111Myctub Rv2817c

Nostoc alr0381Synech slr7071Thethe TTHB145

Chrvio CV1229Xanaxo XAC3842

Chltep CT1130Desvul DVUA0134

Metcap MCA0651Azoarc ebA3283

Deigeo_DRAFT_1682Mansuc MS0981

Baccla ABC3592Bachal BH0341

Strpyo Spy 1286Vibvul VVA1544

Nostoc alr1468Synech slr7092

Synech slr7016Geosul GSU0057

Thethe TTHB224Nostoc alr1568

Metkan MK1312

Esccol b2755Saltyp STM2938

Phopro PBPRC0034Geosul GSU1392

Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639

Symthe_STH663Corjei jk0643

Nocfar nfa44220Strave SAV7537

Metcap MCA0930Cordip DIP2214

Chrvio CV1756Zymmob ZMO0680

Pasmul PM0311Erwcar ECA3679

Yerpes y1722Legpne lpl2837Phopro PBPRB1995

Acinet ACIAD2484

Clotet CTC01148Fusnuc FN1177

Aquaeo aq 369Theten TTE2658Themar TM1797

Bacfra BF2544Porgin PG2014

Halmar pNG4053Clotet CTC01463

Metjan MJ0378Pyrhor PH1245

Thekod TK0455Metthe MTH1084

Arcful AF2435Metace MA3670

Nanequ NEQ017Pyrhor PH0173

Pictor PTO0003Pictor PTO0049Thevol TVN0106

1343

1343

1343

1343

1518

1518

1343

1518

RecB-f4343

RecB-f1468

RecB-f1468

RecB-f1468

Helicase12031857 RAMP

1688

RAMP1688

Helicase1203

Helicase1203

RAMPBH0337

RAMPYgcH

RAMPy1727

RAMPy17261857

Helicase1203

HD-f

Helicase1203

HD-f

1343

HD-f

HD-f

HD-f

RuvC-f

AF0070

BH0338

y1724

ygcKygcL

RAMP1583

HTH

AF1870

SPy1049

3574

AF1870

PH0918

HTH 3574

1857 RAMP1688

RecB-f1468

Helicase1203

HD-f

AF1870

MTH1090

AF1873

RAMP1583

3574

AF1870

3574

1343

1

7

2

3

4

5

6

7a

Sulsol SSO1999Sulsol SSO1440

Sultok ST2639Sultok ST0032

Sulsol SSO1402Pyraer PAE0068

Arcful AF1874Pyrhor PH0917Thekod TK0450

Pyraby PAB1689

13433513

3513

1518

RuvC-f

HNH

4aHNH

1 0

Pyrhor PH0162

6

64

6

4

16

17

7

7

7

7

1

2

7

7

7

7

5 7

5 75 7

5 75 7

5 7

6 1

7

Metkan MK1314

Metace MA1931

Metmaz MM3356

Metbar Mbar_A1355

Strthe stu0960Myctub Rv2823cStaepi SERP2461

Themar TM1811Mansuc MS1651

Niteur NE0123Thethe TTP0102

Metthe MTH1082

Metthe MTH326

Metjan MJ1672Pictor PTO0053Thevol TVN0112

Metkan MK1297Thethe TTP0115

Bachal BH0328Synech sll7090

Aquaeo aq 387Theten TTE2639

Themar TM1794Arcful AF1867

Pyrfur PF1129Thefus Tfu1578

Porgin PG1987Sultok ST1979

Sulsol SSO1991Sulsol SSO1729

Sultok ST0012Sulaci Saci 2046Nostoc all1479

1421 RAMP1567

RAMP1583

RAMPMTH323

RAMP1769

XMTH324

ldquoPolymeraserdquo cassette

ldquoHelicaserdquo cassette(A)

(B)

(C)

Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4

Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56

Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)

mRNA targeting

Phylogeny of Cas1 and the 3 types of CRISPRCas systems

228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes

EcoliCASS2Type IType I

Type I

Type I

Type I

Type III

Type III

Type III

Type III

Type III

Type III

Type II

YpestCASS3

NmeniCASS4

PolymeraseRAMPmoduleMtubeCASS6

PolRAMP

DvulgCASS1

TneapHmariCASS7

ApernCASS5

cas1cas9 cas2 cas4

cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI

II

IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2

RAMP

Makarova et al NRM submitted

CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria

bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes

54

256

1537

26

516

2

9

7 56107

3

10

13

383

210

46

216

8

1

7 70211

10

1

0102030405060708090

100

1623

84 6

17 7

2322

0

9

0

0

151

2

1

21

17

20 1

0

1533 28 7 14

0

9 7 40

117 210

0102030405060708090

100

Type I Type II Type IIICas1presenceabsence

Makarova et al in preparation

Modular evolution of the 3 types of CRISPRCas

Makarova et al in preparation

Mol Microbiol 2011 Jan79(2)484-502

A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair

Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF

Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair

Back to repair functions

The 3 major modalities of evolution

Koonin Wolf Biol Direct 2009

CRISPRCas as a bona fide Lamarckian system

Koonin Wolf Biol Direct 2009

Phenomenon Biological rolefunction

Phyletic spread

Lamarckian criteria

Genomic changes caused by environmental factor

Changes are specific to relevant genomic loci

Changes provide adaptation to the causative factor

Bona fide LamarckianCRISPRCas Defense against

viruses and other mobile elements

Most of the Archaeaand many bacteria

Yes Yes Yes

piRNA Defense against transposable elements in germline

Animals Yes Yes Yes

HGT (specific cases)

Adaptation to new environment stress response resistance

Archaea bacteria unicellular eukaryotes

Yes Yes Yes

Quasi-LamarckianHGT (general phenomenon)

Diverse innovations Archaea bacteria unicellular eukaryotes

Yes No Yesno

Stress-induced mutagenesis

Stress responseresistanceadaptation to new conditions

Ubiquitous Yes No or partially

Yes (but general evolvabilityenhanced as well)

Diverse Lamarckian and quasi-Lamarckian phenomena

Koonin Wolf Biol Direct 2009

Stress as a gauge of evolutionary modality

Koonin Wolf Biol Direct 2009

bullEvolution of parasites is intrinsic to any replicator system

bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since

bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms

bullPerennial arms race between parasites and hosts is one of the principal factors of evolution

Valerian DoljaOregon State University

Yuri Wolf NCBITatiana Senkevich NIAID NIH

Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille

Didier Raoult et al

Bill MartinUniv Duesseldorf

Kira Makarova NCBI

  • Slide Number 1
  • What is a virus
  • Slide Number 3
  • Slide Number 4
  • Slide Number 5
  • Comparative genomics shows that viruses that cause human diseases belong to families that evolved hundreds of millions or even billions years agoViruses accompany the evolving cellular life throughout its history and might even predate it
  • Slide Number 7
  • Slide Number 8
  • Slide Number 9
  • Slide Number 10
  • Slide Number 11
  • Slide Number 12
  • Slide Number 13
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Slide Number 18
  • Slide Number 19
  • Slide Number 20
  • Slide Number 21
  • Slide Number 22
  • Slide Number 23
  • Slide Number 24
  • Slide Number 25
  • Slide Number 26
  • Slide Number 27
  • Slide Number 28
  • Slide Number 29
  • Slide Number 30
  • Slide Number 31
  • Slide Number 32
  • Slide Number 33
  • Slide Number 34
  • Slide Number 35
  • Slide Number 36
  • Evolution of antivirus defense systems
  • Slide Number 38
  • Slide Number 39
  • Slide Number 40
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins extreme sequence diversity
  • Slide Number 46
  • Slide Number 47
  • Slide Number 48
  • Slide Number 49
  • Hypothesis
  • Slide Number 51
  • Slide Number 52
  • Slide Number 53
  • Slide Number 54
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • The three types of CRISPRCas systems and their signature genes
  • Experimental data on CRISPRCas systems
  • Phylogeny of Cas1 and the 3 types of CRISPRCas systems
  • CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
  • Slide Number 62
  • Slide Number 63
  • Slide Number 64
  • Slide Number 65
  • Slide Number 66
  • Slide Number 67
  • Slide Number 68
  • Slide Number 69
Page 52: The Virus World, its evolution, evolution of antiviral defense

Inserts from phage-resistant mutants were homologous to regions scattered over the phage genomes

(1) Adaptation

(2) Expression amp Processing

(3) Interference

invading virus plasmid

CRISPRleader

5 4 3 2 1

6 5 4 3 2 1

invading virus plasmid

invading virus plasmid

host

host

cas operon

Cascade

Cas3

cas3 cas2cas1cas5

Van der Oost et al TIBS 2009

The three types of CRISPRCas systems and their signature genes

1343

1343

1343

1343

Helicase1203

HD-f

Cas2

Cas6

Cas6

Cas5

Cas8

Cas9

Cas1

Cas3

Cas10

Cas11

Cas7

Cas6

Cas6

Cas5 Cas6

Cas4

Cas6

13433513

1343

RuvC-f

3513

HNH

13433513

1857

18571857 RAMP1688

RAMP1688

RecB-f1468

RecB-f1468

Helicase1203

R AMP1583

1857 1857Helicase1203

1857Helicase1203

1343 y1724

ygcKygcL

EcoliCASS2

YpestCASS3

NmeniCASS4

MtubeCASS6 RAMP module

DvulgCASS1

Tneap and HmariCASS7

ApernCASS5

Type I (Helicase cassette)

Cascade

Type II (HNH-type)

Type III (RAMP module or Polymerase cassette)

CASS4a

1857 1343RecB-f1468

Helicase1203

BH0338MTH1090LA3191

1421 RAMP1567

RAMP15833337RAMP

1583

RAMP1583

RAMP1769

SPy1049

Cas6

Cas61857

RecB-f1468

RAMP1688

Helicase1203

AF0070

R AMP1583

1343

1517

Cas12 Cas13

RAMP1688

RAMP1688

RAMP1688

Makarova et al NRM submitted

Experimental data on CRISPRCas systems

1857

1857

1857

RAMP1688

10

Cordip DIP0037Bacfra BF3951

Wolsuc WS1444Camjej Cj1522c

Neimen NMA0630Pasmul PM1126

Strthe str0658Strpyo Spy 0770

Treden TDE0328Staepi SERP2463

Mycgal MGA 0523Mycmob MMOB0320

Porgin PG1982Legpne lpp0161

Wolsuc WS1615Sulsol SSO1450

Aerper APE1240Sulsol SSO1405

Pyraer PAE0200Pyraer PAE0081

Metmaz MM0559Metmaz MM3249

Arcful AF1878Mansuc MS1635

Niteur NE0111Myctub Rv2817c

Nostoc alr0381Synech slr7071Thethe TTHB145

Chrvio CV1229Xanaxo XAC3842

Chltep CT1130Desvul DVUA0134

Metcap MCA0651Azoarc ebA3283

Deigeo_DRAFT_1682Mansuc MS0981

Baccla ABC3592Bachal BH0341

Strpyo Spy 1286Vibvul VVA1544

Nostoc alr1468Synech slr7092

Synech slr7016Geosul GSU0057

Thethe TTHB224Nostoc alr1568

Metkan MK1312

Esccol b2755Saltyp STM2938

Phopro PBPRC0034Geosul GSU1392

Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639

Symthe_STH663Corjei jk0643

Nocfar nfa44220Strave SAV7537

Metcap MCA0930Cordip DIP2214

Chrvio CV1756Zymmob ZMO0680

Pasmul PM0311Erwcar ECA3679

Yerpes y1722Legpne lpl2837Phopro PBPRB1995

Acinet ACIAD2484

Clotet CTC01148Fusnuc FN1177

Aquaeo aq 369Theten TTE2658Themar TM1797

Bacfra BF2544Porgin PG2014

Halmar pNG4053Clotet CTC01463

Metjan MJ0378Pyrhor PH1245

Thekod TK0455Metthe MTH1084

Arcful AF2435Metace MA3670

Nanequ NEQ017Pyrhor PH0173

Pictor PTO0003Pictor PTO0049Thevol TVN0106

1343

1343

1343

1343

1518

1518

1343

1518

RecB-f4343

RecB-f1468

RecB-f1468

RecB-f1468

Helicase12031857 RAMP

1688

RAMP1688

Helicase1203

Helicase1203

RAMPBH0337

RAMPYgcH

RAMPy1727

RAMPy17261857

Helicase1203

HD-f

Helicase1203

HD-f

1343

HD-f

HD-f

HD-f

RuvC-f

AF0070

BH0338

y1724

ygcKygcL

RAMP1583

HTH

AF1870

SPy1049

3574

AF1870

PH0918

HTH 3574

1857 RAMP1688

RecB-f1468

Helicase1203

HD-f

AF1870

MTH1090

AF1873

RAMP1583

3574

AF1870

3574

1343

1

7

2

3

4

5

6

7a

Sulsol SSO1999Sulsol SSO1440

Sultok ST2639Sultok ST0032

Sulsol SSO1402Pyraer PAE0068

Arcful AF1874Pyrhor PH0917Thekod TK0450

Pyraby PAB1689

13433513

3513

1518

RuvC-f

HNH

4aHNH

1 0

Pyrhor PH0162

6

64

6

4

16

17

7

7

7

7

1

2

7

7

7

7

5 7

5 75 7

5 75 7

5 7

6 1

7

Metkan MK1314

Metace MA1931

Metmaz MM3356

Metbar Mbar_A1355

Strthe stu0960Myctub Rv2823cStaepi SERP2461

Themar TM1811Mansuc MS1651

Niteur NE0123Thethe TTP0102

Metthe MTH1082

Metthe MTH326

Metjan MJ1672Pictor PTO0053Thevol TVN0112

Metkan MK1297Thethe TTP0115

Bachal BH0328Synech sll7090

Aquaeo aq 387Theten TTE2639

Themar TM1794Arcful AF1867

Pyrfur PF1129Thefus Tfu1578

Porgin PG1987Sultok ST1979

Sulsol SSO1991Sulsol SSO1729

Sultok ST0012Sulaci Saci 2046Nostoc all1479

1421 RAMP1567

RAMP1583

RAMPMTH323

RAMP1769

XMTH324

ldquoPolymeraserdquo cassette

ldquoHelicaserdquo cassette(A)

(B)

(C)

Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4

Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56

Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)

mRNA targeting

Phylogeny of Cas1 and the 3 types of CRISPRCas systems

228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes

EcoliCASS2Type IType I

Type I

Type I

Type I

Type III

Type III

Type III

Type III

Type III

Type III

Type II

YpestCASS3

NmeniCASS4

PolymeraseRAMPmoduleMtubeCASS6

PolRAMP

DvulgCASS1

TneapHmariCASS7

ApernCASS5

cas1cas9 cas2 cas4

cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI

II

IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2

RAMP

Makarova et al NRM submitted

CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria

bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes

54

256

1537

26

516

2

9

7 56107

3

10

13

383

210

46

216

8

1

7 70211

10

1

0102030405060708090

100

1623

84 6

17 7

2322

0

9

0

0

151

2

1

21

17

20 1

0

1533 28 7 14

0

9 7 40

117 210

0102030405060708090

100

Type I Type II Type IIICas1presenceabsence

Makarova et al in preparation

Modular evolution of the 3 types of CRISPRCas

Makarova et al in preparation

Mol Microbiol 2011 Jan79(2)484-502

A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair

Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF

Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair

Back to repair functions

The 3 major modalities of evolution

Koonin Wolf Biol Direct 2009

CRISPRCas as a bona fide Lamarckian system

Koonin Wolf Biol Direct 2009

Phenomenon Biological rolefunction

Phyletic spread

Lamarckian criteria

Genomic changes caused by environmental factor

Changes are specific to relevant genomic loci

Changes provide adaptation to the causative factor

Bona fide LamarckianCRISPRCas Defense against

viruses and other mobile elements

Most of the Archaeaand many bacteria

Yes Yes Yes

piRNA Defense against transposable elements in germline

Animals Yes Yes Yes

HGT (specific cases)

Adaptation to new environment stress response resistance

Archaea bacteria unicellular eukaryotes

Yes Yes Yes

Quasi-LamarckianHGT (general phenomenon)

Diverse innovations Archaea bacteria unicellular eukaryotes

Yes No Yesno

Stress-induced mutagenesis

Stress responseresistanceadaptation to new conditions

Ubiquitous Yes No or partially

Yes (but general evolvabilityenhanced as well)

Diverse Lamarckian and quasi-Lamarckian phenomena

Koonin Wolf Biol Direct 2009

Stress as a gauge of evolutionary modality

Koonin Wolf Biol Direct 2009

bullEvolution of parasites is intrinsic to any replicator system

bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since

bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms

bullPerennial arms race between parasites and hosts is one of the principal factors of evolution

Valerian DoljaOregon State University

Yuri Wolf NCBITatiana Senkevich NIAID NIH

Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille

Didier Raoult et al

Bill MartinUniv Duesseldorf

Kira Makarova NCBI

  • Slide Number 1
  • What is a virus
  • Slide Number 3
  • Slide Number 4
  • Slide Number 5
  • Comparative genomics shows that viruses that cause human diseases belong to families that evolved hundreds of millions or even billions years agoViruses accompany the evolving cellular life throughout its history and might even predate it
  • Slide Number 7
  • Slide Number 8
  • Slide Number 9
  • Slide Number 10
  • Slide Number 11
  • Slide Number 12
  • Slide Number 13
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Slide Number 18
  • Slide Number 19
  • Slide Number 20
  • Slide Number 21
  • Slide Number 22
  • Slide Number 23
  • Slide Number 24
  • Slide Number 25
  • Slide Number 26
  • Slide Number 27
  • Slide Number 28
  • Slide Number 29
  • Slide Number 30
  • Slide Number 31
  • Slide Number 32
  • Slide Number 33
  • Slide Number 34
  • Slide Number 35
  • Slide Number 36
  • Evolution of antivirus defense systems
  • Slide Number 38
  • Slide Number 39
  • Slide Number 40
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins extreme sequence diversity
  • Slide Number 46
  • Slide Number 47
  • Slide Number 48
  • Slide Number 49
  • Hypothesis
  • Slide Number 51
  • Slide Number 52
  • Slide Number 53
  • Slide Number 54
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • The three types of CRISPRCas systems and their signature genes
  • Experimental data on CRISPRCas systems
  • Phylogeny of Cas1 and the 3 types of CRISPRCas systems
  • CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
  • Slide Number 62
  • Slide Number 63
  • Slide Number 64
  • Slide Number 65
  • Slide Number 66
  • Slide Number 67
  • Slide Number 68
  • Slide Number 69
Page 53: The Virus World, its evolution, evolution of antiviral defense

(1) Adaptation

(2) Expression amp Processing

(3) Interference

invading virus plasmid

CRISPRleader

5 4 3 2 1

6 5 4 3 2 1

invading virus plasmid

invading virus plasmid

host

host

cas operon

Cascade

Cas3

cas3 cas2cas1cas5

Van der Oost et al TIBS 2009

The three types of CRISPRCas systems and their signature genes

1343

1343

1343

1343

Helicase1203

HD-f

Cas2

Cas6

Cas6

Cas5

Cas8

Cas9

Cas1

Cas3

Cas10

Cas11

Cas7

Cas6

Cas6

Cas5 Cas6

Cas4

Cas6

13433513

1343

RuvC-f

3513

HNH

13433513

1857

18571857 RAMP1688

RAMP1688

RecB-f1468

RecB-f1468

Helicase1203

R AMP1583

1857 1857Helicase1203

1857Helicase1203

1343 y1724

ygcKygcL

EcoliCASS2

YpestCASS3

NmeniCASS4

MtubeCASS6 RAMP module

DvulgCASS1

Tneap and HmariCASS7

ApernCASS5

Type I (Helicase cassette)

Cascade

Type II (HNH-type)

Type III (RAMP module or Polymerase cassette)

CASS4a

1857 1343RecB-f1468

Helicase1203

BH0338MTH1090LA3191

1421 RAMP1567

RAMP15833337RAMP

1583

RAMP1583

RAMP1769

SPy1049

Cas6

Cas61857

RecB-f1468

RAMP1688

Helicase1203

AF0070

R AMP1583

1343

1517

Cas12 Cas13

RAMP1688

RAMP1688

RAMP1688

Makarova et al NRM submitted

Experimental data on CRISPRCas systems

1857

1857

1857

RAMP1688

10

Cordip DIP0037Bacfra BF3951

Wolsuc WS1444Camjej Cj1522c

Neimen NMA0630Pasmul PM1126

Strthe str0658Strpyo Spy 0770

Treden TDE0328Staepi SERP2463

Mycgal MGA 0523Mycmob MMOB0320

Porgin PG1982Legpne lpp0161

Wolsuc WS1615Sulsol SSO1450

Aerper APE1240Sulsol SSO1405

Pyraer PAE0200Pyraer PAE0081

Metmaz MM0559Metmaz MM3249

Arcful AF1878Mansuc MS1635

Niteur NE0111Myctub Rv2817c

Nostoc alr0381Synech slr7071Thethe TTHB145

Chrvio CV1229Xanaxo XAC3842

Chltep CT1130Desvul DVUA0134

Metcap MCA0651Azoarc ebA3283

Deigeo_DRAFT_1682Mansuc MS0981

Baccla ABC3592Bachal BH0341

Strpyo Spy 1286Vibvul VVA1544

Nostoc alr1468Synech slr7092

Synech slr7016Geosul GSU0057

Thethe TTHB224Nostoc alr1568

Metkan MK1312

Esccol b2755Saltyp STM2938

Phopro PBPRC0034Geosul GSU1392

Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639

Symthe_STH663Corjei jk0643

Nocfar nfa44220Strave SAV7537

Metcap MCA0930Cordip DIP2214

Chrvio CV1756Zymmob ZMO0680

Pasmul PM0311Erwcar ECA3679

Yerpes y1722Legpne lpl2837Phopro PBPRB1995

Acinet ACIAD2484

Clotet CTC01148Fusnuc FN1177

Aquaeo aq 369Theten TTE2658Themar TM1797

Bacfra BF2544Porgin PG2014

Halmar pNG4053Clotet CTC01463

Metjan MJ0378Pyrhor PH1245

Thekod TK0455Metthe MTH1084

Arcful AF2435Metace MA3670

Nanequ NEQ017Pyrhor PH0173

Pictor PTO0003Pictor PTO0049Thevol TVN0106

1343

1343

1343

1343

1518

1518

1343

1518

RecB-f4343

RecB-f1468

RecB-f1468

RecB-f1468

Helicase12031857 RAMP

1688

RAMP1688

Helicase1203

Helicase1203

RAMPBH0337

RAMPYgcH

RAMPy1727

RAMPy17261857

Helicase1203

HD-f

Helicase1203

HD-f

1343

HD-f

HD-f

HD-f

RuvC-f

AF0070

BH0338

y1724

ygcKygcL

RAMP1583

HTH

AF1870

SPy1049

3574

AF1870

PH0918

HTH 3574

1857 RAMP1688

RecB-f1468

Helicase1203

HD-f

AF1870

MTH1090

AF1873

RAMP1583

3574

AF1870

3574

1343

1

7

2

3

4

5

6

7a

Sulsol SSO1999Sulsol SSO1440

Sultok ST2639Sultok ST0032

Sulsol SSO1402Pyraer PAE0068

Arcful AF1874Pyrhor PH0917Thekod TK0450

Pyraby PAB1689

13433513

3513

1518

RuvC-f

HNH

4aHNH

1 0

Pyrhor PH0162

6

64

6

4

16

17

7

7

7

7

1

2

7

7

7

7

5 7

5 75 7

5 75 7

5 7

6 1

7

Metkan MK1314

Metace MA1931

Metmaz MM3356

Metbar Mbar_A1355

Strthe stu0960Myctub Rv2823cStaepi SERP2461

Themar TM1811Mansuc MS1651

Niteur NE0123Thethe TTP0102

Metthe MTH1082

Metthe MTH326

Metjan MJ1672Pictor PTO0053Thevol TVN0112

Metkan MK1297Thethe TTP0115

Bachal BH0328Synech sll7090

Aquaeo aq 387Theten TTE2639

Themar TM1794Arcful AF1867

Pyrfur PF1129Thefus Tfu1578

Porgin PG1987Sultok ST1979

Sulsol SSO1991Sulsol SSO1729

Sultok ST0012Sulaci Saci 2046Nostoc all1479

1421 RAMP1567

RAMP1583

RAMPMTH323

RAMP1769

XMTH324

ldquoPolymeraserdquo cassette

ldquoHelicaserdquo cassette(A)

(B)

(C)

Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4

Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56

Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)

mRNA targeting

Phylogeny of Cas1 and the 3 types of CRISPRCas systems

228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes

EcoliCASS2Type IType I

Type I

Type I

Type I

Type III

Type III

Type III

Type III

Type III

Type III

Type II

YpestCASS3

NmeniCASS4

PolymeraseRAMPmoduleMtubeCASS6

PolRAMP

DvulgCASS1

TneapHmariCASS7

ApernCASS5

cas1cas9 cas2 cas4

cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI

II

IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2

RAMP

Makarova et al NRM submitted

CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria

bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes

54

256

1537

26

516

2

9

7 56107

3

10

13

383

210

46

216

8

1

7 70211

10

1

0102030405060708090

100

1623

84 6

17 7

2322

0

9

0

0

151

2

1

21

17

20 1

0

1533 28 7 14

0

9 7 40

117 210

0102030405060708090

100

Type I Type II Type IIICas1presenceabsence

Makarova et al in preparation

Modular evolution of the 3 types of CRISPRCas

Makarova et al in preparation

Mol Microbiol 2011 Jan79(2)484-502

A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair

Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF

Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair

Back to repair functions

The 3 major modalities of evolution

Koonin Wolf Biol Direct 2009

CRISPRCas as a bona fide Lamarckian system

Koonin Wolf Biol Direct 2009

Phenomenon Biological rolefunction

Phyletic spread

Lamarckian criteria

Genomic changes caused by environmental factor

Changes are specific to relevant genomic loci

Changes provide adaptation to the causative factor

Bona fide LamarckianCRISPRCas Defense against

viruses and other mobile elements

Most of the Archaeaand many bacteria

Yes Yes Yes

piRNA Defense against transposable elements in germline

Animals Yes Yes Yes

HGT (specific cases)

Adaptation to new environment stress response resistance

Archaea bacteria unicellular eukaryotes

Yes Yes Yes

Quasi-LamarckianHGT (general phenomenon)

Diverse innovations Archaea bacteria unicellular eukaryotes

Yes No Yesno

Stress-induced mutagenesis

Stress responseresistanceadaptation to new conditions

Ubiquitous Yes No or partially

Yes (but general evolvabilityenhanced as well)

Diverse Lamarckian and quasi-Lamarckian phenomena

Koonin Wolf Biol Direct 2009

Stress as a gauge of evolutionary modality

Koonin Wolf Biol Direct 2009

bullEvolution of parasites is intrinsic to any replicator system

bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since

bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms

bullPerennial arms race between parasites and hosts is one of the principal factors of evolution

Valerian DoljaOregon State University

Yuri Wolf NCBITatiana Senkevich NIAID NIH

Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille

Didier Raoult et al

Bill MartinUniv Duesseldorf

Kira Makarova NCBI

  • Slide Number 1
  • What is a virus
  • Slide Number 3
  • Slide Number 4
  • Slide Number 5
  • Comparative genomics shows that viruses that cause human diseases belong to families that evolved hundreds of millions or even billions years agoViruses accompany the evolving cellular life throughout its history and might even predate it
  • Slide Number 7
  • Slide Number 8
  • Slide Number 9
  • Slide Number 10
  • Slide Number 11
  • Slide Number 12
  • Slide Number 13
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Slide Number 18
  • Slide Number 19
  • Slide Number 20
  • Slide Number 21
  • Slide Number 22
  • Slide Number 23
  • Slide Number 24
  • Slide Number 25
  • Slide Number 26
  • Slide Number 27
  • Slide Number 28
  • Slide Number 29
  • Slide Number 30
  • Slide Number 31
  • Slide Number 32
  • Slide Number 33
  • Slide Number 34
  • Slide Number 35
  • Slide Number 36
  • Evolution of antivirus defense systems
  • Slide Number 38
  • Slide Number 39
  • Slide Number 40
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins extreme sequence diversity
  • Slide Number 46
  • Slide Number 47
  • Slide Number 48
  • Slide Number 49
  • Hypothesis
  • Slide Number 51
  • Slide Number 52
  • Slide Number 53
  • Slide Number 54
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • The three types of CRISPRCas systems and their signature genes
  • Experimental data on CRISPRCas systems
  • Phylogeny of Cas1 and the 3 types of CRISPRCas systems
  • CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
  • Slide Number 62
  • Slide Number 63
  • Slide Number 64
  • Slide Number 65
  • Slide Number 66
  • Slide Number 67
  • Slide Number 68
  • Slide Number 69
Page 54: The Virus World, its evolution, evolution of antiviral defense

The three types of CRISPRCas systems and their signature genes

1343

1343

1343

1343

Helicase1203

HD-f

Cas2

Cas6

Cas6

Cas5

Cas8

Cas9

Cas1

Cas3

Cas10

Cas11

Cas7

Cas6

Cas6

Cas5 Cas6

Cas4

Cas6

13433513

1343

RuvC-f

3513

HNH

13433513

1857

18571857 RAMP1688

RAMP1688

RecB-f1468

RecB-f1468

Helicase1203

R AMP1583

1857 1857Helicase1203

1857Helicase1203

1343 y1724

ygcKygcL

EcoliCASS2

YpestCASS3

NmeniCASS4

MtubeCASS6 RAMP module

DvulgCASS1

Tneap and HmariCASS7

ApernCASS5

Type I (Helicase cassette)

Cascade

Type II (HNH-type)

Type III (RAMP module or Polymerase cassette)

CASS4a

1857 1343RecB-f1468

Helicase1203

BH0338MTH1090LA3191

1421 RAMP1567

RAMP15833337RAMP

1583

RAMP1583

RAMP1769

SPy1049

Cas6

Cas61857

RecB-f1468

RAMP1688

Helicase1203

AF0070

R AMP1583

1343

1517

Cas12 Cas13

RAMP1688

RAMP1688

RAMP1688

Makarova et al NRM submitted

Experimental data on CRISPRCas systems

1857

1857

1857

RAMP1688

10

Cordip DIP0037Bacfra BF3951

Wolsuc WS1444Camjej Cj1522c

Neimen NMA0630Pasmul PM1126

Strthe str0658Strpyo Spy 0770

Treden TDE0328Staepi SERP2463

Mycgal MGA 0523Mycmob MMOB0320

Porgin PG1982Legpne lpp0161

Wolsuc WS1615Sulsol SSO1450

Aerper APE1240Sulsol SSO1405

Pyraer PAE0200Pyraer PAE0081

Metmaz MM0559Metmaz MM3249

Arcful AF1878Mansuc MS1635

Niteur NE0111Myctub Rv2817c

Nostoc alr0381Synech slr7071Thethe TTHB145

Chrvio CV1229Xanaxo XAC3842

Chltep CT1130Desvul DVUA0134

Metcap MCA0651Azoarc ebA3283

Deigeo_DRAFT_1682Mansuc MS0981

Baccla ABC3592Bachal BH0341

Strpyo Spy 1286Vibvul VVA1544

Nostoc alr1468Synech slr7092

Synech slr7016Geosul GSU0057

Thethe TTHB224Nostoc alr1568

Metkan MK1312

Esccol b2755Saltyp STM2938

Phopro PBPRC0034Geosul GSU1392

Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639

Symthe_STH663Corjei jk0643

Nocfar nfa44220Strave SAV7537

Metcap MCA0930Cordip DIP2214

Chrvio CV1756Zymmob ZMO0680

Pasmul PM0311Erwcar ECA3679

Yerpes y1722Legpne lpl2837Phopro PBPRB1995

Acinet ACIAD2484

Clotet CTC01148Fusnuc FN1177

Aquaeo aq 369Theten TTE2658Themar TM1797

Bacfra BF2544Porgin PG2014

Halmar pNG4053Clotet CTC01463

Metjan MJ0378Pyrhor PH1245

Thekod TK0455Metthe MTH1084

Arcful AF2435Metace MA3670

Nanequ NEQ017Pyrhor PH0173

Pictor PTO0003Pictor PTO0049Thevol TVN0106

1343

1343

1343

1343

1518

1518

1343

1518

RecB-f4343

RecB-f1468

RecB-f1468

RecB-f1468

Helicase12031857 RAMP

1688

RAMP1688

Helicase1203

Helicase1203

RAMPBH0337

RAMPYgcH

RAMPy1727

RAMPy17261857

Helicase1203

HD-f

Helicase1203

HD-f

1343

HD-f

HD-f

HD-f

RuvC-f

AF0070

BH0338

y1724

ygcKygcL

RAMP1583

HTH

AF1870

SPy1049

3574

AF1870

PH0918

HTH 3574

1857 RAMP1688

RecB-f1468

Helicase1203

HD-f

AF1870

MTH1090

AF1873

RAMP1583

3574

AF1870

3574

1343

1

7

2

3

4

5

6

7a

Sulsol SSO1999Sulsol SSO1440

Sultok ST2639Sultok ST0032

Sulsol SSO1402Pyraer PAE0068

Arcful AF1874Pyrhor PH0917Thekod TK0450

Pyraby PAB1689

13433513

3513

1518

RuvC-f

HNH

4aHNH

1 0

Pyrhor PH0162

6

64

6

4

16

17

7

7

7

7

1

2

7

7

7

7

5 7

5 75 7

5 75 7

5 7

6 1

7

Metkan MK1314

Metace MA1931

Metmaz MM3356

Metbar Mbar_A1355

Strthe stu0960Myctub Rv2823cStaepi SERP2461

Themar TM1811Mansuc MS1651

Niteur NE0123Thethe TTP0102

Metthe MTH1082

Metthe MTH326

Metjan MJ1672Pictor PTO0053Thevol TVN0112

Metkan MK1297Thethe TTP0115

Bachal BH0328Synech sll7090

Aquaeo aq 387Theten TTE2639

Themar TM1794Arcful AF1867

Pyrfur PF1129Thefus Tfu1578

Porgin PG1987Sultok ST1979

Sulsol SSO1991Sulsol SSO1729

Sultok ST0012Sulaci Saci 2046Nostoc all1479

1421 RAMP1567

RAMP1583

RAMPMTH323

RAMP1769

XMTH324

ldquoPolymeraserdquo cassette

ldquoHelicaserdquo cassette(A)

(B)

(C)

Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4

Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56

Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)

mRNA targeting

Phylogeny of Cas1 and the 3 types of CRISPRCas systems

228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes

EcoliCASS2Type IType I

Type I

Type I

Type I

Type III

Type III

Type III

Type III

Type III

Type III

Type II

YpestCASS3

NmeniCASS4

PolymeraseRAMPmoduleMtubeCASS6

PolRAMP

DvulgCASS1

TneapHmariCASS7

ApernCASS5

cas1cas9 cas2 cas4

cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI

II

IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2

RAMP

Makarova et al NRM submitted

CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria

bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes

54

256

1537

26

516

2

9

7 56107

3

10

13

383

210

46

216

8

1

7 70211

10

1

0102030405060708090

100

1623

84 6

17 7

2322

0

9

0

0

151

2

1

21

17

20 1

0

1533 28 7 14

0

9 7 40

117 210

0102030405060708090

100

Type I Type II Type IIICas1presenceabsence

Makarova et al in preparation

Modular evolution of the 3 types of CRISPRCas

Makarova et al in preparation

Mol Microbiol 2011 Jan79(2)484-502

A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair

Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF

Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair

Back to repair functions

The 3 major modalities of evolution

Koonin Wolf Biol Direct 2009

CRISPRCas as a bona fide Lamarckian system

Koonin Wolf Biol Direct 2009

Phenomenon Biological rolefunction

Phyletic spread

Lamarckian criteria

Genomic changes caused by environmental factor

Changes are specific to relevant genomic loci

Changes provide adaptation to the causative factor

Bona fide LamarckianCRISPRCas Defense against

viruses and other mobile elements

Most of the Archaeaand many bacteria

Yes Yes Yes

piRNA Defense against transposable elements in germline

Animals Yes Yes Yes

HGT (specific cases)

Adaptation to new environment stress response resistance

Archaea bacteria unicellular eukaryotes

Yes Yes Yes

Quasi-LamarckianHGT (general phenomenon)

Diverse innovations Archaea bacteria unicellular eukaryotes

Yes No Yesno

Stress-induced mutagenesis

Stress responseresistanceadaptation to new conditions

Ubiquitous Yes No or partially

Yes (but general evolvabilityenhanced as well)

Diverse Lamarckian and quasi-Lamarckian phenomena

Koonin Wolf Biol Direct 2009

Stress as a gauge of evolutionary modality

Koonin Wolf Biol Direct 2009

bullEvolution of parasites is intrinsic to any replicator system

bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since

bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms

bullPerennial arms race between parasites and hosts is one of the principal factors of evolution

Valerian DoljaOregon State University

Yuri Wolf NCBITatiana Senkevich NIAID NIH

Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille

Didier Raoult et al

Bill MartinUniv Duesseldorf

Kira Makarova NCBI

  • Slide Number 1
  • What is a virus
  • Slide Number 3
  • Slide Number 4
  • Slide Number 5
  • Comparative genomics shows that viruses that cause human diseases belong to families that evolved hundreds of millions or even billions years agoViruses accompany the evolving cellular life throughout its history and might even predate it
  • Slide Number 7
  • Slide Number 8
  • Slide Number 9
  • Slide Number 10
  • Slide Number 11
  • Slide Number 12
  • Slide Number 13
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Slide Number 18
  • Slide Number 19
  • Slide Number 20
  • Slide Number 21
  • Slide Number 22
  • Slide Number 23
  • Slide Number 24
  • Slide Number 25
  • Slide Number 26
  • Slide Number 27
  • Slide Number 28
  • Slide Number 29
  • Slide Number 30
  • Slide Number 31
  • Slide Number 32
  • Slide Number 33
  • Slide Number 34
  • Slide Number 35
  • Slide Number 36
  • Evolution of antivirus defense systems
  • Slide Number 38
  • Slide Number 39
  • Slide Number 40
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins extreme sequence diversity
  • Slide Number 46
  • Slide Number 47
  • Slide Number 48
  • Slide Number 49
  • Hypothesis
  • Slide Number 51
  • Slide Number 52
  • Slide Number 53
  • Slide Number 54
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • The three types of CRISPRCas systems and their signature genes
  • Experimental data on CRISPRCas systems
  • Phylogeny of Cas1 and the 3 types of CRISPRCas systems
  • CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
  • Slide Number 62
  • Slide Number 63
  • Slide Number 64
  • Slide Number 65
  • Slide Number 66
  • Slide Number 67
  • Slide Number 68
  • Slide Number 69
Page 55: The Virus World, its evolution, evolution of antiviral defense

Experimental data on CRISPRCas systems

1857

1857

1857

RAMP1688

10

Cordip DIP0037Bacfra BF3951

Wolsuc WS1444Camjej Cj1522c

Neimen NMA0630Pasmul PM1126

Strthe str0658Strpyo Spy 0770

Treden TDE0328Staepi SERP2463

Mycgal MGA 0523Mycmob MMOB0320

Porgin PG1982Legpne lpp0161

Wolsuc WS1615Sulsol SSO1450

Aerper APE1240Sulsol SSO1405

Pyraer PAE0200Pyraer PAE0081

Metmaz MM0559Metmaz MM3249

Arcful AF1878Mansuc MS1635

Niteur NE0111Myctub Rv2817c

Nostoc alr0381Synech slr7071Thethe TTHB145

Chrvio CV1229Xanaxo XAC3842

Chltep CT1130Desvul DVUA0134

Metcap MCA0651Azoarc ebA3283

Deigeo_DRAFT_1682Mansuc MS0981

Baccla ABC3592Bachal BH0341

Strpyo Spy 1286Vibvul VVA1544

Nostoc alr1468Synech slr7092

Synech slr7016Geosul GSU0057

Thethe TTHB224Nostoc alr1568

Metkan MK1312

Esccol b2755Saltyp STM2938

Phopro PBPRC0034Geosul GSU1392

Chltep CT1977Thethe TTHB193Deigeo_DRAFT_2639

Symthe_STH663Corjei jk0643

Nocfar nfa44220Strave SAV7537

Metcap MCA0930Cordip DIP2214

Chrvio CV1756Zymmob ZMO0680

Pasmul PM0311Erwcar ECA3679

Yerpes y1722Legpne lpl2837Phopro PBPRB1995

Acinet ACIAD2484

Clotet CTC01148Fusnuc FN1177

Aquaeo aq 369Theten TTE2658Themar TM1797

Bacfra BF2544Porgin PG2014

Halmar pNG4053Clotet CTC01463

Metjan MJ0378Pyrhor PH1245

Thekod TK0455Metthe MTH1084

Arcful AF2435Metace MA3670

Nanequ NEQ017Pyrhor PH0173

Pictor PTO0003Pictor PTO0049Thevol TVN0106

1343

1343

1343

1343

1518

1518

1343

1518

RecB-f4343

RecB-f1468

RecB-f1468

RecB-f1468

Helicase12031857 RAMP

1688

RAMP1688

Helicase1203

Helicase1203

RAMPBH0337

RAMPYgcH

RAMPy1727

RAMPy17261857

Helicase1203

HD-f

Helicase1203

HD-f

1343

HD-f

HD-f

HD-f

RuvC-f

AF0070

BH0338

y1724

ygcKygcL

RAMP1583

HTH

AF1870

SPy1049

3574

AF1870

PH0918

HTH 3574

1857 RAMP1688

RecB-f1468

Helicase1203

HD-f

AF1870

MTH1090

AF1873

RAMP1583

3574

AF1870

3574

1343

1

7

2

3

4

5

6

7a

Sulsol SSO1999Sulsol SSO1440

Sultok ST2639Sultok ST0032

Sulsol SSO1402Pyraer PAE0068

Arcful AF1874Pyrhor PH0917Thekod TK0450

Pyraby PAB1689

13433513

3513

1518

RuvC-f

HNH

4aHNH

1 0

Pyrhor PH0162

6

64

6

4

16

17

7

7

7

7

1

2

7

7

7

7

5 7

5 75 7

5 75 7

5 7

6 1

7

Metkan MK1314

Metace MA1931

Metmaz MM3356

Metbar Mbar_A1355

Strthe stu0960Myctub Rv2823cStaepi SERP2461

Themar TM1811Mansuc MS1651

Niteur NE0123Thethe TTP0102

Metthe MTH1082

Metthe MTH326

Metjan MJ1672Pictor PTO0053Thevol TVN0112

Metkan MK1297Thethe TTP0115

Bachal BH0328Synech sll7090

Aquaeo aq 387Theten TTE2639

Themar TM1794Arcful AF1867

Pyrfur PF1129Thefus Tfu1578

Porgin PG1987Sultok ST1979

Sulsol SSO1991Sulsol SSO1729

Sultok ST0012Sulaci Saci 2046Nostoc all1479

1421 RAMP1567

RAMP1583

RAMPMTH323

RAMP1769

XMTH324

ldquoPolymeraserdquo cassette

ldquoHelicaserdquo cassette(A)

(B)

(C)

Pseudomonas aeruginosaHaurwitz RE et al Science 2010 Sep 10329(5997)1355-8

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

Staphylococcus epidermidisMarraffini LA Sontheimer EJ Science 2008 Dec 19322(5909)1843-5

E coliBrouns SJ et al Science 2008 Aug15321(5891)960-4

Pyrococcus furiosusHale CR et alCell 2009 Nov25139(5)945-56

Streptococcus thermophilusBarrangou R et al Science 315 1709-12 (2007)

mRNA targeting

Phylogeny of Cas1 and the 3 types of CRISPRCas systems

228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes

EcoliCASS2Type IType I

Type I

Type I

Type I

Type III

Type III

Type III

Type III

Type III

Type III

Type II

YpestCASS3

NmeniCASS4

PolymeraseRAMPmoduleMtubeCASS6

PolRAMP

DvulgCASS1

TneapHmariCASS7

ApernCASS5

cas1cas9 cas2 cas4

cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI

II

IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2

RAMP

Makarova et al NRM submitted

CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria

bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes

54

256

1537

26

516

2

9

7 56107

3

10

13

383

210

46

216

8

1

7 70211

10

1

0102030405060708090

100

1623

84 6

17 7

2322

0

9

0

0

151

2

1

21

17

20 1

0

1533 28 7 14

0

9 7 40

117 210

0102030405060708090

100

Type I Type II Type IIICas1presenceabsence

Makarova et al in preparation

Modular evolution of the 3 types of CRISPRCas

Makarova et al in preparation

Mol Microbiol 2011 Jan79(2)484-502

A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair

Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF

Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair

Back to repair functions

The 3 major modalities of evolution

Koonin Wolf Biol Direct 2009

CRISPRCas as a bona fide Lamarckian system

Koonin Wolf Biol Direct 2009

Phenomenon Biological rolefunction

Phyletic spread

Lamarckian criteria

Genomic changes caused by environmental factor

Changes are specific to relevant genomic loci

Changes provide adaptation to the causative factor

Bona fide LamarckianCRISPRCas Defense against

viruses and other mobile elements

Most of the Archaeaand many bacteria

Yes Yes Yes

piRNA Defense against transposable elements in germline

Animals Yes Yes Yes

HGT (specific cases)

Adaptation to new environment stress response resistance

Archaea bacteria unicellular eukaryotes

Yes Yes Yes

Quasi-LamarckianHGT (general phenomenon)

Diverse innovations Archaea bacteria unicellular eukaryotes

Yes No Yesno

Stress-induced mutagenesis

Stress responseresistanceadaptation to new conditions

Ubiquitous Yes No or partially

Yes (but general evolvabilityenhanced as well)

Diverse Lamarckian and quasi-Lamarckian phenomena

Koonin Wolf Biol Direct 2009

Stress as a gauge of evolutionary modality

Koonin Wolf Biol Direct 2009

bullEvolution of parasites is intrinsic to any replicator system

bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since

bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms

bullPerennial arms race between parasites and hosts is one of the principal factors of evolution

Valerian DoljaOregon State University

Yuri Wolf NCBITatiana Senkevich NIAID NIH

Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille

Didier Raoult et al

Bill MartinUniv Duesseldorf

Kira Makarova NCBI

  • Slide Number 1
  • What is a virus
  • Slide Number 3
  • Slide Number 4
  • Slide Number 5
  • Comparative genomics shows that viruses that cause human diseases belong to families that evolved hundreds of millions or even billions years agoViruses accompany the evolving cellular life throughout its history and might even predate it
  • Slide Number 7
  • Slide Number 8
  • Slide Number 9
  • Slide Number 10
  • Slide Number 11
  • Slide Number 12
  • Slide Number 13
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Slide Number 18
  • Slide Number 19
  • Slide Number 20
  • Slide Number 21
  • Slide Number 22
  • Slide Number 23
  • Slide Number 24
  • Slide Number 25
  • Slide Number 26
  • Slide Number 27
  • Slide Number 28
  • Slide Number 29
  • Slide Number 30
  • Slide Number 31
  • Slide Number 32
  • Slide Number 33
  • Slide Number 34
  • Slide Number 35
  • Slide Number 36
  • Evolution of antivirus defense systems
  • Slide Number 38
  • Slide Number 39
  • Slide Number 40
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins extreme sequence diversity
  • Slide Number 46
  • Slide Number 47
  • Slide Number 48
  • Slide Number 49
  • Hypothesis
  • Slide Number 51
  • Slide Number 52
  • Slide Number 53
  • Slide Number 54
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • The three types of CRISPRCas systems and their signature genes
  • Experimental data on CRISPRCas systems
  • Phylogeny of Cas1 and the 3 types of CRISPRCas systems
  • CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
  • Slide Number 62
  • Slide Number 63
  • Slide Number 64
  • Slide Number 65
  • Slide Number 66
  • Slide Number 67
  • Slide Number 68
  • Slide Number 69
Page 56: The Virus World, its evolution, evolution of antiviral defense

Phylogeny of Cas1 and the 3 types of CRISPRCas systems

228 representative Cas1 sequences selected out of 643 Cas1 proteins in 442 genomes

EcoliCASS2Type IType I

Type I

Type I

Type I

Type III

Type III

Type III

Type III

Type III

Type III

Type II

YpestCASS3

NmeniCASS4

PolymeraseRAMPmoduleMtubeCASS6

PolRAMP

DvulgCASS1

TneapHmariCASS7

ApernCASS5

cas1cas9 cas2 cas4

cse1 cse2 cas7 cas5cas3 cas1 cas2cas6eI

II

IIIcas1csm6csm5csm4csm3csm2cas10cas6 cas2

RAMP

Makarova et al NRM submitted

CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria

bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes

54

256

1537

26

516

2

9

7 56107

3

10

13

383

210

46

216

8

1

7 70211

10

1

0102030405060708090

100

1623

84 6

17 7

2322

0

9

0

0

151

2

1

21

17

20 1

0

1533 28 7 14

0

9 7 40

117 210

0102030405060708090

100

Type I Type II Type IIICas1presenceabsence

Makarova et al in preparation

Modular evolution of the 3 types of CRISPRCas

Makarova et al in preparation

Mol Microbiol 2011 Jan79(2)484-502

A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair

Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF

Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair

Back to repair functions

The 3 major modalities of evolution

Koonin Wolf Biol Direct 2009

CRISPRCas as a bona fide Lamarckian system

Koonin Wolf Biol Direct 2009

Phenomenon Biological rolefunction

Phyletic spread

Lamarckian criteria

Genomic changes caused by environmental factor

Changes are specific to relevant genomic loci

Changes provide adaptation to the causative factor

Bona fide LamarckianCRISPRCas Defense against

viruses and other mobile elements

Most of the Archaeaand many bacteria

Yes Yes Yes

piRNA Defense against transposable elements in germline

Animals Yes Yes Yes

HGT (specific cases)

Adaptation to new environment stress response resistance

Archaea bacteria unicellular eukaryotes

Yes Yes Yes

Quasi-LamarckianHGT (general phenomenon)

Diverse innovations Archaea bacteria unicellular eukaryotes

Yes No Yesno

Stress-induced mutagenesis

Stress responseresistanceadaptation to new conditions

Ubiquitous Yes No or partially

Yes (but general evolvabilityenhanced as well)

Diverse Lamarckian and quasi-Lamarckian phenomena

Koonin Wolf Biol Direct 2009

Stress as a gauge of evolutionary modality

Koonin Wolf Biol Direct 2009

bullEvolution of parasites is intrinsic to any replicator system

bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since

bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms

bullPerennial arms race between parasites and hosts is one of the principal factors of evolution

Valerian DoljaOregon State University

Yuri Wolf NCBITatiana Senkevich NIAID NIH

Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille

Didier Raoult et al

Bill MartinUniv Duesseldorf

Kira Makarova NCBI

  • Slide Number 1
  • What is a virus
  • Slide Number 3
  • Slide Number 4
  • Slide Number 5
  • Comparative genomics shows that viruses that cause human diseases belong to families that evolved hundreds of millions or even billions years agoViruses accompany the evolving cellular life throughout its history and might even predate it
  • Slide Number 7
  • Slide Number 8
  • Slide Number 9
  • Slide Number 10
  • Slide Number 11
  • Slide Number 12
  • Slide Number 13
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Slide Number 18
  • Slide Number 19
  • Slide Number 20
  • Slide Number 21
  • Slide Number 22
  • Slide Number 23
  • Slide Number 24
  • Slide Number 25
  • Slide Number 26
  • Slide Number 27
  • Slide Number 28
  • Slide Number 29
  • Slide Number 30
  • Slide Number 31
  • Slide Number 32
  • Slide Number 33
  • Slide Number 34
  • Slide Number 35
  • Slide Number 36
  • Evolution of antivirus defense systems
  • Slide Number 38
  • Slide Number 39
  • Slide Number 40
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins extreme sequence diversity
  • Slide Number 46
  • Slide Number 47
  • Slide Number 48
  • Slide Number 49
  • Hypothesis
  • Slide Number 51
  • Slide Number 52
  • Slide Number 53
  • Slide Number 54
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • The three types of CRISPRCas systems and their signature genes
  • Experimental data on CRISPRCas systems
  • Phylogeny of Cas1 and the 3 types of CRISPRCas systems
  • CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
  • Slide Number 62
  • Slide Number 63
  • Slide Number 64
  • Slide Number 65
  • Slide Number 66
  • Slide Number 67
  • Slide Number 68
  • Slide Number 69
Page 57: The Virus World, its evolution, evolution of antiviral defense

CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria

bull Cas1 is present in 310 (44) genomesbull ~90 archaea but only ~35 of bacteria bull Type I is present in 42 genomes Type II ndash 9 Type III ndash 20bull Two or three systems of different types are present in 128 (20) genomes

54

256

1537

26

516

2

9

7 56107

3

10

13

383

210

46

216

8

1

7 70211

10

1

0102030405060708090

100

1623

84 6

17 7

2322

0

9

0

0

151

2

1

21

17

20 1

0

1533 28 7 14

0

9 7 40

117 210

0102030405060708090

100

Type I Type II Type IIICas1presenceabsence

Makarova et al in preparation

Modular evolution of the 3 types of CRISPRCas

Makarova et al in preparation

Mol Microbiol 2011 Jan79(2)484-502

A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair

Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF

Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair

Back to repair functions

The 3 major modalities of evolution

Koonin Wolf Biol Direct 2009

CRISPRCas as a bona fide Lamarckian system

Koonin Wolf Biol Direct 2009

Phenomenon Biological rolefunction

Phyletic spread

Lamarckian criteria

Genomic changes caused by environmental factor

Changes are specific to relevant genomic loci

Changes provide adaptation to the causative factor

Bona fide LamarckianCRISPRCas Defense against

viruses and other mobile elements

Most of the Archaeaand many bacteria

Yes Yes Yes

piRNA Defense against transposable elements in germline

Animals Yes Yes Yes

HGT (specific cases)

Adaptation to new environment stress response resistance

Archaea bacteria unicellular eukaryotes

Yes Yes Yes

Quasi-LamarckianHGT (general phenomenon)

Diverse innovations Archaea bacteria unicellular eukaryotes

Yes No Yesno

Stress-induced mutagenesis

Stress responseresistanceadaptation to new conditions

Ubiquitous Yes No or partially

Yes (but general evolvabilityenhanced as well)

Diverse Lamarckian and quasi-Lamarckian phenomena

Koonin Wolf Biol Direct 2009

Stress as a gauge of evolutionary modality

Koonin Wolf Biol Direct 2009

bullEvolution of parasites is intrinsic to any replicator system

bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since

bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms

bullPerennial arms race between parasites and hosts is one of the principal factors of evolution

Valerian DoljaOregon State University

Yuri Wolf NCBITatiana Senkevich NIAID NIH

Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille

Didier Raoult et al

Bill MartinUniv Duesseldorf

Kira Makarova NCBI

  • Slide Number 1
  • What is a virus
  • Slide Number 3
  • Slide Number 4
  • Slide Number 5
  • Comparative genomics shows that viruses that cause human diseases belong to families that evolved hundreds of millions or even billions years agoViruses accompany the evolving cellular life throughout its history and might even predate it
  • Slide Number 7
  • Slide Number 8
  • Slide Number 9
  • Slide Number 10
  • Slide Number 11
  • Slide Number 12
  • Slide Number 13
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Slide Number 18
  • Slide Number 19
  • Slide Number 20
  • Slide Number 21
  • Slide Number 22
  • Slide Number 23
  • Slide Number 24
  • Slide Number 25
  • Slide Number 26
  • Slide Number 27
  • Slide Number 28
  • Slide Number 29
  • Slide Number 30
  • Slide Number 31
  • Slide Number 32
  • Slide Number 33
  • Slide Number 34
  • Slide Number 35
  • Slide Number 36
  • Evolution of antivirus defense systems
  • Slide Number 38
  • Slide Number 39
  • Slide Number 40
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins extreme sequence diversity
  • Slide Number 46
  • Slide Number 47
  • Slide Number 48
  • Slide Number 49
  • Hypothesis
  • Slide Number 51
  • Slide Number 52
  • Slide Number 53
  • Slide Number 54
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • The three types of CRISPRCas systems and their signature genes
  • Experimental data on CRISPRCas systems
  • Phylogeny of Cas1 and the 3 types of CRISPRCas systems
  • CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
  • Slide Number 62
  • Slide Number 63
  • Slide Number 64
  • Slide Number 65
  • Slide Number 66
  • Slide Number 67
  • Slide Number 68
  • Slide Number 69
Page 58: The Virus World, its evolution, evolution of antiviral defense

Modular evolution of the 3 types of CRISPRCas

Makarova et al in preparation

Mol Microbiol 2011 Jan79(2)484-502

A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair

Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF

Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair

Back to repair functions

The 3 major modalities of evolution

Koonin Wolf Biol Direct 2009

CRISPRCas as a bona fide Lamarckian system

Koonin Wolf Biol Direct 2009

Phenomenon Biological rolefunction

Phyletic spread

Lamarckian criteria

Genomic changes caused by environmental factor

Changes are specific to relevant genomic loci

Changes provide adaptation to the causative factor

Bona fide LamarckianCRISPRCas Defense against

viruses and other mobile elements

Most of the Archaeaand many bacteria

Yes Yes Yes

piRNA Defense against transposable elements in germline

Animals Yes Yes Yes

HGT (specific cases)

Adaptation to new environment stress response resistance

Archaea bacteria unicellular eukaryotes

Yes Yes Yes

Quasi-LamarckianHGT (general phenomenon)

Diverse innovations Archaea bacteria unicellular eukaryotes

Yes No Yesno

Stress-induced mutagenesis

Stress responseresistanceadaptation to new conditions

Ubiquitous Yes No or partially

Yes (but general evolvabilityenhanced as well)

Diverse Lamarckian and quasi-Lamarckian phenomena

Koonin Wolf Biol Direct 2009

Stress as a gauge of evolutionary modality

Koonin Wolf Biol Direct 2009

bullEvolution of parasites is intrinsic to any replicator system

bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since

bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms

bullPerennial arms race between parasites and hosts is one of the principal factors of evolution

Valerian DoljaOregon State University

Yuri Wolf NCBITatiana Senkevich NIAID NIH

Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille

Didier Raoult et al

Bill MartinUniv Duesseldorf

Kira Makarova NCBI

  • Slide Number 1
  • What is a virus
  • Slide Number 3
  • Slide Number 4
  • Slide Number 5
  • Comparative genomics shows that viruses that cause human diseases belong to families that evolved hundreds of millions or even billions years agoViruses accompany the evolving cellular life throughout its history and might even predate it
  • Slide Number 7
  • Slide Number 8
  • Slide Number 9
  • Slide Number 10
  • Slide Number 11
  • Slide Number 12
  • Slide Number 13
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Slide Number 18
  • Slide Number 19
  • Slide Number 20
  • Slide Number 21
  • Slide Number 22
  • Slide Number 23
  • Slide Number 24
  • Slide Number 25
  • Slide Number 26
  • Slide Number 27
  • Slide Number 28
  • Slide Number 29
  • Slide Number 30
  • Slide Number 31
  • Slide Number 32
  • Slide Number 33
  • Slide Number 34
  • Slide Number 35
  • Slide Number 36
  • Evolution of antivirus defense systems
  • Slide Number 38
  • Slide Number 39
  • Slide Number 40
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins extreme sequence diversity
  • Slide Number 46
  • Slide Number 47
  • Slide Number 48
  • Slide Number 49
  • Hypothesis
  • Slide Number 51
  • Slide Number 52
  • Slide Number 53
  • Slide Number 54
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • The three types of CRISPRCas systems and their signature genes
  • Experimental data on CRISPRCas systems
  • Phylogeny of Cas1 and the 3 types of CRISPRCas systems
  • CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
  • Slide Number 62
  • Slide Number 63
  • Slide Number 64
  • Slide Number 65
  • Slide Number 66
  • Slide Number 67
  • Slide Number 68
  • Slide Number 69
Page 59: The Virus World, its evolution, evolution of antiviral defense

Mol Microbiol 2011 Jan79(2)484-502

A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair

Babu M Beloglazova N Flick R Graham C Skarina T Nocek B Gagarinova APogoutse O Brown G Binkowski A Phanse S Joachimiak A Koonin EV Savchenko AEmili A Greenblatt J Edwards AM Yakunin AF

Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and theassociated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes Cas1 is a CRISPR-associated protein that is commonto all CRISPR-containing prokaryotes but its function remains obscure Here weshow that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions replication forks and 5-flaps The crystal structure of YgbT and site-directedmutagenesis have revealed the potential active site Genome-wide screens showthat YgbT physically and genetically interacts with key components of DNA repair systems including recB recC and ruvB Consistent with these findings the ygbT deletion strain showed increased sensitivity to DNA damage and impairedchromosomal segregation Similar phenotypes were observed in strains withdeletion of CRISPR clusters suggesting that the function of YgbT in repairinvolves interaction with the CRISPRs These results show that YgbT belongs to a novel structurally distinct family of nucleases acting on branched DNAs andsuggest that in addition to antiviral immunity at least some components of the CRISPR-Cas system have a function in DNA repair

Back to repair functions

The 3 major modalities of evolution

Koonin Wolf Biol Direct 2009

CRISPRCas as a bona fide Lamarckian system

Koonin Wolf Biol Direct 2009

Phenomenon Biological rolefunction

Phyletic spread

Lamarckian criteria

Genomic changes caused by environmental factor

Changes are specific to relevant genomic loci

Changes provide adaptation to the causative factor

Bona fide LamarckianCRISPRCas Defense against

viruses and other mobile elements

Most of the Archaeaand many bacteria

Yes Yes Yes

piRNA Defense against transposable elements in germline

Animals Yes Yes Yes

HGT (specific cases)

Adaptation to new environment stress response resistance

Archaea bacteria unicellular eukaryotes

Yes Yes Yes

Quasi-LamarckianHGT (general phenomenon)

Diverse innovations Archaea bacteria unicellular eukaryotes

Yes No Yesno

Stress-induced mutagenesis

Stress responseresistanceadaptation to new conditions

Ubiquitous Yes No or partially

Yes (but general evolvabilityenhanced as well)

Diverse Lamarckian and quasi-Lamarckian phenomena

Koonin Wolf Biol Direct 2009

Stress as a gauge of evolutionary modality

Koonin Wolf Biol Direct 2009

bullEvolution of parasites is intrinsic to any replicator system

bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since

bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms

bullPerennial arms race between parasites and hosts is one of the principal factors of evolution

Valerian DoljaOregon State University

Yuri Wolf NCBITatiana Senkevich NIAID NIH

Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille

Didier Raoult et al

Bill MartinUniv Duesseldorf

Kira Makarova NCBI

  • Slide Number 1
  • What is a virus
  • Slide Number 3
  • Slide Number 4
  • Slide Number 5
  • Comparative genomics shows that viruses that cause human diseases belong to families that evolved hundreds of millions or even billions years agoViruses accompany the evolving cellular life throughout its history and might even predate it
  • Slide Number 7
  • Slide Number 8
  • Slide Number 9
  • Slide Number 10
  • Slide Number 11
  • Slide Number 12
  • Slide Number 13
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Slide Number 18
  • Slide Number 19
  • Slide Number 20
  • Slide Number 21
  • Slide Number 22
  • Slide Number 23
  • Slide Number 24
  • Slide Number 25
  • Slide Number 26
  • Slide Number 27
  • Slide Number 28
  • Slide Number 29
  • Slide Number 30
  • Slide Number 31
  • Slide Number 32
  • Slide Number 33
  • Slide Number 34
  • Slide Number 35
  • Slide Number 36
  • Evolution of antivirus defense systems
  • Slide Number 38
  • Slide Number 39
  • Slide Number 40
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins extreme sequence diversity
  • Slide Number 46
  • Slide Number 47
  • Slide Number 48
  • Slide Number 49
  • Hypothesis
  • Slide Number 51
  • Slide Number 52
  • Slide Number 53
  • Slide Number 54
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • The three types of CRISPRCas systems and their signature genes
  • Experimental data on CRISPRCas systems
  • Phylogeny of Cas1 and the 3 types of CRISPRCas systems
  • CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
  • Slide Number 62
  • Slide Number 63
  • Slide Number 64
  • Slide Number 65
  • Slide Number 66
  • Slide Number 67
  • Slide Number 68
  • Slide Number 69
Page 60: The Virus World, its evolution, evolution of antiviral defense

The 3 major modalities of evolution

Koonin Wolf Biol Direct 2009

CRISPRCas as a bona fide Lamarckian system

Koonin Wolf Biol Direct 2009

Phenomenon Biological rolefunction

Phyletic spread

Lamarckian criteria

Genomic changes caused by environmental factor

Changes are specific to relevant genomic loci

Changes provide adaptation to the causative factor

Bona fide LamarckianCRISPRCas Defense against

viruses and other mobile elements

Most of the Archaeaand many bacteria

Yes Yes Yes

piRNA Defense against transposable elements in germline

Animals Yes Yes Yes

HGT (specific cases)

Adaptation to new environment stress response resistance

Archaea bacteria unicellular eukaryotes

Yes Yes Yes

Quasi-LamarckianHGT (general phenomenon)

Diverse innovations Archaea bacteria unicellular eukaryotes

Yes No Yesno

Stress-induced mutagenesis

Stress responseresistanceadaptation to new conditions

Ubiquitous Yes No or partially

Yes (but general evolvabilityenhanced as well)

Diverse Lamarckian and quasi-Lamarckian phenomena

Koonin Wolf Biol Direct 2009

Stress as a gauge of evolutionary modality

Koonin Wolf Biol Direct 2009

bullEvolution of parasites is intrinsic to any replicator system

bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since

bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms

bullPerennial arms race between parasites and hosts is one of the principal factors of evolution

Valerian DoljaOregon State University

Yuri Wolf NCBITatiana Senkevich NIAID NIH

Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille

Didier Raoult et al

Bill MartinUniv Duesseldorf

Kira Makarova NCBI

  • Slide Number 1
  • What is a virus
  • Slide Number 3
  • Slide Number 4
  • Slide Number 5
  • Comparative genomics shows that viruses that cause human diseases belong to families that evolved hundreds of millions or even billions years agoViruses accompany the evolving cellular life throughout its history and might even predate it
  • Slide Number 7
  • Slide Number 8
  • Slide Number 9
  • Slide Number 10
  • Slide Number 11
  • Slide Number 12
  • Slide Number 13
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Slide Number 18
  • Slide Number 19
  • Slide Number 20
  • Slide Number 21
  • Slide Number 22
  • Slide Number 23
  • Slide Number 24
  • Slide Number 25
  • Slide Number 26
  • Slide Number 27
  • Slide Number 28
  • Slide Number 29
  • Slide Number 30
  • Slide Number 31
  • Slide Number 32
  • Slide Number 33
  • Slide Number 34
  • Slide Number 35
  • Slide Number 36
  • Evolution of antivirus defense systems
  • Slide Number 38
  • Slide Number 39
  • Slide Number 40
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins extreme sequence diversity
  • Slide Number 46
  • Slide Number 47
  • Slide Number 48
  • Slide Number 49
  • Hypothesis
  • Slide Number 51
  • Slide Number 52
  • Slide Number 53
  • Slide Number 54
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • The three types of CRISPRCas systems and their signature genes
  • Experimental data on CRISPRCas systems
  • Phylogeny of Cas1 and the 3 types of CRISPRCas systems
  • CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
  • Slide Number 62
  • Slide Number 63
  • Slide Number 64
  • Slide Number 65
  • Slide Number 66
  • Slide Number 67
  • Slide Number 68
  • Slide Number 69
Page 61: The Virus World, its evolution, evolution of antiviral defense

CRISPRCas as a bona fide Lamarckian system

Koonin Wolf Biol Direct 2009

Phenomenon Biological rolefunction

Phyletic spread

Lamarckian criteria

Genomic changes caused by environmental factor

Changes are specific to relevant genomic loci

Changes provide adaptation to the causative factor

Bona fide LamarckianCRISPRCas Defense against

viruses and other mobile elements

Most of the Archaeaand many bacteria

Yes Yes Yes

piRNA Defense against transposable elements in germline

Animals Yes Yes Yes

HGT (specific cases)

Adaptation to new environment stress response resistance

Archaea bacteria unicellular eukaryotes

Yes Yes Yes

Quasi-LamarckianHGT (general phenomenon)

Diverse innovations Archaea bacteria unicellular eukaryotes

Yes No Yesno

Stress-induced mutagenesis

Stress responseresistanceadaptation to new conditions

Ubiquitous Yes No or partially

Yes (but general evolvabilityenhanced as well)

Diverse Lamarckian and quasi-Lamarckian phenomena

Koonin Wolf Biol Direct 2009

Stress as a gauge of evolutionary modality

Koonin Wolf Biol Direct 2009

bullEvolution of parasites is intrinsic to any replicator system

bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since

bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms

bullPerennial arms race between parasites and hosts is one of the principal factors of evolution

Valerian DoljaOregon State University

Yuri Wolf NCBITatiana Senkevich NIAID NIH

Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille

Didier Raoult et al

Bill MartinUniv Duesseldorf

Kira Makarova NCBI

  • Slide Number 1
  • What is a virus
  • Slide Number 3
  • Slide Number 4
  • Slide Number 5
  • Comparative genomics shows that viruses that cause human diseases belong to families that evolved hundreds of millions or even billions years agoViruses accompany the evolving cellular life throughout its history and might even predate it
  • Slide Number 7
  • Slide Number 8
  • Slide Number 9
  • Slide Number 10
  • Slide Number 11
  • Slide Number 12
  • Slide Number 13
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Slide Number 18
  • Slide Number 19
  • Slide Number 20
  • Slide Number 21
  • Slide Number 22
  • Slide Number 23
  • Slide Number 24
  • Slide Number 25
  • Slide Number 26
  • Slide Number 27
  • Slide Number 28
  • Slide Number 29
  • Slide Number 30
  • Slide Number 31
  • Slide Number 32
  • Slide Number 33
  • Slide Number 34
  • Slide Number 35
  • Slide Number 36
  • Evolution of antivirus defense systems
  • Slide Number 38
  • Slide Number 39
  • Slide Number 40
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins extreme sequence diversity
  • Slide Number 46
  • Slide Number 47
  • Slide Number 48
  • Slide Number 49
  • Hypothesis
  • Slide Number 51
  • Slide Number 52
  • Slide Number 53
  • Slide Number 54
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • The three types of CRISPRCas systems and their signature genes
  • Experimental data on CRISPRCas systems
  • Phylogeny of Cas1 and the 3 types of CRISPRCas systems
  • CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
  • Slide Number 62
  • Slide Number 63
  • Slide Number 64
  • Slide Number 65
  • Slide Number 66
  • Slide Number 67
  • Slide Number 68
  • Slide Number 69
Page 62: The Virus World, its evolution, evolution of antiviral defense

Phenomenon Biological rolefunction

Phyletic spread

Lamarckian criteria

Genomic changes caused by environmental factor

Changes are specific to relevant genomic loci

Changes provide adaptation to the causative factor

Bona fide LamarckianCRISPRCas Defense against

viruses and other mobile elements

Most of the Archaeaand many bacteria

Yes Yes Yes

piRNA Defense against transposable elements in germline

Animals Yes Yes Yes

HGT (specific cases)

Adaptation to new environment stress response resistance

Archaea bacteria unicellular eukaryotes

Yes Yes Yes

Quasi-LamarckianHGT (general phenomenon)

Diverse innovations Archaea bacteria unicellular eukaryotes

Yes No Yesno

Stress-induced mutagenesis

Stress responseresistanceadaptation to new conditions

Ubiquitous Yes No or partially

Yes (but general evolvabilityenhanced as well)

Diverse Lamarckian and quasi-Lamarckian phenomena

Koonin Wolf Biol Direct 2009

Stress as a gauge of evolutionary modality

Koonin Wolf Biol Direct 2009

bullEvolution of parasites is intrinsic to any replicator system

bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since

bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms

bullPerennial arms race between parasites and hosts is one of the principal factors of evolution

Valerian DoljaOregon State University

Yuri Wolf NCBITatiana Senkevich NIAID NIH

Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille

Didier Raoult et al

Bill MartinUniv Duesseldorf

Kira Makarova NCBI

  • Slide Number 1
  • What is a virus
  • Slide Number 3
  • Slide Number 4
  • Slide Number 5
  • Comparative genomics shows that viruses that cause human diseases belong to families that evolved hundreds of millions or even billions years agoViruses accompany the evolving cellular life throughout its history and might even predate it
  • Slide Number 7
  • Slide Number 8
  • Slide Number 9
  • Slide Number 10
  • Slide Number 11
  • Slide Number 12
  • Slide Number 13
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Slide Number 18
  • Slide Number 19
  • Slide Number 20
  • Slide Number 21
  • Slide Number 22
  • Slide Number 23
  • Slide Number 24
  • Slide Number 25
  • Slide Number 26
  • Slide Number 27
  • Slide Number 28
  • Slide Number 29
  • Slide Number 30
  • Slide Number 31
  • Slide Number 32
  • Slide Number 33
  • Slide Number 34
  • Slide Number 35
  • Slide Number 36
  • Evolution of antivirus defense systems
  • Slide Number 38
  • Slide Number 39
  • Slide Number 40
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins extreme sequence diversity
  • Slide Number 46
  • Slide Number 47
  • Slide Number 48
  • Slide Number 49
  • Hypothesis
  • Slide Number 51
  • Slide Number 52
  • Slide Number 53
  • Slide Number 54
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • The three types of CRISPRCas systems and their signature genes
  • Experimental data on CRISPRCas systems
  • Phylogeny of Cas1 and the 3 types of CRISPRCas systems
  • CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
  • Slide Number 62
  • Slide Number 63
  • Slide Number 64
  • Slide Number 65
  • Slide Number 66
  • Slide Number 67
  • Slide Number 68
  • Slide Number 69
Page 63: The Virus World, its evolution, evolution of antiviral defense

Stress as a gauge of evolutionary modality

Koonin Wolf Biol Direct 2009

bullEvolution of parasites is intrinsic to any replicator system

bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since

bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms

bullPerennial arms race between parasites and hosts is one of the principal factors of evolution

Valerian DoljaOregon State University

Yuri Wolf NCBITatiana Senkevich NIAID NIH

Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille

Didier Raoult et al

Bill MartinUniv Duesseldorf

Kira Makarova NCBI

  • Slide Number 1
  • What is a virus
  • Slide Number 3
  • Slide Number 4
  • Slide Number 5
  • Comparative genomics shows that viruses that cause human diseases belong to families that evolved hundreds of millions or even billions years agoViruses accompany the evolving cellular life throughout its history and might even predate it
  • Slide Number 7
  • Slide Number 8
  • Slide Number 9
  • Slide Number 10
  • Slide Number 11
  • Slide Number 12
  • Slide Number 13
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Slide Number 18
  • Slide Number 19
  • Slide Number 20
  • Slide Number 21
  • Slide Number 22
  • Slide Number 23
  • Slide Number 24
  • Slide Number 25
  • Slide Number 26
  • Slide Number 27
  • Slide Number 28
  • Slide Number 29
  • Slide Number 30
  • Slide Number 31
  • Slide Number 32
  • Slide Number 33
  • Slide Number 34
  • Slide Number 35
  • Slide Number 36
  • Evolution of antivirus defense systems
  • Slide Number 38
  • Slide Number 39
  • Slide Number 40
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins extreme sequence diversity
  • Slide Number 46
  • Slide Number 47
  • Slide Number 48
  • Slide Number 49
  • Hypothesis
  • Slide Number 51
  • Slide Number 52
  • Slide Number 53
  • Slide Number 54
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • The three types of CRISPRCas systems and their signature genes
  • Experimental data on CRISPRCas systems
  • Phylogeny of Cas1 and the 3 types of CRISPRCas systems
  • CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
  • Slide Number 62
  • Slide Number 63
  • Slide Number 64
  • Slide Number 65
  • Slide Number 66
  • Slide Number 67
  • Slide Number 68
  • Slide Number 69
Page 64: The Virus World, its evolution, evolution of antiviral defense

bullEvolution of parasites is intrinsic to any replicator system

bullDefense systems in particular those based on the RNAi principle appeared concomitantly with cells and coevolved with cells and viruses ever since

bullDefense systems occupy a substantial fraction of the genomes inall cellular life forms

bullPerennial arms race between parasites and hosts is one of the principal factors of evolution

Valerian DoljaOregon State University

Yuri Wolf NCBITatiana Senkevich NIAID NIH

Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille

Didier Raoult et al

Bill MartinUniv Duesseldorf

Kira Makarova NCBI

  • Slide Number 1
  • What is a virus
  • Slide Number 3
  • Slide Number 4
  • Slide Number 5
  • Comparative genomics shows that viruses that cause human diseases belong to families that evolved hundreds of millions or even billions years agoViruses accompany the evolving cellular life throughout its history and might even predate it
  • Slide Number 7
  • Slide Number 8
  • Slide Number 9
  • Slide Number 10
  • Slide Number 11
  • Slide Number 12
  • Slide Number 13
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Slide Number 18
  • Slide Number 19
  • Slide Number 20
  • Slide Number 21
  • Slide Number 22
  • Slide Number 23
  • Slide Number 24
  • Slide Number 25
  • Slide Number 26
  • Slide Number 27
  • Slide Number 28
  • Slide Number 29
  • Slide Number 30
  • Slide Number 31
  • Slide Number 32
  • Slide Number 33
  • Slide Number 34
  • Slide Number 35
  • Slide Number 36
  • Evolution of antivirus defense systems
  • Slide Number 38
  • Slide Number 39
  • Slide Number 40
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins extreme sequence diversity
  • Slide Number 46
  • Slide Number 47
  • Slide Number 48
  • Slide Number 49
  • Hypothesis
  • Slide Number 51
  • Slide Number 52
  • Slide Number 53
  • Slide Number 54
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • The three types of CRISPRCas systems and their signature genes
  • Experimental data on CRISPRCas systems
  • Phylogeny of Cas1 and the 3 types of CRISPRCas systems
  • CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
  • Slide Number 62
  • Slide Number 63
  • Slide Number 64
  • Slide Number 65
  • Slide Number 66
  • Slide Number 67
  • Slide Number 68
  • Slide Number 69
Page 65: The Virus World, its evolution, evolution of antiviral defense

Valerian DoljaOregon State University

Yuri Wolf NCBITatiana Senkevich NIAID NIH

Laks Iyer NCBIL Aravind NCBI Natalia Yutin NCBIUniversite de la Mediterranee-Marseille

Didier Raoult et al

Bill MartinUniv Duesseldorf

Kira Makarova NCBI

  • Slide Number 1
  • What is a virus
  • Slide Number 3
  • Slide Number 4
  • Slide Number 5
  • Comparative genomics shows that viruses that cause human diseases belong to families that evolved hundreds of millions or even billions years agoViruses accompany the evolving cellular life throughout its history and might even predate it
  • Slide Number 7
  • Slide Number 8
  • Slide Number 9
  • Slide Number 10
  • Slide Number 11
  • Slide Number 12
  • Slide Number 13
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Slide Number 18
  • Slide Number 19
  • Slide Number 20
  • Slide Number 21
  • Slide Number 22
  • Slide Number 23
  • Slide Number 24
  • Slide Number 25
  • Slide Number 26
  • Slide Number 27
  • Slide Number 28
  • Slide Number 29
  • Slide Number 30
  • Slide Number 31
  • Slide Number 32
  • Slide Number 33
  • Slide Number 34
  • Slide Number 35
  • Slide Number 36
  • Evolution of antivirus defense systems
  • Slide Number 38
  • Slide Number 39
  • Slide Number 40
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • RAMP (Repeat-Associated Mysterious Proteins) superfamily numerous families of Cas proteins extreme sequence diversity
  • Slide Number 46
  • Slide Number 47
  • Slide Number 48
  • Slide Number 49
  • Hypothesis
  • Slide Number 51
  • Slide Number 52
  • Slide Number 53
  • Slide Number 54
  • Slide Number 55
  • Slide Number 56
  • Slide Number 57
  • The three types of CRISPRCas systems and their signature genes
  • Experimental data on CRISPRCas systems
  • Phylogeny of Cas1 and the 3 types of CRISPRCas systems
  • CRISPRCAS systems in 703 selected complete genomes of archaea and bacteria
  • Slide Number 62
  • Slide Number 63
  • Slide Number 64
  • Slide Number 65
  • Slide Number 66
  • Slide Number 67
  • Slide Number 68
  • Slide Number 69