Dramatic variation in phage genome structures revealed by whole genome comparisons Welkin Pope 1 , Charles Bowman 1 , SEA-PHAGES 2 , PHIRE 3 , K-RITH MGC 4 , Deborah Jacobs- Sera 1 , Daniel A. Russell 1 , Steven Cresawn 5 , William R. Jacobs Jr. 6 , Jeffrey G. Lawrence 1 , Roger W. Hendrix 1 , and Graham F. Hatfull 1 *. 1 Department of Biological Sciences, University of Pittsburgh, Pittsburgh, PA 15260 2 Science Education Alliance Phage Hunters Advancing Genomics and Evolutionary Science 3 Phage Hunters Integrating Research and Education 4 KwaZulu-Natal Institute for TB and HIV research Mycobacterial Genetics Course 5 Department of Biology, James Madison University, Harrisonburg, VA 6 Department of Microbiology and Immunology, Albert Einstein College of Medicine, NY *Corresponding Author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Dramatic variation in phage genome structures revealed by whole genome comparisons
Welkin Pope1, Charles Bowman1, SEA-PHAGES2, PHIRE3, K-RITH MGC4, Deborah Jacobs-Sera1, Daniel A. Russell1, Steven Cresawn5, William R. Jacobs Jr.6, Jeffrey G. Lawrence1,
Roger W. Hendrix1, and Graham F. Hatfull1*.
1Department of Biological Sciences, University of Pittsburgh, Pittsburgh, PA 15260 2Science Education Alliance Phage Hunters Advancing Genomics and Evolutionary Science
3Phage Hunters Integrating Research and Education 4KwaZulu-Natal Institute for TB and HIV research Mycobacterial Genetics Course
5Department of Biology, James Madison University, Harrisonburg, VA 6Department of Microbiology and Immunology, Albert Einstein College of Medicine, NY
*Corresponding Author
2
Bacteriophages are the dark matter of the biological universe1, forming a vast, dynamic,
old, and genetically diverse population2. Horizontal exchange generates pervasive
genome mosaicism, with different genome segments having distinct evolutionary
histories3. Phages of phylogenetically distant hosts typically share low nucleic acid
sequence similarity, and few share genes with amino acid sequence similarity2. Phages
of a single common host can also span considerable sequence diversity even though
they are in direct genetic contact1. Comparative genomics of a large collection of phages
isolated on Mycobacterium smegmatis provides insights into the size and diversity of
groups of related phages and the extent to which the groups are discrete and genetically
isolated from other phages. We show that both the diversity and genetic isolation of
phage groups varies enormously. Some are discrete and share few genes with other
phages, whereas others are genetically connected to many other phages. The phage
population thus spans a continuum of relationships, but with phages of different types
varying enormously in prevalence. The reticulate relationships resulting from pervasively
mosaic architectures confound hierarchical taxonomic phage classification or
application of simple numerical values to distinguish among phage genomic types.
Bacteriophages are the most abundant organisms in the biosphere, and the ~1031 tailed phage
particles participate in ~1023 infections per second on a global scale, with the entire population
turning over every few days4. Virion structures suggest the population is also extremely old5 and
thus the great genetic diversity of phages is not surprising2. Phages likely evolved with common
ancestry and access to a large common gene pool3, although rates of horizontal exchange are
heterogeneous, being influenced by host range, varying phage migration rates across the
microbial landscape, and lifestyle (temperate or virulent)6. Multiple processes determine this
including local host diversity and mutation rates, as well as resistance mechanisms such as
receptor availability, restriction, CRISPRs, and abortive infection systems6,7. Constraints on
3
gene acquisition may also be imposed by synteny – particularly among virion structural genes –
and by size limits of DNA packaging2,8.
Genomic comparison of phages infecting a common host provides insights into evolutionary
mechanisms and the structure of their genetic diversity9. Relatively small numbers of phage
genomes have been sequenced for hosts such as Escherichia coli, Salmonella,
Staphylococcus, Pseudomonas, and Propionibacterium10-13 revealing varying degrees of genetic
diversity. Mycobacteriophages isolated from environmental samples using Mycobacterium
smegmatis mc2155 as a host are architecturally mosaic1 and span considerable diversity, but
can be grouped into ‘clusters’ of related phages that share little or no nucleotide sequence
similarity with other phages1,14-18. Some clusters are heterogeneous and can be readily divided
into subclusters by their nucleotide similarities. Recent analysis of phages adsorbed to
Synechococcus revealed 26 discrete ‘populations’, although they were obtained from a single
sample and are predominantly morphologically myoviral (T4-like)9. However, these populations
likely represent only a small portion Synechococcus phages because the genomes of 17 fully
sequenced phages infecting Synechoccocus or closely-related hosts fail to associate with these
“populations”9. These populations may thus reflect sampling bias of the single environment
examined, and extensive genomic mosaicism found in phages of Synechococcus and other
hosts1,3,19 warrants caution in extrapolation of the concept of discrete phage populations in the
absence of complete genome sequences.
The Howards Hughes Medical Institute (HHMI) Science Education Alliance Phage Hunters
Advancing Genomics and Evolutionary Science (SEA-PHAGES) program has facilitated
expansion of the number of sequenced mycobacteriophage genomes to 627 (Table S1) by
engaging large numbers of undergraduates in phage discovery and genomics20. The size of this
collection now provides sufficient resolution to offer insights into the diversity and genetic
4
isolation of phage genome types. Here we address the question of whether the groups of
related phages represent primarily discrete populations or genetically intermixed groups.
Although the collection excludes viruses that don’t form plaques under laboratory conditions, the
phages were isolated from widely dispersed geographical locations, including nine countries
and 36 of the continental United States (Fig. S1), over a dozen or more years. All are dsDNA
tailed phages (Caudovirales), and are morphologically siphoviral, except cluster C myoviruses.
Most have isometric heads except for singleton MooMoo and the Cluster I and O phages, which
have prolate heads21.
Using previously reported parameters15 the 627 genomes were assembled into 20 clusters (A –
T) and 8 singletons (with no close relatives) with large variations in Cluster sizes (Table 1, Fig.
S2); 11 clusters can be subdivided into 2 to 11 subclusters (Table 1). Clustered phages typically
share genome architectures; for example, Cluster A phages are similar in size, transcriptional
organization, and share an unusual immunity system16,22. A different set of clustering
parameters would generate different profiles, but not alter the core observation that there are
large variations among the different phage types. Cluster designation is simple for some phage
types because of extensive nucleotide similarity (e.g. Cluster C; Fig. S2), and if all clusters
resembled Cluster C, our data would be congruent with the Synechococcus populations 9. But
many do not, revealing more complex relationships.
To compare mycobacteriophage gene contents we grouped related genes into phamilies using
Phamerator23, modified to use kclust24. The 69,633 genes assembled into 5,205 phams of which
1,613 (31%) are orphams14 (single-gene phamilies), and the gene content relationships are
represented as a network phylogeny in Fig. 1. In general, branch lengths provide strong support
for cluster and subcluster designations (Table 1, Fig. S2); the proportions of orphams per
genome provide additional support, which as expected is highest for singletons and single-
5
genome subclusters (Fig. S3). Determination of the proportions of shared genes by pairwise
comparisons reveals the complexity of the genetic relationships (Fig. 2), and three major
features are apparent.
First, the overall phage relationships closely mirror the cluster and subcluster designations
derived by DNA similarities (Fig. S2). Secondly, the intra-cluster and intra-subcluster diversity
varies enormously, and this is quantified as the Cluster Cohesion Index (CCI, average number
of genes/genome divided by the total number of phamilies in the cluster; Table 1, Fig. 3). Thus
in clusters such as Cluster A (CCI, 0.08), the total number of phamilies is vastly greater than the
average number of genes per genome, indicating high diversity. The diversity of the A
subclusters is also highly varied with CCI values ranging from 0.22 to 0.91 (Table S1). In
contrast, Clusters G and O have low diversity (high CCI values) and closely related genomes
(Table 1; Fig. 3).
Thirdly, the degree to which clusters are genetically connected to other phages varies greatly,
and is quantified as the Cluster Isolation Index (CII, the percentage of phamilies not present in
genomes outside of the cluster; Table 1, Fig. 3). Some clusters such as Clusters A, B, C, and Q
share relatively few genes (<25%) with other phages and have high CCI values (Fig. 3). Other
groups, such as Clusters I and P, share >60% of their genes with other phages (Table 1),
reflecting the DNA relationships (Fig. S4). There are therefore no universally applicable values
of either diversity or isolation for different phage groups, and the most striking picture emerging
is one of great diversity with unequal representation of different types (Fig. 3). This is in marked
contrast to the discreet populations reported for Synechococcus phages9.
These comparisons reveal additional complexities arising from highly mosaic genomes (Figs.
S5-S8). For example, Dori is clearly related to Cluster B phages (Fig. 1) with which it shares 20-
6
26% of its genes and limited DNA similarity (Fig. S5), but also has nucleotide similarity and
shares genes with Cluster N and I2 phages, among others (Fig. S5, S7A), as reflected in its low
CII (Table 1, Fig. 3). Likewise, the singleton MooMoo has segments of DNA similarity and
shares ~20% of its genes with Cluster F phages (Fig. 1, S6, S7B), but also has similarity to
Clusters N and I; it also has a low CII (Table 1, Fig. 3). It has low DNA similarity to Cluster O
(Fig. S6), but shares several genes and has the same unusual prolate morphology (Fig. 1).
Complex relationships are also seen in the singletons Gaia and Sparky (Fig. S8).
2Total phams is the sum of all phamilies (groups of homologous mycobacteriophage genes) in that cluster
3Cluster Cohesion Index (CCI) is generated by dividing the average number of genes per genome by the total number of phamilies (phams) in that cluster. For singleton phages (bottom eight rows) the number of phams is equivalent to the number of genes (.e. CCI is one), except where phams are represented by two or more genes in the same genome.
4Cluster Isolation Index (CII) is the percentage of phams that are present only in that cluster, and not present in other mycobacteriophages
MMoorrgguusshhii
0.01
M Wildcat
C
Sparky
S O MooMoo
L
FNT IP
Q
G
KMuddy
Patience
RDH
DoriB
A
DS6A
Gaia
J
E
Figure 1
MooMooCorndog
Mozy
Figure 2
A BC
K
F
N
P
I
J
H
L DM
E
OT
R SQ
G
ClusterIsolation
IndexM
oreIsolated
LessIsolated
Cluster Cohesion IndexLess DiverseMore Diverse
0 0.2 0.4 0.6 0.8 1.020
30
40
50
60
70
80
90
Wildcat
Muddy
MooMoo
Dori
Sparky
GaiaDS6A
Patience
>200 100-200 50-100 10-50 5-10 2-5 Singleton
Figure 3
SUPPLEMENTARY DATA
Supplementary Tables
Table S1. Phages used in this study and their cluster designation
Table S2. Genometrics and Cluster Cohesion Index of mycobacteriophages.
Supplementary Figures
Figure S1. Geographical distribution of sequenced mycobacteriophages. (A) Locations of
sequenced mycobacteriophages across the globe. (B) Locations of sequenced
mycobacteriophages across the United States. Data from www.phagesDB.org.
Figure S2. Nucleotide sequence comparison of 627 mycobacteriophages displayed as a
dotplot. Complete genome sequences of 627 mycobacteriophages were concatenated into a
single file and compared with itself using Gepard1 and displayed as a dotplot. The order of the
genomes is as listed in Table S1. Nucleotide similarity is a primary component in assembling
phages into Clusters, which typically requires evident DNA similarity spanning more than 50% of
the genome lengths.
Figure S3. Proportions of orphams in mycobacteriophage genomes. The proportions of
genes that are orphams (i.e. single-gene phamilies with no homologues within the
mycobacteriophage dataset) are shown for each phage. The order of the phages is as shown in
Table S1. All of the singleton genomes have >30% orphams, and most of the other genomes
with relatively high proportions of orphams are the single-genome subclusters (see Table S2)
Saintus A8 Smeadley A8 Alma A9 Catalina A9 Myxus A9 PackMan A9 Goose A10 KittenMittens A10 Rebeuca A10 RhynO A10 Severus A10 Trike A10 Twister A10 Bachome A11 Et2Brutus A11 Fibonacci A11 Mulciber A11 Adjutor D1 BigMama D1 Butterscotch D1 Gumball D1 Nova D1 PBI1 D1 PLot D1 SirHarley D1 Troll4 D1 Hawkeye D2 244 E ABCat E Bask21 E Cactus E Cjw1 E Contagion E Czyszczon1 E DrDrey E Dumbo E Dusk E Elph10 E Eureka E Goku E Henry E Hopey E Kostya E Lilac E MadamMonkfish E Murphy E NelitzaMV E NoSleep E Pharsalus E Phaux E Phrux E Porky E Pumpkin E Rakim E RiverMonster E Simpliphy E SirDuracell E Stark E TeardropMSU E Toto E Tuco E Ukulele E Ardmore F1 Batiatus F1 Bipolar F1 Bobi F1 Boomer F1 Brocalys F1 Bubbles123 F1 BuzzLyseyear F1 Cabrinians F1 CaptainTrips F1
Cerasum F1 Che8 F1 DLane F1 Daenerys F1 Dante F1 DeadP F1 Dorothy F1 DotProduct F1 Drago F1 Empress F1 Estave1 F1 Fruitloop F1 GUmbie F1 Girr F1 Hades F1 Hamulus F1 Hegedechwinu F1 Ibhubesi F1 Inventum F1 Job42 F1 Krakatau F1 Llama F1 Llij F1 Mantra F1 MilleniumForce F1 Minnie F1 MisterCuddles F1 Mozy F1 Mutaforma13 F1 Ogopogo F1 Ovechkin F1 PMC F1 Pacc40 F1 Pippy F1 Ramsey F1 RockyHorror F1 Ruby F1 SG4 F1 Saal F1 Shauna1 F1 ShiLan F1 SiSi F1 Spartacus F1 Spoonbill F1 SuperGrey F1 Taj F1 Tweety F1 Velveteen F1 Wee F1 dirtMcgirt F1 Avani F2 Che9d F2 Jabbawokkie F2 Yoshi F2 Zapner F2 Squirty F3 Angel G Annihilator G Avrafan G BPs G BQuat G BruceB G Cherrybomb426 G Frosty24 G Gomashi G Halo G Hope G Liefie G Phreak G Zombie G Damien H1 Konstantine H1
JoeDirt L1 LeBron L1 UPIE L1 Archie L2 Breezona L2 Crossroads L2 Faith1 L2 Loadrie L2 MkaliMitinis3 L2 Nicholasp3 L2 Rumpelstiltskin L2 Winky L2 Whirlwind L3 Bongo M PegLeg M Rey M Butters N Carcharodon N Charlie N MichelleMyBell N Redi N SkinnyPete N Xerxes N DS6A Sin Dori Sin Gaia Sin MooMoo Sin Muddy Sin Patience Sin Sparky Sin Wildcat Sin Catdawg O Corndog O Dylan O Firecracker O YungJamal O Donovan P1 Fishburne P1 HUHilltop P1 Jebeks P1 Malithi P1 Phineas P1 Shipwreck P1 BigNuz P1 Purky P2 Evanesce Q Giles Q HH92 Q Kinbote Q OBUPride Q Nilo R Papyrus R Send513 R Weiss13 R Marvin S MosMoris S