Comparative genome analysis of Lactobacillus mudanjiangensis, an understudied member of the Lactobacillus plantarum group Sander Wuyts 1,2 , Camille Nina Allonsius 1 , Stijn Wittouck 1 , Sofie Thys 3 , Bart Lievens 4 , Stefan Weckx 2 , Luc De Vuyst 2* , Lebeer Sarah 1* 1 University of Antwerp, Research Group Environmental Ecology and Applied Microbiology (ENdEMIC), Department of Bioscience Engineering, Antwerp, Belgium 2 Vrije Universiteit Brussel, Research Group of Industrial Microbiology and Food Biotechnology (IMDO), Faculty of Sciences and Bioengineering Sciences, Brussels, Belgium 3 University of Antwerp, Laboratory of Cell Biology and Histology, Antwerp Centre for Advanced Microscopy (ACAM), Antwerp, Belgium 4 KU Leuven, Laboratory for Process Microbial Ecology and Bioinspirational Management (PME&BIM), Department of Microbial and Molecular Systems (M2S), Campus De Nayer, Sint-Katelijne-Waver, Belgium * [email protected]Abstract The genus Lactobacillus is known to be extremely diverse and consists of different phylogenetic groups that show a diversity roughly equal to the expected diversity of a typical bacterial genus. One of the most prominent phylogenetic groups within this genus is the Lactobacillus plantarum group which contains the understudied Lactobacillus mudanjiangensis species. Before this study, only one L. mudanjiangensis strain, DSM 28402 T , was described but without whole-genome analysis. In this study, three strains classified as L. mudanjiangensis, were isolated from three different carrot juice fermentations and their whole-genome sequence was determined, together with the genome sequence of the type strain. The genomes of all four strains were compared with publicly available L. plantarum group genome sequences. This analysis showed that L. mudanjiangensis harbored the second largest genome size and gene count of the whole L. plantarum group. In addition, all members of this species showed the presence of a gene coding for a putative cellulose-degrading enzyme. Finally, three of the four L. mudanjiangensis strains studied showed the presence of pili on scanning electron microscopy (SEM) images, which were linked to conjugative gene regions, coded on plasmids in at least two of the strains studied. Author summary Lactobacillus mudanjiangensis is an understudied species within the Lactobacillus plantarum group. Since its first description, no other studies have reported its isolation. Here, we present the first four genome sequences of this species, which include the genome sequence of the type strain and three new L. mudanjiangensis strains isolated from fermented carrot juice. The genomes of all four strains were compared with publicly available L. plantarum group genome sequences. We found that this species February 5, 2019 1/20 . CC-BY 4.0 International license is made available under a The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It . https://doi.org/10.1101/549451 doi: bioRxiv preprint
25
Embed
Comparative genome analysis of Lactobacillus ...Comparative genome analysis of Lactobacillus mudanjiangensis, an understudied member of the Lactobacillus plantarum group Sander Wuyts1,2,
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Comparative genome analysis of Lactobacillusmudanjiangensis, an understudied member of theLactobacillus plantarum group
1 University of Antwerp, Research Group Environmental Ecology and AppliedMicrobiology (ENdEMIC), Department of Bioscience Engineering, Antwerp, Belgium2 Vrije Universiteit Brussel, Research Group of Industrial Microbiology and FoodBiotechnology (IMDO), Faculty of Sciences and Bioengineering Sciences, Brussels,Belgium3 University of Antwerp, Laboratory of Cell Biology and Histology, Antwerp Centre forAdvanced Microscopy (ACAM), Antwerp, Belgium4 KU Leuven, Laboratory for Process Microbial Ecology and BioinspirationalManagement (PME&BIM), Department of Microbial and Molecular Systems (M2S),Campus De Nayer, Sint-Katelijne-Waver, Belgium
The genus Lactobacillus is known to be extremely diverse and consists of differentphylogenetic groups that show a diversity roughly equal to the expected diversity of atypical bacterial genus. One of the most prominent phylogenetic groups within thisgenus is the Lactobacillus plantarum group which contains the understudiedLactobacillus mudanjiangensis species. Before this study, only one L. mudanjiangensisstrain, DSM 28402T, was described but without whole-genome analysis. In this study,three strains classified as L. mudanjiangensis, were isolated from three different carrotjuice fermentations and their whole-genome sequence was determined, together with thegenome sequence of the type strain. The genomes of all four strains were compared withpublicly available L. plantarum group genome sequences. This analysis showed that L.mudanjiangensis harbored the second largest genome size and gene count of the wholeL. plantarum group. In addition, all members of this species showed the presence of agene coding for a putative cellulose-degrading enzyme. Finally, three of the four L.mudanjiangensis strains studied showed the presence of pili on scanning electronmicroscopy (SEM) images, which were linked to conjugative gene regions, coded onplasmids in at least two of the strains studied.
Author summary
Lactobacillus mudanjiangensis is an understudied species within the Lactobacillusplantarum group. Since its first description, no other studies have reported its isolation.Here, we present the first four genome sequences of this species, which include thegenome sequence of the type strain and three new L. mudanjiangensis strains isolatedfrom fermented carrot juice. The genomes of all four strains were compared withpublicly available L. plantarum group genome sequences. We found that this species
February 5, 2019 1/20
.CC-BY 4.0 International licenseis made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It. https://doi.org/10.1101/549451doi: bioRxiv preprint
harbored the second largest genome size and gene count of the whole L. plantarumgroup. Furthermore, we present the first scanning electron microscopy (SEM) images ofL. mudanjiangensis, which showed the formation of pili in three strains that we linkedto genes related to conjugation. Finally, we found the presence of a unique putativecellulose-degrading enzyme, opening the door for different industrial applications ofthese Lactobacillus strains.
Introduction 1
The genus Lactobacillus is known to be extremely diverse [1]. Furthermore, it has been 2
shown that different phylogenetic groups within this genus display a diversity roughly 3
equal to the expected diversity of a typical bacterial genus [2–6]. Each of these 4
phylogenetic groups can be recognized as an entity with unique properties and a distinct 5
natural history, ecology, function and physiology [5]. Therefore, the study of these 6
phylogenetic groups separately, as if they would be one genus, can be an interesting 7
approach that might reveal new, previously overlooked, phylogenetic relationships and 8
functional properties. 9
One of the more abundantly studied species within the genus Lactobacillus, is 10
Lactobacillus plantarum. Previous genome-based phylogenetic studies have defined L. 11
plantarum as a member of the L. plantarum group together with Lactobacillus 12
fabifermentans, Lactobacillus paraplantarum, Lactobacillus pentosus and Lactobacillus 13
xiangfangensis [1, 7]. In addition, the species Lactobacillus herbarum [8], Lactobacillus 14
plajomi [9], Lactobacillus modestisalitolerans [9] and Lactobacillus mudanjiangensis [10] 15
are closely related to L. plantarum and thus should be regarded as members of the L. 16
plantarum group. Lactobacillus mudanjiangensis is a species that has been described for 17
the first time in 2013 and that was isolated from a traditional pickle fermentation in the 18
Heilongjiang province in China [10]. Since its first description, no other study has 19
provided additional characterization or reported the isolation of other strains of the L. 20
mudanjiangensis species. Therefore, currently, not a single genomic assembly of this 21
species is publicly available. However, in in this study, four strains isolated from three 22
different spontaneous carrot juice fermentations [11], were putatively classified as 23
members of this Lactobacillus species. 24
Since the discovery of the mucus-binding pili, the fimbriae or adhesins, in 25
Lactobacillus rhamnosus GG [12,13], several comparative genomic studies have focused 26
on exploring similar gene clusters in other lactobacilli, including the members of the L. 27
plantarum group [1, 14–18]. Whereas these specific pili play an important role in cell 28
surface adhesion, pili can be of importance for an array of other functions as well, 29
ranging from biofilm formation to uptake of extracellular DNA via natural competence 30
(type IV pili) or facilitation of DNA transfer via conjugation [19–21]. The latter is a 31
process that uses conjugative pili to bring bacterial cells together and provide an 32
interface to exchange macromolecules, such as DNA or DNA-protein complexes [20]. In 33
general, such a conjugation system consists of three major components, namely (i) a 34
relaxase (MOB) that will bind and knick the DNA at the origin of replication, (ii) a 35
coupling protein (T4CP) that will couple the relaxase-DNA complex to (iii) a type IV 36
secretion system (T4SS), which ultimately transfers the whole complex to the recipient 37
cell [22–25]. Historically, these conjugation systems and their pili have been associated 38
with conjugative plasmids only [23], one of the main drivers of horizontal gene 39
transfer [22,26]. However, recently, also integrative and conjugative elements (ICEs), 40
which harbor conjugation systems as well, have been found to be another important 41
driver of horizontal gene transfer [26–28]. 42
This study aimed to provide more insights into the genomic features of the 43
understudied Lactobacillus mudanjiangensis species, in relation to the other members of 44
February 5, 2019 2/20
.CC-BY 4.0 International licenseis made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It. https://doi.org/10.1101/549451doi: bioRxiv preprint
the L. plantarum group, using a comparative genomics approach. Therefore, the 45
genome of the type strain of L. mudanjiangensis was sequenced together with three 46
strains isolated from fermented carrot juice. These and other publicly available genome 47
sequences were used to screen for L. mudanjiangensis species-specific properties, which 48
included an analysis for the presence of genes related to pili formation and conjugation. 49
In total, 304 genomes were subjected to an in-depth analysis focusing on the 50
phylogenetic relationships as well as the predicted functional capacity of these strains. 51
Results 52
The genome assembly of the type strain L. mudanjiangensis DSM 28402T was analyzed 53
together with the genome sequences of three putative L. mudanjiangensis strains 54
isolated from carrot juice fermentations, namely AMBF197, AMBF209 and AMBF249, 55
to confirm their putative classification as L. mudanjiangensis members. Furthermore, 56
to allow comparison with other closely related Lactobacillus species and detection of L. 57
mudanjiangensis species-specific properties, all publicly available genome sequences 58
(NCBI Assembly database, 24/07/2018) of L. plantarum group members were included 59
in this comparative genomics study, totaling the number of genomes analyzed to 304 60
(Table 1). 61
Table 1. An overview of the studied species and strains.Public data
Species Number of genomes Type strain ReferenceL. fabifermentans 2 DSM 21115T [29]L. herbarum 1 DSM 100358T [8]L. mudanjiangensis 0 DSM 28402T [10]L. paraplantarum 5 DSM 10667T [30]L. pentosus 13 DSM 20314T [31]L. plantarum 278 DSM 20174T [32]L. xiangfangensis 1 DSM 27103T [33]
Type strains sequenced in this studySpecies Number of genomes Type strain ReferenceL. mudanjiangensis 1 DSM 28402T [10]
In-house isolatesSpecies Number of genomes Isolation source ReferenceL. mudanjiangensis 3 Spontaneously fermented carrot juice [11]
Phylogeny of the Lactobacillus plantarum group 62
To obtain a detailed view on the phylogeny of L. mudanjiangensis in relationship to the 63
whole L. plantarum group, a maximum likelihood phylogenetic tree was constructed, 64
based on 612 single-copy core orthogroups, found with Orthofinder (Fig1 and S1 Fig). 65
The resulting topology of this tree showed seven major clades, mostly following the 66
species annotation as described in the NCBI Assembly database. However, these results 67
exposed a few wrongly annotated genomic assemblies. For example, both L. plantarum 68
MPL16 and L. plantarum AY01 were annotated as L. plantarum before, whereas here, 69
they were found within a clade that contained the L. paraplantarum type strain. 70
Similarly, L. plantarum EGD-AQ4 was found within the clade of the L. pentosus type 71
strain, whereas it was annotated as L. plantarum before. 72
The type strain of L. mudanjiangensis formed a separate clade together with the 73
strains AMBF197, AMBF209 and AMBF249 (Fig1). Based on its single-copy core 74
orthogroups, this species was phylogenetically the most distant to L. plantarum, 75
whereas its closest relative was L. fabifermentans, followed by L. xiangfangensis. 76
February 5, 2019 3/20
.CC-BY 4.0 International licenseis made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It. https://doi.org/10.1101/549451doi: bioRxiv preprint
Fig 1. Maximum likelihood phylogenetic tree of the Lactobacillusplantarum group. The tree is based on the amino acid sequences of 612 single-copymarker genes. Lactobacillus algidus DSM 15638 was used as an outgroup. The tree waspruned to only keep 70 L. plantarum strains to avoid over-plotting. A complete tree canbe found in the Appendix (S1 Fig). The branch length of the outgroup was shortenedfor better visualization. Each tip is colored, based on its species name as annotated inthe NCBI Assembly database (where applicable), with dark blue for L. plantarum, lightgreen for Lactobacillus paraplantarum, light blue for Lactobacillus pentosus, pink forLactobacillus herbarum, light orange for Lactobacillus xiangfangensis, dark orange forLactobacillus mudanjiangensis and red for the isolates obtained from the carrot juicefermentations. Type strains of each species are annotated with a triangle (NCBI) or asquare (sequenced in this study).
Low intraclade ANI values for Lactobacillus pentosus and 77
Lactobacillus plantarum 78
To confirm that each major phylogenetic clade represented at least one different species, 79
the pairwise average nucleotide identity (ANI) values of all genome assemblies were 80
calculated (Fig2). Intraclade ANI values all exceeded the commonly used 95% species 81
level threshold [34] for L. mudanjiangensis (99.0-99.4%), L. fabifermentans (99.7-99.9%) 82
and L. paraplantarum (99.7-99.9%), whereas their interclade ANI values were far below 83
this threshold, showing that these clades all represented a single species. However, this 84
was not the case for L. pentosus, for which multiple pairwise comparisons led to 85
intraclade ANI values below this threshold, suggesting that this phylogenetic clade 86
contained at least two species (Fig2 and S2 Fig). This result was also found for some L. 87
plantarum assemblies, although to a much lesser extent, compared to L. pentosus. 88
Therefore, it was decided that, for subsequent analyses, L. plantarum was kept as one 89
species, whereas L. pentosus was split into two groups, each representing one species. 90
One species was designated as species L. pentosus, represented by a clade containing 91
twelve genomic assemblies, including the type strain L. pentosus DSM 20314T. The 92
other species, here referred to as clade 5a, was represented by two genomic assemblies 93
(L. pentosus KCA1 and L. plantarum EGD-AQ4) (Figures 1 and S2 Fig). Finally, for L. 94
xiangfangensis and L. herbarum, no intraclade comparisons could be performed, as only 95
one genome assembly was available for these species. 96
Fig 2. Density plot of all pairwise average nucleotide identity (ANI)comparisons for each Lactobacillus plantarum group species. In green allinterclade comparisons are shown, whereas orange shows all intraclade comparisons. ForLactobacillus xiangfangensis and Lactobacillus herbarum, no intraclade comparisonscould be performed, as only one genome assembly was available for these species.
Genomic features of Lactobacillus mudanjiangensis 97
The above results confirmed the initial that strains AMBF197, AMBF209 and 98
AMBF249 were members of the L. mudanjiangensis species. Therefore, here, the first 99
four genomes of this species were presented. Their genome size varied between 3.4 Mb 100
(strain DSM 28402T) and 3.6 Mb (strain AMBF209), whereas their GC content varied 101
between 42.73% (strain AMBF209) and 43.06% (strain DSM 28402T) (Table 2). Finally, 102
a high number of transfer RNA (tRNA) genes were found in all four strains. 103
A substantial difference in total genome length between the different species of the L. 104
plantarum group was found (Fig3A). Lactobacillus mudanjiangensis showed a median 105
February 5, 2019 4/20
.CC-BY 4.0 International licenseis made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It. https://doi.org/10.1101/549451doi: bioRxiv preprint
estimated genome size of 3.53 Mb, the second largest of the whole L. plantarum group 106
up to now. Lactobacillus pentosus showed the largest median estimated genome size 107
(3.77 Mb), followed by L. mudanjiangensis (3.53 Mb) and L. fabifermentans (3.43 Mb), 108
whereas for L. xiangfangensis (3.0 Mb) and L. herbarum (2.9 Mb) this size was much 109
smaller. A remarkable high spread in genome length was found within strains belonging 110
to the L. plantarum species, as their genome size ranged between 2.9 Mb and 3.8 Mb. 111
Furthermore, L. mudanjiangensis showed a GC content of 42.9%, the lowest value 112
within the whole L. plantarum group (Fig3B). Finally, regarding median gene count, 113
similar trends were found as for the genome length, with L. pentosus showing the 114
highest count, followed by L. mudanjiangensis and L. fabifermentans, whereas L. 115
xiangfangensis and L. herbarum harbored the lowest numbers of genes (Fig3C). 116
Fig 3. Estimated genome sizes, GC contents and gene counts of all genomesof Lactobacillus plantarum group species studied and predicted functionalcapacity of all unique Lactobacillus mudanjiangensis orthogroups. (A) Totalgenome size, (B) GC content and (C) gene counts of all genomes studied, colored byspecies. (D) Upset plot comparing shared orthogroup counts between the eight L.plantarum group species. Species-specific orthogroups for L. mudanjiangensis arecolored in blue and the inset shows their functional category based on EggNOGclassification. Uniquely shared orthogroups between L. mudanjiangensis and one otherL. plantarum group member are colored in orange, whereas uniquely shared orthogroupsbetween L. mudanjiangensis and two other species are colored in red.
In total, 947,588 genes were found in the whole L. plantarum group, with an average 117
of 3,110 genes per genome. These genes were further clustered into 8,005 different 118
orthogroups, leading to an average count of 2,924 orthogroups per genome. The 119
differences between these numbers was due to the fact that some genes were found in 120
multiple copies within one genome, which clustered together in a single orthogroup. Of 121
all these orthogroups, 2,172 were defined as core orthogroups and 5,833 as accessory 122
orthogroups (see Sections 1.2.2 and 4.2.3 for the definition of these terms). A detailed 123
overview of the number of genes and core and accessory orthogroups can be found in S2 124
Table. Subsequently, the distribution of orthogroups between the different L. plantarum 125
group members was explored (Fig3D). The species with the highest number of 126
species-specific orthogroups was L. plantarum. With 2,065 species-specific orthogroups, 127
it greatly outnumbered all other species, although this number was most probably 128
biased, due to the higher number of sequenced genomes available for L. plantarum 129
compared with the other L. plantarum group species. It was followed by L. 130
mudanjiangensis (Fig3D, blue) that contained 372 species-specific orthogroups and L. 131
fabifermentans harboring 213 species-specific orthogroups. Furthermore, L. plantarum 132
and L. pentosus shared the highest number of uniquely shared orthogroups (286), 133
followed by the combination of L. plantarum and L. paraplantarum (219 uniquely 134
shared orthogroups), which seemed to be in line with the phylogeny described in Fig1. 135
In contrast, L. mudanjiangensis shared more unique orthogroups with the 136
phylogenetically distant L. plantarum (166 orthogroups; Fig3D, yellow) than it did with 137
the most closely related species, L. fabifermentans (59 orthogroups). 138
To get more insights into the unique properties of L. mudanjiangensis, all 372 139
February 5, 2019 5/20
.CC-BY 4.0 International licenseis made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It. https://doi.org/10.1101/549451doi: bioRxiv preprint
Lactobacillus mudanjiangensis harbors a potential 146
cellulose-degrading enzyme 147
Carbohydrate transport and metabolism (category G) was found to be the most 148
abundantly characterized category among the L. mudanjiangensis species-specific 149
orthogroups. Further examination of the 14 unique orthogroups that were detected in 150
this category, revealed the presence of a gene in all four strains annotated as 151
endoglucanase E1, which is involved in the conversion of cellulose polymers into simple 152
saccharides [35]. A BLAST search of the DNA sequences of this gene to the NCBI nt 153
database showed a best scoring hit (26% coverage and 69% identity) with a member of 154
the Herbinix species (GenBank accession number LN879430). The Herbinix genus 155
contains cellulose-degrading bacteria [36]. This result also showed that this gene was 156
not found in any other member of the Lactobacillus Genus Complex (LGC), or any 157
other LAB, and confirmed its uniqueness to L. mudanjiangensis. Since endoglucanases 158
are classified as glycosyl hydrolases (GHs), GHs were predicted for all genomes. Indeed, 159
for all four L. mudanjiangensis strains, this endoglucanase E1 gene was classified as 160
belonging to the GH5 1 family, which was a family uniquely found in L. 161
mudanjiangensis. Although this GH family showed some degree of polyspecificity, the 162
majority of enzymes (22 of 24 enzymes characterized) are reported as 163
endoglucanases [37]. Together, these results thus pointed towards the presence of a 164
novel putative cellulose-degrading enzyme in all four L. mudanjiangensis strains. 165
Presence of a putative conjugative system in Lactobacillus 166
mudanjiangensis 167
The second most abundant category of L. mudanjiangensis-specific orthogroups, 168
excluding category S (function unknown), were genes related to ‘cell wall, membrane, or 169
envelope biogenesis‘ (category M). Examination of the annotation of the genes 170
belonging to these orthogroups did not reveal any new insights, as many of them were 171
annotated as hypothetical proteins. Therefore, scanning electron microscopy (SEM) was 172
performed to screen the cell surfaces of these four strains in more detail. This analysis 173
revealed that three of the four strains (L. mudanjiangensis DSM28402T, AMBF209 and 174
AMBF249) formed pili or fimbriae, connecting different cells to each other as well as 175
cells to an undefined structure (Fig4A). 176
Fig 4. Scanning electron microscopy (SEM) and genes related toconjugation. (A) SEM images of all four Lactobacillus mudanjiangensis strainsstudied. White arrows indicate putative conjugative pili. (B) Gene clusters encoding aputative conjugation system, colored by their potential function, as classified byCONJSCAN. The text above each gene shows its matching orthogroup. (C) Schematicmodel representing the process of bacterial conjugation with all three mandatoryelements. Scheme adapted from [23]. (R, relaxase; T4CP, Type IV coupling protein;T4SS, Type IV secretion system).
To identify the genes encoding these pili, all genome sequences of L. 177
mudanjiangensis were screened for the presence of genes associated with these kinds of 178
February 5, 2019 6/20
.CC-BY 4.0 International licenseis made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It. https://doi.org/10.1101/549451doi: bioRxiv preprint
phenotypes. These included the spaCBA gene cluster, which has been linked with 179
probiotic properties in L. rhamnosus, due to better adhesion to intestinal epithelial 180
cells [38, 39], as well as secretion systems based on pili, such as the type II and type IV 181
secretion systems [25,40]. In this study, no spaCBA gene cluster was found. However, 182
further exploration revealed the presence of a conjugation system in at least three of the 183
four L. mudanjiangensis strains examined (AMBF209, AMBF249 and DSM 28402T). 184
Two complete conjugation systems containing all three mandatory parts (Fig4C) 185
were found in L. mudanjiangensis AMBF209 and AMBF249, whereas one complete 186
conjugation system was found in L. mudanjiangensis DSM 28402T (Fig4B). For all four 187
L. mudanjiangensis strains, the relaxase gene of this conjugation system was classified 188
as a member of the MOBQ class, whereas the coupling protein was classified as a 189
T4CP2. The MPF system, which harbored the putative pilus, was further classified as 190
belonging to the class MPFFATA, which groups the MPF systems of Gram-positive 191
bacteria [24]. VirB4 was identified as the ATPase motor of this MPF system. 192
Furthermore, this MPF system contained three accessory genes (trsC, trsD and trsJ ) in 193
L. mudanjiangensis AMBF209 and AMBF249, whereas four accessory genes were 194
annotated in L. mudanjiangensis DSM 28402T (trsC, trsD, trsF and trsJ ) (Fig4B). 195
Homologs for the genes trsC and trsD were already previously identified, with trsC 196
coding for a VirB3 homolog, which is linked to the formation of the membrane pore, 197
and trsD coding for another homolog of VirB4, the conjugation ATPase [24]. In 198
contrast, both trsF and trsJ are poorly characterized. 199
Further analysis of the genes surrounding the annotated conjugation genes showed 200
that this genomic region contained 18 to 19 open reading frames, most of them 201
annotated as hypothetical proteins (Fig4B and S3 Table). However, a bacteriophage 202
peptidoglycan hydrolase domain was found in orthogroup OG0002812 in both L. 203
mudanjiangensis AMBF209 conjugation region 1 (AMBF209 CR1) and L. 204
mudanjiangensis AMBF249 conjugation region 2 (AMBF249 CR2), making it a 205
VirB1-like protein [41]. In Agrobacterium tumefaciens, the VirB1 protein provides 206
localized lysis of the peptidoglycan cell wall to allow insertion of the T4SS [42]. A 207
similar domain, also known to harbor peptidoglycan lytic activity, was found in L. 208
mudanjiangensis DSM28402 CR1 (orthogroup OG0002812). Finally, another conserved 209
domain was found in all five gene regions clustered in orthogroup OG0003012, 210
annotated as T4SS CagC, which was shown to be a VirB2 homolog. VirB2 is the major 211
pilus component of the type IV secretion system of A. tumefaciens, which is the main 212
building block for extension and retraction of the conjugative pilus [43–45]. Taken 213
together, these results showed the presence of pili in three L. mudanjiangensis strains 214
(AMBF209, AMBF249 and DSM 28402T), which after genomic analysis were 215
hypothesized to be part of a conjugation system. 216
Finally, genome analysis of all other L. plantarum group members showed that, in 217
contrast to an initial belief, the presence of a complete conjugation system was not 218
unique to L. mudanjiangensis (S1 Table). All three necessary genes were also found in 219
58 of 275 L. plantarum strains, two of seven L. paraplantarum strains and four of twelve 220
L. pentosus strains. In contrast, the system was completely absent in clade5a, L. 221
herbarum, L. xiangfangensis and L. fabifermentans. 222
Plasmid reconstruction from genome data 223
Many conjugation systems are coded on plasmids [27]. Therefore, all four L. 224
mudanjiangensis genomes were screened for plasmid presence using Recycler [46]. 225
Plasmids were only found in two of four genome assemblies, namely L. mudanjiangensis 226
AMBF209 and AMBF249 (Fig5). Both strains harbored a plasmid of 27.3 Kb with 33 227
predicted genes, which after pairwise alignment of the plasmid were foun to be exactly 228
the same. Subsequently, the presence of a conjugation system on these plasmids was 229
February 5, 2019 7/20
.CC-BY 4.0 International licenseis made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It. https://doi.org/10.1101/549451doi: bioRxiv preprint
confirmed using CONJScan. Further examination showed that the plasmid exactly 230
matched the above described AMBF209 CR2 and AMBF249 CR1 gene regions (Fig4B). 231
Regarding annotated genes, 13 of 33 gene products were predicted as hypothetical 232
proteins by Prokka. Further annotation using the EggNOG database revealed that most 233
genes were mapped to category S (function unknown), followed by category L 234
(replication, recombination and repair) and category C (energy production and 235
conversion). Finally, a BLAST search was performed against the NCBI nt database to 236
explore whether a similar plasmid was already described in the literature. This resulted 237
in a best matching hit, showing 97% identity and 76% query coverage with plasmid 238
pKLC4 of Leuconostoc carnosum JB16 (GenBank accession number CP003855). This 239
plasmid was found in a strain isolated from kimchi and has a length of 36.6 Kb [47], 240
indicating that some deletions occurred in the L. mudanjiangensis plasmids. All 241
together, these results showed that L. mudanjiangensis strains AMBF209 and 242
AMBF249 carried the same conjugative plasmid, of which the encoded gene functions 243
are poorly characterized. 244
Fig 5. Plasmid map of the predicted conjugative plasmid of Lactobacillusmudanjiangensis AMBF209 and AMBF249. Genes are colored according totheir annotation, as defined in Fig4.
Since only two of five conjugation regions (Fig4) were plasmid-encoded, an 245
additional analysis was performed to assess whether the other three conjugation systems 246
could be part of an ICE. For this, all four L. mudanjiangensis genomes were analyzed 247
similar to a recently published method [26]. However, ICE regions usually contain 248
repeats, such as transposases, leading to fragmentation of these ICE regions, if short 249
read sequencing technology is used [26]. Therefore, these analysis methods usually 250
require a complete genome for proper ICE identification. The assembly state of the four 251
genomes made it thus hard to correctly interpret the results obtained. 252
Discussion 253
In this study, the genome sequence of the L. mudanjiangensis type strain DSM 28402T 254
was presented together with the genomes of three new L. mudanjiangensis strains, 255
AMBF197, AMBF209 and AMBF249, which were isolated from three different 256
spontaneous carrot juice fermentations [11]. Since previous phylogenetic analysis of this 257
species, using the 16S rRNA, pheS, and rpoA genes, showed a close genetic relatedness 258
with the members of the L. plantarum group [10], it was decided to study these 259
genomes in relation to the closely related members of the L. plantarum group with a 260
comparative genomics approach. A maximum likelihood phylogenetic tree confirmed 261
that L. mudanjiangensis was closely related to all other L. plantarum group members. 262
Furthermore, pairwise ANI analysis confirmed that the three strains isolated from 263
carrot juice fermentations were indeed members of the L. mudanjiangensis species. Yet, 264
this analysis also revealed that several genomes annotated as L. pentosus and L. 265
plantarum showed intraclade ANI values below the commonly used cut-off value of 95% 266
identity [34]. Especially for L. pentosus, low intraclade ANI values (a maximum of 267
92,3%) were found, meaning that if the 95% cut-off is strictly applied, two genome 268
assemblies could not be seen as members of the L. pentosus species. One of them 269
represented the genome sequence of the vaginal isolate L. pentosus KCA1 [48], the other 270
a genome sequence that has been annotated as L. plantarum EGD-AQ4, isolated from a 271
fermented bamboo shoot product [49]. The fact that these two genomes showed lower 272
intraclade ANI values could mean that these bacteria are undergoing a speciation 273
process or that this strict ANI value cut-off should be revised, as it potentially separates 274
February 5, 2019 8/20
.CC-BY 4.0 International licenseis made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It. https://doi.org/10.1101/549451doi: bioRxiv preprint
hokkaidonensis [57], L. plantarum [58] and Lactobacillus reuteri [59]. Genes on these 317
plasmids often code for proteins involved in detoxification, virulence, antibiotic 318
resistance and ecological interactions [22], which could give them a fitness advantage in 319
certain environments. Here, apart from the conjugation-related genes, many genes were 320
annotated as hypothetical proteins on the conjugative plasmid. However, since this 321
plasmid showed great similarity with a plasmid from a Leuconostoc strain, which was 322
isolated from fermented kimchi [47], it could potentially harbor genes that are beneficial 323
for survival on plants or in a fermented vegetable environment. 324
In conclusion, in this study, the genome sequences of four L. mudanjiangensis strains 325
were studied in relation to the closely related members of the L. plantarum group. 326
February 5, 2019 9/20
.CC-BY 4.0 International licenseis made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It. https://doi.org/10.1101/549451doi: bioRxiv preprint
Comparative genome analysis of this phylogenetic group found two wrongly annotated 327
genome assemblies and intraclade ANI values below the commonly used species 328
delimitation threshold for L. plantarum and L. pentosus. Furthermore, L. 329
mudanjiangensis harbored one of the largest genomes and the highest gene counts of the 330
L. plantarum group. Together with its broad repertoire of GHs and its potential 331
capability to degrade cellulose, either a nomadic or plant-adapted lifestyle could be 332
assigned to L. mudanjiangensis. Finally, three of the four L. mudanjiangensis strains 333
studied showed the presence of pili on SEM images, which were linked to conjugative 334
gene regions. For two strains, L. mudanjiangensis AMBF209 and AMBF249, these 335
regions were plasmid-associated. Further experimental studies, such as phenotypic 336
growth curve-based screenings, conjugation experiments and the creation of knock-out 337
mutants, are necessary to characterize the plasmid found and to confirm the link 338
between the pili observed and this conjugation gene region. 339
Materials and methods 340
Sequencing of the Lactobacillus mudanjiangensis type strain 341
and downloading of publicly available assemblies 342
The type strain of L. mudanjiangensis [L. mudanjiangensis DSM 28402T ( = LMG 343
27194T = CCUG 62991T)] was purchased from a public microorganism collection 344
(BCCM-LMG, Ghent, Belgium). The strain was grown overnight in de 345
Man-Rogosa-Sharpe (MRS) medium (Carl Roth, Karlsruhe, Germany) and DNA was 346
extracted using the NucleoSpin 96 tissue kit (Macherey-Nagel, Duren, Germany), with 347
an extra cell lysis step using 20 mg/mL of lysozyme (Sigma-Aldrich, St. Louis, MO, 348
USA) and 100 U/mL of mutanolysin (Sigma-Aldrich). Whole-genome sequencing was 349
performed using the Nextera XT DNA Sample Preparation kit (Illumina, San Diego, 350
CA, USA) and the Illumina MiSeq platform, using 2 x 250 cycles, at the Laboratory of 351
Medical Microbiology (University of Antwerp, Antwerp, Belgium) in the case of the 352
strains AMBF197, AMBF249 and DSM28402T or 2 x 300 cycles at the Center of 353
Medical Genetics Antwerp of the University of Antwerp for strain AMBF209. Assembly 354
of the genome sequence was performed using SPAdes v 3.12.0 [60]. In addition, all 355
genome sequences annotated as L. fabifermentans, L. herbarum, L. paraplantarum, L. 356
pentosus, L. plantarum and L. xiangfangensis were downloaded from the National 357
Center for Biotechnology Information (NCBI) Assembly database on 24/07/2018, using 358
in-house scripts. In total, 310 genomes were used as an input for quality control. 359
Quality control and annotation 360
Basic genome characteristics, including genome size, GC content and the N50 value, 361
were estimated using Quast 4.6.3 [61]. The quality of the genome assemblies was 362
evaluated using the Quast output. After visualization of several quality control 363
parameters using ggplot2 [62], genomes with a N50 value < 25,000 bp and a number of 364
undefined nucleotides (N) per 100,000 bases > 500 were discarded. An overview of all 365
genome sequences and strains that passed this quality control (304 assemblies) can be 366
found in S1 Table. Finally, Prokka 1.12 [63] was used to predict and annotate genes for 367
all genome sequences. In addition to its internal databases, a customized genus-specific 368
BLAST database was used for higher quality annotation with Prokka’s –usegenus 369
option. This database was created using BLAST [64,65] and all complete Lactobacillus 370
genomes found in the NCBI Assembly database. Genes encoding glycosyl hydrolases 371
(GHs) were detected by scanning all genomes against hidden Markov model (HMM) 372
profiles of the CAZyme families [66]. The profiles were downloaded from the dbCAN 373
February 5, 2019 10/20
.CC-BY 4.0 International licenseis made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It. https://doi.org/10.1101/549451doi: bioRxiv preprint
webserver [67] and queried using HMMSCAN [68]. An E-value of 1 x 10-15 and a 374
coverage of 0.35 were used as cut-off, similar to what has been described before [69]. 375
Defining the pangenomes of all Lactobacillus plantarum group 376
species 377
To define the pangenome, all genes were clustered into orthogroups using OrthoFinder 378
2.2.6 [70] and further analyzed in R [71]. Here, a core orthogroup is defined as an 379
orthogroup present in more than 95% of a set of genomes. All other orthogroups are 380
defined as accessory orthogroups. An upset plot was created using the R package 381
UpSetR [72]. Unique orthogroups belonging to L. mudanjiangensis were further 382
annotated using EggNOG-mapper [73] and visualized using ggplot2 [62]. 383
Phylogenetic tree construction 384
Single-copy core orthogroups found by Orthofinder were used as input for the 385
construction of a phylogenetic tree. Lactobacillus algidus DSM 15638 (NCBI Assembly 386
accession number GCA 001434695) served as an outgroup, as it is the species most 387
closely related to the L. plantarum group [1]. The first protein sequence of each fasta 388
file of the single-copy core orthogroups was compared with a BLAST database of all 389
genome proteins of the outgroup’s genome sequence. All hits with a coverage > 75% 390
and a percentage similarity > 50% were added to the alignment of each orthogroup. 391
These alignments, on amino acid level, were concatenated into a supermatrix that was 392
used in RaxML 8.2.9 [74], to build a maximum likelihood phylogenetic tree with the –a 393
option, which combines a rapid bootstrap algorithm with an extensive search of the tree 394
space, starting from multiple different starting trees. The tree and subtrees were plotted 395
with the R package ggtree [75]. 396
Average nucleotide identity 397
All pairwise ANI values were calculated with the Python pyani package [76], using a 398
BLASTN approach [64,65] based on the methodology described by Goris et al. [77]. 399
Scanning electron microscopy 400
To assess the presence or absence of pili or fimbriae on the cell surface of L. 401
mudanjiangensis strains AMBF197, AMBF209, AMBF249 and DSM 28402T, SEM was 402
performed. To this end, the bacterial strains were grown overnight (MRS medium, 403
37°C), gently washed with phosphate-buffered saline (per liter: 56 g of NaCl, 1.4 g of 404
KCl, 10.48 g of Na2HPO4, 1.68 g of KH2PO4; pH 7.4) and spotted on a gold-coated 405
membrane [(approximately 5 x 107 colony forming units (CFU) per membrane]. 406
Bacterial spots were fixed with 2.5% (m/v) glutaraldehyde in 0.1 M sodium cacodylate 407
buffer (2.5% glutaraldehyde, 0.1 M sodium cacodylate, 0.05% CaCl2.2H2O; pH 7.4) by 408
gently shaking the membrane for 1 h at room temperature, followed by a further 409
overnight fixation at 4 °C. After fixation, the membranes were washed three times for 20 410
min with cacodylate buffer (containing 7.5% [m/v] saccharose). Subsequently, the 411
bacteria were dehydrated in an ascending series of ethanol (50%, 70%, 90% and 95%, 412
each for 30 min at room temperature and 100% for 2 x 1 h and 1 x 30 min) and dried in 413
a Leica EM CPD030 (Leica Microsystems Belgium, Diegem, Belgium). The membranes 414
were mounted on a stub and coated with 5 nm of carbon (Leica Microsystems Belgium) 415
in a Leica EM Ace 600 coater (Leica Microsystems Belgium). SEM imaging was 416
performed using a Quanta FEG250 SEM system (Thermo Fisher, Asse, Belgium) at the 417
February 5, 2019 11/20
.CC-BY 4.0 International licenseis made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It. https://doi.org/10.1101/549451doi: bioRxiv preprint
Antwerp Centre for Advanced Microscopy (ACAM, University of Antwerp) and Electron 418
Microscopy for Material Science group (EMAT, University of Antwerp). 419
Detection of genomic clusters encoding pili or fimbriae 420
To screen for the presence of the spaCBA gene cluster, the gene cluster that is 421
responsible for expression of the fimbriae in L. rhamnosus GG [38,39], a BLAST 422
search [64,65] on protein level was performed against a BLAST database constructed for 423
each genome separately. The gene sequences of spaA (NCBI GenBank accession number 424
BAI40953.1), spaB (BAI40954.1) and spaC (BAI40955.1) were used as queries. 425
Furthermore, the genomes were screened for genes encoding pili-related protein 426
secretion systems, using the predicted amino acid sequences as query and the TXSScan 427
definitions and profile models [25] as references in MacSyFinder v1.0.5 [78]. As only 428
genes related to conjugation systems were found, all protein sequences of all genomes 429
were again scanned, this time using the CONJScan definitions and profile 430
models [23, 26] using MacSyFinder. In brief, a conjugation region was only considered if 431
the conjugation genes were separated by less than 31 genes, except for genes encoding 432
relaxases that can be separated by maximal 60 genes. The region was considered 433
conjugative when it contained genes coding for (i) a VirB4/TraU homolog, (ii) a 434
relaxase, (iii) a type 4 coupling protein (T4CP) and (iv) a minimum number of 435
mating-pair formation (MPF) type-specific genes [26]. For both scans, hits with 436
alignments covering > 50% of the protein profile and with an independent E-value ¡ 437
10−3 were kept for further analysis (default parameters) in R [71]. Conserved domain 438
analysis of genes of interest was performed using the NCBI Conserved Domain web 439
interface [79]. The gene regions were visualized using the R package gggenes (available 440
at https://github.com/wilkox/gggenes). 441
Plasmid identification 442
Detection and reconstruction of plasmids in the different L. mudanjiangensis strains 443
was performed using Recycler v0.7 [46], with the original fastq files and SPAdes 444
assembly graphs as input. The assembled plasmids were annotated with Prokka and 445
further characterized by scanning against the EggNOG database, as described in above. 446
The presence of a conjugation system was confirmed with CONJScan, as described 447
above. The percentage identity between the different plasmids found was assessed using 448
BLAST [64,65]. The similarity with any previously described plasmid was checked by 449
performing a BLAST search [64,65] against the NCBI nucleotide (nt) database. A 450
plasmid map was created using Geneious v8 [80]. 451
Delimitation of integrative and conjugative elements 452
The presence of ICEs was explored by a similar approach as the pipeline described 453
previously [26]. Briefly, all strict core genes, i.e. genes present in all strains of L. 454
mudanjiangensis were found using the Orthofinder output (see above). Next, all 455
flanking core genes of each conjugative region were identified. Since within one species 456
an ICE is expected to be found between the same core orthogroups, the flanking core 457
genes of each conjugative region found were evaluated to determine whether or not it 458
could be defined as an ICE. 459
Accession number(s) and data availability 460
Sequencing data and genome assemblies are available at the European Nucleotide 461
Archive under the accession number ERP111972. The complete pipeline can be found 462
February 5, 2019 12/20
.CC-BY 4.0 International licenseis made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It. https://doi.org/10.1101/549451doi: bioRxiv preprint
on GitHub (https://github.com/swuyts/mudAnalysis). 463
Acknowledgments 464
We thank Eline Oerlemans, Ilke De Boeck, Ines Tuyaerts, and all other members of the 465
ENdEMIC group for their general assistance and/or fruitful discussions. In addition, we 466
also thank Charlotte Claes and Arvid Suls (Centre of Medical Genetics, University of 467
Antwerp, Antwerp, Belgium) for their valuable input regarding whole-genome 468
sequencing. Furthermore, we would like to thank the Antwerp Centre for Advanced 469
Microscopy (ACAM, University of Antwerp) for processing and imaging of the SEM 470
samples and the Electron Microscopy for Material Science group (EMAT, University of 471
Antwerp) for the use of the environmental SEM (Quanta 250 FEG). 472
References
1. Sun Z, Harris HMB, McCann A, Guo C, Argimon S, Zhang W, et al. Expandingthe biotechnology potential of lactobacilli through comparative genomics of 213strains and associated genera. Nature Communications. 2015;6(1):8322.doi:10.1038/ncomms9322.
2. Claesson MJ, van Sinderen D, O’Toole PW. Lactobacillus phylogenomics -Towards a reclassification of the genus. International Journal of Systematic andEvolutionary Microbiology. 2008;58(12):2945–2954. doi:10.1099/ijs.0.65848-0.
3. Salvetti E, Torriani S, Felis GE. The genus Lactobacillus: A taxonomic update.Probiotics and Antimicrobial Proteins. 2012;4(4):217–226.doi:10.1007/s12602-012-9117-8.
4. Salvetti E, Harris HMB, Felis GE, O’Toole PW. Comparative genomics revealsrobust phylogroups in the genus Lactobacillus as the basis for reclassification.Applied and Environmental Microbiology. 2018;84(17):00993–18.doi:10.1128/AEM.00993-18.
5. Duar RM, Lin XB, Zheng J, Martino ME, Grenier T, Perez-Munoz ME, et al.Lifestyles in transition: evolution and natural history of the genus Lactobacillus.FEMS Microbiology Reviews. 2017;41(Supp 1):S27–S48.doi:10.1093/femsre/fux030.
6. Parks DH, Chuvochina M, Waite DW, Rinke C, Skarshewski A, Chaumeil PA,et al. A standardized bacterial taxonomy based on genome phylogenysubstantially revises the tree of life. Nature Biotechnology.2018;36(August):996–1004. doi:10.1038/nbt.4229.
7. Zheng J, Ruan L, Sun M, Ganzle M. A genomic view of Lactobacilli andPediococci demonstrates that phylogeny matches ecology and physiology. Appliedand Environmental Microbiology. 2015;81(20):7233–7243.doi:10.1128/AEM.02116-15.
8. Mao Y, Chen M, Horvath P. Lactobacillus herbarum sp. nov., a species relatedto Lactobacillus plantarum. International Journal of Systematic and EvolutionaryMicrobiology. 2015;65(12):4682–4688. doi:10.1099/ijsem.0.000636.
9. Miyashita M, Yukphan P, Chaipitakchonlatarn W, Malimas T, Sugimoto M,Yoshino M, et al. Lactobacillus plajomi sp. nov. and Lactobacillus
February 5, 2019 13/20
.CC-BY 4.0 International licenseis made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It. https://doi.org/10.1101/549451doi: bioRxiv preprint
modestisalitolerans sp. nov., isolated from traditional fermented foods.International Journal of Systematic and Evolutionary Microbiology.2015;65(8):2485–2490. doi:10.1099/ijs.0.000290.
10. Gu CT, Li CY, Yang LJ, Huo GC. Lactobacillus mudanjiangensis sp. nov.,Lactobacillus songhuajiangensis sp. nov. and Lactobacillus nenjiangensis sp. nov.,isolated from Chinese traditional pickle and sourdough. International Journal ofSystematic and Evolutionary Microbiology. 2013;63(Pt 12):4698–4706.doi:10.1099/ijs.0.054296-0.
11. Wuyts S, Van Beeck W, Oerlemans EFM, Wittouck S, Claes IJJ, De Boeck I,et al. Carrot juice fermentations as man-made microbial ecosystems dominatedby lactic acid bacteria. Applied and Environmental Microbiology.2018;(April):00134–18. doi:10.1128/AEM.00134-18.
12. Lebeer S, Verhoeven TLa, Francius G, Schoofs G, Lambrichts I, Dufrene Y, et al.Identification of a gene cluster for the biosynthesis of a long, galactose-richexopolysaccharide in Lactobacillus rhamnosus GG and functional analysis of thepriming glycosyltransferase. Applied and Environmental Microbiology.2009;75(11):3554–3563. doi:10.1128/AEM.02919-08.
13. Kankainen M, Paulin L, Tynkkynen S, von Ossowski I, Reunanen J, Partanen P,et al. Comparative genomic analysis of Lactobacillus rhamnosus GG reveals pilicontaining a human-mucus binding protein. Proceedings of the NationalAcademy of Sciences of the United States of America. 2009;106(40):17193–17198.doi:10.1073/pnas.0908876106.
14. Douillard FP, Ribbera A, Kant R, Pietila TE, Jarvinen HM, Messing M, et al.Comparative genomic and functional analysis of 100 Lactobacillus rhamnosusstrains and their comparison with strain GG. PLoS Genetics. 2013;9(8):e1003683.doi:10.1371/journal.pgen.1003683.
15. Douillard FP, Mora D, Eijlander RT, Wels M, de Vos WM. Comparative genomicanalysis of the multispecies probiotic-marketed product VSL3. PLoS ONE.2018;13(2):e0192452. doi:10.1371/journal.pone.0192452.
16. Yu X, Jaatinen A, Rintahaka J, Hynonen U, Lyytinen O, Kant R, et al. Humangut-commensalic Lactobacillus ruminis ATCC 25644 displays sortase-assembledsurface piliation: Phenotypic characterization of its fimbrial operon through insilico predictive analysis and recombinant expression in Lactococcus lactis. PLoSONE. 2015;10(12):1–31. doi:10.1371/journal.pone.0145718.
17. Kant R, Palva A, Von Ossowski I. An in silico pan-genomic probe for themolecular traits behind Lactobacillus ruminis gut autochthony. PLoS ONE.2017;12(4):1–26. doi:10.1371/journal.pone.0175541.
18. Harris HMB, Bourin MJB, Claesson MJ, O’Toole PW. Phylogenomics andcomparative genomics of Lactobacillus salivarius, a mammalian gut commensal.Microbial Genomics. 2017;3(8). doi:10.1099/mgen.0.000115.
19. Mandlik A, Swierczynski A, Das A, Ton-That H. Pili in Gram-positive bacteria:assembly, involvement in colonization and biofilm development. Trends inMicrobiology. 2008;16(1):33–40. doi:10.1016/j.tim.2007.10.010.
20. Filloux A. A variety of bacterial pili involved in horizontal gene transfer. Journalof Bacteriology. 2010;192(13):3243–3245. doi:10.1128/JB.00424-10.
February 5, 2019 14/20
.CC-BY 4.0 International licenseis made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It. https://doi.org/10.1101/549451doi: bioRxiv preprint
21. Muschiol S, Balaban M, Normark S, Henriques-Normark B. Uptake ofextracellular DNA: Competence-induced pili in natural transformation ofStreptococcus pneumoniae. BioEssays. 2015;37(4):426–435.doi:10.1002/bies.201400125.
22. Smillie C, Garcillan-Barcia MP, Francia MV, Rocha EPC, de la Cruz F. Mobilityof plasmids. Microbiology and Molecular Biology Reviews. 2010;74(3):434–452.doi:10.1128/MMBR.00020-10.
23. Guglielmini J, Quintais L, Garcillan-Barcia MP, de la Cruz F, Rocha EPC. Therepertoire of ice in prokaryotes underscores the unity, diversity, and ubiquity ofconjugation. PLoS Genetics. 2011;7(8). doi:10.1371/journal.pgen.1002222.
24. Guglielmini J, Neron B, Abby SS, Garcillan-Barcia MP, La Cruz DF, Rocha EPC.Key components of the eight classes of type IV secretion systems involved inbacterial conjugation or protein secretion. Nucleic Acids Research.2014;42(9):5715–5727. doi:10.1093/nar/gku194.
25. Abby SS, Cury J, Guglielmini J, Neron B, Touchon M, Rocha EPC.Identification of protein secretion systems in bacterial genomes. ScientificReports. 2016;6(1):23080. doi:10.1038/srep23080.
26. Cury J, Touchon M, Rocha EPC. Integrative and conjugative elements and theirhosts: composition, distribution and organization. Nucleic Acids Research.2017;45(15):8943–8956. doi:10.1093/nar/gkx607.
27. Johnson CM, Grossman AD. Integrative and conjugative elements (ICEs): whatthey do and how they work. Annual Review of Genetics. 2015;49(1):577–601.doi:10.1146/annurev-genet-112414-055018.
28. Delavat F, Miyazaki R, Carraro N, Pradervand N, van der Meer JR. The hiddenlife of integrative and conjugative elements. FEMS Microbiology Reviews.2017;41(4):512–537. doi:10.1093/femsre/fux008.
29. De Bruyne K, Camu N, De Vuyst L, Vandamme P. Lactobacillus fabifermentanssp. nov. and Lactobacillus cacaonum sp. nov., isolated from Ghanaian cocoafermentations. International Journal of Systematic and EvolutionaryMicrobiology. 2009;59(1):7–12. doi:10.1099/ijs.0.001172-0.
30. Curk MC, Hubert JC, Bringel F. Lactobacillus paraplantarum sp. nov., a newspecies related to Lactobacillus plantarum. International Journal of SystematicBacteriology. 1996;46(2):595–598. doi:10.1099/00207713-46-2-595.
31. Zanoni P, Farrow JAE, Phillips BA, Collins MD. Lactobacillus pentosus (Fred,Peterson, and Anderson) sp. nov., nom. rev. International Journal of Systematicand Evolutionary Microbiology. 1987;37(4):339–341.
32. Pederson CS. A study of the species Lactobacillus plantarum (Orla-Jensen)Bergey et al . Journal of bacteriology. 1936;31(3):217.
33. Gu CT, Wang F, Li CY, Liu F, Huo GC. Lactobacillus xiangfangensis sp. nov.,isolated from Chinese pickle. International Journal of Systematic andEvolutionary Microbiology. 2012;62(Pt 4):860–863. doi:10.1099/ijs.0.031468-0.
34. Richter M, Rossello-Mora R. Shifting the genomic gold standard for theprokaryotic species definition. Proceedings of the National Academy of Sciencesof the United States of America. 2009;106(45):19126–19131.doi:10.1073/pnas.0906412106.
February 5, 2019 15/20
.CC-BY 4.0 International licenseis made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It. https://doi.org/10.1101/549451doi: bioRxiv preprint
35. Yennamalli RM, Rader AJ, Kenny AJ, Wolt JD, Sen TZ. Endoglucanases:insights into thermostability for biofuel applications. Biotechnology for Biofuels.2013;6(1):136. doi:10.1186/1754-6834-6-136.
36. Koeck DE, Hahnke S, Zverlov VV. Herbinix luporum sp. nov., a thermophiliccellulose-degrading bacterium isolated from a thermophilic biogas reactor.International Journal of Systematic and Evolutionary Microbiology.2016;66(10):4132–4137. doi:10.1099/ijsem.0.001324.
37. Aspeborg H, Coutinho PM, Wang Y, Brumer H, Henrissat B. Evolution,substrate specificity and subfamily classification of glycoside hydrolase family 5(GH5). BMC Evolutionary Biology. 2012;12(1):186.doi:10.1186/1471-2148-12-186.
38. Lebeer S, Claes I, Tytgat HLP, Verhoeven TLa, Marien E, von Ossowski I, et al.Functional analysis of Lactobacillus rhamnosus GG pili in relation to adhesionand immunomodulatory interactions with intestinal epithelial cells. Applied andEnvironmental Microbiology. 2012;78(1):185–193. doi:10.1128/AEM.06192-11.
39. Lebeer S, Bron PA, Marco ML, Van Pijkeren JP, O’Connell Motherway M, HillC, et al. Identification of probiotic effector molecules: present state and futureperspectives. Current Opinion in Biotechnology. 2018;49:217–223.doi:10.1016/j.copbio.2017.10.007.
40. Giltner CL, Nguyen Y, Burrows LL. Type IV pilin proteins: versatile molecularmodules. Microbiology and Molecular Biology Reviews. 2012;76(4):740–772.doi:10.1128/MMBR.00035-12.
41. Guglielmetti S, Balzaretti S, Taverniti V, Miriani M, Milani C, Scarafoni A, et al.TgaA, a VirB1-like component belonging to a putative type IV secretion systemof Bifidobacterium bifidum MIMBb75. Applied and Environmental Microbiology.2014;80(17):5161–5169. doi:10.1128/AEM.01413-14.
42. Zupan J, Hackworth CA, Aguilar J, Ward D, Zambryski P. VirB1 promotesT-pilus formation in the Vir-type IV secretion system of Agrobacteriumtumefaciens. Journal of Bacteriology. 2007;189(18):6551–6563.doi:10.1128/JB.00480-07.
43. Schroder G, Lanka E. The mating pair formation system of conjugative plasmids- A versatile secretion machinery for transfer of proteins and DNA. Plasmid.2005;54(1):1–25. doi:10.1016/j.plasmid.2005.02.001.
44. Alvarez-Martinez CE, Christie PJ. Biological diversity of prokaryotic type IVsecretion systems. Microbiology and Molecular Biology Reviews.2009;73(4):775–808. doi:10.1128/MMBR.00023-09.
45. Shariq M, Kumar N, Kumari R, Kumar A, Subbarao N, Mukhopadhyay G.Biochemical analysis of CagE: A VirB4 homologue of Helicobacter pyloriCag-T4SS. PLoS ONE. 2015;10(11):e0142606. doi:10.1371/journal.pone.0142606.
46. Rozov R, Brown Kav A, Bogumil D, Shterzer N, Halperin E, Mizrahi I, et al.Recycler: an algorithm for detecting plasmids from de novo assembly graphs.Bioinformatics. 2016;53(9):btw651. doi:10.1093/bioinformatics/btw651.
47. Jung JY, Lee SH, Jeon CO. Complete genome sequence of Leuconostoc carnosumstrain JB16, isolated from kimchi. Journal of Bacteriology.2012;194(23):6672–6673. doi:10.1128/JB.01805-12.
February 5, 2019 16/20
.CC-BY 4.0 International licenseis made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It. https://doi.org/10.1101/549451doi: bioRxiv preprint
48. Anukam KC, Macklaim JM, Gloor GB, Reid G, Boekhorst J, Renckens B, et al.Genome sequence of Lactobacillus pentosus KCA1: vaginal isolate from a healthypremenopausal woman. PLoS ONE. 2013;8(3):e59239.doi:10.1371/journal.pone.0059239.
49. Qureshi A, Itankar Y, Ojha R, Mandal M, Khardenavis A, Kapley A, et al.Genome sequence of Lactobacillus plantarum EGD-AQ4, isolated from fermentedproduct of Northeast India. Genome Announcements. 2014;2(1):4–5.doi:10.1128/genomeA.01122-13.
50. Martino ME, Bayjanov JR, Caffrey BE, Wels M, Joncour P, Hughes S, et al.Nomadic lifestyle of Lactobacillus plantarum revealed by comparative genomics of54 strains isolated from different habitats. Environmental Microbiology.2016;00(2016). doi:10.1111/1462-2920.13455.
51. Abriouel H, Perez Montoro B, Casado Munoz MDC, Knapp CW, Galvez A,Benomar N. In silico genomic insights into aspects of food safety and defensemechanisms of a potentially probiotic Lactobacillus pentosus MP-10 isolated frombrines of naturally fermented Alorena green table olives. PLoS ONE.2017;12(6):e0176801. doi:10.1371/journal.pone.0176801.
52. Klemm D, Heublein B, Fink HP, Bohn A. Cellulose: Fascinating biopolymer andsustainable raw material. Angewandte Chemie International Edition.2005;44(22):3358–3393. doi:10.1002/anie.200460587.
53. Sharma KD, Karki S, Thakur NS, Attri S. Chemical composition, functionalproperties and processing of carrot: a review. Journal of Food Science andTechnology. 2012;49(1):22–32. doi:10.1007/s13197-011-0310-7.
54. Fukao M, Oshima K, Morita H, Toh H, Suda W, Kim SW, et al. Genomicanalysis by deep sequencing of the probiotic Lactobacillus brevis KB290 harboringnine plasmids reveals genomic stability. PLoS ONE. 2013;8(3).doi:10.1371/journal.pone.0060521.
55. Zhang W, Yu D, Sun Z, Chen X, Bao Q, Meng H, et al. Complete nucleotidesequence of plasmid plca36 isolated from Lactobacillus casei Zhang. Plasmid.2008;60(2):131–135. doi:10.1016/j.plasmid.2008.06.003.
56. Ito Y, Kawai Y, Arakawa K, Honme Y, Sasaki T, Saito T. Conjugative plasmidfrom Lactobacillus gasseri LA39 that carries genes for production of andimmunity to the circular bacteriocin gassericin A. Applied and EnvironmentalMicrobiology. 2009;75(19):6340–6351. doi:10.1128/AEM.00195-09.
57. Tanizawa Y, Tohno M, Kaminuma E, Nakamura Y, Arita M. Complete genomesequence and analysis of Lactobacillus hokkaidonensis LOOC260T, apsychrotrophic lactic acid bacterium isolated from silage. BMC Genomics.2015;16(1):1–11. doi:10.1186/s12864-015-1435-2.
58. van Kranenburg R, Golic N, Bongers R, Leer RJ, de Vos WM, Siezen RJ, et al.Functional analysis of three plasmids from Lactobacillus plantarum. Applied andEnvironmental Microbiology. 2005;71(3):1223–1230.doi:10.1128/AEM.71.3.1223-1230.2005.
59. Kim DH, Jeon YJ, Chung MJ, Seo JG, Ro YT. Complete sequence and geneanalysis of a cryptic plasmid pLU4 in Lactobacillus reuteri strain LU4 (KCTC12397BP). Applied Biological Chemistry. 2017;60(2):145–153.doi:10.1007/s13765-017-0264-1.
February 5, 2019 17/20
.CC-BY 4.0 International licenseis made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It. https://doi.org/10.1101/549451doi: bioRxiv preprint
60. Bankevich A, Nurk S, Antipov D, Gurevich Aa, Dvorkin M, Kulikov AS, et al.SPAdes: A new genome assembly algorithm and its applications to single-cellsequencing. Journal of Computational Biology. 2012;19(5):455–477.doi:10.1089/cmb.2012.0021.
61. Gurevich A, Saveliev V, Vyahhi N, Tesler G. QUAST: Quality assessment toolfor genome assemblies. Bioinformatics. 2013;29(8):1072–1075.doi:10.1093/bioinformatics/btt086.
62. Wickham H. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag NewYork; 2009. Available from: http://ggplot2.org.
63. Seemann T. Prokka: rapid prokaryotic genome annotation. Bioinformatics.2014;30(14):2068–2069. doi:10.1093/bioinformatics/btu153.
64. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignmentsearch tool. Journal of Molecular Biology. 1990;215(3):403–410.doi:10.1016/S0022-2836(05)80360-2.
65. Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, et al.BLAST+: architecture and applications. BMC Bioinformatics. 2009;10(1):421.doi:10.1186/1471-2105-10-421.
66. Lombard V, Golaconda Ramulu H, Drula E, Coutinho PM, Henrissat B. Thecarbohydrate-active enzymes database (CAZy) in 2013. Nucleic Acids Research.2014;42(D1):D490–D495. doi:10.1093/nar/gkt1178.
67. Yin Y, Mao X, Yang J, Chen X, Mao F, Xu Y. DbCAN: A web resource forautomated carbohydrate-active enzyme annotation. Nucleic Acids Research.2012;40(W1):445–451. doi:10.1093/nar/gks479.
68. Finn RD, Clements J, Eddy SR. HMMER web server: Interactive sequencesimilarity searching. Nucleic Acids Research. 2011;39(SUPPL. 2):29–37.doi:10.1093/nar/gkr367.
69. Zhang H, Yohe T, Huang L, Entwistle S, Wu P, Yang Z, et al. dbCAN2: a metaserver for automated carbohydrate-active enzyme annotation. Nucleic AcidsResearch. 2018;46(W1):W95–W101. doi:10.1093/nar/gky418.
70. Emms DM, Kelly S. OrthoFinder: solving fundamental biases in whole genomecomparisons dramatically improves orthogroup inference accuracy. GenomeBiology. 2015;16(1):157. doi:10.1186/s13059-015-0721-2.
71. R Core Team. R: A language and environment for statistical computing; 2015.Available from: https://www.r-project.org/.
72. Conway JR, Lex A, Gehlenborg N. UpSetR: an R package for the visualization ofintersecting sets and their properties. Bioinformatics. 2017;33(18):2938–2940.doi:10.1093/bioinformatics/btx364.
73. Huerta-Cepas J, Forslund K, Coelho LP, Szklarczyk D, Jensen LJ, Von Mering C,et al. Fast genome-wide functional annotation through orthology assignment byEggNOG-mapper. Molecular Biology and Evolution. 2017;34(8):2115–2122.doi:10.1093/molbev/msx148.
74. Stamatakis A. RAxML version 8: A tool for phylogenetic analysis andpost-analysis of large phylogenies. Bioinformatics. 2014;30(9):1312–1313.doi:10.1093/bioinformatics/btu033.
February 5, 2019 18/20
.CC-BY 4.0 International licenseis made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It. https://doi.org/10.1101/549451doi: bioRxiv preprint
75. Yu G, Smith DK, Zhu H, Guan Y, Lam TTY. ggtree: an r package forvisualization and annotation of phylogenetic trees with their covariates and otherassociated data. Methods in Ecology and Evolution. 2017;8(1):28–36.doi:10.1111/2041-210X.12628.
77. Goris J, Konstantinidis KT, Klappenbach JA, Coenye T, Vandamme P, TiedjeJM. DNA-DNA hybridization values and their relationship to whole-genomesequence similarities. International Journal of Systematic and EvolutionaryMicrobiology. 2007;57(Pt 1):81–91. doi:10.1099/ijs.0.64483-0.
78. Abby SS, Neron B, Menager H, Touchon M, Rocha EPC. MacSyFinder: Aprogram to mine genomes for molecular systems with an application toCRISPR-Cas systems. PLoS ONE. 2014;9(10):e110726.doi:10.1371/journal.pone.0110726.
79. Marchler-Bauer A, Bo Y, Han L, He J, Lanczycki CJ, Lu S, et al.CDD/SPARCLE: functional classification of proteins via subfamily domainarchitectures. Nucleic Acids Research. 2017;45(D1):D200–D203.doi:10.1093/nar/gkw1129.
80. Kearse M, Moir R, Wilson A, Stones-Havas S, Cheung M, Sturrock S, et al.Geneious Basic: An integrated and extendable desktop software platform for theorganization and analysis of sequence data. Bioinformatics.2012;28(12):1647–1649. doi:10.1093/bioinformatics/bts199.
Supporting information
S1 Fig. Maximum likelihood phylogenetic tree of the whole Lactobacillusplantarum group constructed using 304 genome assemblies and the aminoacid sequences of their 612 single-copy marker genes with Lactobacillusalgidus DSM 15638 as outgroup. The branch length of the outgroup was shortenedfor better visualization. The colors represent the species as annotated in the NationalCenter for Biotechnology Information (NCBI) Assembly database. Type strains of eachspecies are annotated with a triangle (NCBI) or a square (sequenced in-house).
S2 Fig. Alternative vizualisation of all pairwise average nucleotideidentity (ANI) comparisons for each Lactobacillus plantarum group species,as defined by the clades in Fig1. In green all inter-clade comparisons are shown,while orange shows all intra-clade comparisons. For Lactobacillus xiangfangensis andLactobacillus herbarum, no intra-clade comparisons could be performed, as only onegenomic assembly was available for these species.
S1 Table. Overview of all of the publicly available Lactobacillusplantarum group genomes used in this study. If available, the strain name isshown in the second column. The clade name is in accordance to the phylogenetic treeof Fig1. The last column indicates whether a conjugation system was found or not.
S2 Table. Number of genomes, core orthogroups, accessory orthogroups,average orthogroups per genome and average genes per genome of theLactobacillus plantarum group as a whole and all of its members separately.
February 5, 2019 19/20
.CC-BY 4.0 International licenseis made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It. https://doi.org/10.1101/549451doi: bioRxiv preprint
S3 Table. An overview of all genes found in the Lactobacillusmudanjiangensis conjugation operons. The CONJscan column contains theannotation resulting from running the CONJscan tool.
February 5, 2019 20/20
.CC-BY 4.0 International licenseis made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It. https://doi.org/10.1101/549451doi: bioRxiv preprint
.CC-BY 4.0 International licenseis made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It. https://doi.org/10.1101/549451doi: bioRxiv preprint
.CC-BY 4.0 International licenseis made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It. https://doi.org/10.1101/549451doi: bioRxiv preprint
.CC-BY 4.0 International licenseis made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It. https://doi.org/10.1101/549451doi: bioRxiv preprint
.CC-BY 4.0 International licenseis made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It. https://doi.org/10.1101/549451doi: bioRxiv preprint
.CC-BY 4.0 International licenseis made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It. https://doi.org/10.1101/549451doi: bioRxiv preprint