-
Vol.:(0123456789)1 3
3 Biotech (2021) 11:35
https://doi.org/10.1007/s13205-020-02583-w
ORIGINAL ARTICLE
Implications of genome simple sequence repeats signature
in 98 Polyomaviridae species
Rezwanuzzaman Laskar1 ·
Md Gulam Jilani1 · Safdar Ali1
Received: 25 September 2020 / Accepted: 2 November 2020 /
Published online: 6 January 2021 © King Abdulaziz City for Science
and Technology 2021
AbstractThe analysis of simple sequence repeats (SSRs) in 98
genomes across four genera of the family Polyomaviridae was
performed. The genome size ranged from 3962 (BM87) to 7369 bp
(BM85) but maximum genomes were in the range of 5–5.5 kb. The
GC% had an average of 42% and ranged between 34.69 (BM95) and 52.35
(BM81). A total of 3036 SSRs and 223 cSSRs were extracted using
IMEx with incident frequency from 18 to 56 and 0 to 7,
respectively. The most prevalent mono-nucleotide repeat motif was
“T” (48.95%) followed by “A” (33.48%). “AT/TA” was the most
prevalent dinucleotide motif closely followed by “CT/TC”. The
distribution was expectedly more in the coding region with 77.6%
SSRs of which nearly half were in Large T Antigen (LTA) gene.
Notably, most viruses with humans, apes and related species as host
exhibited exclusivity of mono-nucleotide repeats in AT region, a
proposed predictive marker for determination of humans as host in
the virus in course of its evolution. Each genome has a unique SSR
signature which is pivotal for viral evolution particularly in
terms of host divergence.
Keywords Simple sequence repeats · Polyomaviridae ·
Prevalence · Distribution · Virus host ·
Evolution
Introduction
The genome of any organism is the key to understanding its
functionality and evolutionary significance. Besides the sequence
per se, each genome has some features which pro-vide for very
crucial information. For instance, the repeat sequences or
satellite sequences which are classified on the basis of the length
of the repeat motif. Simple sequence repeats (SSRs) are the
smallest of satellite sequences also known as microsatellites. SSRs
are ubiquitously present across the genomes of all organisms,
albeit with different
incidence, complexity and iterations. Ever since the
identi-fication of these repeats in multiple species, across coding
and non-coding regions, their functional relevance has been
explored at different levels (Gur-Arie et al. 2000; Kofler
et al. 2008; Chen et al. 2012). Clinical relevance of
SSRs in humans has also been reported. For instance, the expansion
of these repeats through copy number alterations has been
associated with enhancer amplification near oncogenes in cancer as
well as in neuronal degradation in multiple neu-ropathies (Burguete
et al. 2015; Hung et al. 2019). Based on iterations and
intervening sequences, tandemly repeated SSRs may be classified
into interrupted, pure, compound, interrupted compound, complex or
interrupted complex (Chambers and MacAvoy 2000).
Amongst various organisms, viruses are a unique plat-form to
study SSRs owing to their small but rapidly evolving genomes.
Further, the dependence of viruses on the host cell for survival
makes it an easy aspect to study in terms of genome features and
evolution. SSRs have been reported to play a role in genome
evolution (Bennetzen 2000) and host range in viruses (Alam
et al. 2019).
Present study focuses on extraction and analysis of
micro-satellites from genomes of 98 species of Polyomaviridae,
which is a family of small, non-enveloped viruses that
Supplementary Information The online version contains
supplementary material available at https ://doi.org/10.1007/s1320
5-020-02583 -w.
* Safdar Ali [email protected]; [email protected]
Rezwanuzzaman Laskar [email protected]
Md Gulam Jilani [email protected]
1 Clinical and Applied Genomics (CAG) Laboratory,
Department of Biological Sciences, Aliah University, IIA/27,
Newtown, Kolkata 700160, India
http://orcid.org/0000-0003-3298-9282http://crossmark.crossref.org/dialog/?doi=10.1007/s13205-020-02583-w&domain=pdfhttps://doi.org/10.1007/s13205-020-02583-whttps://doi.org/10.1007/s13205-020-02583-w
-
3 Biotech (2021) 11:35
1 3
35 Page 2 of 12
derives its name “Polyoma” from its ability to induce mul-tiple
tumors in its host. These viruses normally have mam-mals, avians
and fish as their hosts (Ahsan and Shah 2006). The circular/linear
genome generally encodes for two types of proteins. First, the
early regulatory proteins which include large tumour antigen
(LTAg), small tumour antigen (STAg), middle tumour antigen (MTAg),
alternative tumour antigen (ATAg) and putative alternative large
tumour antigen (PAL-TAg). These are pivotal for replication,
transcription and maturation of the virus during infection. Second
category of genes include those encoding for late structural
proteins, which include the major capsid protein, viral protein 1
(VP1) and minor capsid proteins, VP2 and VP3. As the name sug-gests
these are important for capsid formation (Moens et al. 2011;
Meijden et al. 2015).
In this analysis, we extracted SSRs from genomes of
Pol-yomavirus and studied its incidence, distribution and
com-plexity to understand the genome SSR signature. Further, the
role of SSRs in viral evolution and contributing genome regions
therein has been studied. This understanding of the viral genomics
holds the key to combat viral pathogenesis and host divergence.
Materials and methods
Genome sequences
Whole-genome sequence of 98 species of Alphapolyoma-virus of
family Polyomaviridae across 4 different genera which is listed in
ICTV (https ://talk.ictvo nline .org/ictv-repor ts/ictv_onlin
e_repor t/dsdna -virus es/w/polyo mavir idae) was extracted from
NCBI (http://www.ncbi.nlm.nih.gov/). These include
Alphapolyomavirus (43 species), Betapolyomavirus (33 species),
Gammapolyomavirus (9 species) and Deltapol-yomavirus (4 species).
The study also included 9 species yet to be assigned Genera. The
details of all the species included in the study (Genome type,
Genera, Genome size, GC%, Host, Accession number) have been
summarized in Sup-plementary file 1. All the genomes were
double-stranded DNA, mostly circular except for 10 linear genomes.
The information for all the known hosts for these viruses was
assessed from Virus-Host Database (https ://www.genom e.jp/virus
hostd b/note.html).
Microsatellite extraction
We have used Imperfect Microsatellite Extractor (IMEx) for
extracting SSRs, wherein mono- to hexa-nucleotide repeat motifs are
uncovered, imperfect microsatellites are allowed and compound
microsatellites (cSSR: multiple SSRs sep-arated by a distance of
less than equal to dMAX) have a
dMAX range of 10–50. So, the results need to be assessed within
these parameters.
Microsatellite extraction was carried out using the
‘Advance-Mode’ of IMEx with the parameters reported for HIV
(Mudunuri and Nagarajaram 2007; Chen et al. 2012) and as used
for Mycobacteriophages (Alam et al. 2019). Briefly, the
parameters included minimum repeat size which was set as follows: 6
(mono-), 3 (di-), 3 (tri-), 3 (tetra-), 3 (penta-), 3 (hexa-). Two
SSRs separated by a distance of less than or equal to dMAX are
treated as a single cSSR. In other words, maximum distance allowed
between any two SSRs is called dMAX which was set at 10 initially
and subsequently varied to 20, 30, 40, 50. All corresponding
changes in cSSR incidence were recorded. It should be noted here
that the maximum permissible dMAX value in IMEx is 50, because
beyond that the fate of microsatellites is individualistic and
hence clubbing it as cSSR becomes irrelevant. Other param-eters
were set to the defaults.
Statistical analysis
All statistical analyses performed on the spreadsheet using data
Analysis ToolPak of MS Office Suite v2016. Linear regression was
used to reveal the correlation between the relative abundance,
relative density of microsatellites with genome size and GC%.
Dot plot analysis for host specificity
Dot plot analysis of two nucleic acid/protein sequences using
Genome Pair Rapid Dotter (GEPARD) highlights the pres-ence of SSRs
within the genomes (Krumsiek et al. 2007; Alam et al.
2019) to ascertain their evolutionary relation-ships in context of
repeats, reverse matches, and conserved domains. We used GEPARD
v1.40 (Krumsiek et al. 2007) to perform dot plot analysis
between genomes on the basis of hosts.
Evolutionary relationship
The phylogenetic tree construction was carried out by aligning
the nucleotide sequence with the default speci-fications of MAFFT
v6.861b (Katoh and Standley 2013) and the alignment was pruned by
the trimAl v1.4.rev6 gap-pyout algorithmic method
(Capella-Gutierrez et al. 2009) using the ETE3 v3.1.1 “build”
function as implemented on GenomeNet (https ://www.genom e.jp/tools
/ete/). To evalu-ate the evolutionary perspective that matches the
alignment perfectly, we used pmodeltest v1.4 among JC, K80, TrNef,
TPM1, TPM2, TPM3, TIM1ef, TIM2ef, TIM3ef, TVMef, SYM, F81, HKY,
TrN, TPM1uf, TPM2uf, TPM3uf, TIM1, TIM2, TIM3, TVM and GTR models
to infer ML tree. Using RAxML v8.1.20 of the GTRGAMMAI model with
default
https://talk.ictvonline.org/ictv-reports/ictv_online_report/dsdna-viruses/w/polyomaviridaehttps://talk.ictvonline.org/ictv-reports/ictv_online_report/dsdna-viruses/w/polyomaviridaehttp://www.ncbi.nlm.nih.gov/https://www.genome.jp/virushostdb/note.htmlhttps://www.genome.jp/virushostdb/note.htmlhttps://www.genome.jp/tools/ete/
-
3 Biotech (2021) 11:35
1 3
Page 3 of 12 35
parameters (Stamatakis 2014), the Maximum-Likelihood (ML) tree
was asserted with the 100 bootstrap replicates. The final tree for
visualization was constructed utilizing the webtool interactive
Tree Of Life (Letunic and Bork 2019).
Results
Genome features
The genome size ranged from 3962 (BM87) to 7369 bp (BM85)
but maximum genomes were in the range of 5–5.5 kb. However,
the GC% with an average of 42% ranged
between 34.69 (BM95) and 52.35 (BM81) but exhibits much more
diversity as compared to genome size (Fig. 1a, Supple-mentary
file 1). In essence, the Polyomaviridae genomes are mostly of
similar sizes, but its composition in terms of GC% is much more
variable. If we hypothesize that SSR incidence has an equal chance
across the whole genome, irrespective of the composition. Then the
same should be reflected in the motifs of SSRs present. However, as
discussed later, this is not the case. There are several species
which have mono-nucleotide motifs exclusively in the AT region.
The correlation between genome size and GC content was
ascertained with various SSR features. SSR incidence was found to
be significantly correlated (r = 0.19, P < 0.05) with
20
30
40
50
SSR
Inci
denc
e
4K
5K
6K
7K
Gen
ome
Size
0
2
4
6
8
cSSR
Inci
denc
e
35
40
45
50
GC
%
BM
3B
M5
BM
7B
M9
BM
11B
M13
BM
15B
M17
BM
19B
M21
BM
23B
M25
BM
27B
M29
BM
31B
M33
BM
35B
M37
BM
39B
M41
BM
43B
M45
BM
47B
M49
BM
51B
M53
BM
55B
M57
BM
59B
M61
BM
64B
M66
BM
68B
M70
BM
72B
M74
BM
76B
M78
BM
80B
M82
BM
84B
M86
BM
88B
M90
BM
92B
M94
BM
96B
M98
0
2
4
6
8
RA
(SSR
)
0.0
0.5
1.0
1.5
RA
(cSS
R)
0
20
40
60
80
RD
(SSR
)0
10
20
30
RD
(cSS
R)
A
B
Fig. 1 a Genome features and SSR/cSSR incidence of
Polyomaviri-dae genomes. Though genome size is predominantly around
5–5.5 kb as evident by a fairly constant level of red bars
whereas the corre-sponding GC variations (superimposed black bars)
have a much broader range. In addition, note the diversity in SSRs
incidence in genomes of similar length. Furthermore, higher SSR
incidence does
not necessarily translate to more cSSRs. b Relative abundance
(RA) and relative density (RD) of SSRs and cSSRs RA is the number
of microsatellites present per kb of the genome whereas RD is the
sequence space composed of SSRs of microsatellites per kb of the
genome. The varying peaks signify the presence of a unique SSR
sig-nature for each genome
-
3 Biotech (2021) 11:35
1 3
35 Page 4 of 12
genome size and GC content (r = 0.08, P < 0.05). Though
relative density and relative abundance were not significantly
correlated with genome size (r = 0.01, P > 0.05; r = 0.005, P
> 0.05), significant correlation was observed with GC con-tent
(r = 0.20, P < 0.05; and r = 0.23, P < 0.05),
respectively.
Further, cSSR incidence is significantly correlated with genome
size (r = 0.06, P < 0.05) but its corresponding rela-tive
density (r = 0.0038, P > 0.05) and relative abundance (r =
0.004, P > 0.05) shows no significant correlation therein. GC
content is also significantly correlated for cSSR inci-dence (r =
0.06, P < 0.05), relative density (r = 0.11, P < 0.05), and
relative abundance (r = 0.08, P < 0.05).
Incidence of SSRs and cSSRs
A total of 3036 SSRs and 223 cSSRs were extracted from the 98
species of Polyomaviridae (Supplementary files 2–4). The average
distribution of SSRs and cSSRs per genome varied from 23 and 1.3
(Gammapolyomavirus) to 33 and 2.9 (Betapolyomavirus), respectively.
Their distribution across genera has been summarized in
Table 1.
Maximum of 56 SSRs were present in BM85 whereas minimum of 18
were present in BM80 and BM21. cSSR incidence ranged from 0 in
seven species (BM99, BM82, BM76, BM59, BM24, BM21, BM14) to 7 in
two species (BM85 and BM84) (Fig. 1a). Two interesting but
contrast-ing observations can be made from this data. First, BM85
and BM84 with 7 cSSRs have 56 and 31 SSRs in a genome size of 7369
and 4697 bp, respectively (Supplementary file 2). What it
essentially means is that though a longer genome should ideally
account for more SSRs but the eventual clus-tering of SSRs
reflected as cSSR incidence remains the same. Thus, the SSR rich
regions of the genome are inde-pendent of genome size. The second
aspect is that the above observation is not the norm as is evident
from the cSSR range of zero to seven. Multiple genomes of
Polyomaviridae with varying number of SSRs have same number of
cSSRs. This is highlighted by 29 species having 2 cSSRs
(Fig. 1a, Supplementary files 2–4) suggesting of a unique
genome SSR signature.
To further highlight the regularity of this anomaly, we looked
into cSSR%, which is percentage of SSRs present as cSSRs in a
particular genome. Note, the variations in cSSR% are not only
across different genera but even within, thereby negating the
clustering of SSRs in a genera specific manner (Fig. 2a).
These are reflective of specific yet variable localizations and
clustering of SSRs in a particular genome.
Relative abundance (RA) and relative density (RD)
of SSRs and cSSRs
RA is the number of microsatellites present per kb of the genome
whereas RD is the sequence space composed of SSRs of
microsatellites per kb of the genome. So, these val-ues are
reflective of number of iterations of SSRs present. If the SSRs
have a conserved tendency to be iterated, then higher incidence
should correspond to elevated RD values. Moreover, a higher RA
value should correspond to high RD value. As observed, BM65 has the
highest RA and RD val-ues of 9.32 and 80.4, respectively, for SSRs
which means, since more SSRs are present per kb of the genome, more
genome is comprised of SSRs. The corresponding lowest values for RA
and RD was 3.39 (BM21) and 26.5 (BM80), respectively (Fig. 1b,
Supplementary files 2–4).
Similarly, the cSSR relative abundance (cRA) and rela-tive
density (cRD) was also studied. Since there were 7 spe-cies with no
cSSR (Fig. 1a), hence the minimum cRA and cRD values were zero
for these species. The highest values for cRA and cRD were 1.490
(BM84) and 33.93 (BM95), respectively (Fig. 1b, Supplementary
files 2–4). This dif-ference may be due to the differential
composition of the cSSRs.
dMAX and cSSR
cSSR incidence is dependent on the allowed distance (dMAX)
between two SSRs for it to be treated as one cSSR. Since cSSR is
reflective of clustering of SSRs, and IMEx allows for dMAX values
till 50, we analyzed cSSR incidence of Polyomaviridae genomes by
varying the dMAX values
Table 1 SSR and cSSR incidence across the different genera of
Polyomaviridae
S. No. Genera No. of Species SSR incidence Average SSR per
Species
cSSR incidence Average cSSR per Species
1 Alphapolyomavirus 43 1315 30.58 80 1.862 Betapolyomavirus 33
1090 33.03 96 2.93 Deltapolyomavirus 04 108 27 6 1.54
Gammapolyomavirus 09 208 23.11 12 1.335 Unassigned Species 09 315
35 29 3.22
Total 98 3036 223
-
3 Biotech (2021) 11:35
1 3
Page 5 of 12 35
BM
4
BM
6
BM
8
BM
10
BM
12
BM
14
BM
16
BM
18
BM
20
BM
22
BM
24
BM
26
BM
28
BM
30
BM
32
BM
34
BM
36
BM
38
BM
90
BM
92
BM
94
BM
40
BM
42
BM
44
BM
46
BM
48
BM
50
BM
52
BM
54
BM
56
BM
58
BM
60
BM
62
BM
65
BM
67
BM
69
BM
95
BM
71
BM
73
BM
75
BM
77
BM
79
BM
81
BM
83
BM
85
BM
87
BM
97
BM
99
0
5
10
15
20
25
30
35
40
45
50cS
SR%
BM
2B
M3
BM
4B
M5
BM
6B
M7
BM
8B
M9
BM
10B
M11
BM
12B
M13
BM
14B
M15
BM
16B
M17
BM
18B
M19
BM
20B
M21
BM
22B
M23
BM
24B
M25
BM
26B
M27
BM
28B
M29
BM
30B
M31
BM
32B
M33
BM
34B
M35
BM
36B
M37
BM
38B
M39
BM
40B
M41
BM
42B
M43
BM
44B
M45
BM
46B
M47
BM
48B
M49
BM
50B
M51
BM
52B
M53
BM
54B
M55
BM
56B
M57
BM
58B
M59
BM
60B
M61
BM
62B
M64
BM
65B
M66
BM
67B
M68
BM
69B
M70
BM
71B
M72
BM
73B
M74
BM
75B
M76
BM
77B
M78
BM
79B
M80
BM
81B
M82
BM
83B
M84
BM
85B
M86
BM
87B
M88
BM
89B
M90
BM
91B
M92
BM
93B
M94
BM
95B
M96
BM
97B
M98
BM
99B
M10
0
0
50
100
150
200
250
300
350
400
450
cSSR
Inci
denc
e In
crea
sing
Per
cent
age
(%) w
ith v
aryi
ng d
MA
XA
B
Alpha PV Beta PV Delta PV Gamma PV Unassigned Species
dMAX30dMAX20 dMAX40 dMax50
Fig. 2 a cSSR% in the studied Polyomaviridae genomes. Percentage
of individual SSRs as part of cSSRs is cSSR%. The data for all the
genera are differentially coloured. Not only there is diversity
across the genera but also within the genomes of the same genera as
well. Interestingly, BM84 which has the highest cSSR% is yet to be
clas-
sified into any genera. b Percentage increase in cSSR incidence
with increasing dMAX (10–50). Note the non-linearity in increase.
Nega-tive bars represent a decrease in cSSR incidence when two
cSSRs merge into one with increasing dMAX
-
3 Biotech (2021) 11:35
1 3
35 Page 6 of 12
from initial value of 10 to 20, 30, 40 and 50. Subsequently, %
increase was calculated using the given formula.
This % increase was thereon plotted. Though maximum increase is
observed for most species when dMAX increased from 10 to 20 as
evident from the predominant black bar, it does not conform to a
pattern per se (Fig. 2b). This means that even in species of
the same family, SSRs chart their own path in terms of
localizations in each genome.
SSR motif types and their prevalence
First, the contribution of different repeat motif (mono- to
hexa) to the overall SSRs incidence was ascertained. The data were
analysed separately for each of the genera. Moreo-ver, the analysis
was done in percentage and not absolute numbers to account for
variable number of species across genera. Note that the data from
species with unassigned genera was not included herein. The
contribution of mono-nucleotide repeats motifs ranged from 36
(Gammapolyoma-virus) to 47% (Betapolyomavirus). Deltapolyomavirus
had no incidence of penta- and hexa-nucleotide repeats whereas
Gammapolyomavirus lacked hexanucleotide repeats. This can be
attributed to fewer species in these genera. Gam-mapolyomavirus had
the highest contribution from di-nucle-otide repeats (39.42%) and
the only genus to have more di-nucleotide repeats than
mono-nucleotide repeats (Fig. 3a, Supplementary files
2–3).
We thereon looked into the motif composition of mono- and
di-nucleotide repeats for their prevalence across the different
genera of Polyomaviridae. For the mono-nucle-otides, if we look at
the overall data, the most prevalent repeat motif is “T” (48.95%)
followed by “A” (33.48%). “T” also remains the most prevalent
mono-nucleotide motif for Alpha-, Beta- and Delta-polyomavirus (47,
52 and 71 percent, respectively). However, Gammapolyomavirus has a
highest contribution from “C” (34.67%) followed by “T” (33.33%)
(Fig. 3b, Supplementary files 2–3). Interestingly, the same
Gammapolyomavirus has the highest di-nucleotide repeat motif
contribution from “AT/TA” (29.27%) motif while Alphapolyomavirus
has its largest contribution from “CT/TC” (29.37). Overall, “AT/TA”
was the most preva-lent dinucleotide repeat motif closely followed
by “CT/TC” (Fig. 3c) PV: polyomavirus.
SSRs in coding regions
The assessment of SSRs distribution across genome revealed that
non-coding region accounted for 679 SSRs (22.4%)
%increase =
[
{cSSR incidence at dMAXn − cSSR incidence at dMAX(n − 10)}
÷cSSR incidence at dMAX(n − 10)
]
× 100
whereas coding region comprised of 32 proteins/putative
genes/ORFs housed 2357 (77.6%) of SSRs (Supplementary
file 2).Subsequently, we analyzed the SSR prevalence across
dif-
ferent genes of the studied genomes. Six genes accounted for
over 92% of SSRs. Overall, the LTAg gene alone accounted for over
47% of total SSRs with VP1 gene a distant second at around 16%
(Fig. 3d). Thereafter, we dissected the data across different
genera. Interestingly, though LTAg gene takes the pole position in
the housing of SSRs across genera, its contribution varied. In
Betapolyomavirus, it was account-ing for one in every two SSR
(49.54%) while in Gammapoly-omavirus, approximately one in every
three SSR was housed in LTAg gene (35%). This difference permeates
to all the genes, albeit to a lesser extent (Fig. 3e,
Supplementary files 2–3).
SSRs (mono‑nucleotide) specificity and host range
exclusivity
The compilation of different SSRs contribution to overall
incidence revealed an interesting observation. Eighteen spe-cies
had one hundred percent mono-nucleotide SSRs com-prising of A/T.
Further, the majority of these viruses had humans or members of the
ape family as their hosts. To elucidate a possible pattern and
significance of the same, we arranged all the studied species in
decreasing order of their mono-nucleotide SSR contribution by A/T
(Fig. 4, Supplementary files 1–2). Notably, viruses with
humans, apes, and related species as hosts have a much higher A/T
mono-nucleotide SSRs composition as compared to birds and fishes as
hosts (Fig. 4).
Using representative species (9 each) we thereon inves-tigated
whether the SSRs composition by A/T and the hosts reflect a
pattern. Dot plot analysis was performed for nine species each with
humans, apes and related species as hosts (Fig. 5a) and nine
species with birds, fishes and other species as hosts
(Fig. 5b). Interestingly, even though three species in
Fig. 4 have 100% mono-nucleotide SSR contribution by A/T (same
as Fig. 5a), the overall number of dots (reflective of repeat
sequences) is higher for all the genomes of Fig. 5a,
representing humans and related species as hosts.
Phylogenetic tree of Polyomaviridae
Subsequently, we constructed the phylogenetic tree of the 98
Polyomaviridae genomes and observed that all the viruses are not
evolved together as per their hosts. However, hosts do
-
3 Biotech (2021) 11:35
1 3
Page 7 of 12 35
Fig.
3 a
SSR
inci
denc
e an
d m
otif
leng
th. A
n in
crea
se in
repe
at m
otif
resu
lted
in le
sser
inci
denc
e, in
vers
e pr
opor
tiona
lity,
whi
ch is
exp
ecte
d. H
owev
er, t
wo
obse
rvat
ions
sho
uld
be n
oted
. Firs
t, G
amm
apol
yom
avir
us is
the
only
gen
era
whe
re th
e hi
ghes
t inc
iden
ce is
of d
i-nuc
leot
ide
repe
at m
otifs
. All
othe
rs h
ave
mon
o-nu
cleo
tide
mot
if as
mos
t rep
rese
nted
alo
ng e
xpec
ted
lines
. Sec
ond,
th
e fa
ll in
inci
denc
e fro
m m
ono-
to d
i-nuc
leot
ide
mot
if SS
Rs
is th
e le
ast i
n D
elta
poly
omav
irus
. b M
ono-
nucl
eotid
e m
otif
com
posi
tion.
In-s
pite
of v
aryi
ng G
C p
erce
ntag
e (F
ig. 1
), th
e m
ono-
nucl
eotid
e m
otif
com
posi
tion
is v
ery
muc
h bi
ased
tow
ards
A/T
acr
oss
all g
ener
a. T
otal
repr
esen
ts o
vera
ll da
ta. c
Di-n
ucle
otid
e m
otif
com
posi
tion.
Tho
ugh
AT/
TA is
the
mos
t rep
rese
nted
di-
nucl
eotid
e re
peat
mot
if ov
eral
l, it
does
not
enj
oy th
e sa
me
stat
us a
cros
s all
gene
ra, w
ith A
lpha
poly
omav
irus
bei
ng th
e ex
cept
ion.
Her
e, C
T/TC
has
the
high
est i
ncid
ence
clo
sely
follo
wed
by
AT/
TA. d
Dist
ribut
ion
of S
SRs
(%) a
cros
s di
ffere
nt p
rote
ins.
Ove
rall,
LTA
g ac
coun
ted
for o
ver 4
7% o
f all
SSR
s in
the
codi
ng re
gion
with
VP1
com
ing
a di
stan
t sec
ond
at a
roun
d 16
%. O
nly
the
6 pr
otei
ns w
hich
acc
ount
ed fo
r the
hig
hest
SSR
s wer
e in
clud
ed, t
he re
st ha
ve b
een
colle
ctiv
ely
take
n as
“O
ther
s”. e
SSR
s con
tribu
tion
(%) b
y pr
otei
ns a
cros
s diff
eren
t gen
era.
Her
ein,
subt
le v
aria
-tio
ns a
re v
isib
le. T
houg
h LT
Ag
gene
acc
ount
s for
max
imum
SSR
s in
the
codi
ng g
enom
e ac
ross
all
the
gene
ra b
ut th
e co
ntrib
utin
g pe
rcen
tage
var
ies f
rom
35%
in G
amm
apol
yom
avir
us to
alm
ost
50%
in B
etap
olyo
mav
irus
-
3 Biotech (2021) 11:35
1 3
35 Page 8 of 12
reflect in the tree. Multiple places of clustering of the virus
with the same or related hosts can be observed (Fig. 6). The
fact that all viruses with human or same hosts do not follow the
pattern is only indicative of other players in genome evolution
besides hosts.
We thereon superimposed the data for percentage mono-nucleotide
SSR contribution by AT region, the phylogenetic analysis and the
known hosts. For the sake of clarity, hosts of only those species
with > 90% mono-nucleotide SSR con-tribution from AT region are
shown as illustrations here, though the complete information is
provided in Fig. 4. We hypothesize that the presence of
mono-repeats in the AT region is somehow providing for viral host
flexibility and interchangeability.
Discussion
Owing to the variable nature of the A/T and G/C regions of the
DNA, often these sequences exhibit specific attrib-utes. The
significance of AT repeats in strand slippage and copy number
polymorphism is well documented (Katti et al. 2001). Though
this implies GC content to be an important aspect for SSR studies
but it is not necessarily the case primarily because of two
reasons. First, the uneven
distribution of SSRs across any genome as observed herein and
reported for other genomes is not determined by the GC content
(Chen et al. 2012; Alam et al. 2013, 2019). For instance,
there are 18 species herein where the complete mono-nucleotide SSRs
are localized to the A/T region. The fact that these genomes have a
maximum GC content of 52%, proves the argument with 48% of the
genome housing hundred percent of the mono-nucleotide repeats. We
believe that this unevenness in distribution is not random but with
a purpose; most probably host range, as discussed later. Sec-ond,
the prevalence of repeats is dependent on size of repeat motifs, as
in what is applicable to mono-nucleotides, is not true for
di-nucleotides and it also varies from one genus to another.
However, two exceptions both in Gammapolyoma-virus deserve mention.
First, it is the only genera to have maximum mono-nucleotide SSRs
as “C”. It is a deviation from AT region being hub for shorter
repeat motifs. Con-trastingly, it returns to expected lines with
“AT/TA” being the most represented di-nucleotide repeat motif.
Second, we should bear in mind that this genus has lesser number of
species (nine) but that may be looked with multiple perspec-tives.
Either we consider the fewer species as the reason for the aberrant
observation or we can assume this uniqueness is the reason for
fewer species in Gammapolyomavirus. We believe in the latter.
BM81
BM83Serinus
canaria
BM21Mus musculus
BM76Gallus gallus,Melopsittacusundulatus
BM77Eurasian jackdaw
BM37Pteropus
vampyrus
BM82Pyrrhula
pyrrhula
griseiventris
BM75Anser sp.
BM2Acerodon celebensis
BM99Sparus aurata
BM66Rattus norvegicus
BM84Bos taurus
BM38Rattus norvegicus
BM3Artibeus planirostris
BM17Homo sapiens
BM44Desmodus rotundus
BM13Homo sapiens
BM78Cracticus torquatus
BM85Centropristis striata
BM35Pongo pygmaeus
BM19Mesocricetus auratus
BM10Dobsonia moluccensis
BM89Sorex araneus
BM47Equus caballus
BM92Sturnira lilium
BM90Sorexcoronatus
BM58Miniopterus africanus
BM91Sorex minutus
BM98Procyon lotor
BM25Pan troglodytes verus
BM12Gorilla gorilla gorilla
BM27Pan troglodytes verus
BM93Miniopterusschreibersii
BM15Homo sapiens
BM29Pan troglodytesverus
BM97Ailuropodamelanoleuca
BM56Meles meles
BM96Rousettusaegyptiacus
BM45Dobsoniamoluccensis
BM26Pan troglodytes verus
BM80Lonchura maja
BM11Eidolon helvum
BM5Ateles paniscus
BM23Otomops martiensseni
BM94Miniopterus schreibersii
BM16Homo sapiens
BM32Piliocolobus badius
BM95Canis familiaris
BM18Macacafascicularis
BM7Carolliaperspicillata
BM8Chlorocebuspygerythrus
BM22Otomops martiensseni
BM6Cardioderma cor
BM53Loxodonta africana
BM71Homo sapiens
BM20Molossus molossus
BM52Leptonychotes weddellii
BM28Pan troglodytes verus
BM57Microtus arvalis
BM9Chlorocebus pygerythrus
BM39Acerodon celebensis
BM30Pan troglodytes schweinfurthii
BM86Delphinus delphis
BM61Myotis lucifugus
BM49Homo sapiens
BM70Zalophus californianus
BM64Pteronotus davyi
BM14Homo sapiens
BM51Homo sapiens
BM4Artibeus planirostris
BM36Procyon lotor
BM50Homo sapiens
BM34Pongo abelii
BM46Dobsonia moluccensis
BM31Papio cynocephalus
BM40Artibeus planirostris polyomavirus 1
BM69Vicugna pacos
BM41Cebus albifrons
BM65Pteronotus parnellii
BM54Macaca mulatta
BM88Trematomus pennellii
BM87Rhynchobatus djiddensis
BM79Erythrura gouldiae
BM74Homo sapiens
BM73Homo sapiens
BM72Homo sapiens
BM68Saimiri sciureus
BM67Saimiri boliviensis
BM62Pan troglodytes verus
BM60Myodes glareolus
BM59Mus musculus
BM55Mastomys natalensis
BM48Homo sapiens
BM43Chlorocebus pygerythrus
BM42Cercopithecus erythrotis
BM33Piliocolobus rufomitratus
BM24Pan troglodytes verus
BM100Trematomus bernacchii
Pygoscelis adeliae
25.49 100.00
Fig. 4 Genomes with decreasing % of A/T mono-nucleotides repeat
motif. Though, not perfect, the similar values for humans and
related species suggests host range dependency on SSR distribution
across AT genome regions. Higher the contribution of
mono-nucleotide
repeat motifs from AT region, greater are the chances that it
will have humans or related species as its host. The color gradient
represents the percentage of A/T mono-nucleotide repeat motif
-
3 Biotech (2021) 11:35
1 3
Page 9 of 12 35
Fig.
5
Dot
plot
ana
lysi
s of
Pol
yom
avir
idae
gen
omes
with
a h
uman
, ape
s or
rela
ted
spec
ies
as h
osts
with
mon
o-nu
cleo
tide
repe
at m
otif
cont
ribut
ion
of 1
00%
from
the
AT
regi
on a
nd b
div
erge
nt
hosts
with
var
ying
mon
o-nu
cleo
tide
repe
ats i
n th
e A
T re
gion
-
3 Biotech (2021) 11:35
1 3
35 Page 10 of 12
The study of cSSRs has always been relevant with SSRs owing to
their involvement in functional aspects such as reg-ulation of gene
expression (Kashi and King 2006; Chen et al. 2011).
Essentially, cSSR is a reflection of accumulation of SSRs in the
genome. Higher cSSR incidence refers to SSRs
present in close proximity to each other and with these being
sources of variations and genome evolution (Kim et al. 2008;
Madsen et al. 2008), we further looked at cSSRs in terms of
cSSR% and by varying dMAX. An increase in cSSR incidence with
increasing dMAX is expected and observed
BM95
BM91
BM30
BM16
BM34
BM31
BM20
BM25
BM18
BM15
BM96
BM60
BM99
BM2
BM86
BM93
BM54
BM97
BM35
BM78
BM49
BM83
BM76
BM47
BM88
BM87
BM21
BM26
BM29
BM94BM
65
BM38
BM79
BM72
BM9
BM92
BM77
BM52
BM50
BM41
BM57
BM17
BM98
BM56BM
69
BM40
BM12
BM70
BM23
BM33
BM100
BM24
BM80
BM19
BM75
BM84
BM22
BM37
BM55
BM14
BM67
BM32
BM45
BM89
BM8
BM46
BM59
BM71
BM42
BM48
BM7
09MB
BM74
BM4
BM51
BM11
BM3
BM53
BM6
BM44
BM82
BM10
BM81
BM66
BM58B
M64
BM13
BM68
BM73
BM43
BM85
BM39
BM28
BM27
BM62
BM36
BM5
5555
100100
3333
8787
100
100
8383
3636
100100
6262
7474
5555
9494
7979
8989
100 100
7171
6565
100100
100100
3333
4747100100
4242
100100
8686
3030
9696
100100
5858
100 100
9696
7878
100100
100100
100 100
100100
5151
9999
4040
100100
100100
8484
100 100
5252
100
100
6363
4848
100
100
8181
8484
100 100
5555
100 100
100
100
100 100
9292
5050
2626
100 100
100 100
9898
5353
100100
9999
100
100
9393
000011
1919
5757
100
100
100100
4949
9999
5858
100 100
100 100
7676
9393
000011
9292
5454
4646
100 100
100100
100 100
9191
99
100100
100 100
100 100
100100
100 100
100 100100 100
5050
100100
100100
0.3650.365
0034
0.034
0131
0.1310063
0.063
0.253
0.253
0.031 0.031
0186 0.186
0.041
0.041
0.0380.038
00.111818
0.0850.085
0.0390.039
013
0.13
0.054 0.054
0.065
0.065
0071 0.071
00.111717
0.081
0.081
0.061
0.061
0.184 0.184
0.213 0.213
0.186
0.186
0.094 0.094
0.05 0.05
0.094
0.094
0.0430.043
0.1480.148
0.3020.302
0.1850.185
0.0290.029
0.051
0.051
0082
0.082
0175 0.175
0.1480.148
1621 1.621
0.080.08
0.02
30.
023
0.5740.574
0.1710.171
0.1490.149
0.152
0.152
0.049 0.049
0.174
0.174
0052
0.052
0.060.06
0.1350.135
0.1050.105
0.265
0.265
0.2
0.211
11
0.501
0.501
0.074
0.074
0.1540.154
0.070.07
0081
0.081
0.0920.092
0.0690.069
0.438 0.438
0.381
0.381
0.23 0.23
1366 1.366
0.1470.147
2021 2.021
06 0.61111
0.063 0.063
0.105
0.105
0.1280.128
0.1530.153
0498
0.498
0.194
0.194
0.0420.042
0.074 0.074
0.0350.035
0213
0.213
0.09
70.
097
0083 0.083
0.234
0.234
0.0760.076
0288
0.288
0.3570.357
0.2150.215
0.090.09
00
00.111212
0.0360.036
0.153
0.153
2.437 2.437
0252 0.252
0.20.21111
0.239
0.239
0379
0.379
0.1060.106
0.155 0.155
0.093
0.093
0.0960.096
0.2310.231
0.038
0.038
0.0620.062
00.111616
164 1.64
0.0390.039
00.111717
0.055 0.055
0.0640.064
0.277 0.277
0.04
0.04
0.1760.176
0.449 0.449
0.14
70.
147
0.312
0.312
0169 0.169
0.610.61
0.2490.249
0079
0.079
0.0750.075
0.046
0.046
0.0350.035
0.1920.192
00.111515
0.0550.055
0.1080.108
0.233 0.233
0.09
10.
091
0.0260.026
0.0510.051
0.7910.791
0164 0.164
0.0010.001
0.059
0.059
0.234
0.234
0.0730.073
0.32 0.32
0.340.34
0096
0.096
334422.
00
0.083
0.083
990044.
00
0051
0.051
0.10
10.
101
0.1640.164
7700.00
0.415
0.415
0.1520.152
0155
0.155
0.3130.313
0.085 0.085
015 0.15
0.05
20.
052
0.1610.161
0.581
0.581
0.03
0.03
0.3060.306
0.2530.253
0.2020.2020.064
0.064
0.055 0.055
0.2180.218
0.197 0.197
0039
0.039
0.15
50.
155
0.568
0.568
0438
0.438
0.29 0.29
0.0630.063
0.3180.318
557755.
00
0.040.04
0.1050.105
0.0370.037
00
0.035
0.035
0.38 0.38
0.0720.072
0.262
0.262
0.2340.234
0.08
70.
087
0.25
0.25
1.301 1.301
0.0620.062
0.0180.018
0174
0.174
0.1430.143
0152 0.152
0.0910.091
0.219 0.219
0.352 0.352
0.151
0.151
0.0930.093
0.0680.068
0.0710.071
0343
0.343
0.1490.149
0.1050.105
0.0810.081
0.524 0.524
0.2380.238
Mono Nucleotide Repeat (SSR) AT%
Mono Nucleotide Repeat (SSR) GC%
Tree scale: 0.1
Human :
Ape :Monkey :
Bat :Alpaca :
Racoon :Rodent :
Fish :Dolphin :
Seal :Bird :
Host Symbol
=100%
-
3 Biotech (2021) 11:35
1 3
Page 11 of 12 35
as well (Fig. 2b). However, the increase not conforming to
any pattern as visible by the different lengths of differently
coloured lines is indicative of each genomes’ uniqueness. The few
instances wherein negative percentage is observed is owing to
merging of two independent cSSRs into one with increasing dMAX,
thus leading to a decrease in cSSR incidence. Moreover, the cSSR%
varies not only across the genera of Polyomaviridae but also within
the species of same genera (Fig. 2a). In spite of these
variations, of all the reported cSSRs, only 17 are composed of
three SSRs and 3 of four SSRs. Rest all are of two SSRs only. There
is only one species BM97 which has two cSSRs of more than 3 SSRs
each. Other genomes have a single representation only. All the
above figures are for dMAX of 10 (Supplementary file 4).
The prevalence of SSRs in coding region of viral genomes
conforms to earlier reports (Alam et al. 2014, 2019). The
distribution of around 78% SSRs across coding regions is in
accordance with other viral genomes through the gene specific data
(Fig. 3d–e) exhibits uniqueness to Polyomaviridae genomes. The
overlap of genes is reflected by LTAg/STAg or VP2/VP3
representation. Presence of SSRs in these overlapping regions can
be influential in the scenario that an alteration there would have
an impact on two genes simultaneously. The cSSRs constitution
ranged from two to four SSRs, albeit with divergent motifs as
men-tioned above. The distribution of SSRs failed to conform to a
pattern. Thus, we can affirm that the genome-specific clustering of
SSRs is not only unique but regulated as well. This may be an
attempt of the genome to shield itself from changes as clustering
of SSRs will lead to developing hot-spots for mutations.
Though the overall evolution of viruses is guided by mul-tiple
factors such as host range and genome features, the number and
composition of mono-nucleotide SSRs showed a correlation with the
hosts and we believe the data has the foundation of predicting the
future hosts for any viral spe-cies. Our hypothesis stems from the
fact that there were eighteen genomes which exhibited
mono-nucleotide repeats being exclusively restricted to the AT
region. A closer analysis (Fig. 4) revealed a pattern
suggesting humans or related hosts in those genomes. On widening
our analysis, we can say with confidence that the contribution of
mono-nucleotide SSRs from AT region is pivotal for host range
determination. Viruses are constantly expanding their hosts as is
supported by HIV which had origins in monkey and Coronavirus which
had originally bats as host (19). Both the species, monkey and
bats, are hosts for Polyomavirus genomes having the exclusive or
near-exclusive contribution of mono-SSRs from AT region.
Earlier studies on the evolution of Polyomavirus have suggested
gene duplications and inversions as sources for variations in
genome size and also predicted their prior
existence in invertebrate hosts indicating an evolving virus
family in terms of host (Buck et al. 2016). This becomes all
the more relevant when we look at the suggested organisms on the
basis of this study to share a common/interchangeable host range
for viruses. This includes monkeys (HIV) and Bats (Coronavirus)
(Parrish et al. 2008). We accept that the correlation between
mono-repeat from AT region and host is not universal suggesting
other influencing factors but its presence in species across genera
demands further authen-tication of the idea.
To conclude, the incidence and distribution of SSRs in the
Polyomaviridae genomes suggests a unique genome SSR signature which
is defined by multiple factors. These include GC content,
evolutionary relation and coding/non-coding regions. We also
propose the mono-nucleotide distribution in A/T region of the
genome as a key parameter to host divergence to humans and related
species. This needs to be ascertained in all the known human
infecting viruses.
Author contributions RL performed all the analysis of extracted
SSRs and prepared all the figures and tables. MGJ carried out the
extrac-tion of microsatellites from IMEx. SA supervised the whole
study and prepared the manuscript.
Funding Not applicable.
Compliance with ethical standards
Conflict of interest The authors declare that they have no
conflict of interest.
Availability of data and material All data have been provided as
sup-plementary material.
References
Ahsan N, Shah KV (2006) Polyomaviruses and human diseases. Adv
Exp Med Biol 577:1–18. https ://doi.org/10.1007/0-387-32957
-9_1
Alam CM, Singh AK, Sharfuddin C, Ali S (2013) In-silico
analy-sis of simple and imperfect microsatellites in diverse
tobamo-virus genomes. Gene 530:193–200. https
://doi.org/10.1016/j.gene.2013.08.046
Alam CM, Singh AK, Sharfuddin C, Ali S (2014) Incidence,
com-plexity and diversity of simple sequence repeats across
potex-virus genomes. Gene 537:189–196. https
://doi.org/10.1016/j.gene.2014.01.007
Alam CM, Iqbal A, Sharma A et al (2019) Microsatellite
diversity, complexity, and host range of mycobacteriophage genomes
of the Siphoviridae family. Front Genetics. https
://doi.org/10.3389/fgene .2019.00207
Bennetzen JL (2000) Transposable element contributions to plant
gene and genome evolution. Plant Mol Biol 42:251–269
Buck CB, Doorslaer KV, Peretti A et al (2016) The ancient
evolution-ary history of polyomaviruses. PLoS Pathog 12:e1005574.
https ://doi.org/10.1371/journ al.ppat.10055 74
Burguete AS, Almeida S, Gao F-B et al (2015) GGG GCC
microsatel-lite RNA is neuritically localized, induces branching
defects, and
https://doi.org/10.1007/0-387-32957-9_1https://doi.org/10.1007/0-387-32957-9_1https://doi.org/10.1016/j.gene.2013.08.046https://doi.org/10.1016/j.gene.2013.08.046https://doi.org/10.1016/j.gene.2014.01.007https://doi.org/10.1016/j.gene.2014.01.007https://doi.org/10.3389/fgene.2019.00207https://doi.org/10.3389/fgene.2019.00207https://doi.org/10.1371/journal.ppat.1005574https://doi.org/10.1371/journal.ppat.1005574
-
3 Biotech (2021) 11:35
1 3
35 Page 12 of 12
perturbs transport granule function. eLife 4:e08881. https
://doi.org/10.7554/eLife .08881
Capella-Gutierrez S, Silla-Martinez JM, Gabaldon T (2009)
trimAl: a tool for automated alignment trimming in large-scale
phylogenetic analyses. Bioinformatics 25:1972–1973. https
://doi.org/10.1093/bioin forma tics/btp34 8
Chambers GK, MacAvoy ES (2000) Microsatellites: consensus and
controversy. Comp Biochem Physiol B Biochem Mol Biol
126:455–476
Chen M, Zeng G, Tan Z et al (2011) Compound microsatellites
in complete Escherichia coli genomes. FEBS Lett 585:1072–1076.
https ://doi.org/10.1016/j.febsl et.2011.03.005
Chen M, Tan Z, Zeng G, Zeng Z (2012) Differential distribution
of compound microsatellites in various Human Immunodeficiency Virus
Type 1 complete genomes. Infect Genet Evol 12:1452–1457. https
://doi.org/10.1016/j.meegi d.2012.05.006
Gur-Arie R, Cohen CJ, Eitan Y et al (2000) Simple sequence
repeats in Escherichia coli: abundance, distribution, composition,
and polymorphism. Genome Res 10:62–71
Hung S, Saiakhova A, Faber ZJ et al (2019) Mismatch
repair-signature mutations activate gene enhancers across human
colorectal cancer epigenomes. eLife 8:e40760. https
://doi.org/10.7554/eLife .40760
Kashi Y, King DG (2006) Simple sequence repeats as advantageous
mutators in evolution. Trends Genet 22:253–259. https
://doi.org/10.1016/j.tig.2006.03.005
Katoh K, Standley DM (2013) MAFFT multiple sequence alignment
software version 7: improvements in performance and usability. Mol
Biol Evol 30:772–780. https ://doi.org/10.1093/molbe v/mst01 0
Katti MV, Ranjekar PK, Gupta VS (2001) Differential distribution
of simple sequence repeats in eukaryotic genome sequences. Mol Biol
Evol 18:1161–1167. https ://doi.org/10.1093/oxfor djour nals.molbe
v.a0039 03
Kim T-S, Booth JG, Gauch HG et al (2008) Simple sequence
repeats in Neurospora crassa: distribution, polymorphism
and evolutionary inference. BMC Genomics 9:31. https
://doi.org/10.1186/1471-2164-9-31
Kofler R, Schlötterer C, Luschützky E, Lelley T (2008) Survey of
microsatellite clustering in eight fully sequenced species sheds
light on the origin of compound microsatellites. BMC Genomics
9:612. https ://doi.org/10.1186/1471-2164-9-612
Krumsiek J, Arnold R, Rattei T (2007) Gepard: a rapid and
sensi-tive tool for creating dotplots on genome scale.
Bioinformatics 23:1026–1028. https ://doi.org/10.1093/bioin forma
tics/btm03 9
Letunic I, Bork P (2019) Interactive Tree Of Life (iTOL) v4:
recent updates and new developments. Nucleic Acids Res
47:W256–W259. https ://doi.org/10.1093/nar/gkz23 9
Madsen BE, Villesen P, Wiuf C (2008) Short tandem repeats in
human exons: a target for disease mutations. BMC Genomics 9:410.
https ://doi.org/10.1186/1471-2164-9-410
Moens U, Ludvigsen M, Van Ghelue M (2011) Human polyomavi-ruses
in skin diseases. In: Pathology research international. https
://www.hinda wi.com/journ als/pri/2011/12349 1/. Accessed 3 May
2020
Mudunuri SB, Nagarajaram HA (2007) IMEx: imperfect
microsatellite extractor. Bioinformatics 23:1181–1187. https
://doi.org/10.1093/bioin forma tics/btm09 7
Parrish CR, Holmes EC, Morens DM et al (2008)
Cross-species virus transmission and the emergence of new epidemic
diseases. Microbiol Mol Biol Rev 72:457–470. https
://doi.org/10.1128/MMBR.00004 -08
Stamatakis A (2014) RAxML version 8: a tool for phylogenetic
analysis and post-analysis of large phylogenies. Bioinformatics
30:1312–1313. https ://doi.org/10.1093/bioin forma tics/btu03 3
van der Meijden E, Kazem S, Dargel CA et al (2015)
Characterization of T antigens, including middle T and alternative
T, expressed by the human polyomavirus associated with
trichodysplasia spinu-losa. J Virol 89:9427–9439. https
://doi.org/10.1128/JVI.00911 -15
https://doi.org/10.7554/eLife.08881https://doi.org/10.7554/eLife.08881https://doi.org/10.1093/bioinformatics/btp348https://doi.org/10.1093/bioinformatics/btp348https://doi.org/10.1016/j.febslet.2011.03.005https://doi.org/10.1016/j.meegid.2012.05.006https://doi.org/10.7554/eLife.40760https://doi.org/10.1016/j.tig.2006.03.005https://doi.org/10.1016/j.tig.2006.03.005https://doi.org/10.1093/molbev/mst010https://doi.org/10.1093/molbev/mst010https://doi.org/10.1093/oxfordjournals.molbev.a003903https://doi.org/10.1093/oxfordjournals.molbev.a003903https://doi.org/10.1186/1471-2164-9-31https://doi.org/10.1186/1471-2164-9-31https://doi.org/10.1186/1471-2164-9-612https://doi.org/10.1093/bioinformatics/btm039https://doi.org/10.1093/nar/gkz239https://doi.org/10.1186/1471-2164-9-410https://doi.org/10.1186/1471-2164-9-410https://www.hindawi.com/journals/pri/2011/123491/https://www.hindawi.com/journals/pri/2011/123491/https://doi.org/10.1093/bioinformatics/btm097https://doi.org/10.1093/bioinformatics/btm097https://doi.org/10.1128/MMBR.00004-08https://doi.org/10.1128/MMBR.00004-08https://doi.org/10.1093/bioinformatics/btu033https://doi.org/10.1128/JVI.00911-15
Implications of genome simple sequence repeats signature
in 98 Polyomaviridae speciesAbstractIntroductionMaterials
and methodsGenome sequencesMicrosatellite
extractionStatistical analysisDot plot analysis for host
specificityEvolutionary relationship
ResultsGenome featuresIncidence of SSRs
and cSSRsRelative abundance (RA) and relative density
(RD) of SSRs and cSSRsdMAX and cSSRSSR motif types
and their prevalenceSSRs in coding regionsSSRs
(mono-nucleotide) specificity and host range
exclusivityPhylogenetic tree of Polyomaviridae
DiscussionReferences