Implications of genome simple sequence repeats signature ...35 3 Biotech (2021) 11:35 1 3 Page 2 of 12 derivesitsname“Polyoma”fromitsabilitytoinducemul-...

Vol.:(0123456789)1 3

3 Biotech (2021) 11:35 https://doi.org/10.1007/s13205-020-02583-w

ORIGINAL ARTICLE

Implications of genome simple sequence repeats signature in 98 Polyomaviridae species

Rezwanuzzaman Laskar1 · Md Gulam Jilani1 · Safdar Ali1

Received: 25 September 2020 / Accepted: 2 November 2020 / Published online: 6 January 2021 © King Abdulaziz City for Science and Technology 2021

AbstractThe analysis of simple sequence repeats (SSRs) in 98 genomes across four genera of the family Polyomaviridae was performed. The genome size ranged from 3962 (BM87) to 7369 bp (BM85) but maximum genomes were in the range of 5–5.5 kb. The GC% had an average of 42% and ranged between 34.69 (BM95) and 52.35 (BM81). A total of 3036 SSRs and 223 cSSRs were extracted using IMEx with incident frequency from 18 to 56 and 0 to 7, respectively. The most prevalent mono-nucleotide repeat motif was “T” (48.95%) followed by “A” (33.48%). “AT/TA” was the most prevalent dinucleotide motif closely followed by “CT/TC”. The distribution was expectedly more in the coding region with 77.6% SSRs of which nearly half were in Large T Antigen (LTA) gene. Notably, most viruses with humans, apes and related species as host exhibited exclusivity of mono-nucleotide repeats in AT region, a proposed predictive marker for determination of humans as host in the virus in course of its evolution. Each genome has a unique SSR signature which is pivotal for viral evolution particularly in terms of host divergence.

Keywords Simple sequence repeats · Polyomaviridae · Prevalence · Distribution · Virus host · Evolution

Introduction

The genome of any organism is the key to understanding its functionality and evolutionary significance. Besides the sequence per se, each genome has some features which pro-vide for very crucial information. For instance, the repeat sequences or satellite sequences which are classified on the basis of the length of the repeat motif. Simple sequence repeats (SSRs) are the smallest of satellite sequences also known as microsatellites. SSRs are ubiquitously present across the genomes of all organisms, albeit with different

incidence, complexity and iterations. Ever since the identi-fication of these repeats in multiple species, across coding and non-coding regions, their functional relevance has been explored at different levels (Gur-Arie et al. 2000; Kofler et al. 2008; Chen et al. 2012). Clinical relevance of SSRs in humans has also been reported. For instance, the expansion of these repeats through copy number alterations has been associated with enhancer amplification near oncogenes in cancer as well as in neuronal degradation in multiple neu-ropathies (Burguete et al. 2015; Hung et al. 2019). Based on iterations and intervening sequences, tandemly repeated SSRs may be classified into interrupted, pure, compound, interrupted compound, complex or interrupted complex (Chambers and MacAvoy 2000).

Amongst various organisms, viruses are a unique plat-form to study SSRs owing to their small but rapidly evolving genomes. Further, the dependence of viruses on the host cell for survival makes it an easy aspect to study in terms of genome features and evolution. SSRs have been reported to play a role in genome evolution (Bennetzen 2000) and host range in viruses (Alam et al. 2019).

Present study focuses on extraction and analysis of micro-satellites from genomes of 98 species of Polyomaviridae, which is a family of small, non-enveloped viruses that

Supplementary Information The online version contains supplementary material available at https ://doi.org/10.1007/s1320 5-020-02583 -w.

* Safdar Ali [email protected]; [email protected]

Rezwanuzzaman Laskar [email protected]

Md Gulam Jilani [email protected]

1 Clinical and Applied Genomics (CAG) Laboratory, Department of Biological Sciences, Aliah University, IIA/27, Newtown, Kolkata 700160, India

http://orcid.org/0000-0003-3298-9282http://crossmark.crossref.org/dialog/?doi=10.1007/s13205-020-02583-w&domain=pdfhttps://doi.org/10.1007/s13205-020-02583-whttps://doi.org/10.1007/s13205-020-02583-w

3 Biotech (2021) 11:35

1 3

35 Page 2 of 12

derives its name “Polyoma” from its ability to induce mul-tiple tumors in its host. These viruses normally have mam-mals, avians and fish as their hosts (Ahsan and Shah 2006). The circular/linear genome generally encodes for two types of proteins. First, the early regulatory proteins which include large tumour antigen (LTAg), small tumour antigen (STAg), middle tumour antigen (MTAg), alternative tumour antigen (ATAg) and putative alternative large tumour antigen (PAL-TAg). These are pivotal for replication, transcription and maturation of the virus during infection. Second category of genes include those encoding for late structural proteins, which include the major capsid protein, viral protein 1 (VP1) and minor capsid proteins, VP2 and VP3. As the name sug-gests these are important for capsid formation (Moens et al. 2011; Meijden et al. 2015).

In this analysis, we extracted SSRs from genomes of Pol-yomavirus and studied its incidence, distribution and com-plexity to understand the genome SSR signature. Further, the role of SSRs in viral evolution and contributing genome regions therein has been studied. This understanding of the viral genomics holds the key to combat viral pathogenesis and host divergence.

Materials and methods

Genome sequences

Whole-genome sequence of 98 species of Alphapolyoma-virus of family Polyomaviridae across 4 different genera which is listed in ICTV (https ://talk.ictvo nline .org/ictv-repor ts/ictv_onlin e_repor t/dsdna -virus es/w/polyo mavir idae) was extracted from NCBI (http://www.ncbi.nlm.nih.gov/). These include Alphapolyomavirus (43 species), Betapolyomavirus (33 species), Gammapolyomavirus (9 species) and Deltapol-yomavirus (4 species). The study also included 9 species yet to be assigned Genera. The details of all the species included in the study (Genome type, Genera, Genome size, GC%, Host, Accession number) have been summarized in Sup-plementary file 1. All the genomes were double-stranded DNA, mostly circular except for 10 linear genomes. The information for all the known hosts for these viruses was assessed from Virus-Host Database (https ://www.genom e.jp/virus hostd b/note.html).

Microsatellite extraction

We have used Imperfect Microsatellite Extractor (IMEx) for extracting SSRs, wherein mono- to hexa-nucleotide repeat motifs are uncovered, imperfect microsatellites are allowed and compound microsatellites (cSSR: multiple SSRs sep-arated by a distance of less than equal to dMAX) have a

dMAX range of 10–50. So, the results need to be assessed within these parameters.

Microsatellite extraction was carried out using the ‘Advance-Mode’ of IMEx with the parameters reported for HIV (Mudunuri and Nagarajaram 2007; Chen et al. 2012) and as used for Mycobacteriophages (Alam et al. 2019). Briefly, the parameters included minimum repeat size which was set as follows: 6 (mono-), 3 (di-), 3 (tri-), 3 (tetra-), 3 (penta-), 3 (hexa-). Two SSRs separated by a distance of less than or equal to dMAX are treated as a single cSSR. In other words, maximum distance allowed between any two SSRs is called dMAX which was set at 10 initially and subsequently varied to 20, 30, 40, 50. All corresponding changes in cSSR incidence were recorded. It should be noted here that the maximum permissible dMAX value in IMEx is 50, because beyond that the fate of microsatellites is individualistic and hence clubbing it as cSSR becomes irrelevant. Other param-eters were set to the defaults.

Statistical analysis

All statistical analyses performed on the spreadsheet using data Analysis ToolPak of MS Office Suite v2016. Linear regression was used to reveal the correlation between the relative abundance, relative density of microsatellites with genome size and GC%.

Dot plot analysis for host specificity

Dot plot analysis of two nucleic acid/protein sequences using Genome Pair Rapid Dotter (GEPARD) highlights the pres-ence of SSRs within the genomes (Krumsiek et al. 2007; Alam et al. 2019) to ascertain their evolutionary relation-ships in context of repeats, reverse matches, and conserved domains. We used GEPARD v1.40 (Krumsiek et al. 2007) to perform dot plot analysis between genomes on the basis of hosts.

Evolutionary relationship

The phylogenetic tree construction was carried out by aligning the nucleotide sequence with the default speci-fications of MAFFT v6.861b (Katoh and Standley 2013) and the alignment was pruned by the trimAl v1.4.rev6 gap-pyout algorithmic method (Capella-Gutierrez et al. 2009) using the ETE3 v3.1.1 “build” function as implemented on GenomeNet (https ://www.genom e.jp/tools /ete/). To evalu-ate the evolutionary perspective that matches the alignment perfectly, we used pmodeltest v1.4 among JC, K80, TrNef, TPM1, TPM2, TPM3, TIM1ef, TIM2ef, TIM3ef, TVMef, SYM, F81, HKY, TrN, TPM1uf, TPM2uf, TPM3uf, TIM1, TIM2, TIM3, TVM and GTR models to infer ML tree. Using RAxML v8.1.20 of the GTRGAMMAI model with default

https://talk.ictvonline.org/ictv-reports/ictv_online_report/dsdna-viruses/w/polyomaviridaehttps://talk.ictvonline.org/ictv-reports/ictv_online_report/dsdna-viruses/w/polyomaviridaehttp://www.ncbi.nlm.nih.gov/https://www.genome.jp/virushostdb/note.htmlhttps://www.genome.jp/virushostdb/note.htmlhttps://www.genome.jp/tools/ete/

3 Biotech (2021) 11:35

1 3

Page 3 of 12 35

parameters (Stamatakis 2014), the Maximum-Likelihood (ML) tree was asserted with the 100 bootstrap replicates. The final tree for visualization was constructed utilizing the webtool interactive Tree Of Life (Letunic and Bork 2019).

Results

Genome features

The genome size ranged from 3962 (BM87) to 7369 bp (BM85) but maximum genomes were in the range of 5–5.5 kb. However, the GC% with an average of 42% ranged

between 34.69 (BM95) and 52.35 (BM81) but exhibits much more diversity as compared to genome size (Fig. 1a, Supple-mentary file 1). In essence, the Polyomaviridae genomes are mostly of similar sizes, but its composition in terms of GC% is much more variable. If we hypothesize that SSR incidence has an equal chance across the whole genome, irrespective of the composition. Then the same should be reflected in the motifs of SSRs present. However, as discussed later, this is not the case. There are several species which have mono-nucleotide motifs exclusively in the AT region.

The correlation between genome size and GC content was ascertained with various SSR features. SSR incidence was found to be significantly correlated (r = 0.19, P < 0.05) with

20

30

40

50

SSR

Inci

denc

e

4K

5K

6K

7K

Gen

ome

Size

0

2

4

6

8

cSSR

Inci

denc

e

35

40

45

50

GC

%

BM

3B

M5

BM

7B

M9

BM

11B

M13

BM

15B

M17

BM

19B

M21

BM

23B

M25

BM

27B

M29

BM

31B

M33

BM

35B

M37

BM

39B

M41

BM

43B

M45

BM

47B

M49

BM

51B

M53

BM

55B

M57

BM

59B

M61

BM

64B

M66

BM

68B

M70

BM

72B

M74

BM

76B

M78

BM

80B

M82

BM

84B

M86

BM

88B

M90

BM

92B

M94

BM

96B

M98

0

2

4

6

8

RA

(SSR

)

0.0

0.5

1.0

1.5

RA

(cSS

R)

0

20

40

60

80

RD

(SSR

)0

10

20

30

RD

(cSS

R)

A

B

Fig. 1 a Genome features and SSR/cSSR incidence of Polyomaviri-dae genomes. Though genome size is predominantly around 5–5.5 kb as evident by a fairly constant level of red bars whereas the corre-sponding GC variations (superimposed black bars) have a much broader range. In addition, note the diversity in SSRs incidence in genomes of similar length. Furthermore, higher SSR incidence does

not necessarily translate to more cSSRs. b Relative abundance (RA) and relative density (RD) of SSRs and cSSRs RA is the number of microsatellites present per kb of the genome whereas RD is the sequence space composed of SSRs of microsatellites per kb of the genome. The varying peaks signify the presence of a unique SSR sig-nature for each genome

3 Biotech (2021) 11:35

1 3

35 Page 4 of 12

genome size and GC content (r = 0.08, P < 0.05). Though relative density and relative abundance were not significantly correlated with genome size (r = 0.01, P > 0.05; r = 0.005, P > 0.05), significant correlation was observed with GC con-tent (r = 0.20, P < 0.05; and r = 0.23, P < 0.05), respectively.

Further, cSSR incidence is significantly correlated with genome size (r = 0.06, P < 0.05) but its corresponding rela-tive density (r = 0.0038, P > 0.05) and relative abundance (r = 0.004, P > 0.05) shows no significant correlation therein. GC content is also significantly correlated for cSSR inci-dence (r = 0.06, P < 0.05), relative density (r = 0.11, P < 0.05), and relative abundance (r = 0.08, P < 0.05).

Incidence of SSRs and cSSRs

A total of 3036 SSRs and 223 cSSRs were extracted from the 98 species of Polyomaviridae (Supplementary files 2–4). The average distribution of SSRs and cSSRs per genome varied from 23 and 1.3 (Gammapolyomavirus) to 33 and 2.9 (Betapolyomavirus), respectively. Their distribution across genera has been summarized in Table 1.

Maximum of 56 SSRs were present in BM85 whereas minimum of 18 were present in BM80 and BM21. cSSR incidence ranged from 0 in seven species (BM99, BM82, BM76, BM59, BM24, BM21, BM14) to 7 in two species (BM85 and BM84) (Fig. 1a). Two interesting but contrast-ing observations can be made from this data. First, BM85 and BM84 with 7 cSSRs have 56 and 31 SSRs in a genome size of 7369 and 4697 bp, respectively (Supplementary file 2). What it essentially means is that though a longer genome should ideally account for more SSRs but the eventual clus-tering of SSRs reflected as cSSR incidence remains the same. Thus, the SSR rich regions of the genome are inde-pendent of genome size. The second aspect is that the above observation is not the norm as is evident from the cSSR range of zero to seven. Multiple genomes of Polyomaviridae with varying number of SSRs have same number of cSSRs. This is highlighted by 29 species having 2 cSSRs (Fig. 1a, Supplementary files 2–4) suggesting of a unique genome SSR signature.

To further highlight the regularity of this anomaly, we looked into cSSR%, which is percentage of SSRs present as cSSRs in a particular genome. Note, the variations in cSSR% are not only across different genera but even within, thereby negating the clustering of SSRs in a genera specific manner (Fig. 2a). These are reflective of specific yet variable localizations and clustering of SSRs in a particular genome.

Relative abundance (RA) and relative density (RD) of SSRs and cSSRs

RA is the number of microsatellites present per kb of the genome whereas RD is the sequence space composed of SSRs of microsatellites per kb of the genome. So, these val-ues are reflective of number of iterations of SSRs present. If the SSRs have a conserved tendency to be iterated, then higher incidence should correspond to elevated RD values. Moreover, a higher RA value should correspond to high RD value. As observed, BM65 has the highest RA and RD val-ues of 9.32 and 80.4, respectively, for SSRs which means, since more SSRs are present per kb of the genome, more genome is comprised of SSRs. The corresponding lowest values for RA and RD was 3.39 (BM21) and 26.5 (BM80), respectively (Fig. 1b, Supplementary files 2–4).

Similarly, the cSSR relative abundance (cRA) and rela-tive density (cRD) was also studied. Since there were 7 spe-cies with no cSSR (Fig. 1a), hence the minimum cRA and cRD values were zero for these species. The highest values for cRA and cRD were 1.490 (BM84) and 33.93 (BM95), respectively (Fig. 1b, Supplementary files 2–4). This dif-ference may be due to the differential composition of the cSSRs.

dMAX and cSSR

cSSR incidence is dependent on the allowed distance (dMAX) between two SSRs for it to be treated as one cSSR. Since cSSR is reflective of clustering of SSRs, and IMEx allows for dMAX values till 50, we analyzed cSSR incidence of Polyomaviridae genomes by varying the dMAX values

Table 1 SSR and cSSR incidence across the different genera of Polyomaviridae

S. No. Genera No. of Species SSR incidence Average SSR per Species

cSSR incidence Average cSSR per Species

1 Alphapolyomavirus 43 1315 30.58 80 1.862 Betapolyomavirus 33 1090 33.03 96 2.93 Deltapolyomavirus 04 108 27 6 1.54 Gammapolyomavirus 09 208 23.11 12 1.335 Unassigned Species 09 315 35 29 3.22

Total 98 3036 223

3 Biotech (2021) 11:35

1 3

Page 5 of 12 35

BM

4

BM

6

BM

8

BM

10

BM

12

BM

14

BM

16

BM

18

BM

20

BM

22

BM

24

BM

26

BM

28

BM

30

BM

32

BM

34

BM

36

BM

38

BM

90

BM

92

BM

94

BM

40

BM

42

BM

44

BM

46

BM

48

BM

50

BM

52

BM

54

BM

56

BM

58

BM

60

BM

62

BM

65

BM

67

BM

69

BM

95

BM

71

BM

73

BM

75

BM

77

BM

79

BM

81

BM

83

BM

85

BM

87

BM

97

BM

99

0

5

10

15

20

25

30

35

40

45

50cS

SR%

BM

2B

M3

BM

4B

M5

BM

6B

M7

BM

8B

M9

BM

10B

M11

BM

12B

M13

BM

14B

M15

BM

16B

M17

BM

18B

M19

BM

20B

M21

BM

22B

M23

BM

24B

M25

BM

26B

M27

BM

28B

M29

BM

30B

M31

BM

32B

M33

BM

34B

M35

BM

36B

M37

BM

38B

M39

BM

40B

M41

BM

42B

M43

BM

44B

M45

BM

46B

M47

BM

48B

M49

BM

50B

M51

BM

52B

M53

BM

54B

M55

BM

56B

M57

BM

58B

M59

BM

60B

M61

BM

62B

M64

BM

65B

M66

BM

67B

M68

BM

69B

M70

BM

71B

M72

BM

73B

M74

BM

75B

M76

BM

77B

M78

BM

79B

M80

BM

81B

M82

BM

83B

M84

BM

85B

M86

BM

87B

M88

BM

89B

M90

BM

91B

M92

BM

93B

M94

BM

95B

M96

BM

97B

M98

BM

99B

M10

0

0

50

100

150

200

250

300

350

400

450

cSSR

Inci

denc

e In

crea

sing

Per

cent

age

(%) w

ith v

aryi

ng d

MA

XA

B

Alpha PV Beta PV Delta PV Gamma PV Unassigned Species

dMAX30dMAX20 dMAX40 dMax50

Fig. 2 a cSSR% in the studied Polyomaviridae genomes. Percentage of individual SSRs as part of cSSRs is cSSR%. The data for all the genera are differentially coloured. Not only there is diversity across the genera but also within the genomes of the same genera as well. Interestingly, BM84 which has the highest cSSR% is yet to be clas-

sified into any genera. b Percentage increase in cSSR incidence with increasing dMAX (10–50). Note the non-linearity in increase. Nega-tive bars represent a decrease in cSSR incidence when two cSSRs merge into one with increasing dMAX

3 Biotech (2021) 11:35

1 3

35 Page 6 of 12

from initial value of 10 to 20, 30, 40 and 50. Subsequently, % increase was calculated using the given formula.

This % increase was thereon plotted. Though maximum increase is observed for most species when dMAX increased from 10 to 20 as evident from the predominant black bar, it does not conform to a pattern per se (Fig. 2b). This means that even in species of the same family, SSRs chart their own path in terms of localizations in each genome.

SSR motif types and their prevalence

First, the contribution of different repeat motif (mono- to hexa) to the overall SSRs incidence was ascertained. The data were analysed separately for each of the genera. Moreo-ver, the analysis was done in percentage and not absolute numbers to account for variable number of species across genera. Note that the data from species with unassigned genera was not included herein. The contribution of mono-nucleotide repeats motifs ranged from 36 (Gammapolyoma-virus) to 47% (Betapolyomavirus). Deltapolyomavirus had no incidence of penta- and hexa-nucleotide repeats whereas Gammapolyomavirus lacked hexanucleotide repeats. This can be attributed to fewer species in these genera. Gam-mapolyomavirus had the highest contribution from di-nucle-otide repeats (39.42%) and the only genus to have more di-nucleotide repeats than mono-nucleotide repeats (Fig. 3a, Supplementary files 2–3).

We thereon looked into the motif composition of mono- and di-nucleotide repeats for their prevalence across the different genera of Polyomaviridae. For the mono-nucle-otides, if we look at the overall data, the most prevalent repeat motif is “T” (48.95%) followed by “A” (33.48%). “T” also remains the most prevalent mono-nucleotide motif for Alpha-, Beta- and Delta-polyomavirus (47, 52 and 71 percent, respectively). However, Gammapolyomavirus has a highest contribution from “C” (34.67%) followed by “T” (33.33%) (Fig. 3b, Supplementary files 2–3). Interestingly, the same Gammapolyomavirus has the highest di-nucleotide repeat motif contribution from “AT/TA” (29.27%) motif while Alphapolyomavirus has its largest contribution from “CT/TC” (29.37). Overall, “AT/TA” was the most preva-lent dinucleotide repeat motif closely followed by “CT/TC” (Fig. 3c) PV: polyomavirus.

SSRs in coding regions

The assessment of SSRs distribution across genome revealed that non-coding region accounted for 679 SSRs (22.4%)

%increase =

[

{cSSR incidence at dMAXn − cSSR incidence at dMAX(n − 10)}

÷cSSR incidence at dMAX(n − 10)

]

× 100

whereas coding region comprised of 32 proteins/putative genes/ORFs housed 2357 (77.6%) of SSRs (Supplementary

file 2).Subsequently, we analyzed the SSR prevalence across dif-

ferent genes of the studied genomes. Six genes accounted for over 92% of SSRs. Overall, the LTAg gene alone accounted for over 47% of total SSRs with VP1 gene a distant second at around 16% (Fig. 3d). Thereafter, we dissected the data across different genera. Interestingly, though LTAg gene takes the pole position in the housing of SSRs across genera, its contribution varied. In Betapolyomavirus, it was account-ing for one in every two SSR (49.54%) while in Gammapoly-omavirus, approximately one in every three SSR was housed in LTAg gene (35%). This difference permeates to all the genes, albeit to a lesser extent (Fig. 3e, Supplementary files 2–3).

SSRs (mono‑nucleotide) specificity and host range exclusivity

The compilation of different SSRs contribution to overall incidence revealed an interesting observation. Eighteen spe-cies had one hundred percent mono-nucleotide SSRs com-prising of A/T. Further, the majority of these viruses had humans or members of the ape family as their hosts. To elucidate a possible pattern and significance of the same, we arranged all the studied species in decreasing order of their mono-nucleotide SSR contribution by A/T (Fig. 4, Supplementary files 1–2). Notably, viruses with humans, apes, and related species as hosts have a much higher A/T mono-nucleotide SSRs composition as compared to birds and fishes as hosts (Fig. 4).

Using representative species (9 each) we thereon inves-tigated whether the SSRs composition by A/T and the hosts reflect a pattern. Dot plot analysis was performed for nine species each with humans, apes and related species as hosts (Fig. 5a) and nine species with birds, fishes and other species as hosts (Fig. 5b). Interestingly, even though three species in Fig. 4 have 100% mono-nucleotide SSR contribution by A/T (same as Fig. 5a), the overall number of dots (reflective of repeat sequences) is higher for all the genomes of Fig. 5a, representing humans and related species as hosts.

Phylogenetic tree of Polyomaviridae

Subsequently, we constructed the phylogenetic tree of the 98 Polyomaviridae genomes and observed that all the viruses are not evolved together as per their hosts. However, hosts do

3 Biotech (2021) 11:35

1 3

Page 7 of 12 35

Fig.

3 a

SSR

inci

denc

e an

d m

otif

leng

th. A

n in

crea

se in

repe

at m

otif

resu

lted

in le

sser

inci

denc

e, in

vers

e pr

opor

tiona

lity,

whi

ch is

exp

ecte

d. H

owev

er, t

wo

obse

rvat

ions

sho

uld

be n

oted

. Firs

t, G

amm

apol

yom

avir

us is

the

only

gen

era

whe

re th

e hi

ghes

t inc

iden

ce is

of d

i-nuc

leot

ide

repe

at m

otifs

. All

othe

rs h

ave

mon

o-nu

cleo

tide

mot

if as

mos

t rep

rese

nted

alo

ng e

xpec

ted

lines

. Sec

ond,

th

e fa

ll in

inci

denc

e fro

m m

ono-

to d

i-nuc

leot

ide

mot

if SS

Rs

is th

e le

ast i

n D

elta

poly

omav

irus

. b M

ono-

nucl

eotid

e m

otif

com

posi

tion.

In-s

pite

of v

aryi

ng G

C p

erce

ntag

e (F

ig. 1

), th

e m

ono-

nucl

eotid

e m

otif

com

posi

tion

is v

ery

muc

h bi

ased

tow

ards

A/T

acr

oss

all g

ener

a. T

otal

repr

esen

ts o

vera

ll da

ta. c

Di-n

ucle

otid

e m

otif

com

posi

tion.

Tho

ugh

AT/

TA is

the

mos

t rep

rese

nted

di-

nucl

eotid

e re

peat

mot

if ov

eral

l, it

does

not

enj

oy th

e sa

me

stat

us a

cros

s all

gene

ra, w

ith A

lpha

poly

omav

irus

bei

ng th

e ex

cept

ion.

Her

e, C

T/TC

has

the

high

est i

ncid

ence

clo

sely

follo

wed

by

AT/

TA. d

Dist

ribut

ion

of S

SRs

(%) a

cros

s di

ffere

nt p

rote

ins.

Ove

rall,

LTA

g ac

coun

ted

for o

ver 4

7% o

f all

SSR

s in

the

codi

ng re

gion

with

VP1

com

ing

a di

stan

t sec

ond

at a

roun

d 16

%. O

nly

the

6 pr

otei

ns w

hich

acc

ount

ed fo

r the

hig

hest

SSR

s wer

e in

clud

ed, t

he re

st ha

ve b

een

colle

ctiv

ely

take

n as

“O

ther

s”. e

SSR

s con

tribu

tion

(%) b

y pr

otei

ns a

cros

s diff

eren

t gen

era.

Her

ein,

subt

le v

aria

-tio

ns a

re v

isib

le. T

houg

h LT

Ag

gene

acc

ount

s for

max

imum

SSR

s in

the

codi

ng g

enom

e ac

ross

all

the

gene

ra b

ut th

e co

ntrib

utin

g pe

rcen

tage

var

ies f

rom

35%

in G

amm

apol

yom

avir

us to

alm

ost

50%

in B

etap

olyo

mav

irus

3 Biotech (2021) 11:35

1 3

35 Page 8 of 12

reflect in the tree. Multiple places of clustering of the virus with the same or related hosts can be observed (Fig. 6). The fact that all viruses with human or same hosts do not follow the pattern is only indicative of other players in genome evolution besides hosts.

We thereon superimposed the data for percentage mono-nucleotide SSR contribution by AT region, the phylogenetic analysis and the known hosts. For the sake of clarity, hosts of only those species with > 90% mono-nucleotide SSR con-tribution from AT region are shown as illustrations here, though the complete information is provided in Fig. 4. We hypothesize that the presence of mono-repeats in the AT region is somehow providing for viral host flexibility and interchangeability.

Discussion

Owing to the variable nature of the A/T and G/C regions of the DNA, often these sequences exhibit specific attrib-utes. The significance of AT repeats in strand slippage and copy number polymorphism is well documented (Katti et al. 2001). Though this implies GC content to be an important aspect for SSR studies but it is not necessarily the case primarily because of two reasons. First, the uneven

distribution of SSRs across any genome as observed herein and reported for other genomes is not determined by the GC content (Chen et al. 2012; Alam et al. 2013, 2019). For instance, there are 18 species herein where the complete mono-nucleotide SSRs are localized to the A/T region. The fact that these genomes have a maximum GC content of 52%, proves the argument with 48% of the genome housing hundred percent of the mono-nucleotide repeats. We believe that this unevenness in distribution is not random but with a purpose; most probably host range, as discussed later. Sec-ond, the prevalence of repeats is dependent on size of repeat motifs, as in what is applicable to mono-nucleotides, is not true for di-nucleotides and it also varies from one genus to another. However, two exceptions both in Gammapolyoma-virus deserve mention. First, it is the only genera to have maximum mono-nucleotide SSRs as “C”. It is a deviation from AT region being hub for shorter repeat motifs. Con-trastingly, it returns to expected lines with “AT/TA” being the most represented di-nucleotide repeat motif. Second, we should bear in mind that this genus has lesser number of species (nine) but that may be looked with multiple perspec-tives. Either we consider the fewer species as the reason for the aberrant observation or we can assume this uniqueness is the reason for fewer species in Gammapolyomavirus. We believe in the latter.

BM81

BM83Serinus

canaria

BM21Mus musculus

BM76Gallus gallus,Melopsittacusundulatus

BM77Eurasian jackdaw

BM37Pteropus

vampyrus

BM82Pyrrhula

pyrrhula

griseiventris

BM75Anser sp.

BM2Acerodon celebensis

BM99Sparus aurata

BM66Rattus norvegicus

BM84Bos taurus

BM38Rattus norvegicus

BM3Artibeus planirostris

BM17Homo sapiens

BM44Desmodus rotundus

BM13Homo sapiens

BM78Cracticus torquatus

BM85Centropristis striata

BM35Pongo pygmaeus

BM19Mesocricetus auratus

BM10Dobsonia moluccensis

BM89Sorex araneus

BM47Equus caballus

BM92Sturnira lilium

BM90Sorexcoronatus

BM58Miniopterus africanus

BM91Sorex minutus

BM98Procyon lotor

BM25Pan troglodytes verus

BM12Gorilla gorilla gorilla


BM93Miniopterusschreibersii

BM15Homo sapiens

BM29Pan troglodytesverus

BM97Ailuropodamelanoleuca

BM56Meles meles

BM96Rousettusaegyptiacus

BM45Dobsoniamoluccensis


BM80Lonchura maja

BM11Eidolon helvum

BM5Ateles paniscus

BM23Otomops martiensseni

BM94Miniopterus schreibersii

BM16Homo sapiens

BM32Piliocolobus badius

BM95Canis familiaris

BM18Macacafascicularis

BM7Carolliaperspicillata

BM8Chlorocebuspygerythrus

BM22Otomops martiensseni

BM6Cardioderma cor

BM53Loxodonta africana

BM71Homo sapiens

BM20Molossus molossus

BM52Leptonychotes weddellii


BM57Microtus arvalis

BM9Chlorocebus pygerythrus

BM39Acerodon celebensis

BM30Pan troglodytes schweinfurthii

BM86Delphinus delphis

BM61Myotis lucifugus

BM49Homo sapiens

BM70Zalophus californianus

BM64Pteronotus davyi

BM14Homo sapiens

BM51Homo sapiens

BM4Artibeus planirostris

BM36Procyon lotor

BM50Homo sapiens

BM34Pongo abelii

BM46Dobsonia moluccensis

BM31Papio cynocephalus

BM40Artibeus planirostris polyomavirus 1

BM69Vicugna pacos

BM41Cebus albifrons

BM65Pteronotus parnellii

BM54Macaca mulatta

BM88Trematomus pennellii

BM87Rhynchobatus djiddensis

BM79Erythrura gouldiae

BM74Homo sapiens

BM73Homo sapiens

BM72Homo sapiens

BM68Saimiri sciureus

BM67Saimiri boliviensis


BM60Myodes glareolus

BM59Mus musculus

BM55Mastomys natalensis

BM48Homo sapiens

BM43Chlorocebus pygerythrus

BM42Cercopithecus erythrotis

BM33Piliocolobus rufomitratus


BM100Trematomus bernacchii

Pygoscelis adeliae

25.49 100.00

Fig. 4 Genomes with decreasing % of A/T mono-nucleotides repeat motif. Though, not perfect, the similar values for humans and related species suggests host range dependency on SSR distribution across AT genome regions. Higher the contribution of mono-nucleotide

repeat motifs from AT region, greater are the chances that it will have humans or related species as its host. The color gradient represents the percentage of A/T mono-nucleotide repeat motif

3 Biotech (2021) 11:35

1 3

Page 9 of 12 35

Fig.

5

Dot

plot

ana

lysi

s of

Pol

yom

avir

idae

gen

omes

with

a h

uman

, ape

s or

rela

ted

spec

ies

as h

osts

with

mon

o-nu

cleo

tide

repe

at m

otif

cont

ribut

ion

of 1

00%

from

the

AT

regi

on a

nd b

div

erge

nt

hosts

with

var

ying

mon

o-nu

cleo

tide

repe

ats i

n th

e A

T re

gion

3 Biotech (2021) 11:35

1 3

35 Page 10 of 12

The study of cSSRs has always been relevant with SSRs owing to their involvement in functional aspects such as reg-ulation of gene expression (Kashi and King 2006; Chen et al. 2011). Essentially, cSSR is a reflection of accumulation of SSRs in the genome. Higher cSSR incidence refers to SSRs

present in close proximity to each other and with these being sources of variations and genome evolution (Kim et al. 2008; Madsen et al. 2008), we further looked at cSSRs in terms of cSSR% and by varying dMAX. An increase in cSSR incidence with increasing dMAX is expected and observed

BM95

BM91

BM30

BM16

BM34

BM31

BM20

BM25

BM18

BM15

BM96

BM60

BM99

BM2

BM86

BM93

BM54

BM97

BM35

BM78

BM49

BM83

BM76

BM47

BM88

BM87

BM21

BM26

BM29

BM94BM

65

BM38

BM79

BM72

BM9

BM92

BM77

BM52

BM50

BM41

BM57

BM17

BM98

BM56BM

69

BM40

BM12

BM70

BM23

BM33

BM100

BM24

BM80

BM19

BM75

BM84

BM22

BM37

BM55

BM14

BM67

BM32

BM45

BM89

BM8

BM46

BM59

BM71

BM42

BM48

BM7

09MB

BM74

BM4

BM51

BM11

BM3

BM53

BM6

BM44

BM82

BM10

BM81

BM66

BM58B

M64

BM13

BM68

BM73

BM43

BM85

BM39

BM28

BM27

BM62

BM36

BM5

5555

100100

3333

8787

100

100

8383

3636

100100

6262

7474

5555

9494

7979

8989

100 100

7171

6565

100100

100100

3333

4747100100

4242

100100

8686

3030

9696

100100

5858

100 100

9696

7878

100100

100100

100 100

100100

5151

9999

4040

100100

100100

8484

100 100

5252

100

100

6363

4848

100

100

8181

8484

100 100

5555

100 100

100

100

100 100

9292

5050

2626

100 100

100 100

9898

5353

100100

9999

100

100

9393

000011

1919

5757

100

100

100100

4949

9999

5858

100 100

100 100

7676

9393

000011

9292

5454

4646

100 100

100100

100 100

9191

99

100100

100 100

100 100

100100

100 100

100 100100 100

5050

100100

100100

0.3650.365

0034

0.034

0131

0.1310063

0.063

0.253

0.253

0.031 0.031

0186 0.186

0.041

0.041

0.0380.038

00.111818

0.0850.085

0.0390.039

013

0.13

0.054 0.054

0.065

0.065

0071 0.071

00.111717

0.081

0.081

0.061

0.061

0.184 0.184

0.213 0.213

0.186

0.186

0.094 0.094

0.05 0.05

0.094

0.094

0.0430.043

0.1480.148

0.3020.302

0.1850.185

0.0290.029

0.051

0.051

0082

0.082

0175 0.175

0.1480.148

1621 1.621

0.080.08

0.02

30.

023

0.5740.574

0.1710.171

0.1490.149

0.152

0.152

0.049 0.049

0.174

0.174

0052

0.052

0.060.06

0.1350.135

0.1050.105

0.265

0.265

0.2

0.211

11

0.501

0.501

0.074

0.074

0.1540.154

0.070.07

0081

0.081

0.0920.092

0.0690.069

0.438 0.438

0.381

0.381

0.23 0.23

1366 1.366

0.1470.147

2021 2.021

06 0.61111

0.063 0.063

0.105

0.105

0.1280.128

0.1530.153

0498

0.498

0.194

0.194

0.0420.042

0.074 0.074

0.0350.035

0213

0.213

0.09

70.

097

0083 0.083

0.234

0.234

0.0760.076

0288

0.288

0.3570.357

0.2150.215

0.090.09

00

00.111212

0.0360.036

0.153

0.153

2.437 2.437

0252 0.252

0.20.21111

0.239

0.239

0379

0.379

0.1060.106

0.155 0.155

0.093

0.093

0.0960.096

0.2310.231

0.038

0.038

0.0620.062

00.111616

164 1.64

0.0390.039

00.111717

0.055 0.055

0.0640.064

0.277 0.277

0.04

0.04

0.1760.176

0.449 0.449

0.14

70.

147

0.312

0.312

0169 0.169

0.610.61

0.2490.249

0079

0.079

0.0750.075

0.046

0.046

0.0350.035

0.1920.192

00.111515

0.0550.055

0.1080.108

0.233 0.233

0.09

10.

091

0.0260.026

0.0510.051

0.7910.791

0164 0.164

0.0010.001

0.059

0.059

0.234

0.234

0.0730.073

0.32 0.32

0.340.34

0096

0.096

334422.

00

0.083

0.083

990044.

00

0051

0.051

0.10

10.

101

0.1640.164

7700.00

0.415

0.415

0.1520.152

0155

0.155

0.3130.313

0.085 0.085

015 0.15

0.05

20.

052

0.1610.161

0.581

0.581

0.03

0.03

0.3060.306

0.2530.253

0.2020.2020.064

0.064

0.055 0.055

0.2180.218

0.197 0.197

0039

0.039

0.15

50.

155

0.568

0.568

0438

0.438

0.29 0.29

0.0630.063

0.3180.318

557755.

00

0.040.04

0.1050.105

0.0370.037

00

0.035

0.035

0.38 0.38

0.0720.072

0.262

0.262

0.2340.234

0.08

70.

087

0.25

0.25

1.301 1.301

0.0620.062

0.0180.018

0174

0.174

0.1430.143

0152 0.152

0.0910.091

0.219 0.219

0.352 0.352

0.151

0.151

0.0930.093

0.0680.068

0.0710.071

0343

0.343

0.1490.149

0.1050.105

0.0810.081

0.524 0.524

0.2380.238

Mono Nucleotide Repeat (SSR) AT%

Mono Nucleotide Repeat (SSR) GC%

Tree scale: 0.1

Human :

Ape :Monkey :

Bat :Alpaca :

Racoon :Rodent :

Fish :Dolphin :

Seal :Bird :

Host Symbol

=100%

3 Biotech (2021) 11:35

1 3

Page 11 of 12 35

as well (Fig. 2b). However, the increase not conforming to any pattern as visible by the different lengths of differently coloured lines is indicative of each genomes’ uniqueness. The few instances wherein negative percentage is observed is owing to merging of two independent cSSRs into one with increasing dMAX, thus leading to a decrease in cSSR incidence. Moreover, the cSSR% varies not only across the genera of Polyomaviridae but also within the species of same genera (Fig. 2a). In spite of these variations, of all the reported cSSRs, only 17 are composed of three SSRs and 3 of four SSRs. Rest all are of two SSRs only. There is only one species BM97 which has two cSSRs of more than 3 SSRs each. Other genomes have a single representation only. All the above figures are for dMAX of 10 (Supplementary file 4).

The prevalence of SSRs in coding region of viral genomes conforms to earlier reports (Alam et al. 2014, 2019). The distribution of around 78% SSRs across coding regions is in accordance with other viral genomes through the gene specific data (Fig. 3d–e) exhibits uniqueness to Polyomaviridae genomes. The overlap of genes is reflected by LTAg/STAg or VP2/VP3 representation. Presence of SSRs in these overlapping regions can be influential in the scenario that an alteration there would have an impact on two genes simultaneously. The cSSRs constitution ranged from two to four SSRs, albeit with divergent motifs as men-tioned above. The distribution of SSRs failed to conform to a pattern. Thus, we can affirm that the genome-specific clustering of SSRs is not only unique but regulated as well. This may be an attempt of the genome to shield itself from changes as clustering of SSRs will lead to developing hot-spots for mutations.

Though the overall evolution of viruses is guided by mul-tiple factors such as host range and genome features, the number and composition of mono-nucleotide SSRs showed a correlation with the hosts and we believe the data has the foundation of predicting the future hosts for any viral spe-cies. Our hypothesis stems from the fact that there were eighteen genomes which exhibited mono-nucleotide repeats being exclusively restricted to the AT region. A closer analysis (Fig. 4) revealed a pattern suggesting humans or related hosts in those genomes. On widening our analysis, we can say with confidence that the contribution of mono-nucleotide SSRs from AT region is pivotal for host range determination. Viruses are constantly expanding their hosts as is supported by HIV which had origins in monkey and Coronavirus which had originally bats as host (19). Both the species, monkey and bats, are hosts for Polyomavirus genomes having the exclusive or near-exclusive contribution of mono-SSRs from AT region.

Earlier studies on the evolution of Polyomavirus have suggested gene duplications and inversions as sources for variations in genome size and also predicted their prior

existence in invertebrate hosts indicating an evolving virus family in terms of host (Buck et al. 2016). This becomes all the more relevant when we look at the suggested organisms on the basis of this study to share a common/interchangeable host range for viruses. This includes monkeys (HIV) and Bats (Coronavirus) (Parrish et al. 2008). We accept that the correlation between mono-repeat from AT region and host is not universal suggesting other influencing factors but its presence in species across genera demands further authen-tication of the idea.

To conclude, the incidence and distribution of SSRs in the Polyomaviridae genomes suggests a unique genome SSR signature which is defined by multiple factors. These include GC content, evolutionary relation and coding/non-coding regions. We also propose the mono-nucleotide distribution in A/T region of the genome as a key parameter to host divergence to humans and related species. This needs to be ascertained in all the known human infecting viruses.

Author contributions RL performed all the analysis of extracted SSRs and prepared all the figures and tables. MGJ carried out the extrac-tion of microsatellites from IMEx. SA supervised the whole study and prepared the manuscript.

Funding Not applicable.

Compliance with ethical standards

Conflict of interest The authors declare that they have no conflict of interest.

Availability of data and material All data have been provided as sup-plementary material.

References

Ahsan N, Shah KV (2006) Polyomaviruses and human diseases. Adv Exp Med Biol 577:1–18. https ://doi.org/10.1007/0-387-32957 -9_1

Alam CM, Singh AK, Sharfuddin C, Ali S (2013) In-silico analy-sis of simple and imperfect microsatellites in diverse tobamo-virus genomes. Gene 530:193–200. https ://doi.org/10.1016/j.gene.2013.08.046

Alam CM, Singh AK, Sharfuddin C, Ali S (2014) Incidence, com-plexity and diversity of simple sequence repeats across potex-virus genomes. Gene 537:189–196. https ://doi.org/10.1016/j.gene.2014.01.007

Alam CM, Iqbal A, Sharma A et al (2019) Microsatellite diversity, complexity, and host range of mycobacteriophage genomes of the Siphoviridae family. Front Genetics. https ://doi.org/10.3389/fgene .2019.00207

Bennetzen JL (2000) Transposable element contributions to plant gene and genome evolution. Plant Mol Biol 42:251–269

Buck CB, Doorslaer KV, Peretti A et al (2016) The ancient evolution-ary history of polyomaviruses. PLoS Pathog 12:e1005574. https ://doi.org/10.1371/journ al.ppat.10055 74

Burguete AS, Almeida S, Gao F-B et al (2015) GGG GCC microsatel-lite RNA is neuritically localized, induces branching defects, and

https://doi.org/10.1007/0-387-32957-9_1https://doi.org/10.1007/0-387-32957-9_1https://doi.org/10.1016/j.gene.2013.08.046https://doi.org/10.1016/j.gene.2013.08.046https://doi.org/10.1016/j.gene.2014.01.007https://doi.org/10.1016/j.gene.2014.01.007https://doi.org/10.3389/fgene.2019.00207https://doi.org/10.3389/fgene.2019.00207https://doi.org/10.1371/journal.ppat.1005574https://doi.org/10.1371/journal.ppat.1005574

3 Biotech (2021) 11:35

1 3

35 Page 12 of 12

perturbs transport granule function. eLife 4:e08881. https ://doi.org/10.7554/eLife .08881

Capella-Gutierrez S, Silla-Martinez JM, Gabaldon T (2009) trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25:1972–1973. https ://doi.org/10.1093/bioin forma tics/btp34 8

Chambers GK, MacAvoy ES (2000) Microsatellites: consensus and controversy. Comp Biochem Physiol B Biochem Mol Biol 126:455–476

Chen M, Zeng G, Tan Z et al (2011) Compound microsatellites in complete Escherichia coli genomes. FEBS Lett 585:1072–1076. https ://doi.org/10.1016/j.febsl et.2011.03.005

Chen M, Tan Z, Zeng G, Zeng Z (2012) Differential distribution of compound microsatellites in various Human Immunodeficiency Virus Type 1 complete genomes. Infect Genet Evol 12:1452–1457. https ://doi.org/10.1016/j.meegi d.2012.05.006

Gur-Arie R, Cohen CJ, Eitan Y et al (2000) Simple sequence repeats in Escherichia coli: abundance, distribution, composition, and polymorphism. Genome Res 10:62–71

Hung S, Saiakhova A, Faber ZJ et al (2019) Mismatch repair-signature mutations activate gene enhancers across human colorectal cancer epigenomes. eLife 8:e40760. https ://doi.org/10.7554/eLife .40760

Kashi Y, King DG (2006) Simple sequence repeats as advantageous mutators in evolution. Trends Genet 22:253–259. https ://doi.org/10.1016/j.tig.2006.03.005

Katoh K, Standley DM (2013) MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol 30:772–780. https ://doi.org/10.1093/molbe v/mst01 0

Katti MV, Ranjekar PK, Gupta VS (2001) Differential distribution of simple sequence repeats in eukaryotic genome sequences. Mol Biol Evol 18:1161–1167. https ://doi.org/10.1093/oxfor djour nals.molbe v.a0039 03

Kim T-S, Booth JG, Gauch HG et al (2008) Simple sequence repeats in Neurospora crassa: distribution, polymorphism

and evolutionary inference. BMC Genomics 9:31. https ://doi.org/10.1186/1471-2164-9-31

Kofler R, Schlötterer C, Luschützky E, Lelley T (2008) Survey of microsatellite clustering in eight fully sequenced species sheds light on the origin of compound microsatellites. BMC Genomics 9:612. https ://doi.org/10.1186/1471-2164-9-612

Krumsiek J, Arnold R, Rattei T (2007) Gepard: a rapid and sensi-tive tool for creating dotplots on genome scale. Bioinformatics 23:1026–1028. https ://doi.org/10.1093/bioin forma tics/btm03 9

Letunic I, Bork P (2019) Interactive Tree Of Life (iTOL) v4: recent updates and new developments. Nucleic Acids Res 47:W256–W259. https ://doi.org/10.1093/nar/gkz23 9

Madsen BE, Villesen P, Wiuf C (2008) Short tandem repeats in human exons: a target for disease mutations. BMC Genomics 9:410. https ://doi.org/10.1186/1471-2164-9-410

Moens U, Ludvigsen M, Van Ghelue M (2011) Human polyomavi-ruses in skin diseases. In: Pathology research international. https ://www.hinda wi.com/journ als/pri/2011/12349 1/. Accessed 3 May 2020

Mudunuri SB, Nagarajaram HA (2007) IMEx: imperfect microsatellite extractor. Bioinformatics 23:1181–1187. https ://doi.org/10.1093/bioin forma tics/btm09 7

Parrish CR, Holmes EC, Morens DM et al (2008) Cross-species virus transmission and the emergence of new epidemic diseases. Microbiol Mol Biol Rev 72:457–470. https ://doi.org/10.1128/MMBR.00004 -08

Stamatakis A (2014) RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30:1312–1313. https ://doi.org/10.1093/bioin forma tics/btu03 3

van der Meijden E, Kazem S, Dargel CA et al (2015) Characterization of T antigens, including middle T and alternative T, expressed by the human polyomavirus associated with trichodysplasia spinu-losa. J Virol 89:9427–9439. https ://doi.org/10.1128/JVI.00911 -15

https://doi.org/10.7554/eLife.08881https://doi.org/10.7554/eLife.08881https://doi.org/10.1093/bioinformatics/btp348https://doi.org/10.1093/bioinformatics/btp348https://doi.org/10.1016/j.febslet.2011.03.005https://doi.org/10.1016/j.meegid.2012.05.006https://doi.org/10.7554/eLife.40760https://doi.org/10.1016/j.tig.2006.03.005https://doi.org/10.1016/j.tig.2006.03.005https://doi.org/10.1093/molbev/mst010https://doi.org/10.1093/molbev/mst010https://doi.org/10.1093/oxfordjournals.molbev.a003903https://doi.org/10.1093/oxfordjournals.molbev.a003903https://doi.org/10.1186/1471-2164-9-31https://doi.org/10.1186/1471-2164-9-31https://doi.org/10.1186/1471-2164-9-612https://doi.org/10.1093/bioinformatics/btm039https://doi.org/10.1093/nar/gkz239https://doi.org/10.1186/1471-2164-9-410https://doi.org/10.1186/1471-2164-9-410https://www.hindawi.com/journals/pri/2011/123491/https://www.hindawi.com/journals/pri/2011/123491/https://doi.org/10.1093/bioinformatics/btm097https://doi.org/10.1093/bioinformatics/btm097https://doi.org/10.1128/MMBR.00004-08https://doi.org/10.1128/MMBR.00004-08https://doi.org/10.1093/bioinformatics/btu033https://doi.org/10.1128/JVI.00911-15

Implications of genome simple sequence repeats signature in 98 Polyomaviridae speciesAbstractIntroductionMaterials and methodsGenome sequencesMicrosatellite extractionStatistical analysisDot plot analysis for host specificityEvolutionary relationship

ResultsGenome featuresIncidence of SSRs and cSSRsRelative abundance (RA) and relative density (RD) of SSRs and cSSRsdMAX and cSSRSSR motif types and their prevalenceSSRs in coding regionsSSRs (mono-nucleotide) specificity and host range exclusivityPhylogenetic tree of Polyomaviridae

DiscussionReferences

Implications of genome simple sequence repeats signature ...35 3 Biotech (2021) 11:35 1 3 Page 2 of 12 derivesitsname“Polyoma”fromitsabilitytoinducemul-...

Documents