1 PROTOCOL FOR THE ANALYSIS AND CLASSIFICATION OF VIRAL PROTEINS USING DOMAIN-ARCHITECTURE AWARE INFERENCE OF ORTHOLOGS (DAIO) July 16, 2019 Publication describing approach: Zmasek CM, Knipe DM, Pellett PE, Scheuermann RH (2019) “Classification of human Herpesviridae proteins using Domain-architecture Aware Inference of Orthologs (DAIO)” Virology, 529, 29-42 [https://doi.org/10.1016/j.virol.2019.01.005] 1. Background We used a computational approach called Domain-architecture Aware Inference of Orthologs (DAIO, see: https://sites.google.com/site/cmzmasek/home/software/forester/daio) for the analysis of genomes by combining phylogenetic and protein domain-architecture information. We performed a systematic phylogenetic and protein domain architecture-based study, encompassing the entire proteomes of Herpesviridae, Poxviridae, and Coronaviridae to define Strict Ortholog Groups (SOGs). Besides assessing the taxonomic distribution for each protein, we computationally inferred gene duplications and performed a protein domain architecture analysis for every protein family. These results allowed us to develop a novel classification system for viral proteins which clusters proteins into SOGs and allows to quickly infer the taxonomic distribution for any protein. Phylogenomics. Orthologs have been defined by Fitch in 1970 as homologous genes in different species that diverged from a common ancestral gene by speciation. Genes that, either in the same or different species, diverged by a gene duplication have been termed paralogs (Fitch, 2000, 1970). While the terms ortholog and paralog have no functional implications (Jensen, 2001), orthologs are oftentimes considered potentially more functionally similar than paralogs at the same level of sequence divergence. This has been termed the “ortholog conjecture” and remains a topic of active research (Altenhoff et al., 2012; Chen and Zhang, 2012; Nehrt et al., 2011; Rogozin et al., 2014), due to its importance for computational sequence functional analysis (Eisen, 1998; Zmasek and Eddy, 2002) and the essential significance of gene duplications for biological evolution by supplying the raw genetic precursor material [(Zhang, 2003) and references therein]. Orthologs (or groups/clusters of orthologs) are oftentimes inferred by indirect methods based on (reciprocal) pairwise highest similarities [e.g. (Remm et al., 2001; Tatusov et al., 1997)]. In this approach, we used explicit phylogenetic inference followed by comparison to a trusted species tree for orthology inference, as this approach is likely to yield more accurate results, albeit at a cost of significant time complexity (Zmasek and Eddy, 2002, 2001). Protein domains and domain architectures. Many eukaryotic proteins, and by extension – eukaryotic viruses, are composed of multiple domains, units with their own evolutionary history and, often, specific and conserved functions. The ordered
23
Embed
DOMAIN ARCHITECTURE AWARE INFERENCE OF ORTHOLOGS … · domain-centric phyloinformatics pipeline. Any arrangement of single or multiple Pfam domains is considered a domain architecture
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1
PROTOCOL FOR THE ANALYSIS AND CLASSIFICATION OF VIRAL PROTEINS USING
DOMAIN-ARCHITECTURE AWARE INFERENCE OF ORTHOLOGS (DAIO)
July 16, 2019 Publication describing approach: Zmasek CM, Knipe DM, Pellett PE, Scheuermann RH (2019) “Classification of human Herpesviridae proteins using Domain-architecture Aware Inference of Orthologs (DAIO)” Virology, 529, 29-42 [https://doi.org/10.1016/j.virol.2019.01.005] 1. Background We used a computational approach called Domain-architecture Aware Inference of Orthologs (DAIO, see: https://sites.google.com/site/cmzmasek/home/software/forester/daio) for the analysis of genomes by combining phylogenetic and protein domain-architecture information. We performed a systematic phylogenetic and protein domain architecture-based study, encompassing the entire proteomes of Herpesviridae, Poxviridae, and Coronaviridae to define Strict Ortholog Groups (SOGs). Besides assessing the taxonomic distribution for each protein, we computationally inferred gene duplications and performed a protein domain architecture analysis for every protein family. These results allowed us to develop a novel classification system for viral proteins which clusters proteins into SOGs and allows to quickly infer the taxonomic distribution for any protein. Phylogenomics. Orthologs have been defined by Fitch in 1970 as homologous genes in different species that diverged from a common ancestral gene by speciation. Genes that, either in the same or different species, diverged by a gene duplication have been termed paralogs (Fitch, 2000, 1970). While the terms ortholog and paralog have no functional implications (Jensen, 2001), orthologs are oftentimes considered potentially more functionally similar than paralogs at the same level of sequence divergence. This has been termed the “ortholog conjecture” and remains a topic of active research (Altenhoff et al., 2012; Chen and Zhang, 2012; Nehrt et al., 2011; Rogozin et al., 2014), due to its importance for computational sequence functional analysis (Eisen, 1998; Zmasek and Eddy, 2002) and the essential significance of gene duplications for biological evolution by supplying the raw genetic precursor material [(Zhang, 2003) and references therein]. Orthologs (or groups/clusters of orthologs) are oftentimes inferred by indirect methods based on (reciprocal) pairwise highest similarities [e.g. (Remm et al., 2001; Tatusov et al., 1997)]. In this approach, we used explicit phylogenetic inference followed by comparison to a trusted species tree for orthology inference, as this approach is likely to yield more accurate results, albeit at a cost of significant time complexity (Zmasek and Eddy, 2002, 2001). Protein domains and domain architectures. Many eukaryotic proteins, and by extension – eukaryotic viruses, are composed of multiple domains, units with their own evolutionary history and, often, specific and conserved functions. The ordered
arrangement of all domains in a given protein constitutes its architecture. Many domains can combine with different partner domains and, as a result, form a wide variety of domain combinations, often even within the same species (Moore et al., 2008). Bringing together multiple domains in one protein creates a distinct entity, combining functions of its constituents. The emergence of proteins with novel domain combinations is considered to be a major mechanism of evolution of new functionality in eukaryotic genomes (Itoh et al., 2007; Peisajovich et al., 2010). It is especially important in the evolution of pathways, where physical proximity of domains in multidomain proteins links different elements of the pathway; thus, emergence of a new domain combination may rearrange pathways or processes in the cell (Peisajovich et al., 2010). The modular structure of eukaryotic proteins provides a mechanism that promotes differentiation and variation of protein functions despite the existence of only a limited number of domains. Proteins can gain (or lose) new domains in genome rearrangements, creating (or removing) domain combinations (Patthy, 2003; Ye and Godzik, 2004). We use a systematic description and classification of the entire set of human Herpesviridae, Poxviridae, and Coronaviridae proteins using Domain-architecture Aware Inference of Orthologs (DAIO). Besides classifying proteins based on their evolutionary history, we equally consider their domain architectures for classification. In particular, our goal is to classify proteins into groups, which we call “Strict Ortholog Groups” (SOGs), in which all proteins exhibit the same domain architecture and are orthologous to each other (related by speciation events). Furthermore, we attempt to provide an informative name for each of these groups. A name that not only includes information about the protein’s function (if known), but also has a suffix that indicates the taxonomic distribution of a protein.
Uppercase suffixes mean that corresponding SOG members are found in each species of
a taxonomic unit, lowercase suffixes mean that corresponding SOG members are found
in some species of a taxonomic unit.
Numbers in square brackets are used to differentiate Coronaviridae SOGs with the
same (automatically inferred) name but different domain architectures.
8
Coronaviridae taxonomic tree used in this work.
9
3. Method Description Sequence retrieval. Individual protein sequences were downloaded from the ViPR database (Pickett et al., 2012), while entire proteomes were downloaded from UniProtKB (Bateman et al., 2017). Multiple sequence alignments. Multiple sequence alignments were calculated using MAFFT version 7.313 (with “localpair” and “maxiterate 1000” options) (Katoh and Standley, 2013; Kuraku et al., 2013). Prior to phylogenetic inference, multiple sequence alignment columns with more than 50% gaps were deleted; for comparison we also performed the analyses based on alignments for which we only deleted columns with more than 90% gaps. Protein domain analysis. Protein domains were analyzed using HMMER v3.1b2 (Eddy, 2011) and the Pfam 31.0 database (Finn et al., 2016). Phylogenetic analyses. Phylogenetic trees were calculated for individual domain architectures (not full-length sequences) except for US22 domain proteins, because US22 domain alignments lack phylogeneticly sufficient signal. Distance-based minimal evolution trees were inferred by FastME 2.0 (Desper and Gascuel, 2002) (with balanced tree swapping and “GME” initial tree options) based on pairwise distances calculated by TREE-PUZZLE 5.2 (Schmidt et al., 2002) (using the WAG substitution model (Whelan and Goldman, 2001), a uniform model of rate heterogeneity, estimation of amino acid frequencies from the dataset, and approximate parameter estimation using a Neighbor-Joining tree). For maximum likelihood approaches, we employed RAxML version 8.2.9 (Stamatakis et al., 2005) (using 100 bootstrapped data sets and the WAG substitution model). Gene duplication inferences were performed using the SDI and RIO methods (Zmasek and Eddy, 2002, 2001). Automated genome wide domain composition analysis was performed using a specialized software tool, Surfacing version 2.002 [Zmasek CM (2012), a tool for the functional analysis of domainome/genome evolution [available at https://sites.google.com/site/cmzmasek/home/software/forester/surfacing]. Phylogenomic analyses and development of novel naming schema using strict ortholog groups. The processes for defining and naming strict ortholog groups were formalized into a set of “rules” and then implemented into a semi-automatic domain-centric phyloinformatics pipeline. Any arrangement of single or multiple Pfam domains is considered a domain architecture (DA) (Zmasek and Godzik, 2012, 2011). In Herpesviridae, most proteins have DAs consisting of only a single domain. For example, the UDG domain of Uracil DNA glycosylase is a single domain DA, whereas the combination of N-terminal DNA_pol_B_exo1 and C-terminal DNA_pol_B (denoted as DNA_pol_B_exo1––DNA_pol_B) of DNA polymerases is a DA with two domains. In this analysis we consider a given DA “present” in a given Herpesviridae species S if the DA is present under a set of thresholds in at least one strain of the species S. The
10
rationale for this is that it is possible to miss a DA in a genome, due to incomplete or erroneous sequences, erroneous assembly and gene-predication (false negatives), and even recent, actual, gene loss. The opposite (false positive), on the other hand, is far less likely. For this work, we used two thresholds: a minimal domain length of 40% of the length set forth in the Pfam database (domain fragments are unlikely to be functionally equivalent to full length domains) and a BLAST E-value cutoff of E=10-6. For every domain architecture, a set of bootstrap resampled phylogenetic trees (gene trees) was calculated by RAxML (Stamatakis et al., 2005) using protein sequences from one representative for each of the nine human Herpesviridae species. For comparison and validation, we also calculated phylogenetic trees that included non-human hosted Herpesviridae. For illustrations, gene duplications were inferred by comparing the consensus gene trees to the species tree (Fig. 1) for Herpesviridae using the SDI (Speciation Duplication Inference) algorithm (Zmasek and Eddy, 2001). To obtain confidence values on orthology assignments (bootstrap support values), we employed the RIO approach (Resampled Inference of Orthologs) to compare sets of bootstrap resampled phylogenetic trees with the species tree for Herpesviridae (Zmasek and Eddy, 2002). In this work, we define a strict ortholog group (SOG) as sequences related by speciation events and exhibiting the same domain architecture (based on Pfam domains from Pfam 31.0, a length threshold of 40%, and E-value cutoff of E=10-6). Based on this approach for defining SOGs, we developed the following naming syntax. For protein families such as Uracil DNA glycosylase, which exhibit the same DA in all nine human Herpesviridae, and which are related by speciation events only, we use the “Recommended name” (under “Protein names”) from the UniProtKB database (Bateman et al., 2017) as the base name and add a case-sensitive suffix which indicates the taxonomic distribution - “ABG” in this case, since Uracil DNA glycosylase appears in each human Alpha-, Beta-, and Gammaherpesvirinae species. Therefore, the full name is “ABG/Uracil DNA glycosylase”. To indicate presence in some, but not all members of a subfamily, we use lower-case suffixes. “Ab/Replication origin-binding protein” implies that members of this SOG are present in all human Alphaherpesvirinae species (“A”), and in some (but not all) Betaherpesvirinae (“b”). While most of the human Herpesviridae protein families fall into these basic cases, families which have a (some) domain(s) in common but differ in their DA, are more difficult to rationally name. An example of such a family is Glycoprotein B described above. Because members of this family have different DAs, namely “Glycoprotein_B” and “HCMVantigenic_N—Glycoprotein_B”, it is composed of two SOGs (named “ABG.AbG/Glycoprotein B” and “ABG.b/Glycoprotein B”). In such cases, we split the suffix into two parts, separated by a period. The first part (“ABG”) indicates overall presence of common domain(s) for all members of this SOG, Glycoprotein_B in this case. The second part (after the period) relates to entire DAs. “.AbG” of “ABG.AbG/Glycoprotein B” means that the Glycoprotein_B DA is present in all human Alpha- and Gamma-, and some Betaherpesvirinae. “.b” of “ABG.b/Glycoprotein B” implies that the “HCMVantigenic_N—Glycoprotein_B” DA is present in some Betaherpesvirinae.
11
References Altenhoff, A.M., Studer, R.A., Robinson-Rechavi, M., Dessimoz, C., 2012. Resolving the
ortholog conjecture: Orthologs tend to be weakly, but significantly, more similar in function than paralogs. PLoS Comput. Biol. 8, e1002514. https://doi.org/10.1371/journal.pcbi.1002514
Bateman, A., Martin, M.J., O’Donovan, C., Magrane, M., Alpi, E., Antunes, R., Bely, B., Bingley, M., Bonilla, C., Britto, R., Bursteinas, B., Bye-AJee, H., Cowley, A., Da Silva, A., De Giorgi, M., Dogan, T., Fazzini, F., Castro, L.G., Figueira, L., Garmiri, P., Georghiou, G., Gonzalez, D., Hatton-Ellis, E., Li, W., Liu, W., Lopez, R., Luo, J., Lussi, Y., MacDougall, A., Nightingale, A., Palka, B., Pichler, K., Poggioli, D., Pundir, S., Pureza, L., Qi, G., Rosanoff, S., Saidi, R., Sawford, T., Shypitsyna, A., Speretta, E., Turner, E., Tyagi, N., Volynkin, V., Wardell, T., Warner, K., Watkins, X., Zaru, R., Zellner, H., Xenarios, I., Bougueleret, L., Bridge, A., Poux, S., Redaschi, N., Aimo, L., ArgoudPuy, G., Auchincloss, A., Axelsen, K., Bansal, P., Baratin, D., Blatter, M.C., Boeckmann, B., Bolleman, J., Boutet, E., Breuza, L., Casal-Casas, C., De Castro, E., Coudert, E., Cuche, B., Doche, M., Dornevil, D., Duvaud, S., Estreicher, A., Famiglietti, L., Feuermann, M., Gasteiger, E., Gehant, S., Gerritsen, V., Gos, A., Gruaz-Gumowski, N., Hinz, U., Hulo, C., Jungo, F., Keller, G., Lara, V., Lemercier, P., Lieberherr, D., Lombardot, T., Martin, X., Masson, P., Morgat, A., Neto, T., Nouspikel, N., Paesano, S., Pedruzzi, I., Pilbout, S., Pozzato, M., Pruess, M., Rivoire, C., Roechert, B., Schneider, M., Sigrist, C., Sonesson, K., Staehli, S., Stutz, A., Sundaram, S., Tognolli, M., Verbregue, L., Veuthey, A.L., Wu, C.H., Arighi, C.N., Arminski, L., Chen, C., Chen, Y., Garavelli, J.S., Huang, H., Laiho, K., McGarvey, P., Natale, D.A., Ross, K., Vinayaka, C.R., Wang, Q., Wang, Y., Yeh, L.S., Zhang, J., 2017. UniProt: The universal protein knowledgebase. Nucleic Acids Res. 45, D158–D169. https://doi.org/10.1093/nar/gkw1099
Chen, X., Zhang, J., 2012. The Ortholog Conjecture Is Untestable by the Current Gene Ontology but Is Supported by RNA Sequencing Data. PLoS Comput. Biol. 8, e1002784. https://doi.org/10.1371/journal.pcbi.1002784
Desper, R., Gascuel, O., 2002. Fast and Accurate Phylogeny Minimum-Evolution Principle. J. Comput. Biol. 9, 687–705.
Eisen, J. a., 1998. Phylogenomics: improving functional predictions for uncharacterized genes by evolutionary analysis. Genome Res. 8, 163–7. https://doi.org/10.1101/gr.8.3.163
Finn, R.D., Coggill, P., Eberhardt, R.Y., Eddy, S.R., Mistry, J., Mitchell, A.L., Potter, S.C., Punta, M., Qureshi, M., Sangrador-Vegas, A., Salazar, G.A., Tate, J., Bateman,
12
A., 2016. The Pfam protein families database: Towards a more sustainable future. Nucleic Acids Res. 44, D279–D285. https://doi.org/10.1093/nar/gkv1344
99–113. https://doi.org/10.2307/2412448 Itoh, M., Nacher, J.C., Kuma, K., Goto, S., Kanehisa, M., 2007. Evolutionary history and
functional implications of protein domains and their combinations in eukaryotes. Genome Biol. 8, R121. https://doi.org/10.1186/gb-2007-8-6-r121
Jensen, R.A., 2001. Orthologs and paralogs - we need to get it right. Genome Biol. 2, INTERACTIONS1002.
Katoh, K., Standley, D.M., 2013. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol. 30, 772–80. https://doi.org/10.1093/molbev/mst010
Kuraku, S., Zmasek, C.M., Nishimura, O., Katoh, K., 2013. aLeaves facilitates on-demand exploration of metazoan gene family trees on MAFFT sequence alignment server with enhanced interactivity. Nucleic Acids Res. 41, W22–W28. https://doi.org/10.1093/nar/gkt389
Moore, A.D., Björklund, Å.K., Ekman, D., Bornberg-Bauer, E., Elofsson, A., 2008. Arrangements in the modular evolution of proteins. Trends Biochem. Sci. 33, 444–451. https://doi.org/10.1016/j.tibs.2008.05.008
Nehrt, N.L., Clark, W.T., Radivojac, P., Hahn, M.W., 2011. Testing the ortholog conjecture with comparative functional genomic data from mammals. PLoS Comput. Biol. 7. https://doi.org/10.1371/journal.pcbi.1002073
Patthy, L., 2003. Modular assembly of genes and the evolution of new functions. Genetica 118, 217–31.
Peisajovich, S.G., Garbarino, J.E., Wei, P., Lim, W. a, 2010. Rapid diversification of cell signaling phenotypes by modular domain recombination. Science (80-. ). 328, 368–72. https://doi.org/10.1126/science.1182376
Pickett, B.E., Greer, D.S., Zhang, Y., Stewart, L., Zhou, L., Sun, G., Gu, Z., Kumar, S., Zaremba, S., Larsen, C.N., Jen, W., Klem, E.B., Scheuermann, R.H., 2012. Virus pathogen Database and Analysis Resource (ViPR): A comprehensive bioinformatics Database and Analysis Resource for the Coronavirus research community. Viruses 4, 3209–3226. https://doi.org/10.3390/v4113209
Remm, M., Storm, C.E.V., Sonnhammer, E.L.L., 2001. Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. J. Mol. Biol. 314, 1041–1052. https://doi.org/10.1006/jmbi.2000.5197
Rogozin, I.B., Managadze, D., Shabalina, S.A., Koonin, E. V., 2014. Gene family level comparative analysis of gene expression n mammals validates the ortholog conjecture. Genome Biol. Evol. 6, 754–762. https://doi.org/10.1093/gbe/evu051
Schmidt, H. a, Strimmer, K., Vingron, M., von Haeseler, A., 2002. TREE-PUZZLE: maximum likelihood phylogenetic analysis using quartets and parallel computing. Bioinformatics 18, 502–4.
Stamatakis, a, Ludwig, T., Meier, H., 2005. RAxML-III: a fast program for maximum likelihood-based inference of large phylogenetic trees. Bioinformatics 21, 456–63.
13
https://doi.org/10.1093/bioinformatics/bti191 Tatusov, R.L., Koonin, E. V, Lipman, D.J., 1997. A genomic perspective on protein
Whelan, S., Goldman, N., 2001. A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. Mol. Biol. Evol. 18, 691–9.
Ye, Y., Godzik, A., 2004. Comparative analysis of protein domain organization. Genome Res. 14, 343–53. https://doi.org/10.1101/gr.1610504
Zhang, J., 2003. Evolution by gene duplication: An update. Trends Ecol. Evol. 18, 292–298. https://doi.org/10.1016/S0169-5347(03)00033-8
Zmasek, C.M., Eddy, S.R., 2002. RIO: analyzing proteomes by automated phylogenomics using resampled inference of orthologs. BMC Bioinformatics 3, 14.
Zmasek, C.M., Eddy, S.R., 2001. A simple algorithm to infer gene duplication and speciation events on a gene tree. Bioinformatics 17, 821–828. https://doi.org/10.1093/bioinformatics/17.9.821
Zmasek, C.M., Godzik, A., 2012. This Déjà Vu Feeling—Analysis of Multidomain Protein Evolution in Eukaryotic Genomes. PLoS Comput. Biol. 8, e1002701. https://doi.org/10.1371/journal.pcbi.1002701
Zmasek, C.M., Godzik, A., 2011. Strong functional patterns in the evolution of eukaryotic genomes revealed by the reconstruction of ancestral protein domain repertoires. Genome Biol. 12, R4. https://doi.org/10.1186/gb-2011-12-1-r4
14
Appendix: Taxonomic distribution of SOGs The columns are: 1. Taxonomy (see above) 2. Taxonomic coverage 3. Domain Architecture (Pfam domains) 4. SOG name base Herpesviridae ALPHA [28/28] Fusion_gly_K Envelope glycoprotein K ALPHA [28/28] Herpes_UL14 Tegument protein UL14 [2] ALPHA [28/28] Herpes_UL20 Envelope protein UL20 ALPHA [28/28] Herpes_UL4 Nuclear protein UL4 ALPHA [28/28] Herpes_UL46 Tegument protein VP11/12 ALPHA [28/28] Herpes_UL49_2 Tegument protein VP22 ALPHA [28/28] Herpes_gE Envelope glycoprotein E ALPHA [28/28] Marek_A Envelope glycoprotein C BETA [17/17] Cytomega_gL Envelope glycoprotein L BETA [17/17] DUF587--Herpes_UL87 Protein UL87 BETA [17/17] Herpes_U59 Tegument protein UL88 BETA [17/17] UL97 Tegument serine/threonine protein kinase CMV [6/6] Bax1-I Membrane protein US20 CMV [6/6] Cytomega_UL84 Protein UL84 CMV [6/6] Gp_UL130 Envelope glycoprotein UL130 CMV [6/6] Herpes_IE1 Regulatory protein IE1 CMV [6/6] UL141 Membrane protein UL14 GAMMA [16/16] DUF832 ORF 48 protein GAMMA [16/16] Herpes_BLRF2 ORF 52 protein GAMMA [16/16] Herpes_BMRF2 ORF 58 protein GAMMA [16/16] Herpes_BTRF1 ORF 23 protein GAMMA [16/16] Herpes_DNAp_acc BMRF1 GAMMA [16/16] Herpes_ORF11 ORF 10 protein GAMMA [16/16] Phage_glycop_gL Envelope glycoprotein L [4] GAMMA [16/16] Tegument_dsDNA--AIRS_C--GATase_5 BNRF1 H6 [3/3] Rep_N--Parvo_NS1 Protein U94 HERPESVIRIDAE [61/61] Herpes_Helicase DNA replication helicase HERPESVIRIDAE [61/61] Herpes_MCP Major capsid protein HERPESVIRIDAE [61/61] Herpes_UL16 Cytoplasmic envelopment protein 2 HERPESVIRIDAE [61/61] Herpes_UL17 Capsid vertex component 1 HERPESVIRIDAE [61/61] Herpes_UL24 Nuclear protein UL24 HERPESVIRIDAE [61/61] Herpes_UL25 Capsid vertex component 2 HERPESVIRIDAE [61/61] Herpes_UL52 DNA primase HERPESVIRIDAE [61/61] Herpes_UL6 Portal protein HERPESVIRIDAE [61/61] Herpes_V23 Triplex capsid protein 2 HERPESVIRIDAE [61/61] Herpes_alk_exo Alkaline nuclease HERPESVIRIDAE [61/61] Herpes_glycop Envelope glycoprotein M HERPESVIRIDAE [61/61] Herpes_glycop_H Envelope glycoprotein H HERPESVIRIDAE [61/61] PRTP Tripartite terminase subunit 1 HERPESVIRIDAE [61/61] Peptidase_S21 Capsid scaffolding protein HERPESVIRIDAE [61/61] UDG Uracil-DNA glycosylase HERPESVIRIDAE [61/61] Viral_DNA_bp Major DNA-binding protein LYMPHOCRYPTOVIRUS [4/4] EBV-NA1 EBNA-1 MACAVIRUS [4/4] AIRS_C--GATase_5 ORF 3 protein MARDIVIRUS [5/5] Marek_SORF3 SORF3 MUROMEGALOVIRUS [4/4] M157 M151 PERCAVIRUS [2/2] CARD Apoptosis regulator E10 ROSEOLOVIRUS [4/4] HHV6-IE U90 ROSEOLOVIRUS [4/4] Herpes_U15 U15
15
ROSEOLOVIRUS [4/4] Herpes_U26 Protein U26 ROSEOLOVIRUS [4/4] Herpes_U47 U47 protein ROSEOLOVIRUS [4/4] Herpes_U55 U55 protein alpha [11/28] DNA_pol_B_exo1--DNA_pol_B--DNAPolymera_Pol DNA polymerase [2] alpha [11/28] Herpes_UL42 DNA polymerase processivity subunit [2] alpha [12/28] Herpes_UL1 Envelope glycoprotein L [2] alpha [13/28] DUF1314 Myristylated tegument protein CIRC alpha [13/28] Herpes_UL1--GlyL_C Envelope glycoprotein L [3] alpha [15/28] Herpes_UL42--Herpes_UL42 DNA polymerase processivity subunit [3] alpha [18/28] Herpes_UL49_5 Envelope glycoprotein N alpha [19/28] Alpha_TIF Transactivating tegument protein VP16 alpha [19/28] Herpes_US9 Membrane protein US9 alpha [21/28] Gene66 Virion protein US10 alpha [21/28] Herpes_UL43 Envelope protein UL43 alpha [22/28] UL45 Membrane protein UL45 alpha [23/28] Herpes_UL55 Nuclear protein UL55 alpha [23/28] US2 Virion protein US2 alpha [24/28] UL11 Cytoplasmic envelopment protein 3 [2] alpha [25/28] Herpes_ICP4_N--Herpes_ICP4_C Transcriptional regulator ICP4 [2] alpha [25/28] Herpes_glycop_D Envelope glycoprotein D alpha [26/28] Herpes_IE68 Regulatory protein ICP22 alpha [26/28] Herpes_UL35 Small capsomere-interacting protein [2] alpha [27/28] Herpes_UL21 Tegument protein UL21 alpha [27/28] Herpes_UL3 Nuclear protein UL3 alpha [27/28] Herpes_UL37_1 Inner tegument protein [2] alpha [27/28] Herpes_UL47 Tegument protein VP13/14 alpha [27/28] Herpes_UL51 Tegument protein UL51 [2] alpha [27/28] Herpes_gI Envelope glycoprotein I alpha [27/28] Herpes_teg_N--Herpes_UL36 Large tegument protein deneddylase [2] alpha [3/28] Herpes_ICP4_N Immediate-early transactivator protein alpha [5/28] Herpes_gp2 Envelope glycoprotein alpha [7/28] XPG_I UL41 alpha [8/28] zf-RING_2 Ubiquitin E3 ligase ICP0 [2] beta [10/17] Herpes_UL74 Envelope glycoprotein O beta [12/17] DUF2664 Tegument protein UL14 beta [12/17] HV_small_capsid Small capsomere-interacting protein beta [13/17] Herpes_U5 Protein UL27 beta [14/17] DUF570 Protein UL31 beta [14/17] Herpes_IE2_3 Protein UL117 beta [14/17] Herpes_PAP DNA polymerase processivity subunit beta [14/17] Herpes_UL37_2 Envelope glycoprotein UL37 beta [14/17] Herpes_UL82_83 Tegument protein pp71 beta [14/17] Herpes_pp85 Tegument protein UL35 beta [14/17] U79_P34 Protein UL112 beta [14/17] US22--US22 Tegument protein UL24 beta [16/17] Ribonuc_red_lgC Ribonucleoside-diphosphate reductase large subunit-like protein [2] beta [3/17] ig--C2-set_2 Membrane protein EE22A beta [5/17] Herpes_UL32 Tegument protein pp150 beta [5/17] Herpes_env--Herpes_env Packaging protein UL32 [2] cmv [2/6] Adeno_E3_CR1 Membrane RL1 protein3 cmv [2/6] An_peroxidase Rh10 cmv [2/6] C1-set Membrane protein A13 cmv [2/6] Cytomega_TRL10 Envelope glycoprotein RL10 cmv [2/6] Cytomega_UL20A Glycoprotein UL22A cmv [2/6] HCMV_UL139 Membrane glycoprotein UL139 cmv [2/6] HHV-5_US34A Protein US34A cmv [2/6] MHC_I Membrane glycoprotein UL18 cmv [2/6] MHC_I--C1-set Membrane protein A18 cmv [2/6] TNFR_c6 Membrane glycoprotein UL144 cmv [2/6] UL2 Protein UL2 cmv [2/6] UL40 Membrane glycoprotein UL40 cmv [3/6] US22--US22--US22--US22 Protein UL29 [2] cmv [4/6] Cytomega_US3 Membrane glycoprotein US2 cmv [4/6] DUF2677 Membrane protein UL121 cmv [4/6] US22--US22--US22 Protein UL29 cmv [4/6] gpUL132 Envelope glycoprotein UL132
16
cmv [5/6] CMV_US Membrane glycoprotein US11 gamma [12/16] DUF717 ORF 30 protein gamma [14/16] DUF848 ORF 35 protein gamma [14/16] Herpes_BBRF1 BRRF1 gamma [14/16] Herpes_TAF50 ORF 50 protein gamma [15/16] DUF2733 Cytoplasmic envelopment protein 3 gamma [15/16] Herpes_TK--Herpes_TK_C Thymidine kinase [2] gamma [15/16] Herpes_capsid Small capsomere-interacting protein [3] gamma [4/16] DED FLICE inhibitory protein gamma [6/16] BALF1 BALF1 gamma [8/16] Herpes_HEPA--Herpes_heli_pri Helicase-primase subunit gamma [8/16] Herpes_heli_pri ORF 41 protein herpesviridae [12/61] IL10 Interleukin-10 herpesviridae [15/61] DNA_pack_N Packaging protein herpesviridae [15/61] US22 Tegument protein UL26 herpesviridae [16/61] DNA_pack_C DNA packaging terminase subunit 1 herpesviridae [17/61] Herpes_UL87 ORF 24 protein herpesviridae [2/61] ig Membrane protein EE22 herpesviridae [22/61] 7tm_1 Envelope glycoprotein UL33 herpesviridae [29/61] Pkinase Serine/threonine protein kinase US3 herpesviridae [3/61] AAA_31 Plasmid-partitioning protein SopA herpesviridae [3/61] CAT Chloramphenicol acetyltransferase herpesviridae [3/61] ParBc Plasmid partition protein B herpesviridae [3/61] Phage_integrase Cre herpesviridae [3/61] Rep_3 Replication initiation protein RepE herpesviridae [3/61] ig--ig Membrane protein EE51 herpesviridae [30/61] Herpes_U30 Inner tegument protein herpesviridae [31/61] Herpes_UL79 Protein UL79 herpesviridae [32/61] Herpes_TK Thymidine kinase herpesviridae [32/61] Herpes_U44 Tegument protein UL51 herpesviridae [33/61] Herpes_UL49_1 Protein UL49 herpesviridae [33/61] Herpes_UL92 Protein UL92 herpesviridae [33/61] Herpes_UL95 Protein UL95 herpesviridae [33/61] Herpes_teg_N Large tegument protein deneddylase herpesviridae [34/61] Herpes_UL73 Envelope glycoprotein N [2] herpesviridae [35/61] Herpes_ori_bp DNA replication origin-binding helicase herpesviridae [4/61] Branch Beta-1,3-galactosyl-O-glycosyl-glycoprotein beta-1,6-N-acetylglucosaminyltransferase herpesviridae [4/61] Sema Membrane protein TE7 herpesviridae [43/61] Ribonuc_red_lgN--Ribonuc_red_lgC Ribonucleoside-diphosphate reductase large subunit herpesviridae [46/61] Ribonuc_red_sm Ribonucleoside-diphosphate reductase small subunit herpesviridae [5/61] IL8 Chemokine vCXCL6 herpesviridae [5/61] Lectin_C B168.5 herpesviridae [5/61] V-set OX-2 membrane glycoprotein homolog herpesviridae [50/61] DNA_pol_B_exo1--DNA_pol_B DNA polymerase herpesviridae [52/61] dUTPase Deoxyuridine 5'-triphosphate nucleotidohydrolase herpesviridae [54/61] DNA_pack_N--DNA_pack_C Tripartite terminase subunit 3 herpesviridae [55/61] Herpes_HEPA DNA helicase/primase complex-associated protein herpesviridae [56/61] Herpes_UL69 Multifunctional expression regulator [2] herpesviridae [57/61] Herpes_UL7 Cytoplasmic envelopment protein 1 herpesviridae [57/61] Herpes_env Packaging protein UL32 herpesviridae [58/61] Herpes_UL33 Tripartite terminase subunit 2 herpesviridae [6/61] Bcl-2 Apoptosis regulator BHRF1 herpesviridae [60/61] Glycoprotein_B Envelope glycoprotein B herpesviridae [60/61] Herpes_U34 Nuclear egress protein 2 herpesviridae [60/61] Herpes_UL31 Nuclear egress protein 1 herpesviridae [60/61] Herpes_VP19C Triplex capsid protein 1 herpesviridae [9/61] Thymidylat_synt Thymidylate synthase lymphocryptovirus [3/4] EBV-NA3 EBNA-3C lymphocryptovirus [3/4] Herpes_BLLF1 Envelope glycoprotein gp350 lymphocryptovirus [3/4] Herpes_LMP2 C7 macavirus [3/4] bZIP_2 A2 mardivirus [3/5] Herpes_pp38 Pp38 mardivirus [4/5] DUF1509 ORF 437 protein rhadinovirus [2/6] Cyclin_N--Herp-Cyclin V-cyclin rhadinovirus [2/6] Cyclin_N--K-cyclin_vir_C ORF 72 protein rhadinovirus [2/6] IRF--IRF-3 K9