Genomic sequence analysis tools and a genotype-phenotype association platform in the Virus Pathogen Resource Yun Zhang 1 , Brett Pickett 1 , Eva Rab 1 , Jyothi Noronha 1 , R. Burke Squires 1 , Victoria Hunt 1 , Mengya Liu 2 , Liwei Zhou 3 , Chris Larson 4 , Jonathan Dietrich 3 , Edward B. Klem 3 , Richard H. Scheuermann 1,5 1 Department of Pathology, 5 Division of Biomedical Informatics, Univ. of Texas Southwestern Medical Center, Dallas, TX; 2 Southern Methodist Univ., Dallas, TX; 3 Northrop Grumman Health Solutions, Rockville MD; 4 Vecna Technologies, Greenbelt MD. Introduction Figure 2: A screenshot of the Sequence Feature Details page. The details page displays strain information, Sequence Feature information, available 3D protein structures, and a table containing all Variant Types for the selected Sequence Feature. 1 (2011) Ongoing and future developments at the Universal Protein Resource. Nucleic acids research, 39, D214-219. 2 Sayers, E.W., Barrett, T., Benson, D.A., Bolton, E., Bryant, S.H., Canese, K., Chetvernin, V., Church, D.M., DiCuccio, M., Federhen, S. et al. (2011) Database resources of the National Center for Biotechnology Information. Nucleic acids research, 39, D38-51. 3 Vita, R., Zarebski, L., Greenbaum, J.A., Emami, H., Hoof, I., Salimi, N., Damle, R., Sette, A. and Peters, B. (2010) The immune epitope database 2.0. Nucleic acids research, 38, D854-862. Edgar, R.C., 2004. 4 MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC bioinformatics 5, 113. 5 Waterhouse, A.M., Procter, J.B., Martin, D.M., Clamp, M. and Barton, G.J. (2009) Jalview Version 2--a multiple sequence alignment editor and analysis workbench. Bioinformatics, 25, 1189-1191. 6 Zmasek, C.M. and Eddy, S.R. (2001) ATV: display and manipulation of annotated phylogenetic trees. Bioinformatics, 17, 383-384. 7 Hanson, R. (2010) Jmol - a paradigm shift in crystallographic visualization. Journal of Applied Crystallography, 43, 1250-1260. We would like to thank Elliot Lefkowitz, Carla Kuiken, Bernard Moss, and R. Pad Padmanabhan for reviewing and validating the SFVT definitions, as well as the primary data providers for the sequence data that was used throughout this study. We also recognize the scientific and technical personnel responsible for supporting and developing ViPR, which has been wholly supported with federal funds from the NIAID, NIH, Department of Health and Human Services (N01AI2008038 to R.H.S.). Figure 4: 3D Protein Structure Viewer in the Virus Pathogen Database and Analysis Resource (ViPR). A display of an example sequence feature highlighted on the 3D structure of the Hepatitis C Virus NS5b protein (PDB ID: 1CSJ). ViPR combines the strength of a relational database with a suite of bioinformatics integrated tools to support everything from basic sequence and structural analyses to more advanced genotype-phenotype studies. The uniqueness of ViPR lies in: • integrating data from various sources • encouraging the analysis of the comprehensive data contained within the system • combining the available tools to quickly perform complex analytical workflows • facilitating rapid hypothesis generation using bio- informatics methods for subsequent experimental testing • allowing data sharing and storage with collaborators in personal workbenches Figure 1: A screenshot of the ViPR homepage. The ViPR homepage is the portal used to access the various types of data and advanced functionality within the system. The Virus Pathogen Database and Analysis Resource (ViPR, www.viprbrc.org ), sponsored by the National Institute of Allergy and Infectious Diseases serves as a single publicly-accessible repository of integrated datasets and analysis tools for 14 different virus families to support wet-bench virology researchers focusing on the development of diagnostics, prophylactics, vaccines, and treatments for these pathogens. ViPR Supports 14 Virus Families Arenaviridae, Bunyaviridae, Caliciviridae, Coronaviridae, Filoviridae, Flaviviridae, Hepeviridae, Herpesviridae, Paramyxoviridae, Picornaviridae, Poxviridae, Reoviridae, Rhabdoviridae, and Togaviridae. ViPR Intergrates Data from Many Sources • GenBank sequence records, gene annotations, and strain metadata • Gene Ontology (GO) classifications • UniProtKB protein annotations • Protein Databank (PDB) 3D protein structures • Immune epitopes from the Immune Epitope Database (IEDB) • Clinical data • Additional data derived from computational algorithms • Host-Pathogen Interactions* coming soon ViPR Provides Analysis and Visualization Tools • Genome Annotator • BLAST Sequence Similarity Search • Multiple Sequence Alignment • Phylogenetic Tree Construction • 3D Protein Structures with Sequence Feature or Epitope Highlights • Sequence Feature Variant Type (SFVT) Analysis • Metadata-driven Comparative Genomics Analysis • SNP Analysis ViPR enables you to store and share data and results through the ViPR Workbench Figure 3: Analytical tools available for SFVT data. (A) A multiple sequence alignment calculated with MUSCLE 4 and visualized with JalView 5 in ViPR. (B) A metadata-driven comparative genomics analysis tool to identify individual positions that correlate with a metadata attribute. (C) A phylogenetic tree that has been automatically colored according to country of isolation using the Archaeopteryx 6 tool showing the relationship between HCV-1 genomes . Loading Virus Pathogen Database and Analysis Resource (ViPR)... Search Search our comprehensive database for: Analyze Analyze data online: Save to Workbench Use your workbench to: Browse All Search Types Browse All Tools Single-Stranded Positive-Sense RNA Single-Stranded Negative-Sense RNA Double-Stranded RNA Double-Stranded DNA Clinical Data for Human Dengue Virus Isolates!! In this release of ViPR we have added extensive clinical data for ~2600 Dengue Virus isolates. Includes patient demographics and measures of disease severity, immune response, evolution, etc. Isolates from both Eastern and Western Hemispheres. All linked to complete genome sequences! Try our Metadata Genome Compare tool for custom comparative analysis. Search, or download complete dataset. Like Brett Pickett , Burke Squires and 16 Click on family or species of interest in taxonomy below to view viral genomes, or click here to view in list format. Genomes Genes & proteins Immune epitopes 3D protein structures Identify similar sequences (BLAST) Align sequences (MSA) Identify short peptides in proteins Visualize aligned sequences Store data in working sets for future analysis Integrate ViPR data with your laboratory data Store analysis results Share results and data with collaborators Virus Families Caliciviridae Coronaviridae Flaviviridae Hepeviridae Picornaviridae Togaviridae Arenaviridae Bunyaviridae Filoviridae Paramyxoviridae Rhabdoviridae Reoviridae Herpesviridae Poxviridae Click on a featured virus of interest to go to virus-specific home page. Featured Viruses Dengue Hepatitis C virus Highlights ViPR Workbench The ViPR Workbench allows users to save 'working sets' of sequences, searches and analysis results between web sessions in their own private workspace. Users can share working sets or analysis results with collaborators. Key Highlights: Connect with Us About Us Announcements Links Resources Support You are logged in as [email protected] !"#$%&' )*+% ,- ./0% 1* 2%34%'5% 6%*$4"% )*+% 7%0*$&$&89:9;&"4897;<1="%>&#'9?@AB-CD 2%34%'5% 6%*$4"% EF 7%0*$&$&89:9;&"4890#G/0"#$%&'9261?C !#8&$&#' ?@A=H11 :#++%'$ 7;<1 ;.=1 2$"*&' B"%I%"%'5% 8$"*&'D 7CC J ):9KKH1K- 2%34%'5% 6%*$4"% L%I&'&$&#' 2$"*&' =)JM= ;*"&*'$ ./0% &' 2%34%'5% 6%*$4"% L%I&'&$&#' 2$"*&' =)JM= 2#4"5% !#8&$&#' ?@A=H11 2#4"5% 2%34%'5% M55%88&#' =)JM= !"#$%&' 2$"45$4"% =)JM= 2#4"5% !4NG&5*$&#' O'&0"#$P!-CQA@ R!4NS%FP1@?H?HCC R1AK1T@TC ,U&F%'5% :#F%8 ,V! !"#$"%&" (")*$+" ,"*)-./ 7#+% 2%34%'5% 6%*$4"% ;*"&*'$ ./0%8 2%34%'5% 6%*$4"%8 2%34%'5% 6%*$4"% L%$*&G8 B7%0*$&$&89:9;&"4897;<1="%>&#'9?@AB-CDD !"#$%&%&' ) *&+,' 2,M<:7 LM.M M)MWXY, Z ;E2OMWEY, [\<]^,):7 ;E<O2 6MSEWE,2 7\S, MN#4$ O8 M''#4'5%+%'$8 W&'_8 <%8#4"5%8 2400#"$ 2&>' \4$ X#4 *"% G#>>%F &' *8 /4'`ab*'>c4$8#4$bd%8$%"'`%F4 A C B !"#$%$%&%'#( !"#$%& (#)# #*#+,-" . /0!1#+0-" 23$45"*%& /0$1! 6#70+0"! &37" #89:; 1< #==9:=>?@?=;< +A=B< $?<9:C>?< !:DD9C; !AE= 3:; ,9: FC? G9EE?H A= F< I:=JKLF=EM:;<9:;LN?<;?C=J?H: !"#$ &"'( )*#+,"-./ ($0+,# - 1*2 ($3*$/# 4$5 ),-"6"62 !+6#-7# 8/ ($.$-/$ 9-#$: ;*. <=> ?@<< )A"/ 0,+B$7# "/ C*6D$D 5E #A$ F-#"+6-. G6/#"#*#$ +C H..$,2E -6D G6C$7#"+*/ 9"/$-/$/ I FGJ K 9JJLM *6D$, !+6#,-7# F+N JJLF?O??@@P@@@Q<! -6D "/ - 7+..-5+,-#"+6 5$#R$$6 F+,#A,+0 S,*TT-6 J$-.#A G)> 86"U$,/"#E +C )$V-/ L+*#AR$/#$,6 W$D"7-. !$6#$, -6D &$76- )$7A6+.+2"$/N &",*/ "T-2$/ 7+*,#$/E +C !9! '*5."7 J$-.#A GT-2$ X"5,-,E> 4$..7+T$ GT-2$/> 8NLN 9$0-,#T$6# +C &$#$,-6/ HCC-",/ > L7"$67$ +C #A$ G6U"/"5.$ -6D &",-.Y+6$> LR"// G6/#"#*#$ +C 1"+"6C+,T-#"7/N !"#$ &'()*+#+ ,-$(#)+ !"#$%&'& )*+#'$ , -./012"3*&42"5*3&0#"506*$#4&*3&072&+,+3*#+8*"+,9*+#.!:; ,./').(" 0'12$ 3-'-4($- 56*).7-'-$#8 94-- :6./ 5;<()2-+ =(4 5).$ 9*+#,5#+# <*"28* .284#3'&2" 6*423+ =:'>?*+@ 9<0ABCDCABEEFGH )A$ W$#-D-#- S$6+T$ !+T0-,"/+6 H6-.E/"/ )++. 7+6/"/#/ +C #A,$$ 0-,#/: - T*.#"0.$ /$3*$67$ -."26T$6# I*/"62 W8L!XZM> - 7A"[/3*-,$ 2++D6$// +C C"# #$/# #+ "D$6#"CE 0+/"#"+6/ I7+.*T6/M +C #A$ T*.#"0.$ /$3*$67$ -."26T$6# #A-# /"26"C"7-6#.E D"CC$, C,+T #A$ $V0$7#$D I,-6D+TM D"/#,"5*#"+6 +C ,$/"D*$/ 5$#R$$6 -.. T$#-D-#- 2,+*0/> -6D - '$-,/+6\/ 7A"[/3*-,$ #$/# #+ "D$6#"CE #A$ /0$7"C"7 0-",/ +C T$#-D-#- 2,+*0/ #A-# 7+6#,"5*#$ #+ #A$ +5/$,U$D /#-#"/#"7-. D"CC$,$67$N 4A$6 ] +, T+,$ 2,+*0/ -,$ "67.*D$D "6 #A$ -6-.E/"/> #A$ ![U-.*$ C,+T #A$ S++D6$// +C ^"# #$/# R".. "D$6#"CE 7+.*T6/ A-U"62 /"26"C"7-6# U-,"-#"+6 5$#R$$6 -.. 2,+*0/> RA".$ #A$ '$-,/+6\/ #$/# R".. "D$6#"CE #A$ /0$7"C"7 0-",I/M +C 2,+*0/ #A-# T-_$ #A$ 7+.*T6 /"26"C"7-6# I"N$N "C #A+/$ 2,+*0/ R$,$ 6+# "67.*D$D "6 #A$ -6-.E/"/> #A$ 7+.*T6 R+*.D 6+ .+62$, 5$ "D$6#"C"$D -/ /"26"C"7-6#MN F+#$: RA$6 +6.E #R+ T$#-D-#- 2,+*0/ -,$ 7+T0-,$D> #A$ 0[U-.*$/ C+, 5+#A /#-#"/#"7-. #$/#/ R".. 5$ "D$6#"7-. /"67$ -.. /"26"C"7-6# 7+.*T6/ C,+T #A$ S++D6$// +C ^"# #$/# 7-6 +6.E D"CC$, 5$#R$$6 2,+*0/N >6#;+?2(4- 3.."'-++ .@ A#$ 9-+$ B-+2)$ )A$,$ -,$ ? 0+/"#"+6/ #A-# A-U$ - /"26"C"7-6# 6+6[,-6D+T D"/#,"5*#"+6 5$#R$$6 #A$ /0$7"C"$D 2,+*0/N 5.+#$#.' >6#;+?2(4- <()2- >;C()2- ,-74-- A4--".D ]`?? =N]?O @N@]PO< ] =]`? <@NOaO @N@<]@a ] J"D$ S,-0A 5-(4+.'E+ >6#;+?2(4- 5(#4/#+- >.D1(4#+.' B-1.4$ )A$,$ -,$ ? 0+/"#"+6/ RA"7A -,$ /"26"C"7-6#.E D"CC$,$6# 5$#R$$6 #A$ 2,+*0/N 5.+#$#.' F2)$#1)- >.D1(4#+.' 5;C()2- :2G+-$+ >.'$4#G2$#'7 $. :#7'#@#8('8- ]`?? @N@]PO< <[? =]`? @N@<]@a <[? J+T$ WE 4+,_5$67A H6-.E/"/ IJ!&bF+6,$/0+6D$,/b-6Db($.-0/$,/b'+/#[#,$-#T$6#[W$#-!H)LM !"#$%& () *%+"%,-%. /0 *%+"%,-% 12$%3 *4&25, !2#% *26% 7,238.5. 9%,%&24% :;83(<%,%45- =&%% !"#$%&"'( *&"+,(- .(/$(,0(# *>?@>!A> B!CDEF7=BD! J+T$ WE 4+,Z5$67A 4+,Z"62NNN H."26 L$3*NNN ($/*.#/ &"/*-."[$ H."26$D L$3*$67$/ ($/*.#/ !"#$%$%&%'#( L\H(!J 9H)H HFHX]Y\ ^ &GL8HXGY\ 4_(`1\F!J &G(8L aHWGXG\L J_W\ H5+*# 8/ H66+*67$T$6#/ X"6Z/ ($/+*,7$/ L*00+,# L"26 _*# ]+* -,$ .+22$D "6 -/ E*6N[A-62b*#/+*#AR$/#$,6N$D* !"#$ &"'( )*#+,"-./ ($0+,# - 1*2 ($3*$/# 4$5 ),-"6"62 !+6#-7# 8/ ($.$-/$ 9-#$: ;*. <=> ?@<< )A"/ 0,+B$7# "/ C*6D$D 5E #A$ F-#"+6-. G6/#"#*#$ +C H..$,2E -6D G6C$7#"+*/ 9"/$-/$/ I FGJ K 9JJLM *6D$, !+6#,-7# F+N JJLF?O??@@P@@@Q<! -6D "/ - 7+..-5+,-#"+6 5$#R$$6 F+,#A,+0 S,*TT-6 J$-.#A G)> 86"U$,/"#E +C )$V-/ L+*#AR$/#$,6 W$D"7-. !$6#$, -6D &$76- )$7A6+.+2"$/N &",*/ "T-2$/ 7+*,#$/E +C !9! '*5."7 J$-.#A GT-2$ X"5,-,E> 4$..7+T$ GT-2$/> 8NLN 9$0-,#T$6# +C &$#$,-6/ HCC-",/ > L7"$67$ +C #A$ G6U"/"5.$ -6D &",-.Y+6$> LR"// G6/#"#*#$ +C 1"+"6C+,T-#"7/N !"#"$ &'"( )*+" &'"( ,# -.*/" )E0$ ;W+. 7+TT-6D A$,$N !01 L-U$ &"$R: 9$/7,"0#"+6: 2!3)4,5 )4!6246!7 89 4:7 !;,<=7>7;=7;4 !;, >853?7!,)7 89 :7>,4-4-) 2 &-!6) '91 X"6Z: <!L; @?85 28??,;= 5-;7A HDU-67$D */$,/ 7-6 $6#$, - ;W+. 7+TT-6D D",$7#.E */"62 ;W+. G6#$,-7#"U$ L7,"0# N ),&7B!7)48!7 )A$ /#-#$ +C #A$ U"$R$, IA"2A."2A#"62> [++T"62> $#7M 7-6 5$ /-U$D -# -6E #"T$ -6D #A$6 ,$#,"$U$D .-#$,N !A++/$ #A$ U"$R E+* R-6# #+ /-U$> #A$6 ,$/#+,$ "# .-#$, RA$6 E+* -,$ ,$-DEN 9"/0.-E )E0$: "#$%&'()* "+),$+,)# -& .()+%%& 9$#-"./ / 0 .#11 23-+%3#4 2C"*D :'/EC'/E$ 2C"*D !+.+, L#,*7#*,$ 1E: X-5$.: =-)>5,3 8>4-8;)A )A$/$ +0#"+6/ 7+6#,+. #A$ 2$6$,-. -00$-,-67$ +C #A$ 0,+#$"6 /#,*7#*,$ "6 #A$ U"$R$,N :-F:5-F:4 5-F,;=) J"2A."2A# X"2-6D/ "6 :-F:5-F:4 7>-48>7) J"2A."2A# $0"#+0$/ +6 #A$ /#,*7#*,$ "6 N \",/#> /$.$7# -6 $0"#+0$ #E0$ C,+T #A$ ."/#N )A$6 7A$7Z $0"#+0$/ #+ A"2A."2A#N >"G$'H" #"I0"1J" !*1/" -7=K -= HH!(HH ?O?@]?O?^ O_ HH`Xa9 ?O?Q]?O?P ?^? H!(HH` ?O?<]?O?_ ^=P HG`LX) ?_O<]?_O_ <P=? H`Xa9! ?O?^]?Ob@ ??bO :-F:5-F:4 K3 )L-))<>!84 >8)-4-8; J"2A."2A# "6 #A$ ,$/"D*$/ "6 #A"/ /#,*7#*,$ 7+,,$/0+6D"62 #+ D$C"6$D LR"//',+# 0+/"#"+6/N c6#$, +6$ +, T+,$ 7+TT-]D$."T"#$D 0+/"#"+6/ I<^>b@M> +, - ,-62$ I<^]b@M> #A$6 7."7Z J"2A."2A#N ?_Q@]?_Q^> ?O@?]?O@b> ?Ob_ '91 L$3*$67$KL#,*7#*,$ 9$#-"./ :-F:5-F:4B5,K75 97,46!7) )A$/$ +0#"+6/ 7+.+,> A"2A."2A# +, .-5$. 7$,#-"6 C$-#*,$/ +C #A$ /#,*7#*,$N ($/#+,$ &"$R: Y++T: 5667 L0"6: J+T$ b9 ',+#$"6 L#,*7#*,$ L$-,7A ($/*.#/ ',+#$"6 L#,*7#*,$ &"$R$, I<!L;M !"#$%&%&' ) *&+,' LcH(!J 9H)H HFHXdYc e &GL8HXGYc 4f(`1cF!J &G(8L \HWGXGcL JfWc H5+*# 8/ H66+*67$T$6#/ X"6Z/ ($/+*,7$/ L*00+,# L"26 f*# d+* -,$ .+22$D "6 -/ E*6N[A-62g*#/+*#AR$/#$,6N$D* Multiple Sequence Alignment, Phylogenetic Tree and Meta-CATS 3D Protein Structure Viewer Summary Acknowledgements References Sequence Feature Variant Type (SFVT) • Sequence Features (SFs): characterized structural, functional, immune epitope, or sequence alteration regions of a protein manually curated from UniProt 1 , GenBank 2 , and the Immune Epitope Database 3 and then validated by expert researchers. • Variant Type (VT): Polymorphisms in each Sequence Feature are identified as “Variant Types” of the Sequence Feature. • Available for hepatitis C virus subtype 1a, Dengue virus type 2, and Orthopoxvirus (Vaccinia) in ViPR. • Enables researchers to quickly query and analyze the genotypic changes for all sequence records that could be associated with a given phenotype.