Using Apollo at the i5k Workspace@NAL NAL USDA-ARS https://i5k.nal.usda.gov August 29 th , 2017
Feb 10, 2020
UsingApolloatthei5kWorkspace@NAL
NALUSDA-ARShttps://i5k.nal.usda.gov
August29th,2017
Agenda
• Manualannotationgeneraloverview• I5kWorkspacetoolsformanualannotation– BLAST,Clustal,HMMER– Apollo
• Manualannotationexample:preparation• Manualannotationliveexample
Otherresources• MonicaMunoz-TorresfromtheApollogrouphasanumberof
comprehensivetutorials:– https://www.slideshare.net/MonicaMunozTorres/presentations
• Irecommendtheseslidesifyouneedmorebackground:– https://www.slideshare.net/MonicaMunozTorres/apollo-workshop-at-ksu-2015
• Note- therearetwoversionsofApollo.Thei5kWorkspacestillusestheolderversionwithaslightlydifferentinterface
– IfyouarenewtoApollo,orneedarefresher,wehighlyrecommendthatyoureviewoneofherpresentations
• TheofficialApolloannotationguide:– http://genomearchitect.org/users-guide/
• Othermanualcurationtutorials:– https://i5k.nal.usda.gov/manual-curation-example– http://genomecuration.github.io/genometrain/d-feature-curation-
crossing/
Manualannotationgeneraloverview
Whatismanualannotation?
• Manualreviewandimprovementofanexistinggeneprediction
• Often,butnotalways:drawingonexternalevidence(e.g.RNA-Seq,cDNA,genesfromotherspecies)toimproveacomputationallypredictedgenemodel– Structuralannotation– definingthegenestructure(e.g.exonboundaries)
– Functionalannotation– describingthegenefunction(e.g itsname)
Whymanuallyannotate?
• “Incorrectannotationspoisoneveryexperimentthatmakesuseofthem”
• “Worsestill,thepoisonspreadsbecauseincorrectannotationsfromoneorganismareoftenunknowinglyusedbyotherprojectstohelpannotatetheirowngenomes.”– Yandell andEnce 2012,doi:10.1038/nrg3174
Generalprocessofmanualannotation1. Selectachromosomalregionofinterest(e.g.scaffold)
1. E.g.findsequenceofinterestfromoneorseveralotherspecies,andalignagainstproteinsorgenomesequencefromyourspecies
2. Selectappropriateevidence(tracksinApollo,oryourownfiles)3. Determinewhetherafeatureinyourevidenceprovidesareasonable
startinggenemodel1. Ifyes:selectanddragthefeaturetothe‘user-createdannotations’area,creatinganinitial
genemodel.Ifnecessaryuseeditingfunctionstoadjustthemodel.2. Ifnot– getintouchwithus!
4. Editmodelifnecessary5. Checkyoureditedgenemodelforintegrityandaccuracybycomparingit
withavailablehomologs1. Verifythatthegenemodelisthebestrepresentationoftheunderlyingbiology
6. Repeatsteps1through5 asneededtorefinemodel7. Addannotationdetailsinthe“InformationEditor”
1. Replacedmodel,name,symbol,othercomments
Adaptedfromhttps://www.slideshare.net/MonicaMunozTorres/apollo-workshop-at-ksu-2015
I5kWorkspace‘Etiquette’1. UseApollotoimproveagenemodelinani5kWorkspaceassembly.
1. Ifyoujustwanttopractice– useoneofourtraininginstances.1. https://i5k.nal.usda.gov/jbrowseapollo-training
2. Ifyoujustwanttoviewthedata– youprobablycangetwhatyouwantwithoutusingApollo.Allofthedatathatwehostispublic.
2. Yourannotationworkisacommunityeffort.1. Ifyounoticethatsomeoneelseisworkingonyourmodelofchoice,getin
touchwiththem(orus)andcollaborate– don’tmakea2ndmodelordeletetheothermodel.
2. Keepinmindthatyourworkwillbeusedbythescientificcommunityonceyou’redone.
3. Ifyoupublishanyofyourworkgeneratedinthei5kworkspace:1. Getintouchwiththegenomecontactfirst(youcanfindthecontactinfoon
theorganismpage;https://i5k.nal.usda.gov/species);2. Pleasecitethei5kWorkspacepaper!Thishelpsuscontinuetoexist.
1. https://doi.org/10.1093/nar/gku983
Manualannotation:i5kWorkspacetools
First,someconventions• HSP– HighscoringpairinBLAST/BLATalignments– The‘Hits’inanalignmentresultset– Asubsectionofapairofsequenceswithsufficientscore– HSPscanchangebasedonthealignmentparameters
• Fiveprimeendandthreeprimeend– Basedondirectionoftranscription– Initiationsiteisatthefiveprimeend– Stopcodonisatthethreeprimeend
• Inthegenomebrowser,arrowheadsindicatedirection
3’5’
5’ 3’
JBrowse andApollo
Trackselector
Bookmark/shareURL
User-createdannotationstrack
Login/outFile:Addyourownfiles
View:Changecoloringscheme
Tools:SearchusingBLAT
Findinformationabouttracks
Locatewhereyouareonthescaffold
Searchforageneorlocation
Zoomin/out
Turntrackson/off
JBrowse isaweb- basedgenomebrowser• Visualizefeaturesthataremappedtoa
genome• Thesefeaturesaredisplayedastracks• Manydifferenttypesofdatamaybe
displayed
ApolloaddseditingfunctionstoJBrowse• Manualgenecuration• Changesautomaticallysavedbacktoserver• Editsarevisibletootherannotatorsinreal-
time• Editinghistoryistracked
i5kWorkspaceBLAST:onewaytoaccessApollo
URL:https://i5k.nal.usda.gov/webapp/blast/
BLASTagainstthegenomeassemblytoviewHSPsinJbrowse
Selectorganism-specificdatabaseSelect
organism
Pasteoruploadquery
sequence(s)
Programisautomatically
selected
i5kWorkspaceBLAST:onewaytoaccessApollo
BLASTresultpagewith4panels
Clickonblueblastdb iconnexttoyourfavoriteHSP
BlastresultsaredisplayedinApollo
HMMERandClustal
• UseHMMERtodetectremoteproteinhomologs
• https://i5k.nal.usda.gov/webapp/hmmer/
• UseClustal toperformmultiplesequencealignments
• https://i5k.nal.usda.gov/webapp/clustal/
TipsandTricks• Thei5kWorkspaceBLASTresultspersistforoneweek
– Youcanbookmarkandsharesearches– BLASTHSPsare‘draggable’andcanbeusedinannotations
• Jbrowse/ApolloURLscanbeshared– Allowyoutosharetheexactview(includingactivetracks)withothers– Greatfortroubleshootingwithcollaborators
• InApollo“walk”featureboundaries– Squarebracketswalkexonboundaries:[and]– Curlybracketswalkgeneboundaries:{and}
• InApollo,youcanpintrackstothetop• IfyouknowthenameorIDofthegenethatyou’dliketoannotate,
youcanpasteitintothesearchboxinApollotonavigatetoit
Manualannotationexample:preparation
AnnotationExample
• Phosphoenolpyruvatecarboxykinase (pepck)inthecopepodEurytemora affinis
• Pepck catalyzestheconversionofoxaloacetate(OAA)tophosphoenolpyruvate(PEP).
• Moreinformationaboutthecopepod:https://i5k.nal.usda.gov/Eurytemora_affinis
• ApolloURL:https://apollo.nal.usda.gov/euraff/jbrowse/– Note:Therearenodemoaccountsforthisspecies
NotesonE.affinis genome/browser
• Bigadvantageforannotation:lotsofRNA-Seqandtranscriptomedataareavailabletouseascontributingevidenceforyourgenemodels– Includesstrand-specificRNA-Seq
• Disadvantage:Noclosereferencegenomes,soitmaybehardertofindhomologsforyourgenesofinteresttoinformyourannotations.
AvailabletracksforE.affinis• BaylorMakerannotations:
– PrimaryGeneSet:• EAFF_v0.5.3-Models
– Othertracksthatwereusedtogeneratetheprimarygeneset
• Transcriptome/RNA-Seq– Transcriptomeassemblies– Coverageplots,Mapped
RNA-Seq data,Splicejunctions
– SomeoftheRNA-Seqlibrariesarestranded
Choosingreferenceproteins:D.melanogasterpepck inUniProt
Catalyzestheconversionofoxaloacetate(OAA)tophosphoenolpyruvate(PEP).Source:http://www.uniprot.org/uniprot/P20007
Annotationscoreisaheuristicforannotationquality
Flybase isanothergreat
resource
Featureviewergivesgraphicalviewofdomainsandsites
Choosingreferenceproteins:Daphniapulex Pepck
• GenBank record:https://www.ncbi.nlm.nih.gov/protein/EFX80236.1
Treatwithcaution!!!
Manualannotationliveexample
BLASTdmel,dpul proteinsagainstE.affinis proteins
https://i5k.nal.usda.gov/webapp/blast/
ResultURL:https://i5k.nal.usda.gov/webapp/blast/c577723ffdb04de7921d768d2a1080b6
Copytheprotein‘basename’EAFF006514forsearchinginApollo
Resultsarefilteredbye-value;onlyoneproteininthe E.affinis datasethasasignificantmatch
ModifyE.affinismodelsequenceinApollo
• GotoApolloURL:https://apollo.nal.usda.gov/euraff/jbrowse/– FindmRNAofEAFF006514-PAingenomebrowserbypastingEAFF006514intosearchbox,selectingEAFF006514-RA
• LogintoApollo• DragEAFF006514-RAintotheyellowannotationtrack
• Checkavailableevidenceformodel
Anotherapproach:BLASTagainstthegenome
https://i5k.nal.usda.gov/webapp/blast/
Clickonblueblastdb buttonnexttoyourfavoriteHSPtoviewitinJBrowse
Anotherapproach:BLASTagainstthegenome
BLASTresultsaredisplayedasglyphsinbrowser;canbeusedasannotationstartingpointsifthealignmentishighquality
Createannotationinuser-createdannotationstrack
DragmodelEaffTmpM006514-RAtoUser-createdAnnotations
track
LoginwithyourApollo
credentials
ModifyE.affinismodelsequenceinApollo
• Questions:–Whatevidencedoyouchoosetochecktheintegrityofthemodel?
– Doyouneedadditionalevidence?– Howdoyouevaluatewhethertheproteinsequenceisascompleteasitcanbe?
– Shouldyouadd/modifyUTRs?
Viewavailableevidence
RNA-Seq andtranscriptometrackssuggestthatoneexonismissing
Modelisonthereversestrand,sowecantakeadvantageofthestrandedRNA-Seq availableforthisspecies
Addanexontothemodel
Dragexonfromtranscriptometrackintonewgenemodel
Adjustexonboundary
CDSsequenceisnowUTR–zoomintoinvestigate
CDSframehaschangedfrompurpletogreen–weneedtofixthis
RNA-Seq suggestsweneedtoadjustexonboundary
Adjustexonboundary
DragexonboundarytomatchRNA-Seq andtranscriptometracks
Fixedbothreadingframeandexonboundary
Evaluatenewproteinsequence• BlastmodifiedEAFF006514-PAsequencetoNCBI’snr
database– Makesureitdoesn’tmatchapotentialcontaminant– Getanideawhetheryouhavetherightsequence– Blastp home:
• https://blast.ncbi.nlm.nih.gov/Blast.cgi?PROGRAM=blastp&PAGE_TYPE=BlastSearch&LINK_LOC=blasthome
– ResultURL:• https://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Get&RID=U8EJ44A701R(expiresendofday8/29)
• Oncecontaminationisruledout,it’sbettertoalignyoursequenceagainstasmallersetofhigh-qualityproteins
• Ifyounoticethatpartsoftheproteinaremissing,checkthe‘Gapsinassembly’trackinthebrowser
Evaluatenewproteinsequence
• GetE.affinis pepck proteinsequencefromoldmodelandnewmodel
• Alignnewandoldsequencetodmel anddmag proteinsequences– Clustal (https://i5k.nal.usda.gov/webapp/clustal/)– CanalsouseNCBIBlast
• Checkalignmentextent,%ID
Clustal Results
Newexonadded
Anotherexonmightbemissing(we’renotgoingtohandlethistoday)
- Clustal resultURL:https://i5k.nal.usda.gov/webapp/clustal/105850a3594e4234a21b07d93cbbed71
- Scrolltobottomofpageandclick‘colorful’toseecolor-codedalignment
UsingtheInformationEditor• SelectthemodelinApollo,thenright-click,andselect‘Edit
Information’fromthedrop-downmenu– Usethe‘mRNA’section– Name:WerecommendtheINSDCnamingguidelines:
• http://www.uniprot.org/docs/nameprot• Ifanamingconventionexists,useit(e.g.forgenefamilies)• Nameshouldbeuniqueandattributedtoallorthologs (asfaraspossible)• Usenamefromanorthologousproteinifyouaresurethatyourgenemodelis
anortholog.• DocumentyourjustificationforthenameintheCommentsfield(e.g.“88%
sequencesimilarityviablastp toD.melanogasterpepck P20007”)– Comments– Documentwhatchangesyouperformed,andyour
justificationforthename.ThesenoteswillbevisibleintheOGS,somakesurethatothersunderstandthem
– ReplacedModelsField– theMakermodel(EAFF_v0.5.3)thatyournewmodelwillreplaceintheOGS
UsingtheInformationEditor
TheReplacedModelsfield• Weusetheinformationin
thisfieldtogenerateamerged,non-redundantgenesetfromthemanuallycuratedmodelsandtheofficialorprimarygeneset
• Yourofficialorprimarygenesetislistedinthecategoryfieldofthetrackselector
• Ifyoudon’tknowwhatyourproject’sgenesetis,contactus!
https://i5k.nal.usda.gov/apollo-replaced-models-field-explanations-and-examples
ReplacedModelsfield
Checklistforaccuracyandintegrity• Checkstart,stopandexonboundaries(splicesites)
– Trytofixnon-canonicalsplicesitesifpossible• CheckifyoucanannotateUTRs(e.g.usingRNA-Seq data)• Checkforgapsinthegenome• Ifyouchangethegenomesequence,addajustificationcommenttothe
correspondinggenemodel• UseBLASToramultiplesequencealigner
– Tolookatcompletenessofmodel– Toverifytheappropriatenessofthegenename
• IntheInformationeditormRNA field– FillintheReplacedModelfortheMaker gene(EAFF_v0.5.3-Models)– UpdatetheNameifappropriate– Addcommentsthatdescribe
• yourevidencefortheannotation• Modificationsthatyoumadetothegenemodel
cf.https://www.slideshare.net/MonicaMunozTorres/editing-functionality-apollo-workshop
WhathappenstomyannotationwhenI’mdone?
• Thisdependsonthegenomeprojectthatyou’reworkingon.• IfthegenomecoordinatorhasaskedustogenerateanOGS(Official
GeneSet),wewilldoso– Wearestillworkingonthisprocess,soifyouaskustodothis,1)itwill
takesometime,and2)wewillprobablyaskyouforco-authorshipifyoupublishapaperontheOGS.
– WeareworkingonapipelinetosubmitOfficialGeneSetstoGenBank,wheretheywillbearchived/accessioned
• Otherwise,don’tassumethatyourannotationwillbearchived.– Ifyouneedittobe,getintouchwithusandwe’llfigureoutwhatto
do.• Getintouchwithusandthegenomeprojectcoordinatorifyou’re
notsureaboutthestatusofagenomeproject.• https://i5k.nal.usda.gov/data-management-policy
Upcomingwebinars(tentativeschedule)
• October:ApollomanualannotationQ&A• December:ManualannotationwithApollo• February:i5kWorkspaceroadmapandQ&A• April:Orientationandresourcesforprojectcoordinators
• June:Overviewofi5kWorkspaceresources• Wewillpostslides,recordingswillbeavailableonrequest
Thankyou!TheNALTeam• Yu-yu Lin• ChaitanyaGutta• Li-MeiChiang• YiHsiao• GaryMoore• SusanMcCarthyI5kWorkspacealumni• Chien-Yueh Lee• HanLin• Jun-WeiLin• Vijaya Tsavatapalli• Mei-Ju Cheni5kWorkspace@NAL advisorycommittee
• i5kCoordinatingCommittee• i5kPilotProject• Apollo&JBrowse DevelopmentTeams
o MonicaMunoz-Torres,NathanDunn
• GMOD/Tripal community
• Allofourusersandcontributors!