Top Banner
Using Apollo at the i5k Workspace@NAL NAL USDA-ARS https://i5k.nal.usda.gov August 29 th , 2017
42

Using Apollo at the i5k [email protected] process of manual annotation 1. Select a chromosomal region of interest (e.g. scaffold) 1. E.g. find sequence of interest from one or

Feb 10, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Using Apollo at the i5k Workspace@NAL...General process of manual annotation 1. Select a chromosomal region of interest (e.g. scaffold) 1. E.g. find sequence of interest from one or

UsingApolloatthei5kWorkspace@NAL

NALUSDA-ARShttps://i5k.nal.usda.gov

August29th,2017

Page 2: Using Apollo at the i5k Workspace@NAL...General process of manual annotation 1. Select a chromosomal region of interest (e.g. scaffold) 1. E.g. find sequence of interest from one or

Agenda

• Manualannotationgeneraloverview• I5kWorkspacetoolsformanualannotation– BLAST,Clustal,HMMER– Apollo

• Manualannotationexample:preparation• Manualannotationliveexample

Page 3: Using Apollo at the i5k Workspace@NAL...General process of manual annotation 1. Select a chromosomal region of interest (e.g. scaffold) 1. E.g. find sequence of interest from one or

Otherresources• MonicaMunoz-TorresfromtheApollogrouphasanumberof

comprehensivetutorials:– https://www.slideshare.net/MonicaMunozTorres/presentations

• Irecommendtheseslidesifyouneedmorebackground:– https://www.slideshare.net/MonicaMunozTorres/apollo-workshop-at-ksu-2015

• Note- therearetwoversionsofApollo.Thei5kWorkspacestillusestheolderversionwithaslightlydifferentinterface

– IfyouarenewtoApollo,orneedarefresher,wehighlyrecommendthatyoureviewoneofherpresentations

• TheofficialApolloannotationguide:– http://genomearchitect.org/users-guide/

• Othermanualcurationtutorials:– https://i5k.nal.usda.gov/manual-curation-example– http://genomecuration.github.io/genometrain/d-feature-curation-

crossing/

Page 4: Using Apollo at the i5k Workspace@NAL...General process of manual annotation 1. Select a chromosomal region of interest (e.g. scaffold) 1. E.g. find sequence of interest from one or

Manualannotationgeneraloverview

Page 5: Using Apollo at the i5k Workspace@NAL...General process of manual annotation 1. Select a chromosomal region of interest (e.g. scaffold) 1. E.g. find sequence of interest from one or

Whatismanualannotation?

• Manualreviewandimprovementofanexistinggeneprediction

• Often,butnotalways:drawingonexternalevidence(e.g.RNA-Seq,cDNA,genesfromotherspecies)toimproveacomputationallypredictedgenemodel– Structuralannotation– definingthegenestructure(e.g.exonboundaries)

– Functionalannotation– describingthegenefunction(e.g itsname)

Page 6: Using Apollo at the i5k Workspace@NAL...General process of manual annotation 1. Select a chromosomal region of interest (e.g. scaffold) 1. E.g. find sequence of interest from one or

Whymanuallyannotate?

• “Incorrectannotationspoisoneveryexperimentthatmakesuseofthem”

• “Worsestill,thepoisonspreadsbecauseincorrectannotationsfromoneorganismareoftenunknowinglyusedbyotherprojectstohelpannotatetheirowngenomes.”– Yandell andEnce 2012,doi:10.1038/nrg3174

Page 7: Using Apollo at the i5k Workspace@NAL...General process of manual annotation 1. Select a chromosomal region of interest (e.g. scaffold) 1. E.g. find sequence of interest from one or

Generalprocessofmanualannotation1. Selectachromosomalregionofinterest(e.g.scaffold)

1. E.g.findsequenceofinterestfromoneorseveralotherspecies,andalignagainstproteinsorgenomesequencefromyourspecies

2. Selectappropriateevidence(tracksinApollo,oryourownfiles)3. Determinewhetherafeatureinyourevidenceprovidesareasonable

startinggenemodel1. Ifyes:selectanddragthefeaturetothe‘user-createdannotations’area,creatinganinitial

genemodel.Ifnecessaryuseeditingfunctionstoadjustthemodel.2. Ifnot– getintouchwithus!

4. Editmodelifnecessary5. Checkyoureditedgenemodelforintegrityandaccuracybycomparingit

withavailablehomologs1. Verifythatthegenemodelisthebestrepresentationoftheunderlyingbiology

6. Repeatsteps1through5 asneededtorefinemodel7. Addannotationdetailsinthe“InformationEditor”

1. Replacedmodel,name,symbol,othercomments

Adaptedfromhttps://www.slideshare.net/MonicaMunozTorres/apollo-workshop-at-ksu-2015

Page 8: Using Apollo at the i5k Workspace@NAL...General process of manual annotation 1. Select a chromosomal region of interest (e.g. scaffold) 1. E.g. find sequence of interest from one or

I5kWorkspace‘Etiquette’1. UseApollotoimproveagenemodelinani5kWorkspaceassembly.

1. Ifyoujustwanttopractice– useoneofourtraininginstances.1. https://i5k.nal.usda.gov/jbrowseapollo-training

2. Ifyoujustwanttoviewthedata– youprobablycangetwhatyouwantwithoutusingApollo.Allofthedatathatwehostispublic.

2. Yourannotationworkisacommunityeffort.1. Ifyounoticethatsomeoneelseisworkingonyourmodelofchoice,getin

touchwiththem(orus)andcollaborate– don’tmakea2ndmodelordeletetheothermodel.

2. Keepinmindthatyourworkwillbeusedbythescientificcommunityonceyou’redone.

3. Ifyoupublishanyofyourworkgeneratedinthei5kworkspace:1. Getintouchwiththegenomecontactfirst(youcanfindthecontactinfoon

theorganismpage;https://i5k.nal.usda.gov/species);2. Pleasecitethei5kWorkspacepaper!Thishelpsuscontinuetoexist.

1. https://doi.org/10.1093/nar/gku983

Page 9: Using Apollo at the i5k Workspace@NAL...General process of manual annotation 1. Select a chromosomal region of interest (e.g. scaffold) 1. E.g. find sequence of interest from one or

Manualannotation:i5kWorkspacetools

Page 10: Using Apollo at the i5k Workspace@NAL...General process of manual annotation 1. Select a chromosomal region of interest (e.g. scaffold) 1. E.g. find sequence of interest from one or

First,someconventions• HSP– HighscoringpairinBLAST/BLATalignments– The‘Hits’inanalignmentresultset– Asubsectionofapairofsequenceswithsufficientscore– HSPscanchangebasedonthealignmentparameters

• Fiveprimeendandthreeprimeend– Basedondirectionoftranscription– Initiationsiteisatthefiveprimeend– Stopcodonisatthethreeprimeend

• Inthegenomebrowser,arrowheadsindicatedirection

3’5’

5’ 3’

Page 11: Using Apollo at the i5k Workspace@NAL...General process of manual annotation 1. Select a chromosomal region of interest (e.g. scaffold) 1. E.g. find sequence of interest from one or

JBrowse andApollo

Trackselector

Bookmark/shareURL

User-createdannotationstrack

Login/outFile:Addyourownfiles

View:Changecoloringscheme

Tools:SearchusingBLAT

Findinformationabouttracks

Locatewhereyouareonthescaffold

Searchforageneorlocation

Zoomin/out

Turntrackson/off

JBrowse isaweb- basedgenomebrowser• Visualizefeaturesthataremappedtoa

genome• Thesefeaturesaredisplayedastracks• Manydifferenttypesofdatamaybe

displayed

ApolloaddseditingfunctionstoJBrowse• Manualgenecuration• Changesautomaticallysavedbacktoserver• Editsarevisibletootherannotatorsinreal-

time• Editinghistoryistracked

Page 12: Using Apollo at the i5k Workspace@NAL...General process of manual annotation 1. Select a chromosomal region of interest (e.g. scaffold) 1. E.g. find sequence of interest from one or

i5kWorkspaceBLAST:onewaytoaccessApollo

URL:https://i5k.nal.usda.gov/webapp/blast/

BLASTagainstthegenomeassemblytoviewHSPsinJbrowse

Selectorganism-specificdatabaseSelect

organism

Pasteoruploadquery

sequence(s)

Programisautomatically

selected

Page 13: Using Apollo at the i5k Workspace@NAL...General process of manual annotation 1. Select a chromosomal region of interest (e.g. scaffold) 1. E.g. find sequence of interest from one or

i5kWorkspaceBLAST:onewaytoaccessApollo

BLASTresultpagewith4panels

Clickonblueblastdb iconnexttoyourfavoriteHSP

BlastresultsaredisplayedinApollo

Page 14: Using Apollo at the i5k Workspace@NAL...General process of manual annotation 1. Select a chromosomal region of interest (e.g. scaffold) 1. E.g. find sequence of interest from one or

HMMERandClustal

• UseHMMERtodetectremoteproteinhomologs

• https://i5k.nal.usda.gov/webapp/hmmer/

• UseClustal toperformmultiplesequencealignments

• https://i5k.nal.usda.gov/webapp/clustal/

Page 15: Using Apollo at the i5k Workspace@NAL...General process of manual annotation 1. Select a chromosomal region of interest (e.g. scaffold) 1. E.g. find sequence of interest from one or

TipsandTricks• Thei5kWorkspaceBLASTresultspersistforoneweek

– Youcanbookmarkandsharesearches– BLASTHSPsare‘draggable’andcanbeusedinannotations

• Jbrowse/ApolloURLscanbeshared– Allowyoutosharetheexactview(includingactivetracks)withothers– Greatfortroubleshootingwithcollaborators

• InApollo“walk”featureboundaries– Squarebracketswalkexonboundaries:[and]– Curlybracketswalkgeneboundaries:{and}

• InApollo,youcanpintrackstothetop• IfyouknowthenameorIDofthegenethatyou’dliketoannotate,

youcanpasteitintothesearchboxinApollotonavigatetoit

Page 16: Using Apollo at the i5k Workspace@NAL...General process of manual annotation 1. Select a chromosomal region of interest (e.g. scaffold) 1. E.g. find sequence of interest from one or

Manualannotationexample:preparation

Page 17: Using Apollo at the i5k Workspace@NAL...General process of manual annotation 1. Select a chromosomal region of interest (e.g. scaffold) 1. E.g. find sequence of interest from one or

AnnotationExample

• Phosphoenolpyruvatecarboxykinase (pepck)inthecopepodEurytemora affinis

• Pepck catalyzestheconversionofoxaloacetate(OAA)tophosphoenolpyruvate(PEP).

• Moreinformationaboutthecopepod:https://i5k.nal.usda.gov/Eurytemora_affinis

• ApolloURL:https://apollo.nal.usda.gov/euraff/jbrowse/– Note:Therearenodemoaccountsforthisspecies

Page 18: Using Apollo at the i5k Workspace@NAL...General process of manual annotation 1. Select a chromosomal region of interest (e.g. scaffold) 1. E.g. find sequence of interest from one or

NotesonE.affinis genome/browser

• Bigadvantageforannotation:lotsofRNA-Seqandtranscriptomedataareavailabletouseascontributingevidenceforyourgenemodels– Includesstrand-specificRNA-Seq

• Disadvantage:Noclosereferencegenomes,soitmaybehardertofindhomologsforyourgenesofinteresttoinformyourannotations.

Page 19: Using Apollo at the i5k Workspace@NAL...General process of manual annotation 1. Select a chromosomal region of interest (e.g. scaffold) 1. E.g. find sequence of interest from one or

AvailabletracksforE.affinis• BaylorMakerannotations:

– PrimaryGeneSet:• EAFF_v0.5.3-Models

– Othertracksthatwereusedtogeneratetheprimarygeneset

• Transcriptome/RNA-Seq– Transcriptomeassemblies– Coverageplots,Mapped

RNA-Seq data,Splicejunctions

– SomeoftheRNA-Seqlibrariesarestranded

Page 20: Using Apollo at the i5k Workspace@NAL...General process of manual annotation 1. Select a chromosomal region of interest (e.g. scaffold) 1. E.g. find sequence of interest from one or

Choosingreferenceproteins:D.melanogasterpepck inUniProt

Catalyzestheconversionofoxaloacetate(OAA)tophosphoenolpyruvate(PEP).Source:http://www.uniprot.org/uniprot/P20007

Annotationscoreisaheuristicforannotationquality

Flybase isanothergreat

resource

Featureviewergivesgraphicalviewofdomainsandsites

Page 21: Using Apollo at the i5k Workspace@NAL...General process of manual annotation 1. Select a chromosomal region of interest (e.g. scaffold) 1. E.g. find sequence of interest from one or

Choosingreferenceproteins:Daphniapulex Pepck

• GenBank record:https://www.ncbi.nlm.nih.gov/protein/EFX80236.1

Treatwithcaution!!!

Page 22: Using Apollo at the i5k Workspace@NAL...General process of manual annotation 1. Select a chromosomal region of interest (e.g. scaffold) 1. E.g. find sequence of interest from one or

Manualannotationliveexample

Page 23: Using Apollo at the i5k Workspace@NAL...General process of manual annotation 1. Select a chromosomal region of interest (e.g. scaffold) 1. E.g. find sequence of interest from one or

BLASTdmel,dpul proteinsagainstE.affinis proteins

https://i5k.nal.usda.gov/webapp/blast/

ResultURL:https://i5k.nal.usda.gov/webapp/blast/c577723ffdb04de7921d768d2a1080b6

Copytheprotein‘basename’EAFF006514forsearchinginApollo

Resultsarefilteredbye-value;onlyoneproteininthe E.affinis datasethasasignificantmatch

Page 24: Using Apollo at the i5k Workspace@NAL...General process of manual annotation 1. Select a chromosomal region of interest (e.g. scaffold) 1. E.g. find sequence of interest from one or

ModifyE.affinismodelsequenceinApollo

• GotoApolloURL:https://apollo.nal.usda.gov/euraff/jbrowse/– FindmRNAofEAFF006514-PAingenomebrowserbypastingEAFF006514intosearchbox,selectingEAFF006514-RA

• LogintoApollo• DragEAFF006514-RAintotheyellowannotationtrack

• Checkavailableevidenceformodel

Page 25: Using Apollo at the i5k Workspace@NAL...General process of manual annotation 1. Select a chromosomal region of interest (e.g. scaffold) 1. E.g. find sequence of interest from one or

Anotherapproach:BLASTagainstthegenome

https://i5k.nal.usda.gov/webapp/blast/

Clickonblueblastdb buttonnexttoyourfavoriteHSPtoviewitinJBrowse

Page 26: Using Apollo at the i5k Workspace@NAL...General process of manual annotation 1. Select a chromosomal region of interest (e.g. scaffold) 1. E.g. find sequence of interest from one or

Anotherapproach:BLASTagainstthegenome

BLASTresultsaredisplayedasglyphsinbrowser;canbeusedasannotationstartingpointsifthealignmentishighquality

Page 27: Using Apollo at the i5k Workspace@NAL...General process of manual annotation 1. Select a chromosomal region of interest (e.g. scaffold) 1. E.g. find sequence of interest from one or

Createannotationinuser-createdannotationstrack

DragmodelEaffTmpM006514-RAtoUser-createdAnnotations

track

LoginwithyourApollo

credentials

Page 28: Using Apollo at the i5k Workspace@NAL...General process of manual annotation 1. Select a chromosomal region of interest (e.g. scaffold) 1. E.g. find sequence of interest from one or

ModifyE.affinismodelsequenceinApollo

• Questions:–Whatevidencedoyouchoosetochecktheintegrityofthemodel?

– Doyouneedadditionalevidence?– Howdoyouevaluatewhethertheproteinsequenceisascompleteasitcanbe?

– Shouldyouadd/modifyUTRs?

Page 29: Using Apollo at the i5k Workspace@NAL...General process of manual annotation 1. Select a chromosomal region of interest (e.g. scaffold) 1. E.g. find sequence of interest from one or

Viewavailableevidence

RNA-Seq andtranscriptometrackssuggestthatoneexonismissing

Modelisonthereversestrand,sowecantakeadvantageofthestrandedRNA-Seq availableforthisspecies

Page 30: Using Apollo at the i5k Workspace@NAL...General process of manual annotation 1. Select a chromosomal region of interest (e.g. scaffold) 1. E.g. find sequence of interest from one or

Addanexontothemodel

Dragexonfromtranscriptometrackintonewgenemodel

Page 31: Using Apollo at the i5k Workspace@NAL...General process of manual annotation 1. Select a chromosomal region of interest (e.g. scaffold) 1. E.g. find sequence of interest from one or

Adjustexonboundary

CDSsequenceisnowUTR–zoomintoinvestigate

CDSframehaschangedfrompurpletogreen–weneedtofixthis

RNA-Seq suggestsweneedtoadjustexonboundary

Page 32: Using Apollo at the i5k Workspace@NAL...General process of manual annotation 1. Select a chromosomal region of interest (e.g. scaffold) 1. E.g. find sequence of interest from one or

Adjustexonboundary

DragexonboundarytomatchRNA-Seq andtranscriptometracks

Fixedbothreadingframeandexonboundary

Page 33: Using Apollo at the i5k Workspace@NAL...General process of manual annotation 1. Select a chromosomal region of interest (e.g. scaffold) 1. E.g. find sequence of interest from one or

Evaluatenewproteinsequence• BlastmodifiedEAFF006514-PAsequencetoNCBI’snr

database– Makesureitdoesn’tmatchapotentialcontaminant– Getanideawhetheryouhavetherightsequence– Blastp home:

• https://blast.ncbi.nlm.nih.gov/Blast.cgi?PROGRAM=blastp&PAGE_TYPE=BlastSearch&LINK_LOC=blasthome

– ResultURL:• https://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Get&RID=U8EJ44A701R(expiresendofday8/29)

• Oncecontaminationisruledout,it’sbettertoalignyoursequenceagainstasmallersetofhigh-qualityproteins

• Ifyounoticethatpartsoftheproteinaremissing,checkthe‘Gapsinassembly’trackinthebrowser

Page 34: Using Apollo at the i5k Workspace@NAL...General process of manual annotation 1. Select a chromosomal region of interest (e.g. scaffold) 1. E.g. find sequence of interest from one or

Evaluatenewproteinsequence

• GetE.affinis pepck proteinsequencefromoldmodelandnewmodel

• Alignnewandoldsequencetodmel anddmag proteinsequences– Clustal (https://i5k.nal.usda.gov/webapp/clustal/)– CanalsouseNCBIBlast

• Checkalignmentextent,%ID

Page 35: Using Apollo at the i5k Workspace@NAL...General process of manual annotation 1. Select a chromosomal region of interest (e.g. scaffold) 1. E.g. find sequence of interest from one or

Clustal Results

Newexonadded

Anotherexonmightbemissing(we’renotgoingtohandlethistoday)

- Clustal resultURL:https://i5k.nal.usda.gov/webapp/clustal/105850a3594e4234a21b07d93cbbed71

- Scrolltobottomofpageandclick‘colorful’toseecolor-codedalignment

Page 36: Using Apollo at the i5k Workspace@NAL...General process of manual annotation 1. Select a chromosomal region of interest (e.g. scaffold) 1. E.g. find sequence of interest from one or

UsingtheInformationEditor• SelectthemodelinApollo,thenright-click,andselect‘Edit

Information’fromthedrop-downmenu– Usethe‘mRNA’section– Name:WerecommendtheINSDCnamingguidelines:

• http://www.uniprot.org/docs/nameprot• Ifanamingconventionexists,useit(e.g.forgenefamilies)• Nameshouldbeuniqueandattributedtoallorthologs (asfaraspossible)• Usenamefromanorthologousproteinifyouaresurethatyourgenemodelis

anortholog.• DocumentyourjustificationforthenameintheCommentsfield(e.g.“88%

sequencesimilarityviablastp toD.melanogasterpepck P20007”)– Comments– Documentwhatchangesyouperformed,andyour

justificationforthename.ThesenoteswillbevisibleintheOGS,somakesurethatothersunderstandthem

– ReplacedModelsField– theMakermodel(EAFF_v0.5.3)thatyournewmodelwillreplaceintheOGS

Page 37: Using Apollo at the i5k Workspace@NAL...General process of manual annotation 1. Select a chromosomal region of interest (e.g. scaffold) 1. E.g. find sequence of interest from one or

UsingtheInformationEditor

Page 38: Using Apollo at the i5k Workspace@NAL...General process of manual annotation 1. Select a chromosomal region of interest (e.g. scaffold) 1. E.g. find sequence of interest from one or

TheReplacedModelsfield• Weusetheinformationin

thisfieldtogenerateamerged,non-redundantgenesetfromthemanuallycuratedmodelsandtheofficialorprimarygeneset

• Yourofficialorprimarygenesetislistedinthecategoryfieldofthetrackselector

• Ifyoudon’tknowwhatyourproject’sgenesetis,contactus!

https://i5k.nal.usda.gov/apollo-replaced-models-field-explanations-and-examples

ReplacedModelsfield

Page 39: Using Apollo at the i5k Workspace@NAL...General process of manual annotation 1. Select a chromosomal region of interest (e.g. scaffold) 1. E.g. find sequence of interest from one or

Checklistforaccuracyandintegrity• Checkstart,stopandexonboundaries(splicesites)

– Trytofixnon-canonicalsplicesitesifpossible• CheckifyoucanannotateUTRs(e.g.usingRNA-Seq data)• Checkforgapsinthegenome• Ifyouchangethegenomesequence,addajustificationcommenttothe

correspondinggenemodel• UseBLASToramultiplesequencealigner

– Tolookatcompletenessofmodel– Toverifytheappropriatenessofthegenename

• IntheInformationeditormRNA field– FillintheReplacedModelfortheMaker gene(EAFF_v0.5.3-Models)– UpdatetheNameifappropriate– Addcommentsthatdescribe

• yourevidencefortheannotation• Modificationsthatyoumadetothegenemodel

cf.https://www.slideshare.net/MonicaMunozTorres/editing-functionality-apollo-workshop

Page 40: Using Apollo at the i5k Workspace@NAL...General process of manual annotation 1. Select a chromosomal region of interest (e.g. scaffold) 1. E.g. find sequence of interest from one or

WhathappenstomyannotationwhenI’mdone?

• Thisdependsonthegenomeprojectthatyou’reworkingon.• IfthegenomecoordinatorhasaskedustogenerateanOGS(Official

GeneSet),wewilldoso– Wearestillworkingonthisprocess,soifyouaskustodothis,1)itwill

takesometime,and2)wewillprobablyaskyouforco-authorshipifyoupublishapaperontheOGS.

– WeareworkingonapipelinetosubmitOfficialGeneSetstoGenBank,wheretheywillbearchived/accessioned

• Otherwise,don’tassumethatyourannotationwillbearchived.– Ifyouneedittobe,getintouchwithusandwe’llfigureoutwhatto

do.• Getintouchwithusandthegenomeprojectcoordinatorifyou’re

notsureaboutthestatusofagenomeproject.• https://i5k.nal.usda.gov/data-management-policy

Page 41: Using Apollo at the i5k Workspace@NAL...General process of manual annotation 1. Select a chromosomal region of interest (e.g. scaffold) 1. E.g. find sequence of interest from one or

Upcomingwebinars(tentativeschedule)

• October:ApollomanualannotationQ&A• December:ManualannotationwithApollo• February:i5kWorkspaceroadmapandQ&A• April:Orientationandresourcesforprojectcoordinators

• June:Overviewofi5kWorkspaceresources• Wewillpostslides,recordingswillbeavailableonrequest

Page 42: Using Apollo at the i5k Workspace@NAL...General process of manual annotation 1. Select a chromosomal region of interest (e.g. scaffold) 1. E.g. find sequence of interest from one or

Thankyou!TheNALTeam• Yu-yu Lin• ChaitanyaGutta• Li-MeiChiang• YiHsiao• GaryMoore• SusanMcCarthyI5kWorkspacealumni• Chien-Yueh Lee• HanLin• Jun-WeiLin• Vijaya Tsavatapalli• Mei-Ju Cheni5kWorkspace@NAL advisorycommittee

• i5kCoordinatingCommittee• i5kPilotProject• Apollo&JBrowse DevelopmentTeams

o MonicaMunoz-Torres,NathanDunn

• GMOD/Tripal community

• Allofourusersandcontributors!