Top Banner
1 A Guide to Lauren Brinkac Ramana Madupu The J. Craig Venter Institute 2008 logo by Connie Shiau
95

Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE

May 10, 2015

Download

Technology

Pathema

Conference: Sept 24 - 26, 2008 at the JCVI Rockville, MD Campus
Presenters: Ramana Madupu, Lauren Brinkac, Derek Harkins
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE

1

A Guide to

Lauren BrinkacRamana Madupu

The J. Craig Venter Institute2008

logo by Connie Shiau

Page 2: Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE

2

Table of Contents (for the most popular topics)topic (page #s)1. Getting started (3-6)2. “Welcome to Manatee” page and links (7-11,21,23,26-28)3. ”Genome Summary” page and links (11-20)

-Annotation Notebook (15,37)-Genome Calculations (13)-Role Category Breakdown (14)

4. “Annotation Tools” page and links (28-38)-Gene List (34-38)-coordinate range (29)-overlaps (30)-InterEvidence (31)

5. Gene Curation Page (39-86)-BER section (43-47)-HMM section (55-57)-GO section (71-75,81)

6. Gene Ontology (21-22,71-81)-edit Gene Ontology (22)-search Gene Ontology (22,76-80)-Gene Ontology on the Gene Curation Page (71-75,81)

7. Genome Properties (23-25,57-60)8. Genome Viewer (26,87-91)9. TIGR role categories (35-36,38,82)

-Role notes (38)-TIGR role entry on Gene Curation Page (82)

10. Edit starts (90)11. Annotation Checklist (92)

Page 3: Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE

3

What Manatee Is

• Manatee is a web-based manual annotation tool foraccessing and editing annotation data

• Manatee draws information from an underlyingdatabase for its displays

• Manatee sends information entered by annotators tothe underlying database for storage

• Manatee depends on JCVI’s database structure(more on this later)

• Multiple users can access the same database fromdifferent computers when Manatee is run on a server(as it is at JCVI)

Page 4: Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE

4

Getting started with Manatee

• When logging into Manatee, one must enter a username, a password, and the name of the database onwhich you wish to work.

• JCVI database names tend to be 3-5 letter codes:– during this tutorial and subsequent exercises we will be

using the Shewanella oneidensis (formerly Shewanellaputrefaciens) database.

– we will be working with two versions of the Shewanelladatabase:

• the production database, which stores the published annotation(gsp)

• the training database, which stores the training annotation(tgsp)

Page 5: Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE

5

Finding Manatee (working at JCVI)GO to manatee.tigr.org and select “Prokaryotic Manatee”

Page 6: Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE

6

Fill in the fields with the requiredinformation.

user name and password

database = “gsp” for the tutorialportion

Clicking on “Prokaryotic Manatee” from www.manatee.orgtakes you to the Manatee Login Page

Page 7: Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE

7

“Welcome to Manatee”

After logging in to Manatee, you come to the“Welcome to Manatee” page.

Here you will find several menu options and acouple search options to choose from.

I will discuss each in more detail in followingslides.

NOTE: in the upper right hand corner of everyManatee page will be something like this:

The “Home” link takes you back to the“Welcome to Manatee” page, from where everyou are within the Manatee tool.

This area also shows you which database youare logged into, and who is logged in. Clickingon the login name will take you back to the loginpage.

Page 8: Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE

8

The Welcome to Manatee Page“Access Gene Curation Page” option

We will look at the options in the Access Listings section insubsequent slides. First we will look at the 3 options on thebottom of this page:

Access Gene Curation Page:This option will take you directly to a pagecontaining gene specific information called the“Gene Curation Page” or “GCP” for short. TheGCP displays most of what knowledge we haveabout a given protein - you will be seeing this pagein much more detail later. For now just know thatyou can reach this page by entering either afeat_name or locus id into this box and then clicking“submit”. A feat_name is an internal identifiergiven to each gene in a genome, feat_names arenot used publically. These are initially assignedby Glimmer and generally are numberedsequentially from the beginning of the DNAsequence given to Glimmer. They have the formatORF#####, where ORF stands for “open readingframe” and ##### is a 5-digit zero padded number.(For more on this see the overview document.)Locus ids (loci) are assigned to proteins at the endof the annotation process. They are numberedsequentially from the origin of replication of thegenome (if it can be identified). Loci are uniqueaccessions and are used for public release anddisplay of the proteins.

Page 9: Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE

9

The Welcome to Manatee Page“Search Genes By Gene Name” optionThis is a keyword based search for the commonnames that have been given to the genes/proteins(we have a tendency to use the terms gene andprotein somewhat interchangeably, however, whatwe are really annotating are the proteintranslations of the predicted genes.)Whatever keyword you enter will be treated asthough it has wildcards flanking it. This meansthat you will get results that include namescontaining your keyword as an individual word andnames that contain words that contain yourkeyword.For example, if you search with “kinase”

you could get these:“adenylate kinase”“protein kinase”“sensor histidine kinase”

as well as these:“glutamate 5-kinase”“phosphoenolpyruvate carboxykinase”“ribose-phosphate pyrophosphokinase”

The results will be in the form of a table containingadditional information and links to other pages -this table format will be described later.

keyword

Page 10: Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE

10

The Welcome to Manatee Page“Change Organism Database” option

To change from one database to another, onedoes not need to re-login, rather one need onlytype in the name of the database they wish to goto and click submit.

gsp

Page 11: Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE

11

The Welcome to Manatee PageOptions under “Access Listings”:“Genome Summary”

The “Annotation Tools” option is one of themost used and will be described in detail in laterslides. The “Gene Ontology”, “GenomeProperties”, and “Genome Viewer” sections areaccessible here as well as elsewhere withinManatee. (There are many routes to view thevarious pieces of information within Manatee.)They will be described briefly as links from“Access Listings” and then in more detail asthey are viewed from the Gene Curation Page(GCP) and elsewhere.

First we will look in more detail at the optionsunder “Genome Summary”

Clicking on “Genome Summary” takes one to anew page with additional menu options (on nextslide).

Page 12: Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE

12

The “Genome Summary” page

Clicking on the item in the list of optionstakes you to a page with the information ormore options. Following slides will describeeach of these.

These are tools that allow one to viewthe data based on various types ofannotation. Following slides willdescribe the use and output of each ofthese.

Page 13: Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE

13

Links from the Genome Summary Page: “Genome Calculations”

This page shows thevarious calculable andcountable features of thegenome. This informationis newly generated eachtime the page is accessedso that all information iscurrent.

Page 14: Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE

14

Links from the Genome Summary Page: “Role Category Breakdown”This page shows a summary of the genes found in various broad categories based on TIGR roles and then a breakdown by TIGRsub category. Each blue role id number or “main” is a link to a table containing a list of all the genes in that category.

Page 15: Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE

15

Links from the Genome Summary Page: “Annotation Notebook” - the annotation notebook is a set of textfields associated with each TIGR role category. These are used for annotators to store information about the annotation which theyfeel the PIs of the project should know for purposes of writing the manuscript, generally they consist of items of particular biologicalinterest, often involving the presence or absence of particular pathways, genes, gene order, etc. These entries are entered andedited with the “Edit Annotation Notebook” page, linked from the gene list, see page 33, 36 of this tutorial.

Page 16: Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE

16

Other links from the Genome Summary Page:

“Project Administration” - Clicking this link takes one to a page that displays theadministrative information for the project: things like PI, grant #, etc.

“Frameshift Status” - Currently this tool is not available for people running Manatee locallyoutside of TIGR. For TIGR users, there is (will be) a separate section of the tutorial governingall things involving Frameshifts (this part of our SOP is currently undergoing change.) In brief,this link displays a page listing all of the genes in the genome which needed to be reviewed forthe presence of a frameshift or in-frame stop codon as well as the status of each.

“Annotation Progress Report” - This links to a page that lists all of the processes that mustbe carried out during the annotation of a genome and provides fields in which to enter wheneach process was done and who did it. There is also a link to a page listing all the TIGRmainrole categories and fields for individual annotators to sign up for each category.

“by InterPro Domain” - This links to a list of genes according to membership in an InterProdomain.

“Genome Properties” - another link to this tool set, will be described in detail elsewhere.

Page 17: Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE

17

Searches on the Genome Summary page: “Attributes”

One can choose to view genes based on one of several “attributes” they might have. Here I haveshown a selection for “MW” which stands for molecular weight. Once you choose and attribute tosearch by, you can then choose various ordering display options. The above choices will showthe proteins in the genome according to calculated molecular weight with the heaviest ones first.(see below)

This is just the top of a very long list containing all of the proteins inthe genome. One can click on any of the blue gene id links and get goto the Gene Curation Page (GCP) for the gene. (The GCP will bedescribed in detail shortly.)

One can jump to different pages in the list by clicking on the bluenumbers in the boxes above the list.

One can change the order of the list by clicking in the arrows in bluecircles.

Page 18: Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE

18

Searches on the Genome Summary page: “Evidence”

One can choose to view the genes based on one of several types of clustering evidencethat has been found for them, some as the result of InterPro searches and some as aresult of separate searches we perform. Here, I have selected “HMM2” (which will includeboth the TIGRFAM and Pfam HMM sets) and I will view the output ordered by the numberof hits in the genome, the HMMs with the most hits will be listed first.

By clicking on the blue numbers in each row,one will get a list of genes that hit that HMM.Clicking on the blue accession number willtake one to an info page for the HMM inquestion.

One can reorder the list by count oraccession by clicking on the blue columnheaders.

Numbered boxes at the top will take one to adesired page in the output.

Page 19: Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE

19

Searches on the Genome Summary page: “Paralogous Families”

One can choose to view the genes based on membership in paralgous families, orderingeither by number of family members or by family name.

-Paralogous families are built by firstsearching all of the proteins within agenome against themselves and againstthe HMM db. If a paralogous familymatches an HMM the family will be namedbased on the HMM. Then further searchesare done to group the proteins based onregions of sequence that did not match anHMM. Those families are given numericalnames and do not have descriptions.-Output shows you the number of membersin each family, the name of the family, anda description of the family (if the family isbased on an HMM).-You can view a list of the proteins in eachfamily be clicking on the family name. Youcan view information about the HMM onwhich the family is based by clicking on thedescription.

Page 20: Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE

20

Searches on the Genome Summary page: “Membrane proteins”One can choose to view the proteins based onpredicted location in a membrane. You canchoose particular SignalP cutoff values, numberof predicted transmembrane regions, proteinsthat have an OMP signal, or lipid attachmentsite. You can also sort the output by severaldifferent options.

Output shows a table of thegenes with the chosen

parameters. You can reorderthem using the pull-down

menu and the “sort” button.The table displays all of the

parameters available for eachprotein. Clicking on the blue

gene id takes you to the GeneCuration Page (GCP) for the

gene.

Page 21: Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE

21

The Welcome to Manatee PageOptions under “Access Listings”:“Gene Ontology”This link will open a page that offers options for usingthe Gene Ontology (GO) system.(For more information on the Gene Ontology system,see the Annotation Overview document, or the GeneOntology web site, www.geneontology.org)

In brief, the GO offers a controlled vocabulary for thedescription of aspects of gene products. Currently,TIGR assigns both TIGR role categories and GOterms to all of our genes. Manatee has many built infeatures for the suggestion and entry of GO terms andassociated information. These features will bedetailed in later slides. The next slide shows a briefdescription of the links available here and of the “editGO” options.

When Manatee refers to “editing” GO, we mean thecreation of “TI” or TIGR terms. These are temporaryterms created for use in-house at TIGR untilcorresponding terms are created at GO. When a needfor a new term is found, we (usually Michelle) submitsa request to the GO via their SourceForge tracking sitethat the new term be created. If a TIGR annotatorneeds the new term right away, they can create a TIterm to use within our db. Later, when the official GOterm is made, the TI term id will be replaced with thenew GO term id.

Page 22: Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE

22

The Welcome to Manatee page, links from Access Listings: “Gene Ontology”

Choose to search or edit GO.

Search optionswill be discussedin detail later

When we refer to editing the GO in TIGR’sdb, we are referring the creation of TI terms.(see overview for more information)

These are links to pages that allow you to enter,update, and add parents to a TI term.

This links to a page that displays all TIterms and their status.

Page 23: Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE

23

The Welcome to Manatee PageOptions under “Access Listings”:“Genome Properties”The Genome Properties system allows oneto view annotation from the context of thewhole genome. It predicts and/or capturesinformation on the presence/absence ofpathways, cellular structures and otherfeatures of the organism. (see overview formore details) Clicking on the GenomeProperties link from the “Welcome toManatee” page displays atable of all of the properties and their statesfor the organism you are working on. Thestate is “yes” if the property is present, “no”if the property is absent, and will have otherintermediate values such as “someevidence” or “not supported” depending onthe amount of evidence for a givenproperty. Details on what is known abouteach property for the genome you areworking on can be obtained by clicking onthe blue property name. (see next slide)

Page 24: Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE

24

The Welcome to Manatee page, links from Access Listings: “Genome Properties”

Update the status of orinformation about aproperty in this organism

Search for a property in this genome.

Click on the blue name of a property to learn more about the steps/requirements forthe property and to see background information and references regarding theproperty. You can also see the genes the are involved in the property in thecontext of their neighbors in the genome. These pages will be shown in detail laterin the tutorial but are quickly shown on the next slide.

Page 25: Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE

25

Genome Property information Page (in brief, more detail will be shown later in the tutorial)

Information onthe property.

Information onthe genesidentified to bea part of theproperty.

Page 26: Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE

26

The Welcome to Manatee page, links from Access Listings: “Genome Viewer”Genome Viewer is a tool which allows one to view the genes in context with their neighboring genes in thegenome. It displays a graphic showing the 6-frame translation of a region of DNA sequence, where eachhorizontal bar is a different frame. Arrows representing the genes are color coded according to TIGRmainrole assignment. There are many viewing and editing options available from this page. These will bediscussed in detail later in the tutorial.

Page 27: Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE

27

The Welcome to Manatee page, links from Access Listings:“Multi Genome Annotation Tool” (MGAT)The MGAT tool allows the annotation of orthologous genes from several genomes at one time. It is linkedinto Manatee at several points. MGAT is still undergoing development and is not currently available forpublic use. A separate tutorial for this tool is under construction.

Welcome to Manatee

Page 28: Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE

28

The Welcome to Manatee Page: Options under “Annotation Tools”

type a custom sql query here to see a list of genes with criteria notavailable on gene list options

Links to a documentwith fairly detailed infoon TIGR’s proteinnaming guidelines.

This is the same option as the one on the “Genome Summary” page

Links to the “GenomeSummary” page

Descriptions of a few of thelinks and tools on this pageare described here in the redboxes, following slides willdetail the other options/toolson this page.

28

Page 29: Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE

29

“Annotation Tools”: “Coordinate Range”

Input a coordinate range and you willget a list of genes whose coordinatesfall anywhere in that range.

29

Page 30: Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE

30

“Annotation Tools”: “Overlap Analysis”We work on the premise that genes do not generally overlap inprokaryotic genomes. We look for overlapping genes predicted byGlimmer and where we can, remove genes suspected of being falsecalls by Glimmer. Often overlap between two genes can be resolved bythe curation of the start site of one or both, or by the removal of a“hypothetical protein” (one that has no similarity to anything) when itoverlaps a protein with very clear similarity to other proteins. For moreon overlap analysis see the Annotation Overview document.This display shows the pairs of overlapping genes as indicated by thebackground color shifts from blue to white to blue to white. Clicking onthe feature id number takes you to a Gene Curation Page (GCP) for thatgene. Also displayed are the percent of overlap, name of the protein,and notes from Glimmer regarding the protein in question.

30

Page 31: Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE

31

“Annotation Tools”: “Interevidence Analysis”Glimmer is known to sometimes miss identifying a few real genes. This is especially true for areas of the genome that havebeen laterally transfered. To find genes Glimmer might have missed, we run an analysis called “interevidence”. This tooltakes the nucleotide sequence between genes, thesequence of hypothetical proteins (those thathave similarity to nothing), and any regions ofproteins that have similarity to nothing, does a6 frame translation, and then searches thosetranslations against niaa (our in-house proteindb). Any possible areas of similarity are thenreviewed by annotators and missed genes areentered into the db.

31

Page 32: Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE

32

Data consistency checks: Clicking this generates a list ofpossible errors or consistency problems in the annotation. Forexample, if two proteins have the same common name butdifferent TIGR role assignments, they would be listed in theconsistency check section for review.

Frameshift Reports: Similar to the “Frameshift status” link thatwas described earlier for the “Genome Summary” page -basically a list of genes with frameshift reports to be resolved.

Hypothetical protein list: a list of hypothetical proteins, (thosewith insufficient evidence to make any functional assignment) forwhich there is any shred of information which might lead toannotation other than “hypothetical protein”, this list is generatedautomatically after AutoAnnotate has made its initial assignment.Those “hypothetical proteins” called by AutoAnnotate that haveany BER or HMM evidence are put on this list for manual review.

Annotation status: The same page as was described from the“Genome Summary” page - lists of the steps in annotation and alist of role categories, status of completion and annotator whodid the work is noted.

Phage Region Viewer: A tool that lists any identified prophageregions in the genome and the genes within them.

PubMed Organism Search: Automatically takes you to theNCBI PubMed site and gives results for a PubMed search usingthe organism name as keywords. Useful for finding literature on the organism you are working on.

“Annotation Tools”: “Other Tools” section

Page 33: Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE

33

“Annotation Tools”: “Access Gene Lists” sectionAlthough all of the tools described so far in this tutorial arequite useful, the bulk of annotator time is spent in viewingand editing information that is displayed on gene lists andGene Curation Pages that are accessed through the“Access Gene Lists” section.This tool will create a table of genes chosen according to theoptions in the red box at right. As mentioned in theoverview, at TIGR we organize our annotation efforts aroundTIGR role categories. This tool allows us to view the geneswithin each TIGR role category.The first option to select in this section is which moleculeyou wish to annotate. Some genomes consist of just onechromosome and nothing else, while others can havemultiple chromosomes or chromosome(s) and one or moreplasmids. If multiple DNA molecules exist for the genome inquestion, the pull down menu at the top of this section willlist them along with their id number. The default selection is“All molecules” as the team usually annotates all moleculesat once, however, to choose just one of the molecules,simply select it from the pull-down menu.Then choose one of the 3 options for which role categoriesyou want to see genes from with the toggle buttons: firstyou can choose all role categories, second you can chooseone particular main role category, and third you can chooseone particular sub-role category. All of the mainrolecategories are listed in the pull-down menu in the main rolecategory selection, to choose one, simply highlight it. Inorder to select a particular sub-role category you must enterinto the box next to “single role category” the id number ofthe sub-role category. There is a listing of all of the TIGRrole categories and their id numbers on the next two pagesof this tutorial.Once you have chosen your desired options, click submit tosee a list of the genes that fit your selections.

Page 34: Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE

34

Gene List: The results of your selection from the Access Listings tool are displayed in a gene list containing gene idnumber, locus (if available), coordinates of the gene (end5, end3), common name of the gene/protein, gene_sym, EC number,and other roles for the protein. Not all of these fields will be populated for every gene. The genes are organized by role category(if your selection included more than one.) There are many features of the gene list, and much information displayed - textdescribing a feature is boxed in the same color as the feature itself.

Click on the gene_id (feat_name)link to see the Gene CurationPage for each gene. Click on“GV” for Genome Viewer.

A green dot in the “A” column indicates this orfwas given a high quality assignment byAutoAnnotate. (The only type of evidence thatwill currently trigger this is an above trustedcutoff hit to an equivalog HMM.) A pink dot willappear in the “C” column once an annotator hasfinished annotation for the gene and marked itcomplete.

The ORFs can be ordered according to any of the blueheaders by clicking on that header.

This links to a text entry field tostore info of interest to the projectthat is found during annotation.

Link to role notes for thiscategory

View list of Genome Propertiesfound for this role category

Clicking on the blue names of any mainrole category takes you to a gene list forthat category.

Page 35: Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE

35

Unclassified (the automated program was unable to assign a role to these)185 Role category not yet assigned

Amino acid biosynthesis70 Aromatic amino acid family71 Aspartate family73 Glutamate family74 Pyruvate family75 Serine family161 Histidine family69 Other

Purines, pyrimidines, nucleosides, and nucleotides123 2'-Deoxyribonucleotide metabolism124 Nucleotide and nucleoside interconversions125 Purine ribonucleotide biosynthesis126 Pyrimidine ribonucleotide biosynthesis127 Salvage of nucleosides and nucleotides128 Sugar-nucleotide biosynthesis and conversions122 Other

Fatty acid and phospholipid metabolism176 Biosynthesis177 Degradation121 Other

Biosynthesis of cofactors, prosthetic groups, and carriers77 Biotin78 Folic acid79 Heme, porphyrin, and cobalamin80 Lipoate81 Menaquinone and ubiquinone82 Molybdopterin83 Pantothenate and coenzyme A84 Pyridoxine85 Riboflavin, FMN, and FAD86 Glutathione162 Thiamine163 Pyridine nucleotides191 Chlorophyll707 Siderophores76 Other

Central intermediary metabolism100 Amino sugars698 One-carbon metabolism103 Phosphorus compounds104 Polyamine biosynthesis106 Sulfur metabolism179 Nitrogen fixation160 Nitrogen metabolism709 Electron carrier regeneration102 Other

Energy metabolism108 Aerobic109 Amino acids and amines110 Anaerobic111 ATP-proton motive force interconversion112 Electron transport113 Entner-Doudoroff114 Fermentation116 Glycolysis/gluconeogenesis117 Pentose phosphate pathway118 Pyruvate dehydrogenase119 Sugars120 TCA cycle159 Methanogenesis105 Biosynthesis and degradation of polysaccharides164 Photosynthesis180 Chemoautotrophy184 Other

Transport and binding proteins142 Amino acids, peptides and amines143 Anions144 Carbohydrates, organic alcohols, and acids145 Cations and iron carrying compounds146 Nucleosides, purines and pyrimidines182 Porins147 Other141 Unknown substrate

TIGR Role Categories - Page 1

Page 36: Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE

36

DNA metabolism132 DNA replication, recombination, and repair183 Restriction/modification131 Degradation of DNA170 Chromosome-associated proteins130 Other

Transcription134 Degradation of RNA135 DNA-dependent RNA polymerase165 Transcription factors166 RNA processing133 Other

Protein synthesis137 tRNA aminoacylation158 Ribosomal proteins: synthesis and modification168 tRNA and rRNA base modification169 Translation factors136 Other

Protein fate97 Protein and peptide secretion and trafficking140 Protein modification and repair95 Protein folding and stabilization138 Degradation of proteins, peptides, and glycopeptides189 Other

Regulatory functions261 DNA interactions262 RNA interactions263 Protein interactions264 Small molecule interactions129 Other

Signal transduction699 Two-component systems700 PTS710 Other

Cell envelope91 Surface structures89 Biosynthesis of murein sacculus and peptidoglycan90 Biosynthesis and degradation of surface polysaccarides and lipopolysaccharides88 Other

Cellular processes93 Cell division188 Chemotaxis and motility701 Cell adhesion702 Conjugation96 Detoxification98 DNA Transformation705 Sporulation and Germination94 Toxin production and resistance187 Pathogenesis149 Adaptations to atypical conditions706 Bioosynthesis of natural products92 Other

Mobile and extrachromosomal element functions186 Plasmid functions152 Prophage functions154 Transposon functions708 Other

Unknown703 Enzymes of unknown specificity157 General

Hypothetical156 Conserved704 Domain

Disrupted reading frame270 NULL

TIGR Role Categories - Page 2

Page 37: Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE

37

Gene list link: Edit Annotation Notebook:Clicking on the “Edit Annotation Notebook” link on the gene list page will take you to a page where youcan enter or edit annotation notes for a particular role category. It is in this text field that we storeinformation that we think will be useful for the PI of the project in the analysis of the genome or in thepreparation of the manuscript. Things such as the presence of an unexpected pathway, or the fact thata key step in another pathway is missing. Once the text is as you want it, click “submit” to store theinformation in the db.

Page 38: Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE

38

Gene list link: Role information page:

TIGR annotators expert in particular rolecategories have written “role notes” to aidnew annotators and annotators unfamiliarwith the category in the annotation process.These notes contain information on whatgenes belong in the category and whatgenes don’t, on the pathways found inparticular categories, and on the TIGRnaming conventions for proteins within thecategory.

Any TIGR annotator can update or add textto the note field by typing it in and thenclicking submit.

There is also a link to the role notes pagesfrom the Gene Curation Page (GCP) whichwill be shown in the GCP section.

38

Page 39: Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE

39

Gene Curation PageThe Gene Curation Page (GCP) is likely the mostimportant page within Manatee, it is certainly theone that annotators spend the bulk of their timelooking at and working with.

This page can be accessed within Manatee frommany places:any gene list, the “Access Gene Curation Page”option on the Genome Summary/Annotation Toolspages, Genome Viewer, …. and more.

The GCP is a very complex page so we will look at itin sections. I will try to organize the descriptions ofeach section in roughly the same order that theconcepts behind each section were reviewed in theAnnotation Overview.

Page 40: Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE

40

Gene Curation PageGene Curation InformationThis section contains basic identifying information about thegene and some search and display options.

The feat_name of the gene is listed at the top of the page, thisnumber is called the “gene id” in gene lists in Manatee. Thefeat_name is followed in parentheses by the locus name (finalloci are assigned to genes at the end of a project, onceannotation is complete, but they may get temporary loci duringthe course of the project).

The blue link under these names is a link to a file containingthe BER search results for this gene (see later slide). There isanother link to this page further down the orf info page (will beseen in a later slide).

To the right of the ORF names is a box containingcoordinates, length, and molecular weight. “end5” is the 5’coordinate for the beginning of the coding sequence, “end3” isthe 3’ coordinate for the end of the coding sequence.

Finally on the extreme right is a box allowing you to move toanother ORF info page by typing in the feat_name or locus inthe box and clicking “new gene”. One can also change to anorf in a different genome by changing the database in thedatabase box, typing in the new orf number and clicking “newgene”.

If you want to reload theGCP, use the “Reload Page” link inthis section. Do not use the browser’s reload button as thiscan cause things to be sent to the db in error.

To generate new HMM and BER searches click “RefreshSearches” and enter your unix password.

Page 41: Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE

41

Gene Curation PageGene Identification

Initial information for this section comes fromAutoAnnotate. The manual annotation thenconfirms or changes the information.

Common name: the descriptive name given tothe proteinGene sym: the gene symbol for the protein (inthis case bioB) (we default to E. coli genesymbols when possible and B. subtilis for Gram+ specific things)EC#: If the protein is an enzyme, we store theEnzyme Commission number. See later slidesfor info on ECGO term suggestions.private comment: a field for annotators to noteinformation for later reference by themselves orother annotators. A good place to keep notes.public comment: comments meant to go outwith our public accessions .auto_comment: A link to information from theAutoAnnotate program indicating whatinformation was used to make the preliminaryannotation assignments (see next slide).nt_comment: For non-TIGR comments. This isthe place that collaborators can put comments tohelp the team in annotation.

Page 42: Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE

42

Gene Curation Page - Auto CommentClicking on “auto_comment” pops up a text box with information on whereAutoAnnotate got the information it used for the preliminary annotation.

Page 43: Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE

43

The characterized match section is where we enter the accessionof a match gene whose function has been characterized in the lab(as opposed to having received its name based on sequencesimilarity.) This is stored as a piece of annotation evidence. Thisaccession will pop into the go with_ev field in the proper format ifyou click on “Add to GO Evidence”. (more on GO data later)

The BTAB SKIM section shows the top hits from the BER searchfile (see Annotation Overview presentation for more informationon BER searches). The first column is the accession of the matchprotein (from various databases), the second is the percentsimilarity of the match, the third is the length of the match (innucleotides), the fourth is the name of the match protein andfinally, the P score from the BLAST search.

The color of the background for each entry in the skim indicateswhether it is in the characterized table and at what confidencelevel: green=high confidence; red=automated process; skyblue=partial characterization; olive=trusted, used when multipleextremely good lines of evidence exist for function but noexperiment has been done; blue-green=fragment/domain hasbeen characterized; fuzzy gray=void, used to indicate thatsomething that was originally thought to be characterized really isnot; gray=omnium only

Clicking on the blue accession number will automaticallypopulate the “Add accession” field in the characterized matchsection with that accession. Clicking on the blue names of theproteins in the skim will take you to a page with just thealignment to that protein.The blue “View BER searches” link at the top of the skim sectionwill take you to a file of all of the pairwise alignments from theBER search (see later slide).The tree icon takes you to aphylogenetic tree of the genome protein with the top hits of theskim, the Belvu icon takes you to a multiple alignment of thegenome protein with the top hits of the skim. (See later slides.)

Gene Curation Page - BER Skim and Characterized Match

Page 44: Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE

44

Links from the Gene CurationPage - The BER alignment fileThis page is accessible by clicking on the “View BERsearches” link at the top of the Info page or at thetop of the BTAB skim section.

Here you will find multiple pairwise alignments of thegenome protein to hits found in the BER search.

In the header of each alignment will be listed theaccessions and names for this protein from everydatabase where it is found. These accessions areclickable objects and will take you to the page for thematch protein in the database in question.

The background color of the header will be gold ifthe protein is found in the characterized table withthe confidence level indicated by the color of the textfor the accession found in the characterized table.(This is seen for the SP accession in this alignment.)

Names in Skim are first entry in header, notnecessarily the name you want to use, check rolenotes for TIGR naming standards, check IUBMB ECsite for official enzyme names, look in header forSwissProt as a model for the name if previous twoguides are not available.The background color in the Skim may be assignedto an entry in the header different than the onenamed in the Skim.

Links to info pages for the match protein in the source db.

Link back to Gene Curation page for this ORF

44

Page 45: Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE

45

-The background color of this box will be gold if the protein is in the characterized table and grey if it isnot.

-The top bar lists the percent identity/similarity and the organism from which the protein comes (ifavailable).

-The bottom section lists all of the accession numbers and names for all the instances of the matchprotein from the source databases (used in building NIAA for the searches.)

-The accession numbers are links to pages for the match protein in the source databases.

-A particular entry in the list will have colored text (the color corresponding to its characterized status) ifthat is the accession that is entered into the characterized table - this tells the annotators which link theyshould follow to find experimental characterization information. Only one accession for the matchprotein need be in the characterized table for the header to turn gold.

-There are links at the end of each line to enter the accession into the characterized table or to edit analready existing entry in the characterized table.

BER Alignment detail: Boxed Header

Page 46: Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE

46

-It is most important to look at the range over which the alignment stretches and thepercent identity

-The top line show the amino acid coordinates over which the match extends for ourprotein

-The second line shows the amino acid coordinates over which the match extends for thematch protein, along with the name and accession of the match protein

-The last line indicates the number of amino acids in the alignment found in each forwardframe for the sequence as defined by the coordinates of the gene. The primary frame isthe one starting with nucleotide one of the gene. If all is well with the protein, all of thematching amino acids should be in frame 1.

-If there is a frameshift in the alignment (see overview) the phrase “Frame Shifts = #” willflash and indicate how many frameshifts there are.

BER Alignment detail: alignment header

Page 47: Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE

47

-In these alignments the codons of the DNA sequence read down in columns with thecorresponding amino acid underneath.

-The numbers refer to amino acid position. Position 1 is the first amino acid of the protein.The first nucleotide of the codon coding for amino acid 1 is nucleotide 1 of the codingsequence. Negative amino acid numbers indicate positions upstream of the predicted startof the protein.

-Vertical lines between amino acids of our protein and the match protein (bottom line)indicate exact matches, dotted lines (colons) indicate similar amino acids.

-Start sites are color coded: ATG is green, GTG is blue, TTG is red/orange

-Stop codons are represented as asterisks in the amino acid sequence. An open readingframe goes from an upstream stop codon to the stop at the end of the protein, while thegene starts at the chosen start codon.

BER Alignment detail: alignment of amino acids

Page 48: Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE

48

Swiss-Prot entry - slide #1 - top of pageSwissProt is an incredibly useful database for manual annotation. All of the genes in SwissProt have been manually annotatedby an experienced knowledgeable staff. In addition, along with each protein’s annotation is stored additional information onreferences that describe the protein, cross referened databases in which the protein can be found, motifs which the proteincontains, and coordinates of any known features in the protein (and much more.)

accession andversioninformation

name, EC#gene_symboltaxonomy

references withlinks toabstracts (clickon NCBI to seea PubMedabstract of thepaper)

Link to Enzyme Commission page(see later slide)

Page 49: Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE

49

usefulfunctionalinformation

links toother dbswhere theprotein isfound or tomotifclusters orproteinfamilieswhich thisprotein is amember of

Swiss-Prot entry - slide #2 - middle of page

Page 50: Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE

50

Swiss-Prot entry - slide #3 - bottom of pagekeywords and sequence features with coordinates

sequence features

Page 51: Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE

51

View of EC number info page from Swiss Institute of Bioinformatics site

Link to official Enzyme Commission site

Page 52: Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE

52

View of information page for an EC number at IUBMB siteThe Enzyme Commission (EC) is part of the IUBMB and is charged with maintaining the database of enzyme classifications. Inthe EC system, each reaction is assigned a 4 part accession number with each part consisting of an integer, where the numbersare separated by periods. As one moves from the first number to the second to the third to the fourth the nature of the reactionbecomes more specific. For example: EC2.-.-.- = “transferase”, 2.8.-.- = “transferase, transferring sulfur-containing groups”,2.8.1.- = “sulfurtransferases”, and finally 2.8.1.6 = “biotin synthase” (a specific sulfurtransferase, which is a specific class oftransferases that transfer sulfur-containing groups). One can see the breakdown of all of the classes within each EC firstnumber (they only go up to 6) by clicking on the home page for each number (see below).

Click here to see all the classifications within EC #2 (the transferases).

52

Page 53: Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE

53

Links from the Gene Curation Page - Tree (may not work on laptops)

Page 54: Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE

54

Links from the Gene Curation Page - BER multiple alignment(will not work on laptops)

Page 55: Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE

55

Gene Curation page - HMM hits scoring above noise(Text describing the features of theHMM section is boxed in the samecolor as each feature.)

The blue id numbers for eachHMM link to an info page forthat HMM.

Key information is the isologytype and the “total” and“cutoff” scores.

The “Add To GO Evidence”link automatically fills theHMM information into the“with” field in the GO termentry box.

GO terms assigned to eachHMM are listed under theHMM (if any). Clicking on the“Add” button here adds notonly the GO term id, but alsothe HMM evidence.

The “Add To Annotation” linkwill automatically copy theannotation from the HMM tothe protein.

Click to see hits below noise

This sectiondescribed onlater slide

55

Page 56: Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE

56

HMM report page

At the top is informationabout the HMMincluding HMM name,associated annotation(gene symbol, EC#,TIGR role, etc.) andcomments from theauthors.Below is a list of allgenes in the organismwhich hit the HMM andthe scores theyreceived. The row withthe gold background isthe protein of interest.Rows with a greenbackground havescores below thetrusted cutoff, rows witha purple backgroundhave scores below thenoise cutoff.

- to get to this page click on an HMM accession number almost anywhere in Manatee

Page 57: Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE

57

Genome Properties - linked from the Gene Curation Page in the HMM section

If an HMM is part of agenome property, therewill be a link here and anindication of the state ofthe property - in this case“YES” indicating that theorganism has an intactbiotin biosynthesispathway. Clicking on thename of the propertytakes one to a propertyreport page.

If you want to use theGenome Property asevidence for GOannotation, click the “GO”link under the “add GOevidence” section.(more on GO data later)

The “Run Rules.spl” link

Page 58: Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE

58

Genome Property info page (part 1): biotin biosynthesisThis has general information about the property, GO terms assigned to the property, and a place forcurators to put comments regarding this property in this organism.

Page 59: Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE

59

Genome Property info page (part 2): biotin biosynthesis

This section of the pageshows the steps for theproperty, which steps arerequired and which stepsare not, and the genesfrom the genome thathave been identified foreach step.

One can link to the GCPfor each gene or to theHMM info page for theHMMs named by clickingon the gene id or HMMaccession, respectively.

Page 60: Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE

60

Genome Property info page (part 3): biotin biosynthesis

This section has reference information and a graphic showing the cluster of genesin the organism involved in the property. One can click on the arrows in the graphicto get a GCP for that gene.

Page 61: Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE

61

Gene Curation Page - Evidence Picture - ORF04813

All of the evidence stored for an ORF is displayed in this graphic. The black bar represents the ORFin question. Green bars represent HMMs which hit the ORF above trusted cutoff. Green HMM barsindicate above trusted score, orange indicates above noise but below trusted, red indicates belownoise and is generally not shown unless an annotator has decided that the HMM should be includedas evidence by toggling the curation box. The pink bar represents the characterized match to thisORF. Characterized matches are shown in different colors that at this time have no meaning. Alsoshown here is a secondary structure prediction (not run on all genomes). Clicking on the colored barsin the graphic opens windows with additional information on that piece of evidence. To get additionalcog info, you must click on the very skinny bar all the way to the left of the cog row. The evidencepicture for ORF04813 does not contain all of the possible evidence types, so later slides will showsome evidence pictures from other genes.

Page 62: Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE

62

Secondary structure prediction

Page 63: Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE

63

The biotin synthase does not have all of the evidence types thatare possible, therefore, the following screen shots will showsome evidence pictures from other genes displaying additionalevidence types.

Following the evidence pictures will be the evidence detailpages linked to from the evidence pictures.

After all of the evidence types have been represented, thetutorial will resume with ORF04813.

Page 64: Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE

64

Additional evidence types shown here are:TmHMM - an HMM specific for transmembrane regions, built by the Center forBiological Sequence Analysis, DenmarkParalogous Family membership - if a protein is a member of a paralogous familyit will be represented with a blue bar, clicking on the bar takes you to a page listingall the family members. Paralogous familes are built from searching the protein setfor a genome against itself. First families are built according to shared hits toHMMs, then regions not matching HMMs are searched against each other to findadditional families. The families corresdponding to HMMs are given names with theHMM accession number, others are given numbers.

NOTE: this display is from ORF03779

Gene Curation Page - Evidence Picture (ORF03779)

Page 65: Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE

65

NOTE: this display is for ORF03779

Page 66: Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE

66

Paralogous Family display NOTE: this display is for ORF03779

Page 67: Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE

67

Evidence picture from ORF01166

NOTE: this display is for ORF01166

Additional evidence types shown here are signal P, lipoprotein predictions, and PROSITEhits. Signal P and PROSITE information are displayed both in the Evidence Picture and insections of their own on the Gene Curation Page (next slide). Clicking on the bars in thegraphic opens windows with additional information.Lipoprotein predictions are based on one particular PROSITE motif, so clicking on the redlipoprotein bar will take you to the PROSITE page for the lipoprotein signature (not shownin tutorial).

Page 68: Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE

68

Gene Curation Page - PROSITE and Signal P sections on the GCP

Click here to see info on PROSITE motif.

Click here to see output in graphical form.

NOTE: this display is for ORF01166

Page 69: Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE

69

Signal P Graphical output

NOTE: this displayis for ORF01166

Page 70: Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE

70

PROSITE page at ExPASyNOTE: this display is for ORF01166

70

Page 71: Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE

71

Current GO term assignments arelisted in table.-Click id # to see term in tree.-Click box for GO term to bedeleted.-Click “add” to add additionalevidence rows. (or click delete andadd to completely redo evidence)-Click “edit” to edit evidence.-”Make ISS”(not seen in thisexample) can be used when the GOterm and evidence assigned byAutoAnnotate are correct, clickingthis button marks the oldassociation for deletion andautomatically puts in the new infofor insertion.

These pull downs have commonlyused GO terms. If you choose theunknown terms from any pull-down,the evidence will automatically fill in(since it is always the same.)

Fill in the fields in this section to addor change GO term assignments.All entries must have “ev_code”,“reference’, and “with”.(more on this in a minute…..)

Gene Curation Page (ORF04813) - Gene Ontology Display Link to GOsearch tool

Link to GOsuggestions

Page 72: Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE

72

Overview of steps in GO annotation:

-Review the GO terms assigned to the gene byAutoAnnotate (if any). If they are correct andsufficient use the “Make ISS” button. (notseen here)

-Look for any other needed GO terms in the varioussuggestion areas on the page: EC#s, HMMs,GO suggestions (see suggestion slide formore info)

-If correct GO terms are unavailable on the GeneCuration Page go to the GO search pagesand find the GO term you need. You getthere by clicking the search link in the upperright corner of the GO section.

-GO terms must be added in the bottom part of theGene Ontology section. The GO term id goesin the “add go id” column

-The ec_code column has a pull-down for choosingthe ev_code you want, the default is “ISS”

-Next is the “reference”, “with”, and “qualifier”columns. Additional slides following this onedetail the search for and insertion of GOterms and evidence.

-See the “overview” presentation for more info onGO

Page 73: Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE

73

Gene Curation Page - GO suggestions and Auto-fill-insGO term suggestions and auto-fill-in buttons are located in several places on the GeneCuration Page:-GO terms assigned to HMMs are listed under HMM hits (if any have been assigned -see the HMM slide for how these look). These are often excellent sources for GOterms. Clicking the “Add” button next to a GO term under an HMM adds both the termid and the evidence to the appropriate fields in the GO entry section. Clicking the “Addto GO evidence” button adds just the HMM accession into the “with” field in the GOentry section.-GO terms corresponding to EC numbers are listed next to the EC box (for enzymes).Clicking the “add” button will put the GO term id into the “add go id” fields in the GOentry section.-GO terms assigned manually to other bacterial genomes (V. cholerae, B. anthracis -both a Gram + and Gram - representative), InterPro hits, Genome Properties are listedboth at the bottom of the page and in a pop-up window accessed by the link in theupper right corner of the GO section. Clicking on “add” in this section puts the GO idinto the “add go id” fields in the GO entry section.-”Add to GO evidence” buttons are also available for Prosite hits, this populates the“with” field with the Prosite accession. Available when a protein has matches to Prosite.-”Add to GO evidence” is also available for the characterized match accession, this willput the accession of the characterized matching protein into the “with” field entry box.-”Add to GO evidence” is also available for Genome Properties, clicking on the “GO”link under the “add to GO evidence” column in the Genome Properties section will enterthe GenProp accession in the “with” field.

See next page for screen shots.

Page 74: Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE

74

GO terms andevidenceAuto Fill-insFollow the arrows tosee which fields arefilled in by clickingthe various GO“evidence” and “add”buttons around theGCP

(only appearswhen there areTMHMM hits fora protein,thispull-down will beto the right of theother term pull-downs forprocess,component, andfunction)

Page 75: Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE

75

Manatee’s GO ontology and annotation search tool:

In many cases the GCP will not have a suggested GO term that meets an annotators needs. Inthat situation the annotator will turn to Manatee’s built in GO ontology and annotation browser.There are several available functions in the search tool:

–GO term id search of ontologies - this returns a tree view of the search term in the ontology–GO term name keyword search of ontologies - this returns a table of terms where the name, or asynonym of the name contains the keyword or where a word contains the keyword in question.–protein name keyword search of annotations - this is a search of the annotations and returns proteinswhose name match the keyword and the GO terms that were assigned to those proteins–GO id search of annotations - search GO annotations with a GO id and see a list of proteins that havebeen annotated to that GO term.–GO correlations in annotations - often a particular function term will often be assigned with a particularprocess term (for example: “biotin synthase” will almost always be assigned in conjunction with “biotinbiosynthesis”) - when one needs help finding a process or function one can search for theserelationships with the correlations tool.–EC search - uses the ec2go mapping file provided on the GO web site to look up GO terms thatcorrespond to EC numbers–GO BLAST - search a protein sequence against a database of proteins that have been annotated toGO, then link to the GO terms that were assigned to them. This is the only GO search tool notaccessible on the GO search page - this one is found in the “Select Function” pull-down menu at the topof the Gene Curation Page.

See next page for screen shots.

Page 76: Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE

76

Input a GO term here. This results in a GO term informationInput a GO term here. This results in a GO term informationpage.page.

Input a search string here using the checkboxes toInput a search string here using the checkboxes toselect some or all of the ontologies to search in. If youselect some or all of the ontologies to search in. If youwant to restrict your results to only terms which share allwant to restrict your results to only terms which share allof your input text, click of your input text, click ““Exact matchExact match””..This searches the name of the GO term.This searches the name of the GO term.

Links from the Gene Curation page - GO Search ToolClick on the “search” link in the title bar of the GO data section

Input text and see GO terms assignedto genes from other databases whosecommon names contain the input text.

Search for GO terms that most frequently are assigned to aprotein along with the input GO term. Check the box torestrict the search to TIGR prokaryotic data only.

Input an EC number and get the corresponding GO term

Input a GO id and seegenes from otherdatabases that have beenannotated with that GOterm 76

Page 77: Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE

77

GO Term information page and tree view.This page is reached by clicking on GO id links or using GO id search.

Numbers next to the terms in the treeindicate the number of genes from thisorganism that are annotated to that term ora child of that term - clicking on the numbergives you a table of those genes andrelevant info. (missing in this screen shot)If you reach this page by clicking on a GOterm on a GCP, clicking the “add” button inthe tree will place that GO term in the “add”field on the GCP.

Page 78: Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE

78

The first part of the tableshows results from theGO term names.

The second part of thetable shows results fromGO term synonyms.

Note that areas of thetext which matched thekeyword are highlightedin red by Manatee.

Terms which are“obsolete” or “secondary”to another term will havethat indicated in columnone.

Click any GO term idnumber for a view of theterm in the GO tree.

Search results forGO term keyword:“biotin”

Page 79: Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE

79

GO correlations searchSearch results from query with GO:0004076 “biotin synthase activity”Searches data set stored in our database of all associations to genes available on GO web site.First table shows percentages of occurrences of the query term with other terms.Second table shows details of all instances of query term assigned to a gene in the data set.

Page 80: Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE

80

Output from GO search for protein common name keyword: biotin synthase

Page 81: Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE

81

Adding GO Evidence

PMID of the genome paper. For “unknown” terms use “GO_REF:nd”. For terms with experimental evidencecodes use the PMID of the paper describing the characterization.Step 3. Fill in the “with” field. For all ISS entries you must fill in the accession of the HMM, characterizedmatch protein, or Genome Property that led to the annotation.Auto Fill-ins: Both the GO ids and associated evidence can be filled in automatically by clicking the “Add”buttons next to GO suggestions and the “Add to GO evidence” buttons. All info for the “unknown” terms is filledin automatically by choosing the “unknown” terms in the pull-down menus. All information for GO terms assignedto HMMs is filled in with the “Add” buttons next to GO terms under HMMs.Qualifier should be set to “contributes_to” when annotating the function of a complex to the proteins that make up the complex. (see the overview for more informatin on all of these fields)

Step 1. Pick an evidencecode. Most genes inbacterial genome sequencingprojects will get an ev_codeof “ISS”. This stands for“Inferred from sequencesimilarity.” If a gene from thesequenced organism hashad experimentalcharacterization, then chosean appropriate experimentalev_code. All “unknown” GOterms get “ND” as ev_code.To see all ev_codes, click the“ev_code’ link.Step 2. Fill in “reference”information. For ISS termsprior to publication use“TIGR_CMR:annotation”,after publication use the

Page 82: Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE

82

Gene Curation Page - TIGR roles

Click here to view/edit role notes

Click here toenter this roleinto the“Delete” box

Click on the name of the main role or subrole to take you to a page with the genelist for that main/sub role.

Add or delete role ids with these boxes.

Click here for a list ofTIGR roles.

Page 83: Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE

83

Gene Curation Page - How to get the data into the database:The “Submit” buttons

Clicking this button indicates thatyou have reviewed the start siteand either found it to be fine oredited it to the correct (or at leastwhat we hope is correct) position.

Click this button when you havecompleted annotation for thisgene. With this toggle we knowthat this gene is finished.

This button resets the page to thestate it was when originally opened.

Click here to submit yourentries to the database. Youcan also do this by clicking onany of the “submit” buttons inthe upper right of eachsection on the page. Clicking“submit” anywhere on thepage submits data from allfields (not just the sectionfrom which you clicked thebutton.)

Page 84: Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE

84

Gene Curation Page - The pull down menusIf you click on the select pull down menus you will get a selection of options. Each of thesewhen selected will generate a new page with the desired information. (Later slides showexamples of some of these.)

Page 85: Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE

85

Links from the Gene CurationPage - View sequenceThis page shows the length (nucleotide and protein),coordinates, MW, and pI of the protein.

Also, in fasta format are the nucleotide and proteinsequences.

Page 86: Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE

86

Links from the Gene Curation Page - Third position GC skew

In organisms whose DNA has ahigh GC content it can sometimesbe helpful to look at third positionGC skew to help resolve overlaps.

Due to the nature of the geneticcode, the third position is the leastconstrained of a codon andtherefore will be able to reflect thehigher GC content of the overallgenome. Therefore one should seea markedly higher GC content inthe third position of the correctframe.

ATGATGATGTACTACTAC

Frame 6

Frame 3

ATGATGATGTACTACTAC

Frame 4

Frame 2

ATGATGATGTACTACTAC

Frame 5

Frame 1

NOTE: this display is for another gene

86

Page 87: Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE

87

ORF Management in Manatee: Genome Viewer

Clicking on the “Genome Viewer” optionon the “Welcome to Manatee” Page,selecting “GV” next to a gene id in a genelist, or selecting “Genome Region” in the“Select Display” pull-down on the GCPwill take you to our Genome Viewer tool.Here you can view the genes from thewhole genome in relation to each other,edit their starts, merge them, insert newgenes, and delete genes. Mousing overthe genes fills in the information boxesnear the top of the display withcoordinates, com_name, etc.

To get to a specific region of the genome,enter coordinates or feat_name in the searchsection at the bottom of every GenomeViewer page. One can also use the pull-down menu on the Gene Curation Page orthe “GV” link on a gene list page.

Page 88: Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE

88

ORF Management in Manatee - 6-frameanalysis

To analyze regions in the 6-frame translation (options boxed inpink), click on the button for the activity you wish and then click inthe open reading frame.

“View sequence” gives you the nucleotide and amino acidsequence of the ORF.“Blast” Blasts the ORF.“Insert Gene” inserts the ORF. (see later slide for more on this)

Page 89: Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE

89

ORF Management in Manatee: geneadjustments

To make adjustments to existing genes in the database, click on theoption you want to do, then click on the arrow for the gene of interest.

New pages will pop up with information specific for your request. (seelater slides)

Page 90: Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE

90

ORF Management in Manatee: start editsThis page can be reached from the“Gene Options” menu on theGenome Viewer page or from theGene Curation Page “Select Function”pull-down by selecting “Edit Start Sites”

Purple text represents the gene ofinterest. Blue text represents othergenes in the region. It is important notto introduce overlap with other geneswhen changing a start site. Oftenediting a start site will remove overlapbetween two genes. Occasionally anannotator may want to extend a startinto an upstream gene and will findthat the upstream gene in question is asmall hypothetical with no homology toanything. In such a case the annotatorshould consider deleting the shorthypothetical, since it becomes likelythat it is not a real gene.

To edit a start, click on the start youwant in the 6-frame representation.The new coordinate for the selectedstart site will appear in the “New End5”box. To save the change to thedatabase, click “Submit”.

Page 91: Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE

91

ORF Management in Manatee: other gene adjustments

Insert

Merge

Delete

In all cases you willbe asked to confirmyour request beforeit is carried out.

gene id

Page 92: Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE

92

Annotation Checklist• Look for HMM hits

– evaluate what the HMMs are telling you - exact function? family membership? domain?• Look at BER results

– looking for proteins in the skim which are characterized (colored backgrounds)– many proteins are characterized but not marked so in our tables - may need to check proteins with

white backgrounds to see if they are characterized– color coding does not indicate quality of match only that the match protein has been experimentally

characterized– evaluate the alignment - what percent ID over what length? active sites? binding sites?– fill in characterized match accession number (by clicking on the accession in left column)

• Look at TMHMM, SignalP, Prosite, region, etc.• Use multiple alignment (belvu link) and tree(tree icon link) as needed to

differentiate function.• Decide what you think the protein should be named• Fill in appropriate fields for common name, gene symbol, EC#, comment.• Decide what GO terms you need

– find them on the page (HMMs, EC number, GO suggestions) or with the GO search tool– change/remove any IEA GO annotations– add GO evidence from HMMs, BER, Genome Properties, Prosite, etc.

• Review TIGR role and change as needed• Check start site

– look in BER and at the BER generated multiple alignment (belvu link)– adjust if necessary - using “edit start” function in pull down or in the Genome Viewer section– check start site box when finished curation

• Check “complete”, click “submit” and your done!

Page 93: Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE

93

External Manatee’s limitations:things available in limited capacity

• Refresh Searches button - will not work, but you can submitsequences for re-searching, we hope to set up an automatedpipeline for this, but the system is not in place yet. Currentlythere is an HMM search page on the CMR that can also beused.

• SignalP in the pull-down can work if you install it locally. Or youcan go to the CBS site to run it on the fly:http://www.cbs.dtu.dk/services

• BER tree view - should work if you have Java on your machine,but may be tricky

• BER multiple alignment - you can view them with belvu if youare running on Linux (what we do), or you can try other multiplealignment tools, a possibility is:– ftp://ftp-igbmc.u-strasbg.fr/pub/ClustalX/

Page 94: Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE

94

External Manatee’s limitations:things not available

• consistency checks• all frameshift scripts• translation exceptions• intergenic region analysis• overlap analysis• annotation status• hypothetical protein list

Page 95: Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE

95

AcknowledgementsHeading up the effort:Granger Sutton

Prokaryotic Annotation:Scott DurkinDerek HarkinsRamana MadupuSagar KothariSusmita ShrivastavaYinong Sebastian

And the many other TIGR employees present and past who havecontributed to the development of these tools and to building theannotation protocols we use. Thanks also go to the fundingagencies that support our work including NIH, NSF, and DOE.

CMR:Tanja DavidsenErin Beck

HMMs/Genome Properties:Dan HaftJeremy Selengut

Bioinformatics Engineers:Alex RichterNikhat Zafar