Top Banner
The Opera of PhAnToMe Ramy K. Aziz (Twitter: @azizrk) Aug 04 2013 opus (LT) = work (Pl. opera) The environment, the toolbox, and the community Phage Genomics Workshop, Evergreen 2013
47

"The Opera of PhAnToMe": Phage Annotation Tools at the 20th Biennial Evergreen International Phage Meeting

May 10, 2015

Download

Technology

Ramy K. Aziz

Tools and Methods developed under the PhAnToMe (http://www.phantome.org) project between 2009-2012 using the Subsystems Technology, the SEED (http://theseed.org) environment, and RAST server (http://rast.nmpdr.org)

Third presentation at the Phage Genomics Workshop at the 20th Biennial Evergreen International Phage Meeting
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: "The Opera of PhAnToMe": Phage Annotation Tools at the 20th Biennial Evergreen International Phage Meeting

The Opera of PhAnToMe

Ramy K. Aziz (Twitter: @azizrk)Aug 04 2013

opus (LT) = work (Pl. opera)

The environment, the toolbox, and the community

Phage Genomics Workshop, Evergreen 2013

Page 2: "The Opera of PhAnToMe": Phage Annotation Tools at the 20th Biennial Evergreen International Phage Meeting

08/04/2013

Past,

Phage Genomics - Evergreen 2013

NSF-funded, 3-year project (09-12) to develop

PhageAnnotationTools andMethods

Four Centers:- SDSU, San Diego, CA- VCU, Richmond, VA- USF, St. Pete FL- UA, Tucson, AZ

Page 3: "The Opera of PhAnToMe": Phage Annotation Tools at the 20th Biennial Evergreen International Phage Meeting

http://www.phantome.org

08/04/2013

… present, ...

Phage Genomics - Evergreen 2013

Page 4: "The Opera of PhAnToMe": Phage Annotation Tools at the 20th Biennial Evergreen International Phage Meeting

?TBA

08/04/2013

… and future

Phage Genomics - Evergreen 2013

Page 5: "The Opera of PhAnToMe": Phage Annotation Tools at the 20th Biennial Evergreen International Phage Meeting

Aims• Direct

– Discuss the concepts behind RAST– Quickly preview several tools developed under (or

under influence of) the PhAnToMe project– Demonstrate online, community annotation using

SEED

• Indirect {hidden agenda ;)}– PhAnToMe 2.0?– Establish community annotation efforts/

crowdsourcing– Seek Funding? Crowdfunding?

08/04/2013 Phage Genomics - Evergreen 2013

Page 6: "The Opera of PhAnToMe": Phage Annotation Tools at the 20th Biennial Evergreen International Phage Meeting

Outline• The environment (the SEED)

– The SEED and the ‘Subsystems Technology’

• The toolbox (PhAnToMe and sequels)– PHAST and RAST– PhACTS– PhiSPy– iVireons

• The community– Online annotation process – Annotation jamboree(s)– Course design

08/04/2013 Phage Genomics - Evergreen 2013

$$

Writing proposals, applying for grants

Page 7: "The Opera of PhAnToMe": Phage Annotation Tools at the 20th Biennial Evergreen International Phage Meeting

I. THE ENVIRONMENTThe Opera of PhAnToMe

Phage Genomics - Evergreen 2013

Page 8: "The Opera of PhAnToMe": Phage Annotation Tools at the 20th Biennial Evergreen International Phage Meeting

I. The Environment: SEED

http://theseed.org

08/04/2013

Aziz RK,, et al. (2012) PLoS ONE 7(10): e48053. doi:10.1371/journal.pone.0048053

Phage Genomics - Evergreen 2013

Page 9: "The Opera of PhAnToMe": Phage Annotation Tools at the 20th Biennial Evergreen International Phage Meeting

SEED: Main concept

One genome

All genomes

08/04/2013 Phage Genomics - Evergreen 2013

Page 10: "The Opera of PhAnToMe": Phage Annotation Tools at the 20th Biennial Evergreen International Phage Meeting

SEED: Main concept

One genome

All genomes

08/04/2013 Phage Genomics - Evergreen 2013

“Subsystems-based technologies were developed in the SEED with the view that the interpretation of one genome can be made more efficient and consistent if hundreds of genomes are simultaneously annotated in one subsystem at a time”

Aziz RK,, et al. (2012) PLoS ONE 7(10): e48053. doi:10.1371/journal.pone.0048053

Page 11: "The Opera of PhAnToMe": Phage Annotation Tools at the 20th Biennial Evergreen International Phage Meeting

SEED: Main concept• Protein-based database

Jargon: PEG = protein-encoding gene

• The subsystems approach

and• FIGfams: protein families based on

– sequence similarity– chromosomal co-occurrence, gene order,

synteny– human curation, evidence-based expert

assertions08/04/2013 Phage Genomics - Evergreen 2013

Page 12: "The Opera of PhAnToMe": Phage Annotation Tools at the 20th Biennial Evergreen International Phage Meeting

RAST: automated annotation

08/04/2013 Phage Genomics - Evergreen 2013

Page 13: "The Opera of PhAnToMe": Phage Annotation Tools at the 20th Biennial Evergreen International Phage Meeting

08/04/2013

What is a subsystem?• “A subset of functional roles studied across genomes”• A spreadsheet where:

– each row represents a genome– each column represents a functional role/ feature/ protein– different patterns = variants

Function 1 Function 2 … Function n

Genome a

Genome b

Genome z

Phage Genomics - Evergreen 2013

Page 14: "The Opera of PhAnToMe": Phage Annotation Tools at the 20th Biennial Evergreen International Phage Meeting

08/04/2013

What is a subsystem?

Phage Genomics - Evergreen 2013

Page 15: "The Opera of PhAnToMe": Phage Annotation Tools at the 20th Biennial Evergreen International Phage Meeting

Advantages of subsystems

Subsystems-basedannotation

08/04/2013 Phage Genomics - Evergreen 2013

Page 16: "The Opera of PhAnToMe": Phage Annotation Tools at the 20th Biennial Evergreen International Phage Meeting

Annotation Reconstruction

from genome from metagenome

08/04/2013 Phage Genomics - Evergreen 2013

Incomplete

frameshift

- complete- accurate

Credit: Andrew Kropinski Credit: Bas Dutilh

faulty assembly

Page 17: "The Opera of PhAnToMe": Phage Annotation Tools at the 20th Biennial Evergreen International Phage Meeting

Annotation Reconstruction

from genome from metagenome

08/04/2013

Incomplete faulty assembly

frameshift

- complete- accurate

Phage Genomics - Evergreen 2013

Credit: Andrew Kropinski Credit: Bas Dutilh

Page 18: "The Opera of PhAnToMe": Phage Annotation Tools at the 20th Biennial Evergreen International Phage Meeting

II. THE TOOLBOXThe Opera of PhAnToMe

Phage Genomics - Evergreen 2013

Page 19: "The Opera of PhAnToMe": Phage Annotation Tools at the 20th Biennial Evergreen International Phage Meeting

II. PhAnToMe ToolBoxhttp://www.phantome.org

08/04/2013 Phage Genomics - Evergreen 2013

Page 20: "The Opera of PhAnToMe": Phage Annotation Tools at the 20th Biennial Evergreen International Phage Meeting

II. The ToolBox: RAST• (At least) Four ways to annotate a genome via

RAST:

– myRAST (local)

• uses the server but you can edit offline)

– RAST (http://rast.nmpdr.org)

• annotates online, saves your genome on server

– “PhAST” (http://www.phantome.org/PhageSeed/Phage.cgi?page=phast)

• optimized gene-calling

– Use your favorite gene caller then upload gbk file to RAST

08/04/2013 Phage Genomics - Evergreen 2013

Page 21: "The Opera of PhAnToMe": Phage Annotation Tools at the 20th Biennial Evergreen International Phage Meeting

http://rast.nmpdr.org

08/04/2013 Phage Genomics - Evergreen 2013

Page 22: "The Opera of PhAnToMe": Phage Annotation Tools at the 20th Biennial Evergreen International Phage Meeting

phiRAST complaints• ORF/Gene calling

• tRNA– bug fixed, but still follow Andrew’s advice

• Too many hypotheticals, etc. – manual annotation, see later

– need for expert annotations, community contribution

– funding

08/04/2013 Phage Genomics - Evergreen 2013

Page 23: "The Opera of PhAnToMe": Phage Annotation Tools at the 20th Biennial Evergreen International Phage Meeting

“PhAST”: some improvement?

08/04/2013 Phage Genomics - Evergreen 2013

Page 24: "The Opera of PhAnToMe": Phage Annotation Tools at the 20th Biennial Evergreen International Phage Meeting

“PhAST”: some improvement?

08/04/2013 Phage Genomics - Evergreen 2013

Page 25: "The Opera of PhAnToMe": Phage Annotation Tools at the 20th Biennial Evergreen International Phage Meeting

PHAST: Disambiguation

08/04/2013 Phage Genomics - Evergreen 2013

Page 26: "The Opera of PhAnToMe": Phage Annotation Tools at the 20th Biennial Evergreen International Phage Meeting

Other tools• PHACTS:

– classifies and predicts lifestyle

• PhiSpy: – finds prophages

• iVireons– predicts phage structural proteins, holins,

more to come

08/04/2013 Phage Genomics - Evergreen 2013

Page 27: "The Opera of PhAnToMe": Phage Annotation Tools at the 20th Biennial Evergreen International Phage Meeting

II. The ToolBox: PHACTS• PHAge Classification Tool Set

• Uses a novel similarity algorithm and a supervised Random Forest classifier to predict whether the lifestyle of a phage, described by its proteome, is virulent or temperate.

• The similarity algorithm creates a training set from phages with known lifestyles and along with the lifestyle annotation, trains a Random Forest to classify the lifestyle of a phage.

• PHACTS predictions have had a 99% precision rate.

08/04/2013 Phage Genomics - Evergreen 2013 Kate McNair

Page 28: "The Opera of PhAnToMe": Phage Annotation Tools at the 20th Biennial Evergreen International Phage Meeting

PHACTS

• Out of the 227 phages with a known lifestyle, PHACTS was able to confidently and correctly calculate the lifestyle of 197 phages.

• Only 2 phages were predicted confidently wrong: The two phages that were confidently incorrectly classified were both virulent phages that contained a functional integrase

08/04/2013 Phage Genomics - Evergreen 2013 Kate McNair

Page 29: "The Opera of PhAnToMe": Phage Annotation Tools at the 20th Biennial Evergreen International Phage Meeting

PHACTS• http://www.phantome.org/PHACTS/

• Other applications• Host prediction: whether a phage infects a Gram

positive or Gram negative bacteria• Taxonomy prediction: a phage’s Family

08/04/2013 Phage Genomics - Evergreen 2013 Kate McNair

Page 30: "The Opera of PhAnToMe": Phage Annotation Tools at the 20th Biennial Evergreen International Phage Meeting

PHACTS

08/04/2013 Phage Genomics - Evergreen 2013 Kate McNair

Page 31: "The Opera of PhAnToMe": Phage Annotation Tools at the 20th Biennial Evergreen International Phage Meeting

II. The ToolBox: PhiSpy

Calculate genomic characteristics

Classifyprophage region

Evaluate predicted prophages

• Transcriptional Strand Orientation• Customized AT skew• Customized GC skew• Protein length • Abundance of Phage words

• Random Forest• Pre calculated training genome• Input bacterial genome

• Produce a rank for each gene

• Phage insertion points• Similarity of phage proteins

08/04/2013 Phage Genomics - Evergreen 2013 Sajia Akhter

Page 32: "The Opera of PhAnToMe": Phage Annotation Tools at the 20th Biennial Evergreen International Phage Meeting

PhiSpy

• Performance comparison in 50 complete bacterial genomes

Applications %Identified %FN %FP

Prophinder 89% 11% 12%

Phage_finder 82% 18% 1.33%

PhiSpy 94% 6% 0.66%

08/04/2013 Phage Genomics - Evergreen 2013 Sajia Akhter

Page 33: "The Opera of PhAnToMe": Phage Annotation Tools at the 20th Biennial Evergreen International Phage Meeting

• Download: PhiSpy – http://sourceforge.net/projects/phispy

• PhiSpy is on Kbase– http://kbase.science.energy.gov

• Web version under final development

• Ran PhiSpy on 4,335 bacterial genomes

• Predicted 12,826 prophages in 3,203 genomes

– 9,101 known prophages

– 3,723 undefined prophages08/04/2013 Phage Genomics - Evergreen 2013

PhiSpy

Sajia Akhter

Page 34: "The Opera of PhAnToMe": Phage Annotation Tools at the 20th Biennial Evergreen International Phage Meeting

iVIREONS – http://vdm.sdsu.edu/ivireons

Victor Seguritan

Page 35: "The Opera of PhAnToMe": Phage Annotation Tools at the 20th Biennial Evergreen International Phage Meeting

Victor Seguritan

Application of Artificial Neural Networks (ANNs)

to Viral Dark Matter

Viral Hypothetical Protein Sequences

Known

eval <= 0.001

Conserved Domain DB (rpsblast)

Keep sequences ≥ 200 aa

no hit OR e-value > 0.001

no hit OR e-value > 0.001

eval <= 0.001

Reference Sequence DB(tblastp)

Artificial Neural Networks (ANNs)

Remove ≥ 80% identical sequences

Synthesize ANN-predicted Hypothetical Protein Genes

Clone in E.coli

Purification By Cobalt Affinity

Validation by TEM or X-ray Crystallography

08/04/2013 Phage Genomics - Evergreen 2013

Page 36: "The Opera of PhAnToMe": Phage Annotation Tools at the 20th Biennial Evergreen International Phage Meeting

“FAMILIES” OF ANNs

1) General structural proteins:

2) Phage major capsid proteins

3) Phage tail/tail fibers/collar etc.

4) Holins

5) Portals

• Trained with all types of proteins• Both phages & viruses

08/04/2013 Phage Genomics - Evergreen 2013

Victor Seguritan

Page 37: "The Opera of PhAnToMe": Phage Annotation Tools at the 20th Biennial Evergreen International Phage Meeting

1

iVIREONS – http://vdm.sdsu.edu/ivireons

2Enter User Info

VibrioPhage

[email protected]

3Upload Sequences

Victor Seguritan

Page 38: "The Opera of PhAnToMe": Phage Annotation Tools at the 20th Biennial Evergreen International Phage Meeting

4 View Results

5Copy Results to a Spreadsheet

iVIREONS – http://vdm.sdsu.edu/ivireons

- Structural 1:1- MCP 1:1- MCP 2:1- MCP 3:1- MCP 4:1- MCP 7:1- MCP 22:1

(lambda)- Tail 1:1- Tail 2:1- Tail 4:1- Tail 7:1- Tail 6.6:1

(lambda)

Stringencies Reported

08/04/2013 Phage Genomics - Evergreen 2013

Page 39: "The Opera of PhAnToMe": Phage Annotation Tools at the 20th Biennial Evergreen International Phage Meeting

III. THE COMMUNITYThe Opera of PhAnToMe

Phage Genomics - Evergreen 2013

Page 40: "The Opera of PhAnToMe": Phage Annotation Tools at the 20th Biennial Evergreen International Phage Meeting

SEED allows continuous annotation

08/04/2013

SEED

RAST

GenomesSubsystems

SEED Viewer

New Genomes

Subsystems Editor

Phage Genomics - Evergreen 2013

Page 41: "The Opera of PhAnToMe": Phage Annotation Tools at the 20th Biennial Evergreen International Phage Meeting

SEED allows community annotation

08/04/2013 Phage Genomics - Evergreen 2013

Page 42: "The Opera of PhAnToMe": Phage Annotation Tools at the 20th Biennial Evergreen International Phage Meeting

Later in the meeting, • Who might be interested in putting

together:a) an outline for an annotation jamboree/

workshop with phage experts

b) a syllabus/outline for a course to get undergraduate/graduate students to annotate specific subsystems

c) a proposal to get funding for community annotation efforts

d) all above

08/04/2013 Phage Genomics - Evergreen 2013

Page 43: "The Opera of PhAnToMe": Phage Annotation Tools at the 20th Biennial Evergreen International Phage Meeting

POST SCRIPTUMThe Opera of PhAnToMe

08/04/2013 Phage Genomics - Evergreen 2013

Page 44: "The Opera of PhAnToMe": Phage Annotation Tools at the 20th Biennial Evergreen International Phage Meeting

Aims• Direct

– Discuss the concepts behind RAST– Quickly preview several tools developed under (or

under influence of) the PhAnToMe project– Demonstrate online, community annotation using

SEED

• Indirect {hidden agenda ;)}– PhAnToMe 2.0?– Establish community annotation efforts/

crowdsourcing– Seek Funding? Crowdfunding?

08/04/2013 Phage Genomics - Evergreen 2013

Page 45: "The Opera of PhAnToMe": Phage Annotation Tools at the 20th Biennial Evergreen International Phage Meeting

If you use, please cite• SEED, RAST, myRAST, phiRAST, PHAST:

– RAST, BMC Genomics 2008 and SEED servers: PLoS ONE 2011

• Other tools– PHAST: McNair et al. PMID: 22238260; PhiSpy: Akhter et al. PMID:

22584627; iVireons: Seguritan et al. PMID: 22927809

• Letters of support

08/04/2013 Phage Genomics - Evergreen 2013

Page 46: "The Opera of PhAnToMe": Phage Annotation Tools at the 20th Biennial Evergreen International Phage Meeting

AcknowledgmentsRobert A. Edwards, PhD

• PhiRAST development: Ross Overbeek, Robert Olson, Gordon Pusch, Terry Disz, Bruce Parrello

• Phage annotators (Phantomers): Bhakti Dwivedi, Mya Breitbart, et al.

• FIG and all SEED annotators:VeronikaV, SvetaG, OlgaV/Z, et al.

Sajia Akhter

08/04/2013

$$

Phage Genomics - Evergreen 2013

& NSF

Page 47: "The Opera of PhAnToMe": Phage Annotation Tools at the 20th Biennial Evergreen International Phage Meeting

$$& NSF

Acknowledgments

• PHAST

Victor Seguritan

08/04/2013

Katelyn McNair

• iVireons

Phage Genomics - Evergreen 2013