The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen Phage Meeting)

Post on 15-Aug-2015

416 Views

Category:

Science

3 Downloads

Preview:

Click to see full reader

Transcript

The Opera of PhAnToMe 2.0

Ramy K. Aziz (@azizrk)Aug 02 2015

opus (LT) = work (Pl. opera)

SEED-based phage database (2009-2013-…)

Phage Genomics Workshop, Evergreen 2015

Phage Genomics - Evergreen 2015

As usual, slides will be made available

• Evergreen 2011 workshop– http://slidesha.re/phantome1– http://slidesha.re/phiRAST1

• Evergreen 2013 workshop– http://bit.ly/phantome2

• This year’s workshop: – http://bit.ly/phantome3

• Hashtag for the meeting?– #Evergreen15

08/02/2015

PRELUDEThe Opera of PhAnToMe 2.0

Aims• Direct

– Discuss the theory behind RAST– Quickly preview several tools developed under (or

under influence of) the PhAnToMe project– Demonstrate online, community annotation using

SEED

• Indirect– PhAnToMe 2.0?– Establish community annotation efforts/ design

courses/ crowdsourcing– Seek Funding? Crowdfunding?

08/02/2015 Phage Genomics - Evergreen 2015

Outline• Act I. The environment (the SEED)

– The SEED and the ‘Subsystems Technology’

• Act II. The toolbox (PhAnToMe and sequels)– The RAST family– PhACTS– PhiSPy– iVireons

• Act III. The community– Online annotation process – Annotation smmit(s)– Course design

08/02/2015 Phage Genomics - Evergreen 2015

$$

Writing proposals, applying for grants

08/02/2015

History

Phage Genomics - Evergreen 2015

NSF-funded, 3-year project (09-12) to develop

PhageAnnotationTools andMethods

Four Centers:- SDSU, San Diego, CA- VCU, Richmond, VA- USF, St. Pete FL- UA, Tucson, AZ

http://www.phantome.org

08/02/2015

Two years ago…

Phage Genomics - Evergreen 2015

MAJOR UPDATE

08/02/2015

Current status

Phage Genomics - Evergreen 2015

MAJOR UPDATE

08/02/2015

Current status

Phage Genomics - Evergreen 2015

ACT I. THE ENVIRONMENTThe Opera of PhAnToMe 2.0

I. The Environment: SEED

http://theseed.org

08/02/2015

Aziz RK,, et al. (2012) PLoS ONE 7(10): e48053. doi:10.1371/journal.pone.0048053

Phage Genomics - Evergreen 2015

SEED: Main concept

One genome

All genomes

08/02/2015 Phage Genomics - Evergreen 2015

SEED: Main concept

One genome

All genomes

08/02/2015 Phage Genomics - Evergreen 2015

“Subsystems-based technologies were developed in the SEED with the view that the interpretation of one genome can be made more efficient and consistent if hundreds of genomes are simultaneously annotated in one subsystem at a time”

SEED: Main concept• Protein-based database

Jargon: PEG = protein-encoding gene

• The subsystems approach

and• FIGfams: protein families based on

– sequence similarity– chromosomal co-occurrence, gene order,

synteny– human curation, evidence-based expert

assertions08/02/2015 Phage Genomics - Evergreen 2015

RAST: automated annotation

08/02/2015 Phage Genomics - Evergreen 2015

08/02/2015

What is a subsystem?• “A subset of functional roles studied across genomes”• A spreadsheet where:

– each row represents a genome– each column represents a functional role/ feature/ protein– different patterns = variants

Function 1 Function 2 … Function n

Genome a

Genome b

Genome z

Phage Genomics - Evergreen 2015

08/02/2015

What is a subsystem?

Phage Genomics - Evergreen 2015

Advantages of subsystems

Subsystems-basedannotation

08/02/2015 Phage Genomics - Evergreen 2015

Annotation Reconstruction

from genome from metagenome

08/02/2015 Phage Genomics - Evergreen 2015

Incomplete

frameshift

- complete- accurate

Credit: Andrew Kropinski Credit: Bas Dutilh

faulty assembly

Annotation Reconstruction

from genome from metagenome

08/02/2015

Incomplete faulty assembly

frameshift

- complete- accurate

Phage Genomics - Evergreen 2015

Credit: Andrew Kropinski Credit: Bas Dutilh

ACT II. THE TOOLBOXThe Opera of PhAnToMe 2.0

II. PhAnToMe ToolBoxhttp://www.phantome.org

08/02/2015 Phage Genomics - Evergreen 2015

The ToolBox: The RAST family• (At least) Five ways to annotate a genome via RAST:

– RAST (http://rast.nmpdr.org)

• annotates online, saves your genome on server

– myRAST (local)

• uses the server but you can edit offline)

– “PhAST” (http://www.phantome.org/PhageSeed/Phage.cgi?page=phast)

• optimized gene-calling

– Use your favorite gene caller then upload gbk file to RAST

– RASTtk (second-generation RAST)

• modular

• batch upload

08/02/2015 Phage Genomics - Evergreen 2015

New

http://rast.nmpdr.org

08/02/2015 Phage Genomics - Evergreen 2015

http://rast.nmpdr.org

08/02/2015 Phage Genomics - Evergreen 2015

“PhAST”: phage-optimized RAST

08/02/2015 Phage Genomics - Evergreen 2015

http://www.phantome.org/PhageSeed/Phage.cgi?page=phast

“PhAST”: phage-optimized RAST

08/02/2015 Phage Genomics - Evergreen 2015

http://www.phantome.org/PhageSeed/Phage.cgi?page=phast

RASTtk (RAST toolkit)

08/02/2015 Phage Genomics - Evergreen 2015

RASTtk (RAST toolkit)

08/02/2015 Phage Genomics - Evergreen 2015

The RASTtk Microbial Annotation Pipeline

FASTA QCFASTA to Genome TO Call rRNAs Call tRNAs

Call CDSsProdigal

Call CDSsGlimmer3

AnnotateProteins K-mer v2

AnnotateProteins K-mer v1

Call CRISPRs CALL Phages (PhiSpy)

Find Repeats ExportGenBank,

GFF3, Fasta

• Green boxes are alternative pipeline steps

• Dashed boxes are optional pipeline steps

08/02/2015 Phage Genomics - Evergreen 2015

In final development: phi-RASTtk

FASTA QCFASTA to Genome TO Call rRNAs Call tRNAs

Call CDSsProdigal

Call CDSsGenMark

AnnotatePhage

Proteins

AnnotateProteins K-mer v2

Find Repeats Find Toxins ExportGenBank,

GFF3, Fasta

• Green boxes are alternative pipeline steps

• Dashed boxes are optional pipeline steps

08/02/2015 Phage Genomics - Evergreen 2015

RASTtk command-line

08/02/2015 Phage Genomics - Evergreen 2015

RAST Video demos available• Watch on your own:

– http://tutorial.theseed.org

• Possible tutorial on Tuesday at 3 PM + hands-on application

08/02/2015 Phage Genomics - Evergreen 2015

ACTIVITIES/EXERCISESAfter this workshop (1 PM)

Phage Genomics - CeBio 2015

What do you need to annotate your genome?

• A sequenced genome• Format: fasta or genbank (.gbk)• A RAST username and password

06/02/2015

Phage Genomics - CeBio 2015

I. Browse your favorite genome

06/02/2015

Phage Genomics - CeBio 2015

1. Browse your favorite genome

06/02/2015

Phage Genomics - CeBio 2015

2. Explore the protein page• Annotation history• Annotation clearinghouse• Evidence

– similarities– literature

06/02/2015

Phage Genomics - CeBio 2015

2. Explore the protein page

06/02/2015

• Find your favorite protein

Phage Genomics - CeBio 2015

2. Explore the protein page

06/02/2015

• Find your favorite protein

Phage Genomics - CeBio 2015

3. Aligning proteins (in context)• Evidence> Similarities> Align• Compare region, advanced settings• Phylogenetic trees

06/02/2015

Phage Genomics - CeBio 2015

3. Aligning proteins (in context)

06/02/2015

END OF ACTIVITY

Other tools• PHACTS:

– classifies and predicts lifestyle

• PhiSpy: – finds prophages

• iVireons– predicts phage structural proteins, holins,

more to come

08/02/2015 Phage Genomics - Evergreen 2015

The ToolBox: PHACTS• PHAge Classification Tool Set

• Uses a novel similarity algorithm and a supervised Random Forest classifier to predict whether the lifestyle of a phage, described by its proteome, is virulent or temperate.

• The similarity algorithm creates a training set from phages with known lifestyles and along with the lifestyle annotation, trains a Random Forest to classify the lifestyle of a phage.

• PHACTS predictions have had a 99% precision rate.

08/02/2015 Phage Genomics - Evergreen 2015Kate McNair

PHACTS• http://www.phantome.org/PHACTS/

• Other applications• Host prediction: whether a phage infects a Gram

positive or Gram negative bacteria• Taxonomy prediction: a phage’s Family

08/02/2015 Phage Genomics - Evergreen 2015Kate McNair

PHACTS

08/02/2015 Phage Genomics - Evergreen 2015Kate McNair

The ToolBox: PhiSpy

Calculate genomic characteristics

Classifyprophage region

Evaluate predicted prophages

• Transcriptional Strand Orientation• Customized AT skew• Customized GC skew• Protein length • Abundance of Phage words

• Random Forest• Pre calculated training genome• Input bacterial genome

• Produce a rank for each gene

• Phage insertion points• Similarity of phage proteins

08/02/2015 Phage Genomics - Evergreen 2015Sajia Akhter

PhiSpy

• Performance comparison in 50 complete bacterial genomes

Applications %Identified %FN %FP

Prophinder 89% 11% 12%

Phage_finder 82% 18% 1.33%

PhiSpy 94% 6% 0.66%

08/02/2015 Phage Genomics - Evergreen 2015Sajia Akhter

• Download: PhiSpy – http://sourceforge.net/projects/phispy

• PhiSpy is on RASTtk

• Ran PhiSpy on 4,335 bacterial genomes

• Predicted 12,826 prophages in 3,203 genomes

– 9,101 known prophages

– 3,723 undefined prophages08/02/2015 Phage Genomics - Evergreen 2015

PhiSpy

Sajia Akhter

iVIREONS – http://vdm.sdsu.edu/ivireons

Victor Seguritan

“FAMILIES” OF ANNs

1) General structural proteins:

2) Phage major capsid proteins

3) Phage tail/tail fibers/collar etc.

4) Holins

5) Portals

• Trained with all types of proteins• Both phages & viruses

08/02/2015 Phage Genomics - Evergreen 2015

Victor Seguritan

1

iVIREONS – http://vdm.sdsu.edu/ivireons

2Enter User Info

VibrioPhage

virus@microsoft.comDHS

3Upload Sequences

Victor Seguritan

4 View Results

5Copy Results to a Spreadsheet

iVIREONS – http://vdm.sdsu.edu/ivireons

- Structural 1:1- MCP 1:1- MCP 2:1- MCP 3:1- MCP 4:1- MCP 7:1- MCP 22:1

(lambda)- Tail 1:1- Tail 2:1- Tail 4:1- Tail 7:1- Tail 6.6:1

(lambda)

Stringencies Reported

08/02/2015 Phage Genomics - Evergreen 2015

ACT III. THE COMMUNITYThe Opera of PhAnToMe 2.0

SEED allows continuous annotation

08/02/2015

SEED

RAST

GenomesSubsystems

SEED Viewer

New Genomes

Subsystems Editor

Phage Genomics - Evergreen 2015

SEED allows community annotation

08/02/2015 Phage Genomics - Evergreen 2015

Annotations will improve only if YOU help

08/02/2015 Phage Genomics - Evergreen 2015

Prospects• Phage annotation “summits”

– First summit (Jan 2011) was at Biosphere 2, Tucson, AZ

– A second one?• On a summit? (e.g., Bogotá? Mount Sinai?)• Red Sea Resort in Egypt??

• Pushing for community annotation– Undergraduate students (I have about 20 in training)

08/02/2015 Phage Genomics - Evergreen 2015

FINALEThe Opera of PhAnToMe 2.0

Aims• Direct

– Discuss the theory behind RAST– Quickly preview several tools developed under (or

under influence of) the PhAnToMe project– Demonstrate online, community annotation using

SEED

• Indirect– PhAnToMe 2.0?– Establish community annotation efforts/ design

courses/ crowdsourcing– Seek Funding? Crowdfunding?

08/02/2015 Phage Genomics - Evergreen 2015

AcknowledgmentsRobert A. Edwards, PhD

• RASTtk and PhiRAST development: Ross Overbeek, Robert Olson, Jim Davis, Gordon Pusch, Terry Disz, Bruce Parrello

• Phage annotators (Phantomers): Bhakti Dwivedi, Mya Breitbart, et al.

• FIG and all SEED annotators:VeronikaV, SvetaG, OlgaV/Z, et al.

Sajia Akhter

08/02/2015

$$

Phage Genomics - Evergreen 2015

& NSF

$$& NSF

Acknowledgments

• PHAST

Victor Seguritan

08/02/2015

Katelyn McNair

• iVireons

Phage Genomics - Evergreen 2015

If you use, please cite• SEED, RAST, myRAST, phiRAST, PHAST:

– RAST: Aziz et al., BMC Genomics 2008 – SEED servers: Aziz RK,, et al. (2012) PLoS ONE 7(10): e48053. – Nucleic Acids Res. 2014 Jan;42(Database issue):D206-14

• Letters of support

06/02/2015 Phage Genomics - CeBio 2015

Questions?

top related