Top Banner
Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning that most gene annotations contain at least one mis-annotated exon. (Yandell and Ence, 2012, Nature Reviews) Automated annotation is often not good enough for genes you really care about!
48

Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.

Dec 22, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.

Why Manual Genome Annotation?

Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning that most gene annotations contain at least one mis-annotated exon. (Yandell and Ence, 2012, Nature Reviews)

Automated annotation is often not good enough for genes you really care about!

Page 2: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.
Page 3: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.

Yandell and Ence, 2012, Nature Reviewshttp://www.yandell-lab.org/publications/pdf/euk_genome_annotation_review.pdf

Page 4: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.

Different lines of evidence go into modern gene annotation pipelines:1. Computational prediction (Open Reading Frames, etc.)2. Evidence based prediction (ESTs, RNA-seq, etc)3. Homology based prediction (BLAST, etc)Synthesized into a consensus gene annotation – still may be wrong!

Page 5: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.

Bees(Order Hymenoptera, Family Apidae)

Western Honey Bee (Apis mellifera)

Common Eastern Bumble Bee (Bombus impatiens)

Buff-Tailed Bumble Bee (Bombus terrestris) Dwarf Asian Honey Bee

(Apis florea)

Page 6: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.

NADPH + H+ + O2 + R-H NADP+ + H2O + R-OH

cytochrome P450 monooxygenase enzymes

classification: CYP 3 A 4

family>40% amino acid sequence-homology

sub-family>55% amino acid sequence-homology

isoenzyme

*15 A-B

allele

Page 7: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.

Chemical signalling??? (pheromone synthesis and breakdown)

Detoxication(toxin and pesticide metabolism)

Hormone synthesis (highly conserved orthologs)+ Detoxication

Page 8: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.

Organism P450s food / environment

Nasonia vitripennis 92 f ly pupae

Apis mellifera 46 nectar and pollen / homeostatic nest

Anopheles gambiae 106 blood and detritus / standing water

Drosophila melanogaster 85 rotting fruit

Tribolium castaneum 131 seeds

Page 9: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.

Organism P450s Mito CYP2 CYP3 CYP4

Drosophila melanogaster 85 11 6 36 32

Apis mellifera 46 6 8 28 4

Nasonia vitripennis 87 6 7 45 29

Page 10: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.

Repeats

Page 11: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.

Intron splice sites are highly conserved

Page 12: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.

P450s:~ 500 amino acids (1500 nucleotides)Highly conserved heme-binding site (cysteine)

Page 13: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.

Basic Annotation Rules

CDS StartAmino acid MNucleotide ATG

CDS Stop * Amino AcidTAA/TAG/TAG Nucleotide

Translation Frames

Frame 1Frame 2Frame 3

Page 14: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.

http://en.wikipedia.org/wiki/File:Exon_and_Intron_classes.png

http://doc.goldenhelix.com/SVS/latest/_images/splice_site_diagram.png

Intron splice sites

GT-AG

Page 15: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.
Page 16: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.
Page 17: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.
Page 18: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.
Page 19: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.
Page 20: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.
Page 21: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.
Page 22: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.
Page 23: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.
Page 24: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.
Page 25: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.
Page 26: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.
Page 27: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.
Page 28: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.
Page 29: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.
Page 30: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.
Page 31: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.
Page 32: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.
Page 33: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.

“(\w)”

“\1 “

Page 34: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.
Page 35: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.

‘GT’ intron donor site

Page 36: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.
Page 37: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.
Page 38: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.

‘AG’ intron acceptor site

Page 39: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.

‘GT’ intron donor site

1 nucelotide “G” for next codon = Phase 1 intron

Page 40: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.

‘AG’ intron acceptor site

2 nucelotides “AA” before first full codon

Combine with “G” on exon 2

Make the codon “GAA” for glutamic acid (E)

Page 41: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.
Page 42: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.

This start looks good!

Page 43: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.
Page 44: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.
Page 45: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.
Page 46: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.
Page 47: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.

Jamboree!Search for paralogs using one of these genes from Apis mellifera in the protein database on Genbank (e.g. CYP9R1 AND Apis mellifera)

CYP9R1 CYP6AS3CYP6BD1CYP6AQ1CYP4G11

Use BLASTP to find predicted paralogs in the NCBI “nr” database. Select one of the following bees for the Organism:

Apis floreaBombus impatiensBombus terrestrisMegachile rotundata

Copy and paste verified amino acid sequences (FASTA formatted) into a text file:

Page 48: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.

Add comments to the header and include a geneidentifier

Send to me at: [email protected]

Thanks!!