Annotation of Sarcocystis neurona scaffolds Nigel Austin Turgay Ibrikci Liliana Lopez Kleine Marton Megyeri Caribbean Training Programme on Bioinformatics January 2010
Annotation of Sarcocystisneurona scaffolds
Nigel Austin
Turgay Ibrikci
Liliana Lopez Kleine
Marton Megyeri
Caribbean Training Programme on Bioinformatics January 2010
Sarcocystis neurona
• Genus: Sarcocystis - parasitic protozoa
• occur as sporocysts in the muscle of mammals, birds, and reptiles.
• In humans – asymptomatic
• Sarcocystis neurona causes equine protozoal myoencephalitis
2
S. neurona & Related Apicomplexa
3
Sarcocystis neurona Eimeria
Neospora
Toxoplasma
Life Cycle of S. neurona
4
About Data
• Data cordially supplied by Dr. Jessica Kissinger who very recently acquired the genome sequence
• First 120,000 bp in 4 scaffolds – analysis
• Then 400,000 bp in 4 scaffolds - analysis
5
Objectives
• To annotate novel DNA sequences of S. neurona.
• Detection of coding sequences by:– comparison with other sequences in data
bases
• NB: No reference genome or other info was available since sequences were novel
6
Strategy for Scaffolds
• BLASTX in nr db: search of translated sequence in protein databases
• TBLASTX in est db: search of translated sequence in translated sequence databases
• Comparison in ACT with most closely related organisms (Toxoplasma gondii and Neospora caninum)
7
Results – Blast Search
8
Results BLAST
9
BLAST DB Start End Similarity E-value Subject
BLASTX nr 41446 42924 71 2.00E-16 Conserved hypothetical protein Toxoplasma gondii
BLASTX nr 41464 42942 42 2.00E-44 Conserved hypothetical protein Plasmodium falciparum
BLASTX nr " " 44 2.00E-42 Conserved hypothetical protein Plasmodium vivax
BLASTX nr " " 41 1.00E-37 Conserved hypothetical protein Plasmodium berghei
BLASTX nr " " 40 1.00E-37 Conserved hypothetical protein Cryptosporidium muris
BLASTX nr " " 43 1.00E-22Conserved hypothetical protein Cryptosporidium
parvum
BLASTX nr 10632 10992 69 6.00E-33 Putative lectin doman protein Toxoplasma gondii
BLASTX nr 32690 32968 66 7.00E-18 Transcript GF18541 Drosophila melanogaster
BLASTX nr " " 66 6.00E-17 Putative acylphosphatase Aedes aegypti
BLASTX nr " " 69 4.00E-16 Putative acylphosphatase Toxoplasma gondii
TBLAST est 1538 1840 45 5.00E-08 Xenopus mRNA (cDNA library)
TBLAST est " " 51 1.00E-07 Cyprinus carpio mRNA (cDNA library)
TBLAST est 10986 10967 82 5.00E-10 T. gondii mRNA (cDNA library)
TBLAST est 14716 14904 87 2.00E-33 T. gondii mRNA (cDNA library)
ACT Results
10
Match of region with a conserved gene in Neospora caninum and Toxoplasma gondii
Neospora caninum
scaffolds
Hmmm….
• No genes in 400,000 bp DNA???
• And then….
• Expertise, experience
• He was able to locate a gene
11
Gene Discovered!
Match of region with a conserved gene in Neospora caninum
12
Discovered Gene - Gene1
• The discovered gene was expanded on both the 5’ and 3’ end
• Start and stop codons were identified
• Protein sequence was determined
• BLAST – hypothetical protein with high similarity to one found in Neospora andToxoplasma
13
Gene ComparisonMatch of region with a conserved gene in Neospora caninum and Toxoplasmagondii
Neighbouring genes are not present in the scaffold. 14
Results – Uniprot SearchPerformed with GENE1
15
Further Protein Info
• Characterize our protein product
– Membrane protein? High regions of hydrophobicity
– Domains and motifs
– Secondary structures
16
No transmembrane motifs present
Hydrophobicity Graph
17
Domains & Motifs
18
Conclusion
• Various blast searches may assist in location of orthologous genes in other genomes
• ACT very useful tool for gene discovery and annotation (along with experience & expertise)
• One gene (Gene1) was found in 400 Kb of DNA –scaffolds perhaps in a gene poor region of genome
• Gene1 is perhaps orthologous with a gene in Toxoplasma and Neurospora
• Hypothetical gene – no function prescribed to it
19
Thank You!!!
20