Transcript
CNIT Final PresentationChris ThompsonApril 18th, 2013
CNIT 227
Introduction
Materials
Methods
Results and Conclusion
Table of Contents
INTRODUCTION
Bioinformatics
Bioinformatics – an interdisciplinary field that develops and improves upon methods for storing, retrieving, organizing, and analyzing biological data.
Bioinformatics is important because without the technologies produced and developed through it, many of the experiments and assays we do today would not be possible.
CNIT
CNIT is the bioinformatics course at Purdue, focused on annotating the genome of mycobacteriophages.
Overall goal is to annotate the genome of the RiverMonster phage, so other researchers can use it in the future.
Bacteriophages• A virus that infects and replicates in bacteria• One of the most common and populous
organism in existence • Many have a mosaic genome• Unlimited potential usage• Mycobacteriophages infect M.smegmatis
Clusters
• System to organize bacteriophages• Phages sorted by factors such as genome
length, presence of certain genes, organization of genome, GC content, and plaque size and characteristics
• A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, Singleton, and T
RiverMonster
• Discovered in 2010 in West Lafayette• Mycobacteriophage• Cluster E• 144 genes in total• Many protein products are unknown• Overall geographical presence is unknown
Through CNIT and bioinformatics we are trying to answer some of the unknowns about RiverMonster
MATERIALS
Bioinformatics Tools
• DNA Master• Phamerator• Glimmer• GeneMark• NCBI and BLAST• EverNote
DNA Master• Designed and written by Dr. Jeffrey Lawrence• Annotation program• Can auto-annotate entire genomes• Uses information from Glimmer and GeneMark• Can locally BLAST genes
Phamerator
• Developed in 2011• Linux-based bioinformatic program• Used for comparative phage genomics• Can visualize entire phage genomes• Separates phages into “phams”
Glimmer
• Stands for Gene Locator and Interpolated Markov ModelER
• Used for finding genes in microbial DNA• Uses models and algorithms to distinguish between
coding and non-coding DNA
GeneMark
• A family of gene prediction programs developed at the Georgia Institute of Technology
• Determines the protein-coding potential of a DNA sequence
• Uses many of the same algorithms and models as GIimmer
NCBI and BLAST• National Center for Biotechnology Information• Basic Local Alignment Search Tool• Program that compares DNA sequences with a large
database of known sequences• Used to find similar gene sequences
EverNote
• Started in 2008• Designed for note-taking and archiving• Used as an online lab notebook for CNIT
METHODS
Organization
• Genome split into two sections• Genes 0 to 65 by Jon and Bill• Genes 66 to 144 by Chris and Nyema• Split again into four sections• 0 to 23 by Jon• 24 to 65 by Bill• 100 to 123 by Chris• 124 to 144 by Nyema
Process
• Document the auto-annotated gene call• Ran the Shine-Delgarno Test• BLASTed gene and compared scores• Compared homologous genes in Phamerator• Made final call
First Section
• Genes 66 to 144• Split up evens and odds• I had even numbered genes• No outstandingly tricky gene calls• Gene 88 seems to be a family of Kinases, many of
them hypothetical• Gene 92 is a family of RNA ligases• Gene 94 is Transcription factor WhiB
Second Section
• Genes 101 to 123• Every gene• Gene 101 is a protease family• Gene 112 contains genes for polymerases• Genes 116 and 117 were reverse genes• 117 had many inconsistencies and was difficult to call
RESULTS AND CONCLUSION
Accomplishments
• Personally called 39 genes• Called 144 genes as a class• Analyzed protein products• Completed a final draft of the RiverMonster genome
Significance
• Genome can be used by future scientists• Proves validity of undergraduate research• Learned about bioinformatics, bacteriophages,
genomes, annotation, and biotechnology
Future Work
• Check and finalize all gene calls• Compilation of DNA Master file• Send to HHMI and SEA Phages to be put in
Phamerator
The End
top related