The Wold Lab BioHub Cory Tobin
Goal
• Standardize the relationship between biological data
• Integrate all of the data seamlessly
• Provide novel methods to search for and analyze data
Background
Species A
Species B
Paralogs
Orthologs
The more general term is “homology”
Gene Gene
Gene
Requirements
• Be more accurate and flexible than HomoloGene
• Work in real time
• Make sense of HomoloGene’s misleading data
Rationale
Gene
Gene
Gene
Gene
Gene
They are similar
Gene
Gene Gene
GeneGene
HomoloGene BioHub
They are related like this
Rationale Continued
Human Genome
Mouse Genome
Seq A Seq B
HomoloGene would BLAST seq A against mouse and determine that seq C is an ortholog of seq A.
Seq C
HomoloGene would also BLAST seq B against mouse and detrmine that seq C is an ortholog of seq B.
BioHub will BLAST seq A against mouse, find seq C, then BLAST C back against human to see if there are any better matches. It will find seq B to be better.
Methods
• Design data relationships that make sense biologically
• Generate the low-level database interaction code
• Parse and load HomoloGene’s data into our database
• Write biologically useful functions
• Create a web-based interface for easy use
Materials
• ArgoUML – Design Aid
• Pymerase – Design Implementation
• PostgreSQL – Database
• HomoloGene – Data Source
• Python – Programming Language
Current State
• Design data relationships that make sense biologically
• Generate the low-level database interaction code
• Parse and load HomoloGene’s data into our database
• Write biologically useful functions
• Create a web-based interface for easy use
Example Usage
Sequence of Interest
…GGATACAAAATTCCTC…
Are there any known genes in this sequence?
acetyl - coenzyme A
dehydrogenase ( Human )
(cont.)
acetyl - coenzyme A
dehydrogenase ( Human )
Are there any homologs?
Mouse
Rat
Mosquito
Fruit fly
Nematode
(cont.)