Simon Twigger, PhD MCW Driving Biological Project 1 Monday, September 27, 2010
Simon Twigger, PhD
MCW Driving Biological Project
1Monday, September 27, 2010
Rat Genome Database
2Monday, September 27, 2010
Whats the problem?• large scale repositories
with unused or inaccessible information
• How can these databases be made more useful?
• How to help researchers find and use this information to connect genes to disease?
3Monday, September 27, 2010
Rat researchers ask...
What tissue is this gene expressed in?
What expression data is known for SD (aka SD/NHsd,
Harlan Sprague Dawley, Sprague Dawley) rats?
Are any of these genes associated with my
phenotype?Has this gene been seen in the brain?
What rat expression studies have been done on Mammary Cancer(aka breast neoplasms/breast
cancer/cancer of the breast, breast carcinoma...)?
Has anyone done any expression studies using congenic rats?
Monday, September 27, 2010
Create AnnotationJobs & Queue Up
Q-In
Put results in toqueue for save
ParseResults
Index textat OBA
1..n Annot. Workers
Results saved toGMiner database
Q-Out
RabbitMQ
GEO Records
What's the strategy?• Focus on GEO
(microarray)
• Use NCBO annotator to markup text, review annotations and then use for tools and visualization
• Combine annotations with biological data to derive new insights.
5Monday, September 27, 2010
Current Ontologies
http://bioportal.bioontology.org/Monday, September 27, 2010
7Monday, September 27, 2010
8Monday, September 27, 2010
Progress
Monday, September 27, 2010
Linking annotations to data
Tm2d1
RGD1306410
Svs4
Hbb
Scgb2a1
Alb
Monday, September 27, 2010
Tm2d1
RGD1306410
Svs4
Hbb
Scgb2a1
Alb
+
Hbb is_expressed_in rat kidneyTm2d1 is_expressed_in rat kidney
Human (U133, U133v2.), Mouse (430, U74, U95) and Rat (U34a/b/c, 230, 230v2)62,000 samples x ca. 25,000 genes/sample = 1.5B data points
Linking annotations to data
Monday, September 27, 2010
Probeset results on GMiner
Probeset L08490cds_at for Gabra1 - gamma-aminobutyric acid (GABA) A receptor, alpha 1
Hs GABRA1
Monday, September 27, 2010
Strain 1 Strain 2!=
Component
Function
Process
G
G
Pathway
Anatomy(Kidney)
G G G
QTLHypertensive
Phenotype
Hypertension
Monday, September 27, 2010
QTL Gene ‘Highlighter’
G G G
QTL
Disease/Pheno.
AllegroGraph
GMiner RGD OBO etc
G
Monday, September 27, 2010
RDF/OWL sourcesCell Ontologyhttp://www.berkeleybop.org/ontologies/obo-all/cell/cell.owl
Mouse Adult Gross Anatomyhttp://www.berkeleybop.org/ontologies/obo-all/adult_mouse_anatomy/adult_mouse_anatomy.owl
Mammalian Phenotypehttp://www.berkeleybop.org/ontologies/obo-all/mammalian_phenotype/mammalian_phenotype.owl
GO Functionhttp://www.berkeleybop.org/ontologies/obo-all/molecular_function/molecular_function.owl
GO Processhttp://www.berkeleybop.org/ontologies/obo-all/biological_process/biological_process.owl
GO componenthttp://www.berkeleybop.org/ontologies/obo-all/cellular_component/cellular_component.owl
Monday, September 27, 2010
Rat Genome Database
16
Wide variety of data types - genomic and physiological many with corresponding ontologies
Monday, September 27, 2010
Monday, September 27, 2010
RGD->RDF
Existing RGD ‘object types’ & mappings to SO
Monday, September 27, 2010
RGD Gene
Monday, September 27, 2010
RGD QTL
Monday, September 27, 2010
QTL Highlighter
• Rails source code will be available on GitHub• RDFizer (ruby) http://github.com/simont/MCW-RDF
Monday, September 27, 2010
Next Steps• Register PURL for RGD
• Create RGD core object ontology (OWL/RDF)
• Select appropriate URIs for RGD data
• Ontology annotations - how best to represent in triple store?
• Export GMiner data to RDF-> Triple Store
• Document & refine biological use cases related to candidate gene selection/evaluation
• Identify additional data required for candidate gene selection, RDFize as appropriate, load into triple store.
• Connections to other RDF collections/LOD, etc.?
Monday, September 27, 2010