The European Bioinformatics Institute The European Bioinformatics Institute MGED ontology for consistent annotation of microarray experiments Manchester Bioinformatics Week Ontologies Workshop1 March 23-24 th 2002 Philippe Rocca-Serra Microarray Informatics Team EBI-EMBL, Hinxton Cambridge
21
Embed
The European Bioinformatics Institute MGED ontology for consistent annotation of microarray experiments Manchester Bioinformatics Week Ontologies Workshop1.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
The European Bioinformatics InstituteThe European Bioinformatics Institute
MGED ontology for consistent annotation of microarray experiments
Manchester Bioinformatics WeekOntologies Workshop1
March 23-24th 2002
Philippe Rocca-Serra
Microarray Informatics Team
EBI-EMBL, Hinxton Cambridge
The European Bioinformatics InstituteThe European Bioinformatics Institute
ArrayExpress: a database for Gene Expression Studies
Samples
Genes
Gene expression data matrix
The European Bioinformatics InstituteThe European Bioinformatics Institute
ArrayExpress goals
To create a public repository for gene expression data:
apply a standard format
apply curation to the data (high quality control)
easy access to information
search and retrieve information
To compare experiments.
To perform analysis and data mining using complex querying
The European Bioinformatics InstituteThe European Bioinformatics Institute
Gene expression data matrix
Experiment (platform, conditions…)
What kind of data should be stored ?
Samples
Genes & transcription units
annotations
The European Bioinformatics InstituteThe European Bioinformatics Institute
Important issues about data annotation
Sufficient annotation of the experiment, genes and samples
The European Bioinformatics InstituteThe European Bioinformatics Institute
MIAME Requirements:addressing the issue of sufficient annotation
Experimental design: the set of hybridisation experiments as a whole
Array design: each array used and each element (spot) on the array
Samples: samples used, extract preparation and labelling Hybridisations: procedures and parameters Measurements: images, quantitation, specifications Normalisation controls: types, values, specifications
(Brazma et al, Nature Genetics, 2001)
Samples: samples used, extract preparation and labelling
Recorded info should be sufficient to interpret and replicate the experiment
The European Bioinformatics InstituteThe European Bioinformatics Institute
Second ChallengeAddressing the issue of annotation efficiency
requires machine understandable annotations:– Avoid free text and natural language:– Avoid synonyms: adrenaline / epinephrine
– General use of CV and Ontologies Gene annotation using e.g. GO and pathway analysis
Create a new ontology where necessary:– Task assigned to MGED for Biomaterial (sample)
description
One of the main MGED Goal to facilitate the adoption of standards for DNA-array experiment annotation and data
representation
The European Bioinformatics InstituteThe European Bioinformatics Institute
ArrayExpress DB is an implementation of the MAGE-OM model (a UML model)
MAGE model by construction includes the use of ontology entries :
-37 locations for an “Ontology Entry”
-36 cases of simple Controlled Vocabularies: e.g. Image Format (TIFF, JPEG)
-1 has required development of specific modelling:
Biomaterial (sample) description
Ontology integration in the object model describing ArrayExpress database
The European Bioinformatics InstituteThe European Bioinformatics Institute
MAGE BioMaterial Model
The European Bioinformatics InstituteThe European Bioinformatics Institute
Facts about MGED biomaterial ontology
Authors: Developed by Chris Stoeckert, U. Penn and Helen Parkinson, EBI
Coordinated with the ArrayExpress database model (mapping available)
Technical choices: Use of the OIL Language–A new standard for building ontologies provides support for Formal
Semantics and Reasoning:–Class/property modelling primitives based on Frame based systems:–Semantics Capturing based on Description Logics: –Syntax for encoding primitives and semantics based on existing Web
The European Bioinformatics InstituteThe European Bioinformatics Institute
Referencing to external ontologies
NCBI taxonomy database Jackson Lab mouse strains and genes Edinburgh mouse atlas anatomy GO Gene Ontology HUGO nomenclature for Human genes Chemical and compound Ontologies - Merck index TAIR Flybase …..and many more…www.mged.org/ontology/
The European Bioinformatics InstituteThe European Bioinformatics Institute
Planning MGED ontology’s future
Making the ontology available where it’s needed:
Develop browser or other interface for the ontology and link to LIMS
Incorporate the ontology into submission/annotation and curation tools (MIAMExpress)
The European Bioinformatics InstituteThe European Bioinformatics Institute
Planning MGED ontology’s future
ArrayExpress DB
Direct Submission in Mage-ML
Large centres LIMSSubmission via
MIAMExpress
Curation DB
Other submitters
Ontology availability made simple ?
MGED/ArrayExpress ontology
External Ontologies
The European Bioinformatics InstituteThe European Bioinformatics Institute
Planning MGED ontology’s future
Making the ontology available where it’s needed: Develop browser or other interface for the ontology and link to
LIMS Incorporate the ontology into submission/annotation and
curation tools (MIAMExpress)
Further ontology development : new instances, class refinementBetter integration of available ontologiesWriting guidelines on how to use ontologies for annotating data:
Developing Use cases (non trivial task)
The European Bioinformatics InstituteThe European Bioinformatics Institute
Resources
List of ontology resources from MGED pages MAGE-MIAME-ontology mappings, MIAME glossary Schemas for both ArrayExpress and MIAMExpress Annotation examples in MAGE-ML