-
ILSI-HESI agreement with EBI:ArrayExpress, public repository for
toxicogenomics dataSusanna Assunta
[email protected] Informatics TeamEuropean
Bioinformatics Institute (EBI)
Hoffmann-La Roche
-
AcknowledgmentsMicroarray Informatics Team, EBI, esp.: Alvis
BrazmaHelen ParkinsonMohammad ShojatalabUgis Sarkans Industry
Support team, EBI MGED steering committeeMIAME working group Chris
Stoeckert, U. Penn. and members of MGED
-
Talk structurePart I= ArrayExpress at EBI: A public repository
for gene expression data
Demo= MIAMExpress: Submission/annotation tool
Part II= ILSI-HESI IMD: Toxicogenomics data transfer to
ArrayExpress
-
Part I - Talk structureData standardization:MGED groupMIAME
conceptsMGED Ontology
Uses of MIAME concepts:ArrayExpress databaseMAGE-OM the object
model
Data flow in out ArrayExpress
-
Part I - Talk structureData standardization:MGED groupMIAME
conceptsMGED Ontology
-
Data standardization - MGEDMGED = Microarray Gene Expression
DbEBI+worlds largest labs (TIGR, Sanger, Stanford, Agilent,
Affymetrics, etc.)www.mged.orgAimsFacilitate adoption of
standards:AnnotationData representationIntroduce:Experimental
controlsData normalization methods
-
Data standardization - Why?Size of datasetDifferent platforms -
nylon, glassDifferent technologies - oligos, spottedReferences to
external db not stable!Gene expression data only have a meaning in
the context of a detailed experiment description
-
MIAME-Minimum Information About Microarray Experiment MGED group
has published: MIAME v1.0 doc (Brazma et al., Nature Gen, 2001)
Minimum information that must be reported about a microarray
experiment in order to ensure: its interpretability potential
verification of the results
-
MIAME-Minimum Information About Microarray
ExperimentPublicationExternal linksDescribes the6 parts of a
microarray experiment Normalisation
-
MIAME - Experimental design6 parts of a microarray experiment
NormalisationDataSampleHybridisationArraySource(e.g.
Taxonomy)Gene(e.g. EMBL)PublicationThe set of the hybridisation
experiments as a whole
-
MIAME - Experimental designOne/more hybridisations experiments
in some way related and addressing related questions:Author,
contact information, citations Type of experiment e.g.:time
coursenormal vs diseased comparisonExperimental factors i.e. tested
parameters in the experiment e.g.:timedoseresponse to a
compoundList of organisms used in the experimentList of platforms
used
-
MIAME - Experimental designList of samples, array and
hybridisations and their relationship e.g.:SamplesS1, S2,
S3ArraysA1, A2, A3Hybridisations:H1 is S1 and S2 on A1H2 is S2 and
S3 on A2H3 is S1 and S2 on A3
Which hybridisations are replicates e.g.:H1 and H3 are
replicates
-
MIAME - Experimental designQuality related indicators e.g.:type
of replicates
Free-text description of the experiment or link to an
e-publication
-
MIAME-Minimum Information About Microarray
ExperimentPublicationExternal links6 parts of a microarray
experiment NormalisationArray
-
MIAME - Array
designNormalisationDataSampleHybridisationSource(e.g.
Taxonomy)Gene(e.g. EMBL)PublicationArrayExperimentEach array used
and each element (spot) on the array
-
MIAME - Array designFor the database, the array description
should be normally submitted only once
For each physical array used in the experiment a unique ID and
the array type are given
Array design related information e.g.:platform type = insitu
synthesized or spotted, array provider, etc.surface type = glass,
membrane, etc.
-
MIAME - Array designProperties of each type of elements on the
array, that are generated by similar protocols e.g.:synthesized
oligos, PCR products, plasmids, colonies, etc.
Each element (spot) on the array:Elements may be simple or
composite (Affymetrix) Each element must be identified by either
the sequence, clone ID, PCR primer pair, or in any other
unambiguous wayComposite elements may be identified by a reference
sequenceElements may be linked to genes (preferably)This
information is normally provided in a separate file
e.g.:spreadsheet
-
MIAME-Minimum Information About Microarray
ExperimentPublicationExternal links6 parts of a microarray
experiment NormalisationSample
-
MIAME - SampleNormalisationDataSampleHybridisationSource(e.g.
Taxonomy)Gene(e.g. EMBL)PublicationArrayExperimentSamples used, the
extract preparation and labelling
-
MIAME - SampleSample source e.g.:OrganismCell source and
typeDevelopmental stageOrganism part (tissue)Animal/plant strain or
lineGenetic variationDisease state or normalTypically only some of
these qualifiers are relevant and there isthe need to implement the
annotation for sample source ! (To be continued)
-
MIAME - SampleSample treatment e.g.:in vivo / in
vitroCompoundsThere is the need to implement the annotation for
sampletreatment ! (To be continued)
Hybridisation extract preparationLaboratory protocol, including
extraction method, whether RNA, mRNA, or genomic DNA is extracted,
amplification method
LabellingLaboratory protocol, including amount of nucleic acids
labelled, label used (e.g. Cy3, Cy5, 33P, etc)
-
MIAME-Minimum Information About Microarray
ExperimentPublicationExternal links6 parts of a microarray
experiment NormalisationHybridisation
-
MIAME -
HybridisationsNormalisationDataSampleHybridisationSource(e.g.
Taxonomy)Gene(e.g. EMBL)PublicationArrayExperimentProcedures and
parameters
-
Laboratory protocol including:The solution e.g.:concentration of
solutesBlocking agentWash procedureQuantity of labelled target
usedTime, concentration, volume, temperatureDescription of the
hybridisation instrumentsMIAME - Hybridisations
-
MIAME-Minimum Information About Microarray
ExperimentPublicationExternal links6 parts of a microarray
experiment NormalisationData
-
MIAME - DataNormalisationDataSampleHybridisationSource(e.g.
Taxonomy)Gene(e.g. EMBL)PublicationArrayExperimentImages,
quantitation, specifications
-
MIAME - Data Three data processing levels:
-
MIAME - Data Why three data processing levels? Each experiment
uses different units! Non reliable information
Lack of gene expression measurement units!
What do we do in absence of standards? Record raw, intermediate
and final analysis data Together with detailed annotation on the
analysis
This passes on the responsibility of interpreting the final data
to the user
-
MIAME - DataRaw dataArray scans The scanner image file e.g.:
TIFF, DAT
Scanning information: Scan parameters: laser power spatial
resolution pixel space PMT voltage Laboratory protocol for scanning
Scanning hardware and software
No MGED consensus on raw data!!
-
MIAME - DataIntermediate data SpotsQuantitationsSpot
quantitations Image analysis and quantitation: Complete image
analysis output for each element normally given as separate file
e.g.: spreadsheet
Image analysis information: Image analysis software
specifications All parameters
-
MIAME - Data Summarised information from possible replicates:
Derived measurement values summarising related elements as used by
the author Reliability information for these values given as
separate file, e.g.: spreadsheet Specifications of these two e.g.:
median value of the replicates, standard deviation
ConditionsGenesGene expression levelsFinal data
-
MIAME-Minimum Information About Microarray
ExperimentPublicationExternal links6 parts of a microarray
experiment Normalisation
-
MIAME -
NormalisationNormalisationDataSampleHybridisationSource(e.g.
Taxonomy)Gene(e.g. EMBL)PublicationArrayExperimentA typical
experiment involves a number of hybridisations in which the data
from multiple samples are analysed and compared
For this comparison, the reported hybridisation intensities
(from the image processing) must be first normalised
-
MIAME - NormalisationNormalisation adjust for a number of
technical variations between and within hybridisation
Normalisation strategy e.g.:SpikingHousekeeping geneTotal
array
Normalisation algorithm
Control array elements
Hybridisation extract preparation
-
6 parts of a microarray experiment
NormalisationDataSampleHybridisationSource(e.g. Taxonomy)Gene(e.g.
EMBL)PublicationArrayExperiment Annotation implementations
requiredGene expression data only have a meaning in the context of
a detailed sample (source-treatment) and array (gene)
descriptionMIAME - Annotation
-
MIAME - Gene
annotationNormalisationDataSampleHybridisationGene(e.g.
EMBL)PublicationArrayExperimentSource(e.g. Taxonomy) Unambiguous
identification: Interpret data
!!Synonyms!! Alternative to gene names Community approved
names
Usable external sources e.g.: EMBL-GenBank (sequence acc#)
Jackson Lab (approved mouse gene names) HUGO (approved human gene
names)
-
MIAME - Sample
annotationNormalisationDataSampleHybridisationGene(e.g.
EMBL)PublicationArrayExperiment Unambiguous identification:
Interpret data
Usable external sources e.g.: NCBI Taxonomy (organisms) Jackson
Lab (mouse strains) Mouse Atlas (mouse anatomy) Merck Index, CAS #
(compounds)
CVs and ontologies are needed: Reduce free-text description
Facilitate data queries-analysisSource(e.g. Taxonomy)
-
What are CV and Ontology?CV = Controlled Vocabulary:Set of
restrictive terms used to describe something, in the simplest case
it could be a list
Ontology:Describes the relationship between the terms in a
structured wayProvides semantics and constraintsAllows for
computational inferences and reliable comparisons
-
Ontology exampleBuild an ontology for e.g.:Affymetrics GeneChip
Rat Toxicology U34 Array
(Top Level Class) Array element type (Sub-Class) oligos (slot
constraint) manufactured by Affymetrics (instance) GeneChip Rat
Toxicology U34 Array
-
MIAME - MGED OntologyMGED Sample (BioMaterial) ontology:Under
construction by Chris Stoeckert
www.cbil.upenn.edu/Ontology/MGED_ontology.htmlMotivated by
MIAMEDefines terms, provides constraints, develops CVs for
microarray experiment submissions Links also to external CVs and
ontologies
-
MIAME Q,V,S tripletsMIAME definitions include the Q,V,S
triplets:User defined qualifier, value, source tripletUsed to
describe a new termqualifier = what the term describes (cell
type)value = its value (epithelial)source = its source (Grays
anatomy-38th ed.)User defined terms are added to the MGED
ontology
-
Part I - Talk structureData standardization:MGED groupMIAME
conceptsMGED Ontology
Uses of MIAME concepts:ArrayExpress databaseMAGE-OM the object
model
-
Uses of MIAME concepts Specifies the content of the information:
Sufficient information must be recorded to: Correctly interpret
Replicate the experiments Structured information must be recorded
to: Correctly retrieve Analyse the data Uses: Creation of
MIAME-compliant databases e.g.: ArrayExpress at EBI Development of
submission/annotation tool for generating MIAME-compliant
information e.g.: MIAMExpress
-
ArrayExpress
A public repository for gene expression data
MIAME-compliant
-
MAGE-OM Microarray Gene Expression Object Model: MIAME compliant
Standard Joint submission to OMG, 2001, by MGED and Rosetta OMG
(Object Management Group) is an international non-profit software
consortium that is setting standards in the area of distributed
object computing
ArrayExpress- Object Model
-
MAGE-ML Mark-up Language: Derived from MAGE-OM Describe and
communicate MIAME information DTD = predominantly computer
readable
UML Unified Modelling Language: UML specifications are used to
develop and describe MAGE-OM UML = human readableArrayExpress-
Object Model
-
MAGE-OM - UML specifications Related classes are grouped
together in packages MAGE-OM has 16 packages
-
MAGE-OM mapping to MIAMENormalisation+ other 7 auxiliary
packages:AuditandSecurity, Protocol, Measuraments, BioEvent, BQS,
Description, HighLevelAnalysis
-
Part I - Talk structureData standardization:MGED groupMIAME
conceptsMGED Ontology
Uses of MIAME concepts:ArrayExpress databaseMAGE-OM the object
model
Data flow in out ArrayExpress
-
Data flow in-out ArrayExpress
Users
-
Data flow in-out ArrayExpress
UsersLoaderSubmissionSubmissionMAGE-ML MIAME compliant Data
model implemented in ORACLE Deals with: Raw data Processed data
Data transformation Independent of: Experimental platform Image
analysis method Normalization method
-
Data flow in-out ArrayExpress
Userscentral databasedata
warehouseArrayExpressLoaderSubmissionSubmissionMAGE-MLMIAMExpress
Submission/annotation tool Generates MIAME-compliant information
Beta-testers Demo version (general) Target specific interfaces
e.g.: Specie specific Toxicology specific
-
Talk structurePart I= ArrayExpress at EBI: A public repository
for gene expression data
Demo= MIAMExpress: Submission/annotation tool
-
Talk structurePart I= ArrayExpress at EBI: A public repository
for gene expression data
Demo= MIAMExpress: Submission/annotation tool
Part II= ILSI-HESI IMD: Toxicogenomics data transfer to
ArrayExpress
-
Part II - Talk structureData transfer from IMD to
ArrayExpress:Can data be parsed?MIAME-compliant?
Toxicology specific MIAMExpress interface:ILSI toxicogenomics
data submission
Areas of collaboration-Summary
-
Part II - Talk structureData transfer from IMD to
ArrayExpress:Can data be parsed?MIAME-compliant?
-
Data parsing?From IMD to ArrayExpress:Lexical parsingMapping
information to MAGE-OM
!! Semantic parsing !!Glossary issues
-
NormalisationSampleHybridisationArrayDataData mapping -
Semantics!
-
ExperimentNormalisationSampleHybridisationDataData mapping -
Semantics!IMD=chip,microarray chip!! Synonyms !!
-
ExperimentNormalisationSampleHybridisationDataIMD=chip
description,microarray chip description!! Synonyms !!
Data mapping - Semantics!
-
ExperimentNormalisationSampleHybridisationDataIMD=chip
design,microarray chip design!! Synonyms !!
Data mapping - Semantics!
-
ExperimentNormalisationSampleHybridisationDataIMD=platform,microarray
platform, microarray platform type!! Synonyms !!Data mapping -
Semantics!
-
MIAME - compliant?IMD MIAME-compliant?Minimal system for data
exchangeComparisons
Current status for toxicogenomic data:Non-MIAME compliant
Additional information required:To be flagged as MIAME compliant
To build queries to the database: ArrayExpress has a object model
query mechanism
Why additional information?
-
ILSI-HESI ObjectiveILSI-HESI objective:To have publicly
available information to assist in developing consensus on
potential applications and interpretation of microarray data with
respect to mechanism-based risk assessmentTo critically assess the
potential utility of these new method for the process of hazard
identification
Toxicologists (other than ILSI-HESI members)Can correctly
interpret and replicate the toxicogenomics experimentsCan correctly
retrieve and analyse the toxicogenomics data
Sufficient and structured information must be recorded in order
to achieve ILSI-HESI objective
-
IMD - DataThree type of data: Required:fold_change of spot
intensityOptional: relative_intensity coefficient_variation of
relative_intensityAdditional:present/absent/marginal_call (for
Affymetrics)P_value (for replicates)
-
MIAME compliant - Data Requirements:
-
Why three data processing levels? Lack of gene expression
measurement units!
What do we do in absence of standards? Record raw, intermediate
and final analysis data Together with detailed annotation on the
analysis
This allows toxicologists (other than ILSI-HESI members) to
interpret the final data
Increase the value of toxicology data by achieving ILSI-HESI
objective To give a critical mass to the ILSI-HESI studies
MIAME compliant - Data
-
IMD Experiment description
Hepatotoxicity e.g.:Oral (gavage) Study in Male SD Rats on
Methapyrilene
-
IMD Experiment descriptionGood level of information
Still incomplete to be MIAME compliant e.g.:Detailed protocols
required e.g.: Hybridization chamber type, scanner type, label
quantity etc.
Need for:CV and ontologies
-
Excerpt from Sample Descriptioncourtesy of M. Hoffman, S.
Schmidtke, Lion BioSciencesOrganism: Mus musculus [ NCBI taxonomy
browser ]Cell source: in-house bred mice (contact:
[email protected]) Sex: female [ MGED ]Age: 3 - 4 weeks after
birth [ MGED ]Growth conditions: normal controlled environment20 -
22 oC average temperaturehoused in cages according to EU
legislationspecified pathogen free conditions (SPF)14 hours light
cycle10 hours dark cycleDevelopmental stage: stage 28 (juvenile
(young) mice) [ GXD "Mouse Anatomical Dictionary" ]Organism part:
thymus [ GXD "Mouse Anatomical Dictionary" ]Strain or line: C57BL/6
[ International Committee on Standardized Genetic Nomenclature for
Mice ]Genetic Variation: Inbr (J) 150. Origin: substrains 6 and 10
were separated prior to 1937. This substrain is now probably the
most widely used of all inbred strains. Substrain 6 and 10 differ
at the H9, Igh2 and Lv loci. Maint. by J,N, Ola. [ International
Committee on Standardized Genetic Nomenclature for Mice ]Treatment:
in vivo [ MGED ] [ intraperitoneal ] injection of [ Dexamethasone ]
into mice, 10 microgram per 25 g bodyweight of the mouseCompound:
drug [ MGED ] synthetic [ glucocorticoid ] [ Dexamethasone ],
dissolved in PBS
-
Part II - Talk structureData transfer from IMD to
ArrayExpress:Can data be parsed?MIAME-compliant?
Toxicology specific MIAMExpress interface:ILSI toxicogenomics
data submission
Areas of collaboration-Summary
-
Toxicology specific MIAMExpressToxicology specific interface
options:in vivo or in vitro Study specific (Hepatotoxicity,
Nephrotoxicity, Genotoxicity)CVs and ontologies to be developed:CVs
in pull down menus Q,V,S users driven ontologiesExtend MGED
ontology to include toxicology specifics termsDynamic, fast and
easy to useBrowse:Protocols Arrays
-
Areas of collaborationData transfer:Parser from IMD to
ArrayExpress (MAGE-ML)Additional information required:MIAME
compliant flag (e.g. data, protocols, sample pooling etc.)Build
complex queriesData submission:Submission via toxicology specific
MIAMExpressCVs and ontologiesInterfaces optionsProtocolsOther
data:Volume (79 from Hetapotoxicity)Clinical chemistry,
HistophatologyFormat (images also?) and volumeMailing list