Top Banner
Provenance of Microarray Experiments for a Better Understanding of Experiment Results Helena F. Deus University of Texas Jun Zhao University of Oxford Satya Sahoo Wright State University Matthias Samwald DERI, Galway Eric Prud’hommeaux W3C Michael Miller Tantric Designs M. Scott Marshall Leiden University Medical Center Kei-Hoi Cheung Yale University
29

provenance of microarray experiments

Aug 29, 2014

Download

Education

Helena Deus

 
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: provenance of microarray experiments

Provenance of Microarray Experiments for a Better Understanding of

Experiment Results

Helena F. Deus

University of Texas

Jun ZhaoUniversity of

Oxford

Satya SahooWright State University

Matthias Samwald

DERI, Galway

Eric Prud’hommeau

xW3C

Michael MillerTantric Designs

M. Scott MarshallLeiden

University Medical Center

Kei-Hoi CheungYale University

Page 2: provenance of microarray experiments

Outline

Background: microarrays, gene expression and why is provenance important for experimental biomedical data Objectives Data: Microarray workflow and gene results

The provenance model Demo Future work Summary

Page 3: provenance of microarray experiments

Introduction

High throughput experiments, such as microarray technologies, have revolutionized the way we study disease and basic biology.

Microarray experiments allow scientists to quantify thousands of genomic features in a single experiment

Source: http://www.scq.ubc.ca/

Affymetrix microarray gene chips

Genes can be used as biomarkers for disease

Page 4: provenance of microarray experiments

Introduction

Since 1997, the number of published results based on an analysis of gene expression microarray data has grown from 30 to over 5,000 publications per year

Existing microarray data repositories and standards, but lack of provenance and interoperable data access

Source: Y

JBM

(2007) 80(4):165-78

Page 5: provenance of microarray experiments

Introduction Cont.

A pilot study of the W3C HCLS BioRDF task force

Bottom-up approach Use Microarray

experiments for Alzheimer’s Diseases as the test-bed Aggregate results

across microarray experiments

Combine different types of data

Page 6: provenance of microarray experiments

Objectives

To facilitate a better understanding of microarray gene results Efficiently query gene results Efficiently combine existing life science datasets

To transform Microarray gene results into Semantic Web format

To encode provenance information about these gene results in the same format as the data itself

Page 7: provenance of microarray experiments

Microarray WorkflowBiological question

Differentially expressed genesSample gathering etc.

Experiment design

Microarray experiment

Image analysis

Normalization

Estimation ClusteringDiscriminat

ion T-test… …

Data extraction

Data analysis and modeling

Page 8: provenance of microarray experiments

An Example of differentially

expressed genes

8

Page 9: provenance of microarray experiments

An Example of gene list from different studies

Page 10: provenance of microarray experiments

What microarray experiments analyze samples taken from the entorhinal cortex region of Alzheimer's patients?

Page 11: provenance of microarray experiments

What genes are overexpressed in the entorhinal cortex region and what is their expression fold change and associated p-value?

Page 12: provenance of microarray experiments

What other diseases may be associated with the same genes found to be linked to AD? 

Page 13: provenance of microarray experiments

A Bottom-up Approach

Separate concerns/perspectives Too many existing vocabularies to choose from Lack of standardization among existing provenance

vocabularies Lack of a clear understanding of what needs to be captured Process

Identify user query Define terms Test the query using test data

Page 14: provenance of microarray experiments

A Bottom-up Approach

Raw Data

Results

Page 15: provenance of microarray experiments

A Bottom-up Approach

Raw Data

Results

Questions

Which genes are markers for

neurodegenerative diseases?

Was gene ALG2 differentially

expressed in multiple experiments?

What software was used to analyse the

data?

How can the experiment be

replicated?

Page 16: provenance of microarray experiments

A Bottom-up Approach

Raw Data

Results

Questions

Which genes are markers for

neurodegenerative diseases?

Was gene ALG2 differentially

expressed in multiple experiments?

Provenance of Microarray experiment

What software was used to analyse the

data?

How can the experiment be

replicated?

Page 17: provenance of microarray experiments

A Bottom-up ApproachProvenance

modelsWorkflow,

experimental designDomain ontologies

(DO, GO…)Communitymodels

Raw Data

Results

Questions

Which genes are markers for

neurodegenerative diseases?

Was gene ALG2 differentially

expressed in multiple experiments?

Provenance of Microarray experiment

What software was used to analyse the

data?

How can the experiment be

replicated?

Page 18: provenance of microarray experiments

The Provenance Data Model: Four Types of Provenance

http://purl.org/net/biordfmicroarray/ns#

Page 19: provenance of microarray experiments

RDF genelist representation Institutional level: metadata associated with each genelist such as

the laboratory where the experiments were performed or the reference to the genelist.

Experimental context level: experimental protocols such as the region of the brain and the disease (terms were partially mapped to MGED, DO and NIF).

Page 20: provenance of microarray experiments

RDF genelist representation Data analysis and significance: statistical analysis methodology for

selecting the relevant genes

Dataset descriptions: version of a source dataset, who published the dataset. The vocabulary of interlinked datasets (voiD) and dublin core terms (dct) were used.

Page 21: provenance of microarray experiments

Provenance types are perspectives on the data

Page 22: provenance of microarray experiments

Provenance types are perspectives on the data

Page 23: provenance of microarray experiments

Provenance types are perspectives on the data

Page 24: provenance of microarray experiments

Provenance types are perspectives on the data

Page 25: provenance of microarray experiments

Query federation with diseasomeIs there a gene network for AD?

Source: PNAS 104:21, 8685 (2007)

Page 26: provenance of microarray experiments

Demo Go to http://purl.org/net/biordfmicroarray/demo

Page 27: provenance of microarray experiments

Conclusions Levels of provenance: 1) institutional; 2) experimental

context; 3) Statistical analysis and significance; 4) dataset description

Provenance as RDF: SPARQL queries to express contrains both about the origins and context of the data

Data model is driven by the biological question: a bottom-up approach shields the model from rapidly evolving ontologies while enabling linking to widely used ontologies

Mapping is facilitated: Mapping to existing provenance vocabularies, like OPM, PML, Provenir is facilitated by: biordf:has_input_value, which can be made a sub-

property of the inverse of OPM property used biordf:derives_from_region, which can become a sub-

property of OPM property wasDerivedFrom.

Page 28: provenance of microarray experiments

Summary and Future Work Provenance modeling in a semantic web application

Query genes gathered from specific samples, in a given condition or from given organizations

Query genes produced through particular statistical analysis process

Query for information about genes from a most recent dataset The bottom-up approach

Separate concerns of interests Create a minimum set of terms required for motivation queries

Future work To integrate our model with provenance information generated

in scientific workflow workbench To integrate provenance information as part of the Excel

Spreadsheet where most biologists report their results

Page 29: provenance of microarray experiments

Acknowledgement

W3C BioRDF group Kei Cheung, Michael Miller, M. Scott Marshall, Eric

Prud’hommeaux, Satya Sahoo, Matthias Samwald The HCLS IG as well as Helen Parkinson, James Malone,

Misha Kapushesky and Jonas Almeida.