Top Banner
Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09 1 Making Sense of Public Domain Expression Data- GeneVestigator
28

Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09 1 Making Sense of Public Domain Expression Data- GeneVestigator.

Dec 30, 2015

Download

Documents

Hester Norton
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09 1 Making Sense of Public Domain Expression Data- GeneVestigator.

Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09

1

Making Sense of Public Domain Expression Data- GeneVestigator

Page 2: Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09 1 Making Sense of Public Domain Expression Data- GeneVestigator.

Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09

2

Microarray databases – characteristics pros and cons

Examples:• GEO and ArrayExpress• GeneVestigator - meta-analytical approach

On the Agenda -

Page 3: Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09 1 Making Sense of Public Domain Expression Data- GeneVestigator.

Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09

3http://titan.biotec.uiuc.edu/cs491jh/slides/cs491jh-Yong.ppt#268,6,Capturing Data and Meta-data in Microarray Experiments

Meta-data in Microarray Experiments

Gene expression studies generate large amounts of data !

Page 4: Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09 1 Making Sense of Public Domain Expression Data- GeneVestigator.

Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09

4

Properties of High-throughput DataMicroarray databases: have the ability to accept, store and export (share) large quantities of data.

Data (stored) contain:Many genesMany samplesVarious organisms/tissuesVariety of biological phenomenaTime courseReplicatesDifferent technologies: various data format

Data Retrieval:user-friendly web-based interfaces

Links to Analysis Tools

Page 5: Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09 1 Making Sense of Public Domain Expression Data- GeneVestigator.

Gene Expression Matrix

Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09

5

The final gene expression matrix (on the right) is needed for higher level analysis and mining

Samples

Gen

es

Gene expression levels

Images

Spo

ts

Spot/Image quantiations

http://titan.biotec.uiuc.edu/cs491jh/slides/cs491jh-Yong.ppt#271,8,Gene Expression Matrix

Page 6: Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09 1 Making Sense of Public Domain Expression Data- GeneVestigator.

Microarray Data Precision and Loss

Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09

6

90% of CEL files generated from microarray experiments have never been deposited to any repository. Stokes et al. BMC Bioinformatics 2008 9(Suppl 6):S18  

http://www.bio-miblab.org/arraywiki

Only provided in 0.1% of public experiments

Electron microscopy

Processed data loses precision !

Page 8: Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09 1 Making Sense of Public Domain Expression Data- GeneVestigator.

Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09

8

Complete description of complex experiments is desired.

We don’t always know what’s important: “Noise” probes could end up being informative (e.g.

detection of a splice variant).

The Future Better (more accurate) summarization algorithms will

emerge. New uses for raw data may emerge.

Challenge: Store the raw data in accessible form.

Different labs have different needs – a central system is needed !

Page 9: Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09 1 Making Sense of Public Domain Expression Data- GeneVestigator.

Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09

9

Complexity and Categories of Data and MIAME 6 parts

Publication

Hybridisation Arraydesign

Sample – Source & treatment,

prep. & labelling

Source(e.g., Taxonomy)

Experimental design

Normalization

Gene(e.g., EMBL)

Datameasurements

http://www.ict.ox.ac.uk/odit/projects/digitalrepository/docs/workshop/Helen_Parkinson-RDMW0608.ppt#429,18,Slide 18

The MIAME (Minimum Information About a Microarray Experiment) guidelines contain standards for publication of information. Brazma et al. (2001), Nature Genetics 29(4), 365-71

Page 10: Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09 1 Making Sense of Public Domain Expression Data- GeneVestigator.

Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09

10

The relative size of each pie corresponds to the number of experiments contained in each repository.

Microarray Database Repositoriesare Biased

Stokes et al. BMC Bioinformatics 2008 9 (Suppl 6): S18 http://www.biomedcentral.com/1471-2105/9/S6/S18

All human data

Mostly human data

Mostly old data

Mostly custom arrays

MainlyAffy chips

Page 11: Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09 1 Making Sense of Public Domain Expression Data- GeneVestigator.

Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09

11

Stokes et al. BMC Bioinformatics 2008 9 (Suppl 6): S18  http://www.biomedcentral.com/1471-2105/9/S6/S18

Overlaps of Data Between Repositories

Total Experiments: 2376 August 2005 – June 2006

Page 12: Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09 1 Making Sense of Public Domain Expression Data- GeneVestigator.

User-Friendly Microarray Databases Many gene expression databases exist: commercial and non-

commercial.

Most focus on either a particular technology, particular organism or both.

We will discuss most promising ones:

ArrayExpress – EBI (AE)

The Gene expression Omnibus (GEO; NCBI)

GeneVestigator

Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09

12

Page 13: Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09 1 Making Sense of Public Domain Expression Data- GeneVestigator.

Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09

13

The Gene Expression Omnibus is a public repository in the Entrez database that includes high-throughput gene expression data, hosted at the National library of Medicine (NIH).

GEO was designed to accommodate diverse types of data.

http://www.ncbi.nlm.nih.gov/geo/

Page 14: Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09 1 Making Sense of Public Domain Expression Data- GeneVestigator.

14

(GDS)

Gene Express Omnibus - Experiment centered view

Page 15: Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09 1 Making Sense of Public Domain Expression Data- GeneVestigator.

Gene Express Omnibus - Gene centered view

Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09

15

Expression profile of the Dystrophin gene in a DataSet examining skeletal muscle biopsies from 12 Duchenne muscular dystrophy patients and 12 normal subjects.

Red bars: level of abundance of an individual transcript across the Samples that make up a DataSet. Values are presented as arbitrary units. Single channel: normalized Values signal count data. Dual channel: submitted Values are normalized log ratios.Blue square rank order, give an indication of where the expression of that gene falls with respect to all other genes on that array (enrichment).

Example: GDS563

Faded bars/squares: These correspond to Affymetrix 'Detection call' = Absent.

Duchenne Normal

Experimental design

Page 16: Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09 1 Making Sense of Public Domain Expression Data- GeneVestigator.

Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09

16Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09

16

http://www.ebi.ac.uk/microarray-as/ae/

Page 17: Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09 1 Making Sense of Public Domain Expression Data- GeneVestigator.

Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09

17

Query ArrayExpress

Gene name Condition

Species

Experiments and description

Annotations

Click

Results: a list of all experiments, ordered by p value.For each experiment: short description, experimental factors and gene expression.

Page 18: Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09 1 Making Sense of Public Domain Expression Data- GeneVestigator.

Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09

18

Query ArrayExpress – similar expressed genesSelect the ‘find 3 closest genes’ option.IER2, FOS, JUN, have similar expression to nfkbia.

Page 20: Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09 1 Making Sense of Public Domain Expression Data- GeneVestigator.

Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09

20

https://www.genevestigator.com/gv/index.jsp GeneVesigator –a reference expression database and meta-analysis system

Page 21: Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09 1 Making Sense of Public Domain Expression Data- GeneVestigator.

Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09

21

A database & Web-browser data mining interface for Affymetrix GeneChip data, based on a the new concept of “Meta-Profiles“, relying on reference expression databases.

Allows biologists to study the expression and regulation of genes in a broad variety of contexts by summarizing information from hundreds of manually curated microarray experiments.

Workspaces and views can be stored into files and re-opened for another analysis session (*.gvw which stands for GenevestigatorWorkspace).

Genevestigator – a system for the meta-analysis of microarray data

http://bar.utoronto.ca/ICAR19/ICAR19_BioinfoWorkshop%20-%20Genevestigator.ppt#257,2,Overview of the Genevestigator system

Application server

Java application

Analysis output

Page 22: Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09 1 Making Sense of Public Domain Expression Data- GeneVestigator.

Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09

22

Database Content and Quality

Database consist of large and various manually curated and quality-controlled Affymetrix chips:

Quality control of EACH experiment is manually done by Genevestigator curators using a pipeline of Bioconductor packages performing normalization and probe-level analysis.

Low quality arrays are characterized by:• fall out of range relative to the other arrays from the same experiment,

• exhibit higher RNA degradation, • particularly noisy, • do not correlate with replicate samples.

Page 23: Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09 1 Making Sense of Public Domain Expression Data- GeneVestigator.

Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09

23

Genevestigator is a web-based application running in Java. Java applet provides several advantages:• users don’t have to install any software• users always work with the latest software release• Java is more powerful than HTML/Javascript for data manipulation

To run the application, client machines must have Java runtime environment(JRE; version 1.4.2 or higher) installed (usually available by default on PCs). JRE is freely available for download at Sun Microsystems (http://www.Java.com).

To optimally work with the Genevestigator application, we recommend:• screen resolution: 1024 x 768 or higher• memory: preferably 512 MB RAM or more

User Hardware Requirements

Page 24: Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09 1 Making Sense of Public Domain Expression Data- GeneVestigator.

Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09

24

Species: Human Mouse Rat [Mammals]

GeneVestigator Species Availability

Species: Arabidopsis Barley Rice Soybean[Plants]

Human 133_2 & Human Genome 10k 20k 47 k

1109, 3786, 2782

Arrays:

Numberof arrays:

Mouse Genome

12k 40k

3071, 1967

Rat Genome

8k 31k

2146, 858

Arabidopsis Genome 22k

3110

Arrays:Numberof arrays:

Barley Genome 22k

706

Rice Genome 22k

-

Page 25: Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09 1 Making Sense of Public Domain Expression Data- GeneVestigator.

Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09

25

Data Sources and ReferencingThe Genevestigator analysis platform comprises a large database ofmanually curated microarray experiments collected from the public domainor from individual contributors. The array annotations necessary for dataanalysis were retrieved from public repositories and/or, if insufficiently available, from the authors themselves.

Genevestigator contains data from the following repositories and databases:

Database Link

Gene Expression Omnibus (GEO)http://www.ncbi.nlm.nih.gov/geo/

ArrayExpresshttp://www.ebi.ac.uk/arrayexpress/

ChipperDBhttp://chipperdb.chip.org/adb/adb-home

The Arabidopsis Information Resource (TAIR)http://www.arabidopsis.org/

MUSC Microarray Databasehttp:proteogenomics.musc.eduma

Public Expression Profiling Resource (PEPR)http://pepr.cnmcresearch.org

NASC Microarray Database (NASCArrays)http://affymetrix.arabidopsis.info/narrays/experimentbrowse.pl

NIH Neuroscience Microarray Consortiumhttp://arrayconsortium.tgen.org/np2/home.do

Gene Expression Open Source System (GEOSS) https://genes.med.virginia.edu/intro to geoss.html

RNA Abundance Database (RAD) http://www.cbil.upenn.edu/RAD/php/index.php

Page 26: Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09 1 Making Sense of Public Domain Expression Data- GeneVestigator.

Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09

26

1. Time (Gene expression during stages of development\life-cycle).2. Space (Tissue specific expression).3. Response (Expression caused by stimuli: biotic stress, abiotic stress, chemical,

hormone, light, drug treatment, disease).

Access:

Free / By license

GeneVestigator – focus on gene expression in the context of:

Users can query the database to retrieve the expression patterns of individual genes throughout chosen environmental conditions, growth stages, or organs.

Reversely, mining tools allow users to identify genes specifically expressed during selected stresses, growth stages, or in particular organs

Page 27: Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09 1 Making Sense of Public Domain Expression Data- GeneVestigator.

Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09

27http://sbw.kgi.edu/

Page 28: Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09 1 Making Sense of Public Domain Expression Data- GeneVestigator.

Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09

28Bioinformatics Intro, 15/12/2008, Metsada Pasmanik-Chor

28

Dr. Metsada Pasmanik-ChorBioinformatics Unit,Life Science, TAU

Tel: x 6992E-mail: [email protected]

Bioinfo. Unit webpage: http://bioinfo.tau.ac.il