Top Banner
ArrayExpress A public database for microarray based gene expression data http://www.ebi.ac.uk/microarray/ European Bioinformatics Institute EMBL-EBI Alvis Brazma, Helen Parkinson, Ugis Sarkans, Mohammadreza Shojatalab, Jaak Vilo + team MGED IV, Boston, February 2002
28

ArrayExpress A public database for microarray based gene expression data European Bioinformatics Institute EMBL-EBI Alvis.

Mar 26, 2015

Download

Documents

Diana Campbell
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: ArrayExpress A public database for microarray based gene expression data  European Bioinformatics Institute EMBL-EBI Alvis.

ArrayExpressA public database for microarray based gene expression datahttp://www.ebi.ac.uk/microarray/

European Bioinformatics Institute

EMBL-EBI

Alvis Brazma, Helen Parkinson, Ugis Sarkans, Mohammadreza Shojatalab, Jaak Vilo + team

MGED IV, Boston, February 2002

Page 2: ArrayExpress A public database for microarray based gene expression data  European Bioinformatics Institute EMBL-EBI Alvis.

ArrayExpress

• Standards:MIAME-compliant• Data model: MAGE-OM• Data input: MAGE-ML, web• Data output: HTML, MAGE-ML,

TAB-delimited, link to Expression Profiler

• Data curation: Team of curators• Data sets: Yeast, human

Tuesday, February 12th, 2002Opened to public

Page 3: ArrayExpress A public database for microarray based gene expression data  European Bioinformatics Institute EMBL-EBI Alvis.

General overview

ArrayExpress

MIAMExpressExpression

Profiler

MAGE-ML

Internet

www

MAGE-ML

Page 4: ArrayExpress A public database for microarray based gene expression data  European Bioinformatics Institute EMBL-EBI Alvis.

ArrayExpress component architecture

Main databaseSQL derived

from MAGE-OM

Data warehousegene-centred

queries

Application serverJava servletsMAGE-OM

Imagesfile server

ArrayExpress

MAGE-ML

Submission/curation

Internet

www

Page 5: ArrayExpress A public database for microarray based gene expression data  European Bioinformatics Institute EMBL-EBI Alvis.

ArrayExpress - features

• MIAME-compliant, MAGE-ML, MAGE-OM

• Can deal with:• raw quantitation data

• processed data

• data transformations

• Independent of:• experimental platforms

• image analysis methods

• data normalization methods

Page 6: ArrayExpress A public database for microarray based gene expression data  European Bioinformatics Institute EMBL-EBI Alvis.

ArrayExpress: details

• Database schema derived from MAGE-OM

• Standard SQL, we use Oracle

• Data loader for MAGE-ML - generated• Web interface (first release 12.2.2002)

• Queries by experiment, array, sample• Browsing

• Object model-based query mechanism, automatic mapping to SQL

Page 7: ArrayExpress A public database for microarray based gene expression data  European Bioinformatics Institute EMBL-EBI Alvis.

Simplified ArrayExpress model

Page 8: ArrayExpress A public database for microarray based gene expression data  European Bioinformatics Institute EMBL-EBI Alvis.

MIAMExpress

• Data annotation and submission tool

• MIAME based web interface

• Experiment, Array, Protocol submissions

• Uses CV/ontology wherever possible

• Creates MAGE-ML files for loading into ArrayExpress

• Based on MySQL, Perl, CGI, Apache

Page 9: ArrayExpress A public database for microarray based gene expression data  European Bioinformatics Institute EMBL-EBI Alvis.
Page 10: ArrayExpress A public database for microarray based gene expression data  European Bioinformatics Institute EMBL-EBI Alvis.

Login

Pending/New Experiment

Sample1 Sample2 Sample3 Samplen Sample protocol

Hybridisations Hyb protocol

Array1 Array2 Array3 Arrayn Scanning protocol

Data1 Data2 Data3 Datan Image analysis protocol

Combined Experiment Data Transformation protocol

Submit Final free text comment

Create account

Extracts 1…nExtracts 1…n Extracts 1…n Extracts 1…n

E1 E2 En E1 E2 En E1 E2 En E1 E2 En

Extraction protocol

MIAMExpresssubmission procedure

Page 11: ArrayExpress A public database for microarray based gene expression data  European Bioinformatics Institute EMBL-EBI Alvis.

MIAMExpress design and future

• Species and domain specific pages and ontologies, ontology development

• Life-span of data submissions is long • Curation control, submissions tracking• Interaction with ArrayExpress• Full MAGE-OM, data updating• Usability, flexibility, scalability, platform

independence • User needs, free in-house installation

Page 12: ArrayExpress A public database for microarray based gene expression data  European Bioinformatics Institute EMBL-EBI Alvis.

ArrayExpress curation effort

• User support and help documentation• Submission support for MIAMExpress• Support on ontologies and CVs• Minimize free text, removal of synonyms• MIAME encouragement• Help on MAGE-ML• Goal: to provide high-quality, well-

annotated data to allow automated data analysis

Page 13: ArrayExpress A public database for microarray based gene expression data  European Bioinformatics Institute EMBL-EBI Alvis.

• E-MEXP-234 Experiment 234 viaMIAMExpress

• E-SANG-25 Experiment 25 from Sanger Institute

• A-AFFY-1034Array description 1034 from Affymetrix

• P-LABL-5 Protocol 5 for labeling

Accession numbers

Page 14: ArrayExpress A public database for microarray based gene expression data  European Bioinformatics Institute EMBL-EBI Alvis.

Data in ArrayExpress

• Human data (ironchip) from EMBL

• Yeast data from EMBL• S. pombe data Sanger

Institute

• TIGR array descriptions• Affymetrix chip designs• Direct pipeline from

Sanger (Rob Andrews)• HGMP mouse• EMBL mosquito

• (Add your name here!)

Now Work underway

Page 15: ArrayExpress A public database for microarray based gene expression data  European Bioinformatics Institute EMBL-EBI Alvis.

Data browsing and queries

Page 16: ArrayExpress A public database for microarray based gene expression data  European Bioinformatics Institute EMBL-EBI Alvis.
Page 17: ArrayExpress A public database for microarray based gene expression data  European Bioinformatics Institute EMBL-EBI Alvis.

Experiment info

Page 18: ArrayExpress A public database for microarray based gene expression data  European Bioinformatics Institute EMBL-EBI Alvis.

Sample info

Page 19: ArrayExpress A public database for microarray based gene expression data  European Bioinformatics Institute EMBL-EBI Alvis.

General overview

ArrayExpress

MIAMExpressExpression

Profiler

MAGE-ML

Internet

www

MAGE-ML

Page 20: ArrayExpress A public database for microarray based gene expression data  European Bioinformatics Institute EMBL-EBI Alvis.

Expression Profiler: EPCLUST

DATA SELECT FOLDER ANALYZE

A “CLUSTER”

URLMAP

GeneOntologyPathwaysDatabasesSPEXSOther tools

Page 21: ArrayExpress A public database for microarray based gene expression data  European Bioinformatics Institute EMBL-EBI Alvis.

>YAL036C chromo=1 coord=(76154-75048(C)) start=-600 end=+2 seq=(76152-76754)

TGTTCTTTCTTCTTCTGCTTCTCCTTTTCCTTTTTTTCCTTCTCCTTTTCCTTCTTGGACTTTAGTATAGGCTTACCATCCTTCTTCTCTTCAATAACCTTCTTTTCTTGCTTCTTCTTCGATTGCTTCAAAGTAGACATGAAGTCGCCTTCAATGGCCTCAGCACCTTCAGCACTTGCACTTGCTTCTCTGGAAGTGTCATCTGCACCTGCGCTGCTTTCTGGATTTGGAGTTGGCGTGGCACTGATTTCTTCGTTCTGGGCGGCGTCTTCTTCGAATTCCTCATCCCAGTAGTTCTGTTGGTTCTTTTTACTCTTTTTCGCCATCTTTCACTTATCTGATGTTCCTGATTGCCCTTCTTATCCCCTCAAAGTTCACCTTTGCCACTTATTCTAGTGCAAGATCTCTTGCTTTCAATGGGCTTAAAGCTTGAAAAATTTTTTCACATCACAAGCGACGAGGGCCCGTTTTTTTCATCGATGAGCTATAAGAGTTTTCCACTTTTAAGATGGGATATTACGGTGTGATGAGGGCGCAATGATAGGAAGTGTTTGAAGCTAGATGCAGTAGGTGCAAGCGTAGAGTTGTTGATTGAGCAAA_ATG_>YAL025C chromo=1 coord=(101147-100230(C)) start=-600 end=+2 seq=(101145-101747)CTTAGAAGATAAAGTAGTGAATTACAATAAATTCGATACGAACGTTCAAATAGTCAAGAATTTCATTCAAAGGGTTCAATGGTCCAAGTTTTACACTTTCAAAGTTAACCACGAATTGCTGAGTAAGTGTGTTTATATTAGCACATTAACACAAGAAGAGATTAATGAACTATCCACATGAGGTATTGTGCCACTTTCCTCCAGTTCCCAAATTCCTCTTGTAAAAAACTTTGCATATAAAATATACAGATGGAGCATATATAGATGGAGCATACATACATGTTTTTTTTTTTTTAAAAACATGGACTCGAACAGAATAAAAGAATTTATAATGATAGATAATGCATACTTCAATAAGAGAGAATACTTGTTTTTAAATGAGAATTGCTTTCATTAGCTCATTATGTTCAGATTATCAAAATGCAGTAGGGTAATAAACCTTTTTTTTTTTTTTTTTTTTTTTTGAAAAATTTTCCGATGAGCTTTTGAAAAAAAATGAAAAAGTGATTGGTATAGAGGCAGATATTGCATTGCTTAGTTCTTTCTTTTGACAGTGTTCTCTTCAGTACATAACTACAACGGTTAGAATACAACGAGGAT_ATG_

...>YBR084W chromo=2 coord=(411012-413936) start=-600 end=+2 seq=(410412-411014)CCATGTATCCAAGACCTGCTGAAGATGCTTACAATGCCAATTATATTCAAGGTCTGCCCCAGTACCAAACATCTTATTTTTCGCAGCTGTTATTATCATCACCCCAGCATTACGAACATTCTCCACATCAAAGGAACTTTACGCCATCCAACCAATCGCATGGGAACTTTTATTAAATGTCTACATACATACATACATCTCGTACATAAATACGCATACGTATCTTCGTAGTAAGAACCGTCACAGATATGATTGAGCACGGTACAATTATGTATTAGTCAAACATTACCAGTTCTCGAACAAAACCAAAGCTACTCCTGCAACACTCTTCTATCGCACATGTATGGTTCTTATTGTTTCCCGAGTTCTTTTTTACTGACGCGCCAGAACGAGTAAGAAAGTTCTCTAGCGCCATGCTGAAATTTTTTTCACTTCAACGGACAGCGATTTTTTTTCTTTTTCCTCCGAAATAATGTTGCAGCGGTTCTCGATGCCTCAAGAATTGCAGAAGTAAACCAGCCAATACACATCAAAAAACAACTTTCATTACTGTGATTCTCTCAGTCTGTTCATTTGTCAGATATTTAAGGCTAAAAGGAA_ATG_

101 Sequences relative to ORF start

GATGAG.T 1:52/70 2:453/508 R:7.52345 BP:1.02391e-33G.GATGAG.T 1:39/49 2:193/222 R:13.244 BP:2.49026e-33AAAATTTT 1:63/77 2:833/911 R:4.95687 BP:5.02807e-32TGAAAA.TTT 1:45/53 2:333/350 R:8.85687 BP:1.69905e-31TG.AAA.TTT 1:53/61 2:538/570 R:6.45662 BP:3.24836e-31TG.AAA.TTTT 1:40/43 2:254/260 R:10.3214 BP:3.84624e-30TGAAA..TTT 1:54/65 2:608/645 R:5.82106 BP:1.0887e-29...

GATGAG.TTGAAA..TTT

YGR128C + 100

Page 22: ArrayExpress A public database for microarray based gene expression data  European Bioinformatics Institute EMBL-EBI Alvis.

Upstream sequence (600bp)

GATGAG.TTGAAA..TTT

GATGAG.T W/30 TGAAA..TTT

1 mismatch

Page 23: ArrayExpress A public database for microarray based gene expression data  European Bioinformatics Institute EMBL-EBI Alvis.
Page 24: ArrayExpress A public database for microarray based gene expression data  European Bioinformatics Institute EMBL-EBI Alvis.

EPCLUST Expression data GENOMES

sequence, function, annotation

SPEXSdiscover patterns

URLMAPprovide links

Components of Expression Profilerhttp://ep.ebi.ac.uk/

Expression data

External data, toolspathways, function,

etc.

PATMATCHvisualise patterns

EP:GOGeneOntology

EP:PPIProt-Prot ia.

SEQLOGO

Page 25: ArrayExpress A public database for microarray based gene expression data  European Bioinformatics Institute EMBL-EBI Alvis.

Ackowledgments: the team (3)

Alvis BrazmaAlan Robinson Jaak Vilo

1999 NovemberMGED 1 in Hinxton, EBI

Page 26: ArrayExpress A public database for microarray based gene expression data  European Bioinformatics Institute EMBL-EBI Alvis.

Ackowledgments: the team (5)

Alvis Brazma, Alan Robinson

DatabaseUgis Sarkans

Expression ProfilerJaak Vilo

Research, studentsThomas Schlitt

2000 August

Page 27: ArrayExpress A public database for microarray based gene expression data  European Bioinformatics Institute EMBL-EBI Alvis.

Ackowledgments: the team (9)

Alvis Brazma

Database Curation MIAMExpressUgis Sarkans Helen Parkinson Mohammadreza

Shojatalab

Expression ProfilerJaak Vilo

Research, studentsThomas SchlittKatja KivinenJohan Rung

Patrick Kemmeren

2001 June

Page 28: ArrayExpress A public database for microarray based gene expression data  European Bioinformatics Institute EMBL-EBI Alvis.

Ackowledgments: the team (19)

Alvis Brazma

Database Curation MIAMExpressUgis Sarkans

Gonzalo Garcia

Helen Parkinson Mohammadreza Shojatalab

Expression ProfilerJaak Vilo

Research, studentsThomas SchlittKatja KivinenJohan Rung

Patrick KemmerenMisha Kapushesky

Lev Soinov

Koichi Tazaki

Anastasia Samsonova

Susanna SansonePhilippe Rocca-SerraEle Holloway

Niran Abeyguna- wardena

Ahmet Oezcimen

2002 February