ArrayExpress A public database for microarray based gene expression data European Bioinformatics Institute EMBL-EBI Alvis.

Post on 26-Mar-2015

219 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

Transcript

ArrayExpressA public database for microarray based gene expression datahttp://www.ebi.ac.uk/microarray/

European Bioinformatics Institute

EMBL-EBI

Alvis Brazma, Helen Parkinson, Ugis Sarkans, Mohammadreza Shojatalab, Jaak Vilo + team

MGED IV, Boston, February 2002

ArrayExpress

• Standards:MIAME-compliant• Data model: MAGE-OM• Data input: MAGE-ML, web• Data output: HTML, MAGE-ML,

TAB-delimited, link to Expression Profiler

• Data curation: Team of curators• Data sets: Yeast, human

Tuesday, February 12th, 2002Opened to public

General overview

ArrayExpress

MIAMExpressExpression

Profiler

MAGE-ML

Internet

www

MAGE-ML

ArrayExpress component architecture

Main databaseSQL derived

from MAGE-OM

Data warehousegene-centred

queries

Application serverJava servletsMAGE-OM

Imagesfile server

ArrayExpress

MAGE-ML

Submission/curation

Internet

www

ArrayExpress - features

• MIAME-compliant, MAGE-ML, MAGE-OM

• Can deal with:• raw quantitation data

• processed data

• data transformations

• Independent of:• experimental platforms

• image analysis methods

• data normalization methods

ArrayExpress: details

• Database schema derived from MAGE-OM

• Standard SQL, we use Oracle

• Data loader for MAGE-ML - generated• Web interface (first release 12.2.2002)

• Queries by experiment, array, sample• Browsing

• Object model-based query mechanism, automatic mapping to SQL

Simplified ArrayExpress model

MIAMExpress

• Data annotation and submission tool

• MIAME based web interface

• Experiment, Array, Protocol submissions

• Uses CV/ontology wherever possible

• Creates MAGE-ML files for loading into ArrayExpress

• Based on MySQL, Perl, CGI, Apache

Login

Pending/New Experiment

Sample1 Sample2 Sample3 Samplen Sample protocol

Hybridisations Hyb protocol

Array1 Array2 Array3 Arrayn Scanning protocol

Data1 Data2 Data3 Datan Image analysis protocol

Combined Experiment Data Transformation protocol

Submit Final free text comment

Create account

Extracts 1…nExtracts 1…n Extracts 1…n Extracts 1…n

E1 E2 En E1 E2 En E1 E2 En E1 E2 En

Extraction protocol

MIAMExpresssubmission procedure

MIAMExpress design and future

• Species and domain specific pages and ontologies, ontology development

• Life-span of data submissions is long • Curation control, submissions tracking• Interaction with ArrayExpress• Full MAGE-OM, data updating• Usability, flexibility, scalability, platform

independence • User needs, free in-house installation

ArrayExpress curation effort

• User support and help documentation• Submission support for MIAMExpress• Support on ontologies and CVs• Minimize free text, removal of synonyms• MIAME encouragement• Help on MAGE-ML• Goal: to provide high-quality, well-

annotated data to allow automated data analysis

• E-MEXP-234 Experiment 234 viaMIAMExpress

• E-SANG-25 Experiment 25 from Sanger Institute

• A-AFFY-1034Array description 1034 from Affymetrix

• P-LABL-5 Protocol 5 for labeling

Accession numbers

Data in ArrayExpress

• Human data (ironchip) from EMBL

• Yeast data from EMBL• S. pombe data Sanger

Institute

• TIGR array descriptions• Affymetrix chip designs• Direct pipeline from

Sanger (Rob Andrews)• HGMP mouse• EMBL mosquito

• (Add your name here!)

Now Work underway

Data browsing and queries

Experiment info

Sample info

General overview

ArrayExpress

MIAMExpressExpression

Profiler

MAGE-ML

Internet

www

MAGE-ML

Expression Profiler: EPCLUST

DATA SELECT FOLDER ANALYZE

A “CLUSTER”

URLMAP

GeneOntologyPathwaysDatabasesSPEXSOther tools

>YAL036C chromo=1 coord=(76154-75048(C)) start=-600 end=+2 seq=(76152-76754)

TGTTCTTTCTTCTTCTGCTTCTCCTTTTCCTTTTTTTCCTTCTCCTTTTCCTTCTTGGACTTTAGTATAGGCTTACCATCCTTCTTCTCTTCAATAACCTTCTTTTCTTGCTTCTTCTTCGATTGCTTCAAAGTAGACATGAAGTCGCCTTCAATGGCCTCAGCACCTTCAGCACTTGCACTTGCTTCTCTGGAAGTGTCATCTGCACCTGCGCTGCTTTCTGGATTTGGAGTTGGCGTGGCACTGATTTCTTCGTTCTGGGCGGCGTCTTCTTCGAATTCCTCATCCCAGTAGTTCTGTTGGTTCTTTTTACTCTTTTTCGCCATCTTTCACTTATCTGATGTTCCTGATTGCCCTTCTTATCCCCTCAAAGTTCACCTTTGCCACTTATTCTAGTGCAAGATCTCTTGCTTTCAATGGGCTTAAAGCTTGAAAAATTTTTTCACATCACAAGCGACGAGGGCCCGTTTTTTTCATCGATGAGCTATAAGAGTTTTCCACTTTTAAGATGGGATATTACGGTGTGATGAGGGCGCAATGATAGGAAGTGTTTGAAGCTAGATGCAGTAGGTGCAAGCGTAGAGTTGTTGATTGAGCAAA_ATG_>YAL025C chromo=1 coord=(101147-100230(C)) start=-600 end=+2 seq=(101145-101747)CTTAGAAGATAAAGTAGTGAATTACAATAAATTCGATACGAACGTTCAAATAGTCAAGAATTTCATTCAAAGGGTTCAATGGTCCAAGTTTTACACTTTCAAAGTTAACCACGAATTGCTGAGTAAGTGTGTTTATATTAGCACATTAACACAAGAAGAGATTAATGAACTATCCACATGAGGTATTGTGCCACTTTCCTCCAGTTCCCAAATTCCTCTTGTAAAAAACTTTGCATATAAAATATACAGATGGAGCATATATAGATGGAGCATACATACATGTTTTTTTTTTTTTAAAAACATGGACTCGAACAGAATAAAAGAATTTATAATGATAGATAATGCATACTTCAATAAGAGAGAATACTTGTTTTTAAATGAGAATTGCTTTCATTAGCTCATTATGTTCAGATTATCAAAATGCAGTAGGGTAATAAACCTTTTTTTTTTTTTTTTTTTTTTTTGAAAAATTTTCCGATGAGCTTTTGAAAAAAAATGAAAAAGTGATTGGTATAGAGGCAGATATTGCATTGCTTAGTTCTTTCTTTTGACAGTGTTCTCTTCAGTACATAACTACAACGGTTAGAATACAACGAGGAT_ATG_

...>YBR084W chromo=2 coord=(411012-413936) start=-600 end=+2 seq=(410412-411014)CCATGTATCCAAGACCTGCTGAAGATGCTTACAATGCCAATTATATTCAAGGTCTGCCCCAGTACCAAACATCTTATTTTTCGCAGCTGTTATTATCATCACCCCAGCATTACGAACATTCTCCACATCAAAGGAACTTTACGCCATCCAACCAATCGCATGGGAACTTTTATTAAATGTCTACATACATACATACATCTCGTACATAAATACGCATACGTATCTTCGTAGTAAGAACCGTCACAGATATGATTGAGCACGGTACAATTATGTATTAGTCAAACATTACCAGTTCTCGAACAAAACCAAAGCTACTCCTGCAACACTCTTCTATCGCACATGTATGGTTCTTATTGTTTCCCGAGTTCTTTTTTACTGACGCGCCAGAACGAGTAAGAAAGTTCTCTAGCGCCATGCTGAAATTTTTTTCACTTCAACGGACAGCGATTTTTTTTCTTTTTCCTCCGAAATAATGTTGCAGCGGTTCTCGATGCCTCAAGAATTGCAGAAGTAAACCAGCCAATACACATCAAAAAACAACTTTCATTACTGTGATTCTCTCAGTCTGTTCATTTGTCAGATATTTAAGGCTAAAAGGAA_ATG_

101 Sequences relative to ORF start

GATGAG.T 1:52/70 2:453/508 R:7.52345 BP:1.02391e-33G.GATGAG.T 1:39/49 2:193/222 R:13.244 BP:2.49026e-33AAAATTTT 1:63/77 2:833/911 R:4.95687 BP:5.02807e-32TGAAAA.TTT 1:45/53 2:333/350 R:8.85687 BP:1.69905e-31TG.AAA.TTT 1:53/61 2:538/570 R:6.45662 BP:3.24836e-31TG.AAA.TTTT 1:40/43 2:254/260 R:10.3214 BP:3.84624e-30TGAAA..TTT 1:54/65 2:608/645 R:5.82106 BP:1.0887e-29...

GATGAG.TTGAAA..TTT

YGR128C + 100

Upstream sequence (600bp)

GATGAG.TTGAAA..TTT

GATGAG.T W/30 TGAAA..TTT

1 mismatch

EPCLUST Expression data GENOMES

sequence, function, annotation

SPEXSdiscover patterns

URLMAPprovide links

Components of Expression Profilerhttp://ep.ebi.ac.uk/

Expression data

External data, toolspathways, function,

etc.

PATMATCHvisualise patterns

EP:GOGeneOntology

EP:PPIProt-Prot ia.

SEQLOGO

Ackowledgments: the team (3)

Alvis BrazmaAlan Robinson Jaak Vilo

1999 NovemberMGED 1 in Hinxton, EBI

Ackowledgments: the team (5)

Alvis Brazma, Alan Robinson

DatabaseUgis Sarkans

Expression ProfilerJaak Vilo

Research, studentsThomas Schlitt

2000 August

Ackowledgments: the team (9)

Alvis Brazma

Database Curation MIAMExpressUgis Sarkans Helen Parkinson Mohammadreza

Shojatalab

Expression ProfilerJaak Vilo

Research, studentsThomas SchlittKatja KivinenJohan Rung

Patrick Kemmeren

2001 June

Ackowledgments: the team (19)

Alvis Brazma

Database Curation MIAMExpressUgis Sarkans

Gonzalo Garcia

Helen Parkinson Mohammadreza Shojatalab

Expression ProfilerJaak Vilo

Research, studentsThomas SchlittKatja KivinenJohan Rung

Patrick KemmerenMisha Kapushesky

Lev Soinov

Koichi Tazaki

Anastasia Samsonova

Susanna SansonePhilippe Rocca-SerraEle Holloway

Niran Abeyguna- wardena

Ahmet Oezcimen

2002 February

top related