Denise Carvalho-Silva Ensembl Outreach Team On behalf of Ensembl and ENA teams European Molecular Biology Laboratories Euroepan Bioinformatics Institute SME Bioinformatics Forum Barcelona 8-9 October 2012 Ensembl and ENA High level overview and use cases
24
Embed
Ensembl and ENA · Kethi Reddy, Stephane Rivière, Marc Rosello, Alexander Senf, Dimitriy Smirnov, Petra Ten Hoopen, Daniel Vaughan, Robert Vaughan, Vadim Zalunin and Guy Cochrane
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Denise Carvalho-Silva Ensembl Outreach Team
On behalf of Ensembl and ENA teams
European Molecular Biology Laboratories
Euroepan Bioinformatics Institute
SME Bioinformatics Forum
Barcelona 8-9 October 2012
Ensembl and ENA High level overview and use cases
Outline
Ensembl project: background and goals
Data available
Data access and Ensembl tools
Use cases: Ensembl and ENA
Ensembl Outreach and Support
Acknowledgements
Ensembl project
Launched in 1999: before the release of the
draft of the human genome
Joint project between the EBI and WTSI
launched in March 2000 www.ensembl.org
Goals Provide comprehensive annotation of genomes
Integrate the annotation with other biological data
Make them all publicly available
+ many more
Ensembl: an integration point
66 vertebrate genomes Release 68 July 2012
Extends the use of Ensembl to other species
Wider taxonomic range (v15, 354 genomes)
6
Annotation of non-vertebrate genomes
launched in 2009 www.ensemblgenomes.org
Data available in Ensembl 68
• Gene annotation for 66 vetebrate species
• Variation data for 19 species
• Comparative Genomics data for 69 species
• Regulation data for 16 species
Data access: browser sites
www.ensembl.org
pre.ensembl.org
archive.ensembl.org
Data access: BioMart
• web interface to export Ensembl data
• no programming skills required
DATASET
FILTER ATTRIBUTES
RESULTS
www.ensembl.org/biomart/martview
BioMart results
Tables/sequences
Export/email
Data access: APIs and FTP
• Ensembl Database (open source): Perl-API, MySQL
http://www.ensembl.org/info/data/ftp/index.html
• FTP download site
http://www.ensembl.org/info/docs/api/index.html
Ensembl Tools h
ttp
://
ww
w.e
nsem
bl.
org
/to
ols
.htm
l
Assembly converter
ID history converter
Virtual Machine
Region Report
Variant Effect Predictor
Gene annotation
• Automatic pipeline
Genome-wide determination
• Manual curation
Gene determination on a case-by-case by an
annotator
+ 63 species
+ gene lists 5 species
Ensembl (20_)
Havana (00_)
Merged (“gold”)
Havana (00_)
Gene annotation on the browser
• Merged (“gold”) gene set: identical annotation from
Ensembl and Havana for human, mouse, zebrafish
• high confidence and quality
Exons are drawn as boxes. Filled boxes are translated (coding) exons, empty boxes are untranslated regions (UTRs).
Biological Evidence
• International Nucleotide Sequence databases
• Protein sequence databases
• NCBI RefSeq
• RNAseq (transcriptomic) data
ENA provides a comprehensive, accessible and publicly available repository for nucleotide sequence data
Funded by the Wellcome Trust, NIH-NHGRI, EU and EMBL
Ensembl Team Retreat 2012 Norwich, United Kingdom
Clara Amid, Ewan Birney, Lawrence Bower, Ana Cerdeño-Tárraga, Ying Cheng, Iain Cleland, Nadeem Faruque, Richard Gibson, Neil Goodgame, Christopher Hunter, Mikyung Jang, Rasko Leinonen, Xin Liu, Arnaud Oisel, Nima Pakseresht, Sheila Plaister, Rajesh Radhakrishnan, Kethi Reddy, Stephane Rivière, Marc Rosello, Alexander Senf, Dimitriy Smirnov, Petra Ten Hoopen, Daniel Vaughan, Robert Vaughan, Vadim Zalunin and Guy Cochrane