Top Banner
InterMine Integrated Data Warehouse Use Cases: Arabidopsis & Medicago Genome Projects Vivek Krishnakumar Plant Genomics Group (EUK) IFX Research WIPS Meeting, 03 October 2014
13

Quick Intro to InterMine within AIP and MTGD - JCVI Research Works-in-Progress Meeting

Jul 17, 2015

Download

Science

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Quick Intro to InterMine within AIP and MTGD - JCVI Research Works-in-Progress Meeting

InterMineIntegrated Data Warehouse

Use Cases: Arabidopsis & Medicago Genome Projects

Vivek KrishnakumarPlant Genomics Group (EUK)

IFX Research WIPS Meeting, 03 October 2014

Page 2: Quick Intro to InterMine within AIP and MTGD - JCVI Research Works-in-Progress Meeting

Overview

• Introduction

• InterMine Integrated data warehouse, Extensible data model,

Flexible query system

Web and Programmatic Interface

Other InterMine instances

• Use cases Arabidopsis Information Portal (AIP)

Medicago truncatula Genome Database (MTGD)

• Summary Advantages

Caveats

Page 3: Quick Intro to InterMine within AIP and MTGD - JCVI Research Works-in-Progress Meeting

Introduction

For genome projects that wish to expose their data via the web (query, visualize, warehouse) to foster scientific collaboration, there are several technologies available:

• JCVI developed software Manatee (backed by an RDBMS)

• Externally developed software BioMart (federated from various databases)

Tripal (powered by Drupal, backed by CHADOdb)

InterMine

Page 4: Quick Intro to InterMine within AIP and MTGD - JCVI Research Works-in-Progress Meeting

InterMine

• Functions as a data warehouse for the integration of complex

biological data. Integration across data types occurs based on

a common identifier (e.g. gene primary ID)

• Uses a flexible and extensible data model, controlled by XML

files, driven by ontologies (Sequence [SO], Gene [SO], etc.)

Genomics, Proteomics, Interactions, Homology,

Expression, Pathways (and more data types)

Parsers for commonly used biological data formats

Provides framework for adding your own data

• Offers a flexible query system, optimized via precomputed

tables (no need for schema denormalization)

Smith, RN. et al. InterMine: a flexible data warehouse system for the integration and analysis of heterogeneous biological data

Bioinformatics (2012) 28 (23): 3163-3165

Page 5: Quick Intro to InterMine within AIP and MTGD - JCVI Research Works-in-Progress Meeting

InterMine (contd.)

• Provides a user-friendly web interface exposing powerful features: Analysis of lists (facilitate enrichment studies)

Full-featured report pages (one-stop shop)

Interactive result tables (sort, filter, summarize)

Visual query builder (no need to write SQL!)

Quick search and Region-based search

• Fosters development of external applications using data hosted within InterMine via Application Programming Interfaces (API): RESTful

Perl, Python, Ruby, Java, JavaScript

Kalderimis, A. et al. InterMine: extensive web services for modern biology

Nucl. Acids Res. (1 July 2014) 42 (W1): W468-W472

Page 6: Quick Intro to InterMine within AIP and MTGD - JCVI Research Works-in-Progress Meeting

Public “Mines”

• InterMine supports querying across mines

for cross-database integration

• Vast number of warehouses powered by

InterMine already exist

Page 7: Quick Intro to InterMine within AIP and MTGD - JCVI Research Works-in-Progress Meeting

Arabidopsis Information Portal (AIP)

• AIP origins Funded by NSF in response to community needs, following

termination of funding to TAIR

• AIP objectives Develop a community web resource that…

– is sustainable and fundable and community-extensible

– hosts analysis & visualization tools, user data spaces

Federation: integrate diverse data sets from distributed data sources; foster development of tools for and by the community

Maintenance of the Col-0 gold standard annotation

• AIP methods Assimilate TAIR data

Host an InterMine instance devoted to Arabidopsis (thale cress)

Offer and consume RESTful web services

Integrate and utilize iPlant resources

Page 8: Quick Intro to InterMine within AIP and MTGD - JCVI Research Works-in-Progress Meeting

ThaleMinehttps://apps.araport.org/thalemine

• An InterMine interface to Arabidopsis genomic data

• Integrates a wide variety of data types (A-E, H), some of which are warehoused and others are federated via web services

• Embedded elements visualizing gene structure (JBrowse, not shown), interaction networks (F), expression patterns (G)

Page 9: Quick Intro to InterMine within AIP and MTGD - JCVI Research Works-in-Progress Meeting

Visual Query Builder

Image created by Benjamin Rosen (Bioinformatics Analyst, Plant Genomics Group)

Page 10: Quick Intro to InterMine within AIP and MTGD - JCVI Research Works-in-Progress Meeting

Images created by Benjamin Rosen (Bioinformatics Analyst, Plant Genomics Group)

Inte

racti

ve R

esu

lt T

ab

les

Reg

ion

-based

searc

h

Page 11: Quick Intro to InterMine within AIP and MTGD - JCVI Research Works-in-Progress Meeting

MedicMinehttp://medicmine.jcvi.org

• NSF funded project to assist with the curation of the Medicago truncatula Genome Assembly and Annotation (funding ended August 2014)

• In order to warehouse and prolong the project data, an InterMine interface for Medicago was implemented (backed by a CHADO database)

• Provides similar kind of functionality available via ThaleMine

Page 12: Quick Intro to InterMine within AIP and MTGD - JCVI Research Works-in-Progress Meeting

Summary

• Advantages InterMine is a powerful biological data warehouse

Performs complex data integration

Allows fast and flexible querying

Well documented programmatic interface

Cookie-cutter, user-friendly web interface

Facilitates cross-talk between “mines”

• Caveats Adding more data requires a full database rebuild (incremental loading

is not possible) because of the integration step

• About InterMine: Developed by the Micklem Lab at the University of Cambridge, UK

Written in Java, backed by PostgreSQLdb, deployed under Tomcat. Documentation and downloads available at http://www.intermine.org

Page 13: Quick Intro to InterMine within AIP and MTGD - JCVI Research Works-in-Progress Meeting

Chris Town, PI

Lisa McDonald

Education and

Outreach

Coordinator

Chris Nelson

PMJason Miller, Co-PI

Technical Lead

Erik Ferlanti

SE

Vivek Krishnakumar

BESvetlana Karamycheva

BE

Eva Huala

Project lead, TAIR

Bob Muller

Technical lead, TAIR

Gos Micklem, co-PI Sergio Contrino

Software Engineer

Matt Vaughn

co-PI Steve Mock

Advanced Computing

Interfaces

Rion Dooley,

Web and Cloud

Services

Matt Hanlon,

Web and Mobile

Applications

Maria Kim

BE

Ben Rosen

BA