tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives

Post on 10-May-2015

189 Views

Category:

Health & Medicine

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives Sherry Cao and Keith Elliston

Transcript

TranSMART CoreFrom tool to ecosystem

Kees van BochovetranSMART Workshop Amsterdam

June 17, 2013

Today, we have a chance to write history.

•Microarray data analysis support•Load public microarray data from GEO•Store and retrieve saved analyses•Search on gene name, disease name etc.•Genomic variants and VCF support•Load TCGA studies we have access to•Load 1000 Genomes data

$$$$$$$$$$$$

•Microarray data analysis support•Load public microarray data from GEO•Store and retrieve saved analyses•Search on gene name, disease name etc.•Genomic variants and VCF support•Load TCGA studies we have access to•Load 1000 Genomes data

$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$

There has to be a better way.

costs $ 0!

No-brainer!

Ehm.. wait a minute…

Let’s have a look how these scientists in academia are doing.

They love to collaborate right?!

In 2003…(Ancient history; before Facebook)

Yet Another ‘New’ Web-based Solution for the Management of Microarray Data ?!

Not Invented Here Syndrome

Image from Rob Hooft, CTO Netherlands Bioinformatics Centrehttp://nothinkingbeyondthispoint.blogspot.nl/2011/11/decision-tree-for-scientific.html

What about all these great FP6, FP7, IMI, … projects?

Source code of major projects isreadily available on GitHub

But… I’m afraid it’s still up to you and me to put the pieces together.

Phenotype DatabaseWritten in Grails, supports several types of omics data, provides data integration and visualization, has R, Groovy and PHP API’s. Sounds familiar?

http://phenotypefoundation.org

share

reuse

specialize

Writing good software is hard.

So far…

• TranSMART has a huge business potential. It’s no silver bullet though.

• Scientists sometimes have trouble re-using each others’ work. Especially when it comes to open source software.

Do they?

Time to look at some succes stories.

R and Bioconductor

Who doesn’t love R?

Website looks as if dates from Stone Age.Must be those LaTeX-loving physicists.

Very active community, and…lots of packages.

Governance of R community

Brian Ripley: “The R Project is governed by a self-perpetuating oligarchy, a group with a lot of power. R was principally developed for the benefit of the core team.”

As cited on http://blog.revolutionanalytics.com/2011/08/brian-ripley-on-the-r-development-process.html

Galaxy

Galaxy is the most widely used open source bioinformatics web interface AFAIK.

Probably in no small amount thanks to their continuous dedication to

improving the UI.But there’s something else.

Galaxy Toolshed

• An open source CMS (Content Management System) written in Python, nowadays backing thousands of production grade websites

• Started by 2 developers in 2000, now an active open source project with hundreds of active developers

• In 2004, the Plone Foundation was formed to formalize IP and secure the future of Plone

• Plone Collective has hundreds of plugins

What do all these success stories have in common?

Bioconductor PackagesGalaxy ToolshedPlone CollectiveDrupal Modules

Lessons for tranSMART

TranSMART needs a marketplace and a thriving community to survive.

To get to a functioning marketplace, we need a well-designed core.

There is also another reason.

TranSMART Contributions - Pharma

• Janssen– Initial version of tranSMART– Genomics viewer using IGV and GenePattern– Faceted Search interface (results browsing)

• Millenium– Loading TCGA and many GEO studies– R interface for interacting with data directly in R– Several R analyses available directly in GUI

TranSMART Contributions - Pharma

• Sanofi– Cleaner user interface– Added metadata layer for all concepts– Study/Program categorization & file management

• Pfizer– GWAS upload (VCF), data storage and analysis– Enhanced data export capabilities

This is a mess.

Another reason why we need that core.

Start the Core: I2B2 Refactoring

1. I2B2 was integrated with tranSMART, but the I2B2 API abstractions were leaked all over the place in the tranSMART application.

2. We agreed in the London meeting that all parties would set some time apart for working on the core.

3. Combined, it made sense to start working at the clinical data API, properly using the I2B2 API where possible, and re-implement all I2B2 functionality in a new ‘core-db’ plugin.

The first version of core-integration was completed half April.

By then, all webservice calls to what formerly was an outdated version of the

I2B2 Ontology and CRC cells, were handled by the newly implemented core-db plugin.

Also, a set of tests was written in the process and API documentation generated.

In the long run, I believe forming a good distributed working group on the core API is a more important delivery of this workshop

than crunching out a stable 1.1 version.

That’s how we write that history

Kees van Bochove - The Hyve

Current tranSMART Architecture

TranSMART’s Strong Points

• Powerful, ready to go user interface for common analyses (survival analysis, gene expression heatmaps etc.)

• Leverages i2b2 data model for clinical data and offers unified view over different studies

• Uses a lot of good open source technology under the hood (Grails, R, SOLR, Pentaho) leveraging existing community developments

TranSMART Building Blocks

• R: open source statistics package with CRAN, an active repository in which many algorithms and statistical packages are published

• Grails: a rapid application development framework in Groovy leveraging Java technology such as Hibernate, Spring, Quartz

• I2b2: domain specific open source package for storing and querying clinical data

• GenePattern, maybe soon: Galaxy, KNIME?

TranSMART’s Weaknesses

• Large monolithic codebase with little modularization beyond the standard Grails MVC setup

• Code quality is problematic, especially JavaScript• Test coverage is low, no functional / web tests

and little unit and integration tests• No clear internal API’s, only a service level that

does the plumbing.• I2b2 integration violates i2b2 abstractions

tranSMART Plans

• Use a clearly modularized architecture with separation of clinical, high dimensional, search and metadata storage; workflow execution enginges and knowledge repository

• Define clear API and rewrite current implementations with good test coverage

• Use i2b2 data model, re-harmonize with latest i2b2 APIs, and don’t use i2b2 binaries directly

• Separate analysis definitions and abstract from workflow execution engine

http://prezi.com/t6twshyctdsk/transmart-core-refactoring

Kees van Bochove - The Hyve

Target tranSMART Architecture

Further reading

• Description of core API efforts: http://thehyve.nl/rewiring-transmart

• In depth description of i2b2 refactoring: http://thehyve.nl/inital-work-on-transmarts-core

• Overview of tranSMART Core API so far: http://thehyve.github.io/transmart-core-api/

• Example of continuous integration test suite (of core-db): https://ci.ctmmtrait.nl/browse/TM-COREDB-JOB1-51/test

top related