SysMO-DB: Just Enough Exchange for Systems Biology Data and Models Carole Goble, Katy Wolstencroft, Stuart Owen, Sergejs Aleksejevs - University of Manchester Wolfgang Müller, O. Krebs, Isabel Rojas – EML Research gGmbH (=not for profit) Jacky Snoep - University of Stellenbosch eScience Workshop, Pittsburgh, PA
43
Embed
SysMO-DB: Just Enough Exchange for Systems Biology Data and Models Carole Goble, Katy Wolstencroft, Stuart Owen, Sergejs Aleksejevs - University of Manchester.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
SysMO-DB: Just Enough Exchange for Systems Biology Data and Models
Carole Goble, Katy Wolstencroft, Stuart Owen, Sergejs Aleksejevs - University of Manchester
Wolfgang Müller, O. Krebs, Isabel Rojas – EML Research gGmbH (=not for profit)Jacky Snoep - University of Stellenbosch
MS eScience Workshop, Pittsburgh, PA
SysMO=SYStems biology of Micro Organisms
(2)
(2)
(29)
(22)
(9)(4)
(1)
11 projects, 91 partners, 9 countries, started 2007
Started July 2008, 3 years, 3 staff + 3 investigators, 3 teams over 3 sites
Sensitively retrofit a data access, model handling and data integration platform.
Support and manage the diversity of data, models and competencies.
Web-based solution:exchange of data, models and processes (intra-
and inter-consortia).search for data, models and processes across
the initiative.dissemination of results.
SysMO-DB
SysMO-DB Team
University of Stellenbosch, South AfricaUniversity of Manchester, UK
Jacky Snoep
EML Research gGmbH, Germany
Isabel Rojas
University of Manchester, UK
Olga Krebs
Wolfgang Müller
Sergejs Aleksejevs
Carole Goble
Stuart Owen
Katy Wolstencroft
Connect projects, connect to outside
Project specific solutions
Internally used tools & data
Outside data and tools
Project
Public
My Disk: DataModelsWorkflows
Personal
SysMO-DB, inter-project
Own solutions
Suspicion
Data issues
Resource Issues
Own data solutions and collaboration environments. wikis, e-Groupware, PHProject, BaseCamp, PLONE, Alfresco, bespoke commercial … files and spreadsheets.
Suspicion and caution over sharing.Interesting interplay between modellers, experimentalists and bioinformaticians.
Many do not have data, or follow the standards that exist or know who is doing what. Much of the data cannot be compared
Different organisms, different strains.
No extra resources for the consortiums91 institutes, 11 consortiums, some overlapping
Principles…
Go for a series of small victories Realistic Don‘t reinvent Migrate to standards Sustainable and extensible
Provide instant gratification Address doubt and anxiety Build it
Modellers
Exchange
Experimentalists
Exch
ange
Exchange
Exchange
Bioinformaticians
Three types of people
„Natural“ collaboration within SysMO
Short, simplified, black and white: Collaboration during
project design Varying methods of
collaboration during project Binomes (One modeller, one
experimentalist) Groups collaborating with
groups (occasional/formalized exchange of information)
Largely a story about how to handle Excel sheets for user‘s benefits
SysMO Just Enough Exchange
COSMIC
Alfresco
BaCell-SysMO
Alfresco
MOSES
Wiki
SysMO-LAB
Wiki
SABIO-RK
Public Resources
SABIO-RK
Spreadsheets
SpreadsheetsSpread
sheets
Spreadsheets
BASE
Need for tradeoff
Huge number of systems Huge number of standards (MIBBI, OBO…) Some of them big standards
Too much to cope with a few people, but: Comparison needs standardisation Search needs standardisation Need to move incrementally to just-enough
standard implementation
Path = goalThe journey is part of the reward
Let people use what they use anyway If changes necessary,
be as unintrusive as possible Be aware of legacy data Nudge people towards best practises Give instantly useful added value to as many
users as possible: Simple search, simple exchange, simple tool use
A roadmap
Provide convincing Web 2.0 functionality for use and as appetizer Yellow pages SOPs
Upload service: Hand-triggered upload of link/file Hand-added metadata
Harvesting+change detection service Automatic download Hand-added metadata
Support for Excel templates Promote internal standards by use + tooling Mappers + parsers Classifiers
Use other data types where appropriate SBML, Matlab, Mathematica…
Stability hierarchy
Single group
Single SysMO project
Whole SysMO
Template for a group of experiments
More stable JERM data modelTemplate best practise
Project-level template
Increasing stability
Parsers/ annotators
Enter into that
Use mappers where needed
JERM Extraction Architecture
MapperExtractor
Template recognizer
Data handlerHarvester Data handler
Classifier/Dispatcher Template recognizer
Extractor
DataM
etad.
Data
Metad.
Data
MapperParser
Data
Metad.
MapperExtractor
Template recognizer
Data handlerHarvester Data handler
Classifier/Dispatcher Template recognizer
Extractor
Data
MapperParser
Project repositories
OopsSome projects not prolonged
Need all project data in the system fast,so…
JERM Extraction Architecture
MapperExtractor
Template recognizer
Data handlerHarvester Data handler
Classifier/Dispatcher Template recognizer
Extractor
DataM
etad.
Data
Metad.
Data
MapperParser
Data
Metad.
MapperExtractor
Template recognizer
Data handlerHarvester Data handler
Classifier/Dispatcher Template recognizer
Extractor
DataData
Data
MapperParser
DataProject repositories
Lessons we‘re learningSome interesting bits along the way
Subsetting: Don‘t overwhelm
Standards need to be comprehensive
Goal: „Minimum information“… (MIBBI)
Tends to be superset of what is needed for a project
Example for non-applicable attributes Tissue of a single cell Gender
Useful to use adapted subset-templates
Experimental design selection list
From biofolksonomy to ontology
Observation: Fast growing set of
standards Standards are moving
target Incremental approach
Keyword annotation Controlled selection lists Home-brewed taxonomies Use/contribution to
standard ontologies Provide migration tools
Tags + suggestions
Home-brewed taxonomy
A word on software
Template tooling Excel JAVA
SysMO-SEEK (open source under Apache license) Ruby on Rails
Convention over configuration Libraries & plugins
Rails specific (e.g. acts_as_authenticated) SOLR & Lucene introduce JAVA/Ruby
Database:MySQL also tested with SQLite(exclude db depedencies)
Summary
SysMO-DB as a virtual meeting point for different flavours of systems biologists
SysMO-DB‘s mantra: Just enough just in time Flexible JERM extracture architecture Just enough metadata (incremental) Lot done still a lot todo
Challenges ahead…
Social PALs work great and motivated Now need moremoremore datadatadata
Technical Publishing into public repositories Search + exploration: The test for data quality
Hierarchical Faceted Search Distributed search via Taverna workflows
More workflows via SysMO-SEEK Improve modelling support
Bonus track: what if…
…the average data quality is below par?
„Nagging functionality“ Remind people of potentially faulty metadata Give suggestions what to improve and how Give possibility to create automatic mappings