MIBBI: Background, Context and Plans Chris Taylor chrisftaylor @ gmail.com

MIBBI: Background,Context and Plans

Chris [email protected]

http://mibbi.org/

Mechanisms of scientific advance

Well-oiled cogs meshing perfectly (would be nice)

How well are things working?—Cue the Tower of Babel analogy…—Situation is improving with respect to standards—But few tools, fewer carrots (though some

sticks)

Why do we care about that..?—Data exchange / depositionComprehensibility (/quality) of workScope for reuse (parallel or orthogonal)

“Publicly-funded research data are a public good, produced in the public interest”

“Publicly-funded research data should be openly available to the maximum extent possible.”

ProteoRED’s MIAPE satisfaction survey

Spanish multi-site collaboration: provision of proteomics services MIAPE customer satisfaction survey (compiled November 2008)

— http://www.proteored.org/MIAPE_Survey_Results_Nov08.html— Responses from 31 proteomics experts representing 17 labs

Yes: 95%No: 5%

Technologically-delineated views of the world A: transcriptomics B: proteomics C: metabolomics …and…

Biologically-delineated views of the world A: plant biology B: epidemiology C: microbiology …and…

Generic features (‘common core’) — Description of source biomaterial — Experimental design components

Arrays

Scanning Arrays &Scanning

Columns

GelsMS MS

FTIR

NMR

Columns

Modelling the biosciences (inefficiently)

‘Omics’ is about as useful as a chocolate teapot

Assay: Omics and miscellaneous techniques

Investigation:

Medical syndrome, environmental effect, etc.Study: Toxicology, environmental science, etc.

Reporting guidelines — a case in point

MIAME, MIAPE, MIAPA, MIACA, MIARE, MIFACE, MISFISHIE, MIGS, MIMIx, MIQAS, MIRIAM, (MIAFGE, MIAO), My Goodness…

‘MI’ checklists usually developed independently, by groups working within particular biological or technological domains

— Difficult to obtain an overview of the full range of checklists

— Tracking the evolution of single checklists is non-trivial— Checklists are inevitably partially redundant one against

another— Where they overlap arbitrary decisions on wording and

sub structuring make integration difficult

Significant difficulties for those who routinely combine information from multiple biological domains and technology platforms

— Example: An investigation looking at the impact of toxins on a sentinel species using proteomics (‘eco-toxico-proteomics’)

— What reporting standard(s) should they be using?

The MIBBI Project (mibbi.org)

[†] Denotes that a specification is provided as a suite of related documents

CONCEPT SPECIALISATION ● C

IMR [†]

● M

IACA

● M

IAM

E

● M

IAM

E/E

nv

● M

IAM

E/N

utr

● M

IAM

E/P

lant

● M

IAM

E/T

ox

● M

IAPA

● M

IAPE [†]

● M

IARE

● M

IFlo

wCyt

● M

IGen

● M

IGS/M

IMS

● M

IMIx

● M

IMPP

● M

INI

study inputs study design ●generic organism ●

cells / microbes

plant

animal

mouse

human

population

environmental sample

environment / habitat

in silico model

study procedures organism maintenance

animal husbandry

cell / microbe culture

plant cultivation

acclimation

preconditioning / pretreatment ●organism manipulation

assay inputs generic study input

organism part ●organism state

organism trait

biomolecule

synthetic analyte ●silencing RNA reagent

Version 0.7 (2008-04-10)

Comparison of MIBBI-registered projects [21] ● Release

Granularity Coarse Medium Fine

Maturity ● Planned ● Drafting



Interaction graph for projects (line thickness & colour saturation show similarity)


Drafting MIBBI Foundry modules

Analytical approach proved ‘challenging’ Cross analyses were either too coarse or too depressing Conclusion: no ‘perfect’ solution…

If in doubt, hack (a.k.a. ‘iterative development’) Start with one set of guidelines, breaking it into ‘paragraphs’ Add another set, breaking it up similarly (‘shared subject’) Where there are overlaps, seek to resolve

— If similar, aim for an ‘average’ module— If distinct, use core and extension modules— Record dependencies in a matrix (for reference)

‘Normalise’ (look for efficiencies, to a point)

Validation Asking for something like MIxxx should get something like

MIxxx Weigh the conflicts/compromises; reexamine extensions etc.

Current coverage: Portal versus Foundry

Checklists covered to date (x) MIGS/MIMS, MIAPE, MIFlowCyt, MIARE, ‘Env’ extensions

Modules developed to date 35 (set to rise rapidly)…

‘Pedro’ tool → XML → (via XSLT) Wiki code (etc.)

MICheckout: Supporting Users

Future direction for MICheckout?

Current status Very simple interface

— Pick what you want, in the order you want— Download or view in the format you want

Issues with the current interface— Pick what you want, in the order you want (=anarchy)— No way to work out everything that you need (fiddly bits)

Different approaches1.Wizard-based Q&A for normal users, plus ‘advanced’ interface

— Simple ordered (ISA) questions for users; high level concepts

— Advanced interface similar to the current one2.Domain-specific-MI-based concepts as keys/shortcuts

— “I normally get MIxxx – please give me the equivalent”— Similar advanced access to #1

http://isa-tools.org

The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project

18

Example of guiding the experimentalist to search and select a term from the EnvO ontology, to describe the habitat of a sample

(Ontologies, accessed in real time via the Ontology Lookup Service and

BioPortal.)

BII @ the NERC Environmental Bioinformatics Centre

ISA Software Users

Several groups have nowbegun to use all or part ofthe ISA software suite

Easy to get going by usingthe data entry tool alone(ISAcreator)

Power users can reconfigureISAcreator to meet local need(ISAconfigurator)

Some skill required to installthe full suite (back end stuff)

Satisfies two needs:•Internal data management•Requirement to share data

The BioSharing project provides stableweb-based catalogues and a userforum. The project seeks to:

•Build links between journals,funders and well-constitutedstandardization efforts inthe biosciences; e.g., BMChttp://is.gd/WIMqz3

•Expedite the production ofan integrated standards-based framework for thebiosciences

Coming soon:

•IDs/DOIs for all items

•Domain-specific views ofstandards — feedback required:http://is.gd/biosharing_feedback

(@ISMB 2011: http://is.gd/biosharing_ISMB_2011)

MIBBI and BioSharing: Proposals to PSI

BioSharing Provide/maintain up-to-date information (content) Offer feedback on the site’s functionality as it matures

MIBBI: three options1.Maintain status quo: MIBBI (and BioSharing) scrape

information— Passive participation only; no real impact (or additional

benefit)— Draw on MIBBI for description of sample and study context

only2.Use the MIBBI Portal as the source for the most current MIAPE

(+?)— MIBBI XML can be transformed into several output types— MIBBI and BioSharing sites increasingly visible to users

3.Participate in the MIBBI Foundry activity (as well as the Portal)— Maintain ‘independent’ MIAPE documents (Portal), but...— Take (joint) ownership of the appropriate Foundry modules— Use the Foundry to re-engineer MIAPE+ where necessary— Show support for integrated cross-domain reporting

Acknowledgements

MIBBIChris Taylor (EBI, NEBC), Susanna-Assunta Sansone (U. Oxford), Dawn Field (NEBC), contributions from participants in MIBBI-registered projects.

BioSharingSusanna Sansone (U. Oxford), Dawn Field (NEBC), Philippe Rocca-Serra (U. Oxford)Annapaola Santarsiero (Mario Negri Institute; U. Oxford), Eamonn Maguire (U. Oxford),Chris Taylor (EBI, NEBC), contributions from numerous communities and individuals.

ISA InfrastructureSusanna-Assunta Sansone, Philippe Rocca-Serra, Eamonn Maguire (U. Oxford); Chris Taylor, Marco Brandizi, Gabriella Rustici, Nataliya Sklyar, Manon Delahaye, Richard Evan (EBI) ; Kimberly Begley, Dorothy Reilly, Oliver Hofmann, Winston Hide (Harvard School of Public Health); Hong Fang, Joshua Xu, Martin Jackson, Jie Zhang, Stephen Harris, Weida Tong (FDA Center for Bioinformatics); Tim Booth, Bela Tiwari, Norman Morrison, Dawn Field (NEBC); Steffen Neumann (Leibniz Institute of Plant Biochemistry); Peter Sterk, Jack Gilbert, Folker Meyer, Linda Amaral-Zettler, Dawn Field (GSC); Alain Zasadzinski, Marie-Christine Jacquemot, Florian Mazur, Damien Fleury, Yahia Berchi, Morad Mercheref, Claude Niederlander, Magali Roux (CNRS Institute of Biological Sciences); Audrey Kauffman (Bergonie Cancer Institute); Miroslaw Dylag (Mentor Software Ltd.).

FundingNEBC, NERC, BBSRC.

BUT…

The objections to fuller reporting

Why should I dedicate resources to providing data to others?—Pro bono arguments have no impact (altruism is a myth)—Sticks wielded by funders and publishers get the bare minimum—No traceability in most contexts (intellectual property = ?)—Loss of competitive advantage (both direct and indirect)

This is just a ‘make work’ scheme for bioinformaticians—Bioinformaticians get a buzz out of having big databases—Parasites benefitting from others’ work ( mutualism..?)

I don’t trust anyone else’s data — I’d rather repeat work—Problems of quality, which are justified to an extent—But what of people lacking resources or specific expertise?

How on earth am I supposed to do this anyway..?—Perception that there is no money to pay for this—No mature free tools — Excel sheets are no good for HT—Worries about vendor support, legacy systems (business models)

Credit where credit’s due

Data sharing is more or less a given now, and tools are emerging—Lots of sticks, but they only get the bare minimum—How to get the best out of data generators?—Need standards- and user-friendly tools, and meaningful

credit

Central registries of data sets that can record deposit and reuse—Well-presented, detailed papers get cited more frequently—The same principle should apply to data sets (metadata, etc.)—ORCIDs for people (orcid.org), DOIs for data (datacite.org)

Side-benefits, challenges—Would also clear up problems around paper authorship—Would enable other kinds of credit (training, curation, etc.)—Community policing — researchers ‘own’ their credit portfolio

(enforcement body useful, but most likely to be reviewers)—Problem of ‘micro data sets’ and legacy data

MIBBI: Background, Context and Plans Chris Taylor chrisftaylor @ gmail.com

Documents

similarity slide

proteomics c

proteomics experts

mibbi project

range of checklists

funded research data

nontrivial checklists

evolution of single