Dec 14, 2015
Mechanisms of scientific advance
Well-oiled cogs meshing perfectly (would be nice)
How well are things working?—Cue the Tower of Babel analogy…—Situation is improving with respect to standards—But few tools, fewer carrots (though some
sticks)
Why do we care about that..?—Data exchange / depositionComprehensibility (/quality) of workScope for reuse (parallel or orthogonal)
“Publicly-funded research data are a public good, produced in the public interest”
“Publicly-funded research data should be openly available to the maximum extent possible.”
ProteoRED’s MIAPE satisfaction survey
Spanish multi-site collaboration: provision of proteomics services MIAPE customer satisfaction survey (compiled November 2008)
— http://www.proteored.org/MIAPE_Survey_Results_Nov08.html— Responses from 31 proteomics experts representing 17 labs
Yes: 95%No: 5%
Technologically-delineated views of the world A: transcriptomics B: proteomics C: metabolomics …and…
Biologically-delineated views of the world A: plant biology B: epidemiology C: microbiology …and…
Generic features (‘common core’) — Description of source biomaterial — Experimental design components
Arrays
Scanning Arrays &Scanning
Columns
GelsMS MS
FTIR
NMR
Columns
Modelling the biosciences (inefficiently)
‘Omics’ is about as useful as a chocolate teapot
Assay: Omics and miscellaneous techniques
Investigation:
Medical syndrome, environmental effect, etc.Study: Toxicology, environmental science, etc.
Reporting guidelines — a case in point
MIAME, MIAPE, MIAPA, MIACA, MIARE, MIFACE, MISFISHIE, MIGS, MIMIx, MIQAS, MIRIAM, (MIAFGE, MIAO), My Goodness…
‘MI’ checklists usually developed independently, by groups working within particular biological or technological domains
— Difficult to obtain an overview of the full range of checklists
— Tracking the evolution of single checklists is non-trivial— Checklists are inevitably partially redundant one against
another— Where they overlap arbitrary decisions on wording and
sub structuring make integration difficult
Significant difficulties for those who routinely combine information from multiple biological domains and technology platforms
— Example: An investigation looking at the impact of toxins on a sentinel species using proteomics (‘eco-toxico-proteomics’)
— What reporting standard(s) should they be using?
The MIBBI Project (mibbi.org)
[†] Denotes that a specification is provided as a suite of related documents
CONCEPT SPECIALISATION ● C
IMR [†]
● M
IACA
● M
IAM
E
● M
IAM
E/E
nv
● M
IAM
E/N
utr
● M
IAM
E/P
lant
● M
IAM
E/T
ox
● M
IAPA
● M
IAPE [†]
● M
IARE
● M
IFlo
wCyt
● M
IGen
● M
IGS/M
IMS
● M
IMIx
● M
IMPP
● M
INI
study inputs study design ●generic organism ●
cells / microbes
plant
animal
mouse
human
population
environmental sample
environment / habitat
in silico model
study procedures organism maintenance
animal husbandry
cell / microbe culture
plant cultivation
acclimation
preconditioning / pretreatment ●organism manipulation
assay inputs generic study input
organism part ●organism state
organism trait
biomolecule
synthetic analyte ●silencing RNA reagent
Version 0.7 (2008-04-10)
Comparison of MIBBI-registered projects [21] ● Release
Granularity Coarse Medium Fine
Maturity ● Planned ● Drafting
The MIBBI Project (mibbi.org)
The MIBBI Project (mibbi.org)
Interaction graph for projects (line thickness & colour saturation show similarity)
The MIBBI Project (mibbi.org)
Drafting MIBBI Foundry modules
Analytical approach proved ‘challenging’ Cross analyses were either too coarse or too depressing Conclusion: no ‘perfect’ solution…
If in doubt, hack (a.k.a. ‘iterative development’) Start with one set of guidelines, breaking it into ‘paragraphs’ Add another set, breaking it up similarly (‘shared subject’) Where there are overlaps, seek to resolve
— If similar, aim for an ‘average’ module— If distinct, use core and extension modules— Record dependencies in a matrix (for reference)
‘Normalise’ (look for efficiencies, to a point)
Validation Asking for something like MIxxx should get something like
MIxxx Weigh the conflicts/compromises; reexamine extensions etc.
Current coverage: Portal versus Foundry
Checklists covered to date (x) MIGS/MIMS, MIAPE, MIFlowCyt, MIARE, ‘Env’ extensions
Modules developed to date 35 (set to rise rapidly)…
‘Pedro’ tool → XML → (via XSLT) Wiki code (etc.)
MICheckout: Supporting Users
Future direction for MICheckout?
Current status Very simple interface
— Pick what you want, in the order you want— Download or view in the format you want
Issues with the current interface— Pick what you want, in the order you want (=anarchy)— No way to work out everything that you need (fiddly bits)
Different approaches1.Wizard-based Q&A for normal users, plus ‘advanced’ interface
— Simple ordered (ISA) questions for users; high level concepts
— Advanced interface similar to the current one2.Domain-specific-MI-based concepts as keys/shortcuts
— “I normally get MIxxx – please give me the equivalent”— Similar advanced access to #1
http://isa-tools.org
The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
18
Example of guiding the experimentalist to search and select a term from the EnvO ontology, to describe the habitat of a sample
(Ontologies, accessed in real time via the Ontology Lookup Service and
BioPortal.)
BII @ the NERC Environmental Bioinformatics Centre
ISA Software Users
Several groups have nowbegun to use all or part ofthe ISA software suite
Easy to get going by usingthe data entry tool alone(ISAcreator)
Power users can reconfigureISAcreator to meet local need(ISAconfigurator)
Some skill required to installthe full suite (back end stuff)
Satisfies two needs:•Internal data management•Requirement to share data
The BioSharing project provides stableweb-based catalogues and a userforum. The project seeks to:
•Build links between journals,funders and well-constitutedstandardization efforts inthe biosciences; e.g., BMChttp://is.gd/WIMqz3
•Expedite the production ofan integrated standards-based framework for thebiosciences
Coming soon:
•IDs/DOIs for all items
•Domain-specific views ofstandards — feedback required:http://is.gd/biosharing_feedback
(@ISMB 2011: http://is.gd/biosharing_ISMB_2011)
MIBBI and BioSharing: Proposals to PSI
BioSharing Provide/maintain up-to-date information (content) Offer feedback on the site’s functionality as it matures
MIBBI: three options1.Maintain status quo: MIBBI (and BioSharing) scrape
information— Passive participation only; no real impact (or additional
benefit)— Draw on MIBBI for description of sample and study context
only2.Use the MIBBI Portal as the source for the most current MIAPE
(+?)— MIBBI XML can be transformed into several output types— MIBBI and BioSharing sites increasingly visible to users
3.Participate in the MIBBI Foundry activity (as well as the Portal)— Maintain ‘independent’ MIAPE documents (Portal), but...— Take (joint) ownership of the appropriate Foundry modules— Use the Foundry to re-engineer MIAPE+ where necessary— Show support for integrated cross-domain reporting
Acknowledgements
MIBBIChris Taylor (EBI, NEBC), Susanna-Assunta Sansone (U. Oxford), Dawn Field (NEBC), contributions from participants in MIBBI-registered projects.
BioSharingSusanna Sansone (U. Oxford), Dawn Field (NEBC), Philippe Rocca-Serra (U. Oxford)Annapaola Santarsiero (Mario Negri Institute; U. Oxford), Eamonn Maguire (U. Oxford),Chris Taylor (EBI, NEBC), contributions from numerous communities and individuals.
ISA InfrastructureSusanna-Assunta Sansone, Philippe Rocca-Serra, Eamonn Maguire (U. Oxford); Chris Taylor, Marco Brandizi, Gabriella Rustici, Nataliya Sklyar, Manon Delahaye, Richard Evan (EBI) ; Kimberly Begley, Dorothy Reilly, Oliver Hofmann, Winston Hide (Harvard School of Public Health); Hong Fang, Joshua Xu, Martin Jackson, Jie Zhang, Stephen Harris, Weida Tong (FDA Center for Bioinformatics); Tim Booth, Bela Tiwari, Norman Morrison, Dawn Field (NEBC); Steffen Neumann (Leibniz Institute of Plant Biochemistry); Peter Sterk, Jack Gilbert, Folker Meyer, Linda Amaral-Zettler, Dawn Field (GSC); Alain Zasadzinski, Marie-Christine Jacquemot, Florian Mazur, Damien Fleury, Yahia Berchi, Morad Mercheref, Claude Niederlander, Magali Roux (CNRS Institute of Biological Sciences); Audrey Kauffman (Bergonie Cancer Institute); Miroslaw Dylag (Mentor Software Ltd.).
FundingNEBC, NERC, BBSRC.
BUT…
The objections to fuller reporting
Why should I dedicate resources to providing data to others?—Pro bono arguments have no impact (altruism is a myth)—Sticks wielded by funders and publishers get the bare minimum—No traceability in most contexts (intellectual property = ?)—Loss of competitive advantage (both direct and indirect)
This is just a ‘make work’ scheme for bioinformaticians—Bioinformaticians get a buzz out of having big databases—Parasites benefitting from others’ work ( mutualism..?)
I don’t trust anyone else’s data — I’d rather repeat work—Problems of quality, which are justified to an extent—But what of people lacking resources or specific expertise?
How on earth am I supposed to do this anyway..?—Perception that there is no money to pay for this—No mature free tools — Excel sheets are no good for HT—Worries about vendor support, legacy systems (business models)
Credit where credit’s due
Data sharing is more or less a given now, and tools are emerging—Lots of sticks, but they only get the bare minimum—How to get the best out of data generators?—Need standards- and user-friendly tools, and meaningful
credit
Central registries of data sets that can record deposit and reuse—Well-presented, detailed papers get cited more frequently—The same principle should apply to data sets (metadata, etc.)—ORCIDs for people (orcid.org), DOIs for data (datacite.org)
Side-benefits, challenges—Would also clear up problems around paper authorship—Would enable other kinds of credit (training, curation, etc.)—Community policing — researchers ‘own’ their credit portfolio
(enforcement body useful, but most likely to be reviewers)—Problem of ‘micro data sets’ and legacy data