Rin goble-published

Post on 10-May-2015

953 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

My experiences of the SysMO-DB project for data sharing and data management in the field by systems biologists.

Transcript

Data sharingData management

The SysMO-SEEK Story

Professor Carole Goble FREng FBCS CITPUniversity of Manchester, UKcarole.goble@manchester.ac.uk

13 teams91 institutes, 300 scientistsMulti-site, multi-disciplinaryEach three year duration

Data generationData consumptionData analysis

Data management:Local – Shared – Long term

Pan European Systems Biology

http://www.sysmo.net

Own data solutions. wikis, e-Groupware, PHProjekt, BaseCamp, PLONE, Alfresco, bespoke commercial … files and spreadsheets.

Extreme caution over sharing.Modellers vs experimentalist tribalism

Many institutions, many projects, overlapping memberships, changing membership. Projects ending, starting, carrying on the same, carrying on differently.

Legacy

Suspicion

Dynamics

Expert scientists, inexpert informaticians. Few resources.

Skills

Patchy standards, incomparable data, afterthought.

Data

Scientist Lab Collaborators Competitors

Programm

ePublished

Post-Publication

Pre-Publication

Data mine-ing

“my impression of researchers, and I can criticize myself in this, is that we’re much more interested in sharing data when we mean sharing somebody else’s as opposed [to] sharing ours.”

E-infrastructure - taking forward the strategy, RIN report, 2010

Competitive advantage.Adoption.

Kudos & Credit.Help.Fame.

Reputation.

Being scooped.Scrutiny.

Misinterpretation.Cost.

Blame. Reputation.

Rew

ards

Risk

s

Nature 461, 145 (10 September 2009)

1. Sharing

“It’s not ready yet”

“I need to get (another) publication first”

“We don’t have the resources or skills to prepare it for others, esp. now we finished that project”

“Its faster/easier to do it myself, and will keep the credit/control too”

“Its not described enough to be usable”

“I don’t trust the quality. Its not reliable enough. Its too noisy.

“Others won’t use it properly.” “It’s not worth my while”“They are my competitors!!”

Pseudo Sharing

2. Preparation for Use Curation StandardsReusabilityReproducibilityAccountability & QualityData discipline Silo busting

CIMR Core Information for Metabolomics ReportingMIABE Minimal Information About a Bioactive Entity MIACA Minimal Information About a Cellular Assay MIAME Minimum Information About a Microarray Experiment MIAME/Env MIAME / Environmental transcriptomic experiment MIAME/Nutr MIAME / Nutrigenomics MIAME/Plant MIAME / Plant transcriptomics MIAME/Tox MIAME / Toxicogenomics MIAPA Minimum Information About a Phylogenetic Analysis MIAPAR Minimum Information About a Protein Affinity Reagent MIAPE Minimum Information About a Proteomics Experiment MIARE Minimum Information About a RNAi Experiment MIASE Minimum Information About a Simulation Experiment MIENS Minimum Information about an ENvironmental Sequence MIFlowCyt Minimum Information for a Flow Cytometry Experiment MIGen Minimum Information about a Genotyping Experiment MIGS Minimum Information about a Genome Sequence MIMIx Minimum Information about a Molecular Interaction Experiment MIMPP Minimal Information for Mouse Phenotyping Procedures MINI Minimum Information about a Neuroscience Investigation MINIMESS Minimal Metagenome Sequence Analysis Standard MINSEQE Minimum Information about a high-throughput SeQuencing Experiment MIPFE Minimal Information for Protein Functional Evaluation MIQAS Minimal Information for QTLs and Association Studies MIqPCR Minimum Information about a quantitative Polymerase Chain Reaction experimentMIRIAM Minimal Information Required In the Annotation of biochemical Models MISFISHIE Minimum Information Specification For In Situ Hybridization and Immunohistochemistry

ExperimentsSTRENDA Standards for Reporting Enzymology DataTBC Tox Biology Checklist

BioPAX : Biological Pathways Exchange http://www.biopax.org/FuGE Functional Genomics Experiment MGED: Microarray Experimental Conditionshttp://www.mibbi.org/index.php/MIBBI_portal

Minimum Information for Biological and Biomedical Investigations

Metadata Minefield

http://usefulchem.wikispaces.com/page/code/EXPLAN001

http://www.mygrid.org.uk/tools/taverna/

Publishing Process

modelssoftware

methods

scripts

http://openwetware.org

standard operating procedures

Community Curation Responsiblity

Blue Collar ScienceJohn Quackenbush

Difficult and time consuming

Poor Creditor Reward

Shabby CareerPaths & Prospects

3. Credit Crisis• Reward sharing, curation and

reuse rather than reinvention. • Credit. Attribution. Citation.• For software, methods and

standards too.

• Technical (DataCite.org).• Cultural (Respected policy).• Institutional.• Funding bodies.

4. Infrastructure, Capability & Capacity• Three year

PhD/project cycle• Local data control• Realistic paths to

adoption by busy people.

• Spreadsheets, wikis, catalogues and yellow pages.

• Content and Tools

http://www.biosharing.org

Identity ManagementSharednames DataCiteLSID DOIs ORCID

5. Data Ecosystem

Resources

6. Sustained Resources• Three year projects.• Three year lifespan of data (and its software).• Sunsets and Sustains• Reinvention rewarded

• Institution.• Funding councils.• Funding panels.• Publishers• Libraries• National data centres• International data centres

Free. Like Puppies

Incentives.Sensitivity to Behaviours

Infrastructure

Community building

Trusted service

CoordinationGovernance

Policy

Capability

Community Integration

A Partnership• Software engineers• Computational scientists• Experimental Scientists• Domain informaticians• Service providers• Funding agencies

• But the community credit crisis continues….

Summary• Science is a complex social activity

undertaken by tribes of people and dominated by trust issues.

• Infrastructure has to be there and fit for purpose but its not the real the problem.

• Need a cultural shift (on all sides) that truly honours data.

top related