Top Banner
Science as an Open Enterprise: Open Data for Open Science Professor Brian Collins CB, FREng UCL, June 2012 Emerging conclusions from a Royal Society Policy Report
11

Science as an Open Enterprise: Open Data for Open Science Professor Brian Collins CB, FREng UCL, June 2012 Emerging conclusions from a Royal Society Policy.

Dec 23, 2015

Download

Documents

Nickolas Gaines
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Science as an Open Enterprise: Open Data for Open Science Professor Brian Collins CB, FREng UCL, June 2012 Emerging conclusions from a Royal Society Policy.

Science as an Open Enterprise:Open Data for Open Science

Professor Brian Collins CB, FREng

UCL, June 2012

Emerging conclusions from a Royal Society Policy Report

 

Page 2: Science as an Open Enterprise: Open Data for Open Science Professor Brian Collins CB, FREng UCL, June 2012 Emerging conclusions from a Royal Society Policy.

Open data as the engine of the “scientific revolution”

Publish scientific theories – and the experimental and observational data on which they are based – to permit others to scrutinise them, to identify errors, to support, reject or refine theories and to reuse data for further understanding and knowledge.

Henry Oldenburg

Page 3: Science as an Open Enterprise: Open Data for Open Science Professor Brian Collins CB, FREng UCL, June 2012 Emerging conclusions from a Royal Society Policy.

Why is “open data” a big current issue?

The data deluge from powerful acquisition toolscoupled with

powerful tools for storing, manipulating, analysing, displaying and transmitting data

and

citizens interest in scrutinising scientific claimshave created

new challenges & new opportunities that require newforms of openness and novel social dynamics in science

Page 4: Science as an Open Enterprise: Open Data for Open Science Professor Brian Collins CB, FREng UCL, June 2012 Emerging conclusions from a Royal Society Policy.

Challenges• Maintaining scientific self-correction (closing the concept-data gap)• Responding to citizens’ demands for evidence in “public interest science”

Opportunities • Exploiting data-intensive science – a 4th paradigm?• The potential of linked data• “Data is the new raw material for business”• Exposing malpractice and fraud• Stimulating citizen science

Aspiration: all scientific literature online, all data online, and for them to interoperate

Page 5: Science as an Open Enterprise: Open Data for Open Science Professor Brian Collins CB, FREng UCL, June 2012 Emerging conclusions from a Royal Society Policy.

Openness of data per se has no value. Open science is more than disclosureFor effective communication, we need intelligent openness. Data must be:

• Accessible• Intelligible• Assessable• Re-usable

Only when these four criteria are fulfilled are data properly open

Metadata must be audience-sensitive

METADATA

Scientific data rarely fits neatly into an EXCEL spreadsheet!

Page 6: Science as an Open Enterprise: Open Data for Open Science Professor Brian Collins CB, FREng UCL, June 2012 Emerging conclusions from a Royal Society Policy.

Boundaries of openness?

• Legitimate commercial interests

• Privacy (complete anonymisation is impossible)

• Safety & Security

But the boundaries are fuzzy & complex

Page 7: Science as an Open Enterprise: Open Data for Open Science Professor Brian Collins CB, FREng UCL, June 2012 Emerging conclusions from a Royal Society Policy.

Benefits/costs of open data to the science processPathfinder disciplines where benefit is recognised and habits are changing

• Bioinformatics (-omics disciplines)

• Biological science

• Particle physics

• Nanotechnology

• Environmental science

• Longitudinal societal data

• Astronomy & space science

Costs

Tier 1 – International databases – e.g. Worldwide Protein Databank: >65 staff; $6.5M pa;

1% of cost of collecting data

Tier 3 – Institutional data management - UK 2011, average UK university repository

- 1.36 FTE (managerial, administrative, technical)

e.g. Gene Omnibus – 2700 GEO uploads by non-contributors in 2000 led to 1150 papers

(>1000 additonal papers over the 16 that would be expected from investment of $400,000)

Page 8: Science as an Open Enterprise: Open Data for Open Science Professor Brian Collins CB, FREng UCL, June 2012 Emerging conclusions from a Royal Society Policy.

Levels of data curation

Tier 1 – International databases

Tier 2 – National

(e.g. Research Councils

Tier 3 – Institutions

(Universities & Institutes)

Tier 4 – “Small science” researchers

& research groups

Financial sustainability?up

war

d da

ta m

igra

tion

Data

loss

Page 9: Science as an Open Enterprise: Open Data for Open Science Professor Brian Collins CB, FREng UCL, June 2012 Emerging conclusions from a Royal Society Policy.

Priorities for action- 1

1) Change the mindset: publicly funded data is a public resource

2) Credit for useful data and productive, novel collaboration (the Tim Gowers phenomenon)

3) Mandatory access to data underlying publications

4) Common standards for communicating data

5) Sustainability (the power needs of current modes of data storage will outstrip the global electricity supply within the decade)

Page 10: Science as an Open Enterprise: Open Data for Open Science Professor Brian Collins CB, FREng UCL, June 2012 Emerging conclusions from a Royal Society Policy.

Priorities for action - 2• R & D on software tools (Enabling dynamic data; managing the data lifecycle; tracking provenance, citation, indexing and searching, standards & inter-operability, sustainability - note that the ICT industry is often way ahead - & the US prioritises investment here)

• Institutional responsibility for the knowledge they create (cumulative small science data > cumulative big science data)

• Data scientists (they are being trained, and the commercial demand is large)

“Big Iron” is a national infrastructure priority“Big data” is a science priority – the big costs are people and software, not computers

Page 11: Science as an Open Enterprise: Open Data for Open Science Professor Brian Collins CB, FREng UCL, June 2012 Emerging conclusions from a Royal Society Policy.

Targets for recommendations

• Scientists – changing cultural assumptions

• Employers (universities/institutes) – data responsibilities; crediting researchers

• Funders of research - the cost of curation is a cost of research

• Learned societies – influencing their communities

• Publishers of research – mandatory open data

• Business – exploiting the opportunity; awareness & skills

• Government – efficiency of the science base; exploiting its data

• Governance processes for privacy, safety, security - proportionality