Top Banner
Data, data standards and sharing Dr Daniel Swan Bioinformatics Support Unit [email protected] http://bsu.ncl.ac.uk
20

Data, data standards and sharing Dr Daniel Swan Bioinformatics Support Unit [email protected] .

Dec 22, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Data, data standards and sharing Dr Daniel Swan Bioinformatics Support Unit d.c.swan@ncl.ac.uk .

Data, data standards and sharing

Dr Daniel Swan

Bioinformatics Support [email protected]://bsu.ncl.ac.uk

Page 2: Data, data standards and sharing Dr Daniel Swan Bioinformatics Support Unit d.c.swan@ncl.ac.uk .

Science used to be like this..

Page 3: Data, data standards and sharing Dr Daniel Swan Bioinformatics Support Unit d.c.swan@ncl.ac.uk .

But now it’s something like this..

Page 4: Data, data standards and sharing Dr Daniel Swan Bioinformatics Support Unit d.c.swan@ncl.ac.uk .

Problem..

Data doesn’t fit in:

Page 5: Data, data standards and sharing Dr Daniel Swan Bioinformatics Support Unit d.c.swan@ncl.ac.uk .

So where do we store it instead?

Page 6: Data, data standards and sharing Dr Daniel Swan Bioinformatics Support Unit d.c.swan@ncl.ac.uk .

Why is this a problem?

• Hard drives explode (more often than you think..)• How is *your* “My Documents” filing system?

– Most of us live in folder chaos!

• How well does your hard drive integrate with your lab book?– Well, generally not at all… you might be able to

match things on dates if you’re lucky!

• Big data is EXPENSIVE to generate• It makes sense to get the most value out of it• Your funding bodies know this!

Page 7: Data, data standards and sharing Dr Daniel Swan Bioinformatics Support Unit d.c.swan@ncl.ac.uk .

MRC

“The MRC expects valuable data arising from MRC-funded research to be made available to the scientific community with as few restrictions as possible. Such data must be shared in a timely and responsible manner.”

Page 8: Data, data standards and sharing Dr Daniel Swan Bioinformatics Support Unit d.c.swan@ncl.ac.uk .

BBSRC

“BBSRC expects research data generated as a result of BBSRC support to be made available with as few restrictions as possible in a timely and responsible manner to the scientific community for examination and use.”

(even more pointedly, they also suggest that IP and commercialisation concerns should NOT preclude you from releasing data in a timely fashion)

Page 9: Data, data standards and sharing Dr Daniel Swan Bioinformatics Support Unit d.c.swan@ncl.ac.uk .

Opens up a new problem

• How do we make sure that we can exchange, and understand the data that we share with other researchers?

• Standardised formats for reporting certain experimental data types have been developed

• Although pre-dated by massive open access biological sequence databases – GenBank, DDBJ, EMBL, PDB, UniprotKB etc. these suffer from the fact there are 20+ ‘standards’ for representing DNA or protein sequence data.

• A new set of data standards has emerged for modern biological data

• Often called ‘MI’ data standards • Capture ‘minimum information’ metadata (data about data)

required to comprehend and share scientific data

Page 10: Data, data standards and sharing Dr Daniel Swan Bioinformatics Support Unit d.c.swan@ncl.ac.uk .

Particularly for high throughput data

• All started with MIAME (minimum information about a microarray experiment)

• Now extends to proteomics, neurophysiology, genome sequences – even gel electrophoresis

• If you are going to publish a microarray experiment it is very likely that the journal you publish in will MANDATE that the data is annotated to MIAME standards AND deposited in a recognised repository for that data– GEO

– ArrayExpress

Page 11: Data, data standards and sharing Dr Daniel Swan Bioinformatics Support Unit d.c.swan@ncl.ac.uk .

Why stop at data?

• Whilst the RCUK’s are moving to policies where data is openly deposited, other scientific information is also being openly released

• Open Access publication – a new paradigm for journals (they charge no subscription fees)

• Scientists are beginning to really utilise the internet to share data, ideas, foster collaborations

• But why?– The realisation that the data in your lab books is

‘tombed’. Unless you’re going to commercialise it, or it’s going to win you a Nobel…. Why not share?

Page 12: Data, data standards and sharing Dr Daniel Swan Bioinformatics Support Unit d.c.swan@ncl.ac.uk .

But still people argue about sharing

• I don’t want to be scooped!

• My data isn’t very good

• I am hoping to commercialise this some day

Page 13: Data, data standards and sharing Dr Daniel Swan Bioinformatics Support Unit d.c.swan@ncl.ac.uk .

Open notebook science

• A new concept being pioneered by some scientists

• Using ‘Web 2.0’ tools (i.e. user generated content)

• A combination of– Blogs

• Even if you’re not sharing data, why not share some ideas?

– Wikis• Wikis are like lab books on steroids, and you can

link them to all kinds of external resources, open them up to the world

– Other collaborative tools

Page 14: Data, data standards and sharing Dr Daniel Swan Bioinformatics Support Unit d.c.swan@ncl.ac.uk .

Usefulchem

Page 15: Data, data standards and sharing Dr Daniel Swan Bioinformatics Support Unit d.c.swan@ncl.ac.uk .

Forging 21st century collaborations

• You’re not limited to talking to your peers in the coffee room

• But where can you get interacting with other scientists?

Page 16: Data, data standards and sharing Dr Daniel Swan Bioinformatics Support Unit d.c.swan@ncl.ac.uk .
Page 17: Data, data standards and sharing Dr Daniel Swan Bioinformatics Support Unit d.c.swan@ncl.ac.uk .
Page 18: Data, data standards and sharing Dr Daniel Swan Bioinformatics Support Unit d.c.swan@ncl.ac.uk .
Page 19: Data, data standards and sharing Dr Daniel Swan Bioinformatics Support Unit d.c.swan@ncl.ac.uk .
Page 20: Data, data standards and sharing Dr Daniel Swan Bioinformatics Support Unit d.c.swan@ncl.ac.uk .

To sum up

• Be aware what the expectations are for releasing your data to the public from your funding body

• The more metadata you capture about your work the easier it will be to comply with data standards regulations later

• Don’t be afraid to use technology, keeping track of science is hard, and there’s no way to Google a lab book!

• Engage with online communities – many a collaboration has been formed via a blog post!

• Want to talk about how best to analyse and store your digital data? Come talk to us!