Top Banner
Department of Bioinformatics - BiGCaT 1 Data Management (from day 0) Egon Willighagen (@egonwillighagen) 3 April 2014, Masterclass RDM in NL
16

Data Management (from day 0)

Aug 11, 2014

Download

Data & Analytics

Practical experiences around data handling form a chemist, now working in biology. The story starts on the day that I started managing my PhD thesis research in a version control system, originally Subversion, later Git. It them moves on to all the issues around data in publishing, data licensing (be sure you understand what you're using), online repositories, a bit of data citation, repository-integrated data analysis, finding a conclusion, and returning to day 1.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Data Management (from day 0)

Department of Bioinformatics - BiGCaT 1

Data Management (from day 0)

Egon Willighagen (@egonwillighagen)3 April 2014, Masterclass RDM in NL

Page 2: Data Management (from day 0)

Department of Bioinformatics - BiGCaT 2

Day 0: data plan

Before you start doing an experiment, you get a lab notebook.

(Some universities already require electronic lab notebooks!)

Page 3: Data Management (from day 0)

Department of Bioinformatics - BiGCaT 3

Day 1: the electronic lab notebook

• Version Control System–Allows backups–Allows

annotation–Dated changes

Page 4: Data Management (from day 0)

Department of Bioinformatics - BiGCaT 4

Day 2: be careful what data you use

• Availability in 4 years?–Your Library/University has a copy?

• Can you read the format?• Can you copy the data and share (e.g.

with collaborators)?• What if the journal you publish in

requires you to share data?

Page 5: Data Management (from day 0)

Department of Bioinformatics - BiGCaT 5

Day 3: store everything

• Experiments–Description–Results (images,

measurements, …)

• Written output–Reports, papers,

presentations

Page 6: Data Management (from day 0)

Department of Bioinformatics - BiGCaT 6

Day 4: Analyse data directly from a repository

Willighagen E. (2014) Accessing biological data in R with semantic webtechnologies. PeerJ PrePrints 2:e185v3. 10.7287/peerj.preprints.185v3

mart = biomaRt::useMart(biomart="snp", dataset="hsapiens_snp")

brca1 = c("rs16940","rs16941", "rs16942", "rs799916", "rs799917")

data = biomaRt::getBM(attributes=attribs, filters=c("snp_filter"),

values=brca1, mart=mart)

results = sparql.remote(

"http://rdf.farmbio.uu.se/chembl/sparql", paste(

"SELECT DISTINCT ?predicate ?object WHERE {",

" ?assay <http://www.w3.org/2000/01/rdf-schema#label> \"CHEMBL615603\" ;",

" ?predicate ?object . }"

))

Page 7: Data Management (from day 0)

Department of Bioinformatics - BiGCaT 7

Day 4: Analyses inside your report

http://yihui.name/knitr/

<p>We can also produce plots (centered by the option <code>fig.align='center'</code>):</p>

<!--begin.rcode html-cars-scatter, message=FALSE, fig.align='center' library(ggplot2) plot(mpg~hp, mtcars) qplot(hp, mpg, data=mtcars)+geom_smooth() end.rcode-->

Page 8: Data Management (from day 0)

Department of Bioinformatics - BiGCaT 8

Day 5: Large Repositories

• Uniprot, ChEMBL, Gene Ontology– Is there a deposition

workflow?

• Growing repositories–WikiPathways

• Set up a new database (paper+1)–e.g. DrugMet– Problem: what about

small data?

• Journal driven–CSD–PDB

Page 9: Data Management (from day 0)

Department of Bioinformatics - BiGCaT 9

Day 5: Database Seeds

• Set up a new database (paper += 1)–e.g. DrugMet

S. Lampa + meCC-SA, but data CC0

Page 10: Data Management (from day 0)

Department of Bioinformatics - BiGCaT 10

Day 5: National Repositories

Page 11: Data Management (from day 0)

Department of Bioinformatics - BiGCaT 11

Day 5: Small Data @ FigShare

Page 12: Data Management (from day 0)

Department of Bioinformatics - BiGCaT 12

Day 5: Scientific dissemination

• Data sharing: copyright– Can data be copyrighted?– Data Source: you, lab mates, others?– Ownership

• Data sharing: license– Do you want your data reused?– And be modified (format!)?– Commercial use?

Page 13: Data Management (from day 0)

Department of Bioinformatics - BiGCaT 13

Day 6: Format? Why not SemWeb?

• 5 Star Open Data (5stardata.info)open available, reusable, open format,URIs (ontologies etc),linkeddata

Page 14: Data Management (from day 0)

Department of Bioinformatics - BiGCaT 14

Linked Open Data Cloud

Page 15: Data Management (from day 0)

Department of Bioinformatics - BiGCaT 15

Day 7: are people using your work?

Page 16: Data Management (from day 0)

Department of Bioinformatics - BiGCaT 16

Day 8: back to step 0

• Take feedback (“peer review”), study new uses

• Plan your next study

CC-BYfrankensteinnn@flickr