Towards Incidental Collaboratories For Experimental Data Anita de Waard VP Research Data Collabora>ons Elsevier RDS, Jericho, VT, USA Thanks: Maryann Martone, Anita Bandrowski, NIF, UCSD Nathan Urban, Shreejoy Thripathy, CMU Ed Hovy, Gully Burns, ISI/CMU; Phil Bourne, UCSD
19
Embed
Towards Incidental Collaboratories For Experimental Data
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Towards Incidental Collaboratories For Experimental Data
Anita de Waard VP Research Data Collabora>ons
Elsevier RDS, Jericho, VT, USA
Thanks: Maryann Martone, Anita Bandrowski, NIF, UCSD Nathan Urban, Shreejoy Thripathy, CMU Ed Hovy, Gully Burns, ISI/CMU; Phil Bourne, UCSD
Problem: a rose is not a rose:
• “…there was significant variability of the injected venom composi>on from specimen to specimen, in spite of their common biogeographic origin.”
Jose A. Rivera-‐Or>z, Herminsul Cano, Frank Marí, Intraspecies variability of the injected venom of Conus ermineus, doi:10.1016/j.pep>des.2010.11.014
• “…Strains DV-‐3/84 DV-‐7/84 (group 3) showed 76.6% similarity to each other and were similar to all other strains at the 67.6% level.”
Zofia Dzierżewicz et al., Intraspecies variability of Desulfovibrio desulfuricans strains determined by the gene>c profiles, FEMS Microbiology Leeers, Volume 219, Issue 1, 14 February 2003, Pages 69–74, doi:10.1016/S0378-‐1097(02)01199-‐0
=> A specimen is not a species!
Problem: gene expression varies with: Age: “SIRT1-‐Associated genes are deregulated in the aged brain”
Philipp Oberdoerffer et al., SIRT1 RedistribuJon on ChromaJn Promotes Genomic Stability but Alters Gene Expression during Aging, Cell, Volume 135, Issue 5, 28 November 2008, Pages 907–918, doi:10.1016/j.cell.2008.10.025
Smell: “…major urinary proteins […] mediate the pregnancy blocking effects of male urine”
P.A. Brennan, et al, PaOerns of expression of the immediate-‐early gene egr-‐1 in the accessory olfactory bulb of female mice exposed to pheromonal consJtuents of male urine, Neuroscience, Volume 90, Issue 4, June 1999, P 1463–1470, doi:10.1016/S0306-‐4522(98)00556-‐9
Hunger: “Out of the ~30K genes, about 10K are differen>ally expressed in liver cells when an animal is in different states of sa>ety.“
Zhang F, Xu X, Zhou B, He Z, Zhai Q (2011) Gene Expression Profile Change and Associated Physiological and Pathological Effects in Mouse Liver Induced by Fas>ng and Refeeding. PLoS ONE 6(11): e27553. doi:10.1371/journal.pone.002755
Light: “Longer-‐term enrichment training also altered the mRNA levels of many genes associated with structural changes that occur during neuronal growth.”
Cailoeo C., et al. (2009) Effects of Nocturnal Light on (Clock) Gene Expression in Peripheral Organs: A Role for the Autonomic Innerva>on of the Liver. PLoS ONE 4(5): e5650. doi:10.1371/journal.pone.0005650:
=> Knowing genes is not knowing how they are expressed!
• “We found the diversity and abundance of each habitat’s signature microbes to vary widely even among healthy subjects, with strong niche specializa>on both within and among individuals.”
The Human Microbiome Project Consor>um, Structure, func>on and diversity of the healthy human microbiome, Nature 486, 207–214 (14 June 2012) doi:10.1038/nature11234
• “Coloniza>on of an infant’s gastrointes>nal tract begins at birth. The acquisi>on and normal development of the neonatal microflora is vital for the healthy matura>on of the immune system.”
Mackie RI, Sghir A, Gaskins HR., Developmental microbial ecology of the neonatal gastrointes>nal tract. Am J Clin Nutr. 1999 May;69(5):1035S-‐1045S
Problem: no man (or mouse) is an island…
=> An animal is an ecosystem!
Interac>ons create more complexity: • Compu>ng cancer: “No amount of informa,on about what happens inside a single cell can ever tell you what a ,ssue is going to do,” [Glazier] said. “Much of the informa>on and complexity of >ssues and life is embedded in the way cells talk to each other and the extracellular environment.”
• Megadata:“These complex emergent systems are impossible to understand,”,”[we] founded Applied Proteomics to create a protein diagnos>c that reveals not just where a cancer is, but how it interacts with the body..” Nature Special Issue Vol. 491 No. 7425
• Interspecies variability > A specimen is not a species! • Gene expression variability > Knowing genes is not
knowing how they are expressed! • Microbiome > An animal is an ecosystem! • Systems biology > Whole is more than the sum of its parts! • Models vs. experiment > Are we talking about the same
things? In a way we can all use? • Dynamics > Life is not in equilibrium! Life is complicated!
Reduc>onism doesn’t work for living systems.
Sta>s>cs to the rescue! With enough observa>ons, trends and anomalies can be detected: • “Here we present resources from a popula>on of 242
healthy adults sampled at 15 or 18 body sites up to three >mes, which have generated 5,177 microbial taxonomic profiles from 16S ribosomal RNA genes and over 3.5 terabases of metagenomic sequence so far.”
The Human Microbiome Project Consor>um, Structure, func>on and diversity of the healthy human microbiome, Nature 486, 207–214 (14 June 2012) doi:10.1038/nature11234
• “The large sample size — 4,298 North Americans of European descent and 2,217 African Americans — has enabled the researchers to mine down into the human genome.”
Nidhi Subbaraman, Nature News, 28 November 2012, High-‐resolu>on sequencing study emphasizes importance of rare variants in disease.
• Collect: store data at the level of the experiment: – Accessible through a single interface – Add enough metadata to know what was done/seen
– In a way that can be used by modelers! • Keep: – Long-‐term preserva>on of data and so}ware – Fulfill Data Management Plan requirements – Allow ‘gated’ access when and to whom researcher wants
Enable ‘incidental collaboratories’:
Problem: biological research is quite insular • Biology is small: size 10^-‐5 – 10^2 m, scien>st can work alone (‘King’ and ‘subjects’).
• Biology is messy: it doesn’t happen behind a terminal.
• Biology is compe>>ve: many people with similar skill sets, vying for the same grants
• In summary: the structure of biological research does not inherently promote collabora>on (vs., for instance, big physics or astronomy).
Prepare
Observe
Analyze
Ponder
Communicate
Let’s look at a typical lab:
• How to get the right an>body IDs
• And messy bits • From the lab notebook • Into the PI’s command center?
Objec>ons and rebueals re. data sharing Objec,on: Rebu=al:
“But our lab notebooks are all on paper”
Develop smart phone/tablet apps for data input
“I need to see a direct benefit from something I spend my >me on”
Develop ‘data manipula,on dashboard’ for PI to allow beeer access to full experimental output for his/her lab
“I want things to be peer reviewed before I expose them”
Allow reviewers access to experimental database before publica>on (of data or paper)
“I don’t really trust anyone else’s data – well, except for the guys I went to Grad School with…”
Add a social networking component to this data repository so you know who (to the individual) created that data point.
“I am afraid other people might scoop my discoveries”
=> Reward system moves from a compe,,on to a ‘shared mission’
Data sharing enables collaboratories:
Prepare
Analyze Communicate
Think
Prepare
Analyze Communicate
Prepare
Analyze Communicate
Observa>ons
Observa>ons
Observa>ons
Labs go from being informa>on islands to being ‘sensors in a network’ ‘Conglomera>on of evidence’ can happen Allow place to share nega>ve data – reproducing experiments.
So we can do joint experiments:
Prepare
Analyze Communicate
Prepare
Analyze Communicate
Observa>ons
Observa>ons
Observa>ons
Across labs, experiments: track reagents and how they are used
So we can do joint experiments:
Prepare
Analyze Communicate
Prepare
Analyze Communicate
Observa>ons
Observa>ons
Observa>ons
Compare outcome of interac>ons with these en>>es
So we can do joint experiments:
Prepare
Analyze Communicate
Prepare
Analyze Communicate
Observa>ons
Observa>ons
Observa>ons
Build a ‘virtual reagent spectrogram’ by comparing how different en>>es interacted in different experiments
Calculate, coordinate…
Compile, comment, compare…
6. Allow apps/tools to integrate
A single environment to perform, store, share and report on experiments:
1. Store metadata on all materials metadata
metadata
metadata
metadata
metadata
5. Invite reviews; open data to trusted par>es, at trusted >me
2. Track the methods while doing them
4. Don’t ‘send’ your papers – just expose them to the outside world
Review Edit
Revise
Rats were subjected to two grueling tests (click on fig 2 to see underlying data). These results suggest that the neurological pain pro-‐
3. Write papers that ‘wrap around’ this
Elsevier Research Data Services:
1. Help increase the amount of data shared from the lab, enabling incidental collaboratories
2. Help increase the value of the data shared by increasing annota>on, normaliza>on, provenance enabling enhanced interoperability
3. Help measure and deliver credit for shared data, the researchers, the ins>tute, and the funding body, enabling more sustainable pla�orms
Plans with CMU/Neuroelectro.org: • Do a pilot in Q3 2013, using: – 7” Tablets for data input – Can we link to barcodes for AB-‐s, scan on tablet (so we can include the batch’s provenance?) – Links to local so}ware to connect to runs – Dashboard for the PI to keep track/play with experiments – Gated exports to • Neuroelectro.org • NIF
– Address NSF Data Management Plan requirements?
In summary:
• Life is complicated! • We need to connect experiments • To do so, overcome technical barriers and social barriers (more difficult)