Incidental Collaboratories For Experimental Data, Or: Why life is so complicated
(and what we might be able to do about it)
Anita de Waard VP Research Data Collabora?ons, Elsevier RDS
Jericho, VT, USA
Outline • Brief bio • The problem: life is complicated • What we can do to understand it • About Elsevier Research Data Services • A pilot project • Some ques?ons.
Brief bio: • Background: – Low-‐temperature physics (Leiden & Moscow) – Joined Elsevier in 1988 as publisher in solid state physics – 1991: ArXiV => publishers will go out of business very soon!
• 1997-‐ now: Disrup?ve Technologies Director, focus on beZer representa?on of scien?fic knowledge: – Iden?fying key knowledge elements in ar?cles (linguis?cs thesis) – Building claim-‐evidence networks (through collabora?ons) – Help build communi?es to accelerate rate of change (Force11)
• Star?ng 1/1/2013: VP Research Data Collabora?ons -‐ why? – Douglas Engelbart’s thinking: connect minds! – My (non-‐biologists) understanding of biology:
Problem: a rose is not a rose: • “Single specimens of C. ermineus show unchanged
injected venom mass spectra and HPLC profiles over ?me. However, there was significant variability of the injected venom composi?on from specimen to specimen, in spite of their common biogeographic origin.”
Jose A. Rivera-‐Or?z, Herminsul Cano, Frank Marí, Intraspecies variability of the injected venom of Conus ermineus, doi:10.1016/j.pep?des.2010.11.014
• “D. desulfuricans CFA profiles for all intes?nal strains (group 1) were approximately iden?cal (98.2 to 99.8% similarity). A 92.4% similarity was evaluated in a group 2, containing six soil strains. The members of this group had 87% similarity with the type soil strain. All intes?nal strains and soil strains were similar at the 85.5% level. Strains DV-‐3/84 DV-‐7/84 (group 3) showed 76.6% similarity to each other and were similar to all other strains at the 67.6% level.”
Zofia Dzierżewicz et al., Intraspecies variability of Desulfovibrio desulfuricans strains determined by the gene?c profiles, FEMS Microbiology LeZers, Volume 219, Issue 1, 14 February 2003, Pages 69–74, doi:10.1016/S0378-‐1097(02)01199-‐0
=> A specimen is not a species!
Problem: gene expression varies with: Age: “SIRT1-‐Associated genes are deregulated in the aged brain”
Philipp Oberdoerffer et al., SIRT1 RedistribuDon on ChromaDn Promotes Genomic Stability but Alters Gene Expression during Aging, Cell, Volume 135, Issue 5, 28 November 2008, Pages 907–918, doi:10.1016/j.cell.2008.10.025
Smell: “…major urinary proteins […] mediate the pregnancy blocking effects of male urine”
P.A. Brennan, et al, PaIerns of expression of the immediate-‐early gene egr-‐1 in the accessory olfactory bulb of female mice exposed to pheromonal consDtuents of male urine, Neuroscience, Volume 90, Issue 4, June 1999, P 1463–1470, doi:10.1016/S0306-‐4522(98)00556-‐9
Hunger: “Out of the ~30K genes, about 10K are differen?ally expressed in liver cells when an animal is in different states of sa?ety.“
Zhang F, Xu X, Zhou B, He Z, Zhai Q (2011) Gene Expression Profile Change and Associated Physiological and Pathological Effects in Mouse Liver Induced by Fas?ng and Refeeding. PLoS ONE 6(11): e27553. doi:10.1371/journal.pone.002755
Light: “Longer-‐term enrichment training also altered the mRNA levels of many genes associated with structural changes that occur during neuronal growth.”
CailoZo C., et al. (2009) Effects of Nocturnal Light on (Clock) Gene Expression in Peripheral Organs: A Role for the Autonomic Innerva?on of the Liver. PLoS ONE 4(5): e5650. doi:10.1371/journal.pone.0005650:
=> Knowing genes is not knowing how they are expressed !
• “We found the diversity and abundance of each habitat’s signature microbes to vary widely even among healthy subjects, with strong niche specializa?on both within and among individuals.”
The Human Microbiome Project Consor?um, Structure, func?on and diversity of the healthy human microbiome, Nature 486, 207–214 (14 June 2012) doi:10.1038/nature11234
• “Coloniza?on of an infant’s gastrointes?nal tract begins at birth. The acquisi?on and normal development of the neonatal microflora is vital for the healthy matura?on of the immune system.”
Mackie RI, Sghir A, Gaskins HR., Developmental microbial ecology of the neonatal gastrointes?nal tract. Am J Clin Nutr. 1999 May;69(5):1035S-‐1045S
Problem: no man (or mouse) is an island…
=> An animal is an ecosystem!
Problem: system interac?ons create even greater complexity:
• Compu?ng cancer: “No amount of informa?on about what happens inside a single cell can ever tell you what a ?ssue is going to do,” [Glazier] says. “Much of the informa?on and complexity of ?ssues and life is embedded in the way cells talk to each other and the extracellular environment.”
• Megadata: “These complex emergent systems are impossible to understand,” [Agus] says. “Our level of understanding is just so cursory that we have to start to look for what they call, in physics, coarse-‐grained elements.”,”[we] founded Applied Proteomics to create a protein diagnos?c that reveals not just where a cancer is, but how it interacts with the body”
Nature Special Issue Vol. 491 No. 7425 ‘Physical Scien?sts Take On Cancer’ :
=> The whole is more than the sum of its parts!
Big problem:
hZp://en.wikipedia.org/wiki/File:Duck_of_Vaucanson.jpg
=> A specimen is not a species => Knowing genes is not knowing how they are expressed => An animal is an ecosystem => The whole is more than the sum of its parts
LIFE IS COMPLICATED!!
Sta?s?cs to the rescue! With enough observa?ons, trends and anomalies can be detected: • “Here we present resources from a popula?on of 242 healthy adults
sampled at 15 or 18 body sites up to three ?mes, which have generated 5,177 microbial taxonomic profiles from 16S ribosomal RNA genes and over 3.5 terabases of metagenomic sequence so far.”
The Human Microbiome Project Consor?um, Structure, func?on and diversity of the healthy human microbiome, Nature 486, 207–214 (14 June 2012) doi:10.1038/nature11234
• “The large sample size — 4,298 North Americans of European descent and 2,217 African Americans — has enabled the researchers to mine down into the human genome.”
Nidhi Subbaraman, Nature News, 28 November 2012, High-‐resolu?on sequencing study emphasizes importance of rare variants in disease.
• “A profile unique for a DNA sample source is obtained … a series of numbers are generated which can be used as a bar code for that DNA source. A registry of bar codes would make it easy to compare DNA samples”
Roland M. Nardone, Ph.D., Eradica?on of Cross-‐Contaminated Cell Lines: A Call for Ac?on, hZp://www.sivb.org/publicPolicy_Eradica?on.pdf
• Collect: store data at the level of the experiment: – Accessible through a single interface – With enough metadata to know what was done/seen
• Connect: allow analyses over: – Similar experiment types – Experiments done with/on similar biological ‘things’: • Species, strains, systems, cells • Anatomical components (e.g. spleen, hypothalamus) • An?bodies, biomarkers, bioac?ve chemicals, etc
We need ‘incidental collaboratories’
Problem: biological research is quite insular: • Biology is small: because objects/
equipment are 10^-‐5 – 10^2 m, you can work alone (‘King’ and ‘subjects’).
• Biology is messy: it doesn’t happen behind a terminal.
• Biology is compe??ve: different people with similar skill sets, vying for the same grants.
• In summary: it does not promote inherent collabora?on (vs., for instance, big physics or astronomy).
Prepare
Observe
Analyze
Ponder
Communicate
We need to pop the lab bubble!
Prepare
Analyze Communicate Think
Prepare
Analyze Communicate
Prepare
Analyze Communicate
Observa?ons
Observa?ons
Observa?ons
Labs go from being informa?on islands, to being ‘sensors in a network’.
Some objec?ons, and rebuZals: Objec&on: Rebu-al:
“But our lab notebooks are all on paper”
Develop smart phone/tablet apps for data input
“I need to see a direct benefit from something I spend my ?me on”
Develop ‘data manipula?on dashboard’ for PI to allow beZer access to full experimental output for his/her lab
“I am afraid other people might scoop my discoveries”
Develop intra-‐lab data communica?on systems first and allow ?med/granular data export
“I want things to be peer reviewed before I expose them”
Allow reviewers access to experimental database before publica?on (of data or paper)
“I don’t really trust anyone else’s data – well, except for the guys I went to Grad School with…”
Add a social networking component to this data repository so you know who (to the individual) created that data point.
Elsevier Research Data Services: Goals
1. Help add more data into (exis?ng, open) data repositories: more data in, annotated, available
2. Make them more interoperable: work towards collaboratory model by connec?ng databases
3. Find ways to make them sustainable, e.g.: – Service-‐level agreements: to funders/ins?tutes – With Lab notebook: subscrip?ons to projects – Back-‐end analy?cs: to companies
RDS Guiding Principles: • In principle, all open data stays open and URLs, front end etc. stay where they are (i.e. with repository)
• Collabora?on is tailored to data repositories’ unique needs/interests and of a ‘service-‐model’ type: – Aspects where collabora?on is needed are discussed – A collabora?on plan is drawn up using a Service-‐Level Agreement: agree on ?me, condi?ons, etc.
– All communica?on, finance, IPR etc. is completely transparent at all ?mes.
• Very small (2/3 people) department; immediate communica?on; instant deployment of ideas
RDS Approach:
• Collaborate and build on rela?onships with data repositories
• Integrate with other content sources, if possible • Build annota?on and standardisa?on tools and processes to implement this
• Develop next-‐genera?on infrastructure solu?ons for back-‐end integra?on
• Explore crea?ve revenue opportuni?es
NIF An?body Registry: Problem: • 95 an?bodies were iden?fied in 8 papers • 52 did not contain enough informa?on
to determine the an?body used • Some provided details in another paper • Failed to give species, vendor, catalog # Solu?on # 1: • Journals ask authors to provide
an?body catalog nr • Link to NIF Registry from manufacturers/
vendors’ sites
Solu?on #2: • Pilot with a lab:
Let’s start with the Urban Lab
• Ge�ng an?bodies • And messy bits • From the notebook • Into Nathan Urban’s command center
• By providing – 7” Tablets – Links to IgorPro – A dashboard UI
My ques?ons to you: • Thoughts on this approach: – In principle? – In prac?ce?
• Do you see serious hurdles: – Are we overlapping with other ini?a?ves; if so, are we complementary?
– How does this connect to libraries/local repositories? – Are there sensi?vi?es/pain points we are overlooking?
• Where to start: – Is an?bodies ok? – Is a neuroscience lab ok? – Thoughts on data repositories/pla�orms to connect to?
Your ques?ons to me?
[email protected] hZp://elsatglabs.com/labs/anita/
hZp://www.slideshare.net/anitawaard
Thanks go to: • Anita Bandrowski and Maryann Martone, NIF • Nathan Urban, Shreejoy Tripathy, CMU • David Marques, SVP RDS