Top Banner
Incidental Collaboratories For Experimental Data, Or: Why life is so complicated (and what we might be able to do about it) Anita de Waard VP Research Data Collabora?ons, Elsevier RDS Jericho, VT, USA
20

Why life is so complicated

May 10, 2015

Download

Documents

Anita de Waard

Talk for 3dimensional virtual cell symposium, San Diego, CA, December 14 2012
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Why life is so complicated

Incidental  Collaboratories  For  Experimental  Data,  Or:    Why  life  is  so  complicated    

(and  what  we  might  be  able  to  do  about  it)  

Anita  de  Waard  VP  Research  Data  Collabora?ons,  Elsevier  RDS  

Jericho,  VT,  USA  

Page 2: Why life is so complicated

Outline    •  Brief  bio  •  The  problem:  life  is  complicated    • What  we  can  do  to  understand  it  •  About  Elsevier  Research  Data  Services  •  A  pilot  project  •  Some  ques?ons.  

Page 3: Why life is so complicated

Brief  bio:  •  Background:    –  Low-­‐temperature  physics  (Leiden  &  Moscow)  –  Joined  Elsevier  in  1988  as  publisher  in  solid  state  physics  –  1991:  ArXiV  =>  publishers  will  go  out  of  business  very  soon!  

•  1997-­‐  now:  Disrup?ve  Technologies  Director,  focus  on  beZer  representa?on  of  scien?fic  knowledge:  –  Iden?fying  key  knowledge  elements  in  ar?cles  (linguis?cs  thesis)  –  Building  claim-­‐evidence  networks  (through  collabora?ons)  –  Help  build  communi?es  to  accelerate  rate  of  change  (Force11)  

•  Star?ng  1/1/2013:  VP  Research  Data  Collabora?ons  -­‐  why?    –  Douglas  Engelbart’s  thinking:  connect  minds!  – My  (non-­‐biologists)  understanding  of  biology:  

Page 4: Why life is so complicated

Problem:  a  rose  is  not  a  rose:  •  “Single  specimens  of  C.  ermineus  show  unchanged  

injected  venom  mass  spectra  and  HPLC  profiles  over  ?me.  However,  there  was  significant  variability  of  the  injected  venom  composi?on  from  specimen  to  specimen,  in  spite  of  their  common  biogeographic  origin.”  

Jose  A.  Rivera-­‐Or?z,  Herminsul  Cano,  Frank  Marí,  Intraspecies  variability  of  the  injected  venom  of  Conus  ermineus,  doi:10.1016/j.pep?des.2010.11.014  

•  “D.  desulfuricans  CFA  profiles  for  all  intes?nal  strains  (group  1)  were  approximately  iden?cal  (98.2  to  99.8%  similarity).  A  92.4%  similarity  was  evaluated  in  a  group  2,  containing  six  soil  strains.  The  members  of  this  group  had  87%  similarity  with  the  type  soil  strain.  All  intes?nal  strains  and  soil  strains  were  similar  at  the  85.5%  level.  Strains  DV-­‐3/84  DV-­‐7/84  (group  3)  showed  76.6%  similarity  to  each  other  and  were  similar  to  all  other  strains  at  the  67.6%  level.”  

Zofia  Dzierżewicz  et  al.,  Intraspecies  variability  of  Desulfovibrio  desulfuricans  strains  determined  by  the  gene?c  profiles,  FEMS  Microbiology  LeZers,  Volume  219,  Issue  1,  14  February  2003,  Pages  69–74,  doi:10.1016/S0378-­‐1097(02)01199-­‐0    

=>  A  specimen  is  not  a  species!  

Page 5: Why life is so complicated

Problem:  gene  expression  varies  with:  Age:  “SIRT1-­‐Associated  genes  are  deregulated  in  the  aged  brain”  

Philipp  Oberdoerffer  et  al.,  SIRT1  RedistribuDon  on  ChromaDn  Promotes  Genomic  Stability  but  Alters  Gene  Expression  during  Aging,  Cell,  Volume  135,  Issue  5,  28  November  2008,  Pages  907–918,  doi:10.1016/j.cell.2008.10.025  

Smell:  “…major  urinary  proteins  […]  mediate  the  pregnancy  blocking  effects  of  male  urine”  

P.A.  Brennan,  et  al,  PaIerns  of  expression  of  the  immediate-­‐early  gene  egr-­‐1  in  the  accessory  olfactory  bulb  of  female  mice  exposed  to  pheromonal  consDtuents  of  male  urine,  Neuroscience,  Volume  90,  Issue  4,  June  1999,  P  1463–1470,  doi:10.1016/S0306-­‐4522(98)00556-­‐9  

Hunger:  “Out  of  the  ~30K  genes,  about  10K  are  differen?ally  expressed  in  liver  cells  when  an  animal  is  in  different  states  of  sa?ety.“  

Zhang  F,  Xu  X,  Zhou  B,  He  Z,  Zhai  Q  (2011)  Gene  Expression  Profile  Change  and  Associated  Physiological  and  Pathological  Effects  in  Mouse  Liver  Induced  by  Fas?ng  and  Refeeding.    PLoS  ONE  6(11):  e27553.  doi:10.1371/journal.pone.002755    

Light:  “Longer-­‐term  enrichment  training  also  altered  the  mRNA  levels  of  many  genes  associated  with  structural  changes  that  occur  during  neuronal  growth.”  

CailoZo  C.,  et  al.  (2009)  Effects  of  Nocturnal  Light  on  (Clock)  Gene  Expression  in  Peripheral  Organs:  A  Role  for  the  Autonomic  Innerva?on  of  the  Liver.  PLoS  ONE  4(5):  e5650.  doi:10.1371/journal.pone.0005650:    

 

=>  Knowing  genes  is  not  knowing  how  they  are  expressed  !  

Page 6: Why life is so complicated

•  “We  found  the  diversity  and  abundance  of  each  habitat’s  signature  microbes  to  vary  widely  even  among  healthy  subjects,  with  strong  niche  specializa?on  both  within  and  among  individuals.”  

The  Human  Microbiome  Project  Consor?um,  Structure,  func?on  and  diversity  of  the  healthy  human  microbiome,  Nature  486,  207–214  (14  June  2012)  doi:10.1038/nature11234  

•  “Coloniza?on  of  an  infant’s  gastrointes?nal  tract  begins  at  birth.  The  acquisi?on  and  normal  development  of  the  neonatal  microflora  is  vital  for  the  healthy  matura?on  of  the  immune  system.”    

Mackie  RI,  Sghir  A,  Gaskins  HR.,  Developmental  microbial  ecology  of  the  neonatal  gastrointes?nal  tract.  Am  J  Clin  Nutr.  1999  May;69(5):1035S-­‐1045S  

Problem:  no  man  (or  mouse)  is  an  island…    

=>  An  animal  is  an  ecosystem!  

Page 7: Why life is so complicated

Problem:  system  interac?ons  create    even  greater  complexity:    

•  Compu?ng  cancer:    “No  amount  of  informa?on  about  what  happens  inside  a  single  cell  can  ever  tell  you  what  a  ?ssue  is  going  to  do,”  [Glazier]  says.  “Much  of  the  informa?on  and  complexity  of  ?ssues  and  life  is  embedded  in  the  way  cells  talk  to  each  other  and  the  extracellular  environment.”    

•  Megadata:  “These  complex  emergent  systems  are  impossible  to  understand,”  [Agus]  says.  “Our  level  of  understanding  is  just  so  cursory  that  we  have  to  start  to  look  for  what  they  call,  in  physics,  coarse-­‐grained  elements.”,”[we]  founded  Applied  Proteomics  to  create  a  protein  diagnos?c  that  reveals  not  just  where  a  cancer  is,  but  how  it  interacts  with  the  body”  

Nature  Special  Issue  Vol.  491  No.  7425  ‘Physical  Scien?sts  Take  On  Cancer’  :    

=>  The  whole  is  more  than  the  sum  of  its  parts!  

Page 8: Why life is so complicated

Big  problem:  

hZp://en.wikipedia.org/wiki/File:Duck_of_Vaucanson.jpg  

=>  A  specimen  is  not  a  species  =>  Knowing  genes  is  not  knowing  how  they  are  expressed  =>  An  animal  is  an  ecosystem  =>  The  whole  is  more  than  the  sum  of  its  parts      

LIFE  IS  COMPLICATED!!    

Page 9: Why life is so complicated

Sta?s?cs  to  the  rescue!    With  enough  observa?ons,  trends  and  anomalies  can  be  detected:  •   “Here  we  present  resources  from  a  popula?on  of  242  healthy  adults  

sampled  at  15  or  18  body  sites  up  to  three  ?mes,  which  have  generated  5,177  microbial  taxonomic  profiles  from  16S  ribosomal  RNA  genes  and  over  3.5  terabases  of  metagenomic  sequence  so  far.”    

The  Human  Microbiome  Project  Consor?um,  Structure,  func?on  and  diversity  of  the  healthy  human  microbiome,  Nature  486,  207–214  (14  June  2012)  doi:10.1038/nature11234  

•  “The  large  sample  size  —  4,298  North  Americans  of  European  descent  and  2,217  African  Americans  —  has  enabled  the  researchers  to  mine  down  into  the  human  genome.”    

Nidhi  Subbaraman,  Nature  News,  28  November  2012,  High-­‐resolu?on  sequencing  study  emphasizes  importance  of  rare  variants  in  disease.  

•  “A  profile  unique  for  a  DNA  sample  source  is  obtained    …  a  series  of  numbers  are  generated  which  can  be  used  as  a  bar  code  for  that  DNA  source.  A  registry  of  bar  codes  would  make  it  easy  to  compare  DNA  samples”    

Roland  M.  Nardone,  Ph.D.,  Eradica?on  of  Cross-­‐Contaminated  Cell  Lines:  A  Call  for  Ac?on,  hZp://www.sivb.org/publicPolicy_Eradica?on.pdf  

 

Page 10: Why life is so complicated

•  Collect:  store  data  at  the  level  of  the  experiment:  – Accessible  through  a  single  interface  – With  enough  metadata  to  know  what  was  done/seen  

•  Connect:  allow  analyses  over:    – Similar  experiment  types    – Experiments  done  with/on  similar  biological  ‘things’:  •  Species,  strains,  systems,  cells  •  Anatomical  components  (e.g.  spleen,  hypothalamus)  •  An?bodies,  biomarkers,  bioac?ve  chemicals,  etc  

 

We  need  ‘incidental  collaboratories’  

Page 11: Why life is so complicated

Problem:  biological  research  is  quite  insular:  •  Biology  is  small:  because  objects/

equipment  are  10^-­‐5  –  10^2  m,  you  can  work  alone  (‘King’  and  ‘subjects’).    

•  Biology  is  messy:  it  doesn’t  happen  behind  a  terminal.    

•  Biology  is  compe??ve:  different  people  with  similar  skill  sets,  vying  for  the  same  grants.    

•  In  summary:  it  does  not  promote  inherent  collabora?on  (vs.,  for  instance,  big  physics  or  astronomy).  

Prepare  

Observe  

Analyze  

Ponder  

Communicate  

Page 12: Why life is so complicated

We  need  to  pop  the  lab  bubble!  

Prepare  

Analyze   Communicate   Think  

Prepare  

Analyze   Communicate  

Prepare  

Analyze   Communicate  

Observa?ons  

Observa?ons  

Observa?ons  

Labs  go  from  being  informa?on  islands,    to  being  ‘sensors  in  a  network’.  

Page 13: Why life is so complicated

Some  objec?ons,  and  rebuZals:  Objec&on:   Rebu-al:  

“But  our  lab  notebooks  are  all  on  paper”  

Develop  smart  phone/tablet  apps  for  data  input  

“I  need  to  see  a  direct  benefit  from  something  I  spend  my  ?me  on”    

Develop  ‘data  manipula?on  dashboard’  for  PI  to  allow  beZer  access  to  full  experimental  output  for  his/her  lab  

“I  am  afraid  other  people  might  scoop  my  discoveries”    

Develop  intra-­‐lab  data  communica?on  systems  first  and  allow  ?med/granular  data  export  

“I  want  things  to  be  peer  reviewed  before  I  expose  them”    

Allow  reviewers  access  to  experimental  database  before  publica?on  (of  data  or  paper)  

“I  don’t  really  trust  anyone  else’s  data  –  well,  except  for  the  guys  I  went  to  Grad  School  with…”    

Add  a  social  networking  component  to  this  data  repository  so  you  know  who  (to  the  individual)  created  that  data  point.    

Page 14: Why life is so complicated

Elsevier  Research  Data  Services:  Goals  

1.  Help  add  more  data  into  (exis?ng,  open)  data  repositories:  more  data  in,  annotated,  available  

2.  Make  them  more  interoperable:  work  towards  collaboratory  model  by  connec?ng  databases  

3.  Find  ways  to  make  them  sustainable,  e.g.:  – Service-­‐level  agreements:  to  funders/ins?tutes  – With  Lab  notebook:  subscrip?ons  to  projects  – Back-­‐end  analy?cs:  to  companies  

Page 15: Why life is so complicated

RDS  Guiding  Principles:  •  In  principle,  all  open  data  stays  open  and  URLs,  front  end  etc.  stay  where  they  are  (i.e.  with  repository)  

•  Collabora?on  is  tailored  to  data  repositories’    unique  needs/interests  and  of  a  ‘service-­‐model’  type:    – Aspects  where  collabora?on  is  needed  are  discussed  – A  collabora?on  plan  is  drawn  up  using  a  Service-­‐Level  Agreement:  agree  on  ?me,  condi?ons,  etc.    

– All  communica?on,  finance,  IPR  etc.  is  completely  transparent  at  all  ?mes.    

•  Very  small  (2/3  people)  department;  immediate  communica?on;  instant  deployment  of  ideas  

 

Page 16: Why life is so complicated

RDS  Approach:  

•  Collaborate  and  build  on  rela?onships  with  data  repositories  

•  Integrate  with  other  content  sources,  if  possible  •  Build  annota?on  and  standardisa?on  tools  and  processes  to  implement  this  

•  Develop  next-­‐genera?on  infrastructure  solu?ons  for  back-­‐end  integra?on  

•  Explore  crea?ve  revenue  opportuni?es  

Page 17: Why life is so complicated

NIF  An?body  Registry:  Problem:    •  95  an?bodies  were  iden?fied  in  8  papers  •  52  did  not  contain  enough  informa?on    

to  determine  the  an?body  used  •  Some  provided  details  in  another  paper  •  Failed  to  give  species,  vendor,  catalog  #  Solu?on  #  1:    •  Journals  ask  authors  to  provide    

an?body  catalog  nr    •  Link  to  NIF  Registry  from  manufacturers/

vendors’  sites  

Solu?on  #2:    •  Pilot  with  a  lab:    

 

Page 18: Why life is so complicated

Let’s  start  with  the  Urban  Lab    

•  Ge�ng  an?bodies    •  And  messy  bits      •  From  the  notebook    •  Into  Nathan  Urban’s  command  center    

•  By  providing  – 7”  Tablets  – Links  to  IgorPro  – A  dashboard  UI  

Page 19: Why life is so complicated

My  ques?ons  to  you:  •  Thoughts  on  this  approach:    –  In  principle?    –  In  prac?ce?  

•  Do  you  see  serious  hurdles:    – Are  we  overlapping  with  other  ini?a?ves;  if  so,  are  we  complementary?  

– How  does  this  connect  to  libraries/local  repositories?    – Are  there  sensi?vi?es/pain  points  we  are  overlooking?    

•  Where  to  start:    –  Is  an?bodies  ok?    –  Is  a  neuroscience  lab  ok?  –  Thoughts  on  data  repositories/pla�orms  to  connect  to?    

Page 20: Why life is so complicated

Your  ques?ons  to  me?  

[email protected]  hZp://elsatglabs.com/labs/anita/    

hZp://www.slideshare.net/anitawaard    

Thanks  go  to:  •  Anita  Bandrowski  and  Maryann  Martone,  NIF  •  Nathan  Urban,  Shreejoy  Tripathy,  CMU  •  David  Marques,  SVP  RDS