Top Banner
Publishing 3.0, or: Why we will all be disintermediated, (and that is a good thing!) Anita de Waard Disrup@ve Technologies Director, Elsevier Labs, Burlington, VT (= not what the program says !) AAMC GREAT/GRAND Mee@ng September 21, 2012
38

deWaardAAMC2012

May 10, 2015

Download

Technology

Anita de Waard

AAMC Great/Grand meeting plenary lecture, September 21, 2012, Nashville, TN
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: deWaardAAMC2012

Publishing  3.0,  or:    Why  we  will  all  be  disintermediated,    

(and  that  is  a  good  thing!)  

Anita  de  Waard    Disrup@ve  Technologies  Director,    

Elsevier  Labs,  Burlington,  VT  (=  not  what  the  program  says  J!)  

AAMC  GREAT/GRAND  Mee@ng  September  21,  2012  

 

Page 2: deWaardAAMC2012

What’s  the  big  deal  with  big  data?  Decoding  the  human  genome  involves  analysing  3  billion  base  pairs—it  took  ten  years  the  first  @me  it  was  done,  in  2003,  but  can  now  be  achieved  in  one  week.  Data,  Data  Everywhere,  The  Economist,  February  25,  2010  Mobile  Internet  devices  will  outnumber  humans  this  year,  

Cisco  predicts…Global  mobile  data  traffic  is  expected  to  increase  18-­‐fold  over  the  next  five  years  to  10.8  exabytes  per  month.  Cloud  traffic  is  expected  to  account  for  71%,  or  7.6  exabytes  per  month,  of  total  mobile  data  traffic  by  2016.  ‘Big  data’  offers  huge  challenges  for  biomedicine    

in  an  era  of  massive  data  sets…    Francis  Collins,  Director  of  NIH,  Yesterday  

Facebook  stores  100  petabytes  in  Hadoop.  

Page 3: deWaardAAMC2012

Your  funders  are  telling  you    to  share  your  data:  

•  NSF  Data  Sharing  Policy:  Inves8gators  are  expected  to  share  with  other  researchers,  at  no  more  than  incremental  cost  and  within  a  reasonable  @me,  the  primary  data,  samples,  physical  collec8ons  and  other  suppor8ng  materials  created  or  gathered  in  the  course  of  the  work  under  NSF  grants.    

•  NIH  Data  Sharing  Policy:  Final  Research  Data  should  be  made  as  widely  and  freely  available  as  possible  while  safeguarding  the  privacy  of  par@cipants,  and  protec@ng  confiden@al  and  proprietary  data.  Final  Research  Data  means  recorded  factual  material  commonly  accepted  in  the  scien8fic  community  as  necessary  to  document  and  support  research  findings.  This  does  not  mean  summary  sta@s@cs  or  tables;  rather,  it  means  the  data  on  which  summary  sta@s@cs  and  tables  are  based.      

Page 4: deWaardAAMC2012

So  are  you  sharing  your  data?      Really?  

Page 5: deWaardAAMC2012

5  

Crea@ng  more  data  by  the  minute.  

1

Home(64%)

Search(36%)

People  manager(23%)

Employment  law(15%)

Search  (35%)

Policies  &  Docs.(16%)

Emp.  law  Ref.  Man.  (11%)

Search  (48%)

Pols.  and  docs.  (11%)

Search  (53%)

Pols.  and  docs.(15%)

Search  (25%)

Pols.  and  doc.  (44%) Search  (26%)

Pols.  and  docs.  (49%)

Pols.  And  docs.  (53%)

Search  (15%)Search  (37%)

Pols.  and  docs.  (25%)

Home  (38%)

Search  (19%)

Policies  (13%)

Emp.  law  ref.  man.  (43%)

Search  (25%)

Search  (28%)

Emp.  law  ref.  man.  (40%)

Employment  law.  (8%)

Pols.  and  docs.  (13%)

Search  (35%)

Emp.  law  ref.  man.   (19%)

Emp.  Law  (82%)

Search  (9%)

Employment  law  (86%)

Statutory  rates  (4%)

Employment  law  (65%)

Emp.  law  ref.  man.  (24%)

Statutory  rates  (37%)

Employment  law  (31%)

Home  (8%)

Policies  (8%)

Search  (35%)

Emp.  law  ref.  man.  (17%)

Pols.  and  doc.(9%)

Legal  guidance  (8%)

Search  (48%)

Employment  law  (9%)

Emp.  law  ref.  man.  (11%)

Search  (28%)

Employment  law  (11%)

Emp.  law  ref.  man.  (63%)

Legal  guidance  (28%)

Search  (26%)

Employment  law  (14%)

Pols.  and  docs.  (32%) Employment  law  (14%)

Time:8.8minAge  :  33.6Bounce  :  1%  N=  25,423

Time:1.14minAge  :  1Bounce  :  0%  N=  16

What’s  new(9%)

Time:2.2  minAge  :  7.9Bounce  :  1.8%  N=  115,498

Time:0.4minAge  :  8.5Bounce  :  6.3%  N=  10,562

What’s  new  (16%)

Legal  guidance  (17%)

Time:3.9  minAge  :  27.7Bounce  :  0.7%  N=  2681

Time:31.9minAge  :  11.6Bounce  :  1.2%  N=  1815

Time:0.4minAge  :  8.6Bounce  :  3.6%  N=  8,563

Time:2.5minAge  :  4.8Bounce  :  28.4%  N=  5,780

Time:1.6  minAge  :  4Bounce  :  1.4%  N=  141

Time:1.7minAge  :  29.3Bounce  :  1%  N=  826

Time:1.63minAge  :  32.5Bounce  :  2.6%  N=  268

Time:2.4minAge  :  7.3Bounce  :  2.1%  N=  96 Time:1.8min

Age  :  5.4Bounce  :  0%  N=  58

Employment  law  (16%)

Time:2.8minAge  :  40Bounce  :  0%  N=  57

What’s  new  (28%)

Time:2.5minAge  :  8.7Bounce  :  0.9%  N=  6,219

Legal  guidance  (13%)Time:1.8  minAge  :  9.02Bounce  :  5.2%  N=  910

What’s  new  (36%)

Legal  reports  (11%)

Time:2.1  minAge  :  10.2Bounce  :  1.3  %  N=  230What’s  new  (20%)

Legal  reports  (33%)

Search  (16%)

Time:1.1  minAge  :  8.9Bounce  :  1  %  N=  98

What’s  new  (13%)

Search  (16%)

Legal  guidance  (24%)

Employment  law  (10%)

Time:1.1  minAge  :  9.3Bounce  :  0.8  %  N=  877

What’s  new  (17%)

Employment  law  (58%)Time:0.7minAge  :  9.2Bounce  :  4.7  %  N=  85

What’s  new  (13%)Search  (16%)Legal  guidance  (24%)Time:0.8min

Age  :  8.8Bounce  :  3.4  %  N=  174

Search  (31%)Pols.  and  doc.(17%)

Emp.  law  ref.  man.  (13%)

Time:1.7minAge  :  31.7Bounce  :  1.5  %  N=  136

Legal  reports  (16%)

What’s  new  (14%)Legal  guidance  (11%)

Time:2minAge  :  8.8Bounce  :1%  N=  104

Time:13.7minAge  :  35.4Bounce  :  2%  N=  3,561 Time:2min

Age  :  20Bounce  :  1%  N=  523Time:1.9min

Age  :  32.2Bounce  :  0%  N=  620 Time:1.6  min

Age  :  22.2Bounce  :  0.8%  N=  761Time:1.4min

Age  :  11.2Bounce  :  1.6%  N=  497

Time:2.36  minAge  :  33.5Bounce  :  0.7%  N=  427

Time:87.5minAge  :  35.6Bounce  :  2.2%  N=  7980

Page 6: deWaardAAMC2012

This  plant  tweets!  •  Internet  of  things:  we  can  interact  with  ‘objects  that  blog’  or  ‘Blogjects’,  that  track  where  they  are  and  where  they’ve  been;    

•  have  histories  of  their  encounters  and  experiences  have  agency    

•  have  a  voice  on  the  social  web  

Page 7: deWaardAAMC2012

Larry  Smarr  creates  lots  of  data:  •  He  wears:    

•  A  Fitbit  to  count  his  every  step  •  A  Zeo  to  track  his  sleep  pajerns  •  A  Polar  WearLink  that  lets  him  regulate  his    

maximum  heart  rate  during  exercise  •  23andMe  analyzed  his  DNA  for  disease  suscep@bility.  

•  Your  Future  Health  analyzed  blood  and  stool  samples  for  100  biomarkers:  •  At  one  point,  C-­‐reac@ve  protein  stood  out  as  higher  than  normal.  •  A  blood  test  showed  that  his  CRP  had  climbed  to  14.5  during  the  ajack.    •  He  took  an@bio@cs,  the  symptoms  resolved,  and  his  CRP  dropped  to  4.9—

but  that  was  s@ll  unusually  high.  •  Lactoferrin,  too,  rose  several  @mes  to  sky-­‐high  levels—200,  whereas  the  

normal  count  is  less  than  7.3  –  and  in  tandem  with  CRP  •  Smarr  now  thinks  his  diver@culi@s  ajack  was  actually  Crohn's  disease  –  and  

his  gastroenterologist  (reluctantly)  agreed.  

Page 8: deWaardAAMC2012

Clearity  Founda@on:  A  transla@onal  medicine  and  public  service  founda@on  for:  •  Providing  doctors  access  to  molecular  profiling    for  their  ovarian  cancer  pa@ents  •  Providing  doctors  and  pa@ents  clinical  trial    op@ons  informed  by  individual  tumor  biology  •  Providing  financial  support  for  the  profiling  work    for  pa@ents  –  Oprah  approved!  

As  are  lots  of  other  ‘Quan@fied  Selfers’:    

Page 9: deWaardAAMC2012

But  who  uses  all  that  data?    

Page 10: deWaardAAMC2012

•  It  knows  where  you  are  •  And  who  you  talked  to  •  And  what  you  bought    •  And  how  much  you  paid..  •  And  whether  you  need  another  pair  of  shoes  •  And  when  and  where  you  can  get  them…  

does!  

Page 11: deWaardAAMC2012

Brijany  Wenger  does!      

17-­‐year  old  Brijany  Wenger  developed  a  cloud-­‐based  neural  network  that  is  able  to  seamlessly  and  accurately  assess  8ssue  samples  for  signs/evidence  of  breast  cancer  to  give  more  credence  to  the  currently  used  (less  reliable)  minimally  invasive  procedure  called  Fine  Needle  Aspirates  (FNAs).  By  looking  at  nine  different  input  features  and  comparing  them  to  the  training  examples,  Brijany’s  cloud-­‐based  neural  network  can  detect  malignant  breast  tumors  with  an  accuracy  of  99.11%    Because  her  neural  network  is  deployed  in  the  cloud  using  Google’s  app  engine  it  means  it  can  be  accessed  from  exis8ng  medical  systems  as  well  as  through  a  web  browser  or  mobile  apps.  

Winner  of  the  Google  Science  Fair  2012  

Page 12: deWaardAAMC2012

Using  what  is  known  about  interac@ons  in  fly  &  yeast,  predict  new  interac@ons  with  a  human  protein  –  

Running  over  data  on  the  web  that  he  neither  created  nor  knew  about!  

Mark  Wilkinson  does!  Given  a  protein  P  in  Species  X:  

 Find  proteins  similar  to  P  in  Species  Y  

   Retrieve  interactors  in  Species  Y  

   Sequence-­‐compare  Y-­‐interactors  with  Species  X  

genome  

                     (1)    à  Keep  only  those  with  homologue  in    

   Find  proteins  similar  to  P  in  Species  Z  

   Retrieve  interactors  in  Species  Z  

   Sequence-­‐compare  Z-­‐interactors  with  (1)  

                         à  Puta8ve  interactors  in  Species  X    

Page 13: deWaardAAMC2012

These  are  different  Web  services!    (and  neither  of  them  Mark’s)  ...selected  at  run-­‐@me  based  on  the  same  model  

Running  the  web  like  an  experiment:  

Page 14: deWaardAAMC2012

Puyng  it  another  way:  

Page 15: deWaardAAMC2012

Science  is  becoming  distributed:  

Tools  

Thoughts  

Data  

Page 16: deWaardAAMC2012

Science  is  becoming  distributed:  

Tools  

Thoughts  

Data  

Data  is  king!  •  Data  needs  to  say  what  it’s  about  •  Data  needs  to  say  where  it  comes  from  •  Data  needs  to  know  who  owns  it  •  Data  needs  to  be  sensi@ve  to  privacy  •  Data  needs  to  know  how  it’s  used  

Page 17: deWaardAAMC2012

Science  is  becoming  distributed:  

Tools  

Thoughts  

Data  Tools  rule!    Tools  can  be  made  by  everyone:  Tools  are  open  and  free  Tools  will  know  where  data  lives  Tools  need  to  know  about  data:  •  Privacy/ownership    •  Trustworthiness  •  Provenance  

Page 18: deWaardAAMC2012

Science  is  becoming  distributed:  

Tools  

Thoughts  

Data  

If  data  and  tools  are  ubiquitous,  what  majers  most  are  the  ques@ons  you  ask:  • What  is  interes@ng?    • What  is  important?    • Who  cares?    

Page 19: deWaardAAMC2012

Science  is  becoming  more  distributed:  

So  where  does  that  leave  you?  

Page 20: deWaardAAMC2012

How  can  you  prepare    (your  students)  for  this  future?    

Well,  you  can’t  -­‐  not  really.    But  there  are  a  few  habits    you  can  ins@ll  (and  model):    

Page 21: deWaardAAMC2012

Habit  #  1:  Be  a  good  data  producer  •  Know  that  you  are  crea@ng  data  •  Be  aware  of  privacy  and  IPR  issues  re.  your  data  •  Assume  that  someone,  some  @me  will  be  using  this  data  for  some  purpose  you  cannot  imagine  

•  Learn  which  data  repositories  exist  in  your  field,  how  they  work,  what  they  need  from  you  

•  Set  up  your  work  habits  to  automa@cally  create  (or  force  you  to  add)  metadata  to  enable  discovery  and  use  of  your  data.  

•  Store  your  data  in  the  repositories.  Every  @me.  

Page 22: deWaardAAMC2012

Habit  #2:  Be  a  good  data  consumer.    

•  Find  out  which  data  exists  that  might  be  relevant  to  your  work.  

•  Learn  how  to  query  available  data.  •  Be  aware  of  privacy  and  IPR  licenses.    •  Give  credit  where  it’s  due:  – Cite  any  data  sources  that  you  use  – Share  your  knowledge  on  querying  data  – Deposit  any  data  you’ve  derived  from  other  data!    

Page 23: deWaardAAMC2012

Habit  #3:  Learn  to  code.    •  Brijany  Wenger  was  born  in  1995!    •  All  sorts  of  people  are  using  technology  that  was  invented  a{er  the  birth  of  your  oldest  grandchild.    

•  Use  anything  at  your  disposal  to  learn:    –  Your  students  –  Your  kids  – Online  forums  –  Video  tutorials,    

•  Etc.  etc.    •  E.g.  Coursera  course  on  Clinical  Research    InformaKcs  -­‐  see  Cynthia  Gadd  (Vanderbilt)    

Page 24: deWaardAAMC2012

Habit  #  4:  Expect  to  keep  learning.    •  This  will  only  get  worse!  (Or:  bejer?)  •  Listen  to  Douglas  Engelbart:    

(he  invented  the  mouse  and  the  cursor,  as  well  as  collabora@ve  work):  “[For]  improving  the  intellectual  effecKveness  of  the  individual  human  being…[o]ne  of  the  tools  that  shows  the  greatest  immediate  promise  is  the  computer…”  (1962)  “The  grand  challenge  is  to  boost  the  collecKve  IQ  of  organizaKons  and  of  society.”  (2000)    

•   Expect  to  keep  learning    –  from  anyone,  and  anywhere  –  the  only  thing  that  can  limit  your  success  is  the  idea  that  you  can’t/don’t  have  to  learn/change/adapt/evolve  

Page 25: deWaardAAMC2012

Richard  Feynman  on  Scien@fic  Integrity:  if  you're  doing  an  experiment,  you  should  report  everything  that  you  think  might  make  it  invalid  -­‐  not  only  what  you  think  is  right  about  it  If  you  make  a  theory,  for  example,  and  adver@se  it,  or  put  it  out,  then  you  must  also  put  down  all  the  facts  that  disagree  with  it,  as  well  as  those  that  agree  with  it.  When  you  have  put  a  lot  of  ideas  together  to  make  an  elaborate  theory,  you  want  to  make  sure,  when  explaining  what  it  fits,  that  those  things  it  fits  are  not  just  the  things  that  gave  you  the  idea  for  the  theory;  but  that  the  finished  theory  makes  something  else  come  out  right,  in  addi@on.  

Habit  #  5:  Don’t  find    what  you  already  know.  

Page 26: deWaardAAMC2012

Habit  #  6:  Anyone  can  come  up    with  a  great  idea.  

•  To  paraphrase  Remi  the  Rat  (Ratatouille):    ‘Not  everyone    can  be  a  great  scienKst,  but  a  great  scienKst  can  come  from  anywhere’    

•  Grand  challenges,  hackathons,  open  invita@ons  etc  etc  can  offer  great  solu@ons  to  difficult  problems  (See  Cameron  for  the  story  of  Tim  Gowers,  who  crowdsourced  math)  

•  See  also  Collins’  talk  yesterday:  issues  with  race/ethnicity  need  to  be  overcome;  involve  students  from  around  the  world  

•  Involve  K-­‐12  students:  get  more  kids  excited  about  science!  

Page 27: deWaardAAMC2012

Tools  

Thoughts  

Data  

Six  habits  that  might  help:  

3.  Learn  to  code  4.  Expect  to  keep  learning    

5.  Don’t  find  what  you  already  know  6.  Anyone  can  come  up  with  a  great  idea!  

1.  Be  a  good  data  producer  2.  Be  a  good  data  consumer  

Page 28: deWaardAAMC2012

Anyway  -­‐  how  are  we  going  to  publish  all  of  this?    

Page 29: deWaardAAMC2012

Not  like  this!  

Page 30: deWaardAAMC2012

How  are  we  going  to    publish  all  of  this?    

We’re  not.    YOU  are.    

(With  support  from  ‘us’    =  publishers,  libraries,  ins@tu@ons,  crowd…)  

Page 31: deWaardAAMC2012

Maybe  as  Executable  Papers….  

Page 32: deWaardAAMC2012

Or  by  linking  data  to  hospital  info  systems..    

Electronic Patient Records Clinical  Guideline  

Data

Step 1: Patient data + diagnosis link to Guideline recommendation

Step 2: Guideline recommendation links to evidence in report or data

Page 33: deWaardAAMC2012

Or  by  crea@ng  Linked  Data  stores...  

33

Images from: Discovering drug–drug interactions: a text-mining and reasoning approach based on properties of drug metabolism, Luis Tari∗, Saadat Anwar, Shanshan Liang, James Cai and Chitta Baral Vol. 26 ECCB 2010, pages i547–i553 doi:10.1093/bioinformatics/btq382

Step  1:  Manually  iden@fy  DDIs  and  drug  names  in  wide  collec@on  of  content  sources  

Step  2:  Develop  a  model  of  Drug-­‐Drug  Interac@on  and  define  candidates  

Step  3:  Automate  this  process  and  store  as  Linked  Data  

Page 34: deWaardAAMC2012

Calculate,  coordinate…    

Compile,  comment,  compare…  

6.  Run  ni{y  apps  over  all  of  this.    

Or  by  gra{ing  stories  onto  your  data…    1.  Add  metadata  to  everything  metadata  

metadata  

metadata  

metadata  

metadata  

5.  The  reviewer  approves    (or  comments,  author  revises,  etc)  

2.  Use  a  workflow  tool  

4.  Invite  reviews  

Review  Edit  

Revise  

Rats  were  subjected  to  two  grueling  tests  (click  on  fig  2  to  see  underlying  data).  These  results  suggest  that  the  neurological  pain  pro-­‐  

3.  Write  in  a  shared  space  

Page 35: deWaardAAMC2012

Or  by  other  ways…    •  Force11.org:  ‘Future  of  Research  Communica@ons  and  e-­‐Science’:  –  ‘Society’  for  thinking  about  new  ways  of  communica@ng    science  and  the  humani@es  

–  Invi@ng  general  par@cipa@on  –  Please  join!  

Page 36: deWaardAAMC2012

In  summary:    •  Big  data  and  linked  tools  are  completely  changing  the  face  of  science  by  distribu@ng  the  crea@on  of  data,  the  building  of  tools,  and  the  intelligent  use  of  both  

•  Social  media  and  open  educa@on  are  changing  who  can  do  science,  and  how  it  is  done  

•  Publishing  all  of  this  will  not  be  a  simple  act,  and  not  something  publishers  can  do  alone.    

•  All  of  this  offers  tremendous  opportuni@es  to  expand  the  prac@ce  and  promise  of  science  

•  The  best  thing  you  can  do  is  prepare  to  be  amazed…    

Page 37: deWaardAAMC2012

P.S.:  Do  we  have  any  jobs  for  your  graduates?  Maybe!  Some  intriguing  ideas:    •  Internships/traineeships?    •  Use  cases  for  classes  on  informa@cs,  e.g.:  – Elsevier  provides  content/ontologies  – Students  develop  ways  to  integrate  data  and  publica@ons  

– Students  help  user  tes@ng/UI,  model  development  

•  Host  joint  grand  challenges?    •  Certainly  there  will  be  lots  of  work  in  the  informa@cs  arena  –  with  publishers,  digital  repositories,  startups,  etc,  etc…    

Page 38: deWaardAAMC2012

Ques@ons?  

[email protected]