Top Banner
Reproducibility in human cogni4ve neuroimaging: a communitydriven data sharing framework for provenance informa4on integra4on and interoperability Nolan Nichols Dissertation Defense Biomedical and Health Informatics University of Washington Seattle, WA, USA December 8, 2014 1
45

Reproducibility in human cognitive neuroimaging: a community-driven data sharing framework for provenance informaton integration and interoperability

Jan 22, 2017

Download

Science

Nolan Nichols
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Reproducibility in human cognitive neuroimaging: a community-driven data sharing framework for provenance informaton integration and interoperability

Reproducibility  in  human  cogni4ve  neuroimaging:  a  community-­‐driven  data  sharing  framework  for  

provenance  informa4on  integra4on  and  interoperability  

Nolan Nichols

Dissertation Defense Biomedical and Health Informatics

University of Washington Seattle, WA, USA December 8, 2014

1  

Page 2: Reproducibility in human cognitive neuroimaging: a community-driven data sharing framework for provenance informaton integration and interoperability

Outline

•  Introduction •  Background •  Research approach •  Conclusions and future directions

2  

Page 3: Reproducibility in human cognitive neuroimaging: a community-driven data sharing framework for provenance informaton integration and interoperability

Outline

•  Introduction – Motivation for Research – Research Goal

•  Background •  Research approach •  Conclusions and future directions

3  

Page 4: Reproducibility in human cognitive neuroimaging: a community-driven data sharing framework for provenance informaton integration and interoperability

Introduction: Motivation for Research

• Human Cognitive Neuroimaging•  Inves4gates  brain  structure  and  func4on  in  normal  and  neuropsychiatric  condi4ons  to  improve  human  health  

•  Facilitates  clinical  decision  making  using  imaging  and  cogni4ve  phenotypes  

4  

Page 5: Reproducibility in human cognitive neuroimaging: a community-driven data sharing framework for provenance informaton integration and interoperability

•  Biomedical Informatics (BMI) – The interdisciplinary field that studies and

pursues the effective use of biomedical data, information, and knowledge for scientific inquiry, problem solving, and decision making, motivated by efforts to improve human health

•  Neuroinformatics – Applies BMI principles to develop techniques

and tools for acquiring, sharing, storing, publishing, analyzing, modeling, visualizing and simulating data across all levels of neuroscience

Introduction: Motivation for Research

5  

Page 6: Reproducibility in human cognitive neuroimaging: a community-driven data sharing framework for provenance informaton integration and interoperability

Poline et al. (2012), Frontiers in Neuroinformatics

• Neuroinformatics Perspective•  Research is a process with distinct stages•  Provenance links together each stage

Introduction: Motivation for Research

6  

Page 7: Reproducibility in human cognitive neuroimaging: a community-driven data sharing framework for provenance informaton integration and interoperability

•  Problem:  research  is  not  reproducibile  –  Ioannidis  JPA:  Why  Most  Published  Research  Findings  Are  False.  PLoS  Med  2005  

–  Donoho  D:  An  invita9on  to  reproducible  computa9onal  research.  Biosta.s.cs  2010.  

–  Yong  EE:  Replica9on  studies:  Bad  copy.  Nature  2012  –  Editorial:  Reducing  our  irreproducibility.  Nature  2012  –  Begley  CG:  Six  red  flags  for  suspect  work.  Nature  2013  –  Collins  FS,  Tabak  LA:  Policy:  NIH  plans  to  enhance  reproducibility.  Nature  2014  

•  Reproducibility  issues  exist  along  a  spectrum  –  Sta4s4cal  issues  –  Computa4onal  issues  

Introduction: Motivation for Research

7  

Page 8: Reproducibility in human cognitive neuroimaging: a community-driven data sharing framework for provenance informaton integration and interoperability

Introduction: Motivation for Research

Can different researchers from a different lab obtain consistent results using a different methodology and data? Can different researchers

from a different lab obtain consistent results using the same methodology?

Can the same researchers in the same lab obtain consistent results using the same methodology and data?

Repeatable  

Replicable  

Reproducible  

Confi

dence  in  Findings  

Reproducibility  Spectrum  8  

Page 9: Reproducibility in human cognitive neuroimaging: a community-driven data sharing framework for provenance informaton integration and interoperability

•  Sta4s4cal  issues  – Repor4ng  bias  of  brain  volume  (Ioannidis,  2011),  fMRI  ac4va4on  foci  (David,  2013)  

– Lack  of  sta4s4cal  power  in  neuroscience  (BuZon,  2013)  

– Data  collec4on  and  analysis  methods  are  highly  flexible  across  fMRI  studies  (Carp,  2012)  

•  Computa4onal  issues  – Lack  of  data  sharing  ,  code,  and  analysis  environments  

Introduction: Motivation for Research

9  

Page 10: Reproducibility in human cognitive neuroimaging: a community-driven data sharing framework for provenance informaton integration and interoperability

Adapted from Peng (2011), Science.

Introduction: Motivation for Research

•  Reusable  Research  –  Can  different  researchers  from  a  different  lab  apply  a  methodology  to  process  shared  data  from  different  researchers  in  a  different  lab?  

10  

Page 11: Reproducibility in human cognitive neuroimaging: a community-driven data sharing framework for provenance informaton integration and interoperability

Poline et al. (2012), Frontiers in Neuroinformatics

Introduction: Motivation for Research

Barriers  to  reusable  research  •  Data  management  systems  are  not  interoperable  •  Data  acquisi4on  and  analysis  methods  lack  provenance  •  Terminologies  are  not  harmonized  (e.g.,  brain  atlases,  schemas)

11  

Page 12: Reproducibility in human cognitive neuroimaging: a community-driven data sharing framework for provenance informaton integration and interoperability

•  To  enhance  the  reusability  of  neuroimaging  data  and  workflow  code  

•  To  advance  an  informa4cs  data  exchange  standard  that  incorporates  provenance  as  a  core  concept  

•  To  engage  the  neuroinforma4cs  community  as  a  partner  in  the  design  process  

Introduction: Research Goals

12  

Page 13: Reproducibility in human cognitive neuroimaging: a community-driven data sharing framework for provenance informaton integration and interoperability

Outline

•  Introduction •  Background – Data exchange – Provenance – Linked Open Data

•  Research approach •  Conclusions and future directions

13  

Page 14: Reproducibility in human cognitive neuroimaging: a community-driven data sharing framework for provenance informaton integration and interoperability

Background: Data Exchange

hZp://xkcd.com/927/  

•  My goal is to extend existing standards to facilitate data reusability and interoperability

14  

Page 15: Reproducibility in human cognitive neuroimaging: a community-driven data sharing framework for provenance informaton integration and interoperability

XML-­‐based  Clinical  Experiment  Data  Exchange  Schema,  Gadde  et  al.  2012  

XCEDE XML Schema•  Experiment Hierarchy is composed of five levels

of information relevant to neuroimaging data exchange– Project– Subject– Visit– Study– Episode– Acquisition

Background: Data Exchange

15  

Page 16: Reproducibility in human cognitive neuroimaging: a community-driven data sharing framework for provenance informaton integration and interoperability

•  Provenance is information about entities, activities, and people involved in producing a piece of data or thing, which can be used to form assessments about its quality, reliability, or trustworthiness.–  Entity (e.g., files, data, publications)

•  a physical, digital, conceptual, or other kind or thing with some fixed aspects

–  Activity (e.g., workflow, editing a manuscript)•  something that occurs over a period of time and acts upon or

with entities–  Agent (e.g., person, software, organization)

•  something that bears some form of responsibility for an activity taking place, for the existence of an entity, or for another agent’s activity.

W3C PROV Specification Suite

Background: Data Exchange

16  

Page 17: Reproducibility in human cognitive neuroimaging: a community-driven data sharing framework for provenance informaton integration and interoperability

Background: Provenance

•  An image registration process–  wasAssociatedWith a registration algorithm–  used an native-space natomical MRI

•  A spatially-normalized anatomical MRI –  wasGeneratedBy an image registration process–  wasDerivedFrom an native-space anatomical MRI–  wasAttrbutedTo a registration algorithm

•  PROV is an extensible language to describe:–  Responsibility–  Data Flow–  Process Flow

17  

Page 18: Reproducibility in human cognitive neuroimaging: a community-driven data sharing framework for provenance informaton integration and interoperability

Background: Linked Open Data

Seman4c  Web  and  Resource  Descrip4on  Framework  

•  A  language  to  make  statements  about  unique  loca4ons  (URLs)  on  the  Web  

•  For  example,  at  the  URL  of  an  anatomical  MRI    –  ‘is  a’  hZp://neurolex.org/wiki/Nlx_156814  

18  

Page 19: Reproducibility in human cognitive neuroimaging: a community-driven data sharing framework for provenance informaton integration and interoperability

Background: Linked Open Data

19  

Page 20: Reproducibility in human cognitive neuroimaging: a community-driven data sharing framework for provenance informaton integration and interoperability

Outline

•  Introduction •  Background •  Research approach – Specific Aims – Study Design – Phase 1 – Phase 2

•  Conclusions and future directions

20  

Page 21: Reproducibility in human cognitive neuroimaging: a community-driven data sharing framework for provenance informaton integration and interoperability

Research Approach: Specific Aims

•  Aim 1: Research and design a framework to represent, access, and query neuroimaging data provenance

•  Aim 2: Develop an information system of Web services to compute and discover data provenance from brain imaging workflow

21  

Page 22: Reproducibility in human cognitive neuroimaging: a community-driven data sharing framework for provenance informaton integration and interoperability

Research Approach: Study Design

•  Phase 1 – Scalable Neuroimaging Initiative (SNI)–  West Coast collaboration funded by the National Academies

Keck Futures Initiative (NAKFI) on Imaging Science–  I led 15 meetings, 1 face-to-face workshop, and presented

preliminary results at 3 conferences

•  Phase 2 – Neuroimaging Data Sharing (NIDASH)–  Task force funded and organized by the International

Neuroinformatics Coordinating Facility (INCF)–  I gathered feedback and redesigned the initial SNI framework

over 14 face-to-face workshops, 2 hackathons, and weekly meetings over two years

22  

Page 23: Reproducibility in human cognitive neuroimaging: a community-driven data sharing framework for provenance informaton integration and interoperability

Research Approach: Study Design

23  

Page 24: Reproducibility in human cognitive neuroimaging: a community-driven data sharing framework for provenance informaton integration and interoperability

Evaluate  metadata  standards  for  data  exchange  (XCEDE)  

Extend  PROV  using  concepts  from  XCEDE  (Neuroimaging  Data  

Model)  

Redesign  NiQuery  using  a  sema4c  Web  service  oriented  architecture  

Demonstrated  a  system  for  computa4onal  access  

to  data  (NiQuery)  Phase  1  –  SNI  

Phase  2  –  NIDASH  

Aim  1  –  Data  Exchange   Aim  2  –  Informa9on  System  

Research Approach: Study Design

24  

Page 25: Reproducibility in human cognitive neuroimaging: a community-driven data sharing framework for provenance informaton integration and interoperability

Outline

•  Introduction •  Background •  Research approach – General Approach – Phase 1 – SNI – Phase 2 – NIDASH

•  Conclusions and future directions

25  

Page 26: Reproducibility in human cognitive neuroimaging: a community-driven data sharing framework for provenance informaton integration and interoperability

Research Approach: Phase 1 – SNI

•  Scalable  Neuroimaging  Ini4a4ve’s  Mission:  –  To  specify  and  demonstrate  an  applica4on  programming  interface  (API)  that  can  support  agile  explora4on  of  distributed  neuroimaging  data  sources  while  allowing  for  heterogeneous  and  evolving  data  management  systems,  ontologies,  image  data  formats,  image  processing  tools,  and  standard  anatomical  spaces.  

•  Aim  1  –  Data  Exchange:  – Applied  XCEDE  as  a  data  exchange  standard  for  two  neuroimaging  databases  

•  Aim  2  –  Informa4on  System:  –  Implemented  a  system  architecture  for  remote  access  to  content  within  neuroimaging  data  

26  

Page 27: Reproducibility in human cognitive neuroimaging: a community-driven data sharing framework for provenance informaton integration and interoperability

Aim 1•  Queries shipped out

to multiple sources•  Links are passed to

visualization app

Aim2•  Extract time series from

data remotely•  Browser and plotting all in

real-time

Research Approach: Phase 1 – SNI

27  

Page 28: Reproducibility in human cognitive neuroimaging: a community-driven data sharing framework for provenance informaton integration and interoperability

App#

NIQ#

Allen##Ins+tute# ABA#Common#

API#

www.niquery.org#

UW#Stanford#

…# UW# XNAT#Common#API#

Stanford## NIMS#Common#API#

Database#Registry#Common#Data#Exchange#Layer#WebLbased#

Applica+ons#

Query#Integrator#

Query#Processing#

NiQuery  presented  at  Neuroinforma4cs,  2012  Munich  Brinkley  (2012),  Query  Integrator.  JBI.  

•  System too slow for real-time access (~30 secs.)•  XCEDE too strict for changing datatype requirements•  Framework doesn’t incorporate formal provenance

Research Approach: Phase 1 – SNI

28  

Page 29: Reproducibility in human cognitive neuroimaging: a community-driven data sharing framework for provenance informaton integration and interoperability

Lessons  learned  •  Harmonizing  the  XCEDE  and  PROV  Schemas    

–  XCEDE has a strict hierarchical structure –  PROV is designed as a graph and compatible with semantic

Web technologies –  A harmonized XCEDE and PROV model could represent the

stages of electronic data capture, not just the experiment hierarchy

•  Solution 1: Extend PROV to represent XCEDE •  Solution 2: Redesign NiQuery using semantic Web

design concepts

Research Approach: Phase 1 – SNI

29  

Page 30: Reproducibility in human cognitive neuroimaging: a community-driven data sharing framework for provenance informaton integration and interoperability

Outline

•  Introduction •  Background •  Research approach – General Approach – Phase 1 – SNI – Phase 2 – NIDASH

•  Conclusions and future directions

30  

Page 31: Reproducibility in human cognitive neuroimaging: a community-driven data sharing framework for provenance informaton integration and interoperability

Research Approach: Phase 2 – NIDASH

•  Neuroimaging  Data  Sharing  Task  Force  Mission:  –  Aiming  at  reproducibility  for  the  sake  of  reproducibility  and  enhanced  research.  

•  Aim  1  –  Data  Exchange:  – Applied  XCEDE  as  a  data  exchange  standard  for  two  neuroimaging  databases  

•  Aim  2  –  Informa4on  System:  –  Implemented  a  system  architecture  for  remote  access  to  content  within  neuroimaging  data  

31  

Page 32: Reproducibility in human cognitive neuroimaging: a community-driven data sharing framework for provenance informaton integration and interoperability

Research Approach: Phase 2 – NIDASH Neuroimaging  Data  Model  (NIDM)  

32  

Page 33: Reproducibility in human cognitive neuroimaging: a community-driven data sharing framework for provenance informaton integration and interoperability

•  Extensions  to  PROV  using  elements  from  the  XCEDE  experiment  hierarchy,  workflow  tools,  and  derived  data  to  create  Domain  Object  Models  

•  Enables  a  model  bridging  informa4on  from  experiment,  workflow  provenance,  and    derived  data   Keator,  et  al.  2013  

Research Approach: Phase 2 – NIDASH

33  

Page 34: Reproducibility in human cognitive neuroimaging: a community-driven data sharing framework for provenance informaton integration and interoperability

Research Approach: Phase 2 – NIDASH

34  

Page 35: Reproducibility in human cognitive neuroimaging: a community-driven data sharing framework for provenance informaton integration and interoperability

NIDM  Collabora4on  •  Mee4ngs  on  Monday  and  Wednesday  to  discuss  previous  week’s  issues  

•  Satellite  mee4ngs  at  HBM,  SfN,  Imaging  Gene4cs,  and  Neuroinforma4cs  for  1-­‐2  days  each  

•  General  Workflow  to  Contribute  –  Contributors  create  a  “fork”  from  Github  (an  online  version  control  system  with  

–  Changes  the  vocabulary  ad  examples  are  logged  as  “commits”  in  the  contributors  “fork”  

–  Contributor  submits  a  “pull  request”  to  have  changes  reviewed  

–  Discussion  takes  place  online  un4l  consensus  is  reached  

35  

Page 36: Reproducibility in human cognitive neuroimaging: a community-driven data sharing framework for provenance informaton integration and interoperability

Aim 2: Design and MethodsWeb services for brain imaging: Demo Query App

36  

Page 37: Reproducibility in human cognitive neuroimaging: a community-driven data sharing framework for provenance informaton integration and interoperability

37  

Page 38: Reproducibility in human cognitive neuroimaging: a community-driven data sharing framework for provenance informaton integration and interoperability

38  

Page 39: Reproducibility in human cognitive neuroimaging: a community-driven data sharing framework for provenance informaton integration and interoperability

NIDM  Results  

•  A  full  descrip4on  is  outside  the  scope  of  this  talk…  but  

39  

Page 40: Reproducibility in human cognitive neuroimaging: a community-driven data sharing framework for provenance informaton integration and interoperability

NIDM Results  •  A  harmonized  model  for  repor4ng  task-­‐based  fMRI  across  SPM,  FSL  and  (soon)  AFNI  

hZp://nidm.nidash.org/specs/nidm-­‐results.html   40  

Page 41: Reproducibility in human cognitive neuroimaging: a community-driven data sharing framework for provenance informaton integration and interoperability

NIDM Results  •  All  terms  are  modeled  with  an  iden4fier,  a  defini4on,  domain/range,  and  examples  

•  Model  fipng:  

41  

Page 42: Reproducibility in human cognitive neuroimaging: a community-driven data sharing framework for provenance informaton integration and interoperability

NIDM  Results  

42  

Page 43: Reproducibility in human cognitive neuroimaging: a community-driven data sharing framework for provenance informaton integration and interoperability

Outline

•  Introduction •  Background •  Research approach •  Conclusions and future directions – Contributions –  Implications – Future Directions

43  

Page 44: Reproducibility in human cognitive neuroimaging: a community-driven data sharing framework for provenance informaton integration and interoperability

Conclusions  and  future  direc4ons  •  Collabora4ve  Framework  Outcomes  –   Github  is  an  effec4ve  tool  for  standards  development  

•  Closed  89  issues  •  1,087  commits  •  9  contributors  •  1  publica4on,  specifica4on  suite  

•  Sorware  engineering  outcomes  –  Implemented  in  Nipype  for  workflow  management  –  Being  used  to  model  task  fMRI    

•  Implemented  for  SPM  12  and  FSL  –  Being  incorporated  into  NeuroVault  for  automated  popula4on  of  a  database  to  share  SPMs  

44  

Page 45: Reproducibility in human cognitive neuroimaging: a community-driven data sharing framework for provenance informaton integration and interoperability

AcknowledgmentsCommittee MembersJames Brinkley (Chair)Susan Coldwell(GSR)Thomas GrabowskiNicholas Anderson

Neuroinformatics CommunitySatra Ghosh, Rich Stoner, JB

Poline, David Keator, Karl Helmer, Camille Maumet, Tom Nichols, Dan Marcus, Christian

Haselgrove, Jessica Turner, David Kennedy, Jack van Horn…

and many others!

Scalable Neuroimaging InitiativeUW: Todd Detwiler, Randy Frank

Stanford: Brian Wandell, Bob Dougherty, Gunnar Schaeffer

Integrated Brain Imaging CenterKatie Askren, Peter Boord, Elliot

Collins, Tina Guan, Clark Johnson, Tara Madhyastha, Sonya Mehta,

Todd Richards, Rosalia Tungaraza, Kurt Weaver, Karl Woelfer, Liza

Young… and everyone else!

45