BD2K @ NIH - A Vision Through 2020

Post on 16-Apr-2017

959 Views

Category:

Education

0 Downloads

Preview:

Click to see full reader

Transcript

BD2K @ NIH – A Vision Through 2020

Philip E. Bourne, PhD, FACMIAssociate Director for Data Science

philip.bourne@nih.gov

First and foremost you should see this meeting as a celebration of the hard work of the past two years

Yes these are uncertain times, but …

There is a commitment to the BD2K program through 2020

BD2K cannot be viewed in isolation, but rather as part of a broader view of data science @ NIH …

Particularly as funding is increasingly from the IC’s

A View Which Includes:

• A vibrant research program of:– Fundamental developments in data science– Application of those fundamental developments– Flagship projects to which developments are applied:

• PMI, Brain, Moonshot, ECHO

• A sustainable data ecosystem– Commons and the FAIR Principles adoption– Cross-cutting activities

• Increased workforce training• A changing governance model

A Strategic Response can be Modeled on Three Axes:

Research

Resources

Outcomes

A Strategic Response

Research

Resources

Outcomes

• Fundamental• Machine learning• Data mining• Indexing• Predictive modeling …

• Applied• Sustainability, governance,

economics of data• Privacy and security• Effective use of clouds …

A Strategic Response

Research

Resources

Outcomes

• Standards• Commons

APIsReference data setsWorkflowsAccess &

Authentication• Workforce

• Fundamental• Machine learning• Data mining• Indexing• Predictive modeling …

• Applied• Sustainability, governance,

economics of data• Privacy and security• Effective use of clouds …

A Strategic Response

Research

Resources

Outcomes

• Standards• Commons

APIsReference data setsWorkflowsAccess &

Authentication• Workforce

• Fundamental• Machine learning• Data mining• Indexing• Predictive modeling …

• Applied• Sustainability, governance,

economics of data• Privacy and security• Effective use of clouds …

• Evaluated pilots• FAIR data• Trained workforce• Best practices• Policies• Effective use of clouds• On-ramps for all IC’s

A View Which Includes:

• A vibrant research program of:– Fundamental developments in data science– Application of those fundamental developments– Flagship projects to which developments are applied:

• PMI, Brain, Moonshot, ECHO

• A sustainable data ecosystem– Commons and the FAIR Principles adoption– Cross-cutting activities

• Increased workforce training• A changing governance model

The Current Situation

• NIH Funded Data– Total data from NIH-funded research currently estimated at 650 PB*– 20 PB of that is in NCBI/NLM (3%) and it is expected to grow by 10 PB this year

• Dark Data– Only 12% of data described in published papers is in recognized archives –

88% is dark data^

• Cost– 2007-2014: NIH spent ~$1.2Bn extramurally on maintaining data archives

* In 2012 Library of Congress was 3 PB^ http://www.ncbi.nlm.nih.gov/pubmed/26207759

The Commons - Status

• Commons and FAIR principles* adopted across NIH• Development and public release of a prototype Data

Discovery Index– DataMed

• Feb. v 1.0• Nov v 1.5

• Cloud credits being issued for work in the Commons• FOA’s for Commons Framework being issued• Commons pilots under way

* https://www.ncbi.nlm.nih.gov/pubmed/26978244

Sustainability – Sample Other Activities

• Request for Information: Metrics to Assess Value of Biomedical Digital Repositories (NOT-OD-16-133)– To be discussed at Sustainability Session, Wed 1pm

• RFA to support community based standards work was released in the fall for May 2017 award, session today 1pm

• Funding opportunity announcement: (BD2K) Enhancing the Efficiency and Effectiveness of Digital Curation for Biomedical Big Data (RFA-LM-17-001)Applications due Dec 15

Sustainability – Looking Forward

• International collaboration on business models for sustainable data repositories– Sustainable Business Models for Data Repositories (OECD Global

Science Forum)– Future of Life Sciences and Biomedical Databases (International

Human Science Frontiers Program)• NIH long-term data repository support

– Federal interagency Workshop on Measuring the Impact of Data Repositories, 2017

– Recommend mechanism(s), review criteria, implementation plan

Example Cross-cutting Activities

• International partnerships• Count everything – Secure count query

framework• California centers regional meetings• GA4GH – Beacon project

A View Which Includes:

• A vibrant research program of:– Fundamental developments in data science– Application of those fundamental developments– Flagship projects to which developments are applied:

• PMI, Brain, Moonshot, ECHO

• A sustainable data ecosystem– Commons and the FAIR Principles adoption– Cross-cutting activities

• Increased workforce training• A changing governance model

NLM

• Working Group Report – http://

acd.od.nih.gov/reports/Report-NLM-06112015-ACD.pdf

– Recommendation – NLM should become the programmatic epicenter for data science at NIH …

• Patti Brennan – New NLM director

What We Hope to See in 2020

• New innovations bought about by large and complex data

• Evidence of translation i.e. real application at the point of care

• Broad Commons adoption leading to– Improved sharing, reuse and hence cost effectiveness and

reproducibility• A balance between what is spent on data vs what is

gained from that data• Policies that are supportive of the above

… for your hard work and to the NIH staff from the ADDS office and from across the IC’s who have toiled to make BD2K a success

top related