Top Banner
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015 AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015 Perspectives from the NIH Associate Director for Data Science (ADDS) Office Vivien Bonazzi, Ph.D. Senior Advisor for Data Science Technologies & Innovation NIH Office of the Associate Director for Data Science (ADDS)
48

Perspectives from the NIH Associate Director for Data Science (ADDS) Office

Aug 05, 2015

Download

Technology

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Perspectives from the NIH Associate Director for Data Science (ADDS) Office

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

Perspectives from the NIH Associate Director for Data Science (ADDS) Office

Vivien Bonazzi, Ph.D.Senior Advisor for Data Science Technologies & Innovation

NIH Office of the Associate Director for Data Science (ADDS)

Page 2: Perspectives from the NIH Associate Director for Data Science (ADDS) Office

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

BIOMEDICAL

Page 3: Perspectives from the NIH Associate Director for Data Science (ADDS) Office

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

NIH Data

Page 4: Perspectives from the NIH Associate Director for Data Science (ADDS) Office

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

NIH Data

Page 5: Perspectives from the NIH Associate Director for Data Science (ADDS) Office

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

Page 6: Perspectives from the NIH Associate Director for Data Science (ADDS) Office

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

Page 7: Perspectives from the NIH Associate Director for Data Science (ADDS) Office

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

NIH Addresses Big Data• In response to the

incredible growth of large biomedical (digital) datasets, the Director of NIH established a special Data and Informatics Working Group (DIWG).

VolumeVelocityVarietyVeracity

Page 8: Perspectives from the NIH Associate Director for Data Science (ADDS) Office

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

US Government Memo Increasing Access to the Results of Federally Funded Scientific Research

In Feb 2013 the US OSTP issued a memo calling for all Federal

Agencies to make digital assets from federally funded research available.Each agency’s public access plan shall:

Maximize access, by the general public and without charge, to digitally formatted scientific

data created with Federal funds while:i) protecting confidentiality and personal privacy, ii) recognizing proprietary interests, business confidential information, and intellectual property rights and avoiding significant negative impact on intellectual property

rights, innovation, and U.S. competitiveness, and iii) preserving the balance between the relative value of long-term preservation and access and the associated cost and administrative burden.

Provide for the assessment of long-term needs for the preservation of scientific data and outline options for developing and sustaining repositories for scientific data in digital formats.

Page 9: Perspectives from the NIH Associate Director for Data Science (ADDS) Office

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

Federal Science Policy Changes• NIH and other Federal Agencies are working to make digital assets from

federally funded research available.

• Public Access to Data Memo: http://www.whitehouse.gov/sites/default/files/microsites/ostp/ostp_public_access_memo_2013.pdf

• Applies to publications and digital scientific data

• Develop a strategy for: – leveraging existing archives (where appropriate) – fostering public-private partnerships with scientific journals relevant to the

agency’s research

Page 10: Perspectives from the NIH Associate Director for Data Science (ADDS) Office

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

NIH Response

Establish new data science research and training programs: Big Data to Knowledge (BD2K) - 2013http://datascience.nih.gov/bd2k

Establish a new position: NIH Associate Director of Data Science(ADDS)

Dr. Phil Bourne - 2014

Page 11: Perspectives from the NIH Associate Director for Data Science (ADDS) Office

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

The Future of Open Data• The nature of the scientific enterprise is evolving.

• Must transform into a digital enterprise

(as have other industries: music, financial, advertising)

• To enable biomedical research as a digital enterprise through which new discoveries are made and knowledge generated by maximizing community engagement and productivity.

Page 12: Perspectives from the NIH Associate Director for Data Science (ADDS) Office

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

ADDS Mission StatementTo use data science

to foster an

open digital ecosystem

that will accelerate

efficient, cost-effective

biomedical research

to enhance health, lengthen

life, and reduce illness and

disability

Page 13: Perspectives from the NIH Associate Director for Data Science (ADDS) Office

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

ADDS Strategy • Discovery and Innovation

Enabling major scientific discovery and innovation through the BD2K Initiative

• Workforce developmentStrengthen the ability of a diverse biomedical workforce to develop and benefit from data science

• Policy and processContribute to policies & processes involving data that further the NIH mission

• LeadershipFurther visibility of NIH leadership in data science by the public, DHHS, USG at large, and international

funders

• SustainabilityTo foster a sustainable, efficient, and productive data science ecosystem

Sustainability

Workforce Development

Discovery & Innovation

Policy & Process

Leadership

Page 14: Perspectives from the NIH Associate Director for Data Science (ADDS) Office

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

ADDS Strategy • Discovery and Innovation

Enabling major scientific discovery and innovation through the BD2K Initiative

• Workforce development

Strengthen the ability of a diverse biomedical workforce to develop and benefit from data science

• Policy and process

Contribute to policies & processes involving data that further the NIH mission

• Leadership

Further visibility of NIH leadership in data science by the public, DHHS, USG at large, and international funders

• Sustainability

To foster a sustainable, efficient, and productive data science

ecosystem: The Commons

Sustainability

Workforce Development

Discovery & Innovation

Policy & Process

Leadership

Page 15: Perspectives from the NIH Associate Director for Data Science (ADDS) Office

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

The Commonsenabling the digital enterprise

Page 16: Perspectives from the NIH Associate Director for Data Science (ADDS) Office

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

What is The Commons?

• Treats products of research – data, methods, papers etc. as digital objects

• These digital objects exist in a shared virtual space

• Digital objects conform to FAIR principles:– Findable– Accessible (and usable)

– Interoperable – Reusable

Page 17: Perspectives from the NIH Associate Director for Data Science (ADDS) Office

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

• A shared virtual space where scientists can:– Find– Deposit– Manage– Share and – Reuse data, software, metadata and workflows

• An environment to find and catalyze the use of shared digital research objects

What is The Commons?

Page 18: Perspectives from the NIH Associate Director for Data Science (ADDS) Office

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

The Commons: Components• Computing environment

– cloud and/or HPC

– supports access, utilization, sharing and storage of digital objects.

• Methods for Interoperability– enables connectivity, shareability and interoperability between digital objects.

– APIs, Containers (docker etc)

• Digital object compliance model – describes the properties of digital objects that enables them to be discoverable and

shareable

– Metadata, UIDs, Clear access controls (human subject data)

• Indexing– Means to find and catalog digital objects

Page 19: Perspectives from the NIH Associate Director for Data Science (ADDS) Office

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

The Commons: Components

Page 20: Perspectives from the NIH Associate Director for Data Science (ADDS) Office

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

Computing Environment: Cloud The ability to store, share and compute on digital research

objects

Especially useful for large data sets that are not easily computed locally

Scalable and Elastic

Pay per use - Cost effective

An environment that fosters collaboration

Page 21: Perspectives from the NIH Associate Director for Data Science (ADDS) Office

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

The Commons: Cloud Commercial

AWS, Google, Microsoft, IBM Others

Academic OSC (Open Science Cloud) iDASH (HIPAA compliant)

The Broad Others

Page 22: Perspectives from the NIH Associate Director for Data Science (ADDS) Office

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

The Commons: HPC• Supercomputing Centers in the US

– Supported by DOE and NSF• NERSC(San Francisco)

• ORNL (Oak Ridge)

• TACC (Texas)

• SDSC (San Diego)

• Argonne (Urbana- Champaign)

• Optimized, high performance systems with IT support

Page 23: Perspectives from the NIH Associate Director for Data Science (ADDS) Office

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

The Commons: Interoperability

Page 24: Perspectives from the NIH Associate Director for Data Science (ADDS) Office

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

The Commons: Interoperability• Software that supports connectivity and interoperability

between digital (data) objects

– API (Application Programing Interfaces)• Expose and and provide direct access to data

• Enable data to be passed to analysis tools or pipelines

– Containers• Package and deploy software tools and pipelines to the cloud

Page 25: Perspectives from the NIH Associate Director for Data Science (ADDS) Office

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

The Commons: Digital Object Compliance

Page 26: Perspectives from the NIH Associate Director for Data Science (ADDS) Office

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

The CommonsDigital Object Compliance: FAIR

• Attributes of digital objects in the Commons • Initial Phase

• Unique digital object identifiers of some type

• A minimal set of searchable metadata

• Physically available in a cloud based Commons provider

• Clear access rules (especially important for human subjects data)

• An entry (with metadata) in one or more indices

– Future Phases• Standard, community based unique digital object identifiers

• Conform to community approved standard metadata for enhanced searching

• Digital objects accessible via open standard APIs

• Are physically and logical available to the commons

Page 27: Perspectives from the NIH Associate Director for Data Science (ADDS) Office

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

The Commons: PI Perspective

The Commons(infrastructure)Cloud Provider

ACloud Provider

BCloud Provider

C

Investigator

Enables Search

Discovery Index

Indexes

PI

1. Efficiency

Digital object ComplianceInteroperability SW

Page 28: Perspectives from the NIH Associate Director for Data Science (ADDS) Office

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

Commons Pilot Projects

Page 29: Perspectives from the NIH Associate Director for Data Science (ADDS) Office

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

Commons Pilot Projects• Evaluating Commons Framework & Populating the

Commons

– NIH funded Large Resource groups BD2K groups (cloud)

– HMP Data and tools available in the cloud (AWS)

• https://aws.amazon.com/datasets/1903160021374413

– NCI Cloud Pilots & Genomic Data Commons (AWS, Google)

• The Cloud Credits - business model for using cloud resources

Page 30: Perspectives from the NIH Associate Director for Data Science (ADDS) Office

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

Commons Credits (business model)

The Commons(infrastructure)Cloud Provider

ACloud Provider

BCloud Provider

C

Investigator

NIH

Provides credits Enables Search

Discovery Index

Uses credits inthe Commons IndexesOption:

Direct Funding

Page 31: Perspectives from the NIH Associate Director for Data Science (ADDS) Office

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

• Cost effective - Only pay for IT support used

• Drives competition – Better services at lower cost

• Supports data access and sharing by driving science into the Commons

• Can help determine metrics of data object usage

• Facilitates public-private partnership

• Never been tried, so we don’t have data about likelihood of success

• Cost Models: Predicated prices among providers

• Service Providers: Predicated on service providers willing to make the investment to become conformant

• Persistence: The model is ‘Pay As You Go’ which means if you stop paying it stops going

Cloud Credits: Pros and Cons

Page 32: Perspectives from the NIH Associate Director for Data Science (ADDS) Office

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

Thank You.This presentation will be loaded to SlideShare the week following the Symposium.

http://www.slideshare.net/AmazonWebServices

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

Vivien Bonazzi: [email protected] Komatsoulis: [email protected]

Page 33: Perspectives from the NIH Associate Director for Data Science (ADDS) Office

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

Secure Genomics Analysis on Amazon Web Services

Angel PizarroScientific Computing, Amazon Web Services

[email protected]

©2015, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Page 34: Perspectives from the NIH Associate Director for Data Science (ADDS) Office

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

Shared responsibility model

Page 35: Perspectives from the NIH Associate Director for Data Science (ADDS) Office

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

FacilitiesPhysical securityCompute infrastructureStorage infrastructureNetwork infrastructureVirtualization layer (Amazon EC2)Hardened service endpointsRich AWS Identity & Access Management (IAM) capabilities

ApplicationsAuth & acct managementAuthorization policiesProper service configurationNetwork configurationSecurity groupsOS firewallsOperating systems

+ =

• Re-focus your security professionals on a subset of the problem

• Partners can further reduce that burden

• Take advantage of high levels of uniformity and automation

The shared responsibility modelAuditedCustomer + Partner

Page 36: Perspectives from the NIH Associate Director for Data Science (ADDS) Office

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

Genomics Data Security

Page 37: Perspectives from the NIH Associate Director for Data Science (ADDS) Office

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

Store and analyze restricted-access genomics on AWS

bit.ly/aws-dbgap

Page 38: Perspectives from the NIH Associate Director for Data Science (ADDS) Office

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

NIH security best practices• Physical security

– Data center access and remote administrator access

• Electronic security– User account security (for example, passwords)– Use of access control lists (ACLs)– Secure networking– Encryption of data in transit and at rest– OS and software patching

• Data access security– Authorization of access to data– Tracking copies; cleaning up after use

Page 39: Perspectives from the NIH Associate Director for Data Science (ADDS) Office

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

EnterpriseApplications

Virtual Desktops

Collaboration and Sharing

PlatformServices

Databases

Caching

Relational

NoSQL

Analytics

Hadoop

Real-time

Data Workflows

Data Warehouse

App Services

Queuing

Orchestration

App Streaming

Transcoding

Email

Search

Deployment & Management

Containers

DevOps Tools

Resource Templates

Usage Tracking

Monitoring and Logs

Mobile Services

Identity

Sync

Mobile Analytics

Notifications

FoundationServices

Compute(VMs, Auto Scaling and Load Balancing)

Storage(Object, Block, and Archive)

Security & Access Control

Networking

Infrastructure Regions CDN and Points of PresenceAvailability Zones

Page 40: Perspectives from the NIH Associate Director for Data Science (ADDS) Office

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

Amazon Virtual Private Cloud (Amazon VPC)

Create secure network configurations for working with sensitive data

EC2

10.0.2.12

AWS region – VPC network isolation

AZ A AZ B

VPC 10.0.0.0/16

SN 10.0.1.0/24 (DMZ) SN 10.0.2.0/24 (Private)

(23.20.103.11)

Internet

EC2

10.0.1.11

Internet GW Service

Virtual Gateway

Page 41: Perspectives from the NIH Associate Director for Data Science (ADDS) Office

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

EnterpriseApplications

Virtual Desktops

Collaboration and Sharing

PlatformServices

Databases

Caching

Relational

NoSQL

Analytics

Hadoop

Real-time

Data Workflows

Data Warehouse

App Services

Queuing

Orchestration

App Streaming

Transcoding

Email

Search

Deployment & Management

Containers

DevOps Tools

Resource Templates

Usage Tracking

Monitoring and Logs

Mobile Services

Identity

Sync

Mobile Analytics

Notifications

FoundationServices

Compute(VMs, Auto Scaling and Load Balancing)

Storage(Object, Block, and Archive)

Security & Access Control

Networking

Infrastructure Regions CDN and Points of PresenceAvailability Zones

Page 42: Perspectives from the NIH Associate Director for Data Science (ADDS) Office

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

Encrypt your data prior to sending to AWS

Your applications in your data center

Your applications in Amazon EC2Encrypted

data

AWS Services

Amazon S3

Amazon Glacier

Amazon Redshift

Amazon Elastic Block Store

Page 43: Perspectives from the NIH Associate Director for Data Science (ADDS) Office

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

Encryption: a brief primer

PlaintextPHI

Hardware/Software

EncryptedPHI

SymmetricData Key

Encrypted Data Key

Master KeySymmetricData Key

?

EncryptedData in Storage

Key Hierarchy

?

Page 44: Perspectives from the NIH Associate Director for Data Science (ADDS) Office

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

Encryption of AWS storage services

Amazon EBS

Amazon S3

• HTTPS• AES-256 server-side encryption• AWS or customer-provided or customer-managed keys• Each object gets its own key

• End-to-end secure network traffic• Whole volume encryption• AWS or customer-managed keys• Encrypted incremental snapshots• Minimal performance overhead (uses Intel AES-NI)

Page 45: Perspectives from the NIH Associate Director for Data Science (ADDS) Office

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

S3 server encryption with AWS fully-managed keys

PlaintextPHI

EncryptedPHI

SymmetricData KeyS3 Web Server

HTTPS

CustomerPHI

Encrypted Data Key

Master KeySymmetricData Key

S3 StorageFleet

A master key managed by S3 and protected by systems internal to AWS in a distinct system

Page 46: Perspectives from the NIH Associate Director for Data Science (ADDS) Office

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

AWS Key Management Service

A service that enables you to provision and use encryption keys to protect your data

Allows you to create, use, and manage encryption keys from within…Your own applications via the AWS SDK

Supported AWS services (Amazon S3, Amazon EBS, Amazon Redshift)

Available in all commercial regions

Can be used in a key hierarchy to secure data encryption keys protecting PHI

Page 47: Perspectives from the NIH Associate Director for Data Science (ADDS) Office

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

AWS services integrate with AWS KMS• 2-tiered key hierarchy using envelope encryption

• Data keys encrypt customer data

• AWS KMS customer master keys encrypt data keys

• Benefits:• Limits blast radius of compromised resources and

their keys• Better performance• Easier to manage a small number of master keys

than billions of resource keys

Master Key(s)

Data Key 1

S3 Object EBS Volume Amazon RDS Instance

Amazon Redshift Cluster

Data Key 2 Data Key 3 Data Key 4 Data Key 5

Your Application

Keys encrypted

Data encrypted

KMS

Page 48: Perspectives from the NIH Associate Director for Data Science (ADDS) Office

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

Thank You.This presentation will be loaded to SlideShare the week following the Symposium.

http://www.slideshare.net/AmazonWebServices

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015