Big Data Analytics in Science and Research: New Drivers ... Johnson.pdf · The Fourth Paradigm, the Internet of Things, Automated Data Extraction Methods, and Big Data Analytics –

Post on 22-May-2020

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

Transcript

Big Data Analytics in Science and Research: New Drivers for

Growth and Global Challenges

Richard A. Johnson CEO, Global Helix LLC and BLS, National Academy of Sciences

ICCP Foresight Forum – Big Data Analytics and Policies

22 October 2012

johnsri@alum.mit.edu

Session 3: 4 Questions for Discussion

Q1 – Importance of data openness and interoperability for science and research, especially in biomedicine and health?

Q2 – Are current IPR regimes ≈ data-intensive scientific discovery?

Q3 – Do we still need scientific methods (and traditional domain scientists) in an era of big data analytics?

Q4 – How, and why, does this matter for policy?

Convergence of Biology with Physical Sciences & Engineering through Data and Data Analytics = the

“New Biology” or Third Revolution in the Life Sciences Foundational trend in STI for next 20 years – NAS (2010); MIT (2011)

Genomic Data is Increasing Faster than Computing Power –

Convergence of 3 key DATA DRIVERS with RESEARCH and ECONOMIC VALUE: (1)Sequencing + (2) Synthesis + (3) Reading AND Writing DNA

Data Tools in the Life Sciences: Moore’s Law on Steroids

Gene Expression Data Sets (Nature 2012)

Life Sciences and Biomedical Research as an Information Science: Quantitative, Data-driven,

Simulation-oriented, Predictive Science

Data and Convergence Driving the Future: Data Analytic Tools, Platforms, and Measurement for New Sources of Growth

6

• Technology Convergence, Data Analytics and Metrology as Interdependent Drivers (Agilent 2012)

Synthetic Biology

Energy and the Environment

Advancing High Growth Economies

Portable, Mobile and Out-of-Lab

Nanotechnology

Food Safety

Personalized Medicine

Single Cells and Microbiome

Intern Executive Speaker Series

Beyond Interoperability, The Power of Interconvertibility: FROM

PHYSICAL LIVING MATERIAL/DNA to DIGITAL DATA, and back 1’s and 0’s ↔ A, C, T, G’s

“IT from Bits” (Poste 2012)

• Programming: increasing ability to both Read and Write DNA

• DNA Construction (analog to Read/Write; 1’s and 0’s manipulation) - Genetic Expression Operating Systems; Scale DNA construction engineering

• Data enables Decoupling:

biological processes from evolution-based descent and replication + design from fabrication

Tools to Edit and Write Genomes: MAGE + CAGE (Church/Isaacs 2011, 2012)

Big Data and Data Analytics Drive new 21st Century Infrastructures and KNMs, and Create Opportunities for New

Research, Better Health Outcomes, and Value Creation (Toward Precision Medicine: Building a Knowledge Network for Biomedical Research and New Taxonomy of Disease: NAS 2011)

The Creative Destruction of Medicine (Topol 2012)

Data Sharing, Disease Modeling and

Biomarkers to Accelerate the Development

Big Data and Engineering Biology as the Transformative “New Normal” in the Life Sciences Driving New Sources of Growth

Synthetic Biology - Standardization, Abstraction and Modularity

Predictive Platforms for Engineering Biology and Predictable Integration of new Genetic Designs built on Massive Data • “an Engineering

METHODOLOGY to construct complex systems and novel properties based on biological components” (EU-US Task Force, June 2010)

Data-driven and Engineering Biology “Value Proposition” Increasingly Drives Science, New Sources of Growth, and our

ability to meet societal Grand Challenges – NAS 2011

Neuroscience – a 21st Century Frontier for Human Understanding and Grand Challenges

Traversing the scales at all levels in understanding the brain from molecular and cellular to systems – neurons (100 Billion)/synapses (150 Trillion), and neural signaling

Human Connectome Project = mapping neural networks with >1 million more connections than the genome has letters of DNA, and linking all this to other life experience data sets

ENCODE: the Encyclopedia of DNA Elements – Big Data, Data Analytics, and Big Science increasingly change how we do

science (Sept. 2012)

The Plasticity of IPR/Open Science Meanings – and lots of rethinking in different domains about IPR, Openness and

Scientific Research

• IPR and Competing Visions of Openness

Open Science (Public domain; BioBricks library/BBF) v. Open Source (IPR-driven; GPL, BSD, CC) v. Open Standards v. Open Development v. Open Access (including reuse and sharing public-funded data) v. Open Innovation (depends on strong, well-functioning IPR system)

• Innovative New Thinking– e.g., Semi-commons as a new lens to view Data – interacting common and private uses that are dynamic/scalable over the same resources and that can adjust through contracting and other mechanisms

• Knowledge Networks and Markets (KNMs) and Knowledge-based Capital KBC) – major OECD initiatives on-going

• Growing Counter-intuitive View that Role of IPR Increasingly Important as a Tool to Promote Openness, Transparency, and Diffusion , e.g., Algorithms, Data Exchanges, Tools and Re-use

Growing Linkage of Data-intensive Science, IPR, and New Models of Innovation: Big Data Analytics Intersect with

Open Innovation, Multi-directional S&T, University-Industry Partnering, New Business Models, Forward-looking IPR, and New Public-Private Collaborative Mechanisms to Enable Cutting-edge

Research and Innovation

The Fourth Paradigm, the Internet of Things, Automated Data Extraction Methods, and Big Data Analytics – the Need for a New

Generation of Scientific computing tools and platforms to manage, visualize and analyze Big Data for Research

(Gray 2009)

Wide Range of New Data Analytic Convergence Challenges with Policy Implications (Gray 2009)

Risks to Scientific Research from (Bad) Data Analytics?

- Jeopardize reproducibility

- Retard pace of research

- Produce poorly written code/bad algorithms on which science relies

- Create serious errors in scientific outcomes, and the interpretations of them

New Day-to-day Science Research Implications of Big Data: Data Analytics Challenges

• Which data to keep – in what format? for how long? • What about “emergent properties”? – resulting from

elaborate networks of interactions and data patterns • How to deal with data distributed across many

locations, formats, scales, etc., and merge them? • How to model large complex data, and derive valuable

knowledge from analytics/models? • How to infuse data into complex computations to

enable simulations of predictive value? • How to deal with different kinds of big data (temporal,

spatial, dimensional, heterogeneous) – Massive data – High-dimensional data – Multi-modal data – Real-time and Streaming data

In a data-driven science era, should we still fund, “incentivize” and value Empirical, Theoretical, Model-based Approaches to Scientific

Discovery? Is Popper’s scientific method paradigm outdated?

• “I believe that math is trumping science. What I mean by that is you don't really have to know why, you just have to know that if a and b happen, c will happen.” Vivek Ranadivé, entrepreneur and CEO, financial-data software company TIBCO (2011)

• “With enough numbers, the data speak for themselves” Chris Anderson, Editor-in-Chief, Wired, “The End of Theory: The Data Deluge Makes the Scientific Method Obsolete (2008)

• “All models are wrong, and increasingly you can succeed without them.” Peter Norvig, Director of Research, Google

• “The numbers have no way of speaking for themselves….Data-driven

predictions can succeed — and they can fail. It is when we deny our role in the process that the odds of failure rise. Before we demand more of our data, we need to demand more of ourselves.” Nate Silver, The Signal and the Noise: Why So Many Predictions Fail – but Some Don’t (2012)

• “The invalid assumption that correlation implies cause is probably among the two or three most serious and common errors of human reasoning.” Stephen Jay Gould, American evolutionary biologist (1981)

Thank you!

Contact Information -- Richard A. Johnson

CEO, Global Helix LLC

richard.johnson@globalhelix.net

MIT

johnsri@alum.mit.edu

top related