Top Banner
Deep-Time Data Infrastructure: A DCO Legacy Program Robert M. Hazen—Geophysical Lab, Carnegie Institution DCO Data Science Day—RPI—June 5, 2014
59

Deep-Time Data Infrastructure: A DCO Legacy Program Robert M. Hazen—Geophysical Lab, Carnegie Institution DCO Data Science Day—RPI—June 5, 2014.

Dec 14, 2015

Download

Documents

Diego Forte
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Deep-Time Data Infrastructure: A DCO Legacy Program Robert M. Hazen—Geophysical Lab, Carnegie Institution DCO Data Science Day—RPI—June 5, 2014.

Deep-Time Data Infrastructure: A DCO Legacy Program

Robert M. Hazen—Geophysical Lab, Carnegie InstitutionDCO Data Science Day—RPI—June 5, 2014

Page 2: Deep-Time Data Infrastructure: A DCO Legacy Program Robert M. Hazen—Geophysical Lab, Carnegie Institution DCO Data Science Day—RPI—June 5, 2014.

Conclusions

Vast, largely untapped, data resources inform our view of Earth’s dynamic

history over 4.5 billion years.

Combining those deep-time data resources into a single infrastructure

represents an opportunity for accelerated “abductive” discovery.

Page 3: Deep-Time Data Infrastructure: A DCO Legacy Program Robert M. Hazen—Geophysical Lab, Carnegie Institution DCO Data Science Day—RPI—June 5, 2014.

Deep-Time Data CollaboratorsCarnegie Institution Robert Hazen Xiaoming Liu Anat ShaharRutgers Paul FalkowskiRPI Peter FoxUniv. of Arizona Robert Downs Mihei Ducea Grethe Hystad Barbara Lafuente Hexiong Yang Alex Pires Joaquin Ruiz Joshua Golden Melissa McMillan Shaunna Morrison

CalTech Ralph MillikenUniv. of Maine Edward GrewSmithsonian Inst. Timothy McCoyUniv. of Manitoba Andrey BekkerMINDAT.ORG Jolyon RalphColorado State Holly Stein Aaron ZimmermanUniv. of Tennessee Linda KahUniv College London Dominic PapineauGeorge Mason Univ. Stephen Elmore

Johns Hopkins Univ. Dimitri Sverjensky Charlene Estrada John Ferry Namhey LeeHarvard University Andrew KnollIndiana University David BishUniv. of Michigan Rodney EwingUniv. of Maryland James Farquhar John NanceUniv. of Wisconsin John ValleyGeol. Survey Canada Wouter Bleeker

Page 4: Deep-Time Data Infrastructure: A DCO Legacy Program Robert M. Hazen—Geophysical Lab, Carnegie Institution DCO Data Science Day—RPI—June 5, 2014.

Deep-Time Data ResourcesMineralogy and petrology data:

Mineral species and assemblages

Compositions (including isotopes)

Age (ages)

Geographic location; tectonic setting

Crystal size; morphology; twinning

Solid and fluid inclusions; defects; Magnetic domains; zoning; exsolution

Surface properties; grain boundaries

Page 5: Deep-Time Data Infrastructure: A DCO Legacy Program Robert M. Hazen—Geophysical Lab, Carnegie Institution DCO Data Science Day—RPI—June 5, 2014.

Mineralogy and petrology data

Paleobiology data

Fossil species and assemblages

Age

Biominerals; isotopic composition

Molecular biomarkers

Host lithology

Geological/tectonic context

Deep-Time Data Resources

Page 6: Deep-Time Data Infrastructure: A DCO Legacy Program Robert M. Hazen—Geophysical Lab, Carnegie Institution DCO Data Science Day—RPI—June 5, 2014.

Mineralogy and petrology data

Paleobiology data

Proteomics data

Enzyme structure and function

Age (from phylogenetics)

Active site composition

Microbial context

Deep-Time Data Resources

Page 7: Deep-Time Data Infrastructure: A DCO Legacy Program Robert M. Hazen—Geophysical Lab, Carnegie Institution DCO Data Science Day—RPI—June 5, 2014.

Mineralogy and petrology data

Paleobiology data

Proteomics data

Geochemistry data and modeling

Thermochemical data

Equilibrium and reaction path models

Deep-Time Data Resources

Page 8: Deep-Time Data Infrastructure: A DCO Legacy Program Robert M. Hazen—Geophysical Lab, Carnegie Institution DCO Data Science Day—RPI—June 5, 2014.

Mineralogy and petrology data

Paleobiology data

Proteomics data

Geochemistry data and modeling

Paleotectonic & Paleomagnetic Data

Age

Deep-Time Data Resources

Page 9: Deep-Time Data Infrastructure: A DCO Legacy Program Robert M. Hazen—Geophysical Lab, Carnegie Institution DCO Data Science Day—RPI—June 5, 2014.

This is the IMA Mineral Database website, with a direct link to the Mineral

Evolution Database.

Page 10: Deep-Time Data Infrastructure: A DCO Legacy Program Robert M. Hazen—Geophysical Lab, Carnegie Institution DCO Data Science Day—RPI—June 5, 2014.

This map displays the localities. The popup demonstrates metadata for a given locality.

Page 11: Deep-Time Data Infrastructure: A DCO Legacy Program Robert M. Hazen—Geophysical Lab, Carnegie Institution DCO Data Science Day—RPI—June 5, 2014.

The Premise: Rocks, minerals, fossils, and life’s biochemistry hold

clues to significant changes in Earth’s near-surface environment

through 4.5 billion years of history.

The Potential of Deep-Time Data

Page 12: Deep-Time Data Infrastructure: A DCO Legacy Program Robert M. Hazen—Geophysical Lab, Carnegie Institution DCO Data Science Day—RPI—June 5, 2014.

The Rise of Atmospheric Oxygen

Lyons et al. (2014) Nature 506, 307-314.

D.E.Canfield (2014) Oxygen. Princeton Univ. Press

Page 13: Deep-Time Data Infrastructure: A DCO Legacy Program Robert M. Hazen—Geophysical Lab, Carnegie Institution DCO Data Science Day—RPI—June 5, 2014.

The Rise of Atmospheric Oxygen

Kump (2008) Nature 451, 277-278.

?

Page 14: Deep-Time Data Infrastructure: A DCO Legacy Program Robert M. Hazen—Geophysical Lab, Carnegie Institution DCO Data Science Day—RPI—June 5, 2014.

The Rise of Atmospheric Oxygen

D.E.Canfield (2014) Oxygen. Princeton Univ. Press.

Lyons et al. (2014) Nature 506, 307-314.

Page 15: Deep-Time Data Infrastructure: A DCO Legacy Program Robert M. Hazen—Geophysical Lab, Carnegie Institution DCO Data Science Day—RPI—June 5, 2014.

= Major metal element

= Major non-metal element

= Trace element

The Rise of Oxygen: Evidence from redox-sensitive elements

Page 16: Deep-Time Data Infrastructure: A DCO Legacy Program Robert M. Hazen—Geophysical Lab, Carnegie Institution DCO Data Science Day—RPI—June 5, 2014.

log fO2 ~ -72

Geochemical modeling is key.

The Rise of Subsurface Oxygen

Page 17: Deep-Time Data Infrastructure: A DCO Legacy Program Robert M. Hazen—Geophysical Lab, Carnegie Institution DCO Data Science Day—RPI—June 5, 2014.

SideriteFeCO3

log fO2 < -68

The Rise of Subsurface Oxygen

Page 18: Deep-Time Data Infrastructure: A DCO Legacy Program Robert M. Hazen—Geophysical Lab, Carnegie Institution DCO Data Science Day—RPI—June 5, 2014.

Azurite&

Malachite

log fO2 > -43

The Rise of Subsurface Oxygen

Page 19: Deep-Time Data Infrastructure: A DCO Legacy Program Robert M. Hazen—Geophysical Lab, Carnegie Institution DCO Data Science Day—RPI—June 5, 2014.

Reaction path calculations reveal changes in mineralogy as fluids and rocks not in equilibrium react with each

other. Data from Sverjensky et al. (in prep)

The Rise of Subsurface Oxygen:Basalt weathering before/after the GOE

Page 20: Deep-Time Data Infrastructure: A DCO Legacy Program Robert M. Hazen—Geophysical Lab, Carnegie Institution DCO Data Science Day—RPI—June 5, 2014.

Reaction path calculations reveal changes in mineralogy as fluids and rocks not in equilibrium react with each

other. Data from Sverjensky et al. (in prep)

The Rise of Subsurface Oxygen:Basalt weathering before/after the GOE

Page 21: Deep-Time Data Infrastructure: A DCO Legacy Program Robert M. Hazen—Geophysical Lab, Carnegie Institution DCO Data Science Day—RPI—June 5, 2014.

What minerals won’t form before the Great Oxidation Event?

598 of 643 Cu minerals

202 of 220 U minerals

319 of 451 Mn minerals

47 of 56 Ni minerals

582 of 790 Fe minerals

Piemontite

Garnierite

Xanthoxenite

Chrysocolla

Page 22: Deep-Time Data Infrastructure: A DCO Legacy Program Robert M. Hazen—Geophysical Lab, Carnegie Institution DCO Data Science Day—RPI—June 5, 2014.

Co-evolution of the geosphere and biosphere

Biologically mediated changes in Earth’s atmospheric composition

at ~2.4 to 2.2 Ga represent the single most significant factor in Earth’s mineralogical diversity.

Page 23: Deep-Time Data Infrastructure: A DCO Legacy Program Robert M. Hazen—Geophysical Lab, Carnegie Institution DCO Data Science Day—RPI—June 5, 2014.

Enzymes reveal Earth’s geochemical history.

Ferredoxin (before the GOE)

Page 24: Deep-Time Data Infrastructure: A DCO Legacy Program Robert M. Hazen—Geophysical Lab, Carnegie Institution DCO Data Science Day—RPI—June 5, 2014.

Nitrogenase (after the GOE)

Enzymes reveal Earth’s geochemical history.

Page 25: Deep-Time Data Infrastructure: A DCO Legacy Program Robert M. Hazen—Geophysical Lab, Carnegie Institution DCO Data Science Day—RPI—June 5, 2014.

The Rise of Subsurface Oxygen

Page 26: Deep-Time Data Infrastructure: A DCO Legacy Program Robert M. Hazen—Geophysical Lab, Carnegie Institution DCO Data Science Day—RPI—June 5, 2014.

Golden et al. (2013), EPSL

GOE HERE

SE HERE

The Rise of Subsurface Oxygen

Page 27: Deep-Time Data Infrastructure: A DCO Legacy Program Robert M. Hazen—Geophysical Lab, Carnegie Institution DCO Data Science Day—RPI—June 5, 2014.

Kump (2008) Nature 451, 277-278.

The Rise of Subsurface Oxygen

Page 28: Deep-Time Data Infrastructure: A DCO Legacy Program Robert M. Hazen—Geophysical Lab, Carnegie Institution DCO Data Science Day—RPI—June 5, 2014.

Hypothesis: There was a protracted “Great Subsurface

Oxidation Interval” that postdated the GOE by a billion

years. This interval was the single most significant factor in Earth’s

mineralogical diversification.

The Rise of Subsurface Oxygen

Page 29: Deep-Time Data Infrastructure: A DCO Legacy Program Robert M. Hazen—Geophysical Lab, Carnegie Institution DCO Data Science Day—RPI—June 5, 2014.

Most of what scientists do most of the time is start with a known phenomenon, and then collect

relevant data and develop explanatory hypotheses.

Data-Driven Discovery Data-Driven Discovery

Page 30: Deep-Time Data Infrastructure: A DCO Legacy Program Robert M. Hazen—Geophysical Lab, Carnegie Institution DCO Data Science Day—RPI—June 5, 2014.

Earth’s atmospheric oxidation influenced the partitioning of

redox-sensitive elements.

Mo, Re, Ni, and Co are redox-sensitive elements.

Therefore, we deduce that atmospheric oxidation influenced the

partitioning of Mo, Re, Ni, and Co.

DeductionDeduction

Page 31: Deep-Time Data Infrastructure: A DCO Legacy Program Robert M. Hazen—Geophysical Lab, Carnegie Institution DCO Data Science Day—RPI—June 5, 2014.

RESULTS: Molybdenite (MoS2) through Time

GOE HERE

SE HERE

Golden et al. (2013) EPSL 366:1-5.

Page 32: Deep-Time Data Infrastructure: A DCO Legacy Program Robert M. Hazen—Geophysical Lab, Carnegie Institution DCO Data Science Day—RPI—June 5, 2014.

RESULTS: Cu/Ni in carbonates vs. time

SE HERE GOE HERE

Xiaoming Liu et al. (2013)

Page 33: Deep-Time Data Infrastructure: A DCO Legacy Program Robert M. Hazen—Geophysical Lab, Carnegie Institution DCO Data Science Day—RPI—June 5, 2014.

Each of the last 5 supercontinent cycles led to episodes of enhanced mineralization

during intervals of continental convergence.

Mo, Be, B, and Hg are mineral-forming elements.

Therefore, we predict by induction that Mo, Be, B, and Hg minerals will display

enhanced mineralization during intervals of continental convergence.

InductionInduction

Page 34: Deep-Time Data Infrastructure: A DCO Legacy Program Robert M. Hazen—Geophysical Lab, Carnegie Institution DCO Data Science Day—RPI—June 5, 2014.

The Supercontinent Cycle

Page 35: Deep-Time Data Infrastructure: A DCO Legacy Program Robert M. Hazen—Geophysical Lab, Carnegie Institution DCO Data Science Day—RPI—June 5, 2014.

The Supercontinent CycleSUPERCONTINENT STAGE INTERVAL DURATION

Kenorland (Superia) Assembly 2.8-2.5 300Stable 2.5-2.4 100Breakup 2.4-2.0 400

Columbia (Nuna) Assembly 2.0-1.8 200Stable 1.8-1.6 200Breakup 1.6-1.2 400

Rodinia Assembly 1.2-1.0 200Stable 1.0-0.75 250Breakup 0.75-0.6 150

Pannotia Assembly 0.6-0.56 40Stable 0.56-0.54 20Breakup 0.54-0.43 110

Pangaea Assembly 0.43-0.25 180Stable 0.25-0.175 75Breakup 0.175-present 175

Page 36: Deep-Time Data Infrastructure: A DCO Legacy Program Robert M. Hazen—Geophysical Lab, Carnegie Institution DCO Data Science Day—RPI—June 5, 2014.

RESULTS: The Supercontinent

CYCLE

The distribution of zircon crystals through time

correlates with the supercontinent cycle over the past 3 billion years.

(Condie & Aster 2010; Hawksworth et al. 2010)

Page 37: Deep-Time Data Infrastructure: A DCO Legacy Program Robert M. Hazen—Geophysical Lab, Carnegie Institution DCO Data Science Day—RPI—June 5, 2014.

RESULTS: Mo Mineral Evolution

Temporal distribution of molybdenite (MoS2)Golden et al. (2013) EPSL 366:1-5.

Page 38: Deep-Time Data Infrastructure: A DCO Legacy Program Robert M. Hazen—Geophysical Lab, Carnegie Institution DCO Data Science Day—RPI—June 5, 2014.

Hg Mineral Evolution

The distribution of mercury (Hg) minerals through time correlates with the SC cycle

over the past 3 billion years, but there’s a gap

during Rodinia asembly.

Hazen et al. (2012) Amer. Mineral. 97:1013.

Page 39: Deep-Time Data Infrastructure: A DCO Legacy Program Robert M. Hazen—Geophysical Lab, Carnegie Institution DCO Data Science Day—RPI—June 5, 2014.

Abduction is a form of logical inference that goes from reliable data (i.e., observations), to a hypothesis that seeks to explain those data.

(Paraphrased from Wikipedia)

AbductionAbduction

Page 40: Deep-Time Data Infrastructure: A DCO Legacy Program Robert M. Hazen—Geophysical Lab, Carnegie Institution DCO Data Science Day—RPI—June 5, 2014.

Observations lead to new hypotheses.

We have vast amounts of data on mineral species, compositions, isotopes, petrologic

context, thermochemical parameters, tectonic settings, and the co-evolving

biosphere through deep time.

Previously unrecognized patterns and correlations will emerge from the

integration and evaluation of those data.

AbductionAbduction

Page 41: Deep-Time Data Infrastructure: A DCO Legacy Program Robert M. Hazen—Geophysical Lab, Carnegie Institution DCO Data Science Day—RPI—June 5, 2014.

THE CHALLENGE: Recognizing statistically meaningful patterns in

large data resources:

1. Correlations among many variables

Data-Driven DiscoveryData-Driven Discovery

Page 42: Deep-Time Data Infrastructure: A DCO Legacy Program Robert M. Hazen—Geophysical Lab, Carnegie Institution DCO Data Science Day—RPI—June 5, 2014.

Large integrated data resources can be explored with multivariate techniques (i.e., principal component analysis).

DATA-DRIVEN DISCOVERYDATA-DRIVEN DISCOVERY

Search for highly correlated patterns

among linear combinations of many different

variables.

Page 43: Deep-Time Data Infrastructure: A DCO Legacy Program Robert M. Hazen—Geophysical Lab, Carnegie Institution DCO Data Science Day—RPI—June 5, 2014.

THE CHALLENGE: Recognizing statistically meaningful patterns in

large data resources:

2. Meaningful trends in data vs. time

Data-Driven DiscoveryData-Driven Discovery

Page 44: Deep-Time Data Infrastructure: A DCO Legacy Program Robert M. Hazen—Geophysical Lab, Carnegie Institution DCO Data Science Day—RPI—June 5, 2014.

RESULTS: Molybdenite (MoS2) through Time

Golden et al. (2013) EPSL 366:1-5.

432 molybdenite samples

Page 45: Deep-Time Data Infrastructure: A DCO Legacy Program Robert M. Hazen—Geophysical Lab, Carnegie Institution DCO Data Science Day—RPI—June 5, 2014.

• Analyze equal sized bins.

• Apply statistical tests: linear regression of log Re

content vs. time. (Montgomery et al. 2006)

Are these trends statistically significant?

Are these trends statistically significant?

Page 46: Deep-Time Data Infrastructure: A DCO Legacy Program Robert M. Hazen—Geophysical Lab, Carnegie Institution DCO Data Science Day—RPI—June 5, 2014.

THE CHALLENGE: Recognizing statistically meaningful patterns

in large data resources:

3. Peak-to-noise problem

Data-Driven DiscoveryData-Driven Discovery

Page 47: Deep-Time Data Infrastructure: A DCO Legacy Program Robert M. Hazen—Geophysical Lab, Carnegie Institution DCO Data Science Day—RPI—June 5, 2014.

Peaks in ages of ~40,000 zircon crystalsPeaks in ages of ~40,000 zircon crystals

Condie & Aster (2010) Precambrian Research 180:227-236.

Page 48: Deep-Time Data Infrastructure: A DCO Legacy Program Robert M. Hazen—Geophysical Lab, Carnegie Institution DCO Data Science Day—RPI—June 5, 2014.

Condie & Aster (2010) Precambrian Research 180:227-236.

Monte Carlo Mean Kernal Density AnalysisMonte Carlo Mean Kernal Density Analysis

Page 49: Deep-Time Data Infrastructure: A DCO Legacy Program Robert M. Hazen—Geophysical Lab, Carnegie Institution DCO Data Science Day—RPI—June 5, 2014.

THE CHALLENGE: Recognizing statistically meaningful patterns

in large data resources:

4. Visualization opportunities

Data-Driven DiscoveryData-Driven Discovery

Page 50: Deep-Time Data Infrastructure: A DCO Legacy Program Robert M. Hazen—Geophysical Lab, Carnegie Institution DCO Data Science Day—RPI—June 5, 2014.

Element abundances versus numbers of mineral species (Hazen, Grew, Downs et al.)

Why Do We See the Minerals We See?

Too few species:Ga, Rb, Hf

Too many species:As, Hg, Sb, U

Page 51: Deep-Time Data Infrastructure: A DCO Legacy Program Robert M. Hazen—Geophysical Lab, Carnegie Institution DCO Data Science Day—RPI—June 5, 2014.

Island area versus numbers of biological species (MacArthur and Wilson, 1967)

Why Do We See the Minerals We See?

Page 52: Deep-Time Data Infrastructure: A DCO Legacy Program Robert M. Hazen—Geophysical Lab, Carnegie Institution DCO Data Science Day—RPI—June 5, 2014.

What percentage of minerals incorporating element X, also incorporates element Y? (Hazen, Fox, Downs et al.)

Cobalt minerals that also incorporate arsenic

Why Do We See the Minerals We See?

Page 53: Deep-Time Data Infrastructure: A DCO Legacy Program Robert M. Hazen—Geophysical Lab, Carnegie Institution DCO Data Science Day—RPI—June 5, 2014.

Frequency distributions of 4933 mineral species: 22% of mineral species are known from only one locality.

Why Do We See the Minerals We See?

Page 54: Deep-Time Data Infrastructure: A DCO Legacy Program Robert M. Hazen—Geophysical Lab, Carnegie Institution DCO Data Science Day—RPI—June 5, 2014.

Frequency distributions of 4933 mineral species: 22% of mineral species are known from only one locality.

Therefore:

(1) Numerous additional minerals exist on Earthbut as yet remain undescribed.

(2) Numerous other plausible minerals do not now exist on Earth, but might have in the past,

or might occur on other Earth-like planets.

(3) If we “played the tape over again,” then the first 4933 minerals to be found would likely

differ by ~1000 mineral species.

Why Do We See the Minerals We See?

Page 55: Deep-Time Data Infrastructure: A DCO Legacy Program Robert M. Hazen—Geophysical Lab, Carnegie Institution DCO Data Science Day—RPI—June 5, 2014.

Conclusions

Vast, largely untapped, data resources inform our view of Earth’s dynamic

history over 4.5 billion years.

Combining those deep-time data resources into a single infrastructure

represents an opportunity for accelerated “abductive” discovery.

Page 56: Deep-Time Data Infrastructure: A DCO Legacy Program Robert M. Hazen—Geophysical Lab, Carnegie Institution DCO Data Science Day—RPI—June 5, 2014.

CONCLUSIONS

We are poised to make fundamental discoveries about our planetary home through development, integration, and

exploration of deep-time data resources.

Data-Driven Discovery

Page 57: Deep-Time Data Infrastructure: A DCO Legacy Program Robert M. Hazen—Geophysical Lab, Carnegie Institution DCO Data Science Day—RPI—June 5, 2014.

Please join this effort:• Archive your data

• Release “dark data”• Help us build this resource

Page 58: Deep-Time Data Infrastructure: A DCO Legacy Program Robert M. Hazen—Geophysical Lab, Carnegie Institution DCO Data Science Day—RPI—June 5, 2014.

Statistical tests: linear regression of log Re content vs. time

(Montgomery et al. 2006):

Log(Re) = β0+β1t+β2x2+β3x3+β4x4+β5x5+β6x6

[t = time; βi = regression parameters; xi = indicator variables]

β0=0; β1=0.0059(8); β2=4.6(7); β3=12(2); β4=15(2); β5=18(2); β6=19(2)

Are these trends statistically significant?

Are these trends statistically significant?

Page 59: Deep-Time Data Infrastructure: A DCO Legacy Program Robert M. Hazen—Geophysical Lab, Carnegie Institution DCO Data Science Day—RPI—June 5, 2014.

Enzymes reveal Earth’s geochemical history.

David & Alm (2011) “Rapid evolutionary innovation during an Archean genetic expansion.” Nature 469,93-96.