DataManagement&$Intro$to$Rpeople.tamu.edu/~alawing/materials/ESSM689/DataManagement.pdf · ESSM689$Quan,tave$Methods$in$ Ecology,$Evolu,on$and$Biogeography$ Ecosystem$Science$and$Management|$
Post on 17-Mar-2020
5 Views
Preview:
Transcript
ESSM 689 Quan,ta,ve Methods in Ecology, Evolu,on and Biogeography
Ecosystem Science and Management | Texas A&M University (c) 2015, A. Michelle Lawing
Data Management & Intro to R
A. Michelle Lawing Ecosystem Science and Management Texas A&M University College Sta,on, TX 77843 alawing@tamu.edu
ESSM 689 Quan,ta,ve Methods in Ecology, Evolu,on and Biogeography
Ecosystem Science and Management | Texas A&M University (c) 2015, A. Michelle Lawing
Michener and Jones 2011
ESSM 689 Quan,ta,ve Methods in Ecology, Evolu,on and Biogeography
Ecosystem Science and Management | Texas A&M University (c) 2015, A. Michelle Lawing
3
Barriers to Synthesis
• Data not preserved – Tiny propor,on of ecological and evolu,onary data are readily
available
• Dispersed, isolated repositories – Each community has its own; disconnected; underu,lized
• Lack of so\ware interoperability
• Heterogeneous data – Many data formats, metadata formats, and varying seman,cs
ESSM 689 Quan,ta,ve Methods in Ecology, Evolu,on and Biogeography
Ecosystem Science and Management | Texas A&M University (c) 2015, A. Michelle Lawing
Dispersed data
Global Biodiversity Informa,on Facility (GBIF) Downloaded 01/26/2015
ESSM 689 Quan,ta,ve Methods in Ecology, Evolu,on and Biogeography
Ecosystem Science and Management | Texas A&M University (c) 2015, A. Michelle Lawing
Data diversity • Biological
– e.g., Gene, Organism, Popula,on, Species, Community, Biome, Ecosystem
• Environmental – e.g., Atmospheric, Chemical,
Ecological, Hydrological, Oceanographic, Physical
• Social – e.g., Land use, human popula,on
• Economic – e.g., trade, ecosystem services,
resource extrac,on
ESSM 689 Quan,ta,ve Methods in Ecology, Evolu,on and Biogeography
Ecosystem Science and Management | Texas A&M University (c) 2015, A. Michelle Lawing
Biodiversity data heterogeneity Space Time Taxa
ESSM 689 Quan,ta,ve Methods in Ecology, Evolu,on and Biogeography
Ecosystem Science and Management | Texas A&M University (c) 2015, A. Michelle Lawing
“Dark” data in the long tail
Heidorn 2008
ESSM 689 Quan,ta,ve Methods in Ecology, Evolu,on and Biogeography
Ecosystem Science and Management | Texas A&M University (c) 2015, A. Michelle Lawing
Data Heterogeneity Heterogeneity High Low
• Tight coupling • Simple subsetting • Explicit semantics
• Loose coupling • Hard subsetting • Limited semantics
Volume Low High
ESSM 689 Quan,ta,ve Methods in Ecology, Evolu,on and Biogeography
Ecosystem Science and Management | Texas A&M University (c) 2015, A. Michelle Lawing
ESSM 689 Quan,ta,ve Methods in Ecology, Evolu,on and Biogeography
Ecosystem Science and Management | Texas A&M University (c) 2015, A. Michelle Lawing
ESSM 689 Quan,ta,ve Methods in Ecology, Evolu,on and Biogeography
Ecosystem Science and Management | Texas A&M University (c) 2015, A. Michelle Lawing
Knowledge Network for Biocomplexity Data Distribu,on
Data until: 07 Oct 2011 Total: 25,191 data sets
ESSM 689 Quan,ta,ve Methods in Ecology, Evolu,on and Biogeography
Ecosystem Science and Management | Texas A&M University (c) 2015, A. Michelle Lawing
So\ware diversity
GMN
ESSM 689 Quan,ta,ve Methods in Ecology, Evolu,on and Biogeography
Ecosystem Science and Management | Texas A&M University (c) 2015, A. Michelle Lawing
Solu,ons
• Preserve data
• Adopt standards
• Create networks
• Create interoperable so\ware
ESSM 689 Quan,ta,ve Methods in Ecology, Evolu,on and Biogeography
Ecosystem Science and Management | Texas A&M University (c) 2015, A. Michelle Lawing
Metadata and data heterogeneity • Every community has
– many data schemas • one for each project and person
– many data formats • ASCII, NetCDF, HDF, GeoTiff, ...
– many metadata schemas • Biological Data Profile, Darwin Core, Dublin Core, Ecological Metadata Language (EML), Open GIS schemas, ISO Schemas, ...
• Accep,ng this heterogeneity is cri,cal
ESSM 689 Quan,ta,ve Methods in Ecology, Evolu,on and Biogeography
Ecosystem Science and Management | Texas A&M University (c) 2015, A. Michelle Lawing
Column metadata
ESSM 689 Quan,ta,ve Methods in Ecology, Evolu,on and Biogeography
Ecosystem Science and Management | Texas A&M University (c) 2015, A. Michelle Lawing
Morpho
Wizard to create metadata
ESSM 689 Quan,ta,ve Methods in Ecology, Evolu,on and Biogeography
Ecosystem Science and Management | Texas A&M University (c) 2015, A. Michelle Lawing
Morpho highlights • Create metadata in EML format • Manage data in EML packages • Save, publish, and share data
• Search for data • Mul,-‐language
– English, Spanish, Chinese, French, Portuguese, Japanese • Export data and metadata • Cross-‐plajorm, and open source
Morpho
ESSM 689 Quan,ta,ve Methods in Ecology, Evolu,on and Biogeography
Ecosystem Science and Management | Texas A&M University (c) 2015, A. Michelle Lawing
Kepler
DMP-Tool
Plan
Collect
Assure
Describe
Preserve
Discover
Integrate
Analyze
ESSM 689 Quan,ta,ve Methods in Ecology, Evolu,on and Biogeography
Ecosystem Science and Management | Texas A&M University (c) 2015, A. Michelle Lawing
How do we harness the long tail?
• Efficient data federa,on – Focus on individual contributors
• Late binding in informa,cs systems – Loose coupling – Schema-‐less storage
• Central search for discovery
• Interoperable so\ware
top related