Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition How Darwin Core Archives have changed the landscape of biodiversity data publishing John Wieczorek ([email protected]) Information Architect Museum of Vertebrate Zoology, UC Berkeley Buenos Aires (Argentina) 28 September 2011
30
Embed
Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition How Darwin Core Archives have changed the landscape.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition
How Darwin Core Archives have changed the landscape of biodiversity data publishing
John Wieczorek ([email protected])Information ArchitectMuseum of Vertebrate Zoology, UC Berkeley
Darwin Core (pre-standard v. 1.2, 47 versions)• 48 concepts, specimens• XML• Shared via by DiGIR
Darwin Core (pre-standard v. 1.4)• 46 concepts (plus extensions), specimens• XML• Shared via Tapir
Darwin Core (TDWG Standard)• 172 concepts (156 in Simple Darwin Core), biodiversity data• CSV, XML, RDF, JSON, …• Shared via Text files, Tapir, Darwin Core Archive…
Darwin Core Archive
PrimaryBiodiversity
Data
TaxonomicData
Metadata
http://www.someplace.org/data.zip
Darwin Core ArchiveComplete Package
• Standard Darwin Core terms in a single, self-contained dataset
• Taxon records or Occurrence Records
• Data set metadata in EML
• Simple format (text files)
• Efficient harvesting (single file)
• Efficient storage (compressed)
• Easy access (no special software required)
• Extensible (related files in one archive)
Darwin Core Archive:Benefits
Preferred format for publishing data in the GBIF network
Darwin Core Archive:Anatomy
Archives always have a metadata file as EML
Ecological Metadata Language (EML)
• Title and Abstract• Citation and Attribution• Contact and Authors• Geographic Scope• Sampling Methods• Bibliography• and more…
For describing data sets – even unpublished ones
Darwin Core Archive:Anatomy
Archives always have a core data file as text
Core data file types
Records based on taxa – one species per row
Records based on species occurrences – one per row
OR
Darwin Core Archive:Anatomy
Archives always have a core data file as text
Core contains a “core ID” column, unique for every record in the file
Darwin Core Archive:Anatomy
Columns are matched to Darwin Core terms
Darwin Core Archive:Anatomy
Columns that do not match to a Darwin Core term
may be included, but are ignored
“Wingspan” is not a Darwin Core term
Darwin Core Archive:Anatomy
1) Rename columns in text file
Two ways to match columns to Darwin Core terms
Darwin Core Archive:Anatomy
2) Match columns to terms in a separate meta.xml file
Two ways to match columns to Darwin Core terms
Darwin Core Archive:Anatomy
meta.xml matches the columns in the core data file (species.txt)
More on how to make the meta.xml file later…
Darwin Core Archive:Anatomy
Archives can include extension filesSpecies.txt
Common_names.txt
Extensions allow multiple records to be linked to a core record.
Extensions link to the core through the core ID
Darwin Core Archive:Anatomy
GBIF hosts extension definitions
http://rs.gbif.org/extension/
Multiple extensions files can be linked to the core
Darwin Core Archive:Anatomy
All files are stored in a single folder
Darwin Core Archive:Anatomy
The folder is zipped.
This is a Darwin Core Archive• Data files• Column matching file• Data set documentation
Darwin Core Archive:Anatomy
http://www.organisation.org /my_data.zip
Archives on a web server can be accessed by a URL. Share this URL to “publish” your data!
Darwin Core Archive:Publishing
Darwin Core Archive:Publishing Options
GBIF Spreadsheet Templates
Integrated Publishing Toolkit
Data Hosting Centers
Darwin Core Mapping Assistant
Metafile
http://tools.gbif.org/dwca-assistant/
Darwin Core Mapping Assistant
• GBIF Darwin Core Archive Spreadsheet Templates:• data in a spreadsheet already• simple archive authoring
• IPT:• creating/managing archives for multiple data sets• managing archives for multiple organisations• metadata as GBIF Metadata Profile of EML
• Make Your Own:• automating archive generation• customisation
• Hosting center:• economy of scale• Infrastructure and support
• Combinations…
Darwin Core Archive:Publishing Options
Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition
How Darwin Core Archives have changed the landscape of biodiversity data publishing