Top Banner
Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition How Darwin Core Archives have changed the landscape of biodiversity data publishing John Wieczorek ([email protected]) Information Architect Museum of Vertebrate Zoology, UC Berkeley Buenos Aires (Argentina) 28 September 2011
30

Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition How Darwin Core Archives have changed the landscape.

Dec 29, 2015

Download

Documents

Jody Ford
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition How Darwin Core Archives have changed the landscape.

Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition

How Darwin Core Archives have changed the landscape of biodiversity data publishing

John Wieczorek ([email protected])Information ArchitectMuseum of Vertebrate Zoology, UC Berkeley

Buenos Aires (Argentina)28 September 2011

Page 2: Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition How Darwin Core Archives have changed the landscape.

Background: Data Exchange

ABCD (TDWG Standard)• > 1200 concepts• XML• Shared via BioCase, Tapir

Darwin Core (pre-standard v. 1.2, 47 versions)• 48 concepts, specimens• XML• Shared via by DiGIR

Darwin Core (pre-standard v. 1.4)• 46 concepts (plus extensions), specimens• XML• Shared via Tapir

Darwin Core (TDWG Standard)• 172 concepts (156 in Simple Darwin Core), biodiversity data• CSV, XML, RDF, JSON, …• Shared via Text files, Tapir, Darwin Core Archive…

Page 3: Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition How Darwin Core Archives have changed the landscape.

Darwin Core Archive

PrimaryBiodiversity

Data

TaxonomicData

Metadata

http://www.someplace.org/data.zip

Page 4: Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition How Darwin Core Archives have changed the landscape.

Darwin Core ArchiveComplete Package

• Standard Darwin Core terms in a single, self-contained dataset

• Taxon records or Occurrence Records

• Data set metadata in EML

Page 5: Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition How Darwin Core Archives have changed the landscape.

• Simple format (text files)

• Efficient harvesting (single file)

• Efficient storage (compressed)

• Easy access (no special software required)

• Extensible (related files in one archive)

Darwin Core Archive:Benefits

Preferred format for publishing data in the GBIF network

Page 6: Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition How Darwin Core Archives have changed the landscape.

Darwin Core Archive:Anatomy

Archives always have a metadata file as EML

Page 7: Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition How Darwin Core Archives have changed the landscape.

Ecological Metadata Language (EML)

• Title and Abstract• Citation and Attribution• Contact and Authors• Geographic Scope• Sampling Methods• Bibliography• and more…

For describing data sets – even unpublished ones

Page 8: Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition How Darwin Core Archives have changed the landscape.

Darwin Core Archive:Anatomy

Archives always have a core data file as text

Page 9: Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition How Darwin Core Archives have changed the landscape.

Core data file types

Records based on taxa – one species per row

Records based on species occurrences – one per row

OR

Page 10: Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition How Darwin Core Archives have changed the landscape.

Darwin Core Archive:Anatomy

Archives always have a core data file as text

Page 11: Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition How Darwin Core Archives have changed the landscape.

Core contains a “core ID” column, unique for every record in the file

Darwin Core Archive:Anatomy

Page 12: Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition How Darwin Core Archives have changed the landscape.

Columns are matched to Darwin Core terms

Darwin Core Archive:Anatomy

Page 13: Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition How Darwin Core Archives have changed the landscape.

Columns that do not match to a Darwin Core term

may be included, but are ignored

“Wingspan” is not a Darwin Core term

Darwin Core Archive:Anatomy

Page 14: Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition How Darwin Core Archives have changed the landscape.

1) Rename columns in text file

Two ways to match columns to Darwin Core terms

Darwin Core Archive:Anatomy

Page 15: Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition How Darwin Core Archives have changed the landscape.

2) Match columns to terms in a separate meta.xml file

Two ways to match columns to Darwin Core terms

Darwin Core Archive:Anatomy

Page 16: Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition How Darwin Core Archives have changed the landscape.

meta.xml matches the columns in the core data file (species.txt)

More on how to make the meta.xml file later…

Darwin Core Archive:Anatomy

Page 17: Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition How Darwin Core Archives have changed the landscape.

Archives can include extension filesSpecies.txt

Common_names.txt

Extensions allow multiple records to be linked to a core record.

Extensions link to the core through the core ID

Darwin Core Archive:Anatomy

Page 18: Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition How Darwin Core Archives have changed the landscape.

GBIF hosts extension definitions

http://rs.gbif.org/extension/

Page 19: Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition How Darwin Core Archives have changed the landscape.

Multiple extensions files can be linked to the core

Darwin Core Archive:Anatomy

Page 20: Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition How Darwin Core Archives have changed the landscape.

All files are stored in a single folder

Darwin Core Archive:Anatomy

Page 21: Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition How Darwin Core Archives have changed the landscape.

The folder is zipped.

This is a Darwin Core Archive• Data files• Column matching file• Data set documentation

Darwin Core Archive:Anatomy

Page 22: Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition How Darwin Core Archives have changed the landscape.

http://www.organisation.org /my_data.zip

Archives on a web server can be accessed by a URL. Share this URL to “publish” your data!

Darwin Core Archive:Publishing

Page 23: Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition How Darwin Core Archives have changed the landscape.

Darwin Core Archive:Publishing Options

Page 24: Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition How Darwin Core Archives have changed the landscape.

GBIF Spreadsheet Templates

Page 25: Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition How Darwin Core Archives have changed the landscape.

Integrated Publishing Toolkit

Page 26: Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition How Darwin Core Archives have changed the landscape.

Data Hosting Centers

Page 27: Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition How Darwin Core Archives have changed the landscape.

Darwin Core Mapping Assistant

Metafile

http://tools.gbif.org/dwca-assistant/

Page 28: Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition How Darwin Core Archives have changed the landscape.

Darwin Core Mapping Assistant

Page 29: Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition How Darwin Core Archives have changed the landscape.

• GBIF Darwin Core Archive Spreadsheet Templates:• data in a spreadsheet already• simple archive authoring

• IPT:• creating/managing archives for multiple data sets• managing archives for multiple organisations• metadata as GBIF Metadata Profile of EML

• Make Your Own:• automating archive generation• customisation

• Hosting center:• economy of scale• Infrastructure and support

• Combinations…

Darwin Core Archive:Publishing Options

Page 30: Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition How Darwin Core Archives have changed the landscape.

Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition

How Darwin Core Archives have changed the landscape of biodiversity data publishing

Presenter (email)RoleOrganization

Buenos Aires (Argentina)28 September 2011