Top Banner
BIS TDWG Conference, New Orleans, 2011 GBIF: Issues in providing federated access to digital information related to biological specimens David Remsen Senior Programme Officer Global Biodiversity Information Facility (GBIF)
25

BIS TDWG Conference, New Orleans, 2011 GBIF: Issues in providing federated access to digital information related to biological specimens David Remsen Senior.

Dec 16, 2015

Download

Documents

Gervais Berry
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: BIS TDWG Conference, New Orleans, 2011 GBIF: Issues in providing federated access to digital information related to biological specimens David Remsen Senior.

BIS TDWG Conference, New Orleans, 2011

GBIF: Issues in providing federated access to digital information related to biological specimens

David RemsenSenior Programme OfficerGlobal Biodiversity Information Facility (GBIF)

Page 2: BIS TDWG Conference, New Orleans, 2011 GBIF: Issues in providing federated access to digital information related to biological specimens David Remsen Senior.

Issue #2: Geospatial integration

Issue #3: Taxonomic integration

Issue #1: The consequences of scale

3 issues

Page 3: BIS TDWG Conference, New Orleans, 2011 GBIF: Issues in providing federated access to digital information related to biological specimens David Remsen Senior.

Issue #1: The consequences of scale

Goal – Provide timely access to a large federated network of biodiversity databases

Page 4: BIS TDWG Conference, New Orleans, 2011 GBIF: Issues in providing federated access to digital information related to biological specimens David Remsen Senior.

About GBIF

• 341 publishers• 9290 datasets• 310M records

The mission of the Global Biodiversity Information Facility (GBIF) is to facilitate free and open access to biodiversity data worldwide via the Internet to underpin sustainable development.

• 57 countries• 45 organisations

Page 5: BIS TDWG Conference, New Orleans, 2011 GBIF: Issues in providing federated access to digital information related to biological specimens David Remsen Senior.

“Wrapper” Software

PyWrapper (Python)

TAPIR Link (PHP)

DiGIR (PHP)

Your database

Insect Collection

Install one of these ‘wrappers’

ABCD

Bird Observations

Herbarium

Data

DarwinCore

DarwinCore

Page 6: BIS TDWG Conference, New Orleans, 2011 GBIF: Issues in providing federated access to digital information related to biological specimens David Remsen Senior.

The promise of federation

Insect Collection HerbariumBird Observations Herbarium

Any specimens from Thailand?

GBIF Data Portal

I will ask!

I do! I do! I do!Nope!

GBIF Data Portal as a Gateway

Page 7: BIS TDWG Conference, New Orleans, 2011 GBIF: Issues in providing federated access to digital information related to biological specimens David Remsen Senior.

The challenge of federation

Insect Collection HerbariumBird Observations Herbarium

Hello?

Server Not AvailableServer Not Available

GBIF Data Portal

Hi!

Page 8: BIS TDWG Conference, New Orleans, 2011 GBIF: Issues in providing federated access to digital information related to biological specimens David Remsen Senior.

The rise of Indexing

Insect Collection HerbariumBird Observations Herbarium

Any data records from

Thailand?Send me a copy of your data

GBIF Data Portal (now with Data!)

GBIF Data Portal as a Data Index

Page 9: BIS TDWG Conference, New Orleans, 2011 GBIF: Issues in providing federated access to digital information related to biological specimens David Remsen Senior.

The wrong tools for the job

Insect Collection HerbariumBird Observations Herbarium

Any data records from

Thailand?

Send me a copy of your data once per month

Here is page one.

If I go offline,start againNot too fast!

You ask the same questions every time

GBIF Data Portal (now with Data!)

Page 10: BIS TDWG Conference, New Orleans, 2011 GBIF: Issues in providing federated access to digital information related to biological specimens David Remsen Senior.

TAPIR request example

• dataset of 260,000 specimens

• 200 records retrieved per request

• requires 1300 request/response pairs

• over 9 hours to complete

• 500 MB of XML data is transferred

• becomes 32 MB text file in the GBIF server

• 32 MB is compressible to 3 MB zip file

Page 11: BIS TDWG Conference, New Orleans, 2011 GBIF: Issues in providing federated access to digital information related to biological specimens David Remsen Senior.

Darwin Core Archives

A text-based solution to publishing biodiversity data

Page 12: BIS TDWG Conference, New Orleans, 2011 GBIF: Issues in providing federated access to digital information related to biological specimens David Remsen Senior.

A Refined Approach

Insect Collection HerbariumBird Observations Herbarium

Any data records from

Thailand?

This is fast!

GBIF Data Portal (now with Data!)

This is easy

URL URL URLURL

- index very large data sets

- reduce latency

Page 13: BIS TDWG Conference, New Orleans, 2011 GBIF: Issues in providing federated access to digital information related to biological specimens David Remsen Senior.

2007 Today

70 million

20102008 2009

147 million

180 million

201 million

302 millionGrowth

Need for a new standard identified

Page 14: BIS TDWG Conference, New Orleans, 2011 GBIF: Issues in providing federated access to digital information related to biological specimens David Remsen Senior.

Issue #2: Geospatial Integration

Goal – Provide accurate reporting of nationally-bound data

Challenge – Inaccurate recording of geospatial coordinates

Page 15: BIS TDWG Conference, New Orleans, 2011 GBIF: Issues in providing federated access to digital information related to biological specimens David Remsen Senior.

Geo-referenced USA data

Verbatim data as shared on the network

Page 16: BIS TDWG Conference, New Orleans, 2011 GBIF: Issues in providing federated access to digital information related to biological specimens David Remsen Senior.

Issue #2: Geospatial IntegrationRemediation includes:• Use of country boundary shapefiles to

verify that coordinates fall within them– Including EEZ boundaries– Including islands

• Outliers identified• Nature of the error qualified (e.g.,

“coordinates inverted”)• Offending records marked and

omitted from display

Page 17: BIS TDWG Conference, New Orleans, 2011 GBIF: Issues in providing federated access to digital information related to biological specimens David Remsen Senior.

Geo-referenced USA data

Data following interpretation- Coastal regions recognised- Offshore islands recognised

Page 18: BIS TDWG Conference, New Orleans, 2011 GBIF: Issues in providing federated access to digital information related to biological specimens David Remsen Senior.

Issue #3: Taxonomic Integration

• Goal – Provide access to biodiversity data according to taxonomic groups and concepts

• Challenge – – Heterogeneous and sometimes inaccurate

classification• Same taxon appearing in different

classifications– Presence of homonyms that complicate

reconciling above– Misspellings– Wide range of orthographies for the same name

Page 19: BIS TDWG Conference, New Orleans, 2011 GBIF: Issues in providing federated access to digital information related to biological specimens David Remsen Senior.

Enabling authoratative taxonomic data to be published through GBIF

Page 20: BIS TDWG Conference, New Orleans, 2011 GBIF: Issues in providing federated access to digital information related to biological specimens David Remsen Senior.

Trochilidae (Hummingbirds) (today)

Misinterpretations(Hummingbirds are restricted to the Americas)

Page 21: BIS TDWG Conference, New Orleans, 2011 GBIF: Issues in providing federated access to digital information related to biological specimens David Remsen Senior.

Trochilidae (Hummingbirds) (next month)

Improved interpretation

Page 22: BIS TDWG Conference, New Orleans, 2011 GBIF: Issues in providing federated access to digital information related to biological specimens David Remsen Senior.

Search for Oenanthe(water dropwort plant or wheatear bird)

Difficult for user to interpret

Accurate search results

Today

Next month

resolution of homonyms

Page 23: BIS TDWG Conference, New Orleans, 2011 GBIF: Issues in providing federated access to digital information related to biological specimens David Remsen Senior.

Improved means to match names to authority files

Page 24: BIS TDWG Conference, New Orleans, 2011 GBIF: Issues in providing federated access to digital information related to biological specimens David Remsen Senior.

In summary• GBIF has had to deploy different data access

strategies in order to effectively scale• Darwin Core Archive offers a scalable solution that

has led to rapid growth in data published through GBIF

• Geospatial filtering via shapefiles provides basis for more accurate national reporting– Basis for additional services later (e.g., ecosystem

shapefiles, protected areas, etc.)

• Heterogenous taxonomy inherent to collections data is nearly impossible to consolidate into a taxonomically accurate structure.– Comprehensive authoritative taxonomic data is a key

organisational component of collections data

Page 25: BIS TDWG Conference, New Orleans, 2011 GBIF: Issues in providing federated access to digital information related to biological specimens David Remsen Senior.

Thank you