Top Banner
Global Biodiversity Information Facility GLOBAL BIODIVERSITY INFORMATION FACILITY Presentation: Wouter Addink – ETI Most slides made by: Giorgos Ksouris - GBIF Secretariat Utrecht, 14 January 2004 www.gbif.org What is DiGIR? What is ABCD?
35

Global Biodiversity Information Facility GLOBAL BIODIVERSITY INFORMATION FACILITY Presentation: Wouter Addink – ETI Most slides made by: Giorgos Ksouris.

Dec 15, 2015

Download

Documents

Keyon Mathes
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Global Biodiversity Information Facility GLOBAL BIODIVERSITY INFORMATION FACILITY Presentation: Wouter Addink – ETI Most slides made by: Giorgos Ksouris.

Global Biodiversity Information Facility

GLOBALBIODIVERSITY

INFORMATIONFACILITY

Presentation:

Wouter Addink – ETI

Most slides made by:

Giorgos Ksouris - GBIF Secretariat

Utrecht, 14 January 2004

www.gbif.org

What is DiGIR?

What is ABCD?

Page 2: Global Biodiversity Information Facility GLOBAL BIODIVERSITY INFORMATION FACILITY Presentation: Wouter Addink – ETI Most slides made by: Giorgos Ksouris.

Global Biodiversity Information Facility

“Primary Biodiversity Data” Network

“Primary Biodiversity Data” Network

GBIF is concerned with primary biodiversity data: Specimens Observations Names

Species Literature

Metadata on the above

How will the data be contributed to the GBIF Network?

Page 3: Global Biodiversity Information Facility GLOBAL BIODIVERSITY INFORMATION FACILITY Presentation: Wouter Addink – ETI Most slides made by: Giorgos Ksouris.

Global Biodiversity Information Facility

GBIF “Data Providers (Nodes)”GBIF “Data Providers (Nodes)” Responsible for providing, through standard WEB

exchange interfaces, metadata describing themselves and the data services they offer and free access to biodiversity data.

Should use a common data exchange format with a fixed structure which clearly defines how the information is to be shared.

Data should be exchanged in a way which makes it as simple as possible to compare and merge information from different resources.

GBIF therefore needs a simple model which will allow institutions to share their data using structured formats, regardless of what formats they use in their own

databases.

Page 4: Global Biodiversity Information Facility GLOBAL BIODIVERSITY INFORMATION FACILITY Presentation: Wouter Addink – ETI Most slides made by: Giorgos Ksouris.

Global Biodiversity Information Facility

Data exchange standardsData exchange standards

Models that allow data on individual specimens or observations to be structured and shared as XML documents that can be transmitted across the Internet:

Darwin Core V2(limited set of core data elements)

(http://tsadev.speciesanalyst.net/documentation/ow.asp?DarwinCoreV2)

ABCD V1.2(complete set of all possible data elements in specimen and observation data)

(http://www.bgbm.org/TDWG/CODATA/Schema/default.htm)

Page 5: Global Biodiversity Information Facility GLOBAL BIODIVERSITY INFORMATION FACILITY Presentation: Wouter Addink – ETI Most slides made by: Giorgos Ksouris.

Global Biodiversity Information Facility

Data exchange protocolsData exchange protocolsDefines request and reponse message formats for standardized communication between provider and portal

DiGIR protocol Uses open protocols and standards, such as HTTP, XML, and UDDI De-couples protocol, software and semantics Automates the establishment of a new data provider as much as possible In use with Darwin Core in a few projects like MaNIS, but cannot be used with

a complex XML Schema like ABCD

BioCASE protocol Based on DiGIR, but with a few improvements, like capability to use ABCD Not compatible with DiGIR Still under development De-couples protocol, software and semantics better then DiGIR but

establishment of a new data provider is more complex

SOAP Generic protocol using HTTP, XML and UDDI, not focussed on specimen and

observation data exchange

Page 6: Global Biodiversity Information Facility GLOBAL BIODIVERSITY INFORMATION FACILITY Presentation: Wouter Addink – ETI Most slides made by: Giorgos Ksouris.

Global Biodiversity Information Facility

data exchange format: ABCDdata exchange format: ABCD XML complex schema Coverage of complete specimen and observation data

domain Schema is used in BioCASE project Hundreds of concepts (data elements) Schema includes:

Meta-data: Information about the source, from name of the holding institution to copyrights statements of the hole dataset.

Unit-data: Information regarding the records, specific copyrights, date last modification, facts that don't fix in any other place etc.

Gathering site: Information about the gathering site. Gathering place, altitude, responsible person etc.

Taxon identification: Possible identifications for this unit. Includes the taxon part and data on the identification event like who identified the unit etc.

Taxon name: Data about the name of the taxon. It is split into different parts for

the different biological disciplines like botany, zoology, etc with their own nomenclatural code. Includes data on the Scientific name, higher taxon etc.

Page 7: Global Biodiversity Information Facility GLOBAL BIODIVERSITY INFORMATION FACILITY Presentation: Wouter Addink – ETI Most slides made by: Giorgos Ksouris.

Global Biodiversity Information Facility

data exchange format: Darwin Core2

data exchange format: Darwin Core2

XML schema In use for some time already (MaNIS project) Suitable for collections and observations data. 48 concepts (data elements):DateLastModified * InstitutionCode * CollectionCode * CatalogNumber *

ScientificName * BasisOfRecord Kingdom Phylum

Class Order Family Genus

Species Subspecies ScientificNameAuthor

IdentifiedBy

YearIdentified MonthIdentified DayIdentified TypeStatus

CollectorNumber FieldNumber Collector YearCollected

MonthCollected DayCollected JulianDay TimeOfDay

ContinentOcean Country StateProvince County

Locality Longitude Latitude CoordinatePrecision

BoundingBox MinimumElevation MaximumElevation MinimumDepth

MaximumDepth Sex PreparationType IndividualCount

PreviousCatalogNumber

RelationshipType RelatedCatalogItem Notes

* = required element

Page 8: Global Biodiversity Information Facility GLOBAL BIODIVERSITY INFORMATION FACILITY Presentation: Wouter Addink – ETI Most slides made by: Giorgos Ksouris.

Global Biodiversity Information Facility

Software for GBIF “Data Providers (Nodes)”

Software for GBIF “Data Providers (Nodes)”

GBIF has chosen to use DiGIR software and Darwin Core2 because: The provider software is stable Easy to install and easy to use Used already in the MaNIS network and some other projects Collection database models are rather easy to map against Darwin

Core2 (but dataproviders will often miss data elements that are important for their database)

However, BioCASE software and ABCD will also be supported in the near future because: Will be in use in BioCASE network BioCASE software has some improvements compared with DiGIR

(but is still less easy to install and use) ABCD has more potential for the future then Darwin Core2

Page 9: Global Biodiversity Information Facility GLOBAL BIODIVERSITY INFORMATION FACILITY Presentation: Wouter Addink – ETI Most slides made by: Giorgos Ksouris.

Global Biodiversity Information Facility

Data Provider within GBIF Architecture

Data Provider within GBIF Architecture

Portal

Data providerProvider Services

Providerquery

RequestManager

QueryEngine

Availableproviders

UDDI Registry

InstitutionsServices (Providers)

AccessPoints

ResourceMetadata

ResourceMetadata

Index

Metadataand name

query

Metadataresponse

Dataquery

Dataresponse

Metadata and logs

Name providerProvider Services

ResourceMetadata

ResourceMetadata

Synonyms, GUIDs

Publishavailability

CacheMetadata

Accounting

SOAP

DiGIR

HTTP

Page 10: Global Biodiversity Information Facility GLOBAL BIODIVERSITY INFORMATION FACILITY Presentation: Wouter Addink – ETI Most slides made by: Giorgos Ksouris.

Global Biodiversity Information Facility

WEB exchange interface: DiGIRWEB exchange interface: DiGIR Distributed Generic Information Retrieval is a client/server

protocol for retrieving information from distributed resources.

Uses HTTP as the transport mechanism and XML for encoding messages sent between client and server. Three type of messages:

Metadata: get metadata information of the provider and the resource(s) that serves.

Search: find specimen and observation records based on search criteria, for example: the name of a species and/or a rectangle defining an area on the earth’s surface and/or …

Inventory: get the set of distinct values associated with a single concept, for instance: Species.

Maps database models of collections to Darwin Core2 (suitable for exchange of specimen and observation data).

Page 11: Global Biodiversity Information Facility GLOBAL BIODIVERSITY INFORMATION FACILITY Presentation: Wouter Addink – ETI Most slides made by: Giorgos Ksouris.

Global Biodiversity Information Facility

DiGIR: AdvantagesDiGIR: Advantages Provides a single point of access to one to many

distributed information resources. Resource: a collection of data objects that conform to a

common schema.

Enables search & retrieval of structured data.

Makes location and technical characteristics of native resource transparent to the user.

Not the only available software (BioCASE/ABCD Schema is

another candidate) but stable enough to be launched.

Page 12: Global Biodiversity Information Facility GLOBAL BIODIVERSITY INFORMATION FACILITY Presentation: Wouter Addink – ETI Most slides made by: Giorgos Ksouris.

Global Biodiversity Information Facility

DiGIR Provider: How it WorksDiGIR Provider: How it Works

Resource

Resource

Resource

WEB Server-DiGIR S/W

Server

Resource Resource

Provider Metadata

Resource Metadata

HTTPXML

Metadata message

Search/Inventory message

Page 13: Global Biodiversity Information Facility GLOBAL BIODIVERSITY INFORMATION FACILITY Presentation: Wouter Addink – ETI Most slides made by: Giorgos Ksouris.

Global Biodiversity Information Facility

GBIF’s DiGIR Provider PackageGBIF’s DiGIR Provider Package Encompasses the DiGIR Provider software, Apache2 WEB server and PHP

libraries.

Requires from the user only basic knowledge of the operating system.

Two available releases: (http://circa.gbif.net/Public/irc/gbif/ict/library?l=/digir_provider_package) Linux (RedHat 7.3, 8, 9) MS Windows (2000, XP)

Supported databases: MySQL PostgreSQL MS SQL Server MS Access (only the MS Windows package)

Offers automatic registration with GBIF UDDI Registry (http://registry.gbif.net)

Other features: Caching (cleanup from the startup script) Rotation of log files (WEB server, DiGIR provider)

Page 14: Global Biodiversity Information Facility GLOBAL BIODIVERSITY INFORMATION FACILITY Presentation: Wouter Addink – ETI Most slides made by: Giorgos Ksouris.

Global Biodiversity Information Facility

DiGIR Provider InstallationDiGIR Provider Installation

Completed in 4 steps:

Installation of the GBIF’s DiGIR Provider package.

Definition of provider’s metadata.(For a unique RecordIdentifier in the GBIF network:Use the format ParticipantCode:InstitutionCode:CollectionCode)

Definition of resource(s). Registration with GBIF UDDI registry.

Page 15: Global Biodiversity Information Facility GLOBAL BIODIVERSITY INFORMATION FACILITY Presentation: Wouter Addink – ETI Most slides made by: Giorgos Ksouris.

Global Biodiversity Information Facility

Becoming a GBIF Data provider in the Netherlands (1)

Becoming a GBIF Data provider in the Netherlands (1)

Determine which data sets you can provide in structured electronic form (like a database) and whether these data sets contain specimen data, observation data, species data or other biodiversity data. The data also needs to be maintained.

Determine which data may be available for public use. GBIF has decided to make all data in the network publicly

available (this may change in the future). There will be no user restrictions like password protection for data, to avoid extra complexity. Data that should not be available for public usage should not be provided. For example: do not provide exact information about locations of endangered species that can be of use for hunters or illegal traders.

Define an IPR (Intellectual Property Rights) policy for each data set.

Information about the data sets (metadata) should be sent to NLBIF and will be kept in a central metadatabase. This information will also be available in the BioCASE metadata network.

Page 16: Global Biodiversity Information Facility GLOBAL BIODIVERSITY INFORMATION FACILITY Presentation: Wouter Addink – ETI Most slides made by: Giorgos Ksouris.

Global Biodiversity Information Facility

Becoming a GBIF Data provider in the Netherlands (2)

Becoming a GBIF Data provider in the Netherlands (2)

Required metadata:(The minimum metadata needed is the required metadata for DiGIR and for the BioCASE NoDIT database.)

A name, addres, description and unique code for your organisation (see gbif website for codes already taken)

A name, description and unique code for each dataset in your organisation

The unique identifier to identify a specimen

A last modified date for the dataset

At least one contact name, email address and phone number

Page 17: Global Biodiversity Information Facility GLOBAL BIODIVERSITY INFORMATION FACILITY Presentation: Wouter Addink – ETI Most slides made by: Giorgos Ksouris.

Global Biodiversity Information Facility

Becoming a GBIF Data provider in the Netherlands (3)

Becoming a GBIF Data provider in the Netherlands (3)

Check if you can make your data available in one of the following database formats:

MySQL PostgreSQL MS SQL Server MS Access (only the MS Windows package)

Check if you have a computer with internet access available and

Linux (RedHat 7.3, 8, 9) or MS Windows (2000, XP)

If this is the case: Congratulations: you can maintain your own data node that uses the standard GBIF DiGIR provider software.

In all other cases, please contact NLBIF. NLBIF can also provide data storage space for your datasets.With DiGIR you might also be able to use DB2, Interbase, Frontbase, Informix, Visual FoxPro, PostgreSQL, Sybase, other ODBC-compliant database. However, this is currently not supported by GBIF.

Page 18: Global Biodiversity Information Facility GLOBAL BIODIVERSITY INFORMATION FACILITY Presentation: Wouter Addink – ETI Most slides made by: Giorgos Ksouris.

Global Biodiversity Information Facility

NLBIF AssistanceNLBIF Assistance the complete distribution of the Digir provider (including PHP, Apache

webserver and automatic GBIF UDDI registration) provided by GBIF is recommended. The GBIF helpdesk or NLBIF (ETI) can help you with technical installation problems.

To use your data source with DiGIR, you need to map the data fields you want to publish 1:1 to Darwin Core V2 Schema elements (the software does not contain translator functions) For this you probably need to create a view (if your database supports this) for some of the fields or a separate database with the needed fields. Contact NLBIF if you need assistance with conversions.

Because GBIF netwerk use caching, it will take a few hours before your

data is visible in the netwerk.

In case you want a custom search interface on your dataset, please also contact NLBIF. NLBIF is developing several web modules for this purpose that will be used for collections like those from ZMA.

You may use BioCASE and ABCD instead of DiGIR, for instance if you want to provide data that does not fit in Darwin Core, but it is recommended to start with DiGIR provider.

Page 19: Global Biodiversity Information Facility GLOBAL BIODIVERSITY INFORMATION FACILITY Presentation: Wouter Addink – ETI Most slides made by: Giorgos Ksouris.

Global Biodiversity Information Facility

GBIF network growthGBIF network growth The global network started end of november

2003

Currently there are already about 28 dataproviders worldwide connected with about 8.5 million specimen and observation records

The Netherlands are currently connected with 7 collections containing 46.981 records

With your help this can be … million records next year?!!

Page 20: Global Biodiversity Information Facility GLOBAL BIODIVERSITY INFORMATION FACILITY Presentation: Wouter Addink – ETI Most slides made by: Giorgos Ksouris.

Global Biodiversity Information Facility

Page 21: Global Biodiversity Information Facility GLOBAL BIODIVERSITY INFORMATION FACILITY Presentation: Wouter Addink – ETI Most slides made by: Giorgos Ksouris.

Global Biodiversity Information Facility

Darwin Core2 Elements (1)Darwin Core2 Elements (1) DateLastModified: ISO 8601 compliant stamp indicating the date and time in UTC(GMT) when

the record was last modified. Example: the instant "November 5, 1994, 8:15:30 am, US Eastern Standard Time" would be represented as "1994-11-05T13:15:30Z"

InstitutionCode: A "standard" code identifier that identifies the institution to which the collection belongs. No global registry exists for assigning institutional codes. Use the code that is "standard" in your discipline.

CollectionCode: A unique alphanumeric value which identifies the collection within the institution.

CatalogNumber: A unique alphanumeric value which identifies an individual record within the collection. It is recommended that this value provides a key by which the actual specimen can be identified. If the specimen has several items such as various types of preparation, this value should identify the individual component of the specimen.

ScientificName: The full name of lowest level taxon the Catalogued Item can be identified as a member of; includes genus name, specific epithet, and subspecific epithet (zool.) or infraspecific rank abbreviation, and infraspecific epithet (bot.) Use name of suprageneric taxon (e.g., family name) if Catalogued Item cannot be identified to genus, species, or infraspecific taxon.

BasisOfRecord: An abbreviation indicating whether the record represents an observation (O), living organism (L), specimen (S), germplasm/seed (G), etc.

Kingdom: The kingdom to which the organism belongs Phylum: The phylum (or division) to which the organism belongs Class: The class name of the organism Order: The order name of the organism Family: The family name of the organism Genus: The genus name of the organism Species: The specific epithet of the organism Subspecies: The sub-specific epithet of the organism ScientificNameAuthor: The author of a scientific name. Author string as applied to the

accepted name. Can be more than one author (concatenated string). Should be formatted according to the conventions of the applicable taxonomic discipline.

Page 22: Global Biodiversity Information Facility GLOBAL BIODIVERSITY INFORMATION FACILITY Presentation: Wouter Addink – ETI Most slides made by: Giorgos Ksouris.

Global Biodiversity Information Facility

Darwin Core2 Elements (2)Darwin Core2 Elements (2) IdentifiedBy: The name(s) of the person(s) who applied the currently accepted Scientific

Name to the Catalogued Item. YearIdentified: The year portion of the date when the Collection Item was identified; as four

digits [-9999..9999], e.g., 1906, 2002. MonthIdentified: The month portion of the date when the Collection Item was identified; as

two digits [01..12]. DayIdentified: The day portion of the date when the Collection Item was identified; as two

digits [01..31]. TypeStatus: Indicates the kind of nomenclatural type that a specimen represents. In

particular, the type status may not apply to the name listed in the scientific name, i.e. current identification. In rare cases, a single specimen may be the type of more than one name.

CollectorNumber: An identifying "number" (really a string) applied to specimens (in some disciplines) at the time of collection. Establishes a links different parts/preparations of a single specimen and between field notes and the specimen.

FieldNumber: A "number" (really a string) created at collection time to identify all material that resulted from a collecting event.

Collector: The name(s) of the collector(s) responsible for collection the specimen or taking the observation

YearCollected: The year (expressed as an integer) in which the specimen was collected. The full year should be expressed (e.g. 1972 must be expressed as "1972" not "72").

MonthCollected: The month of year the specimen was collected from the field. Possible values range from 01...12 inclusive

DayCollected: The day of the month the specimen was collected from the field. Possible value ranges from 01..31 inclusive

JulianDay: The ordinal day of the year; i.e., the number of days since January 1 of the same year. (January 1 is Julian Day 1.)

Page 23: Global Biodiversity Information Facility GLOBAL BIODIVERSITY INFORMATION FACILITY Presentation: Wouter Addink – ETI Most slides made by: Giorgos Ksouris.

Global Biodiversity Information Facility

Darwin Core2 Elements (3)Darwin Core2 Elements (3) TimeOfDay: The time of day a specimen was collected expressed as decimal hours from

midnight local time (e.g. 12.0 = mid day, 13.5 = 1:30pm ContinentOcean: The continent or ocean from which a specimen was collected. Country: The country or major political unit from which the specimen was collected. ISO

3166-1 values should be used. Full country names are currently in use. A future recommendation is to use ISO3166-1 two letter codes or the full name when searching

StateProvince: The state, province or region (i.e. next political region smaller than Country) from which the specimen was collected.

County: The county (or shire, or next political region smaller than State/Province) from which the specimen was collected

Locality: The locality description (place name plus optionally a displacement from the place name) from which the specimen was collected. Where a displacement from a location is provided, it should be in un-projected units of measurement

Longitude: The longitude of the location from which the specimen was collected. This value should be expressed in decimal degrees with a datum such as WGS-84

Latitude: The latitude of the location from which the specimen was collected. This value should be expressed in decimal degrees with a datum such as WGS-84

CoordinatePrecision: An estimate of how tightly the collecting locality was specified; expressed as a distance, in meters, that corresponds to a radius around the latitude-longitude coordinates. Use NULL where precision is unknown, cannot be estimated, or is not applicable.

BoundingBox: This access point provides a mechanism for performing searches using a bounding box. A Bounding Box element is not typically present in the database, but rather is derived from the Latitude and Longitude columns by the data provider

MinimumElevation: The minimum distance in meters above (positive) or below sea level of the collecting locality.

MaximumElevation: The maximum distance in meters above (positive) or below sea level of the collecting locality.

Page 24: Global Biodiversity Information Facility GLOBAL BIODIVERSITY INFORMATION FACILITY Presentation: Wouter Addink – ETI Most slides made by: Giorgos Ksouris.

Global Biodiversity Information Facility

Darwin Core2 Elements (4)Darwin Core2 Elements (4) MinimumDepth: The minimum distance in meters below the surface of the water at which the

collection was made; all material collected was at least this deep. Positive below the surface, negative above (e.g. collecting above sea level in tidal areas).

MaximumDepth: The maximum distance in meters below the surface of the water at which the collection was made; all material collected was at most this deep. Positive below the surface, negative above (e.g. collecting above sea level in tidal areas).

Sex: The sex of a specimen. The domain should be a controlled set of terms (codes) based on community consensus. Proposed values: M=Male; F=Female; H=Hermaphrodite; I=Indeterminate (examined but could not be determined; U=Unknown (not examined); T=Transitional (between sexes; useful for sequential hermaphrodites)

PreparationType: The type of preparation (skin. slide, etc). Probably best to add this as a record element rather than access point. Should be a list of preparations for a single collection record.

IndividualCount: The number of individuals present in the lot or container. Not an estimate of abundance or density at the collecting locality.

PreviousCatalogNumber: The previous (fully qualified) catalogue number of the Catalogued Item if the item earlier identified by another Catalogue Number, either in the current catalogue or another Institution / catalogue. A fully qualified Catalogue Number is preceded by Institution Code and Collection Code, with a space separating the each subelement. Referencing a previous Catalogue Number does not imply that a record for the referenced item is or is not present in the corresponding catalogue, or even that the referenced catalogue still exists. This access point is intended to provide a way to retrieve this record by previously used identifier, which may used in the literature. In future versions of this schema this attribute should be set-valued.

RelationshipType: A named or coded valued that identifies the kind relationship between this Collection Item and the referenced Collection Item. Named values include: "parasite of", "epiphyte on", "progeny of", etc. In future versions of this schema this attribute should be set-valued.

RelatedCatalogItem: The fully qualified identifier of a related Catalogue Item (a reference to another specimen); Institution Code, Collection Code, and Catalogue Number of the related Catalogued Item, where a space separates the three subelements.

Notes: Free text notes attached to the specimen record.

Page 25: Global Biodiversity Information Facility GLOBAL BIODIVERSITY INFORMATION FACILITY Presentation: Wouter Addink – ETI Most slides made by: Giorgos Ksouris.

Global Biodiversity Information Facility

DiGIR & Darwin Core2: An Example

DiGIR & Darwin Core2: An Example

<?xml version="1.0" encoding="utf-8" ?> <responseWrapper> <response xmlns='http://digir.net/schema/protocol/2003/1.0'> <header>

<version>$Revision: 1.10 $</version> <sendTime>11-09-2003 16:33:53+0200</sendTime> <source resource="biotella">http://giorgos.gbif.org:80/digir/DiGIR.php</source> <destination>192.38.103.181</destination>

</header>

<content xmlns:darwin='http://digir.net/schema/conceptual/darwin/2003/1.0' xmlns:xsd='http://www.w3.org/2001/XMLSchema' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'>

<record> <darwin:DateLastModified>19930717T225000Z</darwin:DateLastModified> <darwin:InstitutionCode>bioshare.com</darwin:InstitutionCode> <darwin:CollectionCode>pyy</darwin:CollectionCode> <darwin:CatalogNumber>4</darwin:CatalogNumber> <darwin:ScientificName>Diarsia mendica</darwin:ScientificName>

</record> <record> <darwin:DateLastModified>19950526T220000Z</darwin:DateLastModified> <darwin:InstitutionCode>bioshare.com</darwin:InstitutionCode> <darwin:CollectionCode>pyy</darwin:CollectionCode> <darwin:CatalogNumber>6</darwin:CatalogNumber> <darwin:ScientificName>Lycia lapponaria</darwin:ScientificName> </record> <record> <darwin:DateLastModified>19950526T220000Z</darwin:DateLastModified>

<darwin:InstitutionCode>bioshare.com</darwin:InstitutionCode> <darwin:CollectionCode>pyy</darwin:CollectionCode> <darwin:CatalogNumber>7</darwin:CatalogNumber> <darwin:ScientificName>Plutella maculipennis</darwin:ScientificName> </record> </content>

<diagnostics> <diagnostic code="MATCH_COUNT" severity="info">42763</diagnostic> <diagnostic code="RECORD_COUNT" severity="info">3</diagnostic> <diagnostic code="END_OF_RECORDS" severity="info">false</diagnostic> </diagnostics>

</response> </responseWrapper>

Page 26: Global Biodiversity Information Facility GLOBAL BIODIVERSITY INFORMATION FACILITY Presentation: Wouter Addink – ETI Most slides made by: Giorgos Ksouris.

Global Biodiversity Information Facility

Management of Resources – A Training DB

Management of Resources – A Training DB

Getting familiar with the training MS Access data base:

Biotella: One of many available observation and specimen datatabase tools

http://www.bioshare.net/biotella ”Open source” Microsoft Access Basic application Can export ABCD and DwC formats to GBIF Data

Repository Tool (in upcoming version)

Can act as resource to DiGIR Provider

Training database populated with sample Lepidoptera data

Page 27: Global Biodiversity Information Facility GLOBAL BIODIVERSITY INFORMATION FACILITY Presentation: Wouter Addink – ETI Most slides made by: Giorgos Ksouris.

Global Biodiversity Information Facility

Biotella Observation Database Schema

Biotella Observation Database Schema

Main Tables

Page 28: Global Biodiversity Information Facility GLOBAL BIODIVERSITY INFORMATION FACILITY Presentation: Wouter Addink – ETI Most slides made by: Giorgos Ksouris.

Global Biodiversity Information Facility

Mapping the Database against Darwin Core2

Mapping the Database against Darwin Core2

Alternatives Mapping within database (faster queries with indexing,

conversion of value domains, available in Biotella) Mapping at DiGIR Provider (no database work needed)

Conversion of value domains Big issue, let’s leave it as is for time being

Page 29: Global Biodiversity Information Facility GLOBAL BIODIVERSITY INFORMATION FACILITY Presentation: Wouter Addink – ETI Most slides made by: Giorgos Ksouris.

Global Biodiversity Information Facility

Registration with GBIF UDDI Registry

Registration with GBIF UDDI Registry

Universal Description Discovery & Integration is a special directory that provides methods for publishing and finding business & service information / specifications.

UDDI is based on existing standards, such as XML and SOAP.

Four primary data types: businessEntity: represents business basic information e.g.

contact information, categorization, descriptions, etc. businessService: describes a service provided by the business

bindingTemplate: contains an optional description of the service, the URL of its access point, and a reference to one or more tModel

tModel: abstract description of a particular specification or behaviour to which the Web service conforms

businessEntity

tModel

businessService

bindingTemplate

businessService

bindingTemplate

bindingTemplate

tModel

tModel

tModel

tModel

Page 30: Global Biodiversity Information Facility GLOBAL BIODIVERSITY INFORMATION FACILITY Presentation: Wouter Addink – ETI Most slides made by: Giorgos Ksouris.

Global Biodiversity Information Facility

Registration with GBIF UDDI Registry (2)

Registration with GBIF UDDI Registry (2)

Several steps to make data useful in a UDDI registry:

Companies/organisations/standards bodies define tModels, relevant to an industry/business/science, and register them in UDDI ( DiGIR tModel).

Companies/organisations ( business entities) register descriptions of them ( Data Node) and define the services ( DiGIR provider) they offer.

UDDI taxonomies are used for describing business entities ( connection between GBIF Participant Node and Data Nodes).

Marketplaces, search engines, and business applications ( GBIF portal, GBIF Participant Nodes portals) query the registry to discover services of interest at other companies.

Page 31: Global Biodiversity Information Facility GLOBAL BIODIVERSITY INFORMATION FACILITY Presentation: Wouter Addink – ETI Most slides made by: Giorgos Ksouris.

Global Biodiversity Information Facility

Registration with GBIF UDDI Registry (3)

Registration with GBIF UDDI Registry (3)

Automatic registration with GBIF UDDI registry. Utilisation of the values of the elements defined as metadata of the

provider (plus some extra information). Business Entity

business name: {the <name> of the <host> institution} description: {the location (URL) pointing to <host> institution <related

information>}

Business Service service name: {the common <name> of the provider} (your.server.name) description: {the <abstract> information of <host>}

Binding Template access point: http://your.server.name:port/digir/DiGIR.php description: Access point of {<host> <abstract>}

Demonstration

Registration of trainees’ DiGIR Providers

Page 32: Global Biodiversity Information Facility GLOBAL BIODIVERSITY INFORMATION FACILITY Presentation: Wouter Addink – ETI Most slides made by: Giorgos Ksouris.

Global Biodiversity Information Facility

Exploration of GBIF UDDI Registry

Exploration of GBIF UDDI Registry

Page 33: Global Biodiversity Information Facility GLOBAL BIODIVERSITY INFORMATION FACILITY Presentation: Wouter Addink – ETI Most slides made by: Giorgos Ksouris.

Global Biodiversity Information Facility

Exploration of GBIF UDDI Registry (2)

Exploration of GBIF UDDI Registry (2)

Find all business entities correspond to Data Nodes under a Participant Node:

Access the URL http://registry.gbif.net.

Click on the Browse link under the Taxonomies subtree.

Click on the gbif:nodes link.

Click on the Sweden link in the Categories box.

Press the Find business button.

Page 34: Global Biodiversity Information Facility GLOBAL BIODIVERSITY INFORMATION FACILITY Presentation: Wouter Addink – ETI Most slides made by: Giorgos Ksouris.

Global Biodiversity Information Facility

Use of a Search PortalUse of a Search Portal

Page 35: Global Biodiversity Information Facility GLOBAL BIODIVERSITY INFORMATION FACILITY Presentation: Wouter Addink – ETI Most slides made by: Giorgos Ksouris.

Global Biodiversity Information Facility

Find all records of a database resource where the Darwin Core2 concept Genus contains the word Colias:

Access the URL http://192.168.7.173:8080/pres/PresentationServlet?action=home

and press the Build query button.

Click on one of the available resources in the Select data providers section.

Select Genus from the Select a concept selection list in the Select query conditions section.

Select like from the Select a comparator selection list and type Colias in the adjacent text box.

Press the Submit query box.

Use of a Search Portal (1)Use of a Search Portal (1)