GLOBAL GLOBAL BIODIVERSITY BIODIVERSITY INFORMATION INFORMATION FACILITY FACILITY Greg Riccardi Co-chair 9 November 2009 WWW.GBIF.ORG WWW.GBIF.ORG Outcomes of the GBIF LSID-GUID Task Group
Dec 23, 2015
GLOBALGLOBALBIODIVERSITYBIODIVERSITYGLOBALGLOBALBIODIVERSITYBIODIVERSITY
INFORMATIONINFORMATIONFACILITYFACILITY
Greg Riccardi
Co-chair
9 November 2009
WWW.GBIF.OWWW.GBIF.ORGRG
Outcomes of the GBIF LSID-GUID Task Group
OverviewOverview
Task Group Overview The Characteristics of Effective
Identifiers Benefits and Opportunities Recommendations Discussion Session
Thursday, 12 Nov, 1400-1530
GUID Goals from GBIF Strategic PlansGUID Goals from GBIF Strategic Plans The GBIF strategic plans document includes goals
To consolidate the underlying enabling infrastructure and standardisation for global connectivity of biodiversity data and information
To develop a system of globally unique identifiers and encourage their use throughout biodiversity informatics
To use TDWG standards to allow all data objects to be identified using standard actionable globally unique identifiers
To provision GBIF web services and user interfaces to allow users to locate and view any data object with a standard globally unique identifier.
Call to the Task GroupCall to the Task Group
GBIF convened a task group, the “LSID GUID Task Group” (LGTG) to explore the issues and offer recommendations
on the way forward, with particular reference to the GBIF network,
that will enable GBIF to provide architecture leadership and best practices for implementation.
The principal objective of the group is to provide recommendations and guidelines on
deployment of identifiers on the GBIF network with particular reference to the potential role of GBIF as a stable, long term provider of identifier resolution services.
MembersMembers
Phil Cryer (Missouri Botanical Garden) Roger Hyam (Natural History Museum and PESI) Chuck Miller (Missouri Botanical Garden) Nicola Nicolson (Royal Botanic Gardens, Kew) Éamonn Ó Tuama (GBIF) Rod Page (University of Glasgow) Jonathan Rees (Science Commons) Greg Riccardi (co-chair, Florida State University) Kevin Richards (Landcare Research, New
Zealand) Richard White (co-chair, Cardiff University)
ResultsResults
Report document Draft written at the August 2009 workshop at
GBIF Revised for distribution in October 2009
Contents of report Overview of definitions and technology Recommendations for the GBIF secretariat and
for the biodiversity community
Report delivered to GBIF Science Committee Response of committee (at end of talk)
OverviewOverview
Task Group Overview The Characteristics of Effective
Identifiers Benefits and Opportunities Recommendations Discussion Session
Thursday, 12 Nov, 1400-1530
Preliminary DefinitionPreliminary Definition
An identifier is a character string associated with an object. Identifiers are used in informatics to refer
to objects in data sets, documents and repositories.
Some identifiers are useful Some are more useful
Characteristics of Effective Identifiers
Characteristics of Effective Identifiers
Two use cases that make identifiers effective for users
Uniqueness of reference to a single object An identifier can be used to aggregate information
about the identified object For example, information received from multiple
sources associated with a single identifier is information about a single object.
Actions may be carried out using the identifier An identifier can be used to find further information
about the object, concept or data to which it refers. This information might be interpreted directly or
used to support services.
Problems with terminologyProblems with terminology The task group struggled with terms
GUID is problematic Used in IT to refer to the way that Microsoft
uses 128 bit UUIDs Used in biodiversity to refer to …
Persistent, actionable identifier The Task Group recommendation for
terminology Two required characteristics: persistent
and actionable
Persistent IdentifierPersistent Identifier
Persistence: The property that an identifier always refers to a specific object. All information associated with a persistent
identifier is about the same object. The properties of the object are subject to
change, but once a persistent identifier is assigned to one object, it cannot be reused to refer to a different object.
Example ITIS TSNs are integers that are persistent
identifiers for taxa
Actionable IdentifiersActionable Identifiers
An identifier is actionable if there is a service that, given the identifier, provides information about the object identified E.g., a resolution service maps an identifier into a
Web service that provides information about the identifier and its associated object
Example An HTTP URI is actionable.
The HTTP system provides mechanisms for clients to access informationabout a data object from its associated identifier.
ITIS TSNs are actionable because ITIS supports services that provide information for TSNs.
Good Identifier TechnologiesGood Identifier Technologies HTTP URI: A fundamental technology of WWW
Persistence assured using DNS Actionable through HTTP protocol
LSID: Life Science Identifiers Persistence assured by convention Actionable according to the LSID services model May be mapped into HTTP URI by resolution services
Recommendation: Both are important to biodiversity and should be supported by GBIF
UUID Persistence assured by random assignment Not independently actionable Can be an effective part of HTTP URI and LSID technologies
OverviewOverview
Task Group Overview The Characteristics of Effective
Identifiers Benefits and Opportunities Recommendations Discussion Session
Thursday, 12 Nov, 1400-1530
Example Benefits of IDsExample Benefits of IDs
Tracking citation and impact The association among objects might be contained in a
blog post: Joe writes “I searched the GBIF repository for all frogs from
Cuba. The collection of objects that I found useful are in the collection [ID1]. I plotted the locations of the records [ID2] and reported the results in my paper [ID3].
Such an association provides feedback and is used by search engines in rankings and ratings
Management and disambiguation of taxon names Disambiguation of taxon names requires services that
support tests of difference as well as of equality. Different identifiers do not necessarily refer to different
objects. Tests of inequality for objects must rely on evaluation of
metadata or of the objects themselves.
OpportunityOpportunity
Integrating identifiers with the Semantic Web and the Linked Data model Linked Data (http://linkeddata.org) is a
vision of a web of interconnected data, to be consumed by machines
HTTP URIs are used as identifiers, and the data is described using RDF
If we use HTTP URIs for identifiers, we will be part of Linked Data
Potential Linked Data ModelPotential Linked Data Model
OverviewOverview
Task Group Overview The Characteristics of Effective
Identifiers Benefits and Opportunities Recommendations Discussion Session
Thursday, 12 Nov, 1400-1530
Recommendations: GBIF ShouldRecommendations: GBIF Should Take the leadership role in driving the application and use of identifiers in biodiversity
informatics, Provide materials such as an executive summary targeted to administrative leadership
explaining the costs and benefits of implementing persistent identifiers, Educate the community in general persistent identifier principles and practices, Encourage, support and advise on the use of appropriate identifier technologies, in
particular lsids and HTTP uris, but not impose a requirement for one at the expense of the other, and provide specific advice for the issuing and use of lsids and for HTTP uris,
Support a promotional programme, Demonstrate good practice in its data portal, Assist providers that are not currently maintaining their own persistent identifiers to do
so: this includes both education and technology, Make data more inter-connected, Start a programme to become an RDF consumer and encourage data providers to
deploy RDF services, Provide services to support identifier resolution, redirection, metadata hosting, and
caching, Provide additional services, including persistent identifier monitoring services, Extend the role of its data portal by hosting resources related to the use of identifiers,
such as the TDWG vocabularies, Assist with the availability of software for data and service providers, and Continue to be funded to provide support to data providers for the foreseeable future.
Response of the GBIF Science Committee
Response of the GBIF Science Committee
The SC reviewed and endorsed the report of the LSID GUID TG (LGTG).
The SC recommends that An additional full case study is developed in the document to
highlight the new quality control mechanisms that can be established to have users report and receive feedback on the quality of data being served.
Additionally, the LGTG makes an excellent “obligatory reading material”
for the Biodiversity Informatics community in general and for GBIF Participants, in particular.
The SC strongly recommends all participants to read it and be aware of the impact that the implementation of tools such as IPT and GBRDS will have in their local contexts as well as globally
How to contact GBIF:How to contact GBIF:
Web site: www.gbif.orgData portal: data.gbif.org
GBIF SecretariatUniversitetsparken 152100 CopenhagenDenmark
E-mail: [email protected]: +45 3532 1470Fax: +45 3532 1480