Globally Unique Identifiers in Biodiversity Informatics Kevin Richards Landcare Research NZ TDWG 2008
Jan 20, 2016
Globally Unique Identifiers in
Biodiversity Informatics
Kevin RichardsLandcare Research NZ
TDWG 2008
Introduction
GUID (Globally Unique IDentifier)
– What, Why, Which, How– LSIDs– Issues
What are GUIDs
Globally Unique IDentifier• A short name for a complex entity on the web• Each name identifies only one entity• Examples:
– UUID eg 3E9D6B68-A08C-4F15-BC8A-1265F15D30E2
– DOI eg doi:10.1006/jmbi.1998.2354 – Handle eg hdl:123.456/abc
– LSID eg urn:lsid:indexfungorum.org:names:213645
– PURL eg http://purl.oclc.org/abc/123
What is a GUID
– Properties• Persistent• Opaque • Resolvable, sometimes - useful for locating
information about the entity
Why use GUIDs
Data at Provider 2
BOOK : “Three little pigs” 2 copies
Data Consumer
Data at Provider 1
BOOK : “The three little pigs” 3 copies
BOOKS:“Three little pigs” … (2)“The three little pigs” … (3)
Data at Provider 2 (ID = P2)
BOOK : “Three little pigs”ID (eg ISBN) = A123 2 copies
Data Consumer
Data at Provider 1 (ID = P1)
BOOK : “The three little pigs”ID (eg ISBN) = A1233 copies
BOOKS:ID : A123 : “The three little pigs”… (5)
… but with GUIDs …
BOOK Titles:ID A123 : Provider P1 : “The three little pigs”ID A123 : Provider P2 : “Three little pigs”
Example in our domain
ConsensusId : urn:lsid:compositae.org:names:45240C9B-D419-4B6F-93A5-D0A6DEAB4C81Name : Anthemis gaudium-solis Velen.
Provider Id Taxon Name
IPNI urn:lsid:ipni.org:names:177325-1:1.1 Anthemis gaudium-solis Vel.
Tropicos 50163035 Anthemis goudium-solis Velen.
Euro+Med 133202 Anthemis gaudium-solis Velen.
Govaerts {29FFBEDC-19F5-4899-BCB3-05EE2C7816C8} Anthemis gaudiumsolis Velen.
GUIDs are vital to TDWG architecture
Which GUID
• GUID Subgroup Recommendations:• Use LSIDs for identifying biodiversity data• Reuse GUIDs where they already exist
– GUID type
– Existing assignments
• See GUID Report - http://wiki.gbif.org/guidwiki/wikka.php?wakka=GUID2Report&show_comments=1
Also Canberra LSID Workshop report:http://www.tdwg.org/fileadmin/subgroups/guid/LSID_policy_workshop_Report_Canberra.pdf
What is an LSID?
• Life Science IDentifier• Developed by The Object Management Group &
W3C• Implemented by the team at IBM• Used for – data objects, datasets, images, files
LSID Format urn:lsid:bioguid.org:taxon:1122:v1
• Prefix - indicates that this is a URN
• URN type - indicates that it’s an LSID-type urn
• Authority - the authority who issued the LSID
• Namespace - internal to that authority
• Object identifier - within that authority
• Version - optional
LSID Rules
• Data doesn’t change (byte identical)
• Always available for resolution– Hand over to another authority if necessary
• At least some basic metadata
Pros of LSIDs
Not tied to physical addresses (as URLs are) Comparison can be done without resolving the ID
– eg for cases like “does object a = object b” Do not require any central registration or central
service Quick to adopt Encourage thought and planning before they are
allocated
Cons of LSIDs
However …
Requires DNS SRV record
Requires specialised software to resolve an LSID (not built in to most software)
The restriction - “LSID data cannot change” can be difficult
How
• What data/objects to apply Ids to
• Decide on – Authority– Namespace– Local ids (new vs existing)
• Issue LSIDs
• Setup resolver
LSID Code
• Current Code Stacks– Open Source (sourceforge.net)– Java, C++, Perl (IBM)– Microsoft .NET (Myself)– TAPIR LSID configuration
LSID Tools
• IBM LSID Launchpad• Firefox LSID Browser• LSID Tester (Rod Page)• Web based resolver – http://lsid.tdwg.org/
http://lsid.tdwg.org/urn:lsid... to get LSID metadata http://lsid.tdwg.org/summary/urn:lsid... to get summary info of LSID object
• Example LSID servers:– Index Fungorum - urn:lsid:indexfungorum.org:names:213649 – IPNI – urn:lsid:ipni.org:names:30000959-2:1.1.2.1– uBio - urn:lsid:ubio.org:namebank:11815
Issues to think about
• Who assigns new LSIDs?
• Who maintains LSID resolvers?
• What to assign LSIDs to:– Physical or Digital– Granularity– Only objects that need to be resolved /
identified externally– Is there any data, or only metadata?
Issues to think about
• When to resolve LSIDs– Every time an LSID is encountered, or only
when a client requests it?
• TDWG standards for metadata– Which ones?– Consistent application
References• LSID Source Forge - http://lsids.sourceforge.net/
• LSID .NET Source Forge - http://sourceforge.net/projects/lsid-dotnet
• LSID Tutorial - http://www-128.ibm.com/developerworks/opensource/library/os-lsid/
• LSID Specification - http://www.omg.org/cgi-bin/doc?dtc/04-05-01
• LSID Tester - http://linnaeus.zoology.gla.ac.uk/~rpage/lsid/tester/
• LSID Launchpad - http://www-124.ibm.com/developerworks/downloads/detail.php?group_id=124&what=rele&id=553
• GUID Subgroup - http://www.tdwg.org/activities/guid/
• GUID Subgroup Reports
– http://wiki.gbif.org/guidwiki/wikka.php?wakka=GUID2Report&show_comments=1
– http://wiki.tdwg.org/twiki/pub/TIP/TipDocuments/GUID1Report.pdf
• Firefox LSID developer site - http://lsid.mozdev.org/