Top Banner
IDs in and out of the database Entomological Collection Network (ECN) 2012 November 10 – 11, Knoxville, TN Debbie Paul, Greg Riccardi
21

Paul2 ecn 2012

Nov 22, 2014

Download

Business

ECNOfficer

 
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Paul2 ecn 2012

IDs in and out of the database

Entomological Collection Network (ECN) 2012 November 10 – 11, Knoxville, TN

Debbie Paul, Greg Riccardi

Page 2: Paul2 ecn 2012

• What good is identification?• How are identifiers used by consumers• Providing IDs• Resolving IDs in a server–Strategies for storing IDs in databases

• Linked Data• Annotations ~ all sorts• Feedback

Overview

Page 3: Paul2 ecn 2012

What good is identification?

• Aggregation– If you get info from 2 sources that are about the

same object, you can combine the info• Resolution (finding information about object)– Types of resolution• Determine where to get information• Determine how to get information

• Providing information– How to create IDs– How to publish IDs– How to fetch database information for IDs

Page 4: Paul2 ecn 2012

HTTP URIs

• Biggest problem– Identification and 2 types of

resolution are comingled • Resolution–Where to get information • Look somewhere

–How to get information • Fetch information using some protocol

Page 5: Paul2 ecn 2012

DOI example

• The DOI is• 10.3897/zookeys.209.3135

• URI (for aggregating) is• doi:10.3897/zookeys.209.3135

• A URL for information retrieval (proxy resolution) is• http://dx.doi.org/10.3897/zookeys.209.3135

• Information fetched from– HTML:

• http://www.pensoft.net/journals/zookeys/article/3135/abstract/five-task-clusters-that-enable-efficient-and-effective-digitization-of-biological-collections

– RDF:• http://data.crossref.org/10.3897/zookeys.209.3135

Page 6: Paul2 ecn 2012

What’s in an ID?

• For consumer:– NOTHING! No information– Might as well be UUID• Can’t type it, remember it, parse it,

resolve it– Useful for comparison and aggregation• Equal strings (persistence)• Different strings about the same object

– fetching information• Send the ID somewhere for info

Page 7: Paul2 ecn 2012

What’s in an ID?

• For Provider/resolver:–Use ID to find local storage of information– E.g. • parse out the DWC triple• Extract the database table and primary key• Look up the ID in a table of IDs• Look up ID in a URI field of a database table

Page 8: Paul2 ecn 2012

What’s in an id for the provider?

• record id112234

• uuid 954c8760-e1a6-4b4b-ab82-6bf7311c25f3

• lsid urn:lsid:example.org:specimen:22545

• uri

• ezid http://n2t.net/ark:/99999/fk42b9hdf

• doi doi:10.1038/ng0609-637

Page 9: Paul2 ecn 2012

What about Specimen identifiers?

• identifier on the specimen?

– readable text

– encoded data

– barcode is a contextual identifier

• identifier in the database?

– http://ids.usms.edu/herb/0014097

– http://ids.usms.edu/herb/0303134303937

Page 10: Paul2 ecn 2012

How do providers identify?

Notice online databases and your database and find the identifiers of the various objects

Some identifiers are local (e.g. primary key)

Some identifiers are globally unique Some identifiers are URIs

Page 11: Paul2 ecn 2012

Identification in the field

• wireless or workbench• data collected and uploaded

Page 12: Paul2 ecn 2012

Storing IDs in databases

• your contextual ids?, your guids?• What to use for IDs?– record id–uuid– lsid–uri

• what’s in your wallet database?• Morphbank Example

Page 13: Paul2 ecn 2012

IDs in Morphbank• Morphbank Example• http://www.morphbank.net/818505

Page 14: Paul2 ecn 2012

IDs in Morphbank• Morphbank Example• http://www.morphbank.net/643261

Page 15: Paul2 ecn 2012

Sharing data with IDs• into a publication

• uploaded to the web

• data shared with a database integrator / aggregator

– GBIF

– iDigBio

– VertNet

– Morphbank

• what is it exactly in the publication?

– an id?, a guid? a link to more information?

– what will be cited? searched for?

Page 16: Paul2 ecn 2012

Feedback with IDs• Annotations– Target of annotation• http://www.morphbank.net/818505

– filtered PUSH• linked data ~ the semantic web– (benefits – in a minute)

• updating the database– be(a)ware–Remember previous IDs

Page 17: Paul2 ecn 2012

What’s coming up next?

• expect guids for all sorts of objects–collection objects (example: specimen)–georeferences– taxon concepts–determinations–people

Page 18: Paul2 ecn 2012

GUIDs are key• 1 to many IDs known for a given object• store and share the ones you know about

Specimen RecordID 19537Specimen Previous Catalog Number 212345Specimen Catalog Number / bar code bbbrc000123Darwin Core Triplet (DwC) flmnh:herb:bbbrc000123DwC Occurrence URI

urn:catalog:flmnh:herb:bbbrc000123Specimen GUID of type lsid

urn:lsid:biocol.org:flmnh:bbbrc000123Specimen Opaque Identifier (UUID) 424854d7-baec-42cf-a142-

805b64117b9fURI for UUID urn:uuid:424854d7-baec-42cf-a142-

805b64117b9fSpecimen GUID of type HTTP-URI

http://ids.flmnh.ufl.edu/herb/bbbrc000123

*Cannot enforce single identifier per object

Page 19: Paul2 ecn 2012

caring for guids

• store them

– database adjustments

– tweaking current standard practices

• share them

– data standards

– 3 ways to modify darwin core

• reap the benefits

Page 20: Paul2 ecn 2012

caring for guids – reap the benefits

• Data quality feedback• Dialog based on annotation• Tracking objects through analysis and use• Maintaining attribution to provider• Find related objects• Find a way to take advantage of efforts of

many smart dedicated people– BHL, biscicol, filtered PUSH, GNA,

TNRS, SGR,…

Page 21: Paul2 ecn 2012

Thanks from iDigBio

42!