Top Banner
© 2006 University of Kansas An LSID resolver for specimens and a digression into issues raised by the use of GUIDs Steve Perry ([email protected])
28

© 2006 University of Kansas An LSID resolver for specimens and a digression into issues raised by the use of GUIDs Steve Perry ([email protected])

Jan 18, 2016

Download

Documents

Jennifer Dalton
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: © 2006 University of Kansas An LSID resolver for specimens and a digression into issues raised by the use of GUIDs Steve Perry (smperry@ku.edu)

© 2006 University of Kansas

An LSID resolver for specimensand a digression into issues raised by the use of GUIDs

Steve Perry ([email protected])

Page 2: © 2006 University of Kansas An LSID resolver for specimens and a digression into issues raised by the use of GUIDs Steve Perry (smperry@ku.edu)

©2006 KU BRC Apr 21, 2023

LSID Resolver for Specimens GUID-2

Part 1

Building an LSID

resolver for specimens

Page 3: © 2006 University of Kansas An LSID resolver for specimens and a digression into issues raised by the use of GUIDs Steve Perry (smperry@ku.edu)

©2006 KU BRC Apr 21, 2023

LSID Resolver for Specimens GUID-2

How it Works

LSID Authority

getMetadata()request

getMetadata()response

config file DiGIR2LSIDMetadataService

DiGIR2

SPARQL Service

SPARQLdescribe query

RDF response

Page 4: © 2006 University of Kansas An LSID resolver for specimens and a digression into issues raised by the use of GUIDs Steve Perry (smperry@ku.edu)

©2006 KU BRC Apr 21, 2023

LSID Resolver for Specimens GUID-2

Details of Prototype Implementation

• Classes of Data– Specimens

• Metadata Representation– RDF in DarwinCore inspired RDF-Schema

• Data Representation– N/A

• Experience with Stack– IBM Java toolkit– Great documentation (developerWorks article and Javadoc)– Very easy to implement and test (4 hours)

• Concerns– Integration of LSID client into existing software– SOAP not friendly to non-professional programmers

Page 5: © 2006 University of Kansas An LSID resolver for specimens and a digression into issues raised by the use of GUIDs Steve Perry (smperry@ku.edu)

©2006 KU BRC Apr 21, 2023

LSID Resolver for Specimens GUID-2

Conclusion : Resolution Is Easy

Other issues to resolve:

– Developing ontologies– Mapping databases into RDF– Finding data to link to– Repatriating links into existing databases– Versioning – Duplicate detection– Long term archival storage and access– Data aggregation and caching– Querying across data from multiple providers– Annotating someone else’s data without causing contradictions– Trust

Page 6: © 2006 University of Kansas An LSID resolver for specimens and a digression into issues raised by the use of GUIDs Steve Perry (smperry@ku.edu)

©2006 KU BRC Apr 21, 2023

LSID Resolver for Specimens GUID-2

Part 2

A digression into issuesraised by the use of GUIDs

Page 7: © 2006 University of Kansas An LSID resolver for specimens and a digression into issues raised by the use of GUIDs Steve Perry (smperry@ku.edu)

©2006 KU BRC Apr 21, 2023

LSID Resolver for Specimens GUID-2

DiG

IR2

Serv

er

SPARQLService

Triple Store

Data Source

Synchronizer

LSIDAuthority Public Web

Services

HarvestService

DiGIR2 :: A Semantic Web Publishing System

• Not a protocol, a general-purpose RDF data provider

• Synchronizer converts source data into RDF which is stored in a triple store

• Multiple services including SPARQL and OAI-PMHallow access to RDF data

Page 8: © 2006 University of Kansas An LSID resolver for specimens and a digression into issues raised by the use of GUIDs Steve Perry (smperry@ku.edu)

©2006 KU BRC Apr 21, 2023

LSID Resolver for Specimens GUID-2

Synchronizer

• Synchronizes the triple store with the database

• Builds RDF using:– a data source– a data model (RDF-Schema, OWL ontology)– a mapping program

• Can perform transformations while mapping

• Can perform resource description tracking and versioning

• Standardizes mapping for better support of thematic networks

Page 9: © 2006 University of Kansas An LSID resolver for specimens and a digression into issues raised by the use of GUIDs Steve Perry (smperry@ku.edu)

©2006 KU BRC Apr 21, 2023

LSID Resolver for Specimens GUID-2

Synchronizer :: Mapping and TransformationprimaryResourceMapFunction

preparation

decimalLatitude

if ETOH

equalsIgnoreCase

E

getVar prep

Skeleton

concatenate

urn:lsid:nhm.ku.edu:Herps:

getVar catalogNumber

add

getVar latitude_deg

getVar latitude_min

60

getVar latitude_sec

3600

divide

divide

catalogNumber

latitude_deg

latitude_min

latitude_sec

32

30

9

42

prep E

Page 10: © 2006 University of Kansas An LSID resolver for specimens and a digression into issues raised by the use of GUIDs Steve Perry (smperry@ku.edu)

©2006 KU BRC Apr 21, 2023

LSID Resolver for Specimens GUID-2

Synchronizer :: Versioning and Tracking

urn:lsid:nhm.ku.edu:Herps:32

dc:DarwinCoreSpecimen

“30.145”^^xsd:double

“ETOH”^^xsd:String

dc:decimalLatitude

Synchronizer output

Page 11: © 2006 University of Kansas An LSID resolver for specimens and a digression into issues raised by the use of GUIDs Steve Perry (smperry@ku.edu)

©2006 KU BRC Apr 21, 2023

LSID Resolver for Specimens GUID-2

Synchronizer :: Versioning and Tracking

urn:lsid:nhm.ku.edu:Herps:32

dc:DarwinCoreSpecimen

“30.145”^^xsd:double

“ETOH”^^xsd:String

dc:decimalLatitude

urn:lsid:nhm.ku.edu:Herps:32

dc:DarwinCoreSpecimen

“270.234”^^xsd:double

“ETOH”^^xsd:String

dc:decimalLatitude

Contents of TripleStore

Synchronizer output

Page 12: © 2006 University of Kansas An LSID resolver for specimens and a digression into issues raised by the use of GUIDs Steve Perry (smperry@ku.edu)

©2006 KU BRC Apr 21, 2023

LSID Resolver for Specimens GUID-2

Synchronizer :: Versioning and Tracking

urn:lsid:nhm.ku.edu:Herps:32

dc:DarwinCoreSpecimen

“30.145”^^xsd:double

“ETOH”^^xsd:String

dc:decimalLatitude

urn:lsid:nhm.ku.edu:Herps:32

dc:DarwinCoreSpecimen

“270.234”^^xsd:double

“ETOH”^^xsd:String

dc:decimalLatitude

A newversion?

Contents of TripleStore

Synchronizer output

Page 13: © 2006 University of Kansas An LSID resolver for specimens and a digression into issues raised by the use of GUIDs Steve Perry (smperry@ku.edu)

©2006 KU BRC Apr 21, 2023

LSID Resolver for Specimens GUID-2

Synchronizer :: Versioning and Tracking

What to do with new versions of resource descriptions?

• First, track them. Record outside of the RDF subsystem that a resource has been CRUD’d at a particular date and time

• After that, there are several ways to handle versioning– No versioning– Non-persistent versioning– Persistent versioning

• Each of these affects how clients do searches and how descriptions should be cached and stored remotely.

Page 14: © 2006 University of Kansas An LSID resolver for specimens and a digression into issues raised by the use of GUIDs Steve Perry (smperry@ku.edu)

©2006 KU BRC Apr 21, 2023

LSID Resolver for Specimens GUID-2

Versioning Schemes :: No versioning

• New version replaces old

• No new GUID assigned

• Simplest scheme• Lose ability to

retrieve old versions• Must have

application-level rules to find and remove effective-duplicates

urn:lsid:nhm.ku.edu:Herps:32

dc:DarwinCoreSpecimen

“30.145”^^xsd:double

“ETOH”^^xsd:String

dc:decimalLatitude

urn:lsid:nhm.ku.edu:Herps:32

dc:DarwinCoreSpecimen

“270.234”^^xsd:double

“ETOH”^^xsd:String

dc:decimalLatitude

Contents of Triple Store

Page 15: © 2006 University of Kansas An LSID resolver for specimens and a digression into issues raised by the use of GUIDs Steve Perry (smperry@ku.edu)

©2006 KU BRC Apr 21, 2023

LSID Resolver for Specimens GUID-2

Versioning Schemes :: Non-persistent versioning• New GUID assigned• Contents of old

description removed • New and old

descriptions related to each other by predicates

• Do not have problems of old versions matching in cache search

• Given old, can find new (inefficient)

• Cannot retrieve old data

urn:lsid:nhm.ku.edu:Herps:76

dc:DarwinCoreSpecimen

“30.145”^^xsd:double

“ETOH”^^xsd:String

dc:decimalLatitude

urn:lsid:nhm.ku.edu:Herps:32

Contents of Triple Store

urn:lsid:nhm.ku.edu:Herps:32

urn:lsid:nhm.ku.edu:Herps:76pub:replacedBy

Page 16: © 2006 University of Kansas An LSID resolver for specimens and a digression into issues raised by the use of GUIDs Steve Perry (smperry@ku.edu)

©2006 KU BRC Apr 21, 2023

LSID Resolver for Specimens GUID-2

Versioning Schemes :: Persistent versioning• New GUID assigned• Old description

maintained• New and old

descriptions related to each other by predicates

• Old versions can end up in triple store together

• Given old, can find new (inefficient)

• Can retrieve old• Lots of triples!

urn:lsid:nhm.ku.edu:Herps:76

dc:DarwinCoreSpecimen

“30.145”^^xsd:double

“ETOH”^^xsd:String

dc:decimalLatitude

Contents of Triple Store

urn:lsid:nhm.ku.edu:Herps:32

urn:lsid:nhm.ku.edu:Herps:32

dc:DarwinCoreSpecimen

“270.234”^^xsd:double

“ETOH”^^xsd:String

dc:decimalLatitude

urn:lsid:nhm.ku.edu:Herps:76

Page 17: © 2006 University of Kansas An LSID resolver for specimens and a digression into issues raised by the use of GUIDs Steve Perry (smperry@ku.edu)

©2006 KU BRC Apr 21, 2023

LSID Resolver for Specimens GUID-2

Versioning :: Mixed versioning

• At GUID1, it was stated that different types of information require different versioning policies.

• If implemented, this results in a mix of versioning schemes in the global graph

• Mixed versioning shifts the burden from providers that don’t version to clients (caches, portals, etc.) which have to figure out whether they are getting only current versions or a mix of new and old (effective duplicates)

Page 18: © 2006 University of Kansas An LSID resolver for specimens and a digression into issues raised by the use of GUIDs Steve Perry (smperry@ku.edu)

©2006 KU BRC Apr 21, 2023

LSID Resolver for Specimens GUID-2

Versioning :: Some thoughts on identity

• Do GUIDs name things or identify the descriptions of things?

• A non-versioned changes to metadata always change the semantic meaning of the description (regardless of whether or not identity is changed)

• To paraphrase Heraclitus, “Different waters flow in the same river”

• When deciding that a change in a description does not require a change in version, you’re constraining use of your data (you’re interested in the river, I’m interested in the water).

Page 19: © 2006 University of Kansas An LSID resolver for specimens and a digression into issues raised by the use of GUIDs Steve Perry (smperry@ku.edu)

©2006 KU BRC Apr 21, 2023

LSID Resolver for Specimens GUID-2

Caching

• Lots of use cases for caching– Aggregation for inference– Aggregation as solution to distributed query problem– Quality of service (response time)– Redundancy

• Caches should clearly communicate to clients whether the cache holds multiple historical versions of the same description so clients can avoid retrieving effective-duplicates

• To support caching, data providers should support a harvesting mechanism

Page 20: © 2006 University of Kansas An LSID resolver for specimens and a digression into issues raised by the use of GUIDs Steve Perry (smperry@ku.edu)

©2006 KU BRC Apr 21, 2023

LSID Resolver for Specimens GUID-2

Incremental Harvesting

• Incremental harvesting is more efficient than bulk harvesting because it sends only recent changes

• “Give me all metadata changes since X”

• To support incremental harvesting we need to track type and date of changes (regardless of the versioning policy)

• This adds another set of requirements on to data providers

• OAI protocol for metadata harvesting

Page 21: © 2006 University of Kansas An LSID resolver for specimens and a digression into issues raised by the use of GUIDs Steve Perry (smperry@ku.edu)

©2006 KU BRC Apr 21, 2023

LSID Resolver for Specimens GUID-2

The Open World

Organization A

urn:lsid:A:ns:1 “red"color

Page 22: © 2006 University of Kansas An LSID resolver for specimens and a digression into issues raised by the use of GUIDs Steve Perry (smperry@ku.edu)

©2006 KU BRC Apr 21, 2023

LSID Resolver for Specimens GUID-2

The Open World

Organization A

urn:lsid:A:ns:1 “red"color

Organization B

urn:lsid:A:ns:1 “large”size

Page 23: © 2006 University of Kansas An LSID resolver for specimens and a digression into issues raised by the use of GUIDs Steve Perry (smperry@ku.edu)

©2006 KU BRC Apr 21, 2023

LSID Resolver for Specimens GUID-2

The Open World

Organization A

urn:lsid:A:ns:1 “red"color

Organization B

urn:lsid:A:ns:1 “large”size

urn:lsid:A:ns:1

“red"

“large”

Merged Graph

Page 24: © 2006 University of Kansas An LSID resolver for specimens and a digression into issues raised by the use of GUIDs Steve Perry (smperry@ku.edu)

©2006 KU BRC Apr 21, 2023

LSID Resolver for Specimens GUID-2

The Open World

Organization A

urn:lsid:A:ns:1 “red"color

Organization B

urn:lsid:A:ns:1 “large”size

Organization C

urn:lsid:A:ns:1 “blue”color

urn:lsid:A:ns:1

“red"

“large”

Merged Graph

Page 25: © 2006 University of Kansas An LSID resolver for specimens and a digression into issues raised by the use of GUIDs Steve Perry (smperry@ku.edu)

©2006 KU BRC Apr 21, 2023

LSID Resolver for Specimens GUID-2

The Open World

Organization A

urn:lsid:A:ns:1 “red"color

Organization B

urn:lsid:A:ns:1 “large”size

Organization C

urn:lsid:A:ns:1 “blue”color

urn:lsid:A:ns:1

“red"

“large”

“blue”

Merged Graph

Page 26: © 2006 University of Kansas An LSID resolver for specimens and a digression into issues raised by the use of GUIDs Steve Perry (smperry@ku.edu)

©2006 KU BRC Apr 21, 2023

LSID Resolver for Specimens GUID-2

The Open World

Two solutions to this problem• Close the world

– Ignore assertions about GUIDs that don’t originate from the authority

• Narrow the world– Only allow certain assertions about GUIDs that don’t originate from

the authority– Accept/reject foreign authority notifications

• Treat everything as an assertion and record who makes it and what they intend by it– Named graphs and semantic web publishing warrants

Page 27: © 2006 University of Kansas An LSID resolver for specimens and a digression into issues raised by the use of GUIDs Steve Perry (smperry@ku.edu)

©2006 KU BRC Apr 21, 2023

LSID Resolver for Specimens GUID-2

Provenance, Attribution, and Trust

• Assign GUIDs to resources• Assign GUIDs to the graphs that contain concise bounded

descriptions, resulting in named “description” graphs• For each description graph, create another named graph

that contains information about the assertions made in it• Second named graph is a “warrant” graph• Warrant graph contains meta-meta data – instance of a

Warrant class with attributes such as assertedBy• Carroll and Bizer presented “Semantic Web Publishing

using Named Graphs” at ISWC2004 Trust Workshop

Page 28: © 2006 University of Kansas An LSID resolver for specimens and a digression into issues raised by the use of GUIDs Steve Perry (smperry@ku.edu)

©2006 KU BRC Apr 21, 2023

LSID Resolver for Specimens GUID-2

Issues with LSIDs and RDF

– Developing ontologies– Mapping databases into RDF– Finding data to link to– Repatriating links into existing databases– Versioning – Duplicate detection– Long term archival storage and access– Data aggregation and caching– Querying across data from multiple providers– Annotating someone else’s data without causing contradictions– Trust