Top Banner
Provenance of scientific information as experienced in DRIVER 6th e-Infrastructure Concertation Event Lyon, 24 th November 2008 Wolfram Horstmann Bielefeld University / DRIVER
22

Provenance of scientific information as experienced in DRIVER 6th e-Infrastructure Concertation Event Lyon, 24 th November 2008 Wolfram Horstmann Bielefeld.

Dec 20, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Provenance of scientific information as experienced in DRIVER 6th e-Infrastructure Concertation Event Lyon, 24 th November 2008 Wolfram Horstmann Bielefeld.

Provenance of scientific information

as experienced in DRIVER

6th e-Infrastructure Concertation Event

Lyon, 24th November 2008

Wolfram HorstmannBielefeld University / DRIVER

Page 2: Provenance of scientific information as experienced in DRIVER 6th e-Infrastructure Concertation Event Lyon, 24 th November 2008 Wolfram Horstmann Bielefeld.

Notions of Provenance

• Where do data objects* originate from? – Scientific Work -- examples

• Instrumentation techniques– Manufacturers of hard- and software

• Methodologies– Processes, e.g. gene sequencing

– Technical/Local -- examples

• (web)-identifiers• Database, repository name

* Primary data, documents, metadata …

Page 3: Provenance of scientific information as experienced in DRIVER 6th e-Infrastructure Concertation Event Lyon, 24 th November 2008 Wolfram Horstmann Bielefeld.

Why Provenance?

• Quoting / Citing / Referencing as global scientific principle – „Reproducible research“

• Giving credits to authors / creators in distributed environments

• Original location / context has to be known

• Experienced in Grid-Environments [1]

Page 4: Provenance of scientific information as experienced in DRIVER 6th e-Infrastructure Concertation Event Lyon, 24 th November 2008 Wolfram Horstmann Bielefeld.

Provenance & Interoperability

• Re-Use / Sharing: “Addressing/Accessing”– Common view, common use– Unidirectional: No change of data objects!

• Federation: “Discovering in Context”– Remote representation of distributed DOs

• Aggregation: “Contextualizing”– Add unchanged object in a context

• Processing/Annotation: “Changing”– Uni- vs. Bidirectional: Change of DOs and remote

representation vs. back-storage (e.g. CVS)

Page 5: Provenance of scientific information as experienced in DRIVER 6th e-Infrastructure Concertation Event Lyon, 24 th November 2008 Wolfram Horstmann Bielefeld.

Scenarios in DRIVER

Page 6: Provenance of scientific information as experienced in DRIVER 6th e-Infrastructure Concertation Event Lyon, 24 th November 2008 Wolfram Horstmann Bielefeld.

Digital Scientific Data

Page 7: Provenance of scientific information as experienced in DRIVER 6th e-Infrastructure Concertation Event Lyon, 24 th November 2008 Wolfram Horstmann Bielefeld.

Digital Object Collections

⊃⊃ ⊃ ⊃

Page 8: Provenance of scientific information as experienced in DRIVER 6th e-Infrastructure Concertation Event Lyon, 24 th November 2008 Wolfram Horstmann Bielefeld.

Digital Object Repositories

+ + + +

=

Page 9: Provenance of scientific information as experienced in DRIVER 6th e-Infrastructure Concertation Event Lyon, 24 th November 2008 Wolfram Horstmann Bielefeld.

Digital Information Space

Page 10: Provenance of scientific information as experienced in DRIVER 6th e-Infrastructure Concertation Event Lyon, 24 th November 2008 Wolfram Horstmann Bielefeld.

Conventional Web Data

Page 11: Provenance of scientific information as experienced in DRIVER 6th e-Infrastructure Concertation Event Lyon, 24 th November 2008 Wolfram Horstmann Bielefeld.

„Simple“ Applications

Page 12: Provenance of scientific information as experienced in DRIVER 6th e-Infrastructure Concertation Event Lyon, 24 th November 2008 Wolfram Horstmann Bielefeld.

Metadata Infrastructure

Page 13: Provenance of scientific information as experienced in DRIVER 6th e-Infrastructure Concertation Event Lyon, 24 th November 2008 Wolfram Horstmann Bielefeld.

Basic Provenance Settings

• Indicate Production Situation– Metadata

• Author, Instrumentation etc.

• Remote Representation– Indicate place of origin in remote systems

• Metadata as digital objects / first order citizens

– Allow lineage respresentation • Credits in remote environments / versioning

Page 14: Provenance of scientific information as experienced in DRIVER 6th e-Infrastructure Concertation Event Lyon, 24 th November 2008 Wolfram Horstmann Bielefeld.

Orders of Provenance

• 1st order: Metadata– Provenance attached to data– Minimal „knowledge“ required in application– Allow remote handling of data objects– Require metadata infrastructure– Metadata introduce 2 objects: requires linkage

• 2nd order: context / compounds– Express multiple relations between objects– May introduce semantic model

Page 15: Provenance of scientific information as experienced in DRIVER 6th e-Infrastructure Concertation Event Lyon, 24 th November 2008 Wolfram Horstmann Bielefeld.

Provenance in DRIVER #1

• Simple Objects: OAI-PMH [2]

– 1st order provenance • Metadata: minimum OAI-DC

– 2nd order provenance• DRIVER explicit identifiers for repositories• OAI-PMH: inline representation („about“)

Page 16: Provenance of scientific information as experienced in DRIVER 6th e-Infrastructure Concertation Event Lyon, 24 th November 2008 Wolfram Horstmann Bielefeld.

Semantic/Compound Data

Page 17: Provenance of scientific information as experienced in DRIVER 6th e-Infrastructure Concertation Event Lyon, 24 th November 2008 Wolfram Horstmann Bielefeld.

„Semantic“ Applications

Page 18: Provenance of scientific information as experienced in DRIVER 6th e-Infrastructure Concertation Event Lyon, 24 th November 2008 Wolfram Horstmann Bielefeld.

Provenance in DRIVER #2

• „Enhanced Publications“ – Research project in

DRIVER-II– Representation of

data /document packages

– Use of OAI-ORE

Page 19: Provenance of scientific information as experienced in DRIVER 6th e-Infrastructure Concertation Event Lyon, 24 th November 2008 Wolfram Horstmann Bielefeld.

Provenance in OAI-ORE

• OAI-ORE: Object Re-Use and Exchange[4] – Uses Resource Maps < Named Graphs– Uses „lineage“ to represent expl. Provenance– Future: explicit provenance model [7] ?

Page 20: Provenance of scientific information as experienced in DRIVER 6th e-Infrastructure Concertation Event Lyon, 24 th November 2008 Wolfram Horstmann Bielefeld.

Summary

• Provenance essential for …– Indicating origin in distributed data spaces

• Accessing / Addressing• Federation / Aggregation • Processing / Annotation

– Document and data citation / trace-back– 1st order: describing data > metadata– 2nd order: describing context > semantic data

Page 21: Provenance of scientific information as experienced in DRIVER 6th e-Infrastructure Concertation Event Lyon, 24 th November 2008 Wolfram Horstmann Bielefeld.

Lessons learnt in DRIVER

• Use web-enabled Identification (URI/UDDI etc.)– „Dark“ databases don‘t interoperate

• 1st order provenance at place of origin– Requires metadata to describe origin– Enables a metadata infrastructure– Introduces linkage problem

• 2nd order provenance in contexts– Requires data provider identification in federators /

aggregators in order to link back– May require semantic model for context– Would benefit from a semantic infrastructure

Page 22: Provenance of scientific information as experienced in DRIVER 6th e-Infrastructure Concertation Event Lyon, 24 th November 2008 Wolfram Horstmann Bielefeld.

Resources[1] On provenance in the eScience / grid-environment

– http://www.sigmod.org/sigmod/record/issues/0509/p31-special-sw-section-5.pdf – In GLITE

• http://www.cesnet.cz/doc/techzpravy/2007/glite-job-provenance/• http://twiki.ipaw.info/bin/view/Challenge

[2] On provenance in OAI-PMH– http://www.openarchives.org/OAI/2.0/guidelines-provenance.htm

[3] On provenance OAI-ORE (referred to as ore:lineage)– http://www.openarchives.org/ore/meetings/Soton/ore_beyond_basics.pdf

(general)– http://www.openarchives.org/ore/1.0/vocabulary (definition)

[4] Named Graphs, Provenance and Trust (Caroll et al. )– http://www4.wiwiss.fu-berlin.de/bizer/SWTSGuide/carroll-ISWC2004.pdf

[5] W3C: On provenance in RDF– http://www.w3.org/2001/12/attributions/

[6] Open Provenance Model– http://eprints.ecs.soton.ac.uk/14979/1/opm.pdf

[7] DRIVER: Digital Repository Infrastructure for European Research– http://www.driver-community.eu