DIACHRON Preservation: Evolution Management for Preservation

Post on 01-Jul-2015

162 Views

Category:

Documents

5 Downloads

Preview:

Click to see full reader

DESCRIPTION

by Giorgos Flouris (FORTH), presented at the 3rd PRELIDA Consolidation and Dissemination Workshop, Riva, Italy, October, 17, 2014. More information about the workshop at: prelida.eu

Transcript

Evolution Management for Preservation

PRELIDA Consolidation Workshop 17.10.2014

Giorgos Flouris (FORTH)fgeo@ics.forth.gr

Evolution Management Problem

Preservation ↔ Evolution

Change Detection

• Change detection for evolution management

– Identifying changes between versions

• Challenges (in DIACHRON)

1. Diverse data models

2. Dynamic datasets

3. Recoverable versions

4. Changes as first-class citizens

5. Cross-snapshot queries

Evolution in DIACHRON

Pilot dataset DIACHRON

Ve

rsio

n 1

Pilot dataset DIACHRON

Ve

rsio

n 2

Change Types: Motivation

What a naïve diff will report

Add (Rec, diachron:subject, EFO_001927)Add (Rec, diachron:hasRecordAttribute, rAtt1)Add (rAtt1, diachron:predicate, rdfs:subClassOf)Add (rAtt1, diachron:object, ObsoleteClass)

What the pilot expects

Add_SuperClass (EFO_001927, ObsoleteClass)

Change Hierarchy: Low-level (1/3)

• Low-level changes

– DIACHRON model, for internal use

– Fixed: Add, Delete

– Just additions and deletions of triples

– Simple set difference

Change Hierarchy: Simple (2/3)

• Pilot terminology: – Add_SuperClass

Add_Dimension

• Fixed, pre-defined

• Comprising of low-level changes

• Partitioning is perfect– Complete and unambiguous

Change Hierarchy: Complex (3/3)

• Pilot terminology:

– Add_Synonym, Mark_As_Obsolete

• Totally custom, pilot-specific (defined at run-time)

Using Changes for Evolution Management

• DIACHRON data model contains all versions

• Detection based on SPARQL queries

– Provided at deployment time (for simple)

– Generated at creation time (for complex)

• Recoverability

– Allows moving back and forth between versions

Representation Requirements

• Interesting queries– Return the simple changes that dataset X underwent

between versions V1 and V2– Return the changes that resource X underwent in the first

semester of 2014– Give me all resources of type X that underwent change Y– Return all countries for which the unemployment rate of

their capital city increased at a rate higher than the average increase of the country as a whole, between versions V1 and V2

• Access to both the changes and the data is required– Changes are first-class citizens– Allowing preservation

DIACHRON

Data

Changes Ontology

C1

Add_SuperClass

V1

V2

asc_p1

asc_p2

Simple_Change

Change

prov:Activity

Data level

Schema level

EFO_001927

ObsoleteClass

old_version

new_version

diachron:Entity

Add_Synonym

Complex_Change

… …

Conclusion

• Main DIACHRON message – (Linked) data preservation is related to evolution management

• DIACHRON challenges1. Diverse data models2. Dynamic datasets3. Recoverable versions4. Changes as first-class citizens5. Cross-snapshot queries

• Solutions– DIACHRON data model (#1)– Appropriate change definition and detection (#2, #3)– Changes and data represented at the same level (#4, #5)

top related