Top Banner
Open Provenance Model Tutorial Session 4: Use cases from data.gov.uk Jun Zhao University of Oxford [email protected]
13

2010 09 opm_tutorial_01-jun-usecase-datagovuk

Jun 18, 2015

Download

Documents

Jun Zhao

Provenance use cases from the data.gov.uk project. Part of the OPM tutorial for FIS'2010@Berlin.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 2010 09 opm_tutorial_01-jun-usecase-datagovuk

Open Provenance Model Tutorial Session 4: Use cases from data.gov.uk

Jun ZhaoUniversity of Oxford

[email protected]

Page 2: 2010 09 opm_tutorial_01-jun-usecase-datagovuk

Outline

• Background about data.gov.uk• The use cases– XML serialization– Data transformation on the fly– Complex and nested processes

Page 3: 2010 09 opm_tutorial_01-jun-usecase-datagovuk

data.gov.uk

• Linking UK government data• Aims:– Provide a set of best practices for government

agencies– Provide the minimum set of tooling and

specification to facilitate the publication of data– Encourage “responsible” data publishing

Page 4: 2010 09 opm_tutorial_01-jun-usecase-datagovuk

XML -> RDF

XSLT ProcessorXSLT Processor

XSLT ParameterBinding

XSLT ParameterBinding

XSLT StylesheetXSLT Stylesheet

XSLT TemplateXSLT Template

input outputRDF FileRDF File

Who, when, which version,

how

Who, when, which version,

how

Contributed by Jeni Tennison

Page 5: 2010 09 opm_tutorial_01-jun-usecase-datagovuk

XSLT ProcessorXSLT Processorinput output

RDF FileRDF FileXSLT ParameterBinding

XSLT ParameterBinding

XSLT StylesheetXSLT Stylesheet

XSLT TemplateXSLT Template

Downloaded from;Unzipped from, etc Made accessible

Who, when, which version,

how

Who, when, which version,

how

Contributed by Jeni Tennison

Page 6: 2010 09 opm_tutorial_01-jun-usecase-datagovuk

On-the-fly Transformation

Data transformation

wrapper

Data transformation

wrapper

http://mytransportatio.db/j10

Who, when, which

version, how

Who, when, which

version, how

Contributed by Stuart Williams

Page 7: 2010 09 opm_tutorial_01-jun-usecase-datagovuk

Complex Data Creation Pipeline

GATE PipelineGATE Pipeline

GateXMLRegressionTransformationGateXMLRegressionTransformation

GateXMLRdfaTransformationGateXMLRdfaTransformation

RdfaRdfXmlTransformationRdfaRdfXmlTransformation

Courtesy of Paul Appleby from TSO (Data Enrichment Service)

Page 8: 2010 09 opm_tutorial_01-jun-usecase-datagovuk

Complex Data Creation Pipeline

GATE PipelineGATE Pipeline

GateXMLRegressionTransformationGateXMLRegressionTransformation

GateXMLRdfaTransformationGateXMLRdfaTransformation

RdfaRdfXmlTransformationRdfaRdfXmlTransformation

Document Reset PRDocument Reset PR

ANNIE English Tokeniser

ANNIE English Tokeniser

ANNIE English SplitterANNIE English Splitter

ANNIE POS TaggerANNIE POS Tagger

Data.gov.uk Morphological Analyzer

Data.gov.uk Morphological Analyzer

Data.gov.uk Flexible Roof Gazetteer

Data.gov.uk Flexible Roof Gazetteer

Data.gov.uk Generic Gazeteer

Data.gov.uk Generic Gazeteer

GATE Noun Phrase Chunker

GATE Noun Phrase Chunker

Data.gov.uk Generic Transducer

Data.gov.uk Generic Transducer

TSO CoreferenceTSO CoreferenceCourtesy of Paul Appleby from TSO (Data Enrichment Service)

Page 9: 2010 09 opm_tutorial_01-jun-usecase-datagovuk

wasGeneratedBy wasGeneratedBy wasGeneratedBy

hasParentProcess iterationOfProcess

Level 1: Provenance of execution at higher level

Level 0: Provenance of execution at detailed level

Services used by executions

Artifacts

followed

wasDerivedFrom A data collection

wasTriggeredBy wasTriggeredByaccessedService

Page 10: 2010 09 opm_tutorial_01-jun-usecase-datagovuk

Non-digital Data Objects

• Organizations– Organizational structure changes over time– Origin organization, resulting Organization

• Boundary• Legislation

An organization ontology: http://www.epimorphics.com/public/vocabulary/org.html

Page 11: 2010 09 opm_tutorial_01-jun-usecase-datagovuk

The Challenges

• Data of different representations, of physical forms, of granularity

• Not tooling support• Provenance across different types of systems– Identification– Different terminologies

Page 12: 2010 09 opm_tutorial_01-jun-usecase-datagovuk

The Gaps

• A vocabulary being able to describe provenance of all types of data, from different systems

• A vocabulary still providing enough terms to describe provenance accurately

Page 13: 2010 09 opm_tutorial_01-jun-usecase-datagovuk

This work is licensed under a Creative Commons Attribution-Share Alike 3.0 License

(http://creativecommons.org/licenses/by-sa/3.0/)