Top Banner
1 culturegraph.org Aufbau eines Hubs für Linked Library Data Markus M. Geipel <[email protected]> Adrian Pohl <[email protected]>
23

1 culturegraph.org Aufbau eines Hubs für Linked Library Data Markus M. Geipel Adrian Pohl.

Apr 01, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 culturegraph.org Aufbau eines Hubs für Linked Library Data Markus M. Geipel Adrian Pohl.

1

culturegraph.orgAufbau eines Hubs für Linked Library Data

Markus M. Geipel <[email protected]> Adrian Pohl <[email protected]>

Page 2: 1 culturegraph.org Aufbau eines Hubs für Linked Library Data Markus M. Geipel Adrian Pohl.

2

1. The Linked Data Challenge

2. Culturegraph Platform1. Resolving & Lookup2. Process & Technology3. RDF Modelling

3. Current State

Table of Contents

Page 3: 1 culturegraph.org Aufbau eines Hubs für Linked Library Data Markus M. Geipel Adrian Pohl.

3

Paradigm shift in modeling knowledge/data

Isolated Tables Network beyond organizational boundaries

Page 4: 1 culturegraph.org Aufbau eines Hubs für Linked Library Data Markus M. Geipel Adrian Pohl.

From isolated Tables to a Semantic Network

A naïve Approach

1. Transform from Marc21/Mab2/Pica to RDF

2. Put everything into a Triplestore

3. SPARQL and Reasoner do the magic

What is wrong with this approach?

4

Page 5: 1 culturegraph.org Aufbau eines Hubs für Linked Library Data Markus M. Geipel Adrian Pohl.

5

Format is not Content!

If you pour water into a wine-glass does it change to wine?

How can you expect old Marc21 data to change into a semantically rich, reasoner-ready piece of information just by changing the data format to RDF?

?

Page 6: 1 culturegraph.org Aufbau eines Hubs für Linked Library Data Markus M. Geipel Adrian Pohl.

Connections don’t come for freeSome challenges …

1. No universally unique id

2. Often no references to entities, just character-strings

3. No controlled vocabulary- Example: 1.3 Mio. different

values for the edition field

4. Changing Cataloging Practices

5. Mistakes, Typos

6

Page 7: 1 culturegraph.org Aufbau eines Hubs für Linked Library Data Markus M. Geipel Adrian Pohl.

Culturegraph as a signpostA coherent picture on bibliographic data

7

Hiddenduplicates

Different services

Differentinterfaces

?Culturegraph

!

Page 8: 1 culturegraph.org Aufbau eines Hubs für Linked Library Data Markus M. Geipel Adrian Pohl.

8

Culturegraph as a Platform to interlink Bibliographic Data

1. Open Tools- Open algorithms and code; reuse

2. Integration into existing Workflows- Synchronization of data- Integration of results into original data sources

3. Publication Results- Connections and views, not the entire aggregated Data- Linked Open Data/RDF

4. Persistence of Results- Integration into URN resolving infrastructure

5. Tracking provenance

Page 9: 1 culturegraph.org Aufbau eines Hubs für Linked Library Data Markus M. Geipel Adrian Pohl.

First Project: Resolving & LookupUniversally Unique and Persistent IDs– Input:

6 main German bibliographic catalogues

– Objective: Bundling of manifestations

– Service:- Publication of bundles- Minting of URNs for approved bundles- Search bundles using established identifiers

– Part of the DDB Eco-System- Support for Data Aggregation

9

Page 10: 1 culturegraph.org Aufbau eines Hubs für Linked Library Data Markus M. Geipel Adrian Pohl.

The Process

1. Translate into internal format1. Mapping of Fields to

Properties2. Normalization, Cleaning,

Regexp Matching, etc. defined in XML

2. Database ingest> 80 Million Records> One Billion Properties

10

XML

Page 11: 1 culturegraph.org Aufbau eines Hubs für Linked Library Data Markus M. Geipel Adrian Pohl.

The Process

3. Generate unique properties > 50 Mio.*- Combinations of Properties

defined in XML

4. Group by Unique Properties

5. Merge equivalent Groupsca. 18 Mio. Records* in groups

11

XML

* For a first simple Matching Algorithm

Page 12: 1 culturegraph.org Aufbau eines Hubs für Linked Library Data Markus M. Geipel Adrian Pohl.

The Process (next steps)

5. Check quality & mint persistent Ids

6. Publication as Linked Data

12

Id1 Id2 Id3

http://

Page 13: 1 culturegraph.org Aufbau eines Hubs für Linked Library Data Markus M. Geipel Adrian Pohl.

Representing bundles of bibliographic records in

RDF

13

Page 14: 1 culturegraph.org Aufbau eines Hubs für Linked Library Data Markus M. Geipel Adrian Pohl.

Namespaces for Internal Bibliographic Description

rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

bibo: <http://purl.org/ontology/bibo/>

dcterms: <http://purl.org/dc/terms/>

frbr: <http://purl.org/vocab/frbr/core#>

foaf: <http://xmlns.com/foaf/0.1/>

cg: < http://culturegraph.org/vocab#> (not established yet)

...& others

14

Page 15: 1 culturegraph.org Aufbau eines Hubs für Linked Library Data Markus M. Geipel Adrian Pohl.

15

Page 16: 1 culturegraph.org Aufbau eines Hubs für Linked Library Data Markus M. Geipel Adrian Pohl.

Matching & Bundling

Different matching critieria to be discussed

Example: sameness of ISBN & year

Matching algorithms can be created and modified easily

Matched resources are bundled and underlying algorithm indicated

Bundle Ontology: http://purl.org/net/bundle

16

Page 17: 1 culturegraph.org Aufbau eines Hubs für Linked Library Data Markus M. Geipel Adrian Pohl.

17

Page 18: 1 culturegraph.org Aufbau eines Hubs für Linked Library Data Markus M. Geipel Adrian Pohl.

18

Page 19: 1 culturegraph.org Aufbau eines Hubs für Linked Library Data Markus M. Geipel Adrian Pohl.

Minting Über-Identifiers

In the last step IDs for bibliographic resources may be minted

urn:nbn:de:cg-12345678http://culturegraph.org/urn:nbn:de:cg-12345678

Based on reliable, agreed-upon algorithm

Record-resource linking by foaf:isPrimaryTopicOf

19

Page 20: 1 culturegraph.org Aufbau eines Hubs für Linked Library Data Markus M. Geipel Adrian Pohl.

20

Page 21: 1 culturegraph.org Aufbau eines Hubs für Linked Library Data Markus M. Geipel Adrian Pohl.

Future prospects

– Workflow-IntegrationShare, enrich and reuse metadata right from the start

– New Features/ProjectsFrom concrete to visionary…1. Integration of GND-references

(from BEACON-Files and other sources) 2. Computation of links to further resources

(Subject Headings, Geo coordinates, Person names, Wikipedia)3. Authority file for works4. Crowdsourcing

(enrich and correct descriptions of titles, works, persons, etc.)

21

Page 22: 1 culturegraph.org Aufbau eines Hubs für Linked Library Data Markus M. Geipel Adrian Pohl.

Markus M. Geipel |culturgraph.org | 5. October 201122

Summary

– Culturegraph will - Match the main German library catalogues- give each bibliographic resource a persistent ID

– State- Basic infrastructure up running with good performance

(80 Mio. Records Matched in one hour)- All Source Code published on Sourceforge- First Demonstrator Webportal at www.culturegraph.org

– Soon to come- January:

- Operational Webportal- Publication of first matching results (HTML, RDF, etc.)

- Next Year: - Persistent IDs

Page 23: 1 culturegraph.org Aufbau eines Hubs für Linked Library Data Markus M. Geipel Adrian Pohl.

Appendix: Projektmitarbeiter

– Daniel Schäfer (DNB) Projektleitung

– Katja Mecklinger (DNB) Stellvertretende Projektleitung, ÖA

– Markus Geipel (DNB) Leiter Architektur und Entwicklung

– Adrian Pohl (hbz) – ÖA, Ontologie

– Pascal Christoph (hbz) – Architektur

– Julia Hauser (DNB) - Ontologie

– Lars Svensson (DNB) - Ontologie

– Jürgen Kett (DNB) – Projektsteuerung, ÖA23