Mapping Cultural Heritage Information
to CIDOC-CRM
Maria Theodoridou
Foundation for Research and Technology – Hellas
Institute of Computer Science
BM meet-up “Semantics and Cultural Heritage”, London, September 12, 2014
X3ML
An interface for sustainable management of data mapping process
Use Case
Mapping the dFMRÖ coin database to CIDOC-CRM
2
Overview
BM meet-up “Semantics and Cultural Heritage”, London, September 12, 2014
X3ML
An interface for sustainable
management of data mapping process
Haridimos Kondylakis, Martin Doerr FORTH-ICS
Gerald de Jong Delving B.V.
Dominic Oldman British Museum
3
BM meet-up “Semantics and Cultural Heritage”, London, September 12, 2014
Cultural Diversity and Data Standards
Cultural information is more than a domain:
Collection description (art, archeology, natural history….)
Archives and literature (records, treaties, letters, artful works..)
Administration, preservation, conservation of material heritage
Science and scholarship – investigation, interpretation
Presentation – exhibition making, teaching, publication
But how to make a documentation standard?
Each aspect needs its methods, forms, communication means
Data overlap, but do not fit in one schema
4
BM meet-up “Semantics and Cultural Heritage”, London, September 12, 2014
“One model to rule them all”
The CIDOC CRM
The CIDOC Conceptual Reference Model
A collaboration with the International Council of Museums
An ontology of 86 classes and 137 properties for culture and more
With the capacity to explain hundreds of (meta)data formats
Accepted by ISO TC46 in September 2000
International standard since 2006 - ISO 21127:2006
Serving as:
intellectual guide to create schemata, formats, profiles
A language for analysis of existing sources for integration/mediation
“Identify elements with common meaning”
Transportation format for data integration / migration / Internet
5
BM meet-up “Semantics and Cultural Heritage”, London, September 12, 2014
Mappings
Mapping Rule
“a sufficient specification for the transformation of each
instance of a source schema into an instance of a target
schema while preserving as much as possible its initial
‘meaning’ ”
In practice mappings are produced manually by
Domain/IT experts
Labor-intensive
Error prone
Time consuming
CIDOC-CRM
DB1 DB2 DBn …
Mappings
6
BM meet-up “Semantics and Cultural Heritage”, London, September 12, 2014
Existing Mapping Approaches
Enormous amount of work already developed
Relational databases to RDF/S and OWL models
Files/XMLs to RDF/S
However previous approaches lack understanding of:
the borders between semantics and programming
the semantic heterogeneity cases between models
the business process that should be part of it
7
BM meet-up “Semantics and Cultural Heritage”, London, September 12, 2014
X3ML Workflow
Schema
Matching
CIDOC
-CRM
DB2 DB2 DB2
Domain
Experts
Schema Matching
Definition file
URI
generation
specification
IT Experts
Terminology
Mapping
8
BM meet-up “Semantics and Cultural Heritage”, London, September 12, 2014
X3ML Mapping format
9
BM meet-up “Semantics and Cultural Heritage”, London, September 12, 2014
X3ML – Additional Nodes
10
BM meet-up “Semantics and Cultural Heritage”, London, September 12, 2014
X3ML - Intermediate Paths
11
BM meet-up “Semantics and Cultural Heritage”, London, September 12, 2014
X3ML – Variables, Conditions, Info & Comments
Blocks
<if>
<exists>[xpath]</exists>
</if>
<if><not>
<if><exists>[xpath]</exists></if>
</not></if>
<if>
<equals value="[value-for-
comparison]">[xpath]</equals>
</if>
<if><not>
<if><equals value="[value-for-
comparison]">[xpath]</equals></if>
</not></if>
<x3ml>
<info>
... various fields describing the mapping ...
</info>
<namespaces/>
<mappings>
<mapping>
<domain>
<comments>
... various notes about the domain ...
</comments>
</domain>
<link>
<path>
<comments>
... various notes about the path ...
</comments>
</path>
<range>
<comments>
... various notes about the range ...
</comments>
</range>
</link>
</mapping>
</mappings>
<comments>
... various notes about the mappings ...
</comments>
</x3ml>
<entity variable="p1">
[generate the value]
</entity>
12
BM meet-up “Semantics and Cultural Heritage”, London, September 12, 2014
Use Case:
Mapping the dFMRÖ coin database to
CIDOC-CRM
13
Martin Doerr, Maria Theodoridou FORTH-ICS
Edeltraud Aspöck, Klaus Vondrovec ÖAW
BM meet-up “Semantics and Cultural Heritage”, London, September 12, 2014
Advanced Research Infrastructure for Archaeological Dataset Networking in
Europe
FP7-INFRASTRUCTURES-2012-1 EU project , no: 313193
http://www.ariadne-infrastructure.eu/
Primary goals
To integrate existing archaeological research infrastructures
To enable the use of distributed datasets and services
To develop new and powerful technologies as an integral component of the
archaeological research methodology
14
BM meet-up “Semantics and Cultural Heritage”, London, September 12, 2014
CIDOC-CRM was chosen as ARIADNE’s integration platform since its
primary role is to enable information exchange and integration between
heterogeneous sources of cultural heritage information.
During the first year of ARIADNE, several mapping activities were initiated
trying to convert existing schemata of archaeological data to CIDOC-CRM
Content providers were supported by FORTH
ÖAW worked on the mapping of four data bases:
dFMRÖ, a relational database of ancient Roman coin finds from Austria
and Romania
UK Material Pool Database (Site DB)
UK Thunau Database (Image DB)
Franzhausen Kokoron Database (Cemetery DB)
15
dFMRÖ digitale FundMünzen der Römischen Zeit in
Österreich
Austrian Academy of Sciences Numismatic Commission
Klaus Vondrovec
Access DB since 1999 MySQL DB online since 2007
http://www.oeaw.ac.at/numismatik/projekte/dfmroe/dfmroe.html
16
17
Tables
BM meet-up “Semantics and Cultural Heritage”, London, September 12, 2014
dFMRÖ coin db: mapping Coin
19
Target Domain:
E22 Man-Made Object
Source Domain:
//COIN
Two approaches for defining Coin
Introduce a specialization of E22 Man-Made Object:
Exx Coin subclass of E22 Man-Made Object
Define the Type of E22 Man-Made Object:
E22 Man-Made Object. P2 has type: E55 Type = “Coin”
To choose we need to answer the question:
Does the new class Coin have new properties that are not available in E22?
E55 Type
Coin
BM meet-up “Semantics and Cultural Heritage”, London, September 12, 2014
dFMRÖ coin db: Identifiers
20
Target Range:
E41 Appellation
Target Domain:
E22 Man-Made Object
Target Path:
P1 is identified by
Source Path:
ID
Source Domain:
//COIN
Source Range:
ID
E55 Type
Coin
Guideline: We map local identifiers in relational database tables explicitly only if these
identifiers are visible in the user interface and used in other documents as well.
Alternatively, we use the local database identifiers only for generating URIs for the
record instance, here the coin instance, and do NOT map the COIN.ID at all.
BM meet-up “Semantics and Cultural Heritage”, London, September 12, 2014
Mapping joins
21
Target Range:
E4 Period
Target Domain:
E22 Man-Made Object
P108i was produced by
Source Path:
COUNTRY_ID == COYNTRY_ID
Source Domain:
//Coin
Source Range:
//COUNTRY
P10 falls within
E12 Production
p1
BM meet-up “Semantics and Cultural Heritage”, London, September 12, 2014
Mixing categorical and factual info
Need to separate categorical and factual data
Inconsistent information:
Find spot -> for a specific coin
Historical facts -> for a category of coins
22
BM meet-up “Semantics and Cultural Heritage”, London, September 12, 2014
Categorical production
23
P108i was
produced by
PC2 is example of
PC1 produced
things of type
Need to extend the model in order to
support categorical production
(similar to FRBR R26 produced things
of type and R7 is example of)
Type can take values such as
"AU from Rome, mint ..."
which characterize the "edition" of the
mint
that can be recognized to be outcome
of the same minting process.
Typically we would assume that there
is a unique stamp used. E12 Production
p1
E55 Type
AU from Rome
E22 Man-Made Object
MyCoin
BM meet-up “Semantics and Cultural Heritage”, London, September 12, 2014
P2 has type
P17 was motivated by
Needs specialization “gave order”
P108 has produced
P2 has type
E55 Type
AU from Rome,
mint …
Mixing categorical and factual info
PC1 produced
things of type
E55 Type
“AU”
(DENOMINATION)
E55 Type
Issuing
E22 Man-Made Object
MyCoin
E12 Production
p1
E7 Activity
ia1
BM meet-up “Semantics and Cultural Heritage”, London, September 12, 2014
Issuer
25
Target Range:
E39 Actor
Target Domain:
E22 Man-Made Object
P108i was produced by
Source Path:
ISSUER_ID == PR_ID
Source Domain:
//COIN
Source Range:
//ISSUER
P14 carried out by
E12 Production
p1
"Issuer" is an accidental role, does not characterize
an actor independently from particular contexts of
activity. Therefore the Actor does not have the type
"Issuer" but the activity only has the type "Issuing"
P17 was motivated by
E7 Activity
ia1
E55 Type
Issuing
BM meet-up “Semantics and Cultural Heritage”, London, September 12, 2014
dFMRÖ coin db
26
BM meet-up “Semantics and Cultural Heritage”, London, September 12, 2014
dFMRÖ coin db
27
BM meet-up “Semantics and Cultural Heritage”, London, September 12, 2014
dFMRÖ coin db
28
BM meet-up “Semantics and Cultural Heritage”, London, September 12, 2014
CIDOC CRM Mapping Repository
29
Published schema matching definitions are available at:
http://139.91.183.3:9080/mapping_technology/
The schema matching definition (Version 1.0) format is available:
http://139.91.183.3:9080/mapping_technology/xsd/x3ml/x3ml_v1.0.xsd
The Mapping Memory Manager (3M) is available:
http://139.91.183.3:9080/3M/
Domain experts are able to easily understand & edit X3ML mapping files
You are kindly invited to send us your schema matching definition.
BM meet-up “Semantics and Cultural Heritage”, London, September 12, 2014
Lessons from mapping experiences
Semantic Interoperability can be defined by the capability of mapping
Mapping for epistemic networks is relatively simple:
Specialist/primary information databases frequently employ a flat schema, reducing complex
relationships into simple fields
Source fields frequently map to composite paths under the CRM, making semantics explicit
using a small set of primitives
Intermediate nodes are postulated or deduced (e.g., “production” from “coin”, “birth” from
“person”). They are the hooks for integration with complementary sources
Cardinality constraints must not be enforced= Alternative or incomplete knowledge
Domain experts easily learn schema mapping
IT experts may not understand meaning, underestimate it or are bored by it!
Intuitive tools for domain experts needed:
Separate identifier matching from schema mapping
Separate terminology mediation from schema mapping
30
BM meet-up “Semantics and Cultural Heritage”, London, September 12, 2014
Thank you!
31