Publishing research information as Linked Data Proposal of Recommendations EuroCRIS meeting. February 2012 Miguel-Ángel Sicilia
Jan 25, 2016
Publishing research information as Linked DataProposal of Recommendations
EuroCRIS meeting. February 2012
Miguel-Ángel Sicilia
ROADMAP
Introduction & Motivation Stakeholders Example Architecture Basic Principles of the LD Exposure CERIF Ontology Recipes for the CERIF LD Exposure CERIF Model Extension Key Use Case Demo Bootstrapping Issues and challenges Conclusions
INTRODUCTION & MOTIVATION
A POINT OF DEPARTURE?
“CERIF and Linked Data are similar, complementary approaches. However, there are significant differences in the way they encode relationships. EXRI-UK reviewed these approaches against higher education needs and recommended that CERIF should be the basis for the exchange of research information in the UK. CERIF is currently better able to encode the rich information required to communicate research information, and has the organisational backing of EuroCRIS, ensuring it is well-managed and sustainable. EXRI-UK final report,
http://www.jiscinfonet.ac.uk/infokits/research/exri-uk
AN EXAMPLE OF USING LINKED DATA IN RIS
XML DATA INTERCHANGE
RIS Database (CERIF)
RIS Database (CERIF)
generate parse
send/reception
LIMITATIONS (FROM A LINKED DATA PERSPECTIVE)
WebAPI
A
Aggregator (harvester or query client)
Shortcomings
1. APIs provide proprietary interfaces (even though CERIF XML standardizes the interchange format)
2. Aggregators are based on a fixed set of data sources. (not necessarily, but require some registry of providers).
3. You can not set hyperlinks neither between RIS entities (projects, people, organizations, publications) descriptions nor from them to other data or terminologies.
WebAPI
B
WebAPI
C
WebAPI
D
Adapted from: Christian Bizer: The Web of Linked Data (26/07/2009)
THE LINKED DATA APPROACH
Adapted from: Christian Bizer: The Web of Linked Data (26/07/2009)
BC
RDF
RDFlink
A D DBpedia
RDFlinks
RDFlinks
RDFlinks
RDF
RDF
RDF
RDF
RDF RDF
RDF
RDF
RDF
Use RDF to provide CERIF metadata based on the XML mapping
Add links using different kinds of relations rel (mapping of CERIF link entities?).
Connect to terminologies using some Classification (cls). (an extension of keywords in CERIF?)
Link to other LOD datasets instead of repeating information.
cls
rel
cls
Terminology server
BROWSING & INTEGRATING
Adapted from: Christian Bizer: The Web of Linked Data (26/07/2009)
B C
RI
typedlinks
A D E
typedlinks
typedlinks
typedlinks
RI
Term
Term
RI
RI RI
RI
Term
Term
Data integrator (combines information for a given cfPers, cfProj or cfOrgUnit)
Browser
Data integrator (combines Information of several cfPers, cfProj or cfOrgUnit,
e.g. for analyzing country or call outcomes)
RELEVANT RECOMMENDATIONS
CERIF COMPONENTS
CERIF
SQL
XML
SEMANTICS
LINKED DATA
STAKEHOLDERS
A PROPOSAL
Higher Education Institutions (HEI) or R&D institutions
Funding bodies (FB) Research Authorities (RA) Researchers Research information Enterprises (RIE) General public Enterprises
…which are their critical use cases and their “killer apps”?
EXAMPLE ARCHITECTURE
STRATEGIES FOR PUBLISH LINKED DATA
ALTERNATIVES FOR THE EXPOSURE OF LINKED DATA Providing a endpoint for enquiries Serving Static RDF Files Serving RDF Embedded in HTML Files Serving LD from RDF Triple Stores Serving LD by wrapping Web APIs Serving LD from Relational Databases
FACTORS AFFECTING THE DECISION How much data do you want to serve? How is your data currently stored? How often does your data change?
RIS ARCHITECTURE
Internet Navigator
PAPERSLinked Data-The Story So Far[PDF] de igeex.bizT Berners-Lee - International Journal on Semantic Web and …, 2009 - igi-global.comCitado por 294 - Artículos relacionados - Las 19 versiones - Importar al BibTeX Back!
URL: http://cris.myorganization.orgFile Favourites Help
RIS Database (CERIF)
RIS Application Server
RIS-LD ARCHITECTURE Internet Navigator
PAPERS
relacionados - Las 19 versiones - Importar al BibTeX
Back!
File Faourites Help
<http://cris.myOrganization.org:2020/resource/projects/Organic.Edunet> a cerif:Project ; rdfs:label "Multilingual Federation of Learning Repositories"@en-uk ; cerif:acronym "Organic.Edunet" ; cerif:endDate "2010-09-30"^^xsd:date ; cerif:internalIdentifier "ff808181300cf99e01300d1a355f0003" cerif:isLinkedByOrganisationUnit
RIS Database (CERIF)
D2R Server
RIS Application Server
URI SCHEME PUBLISHED BY D2R
http://cris.myorg.org/resource/RESOURCE_ID LD Identifier for a given resource
http://cris.myorg.org/data/RESOURCE_ID Resource description of a given resource in RDF
(N3)
http://cris.myorg.org/page/RESOURCE_ID Resource description of a given resource in
HTML
OPENING OUR CERIF DATASETS
Internet Navigator
URL: http://mashup.orgFile Favourites Help
mashup
<http://cris.myOrganization.org:2020/resource/projects/Organic.Edunet> a cerif:Project ; rdfs:label "Multilingual Federation of Learning Repositories"@en-uk ; cerif:acronym "Organic.Edunet" ; cerif:endDate "2010-09-30"^^xsd:date ; cerif:internalIdentifier "ff808181300cf99e01300d1a355f0003" cerif:isLinkedByOrganisationUnit
RIS-LD
BENEFITS OF OUR ARCHITECTURE
Exposure of Liked Data without altering the current research information system (non-intrusive)
Linked Data interface: RDF descriptions of individual resources stored in DB over the HTTP protocol
SPARQL endpoint (the SQL of Linked Data) Traditional HTML interface: web pages
describing resources Simple way of interchanging data on the Web Create new third party applications using
open linked data from RIS systems
BASIC PRINCIPLES OF THE LD EXPOSURE
GENERAL PRINCIPLES OF THE LOD APPROACH
1. Use URIs as names for things.2. Use HTTP URIs so that people can look up
those names. 3. When someone looks up a URI, provide
useful information, using the standards (RDF*, SPARQL)
4. Include links to other URIs. so that they can discover more things.
RE-USING OF WELL-KNOWN TERMS
We need an ontology for the CERIF model elements
"Do not reinvent the wheel" Data can be consumed by applications that
may be tuned to well-known vocabularies Foster interoperability between different
datasets
SELF-DESCRIBED AND CONSISTENT TERMS
Logical entities are translated into RDF classes and their attributes into RDF properties
CF prefixes are not necessary for ontology terms Instead, URI namespaces
Properties and Classes self-described rdfs:label (title case capitalized version of the
property/class) rdfs:comment (a plain text description of the
URI DESIGN
Essential to enable interoperability and understanding
Create human-readable and memorable URIs Avoid using artificial primary keys
Discover URIs using similarity heuristics Follow a similar schema/pattern for URIs
http://cris.myorg.org/resource/ENTITYNAME/ENTITYID
Example for a identifier for the EU project “Virtual Open Access ...” hosted at University of Athens http://cris.aua.gr/resource/projects/VOA3R
WHERE DO RI DATASETS LIVE?
Higher ed or R&D institutions maintain repositories centred on Pers, cfOrgUnit (internal) and sometimes cfProj and emphasizing cfResPubl, cfResPat.
Funding bodies are centred around cfProj, cfOrgUnit (mostly legal bodies, not internal) and cfFundProg and related.
Bibliographic and citation databases focus on cfResPubl, cfResPat and in general provide poor support for cfPers and cfOrgUnit.
DISTRIBUTED DATASETS
Research Information is distributed Frequently, there is duplicated information in
different RIS systems. ID for VOA3R Project in the University of
Athens dataset* http://cris.aua.gr/resource/projects/VOA3R
ID for VOA3R Project in the University of Alcalá dataset* http://cris.uah.es/resource/projects/VOA3R
No Problem: a same concept can be identified by different URIs in Linked Data Using owl:sameAs predicate
* Assuming that there is a corporate RIS available in http://cris.....
CERIF ONTOLOGY
CERIF ONTOLOGY
CERIF Ontolog
y
CERIF Semanti
c Vocabul
ary
Other vocabular
ieshttp://eurocris.org/semcerif
http://eurocris.org/cerif
EUROCRIS WEBSITE FOR PUBLISHING ONTOLOGIES
Current version at http://spi-fm.uca.es/neologism/
THE CERIF ONTOLOGY ON THE WEB
VISUAL REPRESENTATION OF THE ONTOLOGIES
Current version at http://spi-fm.uca.es/neologism/cerif
ONTOLOGY TERMS
RECIPES FOR THE CERIF LD EXPOSURE
RECIPES FOR THE CERIF LD EXPOSUREMULTIPLE LANGUAGE FEATURES
CERIF MULTIPLE LANGUAGE FEATURES
Predicate Objectrdfs:labelfoaf:namecerif:name
*.cfName
rdfs:labeldc:titlecerif:title
*.cfTitle
dc:description *.cfDescrcerif:keyworddc:subject
*.cfKeyw
dcterms:abstractcerif:abstract
*.cfAbstract
cerif:researchActivities cfOrgUnit.cfResActcerif:researchInterests cfPers.cfResIntdcterms:alternative cfResPublSubtitle.cfSubtitlefoaf:name cfResPublNameAbbrev.cfNameAbbrevbibo:annotates cfResPublBiblNote.cfBiblNote
RECIPES FOR THE CERIF LD EXPOSURESEMANTIC FEATURES
CERIF SEMANTICS DOCUMENT
From a PDF document with the CERIF semantics…
CERIF SEMANTIC VOCABULARY
Current version at http://spi-fm.uca.es/neologism/semcerif
…To a RDF Vocabulary with the roles and classification terms
CERIF USING EXTERNAL VOCABULARIES.
The predicates cerif:classification and cerif:role enable to use external vocabularies to enrich our data CERIF
Ontology
CERIF Semanti
c Vocabul
ary
Other vocabulary
N
Other vocabulary
1...
cerif:classification
cerif:role
RECIPES FOR THE CERIF LD EXPOSUREADDITIONAL FEATURES
CERIF ADDITIONAL FEATURES
The current CERIF model contains Dublin Core and Formalised Dublin Core entities and attributes.
We will use external vocabularies through cerif:role and cerif:classification properties
Avoiding the need of storing and publishing entities related to any terminology.
RECIPES FOR THE CERIF LD EXPOSUREBASE ENTITIES
CERIF BASE ENTITY PROJECT
Project Acronym (cfProj.cfAcro) will be part of the resource identifier (ID) http://cris.myorganization.org/resource/proj
ects/IDPredicate Objectrdf:type “cerif:Project”cerif:internalIdentifier cfProj.cfPersIdcerif:startDate cfProj.cfStartDatecerif:endDate cfProj.cfEndDatecerif:acronym cfProj.cfAcrocerif:urifoaf:homepage cfProj.cfURI
PREFIXES USED IN EXAMPLES
# Bult-on prefixes
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
# External vocabularies
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix dc: <http://purl.org/dc/elements/1.1/> .
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix bibo: <http://purl.org/ontology/bibo/> .
# CERIF
@prefix cerif: <http://eurocris.org/cerif#> .
@prefix semcerif: <http://eurocris.org/cerif#> .
DESCRIPTION OF A CERIF PROJECT (I)
<http://cris.myOrganization.org/resource/projects/VOA3R>
a cerif:Project ;
rdfs:label "Repositorio de Agricultura y Acuicultura de acceso abierto virtual"@es-es , "Virtual Open Access Agriculture & Aquaculture Repository"@en-uk ;
dc:title "Repositorio de Agricultura y Acuicultura de acceso abierto virtual"@es-es , "Virtual Open Access Agriculture & Aquaculture Repository"@en-uk ;
cerif:title "Repositorio de Agricultura y Acuicultura de acceso abierto virtual"@es-es , "Virtual Open Access Agriculture & Aquaculture Repository"@en-uk ;
cerif:internalIdentifier "ff8080812ddb916a012ddb9170b60001" ;
cerif:acronym "VOA3R" ;
DESCRIPTION OF A CERIF PROJECT (II)
dcterms:abstract "The general objective of the VOA3R project is to improve the spread of European agriculture and aquaculture research results by using an innovative approach to sharing open access research products. "@en-uk ;
cerifs:abstract "The general objective of the VOA3R project is to improve the spread of European agriculture and aquaculture research results by using an innovative approach to sharing open access research products. "@en-uk ;
foaf:homepage <http://voa3r.eu/> ;
cerif:uri <http://voa3r.eu/> ;
cerif:startDate "2010-06-01"^^xsd:date ;
cerif:endDate "2013-05-31"^^xsd:date ;
CERIF BASE ENTITY ORGANISATION UNIT
Organisation Acronym (cfOrgUnit.cfAcro) will be part of the resource identifier (ID) http://cris.myorganization.org/resource/organi
sationUnits/IDPredicate Objectrdf:type “cerif:OrganisationUnit”cerif:internalIdentifier cfOrgUnit.cfOrgUnitIdcerif:headcount cfOrgUnit.cfHeadcountcerif:turnover cfOrgUnit.cfTurncerif:turnoverCurrencyCode cfOrgUnit.cfCurrCode
cerif:acronym cfOrgUnit.cfAcrofoaf:homepage cfOrgUnit.cfURI
DESCRIPTION OF A CERIF ORGANISATION UNIT (I)
<http://cris.myOrganization.org/resource/organisationUnits/UAH>
a cerif:OrganisationUnit ;
rdfs:label "Universidad de Alcala "@es-es , " University of Alcala "@en-uk ;
foaf:name"Universidad de Alcala "@es-es , " University of Alcala "@en-uk ;
cerif:name"Universidad de Alcala "@es-es , " University of Alcala "@en-uk ;
cerif:internalIdentifier "ff8081812f0d51ed012f2faa5bfe0003" ;
cerif:acronym "UAH" ;
DESCRIPTION OF A CERIF ORGANISATION UNIT (II)
foaf:homepage <http://www.uah.es> ;
cerif:uri <http://www.uah.es> ;
cerif:headcount "990" ;
cerif:turnover "200800" ;
CERIF BASE ENTITY PERSON
Person’s full name1 (encoded according to URL rules) will be part of the resource identifier (ID) http://cris.myorganization.org/resource/per
sons/ID
[1] Full name is settled by concatenating the attributes cfFirstNames, cfOtherNames and cfFamilyNames of the cfPersName entity.
Predicate Objectrdf:type “cerif:Person”cerif:internalIdentifier cfPers.cfPersIdcerif:birthdate cfPers.birthdatefoaf:gender cfPers.cfGenderfoaf:homepage cfPers.cfURIfoaf:firstName cfPersName.cfFirstNamesfoaf:familyName cfPersName.cfFamilyNamesfoaf:name $FULLNAME$rdfs:label $FULLNAME$
[1] We have chosen to create a new property cerif:birthdate, since the property foaf:birthday does not support the birth year.
DESCRIPTION OF A CERIF PERSON (I)
<http://cris.myOrganization.org/resource/persons/Miguel-Angel_Sicilia>
a cerif:Person ;
rdfs:label "Miguel-Angel Sicilia" ;
foaf:name "Miguel-Angel Sicilia" ;
cerif:name"Miguel-Angel Sicilia" ;
foaf:familyName "Sicilia" ;
foaf:firstName "Miguel-Angel" ;
foaf:givenName "Miguel-Angel" ;
DESCRIPTION OF A CERIF PERSON (II)
foaf:gender "m" ;
cerif:gender "m" ;
foaf:homepage <http://www.cc.uah.es/msicilia/> ;
cerif:uri<http://www.cc.uah.es/msicilia/> ;
cerif:researchInterests "Ontologies, learning technology" ;
cerif:internalIdentifier "ff8081812ec8ae9e012f0b02353d0008" ;
cerif:birthdate "1973-10-10"^^xsd:date ;
RECIPES FOR THE CERIF LD EXPOSURERESULT ENTITIES
CERIF ENTITY RESULT PUBLICATION
Original title of the publication (cfResPublTitle.cfTitle) encoded according to URL rules, will be part of the resource identifier (ID) http://cris.myorganization.org/resource/public
ations/ID
Later, we will link our bibliographic resources to bibliographic records in other LD systems, using the predicate owl:sameAs.
Predicate Objectrdf:type “cerif:Publication”cerif:internalIdentifier cfResPubl.cfResPublIddc:date cfResPubl.cfResPublDatefoaf:homepage cfResPubl.cfURI
CERIF ENTITY RESULT PUBLICATION (II)
Predicate Objectbibo:volume cfResPubl.cfVolbibo:edition cfResPubl.cfEditionbibo:issue cfResPubl.cfIssuebibo:pageStart cfResPubl.cfStartPagebibo:pageEnd cfResPubl.cfEndPagebibo:isbn cfResPubl.cfISBNbibo:issn cfResPubl.cfISSNbibo:number cfResPubl.cfNumbibo:numPages cfResPubl.cfTotalPagesdcterms:isPartOf1 “myorg:series/” + cfResPubl.cfSeries2
[1] Publications are linked to dynamically generated instances with type bib:Series[2] The prefix myorg refers to http://cris.myorganization.org/resource/ namespace.
Bibliographic metadata in RDF
CERIF ENTITY RESULT PATENT
Register number (cfResPat.cfPatentNum) of the patent will be part of the resource identifier (ID) http://cris.myorganization.org/resource/pat
ents/IDPredicate Objectrdf:type “cerif:Patent”cerif:internalIdentifier cfResPat.cfResPublIdcerif:approvalDate cfResPat.cfApprovDatecerif:registrationDate cfResPat.cfRegistrDatecerif:patentNumber cfResPat.cfPatentNumcerif:countryCode cfResPat.cfCountryCodefoaf:homepage cfResPat.cfURI
CERIF ENTITY RESULT PRODUCT
Internal Identifier (cfResProd.cfResProdInternId) of the product will be part of the resource identifier (ID) http://cris.myorganization.org/resource/pro
ducts/ID
Predicate Objectrdf:type “cerif:Product”cerif:internalIdentifier cfResProd.cfResProdIdcerif:productNumber cfResProd.cfResProdInternIdcerif:registrationDate cfResProd.cfRegistrDatefoaf:homepage cfResPubl.cfURI
RECIPES FOR THE CERIF LD EXPOSURELINK ENTITIES
CERIF ENTITY LINK RELATIONSHIP (I)
<<CERIF Ontology>>
Entity 1
<<CERIF Ontology>>
Entity 2
<<CERIF Ontology>>
Relationship
xsd:dateTime
xsd:float
cerif:linksToEntity
cerif:startDate
cerif:endDate
cerif:fraction
cerif:isLinkedByEntity
<<External Vocabulary>>
rdf:Resource
cerif:role
CERIF ENTITY LINK RELATIONSHIP (II)
ENTITYLINK refers to the name of the link entity (e.g.: cfOrgUnit_ResPubl). VOCAB-TERM-URI is a URI pointing to a given term of a external vocabulary.
Entity Link identifier (ID) must be retrieved/generated from the database http://cris.myorganization.org/resource/
ENTITYLINK/IDPredicate Objectrdf:type “cerif:Relationship”rdfs:label “Association ” + IDcerif:startDate ENTITYLINK.cfStartDatecerif:endDate ENTITYLINK.cfEndDatecerif:fraction ENTITYLINK.cfFractioncerif:role VOCAB-TERM-URI
RELATING A PROJECT WITH A ORGANISATION UNIT
<http://cris.myOrganization.org/resource/projects/VOA3R>
...
cerif:isLinkedByOrganisationUnit
<http://cris.myOrganization.org/resource/proj_orgunit/VOA3R-UAH-uuid> ,
….
<http://cris.myOrganization.org/resource/proj_orgunit/VOA3R-GRNET-uuid> ;
<http://cris.myOrganization.org/resource/organisationUnits/UAH>
...
cerif:linksToProject <http://cris.myOrganization.org/resource/proj_orgunit/VOA3R-UAH-uuid> ;
….
<http://cris.myOrganization.org/resource/proj_orgunit/Organic.Edunet-UAH-uuid> ;
DESCRIPTION OF A ENTITY LINK RELATIONSHIP
<http://cris.myOrganization.org/resource/proj_orgunit/VOA3R-UAH-uuid>
a cerif:Relationship;
rdfs:label "Association between VOA3R (Project) and UAH (Organisation Unit)" ;
cerif:role <http://eurocris.org/semcerif#participant> ;
cerif:startDate "1901-01-01 00:00:00.0" ;
cerif:endDate "2099-12-31 23:59:59.0" ;
cerif:fraction "0.75" .
CERIF ENTITY LINK CLASSIFICATION (I)
<<CERIF Ontology>>
Entity
<<CERIF Ontology>>
Classification
xsd:dateTime
xsd:float
cerif:isClassifiedBy
cerif:startDate
cerif:endDate
cerif:fraction
<<External Vocabulary>>
rdf:Resource
cerif:classification
CERIF ENTITY LINK CLASSIFICATION (II)
ENTITYLINK refers to the name of the link entity (e.g.: cfOrgUnit_Class). VOCAB-TERM-URI is a URI pointing to a given term of a external vocabulary.
Entity Link identifier (ID) must be retrieved/generated from the database ENTITY-URI/class/IDPredicate Object
rdf:type “cerif:Classification”rdfs:label “Classification ” + IDcerif:startDate ENTITY_LINK.cfStartDatecerif:endDate ENTITY_LINK.cfEndDatecerif:fraction ENTITY_LINK.cfFractioncerif:classification VOCAB-TERM-URI
CLASSIFYING A CERIF PROJECT
<http://cris.myOrganization.org/resource/organisationUnits/UAH>
...
cerif:isClassifiedBy
<http://cris.myOrganization.org/resource/organisationUnits/UAH/class/uuid>;
DESCRIPTION OF A ENTITY LINK CLASSIFICATION
<http://cris.myOrganization.org/resource/organisationUnits/UAH/class/uuid>
a cerif:Classification ;
rdfs:label "Classification for UAH as an University " ;
cerif:classification <http://eurocris.org/semcerif#University> ;
cerif:startDate "1901-01-01 00:00:00.0" ;
cerif:endDate "2099-12-31 23:59:59.0" ;
cerif:fraction "1" .
RECIPES FOR THE CERIF LD EXPOSUREOTHER CERIF ENTITIES
OTHER CERIF ENTITIES
Shared Entities [Not exposed as Linked Data resources] Currency Country Language
Infrastructure [Exposed as Linked Data resources] Equipment Facility Service
Second Level [Exposed as Linked Data resources] Funding, Event, PrizeAward Metrics, Cite CurriculumVitae, ExpertiseAndSkill, Qualification, ElectronicAddress, PostalAddress
CERIF SHARED ENTITIES
Country, Language and Currency are shared globally entities, therefore is not necessary to carry out an exposure of data from them in our CERIF-LD datasets.
Instead, we should use the countries, currencies and languages available on Dbpedia. http://dbpedia.org/ontology/Currency http://dbpedia.org/ontology/Country http://dbpedia.org/ontology/Language
CERIF 2ND LEVEL ENTITY FUNDING
Original name1 associated to the funding will be part of the resource identifier (ID) http://cris.myorganization.org/resource/fun
ding/ID
[1] Taking into account the condition [cfTrans=o]
Predicate Objectrdf:type “cerif:Funding”cerif:internalIdentifier cfFund.cfFundIdcerif:startDate cfFund.cfStartDatecerif:endDate cfFund.cfEndDatecerif:amount cfFund.cfAmountcerif:currencyCode cfFund.cfCurrCodefoaf:homepage cfFund.cfURI
CERIF MODEL EXTENSION
CERIF LINKED OPEN DATA ENTITY
Creation of a new entity
structure cfCERIF-LOD
internalcfEntitycfInstanceId
predicate
cfPredicateClassIdcfPredicateClassSchemeIdcfStartDatecfEndDate
external
cfxObjectURIcfxSourceURIcfxURIKind (opt)cfClassId (opt)cfClassSchemeId (opt)
CERIF LINKED OPEN DATA ENTITY
cfCERIF-LOD examplecfEntity cfPerson
cfInstanceId miguel-angel-sicilia-uuid
cfPredicateClassId professor
cfPredicateClassSchemeId CERIF Semantics 2011
cfStartDate 2000-06-01 00:00:00.0
cfEndDate 2004-02-31 23:59:59.0
cfxObjectURI http://otherRIS.org/resource/organisationUnit/CarlosIII
cfxSourceURI http://otherRIS.org/
cfxURIKind (opt) absolute
cfClassId (opt) cfOrgUnit
cfClassSchemeId (opt) cfCERIF-2011-Entities-Collection
Example 1
CERIF LINKED OPEN DATA ENTITY
cfCERIF-LOD examplecfEntity cfPerson
cfInstanceId miguel-angel-sicilia-uuid
cfPredicateClassId cfsameAs-uuid
cfPredicateClassSchemeId cflinkedopendata-2008-1.3-uuid
cfStartDatecfEndDatecfxObjectURI http://dblp.l3s.de/d2r/resource/authors/Miguel-
Angel_SiciliacfxSourceURI http://dblp.l3s.de/d2r/sparql
cfxURIKind (opt)cfClassId (opt)cfClassSchemeId (opt)
Example 2
KEY USE CASE
KEY USE CASE
Miguel-Angel Sicilia is a researcher who has worked for several organizations in his career. Now his research information is spread across several RIS located at different Universities. Miguel-Angel would like to have all his information integrated…..
STEP 1: RETRIEVE LOCAL IDENTIFIERS
SELECT DISTINCT ?researcher
WHERE {
?researcher a <http://eurocris.org/cerif#Person> .
?researcher foaf:firstName "Miguel-Angel" .
?researcher foaf:familyName "Sicilia" ;
}
ORDER BY ?researcher
What is the Miguel-Angel Sicilia’s identifier at University of Alcalá? And in Carlos III of Madrid?
Sending the same query to multiple the registered dataset's endpoints
STEP 2: OBTAINING ORGANISATION NAMES
SELECT DISTINCT ?organisationName
WHERE {
<http://cris.uah.es/resource/persons/Miguel-Angel_Sicilia> cerif:linksToOrganisationUnit ?ENTITYLINK .
?organisation cerif:isLinkedByPerson ?ENTITYLINK .
?organisation a <http://eurocris.org/cerif#OrganisationUnit> .
?organisation foaf:name ?organisationName ;
}
ORDER BY ?organisationName
In what organizations has worked Miguel-Angel? From previous results, we will send the following
query to the proper dataset's endpoints
STEP 3: JOINING AND FORMATTING RESULTS
Miguel-Angel_Sicilia has worked in the following universities: University of Alcalá Carlos III University of Madrid
DEMO
FRONT-END FOR CERIF LD DATASET
http://voa3r.cc.uah.es:443/dataset/
SOME ENTITIES IN VOA3R
http://voa3r.cc.uah.es:443/dataset/resource/persons/Miguel-Angel_Sicilia
http://voa3r.cc.uah.es:443/dataset/resource/projects/VOA3R
http://voa3r.cc.uah.es:443/dataset/resource/organisationUnits/UAH
BOOTSTRAPPING
BOOTSTRAPPINGInstalling and configuring of the Linked Data Server
CONFIGURATION FILE OF D2R SERVER
map:Projects a d2rq:ClassMap;
d2rq:dataStorage map:database;
d2rq:uriPattern "projects/@@cfProj.cfAcro|urlify@@";
d2rq:class cerif:Project;
map:Projects_cfProjId a d2rq:PropertyBridge;
d2rq:belongsToClassMap map:Projects;
d2rq:property cerif:internalIdentifier;
d2rq:column "cfProj.cfProjId";
map:Projects_cfAcro a d2rq:PropertyBridge;
d2rq:belongsToClassMap map:Projects;
d2rq:property cerif:acronym;
d2rq:column "cfProj.cfAcro";
.
REQUIRED ACTIONS IN THE D2R CONFIGURATION
Adaptation to multiple languages D2R does not support dynamic language tags.
Therefore, it is necessary to replicate the predicates d2rq:PropertyBridge.
Data Normalization There are several attributes of the CERIF that
should be normalized before being exposed as Linked Data. Not-atomic attributes, e.g.: cfKeyw, cfResInt and
cfResAct. Publication of the cfPers.cfGender attributes. Attributes with NULL values requires a special
treatment for some databases. Normalization in publication time (vía D2R conf.
file) or within the RIS system.
BOOTSTRAPPINGLinking our dataset with external data
DISCOVERING LINKS WITH EXTERNAL RESOURCES
LD Basic principle: To set RDF links pointing into other data sources on the Web
http://richard.cyganiak.de/2007/10/lod/
CERIF dataset
CERIF db
SILK Link
Discovery Framework
owl:sameAscerif:count
ry
BOOTSTRAPPINGPublishing metadata
METADATA FOR LINKED DATA RESOURCES DESCRIPTIONS
Needed for additional metadata for RDF resource descriptions published in CERIF-LD datasets Assurance the origin of data as well as to enable
them to assess the quality of data
Enable external applications to use CERIF data on a secure legal basis
enables to aiding discovery and indexing of the our data by crawlers
ADDITIONAL METADATA SERVED BY D2R d2r:documentMetadata [
# General metadata
rdf:type foaf:Document ;
rdfs:comment "This resource description is published according to CERIF LD.";
# Provenance metadata
dc:creator <http://www.myOrganization.org> ;
dc:publisher <http://www.myOrganization.org> ;
dc:date "2011-01-01"^^xsd:date;
# License metadata
dc:rights <http://creativecommons.org/licenses/by-nc/3.0> ;
# Dataset metadata
void:inDataset <http://cris.myOrganization.org/dataset> ; ];
BOOTSTRAPPINGPublishing CERIF-LD datasets
DESCRIBING CERIF-LD DATASETS WITH VOIDhttp://cris.myOrganization.org/dataset
a void:Dataset ;
rdfs:label "CERIF Research Information Dataset of MyOrganization" ;
dc:title "CERIF Research Information Dataset of MyOrganization" ;
dc:description "Dataset describing CERIF resources from the corporte current research information system";
foaf:homepage <http://cris.myOrganization.org/dataset.html> ;
foaf:isPrimaryTopicOf <http://cris.myOrganization.org/dataset.rdf> ;
void:sparqlEndpoint <http://cris.myOrganization.org/sparql>;
void:vocabulary <http://xmlns.com/foaf/0.1/>;
void:vocabulary <http://eurocris.org/cerif>;
void:vocabulary <http://eurocris.org/semcerif>;
void:exampleResource <http://cris.myorganization.org/resource/projects/VOA3R> ;
DISSEMINATING OUR DATASET
Include in open data sources registers (linked or non linked)
Using VoID, CERIF-LD datasets can be made discoverable:
Enable to the further interconnection with other datasets
Foster the development of new web apps
ISSUES AND CHALLENGES
OPEN ISSUES: GENERAL
Links Entity introduce “hops” among linked resources into our datasets. It does the navigation via SPARQL a little more complex. Add syntactic sugar?
The configuration of the D2R is too heavy and repetitive, encouraging intensive use of "copy / paste" which is prone to errors Any proposal to further automate file generation
D2R? How to perform a synchronous and
sustainable evolution of the CERIF model components? Namely: data model, SQL scripts, XML Schema,
RDFS ontologies and the D2R pre-configuration file.
OPEN ISSUES: URI DESIGN
Uniqueness of identifiers: the selected attributes to be part of the resource identifiers (acronyms, titles, full names, etc.) must be unique in its local dataset. Alter CERIF database model introducing unique
keys? Resource Identifiers (URIs) must does not
change over time. What happens if you change the title of a publication? Alter CERIF model introducing a new attribute
LDIdentifier for all entities. Links Entity need short unique identifiers. Its
primary key is composed by many attributes.
OPEN ISSUES: LINKING EXTERNAL RESOURCES
How to publish links between our data and external resources? Extend CERIF model including a new entity for
store RDF statements: triples subject/predicate/object
How to establish the mapping between terms and classifications of the CERIF database with the LD external vocabularies? Reusing the attribute cfURI of the cfClass entity?
OPEN ISSUES: LINKING BIBLIOGRAPHIC DATABASES
We have used the Bibliographic Ontology for describe linked data for result publications. There is another proposal, named SWRC
(Semantic Web for Research Communities) The major bibliographic databases are not
yet publishing linked data. Elsevier Developer Network is addressing it now.
CONCLUSIONS
CONCLUSIONS
Proposal of recommendations for the exposure of Linked Data according to CERIF Ontology. A basic, official RDF(S) mapping of CERIF.
Example architecture (not the only) for publish linked data from RIS databases
A set of recipes for translate CERIF model entities into RDF classes and properties and a guided approach for bootstrapping.
There are a number of issues which require a further treatment
Linked Data as a way for interchanging data between different stakeholders involved in Research.
THANKSMiguel-Angel Sicilia [email protected]
Iván Ruiz Rube [email protected]