Top Banner
Managing Metadata for Science, Technology and Innovation Studies: The RISIS Case Al Koudous Idrissou, Ali Khalili, Rinke Hoekstra and Peter van den Besselaar Vrije Universiteit Amsterdam/University of Amsterdam [email protected]
26

Managing Metadata for Science and Technology Studies: the RISIS case

Jan 09, 2017

Download

Science

Rinke Hoekstra
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Managing Metadata for Science and Technology Studies: the RISIS case

Managing Metadata for Science, Technology and Innovation Studies: The RISIS Case

Al Koudous Idrissou, Ali Khalili, Rinke Hoekstra and Peter van den BesselaarVrije Universiteit Amsterdam/University of [email protected]

• Started in January 2014 • 4 years • 13 partners from 10 countries

• 7 universities • 6 public research organizations

• Goal: promote a distributed research infrastructure to advance science & innovation studies

About RISIS

1

• Started in January 2014 • 4 years • 13 partners from 10 countries

• 7 universities • 6 public research organizations

• Goal: promote a distributed research infrastructure to advance science & innovation studies

About RISIS

1

• Started in January 2014 • 4 years • 13 partners from 10 countries

• 7 universities • 6 public research organizations

• Goal: promote a distributed research infrastructure to advance science & innovation studies

About RISIS

1

Page 2: Managing Metadata for Science and Technology Studies: the RISIS case

Science & Technology Studies

• Study the dynamics of scientific ideas.

• Interaction between academia, business and government.

• Highly interdisciplinarysocial sciences, economics, political science, humanities

• Highly heterogeneous datastructured vs. unstructuredqualitative vs. quantitative

Page 3: Managing Metadata for Science and Technology Studies: the RISIS case

The RISIS Project

• "an explosion of experimental datasets since 2000 … mostly thanks to EC supported project"

• A distributed research infrastructure to advance science & innovation studies

• Serving research: consolidate and integrate existing datasetscomplement with new datasets on key issues currently not covered develop software platforms to support research (extract, integrate, structure and treat semantic web data)

• Serving society:A radically improved evidence base for research & innovation policies

Page 4: Managing Metadata for Science and Technology Studies: the RISIS case

Six Use Cases

1. Where do what types of firms innovate, how do they develop, where do they grow fastest?

2. How stable and large are EU-promoted networks? How do joint funding and emerging science & technologies affect Europe?

3. What is the quality and extent of public sector research? Build registers at a European level, integrated views of excellence (leiden ranking etc.)

4. Track the careers of researchers across borders

5. Effect and impact of research & innovation studies

6. Develop integrated data and tools for researchers in the field

Page 5: Managing Metadata for Science and Technology Studies: the RISIS case

Six Use Cases

1. Where do what types of firms innovate, how do they develop, where do they grow fastest?

2. How stable and large are EU-promoted networks? How do joint funding and emerging science & technologies affect Europe?

3. What is the quality and extent of public sector research? Build registers at a European level, integrated views of excellence (leiden ranking etc.)

4. Track the careers of researchers across borders

5. Effect and impact of research & innovation studies

6. Develop integrated data and tools for researchers in the field

Page 6: Managing Metadata for Science and Technology Studies: the RISIS case

The Types of Data in SMSData Integration

Organization Product Agreement

Person PolicyPolicy

Evaluation Location

CIB ETER EUPRO JOREP Leiden-Ranking

MORE I Nano Profile SIPER VICO

Higher Education

Firm Funding Body

Publication

Patent

Project

Investment

Funding Program

6

Page 7: Managing Metadata for Science and Technology Studies: the RISIS case

Semantically Mapping Science (SMS)

DB DBDB DB

RISIS Private DataRISIS Public Data

VOID

RDFVOID

RDFVOID

RDFVOID

[Linked Data] API

Data Cache(Triple Store)

Data Viz. & Exploration views Interoperability with corTEXT

Named Entity Recognition

[Linked] Open Data

Public Data Access Methods(SPARQL, API, RSS,…)

Meta-dataServices

Basic Geo Services

Innovative Geo Services

Integration withlocal datasets

Integration withpublic datasets

CategoryServices

Apps

Integration withsocial data

Access Control Service

Domain Adaptation

Service

Identifier Management

ServiceIdentity Resolution Service

VOID

Page 8: Managing Metadata for Science and Technology Studies: the RISIS case

But wait a moment… hasn't this been done before?

• … solve a similar data integration problempharma (OpenPHACTS), socio-economic history, linguistics, media (CLARIAH, CEDAR, etc.).

• … solve a similar data search, indexing and cataloguing problemdatahub.io, lodlaundromat.org

• … solve similar metadata representation problemsDCAT, VOID, etc.

Page 9: Managing Metadata for Science and Technology Studies: the RISIS case
Page 10: Managing Metadata for Science and Technology Studies: the RISIS case

data privacy

Page 11: Managing Metadata for Science and Technology Studies: the RISIS case

data privacy data licensing

Page 12: Managing Metadata for Science and Technology Studies: the RISIS case

data privacy data licensing

data paywall

Page 13: Managing Metadata for Science and Technology Studies: the RISIS case

data privacy data licensing

data paywallphysical location

Page 14: Managing Metadata for Science and Technology Studies: the RISIS case

Semantically Mapping Science (SMS)

DB DBDB DB

RISIS Private DataRISIS Public Data

VOIDVOIDVOID

SMS [Linked Data] API

Data Cache(Triple Store)

Data Viz. & Exploration views Interoperability with corTEXT

Named Entity Recognition

[Linked] Open Data

Meta-dataServices

Basic Geo Services

Innovative Geo Services

Integration withlocal datasets

Integration withpublic datasets

CategoryServices

Apps

Integration withsocial data

Domain Adaptation

Service

Identifier Management

Service

Identity Resolution Service

Access Control Points

RDFmetadata VOIDVOID

RDFstoreconvert convert

metadata metadata

RDFmetadata

convert

Page 15: Managing Metadata for Science and Technology Studies: the RISIS case

Semantically Mapping Science (SMS)

DB DBDB DB

RISIS Private DataRISIS Public Data

VOIDVOIDVOID

SMS [Linked Data] API

Data Cache(Triple Store)

Data Viz. & Exploration views Interoperability with corTEXT

Named Entity Recognition

[Linked] Open Data

Meta-dataServices

Basic Geo Services

Innovative Geo Services

Integration withlocal datasets

Integration withpublic datasets

CategoryServices

Apps

Integration withsocial data

Domain Adaptation

Service

Identifier Management

Service

Identity Resolution Service

Access Control Points

RDFmetadata VOIDVOID

RDFstoreconvert convert

metadata metadata

RDFmetadata

convert

How can we still provide an integrated view on this data?

Page 16: Managing Metadata for Science and Technology Studies: the RISIS case

Semantically Mapping Science (SMS)

DB DBDB DB

RISIS Private DataRISIS Public Data

VOIDVOIDVOID

SMS [Linked Data] API

Data Cache(Triple Store)

Data Viz. & Exploration views Interoperability with corTEXT

Named Entity Recognition

[Linked] Open Data

Meta-dataServices

Basic Geo Services

Innovative Geo Services

Integration withlocal datasets

Integration withpublic datasets

CategoryServices

Apps

Integration withsocial data

Domain Adaptation

Service

Identifier Management

Service

Identity Resolution Service

Access Control Points

RDFmetadata VOIDVOID

RDFstoreconvert convert

metadata metadata

RDFmetadata

convert

How can we still provide an integrated view on this data?

Do existing vocabularies suffice?

Page 17: Managing Metadata for Science and Technology Studies: the RISIS case

How do experts assess the suitability of a dataset?

• Knowledge acquisition & elicitation with expertsinterviews -> first design -> user experiences -> revise & adapt

• Distinguish between private, publicly accessible, and other public data.

Page 18: Managing Metadata for Science and Technology Studies: the RISIS case

How do experts assess the suitability of a dataset?

• Knowledge acquisition & elicitation with expertsinterviews -> first design -> user experiences -> revise & adapt

• Distinguish between private, publicly accessible, and other public data.

1. User friendly web interface for viewing dataset metadata; 2. Show conditions under which the data can be used; 3. Provide detailed information about the dataset, to 4. Enable users to gain an in depth understanding of the data; 5. Facilitate trust (quality assessment); 6. Allow for both simple and advanced search (background knowledge)

Page 19: Managing Metadata for Science and Technology Studies: the RISIS case

Operationalisation

1. User interfacecategorisation of different types of metadata, non-technical terms, hints

2. Usage conditions legal aspects, access conditions, but also technical (data format, size, model)

3. Information & Understandingoverview, content description, temporal aspects, structure of the data

4. Trustprovenance and origin of data, when, how, and by whom it was created

5. SearchAll of the above + the use of external knowledge sources to show connections

Page 20: Managing Metadata for Science and Technology Studies: the RISIS case

Operationalisation

1. User interfacecategorisation of different types of metadata, non-technical terms, hints

2. Usage conditions legal aspects, access conditions, but also technical (data format, size, model)

3. Information & Understandingoverview, content description, temporal aspects, structure of the data

4. Trustprovenance and origin of data, when, how, and by whom it was created

5. SearchAll of the above + the use of external knowledge sources to show connections

Fig. 2. RISIS metadata coverage overview through knowledge type categorization.

Technical aspects. RISIS's metadata provides information about the datasetmodel used. This informs on whether the dataset follows the traditional tab-ular model (Relational, Spreadsheet, etc.) or the graph model (RDF). It alsocovers other information such as the format and the size of the dataset.

Legal aspects. The legal aspects of a dataset is covered by the RISIS metadatathrough a license which explicitly determines the terms under which a datasetcan be used, rights which provides information such as property and intellectualrights associated with the data, terms of use which describe non-binding con-ditions, access conditions and visit conditions which respectively describe theconditions in which end-users can access or visit a dataset and, non-disclosureagreement which specifies conditions of access to confidential information whichwould need signing a non-disclosure agreement with the dataset holder(s).

Access. To inform the data consumer on how to access or query the data, themetadata provides information such as the opening status which notifies whetherthe data is open for visit, access type which specified whether the data can bevisited, requested or both or whether the data is access free. In addition, itprovides access URL which is information on the landing page, feed, SPARQLendpoint or other type of resource that grant access to the distribution of thedataset and, the data download address which is information on the location ofthe dataset for download.

Data quality. All the above information could be used to assess the quality ofa dataset. However, to specifically assess the work done by dataset providers

Page 21: Managing Metadata for Science and Technology Studies: the RISIS case

In more detail

and generic domain of the problem, SMS is intended to be useful not only forSTIS but also for the humanities and social sciences.

5 Conclusions & Future Work

This paper presents an approach for managing metadata in the field of science,technology and innovation studies. The approach was developed and applied inthe context of the RISIS-SMS project with the goal of supporting data integra-tion, discovery and search across datasets, maintaining privacy, and obtaininguser trust while focussing on data that are not directly accessible. A contribu-tion of this work is the requirements elicited by interviewing the stakeholders.The requirement analysis guided the design of a new vocabulary, together withreview of existing metadata vocabularies that helped us filling in part of themetadata needed to accommodate the domain needs. Additionally, to meet therequirements, we designed and implemented a user-friendly interface which al-lows non-expert users to easily author metadata in RDF.

As future work, we envisage to extend our vocabulary to cover aspects relatedto the quality and provenance of data. We also plan to conduct a usabilityevaluation with end-users of the system to ensure that our user interface andmetadata specifications fulfil the user needs.

References

1. P. Ciccarese, S. Soiland-Reyes, K. Belhajjame, A. J. Gray, C. Goble, and T. Clark.Pav ontology: provenance, authoring and versioning. Journal of biomedical seman-

tics, 4(1):1–22, 2013.2. C. Daraio, M. Lenzerini, C. Leporelli, H. F. Moed, P. Naggar, A. Bonaccorsi, and

A. Bartolucci. Data integration for research and innovation policy: an ontology-based data management approach. Scientometrics, pages 1–15, 2015.

3. P. Groth, A. Loizou, A. J. Gray, C. Goble, L. Harland, and S. Pettifer. Api-centriclinked data integration: The open phacts discovery platform case study. Web Se-

mantics: Science, Services and Agents on the World Wide Web, 29:12–18, 2014.4. E. J. Hackett, O. Amsterdamska, M. Lynch, and J. Wajcman. The handbook of

science and technology studies. The MIT Press, 2008.5. A. Khalili, A. Loizou, and F. van Harmelen. Adaptive linked data-driven web

components: Building flexible and reusable semantic web interfaces. Semantic Web

Conference (ESWC) 2016, 2016.6. J. P. McCrae, P. Labropoulou, J. Gracia, M. Villegas, V. Rodrıguez-Doncel, and

P. Cimiano. One ontology to bind them all: The meta-share owl ontology for theinteroperability of linguistic datasets on the web. In The Semantic Web: ESWC

2015 Satellite Events, pages 271–282. Springer, 2015.7. A. Merono-Penuela, A. Ashkpour, M. Van Erp, K. Mandemakers, L. Breure,

A. Scharnhorst, S. Schlobach, and F. Van Harmelen. Semantic technologies forhistorical research: A survey. Semantic Web, 6(6):539–564, 2014.

8. P. Van den Besselaar. The cognitive and the social structure of sts. Scientometrics,51(2):441–460, 2001.

Fig. 4. The RISIS Ontology and the vocabularies it reuses.

for RDF data”. Figure 3 illustrates the mapping between the RISIS require-ment and existing shared vocabularies. Yet, reusing all the above vocabulariesdoes not entirely satisfy the RISIS’s need for describing a dataset. This forcedthe creation of new vocabularies such risis:usecase or risis:accessConditions (seeFigure 3) for concepts that are not covered by any of the selected vocabularies.

6 User-friendly authoring of metadata

As already mentioned in Section 2, the RISIS metadata about datasets is mod-eled in RDF. Resource Description Framework allows metadata to be shared and,facilitates integration in a structured and semantically machine interpretable wayacross di↵erent applications exploiting the metadata. The adoption of RDF as adata model for the RISIS project triggered the problem that, the auto-generatedmetadata stored need to be manipulated by data-owners who are not familiarwith the Semantic Web technologies and the ways to generate a standard andvalid RDF. In order to tackle this issue, we created a graphical user interface(UI) to enable RISIS non-expert data-owner users to generate and update theirdataset metadata .

To design the RISIS graphical UI for handling RDF metadata editor, wefollowed a user-centered approach where we first collected the UI requirementsby interviewing the potential end-users of the RISIS’s platform. We summarizehere the set of features which needed to be supported by the metadata editor:(1) render metadata properties in di↵erent categories (2) avoid presenting tothe user technical metadata properties (e.g. RDF dump, byte Size) (3) supportmetadata properties with hint to understand the meaning of the property (4)support the user with human readable information by avoiding displaying fullURI for example (5) facilitate inserting metadata values which follow a certainpattern (e.g. DataTime values, URLs, etc.)

Page 22: Managing Metadata for Science and Technology Studies: the RISIS case

In more detail

and generic domain of the problem, SMS is intended to be useful not only forSTIS but also for the humanities and social sciences.

5 Conclusions & Future Work

This paper presents an approach for managing metadata in the field of science,technology and innovation studies. The approach was developed and applied inthe context of the RISIS-SMS project with the goal of supporting data integra-tion, discovery and search across datasets, maintaining privacy, and obtaininguser trust while focussing on data that are not directly accessible. A contribu-tion of this work is the requirements elicited by interviewing the stakeholders.The requirement analysis guided the design of a new vocabulary, together withreview of existing metadata vocabularies that helped us filling in part of themetadata needed to accommodate the domain needs. Additionally, to meet therequirements, we designed and implemented a user-friendly interface which al-lows non-expert users to easily author metadata in RDF.

As future work, we envisage to extend our vocabulary to cover aspects relatedto the quality and provenance of data. We also plan to conduct a usabilityevaluation with end-users of the system to ensure that our user interface andmetadata specifications fulfil the user needs.

References

1. P. Ciccarese, S. Soiland-Reyes, K. Belhajjame, A. J. Gray, C. Goble, and T. Clark.Pav ontology: provenance, authoring and versioning. Journal of biomedical seman-

tics, 4(1):1–22, 2013.2. C. Daraio, M. Lenzerini, C. Leporelli, H. F. Moed, P. Naggar, A. Bonaccorsi, and

A. Bartolucci. Data integration for research and innovation policy: an ontology-based data management approach. Scientometrics, pages 1–15, 2015.

3. P. Groth, A. Loizou, A. J. Gray, C. Goble, L. Harland, and S. Pettifer. Api-centriclinked data integration: The open phacts discovery platform case study. Web Se-

mantics: Science, Services and Agents on the World Wide Web, 29:12–18, 2014.4. E. J. Hackett, O. Amsterdamska, M. Lynch, and J. Wajcman. The handbook of

science and technology studies. The MIT Press, 2008.5. A. Khalili, A. Loizou, and F. van Harmelen. Adaptive linked data-driven web

components: Building flexible and reusable semantic web interfaces. Semantic Web

Conference (ESWC) 2016, 2016.6. J. P. McCrae, P. Labropoulou, J. Gracia, M. Villegas, V. Rodrıguez-Doncel, and

P. Cimiano. One ontology to bind them all: The meta-share owl ontology for theinteroperability of linguistic datasets on the web. In The Semantic Web: ESWC

2015 Satellite Events, pages 271–282. Springer, 2015.7. A. Merono-Penuela, A. Ashkpour, M. Van Erp, K. Mandemakers, L. Breure,

A. Scharnhorst, S. Schlobach, and F. Van Harmelen. Semantic technologies forhistorical research: A survey. Semantic Web, 6(6):539–564, 2014.

8. P. Van den Besselaar. The cognitive and the social structure of sts. Scientometrics,51(2):441–460, 2001.

Fig. 4. The RISIS Ontology and the vocabularies it reuses.

for RDF data”. Figure 3 illustrates the mapping between the RISIS require-ment and existing shared vocabularies. Yet, reusing all the above vocabulariesdoes not entirely satisfy the RISIS’s need for describing a dataset. This forcedthe creation of new vocabularies such risis:usecase or risis:accessConditions (seeFigure 3) for concepts that are not covered by any of the selected vocabularies.

6 User-friendly authoring of metadata

As already mentioned in Section 2, the RISIS metadata about datasets is mod-eled in RDF. Resource Description Framework allows metadata to be shared and,facilitates integration in a structured and semantically machine interpretable wayacross di↵erent applications exploiting the metadata. The adoption of RDF as adata model for the RISIS project triggered the problem that, the auto-generatedmetadata stored need to be manipulated by data-owners who are not familiarwith the Semantic Web technologies and the ways to generate a standard andvalid RDF. In order to tackle this issue, we created a graphical user interface(UI) to enable RISIS non-expert data-owner users to generate and update theirdataset metadata .

To design the RISIS graphical UI for handling RDF metadata editor, wefollowed a user-centered approach where we first collected the UI requirementsby interviewing the potential end-users of the RISIS’s platform. We summarizehere the set of features which needed to be supported by the metadata editor:(1) render metadata properties in di↵erent categories (2) avoid presenting tothe user technical metadata properties (e.g. RDF dump, byte Size) (3) supportmetadata properties with hint to understand the meaning of the property (4)support the user with human readable information by avoiding displaying fullURI for example (5) facilitate inserting metadata values which follow a certainpattern (e.g. DataTime values, URLs, etc.)

Fig. 3. RISIS's Ontology. A view over mapped vocabularies reused.

respectively The Dublin Core metadata Element Set9 which is a ”vocabulary offifteen properties for use in resource description”, The PROV Ontology10 which isused to provide provenance description, The Vocabulary of Interlinked datasets(VoID)11 which is a data-model specific vocabulary for expressing metadataabout RDF datasets and, The Friend of a friend vocabulary12 for describingpersons. Although provenance is not shown in Figure 3, we discuss it here as ithas been extensively used behind the scene for describing data manipulations.

Other reused vocabularies that involved less coverage of the RISIS require-ments include DCAT which is primarily a ”vocabulary designed to facilitateinteroperability between data catalogs published on the Web”, DISCO13 whichis a vocabulary for documenting research and survey data, WAIVER14 which isa vocabulary for waivers of rights, The Provenance, Authoring and Versioning(PAV) [1] which is a ”lightweight ontology for capturing just enough descrip-tions essential for tracking the provenance, authoring and versioning of webresources”, The Simple Knowledge Organization System (SKOS)15 which is a ”acommon data model for sharing and linking knowledge organization systems viathe Semantic Web” and, RDF Schema16 which is a ”data-modeling vocabulary

9http://dublincore.org/documents/dces/

10https://www.w3.org/TR/2013/REC-prov-o-20130430/

11https://www.w3.org/TR/void/

12http://xmlns.com/foaf/spec/

13http://rdf-vocabulary.ddialliance.org/discovery.html

14http://vocab.org/waiver/terms

15https://www.w3.org/TR/swbp-skos-core-spec

16https://www.w3.org/TR/rdf-schema/

Page 23: Managing Metadata for Science and Technology Studies: the RISIS case

Ali Khalili, Antonis Loizou and Frank van Harmelen. Adaptive Linked Data-driven Web Components: Building Flexible and Reusable Semantic Web Interfaces

Page 24: Managing Metadata for Science and Technology Studies: the RISIS case

Ali Khalili, Antonis Loizou and Frank van Harmelen. Adaptive Linked Data-driven Web Components: Building Flexible and Reusable Semantic Web Interfaces

Page 25: Managing Metadata for Science and Technology Studies: the RISIS case

Ali Khalili, Antonis Loizou and Frank van Harmelen. Adaptive Linked Data-driven Web Components: Building Flexible and Reusable Semantic Web Interfaces

Page 26: Managing Metadata for Science and Technology Studies: the RISIS case

Discussion

• Science & innovation studies thrives on diverse and heterogeneous data

• Existing platforms do not take access restrictions into account, or

• … they do not provide sufficiently descriptive metadata to support research

• We performed a requirements analysis for minimal metadata needs

• Resulting in a vocabulary that integrates and connects existing standards, and

• … drives a Linked Data driven data search portal.