YOU ARE DOWNLOADING DOCUMENT

Please tick the box to continue:

Transcript
Page 1: euclid_linkedup WWW tutorial (Besnik Fetahu)

Online Learning and Linked DataLessons Learned and Best Practices

Dataset Profiling

7. April 2023 1Besnik Fetahu

Page 2: euclid_linkedup WWW tutorial (Besnik Fetahu)

LinkedUp: Data Catalog Features

34 linked datasets of educational relevance (http://datahub.io/dataset?organization=linked-education)

VoID representations of datasets include the following information:

Manual dataset schema alignments

Accessibility information, i.e. SPARQL endpoint URL

7. April 2023 2Besnik Fetahu

http://purl.org/ontology/bibo/Thesis owl:equivalentClass http://purl.org/ontology/bibo/Thesishttp://swrc.ontoware.org/ontology#Article owl:equivalentClass http://purl.org/ontology/bibo/AcademicArticle

http://data.linkededucation.org/linkedup/dataset/data-open-ac-uk void:sparqlEndpoint http://data.open.ac.uk/queryCo-occurence graph of data types in 146 datasets: 144 Vocabularies, 588 highly overlapping types, 719 Properties

Assessing the Educational Linked Data Landscape, D’Aquin, M., Adamou, A., Dietze, S., ACM Web Science 2013 (WebSci2013), Paris, France, May 2013.

Page 3: euclid_linkedup WWW tutorial (Besnik Fetahu)

LinkedUp: Data Catalog Features

34 linked datasets of educational relevance (http://datahub.io/dataset?organization=linked-education)

VoID representations of datasets include the following information:

Datasets’ resources type graph

Datasets’ Topic Extraction (Dataset Profiling)

7. April 2023 3Besnik Fetahu

morelab

OpenCourseWare 

Page 4: euclid_linkedup WWW tutorial (Besnik Fetahu)

LinkedUp: Data Catalog Features

34 linked datasets of educational relevance (http://datahub.io/dataset?organization=linked-education)

VoID representations of datasets include the following information:

Federated query interface:

7. April 2023 4Besnik Fetahu

PREFIX void: <http://rdfs.org/ns/void#> PREFIX aiiso: <http://purl.org/vocab/aiiso/schema#>

SELECT DISTINCT ?endpoint WHERE{ ?ds void:sparqlEndpoint ?endpoint. {{ ?ds void:classPartition [void:class aiiso:School] } UNION{?ds void:subset [void:classPartition [void:class

aiiso:School]] }} }

Page 5: euclid_linkedup WWW tutorial (Besnik Fetahu)

LinkedUp: Why dataset profiling?

7. April 2023 5Besnik Fetahu

Few linked dataset characteristics (from Linked Open Data Cloud).

Growing number of datasets: 227 datasets

Data represented as triples: 31 billion triples

Multi-lingual content: 18 languages

Broad set of topics covered

Inter-dataset links

Domain Number of datasets Triples % (Out-)Links

Media 25 1,841,852,061 5.82 % 50,440,705 Geographic 31 6,145,532,484 19.43 % 35,812,328

Government 49 13,315,009,400 42.09 % 19,343,519

Publications 87 2,950,720,693 9.33 % 139,925,218

Cross-domain 41 4,184,635,715 13.23 % 63,183,065

Life sciences 41 3,036,336,004 9.60 % 191,844,090 User-generated content

20 134,127,413 0.42 % 3,449,143

295 31,634,213,770

503,998,829

Domains covered by “lod-cloud” datasets

Page 6: euclid_linkedup WWW tutorial (Besnik Fetahu)

LinkedUp: Why dataset profiling?

7. April 2023 6Besnik Fetahu

Domain Number of datasets Triples % (Out-)Links

Media 25 1,841,852,061 5.82 % 50,440,705 Geographic 31 6,145,532,484 19.43 % 35,812,328 Government 49 13,315,009,400 42.09 % 19,343,519 Publications 87 2,950,720,693 9.33 % 139,925,218 Cross-domain 41 4,184,635,715 13.23 % 63,183,065 Life sciences 41 3,036,336,004 9.60 % 191,844,090 User-generated content

20 134,127,413 0.42 % 3,449,143

295 31,634,213,770

503,998,829

How do I find information about “renewable energy”?

31 billion resources

18 languages 180 organisations

How can we do that?

Check datasets that cover such topic?

Use SPARQL filter clause?

What are all possible forms of renewable energy?

38 out of 228 datasets contain topic coverage informationregex(*) filter clause needs to check all triples that contain a specific keyword

renewable energy: solar energy, wind energy, geothermal…...

Page 7: euclid_linkedup WWW tutorial (Besnik Fetahu)

LinkedUp: How to profile Linked Data?

7. April 2023 7Besnik Fetahu

What is a linked data profile?

Linked Dataset profiles consist of structured information describing their topic coverage. A profile is represented as a graph. The vertices in the profile graph consist of datasets, resources, and topics. The edges of the profile graph are constructed between the tuples ‹dataset, resources› and ‹resources, topics›. Finally, edges between resources and topics are weighted conveying the relevance of a topic for a dataset.

Profile Definition

<resource_uri_1> ?predicate_x value

<resource_uri_1> ?predicate_y value

<resource_uri_1> ?predicate_z value

A dataset consists of a set of resource instances.

A resource is represented by a set of triples.

A topic is equivalent to a DBpedia category, associated to one of the resource values.

<resource_uri_1>

<resource_uri_2>

……<resource_uri_n>

Page 8: euclid_linkedup WWW tutorial (Besnik Fetahu)

Linked-Up: Profiling Linked Data

7. April 2023 8Besnik Fetahu

i. Metadata extraction

ii. Sampling of resource instances

iii. Entity and topic extraction

iv. Topic ranking (PageRank with Priors, HITS

with Priors and K-Step Markov)

v. Weighted dataset-topic profile graphs

vi. Profiles representation

A Scalable Approach for Efficiently Generating Structured Dataset Topic Profiles. Besnik Fetahu, Stefan Dietze, Bernardo Pereira Nunes, Marco Antonio Casanova, Davide Taibi, and Wolfgang Nejdl. In Proceedings of the 11th Extended Semantic Web Conference, Springer, 2014 (to appear).

Page 9: euclid_linkedup WWW tutorial (Besnik Fetahu)

Profiling Linked Data – (I)

7. April 2023 9Besnik Fetahu

i. Metadata extraction:

DataHub’s CKAN API

ii. Sampling of resource instances

weighted, random, centrality

iii. Entity and topic extraction

Consider only the textual values assigned to a resource

NER: Disambiguate and extract named entities (DBpedia Spotlight)

Page 10: euclid_linkedup WWW tutorial (Besnik Fetahu)

Profiling Linked Data – (II)

7. April 2023 10Besnik Fetahu

i. Topic ranking (PageRank with Priors, HITS with Priors and K-Step Markov)

Rank topics for each dataset, and compute their relevance w.r.t the

associated resources

ii. Weighted dataset-topic profile graph

The computed topic weights for each dataset, represent the weights for the

edges <dataset, topic>

iii. Profiles representation (Vocabulary of Interlinked Datasets (VoID) and Vocabulary

of Links (VoL))

VoID: Captures information about a Linked Dataset as a set of links

VoL : Defines a link (of entity or topic type), along with the provenance

information and the relevance score of such link

Page 11: euclid_linkedup WWW tutorial (Besnik Fetahu)

Profiling Linked Data: Representation Example

7. April 2023Besnik Fetahu 11

Dataset Profile Metadata

Dataset’s Profile and Index

Entity Type Link

extracted entity

extracted topic

Provenance information (resources) for the entity link

Provenance information (entities) for the topic link

Topic Type Link

topic relevance score

Page 12: euclid_linkedup WWW tutorial (Besnik Fetahu)

SELECT ?dataset ?link ?score ?link_1 ?entity ?resource WHERE {?dataset a void:Linkset.?dataset vol:hasLink ?link.?link vol:linksResource <http://dbpedia.org/resource/Category:Renewable_energy>.?link vol:derivedFrom ?entity.?link vol:hasScore ?score.?link_1 vol:linksResource ?entity.?dataset vol:hasLink ?link_1.?link_1 vol:derivedFrom ?resource } ORDER BY DESC(?score)

7. April 2023Besnik Fetahu 12

How are the profiles useful?

• “Renewable Energy” is in different forms:• Solar Energy• Wind-farms• Biogas• Hydroelectricity etc.

http://enipedia.tudelft.nl/wiki/Windmar_Renewable_Energy

http://enipedia.tudelft.nl/data/page/eGRID/Plant/57050

http://enipedia.tudelft.nl/wiki/Us_Energy_Biogas_Corp

http://www.reegle.info/profiles/JP

How do I find information about “renewable energy”?

Page 13: euclid_linkedup WWW tutorial (Besnik Fetahu)

Profiling Linked Data: Evaluation

7. April 2023Stefan Dietze 13

Profiling accuracy for the different ranking approaches using the full sample of analysed resource instances, and with NDCG score averaged over all datasets.

The correlation between ranking accuracy (averaged over all datasets and for ∆NDCG ) and ranking time.

Page 14: euclid_linkedup WWW tutorial (Besnik Fetahu)

Profiling Linked Data: Example use cases

7. April 2023Besnik Fetahu 14

Type specific views on datasets/categories “Document” (foaf:document) “Person “ (foaf:person) “Course” (aaiso:course)

LinkedUp Catalog only (as schema mappings already available here)

Exploratory functionalities over the dataset profiles

Available for LinkedUp catalog and the LOD-Cloud.

Page 15: euclid_linkedup WWW tutorial (Besnik Fetahu)

Online Learning and Linked DataLessons Learned and Best Practices

Cite4Me and Linked Challenge

7. April 2023Besnik Fetahu 15

Page 16: euclid_linkedup WWW tutorial (Besnik Fetahu)

Semantic Search and Retrieval of Publications

7. April 2023Besnik Fetahu 16

Semantic SearchGraph Search

Paper RecommendationIn-depth Analysis

Cite4Me: A Semantic Search and Retrieval Web Application for Scientific Publications. Bernardo Pereira Nunes, Besnik Fetahu, Stefan Dietze, and Marco Antonio Casanova. Proceedings of the 12th International Semantic Web Conference, Sydney, Australia, (2013)

Page 17: euclid_linkedup WWW tutorial (Besnik Fetahu)

LinkedUp: Veni Challenge

7. April 2023Besnik Fetahu 17

DataConf.

KnowNodes

Mismuseos

ReCredible

YourHistory

7. April 2023

http://www.globe-town.org/

WeShare - 3rd price / people‘s choice

GlobeTown - 2nd price

http://seek.cloud.gsic.tel.uva.es/weshare/

http://www.polimedia.nl/

PoliMedia – 1st price

Page 18: euclid_linkedup WWW tutorial (Besnik Fetahu)

Demos and Other Resources

7. April 2023Besnik Fetahu 18

Cite4Me: A Semantic Search and Retrieval Web Application for Scientific Publications. Bernardo Pereira Nunes, Besnik Fetahu, Stefan Dietze, and Marco Antonio Casanova. Proceedings of the 12th International Semantic Web Conference, Sydney, Australia, (2013)

A Scalable Approach for Efficiently Generating Structured Dataset Topic Profiles. Besnik Fetahu, Stefan Dietze, Bernardo Pereira Nunes, Marco Antonio Casanova, Davide Taibi, and Wolfgang Nejdl. In Proceedings of the 11th Extended Semantic Web Conference, Springer, 2014 (to appear).

Assessing the Educational Linked Data Landscape, D’Aquin, M., Adamou, A., Dietze, S., ACM Web Science 2013 (WebSci2013), Paris, France, May 2013.

LinkedUp Catalog: http://data.linkededucation.org/linkedup/catalog/

DevTalk LinkedUp: http://data.linkededucation.org/linkedup/devtalk/

LOD Profile Data: http://data-observatory.org/lod-profiles/sparql

LOD Profile Explorer: http://data-observatory.org/lod-profiles/profile-explorer

Cite4Me Application: http://www.cite4me.com/

LinkedUp Challenge: http://linkedup-challenge.org/


Related Documents