Top Banner

Click here to load reader

of 18

euclid_linkedup WWW tutorial (Besnik Fetahu)

Aug 11, 2014

ReportDownload

 

  • Online Learning and Linked Data Lessons Learned and Best Practices Dataset Profiling 3. April 2014 1Besnik Fetahu
  • LinkedUp: Data Catalog Features 34 linked datasets of educational relevance (http://datahub.io/dataset?organization=linked-education) VoID representations of datasets include the following information: Manual dataset schema alignments Accessibility information, i.e. SPARQL endpoint URL 3. April 2014 2Besnik Fetahu http://purl.org/ontology/bibo/Thesis owl:equivalentClass http://purl.org/ontology/bibo/Thesis http://swrc.ontoware.org/ontology#Article owl:equivalentClass http://purl.org/ontology/bibo/AcademicArticle http://data.linkededucation.org/linkedup/dataset/data-open-ac-uk void:sparqlEndpoint http://data.open.ac.uk/queryCo-occurence graph of data types in 146 datasets: 144 Vocabularies, 588 highly overlapping types, 719 Properties Assessing the Educational Linked Data Landscape, DAquin, M., Adamou, A., Dietze, S., ACM Web Science 2013 (WebSci2013), Paris, France, May 2013.
  • LinkedUp: Data Catalog Features 34 linked datasets of educational relevance (http://datahub.io/dataset?organization=linked-education) VoID representations of datasets include the following information: Datasets resources type graph Datasets Topic Extraction (Dataset Profiling) 3. April 2014 3Besnik Fetahu morelab OpenCourseWare
  • LinkedUp: Data Catalog Features 34 linked datasets of educational relevance (http://datahub.io/dataset?organization=linked-education) VoID representations of datasets include the following information: Federated query interface: 3. April 2014 4Besnik Fetahu PREFIX void: PREFIX aiiso: SELECT DISTINCT ?endpoint WHERE{ ?ds void:sparqlEndpoint ?endpoint. {{ ?ds void:classPartition [void:class aiiso:School] } UNION {?ds void:subset [void:classPartition [void:class aiiso:School]] }} }
  • LinkedUp: Why dataset profiling? 3. April 2014 5Besnik Fetahu Few linked dataset characteristics (from Linked Open Data Cloud). Growing number of datasets: 227 datasets Data represented as triples: 31 billion triples Multi-lingual content: 18 languages Broad set of topics covered Inter-dataset links Domain Number of datasets Triples % (Out-)Links % Media 25 1,841,852,061 5.82 % 50,440,705 10.01 % Geographic 31 6,145,532,484 19.43 % 35,812,328 7.11 % Government 49 13,315,009,400 42.09 % 19,343,519 3.84 % Publications 87 2,950,720,693 9.33 % 139,925,218 27.76 % Cross-domain 41 4,184,635,715 13.23 % 63,183,065 12.54 % Life sciences 41 3,036,336,004 9.60 % 191,844,090 38.06 % User-generated content 20 134,127,413 0.42 % 3,449,143 0.68 % 295 31,634,213,770 503,998,829 Domains covered by lod-cloud datasets
  • LinkedUp: Why dataset profiling? 3. April 2014 6Besnik Fetahu Domain Number of datasets Triples % (Out-)Links % Media 25 1,841,852,061 5.82 % 50,440,705 10.01 % Geographic 31 6,145,532,484 19.43 % 35,812,328 7.11 % Government 49 13,315,009,400 42.09 % 19,343,519 3.84 % Publications 87 2,950,720,693 9.33 % 139,925,218 27.76 % Cross-domain 41 4,184,635,715 13.23 % 63,183,065 12.54 % Life sciences 41 3,036,336,004 9.60 % 191,844,090 38.06 % User-generated content 20 134,127,413 0.42 % 3,449,143 0.68 % 295 31,634,213,770 503,998,829 How do I find information about renewable energy? 31 billion resources 18 languages 180 organisations How can we do that? Check datasets that cover such topic? Use SPARQL filter clause? What are all possible forms of renewable energy? 38 out of 228 datasets contain topic coverage information regex(*) filter clause needs to check all triples that contain a specific keyword renewable energy: solar energy, wind energy, geothermal...
  • LinkedUp: How to profile Linked Data? 3. April 2014 7Besnik Fetahu What is a linked data profile? Linked Dataset profiles consist of structured information describing their topic coverage. A profile is represented as a graph. The vertices in the profile graph consist of datasets, resources, and topics. The edges of the profile graph are constructed between the tuples dataset, resources and resources, topics. Finally, edges between resources and topics are weighted conveying the relevance of a topic for a dataset. Profile Definition ?predicate_x value ?predicate_y value ?predicate_z value A dataset consists of a set of resource instances. A resource is represented by a set of triples. A topic is equivalent to a DBpedia category, associated to one of the resource values.
  • Linked-Up: Profiling Linked Data 3. April 2014 8Besnik Fetahu i. Metadata extraction ii. Sampling of resource instances iii. Entity and topic extraction iv. Topic ranking (PageRank with Priors, HITS with Priors and K-Step Markov) v. Weighted dataset-topic profile graphs vi. Profiles representation A Scalable Approach for Efficiently Generating Structured Dataset Topic Profiles. Besnik Fetahu, Stefan Dietze, Bernardo Pereira Nunes, Marco Antonio Casanova, Davide Taibi, and Wolfgang Nejdl. In Proceedings of the 11th Extended Semantic Web Conference, Springer, 2014 (to appear).
  • Profiling Linked Data (I) 3. April 2014 9Besnik Fetahu i. Metadata extraction: DataHubs CKAN API i. Sampling of resource instances weighted, random, centrality i. Entity and topic extraction Consider only the textual values assigned to a resource NER: Disambiguate and extract named entities (DBpedia Spotlight)
  • Profiling Linked Data (II) 3. April 2014 10Besnik Fetahu i. Topic ranking (PageRank with Priors, HITS with Priors and K-Step Markov) Rank topics for each dataset, and compute their relevance w.r.t the associated resources i. Weighted dataset-topic profile graph The computed topic weights for each dataset, represent the weights for the edges i. Profiles representation (Vocabulary of Interlinked Datasets (VoID) and Vocabulary of Links (VoL)) VoID: Captures information about a Linked Dataset as a set of links VoL : Defines a link (of entity or topic type), along with the provenance information and the relevance score of such link
  • Profiling Linked Data: Representation Example 3. April 2014Besnik Fetahu 11 Dataset Profile Metadata Datasets Profile and Index Entity Type Link extracted entity extracted topic Provenance information (resources) for the entity link Provenance information (entities) for the topic link Topic Type Link topic relevance score
  • SELECT ?dataset ?link ?score ?link_1 ?entity ?resource WHERE { ?dataset a void:Linkset. ?dataset vol:hasLink ?link. ?link vol:linksResource . ?link vol:derivedFrom ?entity. ?link vol:hasScore ?score. ?link_1 vol:linksResource ?entity. ?dataset vol:hasLink ?link_1. ?link_1 vol:derivedFrom ?resource } ORDER BY DESC(?score) 3. April 2014Besnik Fetahu 12 How are the profiles useful? Renewable Energy is in different forms: Solar Energy Wind-farms Biogas Hydroelectricity etc. http://enipedia.tudelft.nl/wiki/Windmar_Renewable_Energy http://enipedia.tudelft.nl/data/page/eGRID/Plant/57050 http://enipedia.tudelft.nl/wiki/Us_Energy_Biogas_Corp http://www.reegle.info/profiles/JP How do I find information about renewable energy?
  • Profiling Linked Data: Evaluation 3. April 2014Stefan Dietze 13 Profiling accuracy for the different ranking approaches using the full sample of analysed resource instances, and with NDCG score averaged over all datasets. The correlation between ranking accuracy (averaged over all datasets and for NDCG ) and ranking time.
  • Profiling Linked Data: Example use cases 3. April 2014Besnik Fetahu 14 Type specific views on datasets/ categories Document (foaf:document) Person (foaf:person) Course (aaiso:course) LinkedUp Catalog only (as schema mappings already available here) Exploratory functionalities over the dataset profiles Available for LinkedUp catalog and the LOD-Cloud.
  • Online Learning and Linked Data Lessons Learned and Best Practices Cite4Me and Linked Challenge 3. April 2014Besnik Fetahu 15
  • Semantic Search and Retrieval of Publications 3. April 2014Besnik Fetahu 16 Semantic Search Graph Search Paper Recommendation In-depth Analysis Cite4Me: A Semantic Search and Retrieval Web Application for Scientific Publications. Bernardo Pereira Nunes, Besnik Fetahu, Stefan Dietze, and Marco Antonio Casanova. Proceedings of the 12th International Semantic Web Conference, Sydney, Australia, (2013)
  • LinkedUp: Veni Challenge 3. April 2014Besnik Fetahu 17 DataConf. KnowNodes Mismuseos ReCredible YourHistory 3. April 2014 http://www.globe-town.org/ WeShare - 3rd price / peoples choice GlobeTown - 2nd price http://seek.cloud.gsic.tel.uva.es/weshare/ http://www.polimedia.nl/ PoliMedia 1st price
  • Demos and Other Resources 3. April 2014Besnik Fetahu 18 Cite4Me: A Semantic Search and Retrieval Web Application for Scientific Publications. Bernardo Pereira Nunes, Besnik Fetahu, Stefan Dietze, and Marco Antonio Casanova. Proceedings of the 12th International Semantic Web Conference, Sydney, Australia, (2013) A