Exploring Web Data & Knowledge through the Semantic Web Dr. Stefan Dietze L3S Research Center 27/11/13 1 Stefan Dietze
Jan 27, 2015
Exploring Web Data & Knowledge through
the Semantic Web
Dr. Stefan Dietze
L3S Research Center
27/11/13 1 Stefan Dietze
Pluto & the seven Dwarfs?
Stefan Dietze 27/11/13
„…solar system… #pluto“
pluto the dwarf planet ?
“A little semantics goes a long way” (J. Hendler1)
dbp:CelestialBody
dbp:Pluto
dbp:Pluto(mythology)
typeOf
dwarfPlanetOf
Semantic Web
Adding meaning through shared vocabularies and schemas (eg DBpedia)
W3C standards RDF & SPARQL for data & knowledge representation and querying
Persistent URIs to reference & interlink data on the Web
1 Hendler, J., The Dark Side of the Semantic Web, IEEE Intelligent Systems, Jan/Feb 2007
yago:AstronomicalObjects
typeOf
dbp:SolarSystem
dbp:DwarfPlanetPluto
redirectOf namedAfter
„…solar system… #pluto“
Semantic Web / Linked Data
FOAF
Gene Ontology
BIBO Geo
Ontology
DBpedia Ontology
Dublin Core
BBC Program
mes
„HTTP-accessibility“ (SPARQL, URI-dereferencing)
„Structure“ & „Semantics“ (=> shared/linked vocabularies)
„Interlinked“
„Persistent“
De-facto standard for sharing data on the Web
Vision: well connected graph of open Web data
350+ datasets and 32 billion triples in LOD Cloud alone
Other „incarnations“:
Google Knowledge Graph
Facebook Open Graph
http://schema.org
Stefan Dietze
…why are there so few datasets actually used?
Date reuse and in-links focused on trusted „reference graphs“ such as DBpedia (i.e. Wikipedia)
Long tail of LD datasets which are neither reused nor linked to (LOD Cloud alone consists of 300+ datasets)
Explanations?
That’s awesome, but...
27/11/13
„HTTP-accessibility“ (SPARQL, URI-dereferencing)
„Structure“ & „Semantics“ (=> shared/linked vocabularies)
„Interlinked“
„Persistent“
Hm,
really?
Stefan Dietze
Open data is more diverse than we think SPARQL Web-Querying Infrastructure: Ready for Action?,
Carlos Buil-Aranda, Aidan Hogan, Jürgen Umbrich Pierre-Yves
Vandenbussch, International Semantic Web Conference 2013,
(ISWC2013).
SPARQL endpoint availability over time [Buil-Aranda et al 2013]
Accessibility of datasets?
Less than 50% of all SPARQL endpoints actually responsive at given point of time
“THE” SPARQL protocol? No, but many variants & subsets
…
Shared vocabularies & schemas, but:
…still very heterogeneous [d’Aquin, WebSci13]
…data partially messy an not conformant (RDFS, schemas) [HoganJWS2012]
…even widely used reference datasets such as DBpedia noisy [Paulheim2013]
Co-occurence graph of data types in 146 datasets: 144 Vocabularies, 588 highly overlapping types, 719 Properties
Assessing the Educational Linked Data Landscape, D’Aquin, M.,
Adamou, A., Dietze, S., ACM Web Science 2013 (WebSci2013), Paris,
France, May 2013.
Type Inference on Noisy RDF Data, Paulheim H., Bizer, C. Semantic
Web – ISWC 2013, Lecture Notes in Computer Science Volume 8218,
2013, pp 510-525
An empirical survey of Linked Data conformance. Hogan, A., Umbrich,
J., Harth, A., Cyganiak, R., Polleres, A., Decker., S., In the Journal of Web
Semantics 14: pp. 14–44, 2012
Stefan Dietze
Too many/diverse datasets, too little information
Stefan Dietze 27/11/13
? ? ?
Which datasets are useful & trustworthy for case XY (eg „learning about the solar system“) ?
Which topics (eg „Astronomy“) are covered by dataset X?
Which datasets describe/offer videos (slides, publications, statistics etc)?
Data curation and dataset profiling
LinkedUp
Dataset Catalog
Stefan Dietze 27/11/13
Catalog of data (LinkedUp Catalog): classification of datasets according to resource types, disciplines/topics, data quality, accessability, etc
Infrastructure for distributed/federated querying
describes
Which datasets are useful & trustworthy for case XY (eg „learning about the solar system“) ?
Which topics (eg „Astronomy“) are covered by dataset X?
Which datasets describe/offer videos (slides, publications, statistics etc)?
db:Astro. Objects
Dataset profiling: what’s all the data about
Dataset Metadata
Stefan Dietze 27/11/13
Schema mappings
BIBO
AAISO
FOAF
contains
Entity disambiguation
Topic profile extraction
db:Astronomy
db:Astro. Objects
LinkedUp
Dataset Catalog
yov:Video
po:Programme
BBC Programme
<po:Programme …>
<po:Series>Wonders of the Solar System</.>
<po:Actor>Brian Cox</…>
</po:Programme…>
<yo:Video …>
<dc:title>Pluto & the
Dwarf Planets</dc:title>
…
</yo:Video…>
Yovisto Video
bibo:Fil bibo:Fi
bibo:Film
LinkedUp Data Catalog in a nutshell
http://data.linkededucation.org/linkedup/catalog/
Explore & query for datasets/types & topics
Federated queries using type mappings
Stefan Dietze 27/11/13
http://data.linkededucation.org/linkedup/categories-explorer
LinkedUp Challenge: using open data for learning
Open Data Competition to promote tools and applications that analyse / integrate (Linked) Web data
Organised by LinkedUp project over 2 years (“Veni”, “Vidi”, “Vici”) with 40.000 EUR awards
Veni Competition - 22 submissions, 8 shortlisted for presentation at Open Knowledge Conference (17 September, Geneva Switzerland)
http://linkedup-challenge.org
Stefan Dietze 27/11/13
1st Place: PoliMedia Exploring political debates & events
Cross-media exploration & analysis of political events (parliament debates and media coverage)
Automatically generated links between transcripts debates, newspaper articles, and radio bulletins.
(Linked) Data available at http://data.polimedia.nl
Data sources: 1) newspapers of the historical newspaper archive, 2) radio bulletins of the Dutch National Press Agency (ANP)
9000+ debates (1945 – 1995)
Over 3000 media links
Martijn Kleppe, Max Kemman, Henri Beunders (Erasmus Universiteit Rotterdam), Laura Hollink Damir Juric (Vrije Universiteit Amsterdam), Johan Oomen Jaap Blom (Nederlands Instituut voor Beeld en Geluid)
http://www.polimedia.nl/
Stefan Dietze 27/11/13
Simplifying complex information to make it accessible (example: publications from Elsevier)
Scalable tools and applications using (Linked) open data for educational purposes
LinkedUp data catalog
Promotion of selected Veni submissions
Open Track
Approx. 20.000 EUR awards budget
Final events at 11th Extended Semantic Web Conference (ESWC2014)
Outlook: more “focused” data reuse challenges
27/11/13 13
http://linkedup-challenge.org/
Recommender system for educational resources (courses, MOOCs) relevant to user interests
Focused Track
Submission: 14 February 2014
Stefan Dietze
Thank you!
WWW See also (data)
http://datahub.io/group/linked-education
http://data.linkededucation.org
http://data.linkededucation.org/linkedup/catalog/
http://lak.linkededucation.org
See also (general)
http://linkedup-project.eu
http://linkedup-challenge.org
http://linkededucation.org
http://linkeduniversities.org
REFERENCES Assessing the Educational Linked Data Landscape, D’Aquin, M., Adamou, A., Dietze, S., ACM Web Science 2013 (WebSci2013), Paris, France, May 2013. Generating structured Profiles of Linked Data Graphs, Fetahu, B; Dietze, S., d’Aquin, M., Nunes, B.P., ISWC2013 – 12th International Semantic Web Conference; Combining a co-occurrence-based and a semantic measure for entity linking, B. P. Nunes, S. Dietze, M.A. Casanova, R. Kawase, B. Fetahu, and W. Nejdl., ESWC 2013 - 10th Extended Semantic Web Conference, (May 2013). Type Inference on Noisy RDF Data, Paulheim H., Bizer, C. Semantic Web – ISWC 2013, Lecture Notes in Computer Science Volume 8218, 2013, pp 510-525 An empirical survey of Linked Data conformance. Hogan, A., Umbrich, J., Harth, A., Cyganiak, R., Polleres, A., Decker., S., In the Journal of Web Semantics 14: pp. 14–44, 2012 SPARQL Web-Querying Infrastructure: Ready for Action?, Carlos Buil-Aranda, Aidan Hogan, Jürgen Umbrich Pierre-Yves Vandenbussch, International Semantic Web Conference 2013, (ISWC2013).
27/11/13 14 Stefan Dietze