DBpedia (in) ALIGNED From DBpedia to DBpedia + Dimitris Kontokostas AKSW Group, Leipzig University DBpedia Association
Jul 15, 2015
DBpedia (in) ALIGNED
From DBpedia to DBpedia+
Dimitris Kontokostas
AKSW Group, Leipzig University
DBpedia Association
February 9th 2015 / 3rd DBpedia Meeting in Dublin
DBpedia @ 2007
February 9th 2015 / 3rd DBpedia Meeting in Dublin
DBpedia @ 2008
February 9th 2015 / 3rd DBpedia Meeting in Dublin
DBpedia @ 2009
February 9th 2015 / 3rd DBpedia Meeting in Dublin
DBpedia @ 2010
February 9th 2015 / 3rd DBpedia Meeting in Dublin
DBpedia @ 2011
February 9th 2015 / 3rd DBpedia Meeting in Dublin
DBpedia @ 2014
February 9th 2015 / 3rd DBpedia Meeting in Dublin
RDF Stats (2014 release)
3B facts (only 580M facts in English)● DBpedia En: 4.58M Things / 4.22M typed● 125 Localized versions: 38.3M Things● 50M links to other datasets
Many more stats @:dbpedia.org/Datasets2014/DatasetStatistics
February 9th 2015 / 3rd DBpedia Meeting in Dublin
Dev Stats
DBpedia Information Extraction Framework● Java/Scala based framework
○ Old PHP-based framework● 5.1K Commits● 52K lines of code (100K/1M AT)● 71 total contributors
Many more stats @:www.openhub.net/p/dbpedia
February 9th 2015 / 3rd DBpedia Meeting in Dublin
Aligning Problem
Lot’s of code & a lot more data● Wikipedia evolves over time
○ Infobox Templates change, merge, deleted○ New formatting templates○ Structural differences per language edition
● Code should adapt to all the changes○ hard at this (data) scale
February 9th 2015 / 3rd DBpedia Meeting in Dublin
Unit-testing to the rescue?
● Software & Data testing● Straightforward for software (since 70’s)● Preliminary for (RDF) data
○ RDFUnit, SPIN, OWL, PelletICV, ShEx,...■ W3C Data Shapes WG
Data testing++● Generation: manual, (Semi)automatic, ...● Linking: data & software tests
February 9th 2015 / 3rd DBpedia Meeting in Dublin
RDFUnit
http://rdfunit.aksw.org
February 9th 2015 / 3rd DBpedia Meeting in Dublin
UT feedback loop
Data verification and feedback at different data extraction stages● Three main points of failure in DBpedia:
○ Code○ Infobox mappings○ Wikipedia (!!!)
February 9th 2015 / 3rd DBpedia Meeting in Dublin
DBpedia+ Workflow
February 9th 2015 / 3rd DBpedia Meeting in Dublin
Additional feedback
We are looking into:● Reporting● Statistics● Inter-Wikipedia cross-checking● ML techniques
February 9th 2015 / 3rd DBpedia Meeting in Dublin
Thank you & Questions?
ALIGNEDAligned, Quality-centric Software and Data
Engineering