Cristina Sarasua Data Interlinking together with Crowd Workers 1 Institute for Web Science and Technologies · University of Koblenz-Landau, Germany Data Interlinking together with Crowd Workers Cristina Sarasua 2nd DBpedia Community Meeting, Leipzig
Jul 14, 2015
Cristina Sarasua Data Interlinking together with Crowd Workers 1Institute for Web Science and Technologies · Univ ersity of Koblenz-Landau, Germany
Data Interlinkingtogether with Crowd Workers
Cristina Sarasua
2nd DBpedia Community Meeting, Leipzig
Cristina Sarasua Data Interlinking together with Crowd Workers 2
Image: http://www.w3.org/DesignIssues/diagrams/lod/597992118v2_350x350_Back.jpg
Cristina Sarasua Data Interlinking together with Crowd Workers 3
Scenario for data interlinking
Music data integration
Cristina Sarasua Data Interlinking together with Crowd Workers 4
• A: Extending the description of resources� enabling richer queries
What for?
dbpediasong1
d1song1
owl:sameAs
dbpediaLeipzig
d1song1
o:wasPlayedIn
Cristina Sarasua Data Interlinking together with Crowd Workers 6
d1:song1 a ma:AudioTrack;ma:title ``UFO´´;ma:locator musicexample:s1896.mp3^^xsd:anyURI;ma:hasKeyword d1:colplay;
d1:song1 a ma:AudioTrack;ma:title ``UFO´´;ma:locator musicexample:s1896.mp3^^xsd:anyURI;ma:hasKeyword d1:colplay;
dbpedia:U.F.O._(song a dbpedia-owl:Work;a dbpedia-owl:Song;dc:title ``U.F.O.´´;prop:artist dbpedia:Coldplay;
dbpedia:UFO_(band) a dbpedia-owl:Band;a dbpedia-owl:Song;dc:title ``U.F.O.´´;
dbpedia:U.F.O._(song a dbpedia-owl:Work;a dbpedia-owl:Song;dc:title ``U.F.O.´´;prop:artist dbpedia:Coldplay;
dbpedia:UFO_(band) a dbpedia-owl:Band;a dbpedia-owl:Song;dc:title ``U.F.O.´´;
D1
DBpedia
owl:sameAs ?
Cristina Sarasua Data Interlinking together with Crowd Workers 7
• Goal : typed link tocreate (e.g. owl:sameAs)
• Information to analyse(i.e. attribute-values)
• Decision criterion (e.g. levenshtein < 2)
automatic
Cristina Sarasua Data Interlinking together with Crowd Workers 8
d1:song1 a ma:AudioTrack;ma:title ``UFO´´;ma:locator musicexample:s1896.mp3^^xsd:anyURI;ma:hasKeyword d1:colplay;
d1:song1 a ma:AudioTrack;ma:title ``UFO´´;ma:locator musicexample:s1896.mp3^^xsd:anyURI;ma:hasKeyword d1:colplay;
dbpedia:U.F.O._(song a dbpedia-owl:Work;a dbpedia-owl:Song ;dc:title ``U.F.O.´´;prop:artist dbpedia:Coldplay;
dbpedia:UFO_(band) a dbpedia-owl:Band ;prop:name ``U.F.O.´´;
dbpedia:U.F.O._(song a dbpedia-owl:Work;a dbpedia-owl:Song ;dc:title ``U.F.O.´´;prop:artist dbpedia:Coldplay;
dbpedia:UFO_(band) a dbpedia-owl:Band ;prop:name ``U.F.O.´´;
D1
DBpedia
Human toguide theprocess
owl:sameAs ?
Cristina Sarasua Data Interlinking together with Crowd Workers 9
d1:song1 a ma:AudioTrack;ma:title ``Soon´´;ma:locator musicexample:s1896.mp3^^xsd:anyURI;
d1:song1 a ma:AudioTrack;ma:title ``Soon´´;ma:locator musicexample:s1896.mp3^^xsd:anyURI;
dbpedia:Transatlantic_KK a dbpedia-owl:Work;a dbpedia-owl:Album;dc:title ``Soon´´;dbprop:artist dbpedia:Delorean_(band);
dbpedia:Soon_(Tanya_Tucker_song) a dbpedia-owl:Work ;a dbpedia-owl:MusicalWork;dc:title ``Soon´´;dbprop:artist dbpedia:Tanya_Tucker;
dbpedia:Transatlantic_KK a dbpedia-owl:Work;a dbpedia-owl:Album;dc:title ``Soon´´;dbprop:artist dbpedia:Delorean_(band);
dbpedia:Soon_(Tanya_Tucker_song) a dbpedia-owl:Work ;a dbpedia-owl:MusicalWork;dc:title ``Soon´´;dbprop:artist dbpedia:Tanya_Tucker;
D1
DBpedia
Human tocorrect
owl:sameAs ?
Cristina Sarasua Data Interlinking together with Crowd Workers 10
d1:song1 a ma:AudioTrack;ma:title ``UFO´´;ma:locator musicexample:s1896.mp3^^xsd:anyURI;ma:hasKeyword d1:colplay;
d1:song1 a ma:AudioTrack;ma:title ``UFO´´;ma:locator musicexample:s1896.mp3^^xsd:anyURI;ma:hasKeyword d1:colplay;
dbpedia:Leipzig a dbpedia-owl:Place;rdfs:label ``Leipzig´´;
dbpedia:Leipzig a dbpedia-owl:Place;rdfs:label ``Leipzig´´;
D1
DBpedia
Human tocrate
new links
o:wasPlayedIn?
Cristina Sarasua Data Interlinking together with Crowd Workers 11
• Creative and proactive• Listen / watch / search• Process / associate / more
complicated conclusionshuman
Cristina Sarasua Data Interlinking together with Crowd Workers 13
Crowd -powered data interlinking
• Building a system that– Combines algorithmic and human
computation– Systematically involves humans
via microtasks– Considers the aforementioned
typs of links– Schema- and instance-level links
Automaticinterlinking
Cristina Sarasua Data Interlinking together with Crowd Workers 14
It worked! quick, unexpensiveSee CrowdMAP [Sarasua et al., 2012]
Overview
Cristina Sarasua Data Interlinking together with Crowd Workers 16
A microtask
Challenge #1: It has to work with ANYONE
Challenge #2: We still want a data-independent solution
Cristina Sarasua Data Interlinking together with Crowd Workers 17
Picture: Icon made by Freepik from http://www.flaticon.com
Ongoing work
How toimprove?
Cristina Sarasua Data Interlinking together with Crowd Workers 18
Crowdsourcing approaches• Additional incentives to make them process
more links, faster (e.g. display #links left)• Let them explain others: write the argument
for the decision• Show similar link: decide by comparison
How to optimize the process ?
Cristina Sarasua Data Interlinking together with Crowd Workers 19
Crowdsourcing approaches• Additional incentives to make them process
more links, faster (e.g. display #links left)• Let them explain others: others: write the
argument for the decision• Show similar link: decide by comparison
How to optimize the process ?
Challenge #3: How to decide what is an analogous link here? (danger of bias?)
predicate rdf:type False positive / negative
Cristina Sarasua Data Interlinking together with Crowd Workers 20
Data-oriented approaches• Test and instructing links: targeted selection• Scheduled sequences of links to process: to
make more sense
How to optimize the process ?
Cristina Sarasua Data Interlinking together with Crowd Workers 21
Data-oriented approaches• Test and instructing links: targeted selection
• Scheduled sequences of links to process: • Validate vs identify microtasks :
How to optimize the process ?
Challege #4: How to build that programmatically?
data analysis data + crowd data + expert
Difficult case, rare
Easy case, common
Cristina Sarasua Data Interlinking together with Crowd Workers 22
Data-oriented approaches• Test and instructing links: targeted selection• Scheduled sequences of links to process: to
make more sense
How to optimize the process ?
Cristina Sarasua Data Interlinking together with Crowd Workers 23
Data-oriented approaches• Test and instructing links: targeted selection• Scheduled sequences of links to process: to
make more sense• Validate vs identify microtasks
How to optimize the process ?(II)
Challege #5: How to predict how suitable a worker will be forprocessing a particular link?
Which features of links have influence in the prediction?
Previous cross-platformexperience (CrowdWorkCV)
See also [Sarasua et al., 2013]
Ranking a list of suitablelinks based on training links
Cristina Sarasua Data Interlinking together with Crowd Workers 24
Data-oriented approaches• Test and instructing links: targeted selection• Scheduled sequences of links to process: to
make more sense
How to optimize the process ?(II)
Challege #6: How should we assess a priori if (and to whatextent approx.) we need crowdsourcing for a particular pair
of data sets?
Cristina Sarasua Data Interlinking together with Crowd Workers 26
• Yes, microtask crowdsourcing allows you to involvehumans for processing lots of data, it is cost-effective and fast
• Research shows it is a feasible complement to datainterlinking algorithms
• BUT do not underestimate the microtasks management
Coming soon …http://github.com/criscod
Take-away messages
Cristina Sarasua Data Interlinking together with Crowd Workers 27
[Schmachtenberg et al., 2014]
Open question : wouldn ´t crowd -powereddata interlinking enrich this table ?
Cristina Sarasua Data Interlinking together with Crowd Workers 28Institute for Web Science and Technologies · Univ ersity of Koblenz-Landau, Germany
Thank you for your attention!
Contact:Cristina SarasuaInstitute for Web Science and TechnologiesUniversität Koblenz-Landau [email protected]
Cristina Sarasua Data Interlinking together with Crowd Workers 29
• Sarasua, C. Crowdsourced Interlinking on the Web of Data. In: 18th International Conference on Knowledge Engineering and Knowledge Management(EKAW). Doctoral Symposium. (2012)
• Sarasua, C., Simperl, E., Noy, N.F.: CrowdMAP: Crowdsourcing ontology alignment with microtasks. In: Proceedings of the 11th International Semantic Web Conference (ISWC). (2012)
• Sarasua, C. Thimm, M.: Microtask available, send us your CV! In: Proceedings of the International Workshop on Crowd Work and Human Computation(CrowdWork 2013). (2013)
• Max Schmachtenberg, Christian Bizer, Heiko Paulheim: Adoption of the Linked Data Best Practices in Different Topical Domains. 13th International Semantic Web Conference (ISWC2014) - RDB Track, Riva del Garda, Italy, October 2014
References