Top Banner
TripleCheckMate: A Tool for Crowdsourcing the Quality Assessment of Linked Data Dimitris Kontokostas, Amrapali Zaveri, Sören Auer and Jens Lehmann KESW 2013 Oct 08, 2013
13

TripleCheckMate

May 31, 2015

Download

Education

Amrapali Zaveri

Presentation of the TripleCheckMate tool: http://aksw.org/Projects/TripleCheckMate.html @KESW 2013 (kesw.ifmo.ru/kesw2013/)
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: TripleCheckMate

TripleCheckMate: A Tool for Crowdsourcing the Quality Assessment of Linked Data

Dimitris Kontokostas, Amrapali Zaveri, Sören Auer and Jens Lehmann

KESW 2013 Oct 08, 2013

Page 2: TripleCheckMate

Outline

❏Data Quality❏Data Quality Assessment Methodology❏ Evaluation Methodology - Manual

❏ Phase I: Quality Problem Taxonomy❏ Phase II: Crowdsourcing Quality Assessment

❏ TripleCheckMate❏ Architecture❏Demo

❏Conclusion & Future Work

2

Page 3: TripleCheckMate

Data Quality

● Data Quality (DQ) is defined as:○ fitness for a certain use case*

● On the Data Web - varying quality of information covering various domains

● High quality datasets ○ curated over decades - life science domain○ crowdsourcing process - extracted from unstructured

and semi-structured information, e.g. DBpedia

* J. Juran. The Quality Control Handbook. McGraw-Hill, New York, 1974.3

Page 4: TripleCheckMate

Data Quality Assessment Methodology

4 Step Methodology:

❏ Step 1: Resource selection❏ Per Class❏ Completely random❏ Manual

❏ Step 2: Evaluation mode selection❏ Manual❏ Semi-automatic❏ Automatic

❏ Step 3: Resource evaluation

❏ Step 4: DQ improvement❏ Direct❏ Indirect

4

Page 5: TripleCheckMate

Evaluating Methodology - Manual

❏Phase I: Creation of quality problem taxonomy

❏Phase II: Crowdsourcing quality assessment

5

Page 6: TripleCheckMate

Phase I: Quality Problem Taxonomy

AZaveri, A. Rula, A. Maurino, R. Pietrobon, J. Lehmann, and S. Auer. Quality assessment methodologies for Linked Open Data: A Review. Under review, available at http://www.semantic-webjournal.net/content/quality-assessment-methodologieslinked-open-data.

6

Page 7: TripleCheckMate

Phase II: Crowdsourcing Quality Assessment

Crowdsourcing Our Approach

Type Human Intelligent Tasks (HITs)

Contest-based

Participants Labor market Linked Data (LD) experts

Task Detect quality issues in triples

Detect & classify quality issues in resources

Reward Per tasks/triple Most no. of resources evaluated

Tool Amazon Mechanical Turk, CrowdFlower etc.

TripleCheckMate

7

Page 8: TripleCheckMate

TripleCheckMate - Architecture (1/2)

8

Page 9: TripleCheckMate

TripleCheckMate - Architecture (2/2)

● Built on Java / GWT○ GWT compiles to native cross-browser HTML/JS

● Tomcat / Jetty & MySQL as minimal backend○ store/retrieve evaluation data only

● Application logic is built on the client○ SPARQL executed on client○ Portable

9

Page 10: TripleCheckMate

Evaluation storage schema

● Designed to support multiple campaigns and different ontologies

● Quality taxonomy is stored in the database which makes it easy to adapt

10

Page 11: TripleCheckMate

TripleCheckMate - Demo

http://tinyurl.com/TCM-Demohttp://tinyurl.com/TCM-Screencast

Page 12: TripleCheckMate

Conclusion & Future Work

● TripleCheckMate○ Tool for crowdsouring quality assessment○ Linked Data quality assessment○ Supports inter-rater agreement○ Can be used with any Linked Dataset

● Future Work○ Directly integrating semi-automatic methods○ Improve efficiency of quality assessment○ Include support for Patch Ontology* as output format

* M. Knuth, J. Hercher, and H. Sack. Collaboratively patching linked data. CoRR, 2012. 12

Page 13: TripleCheckMate

Thank YouQuestions?

http://nl.dbpedia.org:8080/TripleCheckMate-Demo/https://github.com/AKSW/TripleCheckMate

http://aksw.org/[email protected]

Twitter: @amrapaliz