Top Banner
TAIPAN: Automatic Property Mapping for Tabular Data by Ivan Ermilov and Axel-Cyrille Ngonga Ngomo November 22nd, 2016 1
10

TAIPAN: Automatic Property Mapping for Tabular Data

Apr 15, 2017

Download

Engineering

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: TAIPAN: Automatic Property Mapping for Tabular Data

TAIPAN: Automatic Property Mapping for

Tabular Data by Ivan Ermilov and Axel-Cyrille Ngonga Ngomo

November 22nd, 2016

1

Page 2: TAIPAN: Automatic Property Mapping for Tabular Data

Web Scale Data Mining from Web Tables

Web Data CommonsDresden Table Dataset

Other tables

The Web

TAIPAN

● Structured● Schemaless● Not using standards*

● SPARQL● RDFS● OWL

2

Page 3: TAIPAN: Automatic Property Mapping for Tabular Data

TAIPAN Approach Overview

Identify Subject Column

Atomize a Table

Identify Property for Each Table

Step 1 Step 2 Step 3 Step 4

Return Mappings

3

Page 4: TAIPAN: Automatic Property Mapping for Tabular Data

TAIPAN Approach Overview (example)1

2

3

4

Page 5: TAIPAN: Automatic Property Mapping for Tabular Data

The Core of TAIPAN

Subject Column Identification

● Unsupervised ML● Structural features● Semantic features

○ Support of a column○ Connectivity

● Retrieve seed entities● Rank entities● Return top entity

Property Mapping

5

Page 6: TAIPAN: Automatic Property Mapping for Tabular Data

Experimental setup

For T2K: 128GB, 4 Cores, Ubuntu 14.04

For TAIPAN: 16GB, 4 Cores Ubuntu 14.04

Dataset 1: curated T2D gold standard (T2D)

Dataset 2: DBpedia table dataset (DBD)

6

Page 7: TAIPAN: Automatic Property Mapping for Tabular Data

Subject Column Identification Experiments

Rule-based approach achieves only 51.72% accuracy

Using support and connectivity increase precision

Observations

Can be further improved using ML techniques

7

Page 8: TAIPAN: Automatic Property Mapping for Tabular Data

Property Mapping Experiments

TAIPAN achieves better recall, but lower precision than T2D

On the DBD dataset T2K could match only 1 property

Observations

Overall TAIPAN performs better than the state of the art

8

Page 9: TAIPAN: Automatic Property Mapping for Tabular Data

Conclusions & Future Work

Curated T2D & DBD datasets

Novel TAIPAN approach

Open Table Extraction

Table Extraction Benchmark (HOBBIT)

Integration of TAIPAN into GEISER project9

Page 10: TAIPAN: Automatic Property Mapping for Tabular Data

Thank you! Follow us on twitter :)

Ivan Ermilov <[email protected]>

@hobbit_project

10