Top Banner
Matic Perovšek, Anže Vavpetič, Nada Lavrač Jožef Stefan Institute, Slovenia A Wordification Approach to Relational Data Mining: Early Results
12

Matic Perovšek, Anže Vavpeti č, Nada Lavra č Jožef Stefan Institute, Slovenia A Wordification Approach to Relational Data Mining: Early Results.

Dec 13, 2015

Download

Documents

Melissa Martin
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Matic Perovšek, Anže Vavpeti č, Nada Lavra č Jožef Stefan Institute, Slovenia A Wordification Approach to Relational Data Mining: Early Results.

Matic Perovšek, Anže Vavpetič, Nada Lavrač

Jožef Stefan Institute, Slovenia

A Wordification Approach to Relational Data Mining: Early

Results

Page 2: Matic Perovšek, Anže Vavpeti č, Nada Lavra č Jožef Stefan Institute, Slovenia A Wordification Approach to Relational Data Mining: Early Results.

OverviewIntroductionMethodologyExperimental resultsConclusion

Page 3: Matic Perovšek, Anže Vavpeti č, Nada Lavra č Jožef Stefan Institute, Slovenia A Wordification Approach to Relational Data Mining: Early Results.

Introduction

Relational data mining algorithms aim to induce models and/or relational patterns from multiple tables

Individual-centered relational databases can be transformed to a single-table form – propositionalization

Page 4: Matic Perovšek, Anže Vavpeti č, Nada Lavra č Jožef Stefan Institute, Slovenia A Wordification Approach to Relational Data Mining: Early Results.

MotivationWordification inspired by text

mining techniquesLarge number of simple, easy to

understand featuresGreater scalability, handling large

datasetsCan be used as a preprocessing

step to propositional learners, as well as to declarative modeling / constraint solving (De Raedt et al., today’s invited talk)

Page 5: Matic Perovšek, Anže Vavpeti č, Nada Lavra č Jožef Stefan Institute, Slovenia A Wordification Approach to Relational Data Mining: Early Results.

Methodology

1. Transformation from relational database to a textual corpus

2. TF-IDF weight calculation

Page 6: Matic Perovšek, Anže Vavpeti č, Nada Lavra č Jožef Stefan Institute, Slovenia A Wordification Approach to Relational Data Mining: Early Results.

Transformation from relational database to a textual corpus

One individual of the initial relational database -> one text document

Features -> the words of this document

Words constructed as a combination:

Page 7: Matic Perovšek, Anže Vavpeti č, Nada Lavra č Jožef Stefan Institute, Slovenia A Wordification Approach to Relational Data Mining: Early Results.

Transformation from relational database to a textual corpus

For each individual, the words generated for the main table are concatenated with words generated from the secondary (BK) tables

Page 8: Matic Perovšek, Anže Vavpeti č, Nada Lavra č Jožef Stefan Institute, Slovenia A Wordification Approach to Relational Data Mining: Early Results.

Example

Page 9: Matic Perovšek, Anže Vavpeti č, Nada Lavra č Jožef Stefan Institute, Slovenia A Wordification Approach to Relational Data Mining: Early Results.

TF-IDF weightsNo explicit use of existential

variables in our features, TF-IDF instead

The weight of a word gives a strong indication of how relevant is the feature for the given individual.

The TF-IDF weights can then be used either for filtering words with low importance or using them directly by a propositional learner.

Page 10: Matic Perovšek, Anže Vavpeti č, Nada Lavra č Jožef Stefan Institute, Slovenia A Wordification Approach to Relational Data Mining: Early Results.

Experimental resultsSlovenian traffic accidents database

IMDB databaseTop 250 and bottom 100 moviesMovies, actors, movie genres, directors, director genres

Applied the wordification methodology

Performed association rule learning

Page 11: Matic Perovšek, Anže Vavpeti č, Nada Lavra č Jožef Stefan Institute, Slovenia A Wordification Approach to Relational Data Mining: Early Results.

Experimental results

Page 12: Matic Perovšek, Anže Vavpeti č, Nada Lavra č Jožef Stefan Institute, Slovenia A Wordification Approach to Relational Data Mining: Early Results.

ConclusionNovel propositionalization technique called

WordificationGreater scalabilityEasy to understand featuresFurther work:

Test on larger databasesExperimental comparison with other

propositionalization techniquesCombine with propositionalization–like

approach to mining heterogeneous information networks (Grčar et al. 2012), applicable to CLP in data preprocessing

Grčar, Trdin, Lavrač: A Methodology for Mining Document-Enriched Heterogeneous Information Networks, Computer Journal 2012