Top Banner
I- Extended Databases Key words: Key words: Knowledge Discovery Knowledge Discovery in Databases (KDD). in Databases (KDD). Data Mining (DM). Data Mining (DM). Data Warehousing Data Warehousing (DW) . (DW) . Query Optimization Query Optimization (QO). (QO).
19

I- Extended Databases

Dec 04, 2014

Download

Technology

Zakaria Zubi

I- Extended Databases
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: I- Extended Databases

I- Extended Databases

Key words:Key words:Knowledge Discovery in Knowledge Discovery in Databases (KDD). Databases (KDD). Data Mining (DM).Data Mining (DM). Data Warehousing (DW) .Data Warehousing (DW) . Query Optimization (QO).Query Optimization (QO).

Page 2: I- Extended Databases

Assistant Professor,Computer Science Department,Faculty of Science,Al-Tahadi University,P.O. Box 727,Sirt ,Libya,

Dr. Zakaria Suliman ZubiDr. Zakaria Suliman ZubiByBy

Page 3: I- Extended Databases

3

I- Extended Databases I- Extended Databases

Abstract . Introduction of the Indicative Databases . I-Extended Databases (IE) motivation. I-Extended Databases (IE) and KDD processes . Example . Conclusions and Remarks . Questions.

Page 4: I- Extended Databases

4

AbstractAbstract (1)

How we can handle generalizations in a very large database using Association Rules (AR), and inclusion Functional Dependencies (FD)?

The answer is Inductive database. I- Extended database has a similar property to

inductive databases. I- Extended database contain exceedingly defined

generalizations about the data .

Page 5: I- Extended Databases

5

AbstractAbstract (2)

It can be used in the process of Data Mining. It was proposed in ODBC_KDD(2) Model. The query will uses normal database terminology. The main aim of I-Extended database is to interact

with a spatial Data Mining query called Knowledge Discovery Query Language (KDQL) described in [22].

The KDQL was demonstrated and introduced as a query in the ODBC_KDD (2) model in [22].

Page 6: I- Extended Databases

6

Introduction of the Indicative DatabasesIntroduction of the Indicative Databases

KDD process, contains several steps: understanding the domain, preparing the data set, discovering patterns (i.e., computing a theory), post-processing of discovered patterns, and putting the results into use.

KDD, we need a query language that not only enables the user to select subsets of the data, but also to specify DM tasks and select patterns from the corresponding theories.

Considering the KDQL rules operator which was described in [21] as a possible querying language on mining association rules for i-extended database.

Query should be an object of a similar type than its arguments.

Page 7: I- Extended Databases

7

The model was introduced at the Institute of Mathematics and The model was introduced at the Institute of Mathematics and Informatics at Debrecen University, Debrecen, Hungary 2002.Informatics at Debrecen University, Debrecen, Hungary 2002.

I-Extended Databases Motivation

Gateway

Page 8: I- Extended Databases

8

I-Extended database is a pair R = (R, (PR, e, V)) Where :

–R is a database schema.–PR is a collection of patterns.–V is a set of result values .– e is the evaluation function that defines pattern semantics.

This function maps each pair (r, θi) to an element of V, where r is a database over R and θi P∊ R is a pattern.

An instance of the schema, i-extended database (r, s) over the schema R consists of a database r over the schema R and a subset s ⊆ PR.

I-Extended Databases MotivationI-Extended Databases Motivation continue continue

Page 9: I- Extended Databases

9

Example :

If the patterns are Boolean formulae about the database, V is {true, false},

And the evaluation function e(r, θ) has value true

iff the formula θ is true about r.

In practice, a user might be interested in selecting from the intentionally defined collection of all Boolean formulas, the formulas which are true or the formulas which are false.

I-Extended Databases MotivationI-Extended Databases Motivation continue continue

Page 10: I- Extended Databases

10

I-Extended Databases MotivationI-Extended Databases Motivation continue continue

I-Extended Database : Is a database that in addition to data also contain exceedingly defined generalizations about the data. First we illustrate the Association Rules, and then we Generalize the approach and point out key issues for query evaluation in general.

I-Extended database is a database that has similar properties that are in inductive database that shows how it can be used throughout the whole process of DM due to the closure property of the framework.

Page 11: I- Extended Databases

11

I-Extended Databases MotivationI-Extended Databases Motivation continue continue

The aim of I-Extended Database is as follow:The aim of I-Extended Database is as follow:– I-extended database consists of a normal database

associated to a subset of patterns from a class of patterns, and an evaluation function that tells how the patterns occur in the data.

– I-extended database can be queried (in principle) just by using normal relational algebra or SQL, with the added property of being able to refer to the values of the evaluation function on the patterns.

– Modeling KDD processes as a sequence of queries on i-extended database gives rise to chances for reasoning and optimizing these processes

Page 12: I- Extended Databases

12

I-Extended Databases (IE) and KDD processes

KDD consists of several steps one of these steps is Data Mining. In Data Mining process we are concerned with unique class of

patterns for a real life mining processes presented in a dynamic nature of knowledge acquisition scenario.

These interesting patterns will be presented in I-Extended Databases based on there captured frequency, confidence and support values.

Knowledge gathered often affects the search process, giving rise to new goals in addition to the original ones.

Page 13: I- Extended Databases

13

I-Extended Databases (IE) and KDD processes I-Extended Databases (IE) and KDD processes continue continue

KDD processes can be described by sequences of operations, i.e., queries over relevant i-extended database.

Sequences of queries are abstract and concise descriptions of DM processes.

These descriptions can even be annotated by statistical information about the size of selected dataset, the size of intermediate collection of patterns etc..

Providing knowledge for further use of these relevant sequences.

Page 14: I- Extended Databases

14

Example/Patterns in three instances of I-Extended

Database

Schema R = {A1,…..,An} of attributes with domain {0, 1}.

Relation r over R, an association rule about r is

an expression of the form X⇒B where X ⊆ R and B ∊R \ X.

The intuitive meaning of the rule is that if a row of the matrix r has a 1 in each column of X, then the row tends to have a 1 also in column B.

This semantics is captured by frequency and confidence values. Given W ⊆ R, support (W, r) denotes the fraction of rows of r that have a 1 in each column of W.

The frequency of X ⇒ B in r is defined to be support(X ⋃{B}, r) while its confidence is support(X ⋃ {B}, r)/ support(X , r). Typically, we are interested in association rules for which the frequency and the confidence are greater than given thresholds.

Page 15: I- Extended Databases

15

Conclusions and RemarksConclusions and Remarks

I-Extended Databases enables the definition of mining process as a sequences of queries by using a closure property.

I-Extended Databases is a mandatory step towards to a general purpose query languages for KDD applications.

I-Extended Databases supports pattern generation, pattern filtering and pattern combining operations.

I-Extended Databases can uses standard database terminology to carry out any significant patterns without introducing any additional concepts .

Page 16: I- Extended Databases

16

Importance ReferencesImportance References

[20] T. Imielinski and H. Mannila. A database perspective on knowledge discovery. Communications of ACM, 39:58-64, 1996.

[21] Zakaria S. Zubi, Knowledge Discovery in Remote Access Database, Ch. 9 , PhD dissertation, Debrecen University, Hungary, 2002.

[22] Zakaria S. Zubi, Fazekas Gábor, On ODBC_KDD models, paper,5th International Conference on Applied Informatics, , 28 January -3 February 2001, Eger, Hungary,2001.

Page 17: I- Extended Databases

17

Thank you!!!

Page 18: I- Extended Databases

18

Page 19: I- Extended Databases

19