Top Banner
Data Mining with Oracle using Classification and Clustering Algorithms Proposed and Presented by Nhamo Mdzingwa Supervisor: John Ebden
16

Data Mining with Oracle using Classification and Clustering Algorithms Proposed and Presented by Nhamo Mdzingwa Supervisor: John Ebden.

Jan 02, 2016

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Data Mining with Oracle using Classification and Clustering Algorithms Proposed and Presented by Nhamo Mdzingwa Supervisor: John Ebden.

Data Mining with Oracle using Classification and Clustering Algorithms

Proposed and Presented by

Nhamo Mdzingwa

Supervisor: John Ebden

Page 2: Data Mining with Oracle using Classification and Clustering Algorithms Proposed and Presented by Nhamo Mdzingwa Supervisor: John Ebden.

Presentation Outline

Problem Statement Objective Background Expected Results Possible Extensions Plan of action Timeline Literature Survey Questions

Page 3: Data Mining with Oracle using Classification and Clustering Algorithms Proposed and Presented by Nhamo Mdzingwa Supervisor: John Ebden.

Problem Statement

The commercial world is fast reacting to the growth & potential in the DM area, as a wide range of tools are being marketed as DM suites.

Examples of these are: Oracle DM DB2’s Intelligent Miner Informix’s Data Mine SQL Data miner Ghost miner Clementine 9.0 (SPSS) SAS Gornish systems, etc

Page 4: Data Mining with Oracle using Classification and Clustering Algorithms Proposed and Presented by Nhamo Mdzingwa Supervisor: John Ebden.

Problem

It is vital to know the algorithms a DM suite uses and which algorithm to use on a particular data set.

Secondly, how well each algorithm performs in terms of accuracy, efficiency and effectiveness when using a particular DM suite e.g. Oracle DM.

Page 5: Data Mining with Oracle using Classification and Clustering Algorithms Proposed and Presented by Nhamo Mdzingwa Supervisor: John Ebden.

Objective

Investigate two types of algorithms available in Oracle for data mining (ODM).

Apply the two algorithms to actual data. Analyse & Evaluate

results in terms of performance.

Page 6: Data Mining with Oracle using Classification and Clustering Algorithms Proposed and Presented by Nhamo Mdzingwa Supervisor: John Ebden.

What is Data Mining? (Background)

Simply put, DM is knowledge discovery.

DM is the process of automatic discovery of [hidden] patterns and relationships within enormous amounts of data.

It is a powerful & new technology that allows businesses to make proactive, knowledge-driven decisions as it tries to predict the future.

Data (represents knowledge) normally stored in databases and data warehouses ( typical size in tera-bytes).

Page 7: Data Mining with Oracle using Classification and Clustering Algorithms Proposed and Presented by Nhamo Mdzingwa Supervisor: John Ebden.

Automatic discovery is implemented by the use of algorithms provided by DM suites

E.g. oracle offers: Adaptive Bayes Network supporting

decision trees (classification)

Naive Bayes (classification)

1. Model Seeker (classification)

2. k-Means (clustering)

3. O-Cluster (clustering)

4. Predictive variance (attribute importance)

5. Apriori (association rules)

Page 8: Data Mining with Oracle using Classification and Clustering Algorithms Proposed and Presented by Nhamo Mdzingwa Supervisor: John Ebden.

Algorithms are grouped as either supervised or unsupervised learning strategies.

DM strategies

Unsupervised learning

Supervised learning

ClassificationNaive BayesModel SeekerAdaptive Bayes

Estimation

PredictionPredictive variance

Clusteringk-MeansO-Cluster

Input attributes and output one or more attributes

Input attributes but have no output attributes

Page 9: Data Mining with Oracle using Classification and Clustering Algorithms Proposed and Presented by Nhamo Mdzingwa Supervisor: John Ebden.

The data mining process involves a series of steps to define a business problem, gather and prepare the data, build and evaluate mining models, and apply the models and disseminate the new information.

Page 10: Data Mining with Oracle using Classification and Clustering Algorithms Proposed and Presented by Nhamo Mdzingwa Supervisor: John Ebden.

Expected Results

Aim at conclusively saying which algorithm will be most effective and suitable for the process of data mining on any dataset

- since datasets are different.

Page 11: Data Mining with Oracle using Classification and Clustering Algorithms Proposed and Presented by Nhamo Mdzingwa Supervisor: John Ebden.

Possible Extensions to the Project:

testing of the same algorithms with different tools offered by other vendors.

e.g. testing with the DM suite in SQL and

checking if the results are similar. If not, investigating why the results are different,

could be another extension.

Page 12: Data Mining with Oracle using Classification and Clustering Algorithms Proposed and Presented by Nhamo Mdzingwa Supervisor: John Ebden.

Plan of Action

Carry out a literature search: mainly to obtain background knowledge and

understanding of field.

Get to know Oracle DM Suite: Do DM tutorials provided by oracle. The server Ora1 is the machine I’ll be working with. It is already installed with JDeveloper & oracle 10g

database, oracle 9i DM.

Page 13: Data Mining with Oracle using Classification and Clustering Algorithms Proposed and Presented by Nhamo Mdzingwa Supervisor: John Ebden.

Continuation from literature and tutorials doneInvestigate Clustering & Classification

algorithms (theory) 2nd term- 15 to 30 April

Find suitable computerised case studies of the use of above algorithms – with or without Oracle.

2nd term- End of May

Search databases for testing (possibilities: AIDS data & faculty data)

2nd term- End of May

Apply algorithms to data found then Critically Analyse & assess results

Second semester

Write up paper September vacation and 3rd term

Final project write up Due 7/11

Timeline

Page 14: Data Mining with Oracle using Classification and Clustering Algorithms Proposed and Presented by Nhamo Mdzingwa Supervisor: John Ebden.

Literature Survey

Richard J. Roiger and Michael W. Geatz, Data mining: a tutorial- based primer. Boston, Massachusetts, Addison Wesley, 2003;

This book will provide the necessary background and practical knowledge required for the project research and also presents different methodologies used in data mining that may be useful.

Page 15: Data Mining with Oracle using Classification and Clustering Algorithms Proposed and Presented by Nhamo Mdzingwa Supervisor: John Ebden.

David Hand, Heikki Mannila and Padhraic Smyth, Principles of data mining.Cambridge Massachusetts, MIT Press, 2001.

Jesus Mena, Data mining your website. Digital Press, 1999.

Jiawei Han and Micheline Kamber, Data mining: concepts and techniquesSan Francisco, California, Morgan Kauffmann, 2001

Robert P. Trueblood and John N. Lovett, Jnr. Data Mining and Statistical Analysis Using SQL, USA, Apress,

http://www.lc.leidenuniv.nl/awcourse/oracle/datamine.920/a95961/preface.htm

http://www.oracle.com/technology/products/oracle9i/htdocs/o9idm_faq.html http://fas.sfu.ca/cs/research/groups/DB/sections/publication/kdd/kdd.html .

Page 16: Data Mining with Oracle using Classification and Clustering Algorithms Proposed and Presented by Nhamo Mdzingwa Supervisor: John Ebden.

Questions?

Thank you