Top Banner
Data Mining Techniques and Applications, 1 st edition Hongbo Du ISBN 978-1-84480-891-5 © 2010 Cengage Learning Chapter Two Principles of data mining
36

Data Mining Techniques and Applications, 1 st edition Hongbo Du ISBN 978-1-84480-891-5 © 2010 Cengage Learning Chapter Two Principles of data mining.

Apr 01, 2015

Download

Documents

Jaxon Gale
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Data Mining Techniques and Applications, 1 st edition Hongbo Du ISBN 978-1-84480-891-5 © 2010 Cengage Learning Chapter Two Principles of data mining.

Data Mining Techniques and Applications, 1st editionHongbo Du

ISBN 978-1-84480-891-5 © 2010 Cengage Learning

Chapter Two

Principles of data mining

Page 2: Data Mining Techniques and Applications, 1 st edition Hongbo Du ISBN 978-1-84480-891-5 © 2010 Cengage Learning Chapter Two Principles of data mining.

Data Mining Techniques and Applications, 1st editionHongbo Du

ISBN 978-1-84480-891-5 © 2010 Cengage Learning

Chapter Overview

• The process of data mining• Approaches of data mining• Categories of data mining problems• Information patterns to be discovered• Overview of data mining solutions• Importance of evaluation• Undertaking a data mining task in Weka • Review of basic concepts in statistics and

probability

Page 3: Data Mining Techniques and Applications, 1 st edition Hongbo Du ISBN 978-1-84480-891-5 © 2010 Cengage Learning Chapter Two Principles of data mining.

Data Mining Techniques and Applications, 1st editionHongbo Du

ISBN 978-1-84480-891-5 © 2010 Cengage Learning

Data Mining Process

PreparingInput Data

MiningPatterns

Post-processingPatterns

InputData

OutputPatterns

A data mining stage

Flow of control from one stage to the next stage

Flow of control from one stage to the previous stage

Repetition of the tasks at one stage

Page 4: Data Mining Techniques and Applications, 1 st edition Hongbo Du ISBN 978-1-84480-891-5 © 2010 Cengage Learning Chapter Two Principles of data mining.

Data Mining Techniques and Applications, 1st editionHongbo Du

ISBN 978-1-84480-891-5 © 2010 Cengage Learning

Data Mining Process

• Preparation

Formatted Data set

Formatted Data set

Target Data set

Pre-Processed Data set

Original Data sets

Collected Data set

• Integrating data• Getting necessary

data details

• Selecting relevant features• Selecting relevant records

• Data cleaning• Deal with unknown data• Data transformation

• Formatting data into acceptable form by the mining tool

Page 5: Data Mining Techniques and Applications, 1 st edition Hongbo Du ISBN 978-1-84480-891-5 © 2010 Cengage Learning Chapter Two Principles of data mining.

Data Mining Techniques and Applications, 1st editionHongbo Du

ISBN 978-1-84480-891-5 © 2010 Cengage Learning

Data Mining Process

• Mining– Determining data mining

tasks – Assigning roles for data

for certain tasks– Selecting data mining

solution(s) to each task– Setting necessary

parameters for the solution

– Collecting result patterns

Formatted Data set

Formatted Data set

Solution3

(w1, w2, …, wm) Solution2

(t1, t2, …, tr)Solution1

(p1, p2, …, pn)

Patterns

Mining solutionsParam

eter settings

Page 6: Data Mining Techniques and Applications, 1 st edition Hongbo Du ISBN 978-1-84480-891-5 © 2010 Cengage Learning Chapter Two Principles of data mining.

Data Mining Techniques and Applications, 1st editionHongbo Du

ISBN 978-1-84480-891-5 © 2010 Cengage Learning

Data Mining Process

• Post-processing– Pattern evaluation – Pattern selection– Pattern interpretation

PatternsEvaluation

criteria

reject

ValidPatternsValid

PatternsSelection

criteria

SelectedPatterns

acceptPattern

Interpretation

Knowledge learnt

Page 7: Data Mining Techniques and Applications, 1 st edition Hongbo Du ISBN 978-1-84480-891-5 © 2010 Cengage Learning Chapter Two Principles of data mining.

Data Mining Techniques and Applications, 1st editionHongbo Du

ISBN 978-1-84480-891-5 © 2010 Cengage Learning

Data Mining Process• Roles of participants in data mining

– Participants include:• Data miners / data analysts: main participant of a DM project• Domain expert: main collaborators of DM project• Decision makers: clients of a DM project

– Risk of human bias in the discovery process– Important roles of domain expert

• Pattern interpretation (for usefulness)• Pattern evaluation (for significance)• Mining options (for suitable tasks, limited)• Advisory on data pre-processing (for suitable operations, limited)

– Balancing the strength of human and machine

Page 8: Data Mining Techniques and Applications, 1 st edition Hongbo Du ISBN 978-1-84480-891-5 © 2010 Cengage Learning Chapter Two Principles of data mining.

Data Mining Techniques and Applications, 1st editionHongbo Du

ISBN 978-1-84480-891-5 © 2010 Cengage Learning

Data Mining Approaches

• Hypothesis testing approach– Top-down lead by a hypothesis statement– Procedure:

1. Forming a hypothesis statement2. Collecting and selecting data of relevance3. Conducting data analysis and collecting patterns 4. Interpreting the patterns to accept/reject the hypothesis

• Discovery approach – Bottom-up without a hypothesis in mind– Procedure:

1. Collecting and preparing data of interest

2. Conducting data analysis and discovering possible patterns

3. Evaluating the importance and interestingness

Page 9: Data Mining Techniques and Applications, 1 st edition Hongbo Du ISBN 978-1-84480-891-5 © 2010 Cengage Learning Chapter Two Principles of data mining.

Data Mining Techniques and Applications, 1st editionHongbo Du

ISBN 978-1-84480-891-5 © 2010 Cengage Learning

Data Mining Approaches

• Discovery approach (cont’d)– Directed discovery (supervised learning):

• Certain aspects of the outcome, i.e. the goal, of the discovery have been specified. The discovery is to find those patterns satisfying the goal.e.g. patterns relating to the outcome of a class variable

– Undirected discovery (unsupervised learning): • There is no specification of the goal of the discovery.

The discovery is to find those patterns of some kind of significance.e.g. associative links among some attribute values

Page 10: Data Mining Techniques and Applications, 1 st edition Hongbo Du ISBN 978-1-84480-891-5 © 2010 Cengage Learning Chapter Two Principles of data mining.

Data Mining Techniques and Applications, 1st editionHongbo Du

ISBN 978-1-84480-891-5 © 2010 Cengage Learning

Data Mining: Problems & Patterns

• Classification– Construct a classification model to determine the class

of a given record

Example Data Set

Model Construction

MethodClassification

Model

ClassificationModel

(a) Model Development Phase

class?

Input features classCi

Input features

(b) Model Use Phase

Unseen Data Record with undetermined class

Data Record with the determined class

Page 11: Data Mining Techniques and Applications, 1 st edition Hongbo Du ISBN 978-1-84480-891-5 © 2010 Cengage Learning Chapter Two Principles of data mining.

Data Mining Techniques and Applications, 1st editionHongbo Du

ISBN 978-1-84480-891-5 © 2010 Cengage Learning

Data Mining: Problems & Patterns

• Various forms of classification models

Instance space Neural network Decision tree

List of ordered classification rulesFunction (linear regression)

Many more …

Page 12: Data Mining Techniques and Applications, 1 st edition Hongbo Du ISBN 978-1-84480-891-5 © 2010 Cengage Learning Chapter Two Principles of data mining.

Data Mining Techniques and Applications, 1st editionHongbo Du

ISBN 978-1-84480-891-5 © 2010 Cengage Learning

Data Mining: Problems & Patterns

• Cluster detection– Measure similarity among data objects and group them

into clusters accordingly

Cluster Memberships of Data Points

Input data points

ClusteringMethod

Page 13: Data Mining Techniques and Applications, 1 st edition Hongbo Du ISBN 978-1-84480-891-5 © 2010 Cengage Learning Chapter Two Principles of data mining.

Data Mining Techniques and Applications, 1st editionHongbo Du

ISBN 978-1-84480-891-5 © 2010 Cengage Learning

Data Mining: Problems & Patterns

• Forms of clustering resultsClusters of various shapes

Eclipse shaped clusters

Hierarchical clustering results

Page 14: Data Mining Techniques and Applications, 1 st edition Hongbo Du ISBN 978-1-84480-891-5 © 2010 Cengage Learning Chapter Two Principles of data mining.

Data Mining Techniques and Applications, 1st editionHongbo Du

ISBN 978-1-84480-891-5 © 2010 Cengage Learning

Data Mining: Problems & Patterns

• Association rule mining– Discover significant relationships between data

objects

AssociationMining Method X Y

– Between values, e.g. Apple Coke

– Between categories of values, e.g. Food Magazine

– Between values of attributes, e.g. Married:yes OwnHouse:yes

– Over time period, e.g. year 1: Database year 2: Data Mining

• Various associations

Page 15: Data Mining Techniques and Applications, 1 st edition Hongbo Du ISBN 978-1-84480-891-5 © 2010 Cengage Learning Chapter Two Principles of data mining.

Data Mining Techniques and Applications, 1st editionHongbo Du

ISBN 978-1-84480-891-5 © 2010 Cengage Learning

Data Mining: Problems & Patterns

• An exampleStudentID Gender Country Major Subject Age TotalUnits Degree Class

1 M UK Computing 22 360 1st Class2 F UK Computing 21 360 2nd Lower3 M FRANCE Psychology 24 345 2nd Lower4 M SPAIN Accounting 23 360 1st Class5 F UK Psychology 22 300 Pass6 F USA History 30 345 2nd Upper7 M UK Computing 35 360 1st Class8 F FRANCE Psychology 25 360 3rd Class9 F GERMANY History 23 360 2nd Upper10 M UK Accounting 22 360 1st Class11 M SPAIN History 20 345 2nd Upper12 F UK Law 45 300 Pass

StudentID Gender Country Major Subject Age TotalUnits Degree Class1 M UK Computing 22 360 1st Class2 F UK Computing 21 360 2nd Lower3 M FRANCE Psychology 24 345 2nd Lower4 M SPAIN Accounting 23 360 1st Class5 F UK Psychology 22 300 Pass6 F USA History 30 345 2nd Upper7 M UK Computing 35 360 1st Class8 F FRANCE Psychology 25 360 3rd Class9 F GERMANY History 23 360 2nd Upper10 M UK Accounting 22 360 1st Class11 M SPAIN History 20 345 2nd Upper12 F UK Law 45 300 Pass

Classification model? Clusters? Association rules?

Page 16: Data Mining Techniques and Applications, 1 st edition Hongbo Du ISBN 978-1-84480-891-5 © 2010 Cengage Learning Chapter Two Principles of data mining.

Data Mining Techniques and Applications, 1st editionHongbo Du

ISBN 978-1-84480-891-5 © 2010 Cengage Learning

Data Mining Solutions: An Overview

• Classification solutions– Decision tree e.g. ID3– k nearest neighbour (kNN) e.g. PEBLS– Rules e.g. Sequential Cover– Bayesian theorem e.g. Naïve Bayes– Artificial neural network

• Clustering Solutions– Partition-based methods e.g. K-means– Hierarchical methods e.g. agglomeration– Density-based methods e.g. DBScan– Model-based methods e.g. Expectation-

Maximisation– Graph-based methods e.g. Chameleon

Page 17: Data Mining Techniques and Applications, 1 st edition Hongbo Du ISBN 978-1-84480-891-5 © 2010 Cengage Learning Chapter Two Principles of data mining.

Data Mining Techniques and Applications, 1st editionHongbo Du

ISBN 978-1-84480-891-5 © 2010 Cengage Learning

Data Mining Solutions: An Overview

• Association rule solutions– Greedy methods e.g. Apriori– Graph-based methods e.g. FP-Growth– Methods for various associations

• Boolean associations• Generalised associations (multi-level associations)• Quantitative associations (multidimensional associations)• Sequential associations (sequential patterns)

Since one type of data mining problems can be transformed to another type of data mining problems, some solutions for one type can also be applied to another type.

Page 18: Data Mining Techniques and Applications, 1 st edition Hongbo Du ISBN 978-1-84480-891-5 © 2010 Cengage Learning Chapter Two Principles of data mining.

Data Mining Techniques and Applications, 1st editionHongbo Du

ISBN 978-1-84480-891-5 © 2010 Cengage Learning

Evaluation of Patterns

• Importance of evaluating result patterns– Classification model must be accurate enough to be

creditable – Clusters must genuinely exist– Association rules must have enough strengths to be

believed– Data descriptions must be general enough to cover a

large part of the data set

How do we evaluate the discovered patterns ?

Page 19: Data Mining Techniques and Applications, 1 st edition Hongbo Du ISBN 978-1-84480-891-5 © 2010 Cengage Learning Chapter Two Principles of data mining.

Data Mining Techniques and Applications, 1st editionHongbo Du

ISBN 978-1-84480-891-5 © 2010 Cengage Learning

Evaluation of Patterns

• Possible measures of interestingness– Objective measures based on data and pattern

• Conciseness of pattern, e.g. minimum description length • Coverage, e.g. coverage for classification rules• Reliability, e.g. accuracy of a classification model• Peculiarity, e.g. measures of difference from the norm• Diversity, e.g. tendency of clusters

– Subjective measures based on domain knowledge• Novelty• Surprisingness• Usefulness • Applicability

Page 20: Data Mining Techniques and Applications, 1 st edition Hongbo Du ISBN 978-1-84480-891-5 © 2010 Cengage Learning Chapter Two Principles of data mining.

Data Mining Techniques and Applications, 1st editionHongbo Du

ISBN 978-1-84480-891-5 © 2010 Cengage Learning

Evaluation of Patterns

• Commonly used measures– Accuracy rate or error rate for classification models

• True positive• False positive• False negative (see section 6.5.1)

– Quality of clusters• Quality of a cluster• Overall quality of all clusters (see section 4.5.1)

– Strengths of associations• Support• Confidence• Lift (see section 8.1.2 and 8.6)

Page 21: Data Mining Techniques and Applications, 1 st edition Hongbo Du ISBN 978-1-84480-891-5 © 2010 Cengage Learning Chapter Two Principles of data mining.

Data Mining Techniques and Applications, 1st editionHongbo Du

ISBN 978-1-84480-891-5 © 2010 Cengage Learning

Associate Tab page

Data Mining in Weka Explorer

• The roadmap

Preprocess Tab page

(1)

Cluster Tab page

(2)

Classify Tab page

Tree Visualiser window

(3)

Page 22: Data Mining Techniques and Applications, 1 st edition Hongbo Du ISBN 978-1-84480-891-5 © 2010 Cengage Learning Chapter Two Principles of data mining.

Data Mining Techniques and Applications, 1st editionHongbo Du

ISBN 978-1-84480-891-5 © 2010 Cengage Learning

Data Mining in Weka Explorer• Preprocess

Open data set from different sources

Generate random data set

Save data set into a file

Display & edit data

Attribute display, selection & removal from the opened data set

Selected attribute summary

Selected attribute visualisation

Visualise all attributes

Filters for pre-processing

Feedback messages

Data summary

Page 23: Data Mining Techniques and Applications, 1 st edition Hongbo Du ISBN 978-1-84480-891-5 © 2010 Cengage Learning Chapter Two Principles of data mining.

Data Mining Techniques and Applications, 1st editionHongbo Du

ISBN 978-1-84480-891-5 © 2010 Cengage Learning

Data Mining in Weka Explorer• Classify (as an example)

Method selection & parameter setting

Test option setting

Task list. Menu of options available with right click.

Result display window

Page 24: Data Mining Techniques and Applications, 1 st edition Hongbo Du ISBN 978-1-84480-891-5 © 2010 Cengage Learning Chapter Two Principles of data mining.

Data Mining Techniques and Applications, 1st editionHongbo Du

ISBN 978-1-84480-891-5 © 2010 Cengage Learning

Data Mining in Weka Explorer• Classify (as an example)

Method List

Selecting a specific method

Selecting &Changing parameters

Page 25: Data Mining Techniques and Applications, 1 st edition Hongbo Du ISBN 978-1-84480-891-5 © 2010 Cengage Learning Chapter Two Principles of data mining.

Data Mining Techniques and Applications, 1st editionHongbo Du

ISBN 978-1-84480-891-5 © 2010 Cengage Learning

Data Mining in Weka Explorer• Visualisation

An Example Decision Tree

Scatter plot of data object of different classes

Page 26: Data Mining Techniques and Applications, 1 st edition Hongbo Du ISBN 978-1-84480-891-5 © 2010 Cengage Learning Chapter Two Principles of data mining.

Data Mining Techniques and Applications, 1st editionHongbo Du

ISBN 978-1-84480-891-5 © 2010 Cengage Learning

Probability & Statistics: A Brief Review

• Where probability and statistics used?– Patterns found from data are probabilistic in nature– Used in various measures of evaluation, e.g. confidence

measure of association rules

– Used in data exploration stage for better understanding, e.g. maximum, minimum, mean, variance, skewness

– Used during the mining process to assist the discovery of patterns, e.g. information gain for decision tree induction

– Used as a part of patterns, e.g. naïve Bayes, Gaussian mixture model

– Used in comparison of patterns, e.g. classification model with significantly better accuracy

Page 27: Data Mining Techniques and Applications, 1 st edition Hongbo Du ISBN 978-1-84480-891-5 © 2010 Cengage Learning Chapter Two Principles of data mining.

Data Mining Techniques and Applications, 1st editionHongbo Du

ISBN 978-1-84480-891-5 © 2010 Cengage Learning

Probability & Statistics: A Brief Review

• Probability and conditional probability– Probability of event P(E) and its meanings when:

P(E) = 0, P(E) = 1 and 0 < P(E) < 1

– Probabilities of multiple events: P(E and F), P(E or F) = P(E) + P(F) – P(E and F)

– Mutually exclusive events: P(E and F) = 0 and P(E and F) = P(E) + P(F)

– Conditional probability of event E given event F: P(E|F) = P(E and F)/P(F)

– Independent events: P(E and F) = P(E)P(F), and P(E|F) = P(E)

Page 28: Data Mining Techniques and Applications, 1 st edition Hongbo Du ISBN 978-1-84480-891-5 © 2010 Cengage Learning Chapter Two Principles of data mining.

Data Mining Techniques and Applications, 1st editionHongbo Du

ISBN 978-1-84480-891-5 © 2010 Cengage Learning

Probability & Statistics: A Brief Review

• Probability & conditional probability (example)StudentID Gender Country Major Subject Age TotalUnits Degree Class

1 M UK Computing 22 360 1st Class2 F UK Computing 21 360 2nd Lower3 M FRANCE Psychology 24 345 2nd Lower4 M SPAIN Accounting 23 360 1st Class5 F UK Psychology 22 300 Pass6 F USA History 30 345 2nd Upper7 M UK Computing 35 360 1st Class8 F FRANCE Psychology 25 360 3rd Class9 F GERMANY History 23 360 2nd Upper10 M UK Accounting 22 360 1st Class11 M SPAIN History 20 345 2nd Upper12 F UK Law 45 300 Pass

StudentID Gender Country Major Subject Age TotalUnits Degree Class1 M UK Computing 22 360 1st Class2 F UK Computing 21 360 2nd Lower3 M FRANCE Psychology 24 345 2nd Lower4 M SPAIN Accounting 23 360 1st Class5 F UK Psychology 22 300 Pass6 F USA History 30 345 2nd Upper7 M UK Computing 35 360 1st Class8 F FRANCE Psychology 25 360 3rd Class9 F GERMANY History 23 360 2nd Upper10 M UK Accounting 22 360 1st Class11 M SPAIN History 20 345 2nd Upper12 F UK Law 45 300 Pass

2

1

12

6)( MGenderP

0 )( FGender and MGenderP

1 )( FGender or MGenderP

2

1)|( UKCountryFGenderP

Page 29: Data Mining Techniques and Applications, 1 st edition Hongbo Du ISBN 978-1-84480-891-5 © 2010 Cengage Learning Chapter Two Principles of data mining.

Data Mining Techniques and Applications, 1st editionHongbo Du

ISBN 978-1-84480-891-5 © 2010 Cengage Learning

Probability & Statistics: A Brief Review

• Probability distribution of random variables– Discrete random variable– Continuous random variable

P(X = x) P(a X < b)

68%

95%

Page 30: Data Mining Techniques and Applications, 1 st edition Hongbo Du ISBN 978-1-84480-891-5 © 2010 Cengage Learning Chapter Two Principles of data mining.

Data Mining Techniques and Applications, 1st editionHongbo Du

ISBN 978-1-84480-891-5 © 2010 Cengage Learning

Probability & Statistics: A Brief Review

• Basic Statistics

– Sample mean, median and mode

– Variance and standard deviation

– Skewness

n

xx

i

1

)( 22

n

xxs

ix

x

x

s

Medianx )(3

26age

23agemedian 22agemode

53.636sage 2 324.7sage

22913247

23263.

.

)(

ageskewness

Page 31: Data Mining Techniques and Applications, 1 st edition Hongbo Du ISBN 978-1-84480-891-5 © 2010 Cengage Learning Chapter Two Principles of data mining.

Data Mining Techniques and Applications, 1st editionHongbo Du

ISBN 978-1-84480-891-5 © 2010 Cengage Learning

Probability & Statistics: A Brief Review• Confidence interval estimate

– Sample mean is only an estimate of the true mean for the data population.

– Central limit theorem: sample means follows a normal distribution that:

a. The mean is the true population mean X b. The standard deviation is

– Based on the central limit theorem and using the sample standard deviation to replace the true one, the following expression is used to estimate the interval for the true mean at confidence level of 1-

n/

1)(n

stx

n

stxP XX

Page 32: Data Mining Techniques and Applications, 1 st edition Hongbo Du ISBN 978-1-84480-891-5 © 2010 Cengage Learning Chapter Two Principles of data mining.

Data Mining Techniques and Applications, 1st editionHongbo Du

ISBN 978-1-84480-891-5 © 2010 Cengage Learning

Probability & Statistics: A Brief Review

• Confidence interval estimate (example)

95012

3247201226

12

3247201226 .)

..

..( P

The interval is estimated as [21.347, 30.653] at confidence level of 95%

For this data set, n = 12, age = 26 and sage = 7.324. At confidence level of 95%, i.e. 1 - = 0.95 and /2 = 0.025, n – 1 = 11, and therefore, t = 2.201. The interval estimate is:

Page 33: Data Mining Techniques and Applications, 1 st edition Hongbo Du ISBN 978-1-84480-891-5 © 2010 Cengage Learning Chapter Two Principles of data mining.

Data Mining Techniques and Applications, 1st editionHongbo Du

ISBN 978-1-84480-891-5 © 2010 Cengage Learning

Probability & Statistics: A Brief Review

• Hypothesis testing– As an introduction to statistical

inference and statistic significance.

– Procedure:a. Forming null and alternative

hypotheses

b. Deciding the level of significance p

c. Determining a test statistic and calculating its value

d. Comparing the calculated value against known value and deciding if the null hypothesis should be rejected

Page 34: Data Mining Techniques and Applications, 1 st edition Hongbo Du ISBN 978-1-84480-891-5 © 2010 Cengage Learning Chapter Two Principles of data mining.

Data Mining Techniques and Applications, 1st editionHongbo Du

ISBN 978-1-84480-891-5 © 2010 Cengage Learning

• Hypothesis testing (example)– Assuming age = 25

– Hypotheses:

Null:

Alternative:

– Calculating the statistic t as:

Probability & Statistics: A Brief Review

0.473ns

aget

age

123247

2526

/./

Less than t = 2.201 for p/2 = 0.025 and n – 1 = 11.

– Conclusion: null hypothesis is not rejected, i.e. the difference between the sample mean and the population mean is insignificant.

ageage

ageage

Page 35: Data Mining Techniques and Applications, 1 st edition Hongbo Du ISBN 978-1-84480-891-5 © 2010 Cengage Learning Chapter Two Principles of data mining.

Data Mining Techniques and Applications, 1st editionHongbo Du

ISBN 978-1-84480-891-5 © 2010 Cengage Learning

Chapter Summary• The data mining process involves preparation of data, mining of

patterns and post-processing of the patterns.

• Top-down and bottom-up approaches are both useful. The discovery approach can be directed or undirected.

• Three main streams of data mining tasks and various forms of patterns and models are introduced.

• Specific solutions are required for specific types of problems

• The importance of evaluation of patterns must be appreciated.

• Normal procedure of conducting data mining in Weka is explained

• Some important basic concepts in probability and statistics are reviewed.

Page 36: Data Mining Techniques and Applications, 1 st edition Hongbo Du ISBN 978-1-84480-891-5 © 2010 Cengage Learning Chapter Two Principles of data mining.

Data Mining Techniques and Applications, 1st editionHongbo Du

ISBN 978-1-84480-891-5 © 2010 Cengage Learning

References

Read Chapter 2 of Data Mining Techniques and Applications

Useful further references

Han, J. and Kamber, M. (2006), Data Mining: Concepts and Techniques, 2nd Edition, Morgan Kaufmann Publishers, Chapter 1

Berry, M. J. A. and Linoff, G. (2004), Data Mining Techniques: For Marketing, Sales and Customer Relationship Management, 2nd ed. Wiley Computer Publishing, Chapters 1 – 2