Top Banner
 PRESENT BY K.Aravind (10mx03) M.Boobalan (10mx05) V .Boopathi raj (10mx06) S.Kadhiresan (10mx18) L.RoshanAli (10mx41) A.Selvaraj (10mx46) Data Mining: Functionalities, Classification and Task Primitives
24

Data Mining - Functionalities,Classification and Task Primitives

Jul 13, 2015

Download

Documents

selvarunachalam
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Data Mining - Functionalities,Classification and Task Primitives

5/12/2018 Data Mining - Functionalities,Classification and Task Primitives - slidepdf.com

http://slidepdf.com/reader/full/data-mining-functionalitiesclassification-and-task-primitives 1/24

PRESENT BY

K.Aravind (10mx03)

M.Boobalan (10mx05)

V.Boopathiraj (10mx06)

S.Kadhiresan(10mx18)

L.RoshanAli (10mx41)

A.Selvaraj (10mx46)

Data Mining: Functionalities,

Classification and Task Primitives

Page 2: Data Mining - Functionalities,Classification and Task Primitives

5/12/2018 Data Mining - Functionalities,Classification and Task Primitives - slidepdf.com

http://slidepdf.com/reader/full/data-mining-functionalitiesclassification-and-task-primitives 2/24

Data Mining Functionalities

It includes

Characterization and Discrimination

Mining Frequent Patterns, Associations, and Correlations

C

lassification and Regression Clustering Analysis

Outlier Analysis

Page 3: Data Mining - Functionalities,Classification and Task Primitives

5/12/2018 Data Mining - Functionalities,Classification and Task Primitives - slidepdf.com

http://slidepdf.com/reader/full/data-mining-functionalitiesclassification-and-task-primitives 3/24

Characterization and Discrimination

y Class/Concept Description: Characterization and

Discrimination

y Data entries can be associated with classes or concepts

y

describe individual classes and concepts in summarized, concise,and precise terms. Such descriptions of a class or a concept are

called class/concept descriptions.

Page 4: Data Mining - Functionalities,Classification and Task Primitives

5/12/2018 Data Mining - Functionalities,Classification and Task Primitives - slidepdf.com

http://slidepdf.com/reader/full/data-mining-functionalitiesclassification-and-task-primitives 4/24

Characterization and Discrimination

y Data characterization is a summarization of the general

characteristics or features of a target class of data.

y Data discrimination is a comparison of the general

features of the target class data objects against the generalfeatures of objects from one or multiple contrasting classes.

Page 5: Data Mining - Functionalities,Classification and Task Primitives

5/12/2018 Data Mining - Functionalities,Classification and Task Primitives - slidepdf.com

http://slidepdf.com/reader/full/data-mining-functionalitiesclassification-and-task-primitives 5/24

Mining Frequent Patterns,

Association &Correlation

y Frequent Patterns are the patterns that occur simultaneously.

y frequent patterns, including frequent itemsets, frequent

subsequences (sequential patterns), and frequent substructures

yAssociation Rulesy Single dimensional association rules vs Multi dimensional

association rules

Page 6: Data Mining - Functionalities,Classification and Task Primitives

5/12/2018 Data Mining - Functionalities,Classification and Task Primitives - slidepdf.com

http://slidepdf.com/reader/full/data-mining-functionalitiesclassification-and-task-primitives 6/24

Association rules are discarded as uninteresting if they do not

satisfy both a minimum support threshold and a

minimum confidence threshold.

Page 7: Data Mining - Functionalities,Classification and Task Primitives

5/12/2018 Data Mining - Functionalities,Classification and Task Primitives - slidepdf.com

http://slidepdf.com/reader/full/data-mining-functionalitiesclassification-and-task-primitives 7/24

Classification and Regression

y Classification is the process of finding a model (or function)

that describes and distinguishes data classes or concepts for

future prediction

y C

lass label is knowny E.g., classify countries based on climate, or classify cars based on

gas mileage

y Presentation: decision-tree, classification rule, neural network

y Prediction: Predict some unknown or missing numerical values

Page 8: Data Mining - Functionalities,Classification and Task Primitives

5/12/2018 Data Mining - Functionalities,Classification and Task Primitives - slidepdf.com

http://slidepdf.com/reader/full/data-mining-functionalitiesclassification-and-task-primitives 8/24

Cluster Analysis

y Class label is unknown: Group data to form new classes, e.g.,

cluster houses to find distribution patterns

y Clustering based on the principle:Maximizing the Intra-

class similarity andMinimizing the Interclasssimilarity

Page 9: Data Mining - Functionalities,Classification and Task Primitives

5/12/2018 Data Mining - Functionalities,Classification and Task Primitives - slidepdf.com

http://slidepdf.com/reader/full/data-mining-functionalitiesclassification-and-task-primitives 9/24

Outlier Analysis

y Outlier: a data object that does not comply with the general

 behavior of the data

y It can be considered as noise or exception but it is quite

useful in fraud detection, rare events analysisy The analysis of outlier data is referred to as outlier analysis

or anomaly mining.

Page 10: Data Mining - Functionalities,Classification and Task Primitives

5/12/2018 Data Mining - Functionalities,Classification and Task Primitives - slidepdf.com

http://slidepdf.com/reader/full/data-mining-functionalitiesclassification-and-task-primitives 10/24

Are all Patterns Interesting?

y What makes a pattern interesting?

y Can a data mining system generate all of the interesting

patterns?

yCan a data mining system generate only interesting patterns?

Page 11: Data Mining - Functionalities,Classification and Task Primitives

5/12/2018 Data Mining - Functionalities,Classification and Task Primitives - slidepdf.com

http://slidepdf.com/reader/full/data-mining-functionalitiesclassification-and-task-primitives 11/24

Cont«

y A data mining system/query may generate thousands of patterns, not all of them are

interesting.

y Suggested approach: Human-centered, query-based, focused mining

y

Interestingness measures: A pattern is interesting if it is easily understood byhumans, valid on new or test data with some degree of certainty, potentially useful,

novel, or validates some hypothesis that a user seeks to confirm

y Objective vs. subjective interestingness measures:

yObjective: based on statistics and structures of patterns, e.g., support,confidence, etc.

y Subjective: based on user·s belief in the data, e.g., unexpectedness, novelty,

actionability, etc.

Page 12: Data Mining - Functionalities,Classification and Task Primitives

5/12/2018 Data Mining - Functionalities,Classification and Task Primitives - slidepdf.com

http://slidepdf.com/reader/full/data-mining-functionalitiesclassification-and-task-primitives 12/24

Can We Find All and Only Interesting 

Patterns?

y Find all the interesting patterns:Completeness

y Can a data mining system find all the interesting patterns?

y Association vs. classification vs. clustering

y Search for only interesting patterns: Optimization

y Can a data mining system find only the interesting patterns?

y Approaches

y First general all the patterns and then filter out the uninteresting ones.

y Generate only the interesting patterns³mining query optimization

Page 13: Data Mining - Functionalities,Classification and Task Primitives

5/12/2018 Data Mining - Functionalities,Classification and Task Primitives - slidepdf.com

http://slidepdf.com/reader/full/data-mining-functionalitiesclassification-and-task-primitives 13/24

Data Mining : Classification Schemes

y General Functionality

y Descriptive data mining

y Predictive data mining

y

Different views, different classificationsy Kinds of databases to be mined

y Kinds of knowledge to be discovered

y Kinds of techniques utilized

y

Kinds of applications adapted

Page 14: Data Mining - Functionalities,Classification and Task Primitives

5/12/2018 Data Mining - Functionalities,Classification and Task Primitives - slidepdf.com

http://slidepdf.com/reader/full/data-mining-functionalitiesclassification-and-task-primitives 14/24

A Multi-Dimensional View of Data

Mining Classificationy Databases to be mined

y Relational, transactional, object-oriented, object-relational, active,

spatial, time-series, text, multi-media, heterogeneous, legacy,

WWW, etc.

y Knowledge to be mined

y Characterization, discrimination, association, classification,

clustering, trend, deviation and outlier analysis, etc.

y Multiple/integrated functions and mining at multiple levels

Page 15: Data Mining - Functionalities,Classification and Task Primitives

5/12/2018 Data Mining - Functionalities,Classification and Task Primitives - slidepdf.com

http://slidepdf.com/reader/full/data-mining-functionalitiesclassification-and-task-primitives 15/24

Cont«

y Techniques utilized

y Database-oriented, data warehouse (OLAP), machine learning,

statistics, visualization, neural network, etc.

y

Applications adaptedy Retail, telecommunication, banking, fraud analysis, DNA

mining, stock market analysis,Web mining,Weblog analysis,

etc.

Page 16: Data Mining - Functionalities,Classification and Task Primitives

5/12/2018 Data Mining - Functionalities,Classification and Task Primitives - slidepdf.com

http://slidepdf.com/reader/full/data-mining-functionalitiesclassification-and-task-primitives 16/24

Data Mining : Task Primitives

y Data mining without user interaction is usually not helpful

y Users may request a few ´data mining primitivesµ to be

performed on data

y

specification of data to be minedy set of data in which the user is interested

y kinds of knowledge to be mined

y background knowledge useful in guiding the discovery process

y

specification of how knowledge should be visualized

Page 17: Data Mining - Functionalities,Classification and Task Primitives

5/12/2018 Data Mining - Functionalities,Classification and Task Primitives - slidepdf.com

http://slidepdf.com/reader/full/data-mining-functionalitiesclassification-and-task-primitives 17/24

Pieces of a Data Mining Tasky What data to mine

y list of relevant attributes

y Kinds of knowledge to be minedy characterizationy discriminationy associationy classificationy clusteringy evolution analysis

y Background knowledgey concept hierarchies

y Interestingness Measuresy separate patterns from knowledge

y Presentation and visualization of patterns

Page 18: Data Mining - Functionalities,Classification and Task Primitives

5/12/2018 Data Mining - Functionalities,Classification and Task Primitives - slidepdf.com

http://slidepdf.com/reader/full/data-mining-functionalitiesclassification-and-task-primitives 18/24

Task Relevant Data

y Mixable ´viewµ of the data

y name of database or warehouse

y name of tables or cubes

y

conditions for selecting useful datay type = ́ home entertainmentµ

y type = ́ fruitµ

y attributes or dimensions (e.g.; name and price)

Page 19: Data Mining - Functionalities,Classification and Task Primitives

5/12/2018 Data Mining - Functionalities,Classification and Task Primitives - slidepdf.com

http://slidepdf.com/reader/full/data-mining-functionalitiesclassification-and-task-primitives 19/24

Kind of Knowledge to be Mined

y Templates or ´meta patternsµ may be used to specify output

of results:

y P(X: customer,W) AND Q(X,Y) -> buys(X,Z)

y age(X,µ30..30µ) AND income(X, µ40K«49Kµ) -> buys(X, ́

VCRµ) [2.2%, 60%]

y Might specify to classify input file of customers as ´ likely to buy

µ, ´ not likely to buyµ

y indicates 60% confidence is to be used and such cases should

represent 2.2% of all transactions.

Page 20: Data Mining - Functionalities,Classification and Task Primitives

5/12/2018 Data Mining - Functionalities,Classification and Task Primitives - slidepdf.com

http://slidepdf.com/reader/full/data-mining-functionalitiesclassification-and-task-primitives 20/24

Background Knowledge:

Concept Hierarchies

y Concept Hierarchy

y defines a sequence of mappings from a set of low-level

y concept to higher-level.

y location

y time

y product

y Types of hierarchies

y schema hierarchy

y set-grouping hierarchy

y operation derived hierarchy

y rule-based hierarchy

Page 21: Data Mining - Functionalities,Classification and Task Primitives

5/12/2018 Data Mining - Functionalities,Classification and Task Primitives - slidepdf.com

http://slidepdf.com/reader/full/data-mining-functionalitiesclassification-and-task-primitives 21/24

Concept Hierarchies

y Schema

y total or partial order among an attribute, usually aware house

dimension (time, location, etc.)

y

Set-Groupy values for a given attribute are lumped into groups of constants

or range values

y Operation defined

y automatically derived, clustering, extraction, etc.

y Rule-based

y hierarchy may be well defined by set of rules

Page 22: Data Mining - Functionalities,Classification and Task Primitives

5/12/2018 Data Mining - Functionalities,Classification and Task Primitives - slidepdf.com

http://slidepdf.com/reader/full/data-mining-functionalitiesclassification-and-task-primitives 22/24

Interestingness Measures

y Simplicity

e.g., (association) rule length, (decision) tree size

y Certainty

e.g., confidence, P(A|B) = n(A and B)/ n (B), classificationreliability or accuracy, certainty factor, rule strength, rule quality,discriminating weight, etc.

y Utility

potential usefulness, e.g., support (association), noise threshold

(description)y Novelty

not previously known, surprising (used to remove redundantrules, e.g.,Canada vs. Vancouver rule implication support ratio

Page 23: Data Mining - Functionalities,Classification and Task Primitives

5/12/2018 Data Mining - Functionalities,Classification and Task Primitives - slidepdf.com

http://slidepdf.com/reader/full/data-mining-functionalitiesclassification-and-task-primitives 23/24

Presentation and visualization of patterns

y Different backgrounds/usages may require different forms of 

representation

y E.g., rules, tables, crosstabs, pie/bar chart etc.

yConcept hierarchy is also importanty Discovered knowledge might be more understandable when

represented at high level of abstraction

y Interactive drill up/down, pivoting, slicing and dicing provide

different perspective to data

y Different kinds of knowledge require different representation:

association, classification, clustering, etc.

Page 24: Data Mining - Functionalities,Classification and Task Primitives

5/12/2018 Data Mining - Functionalities,Classification and Task Primitives - slidepdf.com

http://slidepdf.com/reader/full/data-mining-functionalitiesclassification-and-task-primitives 24/24

Thank you!!!