Top Banner

of 31

1 Data Mining Processes and Knowledge Discovery (1)

Jun 02, 2018

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • 8/10/2019 1 Data Mining Processes and Knowledge Discovery (1)

    1/31

    CS359 Introduction to Data

    Mining

  • 8/10/2019 1 Data Mining Processes and Knowledge Discovery (1)

    2/31

    This course introduces the fundamental concepts ofdata mining and knowledge discovery fromdatabases.

    It focuses on the discussion and demonstration of

    common data mining methods and how data miningresults become useful to businesses andorganizations.

    Course objectives

  • 8/10/2019 1 Data Mining Processes and Knowledge Discovery (1)

    3/31

  • 8/10/2019 1 Data Mining Processes and Knowledge Discovery (1)

    4/31

    Attendance will be checked.

    No make-up quizzes

    Make-up long exam only for excused absence.

    Set schedule within a week after the exam date

    Late submissions will not be accepted (assignments,cases and project)

    Policies

  • 8/10/2019 1 Data Mining Processes and Knowledge Discovery (1)

    5/31

    Han, J. & Kamber, M. (2006) Data Mining Concepts

    and Techniques 2ndEdition. Morgan KaufmannPublisher Elsevier Inc., California.

    P. Tan, M. Steinbach & V. Kumar, Introduction to DataMining, Addison Wesley, 2006.

    References

  • 8/10/2019 1 Data Mining Processes and Knowledge Discovery (1)

    6/31

    Data Mining Software Links by Dr. Pang-Ning Tan :

    www.cse.msu.edu/~cse980/software.html RapidMiner : http://rapid-

    i.com/content/view/26/84/lang,en/

    Weka : http://www.cs.waikato.ac.nz/ml/weka/

    Software Links

    http://www.cse.msu.edu/~cse980/software.htmlhttp://rapid-i.com/content/view/26/84/lang,en/http://rapid-i.com/content/view/26/84/lang,en/http://www.cs.waikato.ac.nz/ml/weka/http://www.cs.waikato.ac.nz/ml/weka/http://rapid-i.com/content/view/26/84/lang,en/http://rapid-i.com/content/view/26/84/lang,en/http://rapid-i.com/content/view/26/84/lang,en/http://www.cse.msu.edu/~cse980/software.html
  • 8/10/2019 1 Data Mining Processes and Knowledge Discovery (1)

    7/31

    Data Mining Processes and

    Knowledge Discovery

  • 8/10/2019 1 Data Mining Processes and Knowledge Discovery (1)

    8/31

    Define Data Mining and knowledge discovery in

    databases. Discuss some business applications of data mining

    Identify the elements of the data mining process

    Discuss the steps in CRISP-DM

    Objectives

  • 8/10/2019 1 Data Mining Processes and Knowledge Discovery (1)

    9/31

    Is also known as Knowledge Discovery in Databases; a

    nontrivial extraction of implicit, previously unknownand potentially useful information from databases(Han et al, 1999)

    Involves the use of analysis to detect patterns and

    allow predictions. (Olson & Shi, 2007)

    What is Data Mining?

  • 8/10/2019 1 Data Mining Processes and Knowledge Discovery (1)

    10/31

    Exploratory data analysis

    Finds its roots along with the development in classicalstatistics, artificial intelligence and machine learning

    Looks for actionable information, or information thatcan be utilized in a concrete way to improve

    profitability

    Data Mining

  • 8/10/2019 1 Data Mining Processes and Knowledge Discovery (1)

    11/31

    Hypothesis Testing

    A theory about the relationship between actions andoutcomes is expressed and tested

    Knowledge Discovery

    Preconceived notion may not be present

    Relationships can be identified by looking in to the data

    Data Mining requires the identification of aproblem

    General Types of Data Mining

  • 8/10/2019 1 Data Mining Processes and Knowledge Discovery (1)

    12/31

    Retailing

    Affinity Positioning based upon the identification ofproducts that the same customer is likely to want

    Cross-selling knowledge of products that go togethercan be used by marketing the complementary product

    Data Mining Applications

  • 8/10/2019 1 Data Mining Processes and Knowledge Discovery (1)

    13/31

    Banking

    Customer Relationship Management identify customervalue, develop programs to maximize revenue

    Credit Card Management

    Identify Balance Surfers or credit card holders who pays

    old balances with a new card

    Lift identify effective market segments

    Churn identify likely customer turnover

    Data Mining Applications

  • 8/10/2019 1 Data Mining Processes and Knowledge Discovery (1)

    14/31

    Insurance

    Fraud detection identify fraud claims meritinginvestigation

    TelecommunicationsChurn customer turnover or switching carriers

    MedicineCancer Cell Detection

    Machine VisionPattern Recognition

    Data Mining Applications

  • 8/10/2019 1 Data Mining Processes and Knowledge Discovery (1)

    15/31

    Cross-Industry Standard Process for Data Mining

    Phases

    Business Understanding

    Data Understanding

    Data Preparation

    Modeling

    Evaluation

    Deployment

    CRISP-DM Process

  • 8/10/2019 1 Data Mining Processes and Knowledge Discovery (1)

    16/31

    Knowing what the study is for

    Identify business task

    Business Understanding

  • 8/10/2019 1 Data Mining Processes and Knowledge Discovery (1)

    17/31

    Select the related data from many available

    databases to correctly describe a givenbusiness task Identify relevant data for the problem descriptionSelected variables for the relevant data should be

    independent of each other or do not containoverlapping information

    Types of data: geographic, socio-graphic,transactional or quantitative and qualitative

    Data Understanding

  • 8/10/2019 1 Data Mining Processes and Knowledge Discovery (1)

    18/31

    Also known as data preprocessing

    Clean selected data for better quality

    Filter, aggregate and fill in missing values (imputation)

    Filter: remove outliers and redundancies

    Aggregate: data is reduces to obtain aggregatedinformation

    Filling-in or Smoothing: missing values are found andreplaces with reasonable values

    Data Preparation

  • 8/10/2019 1 Data Mining Processes and Knowledge Discovery (1)

    19/31

    Data transformationUses mathematical formulations to convert

    different measurements into a unified numericalscale

    Numerical to numerical scales

    Shrink or enlarge the dataCategorical to numerical scales

    Categorical values can be ordinal (less, moderate, strong)or nominal (red, yellow, blue)

    Data Preparation

  • 8/10/2019 1 Data Mining Processes and Knowledge Discovery (1)

    20/31

    Data mining software is used to generate results for

    various situations Data is divided into:

    Training set used for the development of the model

    Test set used to test the model thats built

    Modeling

  • 8/10/2019 1 Data Mining Processes and Knowledge Discovery (1)

    21/31

    Data Modeling Techniques

    Association the relationship of a particular item in adata transaction on other items in the same transactionis used to predict patterns

    Classification learning different functions that mapeach item of the selected data into one of a predefined

    set of classes

    Modeling

  • 8/10/2019 1 Data Mining Processes and Knowledge Discovery (1)

    22/31

    Clustering takes ungrouped data and uses automatic

    techniques to put this data into groupsPrediction Analysis discover the relationship between

    the dependent and independent variables

    Sequential Pattern Analysis seeks to fine similar

    patterns in data transaction over a business period

    Modeling

  • 8/10/2019 1 Data Mining Processes and Knowledge Discovery (1)

    23/31

    Data interpretation stage

    Two things to consider:How to recognize business value from knowledge

    patterns discovered

    How to visualize the results to properly interpret

    patterns

    Evaluation

  • 8/10/2019 1 Data Mining Processes and Knowledge Discovery (1)

    24/31

    The results are reported to project sponsors

    The result is applied to business task or data miningobjective

    Deployment

  • 8/10/2019 1 Data Mining Processes and Knowledge Discovery (1)

    25/31

    Data Cleaning

    Data Integration

    Data Selection

    Data Transformation

    Data Mining

    Pattern Evaluation

    Knowledge Presentation

    Knowledge Discovery Process

  • 8/10/2019 1 Data Mining Processes and Knowledge Discovery (1)

    26/31

    Data Mining System Architecture

    DatabaseData

    WarehouseWWW

    Other

    Repositories

    Data Mining Engine

    Pattern Evaluation

    User Interface

    Knowledge

    Base

    Database or Data Warehouse Server

    Data cleaning, Integration and Selection

  • 8/10/2019 1 Data Mining Processes and Knowledge Discovery (1)

    27/31

    Relational Databases

    Data Warehouses

    Transactional Databases

    Object-Relational Databases

    Temporal, Sequence or Time-Series Database

    Spatial Databases and Spatiotemporal Databases

    Data Mining on what data?

  • 8/10/2019 1 Data Mining Processes and Knowledge Discovery (1)

    28/31

    Descriptive characterize the general properties of

    data Data characterization, Data discrimination, Association,

    Clustering

    Predictive performs inference on the current data in

    order to make predictions Classification and Prediction, Evolution analysis

    Data Mining - what patterns?

  • 8/10/2019 1 Data Mining Processes and Knowledge Discovery (1)

    29/31

    NO

    A pattern is interesting if (1) it is easily understood by humans,

    (2) valid on new or test data with some degree ofcertainty,

    (3) potentially useful, and

    (4) novel.

    A pattern is also interesting if it validates ahypothesis that the user sought to confirm.

    Are all patterns interesting?

  • 8/10/2019 1 Data Mining Processes and Knowledge Discovery (1)

    30/31

    Refers to COMPLETENESS of a data mining algorithm

    It is unrealistic and inefficient for data mining systemsto generate all of the possible patterns.

    A focused search which makes use of interestingnessmeasures should be used to control pattern

    generation.

    Can a data mining system generateall interesting patterns?

  • 8/10/2019 1 Data Mining Processes and Knowledge Discovery (1)

    31/31

    1. What is the business task or data mining objective?

    2. What are the relevant data and their sources?3. How was the data prepared? What were the

    processes?

    4. What was the data mining technique used?

    5. How was the model used to address the businesstask?

    CASE study: Telephone company