Top Banner
CogNova Technologies 1 Knowledge Discovery Knowledge Discovery and Data Mining and Data Mining An Introduction An Introduction Daniel L. Silver Daniel L. Silver Copyright (c), 2003 All Rights Reserved
48

CogNova Technologies 1 Knowledge Discovery and Data Mining An Introduction Daniel L. Silver Copyright (c), 2003 All Rights Reserved.

Dec 18, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: CogNova Technologies 1 Knowledge Discovery and Data Mining An Introduction Daniel L. Silver Copyright (c), 2003 All Rights Reserved.

CogNovaTechnologies

1

Knowledge Discovery Knowledge Discovery and Data Miningand Data Mining

An IntroductionAn Introduction

Daniel L. SilverDaniel L. Silver

Copyright (c), 2003All Rights Reserved

Page 2: CogNova Technologies 1 Knowledge Discovery and Data Mining An Introduction Daniel L. Silver Copyright (c), 2003 All Rights Reserved.

CogNovaTechnologies

2

AgendaAgenda

Introduction to KDD & DMIntroduction to KDD & DM Overview of the KDD ProcessOverview of the KDD Process Benefits, Costs, Status and TrendsBenefits, Costs, Status and Trends

Page 3: CogNova Technologies 1 Knowledge Discovery and Data Mining An Introduction Daniel L. Silver Copyright (c), 2003 All Rights Reserved.

CogNovaTechnologies

3

““We are drowning in information, We are drowning in information, but starving for knowledge.but starving for knowledge.”” John John

NaisbettNaisbett

Megatrends, 1988Megatrends, 1988

Data Analytics or KDD:Data Analytics or KDD:Data Warehousing, Data Mining, Data Warehousing, Data Mining,

Data Visualization Data Visualization

Page 4: CogNova Technologies 1 Knowledge Discovery and Data Mining An Introduction Daniel L. Silver Copyright (c), 2003 All Rights Reserved.

CogNovaTechnologies

4

IntroductionIntroductionData Analytics is not a new field ...Data Analytics is not a new field ... Since 1990’s referred to as:Since 1990’s referred to as: Data Analysis,Data Analysis, Data Mining, Data WarehousingData Mining, Data Warehousing

A multidisciplinary field:A multidisciplinary field:• Database and data warehousingDatabase and data warehousing• Data and model visualization methodsData and model visualization methods• On-line Analytical ProcessingOn-line Analytical Processing• Statistics and machine learning Statistics and machine learning • Knowledge managementKnowledge management

Page 5: CogNova Technologies 1 Knowledge Discovery and Data Mining An Introduction Daniel L. Silver Copyright (c), 2003 All Rights Reserved.

CogNovaTechnologies

5

IntroductionIntroduction

Page 6: CogNova Technologies 1 Knowledge Discovery and Data Mining An Introduction Daniel L. Silver Copyright (c), 2003 All Rights Reserved.

CogNovaTechnologies

6

IntroductionIntroduction

What is Data Analytics (KDD)? What is Data Analytics (KDD)?

A ProcessA Process The selection and processing of data for:The selection and processing of data for:

• the identification of novel, accurate, and the identification of novel, accurate, and useful patterns, and useful patterns, and

• the modeling of real-world phenomenon.the modeling of real-world phenomenon. Data Warehousing, Data mining, and Data Data Warehousing, Data mining, and Data

Visualization Visualization are major components.are major components.

Page 7: CogNova Technologies 1 Knowledge Discovery and Data Mining An Introduction Daniel L. Silver Copyright (c), 2003 All Rights Reserved.

CogNovaTechnologies

7

The KDD ProcessThe KDD Process

Selection and Preprocessing

Data Mining

Interpretation and Evaluation

Data Consolidation

Knowledge

p(x)=0.02

DataWarehouse

Data Sources

Patterns & Models

Prepared Data

ConsolidatedData

Page 8: CogNova Technologies 1 Knowledge Discovery and Data Mining An Introduction Daniel L. Silver Copyright (c), 2003 All Rights Reserved.

CogNovaTechnologies

8

Introduction – KDD In ContextIntroduction – KDD In Context

CogNovaTechnologies

9

The KDD ProcessThe KDD Process

Selection and Preprocessing

Data Mining

Interpretation and Evaluation

Data Consolidation

Knowledge

p(x)=0.02

Warehouse

Data Sources

Patterns & Models

Prepared Data

ConsolidatedData

IdentifyProblem or Opportunity

Measure Effectof Action

Act onKnowledge

“The VirtuousCycle” Berry & Linoff

Knowledge

ResultsStrategy

Problem

Page 9: CogNova Technologies 1 Knowledge Discovery and Data Mining An Introduction Daniel L. Silver Copyright (c), 2003 All Rights Reserved.

CogNovaTechnologies

9

Introduction - CRISPIntroduction - CRISP CrCrossoss I Industry ndustry SStandard tandard PProcess for Data rocess for Data

MiningMining Developed by employees at SPSS, NCR, Developed by employees at SPSS, NCR,

DaimlerCrysler DaimlerCrysler Iterative process with 6 major steps:Iterative process with 6 major steps:

• Business UnderstandingBusiness Understanding• Data UnderstandingData Understanding• Data PreparationData Preparation• Modeling Modeling • EvaluationEvaluation• DeploymentDeployment

Page 10: CogNova Technologies 1 Knowledge Discovery and Data Mining An Introduction Daniel L. Silver Copyright (c), 2003 All Rights Reserved.

CogNovaTechnologies

10

Why? … Why? … RelationshipRelationship MarketingMarketinga.k.aa.k.a

Customer Relationship Customer Relationship ManagementManagement

Marketing Embraces KM, DW, Marketing Embraces KM, DW, DMDM

Marketing

TraditionalMarketing

MIS

DataWarehousingData Mining

Page 11: CogNova Technologies 1 Knowledge Discovery and Data Mining An Introduction Daniel L. Silver Copyright (c), 2003 All Rights Reserved.

CogNovaTechnologies

11

What is Relationship What is Relationship Marketing?Marketing?

Knowing your customers Knowing your customers on an individual basison an individual basis

Maximizing life-time value Maximizing life-time value not individual sales not individual sales

Developing and Developing and maintaining a mutually maintaining a mutually beneficial relationshipbeneficial relationship

Acquire, retain, win-back Acquire, retain, win-back desirable customersdesirable customers

Arbuckle’sMarket

“ The Corner Store ”

Page 12: CogNova Technologies 1 Knowledge Discovery and Data Mining An Introduction Daniel L. Silver Copyright (c), 2003 All Rights Reserved.

CogNovaTechnologies

12

Knowledge DiscoveryKnowledge Discovery

What can KDD do for an organization?What can KDD do for an organization?

Impact on MarketingImpact on Marketing Target marketing at a credit card companyTarget marketing at a credit card company Consumer usage analysis at a telecomm Consumer usage analysis at a telecomm

providerprovider Loyalty assessment at a service bureauLoyalty assessment at a service bureau Quality of service analysis at an appliance Quality of service analysis at an appliance

chainchain

Page 13: CogNova Technologies 1 Knowledge Discovery and Data Mining An Introduction Daniel L. Silver Copyright (c), 2003 All Rights Reserved.

CogNovaTechnologies

13

Application Areas Application Areas Private/Commercial SectorPrivate/Commercial Sector

Marketing: Marketing: segmentation, product targeting,segmentation, product targeting,customer value and retention, ...customer value and retention, ...

Finance: Finance: investment support, portfolio managementinvestment support, portfolio management Banking & Insurance: Banking & Insurance: credit and policy approvalcredit and policy approval Security: Security: fraud detection, access controlfraud detection, access control Science and medicine: Science and medicine: hypothesis discovery, hypothesis discovery,

prediction, classification, diagnosis prediction, classification, diagnosis Manufacturing: Manufacturing: process modeling, quality control,process modeling, quality control,

resource allocation resource allocation Engineering: Engineering: pattern recognition, signal processingpattern recognition, signal processing Internet: Internet: smart search engines, web marketing smart search engines, web marketing

Page 14: CogNova Technologies 1 Knowledge Discovery and Data Mining An Introduction Daniel L. Silver Copyright (c), 2003 All Rights Reserved.

CogNovaTechnologies

14

Application Areas Application Areas Public/GovPublic/Gov’’t Sectort Sector

Finance: Finance: investment management, price investment management, price forecastingforecasting

Taxation: Taxation: adaptive monitoring, fraud detection adaptive monitoring, fraud detection Health care: Health care: medical diagnosis, risk assessment,medical diagnosis, risk assessment,

cost /quality controlcost /quality control Education: Education: process and quality modeling, process and quality modeling,

resource forecastingresource forecasting Insurance: Insurance: workerworker’’s compensation analysis s compensation analysis Security: Security: bomb, iceberg detectionbomb, iceberg detection Transportation: Transportation: simulation and analysissimulation and analysis Statistics: Statistics: demographic analysis, municipal demographic analysis, municipal

planning planning

Page 15: CogNova Technologies 1 Knowledge Discovery and Data Mining An Introduction Daniel L. Silver Copyright (c), 2003 All Rights Reserved.

CogNovaTechnologies

15

The Data Analytics The Data Analytics (KDD) Process(KDD) Process

Page 16: CogNova Technologies 1 Knowledge Discovery and Data Mining An Introduction Daniel L. Silver Copyright (c), 2003 All Rights Reserved.

CogNovaTechnologies

16

The KDD ProcessThe KDD Process

Selection and Preprocessing

Data Mining

Interpretation and Evaluation

Data Consolidation

Knowledge

p(x)=0.02

Warehouse

Data Sources

Patterns & Models

Prepared Data

ConsolidatedData

Page 17: CogNova Technologies 1 Knowledge Discovery and Data Mining An Introduction Daniel L. Silver Copyright (c), 2003 All Rights Reserved.

CogNovaTechnologies

17

The KDD ProcessThe KDD Process

Possible results for any one effort:Possible results for any one effort: Confirmation of the obviousConfirmation of the obvious

New knowledge - the data mine New knowledge - the data mine ““nuggetnugget””

No significant relations found No significant relations found (random (random data)data)

Page 18: CogNova Technologies 1 Knowledge Discovery and Data Mining An Introduction Daniel L. Silver Copyright (c), 2003 All Rights Reserved.

CogNovaTechnologies

18

The KDD ProcessThe KDD ProcessCore Problems & Approaches Core Problems & Approaches

Problems:Problems:• identificationidentification of relevant data of relevant data• representationrepresentation of data of data• searchsearch for valid pattern or model for valid pattern or model

Approaches:Approaches:• top-down top-down deduction deduction by expertby expert• interactive interactive visualization visualization of data/modelsof data/models• * bottom-up* bottom-up induction induction from data *from data *

Probabilityof sale

Income

Age

DataMining

OLAP

Page 19: CogNova Technologies 1 Knowledge Discovery and Data Mining An Introduction Daniel L. Silver Copyright (c), 2003 All Rights Reserved.

CogNovaTechnologies

19

The KDD ProcessThe KDD ProcessThe Architecture of a KDD SystemThe Architecture of a KDD System

Graphical User Interface

DataConsolidation

Selectionand

Preprocessing

DataMining

Interpretationand Evaluation

Warehouse KnowledgeData Sources

Page 20: CogNova Technologies 1 Knowledge Discovery and Data Mining An Introduction Daniel L. Silver Copyright (c), 2003 All Rights Reserved.

CogNovaTechnologies

20

The KDD ProcessThe KDD Process

Selection and Preprocessing

Data Mining

Interpretation and Evaluation

Data Consolidation

Knowledge

p(x)=0.02

Warehouse

Page 21: CogNova Technologies 1 Knowledge Discovery and Data Mining An Introduction Daniel L. Silver Copyright (c), 2003 All Rights Reserved.

CogNovaTechnologies

21

Data ConsolidationData Consolidation

Garbage in Garbage out Garbage in Garbage out The quality of results relates directly to The quality of results relates directly to

quality of the dataquality of the data 50%-70% of KDD process effort will be 50%-70% of KDD process effort will be

spent on data consolidation, cleansing spent on data consolidation, cleansing and preprocessingand preprocessing

Major justification for a corporate Major justification for a corporate Data Data WarehouseWarehouse

Page 22: CogNova Technologies 1 Knowledge Discovery and Data Mining An Introduction Daniel L. Silver Copyright (c), 2003 All Rights Reserved.

CogNovaTechnologies

22

Data Consolidation & Data Consolidation & WarehousingWarehousingFrom data sources to consolidated data From data sources to consolidated data

repositoryrepository

RDBMS

Legacy DBMS

Flat Files

DataConsolidationand Cleansing

Warehouseor Datamart

External

Analysis and Info Sharing

Inflow

MetaflowUpflowDownflowOutflow

Page 23: CogNova Technologies 1 Knowledge Discovery and Data Mining An Introduction Daniel L. Silver Copyright (c), 2003 All Rights Reserved.

CogNovaTechnologies

24

Data Warehousing – A Data Warehousing – A ProcessProcess

Definition: The strategic collection, cleansing, and Definition: The strategic collection, cleansing, and consolidation of organizational data to meet operational, consolidation of organizational data to meet operational, analytical, and communication needs.analytical, and communication needs.

75% of early DW projects were not completed75% of early DW projects were not completed Data warehousing is not a projectData warehousing is not a project It is an on-going set of organizational activitiesIt is an on-going set of organizational activities Must be business benefits drivenMust be business benefits driven

Page 24: CogNova Technologies 1 Knowledge Discovery and Data Mining An Introduction Daniel L. Silver Copyright (c), 2003 All Rights Reserved.

CogNovaTechnologies

27

Relationship between DW Relationship between DW and DM?and DM?

Source of consolidated

data

Rationalefor data

consolidation

Data Warehousing

AnalysisQuery/Reporting

OLAPData Mining

Strategic Tactical

Page 25: CogNova Technologies 1 Knowledge Discovery and Data Mining An Introduction Daniel L. Silver Copyright (c), 2003 All Rights Reserved.

CogNovaTechnologies

28

The KDD ProcessThe KDD Process

Selection and Preprocessing

Data Mining

Interpretation and Evaluation

Data Consolidation

Knowledge

p(x)=0.02

Warehouse

Page 26: CogNova Technologies 1 Knowledge Discovery and Data Mining An Introduction Daniel L. Silver Copyright (c), 2003 All Rights Reserved.

CogNovaTechnologies

29

Selection and Selection and PreprocessingPreprocessing Generate a set of examplesGenerate a set of examples

• choose sampling methodchoose sampling method• consider sample complexityconsider sample complexity• deal with volume bias issuesdeal with volume bias issues

Reduce attribute dimensionalityReduce attribute dimensionality• remove redundant and/or correlating attributesremove redundant and/or correlating attributes• combine attributes (sum, multiply, difference)combine attributes (sum, multiply, difference)

Reduce attribute value rangesReduce attribute value ranges• group symbolic discrete valuesgroup symbolic discrete values• quantize continuous numeric valuesquantize continuous numeric values

OLAP and visualization tools play key role (Han OLAP and visualization tools play key role (Han calls this calls this descriptive data miningdescriptive data mining))

Page 27: CogNova Technologies 1 Knowledge Discovery and Data Mining An Introduction Daniel L. Silver Copyright (c), 2003 All Rights Reserved.

CogNovaTechnologies

30

OLAP: OLAP: On-Line Analytical On-Line Analytical ProcessingProcessing

OLAP FunctionalityOLAP Functionality Dimension selection Dimension selection

• slice & diceslice & dice RotationRotation

• allows change in perspectiveallows change in perspective

FiltrationFiltration • value range selectionvalue range selection

HierarchiesHierarchies• drill-downs to lower levels drill-downs to lower levels • roll-ups to higher levelsroll-ups to higher levels

OLAPcube

Year by Month

Product Classby Product Name

SalesRegion

Profit Values

Page 28: CogNova Technologies 1 Knowledge Discovery and Data Mining An Introduction Daniel L. Silver Copyright (c), 2003 All Rights Reserved.

CogNovaTechnologies

31

Selection and Selection and PreprocessingPreprocessing Transform dataTransform data

• decorrelate and normalize values decorrelate and normalize values • map time-series data to static representationmap time-series data to static representation

Encode data Encode data • representation must be appropriately for the representation must be appropriately for the

Data Mining tool which will be used Data Mining tool which will be used • continue to reduce attribute dimensionality continue to reduce attribute dimensionality

where possible without loss of informationwhere possible without loss of information OLAP and visualization tools as well as OLAP and visualization tools as well as

transformation and encoding softwaretransformation and encoding software

Page 29: CogNova Technologies 1 Knowledge Discovery and Data Mining An Introduction Daniel L. Silver Copyright (c), 2003 All Rights Reserved.

CogNovaTechnologies

33

The KDD ProcessThe KDD Process

Selection and Preprocessing

Data Mining

Interpretation and Evaluation

Data Consolidation

Knowledge

p(x)=0.02

Warehouse

Page 30: CogNova Technologies 1 Knowledge Discovery and Data Mining An Introduction Daniel L. Silver Copyright (c), 2003 All Rights Reserved.

CogNovaTechnologies

34

Overview of Data Mining Overview of Data Mining MethodsMethods

Automated Exploration/DiscoveryAutomated Exploration/Discovery• e.g.. e.g.. discovering new market segmentsdiscovering new market segments• distance and probabilistic clustering algorithmsdistance and probabilistic clustering algorithms

Prediction/ClassificationPrediction/Classification• e.g.. e.g.. forecasting gross sales given current factorsforecasting gross sales given current factors• regression, neural networks, genetic algorithmsregression, neural networks, genetic algorithms

Explanation/DescriptionExplanation/Description• e.g.. e.g.. characterizing customers by demographics characterizing customers by demographics

and purchase historyand purchase history• inductive decision trees, inductive decision trees,

association rule systemsassociation rule systems

x1

x2

f(x)

x

if age > 35 and income < $35k then ...Focus is on induction of a model

from specific examples

Page 31: CogNova Technologies 1 Knowledge Discovery and Data Mining An Introduction Daniel L. Silver Copyright (c), 2003 All Rights Reserved.

CogNovaTechnologies

35

Data Mining MethodsData Mining MethodsAutomated Exploration and DiscoveryAutomated Exploration and Discovery Distance-based numerical clusteringDistance-based numerical clustering

• metric grouping of examples (KNN)metric grouping of examples (KNN)• graphical visualization can be usedgraphical visualization can be used

Bayesian clusteringBayesian clustering• search for the number of classes which result search for the number of classes which result

in best fit of a probability distribution to the in best fit of a probability distribution to the data data

Unsupervised LearningUnsupervised Learning

Income

Age

Page 32: CogNova Technologies 1 Knowledge Discovery and Data Mining An Introduction Daniel L. Silver Copyright (c), 2003 All Rights Reserved.

CogNovaTechnologies

36

Data Mining MethodsData Mining MethodsPrediction and Classification Prediction and Classification

Function approximation Function approximation (curve fitting)(curve fitting) Classification Classification (concept learning, pattern (concept learning, pattern

recognition)recognition) Methods:Methods:

• Statistical regressionStatistical regression• Artificial neural networksArtificial neural networks• Genetic algorithmsGenetic algorithms• Nearest neighbour algorithmsNearest neighbour algorithms

Supervised LearningSupervised LearningI1 I2 I3 I4

O1 O2

f(x)

x

x1

x2

AB

Page 33: CogNova Technologies 1 Knowledge Discovery and Data Mining An Introduction Daniel L. Silver Copyright (c), 2003 All Rights Reserved.

CogNovaTechnologies

37

Data Mining MethodsData Mining Methods

Generalization Generalization The objective of learning is to achieve The objective of learning is to achieve

good good generalizationgeneralization to new cases, to new cases, otherwise just use a look-up table.otherwise just use a look-up table.

Generalization can be defined as a Generalization can be defined as a mathematical mathematical interpolationinterpolation or or regressionregression over a set of training points: over a set of training points:

f(x)

x

Page 34: CogNova Technologies 1 Knowledge Discovery and Data Mining An Introduction Daniel L. Silver Copyright (c), 2003 All Rights Reserved.

CogNovaTechnologies

41

Data Mining MethodsData Mining MethodsExplanation and DescriptionExplanation and Description

Learn a generalized hypothesis (model) Learn a generalized hypothesis (model) from selected datafrom selected data

Description/Interpretation of model Description/Interpretation of model provides new human knowledge provides new human knowledge

Methods:Methods:• Inductive decision tree and rule systemsInductive decision tree and rule systems• Association rule systemsAssociation rule systems• Link AnalysisLink Analysis

A?

B? C?

D?

Root

Leaf

Yes

Page 35: CogNova Technologies 1 Knowledge Discovery and Data Mining An Introduction Daniel L. Silver Copyright (c), 2003 All Rights Reserved.

CogNovaTechnologies

42

Modeling & Data MiningModeling & Data Mining

DEMODEMO

WEKA – A Data Mining WEKA – A Data Mining EnvironmentEnvironment

Page 36: CogNova Technologies 1 Knowledge Discovery and Data Mining An Introduction Daniel L. Silver Copyright (c), 2003 All Rights Reserved.

CogNovaTechnologies

43

The KDD ProcessThe KDD Process

Selection and Preprocessing

Data Mining

Interpretation and Evaluation

Data Consolidationand Warehousing

Knowledge

p(x)=0.02

Warehouse

Page 37: CogNova Technologies 1 Knowledge Discovery and Data Mining An Introduction Daniel L. Silver Copyright (c), 2003 All Rights Reserved.

CogNovaTechnologies

44

Interpretation and Interpretation and EvaluationEvaluation

EvaluationEvaluation Statistical validation and significance testingStatistical validation and significance testing Qualitative review by experts in the fieldQualitative review by experts in the field Pilot surveys to evaluate model accuracyPilot surveys to evaluate model accuracy

InterpretationInterpretation Inductive tree and rule models can be read Inductive tree and rule models can be read

directlydirectly Clustering results can be graphed and tabledClustering results can be graphed and tabled Code can be automatically generated by some Code can be automatically generated by some

systems systems (ANNs, IDTs, Regression models)(ANNs, IDTs, Regression models)

Page 38: CogNova Technologies 1 Knowledge Discovery and Data Mining An Introduction Daniel L. Silver Copyright (c), 2003 All Rights Reserved.

CogNovaTechnologies

45

Interpretation and Interpretation and EvaluationEvaluation

Visualization tools can be very helpful:Visualization tools can be very helpful:• sensitivity analysis (I/O relationship)sensitivity analysis (I/O relationship)• histograms of value distributionshistograms of value distributions• time-series plots and animationtime-series plots and animation• requires training and practicerequires training and practice

Response

Velocity

Temp

Page 39: CogNova Technologies 1 Knowledge Discovery and Data Mining An Introduction Daniel L. Silver Copyright (c), 2003 All Rights Reserved.

CogNovaTechnologies

46

Benefits, Costs, Benefits, Costs, Status and TrendssStatus and Trendss

Page 40: CogNova Technologies 1 Knowledge Discovery and Data Mining An Introduction Daniel L. Silver Copyright (c), 2003 All Rights Reserved.

CogNovaTechnologies

47

Benefits of Data Benefits of Data Analytics(KDD)Analytics(KDD)

Maximum utility from corporate dataMaximum utility from corporate data• discovery of new knowledgediscovery of new knowledge• generation of predictive modelsgeneration of predictive models

Important feedback to data warehousing Important feedback to data warehousing efforteffort• identification and justification of essential dataidentification and justification of essential data

Reduction of application dev Reduction of application dev ’’t backlogt backlog• model development model development vs. vs. software developmentsoftware development

Effect on bottom line of organizationEffect on bottom line of organization• cost reduction, increased productivity, risk cost reduction, increased productivity, risk

avoidance … competitive advantageavoidance … competitive advantage

Page 41: CogNova Technologies 1 Knowledge Discovery and Data Mining An Introduction Daniel L. Silver Copyright (c), 2003 All Rights Reserved.

CogNovaTechnologies

48

Requirements and Costs of Requirements and Costs of KDDKDD

HardwareHardware - - computationally intensivecomputationally intensive SoftwareSoftware - - micro < $20k, integrated suites $100k+micro < $20k, integrated suites $100k+ DataData - internal collection, surveys, external sources- internal collection, surveys, external sources Human resourcesHuman resources

• DB/DP/DC expertise to consolidate and preprocess DB/DP/DC expertise to consolidate and preprocess datadata

• Machine learning and stats competenceMachine learning and stats competence• Application knowledge & project mgmtApplication knowledge & project mgmt

70% 70% of the effort is expended on the data of the effort is expended on the data consolidation and preprocessing activitiesconsolidation and preprocessing activities

Page 42: CogNova Technologies 1 Knowledge Discovery and Data Mining An Introduction Daniel L. Silver Copyright (c), 2003 All Rights Reserved.

CogNovaTechnologies

49

Current Status and TrendsCurrent Status and Trends Standards and methodologies are maturingStandards and methodologies are maturing Many products:Many products:

• Open source (WEKA, RapidMiner)Open source (WEKA, RapidMiner)• micro DM packages (IBM Cognos)micro DM packages (IBM Cognos)• Macro integrated suites (IBM SPSS Macro integrated suites (IBM SPSS

Modeler, SAS Enterprise Miner)Modeler, SAS Enterprise Miner) Software costs have stabalizedSoftware costs have stabalized Major players have been determinedMajor players have been determined Internet - Internet - ““thethe”” sink and source of data sink and source of data Legal and ethical issues on the horizonLegal and ethical issues on the horizon

Page 43: CogNova Technologies 1 Knowledge Discovery and Data Mining An Introduction Daniel L. Silver Copyright (c), 2003 All Rights Reserved.

CogNovaTechnologies

50

Current Status and TrendsCurrent Status and Trends

Methods usedMethods used• http://www.kdnuggets.com/polls/2013/analy

tics-big-data-mining-data-science-software.html

Appication areas:Appication areas:• http://www.kdnuggets.com/polls/2012/wher

e-applied-analytics-data-mining.html

Other Poles: Other Poles: • http://www.kdnuggets.com/polls/index.html

Page 44: CogNova Technologies 1 Knowledge Discovery and Data Mining An Introduction Daniel L. Silver Copyright (c), 2003 All Rights Reserved.

CogNovaTechnologies

51

The Current Status and The Current Status and TrendsTrendsWhat has prevented the use of Data Mining?What has prevented the use of Data Mining? Products:Products:

• General in nature, not tailored for businessGeneral in nature, not tailored for business• Missing standard interfaces to organizational Missing standard interfaces to organizational

datadata• Emphasis on sales and not training/consulting Emphasis on sales and not training/consulting

Customers:Customers:• Frightened by technical skill set requiredFrightened by technical skill set required• Uncertain of mining results and ROIUncertain of mining results and ROI• Convinced warehouse must be completed firstConvinced warehouse must be completed first• Lacking knowledge of external data sourcesLacking knowledge of external data sources

Page 45: CogNova Technologies 1 Knowledge Discovery and Data Mining An Introduction Daniel L. Silver Copyright (c), 2003 All Rights Reserved.

CogNovaTechnologies

52

Key Technologies for KDDKey Technologies for KDD

Data warehousing and distributed Data warehousing and distributed database database

Parallel computingParallel computing AI and expert systemsAI and expert systems Machine learning and statistical inferenceMachine learning and statistical inference Visualization (including Virtual Reality)Visualization (including Virtual Reality) Internet - future sink and source of dataInternet - future sink and source of data

• adaptive filters, knowledge extractorsadaptive filters, knowledge extractors• smart web servicessmart web services

Page 46: CogNova Technologies 1 Knowledge Discovery and Data Mining An Introduction Daniel L. Silver Copyright (c), 2003 All Rights Reserved.

CogNovaTechnologies

53

Current Management Current Management IssuesIssuesOwnership of data and Ownership of data and

knowledgeknowledgeSecurity of customer dataSecurity of customer dataResponsibility for accuracy of Responsibility for accuracy of

informationinformationEthical practices - fair use of Ethical practices - fair use of

datadata

Page 47: CogNova Technologies 1 Knowledge Discovery and Data Mining An Introduction Daniel L. Silver Copyright (c), 2003 All Rights Reserved.

CogNovaTechnologies

54

A List of Major VendorsA List of Major VendorsLots of PlayersLots of Players

Approaching market from hardware, Approaching market from hardware, database, statistical, machine learning, database, statistical, machine learning,

education, financial/marketing, and education, financial/marketing, and management consulting:management consulting:

IBMIBM, , SASSAS, , SPSSSPSS, , SGISGI, , Thinking MachinesThinking Machines, , CognosCognos, , ZDM ScientificZDM Scientific, , NeuralwareNeuralware, ,

Information DiscoveryInformation Discovery, , American American HeuristicsHeuristics, , Data DistilleriesData Distilleries, ,

SuperInductionSuperInduction

Page 48: CogNova Technologies 1 Knowledge Discovery and Data Mining An Introduction Daniel L. Silver Copyright (c), 2003 All Rights Reserved.

CogNovaTechnologies

55

THE ENDTHE END

[email protected]@acadiau.ca