Top Banner
3. DATA MINING Business Intelligence 1
15

Business Intelligence 3. DATA MINING - KOCWcontents.kocw.net/KOCW/document/2014/koreasejong/... · 2016-09-09 · Data Mining Definition “Computational process of extracting interesting

May 22, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Business Intelligence 3. DATA MINING - KOCWcontents.kocw.net/KOCW/document/2014/koreasejong/... · 2016-09-09 · Data Mining Definition “Computational process of extracting interesting

3. DATA MINING

Business Intelligence

1

Page 2: Business Intelligence 3. DATA MINING - KOCWcontents.kocw.net/KOCW/document/2014/koreasejong/... · 2016-09-09 · Data Mining Definition “Computational process of extracting interesting

Data Mining

Definition

“Computational process of extracting interesting (non-

trivial, implicit, previously unknown and potentially

useful) patterns or knowledge from huge amount of

data using methods from artificial intelligence, machine

learning, statistics, and database systems.”

Alternative names

◦ Knowledge discovery in databases (KDD), knowledge

extraction, data/pattern analysis, business intelligence,

big data, etc.

2

Page 3: Business Intelligence 3. DATA MINING - KOCWcontents.kocw.net/KOCW/document/2014/koreasejong/... · 2016-09-09 · Data Mining Definition “Computational process of extracting interesting

Background

Explosive Growth of Data

◦ Transactions, web, e-commerce, mobile computing, sensors

(weather sensors, closed caption cameras), etc.

We are drowning in data, but starving for knowledge!

To mine knowledge from abundant data, we need

automated analysis

One of the technologies

for Big Data

3

Page 4: Business Intelligence 3. DATA MINING - KOCWcontents.kocw.net/KOCW/document/2014/koreasejong/... · 2016-09-09 · Data Mining Definition “Computational process of extracting interesting

A Typical Data Mining System

data cleaning, integration, and selection

Database or Data Warehouse Server

Data Mining Engine

Pattern Evaluation

Visualization

Database Data

Warehouse

World

Wide

Web

Other Info

Repositories

4

Page 5: Business Intelligence 3. DATA MINING - KOCWcontents.kocw.net/KOCW/document/2014/koreasejong/... · 2016-09-09 · Data Mining Definition “Computational process of extracting interesting

Data Mining is a Confluence of Multiple Disciplines

Data Mining

Database Technology Statistics

Machine Learning

Pattern Recognition

Algorithm

Other Disciplines

Visualization

5

Page 6: Business Intelligence 3. DATA MINING - KOCWcontents.kocw.net/KOCW/document/2014/koreasejong/... · 2016-09-09 · Data Mining Definition “Computational process of extracting interesting

Data Mining Classifications

Pattern Mining and Trend Analysis

◦ Association, regression

E.g. Diaper Beer [2%, 75%]

◦ Trend? common temporal behaviors

Classification (symbolic) and prediction (numeric)

◦ E.g., classify a given country based on its ‘climate’ (into tropical climate,

temperate climate, polar climate country)

◦ E.g., predicting temperatures and plants in a country.

Cluster analysis

◦ Group data into different clusters that maximize intra-class similarity

interclass dissimilarity

E.g., cluster animals into various classes (mammal, fish, etc.)

Outlier analysis

◦ Find data object not complying with general characteristics

E.g., Credit card fraud, call fraud, money laundry, etc.

6

Page 7: Business Intelligence 3. DATA MINING - KOCWcontents.kocw.net/KOCW/document/2014/koreasejong/... · 2016-09-09 · Data Mining Definition “Computational process of extracting interesting

Business Applications of DM

Where does the business data come from?

◦ Credit card transactions, loyalty cards, discount coupons, customer complaint calls, (public) lifestyle studies, mobile devices, sensors, etc.

Where does DM applied?

◦ Customer Clustering

◦ Customer Profiling

◦ Determine customer purchasing patterns

◦ Cross-market analysis

◦ Predict what factors will attract new customers

◦ Fraud detection: auto insurance, medical insurance, phone-call frauds,

or money laundering detection

◦ Anti-terrorism

7

Page 8: Business Intelligence 3. DATA MINING - KOCWcontents.kocw.net/KOCW/document/2014/koreasejong/... · 2016-09-09 · Data Mining Definition “Computational process of extracting interesting

PREDICTION

Business Intelligence 3. Data Mining

8

Page 9: Business Intelligence 3. DATA MINING - KOCWcontents.kocw.net/KOCW/document/2014/koreasejong/... · 2016-09-09 · Data Mining Definition “Computational process of extracting interesting

Classification and Prediction

Classification (symbolic)

◦ Construct classification model using training set (having values and class labels), then, classify new instances into categories using the model

◦ Techniques: if/then rules, decision trees, artificial neural net

Prediction (numeric)

◦ Predicts unknown or missing numeric values

◦ Construct models of continuous-valued functions, then use it for predictions

◦ Techniques: regression

Typical applications of classification and prediction

◦ Credit approval, Target marketing, Medical diagnosis, Fraud detection, etc.

9

Page 10: Business Intelligence 3. DATA MINING - KOCWcontents.kocw.net/KOCW/document/2014/koreasejong/... · 2016-09-09 · Data Mining Definition “Computational process of extracting interesting

Prediction

Predict a numeric value for a target variable given various condition variable values.

Two Step Process

1. Construct a model

2. Use model to predict a value for a given input

Major method for prediction: regression

◦ Model the relationship between one or more independent or predictor variables and a dependent or response variable

Regression analysis

◦ Linear regression

◦ Multiple linear regression

◦ Non-linear regression

◦ Etc.

Page 11: Business Intelligence 3. DATA MINING - KOCWcontents.kocw.net/KOCW/document/2014/koreasejong/... · 2016-09-09 · Data Mining Definition “Computational process of extracting interesting

Linear Regression

Linear regression: involves a response variable y and a single

predictor variable x

where w0 (y-intercept) and w1 (slope) are regression coefficients

Method of least squares: estimates the best-fitting straight line

||

1

2

||

11

)(

))((

D

i

i

D

i

ii

xx

yyxx

w

xwyw 10

xwwy 10

11

Page 12: Business Intelligence 3. DATA MINING - KOCWcontents.kocw.net/KOCW/document/2014/koreasejong/... · 2016-09-09 · Data Mining Definition “Computational process of extracting interesting

Multiple Linear Regression

Multiple linear regression: involves more than one predictor

variable

◦ Training data is in the form of (x1, x2, y)

Ex. For 2-D data

◦ Solvable by extension of least square method for y = w0 + w1 x1+ w2 x2

12

Page 13: Business Intelligence 3. DATA MINING - KOCWcontents.kocw.net/KOCW/document/2014/koreasejong/... · 2016-09-09 · Data Mining Definition “Computational process of extracting interesting

Nonlinear Regression

Some nonlinear models can be modeled by a polynomial function

A polynomial regression model can be transformed into multiple

linear regression model.

For example, y = w0 + w1 x + w2 x2 + w3 x

3

can be converted to linear with new variables: x2 = x2, x3= x3

y = w0 + w1 x + w2 x2 + w3 x3

Other functions, such as power function, can also be transformed to

linear model

13

Page 14: Business Intelligence 3. DATA MINING - KOCWcontents.kocw.net/KOCW/document/2014/koreasejong/... · 2016-09-09 · Data Mining Definition “Computational process of extracting interesting

Accuracy Measures

C ~C Total (fact)

C True positive False negative Positive

~C False positive True negative Negative

Total (prediction) Pos-predictions Neg-predictions Total

Accuracy measures

◦ accuracy = (t-pos + t-neg) / (pos + neg)

◦ error-rate = 1 - accuracy

◦ sensitivity = t-pos/pos

(identifying cows with Mad Cow Disease - find all positive cases. )

◦ specificity = t-neg/neg

(identifying professional patients - find all negative cases)

(fact)

(fact)

(prediction)

fact

prediction

14

Page 15: Business Intelligence 3. DATA MINING - KOCWcontents.kocw.net/KOCW/document/2014/koreasejong/... · 2016-09-09 · Data Mining Definition “Computational process of extracting interesting

Accuracy Measures – An Exercise

Compute accuracy, error_rate, sensitivity, and specificity for the

following example

classes buy_computer = yes

buy_computer = no

total recognition(%)

buy_computer = yes 6954 46 7000 99.34

buy_computer = no 412 2588 3000 86.27

total 7366 2634 10000 95.52

15