Top Banner
Neural Networks and Data Mining Folie 1 Artificial Neural Networks and Data Mining Uwe Lämmel Wismar Business School www.wi.hs-wismar.de/ ~laemmel [email protected]
39

Neural Networks and Data Mining Folie 1 Artificial Neural Networks and Data Mining Uwe Lämmel Wismar Business School laemmel [email protected].

Dec 23, 2015

Download

Documents

Preston Sims
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Neural Networks and Data Mining Folie 1 Artificial Neural Networks and Data Mining Uwe Lämmel Wismar Business School laemmel Uwe.Laemmel@hs-wismar.de.

Neural Networks and Data Mining Folie 1

Artificial Neural Networksand

Data Mining

Uwe Lämmel

Wismar Business

School

www.wi.hs-wismar.de/~laemmel

[email protected]

Page 2: Neural Networks and Data Mining Folie 1 Artificial Neural Networks and Data Mining Uwe Lämmel Wismar Business School laemmel Uwe.Laemmel@hs-wismar.de.

Neural Networks and Data Mining Folie 2

Content

Data Mining Classification: approach Data Mining Cup

– 2004: Who will cancel?– 2007: Who will get a rebate coupon?– 2008: How long will someone participate in a

lottery?– 2009: Forecast of book sales figures– 2010 ?

Clustering: approach– Behaviour of bank customers

Page 3: Neural Networks and Data Mining Folie 1 Artificial Neural Networks and Data Mining Uwe Lämmel Wismar Business School laemmel Uwe.Laemmel@hs-wismar.de.

Neural Networks and Data Mining Folie 3

Data Mining

Data Mining is a – systematic and automated

discovery and extraction– of previously unknown knowledge – out of huge amount of data.

"KDD – Knowledge Discovery in Data bases" – synonym

Notion wrong: Gold Mining Data Mining

Page 4: Neural Networks and Data Mining Folie 1 Artificial Neural Networks and Data Mining Uwe Lämmel Wismar Business School laemmel Uwe.Laemmel@hs-wismar.de.

Neural Networks and Data Mining Folie 4

Data Mining – Applications

classification

clustering

association

prediction

text mining

web mining

clustering partitioning a data set into subsets

(clusters), so that the data in each subset (ideally) share some common features – similarity or proximity for some defined

distance measure is building classes

classification items are placed in subsets

(classes) classes have known properties

– customer is bad, average, good– pattern recognition– …

set of training items is used to train the classification algorithm

Page 5: Neural Networks and Data Mining Folie 1 Artificial Neural Networks and Data Mining Uwe Lämmel Wismar Business School laemmel Uwe.Laemmel@hs-wismar.de.

Neural Networks and Data Mining Folie 5

Data Mining Process

CRISP-DM model

Page 6: Neural Networks and Data Mining Folie 1 Artificial Neural Networks and Data Mining Uwe Lämmel Wismar Business School laemmel Uwe.Laemmel@hs-wismar.de.

Neural Networks and Data Mining Folie 6

Content

Data Mining Classification: approach using NN Data Mining Cup Clustering: approach

Page 7: Neural Networks and Data Mining Folie 1 Artificial Neural Networks and Data Mining Uwe Lämmel Wismar Business School laemmel Uwe.Laemmel@hs-wismar.de.

Neural Networks and Data Mining Folie 7

Classification using NN

prerequisite set of training pattern (many patterns)

approach code the values divide set of training pattern into:

– training set– test set

build a network train the network using the training set check the network quality using the test

set

real data

training p.

coded p.

training set test set

Page 8: Neural Networks and Data Mining Folie 1 Artificial Neural Networks and Data Mining Uwe Lämmel Wismar Business School laemmel Uwe.Laemmel@hs-wismar.de.

Neural Networks and Data Mining Folie 8

Development of an NN-application

calculate network output compare to

teaching output

use Test set data

evaluate output

compare to teaching output

change parameters

modify weights

input of training pattern

build a network architecture

quality is good enough

error is too high

error is too high

quality is good enough

Page 9: Neural Networks and Data Mining Folie 1 Artificial Neural Networks and Data Mining Uwe Lämmel Wismar Business School laemmel Uwe.Laemmel@hs-wismar.de.

Neural Networks and Data Mining Folie 9

Build an Artificial Neural Network

Number of Input Neurons?– depends on the number of attributes– depends on the coding

Number of Output Neurons?– depends on the coding of the class attribute

Number of Hidden Neurons?– experiments necessary– generally: not more than input neurons– quarter … half of number of input neurons

may work– see capacity of a neural network

Page 10: Neural Networks and Data Mining Folie 1 Artificial Neural Networks and Data Mining Uwe Lämmel Wismar Business School laemmel Uwe.Laemmel@hs-wismar.de.

Neural Networks and Data Mining Folie 10

Experiments using the JavaNNS

Build a network Load training-pattern open the Error Graph open the Control Panel Initialize the network try different learning parameter: 0.1, 0.2, 0.5,

0.8 Start Learning

Page 11: Neural Networks and Data Mining Folie 1 Artificial Neural Networks and Data Mining Uwe Lämmel Wismar Business School laemmel Uwe.Laemmel@hs-wismar.de.

Neural Networks and Data Mining Folie 11

Getting Results

value the error Finally:

– make the test-Pattern the actual one

– Save Data …– include output files– save as a .res-file

Evaluate the .res-file

Page 12: Neural Networks and Data Mining Folie 1 Artificial Neural Networks and Data Mining Uwe Lämmel Wismar Business School laemmel Uwe.Laemmel@hs-wismar.de.

Neural Networks and Data Mining Folie 12

Experiments

How can we improve the results?– Data pre-processing?– Architecture of ANN?– Learning Parameters?– Evaluation of the results: post-processing?

record your work!

Page 13: Neural Networks and Data Mining Folie 1 Artificial Neural Networks and Data Mining Uwe Lämmel Wismar Business School laemmel Uwe.Laemmel@hs-wismar.de.

Neural Networks and Data Mining Folie 13

Content

Data Mining Classification: approach Data Mining Cup

– 2004: Who will cancel?– 2007: Who will get a rebate coupon?– 2008: How long will someone participate in a

lottery?– 2009: Forecast of book sales figures– 2010 ?

Clustering: approach– Behaviour of bank customers

Page 14: Neural Networks and Data Mining Folie 1 Artificial Neural Networks and Data Mining Uwe Lämmel Wismar Business School laemmel Uwe.Laemmel@hs-wismar.de.

Neural Networks and Data Mining Folie 14

Data Mining Cup www.data–mining–cup.de

annual competition for students runs April – May /June real world problem:

– problem– set of training data – set of data for classification– to be developed: classification

supported by many companies (data/software)

~ 200 – 300 participants workshop (user day)

Page 15: Neural Networks and Data Mining Folie 1 Artificial Neural Networks and Data Mining Uwe Lämmel Wismar Business School laemmel Uwe.Laemmel@hs-wismar.de.

Neural Networks and Data Mining Folie 15

DMC2004: A Mailing Action

mailing action of a company: – special offer– estimated annual income per customer:

given:– 10,000 sets of customer data

containing 1,000 cancellers (training) problem:

– test set contains 10,000 customer data

– Who will cancel ? – Whom to send an offer?

customerwillcancel

willnot cancel

gets an offer 43.80€ 66.30€

gets no offer 0.00€ 72.00€

Page 16: Neural Networks and Data Mining Folie 1 Artificial Neural Networks and Data Mining Uwe Lämmel Wismar Business School laemmel Uwe.Laemmel@hs-wismar.de.

Neural Networks and Data Mining Folie 16

Mailing Action – Aim?

no mailing action:– 9,000 x 72.00 = 648,000

everybody gets an offer:– 1,000 x 43.80 + 9,000 x 66.30 = 640,500

maximum (100% correct classification):– 1,000 x 43.80 + 9,000 x 72.00 = 691,800

customerwillcancel

willnot cancel

gets an offer 43.80€ 66.30€

gets no offer 0.00€ 72.00€

Page 17: Neural Networks and Data Mining Folie 1 Artificial Neural Networks and Data Mining Uwe Lämmel Wismar Business School laemmel Uwe.Laemmel@hs-wismar.de.

Neural Networks and Data Mining Folie 17

Goal Function: Lift

basis: no mailing action: 9,000 · 72.00goal = extra income:liftM = 43.8 · cM + 66.30 · nkM – 72.00· nkM

customerwillcancel

willnot cancel

gets an offer

43.80€ 66.30€

gets no offer

0.00€ 72.00€

Page 18: Neural Networks and Data Mining Folie 1 Artificial Neural Networks and Data Mining Uwe Lämmel Wismar Business School laemmel Uwe.Laemmel@hs-wismar.de.

Neural Networks and Data Mining Folie 18

Dataresults>

<important

^missing values^

----- 32 input data ------

Page 19: Neural Networks and Data Mining Folie 1 Artificial Neural Networks and Data Mining Uwe Lämmel Wismar Business School laemmel Uwe.Laemmel@hs-wismar.de.

Neural Networks and Data Mining Folie 19

Feed Forward Network – What to do?

train the net with training set (10,000) test the net using the test set ( another 10,000)

– classify all 10,000 customer into canceller or loyal– evaluate the additional income

Page 20: Neural Networks and Data Mining Folie 1 Artificial Neural Networks and Data Mining Uwe Lämmel Wismar Business School laemmel Uwe.Laemmel@hs-wismar.de.

Neural Networks and Data Mining Folie 20

Results

data mining cup 2002

neural network project 2004

gain: – additional income by the mailing action

if target group was chosen according analysis

Page 21: Neural Networks and Data Mining Folie 1 Artificial Neural Networks and Data Mining Uwe Lämmel Wismar Business School laemmel Uwe.Laemmel@hs-wismar.de.

Neural Networks and Data Mining Folie 21

DMC 2007: Rebate System

Check-out couponing allows an individual coupon generation at the check-out

The coupon is printed at the end of the sales slip depending on the current customer.

Questions: – How can the retailer identify

whether a customer is a potential couponing customer?

– On what coupons he will respond?

Page 22: Neural Networks and Data Mining Folie 1 Artificial Neural Networks and Data Mining Uwe Lämmel Wismar Business School laemmel Uwe.Laemmel@hs-wismar.de.

Neural Networks and Data Mining Folie 22

Couponing Print:

– coupon A– coupon B– No coupon

50,000 customer cards for training

Classify another 50,000 customer!

Cost function:– coupon not redeemed (false assignment to A or B): –1 – coupon A redeemed (correct assignment to A): +3– coupon B redeemed (correct assignment to B): +6

Maximize the value!

Page 23: Neural Networks and Data Mining Folie 1 Artificial Neural Networks and Data Mining Uwe Lämmel Wismar Business School laemmel Uwe.Laemmel@hs-wismar.de.

Neural Networks and Data Mining Folie 23

Data Understanding What is the meaning of the attributes? Type and range of values?

Page 24: Neural Networks and Data Mining Folie 1 Artificial Neural Networks and Data Mining Uwe Lämmel Wismar Business School laemmel Uwe.Laemmel@hs-wismar.de.

Neural Networks and Data Mining Folie 24

20–20–2 Network

Profit = 3AA + 6 BB – (NA+NB+BA+AB)

results: winner 2007 7,890 my version 6,714 our students 6,468

(73/230)

Page 25: Neural Networks and Data Mining Folie 1 Artificial Neural Networks and Data Mining Uwe Lämmel Wismar Business School laemmel Uwe.Laemmel@hs-wismar.de.

Neural Networks and Data Mining Folie 25

DMC2008: Participation in a Lottery Predicting, at the beginning of the lottery,

how long participants will participate:

0 – The first ticket has not been paid for 1 – Only the ticket for the first class has been paid for 2 – Only the first two classes were played 3 – The lottery was played until the end

but no ticket purchased for the following lottery

4 – At least first ticket for the following lottery purchased

cost matrix

Page 26: Neural Networks and Data Mining Folie 1 Artificial Neural Networks and Data Mining Uwe Lämmel Wismar Business School laemmel Uwe.Laemmel@hs-wismar.de.

Neural Networks and Data Mining Folie 26

Data

113,476 pattern! 69 attributes

– new customer (yes/no)

– age– bank– car– …

Page 27: Neural Networks and Data Mining Folie 1 Artificial Neural Networks and Data Mining Uwe Lämmel Wismar Business School laemmel Uwe.Laemmel@hs-wismar.de.

Neural Networks and Data Mining Folie 27

100–40–20–5 Network

results: 1,030,240 RWTH Aachen (1)

…1,024,535 RWTH Aachen (8)

865,565 Bauhaus Univ. Weimar (100)

Univ. Wismar: 878,550 – 835,035 – 1,494,315 (212)

Page 28: Neural Networks and Data Mining Folie 1 Artificial Neural Networks and Data Mining Uwe Lämmel Wismar Business School laemmel Uwe.Laemmel@hs-wismar.de.

Neural Networks and Data Mining Folie 28

DMC 2009 – online bookshop „Libri“

Sales figures training:– more than 1.800 books– 2.418 shops

Sales figures forecast– 8 books– 2.394 shops

Page 29: Neural Networks and Data Mining Folie 1 Artificial Neural Networks and Data Mining Uwe Lämmel Wismar Business School laemmel Uwe.Laemmel@hs-wismar.de.

Neural Networks and Data Mining Folie 29

DMC 2009 – online bookshop „Libri“

Page 30: Neural Networks and Data Mining Folie 1 Artificial Neural Networks and Data Mining Uwe Lämmel Wismar Business School laemmel Uwe.Laemmel@hs-wismar.de.

Neural Networks and Data Mining Folie 30

DMC 2009 – 83-25-9-3 network

Page 31: Neural Networks and Data Mining Folie 1 Artificial Neural Networks and Data Mining Uwe Lämmel Wismar Business School laemmel Uwe.Laemmel@hs-wismar.de.

Neural Networks and Data Mining Folie 31

DMC 2010: Revenue maximisation by intelligent couponing

Many customers only make an order in an online shop once

decision whether to send a voucher worth € 5.00 voucher for those

who would not have decided to re-order by themselves.

32,427 data sets for training 32,428 data sets for prediction 37 attributes per set + target attribute in training set

Page 32: Neural Networks and Data Mining Folie 1 Artificial Neural Networks and Data Mining Uwe Lämmel Wismar Business School laemmel Uwe.Laemmel@hs-wismar.de.

Neural Networks and Data Mining Folie 32

DMC 2010

out of 67 teams!

Page 33: Neural Networks and Data Mining Folie 1 Artificial Neural Networks and Data Mining Uwe Lämmel Wismar Business School laemmel Uwe.Laemmel@hs-wismar.de.

Neural Networks and Data Mining Folie 33

Content

Data Mining Classification: approach Data Mining Cup Clustering: approach

– Behaviour of bank customers

Page 34: Neural Networks and Data Mining Folie 1 Artificial Neural Networks and Data Mining Uwe Lämmel Wismar Business School laemmel Uwe.Laemmel@hs-wismar.de.

Neural Networks and Data Mining Folie 34

Clustering Transaction Data

Co–operation Hochschule Wismar HypoVereinsbank Medienhaus Rostock

Issue What information can be extracted

from turnover time series?Strategy1. Clustering time series data2. Assign customers/accounts to clusters3. Examine clusters

Page 35: Neural Networks and Data Mining Folie 1 Artificial Neural Networks and Data Mining Uwe Lämmel Wismar Business School laemmel Uwe.Laemmel@hs-wismar.de.

Neural Networks and Data Mining Folie 35

Transaction Data & Time Series

Original financial data not suitable: Order of values is important Time displacements are

problematic

Corporate clients 223 branches

Cumulated transactions per

Month Account Type of transaction

... for a total of 6 years

Page 36: Neural Networks and Data Mining Folie 1 Artificial Neural Networks and Data Mining Uwe Lämmel Wismar Business School laemmel Uwe.Laemmel@hs-wismar.de.

Neural Networks and Data Mining Folie 36

Fourier versus Original Data

No displacementSimilarity detected on both: transaction curve and frequency spectrum

Data is displacedfrequency spectrum shows similarity

Page 37: Neural Networks and Data Mining Folie 1 Artificial Neural Networks and Data Mining Uwe Lämmel Wismar Business School laemmel Uwe.Laemmel@hs-wismar.de.

Neural Networks and Data Mining Folie 37

Using a classification model

Clustering

Sequence A

Initial Cluster

Preprocessing

Classification Model

t0 tm

1. Building the Model

Customer Turnover ...

New Cluster

Sequence B

Preprocessing

t0+n tm+n

2. Applying themodel

Identical

?

3. Comparing clusterassignments

Different

Initial Cluster

Page 38: Neural Networks and Data Mining Folie 1 Artificial Neural Networks and Data Mining Uwe Lämmel Wismar Business School laemmel Uwe.Laemmel@hs-wismar.de.

Neural Networks and Data Mining Folie 38

Clustering & Prediction Results

140.000 records 1 record = 1 account 6x5 SOM = max. 30 clusters average changes of cluster assignments: ca.

19%

Variability per Business Sector22,3% Taxi 239/107022,3% Ship Broker Offices

64/47120,9% Churches 228/109120,2% Trucking 1010/5008

Page 39: Neural Networks and Data Mining Folie 1 Artificial Neural Networks and Data Mining Uwe Lämmel Wismar Business School laemmel Uwe.Laemmel@hs-wismar.de.

Neural Networks and Data Mining Folie 39

Ende