Comparing between machine learning methods for a remote monitoring system. Ronit Zrahia Final Project Tel-Aviv University.

Comparing between machine learning methods for a remote monitoring system.

Ronit ZrahiaFinal ProjectTel-Aviv University

Overview

The remote monitoring systemThe project databaseMachine learning methods:

Decision of Association Rules Inductive Logic Programming Decision Tree

Applying the methods for project database and comparing the results

Remote Monitoring System - Description

Support Center has ongoing information on customer’s equipment

Support Center can, in some situations, know that customer is going to be in trouble

Support Center initiates a call to the customer

Specialist connects to site from remote and tries to eliminate problem before it has influence

Remote Monitoring System - Description

GatewayProduct

AIX/NT

Customer

TCP/IP [FTP]

TCP/IP [Mail/FTP]

Support Server

AIX/NT/95

Modem

Modem

Remote Monitoring System - Technique

One of the machines on site, the Gateway, is able to initiate a PPP connection to the support server or to ISP

All the Products on site have a TCP/IP connection to the Gateway

Background tasks on each Product collect relevant information

The data collected from all Products is transferred to the Gateway via ftp

The Gateway automatically dials to the support server or ISP, and sends the data to the subsidiary

The received data is then imported to database

Project Database

12 columns, 300 recordsEach record includes failure

information of one product at a specific customer site

The columns are: record no., date, IP address, operating system, customer ID, product, release, product ID, category of application, application, severity, type of service contract

Project Goals

Discover valuable information from database

Improve the products marketing and the customer support of the company

Learn different learning methods, and use them for the project database

Compare the different methods, based on the results

The Learning Methods

Discovery of Association RulesInductive Logic ProgrammingDecision Tree

Discovery of Association Rules - Goals

Finding relations between products which are bought by the customers Impacts on product marketing

Finding relations between failures in a specific product Impacts on customer support (failures

can be predicted and handled before influences)

Discovery of Association Rules - Definition

A technique developed specifically for data mining Given

A dataset of customer transactionsA transaction is a collection of items

FindCorrelations between items as rules

Example Supermarket baskets

Determining Interesting Association Rules

Rules have confidence and support IF x and y THEN z with confidence c

If x and y are in the basket, then so is z in c% of cases

IF x and y THEN z with support sThe rule holds in s% of all transactions

Discovery of Association Rules - Example

Input Parameters: confidence=50%; support=50%

If A then C: c=66.6% s=50% If C then A: c=100% s=50%

Transaction

Items

12345 A B C

12346 A C

12347 A D

12348 B E F

Itemsets are Basis of Algorithm

Rule A => C s=s(A, C) = 50% c=s(A, C)/s(A) = 66.6%

Transaction

Items

12345 A B C

12346 A C

12347 A D

12348 B E F

Itemset Support

A 75%

B 50%

C 50%

A, C 50%

Algorithm Outline

Find all large itemsets Sets of items with at least minimum

support Apriori algorithm

Generate rules from large itemsets For ABCD and AB in large itemset the rule

AB=>CD holds if ratio s(ABCD)/s(AB) is large enough

This ratio is the confidence of the rule

Pseudo Algorithm

k

kk

k

kk

k

L

CCL

C

Tt

LC

kLk

L

Answer (8)

(7)

} minsupc.count | { (6)

t),subset( (5)

ons transactiall (4)

)n(apriori_ge (3)

do ) ; ;2( (2)

} sets-item-1frequent { (1)

1

1

1

end

for

beginfor

Relations Between Products

17 / 19 = 0.89 2, 3 6

17 / 20 = 0.85 6 2, 3

17 / 17 = 1 2, 6 3

17 / 24 = 0.71 3 2, 6

17 / 20 = 0.85 3, 6 2

2 – 3 - 6 17 / 19 = 0.89 2 3, 6

20 / 20 = 1 6 3

3 - 6 20 / 24 = 0.83 3 6

17 / 20 = 0.85 6 2

2 - 6 17 / 19 = 0.89 2 6

19 / 24 = 0.79 3 2

2 – 3 19 / 19 = 1 2 3

21 / 21 = 1 9 1

1 - 9 21 / 24 = 0.875 1 9

18 / 24 = 0.75 3 1

1 - 3 18 / 24 = 0.75 1 3

Item Set ( L ) Confidence ( CF ) Association Rules

17 / 19 = 0.89 2, 3 6

17 / 20 = 0.85 6 2, 3

17 / 17 = 1 2, 6 3

17 / 24 = 0.71 3 2, 6

17 / 20 = 0.85 3, 6 2

2 – 3 - 6 17 / 19 = 0.89 2 3, 6

20 / 20 = 1 6 3

3 - 6 20 / 24 = 0.83 3 6

17 / 20 = 0.85 6 2

2 - 6 17 / 19 = 0.89 2 6

19 / 24 = 0.79 3 2

2 – 3 19 / 19 = 1 2 3

21 / 21 = 1 9 1

1 - 9 21 / 24 = 0.875 1 9

18 / 24 = 0.75 3 1

1 - 3 18 / 24 = 0.75 1 3


Relations Between Failures


4-6 14 / 16 = 0.875 4 6

14 / 15 = 0.93 6 4

5-10 15 / 18 = 0.83 5 10

15 / 15 = 1 10 5

Inductive Logic Programming - Goals

Finding the preferred customers, based on: The number of products bought by the

customer The failures types (i.e severity level)

occurred in the products

Inductive Logic Programming - Definition

Inductive construction of first-order clausal theories from examples and background knowledge

The aim is to discover, from a given set of pre-classified examples, a set of classification rules with high predictive power

Examples: IF Outlook=Sunny AND Humidity=High THEN

PlayTennis=No

Horn clause induction

Given:P: ground facts to be entailed (positive examples);N: ground facts not to be entailed (negative examples);B: a set of predicate definitions (background theory);L: the hypothesis language;

Find a predicate definition (hypothesis) such that1. for every (completeness)2. for every (consistency)

LH p|HB :Pp

n|HB :Nn

Inductive Logic Programming - Example

Learning about the relationships between people in a family circle

),(

),(

),(

),(),,(),(

alicejanemother

johnjanemother

janehenryfather

YZparentZXfatherYXrgrandfathe

B

),(

),(

alicehenryrgrandfathe

johnhenryrgrandfatheE

),(

),(

johnalicergrandfathe

henryjohnrgrandfatheE

),(),( YXmotherYXparentH

Algorithm Outline

A space of candidate solutions and an acceptance criterion characterizing solutions to an ILP problem

The search space is typically structured by means of the dual notions of generalization (induction) and specialization (deduction) A deductive inference rule maps a conjunction of clauses G

onto a conjunction of clauses S such that G is more general than S

An inductive inference rule maps a conjunction of clauses S onto a conjunction of clauses G such that G is more general than S.

Pruning Principle: When B and H don’t include positive example, then

specializations of H can be pruned from the search When B and H include negative example, then

generalizations of H can be pruned from the search

Pseudo Algorithm

satisfiedQH

QH

QH to HH Add

HHH yield to H to rr rules theApply

H to applied be to Rrr rules inference the

QHfromH

QH

n

nk1

k

)criterion(-stop

Prune

,...,

,...,,,...,

,..., Choose

Delete

Initialize:

1

21

1

until

repeat

The preferred customers

17%

83%

Preferred Customers

Others

If ( Total_Products_Types( Customer ) > 5 )and ( All_Severity(Customer) < 3 ) then

Preferred_Customer

Decision Trees - Goals

Finding the preferred customersFinding relations between products

which are bought by the customersFinding relations between failures in

a specific productCompare the Decision Tree results to

the previous algorithms results.

Decision Trees - Definition

Decision tree representation: Each internal node tests an attribute Each branch corresponds to attribute value Each leaf node assigns a classification

Occam’s razor: prefer the shortest hypothesis that fits the data

Examples: Equipment or medical diagnosis Credit risk analysis

Algorithm outline

A the “best” decision attribute for next node

Assign A as decision attribute for nodeFor each value of A, create new

descendant of nodeSort training examples to leaf nodesIf training examples perfectly classified,

Then STOP, Else iterate over new leaf nodes

Pseudo algorithm

Root

AAttributesributeTarget_attExamples

Examples ributeTarget_att

Examples

AvExamplesExamples

vARoot

Av

ARoot

gain ninformatio

ExamplesAttributesA

ExamplesributeTarget_att

Root

Attributes

Root

Examples

Root

AttributesributeTarget_attExamples

i

i

i

v

v

iv

i

i

Return

End

}){ ,,ID3(

subtree theaddbranch new thisbelow Else

in of uecommon val

most label with node leaf a addbranch new thisbelowThen

empty is If

for valuehave that ofsubset thebe Let

test the toingcorrespond , belowbranch treenew a Add

, of , value,possibleeach For

for attributedecision The

) highest with theattribute thei.e (

classifies that from attribute the

Begin Otherwise

in of value

common most label with , treenode-single Return the

empty, is If

clabel with , treenode-single Return the

C, class same in the are all If

treefor the node a Create

) , ,ID3(

best

Information Measure

Entropy measures the impurity of the sample of training examples S : is the probability of making a particular decision There are c possible decisions

The entropy is the amount of information needed to identify class of an object in S Maximized when all are equal Minimized (0) when all but one is 0 (the

remaining is 1)

c

1ii2i plogpEntropy(S)

ip

ip

ip

ip

Information Measure

Estimate the gain in information from a particular partitioning of the dataset

Gain(S, A) = expected reduction in entropy due to sorting on A

The information that is gained by partitioning S is then:

The gain criterion can then be used to select the partition which maximizes information gain

Values(A)v

vv )Entropy(S

SS

Entropy(S)A)Gain(S,

Decision Tree - Example

Day Outlook Temperature Humidity Wind PlayTennis

D1 sunny hot high weak No

D2 sunny hot high strong No

D3 overcast hot high weak Yes

D4 rain mild high weak Yes

D5 rain cool normal weak Yes

D6 rain cool normal strong No

D7 overcast cool normal strong Yes

D8 sunny mild high weak No

D9 sunny cool normal weak Yes

D10 rain mild normal weak Yes

D11 sunny mild normal strong Yes

D12 overcast mild high strong Yes

D13 overcast hot normal weak Yes

D14 rain mild high strong No

Decision Tree - Example (Continue)

humidity wind

high weaknormal strong

N P

S: [9+,5-]E=0.940

S: [9+,5-]E=0.940

[6+,2-]E=0.811

[3+,3-]E=1.00

Gain (S, Wind) = .940 - (8/14).811 - (6/14)1.0

= .048

[3+,4-]E=0.985

[6+,1-]E=0.592

Gain (S, Humidity) = .940 - (7/14).985 - (7/14).592

= .151

Which attribute is the best classifier?

Gain(S, Outlook) = 0.246

Gain(S, Temperature) = 0.029

Decision Tree Example – (Continue)

outlook

?

sunny overcast rain

Yes

{D1, D2, …, D14}[9+,5-]

{D4,D5,D6,D10,D14}[3+,2-]

{D1,D2,D8,D9,D11}[2+,3-]

{D3,D7,D12,D13}[4+,0-]

?

Ssunny = {D1,D2,D8,D9,D11} Gain(Ssunny, Humidity) = .970 – (3/5)0.0 – (2/5)0.0 = .970 Gain(Ssunny, Temperature) = .970 – (2/5)0.0 – (2/5)1.0 – (1/5)0.0 = .570 Gain(Ssunny, Wind) = .970 – (2/5)1.0 – (3/5).918 = .019

Decision Tree Example – (Continue)

outlook

humidity wind

sunny overcast rain

Yes

high strongnormal weak

No Yes No Yes

Overfitting

The tree may not be generally applicable called overfitting

How can we avoid overfitting? Stop growing when data split not statistically

significant Grow full tree, then post-prun

The post-pruning approach is more commonHow to select “best” tree:

Measure performance over training data Measure performance over separate validation

data set

Reduced-Error Pruning

Split data into training and validation set Do until further pruning is harmful:

1. Evaluate impact on validation set of pruning each possible node (plus those below it)

2. Greedily remove the one that most improves validation set accuracy

• Produces smallest version of most accurate sub-tree

The Preferred Customer

NO: 7YES: 0

NO: 0YES: 3

NoOfProducts

< 2.5 >= 2.5

MaxSev

< 4.5 >= 4.5

NO: 3YES: 8

Target attribute is TypeOfServiceContract

Relations Between Products

NO: 0YES: 1

NO: 4YES: 0

Product2

Product9

0 1

0 1

Product6

0 1

NO: 0YES: 15

NO: 0YES: 1

Target attribute is Product3

Relations Between Failures

NO: 5YES: 1

NO: 1YES: 0

Application8

Application2

0 1

0 1

Application10

0 1

NO: 0YES: 11

NO: 2YES: 2

Target attribute is Application5

Comparing between machine learning methods for a remote monitoring system. Ronit Zrahia Final Project Tel-Aviv University.

Documents

database slide

transactions slide

influences slide

customer support failures

b e f slide

customer id

confidence c xif x

product id