Top Banner
Information Retrieval from Data Bases for Decisions Dr. Gábor SZŰCS, Ph.D. Assistant professor BUTE, Department Information and Knowledge Management
25

Information Retrieval from Data Bases for Decisions Dr. Gábor SZŰCS, Ph.D. Assistant professor BUTE, Department Information and Knowledge Management.

Dec 25, 2015

Download

Documents

Arleen Stone
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Information Retrieval from Data Bases for Decisions Dr. Gábor SZŰCS, Ph.D. Assistant professor BUTE, Department Information and Knowledge Management.

Information Retrieval from Data Bases

for Decisions

Dr. Gábor SZŰCS, Ph.D.

Assistant professor

BUTE, Department Information and Knowledge Management

Page 2: Information Retrieval from Data Bases for Decisions Dr. Gábor SZŰCS, Ph.D. Assistant professor BUTE, Department Information and Knowledge Management.

Contents

Aims General steps in the procedure Market basket analysis Frequent itemsets Conclusion

Page 3: Information Retrieval from Data Bases for Decisions Dr. Gábor SZŰCS, Ph.D. Assistant professor BUTE, Department Information and Knowledge Management.

Aims

search hidden coherences in the existing data bases (DB)

help to take a well grounded decisionData mining techniques are able to find

such relationships. they provide the ability to optimize decision-

making they are the most powerful tools for retrieval

important information

Page 4: Information Retrieval from Data Bases for Decisions Dr. Gábor SZŰCS, Ph.D. Assistant professor BUTE, Department Information and Knowledge Management.

Steps of the data mining

1. Declaration of the key and the predictor variables in order to analyse (Sampling from a large amount of data)

2. Modification of variables, where we should examine whether some variables should be integrated (in large DBs always occur some mistakes) (some transformations should be executed)

Page 5: Information Retrieval from Data Bases for Decisions Dr. Gábor SZŰCS, Ph.D. Assistant professor BUTE, Department Information and Knowledge Management.

Additional steps of the data mining

3. Modelling, data mining techniques: neural network, decision tree, regression procedures, cluster analysis, factor analysis, discriminant analysis, etc.

4. Comparison the data mining models built on the same DB (the best model can be selected). The procedure can be cyclically repeated. After the whole procedure the hidden relationships between different aspects can be shown.

Page 6: Information Retrieval from Data Bases for Decisions Dr. Gábor SZŰCS, Ph.D. Assistant professor BUTE, Department Information and Knowledge Management.

Market Basket Analysis

is used for finding groups of items that tend to occur together.

The models give the likelihood of different products being purchased together.

Market basket analysis is useful for:

1. items occur together

2. items occur in a particular sequence

Page 7: Information Retrieval from Data Bases for Decisions Dr. Gábor SZŰCS, Ph.D. Assistant professor BUTE, Department Information and Knowledge Management.

Table of Co-Occurrence of Products

Product 1 Product 2 Product 3 Product 4 Product 5

Product 1 234 12 0 125 54

Product 2 12 175 65 23 75

Product 3 0 65 229 67 62

Product 4 125 23 67 315 55

Product 5 54 75 62 55 292

Page 8: Information Retrieval from Data Bases for Decisions Dr. Gábor SZŰCS, Ph.D. Assistant professor BUTE, Department Information and Knowledge Management.

Procedure of the market basket analysis

1. Choose the right level of the product hierarchy for the items.

2. Probabilities and joint probabilities of the items are calculated.

3. Determine the association rules.

Page 9: Information Retrieval from Data Bases for Decisions Dr. Gábor SZŰCS, Ph.D. Assistant professor BUTE, Department Information and Knowledge Management.

Example

Bicycle (A) 140

Hand tools for bicycle (B) 100

Tool rack (C) 61

Bicycle and hand tool (A & B) 50

Bicycle and tool rack (A & C) 7

Hand tool and tool rack (B & C) 45

Bicycle and hand tool and tool rack (A & B & C)

5

Page 10: Information Retrieval from Data Bases for Decisions Dr. Gábor SZŰCS, Ph.D. Assistant professor BUTE, Department Information and Knowledge Management.

Table of probabilities and joint probabilities of items

A 14 %

B 10 %

C 6,1 %

A & B 5 %

A & C 0,7 %

B & C 4,5 %

A & B & C 0,5 %

Page 11: Information Retrieval from Data Bases for Decisions Dr. Gábor SZŰCS, Ph.D. Assistant professor BUTE, Department Information and Knowledge Management.

Association rules

The rules (AB) consist of two parts:

1. condition and

2. consequence A confidence can be defined for the rules:

)(

)&(

conditionp

resultconditionpconfidence

Page 12: Information Retrieval from Data Bases for Decisions Dr. Gábor SZŰCS, Ph.D. Assistant professor BUTE, Department Information and Knowledge Management.

Example

P(AB) = 5 / 14 = 0.357P((A&B)C) = 0.05 / 0.5 = 0.1P((A&C)B) = 0.05 / 0.07 = 0.714P((B&C)A) = 0.05 / 0.45 = 0.111Is this association rule can help us? If we offer product A for everybody,

then 14 % of the persons will purchase. If A for only B and C,

then 11 % of the people will purchase.

Page 13: Information Retrieval from Data Bases for Decisions Dr. Gábor SZŰCS, Ph.D. Assistant professor BUTE, Department Information and Knowledge Management.

Improvement

This will help us to decide that the association rule is useful or not.

)(

)()(

Yp

YXpYXtimprovemen

Page 14: Information Retrieval from Data Bases for Decisions Dr. Gábor SZŰCS, Ph.D. Assistant professor BUTE, Department Information and Knowledge Management.

In our example

Improvement ((B&C)A) = 0.111 / 0.14 = 0.794

Improvement ((A&B)C) = 0.1 / 0.061 = 1.639

The value of improvement shows the usefulness of the analysis:

a) improvement > 1

b) improvement < 1

Page 15: Information Retrieval from Data Bases for Decisions Dr. Gábor SZŰCS, Ph.D. Assistant professor BUTE, Department Information and Knowledge Management.

Dissociation rules

similar to association rules

count the inverse of the original item, modify each transaction:A transaction includes an inverse item if, and only if, it does not contain the original item.

CBA )&(

Page 16: Information Retrieval from Data Bases for Decisions Dr. Gábor SZŰCS, Ph.D. Assistant professor BUTE, Department Information and Knowledge Management.

Time series

the transactions must have two additional features:

time information (e.g. time sequence or time stamp)

identifying information (e.g. customer id, account number in a bank)

Page 17: Information Retrieval from Data Bases for Decisions Dr. Gábor SZŰCS, Ph.D. Assistant professor BUTE, Department Information and Knowledge Management.

Frequent itemsets

appear in at least fixed ratio problem a-priori trick:

If a set of items S is frequent, then every subset of S is also frequent.

procedure built from lower level to upper level (frequent items, frequent pairs, etc.)

Page 18: Information Retrieval from Data Bases for Decisions Dr. Gábor SZŰCS, Ph.D. Assistant professor BUTE, Department Information and Knowledge Management.

A-Priori Algorithm

1. Define a threshold for relative frequency. All items are examined. The set of the frequent items: L1.

2. Pairs of items in L1 become the candidate (C2).

This is compared with the threshold limit. L2 contains the frequent pairs.

Page 19: Information Retrieval from Data Bases for Decisions Dr. Gábor SZŰCS, Ph.D. Assistant professor BUTE, Department Information and Knowledge Management.

A-Priori Algorithm (cont.)

3. The candidate triples (C3) are those sets {A,B,C} such that all of subset are in L2. L3 will contain the frequent triples.

4. Li is the frequent sets of size i, Ci+1 is the candidate set of size i+1

until the sets become empty

Page 20: Information Retrieval from Data Bases for Decisions Dr. Gábor SZŰCS, Ph.D. Assistant professor BUTE, Department Information and Knowledge Management.

Criticism of A-Priori Algorithm

good if we would like to know only the frequent pairs

at searhing maximal frequent itemsets too many steps may be needed

physical capacity of computers

Page 21: Information Retrieval from Data Bases for Decisions Dr. Gábor SZŰCS, Ph.D. Assistant professor BUTE, Department Information and Knowledge Management.

Market Basket Mining with High Correlation Analysis

The data are organised in a matrix. The cells contain Boolean.

1: yes 0: no

This matrix is very sparse. We want to find the highly correlated pairs.

Page 22: Information Retrieval from Data Bases for Decisions Dr. Gábor SZŰCS, Ph.D. Assistant professor BUTE, Department Information and Knowledge Management.

Applications of High Correlation Mining

1. Rows are the document, columns are the words. The highly correlated pairs of columns will give the words that appear almost together.

2. Rows and columns are Web pages. The cell contains 1, if the page of row links to the page of column. Result: pages about the same topic.

3. Page of columns links to the page of row. Result: the mirror pages.

Page 23: Information Retrieval from Data Bases for Decisions Dr. Gábor SZŰCS, Ph.D. Assistant professor BUTE, Department Information and Knowledge Management.

Conclusion

Planning store layout Bundling products Offering coupons

Page 24: Information Retrieval from Data Bases for Decisions Dr. Gábor SZŰCS, Ph.D. Assistant professor BUTE, Department Information and Knowledge Management.

Future

Further development: hierarchical association rules association rules maintenance sequential pattern mining functional dependency mining

Page 25: Information Retrieval from Data Bases for Decisions Dr. Gábor SZŰCS, Ph.D. Assistant professor BUTE, Department Information and Knowledge Management.

Thank you!

The flow is open for the discussion.

E-mail: [email protected]