Top Banner
Table of Contents Market basket analysis in a multiple store environment • Introduction • Need for a new Algorithm Problem Definition • Algorithm Description Graph Based Structure of Market Basket Analysis Market Basket Defining the problem Apriori Algorithm Limitation • Similis Algorithm Problem Transformation Searching for the Maximum-weighted Clique • Comparison Example Conclusion
40
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Market Basket Analysis

Table of ContentsMarket basket analysis in a multiple

store environment

• Introduction• Need for a new Algorithm• Problem Definition• Algorithm Description

Graph Based Structure of Market Basket Analysis

• Market Basket• Defining the problem

• Apriori Algorithm• Limitation

• Similis Algorithm• Problem Transformation• Searching for the Maximum-

weighted Clique

• Comparison• Example

• Conclusion

Page 2: Market Basket Analysis

Introduction

• Market basket is a method of discovering customer purchasing patterns

• Discovering such purchasing patterns can help managers in designing store layout, web sites, product mix and bundling, and other marketing strategies

• Company with multiple stores, discovery of purchasing patterns that may vary over time and exist in all, or in subsets of, stores can be useful in forming marketing, sales, service, and operation strategies at the company’s, local, and store levels

Page 3: Market Basket Analysis

Need for a new AlgorithmTwo main problems in using the existing methods in a multi-store environment:1.Temporal Association Rules :

1. The static association rules either find patterns at a point of time or implicitly assume the patterns stay the same over time and across stores

2. Temporal Association selling periods are considered in computing the support value

2.Spatial Association Rules1. Possibility that some products may not be sold in some

stores, for example, because of geographical, environmental, or political reasons

2. Problem is to find common association patterns in subsets of stores with location

Page 4: Market Basket Analysis

Cont..

• Solution: Apriori-like algorithm – Covers rules that are applicable to the entire chain

without time restriction or to a subset of stores in specific time intervals

– The format of the rules is similar to that of the traditional rules

– Rules also contain information on store (location) and time

Page 5: Market Basket Analysis

Cont..

• Examples:– In the second week of August, customers

purchase computers, printers, Internet and wireless phone services jointly in electronics stores near campus

– In January, customers purchase cold medicine, humidifiers, coffee, and sunglasses together in supermarkets near skiing resorts

Page 6: Market Basket Analysis

Problem Definition

Page 7: Market Basket Analysis

Cont..

– Let {T1, T2,. . ., Tm} be the set of mutually disjoint time intervals (periods) and form a complete partition of T

– Let P={P1, P2,. . ., Pq} be the set of stores, where Pj (1 ≤ j ≤ q) denotes the jth store in the store chain

– Each transaction s in D is attached with a timestamp, ‘t’ and store identifier, ‘p’ to indicate the store and time that the transaction occurs

– Let Sk subset P and Rk subset T be the sets of the stores and times that item Ik is sold, respectively

Page 8: Market Basket Analysis

Cont..

Page 9: Market Basket Analysis

Cont..

Page 10: Market Basket Analysis

Cont..

Page 11: Market Basket Analysis

Cont..

Page 12: Market Basket Analysis

“Apriori-like” Algorithm

Page 13: Market Basket Analysis

Cont..

Page 14: Market Basket Analysis

Cont..

• Some essential Points:– RFk denote the set of all relative-frequent k-

itemsets; Fk the set of all frequent k-itemsets; Ck the set of candidate k-itemsets

– k-item candidate itemset are generated by combining k-1 frequent itemsets following the anti-monotone property

Page 15: Market Basket Analysis

Cont..

• Algorithm in brief:– First step of the algorithm is to build the PT table, for

each item in I – Different Phases of Algorithm:

• In the first phase, we scan the database for the first time and build a two-dimensional table, called the TS table and find frequent 1-itemset

• In the kth phase of the algorithm, Ck is derived, and Fk is generated by evaluating their supports

• Since an RFk itemset must be a frequent itemset, we generate RFk from Fk by evaluating the relative supports of the itemsets X in Fk

Page 16: Market Basket Analysis

Cont..

• PT table: Associates context (stores, time intervals) with each item in I– PT tables for individual items can be used to

determine the PT table for a given itemset X

Page 17: Market Basket Analysis

Cont..

Page 18: Market Basket Analysis

Cont..

• PT table: The method to compute the jth row of PT table for itemset X

Page 19: Market Basket Analysis

Cont..

Page 20: Market Basket Analysis

Cont..

• Candidate itemsets: we generate the candidate itemsets from the frequent itemsets, from the last phase

• Relative-frequent itemset: – Because an RF itemset must be a frequent itemset,

we can generate RFk from Fk by computing the relative supports of those itemsets X in Fk

– |DVX| can be obtained from the TS and PT tables of

X

Page 21: Market Basket Analysis

Cont..

Page 22: Market Basket Analysis

Conclusion

• Store-chain association rules, is proposed specifically for a multi-store environment, where stores may have different product-mix strategies that can be adjusted over time.

• These rules have a distinct advantage over the traditional ones because they contain store (location) and time information so that they can be used not only for general or local marketing strategies (depending on the results), but also for product procurement, inventory, and distribution strategies for the entire store chain

Page 23: Market Basket Analysis

References1. R. Agrawal, R. Srikant, Fast algorithms for mining association rules,

Proceedings of the 20th VLDB Conference, Santiago, Chile, 1994, pp. 478–499

2. J.M. Ale, G.H. Rossi, An approach to discovering temporal association rules, Proceedings of the 2000 ACM Symposium on Applied Computing (Vol. 1), Villa Olmo, Como, Italy, 2000, pp. 294– 300

3. S. Brin, R. Motwani, J.D. Ullman, S. Tsur, Dynamic itemset counting and implication rules for market basket data, Proceedings of the 1997 ACM-SIGMOD Conference on Management of Data, Tucson, Arizona, USA, May 1997, pp. 255–264.

4. E. Clementini, P.D. Felice, K. Koperski, Mining multiplelevel spatial association rules for objects with a broad boundary, Data and Knowledge Engineering 34 (3) (2000) 251– 270.

Page 24: Market Basket Analysis

Table of ContentsMarket basket analysis in a multiple

store environment

• Introduction• Need for a new Algorithm• Problem Definition• Algorithm Description

Graph Based Structure of Market Basket Analysis

• Market Basket• Defining the problem

• Apriori Algorithm• Limitation

• Similis Algorithm• Problem Transformation• Searching for the Maximum-

weighted Clique

• Comparison• Example

• Conclusion

Page 25: Market Basket Analysis

Market Basket

• Market Basket is a powerful tool for implementing the cross selling strategies.

• Problem Definition:– The file with multiple transactions can be shown in a

relational database table T(customer, item). – The customer= {1,2,3,……,n} and the item = a,b,c,….,z. – The table T(customer, item) can be seen as a set of all

customer transactions Trans = {t1,t2,......,tk} where each transaction contains the subset of items tk = {ia, ib,......,iz}

– The relational table thus formed can be seen as the relationship between item and customer called item – clientele.

Page 26: Market Basket Analysis

Apriori Algorithm

• Apriori Algorithm has two important characteristics.

• Level – wise algorithm i.e. it traverses the item lattice one level at a time, from frequent 1 – item set to maximum size of frequent item sets.

• Generate and Test strategy for finding the frequent item sets. The support for each candidate is then counted and tested against the minsup threshold.

• Limitation• Apriori algorithm has an exponential time complexity, and

several passes over the input table are needed. To overcome these handicaps Similis algorithm is proposed.

Page 27: Market Basket Analysis

Similis Algorithm• Similis is a Latin word which means “Similar”.• Similis consists of two steps :

• Problem Transformation: Generation of graph structure• Search: Finding the maximum weight clique

• Algorithm Description: STEP 1 - Data Transformation input: table T(customer, item) 2 Generate graph G(V,E) using the similarities between items output: weighted graph G(V,E)STEP 2 - Finding the maximum-weighted cliques input: weighted graph G(V,E) and size k 2 Find in G(V,E) the clique S with k vertexes with the maximum weight, using the Primal- Tabu Meta-heuristic. output: weighted clique S of size k that correspond to the most frequent market basket with

k items.

Page 28: Market Basket Analysis

Problem Transformation

• As stated earlier of transforming the table T(customer, item) into condense data by using graph structure.

• A graph is a pair of G = (V,E), where V is the vertices and E is the edge to the graph.

• In market basket case each vertex corresponds to an item and each arc has a weight which represents the distance between the adjacent vertices.

• The distance between two items is given by the frequency that the two items are bought together.

Page 29: Market Basket Analysis

Cont…

• To find the values for the weighted graph G(V,E) some similarity measures can be used.

• The similarity value of the two items will be high if they are both included in frequent transactions.

• This means that if two items are frequently bought in the same transactions, then they belong to a frequent market basket.

• In order to create sets of items, one association measure must be found, similarity or distance measures can be created.

Page 30: Market Basket Analysis

Cont…

• For each pair of items (A,B) a similarity measure SIM(A,B) can be found, if the items are bought together many times they have a strong similarity, but they have a weak similarity if they are not usually bought together.

• For all items, an item similarity matrix is generated, which can be represented by the adjacent matrix of the weighted graph G(V,E).

Page 31: Market Basket Analysis

Similarity Measures• The authors describe the following similarity measures. • These measures use binary matrices and return normalized values

between 0 and 1.• The Dice (sim1), Jaccard (sim2) and Cosine (sim3) coefficient are widely

used given their simplicity.

Page 32: Market Basket Analysis

Weight Calculation• A multiplicative model will de used to express the weight of an edge (A,B). The

weight of the edge (A,B) takes into account the similarity and frequency of items, such as:

weight(A,B) = sim(A,B) . frequency(A,B)• Where the similarity value of two items will be high if they are both included in the

same transactions.• The frequency of the item must be considered to guarantee a correspondence

between high-weighted edges and items that appear in many transactions.• There are several ways to define item frequency. In this work author opt for the

average of the relative frequency of the two items, given by:

Page 33: Market Basket Analysis

Example

In this table T customer and item relationship is given, there are 5 customer and 6 items so we have to find the item – clientele relationship means:Bread = (1,2,4,5) Milk = (1,3,4,5) Diaper = (2,3,4,5) Beer = (2,3,4)Eggs = (2) Coke = (3,5)

Page 34: Market Basket Analysis

Matrix of GraphG(V,E) Bread Milk Diaper Beer Eggs Coke

Bread 0.48 0.60 0.28 0.125 0.12

Milk 0.48 0.28 0 0.30

Diaper 0.525 0.125 0.30

Beer 0.133 0.125

Eggs 0

Coke

Adjacent Matrix of weighted graph G =(V,E)

Page 35: Market Basket Analysis

Searching for the Maximum-weighted Clique

• A clique can represent a common interest group.• Given a graph representing the communication among a

group of individuals in an organization, each vertex represents an individual, while edge (i, j) shows that individual i regularly communicates with individual j.

• Our aim is to find the maximum weighted clique in graph.• If a graph with weights in the edges is used, the most

weighted clique corresponds to the common-interest group whose elements communicate the most among themselves. This structure allows the representation of sets of elements strongly connected.

Page 36: Market Basket Analysis

Maximum Clique Problem• The Maximum Clique Problem is an important problem in combinatorial

optimization.• In market basket it is used to find the interesting combination patterns of

the item with another one’s.• Here to solve this problem, the Primal-Tabu algorithm is used for finding

the maximum weighted clique.• Conceptually primal Tabu works on finding the related neighbourhood

structures are N+, N-, and N0 for addition, removal and swap of a vertex of the graph.

• At each step one new solution S' is chosen from the neighbourhood N(S) of the current solution S.

• At each iteration the best solution found S* are updated whenever the clique value is increased.

Page 37: Market Basket Analysis

Comparison• Apriori with support 3 give us the frequent item choice (Bread, Milk, Diaper) .• Here I am trying to find whether the graph give me the same option by using the

maximum weight clique method.• But before this since the support is 3 and our values are in 1 and 0 (binary matrices

) so we should need to normalized the 3 to the range of 1 and 0 .

Page 38: Market Basket Analysis

Result

Milk

Bread

Diaper

Beer

Coke

Eggs

0.48

0.48

0.525

0.60

Only those edges are consider whose weights are more than 0.40.

Page 39: Market Basket Analysis

Conclusion• The main disadvantage of the Apriori algorithm is the

exponential time complexity, since it performs many passes over the data.

• Using few items or sparse data the algorithm is efficient , while when using correlated data the performance degrades significantly.

• The Similis algorithm because of its lower computational complexity, thus allowing the resolution of a greater number of real problems.

• In this innovative approach, the condensed data is obtained by transforming the market basket problem in a maximum-weighted clique problem.

Page 40: Market Basket Analysis

References

1. E. Balas, W. Niehaus, Optimized Crossover-Based Genetic Algorithms will be the Maximum Cardinality and Maximum Weight Clique Problems, Journal of Heuristics, Kluwer Academic Publishers, 4, 1998, pp. 107-122.

2. M. Berry and G. Lino, Data Mining Techniques for Marketing, Sales and Customer Support, John Wiley and Sons, 1997.

3. I.M. Bomze, M. Budinich, P.M. Pardalos and M. Pelillo, Maximum Clique Problem, in Handbook of Combinatorial Optimization, D.-Z. Du and P.M. Pardalos Eds, 1999, pp.1-74.

4. J. Han, M. Kamber, Data Mining, Morgan Kaufmann, San Francisco, 2001.