Top Banner
MARKET BASKET ANALYSIS USING R TOOL Gaurav Mittal DOMS-NITT
23
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Market Basket Analysis

MARKET BASKET ANALYSIS USING

R TOOL

Gaurav MittalDOMS-NITT

Page 2: Market Basket Analysis

What is Market Basket Analysis?

Understanding behavior of shoppers What items are bought together

What’s in each shopping cart/basket?

Basket data consist of collection of transaction date and items bought in a transaction Itemset

Retail organizations interested in generating qualified decisions and strategy based on analysis of transaction data what to put on sale, how to place merchandise on shelves for

maximizing profit, customer segmentation based on buying pattern

Page 3: Market Basket Analysis

Market Basket Analysis

MBA uses this information to: Identify who customers are (not by name) Understand why they make certain purchases Gain insight about its merchandise (products):

Fast and slow movers Products which are purchased together Products which might benefit from promotion

Take action: Store layouts Which products to put on specials, promote, coupons…

Combining all of this with a customer loyalty card it becomes even more valuable

Page 4: Market Basket Analysis

Examples

Rule form: LHS RHS IF a customer buys diapers, THEN they also buy beer

diapers beer

“Transactions that purchase bread and butter also purchase milk”

bread butter milk

Customers who purchase maintenance agreements are very likely to purchase large appliances

When a new hardware store opens, one of the most commonly sold items is toilet bowl cleaners

Page 5: Market Basket Analysis

Def: Market Basket Analysis (Association Analysis) is a mathematical modeling technique based upon the theory that if you buy a certain group of items, you are likely to buy another group of items.

It is used to analyze the customer purchasing behavior and helps in increasing the sales and maintain inventory by focusing on the point of sale transaction data.

Page 6: Market Basket Analysis

Definitions and Terminology

Transaction is a set of items (Itemset). Confidence : It is the measure of uncertainty or trust

worthiness associated with each discovered pattern. Support : It is the measure of how often the collection of items

in an association occur together as percentage of all transactions

Frequent itemset : If an itemset satisfies minimum support,then it is a frequent itemset.

Strong Association rules: Rules that satisfy both a minimum support threshold and a minimum confidence threshold

In Association rule mining, we first find all frequent itemsets and then generate strong association rules from the frequent itemsets

Page 7: Market Basket Analysis

Market Basket Analysis General Concept: methods

_____________________________

Method:

Transaction 1: Frozen pizza, cola, milk Transaction 2: Milk, potato chips Transaction 3: Cola, frozen pizza Transaction 4: Milk, pretzels Transaction 5: Cola, pretzels

 Frozen

Pizza Milk ColaPotato

ChipsPretzel

s

Frozen Pizza 2 1 2 0 0

Milk 1 3 1 1 1

Cola 2 1 3 0 1

Potato Chips 0 1 0 1 0

Pretzels 0 1 1 0 2

Results:

we could derive the association rules: If a customer purchases Frozen Pizza, then they will probably purchase Cola. If a customer purchases Cola, then they will probably purchase Frozen Pizza.

Page 8: Market Basket Analysis

Market Basket Analysis General Concept: Measures Support : measure of how often the collection of items

in an association occur together as a percentage of all the transactions support = (containing the item combination) /( total number of record.) Let the rule Is "If a customer purchases Cola, then they will purchase Frozen

Pizza“ The support for this

= 2 (number of transaction that include both Cola and Frozen Pizza is) / 5(total records )

= 40%.

Confidence : confidence of rule “B given A” is a measure of how much more likely it is that B occurs when A has occurred 100% meaning that B always occurs if A has occurred Confidence of a rule = the support for the combination / the support for the

condition. For the rule "If a customer purchases Milk, then they will purchase

Potato Chips" confidence = support for the combination (Potato Chips + Milk) is 20%/

support for the condition (Milk) is 60%, =33%

Page 9: Market Basket Analysis

Association Rules Apply Elsewhere

Retail – supermarkets, etc… Purchases made using credit/debit cards. Optional Telco Service purchases. Banking services. Unusual combinations of insurance claims can be

a warning of fraud. Medical patient histories. Restaurants and Fast-food Centre.

Page 10: Market Basket Analysis

Preparing Data for MBA

Determining scope of dataset (one or many stores, what period, etc)

Converting transaction data to itemsets Generalizing items to appropriate level

Depends on objective of modelRolling up rare items to get adequate support

Page 11: Market Basket Analysis

INTRODUCTION TO R

R is a programming language and software environment for statistical computing and graphics.

R is part of the GNU project. Its source code is freely available under the GNU General Public License, and pre-compiled binary versions are provided for various operating systems.

R uses a command line interface, though several graphical user interfaces are available.

Comprehensive R Archive Network (CRAN) makes it easy to benefit from others’ work and to share your work and get feedback on potential improvements

Page 12: Market Basket Analysis

For computationally-intensive tasks, C, C++, and Fortran code can be linked and called at run time.

R provides a wide variety of statistical (linear and nonlinear modeling, classical statistical tests, time-series analysis, classification, clustering, and others) and graphical techniques.

Another of R's strengths is its graphical facilities, which produce publication-quality graphs which can include mathematical symbols.

Although R is mostly used by statisticians and other practitioners requiring an environment for statistical computation and software development, it can also be used as a general matrix calculation toolbox with comparable benchmark results to GNU Octave and its proprietary counterpart, MATLAB

Page 13: Market Basket Analysis

THE R ENVIRONMENT

R is an integrated suite of software facilities for data manipulation, calculation and graphical display. It includes an effective data handling and storage facility, a suite of operators for calculations on arrays, in particular matrices, a large, coherent, integrated collection of intermediate tools for data

analysis, and graphical facilities for data analysis and display either on-screen or on

hardcopy.

Packages The capabilities of R are extended through user-submitted packages,

which allow specialized statistical techniques, graphical devices, as well as and import/export capabilities to many external data formats.

A statistical package is a suite of computer programs that are specialised for statistical analysis. It enables people to obtain the results of standard statistical procedures and statistical significance tests, without requiring low-level numerical programming.

Page 14: Market Basket Analysis

Process Methodology The data is obtained from the excel sheet

provided by the customer.

Page 15: Market Basket Analysis

Each row contains-  BUS_DT - Bussiness Date REST_NO – Restaurant Number RTL_TRAN_NO – Transaction Numbrer MENU_ITEM_KEY – Product Key Number MENU_ITEM_PLU – Menu Product Number MENU_ITEM_NAME – Product Name RCPT_DT_TMSTP – Date Of Transaction HALF_HOUR_KEY – The half hour in which the transaction occurred. COMBO_IND – Is the product offered with something else SERVICE_MODE_CODE – Eating / Taken CGY – Category RGLR_PRC – Regular Price DRV_PRC – Derived Price ITEM_QTY – Number of Products Ordered

Page 16: Market Basket Analysis

Products offered at the store

WHOPPER TENDERCRISP Chicken Sandwich Crown-shaped CHICKEN TENDERS French Fries Hamburger Cheeseburger DOUBLE CROISSAN'WICH BK BURGER SHOTS KRAFT Macaroni and Cheese Drinks

Page 17: Market Basket Analysis

Changing the given data in a new format that contains all items purchased in a single transaction.

Done by using VLOOKUP function in excel. The data obtained is re structured to remove the

multiple line of the same transaction using if…then method in excel.

The data is ready to be fed for statistical application.

Page 18: Market Basket Analysis

Working in R

Downloading Rcmndr, which is a GUI, and Apriori or Association rules package from the CRAN.

A GUI is run named as Rcmndr, to load the data in the software, or the data can be directly loaded using the command functions.

<-Dataset <- read.table("C:/Users/mittal/Documents/mittal.csv", header=TRUE, sep=",", na.strings="NA", dec=".", strip.white=TRUE)

Page 19: Market Basket Analysis

loading package Arules library("arules")

To inspect the transactions. <-inspect(Dataset)

Page 20: Market Basket Analysis

Next, we call the function apriori() to find all rules (the default association type for apriori()) with a minimum support of 1% and a confidence of 0.6.

> rules <- arules(Adult, parameter = list(support = 0.01,

+ confidence = 0.6))

 Asking for the rules > rules

  Getting the Summary of the rules > summary(rules_whopper) > rules_whopper <- subset(rules, subset = rhs %in%

"income=small" &

+ lift > 1.2) > rules_hamburger <- subset(rules, subset = rhs %in%

"income=large" &

+ lift > 1.2)

Page 21: Market Basket Analysis

The recommendations Whopper can be bundled with coke, minute

maid orange juice, French toast stick. Cheeseburger can be bundled with the

French fries, onion rings. French fries with HERSHEY®'S Fat Free Milk. Dutch Apple Pie with Bacon, Egg & Cheese

Biscuit Sandwich.

Page 22: Market Basket Analysis

Challenges…!!!

Cannot load data more than 799 rows. R software is usable only for learning

purpose but difficult for industrial purpose where large amount of data to be analyzed.

Limited knowledge available for guiding analysis development in R.

New codes has to be developed for extending the database.

Page 23: Market Basket Analysis

Thank You