Improving the promotion forecasting accuracy at Unilever ...

Eindhoven University of Technology

MASTER

Improving the promotion forecasting accuracy at Unilever Netherlands

van der Poel, M.J.

Award date:2010

Link to publication

DisclaimerThis document contains a student thesis (bachelor's or master's), as authored by a student at Eindhoven University of Technology. Studenttheses are made available in the TU/e repository upon obtaining the required degree. The grade received is not published on the documentas presented in the repository. The required complexity or quality of research of student theses may vary by program, and the requiredminimum study period may vary in duration.

General rightsCopyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright ownersand it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights.

• Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain

https://research.tue.nl/en/studentTheses/b03ab6b5-5127-492f-87b1-83dabfed1701

Eindhoven, August 2010

BSc Industrial Engineering and Management Science

Student identity number 0550934

in partial fulfilment of the requirements for the degree of

Master of Science

in Operations Management and Logistics

Supervisors TU/e:

dr. K.H. van Donselaar

dr. J.J.L. Schepers

Supervisor Unilever:

dr. P.D.J. van Balkom

Improving the promotion

forecasting accuracy at Unilever

Netherlands

by

M.J. (Thijs) van der Poel


Page II

TUE. School of Industrial Engineering.

Series Master Theses Operations Management and Logistics

Subject headings: sales forecasting, promotions, retail trade, consumer goods


Page III

Abstract

This master thesis describes how the forecasting accuracy of promotions can be improved at

Unilever Netherlands. Currently, a very judgemental way of forecasting is applied by employees

within the organization. This research will develop the forecasting process by using a more

mathematical forecasting model. With multiple linear regression the consumer demand and retailer

orders are forecasted and an analysis is made between the difference of forecasting consumer

demand and forecasting retailer orders. The effect size of the 21 dependent variables on the

promotional demand are discussed and the most important are used to formulate a reduced model.

It is concluded that the consumer demand can be forecasted quite accurate; however, the

forecasting accuracy drops substantial for retailer orders. Multiple disturbing factors on consumer

demand apparently increase the variability of the retailer orders. Therefore, this research advices

Unilever to cooperate more extensively with their retailers to investigate the disturbing factors and

develop a integrated forecasting approach.


Page IV

Management summary

Problem introduction

This research is performed at Unilever Netherlands in Rotterdam and is directed at the forecasting

process for promotions. In the last 2 decades the promotional pressure has increased in the Fast

Moving Consumer Market where Unilever operates in. This holds especially in the Netherlands,

where the competition is fierce and multiple price wars have decreased the price level. Therefore,

with a current promotional pressure of around 40%, Unilever has indicated the forecasting process

of these promotions as a developmental area. An earlier internal project indicated that the forecast

accuracy on promotion or range level is quite good; however, on product level the promotion

accuracy drops dramatically. And since the Unilever plants have to produce product specific items

and the stock levels are product specific, the goal is to increase the forecasting accuracy on SKU

level.

Problem definition

The main research question is: What are the causes for the low forecasting accuracy of the

promotion forecasting process and how can the forecasting accuracy be improved?

A first analysis of the problem resulted in five problem areas. This research mainly focussed on the

problem area Poor database usage: Within Unilever different data sources have to be consulted

manually for each promotion. This is a time consuming user unfriendly process, which does not

enhance the usage of data and thus the forecast accuracy. Furthermore, no model is provided to

calculate the sales of a new promotion. Therefore, an employee has to search and analyze all the

information him or herself.

The research is performed at four retailers in the Netherlands (Albert Heijn, C1000, Kruidvat and

Plus) and 86 different products. The promotions of these products are analyzed for the period

January 2009 upto march 2010. Also, some practical requirements to make a forecasting model

work in practice are defined: The forecasting model should be easy to use for Unilever employees, it

should work with data which is available within the organization and it should forecast the consumer

demand and use this as a basis to come to a retailer order forecast to enhance the usability of the

model.

Research design

The research design depicts which method should be used and which variables are included in the

model. Multiple linear regression is chosen as the most suitable method for a forecasting model. In

this method one dependent variable is predicted with multiple independent variables. As dependent

variable the Lift Factor of a promotion is forecasted. This is the promotional sales divided by the


Page V

weekly base line sales of a product. The independent variables are divided among the groups

promotion, retailer and brand as depicted in the underlying figure. The research will test the effect

of the different independent variables on the dependent variable, will reduce the number of

variables and correct for data availability.

Results

The first step in developing a forecasting model for Unilever is to test the model performance of the

full model, where all variables in above figure are included, on the consumer demand. The consumer

demand is the actual number of products which are scanned at the registers of the retailer stores

during a promotion. The effect size and direction of the independent variables are depicted in the

table below, where two plus or minus signs indicate a strong effect of the variable on the

promotional sales. The Adjusted R-square of the model is quite high with a value of 0.700. This

indicates a good model fit where 70% of the variance of the promotional demand is explained by the

model. Furthermore, the model results are robust when used for other promotions than the

promotions with which the model is calibrate.

Besides the fact that the variables with a large effect size are more important to inherit in a

forecasting model, the effect size of a variable can also be used to drive marketing decisions. The

first marketing implication is that a display (second placement) of a promotion in a retailer store is

far more important than folder advertisement and TV advertisement. Hence, when the marketing

budget should be allocated, investments in display should have priority above investments in folder

advertisement and both should have priority on investments in TV advertisement. The second

implication is that the promotion mechanism where a consumer has to buy four or more products to

get the promotional discount results in the highest promotional demand. Surprisingly, a Single Price

Promotion

variables

Brand

variables

Retailer

variables

Promotional

sales

Advertising

Display

Holiday

Length promo

Price decrease

Promo mechanism

Susceptibility

to stockpiling

Preservability

Size of product

Absolute

Percentual

Frequency of

purchase

Folder

TV

# of products

in promotions

Retailer

Market

penetration

Repeat buyers

# of selling

points

Lift factor former

promotions

Product category

Promotion

pressure

Weather

Summer products

Winter products


Page VI

Off (SPO), where a consumer only has to buy one product, leads to a highger promotional demand

than a promotion where a consumer has to buy two or three products. A promotion where the

consumer gets a free product or premiaat has the lowest promotional demand, although the success

of such a promotion really depends on the type of free product or premiaat. The last important

implication is that marketing can increase the promotional sales by making sure that the promotion

is sold in all stores of a retailer. This variable is especially important if the product is not sold in

(almost) all stores in base line sales. For these products there is a lot of extra promotional sales to

gain. One way of boosting the number of stores is by advertising the promotion in the folder, since

all stores are expected to have the folder promotions available. So when a product is not sold in all

stores it is more interesting for Unilever to invest in folder advertisement.

Variable Effect size Variable Effect size

Display ++ log_growth_number_selling_points ++ Folder + Percentage_repeat_buyers n.e.

TV_support n.e. / + Promotion_pressure n.e. Holiday_products n.e. ln_LF_former_promotions_EAN ++ Promo_length ++ Market_penetration n.e.

Percentual_discount ++ Preservability + SPOa

- log_size_of_product n.e.

Two_fora - Frequency_of_purchase - Three_fora - Personalcarec n.e.

Free_producta n.e. Ice_and_beveragesc - Premiaata n.e. SCC_and_vitality_shotsc + Number_of_products_in_promotion - Savoury_and_dressingsc n.e.

C1000b + winter_products_temp n.e.

Plus b - summer_products_temp n.e.

Kruidvat b - -

n.e. = no effect on the promotional sales a The baseline group for the different product categories is the product group “Four_or_five_for” b The baseline group for the different retailers is the retailer “Albert Heijn” c The baseline group for the different product groups is the product group “Homecare”

To increase the usability of the model in practice, a model is constructed with the above most

important variables. The model fit of this model with a limited number of variables is still surprisingly

high and almost equal to the model fit of the full model. However, not all variables have data

availability at Unilever, since Unilever as a manufacturer is dependent on a retailer for information of

upcoming promotions. For two variables in the adapted model Unilever has no data availability.

These are the percentage of shops with a second placement and the extra number of shops where

the product is sold in promotion. To analyze what the effect is of the lack of data on the forecast

accuracy of the model a new model without these variables is tested. The model fit decreases to an

adjusted R-square of around 0.500, indicating that the exclusion of the two variables substantially


Page VII

worsens the performance of the forecasting model. Thus it is important for Unilever to gain data

availability on these variables.

Unilever not only wants to know how much a promotion sells on the shopping floor, but also wants

to know how much a retailer orders of a product. Therefore, the model results for the consumer

demand are adapted to retailer orders. The retailers included in the research order on average

between 39% and 85% more than is sold during the promotion. The forecasts for the consumer

demand are raised with this difference. The model performance decreases substantially because of

the extra variance in the retailer orders. The adjusted R-square for the Non-Food data set has

decreased to 0.103, meaning that the predictive power of the model is almost absent. For the Food

data set the R-square is 0.392. So, the variability in the retailer orders is a lot higher for the Non-

food products than Food products. Forecasting retailer orders for Non-Food products seems to have

little to no benefit, forecasting retailer orders for Food products has more practical value.

Implementation & conclusions

The different model adaptations in this research show that if the right information is available

Unilever is very well capable of accurately predicting the consumer demand. Unilever has an

advantage over the retailer because of their larger data pool of promotions over all retailers which

can be used to forecast upcoming promotions. Hence, with this skill Unilever is able to take the lead

in establishing a collaboration with retailers and increasing the forecast accuracy.

However, two aspects decrease the forecast accuracy of a manufacturer. First, a manufacturer has

less data availability than a retailer and thus important variables cannot be used to forecast the

promotional demand. Second, forecasting retailer orders has turned out to be far more difficult than

consumer demand, especially for Non-Food products. The bullwhip effect leads to a substantial

deviation between retailer orders and consumer demand. As a result, Unilever should first increase

their data availability on promotions by closer collaboration with retailers and better database

management. Thereafter, in order to be able to accurately forecast retailer orders, the disturbing

factors behind the bullwhip effect should be analyzed. Close collaboration with a retailer is needed to

successfully analyze these disturbances. When the disturbing factors are successfully analyzed, a

promotion forecasting model which forecasts the consumer demand and corrects for the disturbing

factors should be formulated and employed together with the retailer.

Concluding, close collaboration and information sharing is needed, where in the end Unilever and

the retailer together use one forecasting approach. Concepts like Vendor Managed Inventory (VMI),

Continuous Replenishment Program (CPR) and Collaborative Planning, Forecasting and

Replenishment (CPFR) can be used to increase the collaboration between Unilever and a retailer,

where VMI is the most basic concept and CPFR is the most advanced concept.


Page VIII

Preface

This master thesis is the result of the final part of my study Industrial Engineering and Management

at Eindhoven University of Technology. The master thesis project was executed Unilever Netherlands

in Rotterdam from the beginning of 2010 up to the end of the summer.

When I started my master thesis I just came back from an international semester in Hong Kong. Life

over there had been eye opening, and really interesting, but also relaxing and having a lot of fun in

one way or another. Therefore, starting my master thesis in Rotterdam really pushed me back into

normal hard working life. And I have to say that I still feel lucky that an opportunity for my master

thesis had presented itself at Unilever, since the working atmosphere is really good in the

headquarters in Unilever Rotterdam. Luckily the burden of the master thesis did not feel like that at

all, so I can look confidently in to the future where a real job is waiting for me.

I would like to grab the opportunity to express my gratitude towards a few people. First of all, I

would like to thank Patrick van Balkom, my supervisor at Unilever. His guidance and comments

provided very useful insights and shed light on my path the moments I needed it. I really enjoyed

working with him.

Second, I would like to thank my first supervisor at the TU/e, Karel van Donselaar. His thorough

knowledge on the subject led to some very good discussions. And without his efforts of finding an

internship I would not have had the opportunity at Unilever. Third, I would like to thank my second

supervisor at the TU/e, Jeroen Schepers. The feedback he gave on my work provided new insights

and improved the quality of my work.

Lastly, I would like to thank my girlfriend for supporting me during the project.

Thijs van der Poel

Rotterdam, August 2010


Page IX

Index

Abstract .................................................................................................................................... III

Management summary............................................................................................................... IV

Preface ................................................................................................................................... VIII

Index ........................................................................................................................................ IX

Part 1: Project definition .................................................................................................... 1

1 Introduction of research ....................................................................................................... 1

1.1 Structure of report ........................................................................................................ 1

1.2 Company description ..................................................................................................... 1

1.3 Problem introduction ..................................................................................................... 3

1.4 Overview literature........................................................................................................ 5

1.5 Gaps in literature .......................................................................................................... 6

2 Problem definition................................................................................................................ 6

2.1 Problem formulation ...................................................................................................... 6

2.2 Problem decomposition ................................................................................................. 7

2.3 Research questions ....................................................................................................... 8

2.4 Practical requirements research ..................................................................................... 9

3 Scope of research .............................................................................................................. 10

3.1 Region ....................................................................................................................... 10

3.2 Retailers ..................................................................................................................... 10

3.3 Time horizon .............................................................................................................. 11

3.4 Products (SKU’s) ......................................................................................................... 11

Part 2: Research design .................................................................................................... 15

4 Method of research ............................................................................................................ 15

5 Dependent and independent variables ................................................................................ 16

5.1 Dependent variable ..................................................................................................... 16

5.1.1 Lift factor as dependent variable ........................................................................... 16

5.2 Independent variables ................................................................................................. 17

5.3 Transformations of (in)dependent variables .................................................................. 20

5.4 Assign baseline dummy variables ................................................................................. 20

6 Different data sets & hypotheses ........................................................................................ 21

6.1 Sample size ................................................................................................................ 21

6.2 Data set split and reduction ......................................................................................... 22

6.3 Hypotheses effect size variables and data sets .............................................................. 23

6.4 Measurement indicators hypotheses ............................................................................. 25


Page X

Part 3: Results full model.................................................................................................. 27

7 Regression analyses full model ........................................................................................... 27

7.1 Overview most important dependent and independent variables .................................... 27

7.2 Checking the assumptions underlying multiple linear regression ..................................... 28

7.3 Results full model ....................................................................................................... 28

7.4 Validation full model .................................................................................................... 33

8 Generalizability of model results ......................................................................................... 34

8.1 Generalizability of sample size ..................................................................................... 34

8.2 Comparison with other research in the field .................................................................. 37

Part 4: Model adaptation .................................................................................................. 41

9 Adaptations to increase the usability and check for data availability ...................................... 42

9.1 Adaptation 1: Increase the usability by reducing the number of variables ....................... 42

9.2 Adaptation 2: Increase the usability by checking for data availability .............................. 44

9.3 Comparison of the different adaptations with the full model .......................................... 45

10 Model adaptation 3: From consumer demand to retailer orders ............................................ 46

10.1 Calculation retailer orders ............................................................................................ 46

10.2 Model fit on retailer orders .......................................................................................... 46

10.3 Difference between retailer orders and consumer demand ............................................ 47

Part 5: Implementation and conclusions .......................................................................... 51

11 Implementation ................................................................................................................. 51

11.1 Final model for implementation .................................................................................... 51

11.1.2 Results retailer orders (model adaption 1 as basis) ................................................ 52

11.1.3 Results retailer orders (model adaption 2 as basis) ................................................ 53

11.1.4 Conclusion results retailer orders based on model adaptation 1 & 2 ........................ 53

11.1.5 Actions needed to overcome current problems ...................................................... 54

11.2 Implementation plan ................................................................................................... 54

12 Conclusions ....................................................................................................................... 58

12.1 Ideal model ................................................................................................................ 58

12.2 Adaptations needed on ideal model .............................................................................. 60

12.3 Future steps to increase the forecast accuracy .............................................................. 61

12.4 Contribution to literature ............................................................................................. 62

References ................................................................................................................................ 65

Appendices ............................................................................................................................... 67


Page 1

Part 1: Project definition

1 Introduction of research

1.1 Structure of report

The report is structured in five parts. The parts are based on the regulative cycle of Van Strien

(1979). In the first part the motivation for this research is discussed resulting in the Needs of

Unilever (Figure 1-1). This part discusses the exact problem Unilever is experiencing and the

possibilities of dealing with the problem, which results in the starting point for the rest of the

research. The second part will translate the company needs into a research design. The research

design will depict how the needs can be investigated and translated into methods to research the

problem. The third part of this research will discuss the model results. The model results will

enhance the understandability of the problem. The results need to be adapted to be applicable in

practice. This is done in the model adaptation in part four. The last part, the implementation &

conclusions, discusses how this research can be implemented in order to fulfil the needs which are

distinguished in part one. Throughout the different parts, the research will have a contribution to the

existing literature as described in paragraph 1.5. After reading the first part it should be clear what

the problem is that Unilever is encountering and what the scope of this research is.

Figure 1-1: The different project parts of the research

1.2 Company description

Unilever is a global manufacturing company operating in the Fast Moving Consumer Goods (FMCG)

industry. The company is specialized in Food, Home and Personal care products. The company

employs around 163.000 people worldwide. The Unilever portfolio of 400 brands, of which Dove,

Knorr, Lipton and Omo are some of the largest brands, is sold in over 100 countries. These brands

contributed to a turnover in 2009 of 39.8 billion euro’s worldwide (www.unilever.com). Most of

Unilever’s products are manufactured in the 264 self owned plants. This research will focus on

Unilever Benelux, of which the main office is located in Rotterdam. Moreover, the emphasis is placed

on the Dutch market and thus on the Dutch part of the Unilever Benelux organization. This decision

Project definition

(Needs)

Model adaptation

Model results

Research Design

(Methods)

Implementation

& Conclusions


Page 2

will be clarified in paragraph 3.1.

In the Netherlands Unilever is split up in 5 product categories and 4 customer teams. The different

product categories are:

� Home care (HC)

� Personal Care (PC)

� Savoury & Dressings (S&D)

� Vitality shots & Spreads/Cooking Category (SCC)

� Icecream & Beverages (I&B)

The different customer teams are:

� Albert Heijn (including Etos)

� Bijeen (C1000 & Jumbo), Super de Boer en Makro

� Drugteam (different drugstore, e.g. Kruidvat, DA)

� Superunie (16 smaller retailers, e.g. Sligro, Plus)

Unilever is organized in a matrix organization around the above product categories and customer

teams (see Figure 1-2). Alongside the customer teams interdisciplinary Customer Development

Teams meet ones a month to discuss the more tactical issues. The Customer Development Teams

are responsible for the planning horizon between 0-6 months. The product categories are

overviewed by Category Brand Teams which have a longer planning horizon of 3-24 months. Hence,

the customer teams have a more operational focus than the product category teams.

Concluding, in this paragraph the organization structure has been explained in a simplistic way to

enhance the understandability. The product categories and some of the named retailers will be used

in the further research. Hence, the reader is able to position the research within the Unilever

organization.


Page 3

Figure 1-2: Organization matrix Unilever Netherlands

1.3 Problem introduction

Within the Dutch FMCG (Fast Moving Consumer Goods) market as well as foreign markets the

promotional share of the total volume has increased in the last decades. Accordingly, in the

Netherlands the promotion pressure has increased in the last couple of years due to multiple price

wars, to around 40% promotional volume of the total volume. Because of that, Unilever noticed that

their promotion forecasting process became more and more important over the years and needed

improvement. Promotion forecasting has received increased attention at Unilever Benelux since

halfway 2008. The forecasting accuracy at that moment was open for improvements with a case fill,

which is a service level measure, of at best 95% (the current case fill target is 98.5%). Besides the

low case fill, Unilever overforecasted their promotions on average with 30%, which resulted in high

numbers of obsoletes. Furthermore, the employees who produced the promotion forecast were not

able to put a lot of time in a promotion forecast, while they addressed the importance of accurate

forecasting in interviews back then.

Hence, the objective formulated was to reduce the overforecast while increasing the case fill. A

program within the company was directed at report and evaluation possibilities of promotion

forecasting and the training of the involved employees. The given trainings further increased the

awareness of an accurate promotion forecast and improved the creation process of a promotion

forecast. Currently the forecasting accuracy is analyzed on SKU level (stands for Stock Keeping Unit

and is defined as a unique product), range level (i.e. different variants of one product together, for

example different DOVE spray deodorants) and promotion level (all products within one promotion,

for example all products in the DOVE line). Before 2008 the promotion accuracy was not yet

analyzed on SKU level. The forecast accuracy on range and promotion level seemed to be quite

Savoury & Dressings

Home Care

Personal Care

Icecream & Beverages

SSC & Vitality

Alb

ert H

eijn

Bije

en

, Su

per d

e B

oer, M

ak

ro

Su

pe

run

ie

Dru

gsto

res

Product categories

Customer teams

Customer DevelopmentTeams (CDT)(planning horizon of 0-6 months)

Category Brand Teams (CBT) (planning horizon of 3-24 months)

CBT: Marketing,

Sales, Planning,Finance

CDT: Sales, CustomerDevelopment, Customer Service


Page 4

good; however, on SKU level the forecast accuracy looked more dramatic. This occurred because the

variance of the different SKU’s levelled each other out. So when one SKU was overforecasted and

another SKU in the promotion was underforecasted, the forecast inaccuracy of the two SKU’s

cancelled out against each other.

In Figure 1-3 a simplified overview of the supply chain is depicted of the market where Unilever

operates in. In this figure Unilever is the manufacturer, Albert Heijn for example the retailer and

shoppers in the retailer stores are the consumer. In this supply chain there are two different demand

origins, the demand from the consumers at the retailers and the demand from the retailers at the

manufacturer. Both demand origins can be forecasted with a model and the remainder of this

paragraph will show which of the two will be forecasted. The consumer demand is a more direct and

accurate representation of the promotional sales and the retailer orders more indirect and contain

more variation. This increase variation is caused mainly by the forward buying of a retailer, available

stock at a retailer before a promotion takes place, the inaccuracy in the promotional forecast of a

retailer and the pipeline fill. These cause a variation which is difficult to explain, especially since

stock levels of a retailer are pretty much unknown at Unilever. Besides that, modelling the

promotional sales requires modelling the promotion mechanism underlying the sales and occurs on

the shopping floor. Hence, this enables a model to accurately forecast the factors behind

promotional sales. Furthermore, the On Shelf Availability in the retailer shops is regarded as a more

important measure than the case fill at the retailer, since the products are sold in the shops and not

in the warehouse of a retailer. Lastly, when discussing the height of the expected promotional sales

with the retailer, the consumer demand is the fundament for this discussion. Hence, it is preferred to

base the forecast on the consumer demand. This forecast will be corrected for the disturbing factors

between consumer demand and retailer orders. So, first a model will be developed which forecasts

the consumer demand. The consumer demand forecast generated by the model has to be adapted

to retailer orders afterwards. Because consumer demand is used as basis, the research will measure

the promotional demand in consumer units. This measure can be adapted to the more widely used

case pack size measure within Unilever (a case pack contains a certain number of consumer units).

Figure 1-3: Overview of the demand in the FMCG supply chain

Manufacturer Retailer ConsumerGoods Goods

Demand (consumer demand)

Demand(retailer orders)

Adaptation


Page 5

This paragraph aimed to give an introduction of the promotion forecasting problem within Unilever.

It depicted the background of the problem and specified the different demand origins in the supply

chain where Unilever operates in. In the next chapter the problem definition will be discussed in

more detail. First an overview of the available relevant literature about promotion forecasting is

given.

1.4 Overview literature

In this chapter the relevant literature for the research field of this master thesis will be summarized

and linked with the situation of the company. An extensive literature study about promotion

forecasting can be found in Van der Poel (2010a) on which this summary is based.

The importance of promotions within the FMCG sector has grown substantially over the last 20 years

(Blattberg, 1995). The market share of promotions has increased likewise in the Dutch FMCG

market. With the increase of promotion pressure, simultaneously the instability of the demand has

increased. Promotions are responsible for large volumes, typically between 4 to 8 times the base line

sales (Buckers, 2010, Van den Heuvel, 2009). Hence, logically the importance of accurate promotion

forecasting has increased as well. On the one hand, a low case fill, because of underforecasting,

results in Out Of Stocks in retailer stores and is harmful for the sales and retailer relationship. On the

other hand, overforecasting results in extra stock (costs) and potential obsoletes.

There are different types of promotions. At the moment, almost all promotions in the Dutch FMCG

sector are price promotions, where the consumer gets a reduced price in one or another form.

Likewise, the price promotion is the largest single category in the marketing budget in American

FMCG companies (Silva-Riso, 1999). But besides price promotions occasionally a coupon promotion

or a promotion with a premiaat or free product (e.g. gadget, discount on theme park ticket or a free

(new) product) is offered. The success of the different type of promotions is influenced by numerous

variables. In Van der Poel (2010a) 53 variables with a possible influence are listed. Which variables

are perceived as important will be discussed in chapter 5.2. These variables have to be fitted in a

model. The most widely used method found in literature is a multiple linear regression analysis (Van

Loo, 2006, Van den Heuvel, 2009, De Schrijver, 2009, Cooper et al, 1999, Wittink et al, 1988). In

such an analysis multiple independent variables predict one dependent variable. Interaction effects

between independent variables can be incorporated when the form of the interacting variables is

continuous. Furthermore, the (in)dependent variables can be included in their linear and logarithmic

form as long as their form is metric.

The literature described will be useful in the development of a promotion forecasting model on

manufacturer level. The same variables have an impact on the promotional volume for retailers and

manufacturers. However, the data availability will differ and manufacturers are dependent on


Page 6

retailers for the data of an upcoming promotion. No research is available on the effect of unknown

stock levels. As mentioned this is only important if a manufacturer wants to predict retailer orders.

1.5 Gaps in literature

In the literature study of Van der Poel (2010a) multiple gaps in the literature on promotion

forecasting were discussed. This paragraph indicates the gaps this research will address:

1. The dependent variable. This can be the lift factor (hereafter shortened with LF) over the

base line sales or the absolute promotional sales. Furthermore the LF as dependent variable

can be transformed in multiple ways. There is no conclusive research on the performance of

the different forms of the dependent variable.

2. Manufacturer based model. All relevant promotion forecasting models found in literature are

retailer based. It is unclear on what aspects a retailer model and manufacturer model differ

and how this could have an impact on the performance. This gap also relates to the

following gap, which discusses whether it is an advantage or disadvantage to be a retailer.

3. Advantage or disadvantage of being a retailer. It is interesting to investigate if a

manufacturer based model has a different performance than a retailer based model. A factor

which might cause this difference is the dependency on the retailer for information. A

second factor is that the products of a manufacturer might be more homogenate than the

products of a retailer. Moreover, a manufacturer has more promotions of the same product

than a retailer, because the product is sold at multiple retailers. Literature on promotion

forecasting does not specify if there is a difference in performance between a manufacturer

and retailer based model and which factors cause this difference.

2 Problem definition

This chapter specifies the aim of the research. First, the overall problem formulation is depicted.

Second, an initial analysis of the problem context is depicted. Third, the corresponding research

questions are formulated. Lastly, the practical requirements of the research for Unilever are stated.

2.1 Problem formulation

Although some steps have been made the last 2 years (as mentioned in paragraph 1.3), the

promotion forecasting process is still open for quite some improvement. In this paragraph the

problems related to promotion forecasting are used to come to the problem formulation and

accompanying research question with sub questions. The purpose of this research is to analyze the

inaccuracy of the promotion forecasts. Hence the following problem formulation is depicted.

Problem formulation: The forecasting accuracy of the current promotion forecasting

process is too low


Page 7

2.2 Problem decomposition

A first analysis of the overall problem context is shown in the Fishbone diagram in Figure 2-1. The

goal of this analysis is to investigate which general problem areas have an impact on the forecast

accuracy and too choose the scope this research will focus on. The forecast inaccuracy of a

promotion forecast is regarded as the main problem. A high forecast inaccuracy results in more

obsoletes, higher stock costs and a lower case fill. The causes of Forecast inaccuracy can be divided

into five general problem areas, which are:

� Phasing of promotions: For the measurement of the forecast accuracy it is important that

the deliveries of promotional volumes are planned in the correct weeks. This is mainly a

measurement issue, but can cause problems if volumes have to be delivered earlier than

planned. Promotions are typically delivered one or two weeks before the promotion takes

place.

� Sales oriented organization: Unilever is a sales oriented organization where logistic issues

typically have less priority. The sales department wants to be able to deliver the products to

the retailer at all costs, i.e. they have less of an eye for logistic costs and operations. This

mentality can result in a tendency to overforecasting.

� Customer team deviation: There are four customer teams which work quite independently

from each other. Information sharing and learning from other customer teams is not

common practice. Furthermore, ways of working differ substantially between the customers

within one customer team.

� Retailer dependency: Retailers have the power to change promotions and can decide not to

share all relevant information with Unilever. It is common practice that retailers are not

willing to share information, mainly because of data sensitivity. Furthermore, similar

promotions at other retailers can result in last minute changes when the discount of a

promotion at another retailer is higher.

� Poor database usage: The different data sources have to be consulted manually for each

promotion. This is a time consuming user unfriendly process, which does not enhance the

usage of data and thus the forecast accuracy. Furthermore, no model is provided to

calculate the sales of a new promotion. Therefore, an employee has to search and analyze

all the information him or herself.


Page 8

Figure 2-1: Problem areas and scope of the research

The blue zone indicates the main scope of the research, which is mainly to enhance the database

usage within Unilever. The grey zones are areas which will benefit from this research as well,

because of more standardization and more clarity on the information needed from a retailer.

Regarding the main scope, employees currently need to research the different databases by

themselves in order to create an accurate forecast. Furthermore, no tool is provided to calculate a

promotion forecast. Hence, an employee has to make his own assumptions and calculations with

limited information available. Concluding, this process is complex, time consuming and not

standardized and a forecasting tool which is used throughout the Unilever organization will improve

these aspects.

2.3 Research questions

Figure 2-1 shows that a low forecasting accuracy results in more obsoletes, a lower service level and

higher stock costs. The forecasting accuracy should be increased to reduce this effect. Contiguously,

the main research question is: What are the causes for the low forecasting accuracy of the

promotion forecasting process and how can the forecasting accuracy be improved?

Most of the issues can be improved with a suitable demand forecasting model for promotions. Such

a forecasting model can diminish the poor database usage, standardize the processes among

Forecast inaccuracy

Poor databaseusage

Sales oriented

organization

Customer team deviation

Different information available

Different method and timeline for process per

retailer

Risk averse

Usability databases

# of databases to extract data from

Time consuming operation

Retailer

dependency

Last minute changes

No stock data

Limited information

sharing

Promotion forecasts on history single

retailer

Phasing of promotions

Multiple SS increases

Low power Logistiek Assistent

Low ServiceLevel (case fill)

Obsoletes

Forward buying

Divided responsibilities

Measurement error

Measurementmechanism

Lack of tool

Lack of knowledge LA

Stock costs


Page 9

different customer teams, can serve as an argument towards retailers to legitimize the request for

data and can serve as a tool to strengthen the logistic voice in the sales oriented organization of

Unilever. Only the phasing of the orders of a promotion is not expected to directly improve. Hence,

the research will be focussed on the development and implementation of a promotion forecasting

model. The following sub questions can be used to answer the main research question:

1. What are the functional requirements to make a forecasting model work within Unilever?

2. Which products, retailers, time horizon and region should be included in the analysis?

3. Which prediction method is most suitable for promotion forecasting?

4. Which independent and dependent variables should be included in a forecasting model?

5. How should a model generate a forecast for retailer orders?

6. What is the impact of being a manufacturer on the performance of the forecasting model?

2.4 Practical requirements research

Besides the scientific nature of this research, the practical goals should be defined as well. The

overall goal is to improve the promotion forecast accuracy of Unilever. A couple of sub goals should

be formulated to reach this overall goal. The sub goals formulated are practical requirements that a

potential solution should meet in order to work in practice. These are:

1. Ease of use

2. Data availability

3. Consumer demand as basis

(1) Ease of use: This aspect reflects on the fact that a forecasting model should be simple to use.

Hence, it should not cost a Unilever employee too much effort to use the model, the interface should

be very simple and the output should be understandable. Regarding the number of variables in the

model, interviews within Unilever indicated that a practical useful model should contain maximum 10

variables and preferably less. Furthermore, the result the model generates should be understandable

and the model itself should not be seen as a black box. This would decrease the acceptance of the

forecast of the model.

(2) Data availability: The model should work with data which is readily available for the Unilever

employees who have to work with the model.

(3) Consumer demand as basis: As reasoned in paragraph 1.3 the consumer demand will be the

starting point for developing a forecasting model. Later on this forecast will be adapted to retailer

orders.

Summarizing, to meet the practical requirements the model needs to be understandable, have a

high usability, work with available data and focus on consumer demand. These requirements will be

taken into account in the model building process.


Page 10

3 Scope of research

In this chapter the scope of the research will be determined. The region, retailers, time horizon and

products which will form the sample size are discussed successively.

3.1 Region

Unilever Benelux exists out of The Netherlands and Belgium. An analysis has been done to judge the

comparability of the Dutch and Belgium market. If the markets and promotion processes are

comparable, the research would have focussed on both markets. However, the Belgium market

differs too much from the Dutch on a couple of aspects. A vast part of the promotions on the

Belgium market are coupon promotions, while these promotions are rare in the Dutch market.

Moreover, most promotions are promoted on special cardboard displays and multiple items of a

SKU’s are bundled together in a repack. Often different SKU’s are even bundled together in one

repack. Lastly, the retailer Colruyt in Belgium has a lowest price guarantee for all his products.

Therefore, they match promotions of all other retailers on the Belgium market on products they offer

in store. As a result, other retailers try to come up with promotions that Colruyt does not need to

match, which brings a high variety of promotions to the Belgium market.

Concluding, above factors are very likely to cause a different promotion mechanism on the Belgium

and Dutch market. Because the markets are very different, the model will not be capable to benefit

from the larger pool of data. Therefore, one of both markets needs to be chosen for the research.

Since the research is performed from the office in Rotterdam, data collection will be easier for the

Dutch market; therefore, the Dutch market is chosen as scope for this research.

3.2 Retailers

Next, the research needs to be focussed on certain retailers in the market, since inclusion of all

retailers will lead to extensive data gathering and will decrease the quality of the analysis. The

following criteria are used to select four retailers in the market.

� Size of the retailer: How large is the retailer compared to other retailers. This indicates the

importance of a retailer. A large retailer is more important than a small retailer, because

promotions of a large retailer have a higher impact on the safety stock and quicker lead to

out of stocks. Therefore, large retailers are preferred.

� Promotion pressure: How much of the total volume of a retailer originates from promotional

volume.

� Possibility of collaboration with a retailer: If the retailer is likely to or does already cooperate

with Unilever to enhance the accuracy of the promotion forecasting process.


� Data availability: In paragraph

model are discussed. However, to include a variable in the model, data is needed. The data

availability differs for each retailer.

� Duration promotion: The duration of a promotion in the retailer sector

however, some retailers have a different promotion period (e.g. Kruidvat, Jumbo, Makro,

Sligro). It is preferred to perform the analysis on retailers with a promotion period of

week, to enhance the comparability between promotions.

Based on these criteria the retailers

and C1000 are the largest retailers and Kruidvat is the largest drugstore in the Netherlands.

Therefore, including them is very logical although Kruidvat

2 weeks; hence, the duration has to be included as independent variable in the research. Plus is a

smaller retailer; however, they are included because they are open for collaboration and the data

availability on promotions of the Plus is good.

3.3 Time horizon

A time horizon of at least 1 year is desirable, to overcome potential seasonal effects. Therefore, a

time horizon from the beginning of 2009 up to week 13 of 201

able to cross validate the model

model and the promotions in the first quarter of 2010 are use

Figure 3-1). By this approach the results o

2001).

Figure

3.4 Products (SKU’s)

Unilever Netherlands has around 2500 SKU’s

of these products are offered only once and/or have a very low volume. The SKU’s are divided

among five product categories and these five product categories are divided in subcategories. The 5

product categories are named in cha

the bottom layer of the figure.


Page 11

Data availability: In paragraph 5.2 the variables that will be included in the forecasting


availability differs for each retailer.

Duration promotion: The duration of a promotion in the retailer sector is normally one week;


Sligro). It is preferred to perform the analysis on retailers with a promotion period of

week, to enhance the comparability between promotions.

retailers AH, C1000, Kruidvat and Plus are included in the research. AH


Therefore, including them is very logical although Kruidvat has promotions with a duration of 1 and



omotions of the Plus is good.


time horizon from the beginning of 2009 up to week 13 of 2010 is chosen for this research.

the sample is split. The promotions in 2009 are used to calibrate the

model and the promotions in the first quarter of 2010 are used to validate the model results

). By this approach the results of the model can be tested on their robustness (Miles et al.,

Figure 3-1: Time horizon and data split research

herlands has around 2500 SKU’s which are sold at a random point in time. Quite some



product categories are named in chapter 1; the subcategories are depicted below,

forecasting accuracy at Unilever Netherlands

the variables that will be included in the forecasting


is normally one week;


Sligro). It is preferred to perform the analysis on retailers with a promotion period of 1

and Plus are included in the research. AH


has promotions with a duration of 1 and




0 is chosen for this research. To be

split. The promotions in 2009 are used to calibrate the

d to validate the model results (see

f the model can be tested on their robustness (Miles et al.,

m point in time. Quite some



in Figure 3-2, in


Page 12

Figure 3-2: The product categories of Unilever

Since data gathering is a time consuming process, a sample of the total population will be taken.

The sample size needs to be representative for the whole set of products. Hence, the selection of

SKU’s for the research sample is based on the following criteria:

1. At least 10% and at most 90% of the total volume of a SKU originates from promotional

volume. Otherwise, a product does almost have no promotions or almost no base line

sales.

2. The focus will be on the more important high volume SKU’s (A and B SKU’s), although

some low volume SKU’s will be included in the sample as well (C SKU’s).

3. The SKU is sold in promotion in at least 2 of the 4 retailers included in the analysis.

4. The number of the total promotions of a SKU at the 4 retailers in the analysis during the

time horizon of the research should be at least 4.

5. Each SKU is part of a broader range of Unilever products (e.g. the product “Kip Siam

wereldgerecht” is part of the “Knorr Wereldgerechten” range). Always at maximum

three variants of a range will be included to assure the diversity of the sample size.

6. The product should have sales since January 2009 up to march 2010, since this is the

time horizon for the research.

Because there are 13 categories in total and the total sample size should be around 100 products,

the aim is to select between 5 and 15 SKU’s for each subcategory, depending on the category size.

The resulting sample size of 86 products is depicted in Appendix 1. For the category Savoury 19

SKU’s are selected because of the large number of SKU’s in this category (Table 3-1). For the

categories Dressings, Other Foods and Tea, Soy & Fruit Beverages respectively 3, 4 and 3 products

are selected. This is less than the goal of 5, because not enough products fulfilled the criteria. In the

Dressings category and in the Tea, Soy & Fruit Beverages category a lot of products have been

All products

Food Non-Food

Ice &

Beverage

Vitality shots,

Spreads & Cooking

Savoury

& Dressings

Home-

care

Personal-

care

Ice

cream

Tea & fruit

beverages

Vitalityshots

Spreads &cooking

Savoury Dres-

sings

Hair

care

Deo &

grooming

House-Hold care

Laundry

SkinOther

Foods

Other

bakery


Page 13

innovated or relaunched in the time horizon of the research. In the Other Foods category, the

number of products which is sold in promotion by at least two retailers is very limited. The category

Vitality shots does almost not have any SKU’s anymore with respectable sales. The sample size

taken is responsible for 1238 promotions in the time horizon of the research. The total number of

promotions in this time horizon is 15283, meaning that the sample size contains 8.1% of the total

promotions. This percentage of the promotions combined with the selection criteria should provide a

representative sample size. This will be checked in chapter 8.

Category Number

of SKU’s

Category Number

of SKU’s

Deo & Grooming 9 Other bakery 0

Dressings 3 Savoury 19

Hair care 8 Skin 10

Household care 6 Spreads and cooking products 8

Ice cream 9 Tea and soy & fruit beverages 3

Laundry 6 Vitality shots 0

Other foods 4 Total 86

Table 3-1: Number of SKU's per category

Conclusion Part 1: This part resulted in a clear problem formulation, scope of the research and

requirements which the research should fulfill in practice (sub research question 1). The research

will be focused on improving the forecast accuracy of Unilever by taking the consumer demand as a

starting point and later on adjust this consumer demand to retailer orders. Chapter 3 depicted the

retailers, SKU’s, time horizon and region which form the sample size of the research (sub research

question 2).


Page 14


Part 2: Research design

The first part defined the needs within the Unilever

organization, stated the problem formulation and

limited the scope of the research. In

able to produce an accurate forecasting model,

this paragraph will discuss which methods are most

suitable, which variables should be included and what

effects are expected in the results of the model

(hypotheses). Basically, this part forms t

4 Method of research

In this paragraph the method to be used

et al (1988), three families of forecasting models can be distinguished, namely judgmental methods,

Time series analysis and Explanatory methods. Judgmental forecasting is currently used within

Unilever. The aim is to come to a more sophisticated, quantitative model. Time series analysis

requires lengthy time series for the prediction of the upcoming period(

events which occur on an infrequent basis. Therefore, time series cannot be used to analyze the

promotional volume. Lastly, Explanatory methods aim to forecast the promotional volume as a

dependent variable by independent variab

explanatory relationship with the dependent variable. Concluding, of the two quantitative

approaches, only explanatory models are suitable for forecasting promotional volumes. Lastly,

judgmental analysis will always co

with common sense.

Next, the different forecasting methods within the explanatory family will be analyzed and a choice is

made for one method. Van Loo (2006)

criteria applicable to the situation at Unilever (see

performs the best relative to the other methods

relative to the other methods. Van Loo concluded it is not even sure if more simple models are

outperformed by more complex models. Therefore, the scoring on the accuracy criteria is

questionable. Still single equation models are

demands a model which is relative flexible and easy to use and interpret. The single equation

models can be further divided in single and multiple linear regression models. Since simple linear

regression models can only include one independent variable and the promotional volume is

dependent on more than one independent variable, multiple regression is chosen as the most

appropriate method. This is consistent with the analysis of Van der Poel (2010

that multiple linear regression is the most widely used method in literature.


Page 15

The first part defined the needs within the Unilever

organization, stated the problem formulation and

limited the scope of the research. In order to be

able to produce an accurate forecasting model,

this paragraph will discuss which methods are most

suitable, which variables should be included and what

effects are expected in the results of the model

(hypotheses). Basically, this part forms the framework for the research.

to be used for the research will be discussed. According to Makridakis


series analysis and Explanatory methods. Judgmental forecasting is currently used within


requires lengthy time series for the prediction of the upcoming period(s). However, promotions are



dependent variable by independent variable(s). Each independent variable needs to have an



ll always co-exist, since the forecast a model provides needs to be verified


made for one method. Van Loo (2006) analyzed the four most important forecasting techniques on

criteria applicable to the situation at Unilever (see Table 4-1), where a 4 indicates that the method

the best relative to the other methods and a 1 indicates that a method performs the worst

. Van Loo concluded it is not even sure if more simple models are


questionable. Still single equation models are perceived as most suitable, especially since Unilever



on models can only include one independent variable and the promotional volume is


. This is consistent with the analysis of Van der Poel (2010b) wh

that multiple linear regression is the most widely used method in literature.


will be discussed. According to Makridakis


series analysis and Explanatory methods. Judgmental forecasting is currently used within


s). However, promotions are



le(s). Each independent variable needs to have an



exist, since the forecast a model provides needs to be verified


ost important forecasting techniques on

where a 4 indicates that the method

method performs the worst

. Van Loo concluded it is not even sure if more simple models are


perceived as most suitable, especially since Unilever



on models can only include one independent variable and the promotional volume is


) which concludes


Page 16

Criteria Single-equation (single and

multiple linear regression)

Multiple-

equation

Econometric

models

Artificial Neural

Networks (ANN)

Accuracy 1 2 3 4

Costs 4 3 2 1

Complexity 4 3 2 1

Data need 4 3 2 1

Ease of interpretation 4 3 2 1

Ease of Use 3 2 1 4

Total 20 16 12 12

Table 4-1: Performance forecasting techniques (Van Loo, 2006)

This paragraph concluded that multiple linear regression is the most suitable method for a promotion

forecasting model. Consequently, this research will make use of multiple linear regression to analyze

the promotions of Unilever.

5 Dependent and independent variables

In this chapter the dependent and independent variables that will be included in the model are

discussed. The choice of the variables and the form of the variables have an important effect on the

model results later on. A poor choice of variables results in a low model fit and thus in an inaccurate

forecasting model for Unilever. Therefore, an adequate analysis will be made in this chapter to select

the variables.

5.1 Dependent variable

The dependent variable of the model is the sales height of a promotion. As concluded in paragraph

2.4, the consumer demand will be forecasted as dependent variable. However, this variable can be

predicted in numerous forms. Hereunder, the LF as a form of the promotional consumer demand is

discussed.

5.1.1 Lift factor as dependent variable

Paragraph 2.4 consumer demand was chosen as dependent variable. The promotional consumer

demand as dependent variable can still be forecasted in multiple ways. A distinction is made

between the absolute sales of a promotion and the LF of a promotion (Cooper et al, 1999, Wittink et

al, 1988). The LF is defined as follows:

� =Promotional sales

Lift Factor Base line sales

Formula 5-1


Page 17

The advantage of working with a LF as dependent variable is that the promotional sales volume is

standardized against the base volume. As a result, the influence of the absolute sales height of a

promotion has been removed from the model equation. The promotional sales is a given fact, but

the way the base line sales is calculated is less straight forward. In this research the base line sales

is calculated by averaging the base line sales of the 5 weeks before a promotion (consistent with

Van den Heuvel, 2009). A time period of 5 weeks has been chosen to reduce the effect of

irregularities in the base line sales. Furthermore, when a promotion occurs in these 5 weeks, which

happens only occasionally, these promotional sales are not included in the base line sales. But, a

substitute base line is calculated when a promotion takes place. No other corrections are made on

the base line sales. Because the above approach works with a base line, the seasonality and trend

effects are included, since the base line sales is already subjected to these effects. Therefore,

seasonality and trend effects will not have to be included as independent variables in the model.1

Summarizing, the LF of a promotion is preferred above the absolute sales of a promotion, since the

absolute promotional sales does not provide a forecasting model with a clear reference (i.e. the

absolute sales constantly differs because different products and retailers have different height of

sales).

5.2 Independent variables

Van der Poel (2010a) published a list of 53 variables with a possible influence on promotional

demand. Of these 53 variables a selection of 21 variables is made which are taken into account in

this research. The other variables are excluded because of lack of data, complexity issues,

irrelevance because of the supply chain perspective taken or a limited expected influence. The most

important variables omitted are (1) the percentage of products which is on promotion within the

category, (2) the percentage of products which was on promotion within the category last week, (3)

promotions of competitors and (4) price discount of last promotion. The first three variables are

excluded because of a lack of data and the last variable is excluded because of complexity issues.

Figure 5-1 categorizes the 21 variables taken into account among the clusters Promotion, Retailer

and Brand. The Promotion cluster is perceived as the most important before the Brand cluster and

Retailer cluster. Figure 5-1 forms the backbone of this research. The split between the clusters is

made to create a better overview and to gain insight in the effect sizes for promotion related

variables, retailer related variables and brand related variables.

1 The model will use the last 5 weeks before a promotion to calculate the average base line sales. In practice a promotion has

to be forecasted between 13 and 4 weeks in advance, when the base line sales is not available for the 5 weeks before a

promotion. Than the base line sales will have to be calculated for the upcoming weeks with a simple trend and seasonal

model.


Page 18

Figure 5-1: Variables with a likely influence on the promotional sales

The variables in Figure 5-1 are discussed in more detail in Table 5-1. Under type of promo, multiple

promotion mechanisms are discussed in further detail. The minimum and maximum measurement

values are shown in the third column and the scale of a variable is depicted in the fourth column of

the table. The variables as described in the underlying table will be tested on their effect on

promotional sales. Which effect is expected to occur for each variable is depicted in paragraph 6.3.

Variable Description Measurement Scale

• Promotion variables

Display The percentage of the selling stores in which the promotion is placed on a

display (kopstelling). The variable is not available for Kruidvat and will be

replaced with the average over the other observations (Cooper et al, 2003).

(0, 100) % Scale

Folder Depicts if the promotion is shown in the folder of the retailer. (0, 1) Nominal

TV support This variable states if the promotion is shown on television. Because of low

data availability this variable is only available for the retailer Albert Heijn.

(0, 1) Nominal

Holiday products The interaction effect between the holiday weeks (New year, Easter,

Whitsunday, Christmas) and products which have higher sales during holiday

weeks (luxury ice-cream).

(0, 1) Nominal

Promo-length Length promotion in weeks (1 or 2 weeks). (1, 2) Scale

Absolute discount The absolute price decrease of a promotion measured per product. (0, 3.53) € Scale

Percentual discount The percentual price decrease of a promotion. (0, 100) % Scale

Promotion

variables

Brand

variables

Retailer

variables

Promotional

sales

Advertising

Display

Holiday

Length promo

Price decrease

Promo mechanism

Susceptibility

to stockpiling

Preservability

Size of product

Absolute

Percentual

Frequency of

purchase

n

n

p

n = available data in Nielsenp = data available in promoplannerm = data available at marketings = SAP datag = general available data

g

p

n

s

s

m

Folder

TVp

n

# of products

in promotions

p

Retailern

Market

penetration

m

Repeat buyersm

# of selling

points

m

Lift factor former

promotions

Product categorys

Promotion

pressure

p

Weathern

Summer products

Winter products

n

n


Page 19

Promo mechanism: The different promo mechanisms will be programmed with dummy variables:

� SPO � Single price off, the consumer receives discount when he buys at least

one promotion product.

(0, 1) Nominal

� two for X � The consumer receives discount when he buys at least two promotion

products.

(0, 1) Nominal

� three for X � The consumer receives discount when he buys at least three promotion

products.

(0, 1) Nominal

� four or five for X � The consumer receives discount when he buys at least four or five

promotion products.

(0, 1) Nominal

� Premiaat � The consumer receives a free non Unilever item with the promotion

product(s).

(0, 1) Nominal

� Free product � The consumer receives a free Unilever product with the promotion. The

free product is mostly a new Unilever product and cannot be compared

with a for example 2+1 promotion, since the consumer is not able to

choose the product he gets for free.

(0, 1) Nominal

Number of products in

promotion

The number of SKU’s which are sold in the same promotion. (1, 366) Scale

• Retailer variables

Retailer The retailer (Albert Heijn, Kruidvat, C1000, Plus) where the promotion is sold. (0, 1) Nominal

Growth # of selling points The number of selling points where the promotion is sold divided by the

average number of selling points in the 5 weeks before the promotion period.

(-47, 162) % Scale

• Brand variables

Repeat buyers The percentage of repeat buyers of the product in a quarter of a year. (0, 100) % Scale

Promo pressure The percentage of products of the total sales which is sold in promotions. (0, 100) % Scale

LF former promotions SKU The natural logarithm of the average LF of historical promotions of the

product.

(0.32, 2.97) Scale

Market penetration The percentage of consumers who buy the product. (0, 100) % Scale

Preservability The preservability of a product in days with a maximum of 730 days. (84, 730) Scale

Size of product The size of a product in cubical centimetres. (194, 3444) Scale

Frequency of purchase The number of times a product is bought by consumers on average in a

quarter of a year.

(1.5, 7.6) Scale

Product category The category to which the product belongs (Ice & beverages, Savoury &

dressings, Spreads & Cooking, Home Care, Personal Care).

(0, 1) Nominal

Winter products

temperature

The interaction effect between the average weekly temperature and products

which report higher sales during cold weather.

(-2.91, 18.33) Scale

Summer products

temperature

The interaction effect between the average weekly temperature and products

which report higher sales during warm weather.

(0, -20.80) Scale

Table 5-1: Overview of the independent variables with a likely influence on promotional sales


Page 20

5.3 Transformations of (in)dependent variables

In this paragraph the variables which should be considered for transformation are discussed.

Variables included in a linear regression analysis should meet the assumptions of parametric data

(Field, 2005). For the dependent variable it is most important that these assumptions are met. The

assumptions for parametric data are:

1. Normally distributed data

2. Interval data

3. Independent of other variables in or outside the model

4. Homogeneity of variance

In order to judge normality in large sample sizes, with more than 200 cases, one should look at the

histogram of a variable and the value of the skewness and kurtosis instead of calculating their

significance (Field, 2005). In appendix 2 the histogram of the variable “Lift Factor” shows obvious

signs of non-normality. Therefore, one of the advised transformations by Field (2005) is applied on

this variable. The natural logarithm of the LF seems to meet the normality requirements.

Furthermore, the variable fulfils the interval data assumption. The assumption of homogeneity of

variance and independency will be tested for the regression model as a whole in paragraph 7.2. The

research of Van Loo (2006) uses a different transformation of the LF, which will be discussed in

paragraph 8.2.

Next, the independent variables will be tested on the above assumptions. Most of the independent

variables are coded with dummy variables and thus do not qualify for transformation. On the

variable LF former promotions SKU the same transformation is applied as above (see appendix 2),

with similar results. Of the other independent variables which have interval data characteristics,

“growth number of shops”, “absolute discount” and “size of product” are positively skewed and

“percentage of repeat buyers” is negatively skewed. Therefore, a log(10) transformation is applied

to these variables (Field, 2005). 2 According to the statistics in appendix 2, the normality of three

variables (growth number of shops, absolute discount and size of product) has improved. Therefore,

these variables are included in the analysis in their logarithmic form.

5.4 Assign baseline dummy variables

According to Field (2005) a baseline group should be chosen, when a characteristic is coded with

dummy variables. The effect of the other dummy groups will be measured against the baseline

group. There are three characteristics which are coded with dummy variables and need a baseline

group. These are Retailer, Product category and promo mechanism. Albert Heijn is chosen as the

2 For the independent variable the natural logarithm performed slightly better than the Log 10 values. Therefore, the ln values

are used. For the dependent variables the Log 10 transformation is used.


Page 21

baseline variable for Retailer, Homecare as the baseline variable for “product category” and “4 or 5

for X” as the baseline variable for promo mechanism. Regarding the variable promo mechanism,

4.6% of the promotions have a double promo mechanism (e.g. three for 5 euro plus a free gadget).

Moreover, because the percentage of double promo mechanisms is fairly small the impact on the

choice of a baseline variable will be absent or minor.

Concluding, the dependent and independent variables have been chosen and the form in which they

should be included in the model has been discussed as well. In chapter 4, multiple linear regression

was already decided to be the most suitable method. Together, these chapters provided the

fundamentals for the construction of a forecasting model. The next chapter will discuss the

hypotheses and the different datasets that will be tested with the model.

6 Different data sets & hypotheses

In this chapter the different datasets that will be tested with the research model are discussed and

the hypotheses are formulated for the direction of the variables and the performance of the data

sets. The purpose of the chapter is to construct theoretical expectations (hypotheses) which will be

tested in the model result part. First the data sets will be determined, which deviate from each other

depending on the products categories that are included in the data set. The data set of promotions

cannot be broken down unlimitedly into subsets, because of a minimum sample size that is required.

First this minimum sample size is discussed, second the different data sets are discussed, third the

hypotheses are discussed and fourth the measurement indicators for the model performance are

discussed.

6.1 Sample size

According to Green (1991), the minimum acceptable sample size of a data set if one wants to test

the overall fit of a model can be determined with the formula 50 + 8k, where k is the number of

predictors. Table 5-1 discussed 333 predictor variables which will be included in the model. This

results in a minimum sample size of 314 cases. This rule of thumb is very useful but oversimplifies

the issue. As a final check the difference between the R2 and the adjusted R2 should be analyzed.

When the difference between those two measures is minor, the variance explained by the regression

model is more likely to be generalizable to other datasets. The R2 and adjusted R2 are compared in

the model results.

3 This is more than the 21 variables named earlier in this research, because some variables of those 21 need to be coded

with multiple dummy variables (e.g. Retailer)


Page 22

6.2 Data set split and reduction

Hence, the dataset should not be split in such a way that the minimum acceptable sample size is

violated. Besides the data set split, an analysis of the outliers is discussed as well in this paragraph.

The most obvious split in the dataset is between Food and Non-food (Homecare and Personal care

products, hereafter named HPC) SKU’s. Food SKU’s follow a different sales pattern than HPC SKU’s,

where HPC SKU’s are more slow moving products and thus have a much lower sales. Other potential

data set splits are on a retailer, product category and promo mechanism level. But, these splits do

result in sample sizes which are not large enough for the different data sets and/or the demand

patterns of the possible splits do not clearly differ. While the products on Food and HPC level do

clearly differ in demand height and pattern. Hence the sample is split on this level which results in

three models: All categories, Food categories and HPC categories.

Next, when exploring the outliers of a regression where all cases are included, it seems that most of

the outliers originate from the Magnum products (see Appendix 3). More specific, 15 out of the 24

outliers originate from the Magnum products. After closer inspection, the Magnum products have

very high LF’s. The first two staves in Figure 6-1 are the LF’s of two out of three Magnum products

in the sample size and are considerably higher than the LF’s of the other products in the sample

size. Since the Magnum SKU’s are responsible for over half of the outliers and have a very high LF,

the different data sets will be tested with and without the three Magnum SKU’s. The high Lift Factor

of the Magnum products are probably the result of the combination of the facts that Magnum is a

very strong brand, that Magnum is an expensive brand with high absolute discounts in promotion

and that Magnum is not often in promotion. However, the high average LF of Magnum is still

remarkable against the average LF of the other Unilever brands. Besides the Magnum products,

three other promotions have been deleted because their standard residual exceeded 3.5 (Appendix

3).


Page 23

0.00

10.00

20.00

30.00

40.00

50.00

60.00

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85

Average Lift factor

SKU

Average Lift factor sample SKU's

Figure 6-1: Lift Factors of the SKU’s in the sample size

As a result, the 5 data sets in Table 6-1 will be tested, where a first split is made between Food and

HPC categories and a second split is made by the inclusion or exclusion of Magnum products.

Categories Data set

number

Number of

total cases

Cases calibration

period

Cases validation

period

All Data set 1 1235 989 246

Food Data set 2 482 388 94

HPC Data set 3 753 601 152

All w/o Magnum Data set 4 1211 968 243

Food w/o Magnum Data set 5 458 367 91

Table 6-1: Data set to be checked to analyze best performing data set

6.3 Hypotheses effect size variables and data sets

Hereunder, in Table 6-2, the hypotheses are formulated for the different independent variables and

the performance of the data sets relative to each other. These hypotheses will be checked using the

results of the model on the different data sets. The number of plus or minus signs indicate the

expected size of the effect on the promotional sales. All variables originate from literature and in the

research of Van der Poel (2010a) the source of each variable is depicted.


Page 24

Variable Effect Explanation

• Hypotheses Promotion variables

H1 Display ++ A promotion placed on a display will have higher promotional sales.

H2 Folder ++ A promotion depicted in the folder will have higher promotional sales.

H3 TV support ++ A promotion showed on TV will have higher promotional sales.

H4 Holiday products + Products which are expected to sell better in holiday weeks are expected to have higher

promotional sales in a holiday week.

H5 Promo-length ++ The longer a promotion the higher the promotional sales.

H6 Absolute discount ++ A higher absolute discount will result in higher promotional sales.

H7 Percentual discount ++ More percentual discount will result in higher promotional sales.

H8 Promo mechanism Ranked from the expected most positive to the most negative effect on the promotional

sales: Four or five for X, three for X, two for X, SPO, free product, premiaat.

H9 Number of products in

promotion _ A promotion with more products in the same promotion will result in lower promotional

sales per SKU.

• Hypotheses Retailer variables

H10 Retailer Unknown which retailer will have a positive or negative effect.

H11 Growth # of selling points ++ More extra selling points will result in higher promotional sales.

• Hypotheses Brand variables

H12 Repeat buyers _ A higher percentage of repeat buyers indicates a larger group of loyal consumers and

likely a lower LF.

H13 Promo pressure + A high promo pressure means relative low base sales. Hence, promotional pressure

increases the LF, since this measure is dependent on the base.

H14 LF former promotions SKU ++ Higher historical LF’s of a SKU indicate higher promotional sales.

H15 Market penetration _ When a higher amount of consumers already buys the product there will be fewer

consumers who switch to this product in promotion.

H16 Preservability + Products with a longer preservability will have higher promotional sales.

H17 Size of product _ The more space a product consumes the lower the susceptibility to stockpiling is, which is

likely to result in a lower LF.

H18 Frequency of purchase _ The higher the frequency of purchase the lower the susceptibility to stockpiling is, which

is likely to result in a lower LF.

H19 Product category Unknown which product category will have a positive or negative effect.

H20 Winter products temp. _ Promotions in weeks with a low temperature will have a higher LF for “winter” products.

H21 Summer products temp. + Promotions in weeks with a high temperature will have a higher LF for “summer”

products.

• Hypotheses different datasets

H22 Dataset 1 & 2 vs. dataset 4

& 5 The exclusion of the Magnum products will increase the model fit for dataset 4 & 5.

H23 Dataset 3 & 5 vs. dataset 4 Breaking down the data set in Food and HPC categories makes the data sets more

specific and will result in a higher model fit for dataset 3 & 5.

Table 6-2: Hypotheses of the effects sizes of the variables and the performance of the different data sets


Page 25

6.4 Measurement indicators hypotheses

Before the results will be discussed in the next chapter, general accepted measurement indicators to

test the hypotheses will be specified in this paragraph. The measurement indicators can be

distinguished on model performance and variable performance. The model performance is tested

with two measurement indicators, the (adjusted) R-square and the MAPE. The R-square and

adjusted R-square are calculated according to formulas 6-1 and 6-2. The (adjusted) R-square is a

widely used measurement for the goodness of fit of a linear regression model and is used in other

research on promotion forecasting as well (Van Loo (2006), Van den Heuvel (2009), Wittink et all

(1988)). For the validation period of the models in this research only the R-square is depicted,

because the adjusted R-square has no meaning when a model from a calibration period is fitted on a

validation period.

=2 mo del

regression

SSR

SS Formula 6-1

−= − −

− −

2 2 11 (1 )

1

nAdjusted R R

n p Formula 6-2

- With n is the number of cases and p the number of predictors included in the model.

The other measure on model performance is the MAPE (mean absolute percentage error). The MAPE

measure states the absolute error of a forecast for every single promotion and aggregates this for

the total data set. Hence, the MAPE calculates the average absolute error of multiple forecasts and

thus the inaccuracy of these forecasts. The most widely used MAPE measure uses the actual sales as

the basis (formula 6-3). However, within Unilever the MAPE is based on the forecast and the

maximum MAPE value is limited to 100% for each independent promotion (formula 6-4). Formula 6-

5 transforms the MAPE (i.e. the forecast inaccuracy) to the forecast accuracy; where a lower MAPE

percentage relates to a higher forecast accuracy.

( )1

actual sales – forecast1MAPE actual *100

actual sales

n

n= ∑ Formula 6-3

( )1

forecast - actual sales1MAPE Unilever Min , 100 *100

forecast

n

n

=

∑ Formula 6-4

=Forecast accuracy 100% - MAPE Formula 6-5

To test the variable performance, one would like to know the effect size and direction a single


Page 26

variable has in the total model. Linear regression models indicate the effect size and direction of a

single variable with the Beta coefficients (β), which are situated in front of every variable in the

equation. However, the Beta coefficient is an unstandardized measurement, since the scale for each

variable differs (e.g. the dummy variable for folder versus the LF former promotions SKU).

Therefore, the effect size and direction will be judged on the standardized Beta coefficients, which

are corrected for the different scale of each variable.

Conclusion Part 2: This part depicted how to model the promotions of Unilever. In total 23

independent variables will be used to forecast the natural logarithmic value of the LF of the

promotional demand (sub research question 3). Multiple linear regression is chosen as the best

method to forecast promotional demand (sub research question 4).


Part 3: Results full model

After defining the goals of the research and specifying the

the research, this part will test the results of th

design. It will analyze which variables are most

important for a promotion forecasting model and

what the performance of the model

serves as the starting point for the creation of a

model which can be used within the Unilever

organization. The number of variables included in the

full model in this chapter is quite numerous, but it indicates out the variables that should be included

in the forecasting model for Unilever.

7 Regression analyses

The following chapter discusses

model is calibrated with the promotions of 2009 and validated with the promotions occurring in

quarter 1 of 2010.

7.1 Overview most important dependent and independent variables

In Table 7-1 the descriptive statistics of the (un)transformed dependent variable and the most

important continues variables of the research are depicted.

observations are shown (N), which have

outliers and 24 Magnum promotions have

Data set 1 and 2 still include the Magnum promotions. In the third column the minimum is shown

In the last three columns the mean, standard deviation and variance of the variables are shown. The

mean over all the LF’s is 6.07, meaning that on average a promotion sells 6.07 times the base line

sales within the taken sample size.

4 The LF of three promotions in the dataset is lower than one. A LF lower than one is very uncommon in promotions, since in

that situation a promotion would sell less than

than one, the LF has been changed to one.


Page 27

Results full model

After defining the goals of the research and specifying the

the research, this part will test the results of the model

design. It will analyze which variables are most

important for a promotion forecasting model and

what the performance of the model is. This part

serves as the starting point for the creation of a

model which can be used within the Unilever

tion. The number of variables included in the


in the forecasting model for Unilever.

Regression analyses full model

The following chapter discusses the results of the full model for the five different

calibrated with the promotions of 2009 and validated with the promotions occurring in

Overview most important dependent and independent variables

the descriptive statistics of the (un)transformed dependent variable and the most

important continues variables of the research are depicted. In the second column, t

observations are shown (N), which have been reduced from 1238 promotions to 1211, because 3

promotions have been excluded for data set 3, 4 and 5 (paragraph

Data set 1 and 2 still include the Magnum promotions. In the third column the minimum is shown



sales within the taken sample size.

he LF of three promotions in the dataset is lower than one. A LF lower than one is very uncommon in promotions, since in

that situation a promotion would sell less than the normal base line sales. Therefore, for the three cases where the LF is low

than one, the LF has been changed to one.


design of


for the five different data sets. The

calibrated with the promotions of 2009 and validated with the promotions occurring in

the descriptive statistics of the (un)transformed dependent variable and the most

In the second column, the number of

om 1238 promotions to 1211, because 3

for data set 3, 4 and 5 (paragraph 6.2).

Data set 1 and 2 still include the Magnum promotions. In the third column the minimum is shown4.



he LF of three promotions in the dataset is lower than one. A LF lower than one is very uncommon in promotions, since in

. Therefore, for the three cases where the LF is lower


Page 28

N Min Max Mean Std. Dev. Var.

LF_promotions 1211 1.00 49.8 6.07 5.16 26.62

ln_LF_promotions 1211 0.00 3.91 1.56 0.68 0.46

ln_LF_former_promotions_EAN 1211 0.32 2.97 1.35 0.38 0.14

Display 1211 0.00 100.0 52.3 26.4 696

Percentual_discount 1211 0.00 57.0 24.2 16.0 255.7

Absolute_discount 1211 0.00 3.53 0.66 0.64 0.41

Table 7-1: Descriptive statistics most important variables in the model

7.2 Checking the assumptions underlying multiple linear regression

Next, the most important assumptions for linear regression are verified (Field, 2005):

� Normality of dependent variable

� Multicollinearity

� Normality of the error distribution

� Homoscedasticity (constant variance) of the errors

� Linearity of the relationship between dependent and independent variables

� Independence of the errors

A verification of the assumptions can be found in Appendix 4. The assumption analysis is performed

on Data set 1 which includes all promotions of 2009. It is assumed that when the assumptions are

met for this data set, they will be met for the other four data sets as well; since the other data sets

are a large subset of data set 1. The analysis in Appendix 4 shows that all of the assumptions

regarding a linear regression analysis are met.

7.3 Results full model

In this paragraph first the model performance for the different data sets is discussed and second the

effect size and direction of individual variables are discussed. In Table 7-2 the model summary of

the different data sets is depicted. The number of predictors varies between 14 and 21. The

adjusted R-square values range between 0.575 and 0.704, indicating that there is a difference

between the model fit of the five data sets. Data sets 1 & 2 with Magnum products included have a

lower model fit than data sets 4 & 5. Table 7-2 shows the difference of the adjusted R-square values

and MAPE values (H22 confirmed). This indicates that the variability of Magnum products worsens

the model fit. Furthermore, the adjusted R-square of data set 3 & 5 is slightly higher than data set 4

(0.703 and 0.705 against 0.697). However, the difference is too small to confirm that splitting the

cases into a data set for Food and HPC results in a higher model fit (H23 unconfirmed).


Page 29

calibration period (2009)

All

(1)

Food

(2)

HPC

(3)

All w/o

Magnum (4)

Food w/o

Magnum (5)

sample size 989 388 601 968 367

number of predictors 21 16 14 19 18

R-square 0.635 0.593 0.711 0.697 0.713

adjusted R-square 0.627 0.575 0.704 0.691 0.698

MAPE (actuals) 31.7% 36.3% 27.2% 27.9% 26.3%

MAPE (Unilever) 29.7% 32.5% 26.8% 27.0% 24.9%

Table 7-2: Model summary of the full model

Concluding, the model fit of the models without Magnum products is better than the models where

the Magnum products are included. However, splitting up the total data set in a Food and HPC data

set does not result in an obvious better performance.

Besides the performance of the overall model the cause of the model fit is very interesting as well,

i.e. which independent variables in the model are responsible for the model fit. The coefficients of

the independent variables (B) accompanied with their significance level and standardized coefficients

(Beta) are depicted in Table 7-3.

All

(1)

Food

(2)

HPC

(3)

All w/o Magnum

(4)

Food w/o

Magnum (5)

B Beta B Beta B Beta B Beta B Beta

(Constant) -3.697 -3.923 -2.799 -3.401 -4.621

Display 0.007 0.253* 0.004 0.165* 0.009 0.301* 0.008 0.317* 0.007 0.328*

Folder 0.497 0.179* 0.726 0.299* 0.234 0.071* 0.467 0.180* 0.681 0.343*

TV_support a a a a 0.311 0.07* a a a a

Holiday_products a a a a a a a a a a

Promo_length 0.572 0.353* 0.761 0.087** 0.535 0.405* 0.593 0.398* 1.015 0.146*

log_absolute_discount -0.676 -0.146* a a -0.473 -0.134** -0.562 -0.132* a a

Percentual_discount 0.021 0.464* 0.014 0.163* 0.023 0.651* 0.022 0.543* 0.022 0.313*

SPO b -0.153 -0.074** a a -0.185 -0.073** -0.207 -0.107* -0.203 -0.135**

Two_for b -0.292 -0.202* -0.199 -0.129* -0.248 -0.169* -0.300 -0.224* -0.332 -0.264*

Three_for b -0.258 -0.152* -0.152 -0.075 -0.232 -0.154* -0.237 -0.152* -0.262 -0.160**

Free_product b a a a a a a a a a a

Premiaat b a a a a a a a a a a

Number_of_products_in_promotion -0.001 -0.132* -0.004 -0.156* a a -0.001 -0.105* -0.003 -0.162*


Page 30

C1000 c 0.090 0.048** a a 0.122 0.058** 0.168 0.096* 0.190 0.134*

Plus c -0.179 -0.102* -0.336 -0.192* a a -0.136 -0.083* -0.235 -0.166*

Kruidvat c -0.503 -0.327* a a -0.507 -0.391* -0.509 -0.36* a a

log_growth_number_selling_points 3.815 0.306* 4.254 0.184* 3.714 0.356* 3.802 0.332* 4.373 0.231*

Percentage_repeat_buyers a a a a a a a a a a

Promotion_pressure 0.003 0.053 0.007 0.126** a a a a a a

ln_LF_former_promotions_EAN 0.798 0.414* 0.856 0.453* 0.663 0.337* 0.600 0.336* 0.491 0.323*

Market_penetration a a a a a a a a a a

Preservability 0.001 0.291* 0.001 0.296* a a 0.001 0.168* 0.001 0.247*

log_size_of_product a a a a a a 0.112 0.057 0.362 0.180*

Frequency_of_purchase -0.043 -0.059 -0.098 -0.132* a a -0.053 -0.079* -0.115 -0.193*

Personalcare d a a a a 0.086 0.055 a a a a

Ice_and_beverages d -0.310 -0.133* -0.776 -0.440* a a -0.164 -0.069* -0.361 -0.236*

SCC_and_vitality_shots d 0.594 0.173* 0.454 0.185* a a 0.318 0.101* 0.396 0.204*

Savoury_and_dressings d 0.146 0.090** a a a a a a a a

winter_products_temp a a a a a a a a a a

summer_products_temp 0.042 0.306* 0.052 0.519* a a a a a a

* = significant at a 0.01 significance level ** = significant at a 0.05 significance level a The variable is not significant for this data set. b The baseline group for the different product categories is the product group “Four_or_five_for” c The baseline group for the different retailers is the retailer “Albert Heijn” d The baseline group for the different product groups is the product group “Homecare”

Table 7-3: Unstandardized and standardized Beta coefficients with significance level for all 5 data sets

Based on the Beta coefficients in Table 7-3 the hypotheses drawn in paragraph 6.3 will be discussed.

The promotional variables, Display, Folder and TV_support were expected to have a very positive

effect on the promotional sales. Display indeed has a very positive effect, Folder has a positive

effect; however, less than display. And TV_support only has an effect in the HPC dataset (H1 and

H2 confirmed and H3 rejected). Because of the unexpected result for the variable TV support an

extra analysis is performed. Since there is only data availability for promotions at the Albert Heijn for

the variable TV_support, it is worth to check if the variable is significant if loaded for all promotions

in the sample at Albert Heijn. Appendix 5 depicts the results for this single linear regression model.

The standardized Beta coefficient is significant at a 0.003 level with a value of 0.141, which is still

not very high. Since in a full model colinearity with other variables is likely to decrease this effect

size, it is concluded that the effect size is medium to small. Therefore, in this research TV_support is

not considered as an important variable for the model. However, when full information is available

for all retailers, a new analysis is needed to test this conclusion.

The promotion variables Holiday_products and Promo_length were suspected to be positively

correlated with the promotional sales. No significant effect is found at all for Holiday products.


Page 31

Products like luxury ice cream do sell more in Holiday period; however, apparently this effect is lost

or hard to find for promotions in holiday period. Regarding the promo_length, this variable indeed

has a high standardized Beta coefficient, especially for the HPC models and data set where all

promotions are included. The effect in the Food data set is minor, since almost all promotions have a

duration of 1 week in this data set. (H4 rejected and H5 confirmed).

Regarding the discount on a promotion, both log_absolute_discount and percentual_discount are

expected to have a highly positive influence on promotional sales. However, only

percentual_discount confirms this hypothesis and log_absolute_discount has no influence or even a

significant negative influence in three of the data sets. The significant negative influence of the

variable log_absolute_discount in the data sets 1, 3 and 4 on the LF is contradictory to the

hypothesis. Since a higher absolute discount is very likely to result in higher promotional demand,

this result requires further investigation. When this variable is the only dependent variable in the

model the impact becomes positive with a Beta of 0.480 and a significance level of 0.000 (Appendix

6). Hence, correlation effects with other dependent variables are responsible for the negative effect

in the final model. The highest correlation in the correlation matrix (Appendix 7) of 0.759 between

the variable percentual_discount and log_absolute_discount is likely to be responsible for the

negative effect in the full model5. Therefore, the variables percentual discount and

log_absolute_discount should not be included in the same model. These results are consistent with

the results of Van Loo (2004) and Van den Heuvel (2006), where the absolute discount had no

impact or a negative impact on the promotional sales. However, so far the absolute discount for a

promotion is calculated per product. Another option is to calculate the absolute discount per offer,

since a consumer is probable sensitive for the total discount received on an offer. Appendix 6 depicts

individual linear regression analysis where the percentual discount, absolute discount per offer and

absolute discount per product6 are compared. Remarkably, both absolute discount measures result

in a higher model fit and have a higher standardized Beta value. However, when the absolute

discount per offer is included in the full model the effect reverses and becomes small (Appendix 6).

Furthermore, a threshold effect could occur at the absolute discount per offer, i.e. consumers are

only willing to especially go to a retailer for a promotion if the total discount per offer received is

high enough. Appendix 6 tests this effect as well for SPO promotions and all for all promotion

mechanisms together. However, no threshold effect is discovered. Altogether, the percentual

discount might be a better predictor because both absolute discount variables correlate too much

with other variables. This is reflected by the last table in 6 where the full models are fitted with the

5 All the other correlation heights in the correlation matrix in appendix 5 are below 0.8 as well (according to Field (2005) a

correlation of 0.8 or higher indicates a multicolinearity problem). 6 No transformation is applied on the absolute discount per offer and the absolute discount per product to simplify the

comparison, because the log transformation does only improve the results slightly.


Page 32

three different discount variables. The model with percentual discount clearly has a higher model fit

than the other two models (0.689 against 0.654 and 0.651). Concluding, in a full model the

percentual discount is a better predictor than the absolute discount per offer or per product and

both predictors do not function together in a model because the Beta coefficient of the absolute

discount variable turns negative, which decreases the understandability of the model (H6 rejected

and H7 confirmed).

For the next hypothesis, the different promotion mechanisms, the result is less conclusive. The

promotion mechanism Four_or_five_for_X functioned as the baseline variable and has the most

positive impact on the promotional sales. SPO has a very similar result as Four_or_five_for_X and

the mechanisms Two_for_X and Three_for_X have the most negative result. For the variables

Premiaat and Free_product no effect is found which could indicate that their effect size is similar as

the baseline group (four_or_five_for_X) or that their effect is insignificant. One would expect that

promotions with a price off should sell better than promotions which offer a premiaat or (unrelated)

Free_product. It could be that this effect is already inherited in the variable percentual_discount

since both free_product and premiaat have no percentual discount. This can be tested by running a

regression analysis on all promotions of 2009 where the promo mechanism variables are the only

included independent variables. The results for this analysis are depicted in Appendix 8, where the

variable four_or_five_for_X is maintained as the baseline variable. And indeed this confirms the

hypothesis that the promo mechanisms Free_product and Premiaat have the most negative/least

positive impact on promotional sales. However a SPO promotion still sells better than a two_for_X or

three_for_X promotion, which was not expected in advance (H8 partly confirmed). The result of

the variable number_of_products_in_promotions corresponds with the hypothesis that more

products in the same promotions negatively affect the promotional sales (H9 confirmed).

Regarding the retailer variables, similar promotions sell better at the C1000, average at Albert Heijn

and lower at Plus and Kruidvat. However, the retailer dummy variables are not significant for all data

sets. C1000 and Plus have a significant effect in four of the five data sets; Kruidvat has a strong

significant effect in all Non-Food data sets. No clear hypothesis was drawn on forehand for this

variable(s) (H10 not tested). Another variable which relates directly to the retailer is the number

of selling points at which a promotion is sold. If that number of selling points in a promotion is

higher than the usual number of selling points than the promotional sales is higher. The effect of the

extra number of selling points is very strong (H11 confirmed).

Next, the effect of the brand variables will be discussed. Both the percentage_of_repeat_buyers and

the promotion_pressure of a SKU have almost no impact on the promotional sales. An explanation

might be that the measures are not directly related to a promotion; therefore, clear effects decrease.


Page 33

(H12 and H13 rejected). The LF_of_former_promotions does have a very strong positive effect

on the promotional sales. Meaning that when a SKU had high promotional sales in the past, it is

more likely to have high promotional sales in the future (H14 confirmed). For the variable

market_penetration no effect is found, meaning that the promotional sales is not affected by the

penetration a product has in the market (H15 rejected).

The variables preservability, size_of_product and frequency_of_purchase are inherited in the

research to describe the susceptibility of stockpiling. Products with a longer preservability, a smaller

size and a lower frequency of purchase are thought to be more susceptible for stockpiling and to

have higher promotional sales. Indeed a longer preservability and a lower frequency of purchase

result in higher promotional sales, especially in the food categories. This might be caused by a more

frequent shopping pattern for food categories, which makes the variable frequency of purchase

more important. Also, the fact that the preservability is of less importance in the HPC categories

explains the lack of effect in the HPC data set (H16 and H18 confirmed). The size of a product

positively affects the promotional sales. This is not in line with the hypothesis and could be the result

of the higher value of large products, which concurs with the absolute discount of a product (H17

rejected).

For the different categories (Homecare, Personalcare, Savoury & Dressings, Ice & Beverages and

Spreads & Cooking) no conclusive results are found over the different data sets for the direction and

magnitude of the categories. Only for the categories Ice_and_beverages and SCC_and_vitality_shots

a medium effect size is found, where Ice_and_beverages has a negative impact on promotional sales

and SCC_and_vitality_shots has a positive impact on promotional sales. Again as with the retailer it

was unclear in advance which effects should be expected (H19 not tested).

For the variables winter_products_temp and summer_products_temp the effect of the temperature

is tested on temperature sensitive products. Temperature is expected to have a positive effect on

summer products since sales is expected to be higher at higher temperatures and a negative effect

on winter products, since sales is expected to be higher at lower temperatures. However, no effects

are found for both variables. A reason could be that seasonality effects are already taken into

account and the extra temperature differences per week are not significant enough to be found. A

more detailed research on temperatures for temperature sensitive products would most likely find an

effect. However, because of the inclusion of all products, temperature does not produce a better

forecast (H20 and H21 rejected).

7.4 Validation full model

The results in the previous paragraph sketched the performance of the full model fitted on the

promotional datasets of 2009. To test the robustness of the model, the promotions of 2010 are


Page 34

forecasted with the same variables and coefficients as in the model of 2009. Continuously, the

sample size and number of predictors in the validation period are equal to the calibration period. The

results of this robustness check are depicted in Table 7-4. The R-square7 of the data sets is slightly

higher than the R-square in the calibration period. Hence, it can be concluded that the model fitted

on the calibration data sets is robust and generalizable for other data periods. Furthermore, the data

sets without the Magnum products have a higher R-square and a lower MAPE (in line with H22).

And the more specific HPC and Food data sets do not generate in both data sets (H23 not

confirmed).

validation period (Q1 2010)

All

(1)

Food

(2)

HPC

(3)

All w/o

Magnum (4)

Food w/o

Magnum (5)

sample size 246 94 152 243 91

number of predictors 21 16 14 19 18

R-square 0.676 0.574 0.742 0.713 0.711

MAPE (actuals) 33.9% 34.9% 31.8% 31.4% 28.8%

MAPE (Unilever) 34.6% 42.3% 29.1% 31.6% 32.9%

Table 7-4: Model summary validation period for the full model

8 Generalizability of model results

In this chapter the generalizability of the model results will be discussed. First, the generalizability of

the sample size taken within Unilever will be discussed. Second, the research is compared with other

research in the field. The goal of this chapter is to check if the results are generalizable within

Unilever and if the results are consistent with other research in the field. If not, further investigation

will be done.

8.1 Generalizability of sample size

The sample size within Unilever is defined on the dimensions retailer, time, region and products. Of

these the choice for region and time are assumed not to disturb the sample size, since the region is

the whole of the Netherlands and the time horizon is longer than 1 year. The retailers in the sample

size are all among the larger retailers in the Netherlands. Furthermore, the retailers included in the

sample are the most important retailers for Unilever in terms of volume and thus also for the

Unilever wide forecast accuracy. Because of their high sales, the impact on safety stock levels of

7 No adjusted R-square is reported for the validation period, because the adjuste R-square only makes sense in the

calibration period.


Page 35

Unilever is higher than that of smaller retailer. Hence, it is concluded that the retailers in this

research form a solid representative base for the sample size. Lastly, the selection of SKU’s included

in the sample size is analyzed. In paragraph 3.4 the criteria for including SKU’s are stated. The

sample size selection has been taken over all different product categories of Unilever. However, it

would still be possible that the sample size selection is not representative for all products of

Unilever. Especially the 3rd criteria in paragraph 3.4, that more high volume SKU’s should be

included, could cause an unrepresentative sample size. One way of checking the effect of this

assumption is to analyze the sample size on ABC classification. The ABC classification is a method

used within Unilever to rank SKU’s on their importance. In this classification A SKU’s are high

volume, high turnover and high gross profit SKU’s, and C SKU’s are low volume, low turnover and

low gross profit SKU’s8. And the ABC classification is not made over all products of Unilever at once,

but over the five different categories named in paragraph 1.2. Figure 8-1 shows the normal

deviation within Unilever and the deviation in the sample size. Within the sample size the A SKU’s

are overrepresented, the B SKU’s and the C SKU’s are underrepresented.

Figure 8-1: ABC partition for all products of Unilever and for the sample size (based on volume)

The next step is to analyze what the effect of this deviation of the normal situation is on the

performance. Figure 8-2 depicts the full model MAPE values of data set 4 (all promotions without

Magnum products) for the A, B and C SKU’s. The C SKU’s perform the worst, the A SKU’s are in the

middle and the B SKU’s perform best. One would expect that the MAPE values decrease for A SKU’s

because of the higher sales volumes of these SKU’s. Normally, higher volumes should result in a

decrease of variance. Table 8-1 shows the average and the variance of the LF’s for the SKU

classification. Interestingly, the variance for A SKU’s is higher than the variance of the other SKU’s,

which explains the difference in MAPE values. The difference in variance is very large meaning that

8 The ABC classification is based on these three criteria. However, the sales departments have the final call over the ABC

classification.

A - products

20%

B - products

60%

C - products

20%

ABC partition Unilever (based on volume)

A - products

51%B - products

42%

C - products

7%

ABC partition sample size (based on volume)


Page 36

A SKU’s contain a lot more variance than B or C SKU’s. A closer look to the data suggest that the

very large LF’s of a SKU have a large contribution to the total variance of that SKU. Table 8-1 indeed

depicts that A SKU’s contain more very high LF’s (20 or higher) than B and C SKU’s. This could be

caused by the fact that A SKU’s are more often severely promoted and the forecasting model might

not be able to adequately forecast such heavy promotions. Another remarkable issue in Figure 8-2 is

that the MAPE (actuals) value is higher for A SKU’s than the MAPE (Unilever) value. At the C SKU’s

this is the other way around. This arises from the fact that in the calculation of the MAPE (actuals)

overforecasting is heavier punished and in the calculation of the MAPE (Unilever) underforecasting is

punished more severely. Meaning that, A SKU’s tend to be overforecasted and C SKU’s tend to be

underforecasted in the model.

Figure 8-2: Comparison of MAPE values data set 4 over the ABC classification

SKU type Average Variance Average number of

promotions per SKU Number of promotions with a LF higher than

20 per SKU

A 7.51 34.95 19.31 0.938

B 6.37 12.87 13.81 0.269

C 6.15 16.82 12.80 0.200

Table 8-1: The average, variance, number of promotions and number of high LF’s on the ABC classification

Concluding, the sample size does deviate from the total Unilever product portfolio on the ABC

classification. However, this has no clear implication on the performance and generalizability of the

model. Furthermore, it has been reasoned that the choice of time horizon, region and retailer are

done in such a way that the sample size is generalizable. One other aspect which could disturb the

sample size is the exclusion of SKU’s which are sold less than a year. Newer products tend to be

more difficult to forecast, because of the lack of stable base line sales and the lack of historical

0.0%

5.0%

10.0%

15.0%

20.0%

25.0%

30.0%

35.0%

40.0%

All A B C

MAPE (actuals)

MAPE (Unilever)


Page 37

comparable promotions. This will hold for the model as well as the current promotion forecasting

process of Unilever. It is difficult to determine what the impact is on new products. This research is

focussed on more stable products, since the effect size of dependent variables is easier determined

for these promotions.

8.2 Comparison with other research in the field

In van der Poel (2010a) the available research on promotion forecasting was split in two parts. The

first paragraph was theoretical research papers and the second part was about more practical

master theses. To judge if the results of this research are comparable, on what aspects the research

differs and what implications these differences have for the result of the forecasting model, a

comparison will be made in this paragraph. Both the theoretical research papers and the practical

master theses will be included in this comparison. The advantage of the research papers is that the

approach is more scientific and the advantage of the master theses is that the model and model

performance have been described more extensively. The following research will be included in the

comparison:

• Cooper et al: PromoCast ™: A New Forecasting Method for Promotion Planning.

• Wittink et al: SCAN*PRO: the estimation, validation and use of promotional effects based on

scanner data (internal paper).

• Van Loo: Out-of-Stock reductie van actieartikelen, Model voor vraagvoorspelling en

logistieke aansturing van actieartikelen bij Schuitema/C1000.

• Van den Heuvel: Action products at Jan Linders Supermarkets.

Table 8-2 makes a comparison between the different research papers on promotion forecasting. All

four papers have been performed from a retailer point of view. Furthermore, the SCAN*PRO and

Promocast models are directed at the store level of a retailer instead of the supply chain level. All

methods use linear regression. Regarding the performance of the models, the paper of the

Promocast model does not contain any comparable performance measures, since the authors

measure the number of case packs missed. For the other models the performance measures differ

considerably. The adjusted R-square of the models of Van Loo en Van den Heuvel is similar, while

the adjusted R-square of this research is substantially higher. Regarding the MAPE, the model of Van

der Poel and Van Loo perform similar. However, the MAPE calculation of Van Loo is not based on the

absolute sales number but on a transformation of the LF. Since this transformation brings the values

of the dependent variable closer together, this measure understates the real MAPE values (based on

absolute sales).


Page 38

Van der

Poel

SCAN*PRO

- model

Promo-

cast

Van Loo Van den

Heuvel

Year 2010 1988 1999 2006 2009

Point of view Manufacturer Retailer Retailer Retailer Retailer

Commercial use No Yes Yes No No

Aggregation level Supply chain Store level Store

level

Supply

chain

Supply

chain

Method Linear

regression

Linear

regression

Linear

regression

Linear

regression

Linear

regression

Ln LF as dependent var. Yes Yes Yes No No

Sample size 1238 20801 n.a. 1556 n.a.

Average LF 6.08 n.a. n.a. 9.04 4.59

Variance 26.95 n.a. n.a. 29.93 7.84

Standard deviation 5.19 n.a. n.a. 5.47 2.80

Minimum 1.00 n.a. n.a. 1.13 1.00

Maximum 49.38 n.a. n.a. 34.00 14.28

Adjusted R-square 0.691 a 0.507 b n.a. 0.45 0.44

MAPE validation period (full model) 31.3% 37.1% n.a. 31.1%c n.a.

a : The adjusted R-square of the full model of data set 4 is taken here. b : MAPE value of SCAN*PRO model in research Van Loo (2006). c : The MAPE calculation in the research of Van Loo seems to be based on the ln of the LF. This calculation understates the MAPE based on absolute promotional demand.

Table 8-2: Comparison research on promotions forecasting

All models in Table 8-2 differ substantially in performance9. To investigate where this difference in

performance originates from Table 8-3 shows an overview of the most important variables included

in the research. The current research is taken as the frame of reference. The SCAN*PRO-model is a

concise model, where only a few important variables are taken into account. The Promocast model is

by far the most elaborate model with 67 independent variables. This model makes extensive use of

LF’s of former promotions and since the model is directed at the store level, the promotion database

is a lot larger. The model of Van Loo does not include the important variables display and folder,

which are included in all other models and are among the most important variables. The model of

Van den Heuvel includes the most important variables and contains some interesting research on the

effect of other actions in the same product category and the effect of Out of Stocks.

9 The performance of the Promocast model is not available for the R-square and MAPE measures. The paper on that model

only states the performance in case pack size difference on retailer store level.


Page 39

The adjusted R-square model performance is known for four of the five models. The performance of

the model build in this research compared to the other models is considerably higher. Here, the

underlying factors for this difference will be discussed. The adjusted R-square of the model in this

research might be higher, because Van Loo did not include the critical variables display and folder.

Furthermore, Van Loo, the SCAN*PRO-model and Van den Heuvel did not include all of the following

variables: promo mechanism, the average LF of former promotions, the number of products in

promotion, the growth of the number of selling points, the size of a product, preservability and TV-

support. Finally, Van Loo did not transform the LF as dependent variable and thus the dependent

variable is not normally distributed. This has a very negative impact on the performance of the

model. Altogether, the model of Promocast is the most sophisticated model regarding the included

variables. However, the results of this model cannot be compared and the model is directed at the

store level of a retailer.

Van der

Poel

SCAN

*PRO model

Promo-

cast

Van Loo Van den

Heuvel

Retailer x n.a. n.a. n.a. n.a.

Product category x x x

LF former promotions SKU x x

Display x x x x

Folder x x x x

Promo-length x x All 1 week All 1 week

Promo mechanism x x x

tv-support x x

Number of products in promotion x

Growth # of selling points x n.a. n.a.

Percentual discount x x x x x

Size of product x

Preservability x x

number of actions in same product group No data x

More specific data on display location No data x

More specific data on size and place folder advertisement

No data x

LF former promotions SKU with matching advertisement and display

Not enough data

x

n.a.: Not applicable in this model because model is build at a single retailer or model is build on store level

Table 8-3: Comparison of the variables included in the different promotion forecasting research

Lastly, Van Loo (2006) used a different dependent variable than the other research. In his research

Van Loo fitted a log normal distribution on the LF’s and then used the cumulative lognormal

distribution of each LF as dependent variable (P(LF)). In the research Van Loo concluded that this


Page 40

measure gave superior results against the LF of a promotion; however, no comparison was made

with another widely used dependent variable in literature, the ln transformation of a LF. Appendix 9

shows the results if the P(LF) is used in the model of this research instead of the ln(LF). The new

dependent variable is tested on the full model on data set 4. The results indicate that indeed the

P(LF) gives superior results against the LF. But the P(LF) has a lower model fit than the ln(LF). An

explanation might be that the ln(LF) meets the requirements of a normal distribution better than the

P(LF) as shown in appendix 9.

Concluding, this chapter provided insight in the generalizability of the model results by checking the

assumptions underlying linear regression, analyzing the generalizability of the sample size and

comparing the research with other relevant research in the field. The sample size is regarded to be

generalizable over the other Unilever SKU’s. Only the introduction of a new SKU will cause deviation

from the current sample size and quite likely lower the performance. But, forecasting new SKU’s has

always been difficult. Regarding the comparison against other research, the results of the model

constructed in this paper are quite high. The inclusion of important variables in this research, which

were not included in the comparable research, is very likely to be responsible for the good model fit.

Conclusion part 3: In this part the results for the full model were depicted. Both the model fit and

forecast accuracy values are quite high for the full model. However, the full model on consumer

demand level only provides the first part of the total picture. Because of the functional requirements

stated in paragraph 2.4, the full model needs to be adapted to the retailer demand level. This will be

done in the next part, so the model becomes useful in practice.


Part 4: Model adaptation

The results of the full model provided det

of the different variables and the performance of

the different data sets. In order to translate

these results into a model that can be used in

practice, this part decreases the number of

variables in the models. Then the

evaluated based on their model results. This gives

insight in the performance of the reduced model

against the full model and thus in the practical usefulness of the forecasting model.

There are three main reasons to adapt the full model o

1. To increase the usability

2. To correct for data availability

3. Adapt the model of consumer demand to retailer orders

The first adaptation will result in a model with 5 to 10 variables, since this nu

useful in practice (interviews Unilever). The number of variables has to be limited because an

employee of Unilever should be able to quickly work with the model. The variables will be selected

on their effect size and direction.

first adaptation on data availability. The variables which are normally not known within Unilever will

be deleted from the set of variables. Hence, adaptation two includes the same variables

adaptation one without the variables with low data availability. In the third adaptation, the consumer

demand will be adjusted to the retailer orders. The consumer demand serves as the basis for the

discussion with the retailer and for the On Shelf Ava

orders form the real demand that should be met within Unilever.

The adapted models will be tested on data sets 3, 4 and 5, because the disturbing effect of the

Magnum products was too large to include t


Page 41

Part 4: Model adaptation

The results of the full model provided detailed insights in the

of the different variables and the performance of

the different data sets. In order to translate

these results into a model that can be used in

practice, this part decreases the number of

variables in the models. Then the models will be

evaluated based on their model results. This gives

insight in the performance of the reduced model

against the full model and thus in the practical usefulness of the forecasting model.

There are three main reasons to adapt the full model of the previous part:

usability of the model in practice.

data availability in practice.

consumer demand to retailer orders.

The first adaptation will result in a model with 5 to 10 variables, since this number of variables is still



on their effect size and direction. The second adaptation will inspect the variables included in the


be deleted from the set of variables. Hence, adaptation two includes the same variables



discussion with the retailer and for the On Shelf Availability of a product, but in the end the retailer

orders form the real demand that should be met within Unilever.


Magnum products was too large to include these products in the further analysis.


effect size

mber of variables is still



The second adaptation will inspect the variables included in the


be deleted from the set of variables. Hence, adaptation two includes the same variables of



ilability of a product, but in the end the retailer



Page 42

9 Adaptations to increase the usability and check for data availability

9.1 Adaptation 1: Increase the usability by reducing the number of variables

To reduce the number of variables in the full model some criteria are needed. The goal is to reduce

the number of variables to less than ten variables and analyze the impact of the reduction of

variables on the performance of the model. The criteria to select the variables are:

1. A strong effect in the three data sets for the full model, i.e. an average standardized Beta of

0.150 or higher over the three data sets.

2. A persistent effect in the three data sets for the full model, i.e. the standardized Beta does

not have an opposing direction in the three data sets.

Analysis of the standardized Beta coefficients of the full model in Table 7-3 leaves nine variables

which meet these criteria: Display, Folder, Promo_length, Percentual_discount, Two_for_X,

Three_for_X, log_growth_number_selling_points, ln_LF_former_promotions_EAN and Kruidvat.

Since the SPO is significant as well and falls under the same variable (promo mechanism) as

Two_for_X and Three_for_X this variable is included as well. This argument also holds for the

retailers C1000 and Plus, which fall under the same variable as Kruidvat, namely retailer. Table 9-1

shows the model results of the calibration and validation period. The number of variables is larger

than the functional requirement of 10 variables; but, the variables SPO, Two_for_X and Three_for_X

as well as the variables Kruidvat, C1000 and Plus are dummy variables for the variable promo

mechanism and retailer and can be regarded as one variable in practice, since an employee only

needs to complete one data field. Hence, the number of variables comes to 8.

Calibration period Validation period

HPC

(3)

All w/o

Magnum (4)

Food w/o

Magnum (5)

HPC

(3)

All w/o

Magnum (4)

Food w/o

Magnum (5)

sample size 601 968 367 152 243 91

# of predictors 11 / (8) 12 / (8) 11 / (8) 11 / (8) 12 / (8) 11 / (8)

R-square 0.704 0.676 0.646 0.723 0.703 0.663

adjusted R-square 0.698 0.671 0.635

MAPE (actuals) 27.6% 28.8% 29.7% 32.8% 31.3% 30.6%

MAPE (Unilever) 27.2% 27.8% 27.8% 30.2% 31.1% 34.6%

Table 9-1: Summary of the reduced model with a limited number of variables (adaptation 1)

Logically, compared with the full model the adjusted R-square values decrease, since fewer variables

are used to fit the data. Furthermore, data set 3 performs slightly better than data set 4 and data


Page 43

set 5; however, this difference is minor with average MAPE values from 27.2% to 29.7% in the

calibration period. In the full model the performance of the HPC data set was slightly worse than the

performance of the Food data set. This is probably caused by the fact that most promotions of

Home and Personal care products occur at Kruidvat where there is no information available for the

variable display. And since this information was already not available in the full model, the decrease

of the model fit in data set 3 is less than the decrease of data set 4 and 5. The B-coefficients and

Standardized Beta coefficients of the data sets are depicted in Table 9-2. The variables Display,

Promo_length, Percentual_discount, Kruidvat, ln_LF_former_promotions_EAN, and

log_growth_number_selling_point have the largest influence in the model (standardized Beta values

of 0.3 or higher in data set 4). Furthermore, all variables are highly significant in all three data sets,

except for Kruidvat which of course has no effect in the Food data set.

HPC

(3)

All w/o Magnum

(4)

Food w/o Magnum

(5)

B Beta B Beta B Beta

(Constant) -2.685 -3.162 -3.804

Display_1 0.009 0.309* 0.008 0.311* 0.007 0.354*

Folder 0.299 0.09* 0.414 0.16* 0.542 0.273*

Promo_length 0.534 0.404* 0.594 0.399* 1.079 0.155*

Percentual_discount 0.018 0.524* 0.020 0.483* 0.021 0.302*

SPO -0.201 -0.079* -0.243 -0.125* -0.188 -0.125

Two_for -0.260 -0.178* -0.287 -0.214* -0.256 -0.204*

Three_for -0.229 -0.152* -0.209 -0.134* -0.190 -0.116

C1000 a a 0.210 0.12* 0.300 0.211*

Plus -0.099 -0.055** -0.102 -0.063* -0.126 -0.089**

Kruidvat -0.556 -0.428* -0.468 -0.331* a a

log_growth_number_selling_points 3.714 0.356* 4.145 0.362* 4.062 0.215*

ln_LF_former_promotions_EAN 0.617 0.314* 0.615 0.345* 0.602 0.397*

* = significant with a 0.01 significance level

** = significant with a 0.05 significance level a The variable is not significant for this data set

Table 9-2: B- and standardized Beta coefficients for the reduced model with limited variables

Concluding, in the first adaptation the number of variables has decreased from 19 to 12 (based on

data set 4), whilst the model fit has almost not decreased. This is promising news for the

implementation phase.


Page 44

9.2 Adaptation 2: Increase the usability by checking for data availability

In the second adaptation of the full model, the variables will be checked for data availability. In

order to use the model in practice, variables in the model should be readily available for employees

of Unilever. If not, the process of using the model will be too time consuming, unclear or not

possible. As a starting point the variables included in adaptation 1 are used. For the variables Display

and Log_growth_number_of_selling_points Unilever has no or limited information. Regarding the

variable Display, Unilever often does not know if and in how much stores a specific product has a

second placement. Regarding the growth of the number of selling points, Unilever receives very

limited information of a retailer about the number of selling points. The variables included in the

model are: Folder, Promo_length, Percentual_discount, SPO, Two_for, Three_for, C1000, Plus,

Kruidvat, and ln_LF_former_promotions_EAN. Table 9-3 depicts the model results of the calibration

and validation period. Again the dummy variables coding the variables retailer and promo

mechanism can be regarded as one in practice. Hence, 6 variables are included in each data set10.

Calibration period Validation period

HPC

(3)

All w/o

Magnum (4)

Food w/o

Magnum (5)

HPC

(3)

All w/o

Magnum (4)

Food w/o

Magnum (5)

sample size 601 968 367 152 243 91

# of predictors 10 / (6) 9 / (6) 7 / (6) 10 / (6) 9 / (6) 7 / (6)

R-square 0.548 0.505 0.517 0.505 0.496 0.560


MAPE (actuals) 35.8% 36.8% 35.2% 49.0% 46.6% 34.4%

MAPE (Unilever) 33.2% 33.0% 31.1% 36.1% 38.5% 39.2%

Table 9-3: Summary of the reduced model corrected for data availability (adaptation 2)

Again the adjusted R-square decreases, since fewer variables are used to fit the data. As a result the

MAPE values in the calibration period decrease as well. The MAPE values in the validation period

show similar results. The different data sets have a very similar model fit in the calibration period

with the Food model performing slightly better. However, in the validation period the Food model

performs quite a lot better than the HPC model for the MAPE (actuals), but not for the MAPE

(Unilever). When the MAPE (actuals) value is higher than the MAPE (Unilever) value this indicates

underforecasting. When the MAPE (actuals) value is lower than the MAPE (Unilever) value this

indicates overforecasting. In this case the promotions for the HPC data set are slightly

underforecasted and the promotions for the Food data set are slightly overforecasted. Overall, the 10 The variables excluded in some of the data sets are dummy variables which fall under retailer or promo mechanism.

Therefore, the number of variables can be 6 for each data set.


Page 45

performance of the model decreased considerably because of the exclusion of the variables with

limited data availability (Display and Log_growth_number_of_selling_points). And the impact is more

severe on the HPC data set. The B-coefficients and Standardized Beta coefficients of the data sets

are depicted in Table 9-4. The variables Folder, Promo_length, Percentual_discount and Kruidvat

have the largest influence in the model (standardized Beta of 0.150 or higher in data set 4).

HPC

(3)

All w/o Magnum

(4)

Food w/o

Magnum (5)

B Beta B Beta B Beta

(Constant) -0.828 -1.316 -1.845

Folder 0.436 0.132* 0.652 0.251* 0.789 0.398*

Promo_length 0.572 0.433* 0.647 0.434* 1.036 0.149*

Percentual_discount 0.022 0.632* 0.023 0.553* 0.019 0.266*

SPO -0.202 -0.08** -0.189 -0.097* -0.126 -0.084

Two_for -0.223 -0.152* -0.187 -0.139* -0.169 -0.135*

Three_for -0.154 -0.103 a a a a

C1000 0.170 0.08** 0.244 0.139* 0.326 0.229*

Plus 0.261 0.146* 0.111 0.068** a a

Kruidvat -0.377 -0.291* -0.268 -0.189* a a

ln_LF_former_promotions_EAN 0.740 0.376* 0.746 0.419* 0.739 0.487*

* = significant with a 0.01 significance level

** = significant with a 0.05 significance level a The variable is not significant for this data set

Table 9-4: Beta coefficients for the reduced model with limited information (adaptation 2)

9.3 Comparison of the different adaptations with the full model

This paragraph discusses how the different models in this chapter perform relative to each other and

the full model. In order to compare the different models, the results of the models are depicted in

Table 9-5. The first conclusion is that the full model performs best on all measurements in the

calibration period, followed by adaption 1. Adaptation 2 performs the worst of all. In the validation

period the Full model and the model of adaptation 1 perform very similar and again the model of

adaptation 2 performs far worse. Overall, the exclusion of the less important variables has very

limited or no result at all on the model performance. However, the exclusion of two important

variables (because of data availability) does have a substantial effect. Hence, Unilever should focus

on obtaining data availability on all variables in Adaption 1.


Page 46

Full model

Adaptation 1:

decrease number

of variables

Adaptation 2:

adjust for data

availability

Calibration period adj. R-square 0.691 0.671 0.501

MAPE (actuals) 27.9% 28.8% 36.8%

MAPE (Unilever) 27.0% 27.8% 33.0%

Validation period MAPE (actuals) 31.4% 31.3% 46.6%

MAPE (Unilever) 31.6% 31.1% 38.5%

Table 9-5: Comparison results data set 4 for the full and reduced models predicting consumer demand

10 Model adaptation 3: From consumer demand to retailer orders

The goal of this chapter is to check if the variables used to forecast consumer demand also predict

the retailer orders in a satisfying manner. In order to do so, the variables of the full model will be

fitted on the retailer orders to gain insight in the difference between consumer demand and retailer

orders. The first paragraph explains the calculation of the retailer orders for a single promotion.

Thereafter, the full model of chapter 7 is fitted on the retailer orders to analyze how accurate the

variables forecast the retailer orders.

10.1 Calculation retailer orders

The retailer orders connected to certain promotions are delivered in multiple weeks to the

distribution centre (DC) of a retailer. Furthermore, a retailer still orders products for its base demand

in the weeks prior to the promotion. This increases the complexity of connecting the retailer orders

to a certain promotion on the shopping floor. However, a promo indicator shows which retailer

orders can be connected to the promotional sales. Furthermore, almost all promotional orders are

delivered two weeks in advance of the promotion up to the promotion week itself. Therefore, retailer

orders for a promotion are defined as orders with a promo indicator in week X-2, X-1 and X, with X

as the promotion week. For the promotions of Kruidvat the retailer orders are not available, because

no distinction is made between promotion orders and base orders for Kruidvat. For the other

retailers the promotion orders are available. Hence, the following analysis will only be done for

Albert Heijn, C1000 and Plus and not for Kruidvat.

10.2 Model fit on retailer orders

The retailer orders calculated are used to determine the LF of a promotion. The LF is calculated

according to formula 10-1, where the base line sales are still based on the consumer demand.

However, the upper part of the fraction has changed from consumer demand to retailer orders.


Page 47

Retailer ordersLift Factor

Base line sales= Formula 10-1

Table 10.1 depicts the results for the calibration and validation period. Contrary to the models which

predicted the consumer demand, the model which predicts the retailer orders is not robust. This is

especially true for the HPC data set where the R-square value decreases from 0.358 in the

calibration period to 0.000 in the validation period. This means that taking the average of the

dependent variables predicts the promotional sales equally bad as the model does. For the Food

data set, the model fit is a lot better with a MAPE (actual) in the validation period of 33.5%. Hence,

the added variability in the food categories is a lot lower than the added variability in the HPC

categories. This causes which increase the variability will be discussed in the next paragraph.

Concluding, for the forecast for retailer orders, other methods have to be analyzed, since the direct

forecast of the retailer orders is too inaccurate, not robust and does not provide a basis to discuss

the expected demand of a promotion with a retailer.

calibration period validation period

HPC

(3)

All w/o

Magnum (4)

Food w/o

Magnum (5)

HPC

(3)

All w/o

Magnum (4)

Food w/o

Magnum (5)

sample size 289 641 352 82 172 90

# of predictors 15 15 14 15 15 14

R-square 0.358 0.403 0.538 0 0.150 0.577


MAPE (actuals) 77.8% 70.1% 48.1% 285.2% 126.7% 33.5%

MAPE (Unilever) 42.2% 36.9% 30.5% 48.9% 40.0% 34.9%

Table 10-1: Model results with retailer orders as dependent variable

10.3 Difference between retailer orders and consumer demand

In the last paragraph the independent variables were fitted on the retailer orders instead of the

consumer demand. It turned out that the model was less capable of predicting the retailer orders

than the consumer demand. To gain insight in the difference between retailer orders and consumer

demand, this paragraph will calculate the percentual difference between both. This way an

alternative manner to adapt the consumer demand to retailer orders is hopefully found in this

paragraph. The consumer demand is known and the retailer orders were calculated in the first

paragraph of this chapter. Table 10-2 displays the absolute difference and non absolute difference,

where the non absolute difference would be zero on average if the retailer orders and consumer

demand were similar (see formula 10-2 and 10-3). However, as expected the retailer orders are

larger than the consumer demand. Moreover, the difference between the retailers is quite large,


Page 48

from 39.9% up to 86,8%. Albert Heijn has the lowest difference and Plus the highest.

ABS(retailer orders-consumer demand)Absolute difference =

consumer demand Formula 10-2

retailer orders-consumer demandDifference =

consumer demand Formula 10-3

Absolute difference

consumer demand &

retailer orders

Difference consumer

demand & retailer

orders

All 67.7% 53.9%

Albert Heijn 58.0% 39.9%

Plus 97.5% 86.8%

C1000 53.6% 46.0%

Kruidvat n.a. n.a.

Table 10-2: Difference between consumer demand and retailer orders for each retailer

But why do the retailer orders differ from the consumer demand? Figure 10-1 depicts the most

plausible disturbing factors. First, forward buy could result in higher retailer orders. Retailers invest

in forward buy because of the lower purchase price they pay for a product when the product is on

promotion. However, most of the retailers included in this research receive their discount on the

purchase price on the bases of scanning data (consumer demand). This is the case for Albert Heijn

and C1000. Plus still received full discount for all the products they ordered in promotion up to

January 2010. This is a reasonable explanation for the large difference between the consumer

demand and retailer orders for a promotion at plus. Second, the DC stock levels and store stock

levels have an influence on the retailer orders. When there is a lot of stock available in the stores

and/or DC of a retailer they will order fewer products for an upcoming promotion. Especially when

the promotion intensity is high, stock levels can be high as a result of earlier promotions. Third, the

consumer sales varies through over the different retailer stores. And since a retailer does not want

to be out of stock in any of his stores, a safety margin in each retailer store is needed to deal with

the variance in sales among the stores. This results in extra retailer orders of approximately 10% to

20% of the consumer demand (interviews Unilever). Fourth, an inaccurate retailer forecast results in

a deviation between customer demand and retailer orders. Retailer will always be sensitive to build

in extra safety stock since they are punished more heavily and directly for out of stocks than for

stock costs. Lastly, the promotional displays are a lot larger than the normal displays and need to

be full to the end of the promotion period. Therefore, more products (stock) are needed on the shelf

than normal and this stock needs to be ordered extra above on the expected consumer demand.


Page 49

The influence the disturbing factors have on the promotional orders is described as the bullwhip

effect in literature. Lee et all (1997) state in their paper about the bullwhip effect that the

information transferred in the form of orders tends to be distorted and can misguide upstream

members in their inventory and production decisions. In particular, the variance of orders may be

larger than that of sales, and the distortion tends to increase as one moves upstream the supply

chain.

Figure 10-1: Disturbing factors which cause a difference between consumer demand and retailer orders

The disturbing factors seem to have a larger influence on the HPC promotions than the Food

promotions, since a model to forecast the retailer demand performs a lot worse for the HPC data set

(see Figure 10-2). So, the connection between consumer demand and retailer orders is a lot less for

HPC than for Food promotions. Furthermore, Figure 10-2 depicts the MAPE values for the ABC

classification for all promotions of 2009/2010 without Magnum products. The A SKU’s clearly perform

better than the B SKU’s and C SKU’s. The B SKU’s also perform better than the C SKU’s. This is

contradictory to the MAPE values of the consumer demand forecast, where the A, B and C SKU’s

performed very similar. A likely explanation for the better performance of Food and A SKU’s is the

law of large numbers. The sales of Food SKU’s and A SKU’s is far larger than the sales of HPC SKU’s,

B SKU’s and C SKU (see Figure 10-3). Hence, the variation in the retailer orders is less, i.e. it is more

likely that a retailer places promotional orders which deviate from the forecast when the sales

volume is lower.

Retailer

orders

Customer

demand

Disturbing factors:• Forward buy• DC stock levels• Store stock levels• Allocation of products over

retailer stores• Inaccurate retailer forecast• Larger promotional displays


Page 50

Figure 10-2: MAPE values retailer orders for data set 4 for Food, HPC, A, B, and C SKU’s

Figure 10-3: Average consumer demand per promotion for Food, HPC, A, B and C SKU's

This chapter showed that a model which predicts retailer orders directly does not lead to satisfactory

results. Especially for HPC SKU’s and C SKU’s the model performance declines substantially when

looking at the MAPE(actuals) values. Hence, directly forecasting retailer orders is concluded not to

be an appropriate approach. Therefore, another approach will be taken, where the consumer

demand is raised with the average difference between consumer demand and retailer orders. This

way the consumer demand is still used as a starting point.

Conclusion part 4: This part adapted the full model in three ways. In adaptation 1, a limited

number of variables is included in the model. This adaptation has a model fit which is almost as

good as the full model. The second adaptation, where 2 important variables without data availability

are removed from the model of the first adaptation, has a substantial worse model fit. In the last

adaptation the full model is fitted on retailer orders instead of consumer demand. This resulted in a

remarkable lower model fit, especially for HPC SKU’s. Concluding, data availability for the more

significant variables of the model is highly important for the performance of the model. And directly

forecasting the retailer orders does not give a satisfying result (sub research question 5). Therefore,

the next part will adapt the consumer demand to retailer orders in another way.

0.0%

20.0%

40.0%

60.0%

80.0%

100.0%

120.0%

140.0%

160.0%

180.0%

Food HPC A B C

MAPE (actuals)

MAPE (Unilever)

0

10000

20000

30000

40000

50000

60000

Food HPC A B C

Av

era

ge

pro

mo

tio

na

l sa

les


Page 51

Project definition(Needs)

Model adaptation

Model results

Research Design(Methods)

Implementation& Conclusions

Part 5: Implementation and conclusions

The previous parts lay the outline for the research, tested which

variables have a large effect on promotional demand

and tested the effect of the necessary adaptations

to use the model within the Unilever organization.

In this part the implementation and conclusions

will be discussed. First, it will be discussed what kind of

model should be implemented within the organization

and which implementation steps are needed to be able

to forecast the promotional demand more accurately. Second, the findings of this research, the

managerial implications and the contribution of the research to science is discussed.

11 Implementation

This part will connect the different parts of this research, so that a promotion forecasting model is

created which fulfils the practical requirements of Unilever (paragraph 2.4). The first paragraph uses

the forecast for consumer demand to come to a forecast for retailer orders. Because these forecast

approach does not result in a satisfactory forecasting accuracy alternative steps need to be taken.

Hence, the second paragraph discusses the different implementation steps which are needed to

reach a higher forecasting accuracy.

11.1 Final model for implementation

The final model is based on the consumer demand and corrected to retailer orders. In this way the

good forecast results on the consumer demand level are used as the basic to forecast the retailer

orders. The smaller the deviation between the two, the less variation is added by the retailer order

process and the better the forecast accuracy for retailer orders will be. In the next two paragraphs

both the results for the consumer demand model with limited variables (adaptation 1) and the

results for the consumer demand model corrected for data availability (adaptation 2) are used as

input to forecast the retailer orders. Directly forecasting the retailer orders did not lead to

satisfactory results (10.2). Therefore, the consumer demand is taken as the starting point here and

adapted to retailer orders. The correction is made by multiplying the consumer demand with the

average difference between consumer demand and retailer orders as shown in Figure 11-1. The

difference is smallest for Albert Heijn and largest for Plus.


Page 52

Figure 11-1: Generation of forecast retailer orders based on the consumer demand model

11.1.2 Results retailer orders (model adaption 1 as basis)

The results for the retailer orders based on adaptation 1 are shown in Table 11-1. The model results

clearly show that the retailer orders for HPC promotions are a lot more difficult to forecast than the

retailer orders for Food promotions. The predicting power expressed by the R-square for HPC

promotions is almost zero (0.103) in the calibration period and is zero in the validation period. This

means that taking the average of all LF’s would result in a similar result. The predicting power for

Food promotions is substantially higher in the calibration and validation period (0.411 and 0.681

respectively). Corresponding to the R-square results the MAPE values are very high for the HPC

promotions, whilst the MAPE values for the Food promotions are still quite good. In the table the

Case fill without any safety stocks and the average left stock in weeks are depicted as well. A higher

average left stock in weeks indicates over-forecasting by the model and thus improves the Case fill.

Average left stock numbers are very different again for HPC promotions and Food promotions,

where the average left stock level of HPC promotions is a lot higher because of the variability in the

retailer demand. When a retailer orders less than expected the left stock in weeks increases, which

occurs more at HPC promotions than Food promotions. Condensing, the forecast accuracy for food

promotions is very acceptable, whilst the forecast accuracy for non food promotions does not

generate acceptable results.


HPC

(3)

All w/o

Magnum (4)

Food w/o

Magnum (5)

HPC

(3)

All w/o

Magnum (4)

Food w/o

Magnum (5)

R-square 0.138 0.314 0.411 0.000 0.105 0.681


MAPE (actuals) 124.7% 87.1% 62.3% 293.5% 143.7% 28.3%

MAPE (Unilever) 39.5% 36.2% 33.0% 47.2% 38.0% 30.1%

Case fill 83.6% 82.5% 81.4% 84.0% 82.3% 80.7%

Average left stock in weeks 1.97 1.42 0.99 3.62 2.17 0.94

Table 11-1: Results for retailer order forecast based on consumer demand model adaptation 1

Forecastretailer orders

Forecast customer demand

… X ...

2.061.38Plus

1.491.40C1000

1.251.37Albert Heijn

FoodHPC

2.061.38Plus

1.491.40C1000

1.251.37Albert Heijn

FoodHPC


Page 53

11.1.3 Results retailer orders (model adaption 2 as basis)

As mentioned before, of the variables Display and number_of_selling_points limited or no

information is available at Unilever. Therefore, the forecast of the retailer orders is analyzed for the

model where these variables are excluded (model adaptation 2). The results are very similar to the

last paragraph, where the forecast for the HPC promotions performs far worse than the forecast of

Food promotions (Table 11-2). Furthermore, the exclusion of two important independent variables

results in a worse model fit on all different measurements. Hence, the lack of data on these two

variables is an important aspect to focus on. This is consistent with the conclusion on consumer

demand level, where the impact of the lack of data of the two important predicts was even larger.


HPC

(3)

All w/o

Magnum (4)

Food w/o

Magnum (5)

HPC

(3)

All w/o

Magnum (4)

Food w/o

Magnum (5)

R-square 0.077 0.214 0.333 0.000 0.056 0.623


MAPE (actuals) 127.2% 93.0% 67.8% 298.3% 145.6% 31.6%

MAPE (Unilever) 44.1% 40.3% 34.9% 49.8% 40.7% 33.2%

Case fill 79.0% 80.9% 78.2% 82.6% 80.5% 74.2%

Average left stock in weeks 1.94 1.46 0.95 3.46 2.23 0.96

Table 11-2: Results forecast retailer orders based on consumer demand model adaptation 2

11.1.4 Conclusion results retailer orders based on model adaptation 1 & 2

Overall the forecast accuracy on retailer order level is disappointing after the good model results on

the consumer demand level. The variance caused by the retailer order process has such a disturbing

influence that model results on the consumer demand level have limited purpose on the retailer

order level. This holds especially for the HPC and C SKU’s and to a lesser extent for the Food, A and

B SKU’s (see Figure 11-2).

Figure 11-2: MAPE values retailer orders for data set 4 with the consumer demand model adaptation 1 and 2 as basis

0.0%

20.0%

40.0%

60.0%

80.0%

100.0%

120.0%

140.0%

160.0%

180.0%

MAPE (actuals) -

adaptation 1

MAPE (actuals) -

adaptation 2

MAPE (Unilever) -

adaptation 1

MAPE (Unilever) -

adaptation 2

Food

HPC

A

B

C


Page 54

The model adaptation of consumer demand to retailer orders depicted how a model should ideally

work in practice. However, the good results on consumer demand level are not imitated on retailer

order level. The effect of the disturbing factors between consumer demand and retailer orders is too

large to neglect (sub research question 6). Hence, the performance on retailer order level is not

good enough to directly implement the above retailer order models.

11.1.5 Actions needed to overcome current problems

The performance of the proposed models to forecast retailer orders in this chapter indicate that

future actions need to be taken to improve the forecast accuracy that can be reached. First, the data

management of the important promotion variables used to come to a promotion forecast should be

improved. These variables should be made easily accessible and usable for analyzing the demand of

upcoming promotions. Second, the performance difference between model adaptation 1 and 2

shows that the variables with low data availability, which are not included in model adaptation 2,

have a high impact on the forecast accuracy. Hence, data availability should be gained on these

variables for upcoming promotions at a retailer. Third, the transformation from consumer demand to

retailer orders causes a lot of extra variation. The models in this chapter showed that currently

Unilever is not able to bridge the gap between consumer demand and retailer orders. Therefore, the

factors causing the difference between consumer demand and retailer orders should be analyzed

and included in a forecasting model. These future actions will be addressed in the implementation

plan in the next paragraph.

11.2 Implementation plan

This paragraph depicts which steps should be taken in the future to improve the promotion

forecasting process. The steps are based on the results found in this research. The first block in

Figure 11-3 depicts the current situation at which no forecasting model is used for promotions,

instead Unilever employees use their own judgemental forecast. The first and second step have

been covered in this research, whilst the third and fourth step is the future direction this research

indicate to improve the promotion forecasting accuracy in the longer run. Every next step increases

the alignment with a retailer on trust, strategy and co-management.


Page 55

Figure 11-3: Implementation steps to increase the promotion forecast accuracy

The second block in the above figure depicts the first implementation step, which states that the

available data within Unilever should be recorded and used better. Momentarily, promotions are

recorded by the logistic employees in a program called Promoplanner. However, during the data

gathering phase of this research, the promotional data available in this program turned out to be

limited and sometimes incorrect. Limited because important variables are not saved in the program

and incorrect because last minute adaptations of a promotion are not always changed in

Promoplanner. Another point is that the important variables Display and LF of former promotions on

SKU level, which are used in this research, stem from the marketing database Nielsen. These

variables should be linked to the promotions in Promoplanner. Accurate historical data is the basis

for a forecasting model and thus is the first step. After this step Unilever is able to forecast the

consumer demand according to the results of model adaptation 2, assuming data availability on all

variables for upcoming promotion except for the two variables with low data availability (Display and

growth number of selling points). The data which needs to be recorded accurately or linked from the

marketing database Nielsen is:

1. Promotion mechanism

2. Percentual discount

3. Promotion length

4. Type of folder advertisement (location in folder and size of ad)

(3) Further collaborate to

understand disturbing

factors on retailer orders

(1) Ease of Use: Record

important data Promo-

planner & use important

data from Nielsen

(2) Data availability: Start

to collaborate with retailer

to enhance data availability

and built trust

(0) Current situation

(4) Generate a supply chain

forecast which adjusts

for disturbing factors on

retailer orders

Time

Alignment

with retailer:• trust• strategic

• co-mgt

Covered in this research Future direction given by this research

Preconditions

Preconditions


Page 56

5. LF former promotions SKU

6. Display (second placement)

The second step is to ensure that the important information of upcoming promotions is provided

by the retailers in advance. The important promotion data (e.g. the height of discount, folder,

promotion mechanism and the week the promotion is held) should be agreed on multiple weeks in

advance with the retailer. Currently, promotions change often or are cancelled at all, which cause

large deviation between the forecasted and expected retailer orders. Furthermore, two important

variables to forecast a promotion are not available at all at Unilever (1st and 2nd variable below).

Both variables have a large impact on the promotional demand and thus on the forecast accuracy of

the forecasting model, which is shown in chapter 7. Retailers are mostly unwilling to share this

information with Unilever because of data sensitivity reasons. Such a lack of important data

increases the difficulty of accurate forecasting. After implementing this step Unilever is able to

forecast the consumer demand according to model adaptation 1 in this research, where data

availability of all important variables is assured. Moreover, this step has to function as the beginning

of a good collaboration with the retailer. Trust between the two parties needs to be created, so the

next implementation steps can be taken. This trust can be created by incentive alignment and clear

terms of collaboration (Anderson, 2002).

1. Display: the percentage of shops where a promotional product has a second placement

2. Selling points: the percentage of extra selling points a promotion is sold at

3. The percentage of products which is on promotion within the category (see footnote)11

To go from the second step to the third step some preconditions need to be met. Trust and clear

communication should be established between Unilever and the retailer. Also, the value of the

project needs to be clear for both Unilever and the retailer. Clear goals, a good project formulation,

clear potential gains for all involved parties and honest communication are ways to meet the

preconditions. The third step is about bridging the gap between retailer orders and consumer

demand. As shown throughout the research, the forecast accuracy of a model on consumer demand

level is quite high. However, this forecast accuracy drops substantially when the consumer demand

has to be transformed to retailer orders. Especially for HPC products the model fit drops

dramatically, for Food products the model fit for retailer orders is considerably better. Figure 10-1

depicted the most likely factors that cause the difference between retailer orders and consumer 11

In paragraph 5.2 three important variables were excluded from the analysis because of a lack of data: (1) the percentage

of products which is on promotion within the category, (2) the percentage of products which was on promotion within the

category last week and (3) promotions of competitors. Van den Heuvel (2006) stated that the first variable indeed has an

important contribution and that the second variable is not significant. Concerning the third variable, it would be interesting to

include more specific data of promotions of competitors. However, promotion forecasting models in the literature have not

been able to include this data, because of complexity issues. Summarizing, data on the first variable should be gained.


Page 57

demand. Clearly the bullwhip effect, which states that orders to the supplier tend to have a larger

variance than sales to the buyer (Lee et al, 1997), has its effects in the FMCG market in which

Unilever operates. As most important activities to minimize the bullwhip effect, Lee et al (1997)

name information sharing of Point Of Sales data and inventory status data, simplification of the

promotional activities of a retailer, making one member of the supply chain responsible for the

forecasting process (e.g. VMI). Disney (2003) confirms that a VMI supply chain performs better than

a traditional supply chain. Hence, the third step is to investigate together with the retailer which

factors have a disturbing influence on the retailer orders, causing a larger variation in retailer orders

than consumer demand. The goal of this step is to gain insight in these factors, so the variance in

the retailer orders is no mystery but can be explained by the retailer and Unilever. These insights

can be used to minimize the negative effect of the variance in the retailer orders on the forecast

accuracy.

For step 4 similar preconditions need to be met and the trust between Unilever and the retailer even

needs to be higher. Therefore, the third step should be successfully finished and both parties should

be willing to further collaborate with each other. The fourth step has to bring the insights of the

third step into action and take this insights one step further. As a starting point the forecast for the

retailer orders should be used. This forecast has to be adjusted for the disturbing factors analyzed in

step 3. When for example the forecast for a promotion on the regular jar of Calvé peanut butter at a

retailer is 100.000 consumer units, then this number should be adjusted for stock left at the retailer,

units needed to fill the pipe line, units needed to fill the promotion displays, safety margin to cover

the variance over the different retailer shops and potential other disturbing factors. Since the stock

levels at a retailer continuously change, such a model should be updated each week. This results in

an accurate promotion forecast on retailer order level. This forecast should be generated by a

collaboration between the retailer and Unilever. Hence, both parties should not produce their own

forecast separate from each other, which momentarily lead to the disturbance in the demand of the

supply chain. This can be regarded as the final stage of collaboration between the retailer and

Unilever, because processes of both parties need to be integrated. To do so, the confidence

between the retailer and Unilever needs to be high, incentives for both parties should be clear and

the responsibility of producing a forecast should be put at the retailer or Unilever.

To support the collaboration between the manufacturer and the retailer, there are already several

initiatives in the FMCG industry, like Vendor Managed Inventory (VMI), Continues Replenishment

Program (CRP), Collaborative Planning, Forecasting and Replenishment (CPFR) and RFID enabled

collaborative process. The order of the collaboration concepts indicates the innovativeness of the

concept (Pramatari, 2007). VMI is most likely the first trust based business link between suppliers

and customers (Barrat et al, 2001), where the manufacturer has the responsibility of managing the


Page 58

customers inventory policy. CPR moves one step ahead of VMI and reveals demand from the retailer

stores to the supplier. CPFR can be seen as an evolution of VMI and CRP, where joint demand

forecasting and promotion planning are also addressed in the approach (Holmstrom et al, 2002).

CPFR is based on extensive information sharing between retailer and manufacturers, including Point-

Of-Sales data, forecasts and promotion plans. RFID enabled collaboration can be applied when each

product is tagged with an RFID chip and thus can be tracked through the whole supply chain.

Concluding, the first part of the chapter showed that the retailer orders cannot be forecasted

accurate enough. The second part discussed the future steps that need to be taken to overcome the

current problems at Unilever and reach a higher forecast accuracy. At each step the process

integration with the retailer becomes higher and more trust is needed between the parties. The end

result of the implementation steps is a higher forecast accuracy for retailer orders, a closer

collaboration with the retailer and thus more insight in the order process of the retailer.

Momentarily, the different retailers where Unilever delivers to are in different stages of the

implementation process. For each retailer value can be added by analyzing their current status and

making the next implementation step(s). The successes at Albert Heijn, which is currently the only

retailer where VMI is employed, can serve as an example for other retailers.

12 Conclusions

This research analyzed the ability to forecast promotional demand at a manufacturer level. The goal

of the research is to increase the forecast accuracy of promotions at Unilever. After an analysis of

the problem situation the research focussed on the development of a more mathematical forecasting

approach, which could support the judgemental forecasting process of the logistic Unilever

employees. The main research question formulated at the beginning of this research was: what

are the causes for the low forecasting accuracy and how can this forecasting accuracy be improved?

The research started with an ideal situation, which focused on the forecasting of the consumer

demand without any hindrances. The ideal situation where all variables are included to forecast the

consumer demand is depicted in paragraph 12.1. However, in practice some limitations obstruct the

use of an ideal model. Alternative models to overcome these limitations are depicted in paragraph

12.2. Finally, paragraph 12.3 depicts the steps which should be taken to overcome the limitations of

the current situation to be able to forecast more accurate.

12.1 Ideal model

The ideal (full) model is a forecasting model which includes all variables and forecasts the consumer

demand, which is easier than forecasting the retailer orders. Of the full model around half of the

variables is significant, dependent on type of products (data set) where the model is fitted on. The

different data sets are HPC products (Non-Food), Food products and a data set with all products.


Page 59

The adjusted R-square values of all three data sets are around 0.700 in the calibration period of the

model. This indicates a good model fit where 70% of the variance of the promotional demand is

explained by the model. In the validation period, where the model of the calibration period is

checked on a different data set, the model fit is even slightly higher than 0.700. This indicates that

the model results are robust when used on other promotions than the original data set.

The most significant variables in the model are the variables with a double plus or minus sign in

Table 12-1. Besides the fact that these variables are more important to inherit in a forecasting

model, the effect size of a variable could also be used to drive marketing decisions. The first

marketing implication is that a display (second placement) of a promotion in a retailer store is more

important than folder advertisement and TV advertisement. Hence, when the marketing budget

should be allocated, investments in display should have priority above investments in folder

advertisement and both should have priority on investments in TV advertisement. The second

implication is that of all promo mechanisms the mechanism where a consumer has to buy four or

more products to get the promotional discount results in the highest promotional demand.

Surprisingly, a Single Price Off (SPO), where a consumer only has to buy one product to receive the

promotional discount, leads to a better promotional demand than a promotion where a consumer

should buy two or three products. And a promotion with a free product or premiaat has the lowest

promotional demand, although the success of such a promotion really depends on the type of free

product or premiaat. Lastly, marketing can increase the promotional sales by making sure that the

promotion is sold in all stores of a retailer. This variable is especially important if the product is not

sold in almost all stores in base line sales. For these products there is a lot of extra promotional

sales to gain. One way of boosting the number of stores is to advertise the promotion in the folder,

since all stores are expected to have the folder promotions available. So, for products which are not

sold in all stores it is more interesting for Unilever to invest in folder advertisement.


Page 60

Variable Effect size Variable Effect size

Display ++ log_growth_number_selling_points ++ Folder + Percentage_repeat_buyers n.e.

TV_support n.e. / + Promotion_pressure n.e. Holiday_products n.e. ln_LF_former_promotions_EAN ++ Promo_length ++ Market_penetration n.e.

Percentual_discount ++ Preservability + SPOa

- log_size_of_product n.e.

Two_fora - Frequency_of_purchase - Three_fora - Personalcarec n.e.

Free_producta n.e. Ice_and_beveragesc - Premiaata n.e. SCC_and_vitality_shotsc + Number_of_products_in_promotion - Savoury_and_dressingsc n.e.

C1000b + winter_products_temp n.e.

Plus b - summer_products_temp n.e.

Kruidvat b - -

n.e. = no effect on the promotional sales a The baseline group for the different product categories is the product group “Four_or_five_for” b The baseline group for the different retailers is the retailer “Albert Heijn” c The baseline group for the different product groups is the product group “Homecare”

Table 12-1: Overview of the effect size and direction of the variables on the promotional sales

The ideal model shows that Unilever has the ability to forecast consumer demand. With the right

information Unilever is able to forecast the consumer demand at least as good as a retailer. Hence,

with this capability Unilever is able to take the lead in establishing a collaboration with retailers and

increasing the forecast accuracy. However, because of the practical requirements of a forecasting

model the ideal model formulated cannot be used in practice within Unilever. First, the model should

have a high ease of use, second the variables used should have data availability and third the

retailer orders need to be forecasted. Hence, some adaptations are needed on the full model, which

are discussed hereafter.

12.2 Adaptations needed on ideal model

To increase the usability of the forecasting model the most important variables are included in an

adapted model. The model fit of this model with a limited number of variables is still surprisingly

high and almost equal to the model fit of the full model. However, not all variables have data

availability at Unilever, since Unilever as a manufacturer is dependent on the retailers for information

of upcoming promotions. For two variables in the limited model Unilever has no data availability.

These are the percentage of shops with a second placement and the extra number of shops where

the product is sold in promotion. To analyze what the effect of the lack of data is on the forecast

accuracy of the model a new model without these variables is tested. The model fit decreases to an

adjusted R-square of around 0.500, indicating that the exclusion of the two variables substantially


Page 61

worsens the performance of the forecasting model.

Moreover, Unilever needs to forecast retailer orders instead of consumer demand. Therefore, the

model results for the consumer demand are adapted to retailer orders. The retailers included in the

research order on average between 39% and 85% more than is sold during promotion. The

forecasts for the consumer demand are raised with this difference. The model performance

decreases substantially because of the extra variance in the retailer orders. The R-square for the

HPC data set has decreased to 0.138 in the calibration period, meaning that the predictive power of

the model is very low. For the Food data set the R-square is 0.411 in the calibration period. So, the

variability in the retailer orders is a lot higher for the HPC products than Food products. Forecasting

retailer orders for HPC products seems to have little to no benefit, Food orders can be forecasted

with a higher accuracy. The difference is partly caused by the height of the sales of a promotion.

Because Food promotions sell 4 to 5 times more than HPC promotions the variability in de retailer

orders decreases. This reasoning also holds for the A, B and C categorisation where the A SKU’s are

the more important high volume products. And indeed A SKU’s have a substantial higher forecasting

accuracy than C SKU’s.

The adaptations indicated that the reduction in the number of variables in the model does not lead

to a lower model performance. But when two of the most important variables are excluded, because

of a lack of data at Unilever, the model performance decreases substantially. Furthermore, the

transition from consumer demand to retailer orders leads to a high loss of predictive power. To

overcome these problems further steps need to be taken.

12.3 Future steps to increase the forecast accuracy

Since a direct forecast of the retailer orders turned out to be inaccurate and not all variables had

data availability, future steps need to be taken to deal with the problems which diminish the forecast

accuracy (implementation plan in paragraph 11.2). The first implementation step focuses on the

enhancement of the data usage within Unilever. Quite some promotion data is available somewhere

in the organization; however, the available data of historical and upcoming promotions should be

recorded more centrally and accessible. Then the data can actually be used by the logistic employee

to forecast promotions. The second step is to ensure that the important information of upcoming

promotions is provided by the retailers in advance. Retailers are afraid to do so because of the

sensitivity of the data. Unilever should win their trust to get hold of the important promotion data.

The third step should bring insight in the factors causing the gap between retailer orders and

consumer demand. Because of the bullwhip effect a lot of extra variance is added to the retailer

orders, especially for HPC products. Unilever should focus on understanding the source of the extra

variance together with the retailer. The fourth step has to bring the insights of the third step into

action and take this insights one step further. During this process the alignment with the retailer


Page 62

becomes more important as the collaboration becomes more intensive. In the end this will result in

a supply chain forecasting model where both the retailer and Unilever make use of and new

technologies like RFID can be used to evolve the forecasting model.

The first two implementation steps will solve the poor database usage within Unilever, the main

scope of this research paper as stated in paragraph 2.2. The problem areas customer (retailer) team

deviation and retailer dependency will be influenced by the implementation steps as well. Because of

a standard way of working is proposed over all retailers, the promotion forecasting process will

become more alike for the different retailers. Furthermore, retailers who are not as far as others in

the implementation steps can learn from the forecasting process of Unilever for the more developed

retailers. Regarding the retailer dependency, it has become clearer which variables are needed from

a retailer to accurately forecast a promotion. And the implementation steps will convert the

dependency on a retailer to collaboration with a retailer.

Altogether, this research showed that if the right information is available Unilever is very well

capable of accurately predicting the consumer demand. Unilever has an advantage over the retailers

because of their larger data pool of promotions over all retailers which can be used to forecast

upcoming promotions. However, forecasting retailer orders has turned out to be far more difficult

than consumer demand, especially for HPC products. The bullwhip effect leads to a substantial

deviation between retailer orders and consumer demand. As a result, in order to be able to

accurately forecast retailer orders, the disturbing factors behind the bullwhip effect should be

analyzed. In order to successfully analyze these factors close collaboration with the retailer is

needed. When the disturbing factors are successfully analyzed, a promotion forecasting model which

forecasts the consumer demand and corrects for the disturbing factors should be formulated and

employed together with the retailer. Close collaboration and information sharing is needed, where in

the end Unilever and the retailer together use one forecasting approach and the retailer orders can

be predicted accurately.

12.4 Contribution to literature

In paragraph 1.5 three gaps in the literature were discussed. The gaps are (1) the choice of the

dependent variable to predict the promotional sales, (2) the development of a forecasting model for

a manufacturer and (3) if it is an advantage or disadvantage to be a manufacturer.

The first gap exists because there is no clarity in the promotion forecasting literature which measure

should be used as dependent variable. Different research uses different dependent variables namely,

the LF of the promotional sales, the ln of the LF and the cumulative lognormal distribution of the LF

(P(LF)). This research concluded that the LF of the promotional sales is not an adequate measure

because of clear signs of non normality. Both other measures correct for this non normality, only the


Page 63

cumulative lognormal distribution does that in a lesser extent. The performance of a promotion

forecasting model substantially improves for both the P(LF) and ln LF measure, where the model fit

of the ln LF was slightly higher. Concluding the natural logarithm of the LF matches the normal

distribution best and has the highest model fit.

Regarding the second gap, the main difference between a retailer and manufacturer is that a retailer

needs to forecast the demand of his shoppers (consumer demand) and a manufacturer has to

forecast the orders placed by his customers (retailer orders). This research both developed a model

which directly predicts the retailer orders and a model which predicts the consumer demand after

which this prediction is adapted to a forecast for the retailer orders. Retailer orders do in fact differ

remarkably from consumer demand, between the 39% and 86% for the retailers in this research:

therefore, a model which predicts consumer demand cannot be used at a manufacturer without an

adaptation.

Third, it is not clear if being a manufacturer is an advantage or disadvantage in producing an

accurate promotion forecast. This research built a model which has a high promotion forecast

accuracy on consumer demand level. The research made use of promotional data of SKU’s for

multiple retailers. The fact that the model performance based on this data is quite good, indicates

that the larger promotional database can act as an advantage for a manufacturer. However, not all

variables for the consumer demand forecasting model are available for upcoming promotions at

Unilever, because retailers are not willing to share some of the important promotion characteristics

with Unilever. This is a major disadvantage of which this research indicated that the model

performance evidently drops. The second disadvantage of being a manufacturer is that a

manufacturer has to deliver retailer orders instead of consumer demand. This research showed that

the variability of retailer orders is higher than that of consumer demand and that the model

performance decreases substantially when forecasting retailer orders. Therefore, overall it is

concluded that a manufacturer has a disadvantage compared to a retailer.


Page 64


Page 65

References

Anderson, E., & Coughlan, A. T. (2002). Channel Management: Structure, Governance and

Relationship Management. In B. A. Weitz & R. Wensley (Eds.), Handbook of Marketing (pg.

223-247). London: Sage.

Barratt, M. and Oliveira, A. (2001). Exploring the experience of collaborative planning initiatives.

International Journal of Physical Distribution & Logistics Management, Vol. 31, No. 4, pg. 266-89.

Blattberg, R.C., Briesch, R., Fox, E.J. (1995). How Promotions Work. Marketing Science, Vol. 14, No.

3, pg. 122-132.

Buckers, J. (2010). The ordering process of dry food groceries under promotion: a study of order

commitment timing and ordering methods. Master Thesis. Eindhoven University of Technology,

Eindhoven.

Cooper, L.G., Baron, P., Levy, W., Swisher, M., Gogos, P. (1999). PromoCast ™: A New Forecasting

Method for Promotion Planning. Marketing Science, Vol. 18, No. 3, pg. 301-316.

Cooper, D. R., Schindler, P. S. (2003), Business Research Methods, eighth edition, New York,

McGraw-Hill/Irwin

Disney, S.M., Towill, D.R. (2003). The effect of vendor managed inventory (VMI) dynamics on the

Bullwhip Effect in supply chains. International Journal of Production Economics, Vol. 85, No. 2, pg.

199-215.

Field, A. (2005). Discovering statistics using SPSS. Third edition. SAGE Publications. London.

Green, S.B. (1991). How Many Subjects Does It Take To Do A Regression Analysis? Multivariate

Behavioral Research, Vol. 26, No. 3, pg. 499 – 510.

Heuvel, F.P. van den (2009). Action products at Jan Linders Supermarkets. Master Thesis, Eindhoven

University of Technology, Eindhoven.

Holmstrom, J., Framling, K., Kaipia, R. and Saranen, J. (2002). Collaborative planning forecasting

and replenishment: new solutions needed for mass collaboration. Supply Chain Management: An

International Journal, Vol. 7, No. 3, pg. 136-45.

Lee, H.L., Padmanabhan, V., Whang, S. (1997). Infortmation distortion in a supply chain: The

bullwhip effect. Management Science, Vol. 43, No. 4, pg. 546.


Page 66

Lee, H.L., Padmanabhan V., Whang S. (2004). Comments on "Information Distortion in a Supply

Chain: The Bullwhip Effect". Management Science, Vol. 50, No. 12, pg. 1887-1893.

Loo, M. van (2006). Out-of-Stock reductie van actieartikelen, Model voor vraagvoorspelling en

logistieke aansturing van actieartikelen bij Schuitema/C1000. Master Thesis, Eindhoven University of

Technology, Eindhoven.

Makridakis, S. (1988). Metaforecasting: Ways of Improving Forecasting Accuracy and Usefulness.

International Journal of Forecasting, Vol. 4, No. 3, pg. 467-491.

Miles, J., Shevlin, M. (2001). Applying regression and correlation: a guide for students and

researchers. SAGE publication. London.

Poel, M.J. van (2010a). A Literature study at promotion forecasting in the Fast Moving Consumer

Good sector, literature study performed on promotion forecasting. Literature study performed for

this master thesis project.

Poel, M.J. van (2010b). A research proposal for promotion forecasting: how to develop a

manufacturer based model? Research proposal for this master thesis project

Pramatari, K., Papakiriakopoulos, D., Poulymenakou, A. and Doukidis, G.I.U. (2002). New forms of

CPFR. The ECR Journal – International Business Review, Vol. 2 No. 2, pg. 38-43.

Pramatari, K. (2007). Collaborative supply chain practices and evolving technological approaches

Katerina Pramatari. Supply Chain Management: An International Journal, Vol.12, No. 3, pg. 210–

220.

Silva-Risso, J.M., Bucklin, R.E., Morrison, D.G. (1999). A Decision Support System for Planning

Manufacturers' Sales Promotion Calendars. Marketing Science, Vol. 18, No. 3, pg. 274-300.

Silver, E. A., Pyke, D. F., & Peterson, R. (1998), Inventory management and production planning and

scheduling. Third edition. John Wiley & Sons. New York.

Strien, P.J. van (1997). Towards a methodology of psychological practice. Theory and Psychology,

Vol. 7, No. 5, pg. 683-700.

Wittink, D.R., Addona, M.J., Hawkes, W.J., Porter, J.C. (1988). SCAN*PRO: The estimation,

validation and use of promotional effects based on scanner data. Internal paper, Cornell University.


Page 67

Appendices

Appendix 1: Sample size (86 products)

EAN CE Material description Category

8717644013045 Axe Deospray Africa 150ML DEO & GROOMING

8717644042359 Axe Deospray Vice 150ML DEO & GROOMING

50097265 Axe roll on Africa 50ML DEO & GROOMING

50096190 Dove Deo Roll On Original 50ML DEO & GROOMING

8717163965030 Dove Deospray Original 150ML DEO & GROOMING

8717163997345 Dove Deospray Original 250ML DEO & GROOMING

8717163964972 Dove Deospray Sensitive 150ML DEO & GROOMING

8717163593318 Rexona Deospray Clear Aqua 150ML DEO & GROOMING

50099214 Rexona Roll On Nutritive 50ML DEO & GROOMING

8593838930653 Calve Dressing Naturel 450ML DRESSINGS

8593838930523 Calve Dressing Slasaus Halfvol 450ML DRESSINGS

8593838930509 Calve Slasaus Yogomix 450ML DRESSINGS

8717644278666 Andrelon Condit. Bruin haar 300ML HAIR CARE

8717163361252 Andrelon Conditioner Perf. Krul 300ML HAIR CARE

8717644341803 Andrelon Hairspr Fix & Shine 250ML HAIR CARE

8717644341582 Andrelon Mousse Volume 200ML HAIR CARE

8717644393956 Andrelon Shamp Hair&Body Men 300ML HAIR CARE

8717163009741 Andrelon Shampoo Glans 300ML HAIR CARE

8717163010068 Andrelon Shampoo Perf. Krul 300ML HAIR CARE

8717644337615 Andrelon Shaper 125ML HAIR CARE

8717163089828 Cif Spray Badkamer 750ML NEK HOUSEHOLD CARE

8717163089897 Cif Spray Keuken 750ML NEK HOUSEHOLD CARE

8717644961001 Glorix Bleek Original 750ML HOUSEHOLD CARE

8717163416976 Glorix Hyg Doekje Normaal. 60ST HOUSEHOLD CARE

8717644465394 Glorix WC Powergel A-kalk Lime 750ML HOUSEHOLD CARE

8717163055946 Sun Machinereiniger 40G 3ST HOUSEHOLD CARE

76840600021 B&J Cookie Dough 500ML ICE CREAM

8000920580806 DO 360ML Magnum Snacksize CAW ICE CREAM

8000920553800 DO 480ML Ola Magnum Classic 3+1 ICE CREAM

8000920555705 DO 480ML Ola Magnum White 3+1 ICE CREAM

8710447120187 Hertog IJsspecialiteit 3 Chocolades ICE CREAM


Page 68

8722700109198 Hertog IJsspecialiteit Stroopwafel ICE CREAM

8710447032756 Ola Festini Peer 600ML 12MP ICE CREAM

5410148322905 Ola Raket 440ML 8MP ICE CREAM

8722700210740 Vien Caramel Crisp 650ml ICE CREAM

8717644379264 Robijn Black Velvet 1,5L 30sc LAUNDRY

8717644391938 Robijn K&K Vloeibaar Color 730ml 20sc LAUNDRY

8717644391907 Robijn K&K Vloeibaar Wit 730ml 20sc LAUNDRY

8717644629536 Robijn Pak Color 1008G 18sc LAUNDRY

8717163404355 Robijn Vloeib Fleur&Fijn 1L 16sc LAUNDRY

8717644374320 Robijn WVZ Zonnig Geel 750ML LAUNDRY

8722700227632 Unox Knaks 200G OTHER FOODS

8722700227618 Unox Knaks Runder 200G OTHER FOODS

8722700335481 Unox Ragout Kalf 400G OTHER FOODS

8722700189510 Unox Rookworst Standaard 275G OTHER FOODS

8722700233053 Bertolli Pastasaus Basilicum 400G SAVOURY

8722700129554 Bertolli Pastasaus Knoflook 450G SAVOURY

8722700093602 Bertolli Pastasaus Kruidig 450G SAVOURY

8714100050262 Conimex Boemboe Sajoer Boontjes 100G SAVOURY

8722700208914 Knorr Chicken Tonight Hawai 490ML SAVOURY

8722700206712 Knorr Maaltijdmx Boerenomelet13G SAVOURY

8722700206354 Knorr Saus Kerrie 28 SAVOURY

8722700206361 Knorr Saus Room 46G SAVOURY

8722700206507 Knorr Saus Wit 22G SAVOURY

8711100069973 Knorr Wereld Burritos 229G SAVOURY

8711100069331 Knorr Wereld Kip Tandoori 292G SAVOURY

8722700222576 Knorr Wereld Mex Enchillada 343G SAVOURY

8722700139355 Unox CAS Speciaal Romige Mosterd SAVOURY

8711200189205 Unox Good Noodles Kip 70G SAVOURY

8722700214090 Unox SIZ Soep Bospaddestoelen 570ML SAVOURY

8722700214076 Unox Soep Champignon 300ML Doy SAVOURY

8722700214137 Unox Soep Romige Tomaat 570ML Doy SAVOURY

8722700419051 ZK 60G CNX Kroepoek Bali SAVOURY

8722700418818 ZK 60G CNX Kroepoek Klein Nat. SAVOURY

42153184 Axe SG Dark Temptation 250ml SKIN

8717644006481 Dove Body Cream Oil Pro Age 250ml SKIN


Page 69

8717163476789 Dove Body Voedende Creme 150ML SKIN

4000388177000 Dove Cream Wash Liq. soap 250ML SKIN

8717163611548 Dove Face Dagcreme 50ML SKIN

8717644046630 Dove Pro Age Shower 250ml SKIN

8717644027462 Dove Shower Cream Shower 500ML SKIN

8000700000012 Dove Wastablet Regular 100gr SKIN

8717163063606 Vaseline Body Lotion Aloe Fresh 400ML SKIN

8717163066003 Vaseline Lotion Hand&Nail Tube 75ML SKIN

8711200189403 Becel Bak en Braad 500ML SPREADS AND COOKING PRODUCTS

8722700250494 BECEL LIGHT LQM 500ML SPREADS AND COOKING PRODUCTS

8722700092971 Becel Olijfolie 500ML SPREADS AND COOKING PRODUCTS

8722700259886 Becel PA Bloeddruk 250G KP SPREADS AND COOKING PRODUCTS

8711200134502 Becel Vlees en Jus 400ML SPREADS AND COOKING PRODUCTS

8722700191377 Blue Band Margarine Idee Calc. 500G SPREADS AND COOKING PRODUCTS

8722700462958 CALVE PIKA REGULAR IKB 350G JAR SPREADS AND COOKING PRODUCTS

8711200134403 CROMA B&B LQM SPREADS AND COOKING PRODUCTS

8722700359326 DO 1,5L Lipton Ice T Lemon CAR TEA AND SOY & FRUIT BEVERAGES

8722700243809 Lipton Ice Tea Green 1.5L TEA AND SOY & FRUIT BEVERAGES

8722700056522 Lipton Ice Tea Sparkling Light 1.5L TEA AND SOY & FRUIT BEVERAGES


Page 70

Appendix 2: Transformation of variables

First the normality and possible transformations of the normal variable will be analyzed. The most

left underlying histogram shows the distribution of the untransformed LF. Since it does not meet the

normality distribution a logarithmic transformation is applied (ln transformation). The right hand

histogram and descriptive statistics table illustrated the large improvements in normality.

Consequently, the ln of the LF is used as the dependent variable in the further research.

Descriptive Statistics dependent variables

LF_5_weeks_b

efore ln_LF_5_week

s_before

Mean 6.6303 1.5832

Std. Error of Mean .22112 .02048

Std. Deviation 7.78001 .72050

Variance 60.529 .519

Skewness 5.983 .603

Std. Error of Skewness .070 .070

Kurtosis 54.596 .901

Std. Error of Kurtosis .139 .139

Range 109.88 5.09

Minimum .69 -.38

Maximum 110.57 4.71

The next descriptive statistics table and histograms depict the independent variables which are

considered for transformation. A logarithmic transformation (Field, 2005) is performed to enhance the

normality of the variables. The variables LF_former_promotions_EAN, Growth_number_of_shops,

Absolute_discount, Percentage_repeat_buyers and Size_products have improved substantially enough.

Therefore, these variables are used in there transformed form.


Page 71

Descriptive Statistics independent variables

LF_former_

promotions

_EAN

ln_LF_former

_promotions

_EAN

Growth_number

_of_shops_

selling_points

log_growth_

number_

selling_points

Absolute_

discount

log_

absolute

_discount

Size_

products

log_

size_ of_

product

Mean 4.18 1.36 1.18 .522 .661 .192 1010.1 2.88

Std. Error of Mean .051 .011 .0069 .0016 .0182 .0043 21.28 .0094

Std. Deviation 1.82 .375 .244 .057 .638 .152 747.9 .329

Variance 3.32 .140 .059 .003 .407 .023 559302 .108

Skewness 2.30 .306 2.11 1.48 1.257 .510 1.077 .035

St. Error Skewness .070 .070 .070 .070 .070 .070 .070 .070

Kurtosis 18.57 .909 5.99 3.37 1.516 -.501 .470 -1.070

St. Error Kurtosis .14 .14 .14 .139 .139 .139 .139 .139

Range 18.20 2.65 2.09 .48 3.53 .66 3246 1.25

Minimum 1.38 .32 .53 .31 .00 .00 194 2.29

Maximum 19.58 2.97 2.62 .80 3.53 .66 3440 3.54


Page 72


Page 73

Appendix 3: Outlier analysis on all cases

Casewise Diagnosticsb

Case Number Status Std. Residual

ln_LF_5_weeks_before

Predicted Value Residual

82 Xa 4.803 3.46 1.6928 1.76717

83 Xa 5.237 3.39 1.4631 1.92688

126 Xa 6.600 3.93 1.5013 2.42866

127 Xa 8.742 4.22 1.0034 3.21665

183 Xa 6.836 3.97 1.4546 2.51545

184 Xa 8.487 4.05 .9269 3.12307

186 3.411 2.37 1.1149 1.25507

208 Xa 6.970 4.71 2.1455 2.56453

209 Xa 7.205 4.57 1.9189 2.65113

213 4.319 3.29 1.7009 1.58912

215 3.081 2.40 1.2663 1.13369

219 -3.437 .25 1.5147 -1.26466

412 Xa 3.488 3.14 1.8564 1.28357

436 3.287 2.57 1.3603 1.20968

437 3.096 2.10 .9606 1.13935

498 Xa 6.894 4.22 1.6833 2.53670

535 Xa 7.159 3.93 1.2956 2.63438

579 -6.006 .00 2.2098 -2.20983

643 -4.058 .66 2.1530 -1.49305

1043 Xa 3.346 1.79 .5588 1.23119

1044 Xa 4.193 2.32 .7772 1.54281

1114 Xa 6.238 3.74 1.4445 2.29551

1115 Xa 7.391 4.22 1.5004 2.71957

1127 3.083 3.30 2.1656 1.13440

a. Xa : Magnum cases

Besides the exclusion of all Magnum promotions do to the large number of outliers originating from

the Magnum ice-creams, the cases with a Standard Residual above 3.5 are also excluded from the

analyses. The disturbing effect of theses cases on the models is too large. These are cases 213, 579

and 643. The reason for the high standard residuals is that case 213 and 643 have a very low base

sales and case 579 has a very low LF (below one).


Page 74

Appendix 4: Assumptions linear regression

Normality of dependent variable

Paragraph 5.5 discussed this property of the dependent variable. After transforming the dependent

variable to it’s ln value the normality assumption is met.

Multicollinearity

This can be assed with the VIF statistics of the different variables. If the largest VIF statistic is

greater than 10 or the average VIF statistic is substantially greater than 1 than there is a cause for

concern (Bowerman & O’Connel, 1990). The VIF statistic values range from 1.2 up to 6.4 and the

average VIF value is 2.6, which are no cause for concern.

Normality of error distribution

The following histogram and Normal P-P Plot of the standardized residuals picture that the normality

of error distribution assumption is accepted.

Homoscedasticity

To check this assumption the scatterplot of the regression standardized residuals and regression

studentized residuals is analysed. Both scatterplots give no concern for heteroscedasticity and show

that the assumption is met.


Page 75

Linearity

Below the scatterplots of the 12 most important independent variables are depicted. In most of the

graphs there is a clear linear relation between the dependent and independent variable. However,

for some of the scatterplots (Two_for_X, Three_for_X, Preservability,

Number_of_products_on_promotion) the linear relationship is unclear, but there certainly is no

concern for non-linearity. The only concern is a limited or lack of relation. Hence, this assumption is

accepted.


Page 76

Independence of the errors

The independence of error assumption means that for any two observations the residual terms

should be uncorrelated. The assumption can be tested with the Durbin-Watson test, which checks

for serial correlations between errors. The test can vary between 0 and 4, with a value of two

meaning that the errors are unrelated. As a general rule values lower than one and greater than

three are a cause for concern. The Durbin-Watson statistic value of model 1 is equal to 1.386.

Therefore, the assumption is accepted.


Page 77

Appendix 5: Results linear regression model TV_support

Model Summary

Model

R

R Square

Adjusted R

Square

Std. Error of the

Estimate

Albert_Heijn = 1

(Selected)

1 .141a .020 .017 .73869

a. Predictors: (Constant), TV_support

ANOVAb,c

Model Sum of Squares df Mean Square F Sig.

1 Regression 4.718 1 4.718 8.647 .003a

Residual 234.087 429 .546

Total 238.805 430

a. Predictors: (Constant), TV_support

b. Dependent Variable: ln_LF_promotions

c. Selecting only cases for which Albert_Heijn = 1

Coefficientsa,b

Model

Unstandardized Coefficients

Standardized

Coefficients

t Sig. B Std. Error Beta

1 (Constant) 1.604 .038 42.316 .000

TV_support .324 .110 .141 2.941 .003

a. Dependent Variable: ln_LF_promotions

b. Selecting only cases for which Albert_Heijn = 1


Page 78

Appendix 6: Further investigation hypothesis on log_absolute_discount

In the model summary below both absolute discount measures surprisingly have a higher model fit

than the percentual discount measure. There under, the coefficients of the different models are

depicted. Again the results are stronger for both absolute discount measures with a standardized

Beta coefficient of 0.497 and 0.473 against 0.434 for the percentual discount measure

Model Summary different single linear regression models

Model

R

R Square

Adjusted R

Square

Std. Error of the

Estimate

All_2009_w_o_M

agnum = 1

(Selected)

Absolute_discount_per_offer .497a .247 .246 .56426402

Absolute_discount_per_product .473a .224 .223 .57267972

Percentual_discount .434a .188 .188 .58567740

a. Predictors: (Constant), Absolute_discount_per_offer

Coefficientsa,b

Model

Unstandardized

Coefficients

Standardized

Coefficients


(Constant) 1.244 .023 53.532 .000

Absolute_discount_per_offer .180 .010 .497 17.783 .000

(Constant) 1.200 .026 46.487 .000

Absolute_discount_per_product .480 .029 .473 16.698 .000

(Constant) 1.095 .033 33.087 .000

Percentual_discount .018 .001 .434 14.973 .000


b. Selecting only cases for which All_2009_w_o_Magnum = 1

However, when the absolute discount per offer and the percentual discount are included in the same

full model than the absolute discount is not significant anymore (see coefficient table below). This is

probably due to collinearity problems.


Page 79

Coefficients full modela,b

Model


Standardized Coefficients


1 (Constant) -3.424 .326 -10.519 .000

Procentual_discount .018 .002 .427 10.545 .000

Absolute_discount_per_offer .005 .013 .014 .392 .695

C1000 .178 .040 .101 4.405 .000

Plus -.130 .040 -.079 -3.281 .001

Kruidvat -.106 .068 -.075 -1.571 .116

Personalcare .113 .060 .087 1.891 .059

Ice_and_beverages -.140 .092 -.059 -1.528 .127

SCC_and_vitality_shots .503 .131 .160 3.847 .000

Savoury_and_dressings .119 .069 .080 1.729 .084

ln_LF_former_promotions_EAN .566 .053 .318 10.699 .000

Display .008 .001 .448 15.019 .000

Folder .441 .055 .170 8.084 .000

Promo_length .589 .060 .396 9.903 .000

SPO -.258 .062 -.133 -4.141 .000

Two_for -.330 .057 -.246 -5.814 .000

Three_for -.277 .060 -.177 -4.648 .000

Free_product -.026 .074 -.015 -.356 .722

Premiaat -.045 .078 -.021 -.580 .562

TV_support .015 .070 .004 .219 .826

Number_of_products_in_promotion -.001 .000 -.095 -2.114 .035

Promotion_pressure -.001 .001 -.024 -.856 .392

log_growth_number_selling_points 3.697 .250 .323 14.804 .000

Market_penetration .002 .002 .047 1.142 .254

Frequency_of_purchase -.094 .035 -.141 -2.665 .008

Percentage_repeat_buyers .002 .002 .028 .716 .474

log_size_of_product .141 .072 .072 1.970 .049

Preservability .001 .000 .175 4.400 .000

Holiday_products .222 .139 .035 1.597 .111

winter_products_temp -.001 .004 -.006 -.273 .785

summer_products_temp .004 .005 .025 .702 .483




Page 80

To investigate if the absolute discount per offer has a threshold effect this variable is plotted against

the average Lift factor for all type of promotions and SPO promotions. For all type of promotions

combined, a clear linear effect is found. For the SPO promotions hardly a linear effect is found. And

in both graphs no clear threshold effect can be found, e.g. no obvious higher LF is found after a

certain absolute discount.

Model Summary

Model

R

R Square

Adjusted R

Square

Std. Error of the

Estimate

All_2009_w_o_

Magnum = 1

(Selected)

Percentual discount .834l 0.695 0.689 0.362

Absolute discount per product .813i 0.662 0.654 0.382

Absolute discount per offer .812j 0.659 0.651 0.384

When the absolute discount per product cannot be used, the non-promo price might be a good

replacing predictor to include in the model. The table on the next page shows that the non-promo

price is insignificant in a full model where the absolute discount per product is excluded as variable.

The coefficients which are depicted in the table are for the full model for all promotions of 2009

without magnum products (data set 4).

0

2

4

6

8

10

12

14

16

18

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5 8

Lif

t F

act

or

Absolute discount per offer (€)

All type of promotions

0

2

4

6

8

10

12

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Lif

t fa

cto

rAbsolute discount per offer (€)

SPO promotions


Page 81

Coefficients full modela,b

Model


Standardized Coefficients


1 (Constant) -3.270 .244 -13.384 .000

Display_1 .008 .001 .320 14.984 .000

Folder .467 .053 .180 8.849 .000

Promo_length .596 .056 .400 10.560 .000

Percentual_discount .018 .001 .446 15.257 .000

SPO -.229 .058 -.118 -3.923 .000

Two_for -.313 .054 -.234 -5.796 .000

Three_for -.258 .056 -.165 -4.570 .000

Number_of_products_in_promotion -.001 .000 -.089 -2.264 .024

C1000 .172 .039 .098 4.424 .000

Plus -.137 .038 -.084 -3.578 .000

Kruidvat -.513 .061 -.363 -8.412 .000

log_growth_number_selling_points 3.771 .239 .329 15.754 .000

ln_LF_former_promotions_EAN .593 .042 .333 14.128 .000

Preservability .001 .000 .162 5.308 .000

log_size_of_product .103 .059 .053 1.750 .080

Frequency_of_purchase -.053 .019 -.079 -2.767 .006

Ice_and_beverages -.158 .048 -.066 -3.316 .001

SCC_and_vitality_shots .308 .077 .098 3.990 .000

Non_promo_price -.022 .011 -.050 -1.638 .085



Im

pro

ving t

he p

rom

otion fore

cast

ing a

ccura

cy a

t U

nile

ver

Neth

erlands

Page 8

2

Appendix 7: Correlation matrix of full model (data set 4)

Pro

motion

_pre

ssure

lo

g_si

ze_o

f_pro

duct

Pro

mo_

length

Savo

ury

_

and_

dre

ssin

gs

Fre

e_

pro

duct

Perc

enta

ge_re

peat_

buye

rs

Thre

e_

for

Ice_and_

beve

rages

Pre

serv

abi

lity

Mark

et_

penetr

atio

n

Pers

onal

care

N

um

ber_

of_

pro

duct

s_in

_pro

motion

Kru

idva

t Perc

entu

al

_dis

count

sum

mer_

pro

duct

s_te

mp

.051

-.117

-.029

.150

-.009

-.045

-.069

-.675

.199

.046

-.003

.000

.013

-.074

Fre

quency

_of_

purc

hase

.0

21

-.040

.013

-.202

-.034

-.604

.010

-.027

.001

-.246

.156

-.026

-.059

.051

Holid

ay_

pro

duct

s -.

007

-.019

.006

-.027

-.014

-.002

.025

-.455

-.055

-.025

.000

-.005

.000

-.018

Fold

er

-.017

.052

.092

.041

-.062

-.060

-.067

-.030

.097

-.096

-.091

-.105

-.106

-.044

TV_su

pport

.0

19

-.016

.007

-.084

.065

.014

-.109

-.015

-.134

.030

.122

-.175

.043

.000

log_abso

lute

_dis

count

-.292

-.117

-.015

.396

.054

-.078

-.143

.216

-.013

.260

.188

.100

-.128

-.759

Plu

s .0

15

-.026

.040

-.041

-.046

-.024

.019

.038

-.137

.034

.046

.228

.178

.180

Tw

o_fo

r .0

32

-.007

.149

-.078

.111

.101

.625

-.050

-.030

-.134

-.114

.255

.165

.019

Dis

pla

y_1

-.170

-.027

-.043

.098

.022

-.038

-.096

.129

.079

.042

.164

.111

-.146

-.030

win

ter_

pro

duct

s_te

mp

.072

.031

.018

.016

.037

.172

.117

.093

.215

-.256

-.068

.013

-.014

.100

C1000

-.045

-.001

.031

.002

.010

-.033

-.022

.136

-.131

.058

.099

.134

.154

.061

log_gro

wth

_num

ber_

selli

ng_poin

ts

-.107

-.031

-.006

.036

.064

.131

.052

-.001

.044

.001

-.145

.005

.023

-.012

Pre

mia

at

-.070

.016

-.050

.067

.682

.012

.164

.036

.047

-.019

-.007

-.272

-.227

.196

SPO

.0

37

.011

.050

-.117

.094

.152

.686

-.164

.035

-.188

-.160

.276

.093

.137

ln_LF

_fo

rmer_

pro

motions_

EAN

-.

254

-.212

.048

.086

-.066

-.013

.029

-.237

.059

-.281

-.001

.030

.061

.035

SCC_and_vi

talit

y_sh

ots

-.

066

.339

-.008

.678

.046

.261

.029

.257

.571

.100

.297

.057

.011

-.111

Pro

motion_pre

ssure

1.0

00

-.257

-.009

-.132

-.051

.041

-.010

-.040

-.248

-.190

-.182

-.051

.019

.164

log_si

ze_of_

pro

duct

-.

257

1.0

00

.016

.322

.024

.054

-.009

.137

.456

-.144

.486

-.027

.027

.153

Pro

mo_le

ngth

-.

009

.016

1.0

00

.000

-.211

.066

.133

.014

-.021

-.087

-.012

-.058

-.455

.016

Savo

ury

_and_dre

ssin

gs

-.132

.322

.000

1.0

00

.054

.085

-.022

.356

.516

-.162

.528

.008

-.044

-.276

Fre

e_pro

duct

-.

051

.024

-.211

.054

1.0

00

.025

.055

.025

.064

-.009

.006

-.412

-.142

.183

Perc

enta

ge_re

peat_

buye

rs

.041

.054

.066

.085

.025

1.0

00

.022

-.003

.108

-.348

-.210

-.009

.080

.074

Thre

e_fo

r -.

010

-.009

.133

-.022

.055

.022

1.0

00

.033

-.013

-.130

-.062

.267

.130

.016

Ice_and_beve

rages

-.040

.137

.014

.356

.025

-.003

.033

1.0

00

.125

-.036

.311

.027

-.049

-.133

Pre

serv

abili

ty

-.248

.456

-.021

.516

.064

.108

-.013

.125

1.0

00

-.157

.226

-.057

-.025

.021

Mark

et_

penetr

ation

-.190

-.144

-.087

-.162

-.009

-.348

-.130

-.036

-.157

1.0

00

.019

.023

.022

-.193

Pers

onalc

are

-.

182

.486

-.012

.528

.006

-.210

-.062

.311

.226

.019

1.0

00

-.090

-.195

-.158

Num

ber_

of_

pro

duct

s_in

_pro

motion

-.051

-.027

-.058

.008

-.412

-.009

.267

.027

-.057

.023

-.090

1.0

00

-.037

.023

Kru

idva

t .0

19

.027

-.455

-.044

-.142

.080

.130

-.049

-.025

.022

-.195

-.037

1.0

00

.115

Perc

entu

al_

dis

count

.164

.153

.016

-.276

.183

.074

.016

-.133

.021

-.193

-.158

.023

.115

1.0

00

Im

pro

ving t

he p

rom

otion fore

cast

ing a

ccura

cy a

t U

nile

ver

Neth

erlands

Page 8

3

su

mm

er

_pro

duct

s_te

mp

Fre

quen

cy_of_

purc

hase

Holid

ay_

pro

duct

s Fold

er

TV_

support

lo

g_

abso

lute

_dis

coun

t

Plu

s Tw

o_

for

Dis

pla

y w

inte

r_p

roduct

s_te

mp

C1000

log_

gro

wth

_#

_se

llin

g_poin

ts

Pre

mia

at

SPO

ln

_LF

_fo

rmer_

pr

om

otion

s_EAN

SCC_and

_vi

talit

y_sh

ots

sum

mer_

pro

duct

s_te

mp

1.0

00

.018

.364

.038

-.063

.052

-.045

-.021

-.084

-.031

-.137

.036

-.015

.050

.308

.122

Fre

quency

_of_

purc

hase

.0

18

1.0

00

.014

.048

-.024

-.095

-.054

-.034

.013

-.152

-.043

-.018

-.023

-.033

.235

-.559

Holid

ay_

pro

duct

s .3

64

.014

1.0

00

-.029

.056

-.002

.035

.021

.003

.000

-.069

-.024

-.011

.002

.002

-.043

Fold

er

.038

.048

-.029

1.0

00

-.081

-.036

-.140

-.084

-.300

-.031

-.216

-.051

-.061

.031

.109

.003

TV_su

pport

-.

063

-.024

.056

-.081

1.0

00

-.027

.190

-.046

.068

.019

.196

-.005

.035

-.046

-.111

-.036

log_abso

lute

_dis

count

.052

-.095

-.002

-.036

-.027

1.0

00

.027

-.118

.147

-.103

.096

-.096

.082

-.231

-.096

.201

Plu

s -.

045

-.054

.035

-.140

.190

.027

1.0

00

.059

-.052

.023

.449

-.236

.018

.020

-.100

.020

Tw

o_fo

r -.

021

-.034

.021

-.084

-.046

-.118

.059

1.0

00

-.051

.060

.031

.067

.210

.646

.049

.021

Dis

pla

y_1

-.084

.013

.003

-.300

.068

.147

-.052

-.051

1.0

00

.016

.006

.052

.035

-.008

-.129

.102

win

ter_

pro

duct

s_te

mp

-.031

-.152

.000

-.031

.019

-.103

.023

.060

.016

1.0

00

.028

.010

.041

.040

-.086

.212

C1000

-.137

-.043

-.069

-.216

.196

.096

.449

.031

.006

.028

1.0

00

-.024

.034

-.159

-.056

.034

log_gro

wth

_num

ber_

selli

ng_poin

ts

.036

-.018

-.024

-.051

-.005

-.096

-.236

.067

.052

.010

-.024

1.0

00

.055

.055

.081

.037

Pre

mia

at

-.015

-.023

-.011

-.061

.035

.082

.018

.210

.035

.041

.034

.055

1.0

00

.181

-.055

.055

SPO

.0

50

-.033

.002

.031

-.046

-.231

.020

.646

-.008

.040

-.159

.055

.181

1.0

00

.081

-.028

ln_LF

_fo

rmer_

pro

motions_

EAN

.3

08

.235

.002

.109

-.111

-.096

-.100

.049

-.129

-.086

-.056

.081

-.055

.081

1.0

00

-.156

SCC_and_vi

talit

y_sh

ots

.1

22

-.559

-.043

.003

-.036

.201

.020

.021

.102

.212

.034

.037

.055

-.028

-.156

1.0

00

Pro

motion_pre

ssure

.0

51

.021

-.007

-.017

.019

-.292

.015

.032

-.170

.072

-.045

-.107

-.070

.037

-.254

-.066

log_si

ze_of_

pro

duct

-.

117

-.040

-.019

.052

-.016

-.117

-.026

-.007

-.027

.031

-.001

-.031

.016

.011

-.212

.339

Pro

mo_le

ngth

-.

029

.013

.006

.092

.007

-.015

.040

.149

-.043

.018

.031

-.006

-.050

.050

.048

-.008

Savo

ury

_and_dre

ssin

gs

.150

-.202

-.027

.041

-.084

.396

-.041

-.078

.098

.016

.002

.036

.067

-.117

.086

.678

Fre

e_pro

duct

-.

009

-.034

-.014

-.062

.065

.054

-.046

.111

.022

.037

.010

.064

.682

.094

-.066

.046

Perc

enta

ge_re

peat_

buye

rs

-.045

-.604

-.002

-.060

.014

-.078

-.024

.101

-.038

.172

-.033

.131

.012

.152

-.013

.261

Thre

e_fo

r -.

069

.010

.025

-.067

-.109

-.143

.019

.625

-.096

.117

-.022

.052

.164

.686

.029

.029

Ice_and_beve

rages

-.675

-.027

-.455

-.030

-.015

.216

.038

-.050

.129

.093

.136

-.001

.036

-.164

-.237

.257

Pre

serv

abili

ty

.199

.001

-.055

.097

-.134

-.013

-.137

-.030

.079

.215

-.131

.044

.047

.035

.059

.571

Mark

et_

penetr

ation

.046

-.246

-.025

-.096

.030

.260

.034

-.134

.042

-.256

.058

.001

-.019

-.188

-.281

.100

Pers

onalc

are

-.

003

.156

.000

-.091

.122

.188

.046

-.114

.164

-.068

.099

-.145

-.007

-.160

-.001

.297

Num

ber_

of_

pro

duct

s_in

_pro

motion

.000

-.026

-.005

-.105

-.175

.100

.228

.255

.111

.013

.134

.005

-.272

.276

.030

.057

Kru

idva

t .0

13

-.059

.000

-.106

.043

-.128

.178

.165

-.146

-.014

.154

.023

-.227

.093

.061

.011

Perc

entu

al_

dis

count

-.074

.051

-.018

-.044

.000

-.759

.180

.019

-.030

.100

.061

-.012

.196

.137

.035

-.111


Page 84

Appendix 8: Results linear regression model Promo mechanism

Model Summary

Model

R

R Square

Adjusted R

Square

Std. Error of the

Estimate

All_2009_w_o_

Magnum = 1

(Selected)

1 .371a .138 .134 .60483142

a. Predictors: (Constant), Premiaat, SPO, Free_product, Three_for, Two_for

ANOVAb,c

Model Sum of Squares df Mean Square F Sig.

1 Regression 56.338 5 11.268 30.801 .000a

Residual 351.920 962 .366

Total 408.258 967

a. Predictors: (Constant), Premiaat, SPO, Free_product, Three_for, Two_for

b. Dependent Variable: ln_LF_promotions

c. Selecting only cases for which All_2009_w_o_Magnum = 1

Coefficientsa,b

Model


Standardized

Coefficients


1 (Constant) 1.660 .075 22.039 .000

SPO -.176 .090 -.091 -1.970 .049

Two_for -.167 .081 -.125 -2.061 .040

Three_for .219 .083 .140 2.648 .008

Free_product -.421 .081 -.234 -5.178 .000

Premiaat -.560 .096 -.260 -5.805 .000




Page 85

Appendix 9: Results full linear regression model with P(LF) as dependent

variable.

Model Summary (P(LF) as independent variable)

Model

R

R Square

Adjusted R

Square

Std. Error of the

Estimate

All_2009_w_o_m

agnum = 1

(Selected)

1 .829a .687 .677 .15748

2 .829b .687 .677 .15740

3 .829c .687 .677 .15732

4 .829d .687 .678 .15724

5 .829e .687 .678 .15718

6 .829f .686 .678 .15712

7 .828g .686 .678 .15708

8 .828h .686 .678 .15705

9 .828i .686 .679 .15702

10 .828j .686 .679 .15699

11 .828k .685 .679 .15700

12 .827l .685 .678 .15707

13 .827m .684 .678 .15716

Statistics P(LF)

P(LF)

N Valid 1235

Missing 1

Std. Error of Mean .00830

Std. Deviation .29156

Variance .085

Skewness .122

Std. Error of Skewness .070

Kurtosis -1.121

Std. Error of Kurtosis .139

Range 1.00

Minimum .00

Maximum 1.00

Improving the promotion forecasting accuracy at Unilever ...

Documents