MACRO ECONOMIC INDICATORS AND FMCG SALES-A CASE STUDY Submitted by:Ayan Adhikari University of Calcutta Department of Statistics We are given data on FMCG Sales, which is a function of mainly 4 macro economic indicators , namely: • GDP(Gross Domestic Product) • CPI(Consumer Price Index) • PPI(Producer Prices Index) • IPI(Industrial Production Index) Along with them the FMCG Sales is also affected by Crude Oil Prices and Sugar Prices and the distribution of all the above mentioned factors. In this Case Study our objective is two fold: 1. To determine how FMCG sales gets impacted by the movement of all the above mentioned factors 2. To predict the sales for next 3 quarters. So to achieve our objectives we carry out a statistical analysis on the data we are provided with. A.Description of the data The data that is provided, gives us information on FMCG Sales,which is the response variable. The FMCG Sales however is a two dimensional vector ,consisting of the components Value Offtake (in 00,000 Rs) and the Number of Stores .The data on FMCG Sales is given for each month , spanning from January 2012 to June 2014. Hence,necessarily it is a Time Series Data. The data on the Macroeconomic Indicators,Crude Oil Prices and Sugar Prices are also given on a monthly basis,although the data on all the covariates are not supplied for the span January 2012 to June 2014,as in the case of the response.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
MACRO ECONOMIC INDICATORS AND FMCG
SALES-A CASE STUDY
Submitted by:Ayan Adhikari
University of Calcutta
Department of Statistics
We are given data on FMCG Sales, which is a function of mainly 4 macro economic indicators ,
namely:
• GDP(Gross Domestic Product)
• CPI(Consumer Price Index)
• PPI(Producer Prices Index)
• IPI(Industrial Production Index) Along with them the FMCG Sales is also affected by Crude Oil Prices and Sugar Prices and the
distribution of all the above mentioned factors.
In this Case Study our objective is two fold:
1. To determine how FMCG sales gets impacted by the movement of all the above
mentioned factors
2. To predict the sales for next 3 quarters. So to achieve our objectives we carry out a statistical analysis on the data we are provided with.
A.Description of the data
The data that is provided, gives us information on FMCG Sales,which is the response
variable. The FMCG Sales however is a two dimensional vector ,consisting of the components
Value Offtake (in 00,000 Rs) and the Number of Stores .The data on FMCG Sales is given for
each month , spanning from January 2012 to June 2014. Hence,necessarily it is a Time Series
Data.
The data on the Macroeconomic Indicators,Crude Oil Prices and Sugar Prices are also
given on a monthly basis,although the data on all the covariates are not supplied for the span
January 2012 to June 2014,as in the case of the response.
The data on GDP is provided for each of the 10 quarters(There are 10 quarters from
January 2012 to June 2014).
The data on CPI(prices paid by consumers for a basket of goods and services)is given for
the span August 2013 to July 2014.Similarly the data provided on PPI(measuring the average
change in price of goods and services sold by manufacturers) and IPI(measuring changes in
output for the manufacturing,mining and utilities)are for the same time span as CPI.
So,the consolidated data may look somewhat like as below:
Month Value Offtake Number of Stores Crude Oil Sugar Pric GDP CPI PPI IPI
As it is seen from the snapshot,we mainly face 3 constraints while modeling the
impact of the movement of all the factors on FMCG Sales.They are:
A. The data contains a number of missing values in the columns of the
Covariates:Sugar Prices,IPI,CPI and PPI.
B. Moreover,the data on GDP was given on a quarterly basis.
C. However the main problem in the data was that, we were provided with the data
on FMCG Sales of January 2013 in the place of January 2012.So we have actually
treated the response data corresponding to January 2012 missing as well.
i. To overcome these problems we first consider the Response variable-FMCG Sales.
As a single value is missing among all the 30 observed data points,we replace the
values(both of Value Offtake and Number of Shops) corresponding to January 2012 by the
mean of the remaining observed data points.
Thus we obtain a dataset where we have responses corresponding to all the
observed thirty months.
Again,as the response FMCG Sales contains of two components Value Offtake and
Number of Shops,we have to actually analyse how both of them are affected by the movement
of the given Macroeconomic Factors.Or else,we can actually obtain a new response variable
Value Offtake per Shop(00,000Rs) = value offtake(00,000)/ no. of shops Here in this case study analysis,the single response variable,Value Offtake per Shop has been
considered.Thus we address one of the three constraints.
ii. Next we try to obtain the GDP values on monthly basis
For addressing this problem we first plot the GDP data given on a quarterly basis.
Now we actually try to fit a trend equation on this data, so that we can obtain monthly data on GDP.To
obtain an appropriate trend equation we initially plot a linear trend.However the prediction was not
satisfactory as can be seen from the graph below.The fact is also supported by the low value of Adjusted
R-squared for the model,which comes out to be 71.83.
So instead we look for a quadratic equation.In this case the Adjusted R-Squared for the fitted model
comes out as high as 93.5. Hence the quadratic trend equation may be considered to be the appropriate
one .The graph of both linear and Quadratic trend equation imposed on the original data is as below:
So we obtain a quarterly trend equation for GDP.Then we carry out the necessary
transformations and adjustments to obtain the monthly trend equation for GDP.
C.Analysis of the Completed dataset and fitting an
Appropriate Model Once we obtain the fully completed dataset we can actually analyse the data and study
the impact of the movement of the Macroeconomic Indicators on the FMCG Sales. We firstly plot all the Time Series data i.e, the response variables along with the six covariates.
From the above graph and the graph attached below it is evident that all the six covariates and
the response variable shows movements along time,though may not be in the same
direction.The graph of the response variable clearly shows an increasing trend with seasonality
present(presence of seasonality is natural as it is a monthly data).
Similarly the plot of OIL Prices,CPI,IPI and PPI also shows an increasing trend in the last few
quarters with certain fluctuations p resent.Also the plot of GDP clearly states that it is
decreasing over time,that is, it might affect the response negatively.The plot of Sugar Prices
also indicates of a decreasing trend with fluctuations in between.Hence Sugar Prices may also
affect the FMCG Sales negatively.
However, nothing can be said with certainty about the impact of the movement of each covariate on
FMCG Sales without carrying out a proper statistical analysis of the data. Hence to come to a concrete
conclusion we must carry out a regression analysis.
Now we check using qqplot whether the response variable can be assumed to
follow a Normal distribution or not.Here we actually compare the sample quantiles with the
theoretical quantiles.Looking at the graph below we conclude that Value Offtake per Shop
maybe well assumed to follow a Normal Distribution as all the points lie on the qqline with a
few exceptions.
Similarly we check the normality assumption for all the six Covariates.
Coefficients Estimate Standard Error p-value Decision taken
Intercept 0.6681 0.179 0.00123 Reject H0
GDP 0.1131 0.03588 0.00482 Reject H0
Value Offtake per
Shop(preeding time
point)
-0.4719 0.1387 0.00268 Reject H0
GDP(preceding time
point)
-0.1165 0.03338 0.00219 Reject H0
Sugarprices(Preceding
time point)
0.003514 0.0009913 0.00192 Reject H0
PPI -0.002009 0.0009601 0.4867 Accept H0
IPI -0.001435 .000766 0.07503 Accept H0
Oil Prices(preceding
time poiny)
-0.000007861 .000004268 0.07962 Accept H0
So it may be assumed that all the covariates as well follow approximately a Normal Distribution
as is evident from the above Q-Q Plots.
As the response follows a Normal Distribution,we may carry out a Generalized Linear
Model with the Identity Link function. However as all the data under consideration are Time
Series data we consider the Time Series regression model and use the dyn package(dynamic
regression) to carry outour necessary analysis.
"dyn" enables regression functions that were not written to handle time series to
handle them. Both the dependent and independent variables may be time series and they may
have different time indexes (in which case they are automatically aligned).
We go on adding one variable at a time to obtain the most parsimonious model,that might
explain the impact of the movement of the covariates on the response.
But we do face a problem in our pursuit to seek for the best predictive equation.The
best model that we obtain in terms of minimum AIC and residual deviance does not include all
the factors,but is a function of GDP,Response of the preceding time point,GDP of the
preceding time point , Sugar Prices of the preceding time point,PPI,IPI and Oil Prices of the
preceding time point.
The best model obtained by incorporating all the covariates do also admit a low AIC but it
is greater than the above mentioned model.
A statistical table of the following two models is presented here:
I. The best model:Model I
Here the null hypothesis states that H0:the particular coefficient is 0.So we conclude from the
above table that all the covariates except PPI,IPI and Oil Prices at the preceding time point are
significant,at 5% level of significance.The measures of Goodness of fit is provided by AIC and
residual deviance.They comes out to be as
AIC:-203.22 Residual Deviance: 0.00082608 on 21 df Hence the model comes out to be :