A real world example of using predictive analytics in large corporations for forecasting, budgeting and planning By Terry Simmonds Founder of Endure Data Science and Business Intelligence
A real world example of using predictive
analytics in large corporations for
forecasting, budgeting and planning
By Terry Simmonds
Founder of Endure Data Science and Business Intelligence
Much of Terry’s career has been involved with developing data information
stores, business forecasting, analytics and business modelling, undertaking
internal stakeholder management, training, coaching staff on the use of
analytics, presenting and consulting.
Recently in Australia’s largest Telecommunications company he directed the
rollout of a national forecasting and analytics platform for 140+ users.
In February this year, he presented on how Predictive Analytics is used in
Telstra at the Certified Practicing Accountants (CPA) forecasting conference
in Sydney, Melbourne and Brisbane. He has developed and presented many
courses on forecasting, analytics and project management over his 30 year
career.
He is a financial mathematician/data scientist by trade with a Bachelor of
Science degree (1988) from the University of Queensland and post graduate
studies in Actuarial mathematics, investments and management.
Terry has Coaching for Performance and Front Line Management qualifications
and he is currently finalising his Australian national accreditation in Workplace
Training and Assessment.
www.EndureDSandBI.com
Endure Data Science and Business Intelligence Page 2
Terry Simmonds – Principal Consultant and Founder
Endure Data Science and Business Intelligence
2. Introduction
3. Table of Contents
4. Acknowledgement and Thanks
5. Predictive Analytics @ Work –
Summary
6. What is Predictive Analytics?
7. Why use Predictive Analytics?
8. Part 1 – Predictive Analytics –
General
9. Data Collection in General
10. Knowledge Base
11. Predictive Models - General
12. Part 2 – Predictive Analytics @ Work
13. Model Example
14. Customer Adds Statistical Models
15. So which one to choose?
16. Model – sample error distributions
17. Selected Customer Sales Forecast
18. Collaboration
19. Final Collaborated Forecast
20. Prediction System Features
21. Connect with us
22. Thank you!
Endure Data Science and Business Intelligence Page 3
Thank you to all the people who inspired this e-book.
Thanks for your help whether it was in person or via
your presentations, books, videos, interviews,
tweets and blogs!
Teresa Simmonds Martin Baker
John Turner Evan Stubbs
Lynley Vinton Tony Ward
Joseph Panagiotakis Adam Franklin
Toby Jenkins Tony Simmonds
Jesus Christ Gary Cokins
Burton Wu Richard Branson
Peter Edwards David Williams
Mike Loukides Charles Nyce
International Institute of Forecasters
Many other folk who over the years have provided
their insight and support both professionally and
personally.
Please Share
© 2012 by Endure Data Science and Business
Intelligence.
This e-book is free and licensed under the Creative
Commons License, Attribution 3.0
http://creativecommons.org/licenses/by/3.0/
If you find this e-book useful please feel free to blog
about it, tweet it, email it to a friend and otherwise
share it with the world. The author just asks that you
don’t alter, transform, or build upon it without prior
consent.
Version 1.0
Please feel free to forward any of your content ideas
for this document at www.EndureDSandBI.com
Full disclosure
Affiliate links may have been used in this e-book, which
means if you click through and buy something, we are
likely to earn a small commission. Plus we may have
business relationships with people and organisations
mentioned in this e-book.
Endure Data Science and Business Intelligence Page 4
Predictive analytics assists with developing forecasts for business decision making.
It will NOT work well, however, when used in a slavishly dictatorial way or as a mindless black box approach.
It works well when used to augment judgemental based and other relevant modelling and forecasting
approaches to deliver a collaborated view which is seen as more accurate and reliable than any one view on
its own.
Endure Data Science and Business Intelligence Page 5
When challenging seasoned executive “gut
feels” talk to the method and numbers rather
than the opinions. This generates respect and
objectivity.
“Predictive Analytics is a broad term describing a variety of statistical and
analytical techniques used to develop models that predict future events or
behaviours.” Charles Nyce
Predictive Analytics incorporates a range of activities including visualizing data, developing
assumptions and data models, overlaying modelling theory and mathematics, then estimating/predicting
future outcomes.
At its core it relies on capturing relationships between the explanatory variables and the predicted
variable from past data points, and exploiting those relationships to predict future outcomes.
Some techniques include
• Mathematical modelling of many varied and different types to determine true drivers;
• Data mining;
• Game theory;
• Time series analysis and forecasting
• Just to name a few
Endure Data Science and Business Intelligence Page 6
Some of the “why’s” for using Predictive Analytics include
1. How will future profits be generated?
2. How will customer purchasing patterns drive future infrastructure demand?
3. How, where and when will your customers take-up your new product(s)?
4. What is the likely future customer churn behaviour?
5. What is the future company financial position for market disclosure?
6. I am sure you can think of many more …
So how does this all work in practice for your organisation?
Part 1 of this eBook outlines a general predictive analytics approach while Part 2 provides a more
concrete example for those interested a little more detail.
Of course, if you are looking for how your organisation can implement a specific predictive analytics
or data science road map then do not hesitate to contact us at www.EndureDSandBI.com .
Please forward any ideas you have regarding the content of future iterations of this eBook at
www.EndureDSandBI.com
Endure Data Science and Business Intelligence Page 7
4 general steps
Define the problem or
hypothesis or objective
Identify and collect relevant
data and information
Analyse data, develop
knowledge base, design predictive
models
Utilise predictive models to
determine most likely outcome(s)
Information/Data store
Knowledge base
Predictive models
Tools Processes
People
When dealing with large volumes of predictive analytics a broad process is required.
Endure Data Science and Business Intelligence Page 8
Defining the problem will drive all aspects of your data collection through to predictive
analytics modelling.
The next few pages will
outline the general approach
from gathering information
to producing predictive
modelling and bringing all
together with the people
and the processes.
Looking at the how to develop the Information data stores and undergoing data collection in general, you will
need to consider your data needs for the following areas.
Big ticket information needs
1. How big is the market currently?
2. What has happened in the market historically?
3. What competitors are out there and what are their market shares, product offerings, etc?
4. Much more ...
More product centric information
1. What is the historical financial General Ledge (GL) information for the product?
2. What additional “driver” based information is available?
• Sales
• Channels to market
• Types of customers buying and using the products
• Customer usage patterns
• What is the social media “chatter”& other “big data” sources ...
What external factors influenced the historical results?
1. What marketing & pricing campaigns were implemented?
2. What investments were made in the network?
3. Much more ...
Information/Data store
Endure Data Science and Business Intelligence Page 9
Now comes the fun part
Once you have collected sufficient data you can then turn your attention to turning the facts and data into
knowledge, ie identify plausible relationships between explanatory and prediction variables; assess the quality of the
information; determine the extent to which the information at hand can be used to predict future outcomes.
Utilising all this information to develop a picture of
• The market
• The customer base
• The delivery side of things, ie the channels to market, infrastructure requirements, product offerings, etc
• What are the significant drivers that influence profit
• The supply side and demand side price elasticity models where applicable
• Appropriate econometric models
• How the data looks, ie graph some data – “a picture paints a thousand words”
• The beginnings of predictive models based on what is now known
After all this you may need to go back and get more data if there is not enough
information to develop suitable models.
Knowledge base
Endure Data Science and Business Intelligence Page 10
Once the knowledge base is established then the next step is to generate the predictive models
This is the very general predictive modelling structure used for statistical analysis and ultimately it can take many forms.
Y is often called the dependent variable and it is generally the thing you want to predict such as profit, revenue, phone sales, etc.
f(x) is the “model” that builds on the knowledge and assumptions about the driver’s “x” variables. The key is to get both the driver’s “x” and the relationships between them and Y, correct.
e is the error, risk, unknown, stochastic term assumed generally to have a normal distribution with mean 0 and standard deviation of s.
e can be time dependent as in time series forecasting or time independent as in multiple regression or econometric models and data mining models.
A statistical predictive model, Y = f(x) + e, for each month (t) might be
Y(t) =
Customer Sales(t) * {Average Customer Initial purchase Yield(t) + ½ Average Customer months spend (t)} + e
The full year revenue forecast for sales would therefore be the sum of Y(t) for each month of the year.
The triangle that makes it all happen
While you can get the tools (the data and IT systems, etc) all established, you are not likely to gain full
benefit from your predictive analytics without have the right People doing the right Processes with the
right Tools on an ongoing basis. This is where a Predictive Analytics Road Map adds a great deal of value.
Predictive models
Endure Data Science and Business Intelligence Page 11
In this case utilise your organisation’s General ledger (GL). This can provide many
thousands, if not hundreds of thousands, of financial and non-financial accounts. The data
elements which might exist in your GL (financial and physical) systems include financial
and non-financial account codes, organisation and customer codes and Time/Date
attributes.
The atomic level data will need to be aggregate into a multi level product, measure and
organisation/customer grouping hierarchies to begin with (using standard OLAP data
architecture and functionality). All in line with organisational structures and strategic
priorities.
Also establish a series of measure hierarchies providing the physical and yield driver
elements for reports, models and forecast calculations.
Establish version control for audit and comparison purposes
In order to make appropriate operational business decisions, this approach provides the
necessary level of detail not available with any top down, market driven or judgement
based approaches.
Ensure aggregation is consistent across the organisation, otherwise collaboration cannot
occur and the effort to provide a robust and defendable forecast or predictive analytical
process fails. (put this into a caption)
The next page gives an example of a mathematical driver model which enables predictive
analytics to be used to drive the outcome.
Endure Data Science and Business Intelligence Page 12
Challenge if
you accept
Use
predictive
analytics to
forecast
detailed
product
revenue for
the rest of
this year
and then for
the next 3
years Warning
Take care working
with aggregated
low level variance
Customer
net
Growth
for Period
Gross
Revenue
Predict this as an example
Endure Data Science and Business Intelligence Page 13
This simple model utilises the customer sales and cancellations to determine net customer growth as well as
combining Sales Yields to generate a value for gross revenues.
Predictive Analytics can be applied to the Relevant Drivers of the model (the grey boxes) or they can be applied to
the Customer net growth and Gross Revenue themselves.
For our purposes we will focus on the Customer adds to illustrate the Predictive Analytics approach.
Additive Winters (exponential smoothing)
Mean Average Percentage Error (MAPE) 5.76
Auto Regressive iterative Moving Average (Arima)
(pdq) 200 (PDQ) 100 no intercept MAPE 7.71
Dum
my D
ata
Use
d
Dum
my D
ata
Use
d
Endure Data Science and Business Intelligence Page 14
What on Earth are these?
These graphs show a time series forecast for the
customer adds (or sales) driver for the model on the
previous page. The information is monthly sales from
2007 which is then forecasted using two different types
of time series forecasting model forms.
For the purpose of this eBook we wont go deeply into
the models themselves other than to say they are a
type of predictive analytics model which are used to
forecast future values. In this case they are being used
to forecast the customer monthly sales.
• The Black dots are the historical data.
• The vertical dotted blue line is the forecast date.
• The red line is the model which has been “fitted” to
the historical data and then used to forecast into
future months.
• The pink shaded areas are the 95% Confidence
Interval generated by the different models.
Generally, the lower the MAPE the smaller the
difference between the values of the fitted model and
the actual black dots (actual historical data). The MAPE
is a statistical measure that aids in determining if the
model is close to lining up with the historical results.
Unfortunately a low MAPE does not always guarantee
the best Predictive Analytics model is selected.
Black-box approach would be to accept smallest MAPE, ie Additive Winters model
However Business knowledge may drive an alternate outcome
• Do you need to capture seasonality?
• Do you need to capture short, medium or long term trends?
Work to incorporate historical events in modelling, e.g. step functions in June 2010 and January 2011.
If you are wanting to capture past events then you will need to user ARIMA type models. This is not the one
with the smallest MAPE currently.
An alternate approach is to consider other statistics such as “White Noise” and distribution of errors (are they
normally distributed, etc)
As a starting point, go back to predictive analytics theory and work with Y = f(x) + e
Revisit Y = F(x) + e • Y is the predictor variable
• F(x) is the function of explanation variables x
• e - is the error we get when we haven’t got the right function f() in place or we may have missed an
explanation variable or we may not have captured the correlation between the variables completely.
• The power of statistics is to get the function and explanation variables to a position where the error has an
approximate normal distribution, an average value of 0 and a known variance.
While this only applies to the actual results – we don’t know the actual outcomes
as yet for future forecasts - the theory is that if we get the errors distributed
normally around an average value of 0, we have a statistically ‘good’ model.
Endure Data Science and Business Intelligence Page 15
Take care you
don’t exceed 4:1
ratio of historical
data to forecast
horizon
Additive Winters (exponential smoothing)
Mean Average Percentage Error (MAPE) 5.76
Auto Regressive iterative Moving Average (Arima)
(pdq) 200 (PDQ) 100 no intercept MAPE 7.71
Endure Data Science and Business Intelligence Page 16
Looking at the error distributions for the same models as shown on page 14.
What on Earth are these?
These graphs are histograms of the errors of the
Prediction models for customer sales.
• The pink boxes are the histogram bars
• The Red line is the estimated normal distribution of
the results
• The blue line is the “smoothed” distribution of the
errors
• The closer the red and blue lines come together the
more the error results will be “normally”
distributed.
You can also see that that each graph has a peak
around 0 which shows that on average the prediction
models do come close to the historical data.
Why is it important to have the peak at 0?
It means that the model is pretty good at guessing the
past, which is a good indicator that it will be OK for
estimating possible future values.
So What!
You can see by these two graphs that the model with
the lowest MAPE – the Winters Additive Model - does
not generate the distribution of errors that is “closer”
to the normal (red) distribution.
There can be a degree of “professional judgement” in using
Predictive Analytics models.
In practice, the most appropriate statistical forecasts are selected
based on a combination of factors including
Business knowledge
Statistical goodness of fit
Forecast quality controls
For completeness and to finish this example, the Predictive
Analytics model results for Customer Sales can be loaded into the
model on page 13 along with the predict values for the other
drivers and let the model calculate the resulting gross revenue.
Arima (pdq) 200 (PDQ) 100 no intercept
MAPE 7.71
Dum
my D
ata
Use
d
Endure Data Science and Business Intelligence Page 17
Challenge what happens
when you have to do this for thousands of predict
variables & models?
International Institute of Forecasters research
It has been demonstrated many times by the international forecasting community (www.iif.com ) that by
combining a number of independent and different forecasting and prediction techniques/models, the final
view will provide a more accurate and reliable forecast than any one view on its own.
What might be simple way of collaborating different views?
1/3 of the (a Top Down Market view + a Bottom up detailed view + an independent statistical
predictive analytics view)
Alternatively, throughout the process an agreed position can be reached utilising inputs for the 3
major techniques.
Or it could be as complex as using a historical scoring process based on the level of bias of previous
predictions from various sources.
The following process has been used in practice to gain a collaborated final
position.
Endure Data Science and Business Intelligence Page 18
Update final views to reflect modified forecasts and assumptions and consolidate results for total
organisational outlook
Provide draft position using these forecasts
Present to subject matter experts and capture their feedback; modify explanatory variable forecasts where appropriate
Assess current predictive analytics views
Summarise statistical predictive analytics forecasts
Evaluate in context with past, current and proposed business initiatives and
external factors where relevant
Modify forecasts to accommodate for business initiatives
Endure Data Science and Business Intelligence Page 19
One last thought before we finish
If you are looking to undertake a large number of predictive analytic modelling tasks on a regular
(say monthly) basis, then unless you have an army of highly skilled data scientists and analysts, you
will need to implement a platform that can assist in completing the work.
Some attributes the platform needs to have
• Strong linkages between financial, customer and operational level data for predictive
analytics and forecasting purposes.
• Regular and timely automated, consistent, managed data extractions.
• Access to powerful trajectory and statistical based forecasting and analytical applications via
MS Excel and custom Graphic User Interfaces (GUI’s).
• Access to sophisticated large volume (millions of records) data management and web hosting
Dashboard and Business Reporting applications.
• Seamless integration with Microsoft Office Excel, Word and PowerPoint.
• Genuine multiuser, workflow management application for as many concurrent users as
needed.
• Utilises OnLine Analytical Processing (OLAP) functionality for ease of dissecting data
information stores and building analytic driver models.
Endure Data Science and Business Intelligence Page 20
So how did you go with the challenge?
Contact us now to discuss the ways it can work for you and your organisation at
www.EndureDSandBI.com
www.EndureDSandBI.com
We’d love to hear from you.
If you would like a hand with @work predictive analytics, business navigation or
putting together your Data Science road map then please get in contact with us at
Endure DS & BI.
You can phone Australia on +61 409 389 060,
write to us at [email protected]
or follow Terry on LinkedIn.
Feel free to share this eBook.
We wish you all the best with implementing your prediction
analytics needs @work.
Terry Simmonds
Principal Consultant and Founder
Endure Data Science and Business
Intelligence
Endure Data Science and Business Intelligence Page 21
Endure Data Science and Business Intelligence Page 22