Top Banner
Ridge Regression using PROC REG A Fixed Effect Model for Determining the Mixture of Acquisition-Subscription Cost Steven Matthew Anderson Century Link [email protected] m
32

Ridge Regression using PROC REG A Fixed Effect Model for Determining the Mixture of Acquisition- Subscription Cost Steven Matthew Anderson Century Link.

Dec 26, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Ridge Regression using PROC REG A Fixed Effect Model for Determining the Mixture of Acquisition- Subscription Cost Steven Matthew Anderson Century Link.

Ridge Regression using PROC REG

A Fixed Effect Model for Determining the Mixture of Acquisition-Subscription Cost

Steven Matthew AndersonCentury Link

[email protected]

Page 2: Ridge Regression using PROC REG A Fixed Effect Model for Determining the Mixture of Acquisition- Subscription Cost Steven Matthew Anderson Century Link.

Outline• A Case Study to Introduce Ridge Regression

– Description of the Business Problem– Regression Model– Problems with the Model

• Ridge Regression Model– Description of the Method– How Does it Work

• SAS’s PROC REG– Code– Output

• Simulation of the Model• Summary• Future work

Page 3: Ridge Regression using PROC REG A Fixed Effect Model for Determining the Mixture of Acquisition- Subscription Cost Steven Matthew Anderson Century Link.

A Case Study to Introduce Ridge Regression

• Terminology– Fixed Cost– Variable Cost– Acquisition Expense– Subscription Expense– Mixtures of Acquisition and Subscription Expense– Side Note: Some Examples of Analysis Using this Cost Structure

• The Business Problem• The Regression Model• Problems with the Model

Page 4: Ridge Regression using PROC REG A Fixed Effect Model for Determining the Mixture of Acquisition- Subscription Cost Steven Matthew Anderson Century Link.

Fixed Cost

• Fixed costs are business expenses that do not change in proportion to the activity of the business (within a relevant time period)

• Discretionary fixed costs– Arise from annual decisions

by management to spend on certain fixed cost items

• Committed fixed costs– Costs that do not change

significantly over time

• Staff Salaries• Network Management• Data/IP Strategy• Sales Force Management• Most Overhead expense

Fixed Cost vs Time

0

5

10

15

20

25

0 5 10 15 20 25

Time

Exp

ense

Adjustment

Page 5: Ridge Regression using PROC REG A Fixed Effect Model for Determining the Mixture of Acquisition- Subscription Cost Steven Matthew Anderson Century Link.

Variable Cost• Variable costs are expenses

that change in proportion to the activities of the business.

• Semi-variable costs are fixed costs that are adjusted periodically to accommodate changes in business activity.– Looks like a step function over

time• Semi-variable costs are

considered in this study to be variable costs.

• Costs of goods sold• Commissions• Sales Headcount (minus commissions)

• Call Center Staffing• Bad Debt

Variable Cost vs TIme

0

5

10

15

20

25

30

0 5 10 15 20 25 30

Time

Expe

nse

Adjustment

Page 6: Ridge Regression using PROC REG A Fixed Effect Model for Determining the Mixture of Acquisition- Subscription Cost Steven Matthew Anderson Century Link.

Acquisition Expense

• Can be interpreted as expenses incurred to “Make the Sale.”

• Positively Correlated with acquisition activities– # Sales units (Gross Inwards)

– # Call Center employees

• Marketing incentives• Sales Headcount• Installation of Service• Design Services (WAN)

Acquisition Cost vs Sales Units

0

5

10

15

20

25

0 5 10 15 20 25

Sales Units (AGI)

Expe

nse

Page 7: Ridge Regression using PROC REG A Fixed Effect Model for Determining the Mixture of Acquisition- Subscription Cost Steven Matthew Anderson Century Link.

Subscription Expense

• Can be interpreted as expenses incurred to “Keep the Customer.”

• Positively Correlated with Monthly Subscription Activity– Monthly Revenue– # of Revenue Generating

Units (RGU)

• Repair of services• Collections• Network Monitoring

Subscription Cost vs Revenue

0

5

10

15

20

25

30

0 5 10 15 20 25

Revenue

Expe

nse

Page 8: Ridge Regression using PROC REG A Fixed Effect Model for Determining the Mixture of Acquisition- Subscription Cost Steven Matthew Anderson Century Link.

Mixed Acquisition/Subscription Expense

• Expenses that are positively correlated with both Subscription and Acquisition Activity

• Fleet• Construction• Hosting Operations

Page 9: Ridge Regression using PROC REG A Fixed Effect Model for Determining the Mixture of Acquisition- Subscription Cost Steven Matthew Anderson Century Link.

Financial Analysis Examples using this Cost Structure

• Break Even Analysis– Used to analyze the

potential profitability of an expenditure in a sales based business

– Need to find the beak-even point (point where revenue is equal to expense)

CostVariablePriceSelling

CostFixedBEP

Picture stolen from Wikipedia

Page 10: Ridge Regression using PROC REG A Fixed Effect Model for Determining the Mixture of Acquisition- Subscription Cost Steven Matthew Anderson Century Link.

Financial Analysis Examples using this Cost Structure

• Customer Lifetime Value– Used in Marketing to determine how much each customer is

“worth” over time– R=Revenue– E=Expense

Calculated by:

T

tt

t

kk

T

t

kt

ktkk

T

tt

t

kt

kt

k

i-1

MarginonSubscripti MarginAquisition

tti

ERER

i

ERCLV

1

100

0

1

1

Page 11: Ridge Regression using PROC REG A Fixed Effect Model for Determining the Mixture of Acquisition- Subscription Cost Steven Matthew Anderson Century Link.

Description of the Business Problem

• Given a particular cost pool (i.e. bucket)– What percentage of the cost pool can be

classified as fixed or variable cost?– What percentage of the cost pool can be

classified as acquisition or subscription cost?

Page 12: Ridge Regression using PROC REG A Fixed Effect Model for Determining the Mixture of Acquisition- Subscription Cost Steven Matthew Anderson Century Link.

Regression Model

• • Expense = Total expense in cost pool• A = Acquisition Activity (AGI)• S = Subscription Activity (RGU) • (AS) = Cross Product Interaction Term

ASSAExpense 3210

Page 13: Ridge Regression using PROC REG A Fixed Effect Model for Determining the Mixture of Acquisition- Subscription Cost Steven Matthew Anderson Century Link.

Regression Model

Acquis

ition A

ctivit

y

Subscription Activity Subscription Activity

Acquisition Activity

100% Subscription Expense 100% Acquisition Expense

Page 14: Ridge Regression using PROC REG A Fixed Effect Model for Determining the Mixture of Acquisition- Subscription Cost Steven Matthew Anderson Century Link.

Regression Model

Page 15: Ridge Regression using PROC REG A Fixed Effect Model for Determining the Mixture of Acquisition- Subscription Cost Steven Matthew Anderson Century Link.

Regression ModelAnswering the Fixed/Variable Expense Question

0

0

3210

thethatso

ExpenseonSubscriptiSandExpense,AquisitionALet

ExpenseTotalExpenseVariable

ExpenseFixedAverage

ASSAExpenseTotal

ExpenseTotal

ExpenseVariable

ExpenseTotal

ExpenseTotal

ExpenseFixed

ExpenseTotal

ExpenseVariable

1

ExpenseFixedofPercentage

ExpenseVariableofPercentage

0

Page 16: Ridge Regression using PROC REG A Fixed Effect Model for Determining the Mixture of Acquisition- Subscription Cost Steven Matthew Anderson Century Link.

Regression ModelAnswering the Acquisition/Subscription Question

22

2

1222

22

11

2

2

2

2

,

,

1

EESAEand

ESE

S

EAE

ALet

E

S

E

AExpenseTotal

1

1

22

21

22

22

21

21

2

2

2

1

2

22

2

2

1

2

1

EE

E

EE

E

S

A

E

Subscription

Acq

uisi

tion

E (Total Expense)

Percentage of Acquisition Cost

Percentage of Subscription Cost

Page 17: Ridge Regression using PROC REG A Fixed Effect Model for Determining the Mixture of Acquisition- Subscription Cost Steven Matthew Anderson Century Link.

The Results from My Brilliant Model

• Variance Inflation Factors are HUGE!• None of the parameter estimates are

significant• When parameter estimates were

significant: – the confidence intervals around them made

the results useless!– The signs were often wrong with respect to

reality

Page 18: Ridge Regression using PROC REG A Fixed Effect Model for Determining the Mixture of Acquisition- Subscription Cost Steven Matthew Anderson Century Link.

The Problem Reading the Log• Extreme Cases

– SAS Note: Model is not full rank. Least-squares solutions for the parameters are not unique. Some statistics will be misleading. A reported DF of 0 or B means that the estimate is biased.

– SAS Note: The following parameters have been set to 0, since the variables are a linear combination of other variables as shown. interaction =-105.877 * Intercept + 13.0209 * ln_agi + 8.13133 * ln_rgu

Page 19: Ridge Regression using PROC REG A Fixed Effect Model for Determining the Mixture of Acquisition- Subscription Cost Steven Matthew Anderson Century Link.

An Exampleods graphics on;

proc reg data=sim_data outvif outest=bob ; model total_expense=A S

Interaction / tol vif collin;run;proc print data=bob;run;

ods graphics off;

Analysis of Variance

Source DF Sum of Squares

Mean Square

F Value Pr > F

Model 3 43231154 14410385 74.77 <.0001

Error 46 8865802 192735

Corrected Total 49 52096956

Parameter Estimates

Variable DF Parameter Estimate

Standard Error

t Value Pr > |t| Tolerance Variance Inflation

Intercept 1 14672 20592 0.71 0.4798 . 0

A 1 -4.55289 8.23521 -0.55 0.5830 0.00192 521.23743

S 1 -2.08466 4.09754 -0.51 0.6134 0.00330 302.85512

interaction 1 0.00176 0.00164 1.07 0.2898 0.00128 784.02140

Collinearity Diagnostics

Number Eigenvalue Condition Index

Proportion of Variation

Intercept A S interaction

1 3.99240 1.00000 5.692765E-7 5.758341E-7 5.725649E-7 5.794308E-7

2 0.00482 28.76909 0.00039475 0.00055150 0.00055094 0.00040402

3 0.00277 37.95309 0.00094557 0.00070160 0.00068843 0.00097339

4 0.00000230 1318.75978 0.99866 0.99875 0.99876 0.99862

Page 20: Ridge Regression using PROC REG A Fixed Effect Model for Determining the Mixture of Acquisition- Subscription Cost Steven Matthew Anderson Century Link.

So What Happened?

YXXXB

YXBXX

ABBABXBXYX

ondistributiXBXYXB

ABABXBYXB

XBYXB

TT

TT

TTT

TTT

TTTTT

T

1

*

*

*

*

)(*

)(

0)(

0)(

)(0)(

0)()(

If (XTX) is invertible, then B has a unique solution B=B*.

Basically for XTX to be invertible each column must be a pivot column. If design matrix X has one or more variables that are linear combinations of the other variables, then when you row reduce XTX you are going to get at least one row that has a bunch of zeros in it, and at least one of your columns isn’t going to be a pivot column. Ergo, you do not have a unique solution!

Near Multicollinearity means that at least one column is approximately a linear combination of some or all of the others, making XTX near singular.

Page 21: Ridge Regression using PROC REG A Fixed Effect Model for Determining the Mixture of Acquisition- Subscription Cost Steven Matthew Anderson Century Link.

(Enter stage left) Ridge Regression• Modify Least Squares

Regression to allow biased estimators of the regression coefficients.

• Bias versus precision trade off

YXkIXXB

XX

YXXXB

Tm

TR

T

TT

11

1

)(

columnstheamongityorthogonal

ofstatethetocloserand

ysingularitnearfrom

awaytomovemodifiedis

)(

E(bR)E(b) Bias of bR

Where k≥0 and is known as the biasing or shrinkage parameter

We introduce bias by uniformlyincreasing the diagonal elementsand leave the off-diagonal elementsinvariant

Page 22: Ridge Regression using PROC REG A Fixed Effect Model for Determining the Mixture of Acquisition- Subscription Cost Steven Matthew Anderson Century Link.

Methods for Picking a Likely Value of k

• Graphically using the Ridge Trace Graph – a plot of the parameters against k and estimating where the coefficients become “stable”

• Getting the VIF’s as close to 1 as possible• Staring at the errors and figure out where the RMSE

levels off

• Using the formula by Hoerl, Kennard, and Baldwin

OLSTOLS

Smk

2)1(

Page 23: Ridge Regression using PROC REG A Fixed Effect Model for Determining the Mixture of Acquisition- Subscription Cost Steven Matthew Anderson Century Link.

Simulation50 observationsIntercept=N(1000,50)Acquisition → N(2500,50)Subscription = 0.7*Acquisition Interaction = acquisition*subscription

So “in theory” we should end up with 57% Acquisition and 43% Subscription

122

21

22

22

21

21

0.0121718943761.8

57651))(4()1( 2

OLS

TOLS

Smk

Page 24: Ridge Regression using PROC REG A Fixed Effect Model for Determining the Mixture of Acquisition- Subscription Cost Steven Matthew Anderson Century Link.

SAS’s PROC REG

ods graphics on;proc reg data=sim_data outvif outest=rb ridge=0 to 0.03 by .001; title 'Ridge Regression with PROC REG'; model total_expense=A S Interaction / tol vif collin;run;ods graphics off;

Page 25: Ridge Regression using PROC REG A Fixed Effect Model for Determining the Mixture of Acquisition- Subscription Cost Steven Matthew Anderson Century Link.

SAS Ridge Plots

Page 26: Ridge Regression using PROC REG A Fixed Effect Model for Determining the Mixture of Acquisition- Subscription Cost Steven Matthew Anderson Century Link.

SAS Diagnostics

Page 27: Ridge Regression using PROC REG A Fixed Effect Model for Determining the Mixture of Acquisition- Subscription Cost Steven Matthew Anderson Century Link.

SAS Diagnostics II

Page 28: Ridge Regression using PROC REG A Fixed Effect Model for Determining the Mixture of Acquisition- Subscription Cost Steven Matthew Anderson Century Link.

SAS Output DatasetType of

statistics

Ridge regression

control value

Root mean squared error

Intercept A S interaction difference in rmse

PARMS   240.1072 4352.4418 1.4511 -3.1776 1.28E-03  

RIDGE 0 240.1072 4352.4418 1.4511 -3.1776 1.28E-03  

RIDGE 0.001 240.4279 2518.0393 1.8645 -1.7268 8.74E-04 13.3446

RIDGE 0.009 242.0831 616.1069 1.6862 0.5524 4.71E-04 4.6013

RIDGE 0.01 242.1817 565.9577 1.6599 0.6410 4.61E-04 4.0718

RIDGE 0.011 242.2697 524.0733 1.6362 0.7175 4.52E-04 3.6324

RIDGE 0.012 242.3488 488.6401 1.6147 0.7842 4.45E-04 3.2640

RIDGE 0.013 242.4203 458.3412 1.5953 0.8428 4.38E-04 2.9523

RIDGE 0.014 242.4855 432.1970 1.5776 0.8948 4.33E-04 2.6867

RIDGE 0.015 242.5451 409.4631 1.5615 0.9412 4.28E-04 2.4585

RIDGE 0.028 243.0417 268.0123 1.4331 1.2765 3.94E-04 1.1248

RIDGE 0.029 243.0680 263.2177 1.4269 1.2911 3.92E-04 1.0824

RIDGE 0.03 243.0934 258.8830 1.4211 1.3048 3.91E-04 1.0441

Page 29: Ridge Regression using PROC REG A Fixed Effect Model for Determining the Mixture of Acquisition- Subscription Cost Steven Matthew Anderson Century Link.

SAS Output DatasetRidge regression

control valueType of statistics A S interaction

0RIDGEVIF 244.8223 228.4689 530.7665

0.001RIDGEVIF 113.7915 110.8910 164.5080

0.009RIDGEVIF 14.5425 14.9128 8.0670

0.01RIDGEVIF 12.5768 12.9119 6.7163

0.011RIDGEVIF 10.9903 11.2939 5.6825

0.012RIDGEVIF 9.6907 9.9662 4.8737

0.013RIDGEVIF 8.6122 8.8629 4.2289

0.014RIDGEVIF 7.7071 7.9359 3.7067

0.015RIDGEVIF 6.9398 7.1492 3.2779

0.028RIDGEVIF 2.5530 2.6368 1.0876

0.029RIDGEVIF 2.4088 2.4880 1.0239

0.03RIDGEVIF 2.2770 2.3519 0.9663

Page 30: Ridge Regression using PROC REG A Fixed Effect Model for Determining the Mixture of Acquisition- Subscription Cost Steven Matthew Anderson Century Link.

Simulation ResultsModel: (57% Subscription, 43%Acquistion)

Expense =1,000+(Acquisition)+(Subscription)+(Interaction)

OLS: (184.1% Subscription, -84.1%Acquistion) Expense = 4352.442– 1.4511(Acquisition) –3.1776(Subscription) + (1.28E-03)(Interaction)

SAS Ridge: (67.3% Subscription, 32.7%Acquistion) Expense = 488.64 + 1.61(Acquisition) + 0.784(Subscription) + 3.624(Interaction)

Page 31: Ridge Regression using PROC REG A Fixed Effect Model for Determining the Mixture of Acquisition- Subscription Cost Steven Matthew Anderson Century Link.

Summary

• Ridge Regression corrects for multicollinearity problems by modifying the method of least squares to allow more precise biased estimators.

• Allows me to perform Customer Lifetime Value and Breakeven Analysis with existing correlated regressors

• Not perfect but better than OLS Estimation• SAS needs some additional functionality

– Confidence intervals for Bi’s– Confidence intervals for k

Page 32: Ridge Regression using PROC REG A Fixed Effect Model for Determining the Mixture of Acquisition- Subscription Cost Steven Matthew Anderson Century Link.

Next Steps

• Implementing other methodology for choosing shrinkage parameter

• Dorugade and Kashid (2009)• Mardikyan and Cetin (2008)• Lawless and Wang (kLW) (1976)

• Add to SAS– Confidence Intervals

• Firinguetti & Bobadilla’s Asymptotic Confidence Intervals• Crivelli, Firinguetti & Montano’s Boot Strapping Confidence

Intervals• Feig’s Monte Carlo method for Evaluating Confidence Intervals