8/3/2019 04. Regression Analysis
1/20
11
Regressionegressionegressionegression
8/3/2019 04. Regression Analysis
2/20
22
What Is Regressionhat Is Regressionhat Is Regressionhat Is Regression Regression is a measure of relation between aRegression is a measure of relation between a
dependent variable and a set of independentdependent variable and a set of independentvariables which affect the value of the dependentvariables which affect the value of the dependent
variable.variable. Preference for purchase Vs price, popularity of the brand,Preference for purchase Vs price, popularity of the brand,
product performanceproduct performance
The relationship derived is in the form of an equationThe relationship derived is in the form of an equation
Y = a+bY = a+b11xx11+b+b22xx22+b+b33xx33++Where Y = dependent variable, xWhere Y = dependent variable, x11,x,x22 = independent variable= independent variable
Regression is usually done on variables that areRegression is usually done on variables that aremeasured on an interval scale.measured on an interval scale.
8/3/2019 04. Regression Analysis
3/20
33
Types of Regressionypes of Regressionypes of Regressionypes of RegressionLinear:
Assumes linear
relationshipbetween
variables
Linear:Assumes linear
relationship
between
variables
SimpleWhen on independent variable isused to predict the value of thedependent variable.
When on independent variable isused to predict the value of thedependent variable.
MultipleWhen there are many independent
variables used to predict the value of
the dependant variable.
When there are many independent
variables used to predict the value of
the dependant variable.
Many models are available asymptotic, log linear, log logisticMany models are available asymptotic, log linear, log logistic
Non linear:When the
relationship is
non linear
Non linear:When the
relationship is
non linear
In most cases the underlying relationship is assumed to be linear.
8/3/2019 04. Regression Analysis
4/20
44
How To Determine Whichow To Determine WhichVariables To Include Inariables To Include InregressionegressionHow To Determine Whichow To Determine WhichVariables To Include Inariables To Include Inregressionegression Drop variables that are unlikely to affect value of dependentDrop variables that are unlikely to affect value of dependentvariable.variable.
Several models are available for eliminating variables from aSeveral models are available for eliminating variables from aregression analysis.regression analysis.
Eliminating independent variables having a low correlation to theEliminating independent variables having a low correlation to thedependent variable.dependent variable.
Stepwise regressionStepwise regression
Starting with the independent variable with the highest predictive value.Starting with the independent variable with the highest predictive value.
And entering variables one by one examining at each stage, the improvementAnd entering variables one by one examining at each stage, the improvementover the predictive power in the previous iteration.over the predictive power in the previous iteration.
At each stage all variables in the equation are examined to check if they areAt each stage all variables in the equation are examined to check if they areneeded. And if at any stage they are found superfluous they are dropped.needed. And if at any stage they are found superfluous they are dropped.
Forward selection Similar to stepwise regression except that noForward selection Similar to stepwise regression except that novariable is dropped once it is entered into the equation.variable is dropped once it is entered into the equation.
Backward elimination Using all independent variables and eliminatingBackward elimination Using all independent variables and eliminating
variables that contribute the least, one by one.variables that contribute the least, one by one.
8/3/2019 04. Regression Analysis
5/20
55
Linear Regression : Standardinear Regression : StandardOutput and Interpretation (1)utput and Interpretation (1)Linear Regression : Standardinear Regression : StandardOutput and Interpretation (1)utput and Interpretation (1) The total variation of the dependent
variable explained by the equation is69%.
This is a good fit and hence one canproceed to draw further inferences based on the assumption that therelationship is linear.
Adjusted R2 is an improvement over R2 inthat it takes into account the number of
variables used for predicting. If R2 is low then the model cannot be
assumed to be linear, further inferencesshould not be drawn in such cases.
The total variation of the dependentvariable explained by the equation is69%.
This is a good fit and hence one canproceed to draw further inferences based on the assumption that therelationship is linear.
Adjusted R2 is an improvement over R2 inthat it takes into account the number of
variables used for predicting. If R2 is low then the model cannot be
assumed to be linear, further inferencesshould not be drawn in such cases.
Interpretation
Model summary
R2 0.693
Adjusted R2 0.694
8/3/2019 04. Regression Analysis
6/20
66
Linear Regression : Standardinear Regression : StandardOutput and Interpretation (2)utput and Interpretation (2)Linear Regression : Standardinear Regression : StandardOutput and Interpretation (2)utput and Interpretation (2)
Sum of squares
Degrees of freedom (df)
F
Significance test
Sum of squares
Degrees of freedom (df)
F
Significance test
ANOVA
ANOVA
The F statistics is a measure of whether
any relationship exists between the
dependent and independent variable.
If (100 sig test) value is high (95% or
above) then the relationship exists. And
the model is robust for prediction.
The F statistics is a measure of whether
any relationship exists between the
dependent and independent variable.
If (100 sig test) value is high (95% or
above) then the relationship exists. And
the model is robust for prediction.
Interpretation
Interpretation
8/3/2019 04. Regression Analysis
7/2077
Linear Regression : Standardinear Regression : StandardOutput And Interpretationutput And InterpretationLinear Regression : Standardinear Regression : StandardOutput And Interpretationutput And InterpretationOutput Interpretation
Constant Constant to be used in the regression equation
B There is a B value for each independent variable.
It is the coefficient of each independent variable in theequation
A unit change in the independent variable can causeB units of change in the dependent variable, if all otherindependent variables are constant
Standard error It is the standard error of the coefficient B.
It is the normalised value of B. And removes the effect of thescale differences in the independent variables.
It is a measure of relative importance because it indicates theexpected change in the dependent variable per unit changein the independent variable.
Significance of t If t is not significant, then the independent variable is not agood predictor. And should be removed from the analysis.
8/3/2019 04. Regression Analysis
8/2088
Applications Of Regressionpplications Of Regressionpplications Of Regressionpplications Of Regression
8/3/2019 04. Regression Analysis
9/2099
Estimating Relative Importancestimating Relative ImportanceOf variables In Choicef variables In ChoiceEstimating Relative Importancestimating Relative ImportanceOf variables In Choicef variables In ChoiceThe values can be used as
a measure of relative
importance of independentvariables in choice. 57%0.794 Duration of
billing
0.593
43% Cleanliness
Relative
importanceParameter
Dependent variable overall satisfaction with
the store
The inference Both cleanliness and duration of billing are
important contributors to overall satisfaction
with the store.
Duration of billing is a relatively more important
contributor.
The rather than B value should be
used as it eliminates problems
related to differences in scale of
measurement the independentvariables.
However, if all independent variables
have been measured on the same
scale then there would be no
difference whether or B is used.
8/3/2019 04. Regression Analysis
10/201010
Forecastingorecastingorecastingorecasting The regression equation
can be used to predict the
value of the dependentvariable when the
independent variable
values are known.
Y = a+b1
x1
+b2
x2
+b3
x3
+
Data availableData available
Awareness for brand A duringAwareness for brand A duringthe period of a campaign.the period of a campaign.
GRPs in TV for the ad campaign.GRPs in TV for the ad campaign.
Can predictCan predict
What are the likely levels ofWhat are the likely levels ofawareness of brand A during theawareness of brand A during thenext campaign, for whichnext campaign, for which
estimates of GRP are available.estimates of GRP are available.
8/3/2019 04. Regression Analysis
11/201111
Some Caveats Toome Caveats ToRemember While Predictingemember While PredictingSome Caveats Toome Caveats ToRemember While Predictingemember While Predicting The prediction can be done only for theThe prediction can be done only for the
range of values based on which the originalrange of values based on which the original
estimation equation was obtained.estimation equation was obtained.
If the regression equation was obtained for theIf the regression equation was obtained for the
awareness of a brand vis--vis GRPs for a marketawareness of a brand vis--vis GRPs for a market
leader, it cannot be extrapolated for a minorityleader, it cannot be extrapolated for a minoritybrand.brand.
8/3/2019 04. Regression Analysis
12/201212
Is my model f it to predicts my model f it to predictsales ?ales ?Is my model f it to predicts my model f it to predictsales ?ales ?
01 02 03 04 05 06 07 08 0
A ctu a l S a le s P R ED I C TI O N 1 P R ED I C TI O N 2
8/3/2019 04. Regression Analysis
13/20
8/3/2019 04. Regression Analysis
14/201414
What is Discriminanthat is DiscriminantAnalysis?nalysis?What is Discriminanthat is DiscriminantAnalysis?nalysis? A modelling technique used when the dependent variable is aA modelling technique used when the dependent variable is a
categorical variable and independent variables are continuouscategorical variable and independent variables are continuousvariablesvariables
ApplicationsApplications Selection Process for a job,Selection Process for a job, Admission process of an educational programAdmission process of an educational program Dividing a group inDividing a group in
potential buyer & non- buyerpotential buyer & non- buyer
high risk low riskhigh risk low risk
Relationship is derived in the form of an equationRelationship is derived in the form of an equation Y = a + k1x1+ k2x2Y = a + k1x1+ k2x2
K1 and K2 are Coefficients of Independent VariableK1 and K2 are Coefficients of Independent Variable K1 and K2 should maximise the separation between two groupsK1 and K2 should maximise the separation between two groups
8/3/2019 04. Regression Analysis
15/201515
Predicting the Groupredicting the GroupMembershipembershipPredicting the Groupredicting the GroupMembershipembership Model building based on the linear discriminantModel building based on the linear discriminant
equationequation
Y determinant score is calculatedY determinant score is calculated
Cut Off point : Mid Point of mean discriminantCut Off point : Mid Point of mean discriminant
scores of the two groupsscores of the two groups
8/3/2019 04. Regression Analysis
16/201616
Linear Discriminant Analysisinear Discriminant AnalysisStandard Outputs andtandard Outputs andInterpretationnterpretationLinear Discriminant Analysisinear Discriminant AnalysisStandard Outputs andtandard Outputs andInterpretationnterpretation Classification/ Confusion MatrixClassification/ Confusion Matrix
Percent Correct/ Wrong ColumnPercent Correct/ Wrong Column
94.44%94.44%
Model has correctly classified 94.44% of the casesModel has correctly classified 94.44% of the cases
Level of accuracy may not hold true for futureLevel of accuracy may not hold true for future
predictions.. But is a good pointer towards modelpredictions.. But is a good pointer towards model
being a Good Onebeing a Good One
8/3/2019 04. Regression Analysis
17/20
1717
Linear Discriminant Analysisinear Discriminant AnalysisStandard Outputs andtandard Outputs andInterpretationnterpretationLinear Discriminant Analysisinear Discriminant AnalysisStandard Outputs andtandard Outputs andInterpretationnterpretation Wilks LambdaWilks Lambda
A low value of Wilks Lambda indicates highA low value of Wilks Lambda indicates high
significance of the modelsignificance of the model
F TestF Test P value is the decision criterionP value is the decision criterion
8/3/2019 04. Regression Analysis
18/20
1818
Linear Discriminant Analysisinear Discriminant AnalysisStandard Outputs andtandard Outputs andInterpretationnterpretationLinear Discriminant Analysisinear Discriminant AnalysisStandard Outputs andtandard Outputs andInterpretationnterpretation Relative Importance of Independent VariablesRelative Importance of Independent Variables
Standardized CoefficientsStandardized Coefficients
indicates relative importance of the variablesindicates relative importance of the variables
Classifying the casesClassifying the cases Means of Canonical VariablesMeans of Canonical Variables
Computed based on Raw co-efficient tableComputed based on Raw co-efficient table
Right side of Mid Point is Group 2Right side of Mid Point is Group 2
Left Side of Mid Point is Group 1Left Side of Mid Point is Group 1
8/3/2019 04. Regression Analysis
19/20
1919
Case Studyase Studyase Studyase Study A Business School selects its students everyA Business School selects its students every
year through a written test, interview and groupyear through a written test, interview and group
discussion. It then tracks the performance ofdiscussion. It then tracks the performance of
students during the two year program bystudents during the two year program bymeans of GPA. A GPA above 2.75 /4.0 is definedmeans of GPA. A GPA above 2.75 /4.0 is defined
as Successful and below as Unsuccessfulas Successful and below as Unsuccessful
students.students.
Can you develop a model that predicts whether aCan you develop a model that predicts whether astudent would be potentially successful or not.student would be potentially successful or not.
8/3/2019 04. Regression Analysis
20/20
2020
How good is the model?How good is the model?
Statistical Significance of the modelStatistical Significance of the model
PredictorsPredictors Classification of new StudentClassification of new Student