Top Banner

of 20

04. Regression Analysis

Apr 07, 2018

Download

Documents

Ankit Saxena
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • 8/3/2019 04. Regression Analysis

    1/20

    11

    Regressionegressionegressionegression

  • 8/3/2019 04. Regression Analysis

    2/20

    22

    What Is Regressionhat Is Regressionhat Is Regressionhat Is Regression Regression is a measure of relation between aRegression is a measure of relation between a

    dependent variable and a set of independentdependent variable and a set of independentvariables which affect the value of the dependentvariables which affect the value of the dependent

    variable.variable. Preference for purchase Vs price, popularity of the brand,Preference for purchase Vs price, popularity of the brand,

    product performanceproduct performance

    The relationship derived is in the form of an equationThe relationship derived is in the form of an equation

    Y = a+bY = a+b11xx11+b+b22xx22+b+b33xx33++Where Y = dependent variable, xWhere Y = dependent variable, x11,x,x22 = independent variable= independent variable

    Regression is usually done on variables that areRegression is usually done on variables that aremeasured on an interval scale.measured on an interval scale.

  • 8/3/2019 04. Regression Analysis

    3/20

    33

    Types of Regressionypes of Regressionypes of Regressionypes of RegressionLinear:

    Assumes linear

    relationshipbetween

    variables

    Linear:Assumes linear

    relationship

    between

    variables

    SimpleWhen on independent variable isused to predict the value of thedependent variable.

    When on independent variable isused to predict the value of thedependent variable.

    MultipleWhen there are many independent

    variables used to predict the value of

    the dependant variable.

    When there are many independent

    variables used to predict the value of

    the dependant variable.

    Many models are available asymptotic, log linear, log logisticMany models are available asymptotic, log linear, log logistic

    Non linear:When the

    relationship is

    non linear

    Non linear:When the

    relationship is

    non linear

    In most cases the underlying relationship is assumed to be linear.

  • 8/3/2019 04. Regression Analysis

    4/20

    44

    How To Determine Whichow To Determine WhichVariables To Include Inariables To Include InregressionegressionHow To Determine Whichow To Determine WhichVariables To Include Inariables To Include Inregressionegression Drop variables that are unlikely to affect value of dependentDrop variables that are unlikely to affect value of dependentvariable.variable.

    Several models are available for eliminating variables from aSeveral models are available for eliminating variables from aregression analysis.regression analysis.

    Eliminating independent variables having a low correlation to theEliminating independent variables having a low correlation to thedependent variable.dependent variable.

    Stepwise regressionStepwise regression

    Starting with the independent variable with the highest predictive value.Starting with the independent variable with the highest predictive value.

    And entering variables one by one examining at each stage, the improvementAnd entering variables one by one examining at each stage, the improvementover the predictive power in the previous iteration.over the predictive power in the previous iteration.

    At each stage all variables in the equation are examined to check if they areAt each stage all variables in the equation are examined to check if they areneeded. And if at any stage they are found superfluous they are dropped.needed. And if at any stage they are found superfluous they are dropped.

    Forward selection Similar to stepwise regression except that noForward selection Similar to stepwise regression except that novariable is dropped once it is entered into the equation.variable is dropped once it is entered into the equation.

    Backward elimination Using all independent variables and eliminatingBackward elimination Using all independent variables and eliminating

    variables that contribute the least, one by one.variables that contribute the least, one by one.

  • 8/3/2019 04. Regression Analysis

    5/20

    55

    Linear Regression : Standardinear Regression : StandardOutput and Interpretation (1)utput and Interpretation (1)Linear Regression : Standardinear Regression : StandardOutput and Interpretation (1)utput and Interpretation (1) The total variation of the dependent

    variable explained by the equation is69%.

    This is a good fit and hence one canproceed to draw further inferences based on the assumption that therelationship is linear.

    Adjusted R2 is an improvement over R2 inthat it takes into account the number of

    variables used for predicting. If R2 is low then the model cannot be

    assumed to be linear, further inferencesshould not be drawn in such cases.

    The total variation of the dependentvariable explained by the equation is69%.

    This is a good fit and hence one canproceed to draw further inferences based on the assumption that therelationship is linear.

    Adjusted R2 is an improvement over R2 inthat it takes into account the number of

    variables used for predicting. If R2 is low then the model cannot be

    assumed to be linear, further inferencesshould not be drawn in such cases.

    Interpretation

    Model summary

    R2 0.693

    Adjusted R2 0.694

  • 8/3/2019 04. Regression Analysis

    6/20

    66

    Linear Regression : Standardinear Regression : StandardOutput and Interpretation (2)utput and Interpretation (2)Linear Regression : Standardinear Regression : StandardOutput and Interpretation (2)utput and Interpretation (2)

    Sum of squares

    Degrees of freedom (df)

    F

    Significance test

    Sum of squares

    Degrees of freedom (df)

    F

    Significance test

    ANOVA

    ANOVA

    The F statistics is a measure of whether

    any relationship exists between the

    dependent and independent variable.

    If (100 sig test) value is high (95% or

    above) then the relationship exists. And

    the model is robust for prediction.

    The F statistics is a measure of whether

    any relationship exists between the

    dependent and independent variable.

    If (100 sig test) value is high (95% or

    above) then the relationship exists. And

    the model is robust for prediction.

    Interpretation

    Interpretation

  • 8/3/2019 04. Regression Analysis

    7/2077

    Linear Regression : Standardinear Regression : StandardOutput And Interpretationutput And InterpretationLinear Regression : Standardinear Regression : StandardOutput And Interpretationutput And InterpretationOutput Interpretation

    Constant Constant to be used in the regression equation

    B There is a B value for each independent variable.

    It is the coefficient of each independent variable in theequation

    A unit change in the independent variable can causeB units of change in the dependent variable, if all otherindependent variables are constant

    Standard error It is the standard error of the coefficient B.

    It is the normalised value of B. And removes the effect of thescale differences in the independent variables.

    It is a measure of relative importance because it indicates theexpected change in the dependent variable per unit changein the independent variable.

    Significance of t If t is not significant, then the independent variable is not agood predictor. And should be removed from the analysis.

  • 8/3/2019 04. Regression Analysis

    8/2088

    Applications Of Regressionpplications Of Regressionpplications Of Regressionpplications Of Regression

  • 8/3/2019 04. Regression Analysis

    9/2099

    Estimating Relative Importancestimating Relative ImportanceOf variables In Choicef variables In ChoiceEstimating Relative Importancestimating Relative ImportanceOf variables In Choicef variables In ChoiceThe values can be used as

    a measure of relative

    importance of independentvariables in choice. 57%0.794 Duration of

    billing

    0.593

    43% Cleanliness

    Relative

    importanceParameter

    Dependent variable overall satisfaction with

    the store

    The inference Both cleanliness and duration of billing are

    important contributors to overall satisfaction

    with the store.

    Duration of billing is a relatively more important

    contributor.

    The rather than B value should be

    used as it eliminates problems

    related to differences in scale of

    measurement the independentvariables.

    However, if all independent variables

    have been measured on the same

    scale then there would be no

    difference whether or B is used.

  • 8/3/2019 04. Regression Analysis

    10/201010

    Forecastingorecastingorecastingorecasting The regression equation

    can be used to predict the

    value of the dependentvariable when the

    independent variable

    values are known.

    Y = a+b1

    x1

    +b2

    x2

    +b3

    x3

    +

    Data availableData available

    Awareness for brand A duringAwareness for brand A duringthe period of a campaign.the period of a campaign.

    GRPs in TV for the ad campaign.GRPs in TV for the ad campaign.

    Can predictCan predict

    What are the likely levels ofWhat are the likely levels ofawareness of brand A during theawareness of brand A during thenext campaign, for whichnext campaign, for which

    estimates of GRP are available.estimates of GRP are available.

  • 8/3/2019 04. Regression Analysis

    11/201111

    Some Caveats Toome Caveats ToRemember While Predictingemember While PredictingSome Caveats Toome Caveats ToRemember While Predictingemember While Predicting The prediction can be done only for theThe prediction can be done only for the

    range of values based on which the originalrange of values based on which the original

    estimation equation was obtained.estimation equation was obtained.

    If the regression equation was obtained for theIf the regression equation was obtained for the

    awareness of a brand vis--vis GRPs for a marketawareness of a brand vis--vis GRPs for a market

    leader, it cannot be extrapolated for a minorityleader, it cannot be extrapolated for a minoritybrand.brand.

  • 8/3/2019 04. Regression Analysis

    12/201212

    Is my model f it to predicts my model f it to predictsales ?ales ?Is my model f it to predicts my model f it to predictsales ?ales ?

    01 02 03 04 05 06 07 08 0

    A ctu a l S a le s P R ED I C TI O N 1 P R ED I C TI O N 2

  • 8/3/2019 04. Regression Analysis

    13/20

  • 8/3/2019 04. Regression Analysis

    14/201414

    What is Discriminanthat is DiscriminantAnalysis?nalysis?What is Discriminanthat is DiscriminantAnalysis?nalysis? A modelling technique used when the dependent variable is aA modelling technique used when the dependent variable is a

    categorical variable and independent variables are continuouscategorical variable and independent variables are continuousvariablesvariables

    ApplicationsApplications Selection Process for a job,Selection Process for a job, Admission process of an educational programAdmission process of an educational program Dividing a group inDividing a group in

    potential buyer & non- buyerpotential buyer & non- buyer

    high risk low riskhigh risk low risk

    Relationship is derived in the form of an equationRelationship is derived in the form of an equation Y = a + k1x1+ k2x2Y = a + k1x1+ k2x2

    K1 and K2 are Coefficients of Independent VariableK1 and K2 are Coefficients of Independent Variable K1 and K2 should maximise the separation between two groupsK1 and K2 should maximise the separation between two groups

  • 8/3/2019 04. Regression Analysis

    15/201515

    Predicting the Groupredicting the GroupMembershipembershipPredicting the Groupredicting the GroupMembershipembership Model building based on the linear discriminantModel building based on the linear discriminant

    equationequation

    Y determinant score is calculatedY determinant score is calculated

    Cut Off point : Mid Point of mean discriminantCut Off point : Mid Point of mean discriminant

    scores of the two groupsscores of the two groups

  • 8/3/2019 04. Regression Analysis

    16/201616

    Linear Discriminant Analysisinear Discriminant AnalysisStandard Outputs andtandard Outputs andInterpretationnterpretationLinear Discriminant Analysisinear Discriminant AnalysisStandard Outputs andtandard Outputs andInterpretationnterpretation Classification/ Confusion MatrixClassification/ Confusion Matrix

    Percent Correct/ Wrong ColumnPercent Correct/ Wrong Column

    94.44%94.44%

    Model has correctly classified 94.44% of the casesModel has correctly classified 94.44% of the cases

    Level of accuracy may not hold true for futureLevel of accuracy may not hold true for future

    predictions.. But is a good pointer towards modelpredictions.. But is a good pointer towards model

    being a Good Onebeing a Good One

  • 8/3/2019 04. Regression Analysis

    17/20

    1717

    Linear Discriminant Analysisinear Discriminant AnalysisStandard Outputs andtandard Outputs andInterpretationnterpretationLinear Discriminant Analysisinear Discriminant AnalysisStandard Outputs andtandard Outputs andInterpretationnterpretation Wilks LambdaWilks Lambda

    A low value of Wilks Lambda indicates highA low value of Wilks Lambda indicates high

    significance of the modelsignificance of the model

    F TestF Test P value is the decision criterionP value is the decision criterion

  • 8/3/2019 04. Regression Analysis

    18/20

    1818

    Linear Discriminant Analysisinear Discriminant AnalysisStandard Outputs andtandard Outputs andInterpretationnterpretationLinear Discriminant Analysisinear Discriminant AnalysisStandard Outputs andtandard Outputs andInterpretationnterpretation Relative Importance of Independent VariablesRelative Importance of Independent Variables

    Standardized CoefficientsStandardized Coefficients

    indicates relative importance of the variablesindicates relative importance of the variables

    Classifying the casesClassifying the cases Means of Canonical VariablesMeans of Canonical Variables

    Computed based on Raw co-efficient tableComputed based on Raw co-efficient table

    Right side of Mid Point is Group 2Right side of Mid Point is Group 2

    Left Side of Mid Point is Group 1Left Side of Mid Point is Group 1

  • 8/3/2019 04. Regression Analysis

    19/20

    1919

    Case Studyase Studyase Studyase Study A Business School selects its students everyA Business School selects its students every

    year through a written test, interview and groupyear through a written test, interview and group

    discussion. It then tracks the performance ofdiscussion. It then tracks the performance of

    students during the two year program bystudents during the two year program bymeans of GPA. A GPA above 2.75 /4.0 is definedmeans of GPA. A GPA above 2.75 /4.0 is defined

    as Successful and below as Unsuccessfulas Successful and below as Unsuccessful

    students.students.

    Can you develop a model that predicts whether aCan you develop a model that predicts whether astudent would be potentially successful or not.student would be potentially successful or not.

  • 8/3/2019 04. Regression Analysis

    20/20

    2020

    How good is the model?How good is the model?

    Statistical Significance of the modelStatistical Significance of the model

    PredictorsPredictors Classification of new StudentClassification of new Student