Top Banner

of 20

Chapter11 Slides

Apr 04, 2018

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • 7/31/2019 Chapter11 Slides

    1/20

    Welcome to Powerpoint slidesfor

    Chapter 11

    Discriminant Analysis

    for

    Classification and

    Prediction

    Marketing Research

    Text and Cases

    byRajendra Nargundkar

  • 7/31/2019 Chapter11 Slides

    2/20

    Application Areas

    1. The major application area for this technique iswhere we want to be able to distinguish between two or

    three sets of objects or people, based on the knowledge

    of some of their characteristics.

    2. Examples include the selection process for a job, the

    admission process of an educational programme in acollege, or dividing a group of people into potential

    buyers and non-buyers.

    3. Discriminant analysis can be, and is in fact used, by

    credit rating agencies to rate individuals, to classifythem into good lending risks or bad lending risks. The

    detailed example discussed later tells you how to do

    that.

    4. To summarise, we can use linear discriminant

    analysis when we have to classify objects into two ormore groups based on the knowledge of some variables

    (characteristics) related to them. Typically, these groups

    would be users-non-users, potentially successful

    salesmanpotentially unsuccessful salesman, high risk

    low risk consumer, or on similar lines.

    Slide 1

  • 7/31/2019 Chapter11 Slides

    3/20

    Methods, Data etc.

    1. Discriminant analysis is very similar to the multiple

    regression technique. The form of the equation in a

    two-variable discriminant analysis is:Y = a + k1 x1 + k2 x2

    2. This is called the discriminant function. Also, like in

    a regression analysis, y is the dependent variable and x1

    and x2

    are independent variables. k1

    and k2

    are the

    coefficients of the independent variables, and a is a

    constant. In practice, there may be any number of x

    variables.

    3. Please note that Y in this case is a categorical

    variable (unlike in regression analysis, where it iscontinuous). x1 and x2 are however, continuous

    (metric) variables. k1 and k2 are determined by

    appropriate algorithms in the computer package used,

    but the underlying objective is that these two

    coefficients should maximise the separation or

    differences between the two groups of the y variable.

    4. Y will have 2 possible values in a 2 group

    discriminant analysis, and 3 values in a 3 group

    discriminant analysis, and so on.

    Slide 2

  • 7/31/2019 Chapter11 Slides

    4/20

    5. K1 and K2 are also called the unstandardised discriminantfunction coefficients

    6. As mentioned above, y is a classification into 2 or more

    groups and therefore, a grouping variable, in the

    terminology of discriminant analysis. That is, groups are

    formed on the basis of existing data, and coded as 1 and 2 orsimilar to dummy variable coding.

    7. The independent (x) variables are continuous scale

    variables, and used as predictors of the group to which the

    objects will belong. Therefore, to be able to use discriminant

    analysis, we need to have some data on y and the x variables

    from experience and / or past records.

    Slide 2 contd...

  • 7/31/2019 Chapter11 Slides

    5/20

    Building a Model for Prediction/Classification

    Assuming we have data on both the y and x variables of

    interest, we estimate the coefficients of the model which

    is a linear equation of the form shown earlier, and use the

    coefficients to calculate the y value (discriminant score)

    for any new data points that we want to classify into one ofthe groups. A decision rule is formulated for this process

    to determine the cut off score, which is usually the

    midpoint of the mean discriminant scores of the two

    groups.

    Accuracy of Classification:

    Then, the classification of the existing data points is done

    using the equation, and the accuracy of the model is

    determined. This output is given by the classification

    matrix (also called the confusion matrix), which tells uswhat percentage of the existing data points is correctly

    classified by this model.

    Slide 3

  • 7/31/2019 Chapter11 Slides

    6/20

    This percentage is somewhat analogous to the R2 in

    regression analysis (percentage of variation in dependentvariable explained by the model). Of course, the actual

    predictive accuracy of the discriminant model may be

    less than the figure obtained by applying it to the data

    points on which it was based.

    Stepwise / Fixed Model:

    Just as in regression, we have the option of entering one

    variable at a time (Stepwise) into the discriminant

    equation, or entering all variables which we plan to use.

    Depending on the correlations between the independentvariables, and the objective of the study (exploratory or

    predictive / confirmatory), the choice is left to the

    student.

    Slide 3 contd...

  • 7/31/2019 Chapter11 Slides

    7/20

    Slide 4

    Relative Importance of Independent Variables

    1. Suppose we have two independent variables, x1 and

    x2. How do we know which one is more important indiscriminating between groups?

    2. The coefficients of x1 and x2 are the ones which

    provide the answer, but not the raw (unstandardised)

    coefficients. To overcome the problem of different

    measurement units, we must obtain standardiseddiscriminant coefficients. These are available from the

    computer output.

    3. The higher the standardised discriminant coefficient

    of a variable, the higher its discriminating power.

  • 7/31/2019 Chapter11 Slides

    8/20

    Slide 5

    A Priori Probability of Classification into Groups

    The discriminant analysis algorithm requires us to

    assign an a priori (before analysis) probability of a

    given case belonging to one of the groups. There are

    two ways of doing this.

    .We can assign an equal probability of

    assignment to all groups. Thus, in a 2 group

    discriminant analysis, we can assign 0.5 as

    the probability of a case being assigned to

    any group.

    .We can formulate any other rule for the

    assignment of probabilities. For example, the

    probabilities could proportional to the group

    size in the sample data. If two thirds of thesample is in one group, the a priori

    probability of a case being in that group

    would be 0.66 (two thirds).

  • 7/31/2019 Chapter11 Slides

    9/20

    Slide 6

    We will turn now to a complete worked examplewhich will clarify many of the concepts explained

    earlier. We will begin with the problem statement

    and input data.

    Problem

    Suppose State Bank of Bhubaneswar wants to start

    credit card division. They want to use discriminant

    analysis and set up a system to screen applicants and

    classify them as either lowrisk or highrisk (risk

    of default on credit card bill payments), based oninformation collected from their applications for a

    credit card.

    Suppose SBB has managed to get from SBI, its

    sister bank, some data on SBIs credit card holders

    who turned out to be low risk (no default) andhigh risk (defaulting on payments) customers.

    These data on 18 customers are given in fig. 1.

  • 7/31/2019 Chapter11 Slides

    10/20

    Slide 7

    Fig. 1

    1

    RISKL

    1

    AG

    3

    INC

    4

    YRSM1 1 35 4000 82 1 33 4500 63 1 29 3600 54 2 22 3200 05 2 26 3000 16 1 28 3500 67 2 30 3100 78 2 23 2700 2

    9 1 32 4800 610 2 24 1200 411 2 26 1500 312 1 38 2500 713 1 40 2000 514 2 32 1800 415 1 36 2400 3

    16 2 31 1700 517 2 28 1400 318 1 33 1800 6

  • 7/31/2019 Chapter11 Slides

    11/20

    Slide 8

    We will perform a discriminant analysis and advise SBB

    on how to set up its system to screen potential good

    customers (low risk) from bad customers (high risk). In

    particular, we will build a discriminant function (model)

    and find out

    .The percentage of customers that it is able to

    classify correctly.

    .Statistical significance of the discriminant

    function.

    .Which variables (age, income, or years of

    marriage) are relatively better in discriminating

    between low and high risk applicants.

    .How to classify a new credit card applicant

    into one of the two groupslowrisk or high

    risk, by building a decision rule and a cut off

    score.

  • 7/31/2019 Chapter11 Slides

    12/20

    Slide 9

    Input Data are given in fig. 1.

    Interpretation of Computer Output:

    We will now find answers to all the four questions

    we have raised earlier.

    Q1. How good is the Model? How many of the 18

    data points does it classify correctly?

    To answer this question, we look at the computer

    output labelled fig. 3. This is a part of the

    discriminant analysis output from any computer

    package such as SPSS, SYSTAT, STATISTICA,

    SAS etc. (there could be minor variations in the exactnumbers obtained, and major variations could occur

    if options chosen by the student are different. For

    example, if a priori probabilities chosen for the

    classification into the two groups are equal, as we

    have assumed while generating this output, then you

    will very likely see similar numbers in your output).

    Fig. 3 : Classification Matrix

    STAT. Classification MatrixGrou Percent G_1 G_2G1 100.0000 9 0

    Total 94.4444 10 8

  • 7/31/2019 Chapter11 Slides

    13/20

    Slide 10

    This output (fig. 3) is called the classification matrix

    (also known as the confusion matrix), and it indicates

    that the discriminant function we have obtained is ableto classify 94.44 percent of the 18 objects correctly. This

    figure is in the percent correct column of the

    classification matrix. More specifically, it also says that

    out of 10 cases predicted to be in group 1, 9 were

    observed to be in group 1 and 1 in Group 2, (from

    column G-1). Similarly, from the column G-2, we

    understand that our of 8 cases predicted to be in group

    2, all 8 were found to be in group 2. Thus, on the whole,

    only 1 case out of 18 was misclassified by the

    discriminant model, thus giving us a classification (or

    prediction) accuracy level of (18-1)/18, or 94.444percent.

    As mentioned earlier, this level of accuracy may not

    hold for all future classification of new cases. But it is

    still a pointer towards the model being a good one,

    assuming the input data were relevant and scientificallycollected. There are ways of checking the validity of the

    model, but these will be discussed separately.

  • 7/31/2019 Chapter11 Slides

    14/20

    Slide 11

    Statistical Significance

    Q2. How significant, statistically speaking, is

    the discriminant function?

    This question is answered by looking at the

    Wilks Lambda and the probability value for

    the F test given in the computer output, as a

    part of fig. 3.(shown below)

    Discriminant Function Analysis ResultsNumber of variables in the model: 3Wilks Lambda: .3188764 approx. F (3, 14)= 9.968056 p < .00089

    The value of Wilks Lamba is 0.318. This

    value is between 0 and 1, and a low value

    (closer to 0) indicates better discriminating

    power of the model. Thus, 0.318 is an

    indicator of the model being good. The

    probability value of the F test indicates that thediscrimination between the two groups is

    highly significant. This is because p

  • 7/31/2019 Chapter11 Slides

    15/20

    Slide 12

    Q3. We have 3 independent (or predictor) variables

    Age, Income and No. of Years Married for. Which of

    these is a better predictor of a person being a low

    credit risk or high credit risk?

    To answer this question, we look at the

    standardised coefficients in the output. These aregiven in fig. 5 (shown below).

    Fig. 5.

    STAT. StandardizedVariable Root 1

    AGE _.923955Ei enval 2.136012

    This output shows that Age is the best predictor,

    with the coefficient of 0.92, followed by Income,

    with a coefficient of 0.77, Years of Marriage is the

    last, with a coefficient of 0.15, Please recall that theabsolute value of the standardised coefficient of each

    variable indicates its relative importance.

  • 7/31/2019 Chapter11 Slides

    16/20

    Slide 13

    Q4. How do we classify a new credit card applicantinto either a high risk or low risk category, and

    make a decision on accepting or refusing him a credit

    card?

    This is the most important question to be answered.

    Please remember why we started out with thediscriminant analysis in this problem. State Bank

    of Bhubaneswar wished to have a decision model

    for screening credit card applicants.

    The way to do this is to use the outputs in fig. 4

    (Raw or unstandardised coefficients in thediscriminant function) and fig. 6 (Means of

    canonical variables). Fig. 6, the means of canonical

    variables, gives us the new means for the

    transformed group centroids.

    Fig. 6.

    STAT. Means of CanonicalGrou Root 1

    G 1:1 -1.37793

  • 7/31/2019 Chapter11 Slides

    17/20

    Thus, the new mean for group 1 (low risk) is 1.37793, and the new mean for group 2 (high risk)

    is + 1.37792. This means that the midpoint of these

    two is 0. This is clear when we plot the two means

    on a straight line, and locate their midpoint, as

    shown below-

    -1.37 0 +1.37

    Mean of Group1 Mean of Group2

    (Low Risk) (High Risk)

    Slide 13 contd...

  • 7/31/2019 Chapter11 Slides

    18/20

    Slide 14

    This also gives us a decision rule for classifying any

    new case. If the discriminant score of an applicant

    falls to the right of the midpoint, we classify him as

    high risk, and if the discriminant score of anapplicant falls to the left of the midpoint, we classify

    him as low risk. In this case, the midpoint is 0.

    Therefore, any positive (greater than 0) value of the

    discriminant score will lead to classification as high

    risk, and any negative (less than 0) value of the

    discriminant score will lead to classification as lowrisk. But how do we compute the discriminant scores

    of an applicant?

    We use the applicants Age, Income and Years of

    Marriage (from his application) and plug these into

    the unstandardised discriminant function. This givesus his discriminant score.

  • 7/31/2019 Chapter11 Slides

    19/20

    Fig. 4.

    STAT. Raw CoefficientsVariable Root 1

    AGE -.24560Constan 10.00335

    Ei enval 2.13601

    From Fig. 4 (reproduced above), the unstandardised

    (or raw) discriminant function is

    Y = 10.0036 Age (.24560) Income (.00008)

    - Yrs. Married (.08465)Where y would give us the discriminant score of any

    person whose Age, Income and Yrs. Married were

    known.

    Slide 14 contd...

  • 7/31/2019 Chapter11 Slides

    20/20

    Slide 15

    Let us take an example of a credit card application to

    SBB who is aged 40, has an income of Rs. 25,000 permonth and has been married for 15 years. Plugging

    these values into the discriminant function or model

    above, we find his discriminant score y to be

    10.003640 (.24560)25000 (.00008)

    -15 (.08465), which is

    = 10.00369.82421.26975

    = - 3.09015

    According to our decision rule, any discriminant score

    to the left of the midpoint of 0 leads to a classificationin the low risk group. Therefore, we should give this

    person a credit card, as he is a low risk customer. The

    same process is to be followed for any new applicant.

    If his discriminant score is to the right of the midpoint

    of 0, he should be denied a credit card, as he is a high

    risk customer.

    We have completed answering the four questions

    raised by State Bank of Bhubaneswar.