Top Banner
Interdependence Technique An interdependence relationship technique is that type of relationship that variables cannot be classified as either dependent or independent. In these all the variables are analyzed simultaneously, in an effort to find an underlying structure to the entire set of variables or subject. If the structure of variables is to be analyzed then factor analysis or confirmatory factor analysis is to the appropriate technique. But If cases or respondents are to be grouped to represent structure, then cluster analysis is selected finally if interest is the structure of object, the technique of perceptual mapping should be applied. As with dependence technique the measurement properties should be considered. Generally factor analysis and cluster analysis are considered to metric independence technique. However, non metric data may be transformed through dummy variable coding for use with special forms of factor analysis and cluster analysis. Both metric and non metric approaches to perceptual mapping have been developed. If the interdependence of objects measured by non metric data are to be analyzed, correspondence analysis is also an appropriate technique. Types of Multivariate techniques Multivariate analysis is an ever expanding set of technique for data analysis that encompasses a wide range of possible research situation as evidence by the classification scheme just discussed the more established as well as emerging technique include the following : a. Principal Component and Common factor analysis b. Multiple regression and multiple correlation c. Multiple discriminant analysis and logistic regression d. Conical correlation analysis e. Multivariate analysis of variance and covariance (MANOVA) f. Conjoint analysis g. Cluster analysis h. Perceptual mapping or multidimensional scaling i. Correspondence analysis j. Structural equation modeling and confirmatory factor analysis Factor Analysis When using Factor analysis is required:
34

Factor Analysis

Nov 10, 2014

Download

Documents

Factor Analysis for Data Analysis Comprising Many variables for reduction and identifying the structure of data
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Factor Analysis

Interdependence Technique

An interdependence relationship technique is that type of relationship that variables cannot be classified as either dependent or independent. In these all the variables are analyzed simultaneously, in an effort to find an underlying structure to the entire set of variables or subject. If the structure of variables is to be analyzed then factor analysis or confirmatory factor analysis is to the appropriate technique. But If cases or respondents are to be grouped to represent structure, then cluster analysis is selected finally if interest is the structure of object, the technique of perceptual mapping should be applied. As with dependence technique the measurement properties should be considered. Generally factor analysis and cluster analysis are considered to metric independence technique. However, non metric data may be transformed through dummy variable coding for use with special forms of factor analysis and cluster analysis. Both metric and non metric approaches to perceptual mapping have been developed. If the interdependence of objects measured by non metric data are to be analyzed, correspondence analysis is also an appropriate technique.

Types of Multivariate techniques

Multivariate analysis is an ever expanding set of technique for data analysis that encompasses a wide range of possible research situation as evidence by the classification scheme just discussed the more established as well as emerging technique include the following :

a. Principal Component and Common factor analysisb. Multiple regression and multiple correlationc. Multiple discriminant analysis and logistic regressiond. Conical correlation analysis e. Multivariate analysis of variance and covariance (MANOVA)f. Conjoint analysisg. Cluster analysish. Perceptual mapping or multidimensional scalingi. Correspondence analysisj. Structural equation modeling and confirmatory factor analysis

Factor AnalysisWhen using Factor analysis is required:

When there are many variables in a research design; it is often helpful to reduce the variables to a smaller set of factors.

When there is an interdependence relationship; in which there is no specification of dependent variable and independent variable.

When the researcher is looking for the underlying structure of the data matrix; ideally, the independent variables are normal and continuous, with at least 3 to 5 variables loading onto a factor. The sample size should be over 50 observations, with over 5 observations per variable.

When multicollinearity exit in a data set; when two or more data interrelate between themselves which effect of one cannot be isolated from the effect of another, the concept is a problem in multiple linear regressions but is generally preferred between the variables, as the correlations are key to data reduction.

Page 2: Factor Analysis

Kaiser’s Measure of Statistical Adequacy (MSA) is a measure of the degree to which every variable can be predicted by all other variables.

An overall MSA of .80 or higher is very good, with a measure of under .50 deemed poor. There are two main factor analysis methods: common factor analysis, which extracts factors based on the variance shared by the factors, and principal component analysis, which extracts factors based on the total variance of the factors. Common factor analysis is used to look for the latent (underlying) factors, where as principal components analysis is used to find the fewest number of variables that explain the most variance.

The first factor extracted explains the most variance. Typically, factors are extracted as long as the Eigen values are greater than 1.0 or the screen test visually indicates how many factors to extract. The factor loadings are the correlations between the factor and the variables. Typically a factor loading of .4 or higher is required to attribute a specific variable to a factor. An orthogonal rotation assumes no correlation between the factors, whereas an oblique rotation is used when some relationship is believed to exist.

Factor Analysis Factor analysis attempts to identify underlying variables, or factors, that explain the pattern of correlations within a set of observed variables. Factor analysis is often used in data reduction to identify a small number of factors that explain most of the variance that is observed in a much larger number of manifest variables. Factor analysis can also be used to generate hypotheses regarding causal mechanisms or to screen variables for subsequent analysis (for example, to identify collinearity prior to performing a linear regression analysis).

The factor analysis procedure offers a high degree of flexibility:

• Seven methods of factor extraction are available.

• Five methods of rotation are available, including direct oblimin and promax for nonorthogonal rotations.

• Three methods of computing factor scores are available, and scores can be saved as variables for further analysis.

Example. What underlying attitudes lead people to respond to the questions on a political survey as they do? Examining the correlations among the survey items reveals that there is significant overlap among various subgroups of items--questions about taxes tend to correlate with each other, questions about military issues correlate with each other, and so on. With factor analysis,

Page 3: Factor Analysis

you can investigate the number of underlying factors and, in many cases, identify what the factors represent conceptually. Additionally, you can compute factor scores for each respondent, which can then be used in subsequent analyses. For example, you might build a logistic regression model to predict voting behavior based on factor scores.

Statistics. For each variable: number of valid cases, mean, and standard deviation. For each factor analysis: correlation matrix of variables, including significance levels, determinant, and inverse; reproduced correlation matrix, including anti-image; initial solution (communalities, eigenvalues, and percentage of variance explained); Kaiser-Meyer-Olkin measure of sampling adequacy and Bartlett's test of sphericity; unrotated solution, including factor loadings, communalities, and eigenvalues; and rotated solution, including rotated pattern matrix and transformation matrix. For oblique rotations: rotated pattern and structure matrices; factor score coefficient matrix and factor covariance matrix. Plots: scree plot of eigenvalues and loading plot of first two or three factors.

Data. The variables should be quantitative at the interval or ratio level. Categorical data (such as religion or country of origin) are not suitable for factor analysis. Data for which Pearson correlation coefficients can sensibly be calculated should be suitable for factor analysis.

Assumptions. The data should have a bivariate normal distribution for each pair of variables, and observations should be independent. The factor analysis model specifies that variables are determined by common factors (the factors estimated by the model) and unique factors (which do not overlap between observed variables); the computed estimates are based on the assumption that all unique factors are uncorrelated with each other and with the common factors.

To Obtain a Factor Analysis

This feature requires the Statistics Base option.

 From the menus choose:

Analyze > Dimension Reduction > Factor...

 Select the variables for the factor analysis.

Principal Components Extraction (PC) (factor analysis algorithms)The matrix of factor loadings based on factor m is

Λm=ΩmΓ1/2m

where

Page 4: Factor Analysis

Ωm=(ω1,ω2,...,ωm)

Γm=diag(|γ1|,|γ2|,...,|γm|)

The communality of variable i is given by

hi=mΣj=1|γj|ω2ij

Analyzing a Correlation Matrix

γ1≥γ2≥...≥γm are the eigenvalues and ωi are the corresponding eigenvectors of R, where R is the correlation matrix.

Analyzing a Covariance Matrix

γ1≥γ2≥...≥γm are the eigenvalues and ωi are the corresponding eigenvectors of Σ, where Σ=(σij)n×n is the covariance matrix.

The rescaled loadings matrix is ΛmR=[diagΣ]−1/2Λm.

The rescaled communality of variable i is hiR=σ−1iihi.

Factor Analysis Select CasesTo select cases for your analysis:

 Choose a selection variable.

 Click Value to enter an integer as the selection value.

Only cases with that value for the selection variable are used in the factor analysis.

This feature requires the Statistics Base option.

 From the menus choose:

Analyze > Dimension Reduction > Factor...

 Click Select.

 Choose a selection variable and then click Value.

Factor Analysis Descriptives

Page 5: Factor Analysis

Statistics. Univariate descriptives includes the mean, standard deviation, and number of valid cases for each variable. Initial solution displays initial communalities, eigenvalues, and the percentage of variance explained.

Correlation Matrix. The available options are coefficients, significance levels, determinant, KMO and Bartlett's test of sphericity, inverse, reproduced, and anti-image.

• KMO and Bartlett's Test of Sphericity. The Kaiser-Meyer-Olkin measure of sampling adequacy tests whether the partial correlations among variables are small. Bartlett's test of sphericity tests whether the correlation matrix is an identity matrix, which would indicate that the factor model is inappropriate.

• Reproduced. The estimated correlation matrix from the factor solution. Residuals (difference between estimated and observed correlations) are also displayed.

• Anti-image. The anti-image correlation matrix contains the negatives of the partial correlation coefficients, and the anti-image covariance matrix contains the negatives of the partial covariances. In a good factor model, most of the off-diagonal elements will be small. The measure of sampling adequacy for a variable is displayed on the diagonal of the anti-image correlation matrix.

This feature requires the Statistics Base option.

 From the menus choose:

Analyze > Dimension Reduction > Factor...

 In the Factor Analysis dialog box, click Descriptives.

Factor Analysis ExtractionMethod. Allows you to specify the method of factor extraction. Available methods are principal components, unweighted least squares, generalized least squares, maximum likelihood, principal axis factoring, alpha factoring, and image factoring.

• Principal Components Analysis. A factor extraction method used to form uncorrelated linear combinations of the observed variables. The first component has maximum variance. Successive components explain progressively smaller portions of the variance and are all uncorrelated with each other. Principal components analysis is used to obtain the initial factor solution. It can be used when a correlation matrix is singular.

• Unweighted Least-Squares Method. A factor extraction method that minimizes the sum of the squared differences between the observed and reproduced correlation matrices (ignoring the diagonals).

Page 6: Factor Analysis

• Generalized Least-Squares Method. A factor extraction method that minimizes the sum of the squared differences between the observed and reproduced correlation matrices. Correlations are weighted by the inverse of their uniqueness, so that variables with high uniqueness are given less weight than those with low uniqueness.

• Maximum-Likelihood Method. A factor extraction method that produces parameter estimates that are most likely to have produced the observed correlation matrix if the sample is from a multivariate normal distribution. The correlations are weighted by the inverse of the uniqueness of the variables, and an iterative algorithm is employed.

• Principal Axis Factoring. A method of extracting factors from the original correlation matrix, with squared multiple correlation coefficients placed in the diagonal as initial estimates of the communalities. These factor loadings are used to estimate new communalities that replace the old communality estimates in the diagonal. Iterations continue until the changes in the communalities from one iteration to the next satisfy the convergence criterion for extraction.

• Alpha. A factor extraction method that considers the variables in the analysis to be a sample from the universe of potential variables. This method maximizes the alpha reliability of the factors.

• Image Factoring. A factor extraction method developed by Guttman and based on image theory. The common part of the variable, called the partial image, is defined as its linear regression on remaining variables, rather than a function of hypothetical factors.

Analyze. Allows you to specify either a correlation matrix or a covariance matrix.

• Correlation matrix. Useful if variables in your analysis are measured on different scales.

• Covariance matrix. Useful when you want to apply your factor analysis to multiple groups with different variances for each variable.

Extract. You can either retain all factors whose eigenvalues exceed a specified value, or you can retain a specific number of factors.

Display. Allows you to request the unrotated factor solution and a scree plot of the eigenvalues.

• Unrotated Factor Solution. Displays unrotated factor loadings (factor pattern matrix), communalities, and eigenvalues for the factor solution.

• Scree plot. A plot of the variance that is associated with each factor. This plot is used to determine how many factors should be kept. Typically the plot shows a distinct break between the steep slope of the large factors and the gradual trailing of the rest (the scree).

Maximum Iterations for Convergence. Allows you to specify the maximum number of steps that the algorithm can take to estimate the solution.

Page 7: Factor Analysis

This feature requires the Statistics Base option.

 From the menus choose:

Analyze > Dimension Reduction > Factor...

 In the Factor Analysis dialog box, click Extraction.

Factor Analysis RotationMethod. Allows you to select the method of factor rotation. Available methods are varimax, direct oblimin, quartimax, equamax, or promax.

• Varimax Method. An orthogonal rotation method that minimizes the number of variables that have high loadings on each factor. This method simplifies the interpretation of the factors.

• Direct Oblimin Method. A method for oblique (nonorthogonal) rotation. When delta equals 0 (the default), solutions are most oblique. As delta becomes more negative, the factors become less oblique. To override the default delta of 0, enter a number less than or equal to 0.8.

• Quartimax Method. A rotation method that minimizes the number of factors needed to explain each variable. This method simplifies the interpretation of the observed variables.

• Equamax Method. A rotation method that is a combination of the varimax method, which simplifies the factors, and the quartimax method, which simplifies the variables. The number of variables that load highly on a factor and the number of factors needed to explain a variable are minimized.

• Promax Rotation. An oblique rotation, which allows factors to be correlated. This rotation can be calculated more quickly than a direct oblimin rotation, so it is useful for large datasets.

Display. Allows you to include output on the rotated solution, as well as loading plots for the first two or three factors.

• Rotated Solution. A rotation method must be selected to obtain a rotated solution. For orthogonal rotations, the rotated pattern matrix and factor transformation matrix are displayed. For oblique rotations, the pattern, structure, and factor correlation matrices are displayed.

• Factor Loading Plot. Three-dimensional factor loading plot of the first three factors. For a two-factor solution, a two-dimensional plot is shown. The plot is not displayed if only one factor is extracted. Plots display rotated solutions if rotation is requested.

Maximum Iterations for Convergence. Allows you to specify the maximum number of steps that the algorithm can take to perform the rotation.

Page 8: Factor Analysis

This feature requires the Statistics Base option.

 From the menus choose:

Analyze > Dimension Reduction > Factor...

 In the Factor Analysis dialog box, click Rotation.

Factor Analysis ScoresSave as variables. Creates one new variable for each factor in the final solution.

Method. The alternative methods for calculating factor scores are regression, Bartlett, and Anderson-Rubin

• Regression Method. A method for estimating factor score coefficients. The scores that are produced have a mean of 0 and a variance equal to the squared multiple correlation between the estimated factor scores and the true factor values. The scores may be correlated even when factors are orthogonal.

• Bartlett Scores. A method of estimating factor score coefficients. The scores that are produced have a mean of 0. The sum of squares of the unique factors over the range of variables is minimized.

• Anderson-Rubin Method. A method of estimating factor score coefficients; a modification of the Bartlett method which ensures orthogonality of the estimated factors. The scores that are produced have a mean of 0, have a standard deviation of 1, and are uncorrelated.

Display factor score coefficient matrix. Shows the coefficients by which variables are multiplied to obtain factor scores. Also shows the correlations between factor scores.

This feature requires the Statistics Base option.

 From the menus choose:

Analyze > Dimension Reduction > Factor...

 In the Factor Analysis dialog box, click Scores.

Factor Analysis OptionsMissing Values. Allows you to specify how missing values are handled. The available choices are to exclude cases listwise, exclude cases pairwise, or replace with mean.

Page 9: Factor Analysis

Coefficient Display Format. Allows you to control aspects of the output matrices. You sort coefficients by size and suppress coefficients with absolute values that are less than the specified value.

This feature requires the Statistics Base option.

 From the menus choose:

Analyze > Dimension Reduction > Factor...

 In the Factor Analysis dialog box, click Options.

† Omit VARIABLES with matrix input.

**Default if subcommand or keyword is omitted.

This command reads the active dataset and causes execution of any pending commands. See the topic Command Order for more information.

See FACTOR Algorithms for computational details for this procedure.

Harman, H. H. 1976. Modern Factor Analysis, 3rd ed. Chicago: University of Chicago Press.Jöreskog, K. G. 1977. Factor analysis by least-square and maximum-likelihood method. In: Statistical Methods for Digital Computers, volume 3, K. Enslein, A. Ralston, and R. S. Wilf, eds. New York: John Wiley and Sons.

The Factor Analysis procedure has several extraction methods for constructing a solution.

For Data Reduction. The principal components method of extraction begins by finding a linear combination of variables (a component) that accounts for as much variation in the original variables as possible. It then finds another component that accounts for as much of the remaining variation as possible and is uncorrelated with the previous component, continuing in this way until there are as many components as original variables. Usually, a few components will account for most of the variation, and these components can be used to replace the original variables. This method is most often used to reduce the number of variables in the data file.

For Structure Detection. Other Factor Analysis extraction methods go one step further by adding the assumption that some of the variability in the data cannot be explained by the components (usually called factors in other extraction methods). As a result, the total variance explained by the solution is smaller; however, the addition of this structure to the factor model makes these methods ideal for examining relationships between the variables.

With any extraction method, the two questions that a good solution should try to answer are "How many components (factors) are needed to represent the variables?" and "What do these components represent?"

Page 10: Factor Analysis

An industry analyst would like to predict automobile sales from a set of predictors. However, many of the predictors are correlated, and the analyst fears that this might adversely affect her results.

This information is contained in the file car_sales.sav. See the topic Sample Files for more information. Use Factor Analysis with principal components extraction to focus the analysis on a manageable subset of the predictors.

► To run a principal components factor analysis, from the menus choose:

Analyze > Dimension Reduction > Factor...

► If the variable list does not display variable labels in file order, right-click anywhere in the variable list and from the context menu choose Display Variable Labels and Sort by File Order.

► Select Vehicle type through Fuel efficiency as analysis variables.

► Click Extraction.

► Select Scree plot.

► Click Continue.

► Click Rotation in the Factor Analysis dialog box.

► Select Varimax in the Method group.

► Click Continue.

► Click Scores in the Factor Analysis dialog box.

► Select Save as variables and Display factor score coefficient matrix.

► Click Continue.

► Click OK in the Factor Analysis dialog box.

These selections generate the following command syntax:

FACTOR /VARIABLES type price engine_s horsepow wheelbas width length curb_wgt fuel_cap mpg /MISSING LISTWISE /ANALYSIS type price engine_s horsepow wheelbas width length curb_wgt fuel_cap mpg /PRINT INITIAL EXTRACTION ROTATION FSCORE AIC /PLOT EIGEN /CRITERIA MINEIGEN(1) ITERATE(25) /EXTRACTION PC /CRITERIA ITERATE(25)

Page 11: Factor Analysis

/ROTATION VARIMAX /METHOD=CORRELATION /SAVE = REG(3).

• The procedure requests a principal components analysis of the variables type through mpg.

• The PRINT subcommand requests the initial communalities, extracted communalities, rotated communalities, factor score coefficient matrix, and anti-image covariance and correlation matrices.

• The PLOT subcommand requests a scree plot.

• The ROTATION subcommand requests a Varimax rotation of the solution for easier interpretation of the results.

• The SAVE subcommand requests the factor scores, computed using the regression method, to be saved to the active dataset. Three scores will be added.

• All other options are set to their default values.

Communalities indicate the amount of variance in each variable that is accounted for.

Initial communalities are estimates of the variance in each variable accounted for by all components or factors. For principal components extraction, this is always equal to 1.0 for correlation analyses.

Extraction communalities are estimates of the variance in each variable accounted for by the components. The communalities in this table are all high, which indicates that the extracted components represent the variables well. If any communalities are very low in a principal components extraction, you may need to extract another component.

The variance explained by the initial solution, extracted components, and rotated components is displayed. This first section of the table shows the Initial Eigenvalues.

The Total column gives the eigenvalue, or amount of variance in the original variables accounted for by each component.

The % of Variance column gives the ratio, expressed as a percentage, of the variance accounted for by each component to the total variance in all of the variables.

The Cumulative % column gives the percentage of variance accounted for by the first n components. For example, the cumulative percentage for the second component is the sum of the percentage of variance for the first and second components.

For the initial solution, there are as many components as variables, and in a correlations analysis, the sum of the eigenvalues equals the number of components. You have requested that eigenvalues greater than 1 be extracted, so the first three principal components form the extracted solution.

Page 12: Factor Analysis

The second section of the table shows the extracted components. They explain nearly 88% of the variability in the original ten variables, so you can considerably reduce the complexity of the data set by using these components, with only a 12% loss of information.

The rotation maintains the cumulative percentage of variation explained by the extracted components, but that variation is now spread more evenly over the components. The large changes in the individual totals suggest that the rotated component matrix will be easier to interpret than the unrotated matrix.

The scree plot helps you to determine the optimal number of components. The eigenvalue of each component in the initial solution is plotted.

Page 13: Factor Analysis

Generally, you want to extract the components on the steep slope.

The components on the shallow slope contribute little to the solution.

Page 14: Factor Analysis

The last big drop occurs between the third and fourth components, so using the first three components is an easy choice.

The rotated component matrix helps you to determine what the components represent.

The first component is most highly correlated with Price in thousands and Horsepower. Price in thousands is a better representative, however, because it is less correlated with the other two components.

The second component is most highly correlated with Length.

The third component is most highly correlated with Vehicle type.

This suggests that you can focus on Price in thousands, Length, and Vehicle type in further analyses, but you can do even better by saving component scores.

For each case and each component, the component score is computed by multiplying the case's standardized variable values (computed using listwise deletion) by the component's score coefficients.

Page 15: Factor Analysis

The resulting three component score variables are representative of, and can be used in place of, the ten original variables with only a 12% loss of information.

Using the saved components is also preferable to using Price in thousands, Length, and Vehicle type because the components are representative of all ten original variables, and the components are not linearly correlated with each other.Although the linear correlation between the components is guaranteed to be 0, you should look at plots of the component scores to check for outliers and nonlinear associations between the components.

► To produce a scatterplot matrix of the component scores, from the menus choose:

Graphs > Chart Builder..

► Click the Gallery tab, select Scatter/Dot for the chart type, and drag and drop the Scatterplot Matrix icon to the canvas.

► Select REGR factor score 1 for analysis 1 through REGR factor score 3 for analysis 1 as the matrix variables.

► Click OK.

These selections generate the following command syntax:

* Chart Builder.GGRAPH /GRAPHDATASET NAME="graphdataset" VARIABLES=FAC1_1 FAC2_1 FAC3_1 MISSING=LISTWISE REPORTMISSING=NO /GRAPHSPEC SOURCE=INLINE.BEGIN GPL SOURCE: s=userSource(id("graphdataset")) DATA: FAC1_1=col(source(s), name("FAC1_1")) DATA: FAC2_1=col(source(s), name("FAC2_1")) DATA: FAC3_1=col(source(s), name("FAC3_1")) GUIDE: axis(dim(1.1), ticks(null())) GUIDE: axis(dim(2.1), ticks(null())) GUIDE: axis(dim(1), gap(0px))

Page 16: Factor Analysis

GUIDE: axis(dim(2), gap(0px)) ELEMENT: point(position((FAC1_1/"FAC1_1"+FAC2_1/"FAC2_1"+FAC3_1/"FAC3_1") *(FAC1_1/"FAC1_1"+FAC2_1/"FAC2_1"+FAC3_1/"FAC3_1")))END GPL.The first plot in the first row shows the first component on the vertical axis versus the second component on the horizontal axis, and the order of the remaining plots follows from there.

The scatterplot matrix shows that the first component has a skewed distribution, which is because Price in thousands is skewed. A principal components extraction using a log-transformed price might give better results. The separation that you see in the third component is explained by the fact that Vehicle type is a binary variable. There appears to be a relationship between the first and third components, due to the fact that there are several expensive automobiles but no "luxury trucks." This problem may be alleviated by using a log-transformed price, but if this does not solve the problem, you may want to split the file on Vehicle type.

You can reduce the size of the data file from ten variables to three components by using Factor Analysis with a principal components extraction. Note that the interpretation of further analyses is dependent upon the relationships defined in the rotated component matrix. This step of "translation" complicates things slightly, but the benefits of reducing the data file and using uncorrelated predictors outweigh this cost.

Using Factor Analysis for Structure Detection

A telecommunications provider wants to better understand service usage patterns in its customer database. If services can be clustered by usage, the company can offer more attractive packages to its customers.

A random sample from the customer database is contained in telco.sav. See the topic Sample Files for more information. Use Factor Analysis to determine the underlying structure in service usage.

► To run a factor analysis, from the menus choose:

Analyze > Dimension Reduction > Factor...

Page 17: Factor Analysis

► If the variable list does not display variable labels in file order, right-click anywhere in the variable list and from the context menu choose Display Variable Labels and Sort by File Order.

► Select Long distance last month through Wireless last month and Multiple lines through Electronic billing as analysis variables.

► Click Descriptives.

► Select Anti-image and KMO and Bartlett's test of sphericity.

► Click Continue.

► Click Extraction in the Factor Analysis dialog box.

Page 18: Factor Analysis

► Select Principal axis factoring from the Method list.

► Select Scree plot.

► Click Continue.

► Click Rotation in the Factor Analysis dialog box.

► Select Varimax in the Method group.

Page 19: Factor Analysis

► Select Loading plot(s) in the Display group.

► Click Continue.

► Click OK in the Factor Analysis dialog box.

These selections generate the following command syntax:

FACTOR /VARIABLES longmon tollmon equipmon cardmon wiremon multline voice ebill pager internet callid callwait forward confer /MISSING LISTWISE /ANALYSIS longmon tollmon equipmon cardmon wiremon multline voice pager internet callid callwait forward confer ebill /PRINT INITIAL KMO EXTRACTION ROTATION /PLOT EIGEN ROTATION /CRITERIA MINEIGEN(1) ITERATE(25) /EXTRACTION PAF /CRITERIA ITERATE(25) /ROTATION VARIMAX /METHOD=CORRELATION .

• The procedure requests a principal axis factoring of the variables longmon through confer.

• The PRINT subcommand requests the initial communalities, extracted communalities, rotated communalities, and the Kaiser-Meyer-Olkin measure of sampling adequacy and Bartlett’s test of sphericity.

• The PLOT subcommand requests a scree plot and loadings plot.

• The ROTATION subcommand requests a Varimax rotation of the solution for easier interpretation of the results.

• All other options are set to their default values.

This table shows two tests that indicate the suitability of your data for structure detection.

Page 20: Factor Analysis

The Kaiser-Meyer-Olkin Measure of Sampling Adequacy is a statistic that indicates the proportion of variance in your variables that might be caused by underlying factors.

High values (close to 1.0) generally indicate that a factor analysis may be useful with your data. If the value is less than 0.50, the results of the factor analysis probably won't be very useful. Bartlett's test of sphericity tests the hypothesis that your correlation matrix is an identity matrix, which would indicate that your variables are unrelated and therefore unsuitable for structure detection.

Small values (less than 0.05) of the significance level indicate that a factor analysis may be useful with your data.

Initial communalities are, for correlation analyses, the proportion of variance accounted for in each variable by the rest of the variables.

Extraction communalities are estimates of the variance in each variable accounted for by the factors in the factor solution.

Small values indicate variables that do not fit well with the factor solution, and should possibly be dropped from the analysis.

The extraction communalities for this solution are acceptable, although the lower values of Multiple lines and Calling card show that they don't fit as well as the others.

The leftmost section of this table shows the variance explained by the initial solution.

Only three factors in the initial solution have eigenvalues greater than 1.

Page 21: Factor Analysis

Together, they account for almost 65% of the variability in the original variables. This suggests that three latent influences are associated with service usage, but there remains room for a lot of unexplained variation.

The second section of this table shows the variance explained by the extracted factors before rotation.

The cumulative variability explained by these three factors in the extracted solution is about 55%, a difference of 10% from the initial solution.

Thus, about 10% of the variation explained by the initial solution is lost due to latent factors unique to the original variables and variability that simply cannot be explained by the factor model.

The rightmost section of this table shows the variance explained by the extracted factors after rotation.

Total Variance Explained

The rotated factor model makes some small adjustments to factors 1 and 2, but factor 3 is left virtually unchanged. Look for changes between the unrotated and rotated factor matrices to see how the rotation affects the interpretation of the first and second factors.

Page 22: Factor Analysis

The scree plot confirms the choice of three components.

Factor Matrix

The relationships in the unrotated factor matrix are somewhat clear. The third factor is associated with Long distance last month. The second corresponds most strongly to Equipment last month, Internet, and Electronic billing. The first factor is associated with Toll free last month, Wireless last month, Voice mail, Paging service, Caller ID, Call waiting, Call forwarding, and 3-way calling. However, some of these "first

Page 23: Factor Analysis

factor" services are negatively associated with the second factor; some, positively. In general, there are a lot of services that have correlations greater than 0.2 with multiple factors, which muddies the picture. The rotated factor matrix should clear this up.

Rotated Factor Matrix

The factor transformation matrix describes the specific rotation applied to your factor solution. This matrix is used to compute the rotated factor matrix from the original (unrotated) factor matrix.Smaller off-diagonal elements correspond to smaller rotations. Larger off-diagonal elements correspond to larger rotations. The third factor is largely unaffected by the rotation, but the first two are now easier to interpret.

The first rotated factor is most highly correlated with Toll free last month, Caller ID, Call waiting, Call forwarding, and 3-way calling. These variables are not particularly correlated with the other two factors. The second factor is most highly correlated with Equipment last month, Internet, and Electronic billing. Thus, there are three major groupings of services, as defined by the services that are most highly correlated with the three factors. Given these groupings, you can make the following observations about the remaining services: Because of their moderately large correlations with both the first and second factors, Wireless last month, Voice mail, and Paging service bridge the "Extras" and "Tech" groups.

• Calling card last month is moderately correlated with the first and third factors, thus it bridges the "Extras" and "Long Distance" groups.• Multiple lines is moderately correlated with the second and third factors, thus it bridges the "Tech" and "Long Distance" groups.This suggests avenues for cross-selling. For example, customers who subscribe to extra services may be more predisposed to accepting special offers on wireless services than Internet services.

Page 24: Factor Analysis

The factor loadings plot is a visual representation of the rotated factor matrix. If the relationships in the matrix are complex, this plot may be easier to interpret.Using a principal axis factors extraction, you have uncovered three latent factors that describe relationships between your variables. These factors suggest various patterns of service usage, which you can use to more efficiently increase cross-sales.

Discriminant Analysis

Page 25: Factor Analysis

Discriminant analysis builds a predictive model for group membership. The model is composed of a discriminant function (or, for more than two groups, a set of discriminant functions) based on linear combinations of the predictor variables that provide the best discrimination between the groups. The functions are generated from a sample of cases for which group membership is known; the functions can then be applied to new cases that have measurements for the predictor variables but have unknown group membership.

Note: The grouping variable can have more than two values. The codes for the grouping variable must be integers, however, and you need to specify their minimum and maximum values. Cases with values outside of these bounds are excluded from the analysis.

Example. On average, people in temperate zone countries consume more calories per day than people in the tropics, and a greater proportion of the people in the temperate zones are city dwellers. A researcher wants to combine this information into a function to determine how well an individual can discriminate between the two groups of countries. The researcher thinks that population size and economic information may also be important. Discriminant analysis allows you to estimate coefficients of the linear discriminant function, which looks like the right side of a multiple linear regression equation. That is, using coefficients a, b, c, and d, the function is:

D = a * climate + b * urban + c * population + d * gross domestic product per capita

If these variables are useful for discriminating between the two climate zones, the values of D will differ for the temperate and tropic countries. If you use a stepwise variable selection method, you may find that you do not need to include all four variables in the function.

Statistics. For each variable: means, standard deviations, univariate ANOVA. For each analysis: Box's M, within-groups correlation matrix, within-groups covariance matrix, separate-groups covariance matrix, total covariance matrix. For each canonical discriminant function: eigenvalue, percentage of variance, canonical correlation, Wilks' lambda, chi-square. For each step: prior probabilities, Fisher's function coefficients, unstandardized function coefficients, Wilks' lambda for each canonical function.

Data. The grouping variable must have a limited number of distinct categories, coded as integers. Independent variables that are nominal must be recoded to dummy or contrast variables.

Assumptions. Cases should be independent. Predictor variables should have a multivariate normal distribution, and within-group variance-covariance matrices should be equal across groups. Group membership is assumed to be mutually exclusive (that is, no case belongs to more than one group) and collectively exhaustive (that is, all cases are members of a group). The procedure is most effective when group membership is a truly categorical variable; if group membership is based on values of a continuous variable (for example, high IQ versus low IQ), consider using linear regression to take advantage of the richer information that is offered by the continuous variable itself.

Page 26: Factor Analysis

To Obtain a Discriminant Analysis

This feature requires the Statistics Base option.

 From the menus choose:

Analyze > Classify > Discriminant...

 Select an integer-valued grouping variable and click Define Range to specify the categories of interest.

 Select the independent, or predictor, variables. (If your grouping variable does not have integer values, Automatic Recode on the Transform menu will create a variable that does.)

 Select the method for entering the independent variables.

• Enter independents together. Simultaneously enters all independent variables that satisfy tolerance criteria.

• Use stepwise method. Uses stepwise analysis to control variable entry and removal.

 Optionally, select cases with a selection variable.

**Default if subcommand or keyword is omitted.

This command reads the active dataset and causes execution of any pending commands. See the topic Command Order for more information.

See DISCRIMINANT Algorithms for computational details for this procedure.

Syntax for the DISCRIMINANT command can be generated from the Discriminant Analysis dialog.