Simple correspondence analysis (CA), Multiple correspondence analysis (MCA), Joint correspondence analysis (JCA), as well as all subset versions of these, using R package ca. Oleg Nenadić & Michael Greenacre University of Göttingen & Universitat Pompeu Fabra View of Aegean Sea and island of Lesbos. Turkey, August 2010. Assos Venue for CARME in ASSOS
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Tutorial presented at the CARME 2011 in Rennes, FranceFebruary 8, 2011
M. Greenacre, O. Nenadi
I - 1Correspondence analysis with ca
Introduction
In the practical part of this tutorial we demonsrate how to apply the capackage for simple, multiple and joint correspondence analysis in R.
R is a freely available statistical software environment. Since itsintroduction by R. Ihaka and R. Gentleman (1996) it has gained muchpopularity in the statistical community.
One advantage of R is the extension system, which allows for extendingR‘s capabilities by so-called packages.
Further information on R is available at the official R website: http://www.R-project.org .
I - 2Correspondence analysis with ca
The ca package, an overview
The ca package offers functions for the computation and visualization of correspondence analysis.
The core computations are done by the functions ca() (simple correspondence analysis) and mjca() (multiple and joint correspondenceanalysis).
Each function has its corresponding print, summary and plot method whichare used for presenting numerical results of the analysis and for thegraphical display.
Additional functions include auxillary functions that are usually not calleddirectly by the users (such as e.g. iterate.mjca() which is used in a joint correspondence analysis).
I - 3Correspondence analysis with ca
The ca package, an overview
The core functions in ca and its methods:
simple correspon- multiple and jointdence analysis correspondence analysis
Extensions to simple correspondence analysis include supplementaryrows and/or columns as well as a subset analysis.
These extensions are handled by the optional arguments supcol / suprow and subsetcol / subsetrow :
# Considering the first column (non-smokers) as supplementary: > ca(smoke, supcol = 1)
# Considering the subset of non-smokers (i.e. columns 2,3 and 4):> ca(smoke, subsetcol = 2:4)
# Adding a supplementary column to a subset analysis:> ca(smoke, subsetcol = 2:4, supcol = 1)
I - 7Correspondence analysis with ca
Simple correspondence analysis
The visualization of simple correspondence analysis is done with thecorresponding plot method:> plot(ca(smoke, supcol = 1))
I - 8Correspondence analysis with ca
Simple correspondence analysis
As with the core function, additional options are provided by optional arguments. For example, different map scaling options are available withthe option map :
option description"symmetric" Rows and columns in principal coordinates (default)"rowprincipal" Rows in principal and columns in standard coordinates"colprincipal" Rows in standard and columns in principal coordinates"symbiplot" Row and column coordinates are scaled to have variances
equal to the singular values"rowgab" Rows in principal coordinates and columns in standard co-
ordinates times mass"colgab" Columns in principal coordinates and rows in standard co-
ordinates times mass(according to a proposal by Gabriel and Odoro , 1990)
"rowgreen" Rows in principal coordinates and columns in standard co-ordinates times the square root of the mass
"colgreen" Columns in principal coordinates and rows in standard co-ordinates times the square root of the mass(according to a proposal by Greenacre, 2006)
I - 9Correspondence analysis with ca
Simple correspondence analysis
In addition, three-dimensional maps can be displayed using the rgl-package (D. Murdoch, D. Adler):> plot3d(ca(smoke))
I - 10Correspondence analysis with ca
Multiple and joint correspondence analysis
Multiple and joint correspondence analysis is computed with the functionmjca().
The approach to MCA is determined by the option lambda:
lambda=“indicator” Multiple correspondence analysis based on the indicator matrix
lambda=“Burt” Multiple correspondence analysis based on the Burt matrix
By default, an adjusted MCA is performed, i.e. lambda=“adjusted“.
I - 11Correspondence analysis with ca
Multiple and joint correspondence analysis
The input data for mjca() is a data frame comprising factors as thecolumns (response pattern matrix).
Internally, computations are performed on the Burt matrix (B), which isobtained from the indicator matrix (Z).
I - 12Correspondence analysis with ca
Multiple and joint correspondence analysis
An example: A multiple correspondence analysis on the wg93 dataset (i.e. four questions on attitude towards science with responses on a five-point scale):> mjca(wg93[,1:4])
The different approaches to MCA are specified with the optional argument lambda:
# MCA based on the indicator matrix:> mjca(wg93[,1:4], lambda = “indicator”)
# MCA based on the Burt matrix:> mjca(wg93[,1:4], lambda = “Burt”)
# MCA based on the adjusted approach:> mjca(wg93[,1:4], lambda = “adjusted”)# lambda=“adjusted” is the default, hence the following # gives the same result:> mjca(wg93[,1:4])
As with simple CA, supplementary variables are specified with the option supcol. In mjca() only supplementary variables (i.e. columns) are considered.
Columns 5 to 7 of the wg93 dataset contain additional demographic information (sex, age and education). These are included as supplementary variables as follows:
> mjca(wg93, supcol = 5:7)
I - 16Correspondence analysis with ca
Multiple and joint correspondence analysis
The option subsetcol in mjca() referrs to the column indexes of the subset categories (i.e. the levels of the variables).
For example, excluding the middle categories in the analysis of the wg93dataset is done as follows:
sex1(*) sex2(*) age1(*) age2(*) age3(*) age4(*) age5(*) age6(*)Mass NA NA NA NA NA NA NA NAChiDist NA NA NA NA NA NA NA NAInertia NA NA NA NA NA NA NA NADim. 1 -0.341876 0.328786 -0.405213 -0.243592 -0.033779 -0.030832 0.025808 0.666671Dim. 2 -0.130770 0.125763 -0.319599 0.305108 0.075773 -0.016810 -0.190774 -0.146837...
I - 18Correspondence analysis with ca
Multiple and joint correspondence analysis
The plotting method gives the graphical representation of the result as a map:> plot(mjca(wg93[,1:4]))
I - 19Correspondence analysis with ca
Summary
The computation is done with two functions, ca() for simple CA and mjca() for multiple and joint CA.
The input data is a table of frequencies for simple CA and a response pattern matrix (i.e. a data frame with factors) for multiple and joint CA.
In mjca() the type of analysis is controlled by the option lambda.
Subsets and supplementary variables are specified with subsetcol and supcol (in simple CA also subsetrow and suprow).
Output (numerical and graphical) is managed by the correspondingmethods (print, summary and plot).
All available options are listed in the manual / help files.
I - 20Correspondence analysis with ca
The End
The package is available from the CARME-N website (Correspondence Analysis and Related Methods Network):http://www.carme-n.org
Currently the package is at version 0.50, the current version includes a major revision for the mjca-part, where all computations have been rewritten to follow a unified approach.
The next update will focus on the graphical output.