Dimension Reduction in R Essex Summer School in Data Analysis Lecture 3: Multidimensional Scaling Dave Armstrong Department of Political Science University of Wisconsin - Milwaukee e: [email protected]w: http://www.quantoid.net/teachessex/dimension/ August 13, 2015 1 / 49 Dissimilarities • Many data are inherently (or can be transformed into) dissimilarities or distances. • Inherently - respondents can be asked explicitly to provide information about the (dis)similarity between stimuli. Or, the data are inherently distance-based (e.g., distances between European cities). • We can calculate the profile dissimilarity among di↵erent observations based on existing data (e.g., how often legislators vote together). • Multidimensional Scaling tries to estimate δ jm = f (d jm ), where δ is the observed dissimilarity of point j and point m and d represents the distance between point j and point m in the low-dimensioned representation of the data. More than other techniques, MDS is used to provide a visual map of the stimuli (so applications where the number of latent dimensions is over 3 are rare). 2 / 49 Outline Classical MDS Example: Metric MDS (Torgersen) Example: Metric MDS using SMACOF Non-metric MDS Example: Non-metric MDS Individual Di↵erences Scaling Example: French Parties 3 / 49 Classical MDS Torgerson’s solution for the simple metric MDS problem: 1. convert the similarities/dissimilarities to a symmetric q by q matrix of squared distances 2. double-center the matrix of squared distances to remove the squared terms 3. perform an eigenvalue/eigenvector decomposition of the double-centered matrix to recover the coordinates. 4 / 49
13
Embed
Dissimilarities Dimension Reduction in R Essex Summer ...Dimension Reduction in R Essex Summer School in Data Analysis Lecture 3: Multidimensional Scaling Dave Armstrong Department
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
• Many data are inherently (or can be transformed into) dissimilaritiesor distances.
• Inherently - respondents can be asked explicitly to provide informationabout the (dis)similarity between stimuli. Or, the data are inherentlydistance-based (e.g., distances between European cities).
• We can calculate the profile dissimilarity among di↵erent observationsbased on existing data (e.g., how often legislators vote together).
• Multidimensional Scaling tries to estimate �jm = f(djm), where � isthe observed dissimilarity of point j and point m and d representsthe distance between point j and point m in the low-dimensionedrepresentation of the data.
More than other techniques, MDS is used to provide a visual map of thestimuli (so applications where the number of latent dimensions is over 3are rare).
2 / 49
Outline
Classical MDSExample: Metric MDS (Torgersen)Example: Metric MDS using SMACOF
Non-metric MDSExample: Non-metric MDS
Individual Di↵erences ScalingExample: French Parties
3 / 49
Classical MDS
Torgerson’s solution for the simple metric MDS problem:
1. convert the similarities/dissimilarities to a symmetric q by q matrixof squared distances
2. double-center the matrix of squared distances to remove the squaredterms
3. perform an eigenvalue/eigenvector decomposition of thedouble-centered matrix to recover the coordinates.
4 / 49
Technical Details I
We assume there exists in the world a matrix Z (which we do notobserve directly) such that stimuli are in the rows and dimensions are inthe columns
Assuming that z̄1 = z̄2 = . . . = z̄s = 0, then Z = Z⇤.
• An eigen decomposition of Y , the double-centered distance matrix,gives
Y = U⇤U 0
• The estimate of Z is:Z = U⇤
12
9 / 49
Outline
Classical MDSExample: Metric MDS (Torgersen)Example: Metric MDS using SMACOF
Non-metric MDSExample: Non-metric MDS
Individual Di↵erences ScalingExample: French Parties
10 / 49
Example: National Similarities
Wish (1971) asked students to rate the similarity of nations from 1 (verydi↵erent) to 9 (very similar).> u <- url("http://www.quantoid.net/files/essex/nations.rda")> load(u)> close(u)> nations[1:5, 1:5]
Brazil Congo Cuba Egypt France
Brazil 9.00 4.83 5.28 3.44 4.72
Congo 4.83 9.00 4.56 5.00 4.00
Cuba 5.28 4.56 9.00 5.17 4.11
Egypt 3.44 5.00 5.17 9.00 4.78
France 4.72 4.00 4.11 4.78 9.00
Since 9 represents maximum similarity, if we want a dissimilarities matrix,then we need to subtract the similarity values from some number � 9.> d <- (9-nations)^2
11 / 49
Creating the Double-centered Matrix
> doubleCenter <- function(x){+ p <- dim(x)[1]+ n <- dim(x)[2]+ -(x-matrix(apply(x,1,mean),nrow=p,ncol=n) -+ t(matrix(apply(x,2,mean),nrow=n,ncol=p)) + mean(x))/2+ }> D <- doubleCenter(d)> ev <- eigen(D)
We can make the coordinates with the following function:> makeCoords <- function(obj){+ scaleFac <- sqrt(max((abs(obj$vec[,1]))^2 + (abs(obj$vec[,2]))^2))+ x <- obj$vec[,1]*(1/T)*sqrt(obj$val[1])+ y <- obj$vec[,2]*(1/T)*sqrt(obj$val[2])+ coords <- cbind(x,y)+ return(coords)+ }> coords <- makeCoords(ev)> rownames(coords) <- rownames(D)
Note, the code in the book no longer works due to changes in the arguments to
smacofSym()". The code in the slides does work.
16 / 49
Configuration
−1.0 −0.5 0.0 0.5 1.0
−1.0
−0.5
0.0
0.5
1.0
SMACOF (Metric) Solution
●●
●
●
●
●
●
●
● ●
●
●
Stress = 0.223
BrazilCongo
Cuba
Egypt
France
India
Israel
JapanChina USSR
USA
Yugoslavia
17 / 49
Comparison of Torgerson and SMACOF Results
The comparison is really about how closely related the inter-pointdistances are:
> cor(dist(conf), dist(coords))
[1] 0.9247277
> plot(dist(conf), dist(coords))
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0.5 1.0 1.5
12
34
56
dist(conf)
dist(coords)
18 / 49
Scree Plot of Stress> stressPlot <- function(obj, maxdim=5){+ s <- NULL+ for(i in 1:maxdim){+ s <- c(s, update(obj, ndim=i)$stress)+ }+ plot(1:maxdim, s, type="o", pch=16,+ xlab = "# Dimensions", ylab = "Stress")+ }> stressPlot(smacof_metric_result)
●
●
●
● ●
1 2 3 4 5
0.15
0.20
0.25
0.30
0.35
0.40
# Dimensions
Stre
ss
19 / 49
SMACOF with Missing Data
We can induce some missing data in the nations matrix as follows:> d <- (9-nations)^2> d[1,2] <- d[2,1] <- NA> weightmat <- !is.na(d)> d[is.na(d)] <- mean(d, na.rm=TRUE)
• Above, we make the weight matrix equal to 1 if d is observed and 0otherwise
• We replace missing values in d with observed values, but thesecontribute nothing to the solution, it just allows smacofSym() towork.
• In factor analysis, we used numerical criteria based on theconfiguration matrix itself to find an appropriate rotation.
• With MDS, when comparing solutions, we want to rotate onesolution to bring it into maximal similarity with another.
• A procrustes rotation does this by specifying a matrix to be rotatedand a target toward which the input is to be fit.
31 / 49
Technical Details of Procrustes Rotation
1. Center the columns of X (the target configuration) and X⇤ (thematrix to be rotated) so they sum to zero.2
2. Calculate the product matrix, X0X⇤ and its singular valuedecomposition: X0X⇤
= UDV0
The optimal rotation matrix is T = VU0
The optimal dilation factor is s = (tr X0X⇤T)/(tr X⇤0X⇤)
3
The optimal translation vector is t = n�1(X� sX⇤T)
01.4
A Procrustes rotation function exists in the MCMCpack package in R.
2
This normalization is done as a matter of course by most MDS software like
smacof".
3
Dilation is stretching or shrinking the dimension (i.e., changing the variance).
4
Translation is changing the centroid (point of means) of the space.
32 / 49
Rotating the Non-metric Solution
> library(MCMCpack)> rot.nm <- procrustes(nm.conf, conf, translation=F, dilation=T)> # corrlation between raw n-m and m configurations> diag(cor(nm.conf, conf))
D1 D2
0.9905024 0.9397823
> # corrlation between rotated n-m and m configurations> diag(cor(rot.nm$X.new, conf))
[1] 0.9905579 0.9398426
The correlations are pretty similar, which suggests that there is not muchrotation> acos.deg <- function(x)acos(x)*180/pi> acos.deg(rot.nm$R[1,1])
[1] 0.4311835
The rotation is around 0.43�.
33 / 49
MDS with Agreement Matrix
We can do MDS with an agreement matrix as well (e.g., how often dolegislators vote together).
• Sometimes these matrices exist already, but we can make them fromscratch.
Using the dataset on Supreme Court votes, use Bootstrapped MDS toget stability estimates of the MDS stimuli configurations.> u <- url("http://www.quantoid.net/files/essex/supreme_court_2010-2013.rda")> load(u)> close(u)
The file contains data on 316 cases voted on by the current 9 supremecourt justices. They are scored 0 for conservative and 1 for liberaldecisions.
• Perform the BS MDS
• Plot the configuration with confidence ellipses..
40 / 49
Outline
Classical MDSExample: Metric MDS (Torgersen)Example: Metric MDS using SMACOF
Non-metric MDSExample: Non-metric MDS
Individual Di↵erences ScalingExample: French Parties
41 / 49
Individual Di↵erences Scaling
Individual Di↵erences Scaling operates as follows:
• Use individual data on dissimilarities distances to generate alow-dimensioned map for each individual.
• Generate weights that map each individual’s configuration onto aglobal configuration.
• These weights increase variance on important dimensions anddecrease variation on less important dimensions.
• This is a more flexible way of aggregating over individuals as it doesnot assume each individual uses each evaluative criterion in exactlythe same way.
The input to the individual di↵erences scaling algorithm is then a list ofdissimilarity matrices.
42 / 49
INDSCAL
In the INDSCAL problem, we are given n (i = 1, ..., n) matrices of theform
D⇤zi = Dzi+E = (diagZWiZ
0)J 0
q�2ZWiZ0+Jq(diagZWiZ
0)
0+Ei (6)
• q (j = 1, ..., q) is the number of stimuli
• we are asked to find the q by s (k = 1, ..., s) (where s is the numberof dimensions) matrix Z and the n diagonal matrices Wi such that:
Y ⇤i = ZWiZ
0+ E⇤
i (7)
The algorithm distinguishes Z from Z 0 in the following way:
Y ⇤i = Z(L)WiZ
0(R)+ E⇤
i (8)
43 / 49
INDSCAL Algorithm
1. Double center the n q by q symmetric matrices of squared distancesto obtain the n Y ⇤
i matrices.
2. Obtain starting estimates of ˆZ(L) and ˆZ(R) from eigendecomposition of ¯Y ⇤.
3. Use ˆZ(L) and ˆZ(R) to construct G (cross-products in Z)
4. Run OLS to obtain estimates of the ˆWi.
5. The ˆWi andˆZ(R) are used to construct i.
6. The ˆWi andˆZ(L) are used to construct i.
7. Go to Step 3.
8. Repeat Steps 3-5 until convergence.
44 / 49
Outline
Classical MDSExample: Metric MDS (Torgersen)Example: Metric MDS using SMACOF
Non-metric MDSExample: Non-metric MDS
Individual Di↵erences ScalingExample: French Parties
45 / 49
French Parties
> u <- url("http://www.quantoid.net/files/essex/french.parties.individuals.rda")> load(u)> close(u)> fi <- na.omit(french.parties.individuals)> parties <- lapply(1:50, function(x)dist(t(fi[x,]))^2 + .001)> indscal.result.2dim <- smacofIndDiff(delta=parties, ndim=2,+ constraint="indscal")
46 / 49
Plot of the Configuration
> plot.indscal(indscal.result.2dim)
First Dimension
Seco
nd D
imen
sion
−1.0
−0.5
0.0
0.5
1.0
−1.0 −0.5 0.0 0.5 1.0
●
extremeleftleftpartycommunist
socialistgreensudfbayrou
umpsarkozynationalfront
●
47 / 49
Plot of Weights
> w <- t(sapply(indscal.result.2dim$cweights, diag))> plot(w)
● ●●
●
●
●
●
● ●●●
●
● ●● ●●
●
●●●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
● ●●●
●
●
●●
●
●● ●
●
●
0.0 0.2 0.4 0.6 0.8 1.0 1.2
02
46
8
D1
D2
48 / 49
Exercise
The 2010 Chapel Hill Expert Survey (CHES) asked expert informants toplace European parties on three 11-point scales: general left-right(leftright), economic left-right (econlr), and social/cultural left-right(galtan). CHES2010.France is a list that includes the raw placementsof six French parties by seven experts(CHES2010.France$lr.placements) and dissimilarity matrices for eachexpert constructed from the sum of the absolute distances between theparties across the three scales(CHES2010.France$dissimilarity.matrices).
1. Use the smacofIndDiff() function to run Individual Di↵erencesScaling (INDSCAL) onCHES2010.France$dissimilarity.matrices in two dimensionswith the indscal constraint.
2. Which respondent has the most stress? Which respondent has theleast stress?
3. Which party has the most stress? Which party has the least stress?4. Plot the group configuration, clearly labeling the party names.