1 Introduction

This note is for those familiar with principal components analysis (PCA) on a multivariate set of continuous variables. The terms loadings, scores, eigenvalues, screeplot etc. should be familiar.

The concept PCA of can be extended to nominal or ordinal categorical variables. That is, transforming a large set of variables to another set, ordered by decreasing amount of variability explained. The original variables have to be transformed to a numerical scale for this to work. This is referred to as Multivariate Analysis with Optimal Scaling (MVAOS) (De Leeuw et al., 2016).

MVAOS techniques quantify factors, replacing categorical values (in R terms, levels of factors) by real numbers, so that then PCA can be applied. But, the transformations are computed along with the PCA, in order to maximize the variance explained by each component in turn, as explained in the references.

There are two kinds of categorical variables (R factors):

Ordinal variables: the order is meaningful, but the interval between them are not necessarily the same. This is typical of the Likert scale ¹, in which respondents to a survey question indicate their level of agreement or disagreement with a statement on a symmetric scale. In R this is is.ordered(...) returns TRUE.

Nominal variables: are not ordered. The non-linear PCA procedure finds a transformation to best represent an ordering, based on the stepwise PCA. See the references for details. In R this is is.ordered(...) returns FALSE. Logical variables, i.e., is.logical(...) returns TRUE, are a special (binary) case of a nominal variable.

In this technical note we use the Gifi R package for MVAOS There are several other packages with somewhat different approaches, see the References.

1.1 The Gifi procedure

“MVAOS is a linear multivariate technique in the sense that it makes linear combinations of transformed variables, and it is a nonlinear multivariate technique in the sense that these transformations are generally nonlinear.”

The aim is to minimize a loss function:

\[ \sigma(H, A) = \sum_{\ell = 1}^L \mathrm{SSQ} (\sum_{j=1}^m H_j A_{j \ell}) \] where \(\mathrm{SSQ}\) is the unweighted sum-of-squares for each of the \(L\) equation blocks, \(H\) is the matrix of transformed variables, and \(A\) is the pattern, containing the coefficients of the linear combinations.

2 Package

The Gifi “Multivariate Analysis with Optimal Scaling” package follows the standard book (Gifi 1990). This re-implements the Gifi FORTRAN programs for R.

require("Gifi")

## Loading required package: Gifi

3 Synthetic dataset

The sample dataset is from “ABC Customer Satisfaction” from Kenett & Salini (2012). It is a customer satisfaction survey. All questions are ordinal on a Likert scale.

Load the example data and display the variable names:

data(ABC)
names(ABC)

##  [1] "satis"     "satis2010" "best"      "recom"     "product"   "equipment"
##  [7] "sales"     "technical" "training"  "purchase"  "pricing"

str(ABC)

## 'data.frame':    208 obs. of  11 variables:
##  $ satis    : Factor w/ 5 levels "1","2","3","4",..: 1 3 4 3 5 4 4 3 3 4 ...
##  $ satis2010: Factor w/ 5 levels "1","2","3","4",..: 1 3 1 3 3 3 3 4 3 3 ...
##  $ best     : Factor w/ 2 levels "0","1": 1 1 1 1 1 2 1 1 1 1 ...
##  $ recom    : Factor w/ 5 levels "1","2","3","4",..: 1 4 3 4 4 4 4 4 2 4 ...
##  $ product  : Factor w/ 5 levels "1","2","3","4",..: 1 4 2 4 4 4 3 4 2 3 ...
##  $ equipment: Factor w/ 5 levels "1","2","3","4",..: 1 4 4 3 4 4 4 3 3 1 ...
##  $ sales    : Factor w/ 5 levels "1","2","3","4",..: 1 3 3 4 5 4 3 4 3 4 ...
##  $ technical: Factor w/ 5 levels "1","2","3","4",..: 1 3 4 2 4 5 4 3 4 3 ...
##  $ training : Factor w/ 5 levels "1","2","3","4",..: 2 3 4 3 3 3 4 3 3 4 ...
##  $ purchase : Factor w/ 5 levels "1","2","3","4",..: 2 3 4 4 4 5 5 4 4 4 ...
##  $ pricing  : Factor w/ 5 levels "1","2","3","4",..: 1 3 3 2 3 4 3 3 2 2 ...

Notice all are R unordered factors, i.e. nominal. These are in fact ordinal, since they come from Likert scales. We will see how to treat them as both ordinal and nominal.

Make a subset with fewer variables, for demonstration:

ABC6 <- ABC[,6:11]

3.1 Ordinal MVAOS

We compute the MVAOS PCA using Gifi::princals. The default is for two components, treating the cateogries as ordinal, as is typical of a Likert scale. Also, the default is to treat the factors, no matter what the R data type, as ordinal.

3.1.1 PCA

We perform the MVAOS PCA asking for two components.

(fitord <- princals(ABC6))  ## default is ordinal=TRUE

## Call:
## princals(data = ABC6)
## 
## Loss value: 0.683 
## Number of iterations: 64 
## 
## Eigenvalues: 2.943 0.857

summary(fitord)

## 
## Loadings (cutoff = 0.1):
##           Comp1  Comp2 
## equipment  0.784       
## sales      0.639  0.611
## technical  0.674 -0.294
## training   0.673 -0.464
## purchase   0.681 -0.234
## pricing    0.741  0.355
## 
## Importance (Variance Accounted For):
##                  Comp1   Comp2
## Eigenvalues     2.9434  0.8566
## VAF            49.0565 14.2772
## Cumulative VAF 49.0600 63.3300

Here we see the proportion of transformed variance explained by each PC, in this case about 49% for PC1, 14% for PC2. About 37% is still unexplained.

3.1.2 Diagnostic plots

3.1.2.1 Transformations

These show how the original variable is transformed for use in the PCA.

plot(fitord, plot.type = "transplot")

These are all step functions, because the default is ordinal=TRUE.

We can see the transformations for the first few objects:

ABC6[1:6,]

##   equipment sales technical training purchase pricing
## 1         1     1         1        2        2       1
## 2         4     3         3        3        3       3
## 3         4     3         4        4        4       3
## 4         3     4         2        3        4       2
## 5         4     5         4        3        4       3
## 6         4     4         5        3        5       4

fitord$transform[1:6,]

##     equipment       sales   technical    training    purchase     pricing
## 1 -0.17733412 -0.19817402 -0.17400949 -0.09857960 -0.12797516 -0.17009979
## 2  0.03820579  0.01201815 -0.01075632 -0.01962519  0.01464401  0.01936495
## 3  0.03820579  0.01201815  0.01295460  0.04640791  0.03538021  0.01936495
## 4 -0.02320855  0.03235367 -0.11656669 -0.01962519  0.03538021 -0.08020186
## 5  0.03820579  0.07356952  0.01295460 -0.01962519  0.03538021  0.01936495
## 6  0.03820579  0.03235367  0.07087356 -0.01962519  0.08741712  0.05786234

3.1.2.2 Component loadings

We can see all the loadings, including those \(< 0.1\) not shown in the model summary:

round(fitord$loadings,3)

##              D1     D2
## equipment 0.784  0.020
## sales     0.639  0.611
## technical 0.674 -0.294
## training  0.673 -0.464
## purchase  0.681 -0.234
## pricing   0.741  0.355

And then plot them; these loading plots are similar to those we would see for continuous variables.

plot(fitord, "loadplot", main = "Loadings Plot ABC Data")  ## aspect ratio = 1

Here, PC1 is a “size” component, i.e., more or less satisfaction with all measures. Since the arrows point the same way in the loadings plot, satisfactions on all items are positively correlated.

Here PC2 contrasts sales/pricing (positive PC2) with training/technical (negative PC2). That is, after accounting for overall satsifaction, these two aspects of satisfaction are opposite. Satisfaction with “Equipment” has no loading in PC2, it is not part of this contrast.

3.1.2.3 Biplot

The biplot shows the loading vectors along with the PC scores in the space of the first two PCs. This helps find unusual observations that do not fit the overall pattern of inter-variable relations.

plot(fitord, "biplot", 
     main = "Biplot ABC Data")

This shows where the individual cases fall in the PC space. For example responent 135 is much more satisfied with training/techical than with sales/pricing , and is overall quite dissatisfied (far negative score for PC1).

For example:

ABC6[135,]

##     equipment sales technical training purchase pricing
## 135         2     3         2        1        1       2

This respondent gave the minimum score to all variables except training, which received a neutral score.

ABC6[51,]

##    equipment sales technical training purchase pricing
## 51         3     1         5        4        4       1

This respondent was quite satisfied with technical and training and purchase, which go together in PC1/2 space, but not at all satisfied with sales and pricing, which also go together, but are oppose to the other three in PC2 space.

3.1.2.4 Screeplot

This shows the relative amount of variance explained.

plot(fitord, "screeplot")

This shows that 2 components are probably all that is needed.

3.1.3 Cross-classification

We can check the correlation between two factors with cross-classification tables. Here we pick two that seem highly correlated in PC1/2 space.

(t.actual <- table(ABC6$purchase, ABC6$technical))

##    
##      1  2  3  4  5
##   1  1  3  2  1  1
##   2  5  8  4 14  2
##   3  1  8 10 29 14
##   4  3  7 12 35 31
##   5  1  0  1  4 11

t.purchase <- table(ABC6[,"purchase"])
t.technical <- table(ABC6[,"technical"])
(t.expected <- round(outer(t.purchase, t.technical/sum(t.purchase))))

##    
##     1  2  3  4  5
##   1 0  1  1  3  2
##   2 2  4  5 13  9
##   3 3  8  9 25 18
##   4 5 11 12 35 25
##   5 1  2  2  7  5

Difference actual and expected:

print(t.actual - t.expected)

##    
##      1  2  3  4  5
##   1  1  2  1 -2 -1
##   2  3  4 -1  1 -7
##   3 -2  0  1  4 -4
##   4 -2 -4  0  0  6
##   5  0 -2 -1 -3  6

Very close agreement, athough some discrepency in class 5 “highly satisfied”.

3.2 Nominal MVAOS

This if we can’t assume a rank.

3.2.1 PCA

(fitnom <- princals(ABC6, ordinal=FALSE))  ## nominal PCA

## Call:
## princals(data = ABC6, ordinal = FALSE)
## 
## Loss value: 0.683 
## Number of iterations: 218 
## 
## Eigenvalues: 2.944 0.856

summary(fitnom)

## 
## Loadings (cutoff = 0.1):
##           Comp1  Comp2 
## equipment  0.786       
## sales      0.637 -0.619
## technical -0.675 -0.283
## training   0.673  0.466
## purchase   0.682  0.221
## pricing    0.740 -0.356
## 
## Importance (Variance Accounted For):
##                  Comp1   Comp2
## Eigenvalues     2.9442  0.8563
## VAF            49.0703 14.2713
## Cumulative VAF 49.0700 63.3400

summary(fitord)

## 
## Loadings (cutoff = 0.1):
##           Comp1  Comp2 
## equipment  0.784       
## sales      0.639  0.611
## technical  0.674 -0.294
## training   0.673 -0.464
## purchase   0.681 -0.234
## pricing    0.741  0.355
## 
## Importance (Variance Accounted For):
##                  Comp1   Comp2
## Eigenvalues     2.9434  0.8566
## VAF            49.0565 14.2772
## Cumulative VAF 49.0600 63.3300

The loadings, eigenvalues and proportion of variance accounted for (VAF) are not the same as for the ordinal case, although very close. Notice that the signs of the loadings for PC2 is reversed. The loss value is the same.

Notice that many more iterations were required to find a solution than were required for the ordinal case.

3.2.2 Diagnostic plots

3.2.2.1 Transformations

plot(fitnom, plot.type = "transplot", stepvec=rep(FALSE,6))

Not step functions, not necessarily monotonic (see “equipmnet”).

Look at the transformations of two of the variables:

“Technical”: “Overall satisfaction level from technical support. (1 … very low, 5 … very high)”. So here we consider it nominal, not ordinal (which it really is). And the transformation is now reversed, but the sequence remains the same.

“Equipment”: Overall satisfaction level from the equipment. (1 … very low, 5 … very high). Somehow the transformation rates “low” worse than “very low”.

3.2.2.2 Component loadings:=

We can see all the loadings, including those \(< 0.1\) not shown in the model summary:

round(fitnom$loadings,3)

##               D1     D2
## equipment  0.786  0.003
## sales      0.637 -0.619
## technical -0.675 -0.283
## training   0.673  0.466
## purchase   0.682  0.221
## pricing    0.740 -0.356

And plot them:

plot(fitnom, "loadplot", main = "Loadings Plot ABC Data")  ## aspect ratio = 1

PC1 is a “size” component (overall satisfaction) but “technical” is opposite this because of the transformation, which no longer preserve the order; PC2 contrasts sales/pricing with training/purchase.

3.2.2.3 Biplot

plot(fitnom, "biplot", main = "Biplot ABC Data")

Again we see the anomalies, such as observations 51 and 135.

3.2.2.4 Screeplot:

plot(fitnom, "screeplot")

This is very similar to the ordinal case.

4 A real dataset

This is a messier dataset, a random sample from a real study, of which only a few variables are reported here – also we don’t say where they came from, and sensitive information has been coded so that it can’t be related to the original value. The variables are five demographic indicators: age group, education level, employment type, marital status, and self-reported ethnicity; also car ownership status.

load("NonlinearPCA_example.Rdata", verbose=TRUE)

## Loading objects:
##   ds.test

dim(ds.test)

## [1] 320   6

names(ds.test)

## [1] "AgeGroup"   "Ethnicity"  "Education"  "Mari"       "Employment"
## [6] "Car"

summary(ds.test)

##  AgeGroup Ethnicity  Education      Mari         Employment    Car     
##  A1: 32   Eth1: 68   E1:162    Married: 97   Full_time:171   No  :170  
##  A2:174   Eth2: 38   E2:108    Other  : 16   Student  : 23   Yes1: 17  
##  A3: 72   Eth3:131   E3: 50    Single :207   Other    : 31   Yes2:133  
##  A4: 39   Eth4: 42                           Part_time: 90             
##  A5:  3   Eth0: 41                           Retired  :  5

There are 320 survey responses and 6 variables . We have arranged AgeGroup and Education to have a logical order, but not the others, which are just nominal categories.

We use nonlinear PCA to see how much redundancy there is in these variables, and their inter-relation.

4.1 PCA

With so many variables we choose to start with three components.

We specify a Boolean vector for ordinal to princcals, with the two factors AgeGroup and Education as ordinal, the others as nominal.

names(ds.test)

## [1] "AgeGroup"   "Ethnicity"  "Education"  "Mari"       "Employment"
## [6] "Car"

(fitreal <- princals(ds.test, ndim=3, ordinal=c(T,F,T,F,F,F)))

## Call:
## princals(data = ds.test, ndim = 3, ordinal = c(T, F, T, F, F, 
##     F))
## 
## Loss value: 0.766 
## Number of iterations: 57 
## 
## Eigenvalues: 1.985 1.175 1.053

summary(fitreal)

## 
## Loadings (cutoff = 0.1):
##            Comp1  Comp2  Comp3 
## AgeGroup    0.781 -0.220  0.337
## Education   0.620  0.495       
## Employment  0.801         0.298
## Ethnicity          0.885       
## Mari       -0.425  0.290  0.588
## Car         0.408        -0.707
## 
## Importance (Variance Accounted For):
##                  Comp1   Comp2  Comp3
## Eigenvalues     1.9853  1.1748  1.053
## VAF            33.0887 19.5800 17.550
## Cumulative VAF 33.0900 52.6700 70.220

70% of the (transformed variables) variance is explained. The loadings show which variables are represented by each component. For the two ordinal variables, we can interpret the loadings as “more” or “less”, although the sign is arbitary. For the nominal variables, we can’t interpret as “more” or “less”, just that the variables (after transformation) go together in this PC space.

So in PC1 AgeGroup, Education and Employment go together, opposite to the transformed Mari.

PC2 is dominated by Ethnicity, which is absent from PC1; this has some correlation with Education in this component.

PC3 is dominated by Car, which is negatively related to Employment and AgeGroup.

4.2 Diagnostic plots

Examine the details of the transformations and PCA.

4.2.1 Transformations

plot(fitreal, plot.type = "transplot", stepvec=c(T,F,T,F,F,F))

Note the major jump in transformation of AgeGroup after A1 (the youngest group). Note also the large discrepency between the transformed value of Student and the other employment categories of Employment.

4.2.2 Screeplot

plot(fitreal, plot.type = "screeplot")

It’s clear more components are needed to account for the majority of the variance. This shows that these variables are fairly independent – this would be good if they are used as predictors in a multivariate regression.

However, with 3 PC we have reached the “knick point” of the screeplot.

4.2.3 Component loadings

We can see all the loadings, including those \(< 0.1\) not shown in the model summary:

round(fitreal$loadings,3)

##                D1     D2     D3
## AgeGroup    0.781 -0.220  0.337
## Ethnicity   0.029  0.885 -0.041
## Education   0.620  0.495  0.061
## Mari       -0.425  0.290  0.588
## Employment  0.801 -0.090  0.298
## Car         0.408  0.084 -0.707

Then plot them:

plot(fitreal, "loadplot", main = "Loadings Plot")  ## aspect ratio = 1

plot(fitreal, "loadplot", main = "Loadings Plot", plot.dim=2:3)  ## aspect ratio = 1

These show graphically the numeric values of the loadings from the model summary.

4.2.4 Biplot:

plot(fitreal, "biplot", main = "Biplot transit data")

plot(fitreal, "biplot", main = "Biplot transit data", plot.dim=c(2,3))

The numbers are indivuals, too many to see all of them clearly.

Some are “unusual” in the PC space, for example:

ds.test[92,]

##    AgeGroup Ethnicity Education   Mari Employment Car
## 92       A1      Eth1        E1 Single    Student  No

A young (group A1) single student with the least education and no car.

4.2.5 Transformation details

We can examine details of the transformation. For some interesting individuals:

The original values:

cases <- c(92, 208, 37, 207, 205, 126)
fitreal$data[cases,]

##     AgeGroup Ethnicity Education    Mari Employment  Car
## 92        A1      Eth1        E1  Single    Student   No
## 208       A2      Eth1        E1 Married    Student   No
## 37        A2      Eth1        E3 Married  Full_time Yes2
## 207       A3      Eth2        E2  Single    Student Yes1
## 205       A3      Eth2        E3  Single  Full_time Yes1
## 126       A2      Eth3        E1  Single      Other Yes2

Converted to numbers (R factor levels):

fitreal$datanum[cases,]

##      AgeGroup Ethnicity Education Mari Employment Car
## [1,]        1         1         1    3          2   1
## [2,]        2         1         1    1          2   1
## [3,]        2         1         3    1          1   3
## [4,]        3         2         2    3          2   2
## [5,]        3         2         3    3          1   2
## [6,]        2         3         1    3          3   3

The transformed values for these individuals:

fitreal$transform[cases,]

##        AgeGroup   Ethnicity   Education        Mari  Employment         Car
## 92  -0.16747360 -0.09710945 -0.05439992  0.03035302 -0.18398372 -0.05210407
## 208  0.01610293 -0.09710945 -0.05439992 -0.08151339 -0.18398372 -0.05210407
## 37   0.01610293 -0.09710945  0.07569963 -0.08151339  0.03339244  0.05542639
## 207  0.02243198  0.07968197  0.04655376  0.03035302 -0.18398372  0.08741075
## 205  0.02243198  0.07968197  0.07569963  0.03035302  0.03339244  0.08741075
## 126  0.01610293  0.02967712 -0.05439992  0.03035302 -0.02973029  0.05542639

The scores of each object in PC space:

fitreal$objectscores[cases,]

##             D1         D2         D3
## 92  -3.1451703 -0.8350931 -0.9437769
## 208 -1.4247147 -1.9421143 -1.0066304
## 37   1.2674147 -1.1236661 -1.0671522
## 207 -0.6841850  1.8515626 -1.5617919
## 205  1.0477315  1.7677667 -0.4325850
## 126 -0.3094668  0.1831522 -0.4982306

These scores could be used as transformed variables in a regression analysis:

ds.test$D1 <- fitreal$objectscores[,"D1"]
ds.test$D2 <- fitreal$objectscores[,"D2"]
ds.test$D3 <- fitreal$objectscores[,"D3"]

Note that by design these are uncorrelated:

round(cor(ds.test$D1, ds.test$D2), 10)

## [1] 0

round(cor(ds.test$D1, ds.test$D3), 10)

## [1] 0

round(cor(ds.test$D2, ds.test$D3), 10)

## [1] 0

4.3 Cross-classification

To see which levels of the predictors are correlated in bivariate space, we use a cross-classification matrix.

For example: ethnicity vs. education level:

# actual
(t.ee <- table(ds.test[,"Ethnicity"], ds.test[,"Education"]))

##       
##        E1 E2 E3
##   Eth1 46 16  6
##   Eth2 16 13  9
##   Eth3 50 51 30
##   Eth4 22 18  2
##   Eth0 28 10  3

The expected values are computed from the marginal totals:

# expected
t.eth <- table(ds.test[,"Ethnicity"])
t.edu <- table(ds.test[,"Education"])
(t.ee.expected <- round(outer(t.eth, t.edu)/sum(t.edu)))

##       
##        E1 E2 E3
##   Eth1 34 23 11
##   Eth2 19 13  6
##   Eth3 66 44 20
##   Eth4 21 14  7
##   Eth0 21 14  6

Difference actual and expected:

(t.ee - t.ee.expected)

##       
##         E1  E2  E3
##   Eth1  12  -7  -5
##   Eth2  -3   0   3
##   Eth3 -16   7  10
##   Eth4   1   4  -5
##   Eth0   7  -4  -3

There is quite some difference. In this case, the actual values show a lack of independence. Specifically, Eth1 have more than expected education status E1, whereas Eth3 have fewer. So these are not independent. We saw in the loadings plots that these two variables went together in PC2, but not the reason for it.

Test whether the expected is different from the actual by a \(\chi^2\) test:

# different?
chisq.test(t.ee)

## 
##  Pearson's Chi-squared test
## 
## data:  t.ee
## X-squared = 29.266, df = 8, p-value = 0.0002848

As expected, rejecting the null hypothesis of independence is quite unlikely to be an error.

5 References

Gifi, A. (1990). Nonlinear Multivariate Analysis. Wiley.

De Leeuw, J., Mair, P., & Groenen, P. (2016, September 26). Multivariate Analysis with Optimal Scaling. https://bookdown.org/jandeleeuw6/gifi/

Kenett, R. S., & Salini, S. (2012). Modern Analysis of Customer Surveys with Applications in R. New York: Wiley

Linting, M., Meulman, J. J., Groenen, P. J. F., & van der Kooij, A. J. (2007). Nonlinear principal components analysis: Introduction and application. Psychological Methods, 12(3), 336–358. https://doi.org/10.1037/1082-989X.12.3.336

Some packages not used here:

Le, S., Josse, J., & Husson, F. (2008). FactoMineR: An R package for multivariate analysis. Journal of Statistical Software, 25(1), 1–18.

Mair, P., & de Leeuw, J. (2010). A general framework for multivariate analysis with optimal scaling: The R package aspect. Journal of Statistical Software, 32(9), 1–23.

de Leeuw, J., & Mair, P. (2009). Gifi Methods for Optimal Scaling in R: The Package homals. Journal of Statistical Software, 31(4), 1–21.

https://en.wikipedia.org/wiki/Likert_scale ↩︎

Nonlinear Principal Components Analysis: Multivariate Analysis with Optimal Scaling (MVAOS)

D G Rossiter (d.g.rossiter@cornell.edu)

2021-06-23

1 Introduction

1.1 The Gifi procedure

2 Package

3 Synthetic dataset

3.1 Ordinal MVAOS

3.1.1 PCA

3.1.2 Diagnostic plots

3.1.2.1 Transformations

3.1.2.2 Component loadings

3.1.2.3 Biplot

3.1.2.4 Screeplot

3.1.3 Cross-classification

3.2 Nominal MVAOS

3.2.1 PCA

3.2.2 Diagnostic plots

3.2.2.1 Transformations

3.2.2.2 Component loadings:=

3.2.2.3 Biplot

3.2.2.4 Screeplot:

4 A real dataset

4.1 PCA

4.2 Diagnostic plots

4.2.1 Transformations

4.2.2 Screeplot

4.2.3 Component loadings

4.2.4 Biplot:

4.2.5 Transformation details

4.3 Cross-classification

5 References