Introduction
The complexity of social and educational phenomena often manifests across multiple dimensions simultaneously. Canonical Correlation Analysis (CCA) is a tool that allows us to identify patterns of association between two blocks or sets of variables, seeking the correlation between combinations of variables from both sets.
What is CCA?
Canonical Correlation Analysis is a multivariate technique that goes beyond classic prediction or dimensionality reduction methods. While multiple regression seeks to explain a single outcome with several predictors and PCA reduces variables to summarize one block, CCA analyzes two blocks of variables simultaneously.
- Objective: To find pairs of canonical variates (U and V) — each a linear combination of variables from its block — such that the correlation between U and V is maximized.
- Application: Ideal when we want to relate, for example, psychological profiles with academic performance variables, or parenting styles with child development indicators.
Each pair of variates describes a distinct “dimension of association,” and by extracting \(\\min(p, q)\) canonical functions, we can explore all the principal facets of the relationship.
Mathematical Foundations of CCA (very understandable)
Variables
- Block X = \((X₁, …, Xₚ)\)
- Block Y = \((Y₁, …, Y\_q)\)
Linear Combinations
\(\\U = {a}^{T} X,\quad V = b^T Y\)
with a (p×1) and b (q×1).
Maximize Correlation
\(\rho = \frac{a^T \Sigma_{XY}\,b}{\sqrt{a^T \Sigma_{XX} a}\,\sqrt{b^T \Sigma_{YY} b}}\)
Solution by Eigenanalysis
\[ \Sigma_{XX}^{-1} \Sigma_{XY} \Sigma_{YY}^{-1} \Sigma_{YX}), a = \rho^{2} ,a \]
\[ \Sigma_{YY}^{-1}\Sigma_{YX}\Sigma_{XX}^{-1}\Sigma_{XY})\,b = \rho^2\,b \]
- Eigenvalues \(\rho^{2}\) → squared canonical correlations
- Eigenvectors → weights a and b
Interpretation:
- The first pair \((U₁, V₁)\) captures the largest correlation; subsequent pairs capture progressively smaller correlations and are orthogonal to the previous ones.
Real-world Usage Examples
Parenting Styles and Child Development
- X = (Warmth, Control, Consistency)
- Y = (Self-esteem, Aggression, Social competence)
- Finding: Warmth and consistency increase self-esteem and competence; excessive control is associated with aggression.
Motivation and Learning Strategies
- X = (Intrinsic Motivation, Extrinsic Motivation, Amotivation)
- Y = (Deep Strategy, Surface Strategy, Metacognitive Strategy)
- Finding: Intrinsic → deep/metacognitive; extrinsic → surface.
Personality and Job Satisfaction
- X = Big Five Traits
- Y = (Satisfaction, Commitment, Engagement)
- Finding: Conscientiousness and extraversion predict satisfaction and engagement; neuroticism, negatively.
Differences from Other Techniques
| Technique | Blocks | Objective | Output |
|---|---|---|---|
| CCA | 2 | Maximize association between X–Y | Pairs of canonical variates |
| PCA | 1 | Reduce dimensionality, preserve variance | Orthogonal principal components |
| Multiple Correlation (R²) | 1 Y + several X | Explain variance of a single Y | R² coefficient and regression betas |
Practical Example in R
#| eval: false
#| include: false
# 1. Install and load package
if (!requireNamespace("yacca", quietly = TRUE)) {
install.packages("yacca")
}
library(yacca)
# 2. Simulate example data
set.seed(42)
n <- 200
X <- data.frame(
x1 = rnorm(n),
x2 = rnorm(n),
x3 = rnorm(n)
)
Y <- data.frame(
y1 = 0.5 * X$x1 + 0.3 * X$x2 + rnorm(n),
y2 = -0.2 * X$x2 + 0.4 * X$x3 + rnorm(n),
y3 = 0.1 * X$x1 - 0.3 * X$x3 + rnorm(n)
)
# 3. Run CCA
cca_res <- cca(data.frame(X, Y), xcol = 1:3, ycol = 4:6)
# 4. Inspect results
print(cca_res$cancor) # Canonical correlations
print(cca_res$xcoef) # Weights for X
print(cca_res$ycoef) # Weights for Y
# 5. Validate the first canonical function
u1 <- as.matrix(X) %*% cca_res$xcoef[, 1]
v1 <- as.matrix(Y) %*% cca_res$ycoef[, 1]
cat("Corr(U1, V1) =", cor(u1, v1), "\n")
Website made with Quarto, by Antonio Matas-Terrón based on Beatriz Milz original idea (https://beamilz.com/). License: CC BY-SA 2.0.