CCA for Finding Latent Relationships and Dimensionality Reduction

4
(1)

Canonical Correlation Analysis (CCA) is a powerful statistical technique. In machine learning and multimedia information retrieval, CCA plays a vital role in uncovering intricate relationships between different sets of variables. In this blog post, we will look into this powerful technique and show how it can be used for finding hidden correlations as well as for dimensionality reduction.

To understand CCA’s capabilities, let’s take a look at two sets of observations, X and Y, shown below. These two sets of observations are made on the same set of objects and each observation represents a different variable.

When computing the pairwise correlation between the column vectors of X and Y, we obtain the following set of values, where the entry at (i,j) represents the correlation between the i-th column of X and the j-th column of Y.

The resulting correlation values give us some insight between the two sets of measurements. The correlation values show moderate to almost no correlation between the columns of the two datasets except a relatively higher correlation between the second column of X and the third column of Y.

Hidden Relationship

It looks like there is not much of a relationship between X and Y. Is that so? Let’s wait before concluding that X and Y do not have much of a relationship.

Lets transform X and Y into one-dimensional arrays, a and b, using the vectors [-0.427 -0.576 0.696] and [0 0 -1].

a = X[-0.427 -0.576 0.696]T

b = Y[0 0 -1]T

Now, let’s calculate the correlation between a and b. Wow! we get a correlation value of 0.999, meaning that the two projections of X and Y are very strongly correlated. In other words, there is a very strong hidden relationship present in our two sets of observations. Wow! How did we end up getting a and b?” The answer to this is the canonical correlation analysis. 

What is Canonical Correlation Analysis?

Canonical correlation analysis is a technique that looks for pairs of basis vectors for two sets of variables X and Y such that the correlation between the projections of X and Y  onto these basis vectors are mutually maximized. In other words, the transformed arrays show much higher correlation to bring out any hidden relationship. The number of pairs of such basis vectors is limited to the smallest dimensionality of X and Y. For example, if X is an array of size nxp and Y of size nxq, then the number of basis vectors cannot exceed min{p,q}.

Assume wx andwy be the pair of basis vectors projecting X and Y into a and b given by a=Xwx, and b=Ywy. The projections a and b are called the scores or the canonical variates. The correlation between the projections, after some algebraic manipulation, can be expressed as:

where CxxCxy and Cyy  are three covariance matrices. CCA algorithm looks for basis vectors that maximize the correlation in the above expression. We will not go into the details of how the solution for basis vectors is computed. We would rather rely on sklearn library to find the solution vectors for the X-Y pair of vectors considered earlier.

Finding Basis Vectors

The sklearn library has a CCA object that we will use to show how I got vectors [-0.427 -0.576 0.696] and [0 0 -1] that I used earlier. This is shown in the code below.

from sklearn.cross_decomposition import CCA
import numpy as np
# Generate X and Y
X = np.array([[1,1,3],[2,3,2],[1,1,1],[1,1,2],
 [2,2,3],[3,3,2],[1,3,2],[4,3,5],[5,5,5]])
Y = np.array([[4,4,-1.078],[3,3,1.214],[2,2,0.307],
 [2,3,-0.385],[2,1,-0.078],[1,1,1.614],[1,2,0.814],[2,1,-0.064],[1,2,1.535]])

from sklearn.cross_decomposition import CCA
cca = CCA(n_components=3)
cca.fit(X, Y)
X_c, Y_c = cca.transform(X, Y)#Get projections

#Compute Correlation between projections
corrcoefs = np.corrcoef(X_c.T, Y_c.T).diagonal(offset=3)
print(corrcoefs)

[0.99999997 0.51938345 0.09113514]

We note that CCA has obtained the canonical correlation values and the first canonical correlation is showing a value of almost 1. Now let’s look at the basis vectors wx andwy found by CCA.

print(cca.x_weights_[:,0])
print(cca.y_weights_[:,0])

[-0.42737099 -0.57643382  0.69647548]
[-4.97454902e-05  5.71166513e-06 -9.99999999e-01]

These basis vectors are the same that I had used earlier. So, now you know how those vectors were obtained.

Dimensionality Reduction using CCA

Often, we are interested in performing dimensionality reduction either for visualization purposes or for creating a reduced set of features. Principal component analysis (PCA) is one of the well-known technique for dimensionality reduction. Fisher’s linear discriminant analysis (FLD) is another technique that can be used for dimensionality reduction.

It turns out that we can use CCA also for dimensionality reduction. I will demonstrate this using Wine data from sklearn library. Recall that this data consists of 13 features and there are three classes of wine in the dataset. To perform dimensionality reduction, we will break the data set into two pairs of observations, X and Y, by keeping first six features in X and the remaining seven features in Y. We then apply CCA to X and Y to search for a single set of basis vectors to create a pair of canonical variate that serve as our reduced features. This is shown below.

from sklearn.datasets import load_wine
wine = load_wine()
X = wine.data[:, :6]# Form X using first six features
Y = wine.data[:, 6:]# Form Y using the remaining seven features
# Perform feature normalization
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X = scaler.fit_transform(X)
Y = scaler.fit_transform(Y)

cca = CCA(n_components=1)
cca.fit(X, Y)
X_c, Y_c = cca.transform(X, Y)
# Plot the resulting projections

While plotting I have colored the projections with the corresponding wine category. We can see from the plot that the mapping from 13-dimensions to 2-dimensions results in a fairly good separation but possibly we need another pair of basis vectors if we were to use the canonical variates for classification.

For a comparision, I show the two-dimensional mappings using PCA and FLD for the same wine data. It is seen that FLD/LDA gives the best separation but CCA performs better than PCA. It should be noted that FLD/LDA is a supervised technique while CCA and PCA are both unsupervised methods.

Summary

Canonical Correlation Analysis (CCA) is a valuable statistical technique that enables us to uncover hidden relationships between two sets of variables. By identifying the most significant patterns and correlations, CCA helps gain valuable insights with numerous potential applications. CCA can be also used for dimensionality reduction. In machine and deep learning, CCA has been used for cross-modal learning and cross-modal retrieval. For example, consider a scenario where a user searches for images using a text query. CCA can assist in aligning the textual features of the query with the visual features of the image database, leading to more accurate and relevant search results. By finding the shared subspace between the text and image modalities, CCA enables effective information retrieval across different data types.

Rate This Post

Click on a star to rate it!

Average rating 4 / 5. Vote count: 1

No votes so far! Be the first to rate this post.