# Understanding Tensors and Tensor Decompositions: Part 3

This is my third post on tensors and tensor decompositions. The first post on this topic primarily introduced tensors and some related terminology. The second post was meant to illustrate a particularly simple tensor decomposition method, called the *CP decomposition*. In this post, I will describe another tensor decomposition method, known as the *Tucker decomposition*. While the CP decomposition’s chief merit is its simplicity, it is limited in its approximation capability and it requires the same number of components in each mode. The Tucker decomposition, on the other hand, is extremely efficient in terms of approximation and allows different number of components in different modes. Before going any further, lets look at factor matrices and *n*-mode product of a tensor and a matrix.

##### Factor Matrices

Recall the CP decomposition of an order three tensor expressed as

\(\boldsymbol{\mathcal{X}}\approx \sum\limits_{r=1}\limits^{R}\bf{a}^{r}\circ \bf{b}^{r}\circ \bf{c}^{r},\)

where (\(\circ\)) represents the outer product. We can also represent this decomposition in terms of organizing the vectors, \(\bf{a}^{r}, \bf{b}^{r}, \bf{c}^{r}, r = 1,\cdots R\), into three matrices, **A**, **B**, and **C**, as

\(\bf{A}=\big[\bf{a}^1 \bf{a}^2 \cdots \bf{a}^R \big],\)

\(\bf{B}=\big[\bf{b}^1 \bf{b}^2 \cdots \bf{b}^R \big],\text{and}\)

\(\bf{C}=\big[\bf{c}^1 \bf{c}^2 \cdots \bf{c}^R \big]\)

The CP decomposition is then expressed as

\(\boldsymbol{\mathcal{X}}\approx \big[\Lambda;\bf {A,B,C}\big],\)

where \(\Lambda \) is a super-diagonal tensor with all zero entries except the diagonal elements. The matrices **A**, **B**, and **C** are called the *factor matrices*.

Next, lets try to understand the *n*-mode product.

##### Multiplying a Tensor and a Matrix

How do you multiply a tensor and a matrix? The answer is via *n*-mode product. The *n-mode product* of a tensor \({\boldsymbol{\mathcal{X}}\in {\mathbb{R}^{I_1\times I_2 \times \cdots I_N}}}\) with a matrix \(\bf{U}\in {\mathbb{R}^{J_1\times I_n}}\) is a tensor of size \(I_1\times I_2 \times \cdots I_{n-1}\times J\times I_{n+1}\times\cdots\times I_{N},\) and is denoted by \(\boldsymbol{\mathcal{X}}\times_n\bf{U}\). The product is calculated by multiplying each mode-*n* fibre by the **U** matrix.

Lets look at an example to better understand the *n-*mode product. Lets consider a 2x2x3 tensor whose frontal slices are:

\(\bf X_1 = \begin{bmatrix}1 & 3\\2 & 4\end {bmatrix},\bf X_2 = \begin{bmatrix}5 & 7\\6 & 8\end {bmatrix},\text{ and }\bf X_3 = \begin{bmatrix}9 & 11\\10 & 12\end {bmatrix}\).

Let the **U** matrix be \(\begin{bmatrix}1 & 3\\2 & 4\end {bmatrix}\). To carry out the multiplication, we perform mode-1 unfolding/flattening of the tensor first. This results in the following mode-1 unfolded matrix:

\(\bf X_{(1)}= \begin{bmatrix}1 & 3 & 5 & 7 & 9 & 11\\2 & 4 &6 &8 &10 & 12\end {bmatrix}.\)

The 1-mode product is then calculated by carrying out the matrix multiplication \(\bf {UX_{(1)}}\). The result is a 2×6 unfolded matrix that corresponds to a 2x3x2 tensor, \({\boldsymbol{\mathcal{Y}}}\), with the frontal slices as:

\(\bf Y_1 = \begin{bmatrix}7 & 15\\10 & 22\end {bmatrix},\bf Y_2 = \begin{bmatrix}23 & 31\\34 & 46\end {bmatrix},\text{ and }\bf Y_3 = \begin{bmatrix}39 & 47\\58 & 70\end {bmatrix}.\)

Thus, the *n*-mode product of a tensor with a matrix yields a new tensor. You really do not have to worry about manually calculating *n-*mode product. The package *rTensor* in R provides a function *ttm*, tensor times matrix, for this task. Similar functions are present in *Tensorly* and *Tensor Toolbox*.

##### Tucker Decomposition

In Tucker decomposition, we express a tensor via a core tensor and *n*-mode products through factor matrices which transform the core tensor into the given tensor. Thus, given a tensor, \({\boldsymbol{\mathcal{X}}\in {\mathbb{R}^{I\times J \times \ K}}}\), its Tucker decomposition is given as

\(\boldsymbol{\mathcal{X}}\approx \boldsymbol{\mathcal{G}}\times_1 \bf {A} \times_2 \bf {B} \times_3 \bf {C } = \sum\limits_{p=1}\limits^{P}\sum\limits_{q=1}\limits^{Q}\sum\limits_{r=1}\limits^{R} g_{pqr}\, \bf{a}^{p}\,\circ\, \bf{b}^{q}\,\circ\, \bf{c}^{r}\),

where \(\boldsymbol{\mathcal{G}}\) is the *core tensor* of size \(P\times Q\times R\), and factor matrices **A**, **B**, and **C** are of size \(I\times P,\,J\times Q,\,\text{and }K\times R\), respectively. The columns of the factor matrices span the corresponding mode space. The factor matrices are usually orthogonal and represent the *principal components in each mode. The elements of the core tensor represent the interactions between the different modes. *The mode ranks of the core tensor* P, Q,* and *R* are generally chosen smaller than the mode ranks *I*, *J*, and *K* of the original tensor. Thus, the core tensor is often viewed as a *compressed representation* of the original tensor. The figure below shows a pictorial representation of the Tucker decomposition.

There are a few points worth mentioning here. First, the CP decomposition is a special case of the Tucker decomposition when the core tensor is superdiagonal and *P*, *Q*, and *R* are all identical. Second, the Tucker decomposition is not unique. This is because we can perform *n*-mode product with any nonsingular matrix, say **U**, on core tensor to transform it and still maintain the exact same level of approximation of the original tensor as long as we multiply the n-mode factor matrix with inverse of matrix **U**. Third, there are a few variations on the Tucker decomposition wherein certain constraints are imposed on factor matrices or core tensor. One of these variations is known as *higher-order SVD (HOSVD)* that is the basis for computing Tucker decomposition.

##### Higher-Order SVD (HOSVD) and Computing Tucker Decomposition

The HOSVD of a tensor is an extension of singular-value decomposition (SVD) of matrices. In this case, the resulting core tensor and all factor matrices are orthogonal, ensuring that variations in each mode are captured independent of the other modes. The HOSVD of a tensor is efficiently computed by performing SVD on *n*-mode unfolded matrices of the given tensor and using the left singular matrices as factor matrices in HOSVD. To calculate the core tensor, the *n*-mode product operation is performed on the original tensor with transposed factor matrices.

Since a numerical example is always valueable for better understanding, I am going to show the steps involved in HOSVD using the 2x2x3 tensor that we used earlier for *n*-mode product. The mode-1 unfolded matrix for this tensor was earlier obtained as

\(\bf X_{(1)}= \begin{bmatrix}1 & 3 & 5 & 7 & 9 & 11\\2 & 4 &6 &8 &10 & 12\end {bmatrix}.\)

The HOSVD method says that we use this unfolded matrix in a SVD factorization and use the resulting left singular matrix as the factor matrix **A** in our tensor decomposition. I perform this step in R using the following code:

[code language=”r”]

> tnsr <- as.tensor(array(1:12,dim=c(2,2,3)))

# Perform mode-1 unfolding

> xmat <-k_unfold(tnsr,1)@data

# compute SVD of the unfolded matrix

> s <-svd(xmat) # Obtain left singular matrix

> print(A <-s$u)

[,1] [,2]

[1,] -0.6632348 -0.7484114

[2,] -0.7484114 0.6632348

[/code]

The matrix at the end of the code block above is the factor matrix **A** in the HOSVD of the given tensor. In a similar manner, we can obtain unfolded matrices for mode-2 and mode-3 and do SVD to get **B** and **C** factor matrices. These are shown in the code block below:

[code language=”r”]

> xmat <-k_unfold(tnsr,2)@data

> s <-svd(xmat)

> print(B <-s$u)

[,1] [,2]

[1,] -0.6157085 -0.7879740

[2,] -0.7879740 0.6157085

> xmat <-k_unfold(tnsr,3)@data

> s <-svd(xmat)

> print(C <-s$u)

[,1] [,2] [,3]

[1,] -0.2067359 -0.8891533 0.4082483

[2,] -0.5182887 -0.2543818 -0.8164966

[3,] -0.8298416 0.3803896 0.4082483

[/code]

Next, lets obtain the core tensor. We do this by performing mode-n multiplications with the original tensor and transposed factor matrices as shown below.

[code language=”r”]

> g <- ttm(ttm(ttm(tnsr,t(A),m=1),t(B),m=2),t(C),m=3) > g

Numeric Tensor of 3 Modes

Modes: 2 2 3

Data:

[1] -25.436122 0.011316 0.005752 0.190090 0.002073 0.758965

[/code]

Just to check whether the *hosvd* function in R, meant for HOSVD, gives us the same result, I perform the decomposition of the same 2x2x3 tensor and display the factor matrices, returned as *U* matrices, below. The core tensor is returned as *Z*, and *est* represents the reconstructed/estimated tensor from decomposed components. Just to show that HOSVD yields an orthogonal core tensor, I show the unfolded mode-1 matrix whose rows can be verified to be orthogonal. You can also compare the entries of the unfolded mode-1 matrix with the elements of the core tensor computed earlier by tensor to matrix multiplication.

[code language=”r”]

> hosvdD <- hosvd(tnsr, c(2,2,3)) |=======================================================================| 100% > str(hosvdD)

List of 4

$ Z :Formal class ‘Tensor’ [package “rTensor”] with 3 slots

.. ..@ num_modes: int 3

.. ..@ modes : int [1:3] 2 2 3

.. ..@ data : num [1:2, 1:2, 1:3] -25.43612 0.01132 0.00575 0.19009 0.00207 …

$ U :List of 3

..$ : num [1:2, 1:2] -0.663 -0.748 -0.748 0.663

..$ : num [1:2, 1:2] -0.616 -0.788 -0.788 0.616

..$ : num [1:3, 1:3] -0.207 -0.518 -0.83 -0.889 -0.254 …

$ est :Formal class ‘Tensor’ [package “rTensor”] with 3 slots

.. ..@ num_modes: int 3

.. ..@ modes : int [1:3] 2 2 3

.. ..@ data : num [1:2, 1:2, 1:3] 1 2 3 4 5 …

$ fnorm_resid: num 3.43e-14

> hosvdD$U[[1]]

[,1] [,2]

[1,] -0.6632348 -0.7484114

[2,] -0.7484114 0.6632348

> hosvdD$U[[2]]

[,1] [,2]

[1,] -0.6157085 -0.7879740

[2,] -0.7879740 0.6157085

> hosvdD$U[[3]]

[,1] [,2] [,3]

[1,] -0.2067359 -0.8891533 0.4082483

[2,] -0.5182887 -0.2543818 -0.8164966

[3,] -0.8298416 0.3803896 0.4082483

> hmat <- k_unfold(hosvdD$Z,1)@data > hmat

[,1] [,2] [,3] [,4] [,5] [,6]

[1,] -25.43612 0.005752 0.002073 1.5352 -1.776e-15 -3.331e-16

[2,] 0.01132 0.190090 0.758965 0.1858 9.714e-17 1.735e-16

[/code]

You will notice that while using the *hosvd* function for decomposition above, I specified the mode ranks for the core tensor as 2, 2, and 3, identical to the mode ranks of the original tensor. When the mode ranks of the core tensor are smaller than the starting tensor, the resulting decomposition is known as the *truncated HOSVD. *It turns out that the truncated HOSVD makes an excellent starting solution for the Tucker decomposition of a tensor. This initial solution is iteratively refined through the alternating least square algorithm. This approach for Tucker decomposition is known as *Higher-Order Orthogonal Iteration (HOOI)*.

##### Tucker Decomposition Example

Lets look at a simple example of applying Tucker decomposition. For this, I will use a 256×256 RGB image which is a mode-3 tensor. Lets first decompose this mode-3 tensor with a core tensor having mode ranks as 128, 128, and 3, as shown below.

[code language=”r”]

> XS<- readPNG(“mendrl.png”) > tuckerD <-tucker(as.tensor(XS),rank = c(128,128,3)) |=======================================================================| 100% > str(tuckerD)

List of 7

$ Z :Formal class ‘Tensor’ [package “rTensor”] with 3 slots

.. ..@ num_modes: int 3

.. ..@ modes : int [1:3] 128 128 3

.. ..@ data : num [1:128, 1:128, 1:3] -223.82 -2.08 1.62 1.95 -1.23 …

$ U :List of 3

..$ : num [1:256, 1:128] -0.0559 -0.0542 -0.0543 -0.0523 -0.0531 …

..$ : num [1:256, 1:128] -0.0589 -0.0594 -0.0598 -0.0601 -0.0613 …

..$ : num [1:3, 1:3] -0.613 -0.585 -0.53 0.75 -0.221 …

$ conv : logi TRUE

$ est :Formal class ‘Tensor’ [package “rTensor”] with 3 slots

.. ..@ num_modes: int 3

.. ..@ modes : int [1:3] 256 256 3

.. ..@ data : num [1:256, 1:256, 1:3] 0.44 0.419 0.389 0.3 0.269 …

$ norm_percent: num 94.6

$ fnorm_resid : num 12.8

$ all_resids : num [1:6] 12.8 12.8 12.8 12.8 12.8 …

[/code]

The output, *$est*, represents the reconstructed/estimated tensor. The output quantity, *$fnorm_resid*, represents the error, using the Frobenius norm, between the original and the reconstructed tensor. The original tensor, *mendril image, *consists of 256x256x3 = 196,608 elements. The Tucker decomposition results in 128x128x3 elements of the core tensor, and (256x128x2 + 3×3) elements of factor matrices with a total of 114,697. This implies a compression of 41.66% with an approximation error of 12.8%. Just to relate the compression with the approximation error, I reran the decomposition for two other sets of mode ranks, 64x64x3 and 32×32,3. The results are shown below with the original and the three reconstructed images along with the % compression and error values. This should suggest to you that Tucker decomposition is a pretty good tool for compressing multidimensional data.

We can also look at the above example from the PCA/SVD perspective. Lets view *mendril image* as a collection of three gray images. Also note that I mentioned earlier that the columns of factor matrices span the corresponding mode space. Thus, we can extract a representation for each mode, called *eigenmodes*, similar to eigenvector representation performed with matrices. This can be done by performing tensor to matrix multiplications as shown below. (Think of forming eigenimages by multiplying left and right singular matrices of SVD decomposition of an image)

[code language=”r”]

> xaa <-ttm(tuckerD$Z,tuckerD$U[[1]],m=1)

> xaa1 <- ttm(xaa,tuckerD$U[[2]],m=2)

> writePNG(xaa1@data[,,1],”mode1eig.png”)

> writePNG(xaa1@data[,,2],”mode2eig.png”)

> writePNG(xaa1@data[,,3],”mode3eig.png”)

[/code]

The three eigenmode images generated via the above code are shown below in the second row. The first row shows the three input mode images (RGB channels of *mendril image*). The mode-1 eigenimage is shown inverted as the elements in this image are all negative. This viewpoint of Tucker decomposition has been used to create *tensor faces*, an extension of the *eigenfaces* concept.

Hopefully, my three-part series on tensors has been helpfull. If you want to explore further, I suggest looking up an excellent overview paper on tensors, and an application oriented paper. Recently, tensor decomposition and its capability to compress representations is being exploited in convolutional deep learning networks where a trained set of convolutional layers is treated as a tensor for decomposition to gain computational advantages. We should be expecting more work along these lines coming out from different research groups.

Benson Muite has created a Git repository for the code used in this and its two previous blog posts on tensors. Pl. visit the repository at https://gitlab.com/bkmgit/understanding-tensors-and-tensor-decomposition/. Many thanks to him.

Hi Krishan, nice article. I have a doubt in rank of a matrix. After performing outer product of the two vectors a and b, the matrix is of rank 2 I believe. Because both the columns are independent.

Hi Krishan, the example was good with R. An example of Tucker Decomposition using Python packages will be more beneficial to us I believe.

Will post soon with Python.

Hi Krishan, the example with R was good. The same example with Python coding also will be more beneficial to us I believe.

OK. I will get to it soon.

Hi Krishnan, thanks a lot for explaining nicely . i just want to know is it possible to use tensor decomposition to extract features for classification??please help me…

It can be used for dimensionality reduction which can be viewed as feature extraction.