Sit back
Let's learn
About

Understanding Tensors and Tensor Decompositions: Part 2

Published on:
April 10, 2023
Published by:
Professor Ishwar Sethi
This post was originally published by one of our partners on:
https://iksinc.tech/understanding-tensors-and-tensor-decompositions-part-2/

In Part 1 of this blog, I had explained what tensors are and some related terminology. Just to briefly recap, a tensor is a multidimensional array of numbers. Tensors provide a useful representation for data that involves multiple modalities of sensing or the data has multiple aspects in terms of time or space or both. In this and the next post, we are going to look at tensor decomposition and suggest some applications of tensor decomposition. We are all familiar with decomposition of matrices, for example PCA/SVD, and non-negative matrix factorization etc. Such decompositions provide insights to different data components as well as ways to compress data. Tensor decomposition yields similar benefits with multidimensional data.

The CP Decomposition

There are several ways of decomposing tensors. In this post, we will look at the CANDECOMP/PARAFAC decomposition, commonly referred as CP decomposition. Although this decomposition dates back to 1927 when it was introduced as canonical polyadic decomposition (CPD), it became popular in seventies under the names canonical decomposition (CANDECOMP) and parallel factors (PARAFAC) with applications in psychometrics and phonetics respectively.  The CP decomposition factorizes a tensor into a linear combination of rank one tensors. Thus, a tensor of order 3 can be decomposed as a sum of R rank one tensors as

X r = 1 R a r b r c r ,

where (∘∘) represents the outer product. This is illustrated in the figure below.

Often, the vectors in rank one tensors are normalized to unit length. In such cases, the CP decomposition is expressed as

X r = 1 R λ r a r b r c r ,

where λr is a scalar accounting for normalization. With such a decomposition, a tensor element

x i j k

can be approximated as

x i j k x ^ i j k = r = 1 R λ r a i r b j r c k r

The decomposition is performed by a minimization algorithm, known as alternating least squares (ALS). The basic gist of this algorithm  is to keep certain components of the solution fixed while finding the remaining components and then iterating the procedure by switching the components to be kept fixed.

An Example of CP Decomposition

Let me use an example to understand CP decomposition. Lets begin by generating 20 binary matrices of size 7×5 representing each of noisy digits 1 to 3. These matrices form a multidimensional array of size 60x7x5 that is treated as a tensor. I used the following code in R for generating the array.

[code language=”r”]
> dig1 one dig2 two dig3 three noise = 0.05
> X for(i in 1:20) {
+ X[i, ,] <- (xor(one,matrix(rbinom(7*5,1,noise),7,5)))*1
+ X[i+20, ,] <- (xor(two,matrix(rbinom(7*5,1,noise),7,5)))*1
+ X[i+40, ,] <- (xor(three,matrix(rbinom(7*5,1,noise),7,5)))*1
+ }
[/code]

Three examples of generated noisy digits are shown below.

I used the rTensor package for tensor decomposition. If you prefer Python, then use can use the TensorLy package. To start with, I just selected to decompose the tensor X with a single rank one tensor using the code shown below:

[code language=”r”]
> library(rTensor)
> nc = 1# number of rank 1 components
> xdecomp xdecomp$conv
[1] TRUE
> str(xdecomp$U)
List of 3
$ : num [1:60, 1] 0.0146 0.0171 0.0119 0.0152 0.0157 …
$ : num [1:7, 1] -0.2276 -0.0986 -0.0681 -0.1522 -0.0913 …
$ : num [1:5, 1] -0.108 -0.233 -0.307 -0.224 -0.128
> attributes(xdecomp)
$names
[1] “lambdas” “U” “conv” “est” “norm_percent” “fnorm_resid” “all_resids”
> attributes(xdecomp$U)
NULL
> xdecomp$lambdas
[1] 771.7821
> dim(xdecomp$U)
NULL> summary(xdecomp$U)
Length Class Mode
[1,] 60 -none- numeric
[2,] 7 -none- numeric
[3,] 5 -none- numeric
> xdecomp$norm_percent
[1] 30.57773
> xdecomp$fnorm_resid
[1] 19.7213
[/code]

The decomposition results in three vectors. The mode one vector has 60 components with each component corresponding to one of the noisy digits. A plot of these 60 values is shown below. It shows that the values in this tensor mode are barely different from each other.

The other two mode vectors of size 1×7 and 1×5 basically define a two dimensional matrix that is common to all 60 digit patterns. This means that the CP decomposition is telling us to view all 60 digit patterns as scaled versions of a single two dimensional matrix. The value of xdecomp$norm_percent indicates the level of approximation achieved by the decomposition. In this case, it is 30.57773% showing a poor approximation; no surprise there because we are using just one rank one tensor for approximation. To see how the approximated digits look like, I reconstructed two examples from each category of digits 1 to 3. These are shown below. Clearly, the reconstructed digits look all alike because of poor approximation achieved by a single rank one tensor.

Next, I decided to increase the number of rank one tensors in the approximation to two. The approximation in this case accounts for 45.915% of the variability which is better than the case with one component only. In this case, the decomposition suggests to view each digit image as a linear sum of two 7×5 pattens where the coefficients in the linear sum are given by the first mode matrix of size 2×60. Plotting these components of the first mode matrix with color coding for three digit categories, I generated the plot given below.

The above plot shows that the values in the first mode of the decomposed tensor can differentiate between digit 1 pattens (shown in black circles) against digit two and three patterns (shown in red and green circles). This is further supported by reconstruction examples, shown below, where patterns for digit 1 are able to reasonbly approximate it while reconstructed patterns for digits 2 and 3 are indistinguiable.

Next, I decided to jump to five components in the decomposition. This resulted in the mode one matrix of size 5×60. The mode two matrix in this case is 5×7 and the mode three matrix is 5×5. Again the mode two and three matrices together represent five patterns of size 7×5 matrices. When we combine these five patterns with weights/scales coming from columns of mode one matrix, we are able to obtain approximated digit images. The scatter plots below show that it is possible to cluster/classify digits 1 to 3 using the columns of mode one matrix in the decomposed representation (see row 4 and column 3). This shows that CP decomposition can be used for clustering/classification tasks.

The reconstructed images with five components, given below, clearly show patterns for digits 1 to 3. We can increase the number of components for better approximation.

As this example shows, we can view the CP decomposition producing a set of patterns in modes two and three that together act analogous to basis vectors of PCA. The first mode matrix can be viewed as producing weights that can be linearly combined to produce approximations to images/data in the original tensor. By treating the elements of mode one matrix in the resulting decomposition as features, it is possible to use the CP decomposition for classification/clustering tasks.

The CP decomposition has been applied to many signal processing and data mining applications. One application that is worth mentioning here is by the company Stitch Fix. In their application, a tensor of order 3 is created by the words customers use to describe clothing made by Stitch Fix and the words used by the company to describe their clothing styles. To form the tensor, word embedding vectors are used. The company uses the CP decomposition to learn the relationships between the words used by the customers and the styles offered by the company. This helps in returning more appropriate results to customer queries for clothing. The blog describing this application is worth checking out.

In Part 3, I am going to describe another popular tensor decomposition method, known as Tucker decomposition. Till then, happy reading and surfing.

Benson Muite has created a Git repository for the code used in this and the two related blog posts on tensors. Pl. visit the repository at https://gitlab.com/bkmgit/understanding-tensors-and-tensor-decomposition/. Many thanks to him.

Check Out These Brilliant Topics
Understanding Tensors and Tensor Decompositions: Part 3
Published on:
April 6, 2023
Published by:
Professor Ishwar Sethi

This is my third post on tensors and tensor decompositions. The first post on this topic primarily introduced tensors and some related terminology. The second post was meant to illustrate a particularly simple tensor decomposition method, called the CP decomposition. In this post, I will describe another tensor decomposition method, known as the Tucker decomposition. While the CP decomposition’s chief merit is its simplicity, it is limited in its approximation capability and it requires the same number of components in each mode. The Tucker decomposition, on the other hand, is extremely efficient in terms of approximation and allows different number of components in different modes. Before going any further, lets look at factor matrices and n-mode product of a tensor and a matrix. Factor Matrices Recall the CP decomposition of an order three tensor expressed as X≈∑r=1Rar∘br∘cr, where (∘ ) represents the outer product. We can also represent this decomposition in terms of organizing the vectors, ar,br,cr,r=1,⋯R , into three matrices, A, B, and C, as A=[a1a2⋯aR], B=[b1b2⋯bR],and C=[c1c2⋯cR] The CP decomposition is then expressed as X≈[Λ;A,B,C], where Λ is a super-diagonal tensor with all zero entries except the diagonal elements. The matrices A, B, and C are called the factor matrices. Next, lets try to understand the n-mode product. Multiplying a Tensor and a Matrix How do you multiply a tensor and a matrix? The answer is via n-mode product. The n-mode product of a tensor X∈RI1×I2×⋯IN with a matrix U∈RJ1×In is a tensor of size I1×I2×⋯In−1×J×In+1×⋯×IN, and is denoted by X×nU . The product is calculated by multiplying each mode-n fibre by the U matrix. Lets look at an example to better understand the n-mode product. Lets consider a 2x2x3 tensor whose frontal slices are:

Want Us to Send You New Posts?

We add Value. Not spam.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Kevadiya INC. © 2023 All Rights Reserved.