Understanding Tensors and Tensor Decompositions: Part 2

Published on:

April 10, 2023

Published by:

Professor Ishwar Sethi

This post was originally published by one of our partners on:

https://iksinc.tech/understanding-tensors-and-tensor-decompositions-part-2/

In Part 1 of this blog, I had explained what tensors are and some related terminology. Just to briefly recap, a tensor is a multidimensional array of numbers. Tensors provide a useful representation for data that involves multiple modalities of sensing or the data has multiple aspects in terms of time or space or both. In this and the next post, we are going to look at tensor decomposition and suggest some applications of tensor decomposition. We are all familiar with decomposition of matrices, for example PCA/SVD, and non-negative matrix factorization etc. Such decompositions provide insights to different data components as well as ways to compress data. Tensor decomposition yields similar benefits with multidimensional data.

The CP Decomposition

There are several ways of decomposing tensors. In this post, we will look at the CANDECOMP/PARAFAC decomposition, commonly referred as CP decomposition. Although this decomposition dates back to 1927 when it was introduced as canonical polyadic decomposition (CPD), it became popular in seventies under the names canonical decomposition (CANDECOMP) and parallel factors (PARAFAC) with applications in psychometrics and phonetics respectively. The CP decomposition factorizes a tensor into a linear combination of rank one tensors. Thus, a tensor of order 3 can be decomposed as a sum of R rank one tensors as

X \approx \sum_{r = 1}^{R} a^{r} \circ b^{r} \circ c^{r},

where (∘∘) represents the outer product. This is illustrated in the figure below.

Often, the vectors in rank one tensors are normalized to unit length. In such cases, the CP decomposition is expressed as

X \approx \sum_{r = 1}^{R} λ_{r} a^{r} \circ b^{r} \circ c^{r},

where λr is a scalar accounting for normalization. With such a decomposition, a tensor element

x_{i j k}

can be approximated as

‍

x_{i j k} \approx {\hat{x}}_{i j k} = \sum_{r = 1}^{R} λ_{r} a_{i}^{r} b_{j}^{r} c_{k}^{r}

The decomposition is performed by a minimization algorithm, known as alternating least squares (ALS). The basic gist of this algorithm is to keep certain components of the solution fixed while finding the remaining components and then iterating the procedure by switching the components to be kept fixed.

An Example of CP Decomposition

Let me use an example to understand CP decomposition. Lets begin by generating 20 binary matrices of size 7×5 representing each of noisy digits 1 to 3. These matrices form a multidimensional array of size 60x7x5 that is treated as a tensor. I used the following code in R for generating the array.

[code language=”r”]
> dig1 one dig2 two dig3 three noise = 0.05
> X for(i in 1:20) {
+ X[i, ,] <- (xor(one,matrix(rbinom(7*5,1,noise),7,5)))*1
+ X[i+20, ,] <- (xor(two,matrix(rbinom(7*5,1,noise),7,5)))*1
+ X[i+40, ,] <- (xor(three,matrix(rbinom(7*5,1,noise),7,5)))*1
+ }
[/code]

Three examples of generated noisy digits are shown below.

I used the rTensor package for tensor decomposition. If you prefer Python, then use can use the TensorLy package. To start with, I just selected to decompose the tensor X with a single rank one tensor using the code shown below:

[code language=”r”]
> library(rTensor)
> nc = 1# number of rank 1 components
> xdecomp xdecomp$conv
[1] TRUE
> str(xdecomp$U)
List of 3
$ : num [1:60, 1] 0.0146 0.0171 0.0119 0.0152 0.0157 …
$ : num [1:7, 1] -0.2276 -0.0986 -0.0681 -0.1522 -0.0913 …
$ : num [1:5, 1] -0.108 -0.233 -0.307 -0.224 -0.128
> attributes(xdecomp)
$names
[1] “lambdas” “U” “conv” “est” “norm_percent” “fnorm_resid” “all_resids”
> attributes(xdecomp$U)
NULL
> xdecomp$lambdas
[1] 771.7821
> dim(xdecomp$U)
NULL> summary(xdecomp$U)
Length Class Mode
[1,] 60 -none- numeric
[2,] 7 -none- numeric
[3,] 5 -none- numeric
> xdecomp$norm_percent
[1] 30.57773
> xdecomp$fnorm_resid
[1] 19.7213
[/code]

The decomposition results in three vectors. The mode one vector has 60 components with each component corresponding to one of the noisy digits. A plot of these 60 values is shown below. It shows that the values in this tensor mode are barely different from each other.

The other two mode vectors of size 1×7 and 1×5 basically define a two dimensional matrix that is common to all 60 digit patterns. This means that the CP decomposition is telling us to view all 60 digit patterns as scaled versions of a single two dimensional matrix. The value of xdecomp$norm_percent indicates the level of approximation achieved by the decomposition. In this case, it is 30.57773% showing a poor approximation; no surprise there because we are using just one rank one tensor for approximation. To see how the approximated digits look like, I reconstructed two examples from each category of digits 1 to 3. These are shown below. Clearly, the reconstructed digits look all alike because of poor approximation achieved by a single rank one tensor.

Next, I decided to increase the number of rank one tensors in the approximation to two. The approximation in this case accounts for 45.915% of the variability which is better than the case with one component only. In this case, the decomposition suggests to view each digit image as a linear sum of two 7×5 pattens where the coefficients in the linear sum are given by the first mode matrix of size 2×60. Plotting these components of the first mode matrix with color coding for three digit categories, I generated the plot given below.

The above plot shows that the values in the first mode of the decomposed tensor can differentiate between digit 1 pattens (shown in black circles) against digit two and three patterns (shown in red and green circles). This is further supported by reconstruction examples, shown below, where patterns for digit 1 are able to reasonbly approximate it while reconstructed patterns for digits 2 and 3 are indistinguiable.

Next, I decided to jump to five components in the decomposition. This resulted in the mode one matrix of size 5×60. The mode two matrix in this case is 5×7 and the mode three matrix is 5×5. Again the mode two and three matrices together represent five patterns of size 7×5 matrices. When we combine these five patterns with weights/scales coming from columns of mode one matrix, we are able to obtain approximated digit images. The scatter plots below show that it is possible to cluster/classify digits 1 to 3 using the columns of mode one matrix in the decomposed representation (see row 4 and column 3). This shows that CP decomposition can be used for clustering/classification tasks.

The reconstructed images with five components, given below, clearly show patterns for digits 1 to 3. We can increase the number of components for better approximation.

As this example shows, we can view the CP decomposition producing a set of patterns in modes two and three that together act analogous to basis vectors of PCA. The first mode matrix can be viewed as producing weights that can be linearly combined to produce approximations to images/data in the original tensor. By treating the elements of mode one matrix in the resulting decomposition as features, it is possible to use the CP decomposition for classification/clustering tasks.

The CP decomposition has been applied to many signal processing and data mining applications. One application that is worth mentioning here is by the company Stitch Fix. In their application, a tensor of order 3 is created by the words customers use to describe clothing made by Stitch Fix and the words used by the company to describe their clothing styles. To form the tensor, word embedding vectors are used. The company uses the CP decomposition to learn the relationships between the words used by the customers and the styles offered by the company. This helps in returning more appropriate results to customer queries for clothing. The blog describing this application is worth checking out.

In Part 3, I am going to describe another popular tensor decomposition method, known as Tucker decomposition. Till then, happy reading and surfing.

Benson Muite has created a Git repository for the code used in this and the two related blog posts on tensors. Pl. visit the repository at https://gitlab.com/bkmgit/understanding-tensors-and-tensor-decomposition/. Many thanks to him.

‍

Check Out These Brilliant Topics

Artificial Intelligence (AI) Fundamentals

Explore the foundations of AI and its various applications in today's world. Our blog covers essential concepts, algorithms, and techniques that form the basis of AI, providing students and educators with a solid understanding of this transformative technology and its potential impact across industries.

Machine Learning Essentials

Uncover the potential of machine learning as a powerful tool for data-driven decision-making. Learn about the core principles, algorithms, and techniques that underpin machine learning, and explore practical examples and applications that demonstrate its capabilities in solving real-world problems.

Deep Learning for Advanced Analytics

Delve into the world of deep learning and its applications in advanced analytics. Our blog explores the fundamentals of deep learning, including neural networks, convolutional neural networks (CNNs), and recurrent neural networks (RNNs), as well as their applications in areas such as image and speech recognition, natural language processing, and autonomous systems.

Clustering Techniques for Data Analysis

Discover the power of clustering techniques in analyzing and understanding complex datasets. Our blog dives into the applications of clustering algorithms in various domains, such as pattern recognition, image segmentation, and customer behavior analysis, offering valuable insights for students and researchers alike.

Natural Language Processing (NLP) in the Digital Age

Explore the role of NLP in transforming human-computer interactions and enabling seamless communication between people and machines. Our blog showcases the core concepts and techniques of NLP, including text processing, sentiment analysis, and machine translation, providing students and educators with a comprehensive understanding of this fascinating field.

Sit back Let's learnAbout