Understanding Compression of Convolutional Neural Nets: Part 1

Published on:

April 10, 2023

Published by:

Professor Ishwar Sethi

This post was originally published by one of our partners on:

https://iksinc.tech/understanding-compression-of-convolutional-neural-nets-part-1/

There are neural network applications where the resources, for example compute power, memory space, battery power etc., are limited. In such instances, we may want to simplify a trained network to better deal with available resources. While simplifying, we obviously would like to maintain the network accuracy as much as possible. Such a network simplification is termed compression of deep neural networks. In this three-part blog posts, I am going to share with you some simple examples of neural network compression to help you better understand compressing deep neural nets. In Part 1, I will begin with a technique, known as singular value decomposition (SVD), which is the basis of compression methods. Part 2 will show compression of feedforward neural networks and Part 3 will demonstrate compression of convolution layers in convolutional neural networks (CNNs).

Singular Value Decomposition (SVD)

Let’s begin our discussion on SVD by recalling that the rank of a matrix is the number of independent columns or rows. For example, the matrix

A = [\begin{matrix} 1 & 2 \\ 3 & 4 \end{matrix}]

has rank 2 as its both columns are independents, i.e. one column/row is not a scaled version of the other column. On the other hand, the following matrix is a matrix of rank 1 as the second column/row is same as the first column multiplied by 2.

A = [\begin{matrix} 1 & 2 \\ 2 & 4 \end{matrix}]

Given a matrix, sometimes you may want to obtain its approximation. Use of such approximations in common in numerous tasks such as dimensionality reduction, information retrieval and text mining, and in image processing. The singular value decomposition (SVD) is one such method for obtaining low rank approximations of a given matrix. Another popular matrix decomposition method is non-negative matrix factorization (NMF). You can read about it in my earlier blog post.

In SVD, a real-valued matrix A of size m x n is factorized as

A = {U D V}^{t}

where U is an orthogonal matrix of size m x m of left singular vectors and V is an orthogonal matrix of size n x n of right singular vectors. The matrix D is a diagonal matrix of size m x n of singular values. A low rank approximation to matrix A of rank r is obtained by using only a subset of singular values and the corresponding left and right singular vectors as given by the following expression. In other words, the approximation is obtained by the weighted sum of rank one matrices.

\hat{A} = \sum_{j = 1}^{k} d_{j j} U_{j} V^{t}, k \leq r

Let me give you an example by decomposing a matrix and obtaining its different approximations.

from scipy import linalgimport numpy as npnp.set_printoptions(precision=2)np.set_printoptions(suppress=True)# Create a 4x3 matrixA = np.array([[6,6,2],[0,1,3],[4,0,1],[0,6,2]])# Obtain SVD decompositionU,d,Vt = linalg.svd(A)print(' U matrix\n',U,'\n d array\n',d,'\n V-Transpose matrix\n',Vt)

This code creates a 4×3 matrix A and decomposes it to obtain U and V matrices along with an array d of singular values as shown below. Note that the array of d singular values represents the diagonal elements of matrix D of the SVD equation given earlier. In this case, there are three non-zero singular values; this means that the rank of matrix A is three.

U matrix [[-0.82 -0.28 0.21 -0.46] [-0.16 0.2 -0.93 -0.26] [-0.24 -0.62 -0.28 0.69] [-0.5 0.7 0.1 0.5 ]] d array [10.51 5.06 2.63] V-Transpose matrix [[-0.56 -0.77 -0.32] [-0.83 0.54 0.16] [ 0.05 0.36 -0.93]]

Given U, V, and d, we can get back our original matrix A without any error by performing the matrix multiplication UDVt.

D = np.zeros((4,3))# (4,3) is the size of matrix Afor i in range(len(d)):D[i,i]= d[i]Aa = U@(D@Vt)# Aa is the reconstructed matrixprint(' Reconstructed Matrix An',Aa)Reconstructed Matrix A[[6. 6. 2.][0. 1. 3.][4. 0. 1.][0. 6. 2.]]

The reconstructed matrix is identical to the original matrix. Suppose we reconstruct matrix A using only the first two singular values. We do that by replacing ‘len(d)’ above with ‘len(d)-1’ and get the following approximated matrix A.

Reconstructed Matrix A [[ 5.97 5.8 2.51] [ 0.12 1.87 0.72] [ 4.04 0.26 0.31] [-0.01 5.91 2.25]]

Since this approximation is obtained by summing two rank one matrices, you can easily verify the rank of this matrix as 2. You can also calculate the approximation error by comparing the original and the reconstructed matrix element by element , squaring the difference, and adding them, as shown below. This measure of comparing two matrices is known as Frobenius norm.

from numpy.linalg import matrix_rankprint(matrix_rank(Aa))err = np.sum(np.square(F-Aa))print(' Approximation Error = ',err)2Approximation Error = 6.904768216213785

If you look at the approximation error, you will find that it equals square of 2.63, the singular value not used in reconstructing the approximation. You can also approximate matrix A by using only one singular value. We do this by replacing ‘len(d)’ with ‘len(d)-2’ and get the following approximation.

Reconstructed Matrix A [[4.79 6.57 2.75] [0.96 1.32 0.55] [1.43 1.95 0.82] [2.92 4.01 1.67]]

You can verify that the error in this case would be the sum of squares of the second and third singular values not used in creating approximation.

Truncated SVD

The singular value decomposition described above creates an approximate matrix whose size remains the same. In some cases, the matrix A to be decomposed is treated as a data matrix with each row representing an observation and each column representing an observed feature. In such a scenario, the objective is not only to approximate but also to achieve a reduction in the size of the approximated matrix, i.e. the resulting matrix has same number of rows (observations) but a fewer number of columns. Thus, the matrix approximation is aimed at achieving dimensionality reduction similar to PCA. The use of SVD in this context is known as truncated SVD. To obtain a reduced approximation matrix, we use only top k < r singular values and the first k columns of the U matrix and calculate the reduced representation as

A_{k} = U_{k} D_{k}, k \leq r

where Uk is the sub-matrix formed by the first k columns of U and Dk is the sub-matrix of D formed by first k rows and columns.

An example of calculating the truncated SVD is shown below.

k = 2Ak = U[:,0:k]@D[0:k,0:k]print('Reduced matrix Ak \n',Ak)Reduced matrix Ak [[-8.58 -1.43] [-1.73 1.02] [-2.55 -3.15] [-5.23 3.54]]

The Scikit-learn library also provides an implementation of truncated SVD as shown below.

from sklearn.decomposition import TruncatedSVDA.dtype='Float64'dim_red = TruncatedSVD(k, algorithm = 'arpack')Ak = dim_red.fit_transform(A)

Before concluding this post, let me show you two examples of SVD applications. The first example is of image compression wherein I have performed singular value decomposition on an image and reconstructed it using 32, 16, and 8 top singular values. The results of this reconstruction along with the original image are shown below. As you can see, the first two approximations provide a good compression without satisfying much of the image quality.

The second example is of truncated SVD application. In this case, the truncated SVD has been used to reduce the Iris data matrix to a two-dimensional space, thus doing dimensionality reduction.

from sklearn.datasets import load_irisiris = load_iris()iris_data = iris.datairis_target = iris.targetiris_transformed = dim_red.fit_transform(iris_data)plt.scatter(iris_transformed[:,0],iris_transformed[:,1], c = iris_target,label=cla)plt.title('Plot of Iris Data in 2-Dimensional Transformed Space')plt.show()

The dimensionality reduction shows that Iris data can be well separated in the transformed two-dimensional space.

With the above introduction to SVD, we are ready to move on to compressing neural networks and convolutional neural networks in the coming blogs.

‍

Check Out These Brilliant Topics

Artificial Intelligence (AI) Fundamentals

Explore the foundations of AI and its various applications in today's world. Our blog covers essential concepts, algorithms, and techniques that form the basis of AI, providing students and educators with a solid understanding of this transformative technology and its potential impact across industries.

Machine Learning Essentials

Uncover the potential of machine learning as a powerful tool for data-driven decision-making. Learn about the core principles, algorithms, and techniques that underpin machine learning, and explore practical examples and applications that demonstrate its capabilities in solving real-world problems.

Deep Learning for Advanced Analytics

Delve into the world of deep learning and its applications in advanced analytics. Our blog explores the fundamentals of deep learning, including neural networks, convolutional neural networks (CNNs), and recurrent neural networks (RNNs), as well as their applications in areas such as image and speech recognition, natural language processing, and autonomous systems.

Clustering Techniques for Data Analysis

Discover the power of clustering techniques in analyzing and understanding complex datasets. Our blog dives into the applications of clustering algorithms in various domains, such as pattern recognition, image segmentation, and customer behavior analysis, offering valuable insights for students and researchers alike.

Natural Language Processing (NLP) in the Digital Age

Explore the role of NLP in transforming human-computer interactions and enabling seamless communication between people and machines. Our blog showcases the core concepts and techniques of NLP, including text processing, sentiment analysis, and machine translation, providing students and educators with a comprehensive understanding of this fascinating field.

Sit back Let's learnAbout