Sit back
Let's learn
About

Understanding Compression of Convolutional Neural Nets: Part 1

Published on:
April 10, 2023
Published by:
Professor Ishwar Sethi
This post was originally published by one of our partners on:
https://iksinc.tech/understanding-compression-of-convolutional-neural-nets-part-1/

There are neural network applications where the resources, for example compute power, memory space, battery power etc., are limited. In such instances, we may want to simplify a trained network to better deal with available resources. While simplifying, we obviously would like to maintain the network accuracy as much as possible. Such a network simplification is termed compression of deep neural networks. In this three-part blog posts, I am going to share with you some simple examples of neural network compression to help you better understand compressing deep neural nets. In Part 1, I will begin with a technique, known as singular value decomposition (SVD), which is the basis of compression methods. Part 2 will show compression of feedforward neural networks and Part 3 will demonstrate compression of convolution layers in convolutional neural networks (CNNs).

Singular Value Decomposition (SVD)

Let’s begin our discussion on SVD by recalling that the rank of a matrix is the number of independent columns or rows. For example, the matrix

A = [ 1 2 3 4 ]

has rank 2 as its both columns are independents, i.e. one column/row is not a scaled version of the other column. On the other hand, the following matrix is a matrix of rank 1 as the second column/row is same as the first column multiplied by 2.

A = [ 1 2 2 4 ]

Given a matrix, sometimes you may want to obtain its approximation. Use of such approximations in common in numerous tasks such as dimensionality reduction, information retrieval and text mining, and in image processing. The singular value decomposition (SVD) is one such method for obtaining low rank approximations of a given matrix. Another popular matrix decomposition method is non-negative matrix factorization (NMF). You can read about it in my earlier blog post.

In SVD, a real-valued matrix A of size m x n is factorized as

A = U D V t

where U is an orthogonal matrix of size m x m of left singular vectors and V is an orthogonal matrix of size n x n of right singular vectors. The matrix D is a diagonal matrix of size m x n of singular values. A low rank approximation to matrix A of rank r is obtained by using only a subset of singular values and the corresponding left and right singular vectors as given by the following expression. In other words, the approximation is obtained by the weighted sum of rank one matrices.

A ^ = j = 1 k d j j U j V t ,   k r

Let me give you an example by decomposing a matrix and obtaining its different approximations.

from scipy import linalgimport numpy as npnp.set_printoptions(precision=2)np.set_printoptions(suppress=True)# Create a 4x3 matrixA = np.array([[6,6,2],[0,1,3],[4,0,1],[0,6,2]])# Obtain SVD decompositionU,d,Vt = linalg.svd(A)print(' U matrix\n',U,'\n d array\n',d,'\n V-Transpose matrix\n',Vt)

This code creates a 4×3 matrix A and decomposes it to obtain U and V matrices along with an array d of singular values as shown below. Note that the array of d singular values represents the diagonal elements of matrix D of the SVD equation given earlier. In this case, there are three non-zero singular values; this means that the rank of matrix A is three.

U matrix [[-0.82 -0.28  0.21 -0.46] [-0.16  0.2  -0.93 -0.26] [-0.24 -0.62 -0.28  0.69] [-0.5   0.7   0.1   0.5 ]]  d array [10.51  5.06  2.63]  V-Transpose matrix [[-0.56 -0.77 -0.32] [-0.83  0.54  0.16] [ 0.05  0.36 -0.93]]

Given U, V, and d, we can get back our original matrix A without any error by performing the matrix multiplication UDVt.

D = np.zeros((4,3))# (4,3) is the size of matrix Afor i in range(len(d)):D[i,i]= d[i]Aa = U@(D@Vt)# Aa is the reconstructed matrixprint(' Reconstructed Matrix An',Aa)Reconstructed Matrix A[[6. 6. 2.][0. 1. 3.][4. 0. 1.][0. 6. 2.]]

The reconstructed matrix is identical to the original matrix. Suppose we reconstruct matrix A using only the first two singular values. We do that by replacing  ‘len(d)’ above with  ‘len(d)-1’ and get the following approximated matrix A.

Reconstructed Matrix A [[ 5.97  5.8   2.51] [ 0.12  1.87  0.72] [ 4.04  0.26  0.31] [-0.01  5.91  2.25]]

Since this approximation is obtained by summing two rank one matrices, you can easily verify the rank of this matrix as 2. You can also calculate the approximation error by comparing the original and the reconstructed matrix element by element , squaring the difference, and adding them, as shown below. This measure of comparing two matrices is known as Frobenius norm.

from numpy.linalg import matrix_rankprint(matrix_rank(Aa))err = np.sum(np.square(F-Aa))print(' Approximation Error = ',err)2Approximation Error = 6.904768216213785

If you look at the approximation error, you will find that it equals square of 2.63, the singular value not used in reconstructing the approximation. You can also approximate matrix A by using only one singular value. We do this by replacing ‘len(d)’ with ‘len(d)-2’ and get the following approximation.

Reconstructed Matrix A [[4.79 6.57 2.75] [0.96 1.32 0.55] [1.43 1.95 0.82] [2.92 4.01 1.67]]

You can verify that the error in this case would be the sum of squares of the second and third singular values not used in creating approximation.

Truncated SVD

The singular value decomposition described above creates an approximate matrix whose size remains the same. In some cases, the matrix A to be decomposed is treated as a data matrix with each row representing an observation and each column representing an observed feature. In such a scenario, the objective is not only to approximate but also to achieve a reduction in the size of the approximated matrix, i.e. the resulting matrix has same number of rows (observations) but a fewer number of columns. Thus, the matrix approximation is aimed at achieving dimensionality reduction similar to PCA.  The use of SVD in this context is known as truncated SVD. To obtain a reduced approximation matrix, we use only top k < r singular values and the first k columns of the U matrix and calculate the reduced representation as

A k = U k D k ,   k r

where Uk is the sub-matrix formed by the first k columns of U and Dk is the sub-matrix of D formed by first k rows and columns.

An example of calculating the truncated SVD is shown below.

k = 2Ak = U[:,0:k]@D[0:k,0:k]print('Reduced matrix Ak \n',Ak)Reduced matrix Ak  [[-8.58 -1.43] [-1.73  1.02] [-2.55 -3.15] [-5.23  3.54]]

The Scikit-learn library also provides an implementation of truncated SVD as shown below.

from sklearn.decomposition import TruncatedSVDA.dtype='Float64'dim_red = TruncatedSVD(k, algorithm = 'arpack')Ak = dim_red.fit_transform(A)

Before concluding this post, let me show you two examples of SVD applications. The first example is of image compression wherein  I have performed singular value decomposition on an image and reconstructed it using 32, 16, and 8 top singular values. The results of this reconstruction along with the original image are shown below. As you can see, the first two approximations provide a good compression without satisfying much of the image quality.

The second example is of truncated SVD application. In this case, the truncated SVD has been used to reduce the Iris data matrix to a two-dimensional space, thus doing dimensionality reduction.

from sklearn.datasets import load_irisiris = load_iris()iris_data = iris.datairis_target = iris.targetiris_transformed = dim_red.fit_transform(iris_data)plt.scatter(iris_transformed[:,0],iris_transformed[:,1], c = iris_target,label=cla)plt.title('Plot of Iris Data in 2-Dimensional Transformed Space')plt.show()

The dimensionality reduction shows that Iris data can be well separated in the transformed two-dimensional space.

With the above introduction to SVD, we are ready to move on to compressing neural networks and convolutional neural networks in the coming blogs.

Check Out These Brilliant Topics
Understanding Tensors and Tensor Decompositions: Part 3
Published on:
April 6, 2023
Published by:
Professor Ishwar Sethi

This is my third post on tensors and tensor decompositions. The first post on this topic primarily introduced tensors and some related terminology. The second post was meant to illustrate a particularly simple tensor decomposition method, called the CP decomposition. In this post, I will describe another tensor decomposition method, known as the Tucker decomposition. While the CP decomposition’s chief merit is its simplicity, it is limited in its approximation capability and it requires the same number of components in each mode. The Tucker decomposition, on the other hand, is extremely efficient in terms of approximation and allows different number of components in different modes. Before going any further, lets look at factor matrices and n-mode product of a tensor and a matrix. Factor Matrices Recall the CP decomposition of an order three tensor expressed as X≈∑r=1Rar∘br∘cr, where (∘ ) represents the outer product. We can also represent this decomposition in terms of organizing the vectors, ar,br,cr,r=1,⋯R , into three matrices, A, B, and C, as A=[a1a2⋯aR], B=[b1b2⋯bR],and C=[c1c2⋯cR] The CP decomposition is then expressed as X≈[Λ;A,B,C], where Λ is a super-diagonal tensor with all zero entries except the diagonal elements. The matrices A, B, and C are called the factor matrices. Next, lets try to understand the n-mode product. Multiplying a Tensor and a Matrix How do you multiply a tensor and a matrix? The answer is via n-mode product. The n-mode product of a tensor X∈RI1×I2×⋯IN with a matrix U∈RJ1×In is a tensor of size I1×I2×⋯In−1×J×In+1×⋯×IN, and is denoted by X×nU . The product is calculated by multiplying each mode-n fibre by the U matrix. Lets look at an example to better understand the n-mode product. Lets consider a 2x2x3 tensor whose frontal slices are:

Want Us to Send You New Posts?

We add Value. Not spam.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Kevadiya INC. © 2023 All Rights Reserved.