Sit back
Let's learn
About

Groups Parameter of the Convolution Layer

Published on:
April 7, 2023
Published by:
Professor Ishwar Sethi
This post was originally published by one of our partners on:
https://iksinc.tech/groups-parameter-of-the-convolution-layer/

One of the convolution layer’s parameters in PyTorch is the groups parameter. This parameter controls the connections between the input and output channels.  In this post, I will describe different scenarios for the group parameter to better understand it.

Groups = 1

This setting is the default setting. Under this setting, all inputs are convolved to all outputs. Suppose you have three input channels as in a color image. Also the number of output channels is three. If you want each output channel to be a function of all the input channels, then you will be using the default setting of the groups parameter. With a square mask of size 5×5, you will use the following specification where the values of the parameters not specified are their respective default settings:

con = nn.Conv2d(in_channels = 3, out_channels = 3, kernel_size = 5, bias = False)print(con.weight.shape)

The above specification returns torch.Size([3, 3, 5, 5]) as the shape of the convolution masks. This implies 3x3x5x5 (=225) weight values involved in convolution. A pictorial representation given below captures how input and output are related.

Groups ≠ 1 and In_channels > Out_channels

Whenever a groups value other than the default value of 1 is to be selected, the chosen value must be an integer such that the number of input channels and the number of output channels are both divisible by this number.  A non-default groups value allows us to create multiple paths where each path connects only a subset of input channels to the output channels. As an example, suppose we have 8 channels coming out of an intermediate convolution layer and you want to convolve them in groups to produce four output channels. In this case, non-default values of 2 and 4 are possible for the groups parameter. Let’s see how the grouping of channels takes place with groups = 4.  In this case our convolution layer specification will be:

con = nn.Conv2d(in_channels = 8, out_channels = 4,groups=4, kernel_size = 5, bias = False)print(con.weight.shape)print(con.bias.shape)

The weight.shape in this case turns out to be torch.Size([4, 2, 5, 5]) and the bias shape to be torch.Size([4]). We can visualize the convolution paths as shown below:

However, if we set groups to 2, we find that weight.shape is torch.Size([4, 4, 5, 5]), meaning four filters with four inputs each. Thus, we can set the groups parameter to a value that aligns with the paths that we want to create.

Groups ≠ 1 and In_channels < Out_channels

Now let’s see how the groups value affects the resulting convolution paths when the number of output channels is more than the number of input channels. Let’s assume we have 4 input channels and 8 output channels. With groups = 4, we find the weight shape to be torch.Size([8, 1, 5, 5]). This choice of groups then results in 8 filters with each filter having only one input. On the other hand, groups value of 2 results in 8 filters with each filter convolving 2 input channels.

Groups ≠ 1 and In_channels = Out_channels

When the number of input and output channels are same, and the groups parameter is set to the number of channels, then each input channel is convolved separately to produce a corresponding output channels. This means a direct one to one connection is made between each input-output channel pair. When any other valid groups value is used, then that value specifies the number of input channels that will be convolved together along any path between input and output. Thus, with in_channels = out_channels = 4, and groups = 2, we will have 4 filters with two input channels being convolved per filter.

To summarize from above, we see that the ratio of number of input channels to the groups value determines the number of input channels that will be grouped per filter. Of course, the number of filters equals the number of output channels.

Convolutions performed using the non-default value for groups are called grouped convolutions.  These convolutions offer two advantages. First, the grouped convolutions by virtue of offering multiple paths allow multiple GPUs to be used in parallel during training making the training efficient. The second advantage is the decrease in the size of parameters because of grouping. You may want to read about how these convolutions have been used in the Alexnet architecture, and in the ResNeXt.

Check Out These Brilliant Topics
Understanding Tensors and Tensor Decompositions: Part 3
Published on:
April 6, 2023
Published by:
Professor Ishwar Sethi

This is my third post on tensors and tensor decompositions. The first post on this topic primarily introduced tensors and some related terminology. The second post was meant to illustrate a particularly simple tensor decomposition method, called the CP decomposition. In this post, I will describe another tensor decomposition method, known as the Tucker decomposition. While the CP decomposition’s chief merit is its simplicity, it is limited in its approximation capability and it requires the same number of components in each mode. The Tucker decomposition, on the other hand, is extremely efficient in terms of approximation and allows different number of components in different modes. Before going any further, lets look at factor matrices and n-mode product of a tensor and a matrix. Factor Matrices Recall the CP decomposition of an order three tensor expressed as X≈∑r=1Rar∘br∘cr, where (∘ ) represents the outer product. We can also represent this decomposition in terms of organizing the vectors, ar,br,cr,r=1,⋯R , into three matrices, A, B, and C, as A=[a1a2⋯aR], B=[b1b2⋯bR],and C=[c1c2⋯cR] The CP decomposition is then expressed as X≈[Λ;A,B,C], where Λ is a super-diagonal tensor with all zero entries except the diagonal elements. The matrices A, B, and C are called the factor matrices. Next, lets try to understand the n-mode product. Multiplying a Tensor and a Matrix How do you multiply a tensor and a matrix? The answer is via n-mode product. The n-mode product of a tensor X∈RI1×I2×⋯IN with a matrix U∈RJ1×In is a tensor of size I1×I2×⋯In−1×J×In+1×⋯×IN, and is denoted by X×nU . The product is calculated by multiplying each mode-n fibre by the U matrix. Lets look at an example to better understand the n-mode product. Lets consider a 2x2x3 tensor whose frontal slices are:

Want Us to Send You New Posts?

We add Value. Not spam.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Kevadiya INC. © 2023 All Rights Reserved.