Convolutional operation takes a patch of the image, and applies a filter by performing a dot product on it. The convolution layer is similar to fully connected layer, but performs convolution operation on input rather than matrix multiplication.
The convolutional layer takes an input volume of:
- Number of input
- The depth of input
- Height of the input
- Width of the input
These hyperparameters control the size of output volume:
- Number of filters
- Spatial Extent
- Stride length
- Zero Padding
The spatial size of output is given by
Note: When preserves the input volume size.
As stated earlier, convolutional layer replaces the matrix multiplication with convolution operation. To compute the pre non linearity for neuron on layer, we have:
Naively, for doing our convolutional operation we loop over each image, over each channel and take a dot product at each location for each of our filters. For the sake of efficiency and computational simplicity, what we need to do is gather all the locations that we need to do the convolution operations and get the dot product at each of these locations. Lets examine this with a simple example.
Suppose we have a single image of size and a single filter and are using and . After padding the shape of our image is .
Now we have locations along both width and height, so possible locations to do our convolution. Locations for top edges are
For all the locations we have a filter, which we stretch out to column vector. Thus we have 25 of these column vectors, or matrix of all the stretched out receptive fields.
Similarly the weights are also stretched out. If we have 3 filters of size then we have matrix of size .
The result of convolution is now equivalent of performing a single matrix multiplication
np.dot(W_row,X_col) which has the dot product of every filter with every receptive field, giving the result which can be reshaped back to get output volume of size .
We use im2col utility to perform the reshaping of input X to X_col.
# Create a matrix of size (h_filter*w_filter) by n_X * ((h_X-h_filter+2p)/stride + 1)**2 # suppose X is 5x3x10x10 with 3x3x3 filter and padding and stride = 1 # X_col will be 27x500 X_col = im2col_indices(X,h_filter,w_filter,padding,stride) # suppose we have 10 filter of size 10x3x3x3, W_col will be 10x27 W_col = W.reshape(n_filter,c_filter*h_filter*w_filter) # output will be 10x500 output = np.dot(W_col,X_col) + b # reshape to get 10x10x10x5 and then transpose the axes 5x10x10x10 output = output.reshape(n_filter,h_out,w_out,n_X).transpose(3,0,1,2)
We know the output error for the current layer which in our case is as our layer is only computing pre non linearity output . We need to find the gradient for each weight .
Notice that is from the forward propagation above, where is the output of the previous layer and input to our current layer.
# from 5x10x10x10 to 10x10x10x5 and 10x500 dout_flat = dout.transpose(1,2,3,0).reshape(n_filter,-1) # calculate dot product 10x500 . 500x27 = 10x27 dW = np.dot(dout_flat,X_col.T) # reshape back to 10x3x3x3 dW = dW.reshape(W.shape)
For bias gradient, we simply accumulate the gradient as with backpropagation for fully connected layers. So,
db = np.sum(dout,axis=(0,2,3)) db = db.reshape(n_filter,-1)
Now to backpropagate the errors back to the previous layer, we need to compute the input gradient which in our case is .
Notice this looks similar to our convolution operation from forward propagation step but instead of we have , which is simply a convolution using which has been flipped along both the axes.
# from 10x3x3x3 to 10x9 W_flat = W.reshape(n_filter,-1) # dot product 9x10 . 10x500 = 9x500 dX_col = np.dot(W_flat.T,dout_flat) # get the gradients for real image from the stretched image. # from the stretched out image to real image i.e. 9x500 to 5x3x10x10 dX = col2im_indices(dX_col,X.shape,h_filter,w_filter,padding,stride)
Here is the source code for convolutional layer with forward and backward API implemented.
class Conv(): def __init__(self,X_dim,n_filter,h_filter,w_filter,stride,padding): self.d_X,self.h_X,self.w_X = X_dim self.n_filter,self.h_filter,self.w_filter = n_filter,h_filter,w_filter self.stride,self.padding = stride,padding self.W = np.random.randn(n_filter,self.d_X,h_filter,w_filter) / np.sqrt(n_filter/2.) self.b = np.zeros((self.n_filter,1)) self.params = [self.W,self.b] self.h_out = (self.h_X - h_filter + 2*padding)/ stride + 1 self.w_out = (self.w_X - w_filter + 2*padding)/ stride + 1 if not self.h_out.is_integer() or not self.w_out.is_integer(): raise Exception("Invalid dimensions!") self.h_out,self.w_out = int(self.h_out), int(self.w_out) self.out_dim = (self.n_filter,self.h_out,self.w_out) def forward(self,X): self.n_X = X.shape self.X_col = im2col_indices(X,self.h_filter,self.w_filter,stride=self.stride,padding=self.padding) W_row = self.W.reshape(self.n_filter,-1) out = W_row @ self.X_col + self.b out = out.reshape(self.n_filter,self.h_out,self.w_out,self.n_X) out = out.transpose(3,0,1,2) return out def backward(self,dout): dout_flat = dout.transpose(1,2,3,0).reshape(self.n_filter,-1) dW = dout_flat @ self.X_col.T dW = dW.reshape(self.W.shape) db = np.sum(dout,axis=(0,2,3)).reshape(self.n_filter,-1) W_flat = self.W.reshape(self.n_filter,-1) dX_col = W_flat.T @ dout_flat shape = (self.n_X,self.d_X,self.h_X,self.w_X) dX = col2im_indices(dX_col,shape,self.h_filter,self.w_filter,self.padding,self.stride) return dX, [dW, db]