Tensor Manipulation: Manual Convolutional Filter
Description
Understand the mechanics of a 2D convolution by manually applying a 3x3 filter to a small, single-channel image. This exercise helps demystify what happens inside a torch.nn.Conv2d
layer.
Guidance
Iterate through the height and width of the input image, extracting 3x3 patches. For each patch, perform an element-wise multiplication with the filter and then sum the results. This sum is the value for a single pixel in the output feature map. Assume no padding and a stride of 1.
Starter Code
import torch
def manual_conv2d(image, kernel):
# image: (H, W)
# kernel: (kH, kW)
h, w = image.shape
kh, kw = kernel.shape
oh, ow = h - kh + 1, w - kw + 1
output = torch.zeros((oh, ow))
for i in range(oh):
for j in range(ow):
patch = image[i:i+kh, j:j+kw]
output[i, j] = (patch * kernel).sum()
return output
Verification
Create a small 5x5 image tensor and a 3x3 kernel (e.g., an edge detector like [[-1, -1, -1], [-1, 8, -1], [-1, -1, -1]]
). The output of your function should be a 3x3 tensor. You can compare your result to torch.nn.functional.conv2d
(you'll need to add batch and channel dimensions to the inputs).