ML Katas

Tensor Manipulation: Manual Convolutional Filter

medium (<10 mins) convolution vision tensor
yesterday by E

Description

Understand the mechanics of a 2D convolution by manually applying a 3x3 filter to a small, single-channel image. This exercise helps demystify what happens inside a torch.nn.Conv2d layer.

Guidance

Iterate through the height and width of the input image, extracting 3x3 patches. For each patch, perform an element-wise multiplication with the filter and then sum the results. This sum is the value for a single pixel in the output feature map. Assume no padding and a stride of 1.

Starter Code

import torch

def manual_conv2d(image, kernel):
    # image: (H, W)
    # kernel: (kH, kW)
    h, w = image.shape
    kh, kw = kernel.shape
    oh, ow = h - kh + 1, w - kw + 1
    output = torch.zeros((oh, ow))

    for i in range(oh):
        for j in range(ow):
            patch = image[i:i+kh, j:j+kw]
            output[i, j] = (patch * kernel).sum()
    return output

Verification

Create a small 5x5 image tensor and a 3x3 kernel (e.g., an edge detector like [[-1, -1, -1], [-1, 8, -1], [-1, -1, -1]]). The output of your function should be a 3x3 tensor. You can compare your result to torch.nn.functional.conv2d (you'll need to add batch and channel dimensions to the inputs).