ML Katas

Simple Differentiable Renderer

medium (<30 mins) pytorch autograd diff rendering 3d
yesterday by E

Description

Modern 3D deep learning often relies on differentiable rendering, allowing gradients to flow from a 2D rendered image back to 3D scene parameters. [1] Your task is to implement a simplified differentiable renderer that can render a single 2D colored circle and backpropagate a loss to its center coordinates.

Guidance

The key to making a renderer differentiable is to ensure all its operations are differentiable. Instead of a hard, binary mask for the circle, you'll create a soft mask using a function like sigmoid. This way, a small change in a circle's position results in a small, smooth change in pixel values, allowing gradients to flow.

Starter Code

import torch
import torch.nn as nn

def simple_circle_renderer(center, radius, color, image_size=64):
    """
    Renders a batch of circles. <!--CODE_BLOCK_3445--> is a (N, 2) tensor of (x, y) coords.
    <!--CODE_BLOCK_3446--> is (N, 1) and <!--CODE_BLOCK_3447--> is (N, 3).
    """
    # 1. Create a grid of pixel coordinates of shape (image_size, image_size, 2).
    #    Hint: Use torch.meshgrid and torch.stack.

    # 2. Calculate the squared distance of each pixel in the grid from the circle's center.
    #    You'll need to reshape tensors to use broadcasting correctly. The goal is to get a
    #    distance tensor of shape (N, image_size, image_size).

    # 3. Create a soft, differentiable mask. A good choice is to use the sigmoid function
    #    applied to the difference between the circle's radius squared and the pixel distances.
    #    This will give values near 1 inside the circle and near 0 outside.

    # 4. Apply the mask to the circle's color to create the final image.
    #    The output should have shape (N, image_size, image_size, 3).

    pass

Verification

  1. Define a target image (e.g., an image of a circle at a known location).
  2. Initialize center coordinates as a learnable nn.Parameter tensor with a random starting position.
  3. In a training loop, render an image using your current center.
  4. Calculate an L2 loss between your rendered image and the target image.
  5. Call loss.backward() and optimizer.step(). If implemented correctly, the rendered circle's center coordinates should move over iterations to match the target circle's location.

References

[1] PyTorch3D GitHub Repository.

[2] Kato, H., Ushiku, Y., & Harada, T. (2018). Neural 3D Mesh Renderer.