ML Katas

Tiny Neural Radiance Fields (NeRF)

medium (<30 mins) pytorch nerf implicit nets
yesterday by E

Description

Implement a simplified version of a Neural Radiance Field (NeRF) to represent a 2D image. [1] A NeRF learns a continuous mapping from spatial coordinates to pixel values. Instead of a full 3D scene, your task is to train an MLP to take 2D coordinates (x, y) and output an RGB color, effectively learning to "overfit" to a single image. This captures the core idea of coordinate-based neural networks.

Guidance

A key component of NeRF is Positional Encoding, which maps the low-dimensional input coordinates to a higher-dimensional space, making it easier for the MLP to learn high-frequency details. You must implement this encoding before passing the coordinates to a standard MLP.

Positional Encoding Formula for a coordinate p:

PE(p)=(sin(20πp),cos(20πp),...,sin(2L1πp),cos(2L1πp))

Starter Code

import torch
import torch.nn as nn

class TinyNeRF(nn.Module):
    def __init__(self, pos_emb_dim=10):
        super().__init__()
        self.L = pos_emb_dim
        # 1. Calculate the input dimension for the MLP after positional encoding.
        #    Each coordinate (x, y) will be expanded.
        input_dim = ... 

        # 2. Define a standard MLP. It should take the encoded coordinates
        #    and output 3 values (R, G, B).
        self.mlp = nn.Sequential(...)

    def positional_encoding(self, x):
        # x is a tensor of (N, 2) coordinates
        # 1. Implement the positional encoding formula shown above.
        # 2. The output should be a tensor of shape (N, input_dim).
        pass

    def forward(self, x):
        # x is a tensor of (x, y) coordinates
        encoded_x = self.positional_encoding(x)
        return self.mlp(encoded_x)

Verification

Create a training dataset by taking the pixel coordinates (normalized to [-1, 1]) and corresponding RGB values of a target image. Train the TinyNeRF model to minimize the Mean Squared Error between its output and the true pixel colors. After training, generate an image by passing a grid of all coordinates through the model. The output image should be a clear reconstruction of the target image.

References

[1] Mildenhall, B., et al. (2020). NeRF: Representing scenes as neural radiance fields for view synthesis.

[2] Tancik, M. (2020). NeRF: Neural Radiance Fields Project Website.