All Katas - ML Katas

Siamese Network for One-Shot Image Verification

this year hard (<30 mins) | pytorch siamese metric learning one-shot

### Description Your task is to implement a Siamese network that can determine if two images are of the same class, given only one or a few examples of that class at test time. You'll train a...

Physics-Informed Neural Network (PINN) for an ODE

this year hard (<30 mins) | pytorch autograd ode pinn physics

### Description Solve a simple Ordinary Differential Equation (ODE) using a Physics-Informed Neural Network. A PINN is a neural network that is trained to satisfy both the data and the underlying...

Graph Convolutional Network for Node Classification

this year hard (<30 mins) | pytorch gnn graph gcn

### Description Implement a simple Graph Convolutional Network (GCN) to perform node classification on a graph dataset like Cora. [1] A GCN layer aggregates information from a node's neighbors to...

HyperNetwork for Weight Generation

this year hard (<30 mins) | pytorch hypernetwork meta-learning

### Description Implement a simple HyperNetwork. A HyperNetwork is a neural network that generates the weights for another, larger network (the "target network"). [1] This allows for dynamic...

Normalizing Flow for Density Estimation

this year hard (<30 mins) | pytorch generative normalizing flow

### Description Implement a simple 2D Normalizing Flow model. Normalizing Flows transform a simple base distribution (like a Gaussian) into a more complex distribution by applying a sequence of...

Spiking Neuron with Leaky Integrate-and-Fire

this year hard (<30 mins) | pytorch snn spiking neuroscience

### Description Implement a single Leaky Integrate-and-Fire (LIF) neuron, the fundamental building block of many Spiking Neural Networks (SNNs). Unlike traditional neurons, LIF neurons operate on...

Implementing a Siamese Network with Triplet Loss

this year hard (>1 hr) | siamese loss learning triplet metric

Building on the previous exercise, let's switch to **Triplet Loss**. This loss function is more powerful as it enforces a margin between an anchor-positive pair and an anchor-negative pair. The...

Implementing Self-Supervised Learning with BYOL

this year hard (>1 hr) | image learning contrastive self-supervised byol

Implement the core logic of **Bootstrap Your Own Latent (BYOL)**. BYOL is a self-supervised learning method that learns image representations without using negative pairs. It consists of two...

Implementing a Multi-Headed Attention Mechanism

this year hard (>1 hr) | transformer attention scratch nlp multi-head

Expand on the previous attention exercise by implementing a **Multi-Headed Attention mechanism** from scratch. A single attention head is a dot-product attention as you've implemented. Multi-head...

Implementing a Masked Language Model

this year hard (>1 hr) | nlp model masked language bert

Implement a **Masked Language Model (MLM)**, a technique at the heart of models like BERT. Given a sentence, you'll randomly mask some of the words and then train a model to predict those masked...

Building a Graph Autoencoder

this year hard (>1 hr) | autoencoder gnn graph representation gae

Implement a **Graph Autoencoder (GAE)** for graph representation learning. The encoder will use a GNN to produce node embeddings, and the decoder will reconstruct the graph's adjacency matrix from...

Adversarial Training for Robustness

this year hard (>1 hr) | training cnn adversarial robustness fgsm

Implement **adversarial training** on a simple classification model like a small CNN on MNIST. The goal is to make the model robust to adversarial attacks. You'll need to generate adversarial...

Neural Style Transfer

this year hard (>1 hr) | style transfer vgg loss image

Implement **Neural Style Transfer**. Given a content image and a style image, generate a new image that combines the content of the former with the style of the latter. Use a pre-trained VGG...

Training a Variational Autoencoder (VAE)

this year hard (>1 hr) | loss vae autoencoder generative kl

Implement and train a **Variational Autoencoder (VAE)** on a dataset like MNIST. The encoder should map the input to a latent space distribution (mean and variance), and the decoder should...

Implementing the Adam Optimizer from Scratch

this year hard (>1 hr) | optimizer adam from scratch gradient

Implement the **Adam optimizer from scratch** as a subclass of `torch.optim.Optimizer`. You'll need to manage the first-moment vector (moving average of gradients) and the second-moment vector...

Building a Transformer Encoder from Scratch

this year hard (>1 hr) | transformer attention scratch encoder neural

Implement a single layer of a **Transformer Encoder** from scratch, without using `torch.nn.TransformerEncoderLayer`. This requires implementing a multi-head self-attention module and a...

Implementing a Siamese Network for Similarity Learning

this year hard (>1 hr) | siamese network similarity contrastive mnist

Build and train a **Siamese network** on a dataset like MNIST. The network takes pairs of images as input and learns to determine if they belong to the same class (a positive pair) or different...

Generative Adversarial Network (GAN) on MNIST

this year hard (>1 hr) | training adversarial generative mnist gan

Implement and train a simple **Generative Adversarial Network (GAN)**. The network consists of a generator and a discriminator. The generator takes a random noise vector and tries to generate a...

Distributed Data Parallel Training

this year hard (>1 hr) | training distributed parallel multi-gpu speed

Set up a **distributed data parallel training** script using `torch.nn.parallel.DistributedDataParallel` and `torch.distributed`. You'll need to use `torch.multiprocessing.spawn` to launch...

Implementing a Graph Neural Network (GNN) Layer

this year hard (>1 hr) | network gnn graph node embedding

Implement a simple **Graph Neural Network (GNN) layer** for node classification on a small graph. The layer should aggregate features from a node's neighbors and combine them with the node's own...

Building a Neural Ordinary Differential Equation (NODE) Layer

this year hard (>1 hr) | node ode continuous differential equation

Implement a simple **Neural Ordinary Differential Equation (NODE) layer**. This involves defining a `torch.nn.Module` that represents the derivative function $f(t, z(t))$ and then using...

Implementing a Simple VAE for Text (Sentence VAE)

this year hard (>1 hr) | rnn vae generative text nlp

Implement a **Variational Autoencoder (VAE)** for text, often called a Sentence VAE. The encoder will be an RNN (e.g., GRU) that outputs a latent distribution, and the decoder will be another RNN...

Differentiating Through a Non-differentiable Function with `torch.autograd.Function`

this year hard (<1 hr) | autograd custom gradient function backprop

Implement a **custom `torch.autograd.Function`** for a non-differentiable operation, such as a custom quantization function. The `forward` method will perform the non-differentiable operation, and...

Batch Normalization From Scratch

this year hard (>1 hr) | pytorch implementation batchnorm deep learning

Implement 1D batch normalization manually (without using `nn.BatchNorm1d`). Steps: 1. Compute batch mean and variance. 2. Normalize inputs. 3. Scale and shift with learnable $$\gamma, \beta$$....

Debug Exploding Gradients

this year hard (>1 hr) | pytorch training gradients debugging

Create a deep feedforward net (20 layers, ReLU). Train it on dummy data. Track gradient norms across layers. Observe if gradients explode. Experiment with: - Smaller learning rate. - Gradient...