All Katas - ML Katas

Model Compression with Pruning

this year medium (>1 hr) | training pruning model compression sparsify

Implement **model pruning** to reduce the size and computational cost of a trained model. Start with a simple, over-parameterized model (e.g., a fully-connected network on MNIST). Train it to a...

Implementing Gradient Clipping

this year easy (<10 mins) | rnn training gradient clipping stability

Implement **gradient clipping** in your training loop. This technique is used to prevent exploding gradients, which can be a problem in RNNs and other deep networks. After the backward pass...

Adversarial Training for Robustness

this year hard (>1 hr) | training cnn adversarial robustness fgsm

Implement **adversarial training** on a simple classification model like a small CNN on MNIST. The goal is to make the model robust to adversarial attacks. You'll need to generate adversarial...

Implementing a Custom Learning Rate Scheduler

this year medium (<1 hr) | training custom scheduler learning rate

Implement a **custom learning rate scheduler** that follows a cosine annealing schedule. The learning rate starts high and decreases smoothly to a minimum value, then resets and repeats. Your...

Generative Adversarial Network (GAN) on MNIST

this year hard (>1 hr) | training adversarial generative mnist gan

Implement and train a simple **Generative Adversarial Network (GAN)**. The network consists of a generator and a discriminator. The generator takes a random noise vector and tries to generate a...

Distributed Data Parallel Training

this year hard (>1 hr) | training distributed parallel multi-gpu speed

Set up a **distributed data parallel training** script using `torch.nn.parallel.DistributedDataParallel` and `torch.distributed`. You'll need to use `torch.multiprocessing.spawn` to launch...

Implementing Weight Initialization Schemes

this year medium (<1 hr) | training initialization weight he xavier

Implement **different weight initialization schemes** (e.g., Xavier/Glorot, He) for a simple neural network. Create a function that iterates through a model's parameters and applies a chosen...

Implementing Layer Normalization from Scratch

this year medium (<30 mins) | training custom module normalization layer

Implement **Layer Normalization** as a custom `torch.nn.Module`. Unlike `BatchNorm`, `LayerNorm` normalizes across the features of a single sample, not a batch. Your implementation should...

Gradient Clipping Example

this year medium (<30 mins) | pytorch rnn training gradients

Write code to: 1. Train a small RNN on dummy data. 2. Add gradient clipping using `torch.nn.utils.clip_grad_norm_`. 3. Print gradient norms before and after clipping. Show that exploding gradients...

Implement a Linear Regression Model

this year easy (<30 mins) | pytorch training linear regression basics

Build a simple linear regression model using `nn.Module`. Requirements: - One input feature, one output. - Train it on synthetic data $$y = 3x + 2 + \epsilon$$. - Use `MSELoss` and `SGD`. Check...

Checkpointing with torch.save

this year easy (<30 mins) | pytorch training basics checkpoints

Train a simple feedforward model for 1 epoch. Save: 1. Model state dict. 2. Optimizer state dict. 3. Epoch number. Then load the checkpoint and resume training seamlessly.

Weight Initialization Techniques

this year medium (<30 mins) | pytorch deep learning training initialization

Initialize a neural network's weights using different schemes: - Xavier initialization. - Kaiming initialization. Show histograms of weight distributions before and after initialization.

Debug Exploding Gradients

this year hard (>1 hr) | pytorch training gradients debugging

Create a deep feedforward net (20 layers, ReLU). Train it on dummy data. Track gradient norms across layers. Observe if gradients explode. Experiment with: - Smaller learning rate. - Gradient...

Visualize Training with TensorBoard

this year medium (<1 hr) | pytorch training tensorboard visualization

Integrate TensorBoard into a training loop: - Log training loss and validation accuracy. - Add histograms of weights and gradients. - Write a few sample images. Open TensorBoard and verify logs.

Gradient Accumulation Example

this year medium (<30 mins) | pytorch implementation training gradients

Simulate large-batch training using gradient accumulation: - Train with microbatches of size 4. - Accumulate gradients over 8 steps. - Update optimizer after accumulation. Verify final result...

Implement Early Stopping

this year medium (<1 hr) | pytorch training basics early stopping

Add early stopping to a training loop: - Monitor validation loss. - Stop training if no improvement after 5 epochs. - Save best model checkpoint. Demonstrate on MNIST subset.

Implement Label Smoothing

this year medium (<30 mins) | pytorch training label smoothing classification

Write a function to apply label smoothing for classification: - Replace one-hot targets with $$1-\epsilon$$ for true class, $$\epsilon/(K-1)$$ for others. - Use it in cross-entropy training. Show...

Distributed DataParallel Basics

this year hard (>1 hr) | pytorch training distributed dataparallel

Simulate training with `torch.nn.DataParallel`: - Define a simple CNN. - Run it on 2 GPUs (if available). - Verify batch is split across devices. Inspect `model.module` usage.

Mixed Precision Training with autocast

this year medium (<30 mins) | pytorch training mixed precision gpu

Modify a training loop to use `torch.cuda.amp.autocast`: - Wrap forward + loss in `autocast`. - Use `GradScaler` for backward. Compare training speed vs. full precision.