All Katas - ML Katas

A simple MLP

this year medium (<1 hr) | jit grad autodiff MLP

Now that you have all of the basic building blocks, it's time to build a simple multi-layer perceptron (MLP). In this exercise, you are to build a 2-layer MLP for a regression problem. 1....

A simple CNN

this year medium (<1 hr) | jit autodiff CNN conv

In this exercise, you will implement a simple convolutional neural network (CNN) for a regression problem. You can use `jax.lax.conv_general_dilated` to implement the convolution. 1. Implement a...

Working with a NN library: Flax

this year medium (<1 hr) | flax MLP TrainState

While it is possible to build neural networks from scratch in JAX, it is often more convenient to use a library like Flax or Haiku. These libraries provide common neural network layers and...

Softmax's Numerical Stability: The Max Trick

this year medium (<1 hr) | Optimization Softmax Numerical Stability Log-Sum-Exp

While the standard softmax formula $\text{softmax}(z_i) = \frac{e^{z_i}}{\sum_j e^{z_j}}$ is mathematically correct, a direct implementation can lead to numerical instability due to potential...

The Gradients of Activation Functions

this year medium (<1 hr) | Activation Functions Backpropagation Derivatives

Activation functions introduce non-linearity into neural networks, but their derivatives are crucial for backpropagation. 1. **Sigmoid**: Given $\sigma(x) = \frac{1}{1 + e^{-x}}$, derive...

L2 Regularization's Gradient Impact

this year medium (<1 hr) | Regularization Overfitting L2 Norm

L2 regularization (also known as weight decay) is a common technique to prevent overfitting. 1. **Loss Function**: Consider a simple linear regression loss with L2 regularization: $J(\mathbf{w},...

The Stabilizing Power of Batch Normalization

this year medium (<1 hr) | Optimization Batch Normalization Deep Learning

Batch Normalization (BatchNorm) is a crucial technique for stabilizing and accelerating deep neural network training. 1. **Normalization Step**: Given a mini-batch of activations $X = \{x_1, x_2,...

Riding the Momentum Wave in Optimization

this year medium (<1 hr) | Gradient Descent Optimization Momentum Hyperparameters

Stochastic Gradient Descent (SGD) with momentum is a popular optimization algorithm that often converges faster and more stably than plain SGD. 1. **Update Rule**: The update rule for SGD with...

Numerical Gradient Verification

this year medium (<1 hr) | debugging gradient calculus numerical-stability backpropagation

Understanding and correctly implementing backpropagation is crucial in deep learning. A common way to debug backpropagation is using numerical gradient checking. This involves approximating the...

Linear Regression via Gradient Descent

this year medium (<1 hr) | optimization linear-regression calculus gradient-descent machine-learning

Linear regression is a foundational supervised learning algorithm. Given a dataset of input features $X$ and corresponding target values $y$, the goal is to find a linear relationship $y =...

MoE Gating and Dispatch

this year medium (<1 hr) | pytorch moe tensor-manipulation

A core component of a Mixture of Experts model is the 'gating network' which determines which expert(s) each token should be sent to. This is often a `top-k` selection. Your task is to implement...

Batched Expert Forward Pass with Einops

this year medium (<1 hr) | pytorch einops performance

A naive implementation of an MoE layer might involve a loop over the experts. This is inefficient. A much better approach is to perform a single, batched matrix multiplication for all expert...

Implementing a Custom `nn.Module` for a Gated Recurrent Unit (GRU)

this year medium (<1 hr) | rnn gru custom module recurrent

Implement a **custom GRU cell** as a subclass of `torch.nn.Module`. Your implementation should handle the reset gate, update gate, and the new hidden state computation from scratch, using...

Implementing a Custom Learning Rate Scheduler

this year medium (<1 hr) | training custom scheduler learning rate

Implement a **custom learning rate scheduler** that follows a cosine annealing schedule. The learning rate starts high and decreases smoothly to a minimum value, then resets and repeats. Your...

Transfer Learning with a Pre-trained Model

this year medium (<1 hr) | cnn transfer learning resnet fine-tuning

Fine-tune a **pre-trained model** (e.g., `resnet18` from `torchvision.models`) on a new, small image classification dataset (e.g., `CIFAR-10`). You'll need to freeze the weights of the initial...

Implementing a Simple Attention Mechanism

this year medium (<1 hr) | rnn attention mechanism seq2seq weights

Implement a **simple attention mechanism** for a sequence-to-sequence model. Given a sequence of encoder outputs and a single decoder hidden state, your attention module should compute attention...

Implementing Weight Initialization Schemes

this year medium (<1 hr) | training initialization weight he xavier

Implement **different weight initialization schemes** (e.g., Xavier/Glorot, He) for a simple neural network. Create a function that iterates through a model's parameters and applies a chosen...

Custom Dataset for CSV Data

this year medium (<1 hr) | pytorch dataset dataloader tabular data

Write a PyTorch `Dataset` class that loads data from a CSV file containing tabular data (features + labels). Requirements: - Use `pandas` to read the CSV. - Convert features and labels to tensors....

Implement Dropout Manually

this year medium (<1 hr) | pytorch implementation dropout regularization

Implement dropout as a function `my_dropout(x, p)`: - Zero out elements of `x` with probability `p`. - Scale survivors by $$1/(1-p)$$. - Ensure deterministic behavior when `torch.manual_seed` is...

Custom Activation Function

this year medium (<1 hr) | pytorch implementation activations mlp

Define a custom activation function called `Swish`: $$f(x) = x \cdot \sigma(x)$$. - Implement it as a PyTorch `nn.Module`. - Train a small MLP on random data with it. - Compare with ReLU...

Visualize Training with TensorBoard

this year medium (<1 hr) | pytorch training tensorboard visualization

Integrate TensorBoard into a training loop: - Log training loss and validation accuracy. - Add histograms of weights and gradients. - Write a few sample images. Open TensorBoard and verify logs.

Implement Early Stopping

this year medium (<1 hr) | pytorch training basics early stopping

Add early stopping to a training loop: - Monitor validation loss. - Stop training if no improvement after 5 epochs. - Save best model checkpoint. Demonstrate on MNIST subset.