-
A simple MLP
Now that you have all of the basic building blocks, it's time to build a simple multi-layer perceptron (MLP). In this exercise, you are to build a 2-layer MLP for a regression problem. 1....
-
A simple CNN
In this exercise, you will implement a simple convolutional neural network (CNN) for a regression problem. You can use `jax.lax.conv_general_dilated` to implement the convolution. 1. Implement a...
-
Working with a NN library: Flax
While it is possible to build neural networks from scratch in JAX, it is often more convenient to use a library like Flax or Haiku. These libraries provide common neural network layers and...
-
Softmax's Numerical Stability: The Max Trick
While the standard softmax formula $\text{softmax}(z_i) = \frac{e^{z_i}}{\sum_j e^{z_j}}$ is mathematically correct, a direct implementation can lead to numerical instability due to potential...
-
The Gradients of Activation Functions
Activation functions introduce non-linearity into neural networks, but their derivatives are crucial for backpropagation. 1. **Sigmoid**: Given $\sigma(x) = \frac{1}{1 + e^{-x}}$, derive...
-
L2 Regularization's Gradient Impact
L2 regularization (also known as weight decay) is a common technique to prevent overfitting. 1. **Loss Function**: Consider a simple linear regression loss with L2 regularization: $J(\mathbf{w},...
-
The Stabilizing Power of Batch Normalization
Batch Normalization (BatchNorm) is a crucial technique for stabilizing and accelerating deep neural network training. 1. **Normalization Step**: Given a mini-batch of activations $X = \{x_1, x_2,...
-
Riding the Momentum Wave in Optimization
Stochastic Gradient Descent (SGD) with momentum is a popular optimization algorithm that often converges faster and more stably than plain SGD. 1. **Update Rule**: The update rule for SGD with...
-
Numerical Gradient Verification
Understanding and correctly implementing backpropagation is crucial in deep learning. A common way to debug backpropagation is using numerical gradient checking. This involves approximating the...
-
Linear Regression via Gradient Descent
Linear regression is a foundational supervised learning algorithm. Given a dataset of input features $X$ and corresponding target values $y$, the goal is to find a linear relationship $y =...
-
MoE Gating and Dispatch
A core component of a Mixture of Experts model is the 'gating network' which determines which expert(s) each token should be sent to. This is often a `top-k` selection. Your task is to implement...
-
Batched Expert Forward Pass with Einops
A naive implementation of an MoE layer might involve a loop over the experts. This is inefficient. A much better approach is to perform a single, batched matrix multiplication for all expert...
-
Implementing a Custom `nn.Module` for a Gated Recurrent Unit (GRU)
Implement a **custom GRU cell** as a subclass of `torch.nn.Module`. Your implementation should handle the reset gate, update gate, and the new hidden state computation from scratch, using...
-
Implementing a Custom Learning Rate Scheduler
Implement a **custom learning rate scheduler** that follows a cosine annealing schedule. The learning rate starts high and decreases smoothly to a minimum value, then resets and repeats. Your...
-
Transfer Learning with a Pre-trained Model
Fine-tune a **pre-trained model** (e.g., `resnet18` from `torchvision.models`) on a new, small image classification dataset (e.g., `CIFAR-10`). You'll need to freeze the weights of the initial...
-
Implementing a Simple Attention Mechanism
Implement a **simple attention mechanism** for a sequence-to-sequence model. Given a sequence of encoder outputs and a single decoder hidden state, your attention module should compute attention...
-
Implementing Weight Initialization Schemes
Implement **different weight initialization schemes** (e.g., Xavier/Glorot, He) for a simple neural network. Create a function that iterates through a model's parameters and applies a chosen...
-
Custom Dataset for CSV Data
Write a PyTorch `Dataset` class that loads data from a CSV file containing tabular data (features + labels). Requirements: - Use `pandas` to read the CSV. - Convert features and labels to tensors....
-
Implement Dropout Manually
Implement dropout as a function `my_dropout(x, p)`: - Zero out elements of `x` with probability `p`. - Scale survivors by $$1/(1-p)$$. - Ensure deterministic behavior when `torch.manual_seed` is...
-
Custom Activation Function
Define a custom activation function called `Swish`: $$f(x) = x \cdot \sigma(x)$$. - Implement it as a PyTorch `nn.Module`. - Train a small MLP on random data with it. - Compare with ReLU...
-
Visualize Training with TensorBoard
Integrate TensorBoard into a training loop: - Log training loss and validation accuracy. - Add histograms of weights and gradients. - Write a few sample images. Open TensorBoard and verify logs.
-
Implement Early Stopping
Add early stopping to a training loop: - Monitor validation loss. - Stop training if no improvement after 5 epochs. - Save best model checkpoint. Demonstrate on MNIST subset.