-
Implementing a Siamese Network with Triplet Loss
Building on the previous exercise, let's switch to **Triplet Loss**. This loss function is more powerful as it enforces a margin between an anchor-positive pair and an anchor-negative pair. The...
-
Implementing Self-Supervised Learning with BYOL
Implement the core logic of **Bootstrap Your Own Latent (BYOL)**. BYOL is a self-supervised learning method that learns image representations without using negative pairs. It consists of two...
-
Implementing a Multi-Headed Attention Mechanism
Expand on the previous attention exercise by implementing a **Multi-Headed Attention mechanism** from scratch. A single attention head is a dot-product attention as you've implemented. Multi-head...
-
Implementing a Masked Language Model
Implement a **Masked Language Model (MLM)**, a technique at the heart of models like BERT. Given a sentence, you'll randomly mask some of the words and then train a model to predict those masked...
-
Building a Graph Autoencoder
Implement a **Graph Autoencoder (GAE)** for graph representation learning. The encoder will use a GNN to produce node embeddings, and the decoder will reconstruct the graph's adjacency matrix from...
-
Adversarial Training for Robustness
Implement **adversarial training** on a simple classification model like a small CNN on MNIST. The goal is to make the model robust to adversarial attacks. You'll need to generate adversarial...
-
Neural Style Transfer
Implement **Neural Style Transfer**. Given a content image and a style image, generate a new image that combines the content of the former with the style of the latter. Use a pre-trained VGG...
-
Training a Variational Autoencoder (VAE)
Implement and train a **Variational Autoencoder (VAE)** on a dataset like MNIST. The encoder should map the input to a latent space distribution (mean and variance), and the decoder should...
-
Implementing the Adam Optimizer from Scratch
Implement the **Adam optimizer from scratch** as a subclass of `torch.optim.Optimizer`. You'll need to manage the first-moment vector (moving average of gradients) and the second-moment vector...
-
Building a Transformer Encoder from Scratch
Implement a single layer of a **Transformer Encoder** from scratch, without using `torch.nn.TransformerEncoderLayer`. This requires implementing a multi-head self-attention module and a...
-
Implementing a Siamese Network for Similarity Learning
Build and train a **Siamese network** on a dataset like MNIST. The network takes pairs of images as input and learns to determine if they belong to the same class (a positive pair) or different...
-
Generative Adversarial Network (GAN) on MNIST
Implement and train a simple **Generative Adversarial Network (GAN)**. The network consists of a generator and a discriminator. The generator takes a random noise vector and tries to generate a...
-
Distributed Data Parallel Training
Set up a **distributed data parallel training** script using `torch.nn.parallel.DistributedDataParallel` and `torch.distributed`. You'll need to use `torch.multiprocessing.spawn` to launch...
-
Implementing a Graph Neural Network (GNN) Layer
Implement a simple **Graph Neural Network (GNN) layer** for node classification on a small graph. The layer should aggregate features from a node's neighbors and combine them with the node's own...
-
Building a Neural Ordinary Differential Equation (NODE) Layer
Implement a simple **Neural Ordinary Differential Equation (NODE) layer**. This involves defining a `torch.nn.Module` that represents the derivative function $f(t, z(t))$ and then using...
-
Implementing a Simple VAE for Text (Sentence VAE)
Implement a **Variational Autoencoder (VAE)** for text, often called a Sentence VAE. The encoder will be an RNN (e.g., GRU) that outputs a latent distribution, and the decoder will be another RNN...
-
Batch Normalization From Scratch
Implement 1D batch normalization manually (without using `nn.BatchNorm1d`). Steps: 1. Compute batch mean and variance. 2. Normalize inputs. 3. Scale and shift with learnable $$\gamma, \beta$$....
-
Debug Exploding Gradients
Create a deep feedforward net (20 layers, ReLU). Train it on dummy data. Track gradient norms across layers. Observe if gradients explode. Experiment with: - Smaller learning rate. - Gradient...
-
Implement a Siamese Network
Implement a Siamese network for MNIST digit similarity: - Two identical CNNs sharing weights. - Contrastive loss function. - Train on pairs of digits (same/different). Evaluate on test pairs.
-
Create a Transformer Encoder Block
Implement a single Transformer encoder block: - Multi-head self-attention. - Layer normalization. - Feedforward network. Compare output with `nn.TransformerEncoderLayer`.
-
Distributed DataParallel Basics
Simulate training with `torch.nn.DataParallel`: - Define a simple CNN. - Run it on 2 GPUs (if available). - Verify batch is split across devices. Inspect `model.module` usage.