-
Model Compression with Pruning
Implement **model pruning** to reduce the size and computational cost of a trained model. Start with a simple, over-parameterized model (e.g., a fully-connected network on MNIST). Train it to a...
-
Implementing a Custom Learning Rate Scheduler
Implement a **custom learning rate scheduler** that follows a cosine annealing schedule. The learning rate starts high and decreases smoothly to a minimum value, then resets and repeats. Your...
-
Implementing Weight Initialization Schemes
Implement **different weight initialization schemes** (e.g., Xavier/Glorot, He) for a simple neural network. Create a function that iterates through a model's parameters and applies a chosen...
-
Implementing Layer Normalization from Scratch
Implement **Layer Normalization** as a custom `torch.nn.Module`. Unlike `BatchNorm`, `LayerNorm` normalizes across the features of a single sample, not a batch. Your implementation should...
-
Gradient Clipping Example
Write code to: 1. Train a small RNN on dummy data. 2. Add gradient clipping using `torch.nn.utils.clip_grad_norm_`. 3. Print gradient norms before and after clipping. Show that exploding gradients...
-
Weight Initialization Techniques
Initialize a neural network's weights using different schemes: - Xavier initialization. - Kaiming initialization. Show histograms of weight distributions before and after initialization.
-
Visualize Training with TensorBoard
Integrate TensorBoard into a training loop: - Log training loss and validation accuracy. - Add histograms of weights and gradients. - Write a few sample images. Open TensorBoard and verify logs.
-
Gradient Accumulation Example
Simulate large-batch training using gradient accumulation: - Train with microbatches of size 4. - Accumulate gradients over 8 steps. - Update optimizer after accumulation. Verify final result...
-
Implement Early Stopping
Add early stopping to a training loop: - Monitor validation loss. - Stop training if no improvement after 5 epochs. - Save best model checkpoint. Demonstrate on MNIST subset.
-
Implement Label Smoothing
Write a function to apply label smoothing for classification: - Replace one-hot targets with $$1-\epsilon$$ for true class, $$\epsilon/(K-1)$$ for others. - Use it in cross-entropy training. Show...
-
Mixed Precision Training with autocast
Modify a training loop to use `torch.cuda.amp.autocast`: - Wrap forward + loss in `autocast`. - Use `GradScaler` for backward. Compare training speed vs. full precision.