- 
                
                    MoE Gating and DispatchA core component of a Mixture of Experts model is the 'gating network' which determines which expert(s) each token should be sent to. This is often a `top-k` selection. Your task is to implement... 
- 
                
                    Batched Expert Forward Pass with EinopsA naive implementation of an MoE layer might involve a loop over the experts. This is inefficient. A much better approach is to perform a single, batched matrix multiplication for all expert... 
- 
                
                    Custom Dataset for CSV DataWrite a PyTorch `Dataset` class that loads data from a CSV file containing tabular data (features + labels). Requirements: - Use `pandas` to read the CSV. - Convert features and labels to tensors.... 
- 
                
                    Implement Dropout ManuallyImplement dropout as a function `my_dropout(x, p)`: - Zero out elements of `x` with probability `p`. - Scale survivors by $$1/(1-p)$$. - Ensure deterministic behavior when `torch.manual_seed` is... 
- 
                
                    Custom Activation FunctionDefine a custom activation function called `Swish`: $$f(x) = x \cdot \sigma(x)$$. - Implement it as a PyTorch `nn.Module`. - Train a small MLP on random data with it. - Compare with ReLU... 
- 
                
                    Visualize Training with TensorBoardIntegrate TensorBoard into a training loop: - Log training loss and validation accuracy. - Add histograms of weights and gradients. - Write a few sample images. Open TensorBoard and verify logs. 
- 
                
                    Implement Early StoppingAdd early stopping to a training loop: - Monitor validation loss. - Stop training if no improvement after 5 epochs. - Save best model checkpoint. Demonstrate on MNIST subset.