- 
                
                    Build a Transformer Encoder Block from Scratch### Description The Transformer architecture is built upon a fundamental component: the Encoder block. [1] Each block is responsible for processing a sequence of embeddings and refining them. Your... 
- 
                
                    Implementing a Multi-Headed Attention MechanismExpand on the previous attention exercise by implementing a **Multi-Headed Attention mechanism** from scratch. A single attention head is a dot-product attention as you've implemented. Multi-head... 
- 
                
                    Implementing a Masked Language ModelImplement a **Masked Language Model (MLM)**, a technique at the heart of models like BERT. Given a sentence, you'll randomly mask some of the words and then train a model to predict those masked... 
- 
                
                    Implementing a Simple VAE for Text (Sentence VAE)Implement a **Variational Autoencoder (VAE)** for text, often called a Sentence VAE. The encoder will be an RNN (e.g., GRU) that outputs a latent distribution, and the decoder will be another RNN...