-
Build a Transformer Encoder Block from Scratch
### Description The Transformer architecture is built upon a fundamental component: the Encoder block. [1] Each block is responsible for processing a sequence of embeddings and refining them. Your...
-
Implementing a Multi-Headed Attention Mechanism
Expand on the previous attention exercise by implementing a **Multi-Headed Attention mechanism** from scratch. A single attention head is a dot-product attention as you've implemented. Multi-head...
-
Implementing a Masked Language Model
Implement a **Masked Language Model (MLM)**, a technique at the heart of models like BERT. Given a sentence, you'll randomly mask some of the words and then train a model to predict those masked...
-
Implementing a Simple VAE for Text (Sentence VAE)
Implement a **Variational Autoencoder (VAE)** for text, often called a Sentence VAE. The encoder will be an RNN (e.g., GRU) that outputs a latent distribution, and the decoder will be another RNN...