All Katas - ML Katas

Implementing a Multi-Headed Attention Mechanism

this year hard (>1 hr) | transformer attention scratch nlp multi-head

Expand on the previous attention exercise by implementing a **Multi-Headed Attention mechanism** from scratch. A single attention head is a dot-product attention as you've implemented. Multi-head...

Implementing the Adam Optimizer from Scratch

this year hard (>1 hr) | optimizer adam from scratch gradient

Implement the **Adam optimizer from scratch** as a subclass of `torch.optim.Optimizer`. You'll need to manage the first-moment vector (moving average of gradients) and the second-moment vector...

Building a Transformer Encoder from Scratch

this year hard (>1 hr) | transformer attention scratch encoder neural

Implement a single layer of a **Transformer Encoder** from scratch, without using `torch.nn.TransformerEncoderLayer`. This requires implementing a multi-head self-attention module and a...