All Katas - ML Katas

Implementing a Multi-Headed Attention Mechanism

this year hard (>1 hr) | transformer attention scratch nlp multi-head

Expand on the previous attention exercise by implementing a **Multi-Headed Attention mechanism** from scratch. A single attention head is a dot-product attention as you've implemented. Multi-head...

Building a Transformer Encoder from Scratch

this year hard (>1 hr) | transformer attention scratch encoder neural

Implement a single layer of a **Transformer Encoder** from scratch, without using `torch.nn.TransformerEncoderLayer`. This requires implementing a multi-head self-attention module and a...

Create a Transformer Encoder Block

this year hard (>1 hr) | pytorch implementation transformer attention

Implement a single Transformer encoder block: - Multi-head self-attention. - Layer normalization. - Feedforward network. Compare output with `nn.TransformerEncoderLayer`.