All Katas - ML Katas

Sliding Window Attention Preparation

this year medium (<10 mins) | pytorch transformer attention einops

### Description Full self-attention has a quadratic complexity with respect to sequence length, which is prohibitive for very long sequences. Models like Longformer introduce **sliding window...

Multi-Head Attention: Splitting Heads

this year medium (<10 mins) | transformer attention einops

### Description In multi-head attention, the query, key, and value tensors are split into multiple heads. Given a tensor of shape `(B, N, D)`, where `D` is the embedding dimension, you need to...

Multi-Head Attention: Merging Heads

this year medium (<10 mins) | transformer attention einops

### Description The inverse of splitting heads. After computing attention for each head, you need to merge them back. Given a tensor of shape `(B, H, N, D//H)`, you need to merge it back to `(B,...

Build a Transformer Encoder Block from Scratch

this year hard (<30 mins) | pytorch transformer attention nlp

### Description The Transformer architecture is built upon a fundamental component: the Encoder block. [1] Each block is responsible for processing a sequence of embeddings and refining them. Your...

Implementing a Multi-Headed Attention Mechanism

this year hard (>1 hr) | transformer attention scratch nlp multi-head

Expand on the previous attention exercise by implementing a **Multi-Headed Attention mechanism** from scratch. A single attention head is a dot-product attention as you've implemented. Multi-head...

Building a Transformer Encoder from Scratch

this year hard (>1 hr) | transformer attention scratch encoder neural

Implement a single layer of a **Transformer Encoder** from scratch, without using `torch.nn.TransformerEncoderLayer`. This requires implementing a multi-head self-attention module and a...

Create a Transformer Encoder Block

this year hard (>1 hr) | pytorch implementation transformer attention

Implement a single Transformer encoder block: - Multi-head self-attention. - Layer normalization. - Feedforward network. Compare output with `nn.TransformerEncoderLayer`.