All Katas - ML Katas

Einops: Multi-Head Attention Input Projection

today easy (<10 mins) | transformer attention einops

### Description In the Multi-Head Attention mechanism, the input tensor `(B, N, D)` is linearly projected to create the Query, Key, and Value matrices. These are then reshaped to have separate...

Tensor Manipulation: Causal Mask for Transformers

today easy (<10 mins) | transformer attention tensor

### Description In decoder-style Transformers (like GPT), we need a "causal" or "look-ahead" mask to prevent positions from attending to subsequent positions. This is typically a lower-triangular...

Einops: Transpose for Attention Output

today easy (<10 mins) | transformer attention einops

### Description After the multi-head attention calculation, the output tensor typically has the shape `(B, num_heads, N, head_dim)`. To feed this into the next layer (usually a feed-forward...