All Katas - ML Katas

Sparse MoE Top-K Gating

this year easy (<10 mins) | pytorch einops gating moe

### Description In a Mixture of Experts (MoE) model, the gating network is a crucial component that determines which 'expert' subnetworks process each token. [1] A common strategy is **top-k...

Hierarchical Patch Merging with Einops

this year easy (<10 mins) | pytorch einops vision swin

### Description In hierarchical vision transformers like the Swin Transformer, **patch merging** is used to downsample the feature map, effectively reducing the number of tokens while increasing...

Sliding Window Attention Preparation

this year medium (<10 mins) | pytorch transformer attention einops

### Description Full self-attention has a quadratic complexity with respect to sequence length, which is prohibitive for very long sequences. Models like Longformer introduce **sliding window...

MoE Gating: Top-K Selection

this year medium (<10 mins) | pytorch deep learning MoE gating

### Description In a Mixture of Experts (MoE) model, a gating network is responsible for routing each input token to a subset of 'expert' networks. [6, 14] A common strategy is Top-K gating, where...