All Katas - ML Katas

by: newest upvotes saves

Sliding Window Attention Preparation

this year medium (<10 mins) | pytorch transformer attention einops

### Description Full self-attention has a quadratic complexity with respect to sequence length, which is prohibitive for very long sequences. Models like Longformer introduce **sliding window...
MoE Gating: Top-K Selection

this year medium (<10 mins) | pytorch deep learning MoE gating

### Description In a Mixture of Experts (MoE) model, a gating network is responsible for routing each input token to a subset of 'expert' networks. [6, 14] A common strategy is Top-K gating, where...