-
MoE Gating and Dispatch
A core component of a Mixture of Experts model is the 'gating network' which determines which expert(s) each token should be sent to. This is often a `top-k` selection. Your task is to implement...
1