All Katas - ML Katas

by: newest upvotes saves

MoE Aggregator: Combining Expert Outputs

this year hard (>1 hr) | pytorch einops moe

After tokens have been dispatched to and processed by their respective experts, the outputs need to be combined based on the weights from the gating network. This exercise focuses on this...
Building a Simple Mixture of Experts (MoE) Layer

this year hard (>1 hr) | pytorch deep-learning moe

Now, let's combine the concepts of dispatching and aggregating into a full, albeit simplified, `torch.nn.Module` for a Mixture of Experts layer. This layer will replace a standard feed-forward...