All Katas - ML Katas

Sparse MoE Top-K Gating

this year easy (<10 mins) | pytorch einops gating moe

### Description In a Mixture of Experts (MoE) model, the gating network is a crucial component that determines which 'expert' subnetworks process each token. [1] A common strategy is **top-k...

Hierarchical Patch Merging with Einops

this year easy (<10 mins) | pytorch einops vision swin

### Description In hierarchical vision transformers like the Swin Transformer, **patch merging** is used to downsample the feature map, effectively reducing the number of tokens while increasing...

Sliding Window Attention Preparation

this year medium (<10 mins) | pytorch transformer attention einops

### Description Full self-attention has a quadratic complexity with respect to sequence length, which is prohibitive for very long sequences. Models like Longformer introduce **sliding window...

Batch-wise Matrix Transposition

this year easy (<10 mins) | linear algebra einops tensor manipulation

### Description Given a batch of matrices, transpose each matrix in the batch. The input tensor has a shape of `(B, H, W)`, and the output should be `(B, W, H)`. ### Starter Code ```python import...

Global Average Pooling

this year easy (<10 mins) | cnn einops reduction

### Description Implement global average pooling, a common operation in convolutional neural networks. For a batch of feature maps of shape `(B, C, H, W)`, you need to compute the mean of each...

Multi-Head Attention: Splitting Heads

this year medium (<10 mins) | transformer attention einops

### Description In multi-head attention, the query, key, and value tensors are split into multiple heads. Given a tensor of shape `(B, N, D)`, where `D` is the embedding dimension, you need to...

Multi-Head Attention: Merging Heads

this year medium (<10 mins) | transformer attention einops

### Description The inverse of splitting heads. After computing attention for each head, you need to merge them back. Given a tensor of shape `(B, H, N, D//H)`, you need to merge it back to `(B,...

Tile a Tensor

this year easy (<10 mins) | einops tensor manipulation broadcasting

### Description Given a tensor, repeat its values along one or more dimensions. For example, given a tensor of shape `(H, W)`, you might want to create a batch of `B` identical copies, resulting...

Concatenate Tensors Along a New Axis

this year easy (<10 mins) | einops tensor manipulation

### Description Given a list of tensors of the same shape, concatenate them along a new axis. For example, given 3 tensors of shape `(H, W)`, you want to create a single tensor of shape `(3, H,...

Channel-wise Max Pooling

this year easy (<10 mins) | einops reduction pooling

### Description Perform max pooling over the channel dimension. Given a tensor of shape `(B, C, H, W)`, find the maximum value across all channels for each spatial location. The output should have...

Swap Height and Width

this year easy (<10 mins) | einops tensor manipulation image processing

### Description For a batch of images or feature maps, swap the height and width dimensions. The input shape is `(B, C, H, W)` and the output shape should be `(B, C, W, H)`. ### Starter Code...

Permute Dimensions Cyclically

this year easy (<10 mins) | einops tensor manipulation

### Description Perform a cyclic permutation of the dimensions of a tensor. For a tensor of shape `(D1, D2, D3, D4)`, a cyclic permutation would result in `(D4, D1, D2, D3)`. ### Starter Code...

Flatten Leading Dimensions

this year easy (<10 mins) | einops reshaping

### Description Given a tensor with multiple leading dimensions, flatten them into a single dimension. For example, transform a tensor of shape `(D1, D2, D3, D4)` into `(D1*D2, D3, D4)`. ###...

Unflatten a Dimension

this year easy (<10 mins) | einops reshaping

### Description This is the inverse of flattening. Given a tensor where the first dimension is a product of two other dimensions, unflatten it. For example, transform a tensor of shape `(D1*D2,...

Pixel Unshuffle (Pixel to Channel)

this year medium (<10 mins) | einops super-resolution pixel shuffle

### Description This is another name for the space-to-depth operation, common in super-resolution models. It involves rearranging blocks of spatial data into the channel dimension. Given a tensor...

Pixel Shuffle (Channel to Pixel)

this year medium (<10 mins) | einops super-resolution pixel shuffle

### Description The inverse of pixel unshuffle, also known as depth-to-space. It is used to upscale an image by rearranging elements from the channel dimension into spatial blocks. Given a tensor...