Einops: Space-to-Depth Transformation

easy (<10 mins) einops vision tensor

today by E

Description

Space-to-depth is an operation that rearranges blocks of spatial data into the channel dimension. For an input of shape (B, C, H, W) and a block size S, the output will be (B, C * S*S, H/S, W/S). This is common in models that want to reduce spatial resolution while increasing the number of features, like in some super-resolution architectures.

Guidance

Use einops.rearrange to perform this transformation. The key is to decompose the height and width dimensions into new spatial dimensions and the block size, then move the block dimensions into the channel dimension.

Starter Code

import torch
from einops import rearrange

def space_to_depth(x, block_size):
    # x: (B, C, H, W)
    # Your rearrange pattern here
    output = rearrange(x, 'b c (h s1) (w s2) -> b (c s1 s2) h w', s1=block_size, s2=block_size)
    return output

Verification

For an input of shape (10, 3, 224, 224) and block_size=2, the output shape should be (10, 12, 112, 112).