ML Katas

Implementing Layer Normalization from Scratch

medium (<30 mins) training custom module normalization layer
this month by E

Implement Layer Normalization as a custom torch.nn.Module. Unlike BatchNorm, LayerNorm normalizes across the features of a single sample, not a batch. Your implementation should calculate the mean and standard deviation for each sample in the batch independently and then normalize the features, followed by a learned scale and shift.

y=(xμ)/σ2+ϵ*γ+β

Verification: Compare the output of your custom LayerNorm module with torch.nn.LayerNorm for a random tensor. The outputs should be numerically very close, with only minor floating-point differences.