Implementing Layer Normalization from Scratch

medium (<30 mins) training custom module normalization layer

this year by E

Implement Layer Normalization as a custom torch.nn.Module. Unlike BatchNorm, LayerNorm normalizes across the features of a single sample, not a batch. Your implementation should calculate the mean and standard deviation for each sample in the batch independently and then normalize the features, followed by a learned scale and shift.

y = (x - μ) / \sqrt{σ^{2} + ϵ} * γ + β

Verification: Compare the output of your custom LayerNorm module with torch.nn.LayerNorm for a random tensor. The outputs should be numerically very close, with only minor floating-point differences.