Implementing Layer Normalization from Scratch
Implement Layer Normalization as a custom torch.nn.Module
. Unlike BatchNorm
, LayerNorm
normalizes across the features of a single sample, not a batch. Your implementation should calculate the mean and standard deviation for each sample in the batch independently and then normalize the features, followed by a learned scale and shift.
Verification: Compare the output of your custom LayerNorm module with torch.nn.LayerNorm
for a random tensor. The outputs should be numerically very close, with only minor floating-point differences.