Tensor Manipulation: Implement Layer Normalization

medium (<10 mins) pytorch normalization tensor

today by E

Description

Layer Normalization is a key component in many modern deep learning models, especially Transformers. It normalizes the inputs across the feature dimension. Your task is to implement it from scratch using basic PyTorch tensor operations.

Equation

y = \frac{x - E [x]}{\sqrt{V a r [x] + ϵ}} * γ + β

Guidance

Your function should take a tensor x, a learnable gain gamma, and a learnable bias beta. 1. Calculate the mean and variance along the last dimension (dim=-1). Keep the dimension for broadcasting. 2. Use these to normalize x. 3. Remember to include a small epsilon for numerical stability inside the square root. 4. Apply the learnable gain and bias.

Starter Code

import torch

def layer_norm(x, gamma, beta, epsilon=1e-5):
    # x shape: (B, N, D)
    # gamma, beta shape: (D,)
    mean = x.mean(dim=-1, keepdim=True)
    var = x.var(dim=-1, unbiased=False, keepdim=True)

    x_normalized = (x - mean) / torch.sqrt(var + epsilon)

    return x_normalized * gamma + beta

Verification

Create a random tensor x and corresponding gamma and beta tensors. Pass them through your function and an instance of torch.nn.LayerNorm. The outputs should be nearly identical (e.g., using torch.allclose).

References

Ba, J. L., et al. (2016). Layer Normalization.