Tensor Manipulation: Implement Layer Normalization
Description
Layer Normalization is a key component in many modern deep learning models, especially Transformers. It normalizes the inputs across the feature dimension. Your task is to implement it from scratch using basic PyTorch tensor operations.
Equation
Guidance
Your function should take a tensor x
, a learnable gain gamma
, and a learnable bias beta
.
1. Calculate the mean and variance along the last dimension (dim=-1
). Keep the dimension for broadcasting.
2. Use these to normalize x
.
3. Remember to include a small epsilon
for numerical stability inside the square root.
4. Apply the learnable gain and bias.
Starter Code
import torch
def layer_norm(x, gamma, beta, epsilon=1e-5):
# x shape: (B, N, D)
# gamma, beta shape: (D,)
mean = x.mean(dim=-1, keepdim=True)
var = x.var(dim=-1, unbiased=False, keepdim=True)
x_normalized = (x - mean) / torch.sqrt(var + epsilon)
return x_normalized * gamma + beta
Verification
Create a random tensor x
and corresponding gamma
and beta
tensors. Pass them through your function and an instance of torch.nn.LayerNorm
. The outputs should be nearly identical (e.g., using torch.allclose
).