Tensor Manipulation: Sinusoidal Positional Encoding
Description
Implement the sinusoidal positional encoding from the "Attention Is All You Need" paper. This technique adds information about the position of tokens in a sequence by injecting a unique, deterministic signal for each position.
Equations
Guidance
- Create a
position
tensor(seq_len, 1)
and adiv_term
tensor(d_model/2,)
. - The
div_term
is calculated as1 / (10000^(2i/d_model))
. This can be done efficiently usingtorch.exp
andtorch.arange
. - Calculate the arguments for
sin
andcos
by multiplyingposition
anddiv_term
. - Assign the
sin
values to even indices andcos
values to odd indices of the final encoding matrix.
Starter Code
import torch
def positional_encoding(seq_len, d_model):
pe = torch.zeros(seq_len, d_model)
position = torch.arange(0, seq_len, dtype=torch.float).unsqueeze(1)
div_term = torch.exp(torch.arange(0, d_model, 2).float() * (-torch.log(torch.tensor(10000.0)) / d_model))
pe[:, 0::2] = torch.sin(position * div_term)
pe[:, 1::2] = torch.cos(position * div_term)
return pe.unsqueeze(0) # Add batch dimension
Verification
The output shape for seq_len=50
and d_model=128
should be (1, 50, 128)
. You can visualize the resulting matrix. Each row (position) should have a unique sinusoidal pattern, and the frequency should decrease across the feature dimension.