Tensor Manipulation: Sinusoidal Positional Encoding

hard (<10 mins) transformer attention nlp tensor

yesterday by E

Description

Implement the sinusoidal positional encoding from the "Attention Is All You Need" paper. This technique adds information about the position of tokens in a sequence by injecting a unique, deterministic signal for each position.

Equations

P E_{(p o s, 2 i)} = s i n (p o s / 10000^{2 i / d_{m o d e l}})

P E_{(p o s, 2 i + 1)} = c o s (p o s / 10000^{2 i / d_{m o d e l}})

Guidance

Create a position tensor (seq_len, 1) and a div_term tensor (d_model/2,).
The div_term is calculated as 1 / (10000^(2i/d_model)). This can be done efficiently using torch.exp and torch.arange.
Calculate the arguments for sin and cos by multiplying position and div_term.
Assign the sin values to even indices and cos values to odd indices of the final encoding matrix.

Starter Code

import torch

def positional_encoding(seq_len, d_model):
    pe = torch.zeros(seq_len, d_model)
    position = torch.arange(0, seq_len, dtype=torch.float).unsqueeze(1)
    div_term = torch.exp(torch.arange(0, d_model, 2).float() * (-torch.log(torch.tensor(10000.0)) / d_model))

    pe[:, 0::2] = torch.sin(position * div_term)
    pe[:, 1::2] = torch.cos(position * div_term)

    return pe.unsqueeze(0) # Add batch dimension

Verification

The output shape for seq_len=50 and d_model=128 should be (1, 50, 128). You can visualize the resulting matrix. Each row (position) should have a unique sinusoidal pattern, and the frequency should decrease across the feature dimension.

References

Vaswani, A., et al. (2017). Attention Is All You Need.