ML Katas

Tensor Manipulation: Sinusoidal Positional Encoding

hard (<10 mins) transformer attention nlp tensor
yesterday by E

Description

Implement the sinusoidal positional encoding from the "Attention Is All You Need" paper. This technique adds information about the position of tokens in a sequence by injecting a unique, deterministic signal for each position.

Equations

PE(pos,2i)=sin(pos/100002i/dmodel) PE(pos,2i+1)=cos(pos/100002i/dmodel)

Guidance

  1. Create a position tensor (seq_len, 1) and a div_term tensor (d_model/2,).
  2. The div_term is calculated as 1 / (10000^(2i/d_model)). This can be done efficiently using torch.exp and torch.arange.
  3. Calculate the arguments for sin and cos by multiplying position and div_term.
  4. Assign the sin values to even indices and cos values to odd indices of the final encoding matrix.

Starter Code

import torch

def positional_encoding(seq_len, d_model):
    pe = torch.zeros(seq_len, d_model)
    position = torch.arange(0, seq_len, dtype=torch.float).unsqueeze(1)
    div_term = torch.exp(torch.arange(0, d_model, 2).float() * (-torch.log(torch.tensor(10000.0)) / d_model))

    pe[:, 0::2] = torch.sin(position * div_term)
    pe[:, 1::2] = torch.cos(position * div_term)

    return pe.unsqueeze(0) # Add batch dimension

Verification

The output shape for seq_len=50 and d_model=128 should be (1, 50, 128). You can visualize the resulting matrix. Each row (position) should have a unique sinusoidal pattern, and the frequency should decrease across the feature dimension.

References