ML Katas

Create a Transformer Encoder Block

hard (>1 hr) pytorch implementation transformer attention
this month by E

Implement a single Transformer encoder block:

  • Multi-head self-attention.
  • Layer normalization.
  • Feedforward network.

Compare output with nn.TransformerEncoderLayer.