ML Katas

Create a Transformer Encoder Block

hard (>1 hr) pytorch implementation transformer attention

this year by E

Implement a single Transformer encoder block:

Multi-head self-attention.
Layer normalization.
Feedforward network.

Compare output with nn.TransformerEncoderLayer.