Implementing a Simple Attention Mechanism

medium (<1 hr) rnn attention mechanism seq2seq weights

this year by E

Implement a simple attention mechanism for a sequence-to-sequence model. Given a sequence of encoder outputs and a single decoder hidden state, your attention module should compute attention weights and a context vector. The attention weights are calculated as a dot product between the decoder state and each encoder output, followed by a softmax. The context vector is a weighted sum of the encoder outputs.

Verification: Test the module with random tensors. The attention weights should sum to 1 across the sequence dimension. The final context vector should be a weighted average of the encoder outputs, and its shape should be correct.