Implementing a Simple Attention Mechanism
Implement a simple attention mechanism for a sequence-to-sequence model. Given a sequence of encoder outputs and a single decoder hidden state, your attention module should compute attention weights and a context vector. The attention weights are calculated as a dot product between the decoder state and each encoder output, followed by a softmax. The context vector is a weighted sum of the encoder outputs.
Verification: Test the module with random tensors. The attention weights should sum to 1 across the sequence dimension. The final context vector should be a weighted average of the encoder outputs, and its shape should be correct.