All Katas - ML Katas

The Elegant Gradient of Softmax-Cross-Entropy

this year hard (>1 hr) | Backpropagation Cross-Entropy Softmax Gradients

One of the most satisfying derivations in deep learning is the gradient of the combined Softmax and Cross-Entropy loss. For a multi-class classification problem with $K$ classes, given true labels...

Softmax's Numerical Stability: The Max Trick

this year medium (<1 hr) | Optimization Softmax Numerical Stability Log-Sum-Exp

While the standard softmax formula $\text{softmax}(z_i) = \frac{e^{z_i}}{\sum_j e^{z_j}}$ is mathematically correct, a direct implementation can lead to numerical instability due to potential...

Deconstructing Self-Attention Scores

this year hard (>1 hr) | Attention Transformers Softmax NLP

The self-attention mechanism is a core component of Transformers. Let's break down how attention scores are calculated. 1. **Query, Key, Value**: In self-attention, each input token (or its...