-
The Elegant Gradient of Softmax-Cross-Entropy
One of the most satisfying derivations in deep learning is the gradient of the combined Softmax and Cross-Entropy loss. For a multi-class classification problem with $K$ classes, given true labels...
-
The Gradients of Activation Functions
Activation functions introduce non-linearity into neural networks, but their derivatives are crucial for backpropagation. 1. **Sigmoid**: Given $\sigma(x) = \frac{1}{1 + e^{-x}}$, derive...
1