ML Katas

Softmax and its Jacobian

hard (>1 hr) classification gradient calculus numerical-stability softmax
this year by E

The softmax function is a critical component in multi-class classification, converting a vector of arbitrary real values into a probability distribution. Given an input vector 𝐳=[z1,z2,,zK], the softmax function is defined as:

σ(𝐳)i=ezij=1Kezj

Your task is twofold: 1. Implement the softmax function in Python using NumPy, ensuring it handles both 1D and 2D inputs (where rows are samples and columns are class scores). Pay attention to numerical stability, especially when dealing with large zi values, by using the "log-sum-exp trick" implicitly by subtracting the maximum value from 𝐳 before exponentiation: z=𝐳max(𝐳). 2. Derive and implement the Jacobian matrix of the softmax function with respect to its input vector 𝐳. The Jacobian 𝐉 is a K×K matrix where Jij=σ(𝐳)izj. * When i=j: σ(𝐳)izi=σ(𝐳)i(1σ(𝐳)i) * When ij: σ(𝐳)izj=σ(𝐳)iσ(𝐳)j

Implementation Details: * softmax(z): Takes a NumPy array z (1D or 2D) and returns the softmax probabilities. * softmax_jacobian(z): Takes a 1D NumPy array z and returns the K×K Jacobian matrix.

Verification: 1. Softmax: Test with various input vectors, including those with large positive/negative values. Compare your softmax implementation with scipy.special.softmax. 2. Jacobian: Use numerical gradient checking (from Exercise 1) to verify your analytical softmax_jacobian implementation. For a given input z, iterate through each element softmax_output[k] and calculate its numerical gradient with respect to z. Compare this numerical gradient vector to the k-th row of your softmax_jacobian(z) output.