ML Katas

L2 Regularization's Gradient Impact

medium (<1 hr) Regularization Overfitting L2 Norm
this year by E

L2 regularization (also known as weight decay) is a common technique to prevent overfitting.

  1. Loss Function: Consider a simple linear regression loss with L2 regularization: J(𝐰,b)=12mi=1m(yi(𝐰T𝐱i+b))2+λ2𝐰2.
  2. Gradient Derivation: Derive the gradient of this loss with respect to the weight vector 𝐰. Show how the regularization term modifies the gradient update rule for 𝐰.
  3. Intuition: Explain how this modification "pushes" weights towards zero. Why does it prevent weights from growing too large, and how does this help with overfitting?
  4. Bias Term: Why is the bias term b typically not regularized? Explain the intuition.
  5. Verification: Implement a simple linear regression model with and without L2 regularization using a small dataset. Observe the magnitude of the learned weights in both cases.