L2 Regularization's Gradient Impact
L2 regularization (also known as weight decay) is a common technique to prevent overfitting.
- Loss Function: Consider a simple linear regression loss with L2 regularization: .
- Gradient Derivation: Derive the gradient of this loss with respect to the weight vector . Show how the regularization term modifies the gradient update rule for .
- Intuition: Explain how this modification "pushes" weights towards zero. Why does it prevent weights from growing too large, and how does this help with overfitting?
- Bias Term: Why is the bias term typically not regularized? Explain the intuition.
- Verification: Implement a simple linear regression model with and without L2 regularization using a small dataset. Observe the magnitude of the learned weights in both cases.