L2 Regularization Gradient
L2 regularization (also known as Ridge Regression or weight decay) is a common technique to prevent overfitting in machine learning models by adding a penalty proportional to the square of the magnitude of the model's weights to the loss function.
The L2 regularization term for a weight vector (excluding the bias term) is given by:
where is the regularization strength.
Your task is to analytically derive the gradient of this L2 regularization term with respect to each weight and then implement a function to compute this gradient for a given weight vector.
Derivation (to be done by the user, not provided here): Compute for each .
Implementation Details:
* l2_regularization_gradient(weights, lambda_val):
* weights: A NumPy array (vector) representing the model weights.
* lambda_val: A scalar representing the regularization strength .
* The function should return a NumPy array of the same shape as weights, containing the gradient of the L2 regularization term with respect to each weight.
Verification:
1. Choose a simple weight vector, e.g., weights = np.array([1.0, -2.0, 3.0]), and a lambda_val = 0.1.
2. Manually calculate the analytical gradient based on your derivation.
3. Compare the output of your function with your manual calculation. The values should match exactly.
4. For a more rigorous check, you can use numerical gradient checking (from Exercise 1) on the function .