ML Katas

L2 Regularization Gradient

easy (<10 mins) regularization gradient linear-regression calculus overfitting
this year by E

L2 regularization (also known as Ridge Regression or weight decay) is a common technique to prevent overfitting in machine learning models by adding a penalty proportional to the square of the magnitude of the model's weights to the loss function.

The L2 regularization term for a weight vector 𝐰 (excluding the bias term) is given by:

R(𝐰)=λ2j=1Dwj2=λ2𝐰22

where λ is the regularization strength.

Your task is to analytically derive the gradient of this L2 regularization term with respect to each weight wj and then implement a function to compute this gradient for a given weight vector.

Derivation (to be done by the user, not provided here): Compute R(𝐰)wj for each j.

Implementation Details: * l2_regularization_gradient(weights, lambda_val): * weights: A NumPy array (vector) representing the model weights. * lambda_val: A scalar representing the regularization strength λ. * The function should return a NumPy array of the same shape as weights, containing the gradient of the L2 regularization term with respect to each weight.

Verification: 1. Choose a simple weight vector, e.g., weights = np.array([1.0, -2.0, 3.0]), and a lambda_val = 0.1. 2. Manually calculate the analytical gradient based on your derivation. 3. Compare the output of your function with your manual calculation. The values should match exactly. 4. For a more rigorous check, you can use numerical gradient checking (from Exercise 1) on the function f(𝐰)=λ2𝐰22.