L2 Regularization Gradient

easy (<10 mins) regularization gradient linear-regression calculus overfitting

this year by E

L2 regularization (also known as Ridge Regression or weight decay) is a common technique to prevent overfitting in machine learning models by adding a penalty proportional to the square of the magnitude of the model's weights to the loss function.

The L2 regularization term for a weight vector $𝐰$ (excluding the bias term) is given by:

R (𝐰) = \frac{λ}{2} \sum_{j = 1}^{D} w_{j}^{2} = \frac{λ}{2} ‖ 𝐰 ‖_{2}^{2}

where $λ$ is the regularization strength.

Your task is to analytically derive the gradient of this L2 regularization term with respect to each weight $w_{j}$ and then implement a function to compute this gradient for a given weight vector.

Derivation (to be done by the user, not provided here): Compute $\frac{\partial R (𝐰)}{\partial w_{j}}$ for each $j$ .

Implementation Details: * l2_regularization_gradient(weights, lambda_val): * weights: A NumPy array (vector) representing the model weights. * lambda_val: A scalar representing the regularization strength $λ$ . * The function should return a NumPy array of the same shape as weights, containing the gradient of the L2 regularization term with respect to each weight.

Verification: 1. Choose a simple weight vector, e.g., weights = np.array([1.0, -2.0, 3.0]), and a lambda_val = 0.1. 2. Manually calculate the analytical gradient based on your derivation. 3. Compare the output of your function with your manual calculation. The values should match exactly. 4. For a more rigorous check, you can use numerical gradient checking (from Exercise 1) on the function $f (𝐰) = \frac{λ}{2} ‖ 𝐰 ‖_{2}^{2}$ .