ML Katas

Backpropagation for a Single-Layer Network

hard (>1 hr) gradient deep-learning calculus backpropagation neural-networks
this year by E

Backpropagation is the cornerstone algorithm for training neural networks. It efficiently calculates the gradients of the loss function with respect to all the weights and biases in the network by applying the chain rule.

Your task is to implement the forward and backward passes for a simple two-layer neural network (one hidden layer) using the sigmoid activation function for the hidden layer and a linear output layer (for regression, or you can extend to softmax for classification if you prefer).

Network Architecture: * Input layer: X (batch size N, input features Di) * Hidden layer: H=σ(XW1+b1) (hidden units Dh) * Activation function: Sigmoid σ(x)=11+ex * Derivative of Sigmoid: σ(x)=σ(x)(1σ(x)) * Output layer: Y^=HW2+b2 (output features Do) * Loss function: Mean Squared Error (MSE) L=12N(Y^Y)2

Derivations (to be done by the user): You need to derive the gradients for W1,b1,W2,b2 using the chain rule. * LY^ * LW2, Lb2 * LH * LactivationinputH (input to sigmoid) * LW1, Lb1

Implementation Details: Implement a Python class TwoLayerNN with: * __init__(self, input_size, hidden_size, output_size): Initializes weights and biases with small random values. * forward(self, X): Performs the forward pass, returning predictions Y_hat and caching intermediate values needed for backpropagation (e.g., hidden layer activations, inputs to activations). * backward(self, X, Y_true, Y_hat): Performs the backward pass, computing gradients for W1,b1,W2,b2. * update_parameters(self, learning_rate): Updates weights and biases using the computed gradients. * train(self, X, Y_true, learning_rate, n_epochs): A training loop that combines forward, backward, and update.

Verification: 1. Generate a small synthetic dataset for a regression task (e.g., Y=sin(X1)+cos(X2)+noise). 2. Initialize the network and perform a few forward/backward passes. 3. Crucially: Use numerical gradient checking (from Exercise 1) to verify each of your derived analytical gradients (dW1,db1,dW2,db2). This is the most important step for debugging backpropagation. For example, to check dW1, you would define a wrapper function that takes W1 as input, uses the current b1,W2,b2 (held constant), performs a forward pass, computes the loss, and returns the scalar loss. Then compare its numerical gradient with your analytical dW1. Repeat for all parameters. 4. Train the network for several epochs and observe if the loss decreases.