Backpropagation for a Single-Layer Network
Backpropagation is the cornerstone algorithm for training neural networks. It efficiently calculates the gradients of the loss function with respect to all the weights and biases in the network by applying the chain rule.
Your task is to implement the forward and backward passes for a simple two-layer neural network (one hidden layer) using the sigmoid activation function for the hidden layer and a linear output layer (for regression, or you can extend to softmax for classification if you prefer).
Network Architecture: * Input layer: (batch size , input features ) * Hidden layer: (hidden units ) * Activation function: Sigmoid * Derivative of Sigmoid: * Output layer: (output features ) * Loss function: Mean Squared Error (MSE)
Derivations (to be done by the user): You need to derive the gradients for using the chain rule. * * , * * (input to sigmoid) * ,
Implementation Details:
Implement a Python class TwoLayerNN with:
* __init__(self, input_size, hidden_size, output_size): Initializes weights and biases with small random values.
* forward(self, X): Performs the forward pass, returning predictions Y_hat and caching intermediate values needed for backpropagation (e.g., hidden layer activations, inputs to activations).
* backward(self, X, Y_true, Y_hat): Performs the backward pass, computing gradients for .
* update_parameters(self, learning_rate): Updates weights and biases using the computed gradients.
* train(self, X, Y_true, learning_rate, n_epochs): A training loop that combines forward, backward, and update.
Verification: 1. Generate a small synthetic dataset for a regression task (e.g., ). 2. Initialize the network and perform a few forward/backward passes. 3. Crucially: Use numerical gradient checking (from Exercise 1) to verify each of your derived analytical gradients (). This is the most important step for debugging backpropagation. For example, to check , you would define a wrapper function that takes as input, uses the current (held constant), performs a forward pass, computes the loss, and returns the scalar loss. Then compare its numerical gradient with your analytical . Repeat for all parameters. 4. Train the network for several epochs and observe if the loss decreases.