Implementing Weight Initialization Schemes

medium (<1 hr) training initialization weight he xavier

this year by E

Implement different weight initialization schemes (e.g., Xavier/Glorot, He) for a simple neural network. Create a function that iterates through a model's parameters and applies a chosen initialization method. You'll need to understand the theory behind why these schemes are better than random initialization, mainly to prevent vanishing or exploding gradients.

Verification: Train a deep network (e.g., 10 layers) with both random initialization and a more robust scheme like He initialization. Plot the histograms of the activations for each layer after the first forward pass. A good initialization should result in activations with a healthy distribution (e.g., non-zero mean, non-zero variance) across all layers, whereas a poor initialization will show activations collapsing to zero or exploding to large values.