ML Katas

Debug Exploding Gradients

hard (>1 hr) pytorch training gradients debugging
this month by E

Create a deep feedforward net (20 layers, ReLU). Train it on dummy data. Track gradient norms across layers. Observe if gradients explode. Experiment with:

  • Smaller learning rate.
  • Gradient clipping.
  • Better initialization.