Gradient Accumulation Example
Simulate large-batch training using gradient accumulation:
- Train with microbatches of size 4.
- Accumulate gradients over 8 steps.
- Update optimizer after accumulation.
Verify final result matches batch size 32 training.
Simulate large-batch training using gradient accumulation:
Verify final result matches batch size 32 training.