ML Katas

Distributed DataParallel Basics

hard (>1 hr) pytorch training distributed dataparallel
this month by E

Simulate training with torch.nn.DataParallel:

  • Define a simple CNN.
  • Run it on 2 GPUs (if available).
  • Verify batch is split across devices.

Inspect model.module usage.