Implementing the Adam Optimizer from Scratch
Implement the Adam optimizer from scratch as a subclass of torch.optim.Optimizer
. You'll need to manage the first-moment vector (moving average of gradients) and the second-moment vector (moving average of squared gradients) for each parameter. The update rule for each parameter p
is:
Verification: Train a small model (e.g., a linear layer) on a simple regression task using both your custom Adam optimizer and torch.optim.Adam
. The final loss values and parameter weights should be very close after a few epochs.