Implementing Self-Supervised Learning with BYOL
Implement the core logic of Bootstrap Your Own Latent (BYOL). BYOL is a self-supervised learning method that learns image representations without using negative pairs. It consists of two interacting networks: a 'online' network and a 'target' network. The online network learns to predict the target network's output of the same image under a different augmentation. The target network is an exponentially moving average of the online network. The loss is the Mean Squared Error between the outputs. This is a great exercise in a modern, non-mainstream topic.
Verification: While training, monitor the loss. It should decrease. A more thorough verification would be to use the trained encoder to perform a simple classification task (e.g., on a subset of ImageNet) to see if it learned meaningful representations. You should get a good linear classification accuracy.