Model Compression with Pruning
Implement model pruning to reduce the size and computational cost of a trained model. Start with a simple, over-parameterized model (e.g., a fully-connected network on MNIST). Train it to a good accuracy. Then, use torch.nn.utils.prune
to prune away a certain percentage of the weights (e.g., 50-70%) that have the smallest magnitude. Finally, fine-tune the pruned model and evaluate its performance. A key observation is that accuracy should drop only slightly, if at all.
Verification: After pruning, check the sparsity of your model by counting the number of zero weights. The sparsity percentage should match your pruning ratio. Compare the accuracy of the original model, the pruned model without fine-tuning, and the pruned model with fine-tuning to see the effect.