ML Katas

Model Compression with Pruning

medium (>1 hr) training pruning model compression sparsify
this month by E

Implement model pruning to reduce the size and computational cost of a trained model. Start with a simple, over-parameterized model (e.g., a fully-connected network on MNIST). Train it to a good accuracy. Then, use torch.nn.utils.prune to prune away a certain percentage of the weights (e.g., 50-70%) that have the smallest magnitude. Finally, fine-tune the pruned model and evaluate its performance. A key observation is that accuracy should drop only slightly, if at all.

Verification: After pruning, check the sparsity of your model by counting the number of zero weights. The sparsity percentage should match your pruning ratio. Compare the accuracy of the original model, the pruned model without fine-tuning, and the pruned model with fine-tuning to see the effect.