Even though there are awesome researches (i.e., Awesome-Pruning; GitHub, GitHub) focused on pruning and sparsity, there are no (maybe... let me know if there are) open-source for fair and comprehensive benchmarks, making first-time users confused. And this made a question, "What is SOTA in the fair environment? More deeply, how can we profile them?"
Therefore, PyTorch-Pruning mainly focuses on implementing/applying a variable of pruning research, benchmarking, and profiling in a fair baseline.
For example, in the LLaMA benchmarks, we use three evaluation metrics and prompts inspired by Wanda (Sun et al., 2023) and SparseGPT (ICML'23) :
- Model (parameters) size
- Latency : Time TO First Token (TTFT) and Time Per Output Token (TPOT)
- Perplexity (PPL) scores : We compute it in same way like Wanda and SparseGPT
- Input Prompt : We uses
databricks-dolly-15k
like Wanda, SparseGPT
Our main objective (2025-Q3 Roadmap) can be checked at here. If you have any ideas, feel free to comment, open issue. Every PR should be directly linked to main
branch.
Since our goal is applying more researches for pruning (sparsity), we are not planning to apply inference engines like ONNX, TensorRT, vLLM, or TorchAO. But applying those engines is definitely a long-term objective, and always welcome!
This projects recommend uv
for pypi (python packaging index).
# Setup virtual environment
uv .venv
source .venv/bin/activate
uv pip install ".[dev]"
# Run any scripts for pruning
uv run experiments/pytorch_pruning.py
[1] TinyML and Efficient Deep Learning Computing (MIT-6.5940)
[2] A Survey on Deep Neural Network Pruning: Taxonomy, Comparison, Analysis, and Recommendations (IEEE'24)
[3] Pruning Deep Neural Networks from a Sparsity Perspective (ICLR'23, arXiv)
[4] APT: Adaptive Pruning and Tuning Pretrained LLM for Efficient Training and Inference (ICML'24)
[5] Fluctuation-based Adaptive Structured Pruning for Large Language Models (AAAI'24)
[6] Isomorphic Pruning for Vision Models (ECCV'24)
[7] How Well Do Sparse ImageNet Models Transfer? (CVPR'22)
[8] A Simple and Effective Pruning Approach for Large Language Models (ICLR'24)
[9] How to Quantize Transformer-based model for TensorRT Deployment)
[10] Ten Lessons We Have Learned in the New "Sparseland": A Short Handbook for Sparse Neural Network Researchers (ICLR'25)
[11] Depgraph: Towards any structural pruning (CVPR'23)
[12] Papers with Code : Pruning Benchmark
[13] Sparsity May Cry Benchmark (SMC-Bench, ICLR'23)
[14] SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot (ICML'23)