PyTorch-Pruning

Motivation

Even though there are awesome researches (i.e., Awesome-Pruning; GitHub, GitHub) focused on pruning and sparsity, there are no (maybe... let me know if there are) open-source for fair and comprehensive benchmarks, making first-time users confused. And this made a question, "What is SOTA in the fair environment? More deeply, how can we profile them?"

Why can PyTorch-Pruning be a fair benchmark?

Therefore, PyTorch-Pruning mainly focuses on implementing/applying a variable of pruning research, benchmarking, and profiling in a fair baseline.

For example, in the LLaMA benchmarks, we use three evaluation metrics and prompts inspired by Wanda (Sun et al., 2023) and SparseGPT (ICML'23) :

Model (parameters) size
Latency : Time TO First Token (TTFT) and Time Per Output Token (TPOT)
Perplexity (PPL) scores : We compute it in same way like Wanda and SparseGPT
Input Prompt : We uses databricks-dolly-15k like Wanda, SparseGPT

Contribution Guide

Our main objective (2025-Q3 Roadmap) can be checked at here. If you have any ideas, feel free to comment, open issue. Every PR should be directly linked to main branch.

Since our goal is applying more researches for pruning (sparsity), we are not planning to apply inference engines like ONNX, TensorRT, vLLM, or TorchAO. But applying those engines is definitely a long-term objective, and always welcome!

User Guide

This projects recommend uv for pypi (python packaging index).

# Setup virtual environment
uv .venv
source .venv/bin/activate
uv pip install ".[dev]"

# Run any scripts for pruning
uv run experiments/pytorch_pruning.py

References

[1] TinyML and Efficient Deep Learning Computing (MIT-6.5940)

[2] A Survey on Deep Neural Network Pruning: Taxonomy, Comparison, Analysis, and Recommendations (IEEE'24)

[3] Pruning Deep Neural Networks from a Sparsity Perspective (ICLR'23, arXiv)

[4] APT: Adaptive Pruning and Tuning Pretrained LLM for Efficient Training and Inference (ICML'24)

[5] Fluctuation-based Adaptive Structured Pruning for Large Language Models (AAAI'24)

[6] Isomorphic Pruning for Vision Models (ECCV'24)

[7] How Well Do Sparse ImageNet Models Transfer? (CVPR'22)

[8] A Simple and Effective Pruning Approach for Large Language Models (ICLR'24)

[9] How to Quantize Transformer-based model for TensorRT Deployment)

[10] Ten Lessons We Have Learned in the New "Sparseland": A Short Handbook for Sparse Neural Network Researchers (ICLR'25)

[11] Depgraph: Towards any structural pruning (CVPR'23)

[12] Papers with Code : Pruning Benchmark

[13] Sparsity May Cry Benchmark (SMC-Bench, ICLR'23)

[14] SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot (ICML'23)

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
.github/workflows		.github/workflows
benchmarks		benchmarks
experiments		experiments
images		images
sparsegpt @ 147d215		sparsegpt @ 147d215
sparseml @ 1c04a01		sparseml @ 1c04a01
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PyTorch-Pruning

Motivation

Why can PyTorch-Pruning be a fair benchmark?

Contribution Guide

User Guide

References

About

Uh oh!

Languages

License

namgyu-youn/PyTorch-Pruning

Folders and files

Latest commit

History

Repository files navigation

PyTorch-Pruning

Motivation

Why can PyTorch-Pruning be a fair benchmark?

Contribution Guide

User Guide

References

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages