Introducing My Pruning Library
LLM pruning research is often hindered by the engineering complexity of reproducing activation-aware methods, which usually require custom hooks and intricate layer-wise management. To lower the barrier for experimentation, I developed nn-pruning: a modular PyTorch toolkit that standardizes activation collection and benchmarking. By decoupling pruning logic from the underlying model infrastructure, the project allows researchers to implement and compare new algorithms like Wanda or SparseGPT with minimal boilerplate.
Currently supported:
- Sparsity Patterns: Unstructured and Semi-structured
N:M - Model Families: OPT (
facebook/opt)
Repository nn-pruning
To validate the toolkit, I reproduced the benchmarks for the OPT model family across three different sparsities: Unstructured (50%), Semi-structured 2:4, and 4:8.
WikiText-2 Perplexity Results (Calibration: 128 C4 sequences, 2048 tokens each. Sparsity applies to Attention and MLP linear weights.)
| Method | Sparsity | 125M | 350M | 1.3B | 2.7B | 6.7B | 13B |
|---|---|---|---|---|---|---|---|
| Dense | 0% | 27.65 | 22.02 | 14.63 | 12.46 | 10.86 | 10.13 |
| Magnitude | 50% | 197.38 | 97.11 | 1.6e3 | 255.16 | 959.48 | 1.2e4 |
| Wanda | 50% | 38.78 | 36.52 | 18.61 | 14.46 | 11.88 | 12.04 |
| SparseGPT | 50% | 38.31 | 32.31 | 17.97 | 13.77 | 11.71 | 11.14 |
| Magnitude | 2:4 | 347.51 | 416.56 | 444.39 | 1.1e3 | 265.80 | 468.95 |
| Wanda | 2:4 | 78.80 | 107.12 | 27.29 | 21.84 | 15.91 | 16.51 |
| SparseGPT | 2:4 | 63.69 | 56.36 | 24.18 | 16.87 | 13.83 | 12.96 |
| Magnitude | 4:8 | 171.28 | 160.52 | 256.32 | 155.48 | 214.14 | 459.81 |
| Wanda | 4:8 | 51.91 | 58.17 | 21.88 | 17.04 | 13.42 | 13.94 |
| SparseGPT | 4:8 | 46.91 | 40.20 | 20.18 | 14.80 | 12.53 | 11.86 |