Introducing My Pruning Library

LLM pruning research is often hindered by the engineering complexity of reproducing activation-aware methods, which usually require custom hooks and intricate layer-wise management. To lower the barrier for experimentation, I developed nn-pruning: a modular PyTorch toolkit that standardizes activation collection and benchmarking. By decoupling pruning logic from the underlying model infrastructure, the project allows researchers to implement and compare new algorithms like Wanda or SparseGPT with minimal boilerplate.

Currently supported:

Sparsity Patterns: Unstructured and Semi-structured N:M
Model Families: OPT (facebook/opt)

Repository nn-pruning

To validate the toolkit, I reproduced the benchmarks for the OPT model family across three different sparsities: Unstructured (50%), Semi-structured 2:4, and 4:8.

WikiText-2 Perplexity Results (Calibration: 128 C4 sequences, 2048 tokens each. Sparsity applies to Attention and MLP linear weights.)

Method	Sparsity	125M	350M	1.3B	2.7B	6.7B	13B
Dense	0%	27.65	22.02	14.63	12.46	10.86	10.13

Magnitude	50%	197.38	97.11	1.6e3	255.16	959.48	1.2e4
Wanda	50%	38.78	36.52	18.61	14.46	11.88	12.04
SparseGPT	50%	38.31	32.31	17.97	13.77	11.71	11.14

Magnitude	2:4	347.51	416.56	444.39	1.1e3	265.80	468.95
Wanda	2:4	78.80	107.12	27.29	21.84	15.91	16.51
SparseGPT	2:4	63.69	56.36	24.18	16.87	13.83	12.96

Magnitude	4:8	171.28	160.52	256.32	155.48	214.14	459.81
Wanda	4:8	51.91	58.17	21.88	17.04	13.42	13.94
SparseGPT	4:8	46.91	40.20	20.18	14.80	12.53	11.86