cyberneticlibrary

Prune models to 50% sparsity for faster inference

model-pruningskillsetup L3★9,423

Orchestra-Research/AI-Research-SKILLs ↗

What it does

Reduce LLM size via unstructured/structured pruning

Best for

Achieving 50% sparsity with minimal accuracy loss via one-shot pruning without retraining.

Inputs

· model
· pruning method (magnitude|wanda|sparse)
· sparsity target (0-1)

Outputs

· pruned model
· sparsity metrics

Requires

· torch
· transformers

Preconditions

· model loaded
· GPU available (optional)

Failure modes

· accuracy drop >threshold
· OOM during pruning

Trust signals

· Wanda method (arXiv 2306.11695)
· SparseGPT (arXiv 2301.00774)
· N:M sparsity support