cyberneticlibrary

Prune models to 50% sparsity for faster inference

model-pruningskillsetup L39,423
Orchestra-Research/AI-Research-SKILLs
What it does

Reduce LLM size via unstructured/structured pruning

Best for

Achieving 50% sparsity with minimal accuracy loss via one-shot pruning without retraining.

Inputs
  • · model
  • · pruning method (magnitude|wanda|sparse)
  • · sparsity target (0-1)
Outputs
  • · pruned model
  • · sparsity metrics
Requires
  • · torch
  • · transformers
Preconditions
  • · model loaded
  • · GPU available (optional)
Failure modes
  • · accuracy drop >threshold
  • · OOM during pruning
Trust signals
  • · Wanda method (arXiv 2306.11695)
  • · SparseGPT (arXiv 2301.00774)
  • · N:M sparsity support