Prune models to 50% sparsity for faster inference
model-pruningskillsetup L3★9,423
Orchestra-Research/AI-Research-SKILLs ↗What it does
Reduce LLM size via unstructured/structured pruning
Best for
Achieving 50% sparsity with minimal accuracy loss via one-shot pruning without retraining.
Inputs
- · model
- · pruning method (magnitude|wanda|sparse)
- · sparsity target (0-1)
Outputs
- · pruned model
- · sparsity metrics
Requires
- · torch
- · transformers
Preconditions
- · model loaded
- · GPU available (optional)
Failure modes
- · accuracy drop >threshold
- · OOM during pruning
Trust signals
- · Wanda method (arXiv 2306.11695)
- · SparseGPT (arXiv 2301.00774)
- · N:M sparsity support