Quantize without calibration data required
hqq-quantizationskillsetup L2★9,423
Orchestra-Research/AI-Research-SKILLs ↗What it does
Quantize LLMs to 4/3/2-bit without calibration data via Half-Quadratic Quantization
Best for
Fast model quantization when calibration data unavailable and extreme compression (2-bit) is acceptable
Inputs
- · model
- · target bit-width (4, 3, or 2)
- · groupsize (64-128)
Outputs
- · quantized model
- · HQQ config JSON
Requires
- · hqq
- · transformers
- · torch
Preconditions
NVIDIA GPU; model loaded in memory; no calibration data needed
Failure modes
2-bit quantization severe accuracy loss; groupsize too small → out-of-memory; bit-width mismatch causes dtype errors
Trust signals
- · No calibration needed (vs GPTQ)
- · Supports 2-bit extreme compression
- · Paper published on arXiv