Quantize without calibration data required

hqq-quantizationskillsetup L29,423
Orchestra-Research/AI-Research-SKILLs
What it does

Quantize LLMs to 4/3/2-bit without calibration data via Half-Quadratic Quantization

Best for

Fast model quantization when calibration data unavailable and extreme compression (2-bit) is acceptable

Inputs
  • · model
  • · target bit-width (4, 3, or 2)
  • · groupsize (64-128)
Outputs
  • · quantized model
  • · HQQ config JSON
Requires
  • · hqq
  • · transformers
  • · torch
Preconditions

NVIDIA GPU; model loaded in memory; no calibration data needed

Failure modes

2-bit quantization severe accuracy loss; groupsize too small → out-of-memory; bit-width mismatch causes dtype errors

Trust signals
  • · No calibration needed (vs GPTQ)
  • · Supports 2-bit extreme compression
  • · Paper published on arXiv